Fault Detection Mechanisms for COTS FPGA Systems Used in Low Earth Orbit
- authored by
- Tim Oberschulte, Jakob Marten, Holger Blume
- Abstract
Field-programmable gate array (FPGAs) in space applications come with the drawback of radiation effects, which inevitably will occur in devices of small process size. This also applies to the electronics of the Bose Einstein Condensate and Cold Atom Laboratory (BECCAL) apparatus, which will operate on the International Space Station (ISS) for several years. A total of more than 100 FPGAs distributed throughout the setup will be used for high-precision control of specialized sensors and actuators at nanosecond scale. On ISS, radiation effects must be taken into account, the functionality of the electronics must be monitored, and errors must be handled properly. Due to the large number of devices in BECCAL, commercial off-the-shelf (COTS) FPGAs are used, which are not radiation hardened. This paper describes the methods and measures used to mitigate the effects of radiation in an application specific COTS-FPGA-based communication network. Based on the firmware for a central communication network switch in BECCAL the steps are described to integrate redundancy into the design while optimizing the firmware to stay within the FPGA’s resource constraints. A redundant integrity checker module is developed that can notify preceding network devices of data and configuration bit errors. The firmware is validated and evaluated by injecting faults into data and configuration registers in simulation and real hardware. In the end, the FPGA resource usage of the firmware is reduced by more than half, enabling the use of dual modular redundancy (DMR) for the switching fabric. Together with the triple modular redundancy (TMR) protected integrity checker, this combination completely prevents silent data corruptions in the design as shown in simulation and by injecting faults into hardware using the Intel Fault Injection FPGA IP Core while staying within the resource limitation of a COTS FPGA.
- Organisation(s)
-
Architectures and Systems Section
- Type
- Conference contribution
- Pages
- 19-32
- No. of pages
- 14
- Publication date
- 2023
- Publication status
- Published
- Peer reviewed
- Yes
- ASJC Scopus subject areas
- Theoretical Computer Science, Computer Science(all)
- Electronic version(s)
-
https://doi.org/10.1007/978-3-031-46077-7_2 (Access:
Closed)