Fault-tolerant computer system with configurable coprocessor component and methods thereof

Inventors

Lameres, Brock JeromeMajor, Christopher MichelAustin, Hezekiah Ajax

Assignees

Resilient Computing LLCMontana State University Bozeman

Publication Number

US-12287713-B1

Publication Date

2025-04-29

Expiration Date

2044-11-18

Interested in licensing this patent?

MTEC can help explore whether this patent might be available for licensing for your application.


Abstract

A fault-tolerant computer system includes a plurality of redundant processor cores configured to simultaneously execute identical sets of processor-executable instructions, and a coprocessor component including a data storage component and a configurable logic region, where the plurality of processor cores are configured with processor-executable instructions to perform operations including configuring the configurable logic region of the coprocessor component with a first coprocessing module, and controlling the first coprocessing module to perform first processing operations on data located in the data storage component. In various embodiments, the redundant processor cores and the coprocessor component may be implemented on an FPGA, and the redundant processor cores may be configured to swap out different coprocessing modules using Partial Reconfiguration (PR) to perform data processing algorithms using hardware acceleration. Embodiments of the fault-tolerant computer system may be utilized in radiation intensive environments, such as in outer space.

Core Innovation

The invention relates to a fault-tolerant computer system comprising a plurality of redundant processor cores and a coprocessor component, where the processor cores are configured to execute identical sets of processor-executable instructions. The coprocessor component includes a data storage component and a configurable logic region that can be dynamically configured with different coprocessing modules to perform processing operations on data. This configurable logic region allows sequential swapping of coprocessing modules using Partial Reconfiguration (PR), enabling hardware acceleration of data processing algorithms on the same system without the need for large, custom hardware implementations.

The background identifies a persistent need to improve the reliability, performance, and cost-effectiveness of computer systems used in high-radiation environments such as space. Traditional radiation-hardening approaches tend to increase costs and diminish performance, making them less suitable for evolving commercial and scientific space applications. Modern commercial off-the-shelf (COTS) hardware, while inherently more tolerant to total ionizing dose, is increasingly susceptible to single event effects, necessitating robust system-level fault-mitigation strategies focused on cost, computational capacity, and reliable operation in harsh environments.

The core solution introduces a system architecture where multiple redundant processor cores execute in lockstep, monitored by voting and repair logic to detect and correct faults rapidly. The coprocessor component, implemented as a dynamically reconfigurable logic region within an FPGA, supports the real-time deployment of different functional modules tailored to accelerate various algorithms (e.g., floating point, matrix, image processing). The PR capability enables efficient utilization of FPGA resources by partitioning complex algorithms into modules that are loaded as needed, thus supporting the execution of larger or more sophisticated processing tasks using hardware acceleration while maintaining system reliability in adverse environments.

Claims Coverage

There are four independent claims in the patent, each detailing distinctive inventive features of the computer system and coprocessor architecture.

Partial reconfiguration of coprocessor logic region with sequential module swapping

A computer system comprising: - A plurality of processor cores configured to simultaneously execute identical sets of instructions. - A coprocessor component with a data storage component and a configurable logic region. - The processor cores execute instructions to configure the coprocessor's logic region with a first coprocessing module, control it to process data from the data storage component, and then reconfigure the logic region with a second coprocessing module for different processing operations on the data. - The configurable logic region is reconfigured using a Partial Reconfiguration (PR) process, allowing sequential swapping of multiple coprocessor modules, each forming part of a data processing algorithm.

Direct data input to coprocessor without local memory storage in processor cores

A computer system where: - Each processor core comprises a CPU and local memory. - Data is input into the coprocessor's data storage component without being stored in any processor core's local memory. - This enables processing operations by the coprocessor on data that bypasses the redundant CPUs' memories.

Memory-mapped coprocessor component as a peripheral to redundant processor cores

A computer system where: - Each processor core includes a CPU and local memory. - The coprocessor component is memory-mapped as a peripheral accessible by the redundant processor cores. - The system allows the processor cores to interact with and access the outputs and control/status of the coprocessor component via memory-mapped interfaces.

Voting and repair mechanism for register values of redundant processor cores

A computer system comprising: - At least four redundant processor cores, each with CPU and local memory. - A voting and repair component that monitors register values across the redundant cores and overwrites any faulty register values with a majority register value. - This fault-mitigation feature maintains correct execution despite faults in individual core registers.

The claims define a computer system with core inventive features including partial reconfiguration-based coprocessor module swapping, direct data streaming into coprocessor storage, memory-mapped coprocessor accessibility, and register-value voting and repair across redundant cores to ensure fault tolerance and efficient hardware acceleration.

Stated Advantages

Provides enhanced computation capabilities and efficient hardware resource usage in high-radiation environments.

Enables real-time data processing and hardware acceleration for advanced algorithms, including machine learning, AI, signal processing, and image processing on small spacecraft and in space applications.

Reduces costs compared to traditional radiation-hardened computer systems, supporting broader participation in commercial space missions.

Integrates robust fault tolerance via quad modular redundancy and background repair, increasing reliability over standard TMR systems.

Facilitates flexible partitioning and execution of large algorithms by dynamically swapping coprocessor modules using partial reconfiguration, thus overcoming FPGA size limitations.

Improves computational performance scalability based on FPGA device, CPU architecture, and available hardware accelerators.

Documented Applications

Computer systems for use in radiation-intensive environments, including space-based applications such as low earth orbit and cislunar missions.

Embedded computers for instruments and subsystems in space vehicles and satellites, supporting in-space processing for sensor data, navigation, and control.

Coprocessor cards for high-performance space computers, operating alongside systems like NASA's HPSC.

Hardware acceleration for advanced algorithms, including image processing (filtering, edge detection, pattern recognition), floating-point unit algorithms, matrix operations, machine learning, and artificial intelligence.

JOIN OUR MAILING LIST

Stay Connected with MTEC

Keep up with active and upcoming solicitations, MTEC news and other valuable information.