Energy-minimal dataflow architecture with programmable on-chip network
Inventors
Interested in licensing this patent?
MTEC can help explore whether this patent might be available for licensing for your application.
Assignees
Carnegie Mellon UniversityCarnegie Mellon University is a global research institution based in Pittsburgh, Pennsylvania, recognized for interdisciplinary education, research, and innovation in science, engineering, arts, technology, and social sciences. The university leads advancements in artificial intelligence, robotics, digital health, and performing arts. Located in a technology-driven and culturally rich city, CMU powers real-world impact through research centers, industry engagement, workforce training, and initiatives that shape regional and global communities.
Carnegie Mellon University is a global research institution based in Pittsburgh, Pennsylvania, recognized for interdisciplinary education, research, and innovation in science, engineering, arts, technology, and social sciences. The university leads advancements in artificial intelligence, robotics, digital health, and performing arts. Located in a technology-driven and culturally rich city, CMU powers real-world impact through research centers, industry engagement, workforce training, and initiatives that shape regional and global communities.
Abstract
Disclosed herein is a co-designed compiler and CGRA architecture that achieves both high programmability and extreme energy efficiency. The architecture includes a rich set of control-flow operators that support arbitrary control flow and memory access on the CGRA fabric. The architecture is able to realize both energy and area savings over prior art implementations by offloading most control operations into a programmable on-chip network where they can re-use existing network switches.
Core Innovation
The invention relates to a co-designed energy-minimal CGRA and compiler that compiles a program to support arbitrary control flow and irregular memory access. Control-flow primitives are inserted during compilation from an intermediate representation into a dataflow graph, enabling execution on a coarse-grained reconfigurable array. The approach uses ordering-operator-based dataflow concepts to represent dependencies and to drive control flow through the CGRA and its on-chip network.
A memory-ordering mechanism is enforced on the intermediate representation to create an ordering graph encoding dependencies between memory operations as ordering arcs. The compiler iteratively prunes the ordering graph to eliminate redundant ordering arcs already enforced by data and control dependencies, including path-sensitive transductive reduction that transforms a potentially cyclic ordering graph through strongly-connected components.
Control-flow operators are translated into the dataflow graph by inserting control-flow operators including Steer, Carry, Invariant, Merge, Order, and stream operator, and mapping operators onto the CGRA. The control-flow operators utilize logic of an on-chip network of the CGRA to control flow of the program by reusing NoC switches with added control-flow modules, in a bufferless 2D-torus CGRA fabric with heterogeneous processing elements.
Claims Coverage
The independent claim describes a system with a processor and compiler software that performs a compilation and mapping pipeline including intermediate representation generation, memory-ordering via an ordering graph with pruning, conversion to a dataflow graph with inserted control-flow operators, and mapping onto a CGRA where control-flow operators use on-chip network logic. The independent claim’s inventive features are focused on memory-order enforcement with ordering-arc pruning, translation to a dataflow graph with inserted control-flow operators, and CGRA mapping in which control-flow operators utilize on-chip network logic for program control.
High-level to intermediate representation compilation
The compiler software compiles a program written in a high-level language to an intermediate representation.
Memory-order enforcement via ordering graph arcs
The compiler software enforces memory ordering on the intermediate representation to create an ordering graph encoding dependencies between memory operations as ordering arcs.
Iterative pruning of redundant ordering arcs
The compiler software iteratively prunes the ordering graph to eliminate redundant ordering arcs already enforced by data and control dependencies.
Intermediate representation to dataflow graph translation
The compiler software translates the intermediate representation to a dataflow graph.
Insertion of control-flow operators into the dataflow graph
The compiler software inserts control-flow operators into the dataflow graph.
CGRA mapping using on-chip network logic for control flow
The compiler software maps operators in the dataflow graph to a coarse-grained reconfigurable array (CGRA) wherein the control-flow operators utilize logic of an on-chip network of the CGRA to control flow of the program.
CGRA with heterogeneous processing elements and bufferless 2D-torus on-chip network
The system includes a CGRA with a fabric of heterogeneous processing elements connected through a bufferless 2D-torus on-chip network, along with a RISC-V scalar core and SRAM main memory.
Path-sensitive transductive reduction for ordering-graph acyclicity
The system applies path-sensitive transductive reduction to an ordering graph to transform a potentially cyclic ordering graph into an acyclic graph outside of strongly-connected components.
Control-flow operators comprising Steer, Carry, Invariant, Merge, Order, and stream operator
The system defines control-flow operators comprising a steer operator, a carry operator, an invariant operator, a merge operator, an order operator, and a stream operator.
Dataflow graph optimization before mapping
The system optimizes the dataflow graph before mapping the operators in the dataflow graph.
C language to LLVM intermediate representation via clang
The system further limits that the high-level language is C and the intermediate representation is LLVM produced by clang.
Overall, the independent claim’s coverage centers on enforcing memory ordering through an ordering graph of ordering arcs with iterative pruning, converting the intermediate representation to a dataflow graph with inserted control-flow operators, and mapping onto a CGRA in which control-flow operators use on-chip network logic to control program flow. The refinements further specify a heterogeneous CGRA fabric with bufferless 2D-torus connectivity and concrete operator types, along with path-sensitive transductive reduction for handling cyclic ordering graphs, optimization of the dataflow graph prior to mapping, and constraints to compiling C to LLVM via clang.
Stated Advantages
Reported energy savings compared with prior CGRAs when control-flow is mapped in the NoC (e.g., ~40% energy).
Reported area savings compared with prior CGRAs when control-flow is mapped in the NoC (e.g., ~22% area).
Reported less energy from ILP mapping (e.g., ILP ~4.3% less energy).
Reported fast mapping solutions using SAT (e.g., SAT <3 min).
Documented Applications
Mapping and compilation of programs from a high-level language into a CGRA execution flow that supports arbitrary control flow and irregular memory access.
Interested in licensing this patent?