Look-up table containing processor-in-memory cluster for data-intensive applications
Inventors
Ganguly, Amlan • Pudukotai Dinakarrao, Sai Manoj • Connolly, Mark • Sutradhar, Purab Ranjan • Bavikadi, Sathwika • Indovina, Mark Allen
Assignees
George Mason University • Rochester Institute of Technology
Publication Number
US-11775312-B2
Publication Date
2023-10-03
Expiration Date
2042-04-11
Interested in licensing this patent?
MTEC can help explore whether this patent might be available for licensing for your application.
Abstract
A processing element includes a PIM cluster configured to read data from and write data to an adjacent DRAM subarray, wherein the PIM cluster has a plurality of processing cores, each processing core of the plurality of processing cores containing a look-up table, and a router connected to each processing core, wherein the router is configured to communicate data among each processing core; and a controller unit configured to communicate with the router, wherein the controller unit contains an executable program of operational decomposition algorithms. The look-up tables can be programmable. A DRAM chip including a plurality of DRAM banks, each DRAM bank having a plurality of interleaved DRAM subarrays and a plurality of the PIM clusters configured to read data from and write data to an adjacent DRAM subarray is disclosed.
Core Innovation
The invention discloses a look-up table (LUT) based processor-in-memory (PIM) cluster architecture within a DRAM chip. Each PIM cluster comprises a plurality of processing cores, where each core contains a programmable look-up table and input registers. A router within the cluster enables data communication among the cores, and the PIM cluster is able to read from and write to an adjacent DRAM subarray. The controller unit interacts with the PIM cluster's router and executes operational decomposition algorithms to orchestrate complex operations by coordinating many processing cores.
The problem addressed by this invention is the performance bottleneck and high data latency of traditional von Neumann computing architectures, particularly for data-intensive applications such as Deep Learning, Data Security, and Multimedia Processing. Prior attempts at processor-in-memory, such as bitwise logic-in-DRAM, SRAM/ReRAM crossbars, and large LUT-based DRAM, have suffered from major limitations including poor support for larger operands, limited function, high area overhead, or costly analog-digital conversions.
Unlike previous approaches, this invention introduces processing cores based on 8-bit LUTs that can be dynamically programmed for various arithmetic and logic operations. By employing a cluster of these cores with a flexible router and a controller to run operation-decomposition algorithms, the architecture enables massively parallel processing at the memory level for complex, high-precision tasks. The in-memory arrangement reduces data communication delays, streamlines parallelization, and allows for the dynamic reprogramming of core functions without scaling hardware overhead.
Claims Coverage
There are two independent claims providing the primary inventive coverage: one for a processor-in-memory cluster and another for a DRAM chip featuring multiple such clusters.
Processor-in-memory cluster with programmable look-up table cores and internal router
A PIM cluster is configured to read from and write to an adjacent DRAM subarray. The cluster includes multiple processing cores, each containing a look-up table, and a router that connects to each core and enables communication among them. A controller unit communicates with the router and contains an executable program for operational decomposition algorithms.
DRAM chip containing multiple processor-in-memory clusters with interleaved subarrays
A DRAM chip is comprised of multiple DRAM banks, each with a plurality of interleaved DRAM subarrays and a plurality of PIM clusters as previously described. Each PIM cluster can communicate with a controller unit, and features programmable look-up tables in the processing cores.
The claims provide coverage for a programmable processor-in-memory cluster architecture with internal routing and a controlling unit for decomposed operations, as well as for DRAM chips integrating multiple such PIM clusters with interleaved subarrays.
Stated Advantages
The architecture avoids complex CMOS logic, enabling simpler construction compatible with standard DRAM chip manufacturing.
Pre-computed LUT-based operations result in lower dynamic power consumption and higher energy efficiency compared to CMOS logic-based computing.
Dynamic reprogramming of the look-up tables allows the same hardware to support numerous operations without increasing hardware overhead.
The in-memory placement of processing clusters enables high bandwidth, minimal latency, and massively parallel operation on data-intensive applications.
Documented Applications
Deep Neural Network acceleration, including inference with 8-bit fixed-point and 12-bit floating point, as well as binary and ternary weighted precisions.
Massively parallel data encryption using AES and other algorithms, including support for in-memory key expansion.
Acceleration of data-parallel applications such as Graph Processing, Automata Processing, Image Processing, and Genomic Sequencing.
Real-time computer vision applications for mobile and edge devices, including autonomous driving modules, drones, and industrial robots.
Interested in licensing this patent?