Systems and methods for multi-use error correcting codes

Inventors

Sethumadhavan, LakshminarasimhanManzhosov, Evgeny

Assignees

Columbia University in the City of New York

Publication Number

US-12360845-B2

Publication Date

2025-07-15

Expiration Date

Interested in licensing this patent?

MTEC can help explore whether this patent might be available for licensing for your application.


Abstract

Disclosed are methods, systems, devices, circuits. and other implementations, including a method for error identification and correction that includes obtaining from a memory device coded input data, the coded input data previously encoded by multiplying a source data element by a pre-determined multiplier, and stored in the memory device, and performing a decoding operation on the coded input data obtained from the memory device, with the decoding operation including at least a modulo operation, to derive a resultant decoded data element and a remainder portion. The method further includes determining whether the coded input data includes a corrupted portion based on a value of the remainder portion.

Core Innovation

Disclosed is a method and system for error identification and correction in which coded input data previously encoded by multiplying a source data element by a pre-determined multiplier is obtained from a memory device, and a decoding operation including at least a modulo operation is performed on the coded input data to derive a resultant decoded data element and a remainder portion. The remainder portion is used to determine whether the coded input data includes a corrupted portion, and when nonzero, the remainder portion can be used to correct the decoded data element.

The background problem addressed is that traditional ECC methods suffer a tradeoff between error correction rate and memory usage, and that conventional residue codes face practical limitations because accommodating failure models with multiple bit failures causes the number of remainders to grow and multiplier values to become too large for practical use. The disclosure addresses this problem by providing MUSE (Multi-Use) ECC based on residue codes that achieves protection similar to storage-oriented codes while using fewer bits of storage.

The disclosure further describes deriving the pre-determined multiplier by identifying symbol representations for a code set, computing error values for each possible representation error for each symbol, and determining a multiplier for which resultant remainder values are unique, and introduces optimization techniques including shuffling, aliasing, and atomization to reduce remainder collisions and the number of required remainders. The approaches include performing decoding and encoding with specialized multiplication, division and modulo operations (including multiplying by an inverse multiplier and fast modulo circuits) to reduce latencies and enable practical integration into memory systems.

Claims Coverage

This section identifies 8 independent claims and summarizes their main inventive features.

Encoding by multiplying a source data element by a pre-determined multiplier

Obtaining from a memory device coded input data previously encoded by multiplying a source data element by a pre-determined multiplier and stored in the memory device; performing a decoding operation including at least a modulo operation to derive a resultant decoded data element and a remainder portion; and determining whether the coded input data includes a corrupted portion based on a value of the remainder portion.

Derivation of the pre-determined multiplier from symbol representations and error remainders

Identifying symbol representations for a code set comprising a plurality of code symbols; computing error values for each possible representation error for each of the plurality of code symbols; and determining the pre-determined multiplier for which each of a plurality of resultant remainder values, resulting from the modulo operation applied to the computed error values, is unique.

Controller circuit configured to perform modulo-based decoding and determine corruption

A computing system comprising a memory device and a controller circuit configured to obtain coded input data encoded by multiplying a source data element by a pre-determined multiplier, perform a decoding operation including at least a modulo operation to derive a resultant decoded data element and a remainder portion, and determine whether the coded input data includes a corrupted portion based on the remainder portion.

Computer-readable media with instructions to perform modulo-based decoding and derive the multiplier

Non-transitory computer readable media comprising computer instructions executable to obtain coded input data encoded by multiplying a source data element by a pre-determined multiplier, perform a decoding operation including at least a modulo operation to derive a decoded data element and remainder portion, determine whether coded input data includes a corrupted portion, and derive the pre-determined multiplier by identifying symbol representations, computing error values, and determining a multiplier with unique remainder mappings.

Correction of decoded data using an error correction table

Correcting the decoded data element in response to a nonzero remainder by searching an error correction table comprising entries with a remainder field, an error field, and a sign field, and in response to matching the nonzero remainder to an entry, combining the sign value with the corresponding error value to yield a signed error value and adding the signed error value to the decoded data element to yield a corrected data value.

Use of unused coded bits for non-coding functionality

Coded input data previously encoded and stored in the memory device includes one or more bits not required for decoding the coded input data, and the one or more bits are utilized for non-coding functionality such as memory tagging or rowhammer detection.

Controller configured to perform modulo via multiplication by an inverse multiplier

The controller configured to perform the modulo operation is configured to multiply the coded input data by an inverse multiplier, determined from the pre-determined multiplier, to derive the resultant decoded data element.

System embodiments including correction and multiplier derivation functionality

Computing system embodiments are configured to obtain coded input data encoded by multiplying a source data element by a pre-determined multiplier, perform modulo-based decoding to derive a decoded data element and remainder portion, determine corrupted portions based on the remainder portion, correct decoded data in response to a nonzero remainder, and derive the pre-determined multiplier via identifying symbol representations and computing error values with unique remainder mappings.

The independent claims focus on (1) encoding by multiplying data with a pre-determined multiplier and modulo-based decoding that yields a remainder for error detection and correction, (2) deriving the multiplier by computing symbol error values and ensuring unique remainder mappings, (3) controller and computer-readable media implementations to perform these operations, (4) correction via an error correction table, and (5) using unused coded bits for non-coding functionality.

Stated Advantages

MUSE uses fewer bits of storage than comparable Reed-Solomon codes, enabling saved bits to be used for storing metadata and reducing storage overhead.

MUSE is a single code that can guarantee the correctness of both data in storage and during computation and is a fit for Processing In-Memory, potentially using 2.6× less redundancy bits in HBM2 contexts.

MUSE offers customization of codes to fit error models and can cover multiple classes of errors (for example, single-bit errors and asymmetrical errors), enabling reduced check bits compared to Reed-Solomon in showcased models.

The MUSE approach implements computationally efficient encoding and decoding by using specialized multiplication, division and modulo operations for a fixed known multiplier, substantially reducing latencies and taking operations off the critical path.

Documented Applications

Storing metadata in spare bits of coded data, including ARM Memory Tagging Extensions (MTE).

Rowhammer detection using salvaged bits to store hash codes of cache lines to detect corruptions.

Processing In-Memory (PIM) and HBM2 devices to protect both stored data and computation, including protecting HBM2 MAC units for neural-network applications.

Providing ChipKill-like single device failure correction for DDR4 and DDR5 DIMMs as a replacement for conventional ChipKill ECC schemes.

Integrating MUSE into encrypted memory computation algorithms to improve data security.

JOIN OUR MAILING LIST

Stay Connected with MTEC

Keep up with active and upcoming solicitations, MTEC news and other valuable information.