Systems and methods for a cross-layer key-value store architecture with a computational storage device

Inventors

Bikonda, Naga SanjanaKim, WookheeRamanathan, Madhava KrishnanMIN, ChangwooMARAM, Vishwanath

Assignees

Samsung Electronics Co LtdVirginia Tech Intellectual Properties IncVirginia Polytechnic Institute and State University

Publication Number

US-12019548-B2

Publication Date

2024-06-25

Expiration Date

2042-06-13

Interested in licensing this patent?

MTEC can help explore whether this patent might be available for licensing for your application.


Abstract

Provided is a data storage system including a host including a host cache portion of a mirror cache, the host cache portion for storing metadata indicating a location of a data node that is stored in a kernel cache portion of the mirror cache, and a storage device including the kernel cache portion located in a common memory area.

Core Innovation

The invention provides a data storage system architecture featuring a cross-layer key-value store that utilizes both a host and a computational storage device. The system implements a mirror cache divided into a host cache portion, which stores metadata indicating the location of data nodes, and a kernel cache portion located in a common memory area on the storage device. This arrangement allows the host to store metadata corresponding to data nodes present on the storage device for improved management of data operations.

The architecture separates the key-value store into two layers: a search layer located on the host and a data layer on the storage device. The search layer, such as a tree structure, is manipulated by the host processor, while the data layer, potentially implemented as a doubly-linked list of leaf nodes, is managed by the processing circuit on the storage device. This enables the offloading of data plane computations to the storage device, utilizing high-speed peer-to-peer bandwidth for data movement and processing.

The invention addresses bottlenecks common in prior key-value store designs that rely solely on either host or storage device computation, as well as inefficiencies of conventional logging-based crash consistency. The disclosed system further incorporates version-based crash consistency using version numbers for data nodes and extension nodes, enabling atomic updates, efficient recovery, and variable-length key-value support via extension nodes when data nodes run out of space. This results in reduced bottlenecking, improved throughput, scalability, and simplified crash consistency mechanisms.

Claims Coverage

The patent claims define several inventive features centered on a cross-layer key-value store architecture that integrates a host and a computational storage device with a mirror cache and version-based crash consistency.

Mirror cache with host and kernel cache alignment

The system uses a mirror cache comprising: - A host cache portion on the host to store metadata indicating the location of data nodes. - A kernel cache portion on a common memory area in the storage device, where each host cache entry's metadata aligns with a corresponding kernel cache entry. - The order of entries in the host cache mirrors the order in the kernel cache, minimizing data movement and improving cache coherence.

Separation into search and data layers

The system features a search layer residing and processed on the host, and a data layer residing and processed on the storage device. - The search layer may be implemented as a tree structure for storing partial keys. - The data layer may consist of data nodes, with each corresponding to a leaf node in a doubly-linked list, supporting efficient point and range queries.

Peer-to-peer transfer and operation offload

The storage device contains a persistent memory and a processing circuit, both coupled to the common memory area. The processing circuit can: - Perform data operations on data nodes and write updated nodes to the kernel cache. - Enable peer-to-peer writes from the common memory area to persistent memory by the host, facilitating high-throughput data movement.

Dynamic extension nodes for variable-length key-value support

When a data node lacks sufficient space, the host assigns an extension node to accommodate additional data. The storage device’s processing circuit: - Updates metadata of the data node to reference the extension node. - Maintains and synchronizes version numbers between nodes for crash consistency. - Updates the extension node and data node, updating their version metadata accordingly.

Version-based crash consistency with atomic updates

The processing circuit is configured to write updated extension nodes to storage in an out-of-place manner and updated data nodes in an in-place manner. - Updates are managed such that extension nodes are written and made persistent before data nodes are overwritten at their original locations. - The version metadata is incremented for updated nodes, enabling efficient crash recovery and atomicity.

Host-side concurrency control for read and write operations

The host manages concurrent access to storage by: - Acquiring read locks when processing read requests to data nodes. - Acquiring write locks when processing write requests to data nodes. - Handling cache slot locking to ensure integrity during concurrent operations.

Method of locating and verifying data nodes using mirror cache and key search

The method includes: 1. Locating, on the host, metadata in a host cache portion to identify a data node in the storage device. 2. Determining presence of the data node in the kernel cache (in the common memory area). 3. Locating a partial key in the search layer to facilitate rapid metadata discovery and access.

The inventive features collectively establish a system and method for a high-efficiency, cross-layer key-value store, which leverages coordinated caching, logical separation of processing tasks, dynamic data extension, and version-based atomicity for improved scalability, throughput, and crash consistency.

Stated Advantages

Reduces bottlenecks by logically splitting key-value store tasks between the host and storage device, enhancing system bandwidth.

Improves I/O bandwidth utilization through peer-to-peer transfers between processing circuit and persistent memory, reducing data movement and network hops.

Implements version-based crash consistency, eliminating the overhead of logging-based techniques and ensuring atomic updates.

Allows for scalability and avoids synchronization issues by decoupling search and data layers across host and processing circuit.

Enables efficient data locality and decreased latency by using a cross-layered mirror cache structure.

Provides variable-length key-value support and facilitates recovery from crashes using extension nodes and version metadata.

Documented Applications

Integration into computational pipelines for artificial intelligence and machine learning applications to significantly reduce data fetch stalls and offload preprocessing onto computational storage devices.

JOIN OUR MAILING LIST

Stay Connected with MTEC

Keep up with active and upcoming solicitations, MTEC news and other valuable information.