Merging multiple sorted lists in a distributed computing system
Inventors
Lieber-Dembo, Adar • Lipcon, Todd
Assignees
Publication Number
US-11301210-B2
Publication Date
2022-04-12
Expiration Date
2040-01-28
Interested in licensing this patent?
MTEC can help explore whether this patent might be available for licensing for your application.
Abstract
A technique is described for merging multiple lists of ordinal elements such as keys into a sorted output. In an example embodiment, a merge window is defined, based on the bounds of the multiple lists of ordinal elements, that is representative of a portion of an overall element space associated with the multiple lists. Lists of elements to be sorted can be placed into one of at least two different heaps based on whether they overlap the merge window. For example, lists that overlap the merge window may be placed into an active or “hot” heap, while lists that do not overlap the merge window may be placed into a separate inactive or “cold” heap. A sorted output can then be generated by iteratively processing the active heap. As the processing of the active heap progresses, the merge window advances, and lists may move between the active and inactive heaps.
Core Innovation
The invention introduces a technique for merging multiple lists of ordinal elements, such as keys, into a sorted output by defining a merge window that represents a portion of an overall element space associated with the multiple lists. Lists are placed into one of at least two heaps, an active or “hot” heap and an inactive or “cold” heap, depending on whether they overlap the merge window. The sorted output is generated by iteratively processing the active heap while the merge window advances, causing lists to move between the active and inactive heaps.
The problem being addressed arises in distributed computing systems where merging multiple sorted lists to produce a single sorted output, such as during ordered scans or merge queries, is computationally intensive and inefficient. Existing approaches, including naïve ordered scans with O(n²) time complexity and single heap-based scans with O(n log n) time complexity, suffer from high memory usage and increased heap motion when scaling to many lists or large data volumes. These issues lead to poor overall system performance, increased latency, and memory strain in distributed, large-scale storage systems.
The introduced technique improves performance by segmenting the multiple lists into multiple heap structures based on a dynamically defined merge window. By separating lists into active and inactive groups depending on overlap with the merge window and using a possible third heap to manage transitions, the approach reduces heap motion and memory usage. This results in more efficient merging operations, lower processing latency, and reduced memory utilization especially when dealing with large datasets distributed across multiple nodes.
Claims Coverage
The patent claims cover methods and systems involving merging multiple per-rowset iterators representing rowsets of a distributed table using multiple heap structures based on a merge window. There are twenty-three main inventive features related to defining, maintaining, and processing these heaps and updating the merge window during iterative merging.
Maintaining two heaps based on merge window overlap
Maintaining a first heap containing per-rowset iterators that overlap a merge window representative of a portion of a keyspace, and a second heap containing the remainder of iterators that do not overlap the merge window.
Merge window definition based on iterator bounds
Basing the merge window on the lower and upper bounds of the runs of keys in each per-rowset iterator, with specific rules for defining start and end of the merge window using smallest lower bound and smallest upper bound among iterators.
Updating iterator bounds and merge window during processing
Updating the lower and/or upper bound of iterators during iterative processing and correspondingly updating the merge window to reflect these changes.
Using a third heap to track upper bounds of iterators in active heap
Maintaining a third heap containing entries based on upper bounds of iterators in the active heap to assist in determining when iterators should move between active and inactive heaps.
Moving iterators between heaps based on bounds comparison
Comparing iterator lower bounds with top entry of the third heap to move iterators from inactive to active heap if they overlap the merge window, or from active to inactive heap if they no longer overlap.
Iterative merge process on active heap
Processing the active heap by popping the top iterator, copying the next non-exhausted key to output, updating its lower bound, returning it to the heap, and repeating until merge completion.
Whole block copy optimization
Copying all non-exhausted keys from an iterator to output when it is the only iterator in the active heap to improve performance.
Ordering of heaps as min-heaps based on lower bounds
Both active and inactive heaps are min-heaps ordered according to lower bounds of the iterators stored therein.
Applying the technique in distributed computing systems
Using the method to process tables with multiple tablets and rowsets stored across multiple nodes in a distributed computing system, specifically referencing Apache Kudu™.
Splitting general lists of ordinal elements into heaps
Splitting a dataset partitioned into multiple lists into first and second groups based on a merge window and iteratively processing the first group to generate sorted output.
Updating merge window dynamically as merging progresses
Adjusting the merge window based on updated bounds of iterators as keys are processed and iterators advance.
Handling partially overlapping keyspaces among rowsets
Managing rowsets with overlapping keyspaces through heap partitioning and iterative merging to optimize ordered scans.
Supporting query conditions in selecting iterators
Accessing per-rowset iterators that correspond to rows satisfying query conditions received from clients to efficiently process ordered scans.
The claims collectively cover the key inventive concepts of defining and using a merge window to split lists or per-rowset iterators into multiple heaps for ordered merging, dynamically maintaining and updating these heaps and the merge window based on iterator bounds, and applying these methods in distributed computing systems to produce sorted outputs efficiently. The patent includes various features to optimize heap operations and reduce memory and processing overhead during ordered scans.
Stated Advantages
Reduces heap motion and processing required to reorder lists in heap compared to existing single heap approaches.
Reduces memory utilization by minimizing the number of elements that must be resident in memory during the merge.
Decreases the number of element comparisons and output operations during merge, improving performance.
Results in significant reductions in processing latency when merging large numbers of lists or large data sets.
Enables efficient ordered scans and merging in distributed computing systems handling large-scale datasets.
Documented Applications
Performing ordered scans in distributed databases such as Apache Kudu™ by merging multiple per-rowset iterators from multiple rowsets into a single sorted output.
Executing merge queries to produce sorted output from multiple lists of distributed data elements across nodes.
Enabling fault tolerance by allowing scans to be resumed with ordering preserved in distributed storage systems.
Increasing data storage efficiency through compaction, merging multiple smaller rowsets into larger ordered rowsets.
Performing incremental backups by identifying changed data between backups using ordered scans that produce multiple versions of rows with the same primary key.
Interested in licensing this patent?