Database compaction in distributed data system

Inventors

Lipcon, Todd

Assignees

Cloudera Inc

Publication Number

US-12169507-B2

Publication Date

2024-12-17

Expiration Date

2036-03-17

Interested in licensing this patent?

MTEC can help explore whether this patent might be available for licensing for your application.


Abstract

A compaction policy imposing soft limits to optimize system efficiency is used to select various rowsets on which to perform compaction, each rowset storing keys within an interval called a keyspace. For example, the disclosed compaction policy results in a decrease in a height of the tablet, removes overlapping rowsets, and creates smaller sized rowsets. The compaction policy is based on the linear relationship shared between the keyspace height and the cost associated with performing an operation (e.g., an insert operation) in that keyspace. Accordingly, various factors determining which rowsets are to be compacted, how large the compacted rowsets are to be made, and when to perform the compaction, are considered within the disclosed compaction policy. Furthermore, a system and method for performing compaction on the selected datasets in a log-structured database is also provided.

Core Innovation

The invention discloses a compaction policy in a distributed database system that imposes soft limits to optimize system efficiency by selecting rowsets on which to perform compaction. Each rowset stores keys within an interval called a keyspace. The compaction policy aims to decrease the height of the tablet, remove overlapping rowsets, and create smaller sized rowsets by leveraging the linear relationship between keyspace height and the cost of performing operations such as inserts within that keyspace.

The compaction policy considers various factors determining which rowsets to compact, how large the compacted rowsets should be, and when compaction should be performed. This involves calculating operational costs based on the number of overlapping rowsets (height) and the width of the rowsets in the keyspace. The policy selects rowsets that optimize operational cost under a given I/O budget, merging those that overlap but are not too large, to maximize performance benefits while minimizing resource consumption.

The problem being solved arises from traditional compaction techniques that waste system resources by either performing compactions too often or compacting datasets that do not optimize tablet efficiency. This leads to inefficient use of storage space and increased I/O costs, adversely impacting existing workloads and increasing service provider costs. The disclosed compaction policy addresses these inefficiencies by applying a cost-based method to select rowsets for incremental compaction, thereby balancing compaction cost with read and write performance improvements in large-scale distributed databases.

Claims Coverage

The patent includes multiple independent claims focusing on methods and systems for organizing data segments and performing compaction in distributed database systems. The main inventive features describe the calculation of operational cost, selection of rowsets based on overlapping key ranges and rowset widths, and performing compaction that reduces tablet height and overlapping rowsets.

Calculation of tablet height based on overlapping rowsets

Determining the height of a tablet within a keyspace based on the number of rowsets that have overlapping key ranges in the plurality of rowsets included in the tablet.

Determination of rowset width based on keyspace percentage

Calculating a rowset’s width according to the percentage of the keyspace to which the rowset corresponds, to aid in cost estimation for compaction.

Operational cost calculation for compaction based on height and widths

Iteratively calculating the operational cost associated with compacting two or more rowsets as a function of the tablet’s height and the rowsets’ widths until a minimum operational cost is reached.

Selection of rowsets for compaction based on minimum operational cost

Selecting two or more particular rowsets for compaction that result in minimizing the operational cost, considering overlapping key ranges and size under a given I/O budget.

Performing distributed compaction via master and slave nodes

Executing the compaction process by communicating instructions between master and slave nodes over a network, merging selected rowsets to reduce tablet height, remove overlaps, and create smaller sized rowsets.

Memory buffering and data flushing with size thresholds

Receiving new data in memory on data nodes and flushing it into new rowsets on tablets, rolling over to new rowsets upon reaching size thresholds, and estimating on-disk data size to flush equally sized rowsets.

Use of interval trees to represent keyspaces and determine heights

Translating the keyspace of a tablet into an interval tree structure to efficiently determine overlapping rowsets and calculate tablet height.

Operational cost modeling for various scan types

Defining operational costs applicable for point queries, short scans, inserts, and random reads as proportional to the number of overlapping rowsets, and for long scans as proportional to rowset size and disk bandwidth.

Bounding rowset sizes to predetermined maximum thresholds

Restricting rowsets to have sizes less than a predetermined size threshold to maintain bounded rowsets covering key ranges less than or equal to the tablet’s keyspace.

The claims collectively define a distributed database compaction system that calculates operational costs based on rowset overlap and size, selects optimal rowsets for compaction under size and I/O constraints, and performs compactions across master and slave nodes to reduce tablet height and overlapping rowsets, improving efficiency and read performance.

Stated Advantages

Maximizes storage utilization while minimizing I/O operation costs in distributed databases.

Improves tablet performance by reducing the height and overlaps of rowsets, leading to more consistent read and write latency.

Balances compaction frequency and size to avoid excessive resource consumption and system performance degradation.

Allows incremental compactions that maintain predictable and substantially constant system performance without spikes.

Reduces unnecessary compactions by selecting only those rowsets whose compaction yields significant cost benefits within an I/O budget.

Documented Applications

Optimizing data storage and retrieval in log-structured distributed databases such as Google BigTable, Apache HBase, Apache Cassandra, and Apache Accumulo.

Managing compaction processes in cloud-based distributed file systems and data storage services to maintain efficient distributed tables and tablets.

Use in distributed non-relational NoSQL databases operating over clusters of data nodes coordinated by master nodes within a distributed file system.

JOIN OUR MAILING LIST

Stay Connected with MTEC

Keep up with active and upcoming solicitations, MTEC news and other valuable information.