Apparatus and method for accelerated query processing using eager aggregation and analytical view matching

Inventors

Betawadkar-Norwood, AnjaliPATEL, Priyank

Assignees

Cloudera Inc

Publication Number

US-11341134-B2

Publication Date

2022-05-24

Expiration Date

2038-06-27

Interested in licensing this patent?

MTEC can help explore whether this patent might be available for licensing for your application.


Abstract

A system comprises a computer network and worker machines connected to the computer network. The worker machines store partitions of a distributed database. A master machine is connected to the computer network. The master machine includes a query processor to identify a star query that references a fact table and related dimension tables that characterize attributes of facts in the fact table. Eager aggregation is applied to a query plan associated with the star query. The eager aggregation alters the query plan by moving an aggregation operation before a join operation to form an eager aggregated query plan. An analytical view with data responsive to the eager aggregated query plan is identified. The eager aggregated query plan is revised to form a final query plan. The final query plan references the analytical view. The final query plan is executed to produce query results.

Core Innovation

The invention provides a system and method for accelerating query processing in a distributed database by applying eager aggregation and analytical view matching to star or snowflake schema queries. A master machine connected to worker machines storing partitions of the distributed database identifies star queries referencing a fact table and related dimension tables. It applies eager aggregation by moving an aggregation operation before a join operation in the query plan to create an eager aggregated query plan, thereby reducing the number of rows processed in joins and improving efficiency.

The query processor then identifies an analytical view that contains data responsive to the eager aggregated query plan. This analytical view comprises attributes and measures maintained as a separate data unit from the distributed database and is constructed prior to receiving the star query. The eager aggregated query plan is revised to reference the analytical view, forming a final query plan which, when executed, produces the query results more efficiently. This approach supports incremental maintenance of analytical views and enables agile business intelligence operations by facilitating dimensions and measures modifications without full recomputation of joins.

Claims Coverage

The patent describes three independent claims covering methods and systems for query processing using eager aggregation and analytical view matching applied to star schema queries in distributed databases.

Identifying star queries and applying eager aggregation to form eager aggregated query plans

The method identifies star queries referencing a fact table and related dimension tables, and applies eager aggregation which alters the query plan by moving an aggregation operation before a join operation to form an eager aggregated query plan.

Using analytical views constructed before query receipt to revise eager aggregated query plans

The method identifies an analytical view containing data responsive to the eager aggregated query plan, maintained as a separate data unit, and revises the eager aggregated query plan to form a final query plan that references the analytical view.

Incrementally maintaining analytical views using a partial function module

The method stores a partial function module that initializes a data dimension, increments aggregates in response to data changes, serializes the last aggregate during refresh operations, and merges partial results of the analytical view.

Performing merge rewrite to optimize the eager aggregated query plan

The method applies a merge rewrite to merge together select blocks introduced by eager aggregation, flattening the query to leverage further database optimizations.

Executing the final query plan in a distributed architecture

The final query plan is executed by multiple worker machines storing partitions of the distributed database, under coordination of a master machine, producing and securing query results for clients connected via the network.

The independent claims collectively cover a method and system implementing eager aggregation in query plans for star schema queries, leveraging pre-constructed analytical views to accelerate processing, supporting incremental maintenance of analytical views, and executing queries efficiently in distributed database environments.

Stated Advantages

Improved query processing efficiency by reducing the number of input rows to join operations through eager aggregation.

Expedited query processing by leveraging analytical views that store pre-aggregated data.

Reduction of network traffic by utilizing analytical views to avoid unnecessary data movement.

Support for incremental maintenance of analytical views, enabling efficient updates when underlying data changes.

Increased agility for business intelligence applications by allowing modifications in dimensions and measures without recomputing entire joins.

Documented Applications

Supporting business intelligence applications that rely on star and snowflake schema queries in distributed databases.

Accelerating dashboard-time queries with high concurrency requirements in big data environments.

Applying to Decision Support benchmarks such as TPC-DS querying.

JOIN OUR MAILING LIST

Stay Connected with MTEC

Keep up with active and upcoming solicitations, MTEC news and other valuable information.