Computing 2-body statistics on graphics processing units (GPUs)
Inventors
Tu, Yicheng • Pitaksirianan, Napath
Assignees
University of South Florida • University of South Florida St Petersburg
Publication Number
US-12147806-B2
Publication Date
2024-11-19
Expiration Date
2039-07-25
Interested in licensing this patent?
MTEC can help explore whether this patent might be available for licensing for your application.
Abstract
Disclosed are various embodiments for computing 2-body statistics on graphics processing units (GPUs). Various types of two-body statistics (2-BS) are regarded as essential components of data analysis in many scientific and computing domains. However, the quadratic complexity of these computations hinders timely processing of data. According, various embodiments of the present disclosure involve parallel algorithms for 2-BS computation on Graphics Processing Units (GPUs). Although the typical 2-BS problems can be summarized into a straightforward parallel computing pattern, traditional wisdom from (general) parallel computing often falls short in delivering the best possible performance. Therefore, various embodiments of the present disclosure involve techniques to decompose 2-BS problems and methods for effective use of computing resources on GPUs. We also develop analytical models that guide users towards the appropriate parameters of a GPU program. Although 2-BS problems share the same core computations, each 2-BS problem however carries its own characteristics that calls for different strategies in code optimization. Accordingly, various embodiments of the present disclosure involve a software framework that automatically generates high-performance GPU code based on a few parameters and short primer code input.
Core Innovation
The invention provides techniques and frameworks for efficiently computing two-body statistics (2-BS) on graphics processing units (GPUs). 2-BS involves evaluating all pairs of points in one or two datasets using a user-defined distance function, with specific application requirements found across many scientific and computing domains. Despite the inherent suitability of 2-BS problems for parallelization, traditional parallel algorithms often fail to exploit the full computing power and memory bandwidth of modern GPUs.
The main challenge addressed is the quadratic computational complexity inherent in pairwise operations, which limits practical performance for large datasets. The patent introduces a set of optimized GPU algorithms, including advanced tiling methods, data caching strategies, register and shared memory management, privatization of output data structures, load balancing, and shuffle-enhanced tiling. These techniques aim to minimize memory bottlenecks, synchronization overheads, and divergence in thread execution, resulting in higher performance and resource utilization.
Additionally, the patent discloses analytical models to guide parameter selection for GPU kernels and a software framework that automates the generation of high-performance GPU code for arbitrary 2-BS problems. The framework incorporates the optimized strategies described, enabling users to input a distance function, data characteristics, output requirements, and device specifications to automatically generate efficient CUDA kernels or equivalent code for other GPGPU frameworks.
Claims Coverage
The patent includes two independent claims that outline the inventive features relating to GPU-based computation and memory management for two-body statistics problems.
Output buffer management with paged allocation and global pointer referencing
The GPU executes machine-readable instructions that: - Determine a plurality of results of a distance function computed between elements of two data blocks. - Allocate an output buffer and divide it into a plurality of pages. - Use a global pointer to reference the first available page in the output buffer. - Assign subsets of results to specific pages, updating the global pointer as pages are filled. - Enable dynamic assignment of results to available pages, supporting flexible output handling during pairwise computation.
Method for 2-BS execution using paged output buffer and pointer updates
A method for GPUs comprising: - Determining results for a distance function across all pairs in two data blocks. - Allocating and dividing an output buffer into pages. - Referencing and updating a global pointer to manage assignment of subsets of results to buffer pages, assigning subsequent subsets as pages fill. - Providing procedural steps for dynamic and concurrent output storage during 2-BS execution.
These inventive features collectively address dynamic output management and efficient distance calculation processes for 2-BS problems on GPUs, leveraging paged memory allocation, global pointer referencing, and instruction sequences for flexible and concurrent output handling.
Stated Advantages
The disclosed techniques and algorithms significantly improve performance and resource utilization on GPUs for two-body statistics computations over known CPU and GPU implementations.
Privatization and efficient output buffer management greatly reduce synchronization overhead and atomic operation collision, enabling higher throughput for large and variable output sizes.
Analytical models guide parameter selection for optimized kernel performance and occupancy, accommodating various workload characteristics automatically.
The software framework automates the generation of optimized GPU code based on user-provided distance functions and data/output specifications, lowering the barrier for high-performance implementation of new 2-BS problems.
Documented Applications
Computation of all-pairs distance functions in problems such as 2-point correlation function, nearest-neighbor search, spatial distance histogram, radial distribution function, and relational joins.
Pairwise comparison-based applications, including ranking, item rating, Cosine similarity, predictive modeling (e.g., music preference prediction), outlier detection, denoising, kernel density regression, and N-body simulation.
Recommendation systems requiring 2-BS computation, including content-based filtering and collaborative filtering, both relying on pairwise calculations between items or users.
Scientific data analysis tasks in domains such as astrophysics and biology, where 2-BS problems like the two-point angular correlation function and kernel methods are essential.
Interested in licensing this patent?