System and method for providing a modern-era retrospective analysis for research and applications (MERRA) data analytic service
Inventors
Schnase, John L. • Duffy, Daniel Q. • Tamkin, Glenn S. • McInerney, Mark • Nadeau, Denis • Thompson, John H. • Sinno, Scott • Strong, Savannah L.
Assignees
National Aeronautics and Space Administration NASA
Publication Number
US-10339114-B2
Publication Date
2019-07-02
Expiration Date
2035-05-13
Interested in licensing this patent?
MTEC can help explore whether this patent might be available for licensing for your application.
Abstract
A system, method and computer-readable storage devices for providing an interface for an analytic service for Modern-Era Retrospective Analysis for Research and Applications (MERRA) datasets. An example system for providing the service includes a data analytics platform of an assemblage of compute and storage nodes that provide a compute-storage fabric upon which high-performance parallel operations are performed over a collection of climate data stored in a distributed file system, a sequencer that transforms the climate data, a desequencer that transforms serialized block compressed sequence files between data formats. The system includes a services library of applications that dynamically create data objects from the data as reduced final results, and a utilities library of software applications that process flat serialized block compressed sequence files. The system also includes a service interface through which a client device can access the climate data via the data analytics platform.
Core Innovation
The invention provides a system, method, and computer-readable storage devices for offering an analytic service interface for the Modern-Era Retrospective Analysis for Research and Applications (MERRA) datasets. The system includes a data analytics platform comprising compute and storage nodes that form a compute-storage fabric enabling high-performance parallel operations over climate data stored in a distributed file system. It incorporates a sequencer to transform climate data into flat serialized block compressed sequence files suitable for MapReduce programs, a desequencer to revert these sequence files back into a second climate data file format, and libraries of software applications for creating reduced final results and processing the sequence files. A service interface allows client devices to access the climate data via this analytics platform.
The problem addressed by the invention is the limited and inefficient access to the large MERRA dataset by end users, climate researchers, and the public. Current technologies are deficient because the MERRA dataset is too large to be moved from its archival location to end users for convenient analysis and use. There is a need for an improved approach that facilitates easier access to MERRA data wherein analyses can be performed at the storage location, and only reduced, more usable data products are transferred to users for further study.
The disclosed system solves these issues by implementing a high-performance data analytics service that leverages a compute-storage fabric enabling parallel processing of MERRA climate data within a distributed file system. The system supports conversion of native MERRA output files to a format suitable for MapReduce computations, preserving metadata and enabling dynamic creation of data objects through a services library implementing functions such as order, status, and download. The analytic service interface exposes these capabilities to external applications via RESTful endpoints, supporting efficient querying, ordering, and retrieval of processed climate data without the need to move the full raw dataset to the end user.
Claims Coverage
The patent contains one independent claim that defines a system with several key inventive features.
Data analytics platform with high-performance parallel operations
A data analytics platform comprising an assemblage of compute and storage nodes that provide a compute-storage fabric to perform high-performance parallel operations over a collection of climate data stored in a distributed file system.
Hardware sequencer for transforming and loading climate data
A hardware sequencer that transforms climate data encoded in a native model output file format to flat serialized block compressed sequence files and loads them into the distributed file system in response to an order service request through a system interface, which maps service requests to MapReduce computations returning session identifiers.
Hardware desequencer for reversing data transformation and facilitating retrieval
A hardware desequencer that transforms the flat serialized block compressed sequence files back from the native model output file format to a second climate data file format, moves the data out of the distributed file system, and prepares it for retrieval via download service requests.
Services library dynamically creating reduced final data objects
A services library comprising multiple software applications that dynamically create data objects from the second climate data file format as reduced final results.
Utilities library processing sequence files
A utilities library comprising software applications that process the flat serialized block compressed sequence files, including sorting, comparing, partitioning, simplifying sequencing/desequencing, and managing configuration files.
Compliance with ISO Open Archival Information System Reference Model
Services library operations correspond to ISO OAIS Reference Model data flow categories for operational archives, including ingest, query, order, download, execute, and status operations.
Order operation supporting multiple climate variable computations
An order operation method (GetVariableByCollection_Operation_TimeRange_SpatialExtent_VerticalExtent) performing maximum, minimum, sum, count, average, variance, and difference computations on climate variables with user-specified input parameters.
MapReduce mapper and reducer modules for parallel processing
A mapper module that compares sequence file key-value pairs to selection criteria, passes values to a reducer module, which performs specified operations and writes results, with the mapper splitting inputs into smaller sub-problems distributed to secondary nodes for parallel processing and collecting reduced final results.
Service interface with adapter and REST server modules
An interface comprising an adapter module mapping external client service requests to the analytics service capabilities and a REST server module that communicatively links the data analytics service to external client applications.
These inventive features collectively provide an integrated system that enables efficient, parallel analytic processing of large MERRA climate datasets at the storage location, with dynamic creation and retrieval of reduced data products via a structured service interface, overcoming limitations of prior systems by preserving metadata, supporting multiple computations, and adhering to recognized archival data flow models.
Stated Advantages
Improved access to large MERRA climate datasets by enabling data analytics as a service, avoiding the need to move large data volumes to end users.
Dynamic creation of reduced final results through parallel MapReduce operations, increasing efficiency of climate data analysis.
Preservation of metadata within transformed sequence files to maintain data integrity and usability.
Flexibility to process multiple climate data collections and variable spatial and temporal extents, supporting a wide range of queries and computations.
Implementation of a service interface that supports industry standards (ISO OAIS) and RESTful communication for ease of integration with external applications.
Documented Applications
Investigation of climate variability through analyses of MERRA variables that include atmosphere, ocean, and land surface products.
Use in application areas such as national disaster response, civil engineering, ecological forecasting, health and air quality, water resources, and agriculture.
Interested in licensing this patent?