Blockchain solution for harmonized storage of clinical and genetic data
Inventors
Gürsoy, Gamze • Elhussein, Ahmed
Assignees
Columbia University in the City of New York
Publication Number
US-12369010-B2
Publication Date
2025-07-22
Expiration Date
Interested in licensing this patent?
MTEC can help explore whether this patent might be available for licensing for your application.
Abstract
A method for practicing precision medicine comprising providing, to a blockchain platform, each of clinical data and genetic data, providing the blockchain platform, the blockchain platform having a first data structure comprising clinical data and a second data structure comprising genetic data, harmonizing the first and second data structures, creating at least one cohort based on the harmonized first and second data structures, and identifying at least one relationship between the clinical data and the genetic data in each of the at least one cohort.
Core Innovation
The disclosed subject matter provides methods, systems, and blockchain platforms for an improved practice of precision medicine. An example method described includes providing, to a blockchain platform, each of clinical data and genetic data, providing the blockchain platform having a first data structure comprising clinical data and a second data structure including genetic data, harmonizing the first and second data structures, creating at least one cohort based on the harmonized first and second data structures, and identifying at least one relationship between the clinical data and the genetic data in each of the at least one cohort.
Precision medicine can be challenging without integrating clinical and genomics data in a data-sharing framework that achieves large sample sizes, and due to their distinct data types and privacy and data ownership issues, there is a need for systems that integrate clinical and genetic data to avoid missed opportunities. The disclosed subject matter addresses this need by providing a unified framework that harmonizes storage and querying of clinical and genetic data using blockchain technology.
The platform uses data structures and indexing to organize data into levels including clinical (EHR), genetics, and access logs, and employs mapping streams and sparse indexing of data streams to harmonize multimodal storage and speed queries. The disclosed subject matter also supports combined genotype-phenotype queries, gives users decentralized control of their data, and records user access logs to improve transparency into how and when health information is used.
Claims Coverage
Independent claims identified: three independent claims (claims 1, 6, and 11). The following inventive features are extracted from the patent claims.
Providing clinical and genetic data to a blockchain platform
Providing, to a blockchain platform, each of clinical data and genetic data as part of a method for practicing precision medicine.
Separate data structures for clinical and genetic data
A blockchain platform having a first data structure comprising clinical data and a second data structure comprising genetic data or the clinical data.
Harmonizing clinical and genetic data structures
Harmonizing the first data structure and the second data structure to enable unified storage and analysis.
Cohort creation based on harmonized data
Creating at least one cohort based on the harmonized first and second data structures.
Identifying relationships between clinical and genetic data
Identifying at least one relationship between the clinical data and the genetic data or the clinical data in each of the at least one cohort.
Indexing into clinical, genetic, and access log levels
Indexing at least one of the first and second data structures into one of a clinical level, a genetic level, and an access log level.
Genetic level views including person, gene, and MAF
Wherein the genetic level is indexed into one or more of a person view, a gene view, and a Minor Allele Frequency (MAF) view.
Clinical level views including domain and person
Indexing the clinical level into a domain view and a person view.
Searchable mapping stream for query routing
Querying the blockchain platform by directing the query to a mapping stream, wherein the mapping stream returns a query data structure and the query data structure is searched.
Blockchain platform configured for a consortium network
A system comprising a network of institutions each providing clinical and genetic data and a blockchain platform configured to harmonize data structures, create cohorts, and identify relationships across the consortium.
The independent claims define a blockchain-based platform and methods that ingest clinical and genetic data into separate data structures, harmonize those structures, index data into clinical, genetic, and access-log levels with specific sub-views, enable cohort creation and identification of genotype-phenotype relationships, and use mapping streams to support queries across a consortium network.
Stated Advantages
Supports combined genotype-phenotype queries enabling simultaneous clinical and genetic searches.
Gives users decentralized control of their data and allows institutions decentralized control of their data.
Provides user access logs, improving transparency into how and when health information is used.
Blockchain storage is inherently decentralized and secure, providing tamper-resistance and cryptographic access controls.
Sparse indexing and mapping streams can improve query time and enable efficient storage of multimodal data.
Combined clinical and genetic data can increase statistical power for rare disease analysis enabling discovery of connections between genetics and clinical observations.
Promotes data interoperability by using commonly used data formats such as OMOP common data model (CDM) and Variant Call Format (VCF).
Documented Applications
Practicing precision medicine by harmonizing clinical and genetic data for cohort creation and analysis.
Creating cohorts based on combined clinical and genetic characteristics for downstream analysis.
Identifying genotype-phenotype relationships within cohorts, including examining relationships between SNPs and conditions.
A consortium network of biomedical institutions sharing genetic and clinical data for research purposes.
Storing and querying multimodal health data with audit logs to record usage and detect potential misuse.
Demonstrating the platform in a simulated network using publicly available datasets (e.g., MIMIC-IV and 1000 Genomes Project) to showcase querying capabilities.
Providing a user-friendly front-end graphical user interface to allow researchers to access and query the blockchain network.
Interested in licensing this patent?