Mining all atom simulations for diagnosing and treating disease

Inventors

Jafri, Mohsin SaleetMcCoy, Matthew

Assignees

George Mason University

Publication Number

US-11574702-B2

Publication Date

2023-02-07

Expiration Date

2038-12-19

Interested in licensing this patent?

MTEC can help explore whether this patent might be available for licensing for your application.


Abstract

The present disclosure describes methods for determining the functional consequences of mutations. The methods include the use of machine learning to identify and quantify features of all atom molecular dynamics simulations to obtain the disruptive severity of genetic variants on molecular function.

Core Innovation

The invention provides novel computational methods to characterize the functional consequences of genomic mutations. The disclosed methods integrate molecular dynamics (MD) simulations with machine learning (ML) to identify and quantify features from all atom simulations, enabling assignment of the disruptive severity of genetic variants on molecular function. The process involves generating ensembles of structural conformations for macromolecules—such as proteins or nucleic acids—and their variants, extracting quantitative structural and energetic features, and applying principle component analysis to distinguish and rank mutation-induced deviations from wildtype references.

The background identifies the problem that prior approaches using MD and ML to model the effects of variants offer only limited insight into the role of mutations and often cannot differentiate between discrete phenotypic outcomes or predict the severity of a mutation in multiple phenotypic contexts. Most current predictive tools focus on sequence conservation or static protein structure comparisons, failing to reflect the full dynamic and functional impact of mutations. There is a need for more accurate and nuanced methods that can use the dynamic structural and energetic features derived from advanced simulations to predict phenotype, severity, and treatment options associated with genetic variants.

The described method proceeds through a series of steps: performing all atom MD simulations for both wildtype and mutant macromolecules; clustering the resulting conformational structures to identify variant-specific structural populations; extracting multidimensional structural and energetic features for each cluster; leveraging unsupervised ML clustering to identify divergent variant populations; applying principal component analysis to map features in a reduced dimensional space; and finally, using the separation of variant centroids from wildtype in this space to associate and quantify phenotypic disruption and functional severity. This approach allows for ranking and classification of the functional effects of different mutations relative to the wildtype, contributing to more accurate genotype-phenotype predictions.

Claims Coverage

The claims present one primary inventive feature with detailed steps, followed by dependent claims clarifying its application to proteins, diseases, analytical techniques, and phenotypic ranking.

Method for determining functional effects of variant macromolecules by integrating molecular dynamics simulations and machine learning

The claimed invention provides a method comprising: 1. Identifying at least one structure of a wildtype macromolecule and executing multiple all atom molecular dynamics simulations to create a conformational landscape of the wildtype and its variants. 2. Generating datasets of structural and energetic features for the wildtype and variant structures, including features such as global structural features, subdomain dynamics, energetic interactions, and overall statistical characteristics. 3. Clustering the trajectories using algorithms (e.g., based on root mean square deviation), yielding clusters that define variant-specific conformational populations. 4. Further quantifying and clustering structural and energetic features to define new sets of variant clusters. 5. Performing principal component analysis to create a principal component feature space and plotting wildtype/variant centroids. 6. Calculating Euclidean distances between wildtype and variant centroids, removing variant clusters falling within a clustering threshold of wildtype, and generating remaining variant centroids. 7. Comparing these centroids to the wildtype centroid and determining a severity ranking of variants based on the comparative analysis.

The inventive coverage focuses on a multi-step, integrated computational method combining MD simulations, structural/energetic feature extraction, advanced clustering, principal component analysis, and severity ranking for genetic or protein variants, with applications to disease phenotyping and drug design.

Stated Advantages

Enables accurate prediction and ranking of the functional and phenotypic effects, including severity, caused by genetic variants compared to wildtype molecules.

Differentiates between divergent phenotypic changes and subtle variants, addressing limitations of existing sequence or structure-based prediction methods.

Provides information that can inform diagnosis, treatment, and management of disease by associating specific variants with disease phenotypes.

Allows for in silico prediction of variant pathogenicity and disease association using molecular dynamics and machine learning, reducing reliance on exhaustive experimental validation.

Documented Applications

Predicting, diagnosing, and treating diseases in humans, veterinary animals, livestock, and research animals by assessing the functional effects of genetic variants.

Characterizing and ranking the phenotype, trait, or disease caused by variants in macromolecules, including proteins or nucleic acids.

Designing drugs and evaluating drug or antibiotic resistance by defining the interaction of candidate agents with variant macromolecules.

Identifying and selecting desirable traits in plants and livestock for agricultural applications.

Determining regulatory and interaction effects, such as binding of proteins to variant nucleic acids or vice versa, with implications for diseases (e.g., transcription factor binding, gene expression regulation).

JOIN OUR MAILING LIST

Stay Connected with MTEC

Keep up with active and upcoming solicitations, MTEC news and other valuable information.