Machine learning implementation for multi-analyte assay development and testing
Inventors
Drake, Adam • Delubac, Daniel • Niehaus, Katherine • Ariazi, Eric • Haque, Imran • Liu, Tzu-Yu • Wan, Nathan • KANNAN, Ajay • White, Brandon
Assignees
Interested in licensing this patent?
MTEC can help explore whether this patent might be available for licensing for your application.
Abstract
Systems and methods that analyze blood-based cancer diagnostic tests using multiple classes of molecules are described. The system uses machine learning (ML) to analyze multiple analytes, for example cell-free DNA, cell-free microRNA, and circulating proteins, from a biological sample. The system can use multiple assays, e.g., whole-genome sequencing, whole-genome bisulfite sequencing or EM-seq, small-RNA sequencing, and quantitative immunoassay. This can increase the sensitivity and specificity of diagnostics by exploiting independent information between signals. During operation, the system receives a biological sample, and separates a plurality of molecule classes from the sample. For a plurality of assays, the system identifies feature sets to input to a machine learning model. The system performs an assay on each molecule class and forms a feature vector from the measured values. The system inputs the feature vector into the machine learning model and obtains an output classification of whether the sample has a specified property.
Core Innovation
The invention provides a classifier-based method that distinguishes a population of individuals having a specified property by assaying a plurality of classes of molecules in a biological sample using a plurality of assays. The plurality of classes include cell-free nucleic acids and polyamino acids, and a first assay is applied to the cell-free nucleic acids and a second assay is applied to the polyamino acids to obtain a plurality of sets of measured values.
A set of features corresponding to properties of each class of molecules is identified to be input to a machine learning model. Feature values are prepared as a feature vector constructed from the plurality of sets of measured values, with each feature value including one or more measured values from the sets and at least one feature value obtained using each set of measured values representative of the classes of molecules.
The machine learning model includes the classifier and is loaded into memory, where the model is trained using training vectors obtained from training biological samples labeled as having the specified property and not having the specified property. The feature vector is input into the machine learning model to obtain an output classification of whether the biological sample has the specified property, thereby distinguishing the population of individuals, and the same approach is used for determining responsiveness to cancer treatment by using responding and not responding labels.
Claims Coverage
The independent claims are directed to a multi-assay, multi-molecule-class feature-vector pipeline feeding a machine learning classifier, with two inventive features emphasized in the claims coverage.
Multi-analyte multi-assay classification using cell-free nucleic acids and polyamino acids
Assaying a plurality of classes of molecules in a biological sample using a plurality of assays, where the plurality of classes include cell-free nucleic acids and polyamino acids, obtaining sets of measured values, identifying features for each molecular class, preparing a feature vector from the plurality of sets of measured values, loading a trained machine learning model, and inputting the feature vector to obtain an output classification of whether the biological sample has the specified property.
Treatment responsiveness determination using a multi-class feature vector classifier
Determining responsiveness of an individual to a cancer treatment by assaying a plurality of classes of molecules in a biological sample with a plurality of assays, where the plurality of classes include cell-free nucleic acids and polyamino acids, preparing a feature vector from the plurality of sets of measured values, loading into memory a machine learning model trained using responding and non-responding training vectors, and inputting the feature vector to obtain an output classification of whether the biological sample is associated with treatment response.
Across the independent claims, the core coverage is a classifier pipeline that combines measured values from multiple assays applied to cell-free nucleic acids and polyamino acids, converts extracted properties into a feature vector, and uses a trained machine learning model to output whether a specified property is present or whether the sample is associated with treatment response.
Stated Advantages
Distinguishes a population of individuals having the specified property based on classifier output using multi-assay, multi-molecule-class features.
Determines responsiveness of an individual to a cancer treatment by classifying whether a biological sample is associated with treatment response.
Documented Applications
Cancer prediction framed for multi-analyte liquid-biopsy classifiers, including use of classifier performance metrics such as PPV, NPV, ROC curve, and AUC.
Longitudinal disease progression monitoring.
Tissue-of-origin determination.
Tumor-burden estimation from genetic sequence features.
Stratification related to treatment responsiveness of responders versus non-responders.
Cancer diagnosis or precision-medicine classification for whether a biological sample has a specified property, including cancer contexts described through dependent limitations.
Determining responsiveness of an individual to a cancer treatment, by classifying whether the biological sample is associated with treatment response.
Interested in licensing this patent?