Systems and methods for classifying patients with respect to multiple cancer classes

Inventors

MAHER, M. CyrusVALOUEV, AntonFilippova, DaryaNicula, VirgilJagadeesh, KarthikVenn, Oliver ClaudeGROSS, Samuel S.Beausang, John F.Calef, Robert Abe Paine

Assignees

Grail Inc

Interested in licensing this patent?

MTEC can help explore whether this patent might be available for licensing for your application.

Publication Number

US-12191000-B2

Patent

Publication Date

2025-01-07

Expiration Date


Abstract

Technical solutions for classifying patients with respect to multiple cancer classes are provided. The classification can be done using cell-free whole genome sequencing information from subjects. A reference set of subjects is used to train classifiers to recognize genomic markers that distinguish such cancer classes. The classifier training includes dividing the reference genome into a set of non-overlapping bins, applying a dimensionality reduction method to obtain a feature set, and using the feature set to train classifiers. For subjects with unknown cancer class, the trained classifiers provide probabilities or likelihoods that the subject has a respective cancer class for each cancer in a set of cancer classes. The present disclosure thus describes methods to improve the screening and detection of cancer class from among several cancer classes. This serves to facilitate early and appropriate treatment for subjects afflicted with cancer.

Core Innovation

The invention relates to classifying a test subject of a given species to a cancer class in a plurality of cancer classes using nucleic acid fragments in a biological sample. For training, the method obtains for each respective reference subject in a first plurality of reference subjects a cancer class and a sequencing construct that includes a first bin count for each respective bin in a plurality of bins collectively representing all or a portion of a reference genome of the species. Each first bin count represents a number of nucleic acid fragments measured from nucleic acids that map onto a different and non-overlapping portion of the reference genome.

The method improves computational efficiency by collectively subjecting the first bin count of each bin in the plurality of bins for each reference subject to a dimensionality reduction method to obtain a feature set with fewer features than the number of bins. The method further performs resampling of the feature set a plurality of times, where resampling includes forming trained component classifiers for each training iteration by omitting a subset of values for features and forming each trained component classifier by inputting remaining feature values with the cancer class of the reference subjects as ground truth.

As a result of resampling, the method constructs a trained first classifier by collectively leveraging output generated by the trained component classifier formed in each training iteration, thereby producing improved cancer class recognition ability over the untrained first classifier. In classification, the trained first classifier is used to classify the test subject to a cancer class in the plurality of cancer classes using nucleic acid fragments in a biological sample obtained from the test subject.

The disclosure further includes combining whole genome signals with methylation data and provides system-level modules and assay definitions, including methylation state vectors and targeted versus WGS inputs. The disclosed evaluation observations include accuracy trends with cancer stage and combined sequencing modalities, and the disclosure describes producing probabilities or likelihoods for each cancer class using decision rules.

Claims Coverage

The patent explicitly contains three independent claims (training an untrained classifier, classifying with a trained classifier, and corresponding computer-system implementations). Across these independent claims, the inventive features consistently include non-overlapping genome bin counts tied to reference cancer classes, dimensionality reduction to a smaller feature set, and resampling/feature omission to form trained component classifiers whose outputs are collectively leveraged to construct an improved trained classifier.

Genome-binned sequencing constructs tied to non-overlapping reference genome portions and reference cancer classes

For each reference subject in a first plurality of reference subjects, obtaining a cancer class and a sequencing construct that includes a first bin count for each bin in a plurality of bins representing all or a portion of a reference genome, where each first bin count represents a number of nucleic acid fragments measured from nucleic acids mapping onto a different and non-overlapping portion of the reference genome.

Dimensionality reduction producing a feature set with fewer features than bins

Collectively subjecting the first bin count of each bin in the plurality of bins for each reference subject to a dimensionality reduction method to obtain a feature set with fewer features than the number of bins.

Resampling with feature omission to form trained component classifiers using ground-truth cancer classes

Resampling the feature set a plurality of times, omitting a subset of values for features in each training iteration, and forming trained component classifiers by inputting remaining feature values together with the cancer class of respective reference subjects as ground truth.

Collective leveraging of component outputs to construct an improved trained classifier

Constructing a trained first classifier by collectively leveraging output generated by the trained component classifier formed in each training iteration, producing improved cancer class recognition ability over the untrained first classifier.

Classification using a trained first classifier on nucleic acid fragments in a biological sample

Using the trained first classifier to classify the test subject to a cancer class in the plurality of cancer classes using nucleic acid fragments in a biological sample obtained from the test subject.

Across the independent claims, the inventive coverage is directed to training and using a classifier with genome-binned sequencing constructs from biological samples, dimensionality reduction to a smaller feature set, resampling with feature omission to form trained component classifiers, and collective leveraging of component outputs to construct an improved classifier for cancer class recognition.

Stated Advantages

Improved computational efficiency of the computer system.

Improved cancer class recognition ability of the trained first classifier over the untrained first classifier.

Documented Applications

Classifying a test subject to a cancer class in a plurality of cancer classes using nucleic acid fragments from a biological sample, including cell-free nucleic acids, with example emphasis on cell-free whole genome sequencing, and in disclosed combinations additionally using methylation data.

Producing multi-class classification outputs including probabilities or likelihoods for each cancer class and using decision rules to form cancer class calls.

JOIN OUR MAILING LIST

Stay Connected with MTEC

Keep up with active and upcoming solicitations, MTEC news and other valuable information.