Interested in licensing this patent?
MTEC can help explore whether this patent might be available for licensing for your application.
Abstract
The disclosure provides methods for estimating tumor purity from tumor samples without use of matched-normal controls. A set of genomic regions are identified based on a nucleic acid sequence data that is aligned to a reference genome. Each genomic region of the set of genomic regions includes one or more nucleotide-sequence variants relative to a corresponding genomic region of the reference genome. A B-allele frequency distribution for the biological sample is determined based on a B-allele frequency determined for each genomic region of the set of genomic regions. The B-allele frequency distribution is processed using a trained machine-learning model to estimate a metric identifying tumor purity in the biological sample.
Core Innovation
The invention determines tumor purity of a biological sample of a subject for informing a cancer feature and evaluating a treatment efficacy. The method obtains nucleic acid sequence data from one or more sequencers that represent a plurality of nucleic acid molecules of the biological sample, aligns the sequence data to a reference genome, and identifies a set of genomic regions containing one or more nucleotide-sequence variants relative to the reference genome. For each genomic region, the method determines a B-allele frequency and determines a B-allele frequency distribution for the biological sample.
The invention processes the B-allele frequency distribution using a trained machine-learning model to estimate a probability of a true tumor purity as a function of a predicted tumor purity in the biological sample. The trained machine-learning model is trained on a training dataset generated from nucleic acid sequence data derived from one or more tumor cells diluted into normal cells. Based on the estimated probability, the method generates a report to inform the cancer feature and evaluate the treatment efficacy for the subject.
Claims Coverage
The document includes two independent claims, a method and a system. Both claims share three inventive features: sequencing alignment and B-allele frequency distribution formation, machine-learning-based tumor purity probability estimation, and report generation for cancer feature and treatment efficacy evaluation.
Determining tumor purity using aligned sequencing variants and a B-allele frequency distribution
Obtaining nucleic acid sequence data; aligning the nucleic acid sequence data to a reference genome; identifying genomic regions with nucleotide-sequence variants; determining a B-allele frequency for each genomic region; and determining a B-allele frequency distribution for the biological sample.
Estimating true tumor purity probability using a machine-learning model trained on tumor-cell diluted training data
Processing the B-allele frequency distribution using a trained machine-learning model to estimate a probability of a true tumor purity as a function of a predicted tumor purity, wherein the model is trained on a training dataset generated from nucleic acid sequence data derived from one or more tumor cells diluted into normal cells.
Generating a report for cancer feature and treatment efficacy evaluation
Generating a report to inform the cancer feature and evaluate the treatment efficacy for the subject based on the estimated probability of a true tumor purity as a function of a predicted tumor purity in the biological sample.
A system performing sequencing alignment, B-allele distribution formation, ML-based tumor purity probability estimation, and report generation
A system comprising data processors and instructions to perform obtaining nucleic acid sequence data, aligning to a reference genome, identifying genomic regions with nucleotide-sequence variants, determining B-allele frequencies, determining a B-allele frequency distribution, processing the distribution using a trained machine-learning model trained on tumor-cell diluted into normal cells data to estimate a probability of true tumor purity as a function of predicted tumor purity, and generating a report to inform a cancer feature and evaluate treatment efficacy.
Across the independent claims, the core coverage is the combination of aligning nucleic-acid sequencing data to a reference genome, forming a B-allele frequency distribution from variant-containing genomic regions, using a trained machine-learning model trained on tumor-cell diluted into normal cells to estimate a probability of true tumor purity as a function of predicted tumor purity, and generating a report for cancer feature and treatment efficacy evaluation.
Stated Advantages
Not explicitly described in patent.
Documented Applications
Not explicitly described in patent.
Interested in licensing this patent?