Estimating tumor purity from single samples

Inventors

Phillips, Nicholas • Harris, Jason

Assignees

Personalis Inc

Interested in licensing this patent?

MTEC can help explore whether this patent might be available for licensing for your application.

Publication Number

US-12217830-B2

Publication Date

2025-02-04

Expiration Date

Abstract

The disclosure provides methods for estimating tumor purity from tumor samples without use of matched-normal controls. A set of genomic regions are identified based on a nucleic acid sequence data that is aligned to a reference genome. Each genomic region of the set of genomic regions includes one or more nucleotide-sequence variants relative to a corresponding genomic region of the reference genome. A B-allele frequency distribution for the biological sample is determined based on a B-allele frequency determined for each genomic region of the set of genomic regions. The B-allele frequency distribution is processed using a trained machine-learning model to estimate a metric identifying tumor purity in the biological sample.

Core Innovation

The invention determines tumor purity of a biological sample of a subject for informing a cancer feature and evaluating a treatment efficacy. The method obtains nucleic acid sequence data from one or more sequencers that represent a plurality of nucleic acid molecules of the biological sample, aligns the sequence data to a reference genome, and identifies a set of genomic regions containing one or more nucleotide-sequence variants relative to the reference genome. For each genomic region, the method determines a B-allele frequency and determines a B-allele frequency distribution for the biological sample.

The invention processes the B-allele frequency distribution using a trained machine-learning model to estimate a probability of a true tumor purity as a function of a predicted tumor purity in the biological sample. The trained machine-learning model is trained on a training dataset generated from nucleic acid sequence data derived from one or more tumor cells diluted into normal cells. Based on the estimated probability, the method generates a report to inform the cancer feature and evaluate the treatment efficacy for the subject.

Claims Coverage

The document includes two independent claims, a method and a system. Both claims share three inventive features: sequencing alignment and B-allele frequency distribution formation, machine-learning-based tumor purity probability estimation, and report generation for cancer feature and treatment efficacy evaluation.

Determining tumor purity using aligned sequencing variants and a B-allele frequency distribution

Obtaining nucleic acid sequence data; aligning the nucleic acid sequence data to a reference genome; identifying genomic regions with nucleotide-sequence variants; determining a B-allele frequency for each genomic region; and determining a B-allele frequency distribution for the biological sample.

Estimating true tumor purity probability using a machine-learning model trained on tumor-cell diluted training data

Processing the B-allele frequency distribution using a trained machine-learning model to estimate a probability of a true tumor purity as a function of a predicted tumor purity, wherein the model is trained on a training dataset generated from nucleic acid sequence data derived from one or more tumor cells diluted into normal cells.

Generating a report for cancer feature and treatment efficacy evaluation

Generating a report to inform the cancer feature and evaluate the treatment efficacy for the subject based on the estimated probability of a true tumor purity as a function of a predicted tumor purity in the biological sample.

A system performing sequencing alignment, B-allele distribution formation, ML-based tumor purity probability estimation, and report generation

A system comprising data processors and instructions to perform obtaining nucleic acid sequence data, aligning to a reference genome, identifying genomic regions with nucleotide-sequence variants, determining B-allele frequencies, determining a B-allele frequency distribution, processing the distribution using a trained machine-learning model trained on tumor-cell diluted into normal cells data to estimate a probability of true tumor purity as a function of predicted tumor purity, and generating a report to inform a cancer feature and evaluate treatment efficacy.

Across the independent claims, the core coverage is the combination of aligning nucleic-acid sequencing data to a reference genome, forming a B-allele frequency distribution from variant-containing genomic regions, using a trained machine-learning model trained on tumor-cell diluted into normal cells to estimate a probability of true tumor purity as a function of predicted tumor purity, and generating a report for cancer feature and treatment efficacy evaluation.

Stated Advantages

Not explicitly described in patent.

Documented Applications

Not explicitly described in patent.

Abstract
Claims Coverage
Core Innovation
Stated Advantages
Documented Applications
Interested in licensing this patent?

Estimating tumor purity from single samples

Inventors

Assignees

Interested in licensing this patent?

Publication Number

Publication Date

Expiration Date

Abstract

Core Innovation

Claims Coverage

Determining tumor purity using aligned sequencing variants and a B-allele frequency distribution

Estimating true tumor purity probability using a machine-learning model trained on tumor-cell diluted training data

Generating a report for cancer feature and treatment efficacy evaluation

A system performing sequencing alignment, B-allele distribution formation, ML-based tumor purity probability estimation, and report generation

Stated Advantages

Documented Applications

Interested in licensing this patent?

Stay Connected with MTEC