Hierarchical optimized detection of relatives

Inventors

Marciano, MichaelAdelman, Jonathan D.

Assignees

Syracuse University

Publication Number

US-11309062-B2

Publication Date

2022-04-19

Expiration Date

2038-10-01

Interested in licensing this patent?

MTEC can help explore whether this patent might be available for licensing for your application.


Abstract

A system for evaluating a DNA sample and determining whether the sample contains related individuals and/or unrelated individuals with high levels of alleles sharing. Trained and pre-validated machine learning algorithms are to rapidly and probabilistically assess the presence of relatives in a DNA mixture. To make a probabilistic determination, the system evaluates aspect of the sample that have not be considered before, such as peak heights, peak height ratios, maximum peak heights, minimum peak heights, ratios of allele heights to one another, number of contributors using maximum allele count method, and quantitative measures of the amount of DNA contributed by the male and female organisms. The system identifies whether a DNA sample has contributors that are not readily identifiable based on the data and can thus improve downstream analysis.

Core Innovation

The invention provides a system and method that uses machine learning algorithms to probabilistically evaluate whether a DNA sample contains related individuals and/or unrelated individuals with high levels of allele sharing. The approach transforms DNA sample data, such as peak detection signals from sequencing systems, into feature vectors combining both transformed and untransformed data. This data is then input into trained and validated machine learning models to rapidly assess the probability that contributors to the mixture are genetically related.

The problem being addressed is the challenge in identifying the presence of related DNA contributors or contributors with high levels of allele sharing in a DNA mixture. Existing methods do not provide a probability that a sample includes such contributors, leading to potential incorrect or misleading analysis results. The invention aims to improve the accuracy and reliability of DNA mixture analysis, particularly in forensic and research applications where the presence of relatives or high allele sharing complicates contributor identification.

The system evaluates aspects of the sample not previously considered, such as peak heights, peak height ratios, maximum and minimum peak heights, ratios of allele heights, number of contributors using the maximum allele count method, and quantitative measures of DNA from male and female sources. By transforming these features and analyzing them using machine learning, the system assigns probabilities at both the locus and sample level for the likelihood of the presence of genetically related contributors, enhancing both the speed and confidence of downstream DNA analysis.

Claims Coverage

The patent includes two independent claims covering a method and a system for determining whether a DNA sample contains related contributors using specific data transformations and machine learning.

Method for determining related DNA contributors using transformed peak detection data and machine learning

This method involves: 1. Receiving peak detection signals for a predetermined number of loci in a DNA sample. 2. Establishing a first set of peak detection data to be transformed and a second set to remain untransformed. 3. Transforming the first set of peak detection data using at least one transformation approach, which may include: - Signal detection using average baseline noise and thresholding. - Trimming signals for errors or artifacts. - Calculating ratios involving peak heights of gender-determining markers, counts determined by the maximum allele count method, and other specified quantitative measures. 4. Inputting both transformed and untransformed data into a trained machine learning algorithm to output, for each locus, a probability that multiple, genetically related contributors are present.

System for determining related DNA contributors using modular data processing and machine learning

The system comprises: - A computer processor programmed with modules to: - Receive peak detection signals for a set of loci from a sequencing device. - Establish transformed and untransformed data sets from these signals. - Transform one data set using multiple transformation approaches (such as those described in the method). - Input both data sets into a trained machine learning algorithm to produce a probability for each locus indicating presence of multiple, genetically related contributors. - The processor configuration supports implementation on standard desktop or laptop hardware and may involve additional processing modules for locus-level and overall probability calculations.

In summary, the inventive coverage focuses on a novel combination of specific data transformations applied to DNA peak detection signals, and the use of trained machine learning algorithms to generate probabilistic assessments of related contributors, implemented as either a method or as a configurable computer system.

Stated Advantages

Provides higher-confidence and more rapid analysis of DNA mixtures to determine the presence of related contributors or individuals with high levels of allele sharing.

Enables assignment of probabilities for genetically related contributors at both the locus and sample level, improving accuracy in contributor determination.

Improves the efficiency and accuracy of downstream DNA mixture analysis, including contributor estimation and hypothesis selection for likelihood ratio calculations.

Allows analysis and characterization of unknown DNA samples, which is not possible with conventional genetic relatedness methods.

Documented Applications

Used in forensic DNA analysis to identify the presence of related contributors in DNA mixtures.

Applicable in clinical and medical research for analysis of DNA mixtures with potential related contributors.

Enhances downstream analysis such as contributor estimation (PACE, NOCit), mixture deconvolution (TrueAllele, STRmix, ArmedXpert), and hypothesis setting for likelihood ratio computations (STRmix, TrueAllele, Lab Retriever, LRmix, FST, GenoProof, likeLTD-R).

JOIN OUR MAILING LIST

Stay Connected with MTEC

Keep up with active and upcoming solicitations, MTEC news and other valuable information.