DNA sequence assembly methods of short reads
Inventors
Liu, Changsheng Jonathan • Wu, Yiqiong • LeVan, Kevin Jay
Assignees
Interested in licensing this patent?
MTEC can help explore whether this patent might be available for licensing for your application.
Abstract
Certain embodiments of the invention provide systems and methods for the automated assembly of DNA sequence data into contiguous DNA segments using a computer a system. DNA sequence data is entered into the system. The system indexes and groups a plurality of DNA fragment reads utilizing an anchor sequence and consolidates the fragments into larger sequences by merging the fragment reads within a group.
Core Innovation
The invention provides an automated assembly method for transforming DNA sequence data comprising DNA fragment reads into contiguous DNA segments using a computer system. The method inputs a plurality of DNA fragment reads, identifies a plurality of n-mers within each DNA fragment read, compiles an index of the n-mers, and identifies one or more anchor sequences common to a plurality of the reads among the indexed n-mers.
Consolidated sequences are generated by using a plurality of reads comprising the same indexed anchor sequence. The reads are sub-grouped according to two or more different shoulder sequences, thereby producing two or more subgroups of reads and one or more consolidated sequences, with each consolidated sequence generated using reads of one of the subgroups.
The invention also provides a corresponding sequence assembly system that includes a computer processor, memory, and data storage devices. Programming instructions operate the computer processor to consolidate a set of fragment reads by compiling an index of n-mers, identifying one or more anchor sequences common to a plurality of reads, and generating consolidated sequences from reads comprising the same indexed anchor sequence.
Claims Coverage
The partial content identifies two independent claims. Each independent claim centers on automated consolidation using an indexed set of n-mers and common anchor sequences, with additional refinement by sub-grouping reads using two or more different shoulder sequences to generate one or more consolidated sequences.
Automated consolidation of reads by indexed n-mers and common anchors
A method comprising entering a plurality of DNA fragment reads, identifying a plurality of n-mers within each DNA fragment read and compiling an index of the n-mers, identifying one or more anchor sequences common to a plurality of the reads among the indexed n-mers, and generating a consolidated sequence using reads comprising the same indexed anchor sequence.
Shoulder-based sub-grouping for multiple consolidated sequences
The consolidated sequence is generated by sub-grouping reads comprising the same indexed anchor sequence according to two or more different shoulder sequences to produce two or more subgroups of reads and generating one or more consolidated sequences, where each consolidated sequence is generated using reads of one of the two or more subgroups.
Sequence assembly system for consolidation into contigs using indexed n-mers and common anchors
A sequence assembly system comprising a computer processor, memory, and data storage devices, with memory programming instructions to consolidate a set of fragment reads by identifying a plurality of n-mers within each DNA fragment read and compiling an index of the n-mers, identifying one or more anchor sequences common to a plurality of the reads among the indexed n-mers, and generating a consolidated sequence using reads comprising the same indexed anchor sequence.
Shoulder-based sub-grouping in an assembly system
The system generates the consolidated sequence by sub-grouping reads comprising the same indexed anchor sequence according to two or more different shoulder sequences to produce two or more subgroups of reads and generating one or more consolidated sequences, where each consolidated sequence is generated using reads of one of the two or more subgroups.
Across both independent claims, the core claim coverage is the combination of indexed n-mers with one or more common anchor sequences to select reads for consolidation and refinement of consolidation by sub-grouping those reads according to two or more different shoulder sequences so that multiple consolidated sequences are produced.
Stated Advantages
Not explicitly described in patent.
Documented Applications
Not explicitly described in patent.
Interested in licensing this patent?