System for assembling a derived nucleotide sequence

Inventors

Blattner, Frederick R.Baldwin, Schuyler F.Durfee, Timothy J.Nash, Daniel A.Dullea, Kenneth C.Nelson, Richard D.

Assignees

Dnastar Inc

Interested in licensing this patent?

MTEC can help explore whether this patent might be available for licensing for your application.

Publication Number

US-9109861-B1

Patent

Publication Date

2015-08-18

Expiration Date


Abstract

Systems and methods automatically assemble a set of read sequences into one or more larger nucleotide sequences using a set of comparing sequences as a template. The sequences of each set are divided into smaller mer sequences and sorted to arrange the mer sequences in order, and the sets of mers originating from the read sequence set and the comparing sequence set are compared pairwise to determine matching regions between the sequences of the read sequence set and the sequences of the comparing set. The sorting of the sequence sets prior to the pairwise comparison reduces the amount of volatile memory required to assemble the read sequence set and also reduces the overall time to assemble the read sequence set into one or more larger nucleotide sequences.

Core Innovation

The invention provides a layout assembly system for assembling a derived nucleotide sequence from a read sequence set using a comparing sequence set. The system stores a read sequence set with read entries that include read oligonucleotide sequences and a corresponding read sequence index, and stores a comparing sequence set with comparing entries that include comparing oligonucleotide sequences and a corresponding comparing sequence index. The comparing oligonucleotide sequences collectively form a known completely predetermined comparing oligonucleotide sequence.

The system divides read oligonucleotide sequences into read mers and assigns each read mer a read sequence index and a read position index indicating a number of nucleotides from a location within the read oligonucleotide sequence. The system generates a read mer table with entries including the read mer, the read sequence index, and the read position index, and stores the read mer table with entries sorted by ascending or descending order of read mer sequence. The system similarly divides comparing oligonucleotide sequences into comparing mers, assigns each comparing mer a comparing sequence index and a comparing position index, and generates a comparing mer table with entries sorted by ascending or descending order of comparing mer sequence.

The system determines matches by comparing, for each respective read mer, whether there is a match between the read mer and a comparing mer in the comparing mer table by comparing the read mer sequence to comparing mer sequence that has not been previously compared to a read mer in the read mer table. Using the matches between comparing mers and read mers together with at least one of sequence index, orientation index, frameshift, and position index of comparing mers in the corresponding comparing oligonucleotide sequences, the system orders the read oligonucleotide sequences with respect to each other. The ordered read oligonucleotide sequences collectively specify the assembled derived nucleotide sequence.

Claims Coverage

The provided independent claims are clm-00001 and clm-00007. Each independent claim covers a layout assembly system with four inventive features.

Indexed mer-table generation from read and comparing sequences

Divide each respective read oligonucleotide sequence in the read sequence set into one or more read mers, assign each read mer a read sequence index and a read position index, and generate a read mer table sorted by ascending or descending order of read mer sequence; divide each respective comparing oligonucleotide sequence in the comparing sequence set into one or more comparing mers, assign each comparing mer a comparing sequence index and a comparing position index, and generate a comparing mer table sorted by ascending or descending order of comparing mer sequence.

Match determination between mer tables

Determine, for each respective read mer in the read mer table, whether there is a match between the respective read mer and a comparing mer in the comparing mer table by comparing the sequence of the respective read mer to the sequence of a comparing mer in the comparing mer table that has not previously been compared to a read mer in the read mer table.

Ordering read oligonucleotide sequences using match and comparing attributes

Order respective read oligonucleotide sequences with respect to each other by sorting the matches between comparing mers and read mers together with at least one of sequence index, orientation index, frameshift, and position index of comparing mers in the corresponding comparing oligonucleotide sequences, to form an ordered plurality of read oligonucleotide sequences that collectively specify the assembled derived nucleotide sequence.

Known completely predetermined comparing sequence for layout assembly

Store a comparing sequence set comprising comparing entries, wherein each comparing entry comprises a corresponding comparing oligonucleotide sequence and a corresponding comparing sequence index, and wherein the comparing oligonucleotide sequences in the comparing sequence set collectively form a known completely predetermined comparing oligonucleotide sequence.

Across clm-00001 and clm-00007, the claim coverage centers on constructing sorted, indexed read mer tables and comparing mer tables from a read sequence set and a known completely predetermined comparing oligonucleotide sequence, determining match relationships between the mer tables, and ordering read oligonucleotide sequences using the match relationships together with at least one of sequence index, orientation index, frameshift, and position index to specify the assembled derived nucleotide sequence.

Stated Advantages

Documented Applications

No documented applications found

JOIN OUR MAILING LIST

Stay Connected with MTEC

Keep up with active and upcoming solicitations, MTEC news and other valuable information.