Systems and methods for analyzing viral nucleic acids

Inventors

Locke, DevinSzamel, Piotr

Assignees

Seven Bridges Genomics Inc

Interested in licensing this patent?

MTEC can help explore whether this patent might be available for licensing for your application.

Publication Number

US-12173374-B2

Patent

Publication Date

2024-12-24

Expiration Date


Abstract

The invention provides systems and methods for analyzing viruses by representing viral genetic diversity with a directed acyclic graph (DAG), which allows genetic sequencing technology to detect rare variations and represent otherwise difficult-to-document diversity within a sample. Additionally, a host-specific sequence DAG can be used to effectively segregate viral nucleic acid sequence reads from host sequence reads when a sample from a host is subject to sequencing. Known viral genomes can be represented using a viral reference DAG and the viral sequence reads from the sample can be compared to viral DAG to identify viral species or strains from which the reads were derived. Where the viral sequence reads indicate great genetic diversity in the virus that was infecting the host, those reads can be assembled into a DAG that itself properly represents that diversity.

Core Innovation

The disclosed invention provides systems and methods for analyzing viral nucleic acids using a genomic reference graph represented as nodes and edges, where the nodes are stored as objects in non-transitory memory and each node object comprises pointers to one or more other node objects. Sequence reads are aligned to the genomic reference graph by using the objects representing the nodes and the pointers to other node objects.

One aspect is identifying one or more variants in one or more non-viral sequence reads that are not represented by the genomic reference graph, where the non-viral sequence reads are obtained from a non-viral sample from a subject. The genomic reference graph is then updated by creating one or more additional nodes representing the identified variants not represented in the original graph, storing the additional nodes as additional objects with respective lists of pointers to other objects, and obtaining an updated genomic reference graph.

After updating the reference graph, sequence reads from a viral sample containing viral nucleic acid are aligned to the updated genomic reference graph to identify one or more candidate viral sequence reads not represented by the updated genomic reference graph. The aligning to the updated genomic reference graph is performed using the node objects and the additional node objects representing the variants.

Claims Coverage

The partial content provides three independent claims. Each independent claim contains the same core workflow: accessing a pointer-linked human genomic reference graph, aligning non-viral reads to identify variants not represented, updating the graph by adding variant nodes, and aligning viral sample reads to the updated graph to identify candidate viral sequence reads not represented by the updated graph.

Pointer-linked genomic reference graph with node objects and pointer lists

A genomic reference graph is accessed where nodes are stored as objects in non-transitory memory, each object comprises pointers to one or more other node objects, and a first node is stored as a first object whose object comprises a first list of one or more pointers to one or more other objects stored in non-transitory memory.

Align non-viral sequence reads to identify variants not represented in the graph

One or more non-viral sequence reads obtained from a non-viral sample previously obtained from a subject are aligned to the genomic reference graph to identify one or more variants that are not represented by the genomic reference graph, wherein aligning uses the node objects and the pointers to one or more other node objects.

Update the genomic reference graph by creating additional variant nodes with pointer lists

The genomic reference graph is updated to represent the identified one or more variants, including creating one or more additional nodes representing the one or more variants not represented by the genomic reference graph and storing the additional nodes as additional objects in the non-transitory memory, where each additional object comprises a respective list of pointers to one or more other objects.

Align viral sample reads to the updated graph to identify candidate viral sequence reads not represented

Sequence reads from a viral sample containing viral nucleic acid previously obtained from the subject are aligned to the updated genomic reference graph to identify one or more candidate viral sequence reads not represented by the updated genomic reference graph, wherein aligning uses the node objects and the additional node objects representing the one or more additional nodes.

The claimed coverage is the same across the independent claims: pointer-linked node-object storage of a genomic reference graph, non-viral read alignment to detect variants missing from the reference, updating the graph by adding additional variant nodes stored as pointer-linked objects, and aligning viral sample reads to the updated graph to identify candidate viral sequence reads missing from the updated reference.

Stated Advantages

Not explicitly described in patent.

Documented Applications

Not explicitly described in patent.

JOIN OUR MAILING LIST

Stay Connected with MTEC

Keep up with active and upcoming solicitations, MTEC news and other valuable information.