Identifying peptides having T-cell-exposed motifs with known frequency of occurrence in a reference database

Inventors

Bremel, Robert D.Homan, JaneImboden, Michael

Assignees

ioGenetics LLC

Interested in licensing this patent?

MTEC can help explore whether this patent might be available for licensing for your application.

Publication Number

US-10755801-B2

Patent

Publication Date

2020-08-25

Expiration Date


Abstract

The present invention provides methods and systems for identifying and classifying epitopes and use of that information to analyze proteins and peptides within proteins, especially potential epitopes, and to use the information to design synthetic peptides and proteins, analyze biopharmaceutical proteins, and diagnose autoimmune conditions. Peptides which are bound in MHC grooves comprise two sets of amino acids: those that face inwards into the groove and determine the binding affinity to the MHC molecule (the groove exposed motifs or GEM) and those which do not interact with the groove but rather are on the obverse side exposed outwardly to the T-cells (the T-cell exposed Motifs or TCEM). The present invention utilizes information related to the identity and physiochemical characteristics of the GEM and TCEM.

Core Innovation

The invention provides a method to identify peptides in a target protein that contain one or more T-cell-exposed motifs with a known frequency of occurrence in a reference database of T-cell-exposed motifs. The method establishes the reference database by assembling a database of reference proteins and extracting sequential 9-mer and 15-mer peptide sequences with a single amino acid displacement from the reference proteins.

T-cell-exposed motifs are defined as subsets of amino acids within a peptide that, if bound in a MHC molecule, are directed outwards and exposed to T-cell binding. For possible MHC I binding 9-mer peptides, the outward-facing motif positions are amino acid positions 4, 5, 6, 7, and 8, and for possible MHC II binding 15-mer peptides, the outward-facing motif positions are amino acid positions 2, 3, 5, 7, 8 or -1, 3, 5, 7, 8 based on a 15-mer peptide with a central core of 9 amino acids numbered 1-9 and positions outside that core numbered as negative or positive.

The method identifies T-cell-exposed motifs in the extracted peptide sequences, categorizes the motifs based on the frequency of occurrence in the reference database, and determines the relative frequency of each T-cell-exposed motif to provide the reference database of T-cell-exposed motifs. The method extracts peptide sequences from the target protein, identifies T-cell-exposed motifs in the target protein, and compares those motifs to the frequency of occurrence of the same motifs in the reference database to identify one or more peptides in the target protein containing one or more T-cell-exposed motifs having a known frequency of occurrence in the reference database.

The identified peptides are cloned into an expression vector and recombinantly synthesized as a biomolecule selected from the group consisting of a protein and nucleic acid encoding the peptides. The reference proteins are selected from immunoglobulin proteins, human proteome proteins other than immunoglobulins, protein allergens, or microorganism proteins.

Claims Coverage

The claim set contains one independent claim with dependent claims that refine or extend reference protein selection, frequency-based categorization, and additional motif and peptide evaluation steps. Across these claims, the core inventive coverage is motif-frequency construction from reference proteins, outward-facing motif definitions for MHC I and MHC II peptide windows, comparison of target motifs to the reference frequency database, and recombinant synthesis of biomolecules encoding the identified peptides.

Reference database construction from reference proteins

Establishing a reference database of T-cell-exposed motifs by assembling a database of reference proteins comprising at least 40,000 proteins, extracting all sequential 9-mer and 15-mer peptide sequences with a single amino acid displacement from said reference proteins, identifying said T-cell exposed motifs in said peptide sequences, categorizing said T-cell exposed motifs based on the frequency of occurrence of said T-cell exposed motifs in said reference database, and determining the relative frequency of each of said T-cell exposed motifs.

Outward-facing T-cell-exposed motif definition for MHC I and MHC II

Defining the T-cell-exposed motifs as those subsets of amino acids within a peptide which, if bound in a MHC molecule, are directed outwards and exposed to T-cell binding, comprising for a possible MHC I binding 9-mer peptide amino acid positions 4, 5, 6, 7, 8 of a 9-mer and comprising for a possible MHC II binding 15-mer peptide amino acid positions 2, 3, 5, 7, 8 or -1, 3, 5, 7, 8 based on a 15-mer peptide with a central core of 9 amino acids numbered 1-9 and positions outside that core numbered as negative or positive.

Frequency-based identification of peptides from a target protein

Extracting peptide sequences from said target protein and identifying T-cell exposed motifs in said target protein, then comparing said T-cell-exposed motifs from said target protein to the frequency of occurrence of the same T-cell-exposed motifs in said reference database to identify one or more peptides in said target peptide containing one or more T-cell-exposed motifs having a known frequency of occurrence in said reference database.

Cloning and recombinant synthesis of biomolecules encoding identified peptides

Cloning said one or more peptides from said target peptide into an expression vector and recombinantly synthesizing a biomolecule selected from the group consisting of a protein and nucleic acid encoding the one or more peptides.

Reference protein selection categories

The method includes selecting the reference proteins from immunoglobulin proteins, human proteome proteins other than immunoglobulins, protein allergens, or microorganism proteins.

Motif-frequency threshold categorization rules

The method further categorizes by determining whether the T-cell-exposed motifs in the target protein occur with a frequency greater than 1 in 64 in a reference database of T-cell-exposed motifs.

Alternative motif-frequency categorization range or threshold

The method further categorizes T-cell-exposed motifs in a target protein by determining whether they occur with a frequency greater than or less than 1 in 1024 in said reference database of T-cell-exposed motifs.

Allele-specific WIC binding affinity prediction and peptide sequence modification

The method further determines the predicted WIC binding affinity of peptide sequences containing T-cell-exposed motifs derived from a target protein by allele-specific binding assessment, and modifies the peptide sequences to increase or decrease their WIC binding affinity.

Design modifications to shift motif representation frequency

The method further includes designing modifications in one or more selected peptides from a target protein to change the frequency of representation of T-cell-exposed motifs compared to a reference database.

The claim coverage is centered on building a T-cell-exposed motif frequency reference database using outward-facing motif definitions for MHC I and MHC II peptide windows, extracting and identifying those motifs in target proteins, and selecting peptides whose motif or motifs have known frequency of occurrence in the reference database, followed by cloning into an expression vector and recombinant synthesis of biomolecules encoding the selected peptides. Dependent claims add reference protein selection categories, explicit motif-frequency categorization thresholds, and extensions including WIC allele-specific binding affinity prediction, peptide sequence modification, and peptide design modifications to shift motif representation frequency.

Stated Advantages

Allows identification of peptides in a target protein having one or more T-cell-exposed motifs with a known frequency of occurrence in a reference database of T-cell-exposed motifs.

Enables categorizing T-cell-exposed motifs and determining relative frequency of each T-cell-exposed motif to provide the reference database.

Provides peptides encoding biomolecules selected from protein and nucleic acid via cloning into an expression vector and recombinant synthesis.

Documented Applications

Processing and frequency scoring of T-cell-exposed motifs from germline and somatically hypermutated IGHV repertoires, including set-algebra classification across proteins using log2 Pareto-based categorization and unique motif identifiers.

Allele- and motif-frequency-based analysis related to influenza H1N1 narcolepsy association and a conserved HA B-cell epitope region with predicted immunosuppressive T-cell-exposed motifs.

Evaluation and proposed substitutions in influenza hemagglutinin to shift motifs between frequency/likely immunosuppressive and immunostimulatory classes.

Identification and overlap analysis of T-cell-exposed motifs in immunoglobulin light chains (IGLV) and T-cell receptors (TCR), including quantitative overlap statistics.

Identification of probable Treg motifs in tumor-associated antigens and viral proteins (PMEL, MART-1, MAGE-1; HPV E6/E7; hepatitis B core; EBOV GP1/VP24/VP40), using motif frequency classes and MHC binding predictions, with further discussion of immunogen design constructs and peptide optimization strategies.

JOIN OUR MAILING LIST

Stay Connected with MTEC

Keep up with active and upcoming solicitations, MTEC news and other valuable information.