Download Lund_Apr04

Immunological bioinformatics Ole Lund, Center for Biological Sequence Analysis (CBS) Denmark. World-wide Spread of SARS Status as of July 11, 2003: 8437 Infected, 813 Dead SARS    First severe infectious disease to emerge in the post-genomic era Modern societies are vulnerable to epidemics Classical containment strategies has been successful in controlling the epidemic, but – –    SARS may resurface (e.g. be seasonal) Suggested existence of an animal reservoir could compromise the containment strategy Need to develop a vaccine strategy Biotechnology has provided new tools to analyze genome/proteome information and guide vaccine development. The causative virus, the SARS corona virus (SARS CoV), has been isolated and full-length sequenced. Main scientific achievements     Discovery of causative agent Genome(s) 3D Structure of main proteinase Origin – Similar virus found in from Himalayan palm civets and other animals, including a raccoondog, and in humans working at an animal market in Guangdong, China (Guan et al., Sep 4, 2003). Himalayan (Masked) palm civet Ferret-Badger Raccoon-dog http://biobase.dk/~david-c/uk-dk-mammmal-list.htm Source: Michael Buchmeier, Beijing June, 2003 New corona viruses 1978 Porcine Epidemic diarrhea virus (PEDV) Probably from humans 1984 Porcine Respiratory Coronavirus 1987 Porcine Reproductive and Respiratory Syndrome (PRRS) 1993 Bovine corona virus 2003 SARS Will it be back?  When? – – – –  Every year?, Like the flu. Every few years? Like measles used to. Sporadic? Like Ebola Never? Lab safety: The patient, a 27-year-old virologist, worked on the West Nile virus in a biosafety level 3 lab at the Environmental Health Institute, where the SARS coronavirus was also studied (Enserink, 2003) How does the immune system “see” a virus? The immune system  The innate immune system – – –  Found in animals and plants Fast response Complement, Toll like receptors The adaptive Immune system – – – Found in vertebrates Stronger response 2nd time B lymphocytes   – Produce antibodies (Abs) recognizes 3D shapes Neutralize virus/bacteria outside cells T lymphocytes  Cytotoxic T lymphocytes (CTLs) - MHC class I – –  Recognize foreign protein sequences in infected cells Kill infected cells Helper T lymphocytes (HTLs) - MHC class II – – Recognize foreign protein sequences presented by immune cells Activates cells Weight matrices (Hidden Markov models) YMNGTMSQV GILGFVFTL ALWGFFPVV ILKEPVHGV ILGFVFTLT LLFGYPVYV GLSPTVWLS WLSLLVPFV FLPSDFFPS CVGGLLTMV FIAGNSAYE A2 Logo Protein sequence information content Entropy  – – – Average Uncertainty in the random variable H = -Spilog2pi range: 0 to log2(20) = 4.3 Logo height I = log2(20) + H Relative entropy (Kullback Leibler distance)  – D = Spilog2(pi/qi) range: 0 to infinity Mutual information  – – Reduction in uncertainty due to knowledge of another random variable (corresponds to correlation) M = SSpijlog2(pij/pipj) Prediction of MHC binding specificity Simple Motifs  – Allowed (non allowed) amino acids Extended motifs  – Amino acid preferences Structural models  – Limitations: precision of force field, and speed of calculations Neural networks  – Can take correlations into account Log odds ratios     Used for scoring Alignments (BLAST), HMMs, Matrix methods Odds ratio of observing given amino acids – Relative probability of observing amino acid i in motif position j – Oj = p(aai at pos j)/p(aai) Assumption of independence => – Odds for observing sequence = O1O2 … On Log odds ratio – LO = log(O1O2 … On) = log(O1)+log(O2)+…log(On) – LO in half bits = 2 LO/log(2) G F C A Evaluation of prediction accuracy Coverage = TP/actual_positive Reliability = TP/predicted_positive A*1101 performance 154 peptides, 9 Binders 95% Reliability 1 0.8 0.8 0.6 0.6 0.4 0.2 0.43 0.33 0.18 0.11 0 Correlation 1 0.44 0.4 0.2 0 Prediction method 0 0 0 Prediction method Pearson correlation coefficient 1 Coverage True positive ratio 50% Coverage 0.76 0.8 0.6 0.4 0.450.5 0.35 0.2 0 Prediction method SYFPEITHI Bimas HMM NN SYFPEITHI Bimas HMM NN SYFPEITHI Bimas HMM NN The MHC gene region From Bill Paul, ”Fundamental Immunology”, 4th Ed Human Leukocyte antigen (HLA=MHC in humans) polymorphism - alleles A total of 229 HLA-A 464 HLA-B 111 HLA-C class I alleles have been named, a total of 2 HLA-DRA, 364 HLA-DRB 22 HLA-DQA1, 48 HLA-DQB1 20 HLA-DPA1, 96 HLA-DPB1 class II sequences have also been assigned. As of October 2001 (http://www.anthonynolan.com/HIG/index.html) HLA polymorphism - supertypes •Each HLA molecule within a supertype essentially binds the same peptides •Nine major HLA class I supertypes have been defined •HLA-A1, A2, A3, A24,B7, B27, B44, B58, B62 Sette et al, Immunogenetics (1999) 50:201-212 HLA polymorphism - frequencies Supertypes Phenotype frequencies Caucasian Black Japanese Chinese Hispanic Average A2,A3, B27 83 % 86 % 88 % 88 % 86 % 86% +A1, A24, B44 100 % 98 % 100 % 100 % 99 % 99 % +B7, B58, B62 100 % 100 % 100 % 100 % 100 % 100 % Sette et al, Immunogenetics (1999) 50:201-212 Conclutions  We suggest to – – – – split some of the alleles in the A1 supertype into a new A26 supertype split some of the alleles in the B27 supertype into a new B39 supertype. the B8 alleles may define their own supertype The specificities of the class II molecules can be clustered into nine classes, which only partly correspond to the serological classification Lund O, Nielsen M, Kesmir C, Petersen AG, Lundegaard C, Worning P, Sylvester-Hvid C, Lamberth K, Roder G, Justesen S, Buus S, Brunak S. Definition of supertypes for HLA molecules using clustering of specificity matrices. Immunogenetics. 2004 Feb 13 [Epub ahead of print] MHC class I binding of SARS peptides  Predictions for all supertypes –  Allele specific neural networks – –  Broad population coverage Peptides with associated measured binding affinity A1 (A0101), A2 (A0204), A3 (A1101+A0301), B7 (B0702) Weight matrices – – Peptides from public databases (Sypfeithi, MHCpep) A24, B27, B44, B58 and B62 Super type weight matrices B27 B44 B58 B62 Proteasomal cleavage Epitope predictions    Binding to MHC class I High probability for C-terminal proteasomal cleavage No sequence variation Inside out: 1. Position in RNA 2. Translated regions (blue) 3. Observed variable spots 4. Predicted proteasomal cleavage 5. Predicted A1 epitopes 6. Predicted A*0204 epitopes 7. Predicted A*1101 epitopes 8. Predicted A24 epitopes 9. Predicted B7 epitopes 10. Predicted B27 epitopes 11. Predicted B44 epitopes 12. Predicted B58 epitopes 13. Predicted B62 epitopes Strategy for the quantitative ELISA assay C. Sylvester-Hvid, et al., Tissue antigens, 2002: 59:251 • Step I: Folding of MHC class I molecules in solution b2m Heavy chain peptide Incubation Peptide-MHC complex • Step II: Detection of de novo folded MHC class I molecules by ELISA Development Christina Sylvester-Hvid, University of Copenhagen , July, 2003 Summery of peptide binding assays A1 A2 A3 A24 B7 B27 B44 B58 B62 #tested 15 15 15 0 15 13 0 15 14 #binding <500nM 13 12 14 10 2 13 12 Initial polytope (19 HIV epitopes) • New epitopes 12 • Poor C-term cleavage 8 • Cleavage within 31 • Linker length 12 Optimized polytope • New epitopes 1 • Weak C-term cleavage 3 • Cleavage within 7 • Linker length 37 MHC class II Molecule Virtual matrices  HLA-DR molecules sharing the same pocket amino acid pattern, are asumed to have identical amino acid binding preferences. MHC Class II binding  Virtual matrices – –   TEPITOPE: Hammer, J., Current Opinion in Immunology 7, 263-269, 1995, PROPRED: Singh H, Raghava GP Bioinformatics 2001 Dec;17(12):1236-7 Web interface http://www.imtech.res.in/raghava/propred Prediction Results MHC class II prediction  Complexity of problem – –   Peptides of different length Weak motif signal Alignment crucial Gibbs Monte Carlo sampler RFFGGDRGAPKRG YLDPLIRGLLARPAKLQV KPGQPPRLLIYDASNRATGIPA GSLFVYNITTNKYKAFLDKQ SALLSSDITASVNCAK PKYVHQNTLKLAT GFKGEQGPKGEP DVFKELKVHHANENI SRYWAIRTRSGGI TYSTNEIDLQLSQEDGQTIE Class II binding motif Random ClustalW Alignment by Gibbs sampler RFFGGDRGAPKRG YLDPLIRGLLARPAKLQV KPGQPPRLLIYDASNRATGIPA GSLFVYNITTNKYKAFLDKQ SALLSSDITASVNCAK PKYVHQNTLKLAT GFKGEQGPKGEP DVFKELKVHHANENI SRYWAIRTRSGGI TYSTNEIDLQLSQEDGQTI Gibbs sampler Tepitope Gibbs 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 Accuracy MHC class II predictions Allele DRB1_0401 H H e Cb uk d el oo G hw 8 ut ch So e n 7 Cb ch H M en Cb ch6 H M en Cb ch5 H M en 4 Cb ch H M en Cb ch3 H M en Cb ch2 H M en 1 Cb ch n M M Polytope construction Linker NH2 M Epitope COOH C-terminal cleavage Cleavage within epitopes cleavage New epitopes Prediction of Antibody epitopes  Linear – Hydrophilicity scales (average in ~7 window)    – Other scales & combinations    Pellequer and van Regenmortel Alix Discontinuous –  Hoop and Woods (1981) Kyte and Doolittle (1982) Parker et al. (1986) Protrusion (Novotny, Thornton, 1986) Neural networks (In preparation) Secondary structure in epitopes Sec struct: H T B E S G I . Log odds ratio -0.19 0.30 0.21 -0.27 0.24 -0.04 0.00 0.17 H: G: I: E: B: S: T: .: Alpha-helix (hydrogen bond from residue i to residue i+4) 310-helix (hydrogen bond from residue i to residue i+3) Pi helix (hydrogen bond from residue i to residue i+5) Extended strand Beta bridge (one residue short strand) Bend (five-residue bend centered at residue i) H-bonded turn (3-turn, 4-turn or 5-turn) Coil Amino acids in epitopes Fre Amino Acid G A V L I M P F W S e/E 0.09 0.07 0.05 0.08 0.04 0.02 0.06 0.03 0.01 0.08 . 0.07 0.08 0.07 0.10 0.06 0.03 0.05 0.05 0.02 0.07 Amino acid C T Q N H Y E D K R e/E 0.03 0.08 0.04 0.04 0.02 0.04 0.06 0.07 0.07 0.04 . 0.03 0.06 0.04 0.05 0.02 0.03 0.04 0.04 0.05 0.04 Dihedral angles in epitopes Z-scores for number of dihedral angle combinations in epitopes vs. non epitopes Phi\Psi 1 2 3 4 5 6 7 8 9 10 11 12 1 -0.47 0.44 -0.58 0.45 0.46 0.00 0.00 -0.73 -0.79 0.00 -0.83 1.42 2 -0.01 -0.12 -1.82 0.52 1.75 0.00 0.00 0.00 1.42 -0.82 0.00 0.00 3 1.82 -2.26 -1.57 0.48 0.10 0.00 -0.77 0.45 1.77 0.00 -0.82 0.99 4 1.76 1.15 -0.34 0.75 0.00 0.00 0.97 0.16 0.38 1.03 0.00 0.00 5 -0.85 0.45 -1.09 0.57 0.00 0.00 0.00 0.13 1.52 0.00 1.02 -0.79 6 0.60 1.28 1.30 1.73 0.00 0.00 0.00 0.00 1.32 -0.89 -0.76 0.00 7 0.27 -0.91 1.67 -0.51 0.00 0.00 0.00 0.00 -1.02 -1.09 0.00 0.00 8 0.93 1.21 -0.23 -3.63 0.49 0.00 0.00 0.00 0.00 -0.19 0.31 -0.82 9 0.00 0.28 -0.67 0.33 0.01 -0.83 0.00 0.00 0.87 0.23 0.00 0.00 10 0.00 0.95 1.71 -0.70 0.00 0.00 0.00 1.29 1.08 0.00 1.00 0.00 11 0.00 0.00 1.02 0.00 0.00 0.00 0.00 0.86 -0.75 0.00 0.00 0.00 12 0.42 0.83 0.28 1.68 0.00 0.00 0.00 0.00 1.03 -0.21 -0.79 0.93 Immunological bioinformatics  Classical experimental research – –  New experimental methods – – –  Few data points Data recorded by pencil and paper/spreadsheet Sequencing DNA arrays Proteomics Need to develop new methods for handling these large data sets  Immunological Bioinformatics/Immunoinformatics Acknowledgements CBS, Technical University of Denmark Søren Brunak (Director of CBS) Morten Nielsen (Epitope prediction) Peder Worning (Genome atlases) Claus Lundegaard (Data bases) Mette Børgesen (CTL prediction) Jesper Schantz (Polytope optimization) IMMI, University of Copenhagen Søren Buus (Professor) Christina Sylvester-Hvid (Experimental coordinator) Kasper Lamberth (Peptide bank, Quality control) Erland Johansson, Jeanette Nielsen (Preparations of peptides) Hanne Møller (ELISA binding assay)

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Lund_Apr04