Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Napovedovanje imunskega odziva iz peptidnih mikromrež Mitja Luštrek1 (2), Peter Lorenz2, Felix Steinbeck2, Georg Füllen2, Hans-Jürgen Thiesen2 1 Odsek za inteligentne sisteme, Institut Jožef Stefan 2 Univerza v Rostocku 1. Introduction 2. Immune response prediction 3. Interpretation 1. Introduction 2. Immune response prediction 3. Interpretation Peptide = part of protein = short sequence of amino acids Image taken from EMBL website Peptide = part of protein = short sequence of amino acids SNDIVLT Image taken from EMBL website = string of letters from 20-letter alphabet (1 letter = 1 amino acid, 20 standard amino acids) Epitope Antigen protein Antibody binding Antibody Epitope Epitope Antigen protein Antibody binding Antibody Epitope Epitope Antigen protein Peptide Epitope Epitope Antigen protein Epitope Epitope Antigen protein Antibody binding Antibody Epitope Epitope Antigen protein Antibody binding Antibody Epitope Epitope Antigen protein Antibody binding Antibody Epitope Epitope Antigen protein Epitope Epitope Antigen protein Peptide arrays Peptide array Peptides (15 amino acids) Glass slide Peptide arrays IVIg antibody mixture Peptide array Peptides (15 amino acids) Glass slide Peptide arrays IVIg antibody mixture Red = epitopes (bind antibodies) Black = non-epitopes Peptide array Peptides (15 amino acids) Glass slide Peptide arrays Red = epitopes (bind antibodies) Black = non-epitopes Antibody against antibody + dye Antibody Peptide Glass slide Peptide arrays Peptide Class PGIGFPGPPGPKGDQ non-ep. Red = epitopes (bind antibodies) Black = non-epitopes PNMVFIGGINCANGK non-ep. DGIGGAMHKAMLMAQ non-ep. REDNLTLDISKLKEQ non-ep. TPLAGRGLAERASQQ non-ep. DQVHPVDPYDLPPAG non-ep. ... RRMISRMPIFYLMSG epitope LPPGFKRFTCLSIPR epitope EFSQMESYPEDYFPI epitope ... 1. Introduction 2. Immune response prediction 3. Interpretation Our task Peptide RRKGGLEEPQPPAEQ SEDLENALKAVINDK EDHVKLVNEVTEFAK GEKIIQEFLSKVKQM ILVSRSLKMRGQAFV YTCQCRAGYQSTLTR ... Our task Peptide Peptide Class RRKGGLEEPQPPAEQ RRKGGLEEPQPPAEQ non-ep. SEDLENALKAVINDK SEDLENALKAVINDK non-ep. EDHVKLVNEVTEFAK EDHVKLVNEVTEFAK non-ep. GEKIIQEFLSKVKQM Machine learning GEKIIQEFLSKVKQM non-ep. ILVSRSLKMRGQAFV ILVSRSLKMRGQAFV epitope YTCQCRAGYQSTLTR YTCQCRAGYQSTLTR epitope ... ... Our task Peptide Peptide Class RRKGGLEEPQPPAEQ RRKGGLEEPQPPAEQ non-ep. SEDLENALKAVINDK SEDLENALKAVINDK non-ep. EDHVKLVNEVTEFAK EDHVKLVNEVTEFAK non-ep. GEKIIQEFLSKVKQM Machine learning GEKIIQEFLSKVKQM non-ep. ILVSRSLKMRGQAFV ILVSRSLKMRGQAFV epitope YTCQCRAGYQSTLTR YTCQCRAGYQSTLTR epitope ... ... Training set: 13,638 peptides (3,420 epitopes) Test set: 13,640 peptides (3,421 epitopes) Balanced until the final testing Machine learning Peptide Class PGIGFPGPPGPKGDQ non-ep. / epitope Machine learning Class Peptide PGIGFPGPPGPKGDQ non-ep. / epitope Attribute 1 Attribute 2 value 1 value 2 ... Class non-ep. / epitope Attribute representation Machine learning Class Peptide PGIGFPGPPGPKGDQ non-ep. / epitope Attribute 1 Attribute 2 value 1 value 2 ... Class non-ep. / epitope ML Classifier Proability for epitope p Attribute representation Machine learning Class Peptide PGIGFPGPPGPKGDQ non-ep. / epitope Attribute 1 Attribute 2 value 1 value 2 ... Class non-ep. / epitope ML Classifier Proability for epitope p Attribute representation Machine learning Class Peptide PGIGFPGPPGPKGDQ non-ep. / epitope Attribute representation 1 ... ML Classifier 1 Attribute representation 8 ML ... Classifier 8 Machine learning Class Peptide PGIGFPGPPGPKGDQ non-ep. / epitope Attribute representation 1 Attribute representation 8 ... ML ML Classifier 1 ... Classifier 8 Probabilities for epitope p1 p2 p3 p4 p5 Class p6 p7 p8 non-ep. / epitope Final proability for epitope p Meta classifier Machine learning Class Peptide PGIGFPGPPGPKGDQ non-ep. / epitope Attribute representation 1 SVM (SMO), Logistic regression Attribute representation 8 ... ML ML Classifier 1 ... Final proability for epitope p Classifier 8 Linear regression Probabilities for epitope p1 p2 p3 p4 p5 Class p6 p7 p8 non-ep. / epitope Meta classifier Attribute representation 1 Amino-acid counts RRMISRMPIFYLMSG Count of A C D E F G H I 1 1 2 K L M N P Q R S 1 2 3 1 3 T V W Y 1 Attribute representation 2 Amino-acid count differences RRMISRMPIFYLMSG Difference in counts of F–G F–I 0 –1 F–L F–M F–P F–R F–S F–Y G–F G–I 0 –2 0 –2 –1 0 0 –1 ... Attribute representation 3 Subsequence counts RRMISRMPIFYLMSG Count of RR RM MI 1 2 1 ... RRM RMI 1 1 MIS 1 ... ACDE ... ACDEF ... 0 0 Attribute representation 4 Amino-acid class counts l l l l t l l s l l l l l t t RRMISRMPIFYLMSG bbnnnbnnnnnnnnn Count of tiny small large basic acidic neutral 3 1 11 3 0 12 ... Attribute representation 5 Amino-acid class subsequence counts l l l l t l l s l l l l l t t RRMISRMPIFYLMSG bbnnnbnnnnnnnnn Count of ll lt tl ls sl tt 8 2 1 1 1 1 ... bb bn nb nn 1 2 1 10 ... Attribute representation 6 Amino-acid pair counts Rationale: antibodies may bind in two places due to their twochain structure. Antibody Peptide Attribute representation 6 Amino-acid pair counts Rationale: antibodies may bind in two places due to their twochain structure. RRMISRMPIFYLMSG 123 Antibody 3 Peptide Count of pairs at distance (R,R) at 1 (R,M) at 2 (R,I) at 3 1 1 2 ... (A,C) at 1 (A,C) at 2 0 0 ... Attribute representation 7 Amino-acids at distances from first + first amino acid Rationale: antibodies may bind in two places, first amino acid most accesible on the peptide array. Antibody Peptide Attribute representation 7 Amino-acids at distances from first + first amino acid Rationale: antibodies may bind in two places, first amino acid most accesible on the peptide array. Antibody R RMISRMPIFYLMSG Peptide Count of at distance ... R at 1 1 ... M at 2 1 ... A at 3 C at 3 0 0 ... First R Attribute representation 8 Average amino-acid properties RRMISRMPIFYLMSG Hydrophobicity Size Polarity Flexibility Accesibility 0.448 0.596 0.306 0.231 0.376 ... Attribute representation 9 (not used) Amino-acid counts with a difference RRMISRMPIFYLMSG RRMISRMPIWYLMSG Equivalent for epitope prediction? Attribute representation 9 (not used) Amino-acid counts with a difference RRMISRMPIFYLMSG RRMISRMPIWYLMSG Equivalent for epitope prediction? Count F as: • 1F • 0.8 W • 0.4 Y • ... Count W as: • 1W • 0.7 F • 0.3 Y • ... Attribute representation 9 (not used) Amino-acid substitution matrix A C D ... F W Y A 1 C D ... F W Y 1 1 1 0.8 0.4 0.7 1 0.3 1 Attribute representation 9 (not used) Amino-acid substitution matrix A C D ... F W Y A 1 C D ... F W Y 1 Optimize with a genetic algorithm to maximize classification accuracy 1 1 0.8 0.4 0.7 1 0.3 1 Results – training set Attribute representation Amino-acid counts Amino-acid count differences Subsequence counts Amino-acid class counts Amino-acid class subsequence counts Amino-acid pair counts Amino acids at distances from the first Average amino-acid properties AUC Accuracy 0.870 80.7 % 0.868 80.3 % 0.867 80.5 % 0.873 81.2 % 0.866 80.5 % 0.865 80.6 % 0.873 81.2 % 0.863 80.3 % Results – training set Attribute representation Amino-acid counts Amino-acid count differences Subsequence counts Amino-acid class counts Amino-acid class subsequence counts Amino-acid pair counts Amino acids at distances from the first Average amino-acid properties Combined AUC Accuracy 0.870 80.7 % 0.868 80.3 % 0.867 80.5 % 0.873 81.2 % 0.866 80.5 % 0.865 80.6 % 0.873 81.2 % 0.863 80.3 % 0.881 83.3 % Results – test set Attribute representation / dataset Best single / training set Combined / training set Combined / test set AUC Accuracy 0.873 81.2 % 0.881 83.3 % 0.883 83.7 % Results – test set Attribute representation / dataset Best single / training set (balanced) Combined / training set (balanced) Combined / test set (balanced) Combined / test set (original) AUC Accuracy 0.873 81.2 % 0.881 83.3 % 0.883 83.7 % 0.884 85.9 % Epitope : non-epitope = 1 : 1 Epitope : non-epitope = 1 : 3 Results – test set Attribute representation / dataset Best single / training set (balanced) Combined / training set (balanced) Combined / test set (balanced) Combined / test set (original) EL-Manzalawy / test set (balanced) EL-Manzalawy / test set (original) State of the art: SVM + string kernel (EL-Manzalawy et al., 2008) Trained and tested on our data. AUC Accuracy 0.873 81.2 % 0.881 83.3 % 0.883 83.7 % 0.884 85.9 % 0.868 82.0 % 0.874 83.9 % Results – test set Our results Balanced: 0.883 / 83.7 % Original: 0.884 / 85.9 % EL-Manzalawy Balanced: 0.868 / 82.0 % Original: 0.874 / 83.9 % 1. Introduction 2. Immune response prediction 3. Interpretation Rules Interpretable classifier: • Interpretable attributes (frequencies, properties of amino acids) • RIPPER (JRip) to induce rules Rules Interpretable classifier: • Interpretable attributes (frequencies, properties of amino acids) • RIPPER (JRip) to induce rules Property Aromaticity Low/high High Applies to peptides 53.8 % If a peptide has a high aromaticity, it binds antibodies. This applies to 53.8 % of peptides that bind antibodies. (Aromaticity is the percentage of aromatic amino acids in the peptide.) Rules Property Aromaticity Polarity Frequency of tyrosine Hydrophobicity Frequency of arginine Summary factor 2 Acidity Preference for -sheets Summary factor 5 Low/high High Low High Low High High Low Low High Applies to peptides 53.8 % 27.7 % 26.2 % 22.5 % 19.7 % 16.7 % 11.4 % 4.3 % 3.0 % Epitope propensity Frequency in peptides with epitopes, divided by frequency in peptides without epitopes Epitope propensity Aromatic Epitope propensity Non-polar Epitope propensity Tyrosine (Un)classifiable peptides Simplified classifier: • Interpretable attributes (frequencies, properties of amino acids) • Logistic regression to train the classifier Peptides All AUC Accuracy 0.860 83.0 % (Un)classifiable peptides Simplified classifier: • Interpretable attributes (frequencies, properties of amino acids) • Logistic regression to train the classifier Peptides All Classifiable Unclassifiable Classified correctly AUC Accuracy 0.860 83.0 % Classified incorrectly (Un)classifiable peptides Simplified classifier: • Interpretable attributes (frequencies, properties of amino acids) • Logistic regression to train the classifier Peptides All Classifiable Unclassifiable AUC Accuracy 0.860 83.0 % 0.999 98.8 % 0.956 91.5 % Expected Strange? (Un)classifiable – rules Attribute Aromaticity Polarity Frequency of arginine Frequency of tyrosine Summary factor 5 Antigenicity Hydrophobicity Frequency of histidine Frequency of cysteine Preference for reverse turns Occurrence in turns Frequency of alanine Classifiable L/h Applies High 74.3 % Low 58.7 % High 31.5 % High 20.7 % High 15.1 % High 7.3 % Low 4.7 % Low 3.9 % Unclassifiable L/h Applies Low 53.3 % High 27.5 % Low 34.0 % Low 16.9 % Low 15.2 % Low 8.7 % High 6.5 % Low High Low High 10.4 % 10.4 % 10.4 % 8.7 % (Un)classifiable – rules Attribute Aromaticity Polarity Frequency of arginine Frequency of tyrosine Summary factor 5 Antigenicity Hydrophobicity Frequency of histidine Frequency of cysteine Preference for reverse turns Occurrence in turns Frequency of alanine Classifiable L/h Applies All: 53.8 % 74.3 % High LowAll: 27.7 % 58.7 % High 31.5 % High 20.7 % High 15.1 % High 7.3 % Low 4.7 % Low 3.9 % Unclassifiable L/h Applies Low 53.3 % High 27.5 % Low 34.0 % Low 16.9 % Low 15.2 % Low 8.7 % High 6.5 % Low High Low High 10.4 % 10.4 % 10.4 % 8.7 % (Un)classifiable – epitope propensity (Un)classifiable peptides Simplified classifier: • Interpretable attributes (frequencies, properties of amino acids) • Logistic regression to train the classifier Peptides All Classifiable Unclassifiable AUC Accuracy 0.860 83.0 % 0.999 98.8 % 0.956 91.5 % Strange? Not really! Inevitable or does it mean something? 2nd degree (un)classifiable peptides • Unclassifiable peptides only • Simplified classifier Peptides All unclassifiable AUC Accuracy 0.956 91.5 % 2nd degree (un)classifiable peptides • Unclassifiable peptides only • Simplified classifier Peptides AUC Accuracy Classified correctly All unclassifiable 0.956 91.5 % Classifiable unclassifiable Classified incorrectly Unclassifiable unclassifiable 2nd degree (un)classifiable peptides • Unclassifiable peptides only • Simplified classifier Peptides All unclassifiable Classifiable unclassifiable Unclassifiable unclassifiable AUC Accuracy 0.956 91.5 % 0.992 97.8 % 0.683 65.0 % 2nd degree (un)classifiable peptides Peptides All unclassifiable Classifiable unclassifiable Unclassifiable unclassifiable AUC Accuracy 0.956 91.5 % 0.992 97.8 % 0.683 65.0 % (Un)classifiable peptides Peptides All Classifiable Unclassifiable AUC Accuracy 0.860 83.0 % 0.999 98.8 % 0.956 91.5 % Not inevitable! Inevitable or does it mean something? 2nd degree (un)cl. – epitope propensity Conclusions • Epitopes have common characteristics Conclusions • Epitopes have common characteristics – Epitopes are parts of antigens that bind antibodies Our peptides mostly did not come from known antigens Probably partly general and partly antibody-specific binding Conclusions • Epitopes have common characteristics – Epitopes are parts of antigens that bind antibodies Our peptides mostly did not come from known antigens Probably partly general and partly antibody-specific binding • Epitope characteristics are not unexpected Conclusions • Epitopes have common characteristics – Epitopes are parts of antigens that bind antibodies Our peptides mostly did not come from known antigens Probably partly general and partly antibody-specific binding • Epitope characteristics are not unexpected • Two groups of epitopes: – around 80 % “typical” (classifiable) – around 20 % “atypical” (unclassifiable) Conclusions • Epitopes have common characteristics – Epitopes are parts of antigens that bind antibodies Our peptides mostly did not come from known antigens Probably partly general and partly antibody-specific binding • Epitope characteristics are not unexpected • Two groups of epitopes: – around 80 % “typical” (classifiable) – around 20 % “atypical” (unclassifiable) Mostly generalpurpose antibodies? Mostly antigenspecific antibodies?