* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Presentation @ 1:30 - Bioinformatics at School of Informatics
Survey
Document related concepts
Magnesium in biology wikipedia , lookup
Metabolomics wikipedia , lookup
Genetic code wikipedia , lookup
Amino acid synthesis wikipedia , lookup
Biosynthesis wikipedia , lookup
Matrix-assisted laser desorption/ionization wikipedia , lookup
Evolution of metal ions in biological systems wikipedia , lookup
Protein structure prediction wikipedia , lookup
Mass spectrometry wikipedia , lookup
Biochemistry wikipedia , lookup
Peptide synthesis wikipedia , lookup
Metalloprotein wikipedia , lookup
Proteolysis wikipedia , lookup
Ribosomally synthesized and post-translationally modified peptides wikipedia , lookup
Transcript
A Neural Network Predictor for Peptide Fragmentation in Mass Spectrometry Arunima Ram Advisor : Dr. Predrag Radivojac Co-Advisor : Dr. Haixu Tang Co-Advisor : Dr. Randy J. Arnold Indiana University, Bloomington, Indiana Outline Introduction to Proteomics Introduction to Neural Networks Objective Data and Process Results Future Work Acknowledgments Introduction to Proteomics Proteins are molecules of life, made up of chain of amino acids. There are 20 known amino acids and each are represented by a letter Proteome is sum of all proteins in an organism, tissue or sample under study Amino Acid Introduction to Proteomics Proteomics is study of protein composition of an organelle, or cell or entire organism to discover the following goals Identification Quantification Expression changes Modifications Interaction with other proteins and molecules Mass Spectrometers are machines used for proteomics study. Introduction to Proteomics Protein separation Protein digestion by specific enzyme trypsin into peptides Peptides are separated and charged Mass Spectrometer selects peptide based on mass Mass Spectrum (MS) peptides is recorded of Each peptide is fragmented and sent through a second MS to record MS/MS data Ruedi Aebersold & Matthias Mann, NATURE :VOL 422, 198=207 Introduction to Proteomics Fragmentation of follows certain rules peptides b ion – N terminal fragment y ion – C terminal fragment Most abundant are b and y ions Multiply charged peptide can generate multiply charged fragment ions Certain residues lose water or ammonia or both to generate less abundant ions http://www.ionsource.com/tutorial/DeNovo/nomenclature.htm Introduction to Proteomics b ions http://www.ionsource.com/tutorial/DeNovo/nomenclature.htm Introduction to Proteomics y ions http://www.ionsource.com/tutorial/DeNovo/nomenclature.htm Introduction to Artificial Neural Networks Neural networks are composed of interconnected neurons working in unison to solve specific problems analogous to animal brains ANN’s learn from examples to extract patterns and detect trends too complex to be noticed otherwise Benefits Components of a Neuron cell Can learn real-valued, discrete-valued function Robust to noise in data Single Neuron http://www.doc.ic.ac.uk/~nd/surprise_96/journal/vol4/cs11/report.html Introduction to Artificial Neural Networks Training examples fed to input layer Weights associated with each input in each layer Weighted input combined at each layer to give an output Hidden layer computes output using a logistic function and feeds to output layer Determine error between output of network and desired output Accordingly adjust weights in each layer and iterate through 2 – layered Feed-forward Neural Network Objective Matching fragmentation spectrum of peptides through “Database Matching” uses ad-hoc rules or probabilistic models and cannot match proteins not present in database Aim – Use Machine Learning to learn Peptide Fragmentation rules from set of examples and predict the Fragmentation spectra and use that to better identify Peptides and Proteins Dataset Organism Charge 1 Total/Unique Charge 2 Total/Unique Charge 3 Total/Unique Search Engine Shewanella 7175 / 7175 17647 / 17647 3489 / 3489 Sequest Rat 150 / 58 3047 / 1782 421 / 305 Mascot Human 4433 / 472 63012 / 2261 14384 / 775 Sequest Drosophila ------- 2331 / 1234 28 / 25 Mascot Mouse 1562 / 419 77030 / 8779 31974 / 3961 Sequest Process 202 features extracted for 8 ions in charge 1 and 10 ions in charge 2 and charge 3 Amino acids in the peptide Amino acids at both side of cleavage Amino acid at NTerminal L N V W G K Amino acid at CTerminal Number of Arginine and Lysine in peptide Cleavage point( b-3 ion ) Mass of Peptide and Mass of Fragment Ion Basicity, Hydrophobicity, Isoelectric point, helix propensity for peptide and for the charged ion and for neighboring amino acids R. J. Arnold, N. Jayasankar, D. Aggarwal, H. Tang, P. Radivojac. A machine learning approach to predicting peptide fragmentation spectra. PSB 2006, pp. 219-230 Process Target values intensity >= 1% of total intensity = 1 intensity < 1% of total intensity =0 Number of Positives much smaller than Negatives, hence create class-balanced dataset 10 fold Cross validation Input data partitioned into 10 disjoint sets One set becomes test set and rest 9 become training set Feature Set Reduction Unrelated features removed using T-Test Correlated features Component Analysis removed using Dimensionality Reduction Learning task reduced to classification problem – ion Principal exists or not Process Train 10 ensemble neural network with the best performing hidden neurons for EACH ion in EACH charge Report statistics on each cross-fold and average across each cross fold Sensitivity - % of correctly identified positive examples Specificity - % of correctly identified negative examples Accuracy – ( Sn + Sp ) / 2 AUC – Area under the ROC curve Ensemble of Neural Networks Acta Chim. Slov. 2005, 52, 440–449 Process - Predictor Final training done with ALL data Neural Net architecture saved for future use Steps – score p.o 1 o o(2 p 1) Input Peptide with charge to predictor Peptide decomposed into features Extract saved ANN architecture for each ion in each charge Predict on 10 ensembles and output the averaged prediction p.o score Score intensities as such – 1 o o(2 p 1) p = prior probability o = predicted output R. J. Arnold, N. Jayasankar, D. Aggarwal, H. Tang, P. Radivojac. A machine learning approach to predicting peptide fragmentation spectra. PSB 2006, pp. 219-230 Reproducibility Analysis Among Mouse liver replicates, pick one as actual spectrum and other as predictions and compute AUC values Determines maximum accuracy that can be achieved by any fragmentation predictor Results - Reproducibility Analysis Ion Name Charge 1 AUC Charge 2 AUC Charge 3 AUC b 97.91 90.90 95.53 b-H2O 96.95 89.66 95.45 b-NH3 93.38 87.97 93.65 b-H2O-NH3 96.70 91.35 96.26 b++ --------- 93.80 92.13 y 97.27 92.90 95.76 y-H2O 91.89 89.57 93.21 y-NH3 94.41 86.79 91.11 y-H2O-NH3 96.41 93.67 96.94 y++ --------- 93.66 93.18 Results – Cross validation Accuracies Ion Charge 1 Sn / Sp Acc / AUC precursor-H2O 57.6 / 60.2 Charge 2 Sn / Sp Acc / AUC Charge 3 Sn / Sp Acc / AUC 58.9 / 61.1 64.9 / 66.1 65.5 / 71.4 59.6 / 58.5 59 / 62.7 77 / 84.6 78.4 / 85.9 b 83.2 / 78.8 81 / 89 b-H2O 79.7 / 76.4 78.1 / 86.1 76.8 / 75.8 76.3 / 83.9 82.5 / 65.5 74 / 81.6 b-NH3 77.2 / 75.1 76.1 / 83.5 73.1 / 76.9 75 / 82.6 83.2 / 61.2 72.2 / 78.8 b-H2O-NH3 74.3 / 76.1 75.2 / 82 72.3 / 64.2 68.2 / 75.2 84.2 / 59.2 71.7 / 77.8 ----------------------------- 77.4 / 72.8 75.1 / 83.1 80.6 / 72.2 76.4 / 84.4 82.8 / 90 b++ 78.7 / 75.4 81.1 / 75.8 y 82.6 / 82.3 82.4 / 90.1 84.4 / 79.6 82 / 89.7 y-H2O 79.1 / 77.8 78.4 / 86.1 77.8 / 73.2 75.5 / 82.6 82.5 / 64.7 73.6 / 80.5 y-NH3 76.5 / 68.3 72.4 / 79.5 69.6 / 66.9 68.3 / 75.3 81.7 / 61.9 71.8 / 78.4 y-H2O-NH3 70.4 / 76.4 73.4 / 80.7 74.2 / 64.3 69.3 / 75.7 82.5 / 62.4 72.4 / 77.3 84 / 90.9 80.4 / 87.9 y++ ------------------------------ 86.6 / 81.5 84.4 / 81.1 85.9 / 74.9 Sensitivity-Specificity and Accuracy-AUC for all charges in all ions on Cross Validation Results – Cross Testing Accuracies on Drosophila data for charge 2 Ion Sn Sp ACC AUC Combined AUC b 69.4 78.5 73.9 82.1 84.6 b-H2O 79.4 72.8 76.1 83.6 83.9 b-NH3 71.1 75.0 73.1 81.8 82.6 b-H2O-NH3 69.9 66.0 67.9 76.8 75.2 b++ 65.7 77.0 71.4 77.2 83.1 y 68.0 86.8 77.4 86.8 89.7 y-H2O 63.0 70.4 66.7 72.5 82.6 y-NH3 54.1 75.2 64.6 72.7 75.3 y-H2O-NH3 53.9 70.2 62.1 67.5 75.7 y++ 90.3 82.2 86.2 92.6 90.9 MassAnalyzer – Peptide Fragmentation tool Uses Mathematical model to predict fragmentation Uses one model for charge 1 and charge 2 and a separate model for higher charges Z. Zhang, Anal. Chem. 2004, 76(14),3908-3922 Z. Zhang, Anal. Chem. 2005, 77(19),6634-6373 Results – Prediction Comparison Charge 1 Charge 2 Ion Name AUC MA AUC ANN AUC MA AUC ANN b 90.44 91.74 85.61 b-H2O 89.37 91.87 b-NH3 85.82 b-H2O-NH3 Charge 3 AUC MA AUC ANN 86.24 90 90.24 86.84 85.85 88.99 86.97 89.01 85.40 85.17 83.61 86.51 61.31 90.12 71.20 80.15 77.45 85.58 b++ --------- -------- 85.34 87.20 86.01 88.56 y 86.65 88.20 85.98 88.57 91.72 91.82 y-H2O 85.96 81.38 77.58 72.95 82.88 87.89 y-NH3 87.62 78.43 78.30 75.73 85.10 85.03 y-H2O-NH3 64.82 77.50 69.51 77.28 76.55 84.47 y++ -------- -------- 90.92 93.34 85.60 87.55 Results – Prediction Comparison ROC figures – Charge 2 Results – Spectrum Comparison Future Work Reproducibility analysis on various other datasets and incorporating for replicate size( number of replicates for each spectrum ) Use Predicted Spectra to build another Predictor that would learn to score the given spectrum Acknowledgements Dr. Predrag Radivojac Dr. Haixu Tang Dr. Randy J. Arnold Lab mates – Amrita Mohan Nils Schimmelmann Wyatt Clark Yong Li Linda Hostetter Bioinformatics faculty at SOI School of Informatics