Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Using MS/MS Spectrum Libraries for the Detection of PTM’s Markus Müller Swiss Institute of Bioinformatics Geneva, Switzerland © 2009 SIB Outline • MS/MS peptide identification – Spectrum library versus sequence search • QuickMod MS/MS workflow • QuickMod Open modification spectrum library search – Alignment scoring – Statistical validation – Positioning of modifications 2 © 2009 SIB QuickMod Tutorial 2011 Spectrum Library Searches 3 © 2009 SIB QuickMod Tutorial 2011 Spectrum Library Searching 4 © 2009 SIB QuickMod Tutorial 2011 Peptide-Spectrum Match (PSM) p = LREQLGPVTQEFWDNLEK; z = 3 © 2009 SIB 5 QuickMod Tutorial 2011 Spectrum Library Search Scoring • Log-transform intensities (variance stabilization, i.e. the variance of a peak becomes independent of its intensity). • Bin peak (m/z-intensity) lists into bins of width =0.1-1.0 m/z units. • Normalized dot-product score: 1 Spectrum S Pi m , log I ; i 1 n1 binning S s11 , s12 ,.., s1N 2 2 2 2 2 Spectrum S Pi mi , log I i ; i 1 n2 binning S s12 , s22 ,.., s N2 log I 1 s kj 1 1 i 1 i 2 i mmin j mik mmin j 1 N 1 2 si1si2 S S i 1 score cos 1 2 12 12 S S N 1 1 N 2 2 si si si si i 1 i 1 6 © 2009 SIB QuickMod Tutorial 2011 Spectral Library Search Zhang et al., Proteomics 2011 7 © 2009 SIB QuickMod Tutorial 2011 Spectral Library Search Zhang et al. 8 © 2009 SIB QuickMod Tutorial 2011 Spectral Library Search Zhang et al. 9 © 2009 SIB QuickMod Tutorial 2011 Spectral Library Search Zhang et al., Proteomics 2011 10 © 2009 SIB QuickMod Tutorial 2011 Spectrum Library Searches • Spectrum library searches are more accurate than sequence searches. • Scoring is less critical and easier to implement. • Spectrum library searches are very fast compared to sequence searches. • Libraries must be complete. Low abundance proteins are rarely found in spectrum libraries. • Different libraries for different instruments. 11 © 2009 SIB QuickMod Tutorial 2011 Completeness of Libraries Yeast data and one of the completest yeast libraries: 20281 of 25348 non-phospho peptides found 14186 of 31120 phospho peptides found 12 © 2009 SIB QuickMod Tutorial 2011 Completeness of Spectrum Libraries • Only 2 TF in NIST spectrum libraries of human protein! – For a given biological sample, measure the sample repetitively using inclusion/exclusion list to get maximum coverage of the peptides in the spectrum library (Schmidt A, et al.) – Clone TF in bacteria, purify, digest and measure with LC-MS (Bart Deplancke Lab) – Create synthetic peptides for all proteins of an organism and measure them with LC-MS (Aebersold lab) – Combine sequence search with spectrum library search (Ahrne et al, 2009) – Create realistic in silico spectra to complement real spectra (Cannon et al, JPR, 2011) • Few modified peptides in libraries – Use and OMS spectrum library search tool, if the unmodified form of the peptide is present (QuickMod, see below) – Isolate modified peptides and create spectrum libraries for specific modifications (PhosphoPep, PHOSPHIDA,..) 13 © 2009 SIB QuickMod Tutorial 2011 Prediction of MS/MS Spectra Zhang et al., Proteomics 2011 14 © 2009 SIB Cannon et al, JPR, 2011 QuickMod Tutorial 2011 Spectrum Library Searches Ahrne et al., Proteomics 2009 15 © 2009 SIB QuickMod Tutorial 2011 Spectrum Libraries Spectra identified with SpectraST, but not with Phenyx Ahrné et al. Proteomics, 2009 16 © 2009 SIB QuickMod Tutorial 2011 QuickMod Spectral Library Search Workflow Ahrné et al, Proteomics, 2009 17 © 2009 SIB QuickMod Tutorial 2011 Combining Search Tools (PepArML) https://edwardslab.bmcb.georgetown.edu/pymsio/ 18 © 2009 SIB QuickMod Tutorial 2011 Random and True Matches • When searching a large database, most of the candidate peptides are not present at a detectable level in a MS2 spectrum. • For example, in silico tryptic digest of 10000 proteins may yield 100x 10000 = 1’000’000 peptides, but only 300 of these peptides may actually be detectable in MS2 spectra. • The score distribution will (hopefully) be bimodal: many low scores for the random matches and higher scores for the true matches. • The random and true score distributions will evidently overlap, if the database is large. 19 © 2009 SIB QuickMod Tutorial 2011 Statistical Scores False discovery rate : FDR = FPR = B/(A+B); P-value: pValue = B/(B+C) Posterior error probability: PEP = b/(a+b) (see TPP) 20 © 2009 SIB QuickMod Tutorial 2011 Statistical Scores • Statistical scores do not depend on the details of the scoring function. • The underlying scoring function can even be multidimensional, i.e. include several scores of a search engine. • Statistical scores have a unified probabilistic interpretation, i.e. they correspond to frequencies and counts. • This allows comparing the statistical scores of different search engines with each other. 21 © 2009 SIB QuickMod Tutorial 2011 False Discovery Rate (FDR) • Decoy search to control FDR on peptide and protein level • Works for both single and combined runs if applied correctly • Does not provide an answer about modification positioning. • Does not provide an answer if there is more than one high scoring PSM. • FDR is very sensitive to high scoring random matches. • The number of peptides identified at a given FDR is dependent on the way the decoy database is created and the way FDR is calculated. • Statistically the FDR is an expectation value, i.e. the mean of many different decoy searches: FDR E FP / FP TP FP TP 0 • Each estimate with a single decoy db is only accurate within its standard error (Granholm & Käll, Proteomics 2011): 1 FDR / FP TP FP TP 2400, FDR 0.01, 0.5 0.0025 22 © 2009 SIB QuickMod Tutorial 2011 Robustness of FDR 23 © 2009 SIB QuickMod Tutorial 2011 Creation of Decoy Spectrum Libraries 1. Shuffle sequence 2. Move annotated b,y,c,z-ions in accordance with shuffled sequence (e.g. y8+ -> y8+) 3. Sample non-annotated m/z if they do not belong to a conserved pattern (intensity is left intact) Ahrne et al, Preoteomics, 2011 24 © 2009 SIB QuickMod Tutorial 2011 Fragment Peak Distribution ETD IT 25 © 2009 SIB QuickMod Tutorial 2011 Controlling FDR DeLiberator Ahrné et al, Proteomics, 2011 26 © 2009 SIB QuickMod Tutorial 2011 MS/MS Spectra of Modified Peptides • Modifications of mass of a amino acid in a peptide induce several important changes in the MS/MS spectrum: – Precursor m/z is shifted by /z – All the m/z values of the fragment ions, which contain the modified amino acid are shifted by /z – All the m/z values of the fragment ions, which do not contain the modified amino acid remain the same. However, their intensities my change significantly. – Multiple modifications induce more complicated changes. 27 © 2009 SIB QuickMod Tutorial 2011 Similarity Between Modified and Unmodified Spectra Oxidation of GQGTLSVVTM{16}YHK/2 Phosphorylation of TY{80}FPHFDLSHGSAQVK/2 28 © 2009 SIB QuickMod Tutorial 2011 QuickMod Open modification search: Spectral alignment and scoring Controlling FDR Ahrné et al. Recomb2011/JPR, submitted 29 © 2009 SIB QuickMod Tutorial 2011 Modification positioning OMS: Spectrum Libraries Versus Theoretical Spectra 30 © 2009 SIB QuickMod Tutorial 2011 QuickMod Scores QuickMod score = Linear SVM combination of 3 best scores Z=2 Z=3 31 © 2009 SIB QuickMod Tutorial 2011 Benchmarking Speed: InsPecT 30 min, PTMFinder 5 min; SpectraST 55 min; QuickMod 5 min 32 © 2009 SIB QuickMod Tutorial 2011 Modification Positioning 33 C I S K b1,b2,b3 b2,b3,y3 b3,y2,y3 y3,y2,y3 -1-1-1 -1 - 1 + 1 -1 + 1 + 1 +1 + 1 + 1 -3 -1 +1 +3 © 2009 SIB QuickMod Tutorial 2011 Modification Positioning 34 © 2009 SIB QuickMod Tutorial 2011 Multiple Modifications • QuickMod is primarily designed for single modifications • Double modifications can also be detected as long as the 2 modified residues are close together • Positioning yields a region between the two modified amino acids 35 © 2009 SIB QuickMod Tutorial 2011 Modification Positioning 36 © 2009 SIB QuickMod Tutorial 2011 Modification Positioning 2) Directed MS (Inclusion list) 1) QuickMod Workflow B2,Y2 HCD/CID IK,IF,IH Y3 Y4 Y7 Y5 Y8 CID 37 © 2009 SIB QuickMod Tutorial 2011 3) Complimentary Fragmentation CID/HCD or MS3 QuickMod Tools 38 © 2009 SIB QuickMod Tutorial 2011 Java Proteomics Library (JPL) http://javaprotlib.sourceforge.net/ 39 © 2009 SIB QuickMod Tutorial 2011 Future Work • Extend alignment to multiple modifications • Develop modification specific scores and positioning algorithms (phosphorylation) • Work on combined sequence search and spectrum library search • Apply QM to large datasets for phosphorylation and other modifications. • Use it for verification of MS/MS assignments. • … 40 © 2009 SIB QuickMod Tutorial 2011 Many Thanks to Proteome Informatics Group Swiss Institute of Bioinformatics Swetha Ramagoni Luc Mottin Leelapavan Tadoori Nottania Campbell Erik Ahrné Yuki Ohta Frederic Nikitin Rostyk Kuzyakiv Dominique Kadio Koua Patricia Palagi Markus Müller Frederique Lisacek 41 © 2009 SIB SIP-CUI Fokko Beekhof Oleksiy Koval Slava Voloshynovskiy SCAHT Laurent Geiser Florent Glück Paola Antinori Denis Hochstrasser BPRG Alex Scherl Maria Ramirez-Boo Xavier Robin Alex Hainard Natacha Turck Jean-Charles Sanchez QuickMod Tutorial 2011