Survey							
                            
		                
		                * Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Using MS/MS Spectrum
Libraries for the Detection of
PTM’s
Markus Müller
Swiss Institute of Bioinformatics
Geneva, Switzerland
© 2009 SIB
Outline
• MS/MS peptide identification
– Spectrum library versus sequence search
• QuickMod MS/MS workflow
• QuickMod Open modification spectrum
library search
– Alignment scoring
– Statistical validation
– Positioning of modifications
2
© 2009 SIB
QuickMod Tutorial 2011
Spectrum Library Searches
3
© 2009 SIB
QuickMod Tutorial 2011
Spectrum Library Searching
4
© 2009 SIB
QuickMod Tutorial 2011
Peptide-Spectrum Match (PSM)
p = LREQLGPVTQEFWDNLEK; z = 3
© 2009 SIB
5
QuickMod Tutorial 2011
Spectrum Library Search Scoring
• Log-transform intensities (variance stabilization, i.e. the variance of a
peak becomes independent of its intensity).
• Bin peak (m/z-intensity) lists into bins of width =0.1-1.0 m/z units.
• Normalized dot-product score:
1
Spectrum S  Pi  m , log I ; i  1 n1 binning
 S  s11 , s12 ,.., s1N
2
2
2
2
2
Spectrum S  Pi  mi , log I i ; i  1 n2 binning
 S  s12 , s22 ,.., s N2
 log I 
1
s kj 
1
1
i
 
 
1
i
2
i
mmin  j  mik  mmin   j 1
N
1  2
si1si2
S S
i 1
score  cos    1  2 
12
12
S S
 N 1 1  N 2 2 
  si si     si si 
 i 1
  i 1
6
© 2009 SIB
QuickMod Tutorial 2011
Spectral Library Search
Zhang et al., Proteomics 2011
7
© 2009 SIB
QuickMod Tutorial 2011
Spectral Library Search
Zhang et al.
8
© 2009 SIB
QuickMod Tutorial 2011
Spectral Library Search
Zhang et al.
9
© 2009 SIB
QuickMod Tutorial 2011
Spectral Library Search
Zhang et al., Proteomics 2011
10
© 2009 SIB
QuickMod Tutorial 2011
Spectrum Library Searches
• Spectrum library searches are more accurate than
sequence searches.
• Scoring is less critical and easier to implement.
• Spectrum library searches are very fast compared
to sequence searches.
• Libraries must be complete. Low abundance
proteins are rarely found in spectrum libraries.
• Different libraries for different instruments.
11
© 2009 SIB
QuickMod Tutorial 2011
Completeness of Libraries
Yeast data and one of the completest yeast libraries:
20281 of 25348 non-phospho peptides found
14186 of 31120 phospho peptides found
12
© 2009 SIB
QuickMod Tutorial 2011
Completeness of Spectrum Libraries
• Only 2 TF in NIST spectrum libraries of human protein!
– For a given biological sample, measure the sample repetitively using
inclusion/exclusion list to get maximum coverage of the peptides in the
spectrum library (Schmidt A, et al.)
– Clone TF in bacteria, purify, digest and measure with LC-MS (Bart Deplancke
Lab)
– Create synthetic peptides for all proteins of an organism and measure them
with LC-MS (Aebersold lab)
– Combine sequence search with spectrum library search (Ahrne et al, 2009)
– Create realistic in silico spectra to complement real spectra (Cannon et al, JPR,
2011)
• Few modified peptides in libraries
– Use and OMS spectrum library search tool, if the unmodified form of the
peptide is present (QuickMod, see below)
– Isolate modified peptides and create spectrum libraries for specific
modifications (PhosphoPep, PHOSPHIDA,..)
13
© 2009 SIB
QuickMod Tutorial 2011
Prediction of MS/MS Spectra
Zhang et al., Proteomics 2011
14
© 2009 SIB
Cannon et al, JPR, 2011
QuickMod Tutorial 2011
Spectrum Library Searches
Ahrne et al., Proteomics 2009
15
© 2009 SIB
QuickMod Tutorial 2011
Spectrum Libraries
Spectra identified with SpectraST,
but not with Phenyx
Ahrné et al. Proteomics, 2009
16
© 2009 SIB
QuickMod Tutorial 2011
QuickMod Spectral Library Search Workflow
Ahrné et al, Proteomics, 2009
17
© 2009 SIB
QuickMod Tutorial 2011
Combining Search Tools (PepArML)
https://edwardslab.bmcb.georgetown.edu/pymsio/
18
© 2009 SIB
QuickMod Tutorial 2011
Random and True Matches
• When searching a large database, most of the candidate peptides
are not present at a detectable level in a MS2 spectrum.
• For example, in silico tryptic digest of 10000 proteins may yield 100x
10000 = 1’000’000 peptides, but only 300 of these peptides may
actually be detectable in MS2 spectra.
• The score distribution will (hopefully)
be bimodal: many low scores for the
random matches and higher scores for
the true matches.
• The random and true score distributions
will evidently overlap, if the database is
large.
19
© 2009 SIB
QuickMod Tutorial 2011
Statistical Scores
False discovery rate : FDR = FPR = B/(A+B);
P-value: pValue = B/(B+C)
Posterior error probability: PEP = b/(a+b) (see TPP)
20
© 2009 SIB
QuickMod Tutorial 2011
Statistical Scores
• Statistical scores do not depend on the details of
the scoring function.
• The underlying scoring function can even be
multidimensional, i.e. include several scores of a
search engine.
• Statistical scores have a unified probabilistic
interpretation, i.e. they correspond to frequencies
and counts.
• This allows comparing the statistical scores of
different search engines with each other.
21
© 2009 SIB
QuickMod Tutorial 2011
False Discovery Rate (FDR)
•
Decoy search to control FDR on peptide and protein level
•
Works for both single and combined runs if applied correctly
•
Does not provide an answer about modification positioning.
•
Does not provide an answer if there is more than one high scoring PSM.
•
FDR is very sensitive to high scoring random matches.
•
The number of peptides identified at a given FDR is dependent on the way the decoy
database is created and the way FDR is calculated.
•
Statistically the FDR is an expectation value, i.e. the mean of many different decoy
searches:
FDR  E FP / FP  TP FP  TP  0
•
Each estimate with a single decoy db is only accurate within its standard error
(Granholm & Käll, Proteomics 2011):
1    FDR / FP  TP
FP  TP  2400, FDR  0.01,   0.5    0.0025
22
© 2009 SIB
QuickMod Tutorial 2011
Robustness of FDR
23
© 2009 SIB
QuickMod Tutorial 2011
Creation of Decoy Spectrum
Libraries
1. Shuffle sequence
2. Move annotated b,y,c,z-ions in
accordance with shuffled
sequence (e.g. y8+ -> y8+)
3. Sample non-annotated m/z if
they do not belong to a
conserved pattern (intensity is
left intact)
Ahrne et al, Preoteomics, 2011
24
© 2009 SIB
QuickMod Tutorial 2011
Fragment Peak Distribution
ETD
IT
25
© 2009 SIB
QuickMod Tutorial 2011
Controlling FDR
DeLiberator
Ahrné et al, Proteomics, 2011
26
© 2009 SIB
QuickMod Tutorial 2011
MS/MS Spectra of Modified
Peptides
• Modifications of mass  of a amino acid in a peptide
induce several important changes in the MS/MS spectrum:
– Precursor m/z is shifted by /z
– All the m/z values of the fragment ions, which contain
the modified amino acid are shifted by /z
– All the m/z values of the fragment ions, which do not
contain the modified amino acid remain the same.
However, their intensities my change significantly.
– Multiple modifications induce more complicated
changes.
27
© 2009 SIB
QuickMod Tutorial 2011
Similarity Between Modified and
Unmodified Spectra
Oxidation of
GQGTLSVVTM{16}YHK/2
Phosphorylation of
TY{80}FPHFDLSHGSAQVK/2
28
© 2009 SIB
QuickMod Tutorial 2011
QuickMod
Open modification search:
Spectral alignment and scoring
Controlling FDR
Ahrné et al. Recomb2011/JPR, submitted
29
© 2009 SIB
QuickMod Tutorial 2011
Modification
positioning
OMS: Spectrum Libraries Versus
Theoretical Spectra
30
© 2009 SIB
QuickMod Tutorial 2011
QuickMod Scores
QuickMod score =
Linear SVM combination of 3 best scores
Z=2
Z=3
31
© 2009 SIB
QuickMod Tutorial 2011
Benchmarking
Speed: InsPecT 30 min, PTMFinder 5 min; SpectraST 55 min;
QuickMod 5 min
32
© 2009 SIB
QuickMod Tutorial 2011
Modification Positioning
33
C
I
S
K
b1,b2,b3
b2,b3,y3
b3,y2,y3
y3,y2,y3
-1-1-1
-1 - 1 + 1
-1 + 1 + 1
+1 + 1 + 1
-3
-1
+1
+3
© 2009 SIB
QuickMod Tutorial 2011
Modification Positioning
34
© 2009 SIB
QuickMod Tutorial 2011
Multiple Modifications
• QuickMod is primarily designed for single
modifications
• Double modifications can also be detected as
long as the 2 modified residues
are close together
• Positioning yields a region
between the two modified
amino acids
35
© 2009 SIB
QuickMod Tutorial 2011
Modification Positioning
36
© 2009 SIB
QuickMod Tutorial 2011
Modification Positioning
2) Directed MS
(Inclusion list)
1) QuickMod Workflow
B2,Y2
HCD/CID
IK,IF,IH
Y3
Y4
Y7
Y5
Y8
CID
37
© 2009 SIB
QuickMod Tutorial 2011
3) Complimentary
Fragmentation
CID/HCD or MS3
QuickMod Tools
38
© 2009 SIB
QuickMod Tutorial 2011
Java Proteomics Library (JPL)
http://javaprotlib.sourceforge.net/
39
© 2009 SIB
QuickMod Tutorial 2011
Future Work
• Extend alignment to multiple modifications
• Develop modification specific scores and
positioning algorithms (phosphorylation)
• Work on combined sequence search and spectrum
library search
• Apply QM to large datasets for phosphorylation
and other modifications.
• Use it for verification of MS/MS assignments.
• …
40
© 2009 SIB
QuickMod Tutorial 2011
Many Thanks to
Proteome Informatics Group
Swiss Institute of Bioinformatics
Swetha Ramagoni
Luc Mottin
Leelapavan Tadoori
Nottania Campbell
Erik Ahrné
Yuki Ohta
Frederic Nikitin
Rostyk Kuzyakiv
Dominique Kadio Koua
Patricia Palagi
Markus Müller
Frederique Lisacek
41
© 2009 SIB
SIP-CUI
Fokko Beekhof
Oleksiy Koval
Slava Voloshynovskiy
SCAHT
Laurent Geiser
Florent Glück
Paola Antinori
Denis Hochstrasser
BPRG
Alex Scherl
Maria Ramirez-Boo
Xavier Robin
Alex Hainard
Natacha Turck
Jean-Charles Sanchez
QuickMod Tutorial 2011