Download Chapter 2: Target/decoy search strategy for increased

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Pharmacometabolomics wikipedia , lookup

Western blot wikipedia , lookup

Peptide synthesis wikipedia , lookup

Point mutation wikipedia , lookup

Protein–protein interaction wikipedia , lookup

Protein purification wikipedia , lookup

Metalloprotein wikipedia , lookup

Matrix-assisted laser desorption/ionization wikipedia , lookup

Proteolysis wikipedia , lookup

Two-hybrid screening wikipedia , lookup

Ribosomally synthesized and post-translationally modified peptides wikipedia , lookup

Ancestral sequence reconstruction wikipedia , lookup

Mass spectrometry wikipedia , lookup

Metabolomics wikipedia , lookup

Transcript
Supplementary Methods
Sample processing
Two experimental data sets were used for this study for illustrative purposes. The
larger human sample representing a typical proteome-scale analytical sample was
derived from a fraction SDS-PAGE-separated Jurkat lysate. The second, yeast-derived
data set in which correct identifications were present in a much higher proportion, was
acquired as part of a previous study1. For each protein source, whole-cell lysate was
alkylated with iodoacetamide and fractionated by SDS-PAGE (4-12% gradient trisglycine gel, Invitrogen). Proteins were then in-gel digested with trypsin, extracted, and
cleaned by off-line desalting with Sep-Pak C18 solid phase resin (Waters, Milford, MA).
Lyophilized samples were redissolved in 7.5% acetonitrile/5% formic acid to a final
concentration of approximately 1 g/L.
Liquid chromatography and tandem mass spectrometry (LC-MS/MS)
LC-MS/MS experiments were performed on an LTQ FT mass spectrometer (Thermo
Electron, San Jose, CA) equipped with an Agilent 1100 high-performance liquid
chromatography (HPLC) pump (Agilent Technologies, Palo Alto, CA) and a Famos
autosampler (LC Packings, San Francisco, CA). Peptide mixtures were introduced into
the mass spectrometer via a fused silica microcapillary column (internal diameter = 125
m), ending in an in-house pulled needle tip (internal diameter ~ 5 m). Columns were
packed to a length of 18 cm with a C18 reversed-phase resin (Magic C18AQ, Michrom
Bioresources, Auburn, CA). Approximately 2 g of sample solution were loaded onto
the column. Peptides were eluted into the electrospray ionization source of the mass
spectrometer via a linear gradient of 12 to 33% buffer B (2.5% water and 0.1% formic
acid in acetonitrile (v/v)) in buffer A (2.5% acetonitrile and 0.1% formic acid in water
(v/v)) over 96 minutes (Jurkat sample) or 3 to 37% Buffer B over 90 minutes (yeast
sample) followed by a high organic wash (100% buffer B, 6 minutes) and a column
reconditioning wash (100 % buffer A, 50 minutes). Eluting peptides were measured by
the LTQ FT mass spectrometer (ThermoElectron, San Jose, CA) operating in a datadependent mode. For the Jurkat sample, eight ion-trap MS/MS spectra were acquired
1
per data-dependent cycle from a high-resolution (R set at 100,000) FTICR master
spectrum (mass range = 350 – 1700 m/z). The yeast sample was analyzed with a SIM3
method as described1.
Data processing
Resulting MS/MS spectra were searched with SEQUEST2 or Mascot3 algorithms (where
noted) against a composite sequence database consisting of sample-appropriate
protein sequences downloaded from the Saccharomyces Genome Database (SGD,
Stanford University, CA) (yeast sample) or the minimally-redundant human sequences
stored at the International Protein Index4 (downloaded February, 2006), common
contaminant protein sequences, and reversed versions of these sequences.
Alternatively, these data were searched against either the downloaded sequences
(target) or reversed sequences (decoy) separately, where indicated. Pseudo-reversed
sequences were generated on-the-fly by the implementation of SEQUEST on the
SEQUEST Sorcerer platform (Sage-N Research, San Jose, CA). Random and Markovchain modeled decoy sequence databases were constructed based on amino acid
frequencies in the target database using an in-house algorithm written in the Perl
programming language. For the Markov database, new residues were selected based
on the preceding four residues.
All SEQUEST searches were performed on the SEQUEST Sorcerer platform. All
Mascot searches were performed on an in-house dual-processor linux server.
Searches against the human databases were performed using the following
parameters: at least one tryptic terminus for all considered peptides, a mass tolerance
of ± 50 ppm, variable oxidation for methionine residues ( + 15.99491 Da), and static
modification with iodoacetamide on cysteine residues ( + 57.02146 Da). Fragment ion
mass tolerance for SEQUEST and Mascot searches were left at their default
parameters.
Modeling the error associated with measured FP rates was performed as follows:
Software was written to simulate the effect filtering criteria have on FP estimations
2
derived from set numbers of correct and incorrect PSMs. This program exploited the
target-decoy principle and therefore relied on the same assumptions explained
previously, namely that all decoy hits are incorrect, and that there are equal numbers of
incorrect target and decoy hits. The program took as input the number of total hits to
consider, and what portion of them are actually correct (i.e., precision, Table 1). These
correct hits were assigned a “target” state. For each of the remaining incorrect hits, the
program randomly assigned a “target” or “decoy” state. Once all hits were assigned a
state, the precision rate was calculated by doubling the number of decoy hits and
dividing this by the total number of hits. This number was then subtracted from the predetermined precision rate to give the deviation between the actual and estimated
precision rates (estimation error). This process was repeated 100,000 times to create a
distribution of estimation error from which a standard deviation was derived. Such
standard deviation measurements were made for many combinations of input total hits
and precision (Fig. 5a).
For estimating the frequencies of incorrect identifications and establishing PSM
selection criteria, redundant PSMs were first removed keeping the top-scoring PSM as
a single representative. Confidently-assigned peptide hits were selected by an in-house
program similar in principle to one previously described5 that took into account charge,
tryptic and missed cleavage states, and SEQUEST’s XCorr and Cn scores, or the
Mascot’s Ion Score and homology factor6. With this program, low-confidence peptides
such as those with one tryptic terminus, multiple missed cleavages and low scores were
automatically excluded from subsequent analyses.
Additional computational experiments were conducted using a combination of software
written in the programming languages Perl and PHP, with additional support from
Microsoft Excel and a MySql database.
3
References
1.
2.
3.
4.
5.
6.
Haas, W., Faherty, B.K., Gerber, S.A., Elias, J.E., Beausoleil, S.A., Bakalarski,
C.E., Li, X., Villen, J. & Gygi, S.P. Optimization and use of peptide mass
measurement accuracy in shotgun proteomics. Mol Cell Proteomics (2006).
Eng, J.K., McCormack, A.L. & Yates, J.R., 3rd An approach to correlate tandem
mass spectral data of peptides with amino acid sequences in a protein database.
J Am Soc Mass Spectrom 5, 976-989 (1994).
Perkins, D.N., Pappin, D.J., Creasy, D.M. & Cottrell, J.S. Probability-based
protein identification by searching sequence databases using mass spectrometry
data. Electrophoresis 20, 3551-3567 (1999).
Kersey, P.J., Duarte, J., Williams, A., Karavidopoulou, Y., Birney, E. & Apweiler,
R. The International Protein Index: an integrated database for proteomics
experiments. Proteomics 4, 1985-1988 (2004).
Kislinger, T., Rahman, K., Radulovic, D., Cox, B., Rossant, J. & Emili, A. PRISM,
a generic large scale proteomic investigation strategy for mammals. Mol Cell
Proteomics 2, 96-106 (2003).
Elias, J.E., Haas, W., Faherty, B.K. & Gygi, S.P. Comparative evaluation of mass
spectrometry platforms used in large-scale proteomics investigations. Nat
Methods 2, 667-675 (2005).
4