Download 076075.Supplemental Data Text

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

List of types of proteins wikipedia , lookup

Ancestral sequence reconstruction wikipedia , lookup

Magnesium transporter wikipedia , lookup

Protein wikipedia , lookup

Protein (nutrient) wikipedia , lookup

Community fingerprinting wikipedia , lookup

Intrinsically disordered proteins wikipedia , lookup

Protein moonlighting wikipedia , lookup

Cyclol wikipedia , lookup

Interactome wikipedia , lookup

Protein structure prediction wikipedia , lookup

Metabolomics wikipedia , lookup

QPNC-PAGE wikipedia , lookup

Two-hybrid screening wikipedia , lookup

Protein adsorption wikipedia , lookup

Western blot wikipedia , lookup

Protein–protein interaction wikipedia , lookup

Nuclear magnetic resonance spectroscopy of proteins wikipedia , lookup

Proteomics wikipedia , lookup

Protein mass spectrometry wikipedia , lookup

Transcript
Supplemental Data Text
Statistical calculations
Having established the group dynamics of the dataset of 406 SSPs, we tested at what
level of predictive accuracy we could assign any given subject to a specific group
designation based solely on their individual SSP profile. The Q2 measures of group
predictability were 0.78, 0.54 and 0.69, respectively for never, light, and heavy smokers.
In this analysis, 1.00 defines perfect prediction. We further modeled the results of these
analyses in order to test and establish error rates of the weighed value of individual SSPs,
on group separations. We applied a method which assesses variable importance, the
Random Forests test, to determine the specific linkage of individual SSPs to group
distributions. Each SSP group was ranked for overall discrimination power and
association with any of the three groups. We established an error rate of 27% in
association with as few as 12 SSP determinants. Overall these results showed significant
linkage between SSP scores and specific group membership, and confirmed the results of
the PLS-DA analysis. Random Forest prediction was done in the R package
randomForest (1-4).
Mass spectrometry
Protein identification was achieved by peptide mass fingerprinting using a MALDI-TOF
MS, Voyager DE-PRO and an AB 4700 (Applied Biosystems, Framingham, MA, USA).
The instrument was operated in reflector mode at an accelerating voltage of 20 kV.
Sample deposition was performed by spotting onto stainless steel MALDI-target plates,
according to a previously described procedure (5). Three cm microcapillarys packed with
4-6 mm POROS-50 beads was used for sample trace enrichment reaching high
sensitivities and improved protein identification.
Mass spectrometry identification
Individual protein spots on the 2-D gels were assigned unique standard spot numbers
(SSPs) and the amount of protein in a spot was assessed as background corrected optical
density, integrated over all pixels in the spot and expressed as integrated optical density
(IOD). In order to normalize for differences in total staining intensity between different
2-DE images, the amount of different spots were expressed as the percentage of the
individual spot IOD per total IOD of all the spots (% IOD). The results were manually
inspected and corrected when needed.
We chose to perform further detailed analysis of spot identification using a conservative
set point that required ≥ 70% presence rate (Figure 3, red bars) in either the never
smoking or the smoking cohorts. Representative spots from 406 SSPs annotations were
identified by MALDI-TOF MS and MALDI-TOF/TOF MS utilizing the MASCOT
search engine, with the statistical criteria as previously described (6). In total, MS and
MS/MS identified 200 proteins in the BAL proteome, of which about 15% were
exclusively present, or regulated significantly within the smoking subject groups.
Supplemental Data Table 1 shows examples of regulated and non regulated SSP protein
identities that were found at presence rates ≥70%, in either the never smoking or smoking
groups. These included a number of well described proteins associated with re-dox
reactions, immune reactivity, and inflammation.
References
1.
Eriksson J, Chait BT, Fenyo D. A statistical basis for testing the significance of
mass spectrometric protein identification results. Anal Chem 2000;72:999-1005.
2.
Breiman L. Random Forests. Machine Learning 2001;45:5.
3.
Ihaka R, Gentleman R. R: A language for data analysis and graphics. J
Computational and Graphical Statistics 1996;5:299-314.
4.
Eriksson L, Johansson L, Kettaneh-Wold N, Wold S. Multi- and Megavariate
Data Analysis Principles and Applications: Umetrics AB, 2001.
5.
Plymoth A, Lofdahl CG, Ekberg-Jansson A, Dahlback M, Lindberg H,
Fehniger TE, Marko-Varga G. Human bronchoalveolar lavage: biofluid
analysis with special emphasis on sample preparation. Proteomics
2003;3:962-72.
6.
Pappin DJ, Hojrup P, Bleasby AJ. Rapid identification of proteins by
peptide-mass fingerprinting. Curr Biol 1993;3:327-32.