* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download 076075.Supplemental Data Text
Survey
Document related concepts
List of types of proteins wikipedia , lookup
Ancestral sequence reconstruction wikipedia , lookup
Magnesium transporter wikipedia , lookup
Protein (nutrient) wikipedia , lookup
Community fingerprinting wikipedia , lookup
Intrinsically disordered proteins wikipedia , lookup
Protein moonlighting wikipedia , lookup
Interactome wikipedia , lookup
Protein structure prediction wikipedia , lookup
Metabolomics wikipedia , lookup
Two-hybrid screening wikipedia , lookup
Protein adsorption wikipedia , lookup
Western blot wikipedia , lookup
Protein–protein interaction wikipedia , lookup
Nuclear magnetic resonance spectroscopy of proteins wikipedia , lookup
Transcript
Supplemental Data Text Statistical calculations Having established the group dynamics of the dataset of 406 SSPs, we tested at what level of predictive accuracy we could assign any given subject to a specific group designation based solely on their individual SSP profile. The Q2 measures of group predictability were 0.78, 0.54 and 0.69, respectively for never, light, and heavy smokers. In this analysis, 1.00 defines perfect prediction. We further modeled the results of these analyses in order to test and establish error rates of the weighed value of individual SSPs, on group separations. We applied a method which assesses variable importance, the Random Forests test, to determine the specific linkage of individual SSPs to group distributions. Each SSP group was ranked for overall discrimination power and association with any of the three groups. We established an error rate of 27% in association with as few as 12 SSP determinants. Overall these results showed significant linkage between SSP scores and specific group membership, and confirmed the results of the PLS-DA analysis. Random Forest prediction was done in the R package randomForest (1-4). Mass spectrometry Protein identification was achieved by peptide mass fingerprinting using a MALDI-TOF MS, Voyager DE-PRO and an AB 4700 (Applied Biosystems, Framingham, MA, USA). The instrument was operated in reflector mode at an accelerating voltage of 20 kV. Sample deposition was performed by spotting onto stainless steel MALDI-target plates, according to a previously described procedure (5). Three cm microcapillarys packed with 4-6 mm POROS-50 beads was used for sample trace enrichment reaching high sensitivities and improved protein identification. Mass spectrometry identification Individual protein spots on the 2-D gels were assigned unique standard spot numbers (SSPs) and the amount of protein in a spot was assessed as background corrected optical density, integrated over all pixels in the spot and expressed as integrated optical density (IOD). In order to normalize for differences in total staining intensity between different 2-DE images, the amount of different spots were expressed as the percentage of the individual spot IOD per total IOD of all the spots (% IOD). The results were manually inspected and corrected when needed. We chose to perform further detailed analysis of spot identification using a conservative set point that required ≥ 70% presence rate (Figure 3, red bars) in either the never smoking or the smoking cohorts. Representative spots from 406 SSPs annotations were identified by MALDI-TOF MS and MALDI-TOF/TOF MS utilizing the MASCOT search engine, with the statistical criteria as previously described (6). In total, MS and MS/MS identified 200 proteins in the BAL proteome, of which about 15% were exclusively present, or regulated significantly within the smoking subject groups. Supplemental Data Table 1 shows examples of regulated and non regulated SSP protein identities that were found at presence rates ≥70%, in either the never smoking or smoking groups. These included a number of well described proteins associated with re-dox reactions, immune reactivity, and inflammation. References 1. Eriksson J, Chait BT, Fenyo D. A statistical basis for testing the significance of mass spectrometric protein identification results. Anal Chem 2000;72:999-1005. 2. Breiman L. Random Forests. Machine Learning 2001;45:5. 3. Ihaka R, Gentleman R. R: A language for data analysis and graphics. J Computational and Graphical Statistics 1996;5:299-314. 4. Eriksson L, Johansson L, Kettaneh-Wold N, Wold S. Multi- and Megavariate Data Analysis Principles and Applications: Umetrics AB, 2001. 5. Plymoth A, Lofdahl CG, Ekberg-Jansson A, Dahlback M, Lindberg H, Fehniger TE, Marko-Varga G. Human bronchoalveolar lavage: biofluid analysis with special emphasis on sample preparation. Proteomics 2003;3:962-72. 6. Pappin DJ, Hojrup P, Bleasby AJ. Rapid identification of proteins by peptide-mass fingerprinting. Curr Biol 1993;3:327-32.