Download Enhancing Sequence Coverage in Proteomics

Enhancing Sequence Coverage in Proteomics Studies by Using a Combination of Proteolytic Enzymes Dominic Baeumlisberger2, Christopher Kurz3 Tabiwang N. Arrey1, Marion Rohmer2, Carola Schiller3, Thomas Moehring1, Walter A. Möller3 and Michael Karas2 1Thermo Fisher Scientific, Bremen, Germany, 2Institute for Pharmaceutical Chemistry, Goethe-University, Frankfurt am Main, Germany, 3Department of Pharmacology, Goethe-University, Frankfurt am Main, Germany Overview Results Purpose: Increase sequence coverage and overall confidence of protein identification using a combination of datasets from three enzyme digests. Results: Combination of datasets from multiple enzyme digests enabled improved sequence coverage of proteins, increased the total number of unique peptide and protein groups identified, and minimized false-positive discovery rates. The Q ExactiveTM mass spectrometer provides not only rich fragmentation but also immonium ions, which are important for peptide correlation. Coupled with the high resolution and high mass accuracy in both MS and MS/MS, reliable identification is possible. This is especially very important for peptides generated using less-specific enzymes. Figure 1 shows triplicate runs of individual enzyme digests. Reproducibility rates of 69.9%, 62.3 % and 58. 25 % were obtained for trypsin, elastase and chymotrypsin, respectively. However, at the peptide level, it decreased to 57%, 46.92 % and 42. 97 % (see Figure 2) respectively. Introduction FIGURE 1. Proteins identified in triplicate experiments of each enzyme digest. Besides being the main site of adenosine triphosphate (ATP), mitochondria are associated with a range of other processes and diseases such as cell growth, cellular differentiation, mitochondrial disorder, aging processes and cardiac dysfunctions. To obtain a better understanding of these mitochondrial processes and diseases, we need to identify the proteins and proteins modifications involved. The ability to identify and characterize large numbers of proteins from medium- to high- complexity samples has made mass spectrometry (MS) coupled to reversed phase high-performance liquid chromatography (HPLC) a common analytical technique in proteomics. Usually, the extracted proteins are digested with a suitable protease and the resulting peptide mixture is separated and analyzed. Trypsin is the common enzyme of choice for proteomics experiments. Digestion with trypsin (or any single enzyme in general) often results in the identification of large numbers of proteins, but sequence coverage is frequently incomplete. If maximum sequence coverage is desired (e.g. when studying changes in protein modification or different isoforms), then signals covering all or most of the protein sequence are needed. Different approaches have been used to improve protein sequence coverage in proteomics. In this study, data obtained from individual trypsin, chymotrypsin and elastase digests were combined to significantly improve sequence coverage of proteins. A common phenomenon which is observed with peptides generated by less-specific enzymes such as elastase, is the absence of charge localization at either the N- or Cterminus. Fragmentation of these peptides results in lack of extended b- or y-ion series and an increase in internal fragment ions. Due to the basic moiety (TMT0), extended bions were generated. Figure 4 shows an example of a tandem MS of this peptide, IQGGVLAGDVTDVLLLDVTPL with monoisotopic mass of 2408.38506. Sample Preparation Liquid Chromatography Samples were loaded onto a Thermo Scientific Acclaim PepMap100 C18 pre-column (100 µm × 2 cm, C18 5 µm, 100 Å), and separated on a reversed-phase Acclaim ® PepMapTM100 C18 column (75 µm × 15 cm, C18 3 µm, 120 Å) using the Thermo Scientific EASY-nLC 1000 nanoflow HPLC. A 90 min gradient at a flow rate of 300 nL/min was used for the separation. Triplicate runs of individual enzyme digests were performed. FIGURE 2. Peptides identified in triplicate experiments of each enzyme digest. FIGURE 4. Tandem MS and annotated spectrum of the peptide AIQGGVLAGDVTDVLLLDVTPL generated from elastase digest. b-/a-type ions are shown in red while y-type ions in blue colour. The mass deviation of this peptide was 0.01 ppm (IonScore: 136) in MS and below 10 ppm for fragment ions in MS/MS. FIGURE 7. Amino acid sequence of ATP synthase subunit beta showing sections of the protein that was identified with annotated known modification (from UniProt). Acetylation is represented by A and phosphorylation by P. FIGURE 6. A) Sequence coverage achieved using different enzymes for a 453 amino acid protein Cytochrome b-c1 complex subunit 2. Green represents sections of the protein that were identified and white, the sections that were not covered by any of the identified peptides. The sequence coverage increased by 7.3 %, 45.2 %, and 56.4% for trypsin, elastase and chymotrypsin respectively. Combining all datasets, a net increase of 32.8 % is obtained. B) Comparison of sequence coverage from a single enzyme digest (trypsin) to that of the combined dataset for identified membrane proteins. Dark blue bars represent coverage obtain with trypsin alone and red bars from the sum of all enzymes used. Trypsin 87.86% A 1 51 101 15 1 201 1 51 101 15 1 201 25 1 301 3 51 401 453 25 1 301 3 51 401 453 b6+ 1.5 Chymotrypsin 60.26% 1.0 Conclusion The use of three different enzymes in proteomics studies enabled an average increase in total number of peptides of approximately 227.5 % and protein groups of about 68.8 % identified. The use of three different enzymes led to an average increase in protein sequence coverage of about 31 %. The use of three different enzymes improved overall confidence in protein identification The use of three different enzymes aided the study of changes in protein sequences and post-translational modifications. The high mass accuracy in both MS and MS/MS minimized false discovery rate (FDR). 1 51 101 15 1 201 1 51 101 15 1 201 25 1 301 3 51 401 453 In spite of the increase in sequence coverage with multiple enzyme digests, the highest number of protein and peptide identification for single proteolytic digest was obtained with trypsin. 3 51 401 453 b7+ b10+ b5+ y2+ b62+ b1+ 0.0 b72+ b2+ b82+ b3+ y3+ b10 2 + 300 400 m/z 500 b8+ a6+ All 3 enzymes 94.26% b9+ b4+ b11+ a7+ a8+ 600 b13+ 25 1 301 References b14+ b12+ 700 800 900 1000 1100 1200 1300 1400 1500 1600 1700 m/z FTMS + p NSI d Full ms2 [email protected] [120.00-2480.00] B 755.4594 100 100.00 90 80 Relative Abundance The raw data files were searched using Thermo Scientific Proteome Discoverer software v. 1.3 with MascotTM v. 2.2.1 search engine (Matrix Science Ltd, London UK). The peptide tolerance for MS was set at 15 ppm and for MS/MS 20 mmu. A highconfidence peptide filter with FDR of 1% was used. The use of multiple enzyme digests in proteomic studies might enable proteolytic cleavages at sites further away from modified peptides, thereby overcoming incomplete digestion caused by these protein modifications. For example, with a combination of datasets, peptides covering almost all known modifications (present in UniProt) from ATP synthase subunit beta were identified (figure 7). This was not true for all the identified proteins; nevertheless, a reasonable number of modified peptides were identified. This shows that to some extent, some portions of the proteome are simply inaccessible following digestion with a single protease. Therefore, in combination with technical replicate, multiple proteases can be used to significantly improve sequence coverage of proteins from a proteome and increase the confidence degree in protein identification. In addition, proteins that were identified by individual enzymes would have been missed, if only this enzyme was used in this experiment. 2.0 0.5 Data Analysis FIGURE 5. Total number of protein groups identified from triplicate runs of all enzymes. The highest number of proteins were identified with trypsin. Elastase 64.90% Mass Spectrometry All MS and MS/MS spectra were acquired in positive ion mode using a Thermo Scientific Q Exactive hybrid quadrupole-Orbitrap mass spectrometer. Full-scan data was obtained at a resolution of 70,000 (at m/z 200), demanding 1e6 ions in the mass range 350–1800 Da. For the tandem MS, 1e5 charges were required and the fragment ions were measured at a resolution of 17,500 (at m/z 200). The 10 most intensive ions in a spectrum were selected for fragmentation with a maximum injection time of 200ms. In general, 992 protein groups were identified in all enzyme digests, of which 18.25% were mitochondrial membrane proteins. Approximately 33% of the total number of identified proteins were present in the combined dataset (Figure 5). This not only lead to a significant increase in the number of protein groups identified but also enhanced the overall sequence coverage. However, the sequence coverage varied from protein to protein. For example, 100% or close to 100% sequence coverage was achieved for the small proteins (>100 amino acid) NADH dehydrogenase [ubiquinone] 1 alpha subcomplex subunit or cytochrome b-c1 complex subunit, while for larger proteins such as cytochrome b-c1 complex subunit 2 (> 400 amino acid) as shown in Figure 6, sequence coverage above 90% was obtained. 70 60 868.5435 50 40 ΣCoverage Coverage (Trypsin) 80.00 Sequence Coverage Methods Purified mitochondrial membrane proteins from mouse brain were dissolved in 25 mM triethylammonium bicarbonate buffer. Disulfide bridges were reduced in dithiothreitol, alkylated with iodoacetamide and digested over night with trypsin, chymotrypsin and elastase. Digestion was stopped by freezing at −20°C. Just before separation, each digest was labeled with the Thermo Scientific Amine-Reactive Tandem Mass Tag (TMT0) Reagent, to improve fragmentation, especially of the elastase and chymotrypsin generated peptides. FIGURE 3. Venn diagram showing unique peptides identified from triplicates experiments in all 3 enzyme digest. As expected, no peptide identified was common to all three enzyme preparations. Intensity 10^6 Methods: Peptides generated by proteolytic digestion of mitochondrial membrane were analyzed using a hybrid quadrupole-OrbitrapTM mass spectrometer. In total 12,007 peptides from a combination of triplicate dataset of 3 enzyme digests were identified. As expected, no peptide common to all three enzyme digests was identified. Less than 1% of the total number of identified peptides were identified in two enzyme digests. As shown in Figure 3, mostly unique peptides were identified and common peptide sequences in most cases cover regions that could not be identified by one enzyme digest. While the shared peptides between trypsin /chymotrypsin and trypsin/elastase contained basically R and K amino acids at their C termini, 54.05 % of those shared between chymotrypsin and elastase were outside the define cleavage sites (Y, W, F, M, L) of chymotrypsin. Most of these peptides have A, V, L and S at their C-termini, typical cleavage sites for elastase. 1111.6288 656.3893 229.1541 939.5805 434.7753 30 20 0.00 1210.6965 1311.7445 1525.8417 0 400 A. Gardner and G. R. Boles, Curr. Psychiatry Rev., 2005, 1 (3): 255–271. 3. A. E. Speers and C. C. Wu, Chem Rev., 2007, 107(8):3687–3714. 4. B. Rietschel et al. MCP, 2009, 8(5):1029-43. 5. D. Baeumlisberger et al. Proteomics, 2010, 10(21):3905-9. 40.00 1426.7715 996.6036 10 200 G. Choudhary et al., JPR, 2003, 2 (1), 59–67. 2. 20.00 470.2938 542.3498 301.2065 60.00 1. 600 800 1000 m/z 1200 1400 1600 1 21 41 61 81 101 121 141 161 Total number of identifed membrane proteins 181 201 Mascot is a registered product of Matrix Science Ltd. All other trademarks are the property of Thermo Fisher Scientific an its subsidiaries. This information is not intended to encourage use of these products in any manners that might infringe the intellectual property rights of others.

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Enhancing Sequence Coverage in Proteomics