Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
INF380 – Proteomics Chapter 3 – Protein digestion • • • • • An important part of the identification process is protein digestion, cleaving the proteins into peptides. The proteins are experimentally digested (cleaved) into peptides by enzymes called proteases, which are active in all organisms. Numerous proteases from numerous species, ranging from man to bacteria, are known and characterized. Because proteases that should be used in proteomic research must have specific properties, only a few of them are routinely used. The peptides are analysed by mass spectrometry, producing a mass spectrum. This spectrum is then compared to the theoretical peptide masses from in silico digestion of database sequences. INF380 - Proteomics-3 1 Protein digestion • • • Let PE be the set of masses from a spectrum, assuming to come from one protein. Let PT be the set of theoretical masses from in silico digestion (simulating the protease) of the same protein. Then, in an ideal situation, and without modifications, PE=PT. In the real world we however have – Some of the masses in PE come from other proteins or molecules contaminating our sample. – Not all of the peptides are detected by the mass spectrometer, and their masses will not be in PE – There may be a disagreement between the experimental digestion and the model used for the in silico digestion, for example that not all expected cleavages are performed in the experimental digestion, so-called missed cleavages. • • This results in a set of experimental masses, which is not in PT and a set of theoretical masses which is not in PE There can be peptides containing modified residues INF380 - Proteomics-3 2 Protein digestion INF380 - Proteomics-3 3 Protein digestion • • • • • For peptide mass fingerprinting a goal is to achieve as many common peptides as possible for the experimental and the in silico digestions, hence as high coverage as possible. Therefore the protein separation steps before digestion are important. A single protein in each sample (for example a single protein in each spot of a 2D gel) is preferable, and contamination should be avoided at any step of sample handling. Human keratins (from our skin or hair) is a common contamination In PMF identifications of average proteins, a sequence coverage of 15 to 30 % is frequently achieved, often corresponding to 5 to 15 peptides. This may constitute less than half of the experimental peptides detected. INF380 - Proteomics-3 4 Protein digestion • • There are many reasons why the major part of the sequence is not covered by the experimental peptide As can be understood from above, the selection of the protease is a very important consideration. – The protease should cleave the protein in a unique and predictable way. – For two main reasons the protease should not cut the protein into too many small peptides or a few long peptides. • • • Most mass spectrometry instruments have a limited mass range where they can operate at their optimum. The number of sequences containing a specific peptide mass increases with decreasing mass. For example, if we take the 13,359 human proteins available in the database SwissProt in January 2006, and digest them with trypsin, there are 633 peptides in the m/z range 499.0 to 501.0, but only 195 peptides in the m/z range 1999.0 to 2001.0. The mass of peptides of length less than six amino acids occurs in so many sequences that they are less appropriate for use in identification. Thus, the protease should not cut the protein into many small peptides. INF380 - Proteomics-3 5 Experimental digestion INF380 - Proteomics-3 6 Cleavage specificity • • Cleavage specificity is a description of the cleavage site of a protease's substrate protein. A cleavage site can be described by: – Cleavage activator is a set of amino acids that a subsite can bind to. – Cleavage point specify the cleavage point. – Cleavage preventor is a set of amino acids that hinders the cleavage if one of them occur at a specific position, despite the occurrence of the cleavage activators. • Thus each residue of a cleavage site is part of an activator or a preventor. Note however that for some proteases an activator can be X, meaning that any of the amino acids can occur. INF380 - Proteomics-3 7 Cleavage specificity • A notation for describing a cleavage site is to: – – – – enclose the cleavage activators in brackets, '[]'; enclose the preventors in '<>'; specify the cleavage point by a full stop, '.'; the length of the cleavage site is equal to the number of activators and preventors. INF380 - Proteomics-3 8 Cleavage specificity Trypsin is the protease that best satisfies the desired requirements. • It has high specificity, few missed cleavages and rarely or never cleaves at unexpected positions. . • arginine and lysine appear with an average distance of approximately 11, and with a small probability of being succeeded by a proline, peptides of suitable length are produced. • It is easily obtained and purified. • It is applicable in most experimental settings and procedures, and is used to cleave proteins in solution, in gels, or even adsorbed onto surfaces. • Thus trypsin, by cleaving after each arginine and lysine, ensures that each peptide will have a site capable of retaining a proton (for ionization). INF380 - Proteomics-3 9 In silico digestion • The in silico digestion of a protein sequence is performed by scanning the sequence for cleavage sites. However, one should have in mind how the experimental data are produced. 1. 2. 3. 4. 5. 6. There may be missed cleavages. There can be naturally occurring modifications in some positions of the protein. There can be chemical modifications intentionally introduced. There can be unintentional modifications introduced by the sample handling. There can be unsuspected cleavages during the maturation/life cycle of the protein There can be unexpected cleavages occurring during the experimental proteolytic treatment. INF380 - Proteomics-3 10 In silico digestion • • Missed cleavages and different modifications (points 1-4) greatly increase the number of theoretical peptides, thus also increasing the chances of accidental matches with the experimental data. If the number of cleavage sites in a sequence is n and the number of missed cleavages allowed in a peptide is k, then the number of theoretical peptides is INF380 - Proteomics-3 11