Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Determination of the Substrate Specificities of Protein Kinases by Prediction Algorithms and Substrate Peptide Microarrays Sommerfeld, D., Lai, S., Safaei, J., Shi, J., Zhang, H. and Pelech, S. 1. Overview Peptide microarrays were employed to experimentally elucidate the optimal substrate amino acid sequence information for 112 human protein kinases to compare with predictions from the Kinase Predictor 1.0 algorithm. These findings will be used to generate training data to improve Kinase Predictor 2.0, to identify novel kinase-substrate interactions, and to map novel phosphorylation-dependent signalling pathways. 2. Introduction Protein kinases play a virtually universal role in the regulation of eukaryotic cellular processes by phosphorylating a plethora of protein (and lipid) substrates. Over two thirds of the proteins encoded by the human genome are subjected to phosphorylation on multiple sites, and there may be in excess of 650,000 phosphorylation sites (P-sites) in the human proteome. To ensure signalling fidelity amidst the vast complement of potential substrate P-sites, protein kinases must exert specificity for their substrates and act only on appropriate cellular targets. Substrate recognition by a particular kinase is largely determined by molecular recognition of the amino acid sequence surrounding a target P-site. An important aspect of substrate peptide specificity is that the target P-site falls within a consensus amino acid sequence motif, which exhibits structural and chemical complementarity to the kinase active site. Consensus P-site motifs for individual kinases have been determined through alignment of known substrate P-sites, screening of substrate peptide libraries in solution and arrays, and systematic mutation of substrate sequences. However, the substrate consensus sequences for most of the 516 human protein kinases remain unknown. To fill this gap in knowledge, there have been several attempts to predict kinase specificities for substrate recognition in the last decade since the sequencing of the human genome. 3. Experimental Approach Figure 1: Overview of input and output of Kinase Predictor 1.0 INPUT (a) Aligned amino acid sequences for P-sites from known substrates for 229 ‘typical’ protein kinases (from 9,125 confirmed kinasesubstrate pairs) (b) Aligned amino acid sequences for 488 catalytic domains from 478 ‘typical’ kinases Determine SDRs and their interaction probabilities for each amino acid position in the P-site region of optimal substrates using deduced consensus P-sites and kinase catalytic domain information. OUTPUT PSSM matrices of all kinases. PSSMs are used to derive an optimal P-site consensus sequence. Algorithm 2 Prediction of PSSM matrices for all kinases Generate Profile and PSSM matrices for each kinase catalytic domain using SDR’s, predicted amino acid interactions, and the unique amino acid sequence of each of the 488 catalytic domains. Figure 2: Synthesis and printing of optimal peptide sequences onto glass slide microarrays We previously developed computer algorithms to predict the specificities for all of the human typical protein kinases (Kinase Predictor 1.0). These algorithms utilize mutual information and charge information theory to identify Specificity Determining Residues (SDRs) in the catalytic domains of kinases and predict the preferred substrate sequences (13 amino acid residues in length) for 488 typical human protein kinase catalytic domains. They generate a frequency table or Position Specific Scoring Matrix (PSSM) that provides the probability of occurrence for each of the 20 amino acids at each of the 12 positions surrounding a target P-site for a specific protein kinase (Figure 1). Algorithm 1 Computing SDRs PEPTIDE MICROARRAY PRINTING Peptides printed onto epoxysilanecoated glass slides. Epoxysilane coating provides an epoxy ring that reacts with and covalently binds amine groups on the spotted peptides. Image below is modified from SCHOTT Nexterion® SlideE Product Information Sheet. MICROARRAY SET-UP Every peptide microarray has 4 fields that each contain 445 peptides (plus control spots and orientation markers) printed in triplicate. This setup allows for the testing of 4 different kinases per slide. While the accuracy of our algorithms have been tested in silico by comparing the computer-predicted consensus sequences with deduced consensus sequences by alignment of known protein substrates, they have yet to be validated empirically by in vitro assays. To test the accuracy of our algorithms, we used the PSSM for each of 488 typical human protein kinase catalytic domains to generate a list of ‘optimal’ substrate peptide sequences. We selected 445 peptide sequences (omitting nearly identical consensus sequences for highly related kinases) and printed them onto glass slide microarrays (Figure 2). Figure 3: Detection of phosphorylation with Pro-Q Diamond phosphoprotein stain Pro-Q Diamond® (Invitrogen) is a proprietary fluorescent stain that binds directly to the highly negatively charged phosphate group on phosphorylated serine, threonine and tyrosine amino acid residues, and allows for the detection of phosphopeptides (or phosphoproteins) without the requirement for antibodies or radioisotopes. Image below is modified from Gao et al., 2010.* Peptide microarrays were incubated with ATP and purified recombinant active kinases (SignalChem Pharmaceuticals). The phosphorylation of the peptides was detected using the fluorescent Pro-Q Diamond® phosphoprotein/peptide stain (Figure 3). Slides were scanned and the images were analyzed using advanced microarray image analysis software (ImaGene® 8.0). *Gao, L., Sun, H. and Yao, S.Q. (2010). Activity-based high-throughput determination of PTPs substrate specificity using a phosphopeptide microarray. Peptide Science 94 (6): 810-819. 4. Results We are aiming to test 204 purified recombinant human protein kinases in high-throughput phosphorylation assays against 445 peptide substrates. To date we have evaluated 112 kinases, and these different kinases exhibit distinct substrate fingerprints/phosphorylation profiles (Figure 4). The Pro-Q Diamond stain binds directly to the phosphate groups on phosphorylated peptides. Therefore, the signal intensity from peptide spots is proportional to the degree of phosphorylation of the respective peptide substrate, and we can use the signal intensity as a measure of the ‘efficiency’ of a peptide as a substrate for a particular kinase. Peptide substrates are ranked based on signal intensities (or degree of phosphorylation), and the sequences of the top ranking peptides are aligned and used to derive consensus sequences for target kinases. Table 1 demonstrates the alignment of top ranking peptide substrates for the AMPKα1 kinase and the comparison between the consensus sequences obtained from our peptide arrays (Peptide Consensus), alignment of known substrates P-sites (Protein Consensus), and our Kinase Predictor 1.0 algorithm (Predicted). Comparison of the consensus sequences shows a close match between the different methods, indicating that our algorithms are capable of predicting specificities of kinases, and that our peptide array can be used to test kinase specificities and validate our algorithms. Our preliminary results indicate that the first generation kinase specificity algorithms are capable of predicting optimal substrate sequences with relatively high accuracy. Table 2 illustrates the comparison of consensus sequences derived from our peptide arrays, computer prediction, and alignment of confirmed substrate proteins for a variety of protein kinases . Figure 4: Phosphorylation profile of the kinase AMPKa1 Different kinases display distinct substrate fingerprints. Red indicates high signal intensity corresponding to higher phosphorylation, while green indicates low signal intensity. 5. Significance and Future Directions Table 1: AMPKα1 peptide substrate ranking and alignment. Peptides exhibiting signal intensities above 20% of the highest signal detected are aligned and used to derive a consensus sequence (Peptide Consensus). Table 2: Comparison of consensus sequences obtained from peptide arrays (Peptide Consensus), Kinase Predictor 1.0 algorithm (Predicted), and alignment of known substrates Psites (Protein Consensus) Shown are the comparisons of derived consensus sequences for protein kinases of various groups and families. Our results also demonstrate the similarities in substrate specificities between members of the same kinase groups. The substrate phosphorylation data obtained from testing all of 204 protein kinases against our peptide microarrays will be used to expand the input datasets for training of our algorithms. In addition, the acquisition of 3D structural information of kinase catalytic domains, the incorporation of a larger number of SDRs, and the adoption of an empirically-derived amino acid interaction table will also be included in the development of second generation Kinase Predictor algorithms with improved accuracy and predictive power for mapping phosphoproteome interactions. Importantly, the in silico prediction of kinase specificities and substrate interactions can significantly facilitate the identification of kinase-substrate interactions in vivo. Understanding the molecular determinants that guide substrate recognition by a kinase is essential to understanding the role of individual protein kinases in particular cellular processes. The knowledge of substrate specificities that can be gained through our work has significant implications to the mapping of phosphorylation sites in the human proteome, the identification of previously uncharacterized substrates, and the generation of model substrates for smallmolecule inhibitor design for drug development. The identification of novel Psites and substrates for kinases may ultimately lead to a high-resolution map of kinase-dependent phosphorylation signalling networks inside the cell, and a greater understanding of their role in the pathology of many human diseases, including cancer, diabetes, neurodegenerative and immune disorders.