* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Solvent Accessibility Prediction in proteins
Survey
Document related concepts
Transcript
Solvent Accessibility Prediction in proteins Shandar Ahmad Department of Biosciences, Jamia Millia Islamia University, New Delhi-110025, India “Surface Was Invented by Devil.”, Fermi. Is that really so? • Surface of a molecule hides the inner part to visualization of bulk material, which frustrated Fermi. • Surface is the most important part of proteins, as the enzyme activity, and binding are governed by surface residues. • Solvent accessibility is a measure of exposed surface of an entire protein, individual amino acid residues or constituent atoms. Definition Importance of solvent accessibility • Accessible surface area (ASA) determines stability of proteins, as hydrophobic transfer energy is directly a measure of residue-wise solvent accessible surface. • Buried residues reside in the core of the protein and hence crucial to stability, even if they may not be active sites. • A protein may lose function upon mutation either because the new amino acid does not bind or because the protein lost structure upon that mutation making stability a crucial factor. • Accessible residues represent active sites of the protein. • Secondary structure is not sensitive to point mutations, as much as ASA is. DNA-binding probability and ASA 0.25 Fraction of Binding Residues 0.2 0.15 0.1 0.05 0 90-100 80-90 70-80 60-70 50-60 40-50 30-40 20-30 10-20 ASA range (%) Source: Ahmad S. et. al 2004, Bioinformatics 0-10 How do we calculate ASA • There are several free programs to calculate ASA for proteins with known structures. Characteristic features of the most common methods for calculating ASA ACCESS DSSP NACCESS ASC GETAREA Standalone executable availability Yes Yes Licenced Yes No Online calculations/ database No Yes No Yes Yes Polar and nonpolar area No No Yes No Yes Atom-wise surface area Yes No Yes Yes Yes Source code availability No Yes No Yes No Choice of probe radius Yes No Yes Yes No Choice of van der Waals and other parameters Yes No Yes Yes By Manual editing Secondary structure No Yes No No No Reference Lee and Richards (1971) Kabsch and Sander (1983) Hubbard and Thornton (1993) Eisenhaber and Argos (1993) Fraczkiewicz, and Braun, (1998) Comparison ASAView: A tool to plot and view Solvent accessibility • An online service to calculate and view ASA was developed (www.netasa.org/asaview). • A database of plots for the entire PDB is included. • The database can be downloaded. • ASAView is linked from PDB. ASAView: Database and tool for solvent accessibility representation in proteins. Shandar Ahmad, Michael Gromiha, Hamed Fawareh and Akinori Sarai, BMC Bioinformatics (2004) 5:51 ASAView • Residues are colored in polar, hydrophobic, positive and negative charge categories. • Each residue is represented by a solid circle of radius proportional to its solvent accessibility. • Residues are arranged outward in a spiral diagram, such that the lowest ASA is in the interior of the diagram, emulating the actual three-dimensional environment. Solvent accessibility and protein interfaces • Change is ASA is often used to define an interface and identify interacting residues. • We have defined native and isolated domain ASAs to estimate the extent of unsaturated bonds in a protein sequence. • Domain-domain and protein-protein interactions may be predictable from native ASAs. M. Firdaus Raih, Shandar Ahmad, Zheng Rong, Rahmah Mohamed Biophysical Chemistry 114 (2005) 63-69 Post interface ASA in different secondary structure conformations Relative loss of ASA by interfacing is not dependent on secondary structure 70.0 N a t ive A SA r e lat ive t o do m a 60.0 50.0 40.0 30.0 20.0 10.0 Helix Strand BetaB 3-10helix Turn Bend Coil 0.0 Ala Cys Asp Glu Phe Gly His Ile Lys Leu Met Asn Pro Gln Residue Arg Ser Thr Val Trp Charged residues retain most of their ASA even after interfacing 35.0 Postive Negative Hydrophobic Polar neutral R e la t ive n u m be r o f r e s 30.0 25.0 Surface hydrophobic residues lose more ASA upon interfacing than charged ones 20.0 15.0 10.0 5.0 0.0 <10 10-20 20-30 30-40 40-50 50-60 60-70 70-80 80-90 >90 Post interface ASA range (Native ASA relative to isolated domain ASA) Solvent accessibility predictions • Many more sequences are available than structures. • Structure prediction requires good templates or extensive computing. • Knowledge of solvent accessibility ahead of structure is useful. Methods of ASA prediction • Goal – ASA categories/ states – Real relative value of ASA – Real absolute value of ASA • Information indices – Amino acid sequence – Evolutionary information – Burial potentials • Model type – Information theory – Neural network (single and two-stage) – Multiple linear regression ASA states or categories • ASA cutoffs: – ASA values are transformed to normalized values by (1) extended state ASA of Ala-XAla or Gly-X-Gly (2) highest ASA – Residues are annotated as buried or exposed based on certain values of relative ASAs. – Two, Three or up to 10 categories are defined. (Ahmad & Gromiha, 2002 Bioinformatics) • Cutoffs are arbitrary – Different people use different categories. – Prediction quality strongly depends on these cutoffs. – Comparison between performance is difficult to make. • Recent work by Vardarajan (2006) shows a cutoff at 5%. – It is shown experimentally that mutations at sites with > 5% ASA, most strongly affect the protein function i.e. activity. A multi-layer neural network Hidden Layer (s) Input Layer Connection Weights Wijk (jth unit of layer i and kth unit of layer i+1) Output Layer Unit activation (kth unit of ith layer) Uik = f (Σ U(i-1)j W(i-1)jk ) Neural networks and digitization of amino acids • Binary orthogonal codification • Substitution matrices as amino acid codes. • Dimensionality reduction, using neural network. Dimensionality of amino-acid space … , Arauzo Bravo, Ahmad S, and Sarai, Comp. Biol. & Chem. (In Press) 2006 PSSM based predictions Development of non-redundant databases. • Several data sets are available. – – – – – Barton 512 proteins, Rost and Sander 126 Yuan 1260 Meller ~800 Ahmad ~2300 domains. • Largest data set so far used by us based on ASTRAL picked up domain-wise instead of proteins. • Redundancy is removed by sequence identity. • Completeness of structure and quality are checked, using WhatIF and ProCheck. Cross-validation • Three-fold cross-validation • Leave-one-out cross validation for domain vs native ASAs Real value Prediction of ASA We gave the first Real Value Prediction. Several other authors followed. (Ahmad et al. Proteins 2003) Real Value Prediction Input Layer Residue and neighbor information (Each residue and its neighbor are coded by 21 bits). Connection Weights Wijk (jth unit of layer i and kth unit of layer i+1) x1 x2 P=1/[1+exp(x2-x1)] x1 and x2 are activation values of units in the output layer. P is multiplied by 100 to get a percentage scale prediction. Unit activation (kth unit of ith layer) Uik = Σ U(i-1)j W(i-1)jk Results of RV Predictions. 70.0 60.0 Percentage 50.0 40.0 30.0 20.0 10.0 0.0 0--10 10--20 20--30 30--40 40--50 50--60 60--70 70--80 80--90 90--100 ASA Range (%) As ASA values increase, so does prediction error (black) due to a corresponding fall in the relative abundance of data (gray). Residue-wise variation in prediction error 35.0 30.0 25.0 Percentage Residue-specific prediction error and ASA variability. Dark circles represent the prediction error, and gray squares show the corresponding standard deviation in the experimental ASA for that residue type. A very high correlation (r = 0.97) is observed between the prediction error and standard deviation in the original data. 20.0 15.0 10.0 5.0 0.0 A C D E F G H I K L M N P Q Amino acid residue R S T V W Y Prediction histogram Relative number of residues (%) 45 40 35 30 25 20 15 10 5 0 0--10 10--20 20--30 30--40 40--50 50--60 60--70 70--80 Prediction error per residue (% ) 80--90 90--100 Look-up tables for Solvent Accessibility Predictions ¾We recently developed residue pattern libraries to serve as dictionaries of ASA values www.netasa.org/look-up/. ¾1P, 1N, 2P2N type prediction. ¾Smaller patterns give better results due to lack of convergence in longer patterns. Look up tables for prediction and analysis of nearest neighbor effects on solvent accessibility, Jung-Ying Wang, Shandar Ahmad, Michael Gromiha and Akinori Sarai Bioploymers 75 (2004) 209-216 Variation between 1P and 1N ASA information 0.12 Stdev (1P) Stdev (1N) Stde v i n A SA w i th c h an g e i n n e 0.10 0.08 0.06 0.04 0.02 0.00 A C D E F G H I K L M N Residue P Q R S T V W X Y Z Prediction of ASA for each atom • 167 different atomic groups occur in proteins. • We have carried out first large scale analysis and prediction of ASA for each of these atoms (Ahmad et al. submitted for publication, 2006). • Interesting observations are made about ASA distribution. Main results from analysis and prediction of atomic ASA Most atoms are primarily distributed in very small ASA range near 0. Proline CB atoms are frequently exposed Some Carbon atoms show a sharp second peak, suggesting two stable conformations Acidic and Basic residues have exposed nitrogen/ oxygen Other results • 167 neural network were designed and most atomic ASAs could be predicted close to 1A except those having bimodal distribution. • Some atoms were more sensitive to neighbor information than others. • Some were more sensitive to C-terminal neighbor and some to N-terminal. Thank you