* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Systematically Assessing the Influence of 3
Survey
Document related concepts
Endogenous retrovirus wikipedia , lookup
Peptide synthesis wikipedia , lookup
Western blot wikipedia , lookup
Ancestral sequence reconstruction wikipedia , lookup
Ribosomally synthesized and post-translationally modified peptides wikipedia , lookup
Interactome wikipedia , lookup
Specialized pro-resolving mediators wikipedia , lookup
Two-hybrid screening wikipedia , lookup
Catalytic triad wikipedia , lookup
Amino acid synthesis wikipedia , lookup
Structural alignment wikipedia , lookup
Biosynthesis wikipedia , lookup
Protein–protein interaction wikipedia , lookup
Genetic code wikipedia , lookup
Proteolysis wikipedia , lookup
Biochemistry wikipedia , lookup
Transcript
Systematically Assessing the Influence of 3-Dimensional Structural Context on the Molecular Evolution of Mammalian Proteomes Sun Shim Choi,* Eric J. Vallender,* and Bruce T. Lahn* *Howard Hughes Medical Institute, Department of Human Genetics, University of Chicago; and Committee on Genetics, University of Chicago The 3-dimensional (3D) structural context of amino acid residues in a protein could significantly impact the level of selective constraint on the residues. Here, by analyzing 767 mammalian proteins, we systematically investigate how various 3D structural contexts influence selective constraint. The structural contexts we examined include solvent accessibility, secondary structure, and intramolecular residue–residue interactions. Through this analysis, we offer quantitative information on how 3D structural contexts affect the level of selective constraint. The functional specificity of a protein stems from the intricate 3-dimensional (3D) structure of the protein. As such, the local structural context of amino acid residues within the protein should significantly affect the level of selective constraint operating on the residues. Although this notion is readily assumed by many investigators (Bao and Cui 2005; Karchin et al. 2005; Porto et al. 2005), only a few studies have directly examined the influence of 3D structural context on selective constraint (Goldman et al. 1998; Bustamante et al. 2000; Mintseris and Weng 2005). Given that these studies typically utilized a small number of proteins, their conclusions are qualitative rather than quantitative and may not represent genome-wide patterns. In this study, we performed a combination of 3D structural analysis and phylogenetic analysis on a large number of mammalian proteins. This large-scale study allowed us, for the first time, to quantitatively assess the extent to which various 3D structural contexts affect levels of selective constraint on amino acid residues in proteins. We collected all available human–rat orthologous pairs for which 3D structural information can be obtained for the human protein and for which there is at least one amino acid difference between human and rat orthologs. This resulted in a set of 767 human–rat orthologous pairs. We will only report results based on human structures because rat structures produced essentially the same findings. We first investigated how solvent accessibility of amino acid residues influences selective constraint. We divided all the residues into 2 categories: likely buried and likely exposed. We found that, on average, buried residues have 26% lower human–rat replacement rate as compared with exposed residues, though both have much lower rates than the neutral expectation (fig. 1A). This finding shows that buried residues evolve under considerable stronger constraint than exposed residues. We next examined the influence of secondary structural context on selective constraint. We considered 4 secondary structures: helix, strand, loop, and turn. Residues in helices and strands showed significantly lower replacement rate as compared with residues in loops and turns, with helices showing a slightly lower rate than strands (fig. 1A). As helices and strands are generally considered to be more Key words: proteome evolution, 3-dimensional structure, selective constraint. E-mail: [email protected]. Mol. Biol. Evol. 23(11):2131–2133. 2006 doi:10.1093/molbev/msl086 Advance Access publication August 10, 2006 Ó The Author 2006. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. All rights reserved. For permissions, please e-mail: [email protected] ordered than loops and turns, this finding suggests that portions of proteins that are more likely to be ordered tend to evolve under greater constraint. Protein function is critically dependent on 3D protein structure, and the structure of a protein, in turn, is critically dependent on the complex interactions among its amino acid residues. To consider how intramolecular residue– residue interactions affect selective constraint, we divided all the residues into those that are involved in some form of intramolecular interactions and those that are not. We found that interacting residues have 29% lower amino acid replacement rate than noninteracting residues (fig. 1A), arguing that interacting residues are under much stronger constraint. We further noted that the rate difference between interacting and noninteracting residues is greater than that between buried and exposed residues or between any 2 classes of secondary structure. This argues that among all the 3D structural contexts considered thus far, intramolecular residue–residue interactions exert the strongest effect on selective constraint. Using the Grantham classification (Grantham 1974), we divided amino acid replacements into 4 groups based on the physicochemical properties of the replacements, including conservative, moderate, radical, and very radical. We found that interacting residues are strongly enriched for the conservative class of replacements relative to noninteracting residues and are deficient for the moderate, radical, and very radical classes (fig. 1B). This result demonstrates that not only is selective constraint stronger on interacting residues than noninteracting residues, but the stronger constraint on interacting residues also biases amino acid replacements considerably toward conservative ones. We next investigated how chemical properties of residue–residue interactions influence levels of constraint. We categorized residue–residue interactions into 5 classes: ionic interactions, hydrophobic interactions, sidechain– sidechain hydrogen bonds, sidechain–backbone hydrogen bonds, and disulfide bonds. We then calculated the rate of amino acid replacements for each class (fig. 1A). It is noteworthy that cysteine residues in disulfide bonds have a replacement rate of 1.4%, far lower than other types of interactions, demonstrating the extraordinary conservation of such residues. In contrast, of the 3,493 cysteines not involved in disulfide bonding, there are 227 (6.5%) replacements. The traditional metric for selective constraint on a gene is the ratio of the gene’s nonsynonymous substitution rate 2132 Choi et al. A Amino acid replacement rate 0 0.04 0.08 0.12 0.56 0.60 Neutral expectation Residues partitioned by solvent accessibility Likely buried (6473/79206) Likely exposed (17208/156548) Helix (6542/71907) Strand (3679/38293) Loop (9307/87491) Turn (4153/38063) Interacting (12844/147810) Non-interacting (10837/87944) Ionic interaction (1755/20206) Hydrophobic interaction (4557/59323) Residues partitioned by secondary structural context Residues partitioned by whether they participate in residue-residue interaction Interacting residues further partitioned by the type of interaction Fraction of each class B Sidechain-sidechain H bond (3256/40746) Sidechain-backbone H bond (5343/53905) Disulfide bond (21/1502) 60% 50% 40% 30% 20% 10% 0 Conservertive Moderate Radical Very radical FIG. 1.—Effect of 3D structural context on protein sequence evolution. (A) Amino acid replacement rates under various 3D structural contexts. For each structural context, the number of replacements over the total number of residues is given in parentheses. Neutral expectation was calculated by computer simulation and error bars obtained from bootstrap resampling. (B) Breakdown of amino acid replacements into conservative, moderate, radical, and very radical groups. Solid bars represent amino acid replacements of interacting residues; open bars represent amino acid replacements of noninteracting residues. Error bars were obtained from bootstrap resampling. (Ka) to its synonymous substitution rate (Ks) (Li 1993, 1997). Here, we developed an analogous metric based on the ratio of interacting residue replacement rate (Ri) to noninteracting residue replacement rate (Rn). Similar to the Ka/Ks ratio, the Ri/Rn ratio is below 1 for the great majority of genes. We plotted human–rat Ka/Ks against the Ri/Rn, which showed a strong positive correlation between Ka/Ks and Ri/Rn for Ri/Rn values less than 1 (supplementary fig. S1, Supplementary Material online). A simple explanation for this correlation is that, similar to Ka/Ks, low Ri/Rn is consistent with strong constraint, whereas high Ri/ Rn being consistent with weak constraint. In other words, as selective constraint becomes stronger, its effect in limiting the freedom of amino acid replacements tends to fall preferentially on interacting residues more so than on noninteracting residues. To better visualize the relationship between Ka/Ks and Ri/Rn, we binned genes based on Ri/Rn values into 25 genes per bin. This produced a remarkably neat, positive linear relationship between bin-average Ka/Ks and Ri/Rn for Ri/Rn values less than 1. However, this positive relationship breaks down for Ri/Rn values greater than 1. The reason for this breakdown is not immediately obvious. We noted that proteins with Ri/Rn values greater than 1 tend to have small numbers of amino acid replacements, suggesting that stochastic variance may have contributed to the high Ri/Rn values in these proteins. However, additional studies are needed to clarify the relationship between Ka/Ks and Ri/Rn. The significantly greater selective constraint on interacting residues suggests that mutations in interacting residues may be more deleterious to fitness. To test this, we collected all disease-causing missense mutations occurring in our proteins. Of the 2,877 such mutations, 1,933 (67%) strike interacting residues, significantly higher than the 63% expected by chance (P , 0.00001). Because the human lineage and the rat lineage diverged from their common forebears, these 2 lineages were beset by vastly different mutation rates, generation time, demographic histories, and selective regimes. It is, 3D Structural Context and Selective Constraint 2133 therefore, of interest to examine whether the manner by which 3D structural context affects evolutionary constraint is comparable between these 2 lineages. Using dog sequences as the outgroup, we assigned human–rat amino acid replacements onto either the human or the rat lineage. We then analyzed the extent to which various 3D structural contexts affected amino acid replacement rates in each lineage. We found that there is a remarkable symmetry between the 2 lineages (data not shown). Thus, the observations we made for the human and rat lineages may reflect broader patterns of protein evolution in other species. In conclusion, our study provides compelling evidence that 3D structural context of amino acid residues in proteins exerts a strong influence on selective constraint. Although this conclusion should not come as a surprise, our study is the first to definitively demonstrate it on a genomic scale and to offer quantitative measurements on the extent to which various structural contexts affect constraint. Materials and Methods As described previously (Choi et al. 2005), a set of 767 rat–human orthologs was compiled for which 3D structures were available for the human proteins, and there was at least one amino acid difference between the rat and human orthologs. Dog sequences were also obtained for these proteins. Human–rat amino acid replacements were parsed by parsimony onto either the human or the rat lineage (i.e., the identity of the human–rat ancestral residue is assumed to be the residue that occurs in 2 of the 3 species), with residues that differ in all 3 species excluded from the analysis. Structural analysis of the proteins and the classification of 3D structural contexts were as previously described (Choi et al. 2005). Ka and Ks between human and rat orthologs were calculated by the Li (1993) method. Amino acid replacement rate for a given class of residues (e.g., buried residues) was calculated as the fraction of residues in that class that differ between human and rat. We did not correct for multiple hits because they are likely rare given the low replacement rates. Error bars are generated by 1,000 bootstrap resampling (with replacement) of all the proteins and obtaining the standard deviation of the resampled data. Neutral evolution was simulated by evolving the human sequence forward, allowing a random placement of mutations under the observed transition/transversion ratio until the number of simulated synonymous mutations was equal to the number of synonymous mutations observed between human and rat orthologs. This simulated sequence was then compared against the original human sequence to determine the neutral replacement rate. Grantham distances for amino acid replacements were obtained as previously described (Grantham 1974) and classified into conservative, moderate, radical, and very radical groups based on prior convention (Grantham 1974; Wyckoff et al. 2000; Choi and Lahn 2003; Choi et al. 2005). Human disease-causing mutations were obtained from the Human Gene Mutation Database (http://www. hgmd.cf.ac.uk) (Stenson et al. 2003). We only considered missense mutation (i.e., mutations that result in amino acid changes) and disregarded the other types of mutations such as nonsense mutations and deletions. This resulted in a total of 2,877 residues, whose missense mutations are disease causing in humans. Supplementary Material Supplementary figure S1 is available at Molecular Biology and Evolution online (http://www.mbe. oxfordjournals.org/). Literature Cited Bao L, Cui Y. 2005. Prediction of the phenotypic effects of nonsynonymous single nucleotide polymorphisms using structural and evolutionary information. Bioinformatics 21:2185–90. Bustamante CD, Townsend JP, Hartl DL. 2000. Solvent accessibility and purifying selection within proteins of Escherichia coli and Salmonella enterica. Mol Biol Evol 17:301–8. Choi SS, Lahn BT. 2003. Adaptive evolution of MRG, a neuronspecific gene family implicated in nociception. Genome Res 13:2252–9. Choi SS, Li W, Lahn BT. 2005. Robust signals of coevolution of interacting residues in mammalian proteomes identified by phylogeny-aided structural analysis. Nat Genet 37:1367–71. Goldman N, Thorne JL, Jones DT. 1998. Assessing the impact of secondary structure and solvent accessibility on protein evolution. Genetics 149:445–58. Grantham R. 1974. Amino acid difference formula to help explain protein evolution. Science 185:862–4. Karchin R, Diekhans M, Kelly L, Thomas DJ, Pieper U, Eswar N, Haussler D, Sali A. 2005. LS-SNP: large-scale annotation of coding non-synonymous SNPs based on multiple information sources. Bioinformatics 21:2814–20. Li WH. 1993. Unbiased estimation of the rates of synonymous and nonsynonymous substitution. J Mol Evol 36:96–9. Li WH. 1997. Molecular evolution. Sunderland, MA: Sinauer Associates. Mintseris J, Weng Z. 2005. Structure, function, and evolution of transient and obligate protein-protein interactions. Proc Natl Acad Sci USA 102:10930–5. Porto M, Roman HE, Vendruscolo M, Bastolla U. 2005. Prediction of site-specific amino acid distributions and limits of divergent evolutionary changes in protein sequences. Mol Biol Evol 22:630–8. Stenson PD, Ball EV, Mort M, Phillips AD, Shiel JA, Thomas NS, Abeysinghe S, Krawczak M, Cooper DN. 2003. Human Gene Mutation Database (HGMD): 2003 update. Hum Mutat 21:577–81. Wyckoff GJ, Wang W, Wu C-I. 2000. Rapid evolution of male reproductive genes in the descent of man. Nature 403:304–9. William Martin, Associate Editor Accepted August 4, 2006