Download Systematically Assessing the Influence of 3

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Endogenous retrovirus wikipedia , lookup

Peptide synthesis wikipedia , lookup

Western blot wikipedia , lookup

Metabolism wikipedia , lookup

Ancestral sequence reconstruction wikipedia , lookup

Ribosomally synthesized and post-translationally modified peptides wikipedia , lookup

Interactome wikipedia , lookup

Specialized pro-resolving mediators wikipedia , lookup

Protein wikipedia , lookup

Two-hybrid screening wikipedia , lookup

Catalytic triad wikipedia , lookup

Amino acid synthesis wikipedia , lookup

Structural alignment wikipedia , lookup

Biosynthesis wikipedia , lookup

Protein–protein interaction wikipedia , lookup

Genetic code wikipedia , lookup

Proteolysis wikipedia , lookup

Biochemistry wikipedia , lookup

Point mutation wikipedia , lookup

Hepoxilin wikipedia , lookup

Metalloprotein wikipedia , lookup

Transcript
Systematically Assessing the Influence of 3-Dimensional Structural Context on
the Molecular Evolution of Mammalian Proteomes
Sun Shim Choi,* Eric J. Vallender,* and Bruce T. Lahn*
*Howard Hughes Medical Institute, Department of Human Genetics, University of Chicago; and Committee on Genetics,
University of Chicago
The 3-dimensional (3D) structural context of amino acid residues in a protein could significantly impact the level of selective constraint on the residues. Here, by analyzing 767 mammalian proteins, we systematically investigate how various
3D structural contexts influence selective constraint. The structural contexts we examined include solvent accessibility,
secondary structure, and intramolecular residue–residue interactions. Through this analysis, we offer quantitative information on how 3D structural contexts affect the level of selective constraint.
The functional specificity of a protein stems from the
intricate 3-dimensional (3D) structure of the protein. As
such, the local structural context of amino acid residues
within the protein should significantly affect the level of
selective constraint operating on the residues. Although this
notion is readily assumed by many investigators (Bao and
Cui 2005; Karchin et al. 2005; Porto et al. 2005), only a
few studies have directly examined the influence of 3D
structural context on selective constraint (Goldman et al.
1998; Bustamante et al. 2000; Mintseris and Weng
2005). Given that these studies typically utilized a small
number of proteins, their conclusions are qualitative rather
than quantitative and may not represent genome-wide patterns. In this study, we performed a combination of 3D
structural analysis and phylogenetic analysis on a large
number of mammalian proteins. This large-scale study
allowed us, for the first time, to quantitatively assess the
extent to which various 3D structural contexts affect levels
of selective constraint on amino acid residues in proteins.
We collected all available human–rat orthologous
pairs for which 3D structural information can be obtained
for the human protein and for which there is at least one
amino acid difference between human and rat orthologs.
This resulted in a set of 767 human–rat orthologous pairs.
We will only report results based on human structures because rat structures produced essentially the same findings.
We first investigated how solvent accessibility of
amino acid residues influences selective constraint. We divided all the residues into 2 categories: likely buried and
likely exposed. We found that, on average, buried residues
have 26% lower human–rat replacement rate as compared
with exposed residues, though both have much lower rates
than the neutral expectation (fig. 1A). This finding shows
that buried residues evolve under considerable stronger
constraint than exposed residues.
We next examined the influence of secondary structural context on selective constraint. We considered 4 secondary structures: helix, strand, loop, and turn. Residues in
helices and strands showed significantly lower replacement
rate as compared with residues in loops and turns, with helices showing a slightly lower rate than strands (fig. 1A). As
helices and strands are generally considered to be more
Key words: proteome evolution, 3-dimensional structure, selective
constraint.
E-mail: [email protected].
Mol. Biol. Evol. 23(11):2131–2133. 2006
doi:10.1093/molbev/msl086
Advance Access publication August 10, 2006
Ó The Author 2006. Published by Oxford University Press on behalf of
the Society for Molecular Biology and Evolution. All rights reserved.
For permissions, please e-mail: [email protected]
ordered than loops and turns, this finding suggests that
portions of proteins that are more likely to be ordered tend
to evolve under greater constraint.
Protein function is critically dependent on 3D protein
structure, and the structure of a protein, in turn, is critically
dependent on the complex interactions among its amino
acid residues. To consider how intramolecular residue–
residue interactions affect selective constraint, we divided
all the residues into those that are involved in some form of
intramolecular interactions and those that are not. We found
that interacting residues have 29% lower amino acid replacement rate than noninteracting residues (fig. 1A), arguing that interacting residues are under much stronger
constraint. We further noted that the rate difference between
interacting and noninteracting residues is greater than that
between buried and exposed residues or between any 2 classes of secondary structure. This argues that among all the
3D structural contexts considered thus far, intramolecular
residue–residue interactions exert the strongest effect on
selective constraint.
Using the Grantham classification (Grantham 1974),
we divided amino acid replacements into 4 groups based
on the physicochemical properties of the replacements, including conservative, moderate, radical, and very radical.
We found that interacting residues are strongly enriched
for the conservative class of replacements relative to noninteracting residues and are deficient for the moderate,
radical, and very radical classes (fig. 1B). This result demonstrates that not only is selective constraint stronger
on interacting residues than noninteracting residues, but
the stronger constraint on interacting residues also biases
amino acid replacements considerably toward conservative
ones.
We next investigated how chemical properties of
residue–residue interactions influence levels of constraint.
We categorized residue–residue interactions into 5 classes:
ionic interactions, hydrophobic interactions, sidechain–
sidechain hydrogen bonds, sidechain–backbone hydrogen
bonds, and disulfide bonds. We then calculated the rate
of amino acid replacements for each class (fig. 1A). It is
noteworthy that cysteine residues in disulfide bonds have
a replacement rate of 1.4%, far lower than other types of
interactions, demonstrating the extraordinary conservation of such residues. In contrast, of the 3,493 cysteines
not involved in disulfide bonding, there are 227 (6.5%)
replacements.
The traditional metric for selective constraint on a gene
is the ratio of the gene’s nonsynonymous substitution rate
2132 Choi et al.
A
Amino acid replacement rate
0
0.04
0.08
0.12
0.56
0.60
Neutral expectation
Residues partitioned by
solvent accessibility
Likely buried
(6473/79206)
Likely exposed
(17208/156548)
Helix
(6542/71907)
Strand
(3679/38293)
Loop
(9307/87491)
Turn
(4153/38063)
Interacting
(12844/147810)
Non-interacting
(10837/87944)
Ionic interaction
(1755/20206)
Hydrophobic interaction
(4557/59323)
Residues partitioned by
secondary structural
context
Residues partitioned by
whether they participate in
residue-residue interaction
Interacting residues further
partitioned by the type
of interaction
Fraction of each class
B
Sidechain-sidechain H bond
(3256/40746)
Sidechain-backbone H bond
(5343/53905)
Disulfide bond
(21/1502)
60%
50%
40%
30%
20%
10%
0
Conservertive
Moderate
Radical
Very
radical
FIG. 1.—Effect of 3D structural context on protein sequence evolution. (A) Amino acid replacement rates under various 3D structural contexts. For
each structural context, the number of replacements over the total number of residues is given in parentheses. Neutral expectation was calculated by
computer simulation and error bars obtained from bootstrap resampling. (B) Breakdown of amino acid replacements into conservative, moderate, radical,
and very radical groups. Solid bars represent amino acid replacements of interacting residues; open bars represent amino acid replacements of noninteracting residues. Error bars were obtained from bootstrap resampling.
(Ka) to its synonymous substitution rate (Ks) (Li 1993,
1997). Here, we developed an analogous metric based
on the ratio of interacting residue replacement rate (Ri)
to noninteracting residue replacement rate (Rn). Similar
to the Ka/Ks ratio, the Ri/Rn ratio is below 1 for the great
majority of genes. We plotted human–rat Ka/Ks against
the Ri/Rn, which showed a strong positive correlation between Ka/Ks and Ri/Rn for Ri/Rn values less than 1 (supplementary fig. S1, Supplementary Material online). A simple
explanation for this correlation is that, similar to Ka/Ks, low
Ri/Rn is consistent with strong constraint, whereas high Ri/
Rn being consistent with weak constraint. In other words, as
selective constraint becomes stronger, its effect in limiting
the freedom of amino acid replacements tends to fall preferentially on interacting residues more so than on noninteracting residues. To better visualize the relationship between
Ka/Ks and Ri/Rn, we binned genes based on Ri/Rn values into
25 genes per bin. This produced a remarkably neat, positive
linear relationship between bin-average Ka/Ks and Ri/Rn for
Ri/Rn values less than 1. However, this positive relationship
breaks down for Ri/Rn values greater than 1. The reason for
this breakdown is not immediately obvious. We noted that
proteins with Ri/Rn values greater than 1 tend to have small
numbers of amino acid replacements, suggesting that stochastic variance may have contributed to the high Ri/Rn values in these proteins. However, additional studies are
needed to clarify the relationship between Ka/Ks and Ri/Rn.
The significantly greater selective constraint on interacting residues suggests that mutations in interacting residues may be more deleterious to fitness. To test this, we
collected all disease-causing missense mutations occurring
in our proteins. Of the 2,877 such mutations, 1,933 (67%)
strike interacting residues, significantly higher than the 63%
expected by chance (P , 0.00001).
Because the human lineage and the rat lineage
diverged from their common forebears, these 2 lineages
were beset by vastly different mutation rates, generation
time, demographic histories, and selective regimes. It is,
3D Structural Context and Selective Constraint 2133
therefore, of interest to examine whether the manner by
which 3D structural context affects evolutionary constraint
is comparable between these 2 lineages. Using dog sequences as the outgroup, we assigned human–rat amino acid
replacements onto either the human or the rat lineage. We
then analyzed the extent to which various 3D structural contexts affected amino acid replacement rates in each lineage.
We found that there is a remarkable symmetry between the
2 lineages (data not shown). Thus, the observations we
made for the human and rat lineages may reflect broader
patterns of protein evolution in other species.
In conclusion, our study provides compelling evidence
that 3D structural context of amino acid residues in proteins
exerts a strong influence on selective constraint. Although
this conclusion should not come as a surprise, our study is
the first to definitively demonstrate it on a genomic scale
and to offer quantitative measurements on the extent to
which various structural contexts affect constraint.
Materials and Methods
As described previously (Choi et al. 2005), a set of 767
rat–human orthologs was compiled for which 3D structures
were available for the human proteins, and there was at least
one amino acid difference between the rat and human orthologs. Dog sequences were also obtained for these proteins.
Human–rat amino acid replacements were parsed by parsimony onto either the human or the rat lineage (i.e., the identity of the human–rat ancestral residue is assumed to be the
residue that occurs in 2 of the 3 species), with residues that
differ in all 3 species excluded from the analysis. Structural
analysis of the proteins and the classification of 3D structural contexts were as previously described (Choi et al.
2005).
Ka and Ks between human and rat orthologs were calculated by the Li (1993) method. Amino acid replacement
rate for a given class of residues (e.g., buried residues) was
calculated as the fraction of residues in that class that differ
between human and rat. We did not correct for multiple hits
because they are likely rare given the low replacement rates.
Error bars are generated by 1,000 bootstrap resampling
(with replacement) of all the proteins and obtaining the
standard deviation of the resampled data. Neutral evolution
was simulated by evolving the human sequence forward,
allowing a random placement of mutations under the observed transition/transversion ratio until the number of simulated synonymous mutations was equal to the number of
synonymous mutations observed between human and rat
orthologs. This simulated sequence was then compared
against the original human sequence to determine the neutral replacement rate. Grantham distances for amino
acid replacements were obtained as previously described
(Grantham 1974) and classified into conservative, moderate, radical, and very radical groups based on prior convention (Grantham 1974; Wyckoff et al. 2000; Choi and Lahn
2003; Choi et al. 2005).
Human disease-causing mutations were obtained
from the Human Gene Mutation Database (http://www.
hgmd.cf.ac.uk) (Stenson et al. 2003). We only considered
missense mutation (i.e., mutations that result in amino acid
changes) and disregarded the other types of mutations such
as nonsense mutations and deletions. This resulted in a total
of 2,877 residues, whose missense mutations are disease
causing in humans.
Supplementary Material
Supplementary figure S1 is available at Molecular Biology and Evolution online (http://www.mbe.
oxfordjournals.org/).
Literature Cited
Bao L, Cui Y. 2005. Prediction of the phenotypic effects of nonsynonymous single nucleotide polymorphisms using structural
and evolutionary information. Bioinformatics 21:2185–90.
Bustamante CD, Townsend JP, Hartl DL. 2000. Solvent accessibility and purifying selection within proteins of Escherichia
coli and Salmonella enterica. Mol Biol Evol 17:301–8.
Choi SS, Lahn BT. 2003. Adaptive evolution of MRG, a neuronspecific gene family implicated in nociception. Genome Res
13:2252–9.
Choi SS, Li W, Lahn BT. 2005. Robust signals of coevolution of
interacting residues in mammalian proteomes identified by
phylogeny-aided structural analysis. Nat Genet 37:1367–71.
Goldman N, Thorne JL, Jones DT. 1998. Assessing the impact of
secondary structure and solvent accessibility on protein evolution. Genetics 149:445–58.
Grantham R. 1974. Amino acid difference formula to help explain
protein evolution. Science 185:862–4.
Karchin R, Diekhans M, Kelly L, Thomas DJ, Pieper U, Eswar N,
Haussler D, Sali A. 2005. LS-SNP: large-scale annotation of
coding non-synonymous SNPs based on multiple information
sources. Bioinformatics 21:2814–20.
Li WH. 1993. Unbiased estimation of the rates of synonymous and
nonsynonymous substitution. J Mol Evol 36:96–9.
Li WH. 1997. Molecular evolution. Sunderland, MA: Sinauer
Associates.
Mintseris J, Weng Z. 2005. Structure, function, and evolution of
transient and obligate protein-protein interactions. Proc Natl
Acad Sci USA 102:10930–5.
Porto M, Roman HE, Vendruscolo M, Bastolla U. 2005. Prediction of site-specific amino acid distributions and limits of
divergent evolutionary changes in protein sequences. Mol
Biol Evol 22:630–8.
Stenson PD, Ball EV, Mort M, Phillips AD, Shiel JA, Thomas NS,
Abeysinghe S, Krawczak M, Cooper DN. 2003. Human Gene
Mutation Database (HGMD): 2003 update. Hum Mutat
21:577–81.
Wyckoff GJ, Wang W, Wu C-I. 2000. Rapid evolution of male
reproductive genes in the descent of man. Nature 403:304–9.
William Martin, Associate Editor
Accepted August 4, 2006