Download Selection at the Wobble Position of Codons Read by the Same tRNA

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Genomic library wikipedia , lookup

Non-coding DNA wikipedia , lookup

Gene desert wikipedia , lookup

Magnesium transporter wikipedia , lookup

Expression vector wikipedia , lookup

Gene expression wikipedia , lookup

Molecular ecology wikipedia , lookup

Point mutation wikipedia , lookup

Community fingerprinting wikipedia , lookup

Gene regulatory network wikipedia , lookup

Silencer (genetics) wikipedia , lookup

Two-hybrid screening wikipedia , lookup

Promoter (genetics) wikipedia , lookup

Genomic imprinting wikipedia , lookup

Endogenous retrovirus wikipedia , lookup

RNA-Seq wikipedia , lookup

Biosynthesis wikipedia , lookup

Ridge (biology) wikipedia , lookup

Gene wikipedia , lookup

Gene expression profiling wikipedia , lookup

Epitranscriptome wikipedia , lookup

Molecular evolution wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Transfer RNA wikipedia , lookup

Genetic code wikipedia , lookup

Transcript
Selection at the Wobble Position of Codons Read by the Same tRNA in
Saccharomyces cerevisiae
Riccardo Percudani and Simone Ottonello
Institute of Biochemical Sciences, University of Parma, Parma, Italy
The transfer RNA gene complement of Saccharomyces cerevisiae was utilized for a whole-genome analysis of the
deviation from a neutral usage of pyrimidine-ending cognate codons, that is, codons read by a single tRNA species
having either inosine or guanosine as the first anticodon base. Mutational pressure at the wobble position was
estimated from the base composition of the noncoding portion of the yeast genome. The selective pressure for
translational efficiency was inferred from the degree of codon adaptation to tRNA gene redundancy and from mRNA
abundance data derived from yeast transcriptome analysis. Amino acid conservation in orthologous comparisons
with wholly sequenced microbial genomes was used to estimate translational accuracy requirements. A close correspondence was observed between the usage of wobble position pyrimidines and the frequency predicted by
mutational bias. However, in the case of four cognate pairs (Gly: ggu/ggc; Asn: aau/aac; Phe: uuu/uuc; Tyr: uau/
uac) all read by guanosine-starting anticodons, we found evidence for a strong selective pressure driven by translational efficiency. Only for the glycine pair, wobble pyrimidine choice also appears to fulfill a translational accuracy
requirement. Wobble pyrimidine selection is strictly related to the number of hydrogen bonds formed by alternative
cognate codons: whenever a different number of hydrogen bonds can be formed at the wobble position, there is
selection against six- or nine-hydrogen-bonded codon-anticodon pairs. Our results indicate that an intrinsic codon
preference, critically dependent on the stability of codon-anticodon interaction and mainly reflecting selection for
the optimization of translational efficiency, is built into the translational apparatus.
Introduction
Mutation bias and natural selection determine a
nonrandom usage of synonymous codons in prokaryotic
and eukaryotic genes (Bulmer 1991; Sharp et al. 1993).
A major driving force responsible for deviations from a
neutral choice of codons is translational selection, and
a noticeable indication of selection toward an optimized
protein synthesis is the adjustment of codon usage to the
abundance of the various tRNA species. This has been
documented for prokaryotic and eukaryotic genes (Ikemura 1981, 1982, 1985, 1992; Bennetzen and Hall
1982) and more recently confirmed by a whole-genome
analysis in the yeast Saccharomyces cerevisiae, for
which synonymous codon usage and even protein amino
acid composition were found to be highly coadapted
with both gene copy number and intracellular content of
individual tRNA species (Percudani, Pavesi, and Ottonello 1997). Although the biased usage of synonymous
codons is to a large extent explained by this mutual
relationship, the preference for one codon over another
may also depend on more intrinsic aspects of the translational machinery. One such aspect is the stability of
codon-anticodon interaction. The contribution of codonanticodon complex stability to translational selection can
be evaluated by measuring the relative usage of cognate
codons (i.e., codons read by a single tRNA species),
whose choice is independent of tRNA abundance. Biased usage of some cognate codons in highly expressed
genes has been reported previously (Grosjean et al.
1978; Ikemura 1981), and it has been shown that some
particular Escherichia coli cognate codons differ in their
Key words: yeast genome, wobble, codon-anticodon interaction,
hydrogen bonds, translational efficiency, translational accuracy.
Address for correspondence and reprints: Simone Ottonello, Institute of Biochemical Sciences, Viale delle Scienze, I-43100 Parma,
Italy. E-mail: [email protected].
Mol. Biol. Evol. 16(12):1752–1762. 1999
q 1999 by the Society for Molecular Biology and Evolution. ISSN: 0737-4038
1752
rates of tRNA recruitment in vitro (Thomas, Dix, and
Thompson 1988) and are translated at different rates in
vivo (Curran and Yarus 1989). A general preference for
‘‘intermediate G/C content’’ codons has been recognized
early in bacteria (Grosjean et al. 1978; Grosjean and
Fiers 1982), and a set of rules to predict cognate codon
choice have been formulated taking into account the different types of codon-anticodon interactions (Ikemura
1981, 1982, 1985, 1992). Nevertheless, the existence of
a causal relationship between cognate codon choice and
translational selection has been either negated (Kurland
1987) or variably attributed to translational efficiency or
accuracy factors or both (Grosjean and Fiers 1982; Parker et al. 1983; Thomas, Dix, and Thompson 1988; Curran and Yarus 1989).
The alternative usage of codons within a cognate
pair can entail variations in the energetics of codon-anticodon interaction. These variations depend on the identity of the first anticodon base and on the use of wobbling for a given organism. For example, in the case of
a G-starting anticodon, three or two hydrogen bonds can
be formed at the wobble position with C or U residues,
while the same interactions made by an I-starting anticodon invariably involve the formation of two hydrogen
bonds. A full appreciation of cognate codon selection
and its possible effects on either the efficiency or the
accuracy of translation thus requires complete knowledge not only of the codon frequency in protein coding
genes, but also of the composition and decoding specificity of the tRNA gene complement. The availability of
detailed information on the decoding system of Saccharomyces cerevisiae, together with a systematic analysis
of the deviation from a neutral usage of cognate codons,
has allowed us to delineate on a whole-genome scale
the significance of codon preferences that are independent of tRNA abundance. A key aspect of this work was
the use of two indicators of gene expressivity not in-
Genomic Selection in Cognate Codon Pairs
trinsically biased by the choice of particular codons
within cognate pairs and the evaluation of translational
accuracy requirements through the comparison of orthologous genes from wholly sequenced genomes. This
allowed us to distinguish between efficiency (i.e., the
optimization of the speed and energetic cost) and accuracy (i.e., the optimization of decoding fidelity) as the
driving forces of the translational selective pressure acting on wobble pyrimidine choice in yeast.
Materials and Methods
Protein-coding sequences of S. cerevisiae were extracted from the complete list of yeast open reading
frames (ORFs) available at the MIPS site (ftp://
ftp.mips.embnet.org) as of February 1998. Sequences recognized as ORFs through computer analysis but of uncertain identity were sorted out and discarded by searching individual entries for the key words ‘‘hypothetical
protein,’’ ‘‘questionable ORF,’’ and ‘‘pseudogene.’’ To
minimize redundancy, all the protein-coding sequences
thus selected were internally matched, and single members of highly conserved paralogous gene families were
included in our sequence sample. This filtering step was
carried out with the iterative use of the program FASTA
(Pearson and Lipman 1988) set to standard parameters; a
dedicated C language program was employed to conduct
pairwise homology searches and to eliminate all the sequences having nucleotide sequence similarity scores
with other yeast genes higher than one third of the selfscores. The decoding properties and the gene copy number of yeast tRNAs utilized in this study have been reported previously (Percudani, Pavesi, and Ottonello 1997;
Hani and Feldmann 1998). For the entire set of retained
yeast protein-coding genes, we calculated the usage of
alternative codons read by a single tRNA species, with
the exception of four tRNAs whose decoding specificities
are still uncertain. The expected base frequency at the
wobble position of each cognate pair was estimated from
the base frequency distribution of the noncoding portion
of the yeast genome (i.e., the subset of sequences remaining after the conservative subtraction of all identified
ORFs). Deviations from the expected base frequencies for
each cognate codon pair were estimated with the G-statistic (Sokal and Rohlf 1995, pp. 685–743) and expressed
as the natural logarithm of likelihood, calculated as the
sum of ƒ ln(ƒ/ƒ^),where ƒ and ƒ^ are, respectively, the
observed and the expected base frequencies. Deviations
from the expected base frequencies in the different classes in which yeast genes were subdivided were calculated
by summing cognate codon frequencies for all of the
genes included in each class. Heterogeneity in cognate
codon usage among the various gene classes was evaluated with the heterogeneity G-test.
As shown by Ikemura (1981, 1985) the parameter
values of the regression line ( y 5 a 1 bx) of tRNA
abundance versus codon frequency distributions strictly
depend on the expressivity of genes, and high slope values are associated with genes that encode abundant protein species. In this study, the slope (btRNA) values of
the regression lines that fit the data point distributions
1753
relating yeast tRNA gene multiplicity to codon frequency in individual genes (Percudani, Pavesi, and Ottonello
1997) were used to subdivide yeast protein-coding genes
into 10 classes of translational selective pressure. The
intracellular levels of yeast mRNAs were inferred from
an extensive set of data (V. E. Velculescu, personal communication) derived from yeast transcriptome analysis
(Velculescu et al. 1997). These data, which were obtained under three different growth conditions (logarithmic, G2/M transition, and stationary), refer to a total of
60,633 yeast mRNAs, each represented by a sequence
tag of 10 nt preceded by a four-base N1aIII restriction
site. Tags associated with putative untranslated regions
were not considered. Only tags internal to the coding
sequence and matching no more than one genomic region were used to assign mRNA output values to genes
in our sample. To minimize the number of false positives, tags not represented at least once in all three
growth conditions were similarly discarded. With these
restrictions, which reduce the number of tag-identifiable
genes, especially for those belonging to low-expression
classes, a total of 14,917 tags could be unambiguously
assigned to 673 genes in our sample.
Clusters of orthologous groups (COGs), consisting
of individual orthologous proteins or orthologous sets of
paralogs from wholly sequenced organisms belonging to
at least three lineages (Tatusov, Koonin, and Lipman
1997), were used to identify orthologous relatives of
yeast genes. Within each COG, obtained from the World
Wide Web (http://www.ncbi.nlm.nih.gov/COG), proteins
from yeast and other microorganisms were reciprocally
compared (i.e., each yeast protein against all the other
predicted microbial proteins and vice versa) with the rss
program of the FASTA package (Pearson and Lipman
1988). Each comparison yielded a similarity score, expressed as S 5 (Sobs 2 Srand)/(Sident 2 Srand), where Sobs
is the score for a given amino acid alignment, Srand is
the score for the comparison of two random sequences
(resulting from five independent randomizations) having
the same length and amino acid composition as the real
sequences, and Sident is the average of the score values
produced by self comparisons (Doolittle et al. 1996). For
all yeast proteins within a COG, we took into account
the best score value (BeS) obtained for a given organism. Similarly, the BeS score values of microbial protein
cooling sequences with respect to yeast were recorded.
The relationship between two sequences was considered
orthologous when a yeast BeS in a given organism was
reciprocally BeS for such an organism in yeast. For our
analysis, we selected yeast proteins having at least two
orthologs in two different evolutionary lineages corresponding to either Archaea (Methanococcus jannaschii),
Cyanobacteria (Synechocystis spp.), g-Gram-negative
(Escherichia coli, Haemophilus influenzae), or e-Gramnegative (Helicobacter phylori) bacteria. Amino acid sequence conservation for a total of 402 yeast genes meeting the above criteria was considered to be inversely
related to the mean genetic distance (D), calculated by
the Poissonian relation D 5 2ln(S). Multiple alignments
within this set of orthologous genes were carried out
with the CLUSTAL W program (Thompson, Higgins,
1754
Percudani and Ottonello
and Gibson 1994) set to standard parameters; leaving
out all gaped positions, we distinguished variable residues from invariant ones (i.e., residues corresponding to
positions at which the identity of an amino acid is the
same in all organisms). A dedicated program was used
to locate cognate codons for Gly (ggu/ggc), Asn (aau/
aac), Phe (uuu/uuc), and Tyr (uau/uac) in the corresponding yeast nucleotide sequences. The proportions of
nnu and nnc cognate codons at invariant and variable
sequence positions were compared by calculating the U/
C odds ratios. These were obtained by dividing the proportions of nnu/nnc codons at the two types of residue
positions; the resulting values were tested for significance using the G-test of independence.
Individual btRNA values for the 4,128 yeast genes
considered in the present study, together with mRNA
level and orthologous comparison data for a subset of
these genes, are available at the URL http://biochimica.
unipr.it/wobble.
Results
Deviation from Neutrality at the Wobble Position of
Pyrimidine-Ending Cognate Codons
Wobbling in S. cerevisiae pertains to pyrimidineending codons and involves 15 tRNAs with either guanosine- or inosine-starting anticodons (table 1). Exceptions to this rule are four additional tRNAs for which
the modification state of the first anticodon base is as
yet poorly defined and that are supposed to recognize a
wobbling purine (Percudani, Pavesi, and Ottonello 1997;
Hani and Feldmann 1998). We determined the occurrence of the codons served by the 15 tRNAs reading
pyrimidine-ending cognate pairs in a complete, nonredundant set of yeast protein-coding sequences that was
extracted from the 6,296 ORFs identified by the Yeast
Genome Project (Goffeau et al. 1996) as of February
1998, by excluding 429 ORFs annotated as ‘‘questionable,’’ 19 putative pseudogenes, and 1,242 hypothetical
proteins. A final sample of 4,128 bona fide protein-coding genes was obtained by filtering out 62 identical and
414 highly similar sequences. On the assumption of no
selective pressure, the expected frequencies of cytidineand uridine-ending codons within cognate pairs were
calculated by taking the base composition of the noncoding portion of the yeast genome (63.9% A/T, 36.1%
G/C) as our best estimate of mutational bias in S. cerevisiae. Deviations from these expected frequencies of
the relative usage of C- and U-ending cognate codons
were evaluated using the G-statistic (Sokal and Rohlf
1995, pp. 685–743) and expressed as the natural logarithm of the respective likelihood values. As shown in
table 1, there is generally good agreement between the
observed and the expected partitionings of C- and Uending codons within cognate pairs. If one further considers the average distribution of third-position C and U
residues for all 15 cognate pairs, a value of 63.6% U
and 36.4% C is obtained, which is remarkably closer to
the overall pyrimidine composition of the noncoding
portion than to that of the coding portion of the yeast
genome (58.9% U, 41.1% C). However, there are some
prominent cases of deviation. Cognate pairs that exhibit
the highest deviation from the average C/U distribution
are also characterized by the highest likelihood distance
from expected frequencies and are all decoded by
tRNAs having guanosine residues at the wobble position. A distinguishing feature of G-starting anticodons,
as compared with I-starting anticodons, is their ability
to form a number of hydrogen bonds that is different
for standard (G[C) or wobbling (G5U) interactions.
Wobble Base Choice as a Function of Translational
Selection
Since deviations from neutrality in cognate codon
usage appear to be associated with particular tRNA anticodons, and hence to selection at the translational level, we examined the dependence of these deviations
from the intensity of translational selection. The selective pressure acting on individual protein-coding genes
was inferred from the degree of adaptation between
tRNA gene copy number and tRNA usage. This was
measured by the slope value (btRNA) of the regression
line that fits the data point distribution relating tRNA
usage of individual protein-coding sequences to the multiplicity of the corresponding tRNA genes—a parameter
that, at variance with commonly used expressivity indices (Bennetzen and Hall 1982; Sharp and Li 1987;
Wright 1990), is not intrinsically biased by the choice
of a particular codon within a cognate pair. To probe the
dependence of cognate codon choice on translational selective pressure, the 4,128 protein-coding sequences of
S. cerevisiae were subdivided into 10 equal-sized classes
of increasing btRNA values (table 2). Although arbitrarily
chosen, this subdivision proved to be effective in discriminating between the relatively few highly expressed
genes and the vast majority of genes of low to intermediate expressivity. For a subset of genes comprised
in each of these 10 classes, we could also estimate the
abundance of the corresponding mRNAs on the basis of
yeast transcriptome data (Velculescu et al. 1997). As
shown in table 2, genes belonging to classes with high
btRNA values, and thus predicted to be under strong translational selective pressure, are also characterized by high
mRNA expression levels. This behavior is in keeping
with recent observations pointing to the existence of a
close relationship between translational and transcriptional indicators of gene expression (Pavesi 1999). Both
indicators, however, should be considered when judging
the expressivity of a gene, because codon usage measurements may be inadequate for genes whose compositions are strongly biased by functional or structural
constraints, while mRNA data necessarily reflect the expression profile associated with particular growth conditions. For each class thus delineated, we measured
wobble pyrimidine usage frequencies within cognate
pairs and found them to be heterogeneously distributed
among classes of tRNA usage (table 2). The heterogeneity of wobble pyrimidine usage is higher for cognate
pairs read by G-starting anticodons and is most prominent in a btRNA range that includes genes expressed at
high levels. Indeed, alternative cognate codons are used
Genomic Selection in Cognate Codon Pairs
Table 1
Observed and Expected Frequencies of Wobble Pyrimidine Codons in Saccharomyces
cerevisiae Protein-Coding Genes
Amino
Acid
tRNAa
tRNA
Usage
Tyr . . . . .
tGUA
72,778
Gly . . . . .
tGCC
72,527
Asn . . . . .
tGUU
135,928
Phe . . . . .
tGAA
97,905
Leu . . . . .
tGAG
38,300
Ser . . . . .
tGCU
53,921
Asp . . . . .
tGUC
130,039
Val . . . . .
tIAC
73,006
Pro . . . . .
tIGG
45,282
Thr . . . . .
tIGU
71,498
Ala . . . . .
tIGC
72,067
Ser . . . . .
tIGA
83,181
Ile . . . . . .
tIAU
104,718
His . . . . .
tGUG
47,201
Cys . . . . .
tGCA
27,288
Codon
uau
uac
ggu
ggc
aau
aac
uuu
uuc
cuu
cuc
agu
agc
gau
gac
guu
guc
ccu
ccc
acu
acc
gcu
gcc
ucu
ucc
auu
auc
cau
cac
ugu
ugc
Observed
ƒ
Expectedb
ƒ^
ƒ ln(ƒ/ƒ^)
ln Lc
41,591
31,187
50,833
21,694
81,445
54,483
58,284
39,621
26,618
11,682
32,177
21,744
85,641
44,398
48,418
24,588
30,075
15,207
44,366
27,132
44,881
27,186
51,977
31,204
67,550
37,168
30,522
16,679
17,213
10,075
46,535
26,242
46,375
26,151
86,915
49,012
62,602
35,302
24,489
13,810
34,478
19,442
83,149
46,889
46,681
26,324
28,954
16,327
45,717
25,780
46,081
25,985
53,187
29,993
66,958
37,759
30,181
17,019
17,448
9,839
24,671.539
5,384.129
4,665.719
24,053.561
25,294.144
5,765.585
24,165.535
4,573.042
2,218.973
21,954.910
22,222.450
2,433.201
2,528.975
22,423.628
1,768.923
21,677.456
1,142.426
21,080.675
21,330.838
1,386.848
21,184.238
1,228.337
21,196.131
1,235.125
594.609
2586.351
342.919
2336.581
2233.410
238.808
712.6
612.2
471.4
407.5
264.1
210.8
105.3
91.5
61.8
56.0
44.1
39.0
8.3
6.3
5.4
a For all the listed I-starting anticodons (INN), yeast has a UNN isoacceptor tRNA that reads the corresponding Aending codon (Percudani, Pavesi, and Ottonello 1997; Hani and Feldmann 1998); in these cases, in vivo experiments have
shown that decoding of A by I is negligible (Munz et al. 1981). The first anticodon residue of tRNA Phe GAA and the
second anticodon residue of tRNA Tyr GUA are present, respectively, as 29-O-methylguanosine and pseudouridine modified
bases (Hani and Feldmann 1998).
b Expected codon frequencies within cognate pairs were calculated by partitioning the tRNA usage of 4,128 protein
sequences according to the pyrimidine frequency (63.94% T, 36.06% C) of the noncoding portion of the yeast genome.
c Log-likelihood values were calculated by summing the ƒ-weighted logarithm of the ratio between observed and
expected frequencies for each codon within a cognate pair.
Table 2
Heterogeneity of Wobble Pyrimidine Codon Usage Among Classes of Genes of Different
tRNA Usage
CLASS
1
btRNAa
2
3
4
5
6
7
8
9
10
........
0.33
0.45
0.49
0.53
0.57
0.61
0.66
0.73
0.84
1.19
(412) (414) (414) (411) (413) (412) (414)
(412)
(413)
(413)
mRNA tagsb . . 201
241
172
189
261
417
670
1,138
1,768
9,860
(31)
(28)
(27)
(36)
(42)
(48)
(70)
(109)
(130)
(152)
1.3% 1.6% 1.2% 1.3% 1.7% 2.8% 4.5%
7.6%
11.9%
66.1%
tGNN . . . . . . . .
tINN . . . . . . . . .
Heterogeneity Among Classes 1–5c
Heterogeneity Among Classes 6–10c
53.7
34.2
529.2
112.1
a Average b
tRNA values (Percudani, Pavesi, and Ottonello 1997) are reported for each class; the numbers of genes in
individual classes are shown in parentheses.
b mRNA levels for three growth conditions (logarithmic, G2/M, and stationary) derived from yeast transcriptome data
(Velculescu et al. 1997; see Materials and Methods) are expressed as the absolute number and the percentage of tags
assigned to each class; a higher fraction of tags and a higher number of tag-identified genes (indicated in parentheses) are
associated with classes with high btRNA values.
c Mean heterogeneity G values of wobble pyrimidine usage for cognate codons read by guanosine-starting (tGNN) or
inosine-starting (tINN) anticodons were separately calculated for classes 1–5 and 6–10.
1755
1756
Percudani and Ottonello
in a much more homogeneous manner in classes of
genes subjected to lower translational selective pressure.
A clear-cut relationship between cognate codon usage and translational selection becomes apparent when
we consider departures from the expected frequencies of
individual cognate codons in classes of genes with different tRNA usages. For I-starting anticodons, there is
no marked selection for either C- or U-ending codons
in the different btRNA classes (fig. 1, left panels). In contrast, four cognate codon pairs read by G-starting anticodons display an increasingly strong departure from
compositional bias in classes of increasing translational
selective pressure (fig. 1, right panels). A superimposable deviation pattern was obtained by using a classification scheme based only on mRNA abundance data
(fig. 2). A closer inspection of cognate pairs undergoing
selection reveals that a C residue at the wobble position
is only and always preferred when four hydrogen bonds
are formed at the first two codon positions (figs. 1B and
2B). Conversely, a preference for a wobble U residue is
observed in the single case in which six hydrogen bonds
are formed at the two nonwobble positions (figs. 1F and
2F). In other words, whenever the type of interaction
established by the first anticodon base can determine a
variation in the number of hydrogen bonds, there is selection against codon-anticodon pairs that form a total
of either six or nine hydrogen bonds.
Distinguishing Between Efficiency- and AccuracyDriven Selection
The existence of a close link between the biased
use of some cognate codons and both gene expressivity
and tRNA usage indicates that wobble pyrimidine
choice for these codons is under translational selective
pressure. The force driving this choice can be ascribed
to an optimization of the efficiency and/or accuracy of
translation. While efficiency is an obvious requirement
of highly expressed genes, there is no a priori reason to
expect that the effects of misincorporation events on organism fitness are related to gene expressivity. Instead,
selection for translational accuracy is expected to be related to the evolutionary rate of a protein, as genes coding for proteins with a strong requirement for amino acid
sequence conservation should be subjected to the highest pressure for the choice of accurate codons. A conservation-dependent synonymous codon usage bias is
not present in bacteria (Hartl, Moriyama, and Sawyer
1994), but it has been observed in Drosophila (Akashi
1994; Tautz and Nigro 1998). To distinguish between
the relative contributions of efficiency and accuracy in
determining wobble base selection, we examined the dependence of such selection on amino acid sequence conservation. This was inferred from genetic distance measurements conducted on yeast genes sharing at least two
orthologous relatives in wholly sequenced, evolutionarily distinct microbial genomes (Tatusov, Koonin, and
Lipman 1997; see Materials and Methods for details).
In this case, as opposed to the case of proteins only
represented in yeast and in a single microbial lineage,
the average genetic distance was found not to be appreciably influenced by the composition of a particular phy-
logenetic assembly, thereby providing a more reliable
indicator of amino acid sequence conservation. The resulting set of evolutionarily conserved yeast genes, a
total of 402 sequences extracted from our nonredundant
sample, was then subdivided into 10 equal-sized classes
of increasing amino acid sequence conservation and analyzed for deviations from codon frequencies predicted
by mutational bias. No dependence of cognate codon
bias on amino acid sequence conservation was observed. As shown in figure 3A, this also holds for the
four cognate pairs characterized by marked deviation
from compositional bias. In contrast, the subdivision of
this same set of genes according to expressivity criteria
produced a pattern of deviation (fig. 3B) superimposable
on that previously obtained for the entire set of yeast
protein-coding genes.
By distinguishing between invariant or variable
amino acid residues in the 402 orthologous alignments,
we next examined the dependence of cognate codon
choice on the conservation of the specific amino acids
(Gly, Asn, Phe, and Tyr) that exhibit the strongest cognate bias in highly expressed genes. As shown in table
3, such a dependence is only apparent in the case of
glycine, for which the U/C ratio at the wobble position
of the ggu/ggc cognate pair is significantly higher for
invariant than for variable positions. The extent of such
preference is greater for genes having btRNA values
above the average (U/C odds ratio 5 1.56; P , 0.001,
G-test of independence) than for genes having btRNA values below average (U/C odds ratio 5 1.16; P . 0.1).
However, this conservation-dependent bias does not fully account for the preference of U-ending glycine codons by highly expressed genes, a preference that is also
observed at variable positions. Further evidence against
a selection of U-ending glycine codons exclusively driven by translational accuracy was provided by U/C ratios
measured at invariant and variable sequence positions of
individual protein-coding genes. In fact, an identical
proportion of wobble U and C residues at either position
(U/C odds ratio 5 1) is present in 29 cases; 177 genes
have a higher proportion of wobble uridine residues at
variable positions (U/C odds ratio , 1), while a prevalence of U at invariant positions is observed in 196
genes (U/C odds ratio . 1).
It thus appears that the preference for C-ending codons within aau/c, uuu/c, and uau/c cognate pairs is dictated only by efficiency factors, while the opposite preference in the case of the ccu/c glycine pair reflects both
efficiency and accuracy needs. Such a difference is also
evident when we consider the dependence of wobble
base choice on gene length for these four cognate codon
pairs (fig. 4). In keeping with recent observations, yeast
protein-coding genes of short lengths, which among
highly expressed genes have an especially strong requirement for translational efficiency (Moriyama and
Powell 1998), have a strongly biased usage of all four
of these cognate codon pairs. Wobble pyrimidine selection becomes increasingly relaxed with increasing gene
length for the three cognate pairs that are not influenced
by amino acid residue conservation. Instead, for glycine
ccu/c cognate codons, selective pressure is maintained
Genomic Selection in Cognate Codon Pairs
1757
FIG. 1.—Deviation from compositional bias of wobble pyrimidine base frequencies in classes of genes with different tRNA usages. The
4,128 yeast protein-coding genes were subdivided into 10 classes of increasing btRNA values as specified in table 2. Data points correspond to
log-likelihood distances (ln L) from the expected usage of alternative cognate codons forming either four (A and B), five (C and D), or six (E
and F) hydrogen bonds at the two nonwobble positions; total ln L values were divided by the number of genes in each class. Wobble C and U
frequencies higher than expected are plotted, respectively, above and below the reference line representing the frequencies predicted by mutational
bias.
1758
Percudani and Ottonello
FIG. 2.—Deviation from compositional bias of wobble pyrimidine base frequencies in classes of genes of different expressivities. The 673
mRNA tag-identified yeast genes in our sample were subdivided into 10 equal-sized classes with increasing numbers of tags per gene. The
percentage of the total number of tags represented in each class is indicated. Data points correspond to log-likelihood distances (ln L) from the
expected usage of alternative cognate codons forming either four (A and B), five (C and D), or six (E and F) hydrogen bonds at the two
nonwobble positions; total ln L values were divided by the number of genes in each class. Wobble C and U frequencies higher than expected
are plotted, respectively, above and below the reference line representing the frequencies predicted by mutational bias.
Genomic Selection in Cognate Codon Pairs
1759
FIG. 3.—Dependence of wobble base selection on sequence conservation and tRNA usage. A, Four hundred two yeast genes sharing at
least two orthologous relatives in wholly sequenced, evolutionarily distinct microbial genomes were subdivided into 10 equal-sized classes of
increasing sequence conservation, expressed as the reciprocal of the mean genetic distance (1/D); average 1/D values for each class are given.
B, The same set of 402 genes was subdivided into 10 classes of increasing btRNA values according to the criteria specified in table 2. Data points
correspond to log-likelihood distances (ln L) from the expected usage of codons belonging to the four translationally selected cognate pairs
divided by the number of genes in each class. Wobble C and U frequencies higher than expected are plotted, respectively, above and below the
reference line representing the frequencies predicted by mutational bias.
even in the longest coding sequences, which are expected to have the greatest need for accuracy (EyreWalker 1996).
Discussion
As revealed by the present analysis, there is good
overall correspondence between the observed usage of
wobble pyrimidines in cognate codons and the frequency expected on the assumption of no translational selective pressure. Contrasting with the selective constraints
acting on third-base choice for synonymous codons read
by different tRNA isoacceptors (as for most purine-ending codons), this delineates a rather unique situation of
lack of selection within a coding region. In this scenario,
four particular cases stand out in which wobble base
usage departs markedly from compositional bias. Two
observations indicate that departure from compositional
bias pertains to the physicochemical characteristics of
codon-anticodon interaction and is mostly determined
by a selective pressure for the optimization of translational efficiency. First, all of the prominent cases of
wobble pyrimidine selection are associated with a particular type of tRNA anticodon (G-starting) and thus
with the formation of a different number of hydrogen
bonds with alternative cognate codons. More specifically, we find that third-base choice within cognate pairs
determines an increased or a decreased stability of codon-anticodon interaction when four or six hydrogen
bonds, respectively, are formed at the two nonwobble
positions. Second, all cases of deviation are consistently
associated with the degree of expressivity of individual
genes, while a relationship with amino acid conservation
is only observed for the glycine ggc/ggu pair. This indicates that the selective pressure driving cognate codon
choice pertains mainly to the optimization of the rate
and energetic cost, rather than to the accuracy, of the
translational process. Base stacking, besides hydrogen
bonding, is known to contribute to the stability of intermolecular RNA duplexes. It may thus seem surprising
Table 3
Wobble Pyrimidine Frequencies in Translationally Selected Cognate Codons Encoding Invariant or Variable Amino
Acid Residues
GLYCINE (tRNA GCC)
TYPE OF
RESIDUEa
No.
Invariant . . 2,644
Variable . . . 3,553
a
ggu
ggc
0.81
0.75
0.19
0.25
U/C
Odds
Ratiob
1.41*
ASPARAGINE (tRNA GUU)
No.
aau
aac
675
4,780
0.51
0.54
0.49
0.46
U/C
Odds
Ratiob
0.91
PHENYLALANINE (tRNA GAA)
No.
uuu
uuc
725
4,647
0.54
0.54
0.46
0.46
U/C
Odds
Ratiob
0.98
TYROSINE (tRNA GUA)
No.
uau
uac
599
3,465
0.52
0.51
0.48
0.49
Amino acid residues in 402 sets of aligned orthologous genes were classified as invariant or variable as specified in Materials and Methods.
U/C odds ratios were calculated by dividing the proportions of nnu/nnc codons at invariant and variable sequence positions.
* P , 0.001, G-test of independence.
b
U/C
Odds
Ratiob
1.02
1760
Percudani and Ottonello
FIG. 4.—Relationships between wobble pyrimidine choice and gene length in genes subjected to translational selection. A set of 826 genes
belonging to the two highest btRNA classes in table 2 was subdivided into five gene length (kb) classes; the number of genes in each class is
given in parentheses. Data points correspond to the fractional usage of cytosine at the wobble position (nnc/(nnu1nnc)) of the four translationally
selected cognate codon pairs; the dashed line indicates the fraction of third-position C residues predicted by mutational bias.
that the hydrogen bond balance of different codon-anticodon pairs is by itself sufficient to account for wobble-base selection in S. cerevisiae. It should be noted,
however, that a substantial contribution of binding interactions other than hydrogen bonds to this kind of selection would not explain the clear-cut distinction between codons read by inosine- and guanosine-starting
anticodons. Furthermore, these same interactions do not
account for the nearly equal selection intensity measured
for six-hydrogen-bonded codon-anticodon pairs (figs. 1B
and 2B), regardless of differences in stacking and modification states of the anticodon bases.
The association rate of ternary complexes (Elongation Factor·GTP·aa-tRNA) with the ribosome depends
only on their relative concentrations and does not involve codon recognition (Rodnina et al. 1996). Thus, the
choice among codons read by the same tRNA must rely
on differences in the dissociation constants of the two
alternative codon-anticodon complexes within a given
cognate pair. It is generally accepted that tRNA dissociation along the translational elongation cycle can occur at three distinct stages (Thompson 1988; Yarus and
Smith 1995): (1) initial discharge of the ternary complex
prior to codon-dependent GTP hydrolysis, (2) release of
the aminoacyl-tRNA complex after GTP hydrolysis
(proofreading), and (3) exit of the deacylated tRNA. Because departures from neutral cognate codon usage are
either toward an increased or a decreased stability of
codon-anticodon interaction, it follows that the targets
of such selection, and hence the translational steps involved, must be different.
The reason for the avoidance of six-hydrogenbonded codon-anticodon interactions, observed for the
Asn, Phe, and Tyr cognate pairs regardless of amino
acid residue conservation, conceivably resides in a reduction of the frequency of premature release events,
whose probability is stochastically determined by kinetic
standards built into the translational system (Thompson
1988). Since GTP hydrolysis has recently been shown
to be triggered by a cognate ternary complex at a rate
that is four orders of magnitude higher than the rate for
a noncognate complex (Rodnina et al. 1996), it seems
unlikely that this step has enough sensitivity to discriminate within cognate codon pairs. Such discrimination,
instead, may be afforded by proofreading, a step in
which the fate of an EF·GDP·aa-tRNA complex is stochastically determined by the ratio between the codondependent rate constant for dissociation of the ternary
complex and the codon-independent release of the binary complex (EF·GDP). Acting as an internal kinetic
standard, the latter step must be slow enough to allow
for the release of near-cognate ternary complexes. The
same step, however, can also lead to the premature dissociation of the counterselected six-hydrogen-bonded
ternary complexes. Excess of proofreading of weak interacting cognate codon-anticodon complexes would
thus determine a slowing down and an increased energetic cost of elongation. Selection against nine-hydrogen-bonded codon-anticodon pairs, as observed in the
case of glycine ggu/ggc cognate codons, may pertain to
the release of deacylated tRNAs from the E site. This
step, which in various fungi (including S. cerevisiae) is
promoted by a dedicated, ATP-dependent elongation
factor (Triana Alonso, Chakraburtty, and Nierhaus
1995), is the only one in which the exceedingly slow
dissociation of an otherwise correct tRNA can critically
reduce the efficiency of translation. However, the observation of an even more prominent selection of the
ggu codon at positions at which there is a critical requirement for glycine conservation suggests that an additional reason for this choice is the fulfillment of an
accuracy need. This is consistent with the more stable
interaction of the counterselected ggc codon with the
abundant competitor tRNAAsp GUC through second-po-
Genomic Selection in Cognate Codon Pairs
sition wobbling, as previously proposed by Kato (1990).
In line with this view, since in different organisms the
type and relative concentration of the competitor tRNA
may vary, organism-specific preferences in the usage of
this particular cognate pair are expected.
If the choice of particular codon-anticodon pairs
really results in an optimized translation, it may seem
surprising that only a small fraction of yeast genes actually undergo selection. It should be noted, however,
that although selection within cognate pairs is only apparent in the last 2 of the 10 gene classes we considered,
these 2 classes alone account for the large majority (ca.
80%) of the total mRNA output of yeast cells (table 2).
Genes belonging to these two high-expressivity classes
are additionally characterized by a decreasing relative
usage of all weakly or strongly interacting purine-ending
codons. In fact, in highly expressed genes, we also observed a two- to fivefold reduction in the relative usage
of the five codons (cgg, ggg, aua, uua, aaa) read by
nonwobbling tRNAs and obligatorily leading to six- or
nine-hydrogen-bonded codon-anticodon interactions.
Furthermore, the strikingly reduced copy number of
genes encoding tRNAs with anticodons that obligatorily
form nine hydrogen bonds has been evidenced previously in yeast (Percudani, Pavesi, and Ottonello 1997).
The avoidance of both six- and nine-hydrogen-bonded
cognate codon-anticodon pairs thus appears to be part
of a general strategy which optimizes the biosynthetic
efficiency of most cellular proteins while complying
with a basic need for accuracy that is set by kinetic
standards internal to the system. By allowing the proper
recognition of all 61 sense codons, these kinetic standards have probably emerged as those that best satisfy
the most recurrent cases of hydrogen-bonding interaction resulting from the combinatorial assortment of the
four bases (7–8 hydrogen bonds for 75% of the codons).
Accordingly, the counterselection of the inherently less
frequent six- and nine-hydrogen-bonded codon-anticodon pairs can be viewed as an adaptation to the kinetic
standards of the translational system, relying on both
exploitation of code degeneracy and careful use of wobbling.
Acknowledgments
We are grateful to Gian Luigi Rossi for encouragement and support, to Victor Velculescu for the extended
set of yeast transcriptome data, to Roberto Favilla for
helpful discussions, and to Giorgio Dieci, Alessio Peracchi, and Claudio Rivetti for a critical reading of the
manuscript. Computer assistance by Silvia Giuliodori is
also gratefully acknowledged. This work was supported
by grants from the Ministry of University and of Scientific and Technological Research (MURST, Rome, Italy).
LITERATURE CITED
AKASHI, H. 1994. Synonymous codon usage in Drosophila melanogaster: natural selection and translational accuracy. Genetics 136:927–935.
1761
BENNETZEN, J. L., and B. D. HALL. 1982. Codon selection in
yeast. J. Biol. Chem. 257:3026–3031.
BULMER, M. 1991. The selection-mutation-drift theory of synonymous codon usage. Genetics 129:897–907.
CURRAN, J. F., and M. YARUS. 1989. Rates of aminoacyl-tRNA
selection at 29 sense codons in vivo. J. Mol. Biol. 209:65–
77.
DOOLITTLE, R. F., D. F. FENG, S. TSANG, G. CHO, and E. LITTLE. 1996. Determining divergence times of the major kingdoms of living organisms with a protein clock. Science 271:
470–477.
EYRE-WALKER, A. 1996. Synonymous codon bias is related to
gene length in Escherichia coli: selection for translational
accuracy? Mol. Biol. Evol. 13:864–872.
GOFFEAU, A., B. G. BARRELL, H. BUSSEY et al. (16 co-authors).
1996. Life with 6000 genes. Science 274:563–567.
GROSJEAN, H., and W. FIERS. 1982. Preferential codon usage
in prokaryotic genes: the optimal codon-anticodon interaction energy and the selective codon usage in efficiently expressed genes. Gene 18:199–209.
GROSJEAN, H., D. SANKOFF, W. M. JOU, W. FIERS, and R. J.
CEDERGREN. 1978. Bacteriophage MS2 RNA: a correlation
between the stability of the codon:anticodon interaction and
the choice of code words. J. Mol. Evol. 12:113–119.
HANI, J., and H. FELDMANN. 1998. tRNA genes and retroelements in the yeast genome. Nucleic Acids Res. 26:689–696.
HARTL, D. L., E. N. MORIYAMA, and S. A. SAWYER. 1994.
Selection intensity for codon bias. Genetics 138:227–234.
IKEMURA, T. 1981. Correlation between the abundance of Escherichia coli transfer RNAs and the occurrence of the respective codons in its protein genes: a proposal for synonymous codon choice that is optimal for the E. coli translational system. J. Mol. Biol. 151:389–409.
. 1982. Correlation between the abundance of yeast
transfer RNAs and the occurrence of the respective codons
in protein genes. Differences in synonymous codon choice
patterns of yeast and Escherichia coli with reference to the
abundance of isoaccepting transfer RNAs. J. Mol. Biol.
158:573–597.
. 1985. Codon usage and tRNA content in unicellular
and multicellular organisms. Mol. Biol. Evol. 2:13–34.
. 1992. Correlation between codon usage and tRNA
content in microorganisms. Pp. 88–111 in D. L. HATFIELD,
B. J. LEE, and R. M. PIRTLE, eds. Transfer RNA in protein
synthesis. CRC Press, London.
KATO, M. 1990. Codon discrimination due to presence of abundant non-cognate competitive tRNA. J. Theor. Biol. 142:
35–39.
KURLAND, C. G. 1987. Strategies for efficiency and accuracy
in gene expression. Trends Biochem. Sci. 12:126–128.
MORIYAMA, E. N., and J. R. POWELL. 1998. Gene length and
codon usage bias in Drosophila melanogaster, Saccharomyces cerevisiae and Escherichia coli. Nucleic Acids Res.
26:3188–3193.
MUNZ, P., U. LEOPOLD, P. AGRIS, and J. KOHLI. 1981. In vivo
decoding rules in Schizosaccharomyces pombe are at variance with in vitro data. Nature 294:187–188.
PARKER, J., T. C. JOHNSTON, P. T. BORGIA, G. HOLTZ, E. REMAUT, and W. FIERS. 1983. Codon usage and mistranslation.
In vivo basal level misreading of the MS2 coat protein message. J. Biol. Chem. 258:10007–10012.
PAVESI, A. 1999. Relationships between transcriptional and
translational control of gene expression in Saccharomyces
cerevisiae: a multiple regression analysis. J. Mol. Evol. 48:
133–141.
1762
Percudani and Ottonello
PEARSON, W. R., and D. J. LIPMAN. 1988. Improved tools for
biological sequence comparison. Proc. Natl. Acad. Sci.
USA 85:2444–2448.
PERCUDANI, R., A. PAVESI, and S. OTTONELLO. 1997. Transfer
RNA gene redundancy and translational selection in Saccharomyces cerevisiae. J. Mol. Biol. 268:322–330.
RODNINA, M. V., T. PAPE, R. FRICKE, L. KUHN, and W. WINTERMEYER. 1996. Initial binding of the elongation factor
Tu.GTP.aminoacyl-tRNA complex preceding codon recognition on the ribosome. J. Biol. Chem. 271:646–652.
SHARP, P. M., and W. H. LI. 1987. The codon adaptation index—a measure of directional synonymous codon usage
bias, and its potential applications. Nucleic Acids Res. 15:
1281–1295.
SHARP, P. M., M. STENICO, J. F. PEDEN, and A. T. LLOYD. 1993.
Codon usage: mutational bias, translational selection, or
both? Biochem. Soc. Trans. 21:835–841.
SOKAL, R. R., and F. J. ROHLF. 1995. Biometry. W. H. Freeman, New York.
TATUSOV, R. L., E. V. KOONIN, and D. J. LIPMAN. 1997. A
genomic perspective on protein families. Science 278:631–
637.
TAUTZ, D., and L. NIGRO. 1998. Microevolutionary divergence
pattern of the segmentation gene hunchback in Drosophila.
Mol. Biol. Evol. 15:1403–1411.
THOMAS, L. K., D. B. DIX, and R. C. THOMPSON. 1988. Codon
choice and gene expression: synonymous codons differ in
their ability to direct aminoacylated-transfer RNA binding
to ribosomes in vitro. Proc. Natl. Acad. Sci. USA 85:4242–
4246.
THOMPSON, J. D., D. G. HIGGINS, and T. J. GIBSON. 1994.
CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, positions-specific gap penalties and weight matrix. Nucleic
Acids Res. 22:4673–4680.
THOMPSON, R. C. 1988. EF-Tu provides an internal kinetic
standard for translational accuracy. Trends. Biochem. Sci.
13:91–93.
TRIANA ALONSO, F. J., K. CHAKRABURTTY, and K. H. NIERHAUS. 1995. The elongation factor 3 unique in higher fungi
and essential for protein biosynthesis is an E site factor. J.
Biol. Chem. 270:20473–20478.
VELCULESCU, V. E., L. ZHANG, W. ZHOU, J. VOGELSTEIN, M.
A. BASRAI, D. E. BASSETT JR., P. HIETER, B. VOGELSTEIN,
and K. W. KINZLER. 1997. Characterization of the yeast
transcriptome. Cell 88:243–251.
WRIGHT, F. 1990. The ‘effective number of codons’ used in a
gene. Gene 87:23–29.
YARUS, M., and D. SMITH. 1995. tRNA on the ribosome: a
waggle theory. Pp. 443–469 in D. SÖLL and U. RAJBHANDARY, eds. tRNA: structure, biosynthesis, and function.
American Society for Microbiology, Washington, D.C.
STANLEY SAWYER, reviewing editor
Accepted August 17, 1999