* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Selection at the Wobble Position of Codons Read by the Same tRNA
Genomic library wikipedia , lookup
Non-coding DNA wikipedia , lookup
Gene desert wikipedia , lookup
Magnesium transporter wikipedia , lookup
Expression vector wikipedia , lookup
Gene expression wikipedia , lookup
Molecular ecology wikipedia , lookup
Point mutation wikipedia , lookup
Community fingerprinting wikipedia , lookup
Gene regulatory network wikipedia , lookup
Silencer (genetics) wikipedia , lookup
Two-hybrid screening wikipedia , lookup
Promoter (genetics) wikipedia , lookup
Genomic imprinting wikipedia , lookup
Endogenous retrovirus wikipedia , lookup
Biosynthesis wikipedia , lookup
Ridge (biology) wikipedia , lookup
Gene expression profiling wikipedia , lookup
Epitranscriptome wikipedia , lookup
Molecular evolution wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Selection at the Wobble Position of Codons Read by the Same tRNA in Saccharomyces cerevisiae Riccardo Percudani and Simone Ottonello Institute of Biochemical Sciences, University of Parma, Parma, Italy The transfer RNA gene complement of Saccharomyces cerevisiae was utilized for a whole-genome analysis of the deviation from a neutral usage of pyrimidine-ending cognate codons, that is, codons read by a single tRNA species having either inosine or guanosine as the first anticodon base. Mutational pressure at the wobble position was estimated from the base composition of the noncoding portion of the yeast genome. The selective pressure for translational efficiency was inferred from the degree of codon adaptation to tRNA gene redundancy and from mRNA abundance data derived from yeast transcriptome analysis. Amino acid conservation in orthologous comparisons with wholly sequenced microbial genomes was used to estimate translational accuracy requirements. A close correspondence was observed between the usage of wobble position pyrimidines and the frequency predicted by mutational bias. However, in the case of four cognate pairs (Gly: ggu/ggc; Asn: aau/aac; Phe: uuu/uuc; Tyr: uau/ uac) all read by guanosine-starting anticodons, we found evidence for a strong selective pressure driven by translational efficiency. Only for the glycine pair, wobble pyrimidine choice also appears to fulfill a translational accuracy requirement. Wobble pyrimidine selection is strictly related to the number of hydrogen bonds formed by alternative cognate codons: whenever a different number of hydrogen bonds can be formed at the wobble position, there is selection against six- or nine-hydrogen-bonded codon-anticodon pairs. Our results indicate that an intrinsic codon preference, critically dependent on the stability of codon-anticodon interaction and mainly reflecting selection for the optimization of translational efficiency, is built into the translational apparatus. Introduction Mutation bias and natural selection determine a nonrandom usage of synonymous codons in prokaryotic and eukaryotic genes (Bulmer 1991; Sharp et al. 1993). A major driving force responsible for deviations from a neutral choice of codons is translational selection, and a noticeable indication of selection toward an optimized protein synthesis is the adjustment of codon usage to the abundance of the various tRNA species. This has been documented for prokaryotic and eukaryotic genes (Ikemura 1981, 1982, 1985, 1992; Bennetzen and Hall 1982) and more recently confirmed by a whole-genome analysis in the yeast Saccharomyces cerevisiae, for which synonymous codon usage and even protein amino acid composition were found to be highly coadapted with both gene copy number and intracellular content of individual tRNA species (Percudani, Pavesi, and Ottonello 1997). Although the biased usage of synonymous codons is to a large extent explained by this mutual relationship, the preference for one codon over another may also depend on more intrinsic aspects of the translational machinery. One such aspect is the stability of codon-anticodon interaction. The contribution of codonanticodon complex stability to translational selection can be evaluated by measuring the relative usage of cognate codons (i.e., codons read by a single tRNA species), whose choice is independent of tRNA abundance. Biased usage of some cognate codons in highly expressed genes has been reported previously (Grosjean et al. 1978; Ikemura 1981), and it has been shown that some particular Escherichia coli cognate codons differ in their Key words: yeast genome, wobble, codon-anticodon interaction, hydrogen bonds, translational efficiency, translational accuracy. Address for correspondence and reprints: Simone Ottonello, Institute of Biochemical Sciences, Viale delle Scienze, I-43100 Parma, Italy. E-mail: [email protected]. Mol. Biol. Evol. 16(12):1752–1762. 1999 q 1999 by the Society for Molecular Biology and Evolution. ISSN: 0737-4038 1752 rates of tRNA recruitment in vitro (Thomas, Dix, and Thompson 1988) and are translated at different rates in vivo (Curran and Yarus 1989). A general preference for ‘‘intermediate G/C content’’ codons has been recognized early in bacteria (Grosjean et al. 1978; Grosjean and Fiers 1982), and a set of rules to predict cognate codon choice have been formulated taking into account the different types of codon-anticodon interactions (Ikemura 1981, 1982, 1985, 1992). Nevertheless, the existence of a causal relationship between cognate codon choice and translational selection has been either negated (Kurland 1987) or variably attributed to translational efficiency or accuracy factors or both (Grosjean and Fiers 1982; Parker et al. 1983; Thomas, Dix, and Thompson 1988; Curran and Yarus 1989). The alternative usage of codons within a cognate pair can entail variations in the energetics of codon-anticodon interaction. These variations depend on the identity of the first anticodon base and on the use of wobbling for a given organism. For example, in the case of a G-starting anticodon, three or two hydrogen bonds can be formed at the wobble position with C or U residues, while the same interactions made by an I-starting anticodon invariably involve the formation of two hydrogen bonds. A full appreciation of cognate codon selection and its possible effects on either the efficiency or the accuracy of translation thus requires complete knowledge not only of the codon frequency in protein coding genes, but also of the composition and decoding specificity of the tRNA gene complement. The availability of detailed information on the decoding system of Saccharomyces cerevisiae, together with a systematic analysis of the deviation from a neutral usage of cognate codons, has allowed us to delineate on a whole-genome scale the significance of codon preferences that are independent of tRNA abundance. A key aspect of this work was the use of two indicators of gene expressivity not in- Genomic Selection in Cognate Codon Pairs trinsically biased by the choice of particular codons within cognate pairs and the evaluation of translational accuracy requirements through the comparison of orthologous genes from wholly sequenced genomes. This allowed us to distinguish between efficiency (i.e., the optimization of the speed and energetic cost) and accuracy (i.e., the optimization of decoding fidelity) as the driving forces of the translational selective pressure acting on wobble pyrimidine choice in yeast. Materials and Methods Protein-coding sequences of S. cerevisiae were extracted from the complete list of yeast open reading frames (ORFs) available at the MIPS site (ftp:// ftp.mips.embnet.org) as of February 1998. Sequences recognized as ORFs through computer analysis but of uncertain identity were sorted out and discarded by searching individual entries for the key words ‘‘hypothetical protein,’’ ‘‘questionable ORF,’’ and ‘‘pseudogene.’’ To minimize redundancy, all the protein-coding sequences thus selected were internally matched, and single members of highly conserved paralogous gene families were included in our sequence sample. This filtering step was carried out with the iterative use of the program FASTA (Pearson and Lipman 1988) set to standard parameters; a dedicated C language program was employed to conduct pairwise homology searches and to eliminate all the sequences having nucleotide sequence similarity scores with other yeast genes higher than one third of the selfscores. The decoding properties and the gene copy number of yeast tRNAs utilized in this study have been reported previously (Percudani, Pavesi, and Ottonello 1997; Hani and Feldmann 1998). For the entire set of retained yeast protein-coding genes, we calculated the usage of alternative codons read by a single tRNA species, with the exception of four tRNAs whose decoding specificities are still uncertain. The expected base frequency at the wobble position of each cognate pair was estimated from the base frequency distribution of the noncoding portion of the yeast genome (i.e., the subset of sequences remaining after the conservative subtraction of all identified ORFs). Deviations from the expected base frequencies for each cognate codon pair were estimated with the G-statistic (Sokal and Rohlf 1995, pp. 685–743) and expressed as the natural logarithm of likelihood, calculated as the sum of ƒ ln(ƒ/ƒ^),where ƒ and ƒ^ are, respectively, the observed and the expected base frequencies. Deviations from the expected base frequencies in the different classes in which yeast genes were subdivided were calculated by summing cognate codon frequencies for all of the genes included in each class. Heterogeneity in cognate codon usage among the various gene classes was evaluated with the heterogeneity G-test. As shown by Ikemura (1981, 1985) the parameter values of the regression line ( y 5 a 1 bx) of tRNA abundance versus codon frequency distributions strictly depend on the expressivity of genes, and high slope values are associated with genes that encode abundant protein species. In this study, the slope (btRNA) values of the regression lines that fit the data point distributions 1753 relating yeast tRNA gene multiplicity to codon frequency in individual genes (Percudani, Pavesi, and Ottonello 1997) were used to subdivide yeast protein-coding genes into 10 classes of translational selective pressure. The intracellular levels of yeast mRNAs were inferred from an extensive set of data (V. E. Velculescu, personal communication) derived from yeast transcriptome analysis (Velculescu et al. 1997). These data, which were obtained under three different growth conditions (logarithmic, G2/M transition, and stationary), refer to a total of 60,633 yeast mRNAs, each represented by a sequence tag of 10 nt preceded by a four-base N1aIII restriction site. Tags associated with putative untranslated regions were not considered. Only tags internal to the coding sequence and matching no more than one genomic region were used to assign mRNA output values to genes in our sample. To minimize the number of false positives, tags not represented at least once in all three growth conditions were similarly discarded. With these restrictions, which reduce the number of tag-identifiable genes, especially for those belonging to low-expression classes, a total of 14,917 tags could be unambiguously assigned to 673 genes in our sample. Clusters of orthologous groups (COGs), consisting of individual orthologous proteins or orthologous sets of paralogs from wholly sequenced organisms belonging to at least three lineages (Tatusov, Koonin, and Lipman 1997), were used to identify orthologous relatives of yeast genes. Within each COG, obtained from the World Wide Web (http://www.ncbi.nlm.nih.gov/COG), proteins from yeast and other microorganisms were reciprocally compared (i.e., each yeast protein against all the other predicted microbial proteins and vice versa) with the rss program of the FASTA package (Pearson and Lipman 1988). Each comparison yielded a similarity score, expressed as S 5 (Sobs 2 Srand)/(Sident 2 Srand), where Sobs is the score for a given amino acid alignment, Srand is the score for the comparison of two random sequences (resulting from five independent randomizations) having the same length and amino acid composition as the real sequences, and Sident is the average of the score values produced by self comparisons (Doolittle et al. 1996). For all yeast proteins within a COG, we took into account the best score value (BeS) obtained for a given organism. Similarly, the BeS score values of microbial protein cooling sequences with respect to yeast were recorded. The relationship between two sequences was considered orthologous when a yeast BeS in a given organism was reciprocally BeS for such an organism in yeast. For our analysis, we selected yeast proteins having at least two orthologs in two different evolutionary lineages corresponding to either Archaea (Methanococcus jannaschii), Cyanobacteria (Synechocystis spp.), g-Gram-negative (Escherichia coli, Haemophilus influenzae), or e-Gramnegative (Helicobacter phylori) bacteria. Amino acid sequence conservation for a total of 402 yeast genes meeting the above criteria was considered to be inversely related to the mean genetic distance (D), calculated by the Poissonian relation D 5 2ln(S). Multiple alignments within this set of orthologous genes were carried out with the CLUSTAL W program (Thompson, Higgins, 1754 Percudani and Ottonello and Gibson 1994) set to standard parameters; leaving out all gaped positions, we distinguished variable residues from invariant ones (i.e., residues corresponding to positions at which the identity of an amino acid is the same in all organisms). A dedicated program was used to locate cognate codons for Gly (ggu/ggc), Asn (aau/ aac), Phe (uuu/uuc), and Tyr (uau/uac) in the corresponding yeast nucleotide sequences. The proportions of nnu and nnc cognate codons at invariant and variable sequence positions were compared by calculating the U/ C odds ratios. These were obtained by dividing the proportions of nnu/nnc codons at the two types of residue positions; the resulting values were tested for significance using the G-test of independence. Individual btRNA values for the 4,128 yeast genes considered in the present study, together with mRNA level and orthologous comparison data for a subset of these genes, are available at the URL http://biochimica. unipr.it/wobble. Results Deviation from Neutrality at the Wobble Position of Pyrimidine-Ending Cognate Codons Wobbling in S. cerevisiae pertains to pyrimidineending codons and involves 15 tRNAs with either guanosine- or inosine-starting anticodons (table 1). Exceptions to this rule are four additional tRNAs for which the modification state of the first anticodon base is as yet poorly defined and that are supposed to recognize a wobbling purine (Percudani, Pavesi, and Ottonello 1997; Hani and Feldmann 1998). We determined the occurrence of the codons served by the 15 tRNAs reading pyrimidine-ending cognate pairs in a complete, nonredundant set of yeast protein-coding sequences that was extracted from the 6,296 ORFs identified by the Yeast Genome Project (Goffeau et al. 1996) as of February 1998, by excluding 429 ORFs annotated as ‘‘questionable,’’ 19 putative pseudogenes, and 1,242 hypothetical proteins. A final sample of 4,128 bona fide protein-coding genes was obtained by filtering out 62 identical and 414 highly similar sequences. On the assumption of no selective pressure, the expected frequencies of cytidineand uridine-ending codons within cognate pairs were calculated by taking the base composition of the noncoding portion of the yeast genome (63.9% A/T, 36.1% G/C) as our best estimate of mutational bias in S. cerevisiae. Deviations from these expected frequencies of the relative usage of C- and U-ending cognate codons were evaluated using the G-statistic (Sokal and Rohlf 1995, pp. 685–743) and expressed as the natural logarithm of the respective likelihood values. As shown in table 1, there is generally good agreement between the observed and the expected partitionings of C- and Uending codons within cognate pairs. If one further considers the average distribution of third-position C and U residues for all 15 cognate pairs, a value of 63.6% U and 36.4% C is obtained, which is remarkably closer to the overall pyrimidine composition of the noncoding portion than to that of the coding portion of the yeast genome (58.9% U, 41.1% C). However, there are some prominent cases of deviation. Cognate pairs that exhibit the highest deviation from the average C/U distribution are also characterized by the highest likelihood distance from expected frequencies and are all decoded by tRNAs having guanosine residues at the wobble position. A distinguishing feature of G-starting anticodons, as compared with I-starting anticodons, is their ability to form a number of hydrogen bonds that is different for standard (G[C) or wobbling (G5U) interactions. Wobble Base Choice as a Function of Translational Selection Since deviations from neutrality in cognate codon usage appear to be associated with particular tRNA anticodons, and hence to selection at the translational level, we examined the dependence of these deviations from the intensity of translational selection. The selective pressure acting on individual protein-coding genes was inferred from the degree of adaptation between tRNA gene copy number and tRNA usage. This was measured by the slope value (btRNA) of the regression line that fits the data point distribution relating tRNA usage of individual protein-coding sequences to the multiplicity of the corresponding tRNA genes—a parameter that, at variance with commonly used expressivity indices (Bennetzen and Hall 1982; Sharp and Li 1987; Wright 1990), is not intrinsically biased by the choice of a particular codon within a cognate pair. To probe the dependence of cognate codon choice on translational selective pressure, the 4,128 protein-coding sequences of S. cerevisiae were subdivided into 10 equal-sized classes of increasing btRNA values (table 2). Although arbitrarily chosen, this subdivision proved to be effective in discriminating between the relatively few highly expressed genes and the vast majority of genes of low to intermediate expressivity. For a subset of genes comprised in each of these 10 classes, we could also estimate the abundance of the corresponding mRNAs on the basis of yeast transcriptome data (Velculescu et al. 1997). As shown in table 2, genes belonging to classes with high btRNA values, and thus predicted to be under strong translational selective pressure, are also characterized by high mRNA expression levels. This behavior is in keeping with recent observations pointing to the existence of a close relationship between translational and transcriptional indicators of gene expression (Pavesi 1999). Both indicators, however, should be considered when judging the expressivity of a gene, because codon usage measurements may be inadequate for genes whose compositions are strongly biased by functional or structural constraints, while mRNA data necessarily reflect the expression profile associated with particular growth conditions. For each class thus delineated, we measured wobble pyrimidine usage frequencies within cognate pairs and found them to be heterogeneously distributed among classes of tRNA usage (table 2). The heterogeneity of wobble pyrimidine usage is higher for cognate pairs read by G-starting anticodons and is most prominent in a btRNA range that includes genes expressed at high levels. Indeed, alternative cognate codons are used Genomic Selection in Cognate Codon Pairs Table 1 Observed and Expected Frequencies of Wobble Pyrimidine Codons in Saccharomyces cerevisiae Protein-Coding Genes Amino Acid tRNAa tRNA Usage Tyr . . . . . tGUA 72,778 Gly . . . . . tGCC 72,527 Asn . . . . . tGUU 135,928 Phe . . . . . tGAA 97,905 Leu . . . . . tGAG 38,300 Ser . . . . . tGCU 53,921 Asp . . . . . tGUC 130,039 Val . . . . . tIAC 73,006 Pro . . . . . tIGG 45,282 Thr . . . . . tIGU 71,498 Ala . . . . . tIGC 72,067 Ser . . . . . tIGA 83,181 Ile . . . . . . tIAU 104,718 His . . . . . tGUG 47,201 Cys . . . . . tGCA 27,288 Codon uau uac ggu ggc aau aac uuu uuc cuu cuc agu agc gau gac guu guc ccu ccc acu acc gcu gcc ucu ucc auu auc cau cac ugu ugc Observed ƒ Expectedb ƒ^ ƒ ln(ƒ/ƒ^) ln Lc 41,591 31,187 50,833 21,694 81,445 54,483 58,284 39,621 26,618 11,682 32,177 21,744 85,641 44,398 48,418 24,588 30,075 15,207 44,366 27,132 44,881 27,186 51,977 31,204 67,550 37,168 30,522 16,679 17,213 10,075 46,535 26,242 46,375 26,151 86,915 49,012 62,602 35,302 24,489 13,810 34,478 19,442 83,149 46,889 46,681 26,324 28,954 16,327 45,717 25,780 46,081 25,985 53,187 29,993 66,958 37,759 30,181 17,019 17,448 9,839 24,671.539 5,384.129 4,665.719 24,053.561 25,294.144 5,765.585 24,165.535 4,573.042 2,218.973 21,954.910 22,222.450 2,433.201 2,528.975 22,423.628 1,768.923 21,677.456 1,142.426 21,080.675 21,330.838 1,386.848 21,184.238 1,228.337 21,196.131 1,235.125 594.609 2586.351 342.919 2336.581 2233.410 238.808 712.6 612.2 471.4 407.5 264.1 210.8 105.3 91.5 61.8 56.0 44.1 39.0 8.3 6.3 5.4 a For all the listed I-starting anticodons (INN), yeast has a UNN isoacceptor tRNA that reads the corresponding Aending codon (Percudani, Pavesi, and Ottonello 1997; Hani and Feldmann 1998); in these cases, in vivo experiments have shown that decoding of A by I is negligible (Munz et al. 1981). The first anticodon residue of tRNA Phe GAA and the second anticodon residue of tRNA Tyr GUA are present, respectively, as 29-O-methylguanosine and pseudouridine modified bases (Hani and Feldmann 1998). b Expected codon frequencies within cognate pairs were calculated by partitioning the tRNA usage of 4,128 protein sequences according to the pyrimidine frequency (63.94% T, 36.06% C) of the noncoding portion of the yeast genome. c Log-likelihood values were calculated by summing the ƒ-weighted logarithm of the ratio between observed and expected frequencies for each codon within a cognate pair. Table 2 Heterogeneity of Wobble Pyrimidine Codon Usage Among Classes of Genes of Different tRNA Usage CLASS 1 btRNAa 2 3 4 5 6 7 8 9 10 ........ 0.33 0.45 0.49 0.53 0.57 0.61 0.66 0.73 0.84 1.19 (412) (414) (414) (411) (413) (412) (414) (412) (413) (413) mRNA tagsb . . 201 241 172 189 261 417 670 1,138 1,768 9,860 (31) (28) (27) (36) (42) (48) (70) (109) (130) (152) 1.3% 1.6% 1.2% 1.3% 1.7% 2.8% 4.5% 7.6% 11.9% 66.1% tGNN . . . . . . . . tINN . . . . . . . . . Heterogeneity Among Classes 1–5c Heterogeneity Among Classes 6–10c 53.7 34.2 529.2 112.1 a Average b tRNA values (Percudani, Pavesi, and Ottonello 1997) are reported for each class; the numbers of genes in individual classes are shown in parentheses. b mRNA levels for three growth conditions (logarithmic, G2/M, and stationary) derived from yeast transcriptome data (Velculescu et al. 1997; see Materials and Methods) are expressed as the absolute number and the percentage of tags assigned to each class; a higher fraction of tags and a higher number of tag-identified genes (indicated in parentheses) are associated with classes with high btRNA values. c Mean heterogeneity G values of wobble pyrimidine usage for cognate codons read by guanosine-starting (tGNN) or inosine-starting (tINN) anticodons were separately calculated for classes 1–5 and 6–10. 1755 1756 Percudani and Ottonello in a much more homogeneous manner in classes of genes subjected to lower translational selective pressure. A clear-cut relationship between cognate codon usage and translational selection becomes apparent when we consider departures from the expected frequencies of individual cognate codons in classes of genes with different tRNA usages. For I-starting anticodons, there is no marked selection for either C- or U-ending codons in the different btRNA classes (fig. 1, left panels). In contrast, four cognate codon pairs read by G-starting anticodons display an increasingly strong departure from compositional bias in classes of increasing translational selective pressure (fig. 1, right panels). A superimposable deviation pattern was obtained by using a classification scheme based only on mRNA abundance data (fig. 2). A closer inspection of cognate pairs undergoing selection reveals that a C residue at the wobble position is only and always preferred when four hydrogen bonds are formed at the first two codon positions (figs. 1B and 2B). Conversely, a preference for a wobble U residue is observed in the single case in which six hydrogen bonds are formed at the two nonwobble positions (figs. 1F and 2F). In other words, whenever the type of interaction established by the first anticodon base can determine a variation in the number of hydrogen bonds, there is selection against codon-anticodon pairs that form a total of either six or nine hydrogen bonds. Distinguishing Between Efficiency- and AccuracyDriven Selection The existence of a close link between the biased use of some cognate codons and both gene expressivity and tRNA usage indicates that wobble pyrimidine choice for these codons is under translational selective pressure. The force driving this choice can be ascribed to an optimization of the efficiency and/or accuracy of translation. While efficiency is an obvious requirement of highly expressed genes, there is no a priori reason to expect that the effects of misincorporation events on organism fitness are related to gene expressivity. Instead, selection for translational accuracy is expected to be related to the evolutionary rate of a protein, as genes coding for proteins with a strong requirement for amino acid sequence conservation should be subjected to the highest pressure for the choice of accurate codons. A conservation-dependent synonymous codon usage bias is not present in bacteria (Hartl, Moriyama, and Sawyer 1994), but it has been observed in Drosophila (Akashi 1994; Tautz and Nigro 1998). To distinguish between the relative contributions of efficiency and accuracy in determining wobble base selection, we examined the dependence of such selection on amino acid sequence conservation. This was inferred from genetic distance measurements conducted on yeast genes sharing at least two orthologous relatives in wholly sequenced, evolutionarily distinct microbial genomes (Tatusov, Koonin, and Lipman 1997; see Materials and Methods for details). In this case, as opposed to the case of proteins only represented in yeast and in a single microbial lineage, the average genetic distance was found not to be appreciably influenced by the composition of a particular phy- logenetic assembly, thereby providing a more reliable indicator of amino acid sequence conservation. The resulting set of evolutionarily conserved yeast genes, a total of 402 sequences extracted from our nonredundant sample, was then subdivided into 10 equal-sized classes of increasing amino acid sequence conservation and analyzed for deviations from codon frequencies predicted by mutational bias. No dependence of cognate codon bias on amino acid sequence conservation was observed. As shown in figure 3A, this also holds for the four cognate pairs characterized by marked deviation from compositional bias. In contrast, the subdivision of this same set of genes according to expressivity criteria produced a pattern of deviation (fig. 3B) superimposable on that previously obtained for the entire set of yeast protein-coding genes. By distinguishing between invariant or variable amino acid residues in the 402 orthologous alignments, we next examined the dependence of cognate codon choice on the conservation of the specific amino acids (Gly, Asn, Phe, and Tyr) that exhibit the strongest cognate bias in highly expressed genes. As shown in table 3, such a dependence is only apparent in the case of glycine, for which the U/C ratio at the wobble position of the ggu/ggc cognate pair is significantly higher for invariant than for variable positions. The extent of such preference is greater for genes having btRNA values above the average (U/C odds ratio 5 1.56; P , 0.001, G-test of independence) than for genes having btRNA values below average (U/C odds ratio 5 1.16; P . 0.1). However, this conservation-dependent bias does not fully account for the preference of U-ending glycine codons by highly expressed genes, a preference that is also observed at variable positions. Further evidence against a selection of U-ending glycine codons exclusively driven by translational accuracy was provided by U/C ratios measured at invariant and variable sequence positions of individual protein-coding genes. In fact, an identical proportion of wobble U and C residues at either position (U/C odds ratio 5 1) is present in 29 cases; 177 genes have a higher proportion of wobble uridine residues at variable positions (U/C odds ratio , 1), while a prevalence of U at invariant positions is observed in 196 genes (U/C odds ratio . 1). It thus appears that the preference for C-ending codons within aau/c, uuu/c, and uau/c cognate pairs is dictated only by efficiency factors, while the opposite preference in the case of the ccu/c glycine pair reflects both efficiency and accuracy needs. Such a difference is also evident when we consider the dependence of wobble base choice on gene length for these four cognate codon pairs (fig. 4). In keeping with recent observations, yeast protein-coding genes of short lengths, which among highly expressed genes have an especially strong requirement for translational efficiency (Moriyama and Powell 1998), have a strongly biased usage of all four of these cognate codon pairs. Wobble pyrimidine selection becomes increasingly relaxed with increasing gene length for the three cognate pairs that are not influenced by amino acid residue conservation. Instead, for glycine ccu/c cognate codons, selective pressure is maintained Genomic Selection in Cognate Codon Pairs 1757 FIG. 1.—Deviation from compositional bias of wobble pyrimidine base frequencies in classes of genes with different tRNA usages. The 4,128 yeast protein-coding genes were subdivided into 10 classes of increasing btRNA values as specified in table 2. Data points correspond to log-likelihood distances (ln L) from the expected usage of alternative cognate codons forming either four (A and B), five (C and D), or six (E and F) hydrogen bonds at the two nonwobble positions; total ln L values were divided by the number of genes in each class. Wobble C and U frequencies higher than expected are plotted, respectively, above and below the reference line representing the frequencies predicted by mutational bias. 1758 Percudani and Ottonello FIG. 2.—Deviation from compositional bias of wobble pyrimidine base frequencies in classes of genes of different expressivities. The 673 mRNA tag-identified yeast genes in our sample were subdivided into 10 equal-sized classes with increasing numbers of tags per gene. The percentage of the total number of tags represented in each class is indicated. Data points correspond to log-likelihood distances (ln L) from the expected usage of alternative cognate codons forming either four (A and B), five (C and D), or six (E and F) hydrogen bonds at the two nonwobble positions; total ln L values were divided by the number of genes in each class. Wobble C and U frequencies higher than expected are plotted, respectively, above and below the reference line representing the frequencies predicted by mutational bias. Genomic Selection in Cognate Codon Pairs 1759 FIG. 3.—Dependence of wobble base selection on sequence conservation and tRNA usage. A, Four hundred two yeast genes sharing at least two orthologous relatives in wholly sequenced, evolutionarily distinct microbial genomes were subdivided into 10 equal-sized classes of increasing sequence conservation, expressed as the reciprocal of the mean genetic distance (1/D); average 1/D values for each class are given. B, The same set of 402 genes was subdivided into 10 classes of increasing btRNA values according to the criteria specified in table 2. Data points correspond to log-likelihood distances (ln L) from the expected usage of codons belonging to the four translationally selected cognate pairs divided by the number of genes in each class. Wobble C and U frequencies higher than expected are plotted, respectively, above and below the reference line representing the frequencies predicted by mutational bias. even in the longest coding sequences, which are expected to have the greatest need for accuracy (EyreWalker 1996). Discussion As revealed by the present analysis, there is good overall correspondence between the observed usage of wobble pyrimidines in cognate codons and the frequency expected on the assumption of no translational selective pressure. Contrasting with the selective constraints acting on third-base choice for synonymous codons read by different tRNA isoacceptors (as for most purine-ending codons), this delineates a rather unique situation of lack of selection within a coding region. In this scenario, four particular cases stand out in which wobble base usage departs markedly from compositional bias. Two observations indicate that departure from compositional bias pertains to the physicochemical characteristics of codon-anticodon interaction and is mostly determined by a selective pressure for the optimization of translational efficiency. First, all of the prominent cases of wobble pyrimidine selection are associated with a particular type of tRNA anticodon (G-starting) and thus with the formation of a different number of hydrogen bonds with alternative cognate codons. More specifically, we find that third-base choice within cognate pairs determines an increased or a decreased stability of codon-anticodon interaction when four or six hydrogen bonds, respectively, are formed at the two nonwobble positions. Second, all cases of deviation are consistently associated with the degree of expressivity of individual genes, while a relationship with amino acid conservation is only observed for the glycine ggc/ggu pair. This indicates that the selective pressure driving cognate codon choice pertains mainly to the optimization of the rate and energetic cost, rather than to the accuracy, of the translational process. Base stacking, besides hydrogen bonding, is known to contribute to the stability of intermolecular RNA duplexes. It may thus seem surprising Table 3 Wobble Pyrimidine Frequencies in Translationally Selected Cognate Codons Encoding Invariant or Variable Amino Acid Residues GLYCINE (tRNA GCC) TYPE OF RESIDUEa No. Invariant . . 2,644 Variable . . . 3,553 a ggu ggc 0.81 0.75 0.19 0.25 U/C Odds Ratiob 1.41* ASPARAGINE (tRNA GUU) No. aau aac 675 4,780 0.51 0.54 0.49 0.46 U/C Odds Ratiob 0.91 PHENYLALANINE (tRNA GAA) No. uuu uuc 725 4,647 0.54 0.54 0.46 0.46 U/C Odds Ratiob 0.98 TYROSINE (tRNA GUA) No. uau uac 599 3,465 0.52 0.51 0.48 0.49 Amino acid residues in 402 sets of aligned orthologous genes were classified as invariant or variable as specified in Materials and Methods. U/C odds ratios were calculated by dividing the proportions of nnu/nnc codons at invariant and variable sequence positions. * P , 0.001, G-test of independence. b U/C Odds Ratiob 1.02 1760 Percudani and Ottonello FIG. 4.—Relationships between wobble pyrimidine choice and gene length in genes subjected to translational selection. A set of 826 genes belonging to the two highest btRNA classes in table 2 was subdivided into five gene length (kb) classes; the number of genes in each class is given in parentheses. Data points correspond to the fractional usage of cytosine at the wobble position (nnc/(nnu1nnc)) of the four translationally selected cognate codon pairs; the dashed line indicates the fraction of third-position C residues predicted by mutational bias. that the hydrogen bond balance of different codon-anticodon pairs is by itself sufficient to account for wobble-base selection in S. cerevisiae. It should be noted, however, that a substantial contribution of binding interactions other than hydrogen bonds to this kind of selection would not explain the clear-cut distinction between codons read by inosine- and guanosine-starting anticodons. Furthermore, these same interactions do not account for the nearly equal selection intensity measured for six-hydrogen-bonded codon-anticodon pairs (figs. 1B and 2B), regardless of differences in stacking and modification states of the anticodon bases. The association rate of ternary complexes (Elongation Factor·GTP·aa-tRNA) with the ribosome depends only on their relative concentrations and does not involve codon recognition (Rodnina et al. 1996). Thus, the choice among codons read by the same tRNA must rely on differences in the dissociation constants of the two alternative codon-anticodon complexes within a given cognate pair. It is generally accepted that tRNA dissociation along the translational elongation cycle can occur at three distinct stages (Thompson 1988; Yarus and Smith 1995): (1) initial discharge of the ternary complex prior to codon-dependent GTP hydrolysis, (2) release of the aminoacyl-tRNA complex after GTP hydrolysis (proofreading), and (3) exit of the deacylated tRNA. Because departures from neutral cognate codon usage are either toward an increased or a decreased stability of codon-anticodon interaction, it follows that the targets of such selection, and hence the translational steps involved, must be different. The reason for the avoidance of six-hydrogenbonded codon-anticodon interactions, observed for the Asn, Phe, and Tyr cognate pairs regardless of amino acid residue conservation, conceivably resides in a reduction of the frequency of premature release events, whose probability is stochastically determined by kinetic standards built into the translational system (Thompson 1988). Since GTP hydrolysis has recently been shown to be triggered by a cognate ternary complex at a rate that is four orders of magnitude higher than the rate for a noncognate complex (Rodnina et al. 1996), it seems unlikely that this step has enough sensitivity to discriminate within cognate codon pairs. Such discrimination, instead, may be afforded by proofreading, a step in which the fate of an EF·GDP·aa-tRNA complex is stochastically determined by the ratio between the codondependent rate constant for dissociation of the ternary complex and the codon-independent release of the binary complex (EF·GDP). Acting as an internal kinetic standard, the latter step must be slow enough to allow for the release of near-cognate ternary complexes. The same step, however, can also lead to the premature dissociation of the counterselected six-hydrogen-bonded ternary complexes. Excess of proofreading of weak interacting cognate codon-anticodon complexes would thus determine a slowing down and an increased energetic cost of elongation. Selection against nine-hydrogen-bonded codon-anticodon pairs, as observed in the case of glycine ggu/ggc cognate codons, may pertain to the release of deacylated tRNAs from the E site. This step, which in various fungi (including S. cerevisiae) is promoted by a dedicated, ATP-dependent elongation factor (Triana Alonso, Chakraburtty, and Nierhaus 1995), is the only one in which the exceedingly slow dissociation of an otherwise correct tRNA can critically reduce the efficiency of translation. However, the observation of an even more prominent selection of the ggu codon at positions at which there is a critical requirement for glycine conservation suggests that an additional reason for this choice is the fulfillment of an accuracy need. This is consistent with the more stable interaction of the counterselected ggc codon with the abundant competitor tRNAAsp GUC through second-po- Genomic Selection in Cognate Codon Pairs sition wobbling, as previously proposed by Kato (1990). In line with this view, since in different organisms the type and relative concentration of the competitor tRNA may vary, organism-specific preferences in the usage of this particular cognate pair are expected. If the choice of particular codon-anticodon pairs really results in an optimized translation, it may seem surprising that only a small fraction of yeast genes actually undergo selection. It should be noted, however, that although selection within cognate pairs is only apparent in the last 2 of the 10 gene classes we considered, these 2 classes alone account for the large majority (ca. 80%) of the total mRNA output of yeast cells (table 2). Genes belonging to these two high-expressivity classes are additionally characterized by a decreasing relative usage of all weakly or strongly interacting purine-ending codons. In fact, in highly expressed genes, we also observed a two- to fivefold reduction in the relative usage of the five codons (cgg, ggg, aua, uua, aaa) read by nonwobbling tRNAs and obligatorily leading to six- or nine-hydrogen-bonded codon-anticodon interactions. Furthermore, the strikingly reduced copy number of genes encoding tRNAs with anticodons that obligatorily form nine hydrogen bonds has been evidenced previously in yeast (Percudani, Pavesi, and Ottonello 1997). The avoidance of both six- and nine-hydrogen-bonded cognate codon-anticodon pairs thus appears to be part of a general strategy which optimizes the biosynthetic efficiency of most cellular proteins while complying with a basic need for accuracy that is set by kinetic standards internal to the system. By allowing the proper recognition of all 61 sense codons, these kinetic standards have probably emerged as those that best satisfy the most recurrent cases of hydrogen-bonding interaction resulting from the combinatorial assortment of the four bases (7–8 hydrogen bonds for 75% of the codons). Accordingly, the counterselection of the inherently less frequent six- and nine-hydrogen-bonded codon-anticodon pairs can be viewed as an adaptation to the kinetic standards of the translational system, relying on both exploitation of code degeneracy and careful use of wobbling. Acknowledgments We are grateful to Gian Luigi Rossi for encouragement and support, to Victor Velculescu for the extended set of yeast transcriptome data, to Roberto Favilla for helpful discussions, and to Giorgio Dieci, Alessio Peracchi, and Claudio Rivetti for a critical reading of the manuscript. Computer assistance by Silvia Giuliodori is also gratefully acknowledged. This work was supported by grants from the Ministry of University and of Scientific and Technological Research (MURST, Rome, Italy). LITERATURE CITED AKASHI, H. 1994. Synonymous codon usage in Drosophila melanogaster: natural selection and translational accuracy. Genetics 136:927–935. 1761 BENNETZEN, J. L., and B. D. HALL. 1982. Codon selection in yeast. J. Biol. Chem. 257:3026–3031. BULMER, M. 1991. The selection-mutation-drift theory of synonymous codon usage. Genetics 129:897–907. CURRAN, J. F., and M. YARUS. 1989. Rates of aminoacyl-tRNA selection at 29 sense codons in vivo. J. Mol. Biol. 209:65– 77. DOOLITTLE, R. F., D. F. FENG, S. TSANG, G. CHO, and E. LITTLE. 1996. Determining divergence times of the major kingdoms of living organisms with a protein clock. Science 271: 470–477. EYRE-WALKER, A. 1996. Synonymous codon bias is related to gene length in Escherichia coli: selection for translational accuracy? Mol. Biol. Evol. 13:864–872. GOFFEAU, A., B. G. BARRELL, H. BUSSEY et al. (16 co-authors). 1996. Life with 6000 genes. Science 274:563–567. GROSJEAN, H., and W. FIERS. 1982. Preferential codon usage in prokaryotic genes: the optimal codon-anticodon interaction energy and the selective codon usage in efficiently expressed genes. Gene 18:199–209. GROSJEAN, H., D. SANKOFF, W. M. JOU, W. FIERS, and R. J. CEDERGREN. 1978. Bacteriophage MS2 RNA: a correlation between the stability of the codon:anticodon interaction and the choice of code words. J. Mol. Evol. 12:113–119. HANI, J., and H. FELDMANN. 1998. tRNA genes and retroelements in the yeast genome. Nucleic Acids Res. 26:689–696. HARTL, D. L., E. N. MORIYAMA, and S. A. SAWYER. 1994. Selection intensity for codon bias. Genetics 138:227–234. IKEMURA, T. 1981. Correlation between the abundance of Escherichia coli transfer RNAs and the occurrence of the respective codons in its protein genes: a proposal for synonymous codon choice that is optimal for the E. coli translational system. J. Mol. Biol. 151:389–409. . 1982. Correlation between the abundance of yeast transfer RNAs and the occurrence of the respective codons in protein genes. Differences in synonymous codon choice patterns of yeast and Escherichia coli with reference to the abundance of isoaccepting transfer RNAs. J. Mol. Biol. 158:573–597. . 1985. Codon usage and tRNA content in unicellular and multicellular organisms. Mol. Biol. Evol. 2:13–34. . 1992. Correlation between codon usage and tRNA content in microorganisms. Pp. 88–111 in D. L. HATFIELD, B. J. LEE, and R. M. PIRTLE, eds. Transfer RNA in protein synthesis. CRC Press, London. KATO, M. 1990. Codon discrimination due to presence of abundant non-cognate competitive tRNA. J. Theor. Biol. 142: 35–39. KURLAND, C. G. 1987. Strategies for efficiency and accuracy in gene expression. Trends Biochem. Sci. 12:126–128. MORIYAMA, E. N., and J. R. POWELL. 1998. Gene length and codon usage bias in Drosophila melanogaster, Saccharomyces cerevisiae and Escherichia coli. Nucleic Acids Res. 26:3188–3193. MUNZ, P., U. LEOPOLD, P. AGRIS, and J. KOHLI. 1981. In vivo decoding rules in Schizosaccharomyces pombe are at variance with in vitro data. Nature 294:187–188. PARKER, J., T. C. JOHNSTON, P. T. BORGIA, G. HOLTZ, E. REMAUT, and W. FIERS. 1983. Codon usage and mistranslation. In vivo basal level misreading of the MS2 coat protein message. J. Biol. Chem. 258:10007–10012. PAVESI, A. 1999. Relationships between transcriptional and translational control of gene expression in Saccharomyces cerevisiae: a multiple regression analysis. J. Mol. Evol. 48: 133–141. 1762 Percudani and Ottonello PEARSON, W. R., and D. J. LIPMAN. 1988. Improved tools for biological sequence comparison. Proc. Natl. Acad. Sci. USA 85:2444–2448. PERCUDANI, R., A. PAVESI, and S. OTTONELLO. 1997. Transfer RNA gene redundancy and translational selection in Saccharomyces cerevisiae. J. Mol. Biol. 268:322–330. RODNINA, M. V., T. PAPE, R. FRICKE, L. KUHN, and W. WINTERMEYER. 1996. Initial binding of the elongation factor Tu.GTP.aminoacyl-tRNA complex preceding codon recognition on the ribosome. J. Biol. Chem. 271:646–652. SHARP, P. M., and W. H. LI. 1987. The codon adaptation index—a measure of directional synonymous codon usage bias, and its potential applications. Nucleic Acids Res. 15: 1281–1295. SHARP, P. M., M. STENICO, J. F. PEDEN, and A. T. LLOYD. 1993. Codon usage: mutational bias, translational selection, or both? Biochem. Soc. Trans. 21:835–841. SOKAL, R. R., and F. J. ROHLF. 1995. Biometry. W. H. Freeman, New York. TATUSOV, R. L., E. V. KOONIN, and D. J. LIPMAN. 1997. A genomic perspective on protein families. Science 278:631– 637. TAUTZ, D., and L. NIGRO. 1998. Microevolutionary divergence pattern of the segmentation gene hunchback in Drosophila. Mol. Biol. Evol. 15:1403–1411. THOMAS, L. K., D. B. DIX, and R. C. THOMPSON. 1988. Codon choice and gene expression: synonymous codons differ in their ability to direct aminoacylated-transfer RNA binding to ribosomes in vitro. Proc. Natl. Acad. Sci. USA 85:4242– 4246. THOMPSON, J. D., D. G. HIGGINS, and T. J. GIBSON. 1994. CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, positions-specific gap penalties and weight matrix. Nucleic Acids Res. 22:4673–4680. THOMPSON, R. C. 1988. EF-Tu provides an internal kinetic standard for translational accuracy. Trends. Biochem. Sci. 13:91–93. TRIANA ALONSO, F. J., K. CHAKRABURTTY, and K. H. NIERHAUS. 1995. The elongation factor 3 unique in higher fungi and essential for protein biosynthesis is an E site factor. J. Biol. Chem. 270:20473–20478. VELCULESCU, V. E., L. ZHANG, W. ZHOU, J. VOGELSTEIN, M. A. BASRAI, D. E. BASSETT JR., P. HIETER, B. VOGELSTEIN, and K. W. KINZLER. 1997. Characterization of the yeast transcriptome. Cell 88:243–251. WRIGHT, F. 1990. The ‘effective number of codons’ used in a gene. Gene 87:23–29. YARUS, M., and D. SMITH. 1995. tRNA on the ribosome: a waggle theory. Pp. 443–469 in D. SÖLL and U. RAJBHANDARY, eds. tRNA: structure, biosynthesis, and function. American Society for Microbiology, Washington, D.C. STANLEY SAWYER, reviewing editor Accepted August 17, 1999