* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Amino Acid Cost and Codon-Usage Biases in 6 Prokaryotic
Magnesium transporter wikipedia , lookup
Metabolic network modelling wikipedia , lookup
Basal metabolic rate wikipedia , lookup
Ribosomally synthesized and post-translationally modified peptides wikipedia , lookup
Butyric acid wikipedia , lookup
Two-hybrid screening wikipedia , lookup
Endogenous retrovirus wikipedia , lookup
Metalloprotein wikipedia , lookup
Citric acid cycle wikipedia , lookup
Fatty acid metabolism wikipedia , lookup
Fatty acid synthesis wikipedia , lookup
Nucleic acid analogue wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Peptide synthesis wikipedia , lookup
Proteolysis wikipedia , lookup
Point mutation wikipedia , lookup
Amino acid synthesis wikipedia , lookup
Biochemistry wikipedia , lookup
Amino Acid Cost and Codon-Usage Biases in 6 Prokaryotic Genomes: A WholeGenome Analysis Esley M. Heizer Jr,* Douglas W. Raiford, Michael L. Raymer, Travis E. Doom, Robert V. Miller,à and Dan E. Krane* *Department of Biological Sciences, Wright State University; Department of Computer Science and Engineering, Wright State University; and àDepartment of Microbiology and Molecular Genetics, Oklahoma State University For most prokaryotic organisms, amino acid biosynthesis represents a significant portion of their overall energy budget. The difference in the cost of synthesis between amino acids can be striking, differing by as much as 7-fold. Two prokaryotic organisms, Escherichia coli and Bacillus subtilis, have been shown to preferentially utilize less costly amino acids in highly expressed genes, indicating that parsimony in amino acid selection may confer a selective advantage for prokaryotes. This study confirms those findings and extends them to 4 additional prokaryotic organisms: Chlamydia trachomatis, Chlamydophila pneumoniae AR39, Synechocystis sp. PCC 6803, and Thermus thermophilus HB27. Adherence to codon-usage biases for each of these 6 organisms is inversely correlated with a coding region’s average amino acid biosynthetic cost in a fashion that is independent of chemoheterotrophic, photoautotrophic, or thermophilic lifestyle. The obligate parasites C. trachomatis and C. pneumoniae AR39 are incapable of synthesizing many of the 20 common amino acids. Removing auxotrophic amino acids from consideration in these organisms does not alter the overall trend of preferential use of energetically inexpensive amino acids in highly expressed genes. Introduction Advances in sequencing technology have provided an abundance of genomic data from prokaryotic organisms (Ghai et al. 2004). This abundance has in turn facilitated a number of whole- and comparative-genome analyses (Kanaya et al. 1999; Akashi and Gojobori 2002; dos Reis et al. 2003). Even before whole-genome analysis was possible, it was known that some organisms exhibit striking biases in their utilization of synonymous triplet codons (Grantham, Gautier, and Gouy 1980; Grantham, Gautier, Gouy, and Pave 1980; Robinson et al. 1984; Kanaya et al. 1999; Akashi 2003) in ways that are correlated with the relative abundance of the isoacceptor tRNAs for each codon (Ikemura 1981a, 1981b) and the copy number of the individual tRNA genes (Kanaya et al. 1999). Adherence to these codon-usage biases in prokaryotes is selectively advantageous and has been shown to be responsible for 3- to 6-fold differences in translation rates (Robinson et al. 1984) and up to 10-fold differences in the accuracy of translation (Precup and Parker 1987). Some regions of a protein’s primary structure are under strong selective pressure (e.g., active sites), making the observation of even conservative substitutions uncommon in naturally occurring populations, whereas other regions of proteins are much more likely to display sequence variability (Axe 2000). The primary structure of a protein can also be constrained by a variety of cellular processes, including the organism’s metabolic pathways (Craig and Weber 1998), the translation rate of the mRNA (Ikemura 1981a, 1981b), and the production cost of the amino acids (Craig and Weber 1998). Given that the energy expended to biosynthesize amino acids varies considerably, ranging from 11.7 (glycine and alanine) to more than 74 (tryptophan) high-energy phosphate bonds (;PO4) (Akashi and Gojobori 2002), and Key words: adaptive evolution, codon-usage bias, amino acid cost, prokaryote, genome, bioinformatics. E-mail: [email protected]. Mol. Biol. Evol. 23(9):1670–1680. 2006 doi:10.1093/molbev/msl029 Advance Access publication June 5, 2006 Ó The Author 2006. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. All rights reserved. For permissions, please e-mail: [email protected] that energy availability commonly limits prokaryotic growth, it is reasonable that natural selection would favor substitutions that resulted in the utilization of less energetically costly amino acids where possible. Manifestation of such a substitution bias should be greatest in highly expressed genes in much the same way as adherence to codon-usage biases tend to be greatest in genes that are expressed at high levels (Grantham, Gautier, and Gouy 1980; Grantham, Gautier, Gouy, and Pave 1980; Ikemura 1981a; Robinson et al. 1984; Kanaya et al. 1999; Lafay et al. 2000; dos Reis et al. 2003). Indeed, Akashi and Gojobori (2002) have demonstrated that genes that adhere to organismal codon-usage biases most strongly (and, by inference, are most highly expressed) tend to incorporate lower cost amino acids in Escherichia coli and Bacillus subtilis. By performing a Spearman rank correlation (Spearman 1904), they were able to demonstrate a negative correlation between the major codon usage (MCU) of a gene and the average biosynthetic cost of the amino acids incorporated into the expressed protein. In their analysis, Akashi and Gojobori (2002) performed complete genomic analyses of only 2 organisms (E. coli and B. subtilis), both of which have all the metabolic pathways necessary for biosynthesis of each of the 20 common amino acids. Although the codon-usage biases of E. coli and B. subtilis differ (Kanaya et al. 1999), their amino acid biosynthetic costs, pathways, and capabilities are the same (Ogata et al. 1999; Overbeek et al. 2000). Other prokaryotic organisms have significantly different amino acid biosynthetic costs (Craig and Weber 1998), pathways (e.g., Hess 2004), and capabilities (e.g., Razin 1999) yet still exhibit pronounced codon-usage biases (e.g., Zavala et al. 2002). Further, factors other than translation efficiency and accuracy, such as skewed genomic GC content (McHardy et al. 2004) or adaptation to extreme environmental conditions (Zavala et al. 2002), may also contribute to an organism’s codon-usage and amino acid utilization biases. If the trends reported by Akashi and Gojobori (2002) are the result of natural selection favoring the utilization of amino acids that are less energetically costly to biosynthesize, then similar but different trends 0 0 6 6 0 0 1,010,337 1,127,136 297,317 330,395 923,920 485,644 3,040 3,318 794 911 2,592 1,492 58 201 2 14 61 9 433 288 33 0 160 312 2 3 0 34 0 0 89 68 0 0 107 13 488 433 66 153 247 156 2 0 0 0 0 0 NOTE.—No. of genes in genome: number of protein-coding genes in the genome; % GC: percentage of genome that comprised guanine and cysteine; partial codons: a codon containing less than 3 nt; Genes ,100 codons: genes less than 100 codons not including start and stop codons; phage-related genes: genes that are phage or transposon related; sequence does not match translation: when translated the nucleotide sequence that does not match the given protein sequence; HGT: any gene identified as a candidate for horizontal gene transfer; paralog: genes identified as paralogous; and auxotrophic: inability to produce amino acids. Organism 42 50 41 40 60 69 HGT Sequence Does Not Match Translation Phage-Related Genes Genes ,100 Codons Partial Codons % GC Organismal amino acid biosynthetic costs were determined by adding the number of high-energy phosphate bonds required to synthesize precursor molecules to those expended to convert the precursors to amino acids as described by Akashi and Gojobori (2002). The amount of potential energy lost by diverting precursors to amino acid production was also included in the total cost estimates (Craig and Weber 1998). The 6 organisms considered in this study utilized 1 of 2 different pathways for amino acid synthesis (fig. 1): either chemoheterotrophic or photoautotrophic. The principle difference between these 2 pathways is that photoautotrophs utilize the Calvin cycle (Poolman et al. 2000) to feed precursors into the Embden–Meyerhof– Parnas (EMP) pathway. Chemoheterotrophs and photoautotrophs then utilize 3 different central pathways to No. of Genes in Genome Amino Acid Production Cost Table 1 Number of Genes in the Genome and the Number Removed by Each Culling Criteria Annotated files containing gene location information as well as complete genomic sequences were obtained from http://www.ncbi.nih.gov/genomes/lproks.cgi for all 6 organisms considered in this study (B. subtilis, E. coli K12, C. trachomatis, C. pneumoniae AR39, Synechocystis sp. PCC 6803, and T. thermophilus HB27). Genes that were described as ‘‘phage related’’ or ‘‘transposon related’’ in the annotated file were removed from further consideration as were candidates for horizontal gene transfer (Garcia-Vallve et al. 2003) because they may not reflect the codon-usage bias of the organism (dos Reis et al. 2003; Garcia-Vallve et al. 2003). Genes that were less than 100 codons in length (not including the start and stop codons) were also removed from each organism’s gene set to minimize sampling effects and potential length biases (Eyre-Walker 1996). All but one version of each paralogous locus was also removed from each organism’s gene set. Paralogs were identified with unfiltered Blast (http://www.ncbi.nlm.nih. gov/BLAST/) searches (Altschul et al. 1990) that were individually performed with all proteins in each organism against all other proteins from that organism. Proteins with greater than 60% amino acid identity over a stretch of more than 60 amino acids in these intraorganismal searches were considered to be paralogous. This approach made our results directly comparable to Akashi and Gojobori (2002). Only the single paralog with a GC content closest to the organism’s genomic GC content was retained. The number of genes included for each organism as well as total codon counts are shown in table 1. Paralog Materials and Methods Genomic Data 4,112 4,311 895 1,112 3,167 1,982 No. of Genes after Culling No. of Codons should also be present in prokaryotes with different amino acid biosynthetic costs, pathways, and capabilities as well as adaptation to extreme environments. To address this issue, we have confirmed the original analysis of Akashi and Gojobori (2002) and expanded it to include organisms with different amino acid biosynthetic costs and pathways (Synechocystis sp. PCC 6803) and amino acid production capabilities (Chlamydia trachomatis and Chlamydophila pneumoniae AR39) and an extreme thermophile (Thermus thermophilus HB27). Bacillus subtilis Escherichia coli K12 Chlamydia trachomatis Chlamydophila pneumoniae Synechocystis sp. PCC 6803 Thermus thermophilus No. of Amino Acids Auxotrophic for Amino Acid Cost and Codon-Usage Biases in 6 Prokaryotes 1671 1672 Heizer et al. FIG. 2.—Principle component analysis matrix, codon frequency matrix (cfm), contains the frequency of each codon in each gene in the genome, first principle component is a new vector that best captures the variability of the cfm, and Z# is the dot product of the cfm and the first principle component. FIG. 1.—Metabolic pathways involved in amino acid biosynthesis and energy production. penP, ribose 5-phosphate; PRPP, 5-phosphoribosyl pyrophosphate; eryP, erythrose 4-phosphate; 3pg, 3-phosphoglycerate; pep, phosphoenolpyruvate; pyr, pyruvate; acCoA, acetyl-CoA; akg, a-ketoglutarate; oaa, oxaloacetate; RuBP, ribulose bisphosphate; TCA, tricarboxylic acid cycle. synthesize individual amino acids (fig. 1): the EMP pathway, the citric acid cycle, and the pentose phosphate pathway (Ogata et al. 1999; Overbeek et al. 2000). Blast searches were also performed on the complete genomes of all the six 6 study organisms to determine if they were capable of synthesizing each of the 20 common amino acids. When the protein-coding regions were at least 20% identical at the amino acid level with the enzymes known to be involved in the biosynthetic pathways (Atkinson 1977), it was concluded that a homolog to the enzyme used for a particular metabolic step was also available to the organism in question. When 50% or more of a specific amino acid biosynthetic pathway was absent, it was concluded that the organism was unable to synthesize the amino acid in question. Major Codons and Average Protein Cost Because protein expression can be difficult to quantify directly, adherence to MCU bias was calculated as an indicator of gene expressivity. Previous research (Akashi and Gojobori 2002) used MCU as an indicator of expres- sion level for E. coli and B. subtilis. These 2 organisms exhibited a codon-usage bias driven by translational efficiency (Kanaya et al. 1999) (i.e., major codons and tRNA abundance were strongly related and ribosomal protein–coding genes exhibited high MCU scores). Similarly, 5 of the organisms in this study (B. subtilis, E. coli K12, Synechocystis sp. PCC 6803, C. trachomatis, and C. pneumoniae) have been shown to have a dominant translational bias (Carbone et al. 2005). The other organism’s (T. thermophilus) ribosomal protein–coding genes exhibited high MCU scores. Carbone et al. (2005) used codon adaptation index (CAI) in their analysis. Four of the 6 organisms (B. subtilis, E. coli K12, Synechocystis sp. PCC 6803, and T. thermophilus) exhibited strong correlations between MCU and CAI scores. CAI was relied upon for those that did not exhibit such correlations (C. trachomatis and C. pneumoniae) (Carbone et al. 2005). For each gene in an organism, the number of occurrences of each codon (excluding those for methionine, tryptophan, and stop codons) was tallied. Start codons were excluded because they are known to vary in prokaryotes, although still generally coding for methionine. Stop codons were likewise excluded because they have no associated tRNA and thus should not exhibit codonusage bias. Finally, methionine and tryptophan were excluded because they each have only a single codon and thus exhibit no usage bias. After these exclusions, each of the resulting 59 codon counts for each gene was then normalized to allow comparisons between genes with differing numbers of synonymous codons (relative synonymous codon usage; Sharp et al. 1986). For example, the number of GGT codons (1 of 4 synonymous codons for glycine) was divided by the total number of occurrences of all glycine codons within the gene, which was divided by the number of synonymous codons for glycine (4). The normalized values were assembled into a codon frequency matrix, in which each entry, fi,j, is the normalized count of codon j in gene i. The projection of these codon frequencies onto the 1-dimensional space that best captures their variability was obtained by calculating the dot product between the codon frequency matrix and the first principle component of this matrix (fig. 2). This projection results in a single value for each gene that approximates the degree to which the gene contributes to the codon-usage variability of the genome. Intuitively, the 1-dimensional space given by the first principle component represents a trend in codon usage, and codons are determined to be major if Amino Acid Cost and Codon-Usage Biases in 6 Prokaryotes 1673 they make a statistically significant positive contribution to this major trend. Therefore, a codon was designated to be major for the organism in question if there was a significant and positive correlation between the normalized frequencies for a particular codon and their projected values. Once the major codons for an organism were identified, the MCU for each gene was calculated by dividing the number of major codons in the gene by the total number of codons (excluding start, stop, methionine, and tryptophan codons). The average biosynthetic cost of the amino acids in the proteins coded for by each individual gene in each organism was calculated by totaling the number of high-energy phosphate bonds (;PO4) needed to synthesize each amino acid and then dividing by the total number of amino acids in the protein. As before, start and stop codons were excluded from consideration and did not contribute to either the phosphate bond or amino acid totals though methionine and tryptophan residues did. Although the start codon codes for an amino acid (usually f-methionine), the energy contribution of this amino acid is constant for all proteins and was not considered. Statistical Analyses Once the average biosynthetic cost and adherence to MCU were determined for each of the organism’s genes, several statistical tests were performed to determine the degree to which cost and expressivity (as estimated by MCU) are correlated. First, genes were independently rank ordered by average amino acid biosynthetic cost and by adherence to organismal MCU for each organism. Spearman rank correlation (Spearman 1904) was performed on these ranked lists, and the threshold for significance was set at a 5 0.05. In order to test the effect of biological function as a confounding factor, ranking by average amino acid biosynthetic cost and by adherence to organismal MCU for each organism was also performed after subdividing genes into the 16 functional categories based upon those listed on National Center for Biotechnology Information (NCBI; http://www.ncbi.nlm.nih.gov/COG/) (those that were labeled ‘‘poorly characterized,’’ ‘‘general function prediction only,’’ or ‘‘function unknown’’ were excluded). Akashi and Gojobori (2002) used the Micado database (Biaudet et al. 1997) and the University of Wisconsin E. coli Genome Web site (http://www.genome.wisc.edu/) as sources for functional category data on B. subtilis and E. coli, respectively. The functional categories used in our analyses came from a single database (COG database; Tatusov et al. 2003) in order to assure consistency in classification for all organisms. Genes falling above the median MCU value were designated as ‘‘high adherence,’’ whereas those that were below the median were designated as ‘‘low adherence’’ for the purpose of calculating the probability of divergence from equal amino acid usage in both categories by a Mantel–Haenszel test (Rosner 2000). A Mantel–Haenszel Z-score (standard normal) and a Spearman rank correlation were calculated for each of the 20 amino acids. The threshold for significance for the Mantel–Haenszel test t-statistic was set at a 5 0.05. Table 2 Spearman Rank Correlation over Whole Genome and Internal, External, and Ambivalent Amino Acids Organism Bacillus subtilis Escherichia coli K12 Chlamydia trachomatis1 Chlamydia trachomatis Chlamydophila pneumoniae1 Chlamydophila pneumoniae Synechocystis sp. PCC 6803 Thermus thermophilus rS rS int rS ext rS amb ÿ0.37 ÿ0.23 ÿ0.25 ÿ0.28 ÿ0.14 ÿ0.08 ÿ0.24 ÿ0.17 ÿ0.29 ÿ0.07 ÿ0.22 ÿ0.23 ÿ0.10 ÿ0.21 ÿ0.24 ÿ0.14 ÿ0.25 ÿ0.16 ÿ0.21 ÿ0.26 ÿ0.08 0.05* ÿ0.01* ÿ0.13 ÿ0.16 ÿ0.21 ÿ0.11 ÿ0.08 ÿ0.07 ÿ0.05* ÿ0.22 ÿ0.02* NOTE.—Asterisk denotes no statistical significance (P . 0.05); rS, Spearman rank correlation overall; rS int, Spearman rank correlation, internal (hydrophobic); rS ext, Spearman rank correlation, external (hydrophilic); rS amb, Spearman rank correlation, ambivalent (can be hydrophobic or hydrophilic). The number of codons does not include start or stop codon, and plus denotes values before adjusting for amino acids that the organism is unable to produce. Data Representation For visualization purposes, data were grouped into 20 bins such that approximately 1/20th of the codons for an organism were placed into each bin. The exact number of codons varies slightly among bins to avoid dividing the codons for a single gene between 2 bins. Results Correlation between MCU and Amino Acid Production Cost Statistically significant negative Spearman rank correlations (P , 0.05) were found between the adherence to codon-usage bias and the average biosynthetic cost per encoded amino acid in all 6 organisms analyzed (table 2 and fig. 3). Chlamydia trachomatis appears to preferentially utilize auxotrophic amino acids in highly expressed genes (average percentage of auxotrophic amino acid usage vs. expressivity yields an rS 5 0.24, P , 1 3 10ÿ11). Despite that, it also exhibits a significant negative Spearman rank correlation between expressivity and average amino acid biosynthetic cost (rS 5 ÿ0.25, P , 1 3 10ÿ12). After removing from consideration those amino acids for which C. trachomatis is auxotrophic (Thr, Met, Lys, Arg, Ala, Asp), the trend was even more negative (rS 5 ÿ0.28, P , 1 3 10ÿ15). Chlamydophila pneumoniae AR39 appears to avoid utilization of auxotrophic amino acids in highly expressed genes (average percentage of auxotrophic amino acid usage vs. expressivity yields an rS 5 ÿ0.17, P , 1 3 10ÿ7). It also exhibits a significant negative Spearman rank correlation between expressivity and average amino acid biosynthetic cost (rS 5 ÿ0.14, P , 1 3 10ÿ6). After removing from consideration those amino acids for which C. pneumoniae AR39 is auxotrophic (His, Leu, Pro, Ala, Glu, Gln), the trend is still negative (rS 5 ÿ0.08, P 5 0.0083). Correlation between MCU and Cost in Functional Categories in Organisms Proteins in different functional categories may have differing compositional constraints and expression levels. NCBI (http://www.ncbi.nih.gov/genomes/lproks.cgi) apportions all genes into 1 of 16 different functional categories 1674 Heizer et al. FIG. 3.—Comparison of average MCU and average cost in high-energy phosphate bonds (;P) in 6 bacterial species: (A) Bacillus subtilis, (B) Escherichia coli K12, (C) Chlamydia trachomatis, (D) Chlamydophila pneumoniae AR39, (E) Synechocystis sp. PCC 6803, and (F) Thermus thermophilus. Error bars represent standard error of the means of the bins. (Tatusov et al. 2003). Spearman rank correlations were determined in all 16 functional categories for all genomes in the study except for those of C. pneumoniae and T. thermophilus (whose genes have not yet been assigned to functional categories). For each Spearman rank correlation coefficient, a t-statistic was calculated, and a P value was obtained to determine significance. No single category seems to be responsible for the correlation between MCU and average amino acid production cost (table 3). All 16 subcategories for B. subtilis, E. coli K12, and Synechocystis sp. PCC 6803 displayed either negative correlations or statistically insignificant correlations between the average amino acid cost and MCU (table 3). The trends observed in our analysis and by Akashi and Gojobori (2002) are similar, though not directly comparable due to subtle differences in assignment to functional categories. For C. trachomatis, 3 categories (translation; ribosomal structure and biogenesis, energy production and conversion, and lipid metabolism) showed a statistically significant positive correlation. All the other functional categories in this organism were not statistically significance (table 3). Correlation between MCU and Cost in Hydrophilic, Hydrophobic, and Ambivalent Amino Acids The physicochemical makeup of different amino acids has a strong effect upon their structural and functional roles within proteins. Akashi and Gojobori (2002) recognized this relationship as a potential confounding factor in identifying potential correlations between amino acid cost and protein expressivity. To identify the role of amino acid chemistry, the 20 common amino acids were grouped into 3 physicochemical classes: internal, external, and ambivalent. Internal amino acids are largely hydrophobic and tend to occur within the core of a protein isolated from solvent. External amino acids are those polar and charged amino acids that are prevalent at the solvent-exposed surface of the protein. Finally, ambivalent amino acids are amphipathic or borderline residues that are generally equally frequent within the core or at the surface of a protein. A Spearman rank correlation test was used to determine the significance of the relationship between MCU and synthetic cost of amino acids within each of these 3 categories. For the hydrophobic amino acids (Phe, Leu, Ile, Met, Val), statistically significant negative correlations between MCU and amino acid cost for all organisms were seen (fig. 4). All organisms, except C. pneumoniae and Synechocystis sp. PCC 6803, gave a negative and significant trend between MCU and amino acid cost when only hydrophilic amino acids (His, Arg, Lys, Gln, Glu, Asn, Asp) were considered (see table 2 and fig. 5). The trend in C. pneumoniae (P 5 0.16) and Synechocystis sp. PCC 6803 was not significant (P 5 0.32). Amino Acid Cost and Codon-Usage Biases in 6 Prokaryotes 1675 FIG. 4.—Comparison of average MCU and average cost in high-energy phosphate bonds (;P) among internal amino acids in 6 bacterial species: (A) Bacillus subtilis, (B) Escherichia coli K12, (C) Chlamydia trachomatis, (D) Chlamydophila pneumoniae AR39, (E) Synechocystis sp. PCC 6803, and (F) Thermus thermophilus. Error bars represent standard error of the means of the bins. Negative trends between MCU and amino acid production cost were found for ambivalent amino acids (Trp, Tyr, Cys, Ala, Ser, Gly, Pro, Thr) in B. subtilis, E. coli K12, C. trachomatis, and Synechocystis sp. PCC 6803 (see table 2 and fig. 6). The correlation for C. pneumoniae (P 5 0.21) and T. thermophilus (P 5 0.10) were not found to have a significant Spearman rank correlation coefficient. table 4 either decreases or has statistically insignificant increases in all 4 organisms where functional categories have been assigned to genes. The same is true of tryptophan (Trp) in all cases. Asparagine (Asn) increases in all organisms except Synechocystis sp. PCC 6803. This trend holds in general with only a few exceptions. The less energetically costly amino acids tend to increase with MCU, and the energetically costly amino acids tend to decrease (table 4). Amino Acid Utilization and Gene Expression Levels In order to look at how individual amino acid usage changed in highly expressed genes and in functional categories, Spearman rank correlation (Spearman 1904) and Mantel–Haenszel tests (Rosner 2000) were performed for each individual amino acid. Significance was determined by a t-statistic for Spearman rank correlations (P , 0.05) and by a sequential Bonferroni 2-tailed test (P , 0.05) for the Mantel–Haenszel tests. The objective of these tests was to determine if the amino acid was increasing or decreasing as a function of MCU, both overall and within functional categories. The results demonstrate that low-cost amino acids tend to increase with MCU, both overall and within each of the 16 functional categories examined. Similarly, amino acids with high biosynthetic costs tend to decrease with MCU, both overall and within all 16 functional categories considered. For example, phenylalanine (Phe) in Discussion So much of what we know about the molecular biology and physiology of microorganisms has been learned from batch culture, chemostats, and turbidostats; we forget that this state of balanced growth is totally unnatural and few if any microorganisms experience such conditions in their natural habitats (Koch 1997). Most natural environments are octogonous for nutrients, and energy supplies are always limited. Microorganisms in these habitats exist in a continuous state of starvation that must be addressed in order for them to survive. As species and individuals, they are constantly in competition with other microorganisms for carbon and other essential nutrients. Even E. coli and other species that naturally inhabit the mammalian gut are in a state of starvation for most of their existence (Koch 1971). Only when the host organism has a meal does a 1676 Heizer et al. Table 3 Number of Genes and Spearman Rank Correlation within Functional Categories of 4 Bacterial Species Bacillus subtilis Functional Classification Translation; ribosomal structure and biogenesis Transcription DNA replication; recombination, and repair Cell division and chromosome partitioning Posttranslational modification; protein turnover; chaperones Cell envelope biogenesis; outer membrane Cell motility and secretion Inorganic ion transport and metabolism Signal transduction mechanisms Energy production and conversion Carbohydrate transport and metabolism Amino acid transport and metabolism Nucleotide transport and metabolism Coenzyme metabolism Lipid metabolism Secondary metabolites biosynthesis, transport, and catabolism Escherichia coli K12 Chlamydia trachomatis D/UW-3/CX Synechocystis sp. PCC 6803 No. Genes rS No. Genes rS No. Genes rS No. Genes rS 119 211 92 31 ÿ0.42 ÿ0.07* ÿ0.26 ÿ0.43 137 188 140 27 ÿ0.29 ÿ0.35 ÿ0.46 ÿ0.26* 89 19 52 9 ÿ0.26 ÿ0.41* ÿ0.20* NA 118 61 72 21 ÿ0.21 ÿ0.32 ÿ0.45 ÿ0.42* 75 145 41 135 108 134 238 241 56 93 77 ÿ0.46 ÿ0.37 ÿ0.33 ÿ0.29 ÿ0.33 ÿ0.37 ÿ0.49 ÿ0.30 ÿ0.15* ÿ0.18* ÿ0.21* 104 161 117 134 110 221 302 295 85 113 74 ÿ0.41 ÿ0.14* ÿ0.07* 0.02* ÿ0.39 0.11* ÿ0.25 ÿ0.20 ÿ0.30 ÿ0.11* ÿ0.37 25 29 35 15 13 36 29 46 15 31 32 ÿ0.39* 0.17* ÿ0.25* ÿ0.16* ÿ0.34* ÿ0.43 ÿ0.31* ÿ0.22* 0.25* ÿ0.25* ÿ0.47 109 145 23 137 153 128 95 157 54 96 42 ÿ0.33 ÿ0.29 ÿ0.15* ÿ0.13* ÿ0.37 ÿ0.20 ÿ0.28 ÿ0.33 ÿ0.35 ÿ0.27 ÿ0.24* 74 ÿ0.06* 76 ÿ0.30 4 NA 58 ÿ0.02* NOTE.—Functional categories were obtained from the NCBI Web site; asterisk denotes no statistical significance (P . 0.05); rS is the Spearman rank correlation; No. genes is the number of genes within each functional category. NA: less than 10 genes in this category and thus is not applicable. transient flood of carbon, nitrogen, energy, and other building blocks become available to these organisms. These conditions have led to the development of many adaptive responses necessary to deal with a life of constant feast and famine (Koch 1971). In fact, many bacteria have adapted so well to these oligotrophic conditions that their introduction into an environment rich in nutrients is deleterious and even lethal (Koch 1979; Poindexter 1981; Dykhuizen 1995; Colwell and Grimes 1997). Under these conditions of starvation, any energy saving has the potential to translate into an increased capability for biomass production and, hence, an increased probability that the species will survive. With approximately 20–60 billion high-energy ;PO4 bonds expended in the creation of a single E. coli cell (Stouthamer 1973), any saving of energy in the production of cellular components is evolutionarily desirable. Because highly expressed proteins are often found in concentrations of 50,000–100,000 copies per cell (Akashi and Gojobori 2002), they are a primary production cost to the cell. The production of these proteins is not limited to simply the production of peptide bonds but includes the energy needed to synthesize their amino acid– building blocks. Many parts of a protein can sustain substitution of synonymous amino acids (those similar in their physical and chemical properties) without detriment to the biological function of the protein. Thus, selection for proteins with less energy–expensive amino acids can save considerable energy. Even at a saving of 1–2 high-energy ;PO4 bonds per protein molecule, a total saving of 0.01% of the energy necessary to make an E. coli cell can be easily realized. Under natural conditions of semistarvation and extreme competition, these savings may give a cell just the selective advantage necessary to win the game of natural selection. The ecology of the organisms we evaluated (Miller and Day 2004) and their strong codon-usage biases (Carbone et al. 2005) are consistent with all of them having sufficiently large effective population sizes such that the observed trends are unlikely to be due to random genetic drift. Akashi and Gojobori (2002) demonstrated that the evolutionary record for the chemoheterotrophic mesophiles E. coli and B. subtilis is consistent with this expectation by showing a negative correlation between the frequency of less energy expensive synonymous amino acids and the level of gene expression. Thus, the proteins that are most abundant in the cell have had the greatest selective pressure to substitute energy-thrifty amino acids for those that require more energy to produce. The lifestyles and physiologies of E. coli and B. subtilis allow for relatively unrestricted latitude in amino acid substitution in the nonessential regions of their proteins. However, various environmental conditions such as extreme temperature and metabolic strategies such as phototrophy may greatly alter an organism’s ability to tolerate (or perhaps to even need) such substitutions. To begin an exploration of the effects of metabolic and ecological lifestyle on the correlation between high expressivity and frequency of energy-efficient amino acid utilization, we have confirmed the observations of Akashi and Gojobori (2002) and extended them to 4 additional species with distinctly different metabolic and environmental lifestyles. Our findings for E. coli and B. subtilis are in substantive agreement with those of Akashi and Gojobori (2002). Slight differences between our results and those of Akashi and Gojobori (2002) for E. coli and B. subtilis are likely to be due to minor changes in the NCBI sequence files for the 2 organisms since the original study of Akashi and Gojobori. Of greater importance, however, is our finding that although the general trend observed for these 2 generalists (with similar metabolic pathways and energy-garnering Amino Acid Cost and Codon-Usage Biases in 6 Prokaryotes 1677 FIG. 5.—Comparison of average MCU and average cost in high-energy phosphate bonds (;P) among external amino acids in 6 bacterial species: (A) Bacillus subtilis, (B) Escherichia coli K12, (C) Chlamydia trachomatis, (D) Chlamydophila pneumoniae AR39, (E) Synechocystis sp. PCC 6803, and (F) Thermus thermophilus. Error bars represent standard error of the means of the bins. strategies) also holds for the 4 additional prokaryotic species examined in this study, the specifics reflect their diversity of metabolic and physiological life strategies. Synechocystis sp. PCC 6803 is a photoheterotrophic mesophile that acquires energy from the capture of light energy and obtains carbon from organic forms. This freeliving cyanobacterial genus is found in various aquatic habitats that are often oligotrophic (Ditty et al. 2005). Interestingly, phototrophy does not lead to energy savings in amino acid biosynthesis when compared with a chemoheterotroph such as E. coli (table 4). Consistent with this, Synechocystis sp. PCC 6803 show similar correlation with Escherichia and Bacillus between expressivity and amino acid usage (table 2 and fig. 3). Although a chemoheterotroph, T. thermophilus is an obligate thermophile. Life at high temperature imposes restrictions on protein structures that are not relevant to mesophiles (Kumar et al. 2000; Gianese et al. 2002). Thus, T. thermophilus would be expected to encounter more constraints on substitution of synonymous amino acids and might be forced to sacrifice energetic economy in favor of stability and function in the selection of amino acids for highly expressed protein. Our data support this hypothesis as the Spearman rank correlation for T. thermophilus is less negative than for any of the 3 free-living mesophiles (table 2). This difference is most dramatic in the ambivalent class of amino acids where one might expect the greatest freedom of substitution in mesophiles. Hence, it appears that restraints due to adaptations to its extreme environment have reduced T. thermophilus’s ability to evolve proteins for maximal energy savings. A weak relationship between genomic AT content and average amino acid biosynthetic cost may exist (Akashi and Gojobori 2002; Rocha and Danchin 2002). Indeed, the organism with the highest genomic GC content in this study (T. thermophilus; table 1) does exhibit generally lower costs for its amino acid biosynthesis. Similarly, the organism with the highest AT content (C. trachomatis; table 1) exhibits the highest overall amino acid biosynthetic cost (fig. 3) and also a stronger association between biosynthetic cost and expressivity (table 2). The 2 auxotrophic organisms in this study appear to have adopted different strategies with respect to the use of amino acids they derive from their environment. Chlamydia trachomatis effectively treats auxotrophic amino acids as if their biosynthetic costs were less than the cost it takes to synthesize them by itself (table 2 and fig. 3) and preferentially utilizes these amino acids in highly expressed genes (average percentage of auxotrophic amino acid usage vs. expressivity yields an rS 5 0.24, P , 1 3 10ÿ11). Chlamydophila pneumoniae appears to place greater weight on reliable availability of its heterotrophic amino acids than the biosynthetic cost savings associated 1678 Heizer et al. FIG. 6.—Comparison of average MCU and average cost in high-energy phosphate bonds (;P) among ambivalent amino acids in 6 bacterial species: (A) Bacillus subtilis, (B) Escherichia coli K12, (C) Chlamydia trachomatis, (D) Chlamydophila pneumoniae AR39, (E) Synechocystis sp. PCC 6803, and (F) Thermus thermophilus. Error bars represent standard error of the means of the bins. with its auxotrophic amino acids (average percentage of auxotrophic amino acid usage vs. expressivity yields an rS 5 ÿ0.17, P , 1 3 10ÿ7). Chlamydophila pneumoniae’s transition to auxotrophy is likely to have occurred approximately 700 MYA (Horn et al. 2004), making it unlikely that its avoidance of auxotrophic amino acids is simply a reflection of its ancestral amino acid usage. Despite the alternative strategies of C. trachomatis and C. pneumoniae, both adhere to the universal trend to preferentially utilize inexpensive heterotrophic amino acids in highly expressed genes (fig. 3 and table 2). It is interesting to note that each of these organisms has tended to lose the ability to anabolize the more energetically costly amino acids (see table 3 for individual amino acid costs). Chlamydia trachomatis was determined in these analyses to be auxotrophic for the production of Ala, Thr, Arg, Lys, Met, and Asp. Of these, only alanine is a low-energy amino acid. This trend is even more dramatic in C. pneumoniae, which was determined in these analyses to be auxotrophic for His, Leu, Pro, Ala, Glu, and Gln. The less oligotrophic nature of C. trachomatis’s environment is also reflected in our results. Even among those amino acids that it synthesizes for itself, the Spearman ranking is less negative than for the free-living mesophiles (table 4). This may reflect the organism’s increased ease in obtaining these amino acids from its environment. The results presented in this article demonstrate the usefulness of expanding the studies of amino acid utilization to additional species from various habitats and having various metabolic strategies for the acquisition and synthesis of amino acids and their building blocks. They also suggest several additional lines of investigation. For instance, it may be possible to determine an organism’s ‘‘perceived’’ amino acid biosynthetic cost that, in turn, may lead to the discovery of previously unappreciated biosynthetic or transport opportunities. Analysis of substitutions within classes of amino acids (external, internal, and ambiguous) may also facilitate explorations of the contributions of specific amino acids to protein structure and function. Differences in the average amino acid biosynthetic cost among paralogs (Miller and Day 2004) might be revealed in the event of rapid evolution of one or more of these genes toward new functions or, conversely, their drift toward pseudogene status/irrelevancy. Conformation to energy conservation may even allow estimation of a protein’s (such as the various RecA proteins; Miller 2000) initial appearance in evolutionary time. Similarly, it might also be possible to gauge the rate at which this trend toward energetic efficiency affects change in an organism’s genome through analyses of species such as Coxiella burnetii (Seshadri et al. 2003) that have adopted intracellular lifestyles only recently in evolutionary time (E. Shaw, personal communication). Amino Acid Cost and Codon-Usage Biases in 6 Prokaryotes 1679 Table 4 Amino Acid Abundance across the Genome, Production Costs of Amino Acids (listed by subcategories), Spearman Rank Correlation, and Z-Scores in 4 Bacterial Species Bacillus subtilis Internal External Ambivalent Escherichia coli K12 Chlamydia trachomatis D/ UW-3/CX Synechocystis sp. PCC 6803 Amino Acid Chemo. Cost Photo. Cost rS Z rS Z rS Z rS Z Val Leu Ile Met Phe Asp Asn Glu Gln Arg Lys His Ala Gly Ser Thr Pro Cys Tyr Trp 23.3 27.3 32.3 34.3 52 12.7 14.7 15.3 16.3 27.3 30.3 38.3 11.7 11.7 11.7 18.7 20.3 24.7 50 74.3 23.3 27.3 32.3 34.3 54 12.7 14.7 15.3 16.3 27.3 30.3 40.3 11.7 11.7 11.7 18.7 20.3 24.7 52 76.3 0.08 ÿ0.28 ÿ0.16 ÿ0.14 ÿ0.32 0.17 0.21 0.31 0.07 ÿ0.08 0.33 ÿ0.11 0.05 ÿ0.07 ÿ0.09 0.13 ÿ0.05 ÿ0.14 ÿ0.14 ÿ0.16 0.07* ÿ0.05* ÿ0.03* ÿ0.11* 0.27 0.32 0.03* 0.76 0.03* ÿ0.04* 0.32 0.10* 0.08* 0.19* ÿ0.06* 0.00* 0.00* ÿ0.06* 0.00* 0.02* 0.19 ÿ0.29 ÿ0.04 0.09 ÿ0.10 0.18 0.04 0.20 ÿ0.12 ÿ0.04 0.26 ÿ0.14 0.08 0.23 ÿ0.22 0.01* ÿ0.07 ÿ0.14 ÿ0.01* ÿ0.21 0.24 ÿ0.85 ÿ0.10* 0.23* ÿ0.30 0.83 0.40 1.00 ÿ0.36 ÿ0.01* 1.28 ÿ0.52 ÿ0.15 0.28 ÿ0.48 0.03* ÿ0.30 ÿ1.15 0.30 ÿ1.27 0.29 ÿ0.31 ÿ0.19 0.19 ÿ0.20 0.23 0.00* 0.34 0.02* 0.12 0.22 ÿ0.16 0.10 0.27 ÿ0.25 ÿ0.29 ÿ0.33 ÿ0.10 ÿ0.11 ÿ0.01* 0.03* 0.22* 0.64 0.31* 0.72 0.65 0.29* 1.15 0.07* 0.18* 0.79 0.88 0.36 1.07 ÿ0.06* ÿ0.30* 0.10* 0.30* 0.00* 0.03* 0.18 ÿ0.06 ÿ0.09 0.16 ÿ0.22 0.07 ÿ0.18 0.09 ÿ0.09 0.11 ÿ0.05 0.00* 0.26 0.16 ÿ0.14 0.03* ÿ0.01* ÿ0.06 ÿ0.09 ÿ0.14 0.14* 0.09* 0.36 0.19* 0.55 0.80 0.06* 0.93 ÿ0.05* 0.05* 0.31 0.58 0.64 0.62 ÿ0.02* ÿ0.01* 0.06* 0.67 0.00* 0.06* NOTE.—Asterisk denotes no statistical significance (P . 0.05); rS: Spearman rank correlation, Z: Mantel–Haenszel Z-score, Chemo. cost: chemoheterotrophic costs, and Photo. cost: photoautotrophic costs. Efficient energy management of protein synthesis requires a tight coupling between regulation of amino acid biosynthesis and the need for these amino acids in protein synthesis (Akashi and Gojobori 2002). Conway’s group has demonstrated through microarray analysis that the amino acid biosynthetic genes of E. coli are induced in minimal medium but are repressed in rich medium (Tao et al. 1999). This tight regulation of amino acid biosynthesis is surely an adaptation to the feast–famine (Koch 1979) mode of existence that E. coli naturally encounters on a daily basis. The comparison of amino acid energy cost with microarray data as they become available may ultimately provide a more accurate picture of the correlation of amino acid usage with expressivity than does MCU. This is especially true in organisms where MCU is a weak predictor of expressivity (i.e., the organism does not exhibit a dominant codon-usage bias that increases translational efficiency by exploiting tRNA abundance). Relationships between amino acid–altering mutations and MCU can be weak or nonexistent in such organisms (Akashi 2003). The analyses reported here have confirmed the observations of Akashi and Gojobori (2002) that increased expressivity of genes is correlated with the substitution over evolutionary time of low-energy production amino acids for synonymous amino acids that cost more energy to biosynthesize. It also expands their study to include obligate intracellular pathogens, a thermophilic species, and a photochemotrophic species. Our data demonstrate that, as Akashi and Gojobori (2002) suggested, compliance to this principle varies with lifestyle and habitat. Expansion of analyses of this kind to organisms of various metabolic groups including chemo- and photolithotrophs and to other extremophiles including psychrophiles, barophiles, halo- philes, and even anaerobic fermentative species is likely to provide even further insights into proteome evolution. It will be most interesting to see if our observations and predictions on the restraints that unusual environments place on protein structure will be borne out in the trends seen in their amino acid energy conservation patterns. Acknowledgments We wish to thank R. Burnap and E. Shaw for helpful discussions on the subject of amino acid synthesis and auxotrophy. We also thank the many graduate and undergraduate students that have contributed to the data acquisition and analysis during the course of bioinformatics classes at Wright State University. This study was supported in part by grants no. MBC-0132097 (to R.V.M.) and no. EIA0122582 (to M.L.R., T.E.D., and D.E.K.) from the National Science Foundation. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation. Literature Cited Akashi H. 2003. Translational selection and yeast proteome evolution. Genetics 164:1291–303. Akashi H, Gojobori T. 2002. Metabolic efficiency and amino acid composition in the proteomes of Escherichia coli and Bacillus subtilis. Proc Natl Acad Sci USA 99:3695–700. Altschul S, Gish W, Miller W, Myers E, Lipman D. 1990. Basic local alignment search tool. J Mol Biol 215:403–10. Atkinson D. 1977. Cellular energy metabolism and its regulation. New York: Academic Press. 1680 Heizer et al. Axe D. 2000. Extreme functional sensitivity to conservative amino acid changes on enzyme exteriors. J Mol Biol 301:585–95. Biaudet V, Samson F, Bessieres P. 1997. Micado—a networkoriented database for microbial genomes. Comput Appl Biosci 13:431–8. Carbone A, Kèpés F, Zinovyev A. 2005. Codon bias signatures, organization of microorganisms in codon space, and lifestyle. Mol Biol Evol 22:547–61. Colwell R, Grimes D, editors. 1997. Non-culturable organisms in the environment. New York: Chapman & Hall. Craig C, Weber R. 1998. Selection costs of amino acid substitutions in ColE1 and ColIa gene clusters harbored by Escherichia coli. Mol Biol Evol 15:774–6. dos Reis M, Wernisch L, Savva R. 2003. Unexpected correlations between gene expression and codon usage bias from microarray data for the whole Escherichia coli K-12 genome. Nucleic Acids Res 31:6976–85. Ditty JL, Canales SR, Anderson BE, Williams SB, Golden SS. 2005. Stability of the Synechococcus elongatus pcc 7942 circadian clock under directed clock under directed anti-phase expression of the kai genes. Microbiology 151:2605–13. Dykhuizen D. 1995. Natural selection and the single cell. Symp Soc Gen Microbiol 52:101–73. Eyre-Walker A. 1996. Synonymous codon bias is related to gene length in Escherichia coli: selection for translational accuracy? Mol Biol Evol 13:864–72. Garcia-Vallve S, Guzman E, Montero M, Romeu A. 2003. HGTDB: a database of putative horizontally transferred genes in prokaryotic complete genomes. Nucleic Acids Res 31:187–9. Ghai R, Hain T, Chakraborty T. 2004. GenomeViz: visualizing microbial genomes. BMC Bioinformatics 5:198–203. Gianese G, Bossa F, Pascarella S. 2002. Comparative structural analysis of psychrophilic and meso- and thermophilic enzymes. Proteins 47:236–49. Grantham R, Gautier C, Gouy M. 1980. Codon frequencies in 119 individual genes confirm consistent choices of degenerate bases according to genome type. Nucleic Acids Res 8:1893–912. Grantham R, Gautier C, Gouy M, Pave A. 1980. Codon catalog usage and the genome hypothesis. Nucleic Acids Res 8: r49–62. Hess W. 2004. Genome analysis of marine photosynthetic microbes and their global role. Curr Opin Biotechnol 15:191–8. Horn M, Collingro A, Schmitz-Esser S, et al. (13 co-authors). 2004. Illuminating the evolutionary history of Chlamydiae. Science 304:728–30. Ikemura T. 1981a. Correlation between the abundance of Escherichia coli transfer RNA and the occurrence of the respective codons in its protein genes: a proposal for a synonymous codon choice that is optimal for the Escherichia coli translational system. J Mol Biol 151:389–410. Ikemura T. 1981b. Correlation between the abundance of Escherichia coli transfer RNA and the occurrence of the respective codons in its protein genes. J Mol Biol 146:1–22. Kanaya S, Yamada Y, Kudo Y, Ikemura T. 1999. Studies of codon usage and tRNA genes of 18 unicellular organisms and quantification of Bacillus subtilis tRNAs: gene expression level and species-specific diversity of codon usage based on multivariate analysis. Gene 238:143–55. Koch A. 1971. The adaptive responses of Escherichia coli to a feast and famine existence. Adv Microb Physiol 6:147–217. Koch A. 1979. Microbial growth in low concentrations of nutrients. In: Shilo M, editor. Strategies in microbial life in extreme environments. Dahlem Konferenzen—1978. Weinheim, Germany: Verlag Chemie. p 261–79. Koch A. 1997. Microbial physiology and ecology of slow growth. Microbiol Mol Biol Rev 61:305–18. Kumar S, Tsai C, Mussinov R. 2000. Factors enhancing protein thermostability. Protein Eng 13:179–91. Lafay B, Atherton JC, Sharp PM. 2000. Absence of translationally selected synonymous codon usage bias in Helicobacter pylori. Microbiology 146:851–60. McHardy A, Puehler A, Kalinowski J, Meyer F. 2004. Comparing expression level-dependent features in codon usage with protein abundance: an analysis of Ôpredictive proteomicsÕ. Proteomics 4:46–58. Miller R. 2000. recA: the gene and its protein product. In: Luria S, editor. Encyclopedia of microbiology. 2nd ed. Volume 4. San Diego, CA: Academic Press. p 43–54. Miller R, Day M. 2004. Evolution of the bacterial genome: gene establishment, survival, and exchange. Washington, DC: ASM Press. Ogata H, Goto S, Sato K, Fujibuchi W, Bono H, Kanehisa M. 1999. KEGG: Kyoto encyclopedia of genes and genomes. Nucleic Acids Res 27:29–34. Overbeek R, Larsen N, Pusch G, D’Souza M, Selkov E, Kyrpides N, Fonstein M, Maltsev N, Selkov E. 2000. WIT: integrated system for high-throughput genome sequence analysis and metabolic reconstruction. Nucleic Acids Res 28:123–5. Poindexter J. 1981. Oligotrophy: fast and famine existence. Adv Microb Ecol 5:63–89. Poolman M, Fell D, Thomas S. 2000. Modeling photosynthesis and its control. J Exp Bot 51:319–28. Precup J, Parker J. 1987. Missense misreading of asparagine codons as a function of codon identity and context. J Biol Chem 262:11351–5. Razin S. 1999. Adherence of pathogenic mycoplasmas to host cells. Biosci Rep 19:367–72. Robinson M, Lilley R, Little S, Emtage J, Yarranton G, Stephens P, Millican A, Eaton M, Humphreys G. 1984. Codon usage can affect efficiency of translation of genes in Escherichia coli. Nucleic Acids Res 12:6663–71. Rocha E, Danchin A. 2002. Base composition bias might result from competition for metabolic resources. Trends Genet 18:291–4. Rosner B. 2000. Fundamentals of biostatistics. Pacific Grove, CA: Duxbury. Seshadri R, Paulsen I, Eisen J, et al. (24 co-authors). 2003. Complete genome sequence of the Q-fever pathogen Coxiella burnetii. Proc Natl Acad Sci USA 100:5455–60. Sharp P, Tuohy T, Mosurski K. 1986. Codon usage in yeast—cluster-analysis clearly differentiates highly and lowly expressed genes. Nucleic Acids Res 14:5125–43. Spearman C. 1904. ‘‘General intelligence,’’ objectively determined and measured. Am J Psychol 15:201–93. Stouthamer A. 1973. A theoretical study on the amount of ATP required for synthesis of microbial cell material. Antonie Leeuwenhoek 39:545–65. Tao H, Bausch C, Richmond C, Blattner FR, Conway T. 1999. Functional genomics: expression analysis of Escherichia coli growing on minimal and rich media. J Bacteriol 181:6425–40. Tatusov R, Fedorova N, Jackson J, et al. (17 co-authors). 2003. The COG database: an updated version includes eukaryotes. BMC Bioinformatics 4:41. Zavala A, Naya H, Romero H, Musto H. 2002. Trends in codon and amino acid usage in Thermotoga maritima. J Mol Evol 54:563–8. Edward Holmes, Associate Editor Accepted May 25, 2006