Download Amino Acid Cost and Codon-Usage Biases in 6 Prokaryotic

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Magnesium transporter wikipedia , lookup

Metabolic network modelling wikipedia , lookup

Basal metabolic rate wikipedia , lookup

Ribosomally synthesized and post-translationally modified peptides wikipedia , lookup

Butyric acid wikipedia , lookup

Two-hybrid screening wikipedia , lookup

Endogenous retrovirus wikipedia , lookup

Metalloprotein wikipedia , lookup

Citric acid cycle wikipedia , lookup

Fatty acid metabolism wikipedia , lookup

Fatty acid synthesis wikipedia , lookup

Nucleic acid analogue wikipedia , lookup

Protein wikipedia , lookup

Gene wikipedia , lookup

Hepoxilin wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Peptide synthesis wikipedia , lookup

Proteolysis wikipedia , lookup

Point mutation wikipedia , lookup

Amino acid wikipedia , lookup

Metabolism wikipedia , lookup

Amino acid synthesis wikipedia , lookup

Biochemistry wikipedia , lookup

Biosynthesis wikipedia , lookup

Genetic code wikipedia , lookup

Transcript
Amino Acid Cost and Codon-Usage Biases in 6 Prokaryotic Genomes: A WholeGenome Analysis
Esley M. Heizer Jr,* Douglas W. Raiford, Michael L. Raymer, Travis E. Doom, Robert
V. Miller,à and Dan E. Krane*
*Department of Biological Sciences, Wright State University; Department of Computer Science and Engineering, Wright State
University; and àDepartment of Microbiology and Molecular Genetics, Oklahoma State University
For most prokaryotic organisms, amino acid biosynthesis represents a significant portion of their overall energy budget.
The difference in the cost of synthesis between amino acids can be striking, differing by as much as 7-fold. Two prokaryotic
organisms, Escherichia coli and Bacillus subtilis, have been shown to preferentially utilize less costly amino acids in highly
expressed genes, indicating that parsimony in amino acid selection may confer a selective advantage for prokaryotes. This
study confirms those findings and extends them to 4 additional prokaryotic organisms: Chlamydia trachomatis, Chlamydophila pneumoniae AR39, Synechocystis sp. PCC 6803, and Thermus thermophilus HB27. Adherence to codon-usage
biases for each of these 6 organisms is inversely correlated with a coding region’s average amino acid biosynthetic cost in
a fashion that is independent of chemoheterotrophic, photoautotrophic, or thermophilic lifestyle. The obligate parasites C.
trachomatis and C. pneumoniae AR39 are incapable of synthesizing many of the 20 common amino acids. Removing
auxotrophic amino acids from consideration in these organisms does not alter the overall trend of preferential use of energetically inexpensive amino acids in highly expressed genes.
Introduction
Advances in sequencing technology have provided an
abundance of genomic data from prokaryotic organisms
(Ghai et al. 2004). This abundance has in turn facilitated
a number of whole- and comparative-genome analyses
(Kanaya et al. 1999; Akashi and Gojobori 2002; dos Reis
et al. 2003). Even before whole-genome analysis was possible, it was known that some organisms exhibit striking
biases in their utilization of synonymous triplet codons
(Grantham, Gautier, and Gouy 1980; Grantham, Gautier,
Gouy, and Pave 1980; Robinson et al. 1984; Kanaya
et al. 1999; Akashi 2003) in ways that are correlated with
the relative abundance of the isoacceptor tRNAs for each
codon (Ikemura 1981a, 1981b) and the copy number of
the individual tRNA genes (Kanaya et al. 1999). Adherence
to these codon-usage biases in prokaryotes is selectively advantageous and has been shown to be responsible for 3- to
6-fold differences in translation rates (Robinson et al. 1984)
and up to 10-fold differences in the accuracy of translation
(Precup and Parker 1987).
Some regions of a protein’s primary structure are under strong selective pressure (e.g., active sites), making the
observation of even conservative substitutions uncommon
in naturally occurring populations, whereas other regions of
proteins are much more likely to display sequence variability (Axe 2000). The primary structure of a protein can also
be constrained by a variety of cellular processes, including
the organism’s metabolic pathways (Craig and Weber
1998), the translation rate of the mRNA (Ikemura 1981a,
1981b), and the production cost of the amino acids (Craig
and Weber 1998).
Given that the energy expended to biosynthesize
amino acids varies considerably, ranging from 11.7 (glycine and alanine) to more than 74 (tryptophan) high-energy
phosphate bonds (;PO4) (Akashi and Gojobori 2002), and
Key words: adaptive evolution, codon-usage bias, amino acid cost,
prokaryote, genome, bioinformatics.
E-mail: [email protected].
Mol. Biol. Evol. 23(9):1670–1680. 2006
doi:10.1093/molbev/msl029
Advance Access publication June 5, 2006
Ó The Author 2006. Published by Oxford University Press on behalf of
the Society for Molecular Biology and Evolution. All rights reserved.
For permissions, please e-mail: [email protected]
that energy availability commonly limits prokaryotic
growth, it is reasonable that natural selection would favor
substitutions that resulted in the utilization of less energetically costly amino acids where possible. Manifestation
of such a substitution bias should be greatest in highly
expressed genes in much the same way as adherence to
codon-usage biases tend to be greatest in genes that are
expressed at high levels (Grantham, Gautier, and Gouy
1980; Grantham, Gautier, Gouy, and Pave 1980; Ikemura
1981a; Robinson et al. 1984; Kanaya et al. 1999; Lafay
et al. 2000; dos Reis et al. 2003). Indeed, Akashi and
Gojobori (2002) have demonstrated that genes that adhere
to organismal codon-usage biases most strongly (and, by
inference, are most highly expressed) tend to incorporate
lower cost amino acids in Escherichia coli and Bacillus
subtilis. By performing a Spearman rank correlation
(Spearman 1904), they were able to demonstrate a negative
correlation between the major codon usage (MCU) of a gene
and the average biosynthetic cost of the amino acids incorporated into the expressed protein.
In their analysis, Akashi and Gojobori (2002) performed complete genomic analyses of only 2 organisms
(E. coli and B. subtilis), both of which have all the metabolic pathways necessary for biosynthesis of each of the 20
common amino acids. Although the codon-usage biases of
E. coli and B. subtilis differ (Kanaya et al. 1999), their
amino acid biosynthetic costs, pathways, and capabilities
are the same (Ogata et al. 1999; Overbeek et al. 2000).
Other prokaryotic organisms have significantly different
amino acid biosynthetic costs (Craig and Weber 1998),
pathways (e.g., Hess 2004), and capabilities (e.g., Razin
1999) yet still exhibit pronounced codon-usage biases
(e.g., Zavala et al. 2002). Further, factors other than translation efficiency and accuracy, such as skewed genomic GC
content (McHardy et al. 2004) or adaptation to extreme environmental conditions (Zavala et al. 2002), may also contribute to an organism’s codon-usage and amino acid
utilization biases. If the trends reported by Akashi and
Gojobori (2002) are the result of natural selection favoring
the utilization of amino acids that are less energetically
costly to biosynthesize, then similar but different trends
0
0
6
6
0
0
1,010,337
1,127,136
297,317
330,395
923,920
485,644
3,040
3,318
794
911
2,592
1,492
58
201
2
14
61
9
433
288
33
0
160
312
2
3
0
34
0
0
89
68
0
0
107
13
488
433
66
153
247
156
2
0
0
0
0
0
NOTE.—No. of genes in genome: number of protein-coding genes in the genome; % GC: percentage of genome that comprised guanine and cysteine; partial codons: a codon containing less than 3 nt; Genes ,100 codons: genes less than 100
codons not including start and stop codons; phage-related genes: genes that are phage or transposon related; sequence does not match translation: when translated the nucleotide sequence that does not match the given protein sequence; HGT: any
gene identified as a candidate for horizontal gene transfer; paralog: genes identified as paralogous; and auxotrophic: inability to produce amino acids.
Organism
42
50
41
40
60
69
HGT
Sequence Does
Not Match
Translation
Phage-Related
Genes
Genes ,100
Codons
Partial
Codons
%
GC
Organismal amino acid biosynthetic costs were determined by adding the number of high-energy phosphate
bonds required to synthesize precursor molecules to those
expended to convert the precursors to amino acids as described by Akashi and Gojobori (2002). The amount of potential energy lost by diverting precursors to amino acid
production was also included in the total cost estimates
(Craig and Weber 1998). The 6 organisms considered in
this study utilized 1 of 2 different pathways for amino acid
synthesis (fig. 1): either chemoheterotrophic or photoautotrophic. The principle difference between these 2 pathways
is that photoautotrophs utilize the Calvin cycle (Poolman
et al. 2000) to feed precursors into the Embden–Meyerhof–
Parnas (EMP) pathway. Chemoheterotrophs and photoautotrophs then utilize 3 different central pathways to
No. of
Genes in
Genome
Amino Acid Production Cost
Table 1
Number of Genes in the Genome and the Number Removed by Each Culling Criteria
Annotated files containing gene location information
as well as complete genomic sequences were obtained from
http://www.ncbi.nih.gov/genomes/lproks.cgi for all 6 organisms considered in this study (B. subtilis, E. coli
K12, C. trachomatis, C. pneumoniae AR39, Synechocystis
sp. PCC 6803, and T. thermophilus HB27). Genes that were
described as ‘‘phage related’’ or ‘‘transposon related’’ in the
annotated file were removed from further consideration as
were candidates for horizontal gene transfer (Garcia-Vallve
et al. 2003) because they may not reflect the codon-usage
bias of the organism (dos Reis et al. 2003; Garcia-Vallve
et al. 2003). Genes that were less than 100 codons in length
(not including the start and stop codons) were also removed
from each organism’s gene set to minimize sampling effects
and potential length biases (Eyre-Walker 1996).
All but one version of each paralogous locus was also
removed from each organism’s gene set. Paralogs were
identified with unfiltered Blast (http://www.ncbi.nlm.nih.
gov/BLAST/) searches (Altschul et al. 1990) that were individually performed with all proteins in each organism
against all other proteins from that organism. Proteins with
greater than 60% amino acid identity over a stretch of more
than 60 amino acids in these intraorganismal searches were
considered to be paralogous. This approach made our
results directly comparable to Akashi and Gojobori
(2002). Only the single paralog with a GC content closest
to the organism’s genomic GC content was retained. The
number of genes included for each organism as well as total
codon counts are shown in table 1.
Paralog
Materials and Methods
Genomic Data
4,112
4,311
895
1,112
3,167
1,982
No. of
Genes after
Culling
No. of
Codons
should also be present in prokaryotes with different amino
acid biosynthetic costs, pathways, and capabilities as well
as adaptation to extreme environments. To address this
issue, we have confirmed the original analysis of Akashi
and Gojobori (2002) and expanded it to include organisms
with different amino acid biosynthetic costs and pathways
(Synechocystis sp. PCC 6803) and amino acid production
capabilities (Chlamydia trachomatis and Chlamydophila
pneumoniae AR39) and an extreme thermophile (Thermus
thermophilus HB27).
Bacillus subtilis
Escherichia coli K12
Chlamydia trachomatis
Chlamydophila pneumoniae
Synechocystis sp. PCC 6803
Thermus thermophilus
No. of
Amino Acids
Auxotrophic for
Amino Acid Cost and Codon-Usage Biases in 6 Prokaryotes 1671
1672 Heizer et al.
FIG. 2.—Principle component analysis matrix, codon frequency matrix (cfm), contains the frequency of each codon in each gene in the genome, first principle component is a new vector that best captures the
variability of the cfm, and Z# is the dot product of the cfm and the first
principle component.
FIG. 1.—Metabolic pathways involved in amino acid biosynthesis and
energy production. penP, ribose 5-phosphate; PRPP, 5-phosphoribosyl
pyrophosphate; eryP, erythrose 4-phosphate; 3pg, 3-phosphoglycerate;
pep, phosphoenolpyruvate; pyr, pyruvate; acCoA, acetyl-CoA; akg,
a-ketoglutarate; oaa, oxaloacetate; RuBP, ribulose bisphosphate; TCA, tricarboxylic acid cycle.
synthesize individual amino acids (fig. 1): the EMP
pathway, the citric acid cycle, and the pentose phosphate
pathway (Ogata et al. 1999; Overbeek et al. 2000).
Blast searches were also performed on the complete
genomes of all the six 6 study organisms to determine if
they were capable of synthesizing each of the 20 common
amino acids. When the protein-coding regions were at least
20% identical at the amino acid level with the enzymes
known to be involved in the biosynthetic pathways
(Atkinson 1977), it was concluded that a homolog to the
enzyme used for a particular metabolic step was also available to the organism in question. When 50% or more of
a specific amino acid biosynthetic pathway was absent, it
was concluded that the organism was unable to synthesize
the amino acid in question.
Major Codons and Average Protein Cost
Because protein expression can be difficult to quantify
directly, adherence to MCU bias was calculated as an indicator of gene expressivity. Previous research (Akashi
and Gojobori 2002) used MCU as an indicator of expres-
sion level for E. coli and B. subtilis. These 2 organisms
exhibited a codon-usage bias driven by translational
efficiency (Kanaya et al. 1999) (i.e., major codons and
tRNA abundance were strongly related and ribosomal
protein–coding genes exhibited high MCU scores). Similarly, 5 of the organisms in this study (B. subtilis, E. coli
K12, Synechocystis sp. PCC 6803, C. trachomatis, and
C. pneumoniae) have been shown to have a dominant translational bias (Carbone et al. 2005). The other organism’s
(T. thermophilus) ribosomal protein–coding genes exhibited high MCU scores. Carbone et al. (2005) used codon
adaptation index (CAI) in their analysis. Four of the 6 organisms (B. subtilis, E. coli K12, Synechocystis sp. PCC
6803, and T. thermophilus) exhibited strong correlations
between MCU and CAI scores. CAI was relied upon for
those that did not exhibit such correlations (C. trachomatis
and C. pneumoniae) (Carbone et al. 2005).
For each gene in an organism, the number of occurrences of each codon (excluding those for methionine,
tryptophan, and stop codons) was tallied. Start codons
were excluded because they are known to vary in prokaryotes, although still generally coding for methionine.
Stop codons were likewise excluded because they have
no associated tRNA and thus should not exhibit codonusage bias. Finally, methionine and tryptophan were excluded because they each have only a single codon and thus
exhibit no usage bias. After these exclusions, each of the
resulting 59 codon counts for each gene was then normalized to allow comparisons between genes with differing
numbers of synonymous codons (relative synonymous
codon usage; Sharp et al. 1986). For example, the number
of GGT codons (1 of 4 synonymous codons for glycine)
was divided by the total number of occurrences of all
glycine codons within the gene, which was divided by
the number of synonymous codons for glycine (4). The
normalized values were assembled into a codon frequency
matrix, in which each entry, fi,j, is the normalized count
of codon j in gene i. The projection of these codon frequencies onto the 1-dimensional space that best captures
their variability was obtained by calculating the dot product
between the codon frequency matrix and the first principle
component of this matrix (fig. 2). This projection results
in a single value for each gene that approximates the degree
to which the gene contributes to the codon-usage variability
of the genome. Intuitively, the 1-dimensional space given
by the first principle component represents a trend in
codon usage, and codons are determined to be major if
Amino Acid Cost and Codon-Usage Biases in 6 Prokaryotes 1673
they make a statistically significant positive contribution
to this major trend. Therefore, a codon was designated to
be major for the organism in question if there was a significant and positive correlation between the normalized
frequencies for a particular codon and their projected
values. Once the major codons for an organism were identified, the MCU for each gene was calculated by dividing
the number of major codons in the gene by the total number
of codons (excluding start, stop, methionine, and tryptophan codons).
The average biosynthetic cost of the amino acids in the
proteins coded for by each individual gene in each organism
was calculated by totaling the number of high-energy phosphate bonds (;PO4) needed to synthesize each amino acid
and then dividing by the total number of amino acids in the
protein. As before, start and stop codons were excluded
from consideration and did not contribute to either the phosphate bond or amino acid totals though methionine and
tryptophan residues did. Although the start codon codes
for an amino acid (usually f-methionine), the energy contribution of this amino acid is constant for all proteins and
was not considered.
Statistical Analyses
Once the average biosynthetic cost and adherence to
MCU were determined for each of the organism’s genes,
several statistical tests were performed to determine the
degree to which cost and expressivity (as estimated by
MCU) are correlated. First, genes were independently rank
ordered by average amino acid biosynthetic cost and by
adherence to organismal MCU for each organism. Spearman
rank correlation (Spearman 1904) was performed on these
ranked lists, and the threshold for significance was set at
a 5 0.05.
In order to test the effect of biological function as
a confounding factor, ranking by average amino acid biosynthetic cost and by adherence to organismal MCU for
each organism was also performed after subdividing genes
into the 16 functional categories based upon those listed on
National Center for Biotechnology Information (NCBI;
http://www.ncbi.nlm.nih.gov/COG/) (those that were labeled ‘‘poorly characterized,’’ ‘‘general function prediction
only,’’ or ‘‘function unknown’’ were excluded). Akashi and
Gojobori (2002) used the Micado database (Biaudet et al.
1997) and the University of Wisconsin E. coli Genome
Web site (http://www.genome.wisc.edu/) as sources for
functional category data on B. subtilis and E. coli, respectively. The functional categories used in our analyses came
from a single database (COG database; Tatusov et al. 2003)
in order to assure consistency in classification for all organisms. Genes falling above the median MCU value were
designated as ‘‘high adherence,’’ whereas those that were
below the median were designated as ‘‘low adherence’’
for the purpose of calculating the probability of divergence
from equal amino acid usage in both categories by a
Mantel–Haenszel test (Rosner 2000). A Mantel–Haenszel
Z-score (standard normal) and a Spearman rank correlation
were calculated for each of the 20 amino acids. The threshold for significance for the Mantel–Haenszel test t-statistic
was set at a 5 0.05.
Table 2
Spearman Rank Correlation over Whole Genome and
Internal, External, and Ambivalent Amino Acids
Organism
Bacillus subtilis
Escherichia coli K12
Chlamydia trachomatis1
Chlamydia trachomatis
Chlamydophila pneumoniae1
Chlamydophila pneumoniae
Synechocystis sp. PCC 6803
Thermus thermophilus
rS
rS int
rS ext
rS amb
ÿ0.37
ÿ0.23
ÿ0.25
ÿ0.28
ÿ0.14
ÿ0.08
ÿ0.24
ÿ0.17
ÿ0.29
ÿ0.07
ÿ0.22
ÿ0.23
ÿ0.10
ÿ0.21
ÿ0.24
ÿ0.14
ÿ0.25
ÿ0.16
ÿ0.21
ÿ0.26
ÿ0.08
0.05*
ÿ0.01*
ÿ0.13
ÿ0.16
ÿ0.21
ÿ0.11
ÿ0.08
ÿ0.07
ÿ0.05*
ÿ0.22
ÿ0.02*
NOTE.—Asterisk denotes no statistical significance (P . 0.05); rS, Spearman
rank correlation overall; rS int, Spearman rank correlation, internal (hydrophobic);
rS ext, Spearman rank correlation, external (hydrophilic); rS amb, Spearman rank
correlation, ambivalent (can be hydrophobic or hydrophilic). The number of codons
does not include start or stop codon, and plus denotes values before adjusting for
amino acids that the organism is unable to produce.
Data Representation
For visualization purposes, data were grouped into 20
bins such that approximately 1/20th of the codons for an
organism were placed into each bin. The exact number
of codons varies slightly among bins to avoid dividing
the codons for a single gene between 2 bins.
Results
Correlation between MCU and Amino Acid
Production Cost
Statistically significant negative Spearman rank correlations (P , 0.05) were found between the adherence to
codon-usage bias and the average biosynthetic cost per
encoded amino acid in all 6 organisms analyzed (table 2
and fig. 3).
Chlamydia trachomatis appears to preferentially utilize
auxotrophic amino acids in highly expressed genes (average
percentage of auxotrophic amino acid usage vs. expressivity
yields an rS 5 0.24, P , 1 3 10ÿ11). Despite that, it also
exhibits a significant negative Spearman rank correlation between expressivity and average amino acid biosynthetic cost
(rS 5 ÿ0.25, P , 1 3 10ÿ12). After removing from consideration those amino acids for which C. trachomatis is auxotrophic (Thr, Met, Lys, Arg, Ala, Asp), the trend was even
more negative (rS 5 ÿ0.28, P , 1 3 10ÿ15).
Chlamydophila pneumoniae AR39 appears to avoid
utilization of auxotrophic amino acids in highly expressed
genes (average percentage of auxotrophic amino acid usage
vs. expressivity yields an rS 5 ÿ0.17, P , 1 3 10ÿ7). It
also exhibits a significant negative Spearman rank correlation between expressivity and average amino acid biosynthetic cost (rS 5 ÿ0.14, P , 1 3 10ÿ6). After removing
from consideration those amino acids for which C. pneumoniae AR39 is auxotrophic (His, Leu, Pro, Ala, Glu, Gln),
the trend is still negative (rS 5 ÿ0.08, P 5 0.0083).
Correlation between MCU and Cost in Functional
Categories in Organisms
Proteins in different functional categories may have
differing compositional constraints and expression levels.
NCBI (http://www.ncbi.nih.gov/genomes/lproks.cgi) apportions all genes into 1 of 16 different functional categories
1674 Heizer et al.
FIG. 3.—Comparison of average MCU and average cost in high-energy phosphate bonds (;P) in 6 bacterial species: (A) Bacillus subtilis,
(B) Escherichia coli K12, (C) Chlamydia trachomatis, (D) Chlamydophila pneumoniae AR39, (E) Synechocystis sp. PCC 6803, and (F) Thermus
thermophilus. Error bars represent standard error of the means of the bins.
(Tatusov et al. 2003). Spearman rank correlations were determined in all 16 functional categories for all genomes in the
study except for those of C. pneumoniae and T. thermophilus
(whose genes have not yet been assigned to functional categories). For each Spearman rank correlation coefficient, a
t-statistic was calculated, and a P value was obtained to determine significance. No single category seems to be responsible for the correlation between MCU and average amino acid
production cost (table 3). All 16 subcategories for B. subtilis,
E. coli K12, and Synechocystis sp. PCC 6803 displayed either negative correlations or statistically insignificant correlations between the average amino acid cost and MCU (table
3). The trends observed in our analysis and by Akashi and
Gojobori (2002) are similar, though not directly comparable
due to subtle differences in assignment to functional categories. For C. trachomatis, 3 categories (translation; ribosomal
structure and biogenesis, energy production and conversion,
and lipid metabolism) showed a statistically significant positive correlation. All the other functional categories in this
organism were not statistically significance (table 3).
Correlation between MCU and Cost in Hydrophilic,
Hydrophobic, and Ambivalent Amino Acids
The physicochemical makeup of different amino acids
has a strong effect upon their structural and functional roles
within proteins. Akashi and Gojobori (2002) recognized
this relationship as a potential confounding factor in identifying potential correlations between amino acid cost and
protein expressivity. To identify the role of amino acid
chemistry, the 20 common amino acids were grouped into
3 physicochemical classes: internal, external, and ambivalent. Internal amino acids are largely hydrophobic and tend
to occur within the core of a protein isolated from solvent.
External amino acids are those polar and charged amino
acids that are prevalent at the solvent-exposed surface of
the protein. Finally, ambivalent amino acids are amphipathic or borderline residues that are generally equally frequent within the core or at the surface of a protein. A
Spearman rank correlation test was used to determine the
significance of the relationship between MCU and synthetic
cost of amino acids within each of these 3 categories.
For the hydrophobic amino acids (Phe, Leu, Ile, Met,
Val), statistically significant negative correlations between
MCU and amino acid cost for all organisms were seen (fig. 4).
All organisms, except C. pneumoniae and Synechocystis sp. PCC 6803, gave a negative and significant trend
between MCU and amino acid cost when only hydrophilic
amino acids (His, Arg, Lys, Gln, Glu, Asn, Asp) were considered (see table 2 and fig. 5). The trend in C. pneumoniae
(P 5 0.16) and Synechocystis sp. PCC 6803 was not significant (P 5 0.32).
Amino Acid Cost and Codon-Usage Biases in 6 Prokaryotes 1675
FIG. 4.—Comparison of average MCU and average cost in high-energy phosphate bonds (;P) among internal amino acids in 6 bacterial species:
(A) Bacillus subtilis, (B) Escherichia coli K12, (C) Chlamydia trachomatis, (D) Chlamydophila pneumoniae AR39, (E) Synechocystis sp. PCC 6803, and
(F) Thermus thermophilus. Error bars represent standard error of the means of the bins.
Negative trends between MCU and amino acid production cost were found for ambivalent amino acids
(Trp, Tyr, Cys, Ala, Ser, Gly, Pro, Thr) in B. subtilis, E.
coli K12, C. trachomatis, and Synechocystis sp. PCC
6803 (see table 2 and fig. 6). The correlation for C. pneumoniae (P 5 0.21) and T. thermophilus (P 5 0.10) were
not found to have a significant Spearman rank correlation
coefficient.
table 4 either decreases or has statistically insignificant increases in all 4 organisms where functional categories have
been assigned to genes. The same is true of tryptophan
(Trp) in all cases. Asparagine (Asn) increases in all organisms except Synechocystis sp. PCC 6803. This trend holds
in general with only a few exceptions. The less energetically costly amino acids tend to increase with MCU, and
the energetically costly amino acids tend to decrease (table 4).
Amino Acid Utilization and Gene Expression Levels
In order to look at how individual amino acid usage
changed in highly expressed genes and in functional categories, Spearman rank correlation (Spearman 1904) and
Mantel–Haenszel tests (Rosner 2000) were performed for
each individual amino acid. Significance was determined
by a t-statistic for Spearman rank correlations (P ,
0.05) and by a sequential Bonferroni 2-tailed test (P ,
0.05) for the Mantel–Haenszel tests. The objective of these
tests was to determine if the amino acid was increasing or
decreasing as a function of MCU, both overall and within
functional categories. The results demonstrate that low-cost
amino acids tend to increase with MCU, both overall and
within each of the 16 functional categories examined. Similarly, amino acids with high biosynthetic costs tend to decrease with MCU, both overall and within all 16 functional
categories considered. For example, phenylalanine (Phe) in
Discussion
So much of what we know about the molecular biology and physiology of microorganisms has been learned
from batch culture, chemostats, and turbidostats; we forget
that this state of balanced growth is totally unnatural and
few if any microorganisms experience such conditions in
their natural habitats (Koch 1997). Most natural environments are octogonous for nutrients, and energy supplies
are always limited. Microorganisms in these habitats exist
in a continuous state of starvation that must be addressed in
order for them to survive. As species and individuals, they
are constantly in competition with other microorganisms
for carbon and other essential nutrients. Even E. coli and
other species that naturally inhabit the mammalian gut
are in a state of starvation for most of their existence (Koch
1971). Only when the host organism has a meal does a
1676 Heizer et al.
Table 3
Number of Genes and Spearman Rank Correlation within Functional Categories of 4 Bacterial Species
Bacillus subtilis
Functional Classification
Translation; ribosomal structure and
biogenesis
Transcription
DNA replication; recombination, and repair
Cell division and chromosome partitioning
Posttranslational modification; protein
turnover; chaperones
Cell envelope biogenesis; outer membrane
Cell motility and secretion
Inorganic ion transport and metabolism
Signal transduction mechanisms
Energy production and conversion
Carbohydrate transport and metabolism
Amino acid transport and metabolism
Nucleotide transport and metabolism
Coenzyme metabolism
Lipid metabolism
Secondary metabolites biosynthesis,
transport, and catabolism
Escherichia coli K12
Chlamydia
trachomatis
D/UW-3/CX
Synechocystis
sp. PCC 6803
No. Genes
rS
No. Genes
rS
No. Genes
rS
No. Genes
rS
119
211
92
31
ÿ0.42
ÿ0.07*
ÿ0.26
ÿ0.43
137
188
140
27
ÿ0.29
ÿ0.35
ÿ0.46
ÿ0.26*
89
19
52
9
ÿ0.26
ÿ0.41*
ÿ0.20*
NA
118
61
72
21
ÿ0.21
ÿ0.32
ÿ0.45
ÿ0.42*
75
145
41
135
108
134
238
241
56
93
77
ÿ0.46
ÿ0.37
ÿ0.33
ÿ0.29
ÿ0.33
ÿ0.37
ÿ0.49
ÿ0.30
ÿ0.15*
ÿ0.18*
ÿ0.21*
104
161
117
134
110
221
302
295
85
113
74
ÿ0.41
ÿ0.14*
ÿ0.07*
0.02*
ÿ0.39
0.11*
ÿ0.25
ÿ0.20
ÿ0.30
ÿ0.11*
ÿ0.37
25
29
35
15
13
36
29
46
15
31
32
ÿ0.39*
0.17*
ÿ0.25*
ÿ0.16*
ÿ0.34*
ÿ0.43
ÿ0.31*
ÿ0.22*
0.25*
ÿ0.25*
ÿ0.47
109
145
23
137
153
128
95
157
54
96
42
ÿ0.33
ÿ0.29
ÿ0.15*
ÿ0.13*
ÿ0.37
ÿ0.20
ÿ0.28
ÿ0.33
ÿ0.35
ÿ0.27
ÿ0.24*
74
ÿ0.06*
76
ÿ0.30
4
NA
58
ÿ0.02*
NOTE.—Functional categories were obtained from the NCBI Web site; asterisk denotes no statistical significance (P . 0.05); rS is the Spearman rank correlation; No.
genes is the number of genes within each functional category. NA: less than 10 genes in this category and thus is not applicable.
transient flood of carbon, nitrogen, energy, and other building blocks become available to these organisms. These conditions have led to the development of many adaptive
responses necessary to deal with a life of constant feast
and famine (Koch 1971). In fact, many bacteria have adapted so well to these oligotrophic conditions that their introduction into an environment rich in nutrients is deleterious
and even lethal (Koch 1979; Poindexter 1981; Dykhuizen
1995; Colwell and Grimes 1997).
Under these conditions of starvation, any energy saving has the potential to translate into an increased capability
for biomass production and, hence, an increased probability
that the species will survive. With approximately 20–60
billion high-energy ;PO4 bonds expended in the creation
of a single E. coli cell (Stouthamer 1973), any saving of
energy in the production of cellular components is evolutionarily desirable. Because highly expressed proteins are
often found in concentrations of 50,000–100,000 copies
per cell (Akashi and Gojobori 2002), they are a primary production cost to the cell. The production of these proteins is
not limited to simply the production of peptide bonds but
includes the energy needed to synthesize their amino acid–
building blocks. Many parts of a protein can sustain substitution of synonymous amino acids (those similar in their
physical and chemical properties) without detriment to the
biological function of the protein. Thus, selection for proteins with less energy–expensive amino acids can save considerable energy. Even at a saving of 1–2 high-energy
;PO4 bonds per protein molecule, a total saving of
0.01% of the energy necessary to make an E. coli cell
can be easily realized. Under natural conditions of semistarvation and extreme competition, these savings may give
a cell just the selective advantage necessary to win the game
of natural selection. The ecology of the organisms we evaluated (Miller and Day 2004) and their strong codon-usage
biases (Carbone et al. 2005) are consistent with all of them
having sufficiently large effective population sizes such that
the observed trends are unlikely to be due to random genetic
drift.
Akashi and Gojobori (2002) demonstrated that the
evolutionary record for the chemoheterotrophic mesophiles
E. coli and B. subtilis is consistent with this expectation by
showing a negative correlation between the frequency of
less energy expensive synonymous amino acids and the
level of gene expression. Thus, the proteins that are most
abundant in the cell have had the greatest selective pressure
to substitute energy-thrifty amino acids for those that require more energy to produce. The lifestyles and physiologies of E. coli and B. subtilis allow for relatively
unrestricted latitude in amino acid substitution in the nonessential regions of their proteins. However, various environmental conditions such as extreme temperature and
metabolic strategies such as phototrophy may greatly alter
an organism’s ability to tolerate (or perhaps to even need)
such substitutions.
To begin an exploration of the effects of metabolic and
ecological lifestyle on the correlation between high expressivity and frequency of energy-efficient amino acid utilization, we have confirmed the observations of Akashi and
Gojobori (2002) and extended them to 4 additional species
with distinctly different metabolic and environmental lifestyles. Our findings for E. coli and B. subtilis are in substantive agreement with those of Akashi and Gojobori (2002).
Slight differences between our results and those of Akashi
and Gojobori (2002) for E. coli and B. subtilis are likely to
be due to minor changes in the NCBI sequence files for
the 2 organisms since the original study of Akashi and
Gojobori. Of greater importance, however, is our finding
that although the general trend observed for these 2 generalists (with similar metabolic pathways and energy-garnering
Amino Acid Cost and Codon-Usage Biases in 6 Prokaryotes 1677
FIG. 5.—Comparison of average MCU and average cost in high-energy phosphate bonds (;P) among external amino acids in 6 bacterial species:
(A) Bacillus subtilis, (B) Escherichia coli K12, (C) Chlamydia trachomatis, (D) Chlamydophila pneumoniae AR39, (E) Synechocystis sp. PCC 6803, and
(F) Thermus thermophilus. Error bars represent standard error of the means of the bins.
strategies) also holds for the 4 additional prokaryotic
species examined in this study, the specifics reflect their
diversity of metabolic and physiological life strategies.
Synechocystis sp. PCC 6803 is a photoheterotrophic
mesophile that acquires energy from the capture of light energy and obtains carbon from organic forms. This freeliving cyanobacterial genus is found in various aquatic
habitats that are often oligotrophic (Ditty et al. 2005).
Interestingly, phototrophy does not lead to energy savings
in amino acid biosynthesis when compared with a chemoheterotroph such as E. coli (table 4). Consistent with this,
Synechocystis sp. PCC 6803 show similar correlation with
Escherichia and Bacillus between expressivity and amino
acid usage (table 2 and fig. 3).
Although a chemoheterotroph, T. thermophilus is an
obligate thermophile. Life at high temperature imposes restrictions on protein structures that are not relevant to mesophiles (Kumar et al. 2000; Gianese et al. 2002). Thus, T.
thermophilus would be expected to encounter more constraints on substitution of synonymous amino acids and
might be forced to sacrifice energetic economy in favor
of stability and function in the selection of amino acids
for highly expressed protein. Our data support this hypothesis as the Spearman rank correlation for T. thermophilus is
less negative than for any of the 3 free-living mesophiles
(table 2). This difference is most dramatic in the ambivalent
class of amino acids where one might expect the greatest
freedom of substitution in mesophiles. Hence, it appears
that restraints due to adaptations to its extreme environment
have reduced T. thermophilus’s ability to evolve proteins
for maximal energy savings. A weak relationship between
genomic AT content and average amino acid biosynthetic
cost may exist (Akashi and Gojobori 2002; Rocha and
Danchin 2002). Indeed, the organism with the highest
genomic GC content in this study (T. thermophilus; table 1)
does exhibit generally lower costs for its amino acid biosynthesis. Similarly, the organism with the highest AT content
(C. trachomatis; table 1) exhibits the highest overall amino
acid biosynthetic cost (fig. 3) and also a stronger association between biosynthetic cost and expressivity (table 2).
The 2 auxotrophic organisms in this study appear to
have adopted different strategies with respect to the use
of amino acids they derive from their environment. Chlamydia trachomatis effectively treats auxotrophic amino
acids as if their biosynthetic costs were less than the cost
it takes to synthesize them by itself (table 2 and fig. 3)
and preferentially utilizes these amino acids in highly expressed genes (average percentage of auxotrophic amino
acid usage vs. expressivity yields an rS 5 0.24, P ,
1 3 10ÿ11). Chlamydophila pneumoniae appears to place
greater weight on reliable availability of its heterotrophic
amino acids than the biosynthetic cost savings associated
1678 Heizer et al.
FIG. 6.—Comparison of average MCU and average cost in high-energy phosphate bonds (;P) among ambivalent amino acids in 6 bacterial species:
(A) Bacillus subtilis, (B) Escherichia coli K12, (C) Chlamydia trachomatis, (D) Chlamydophila pneumoniae AR39, (E) Synechocystis sp. PCC 6803, and
(F) Thermus thermophilus. Error bars represent standard error of the means of the bins.
with its auxotrophic amino acids (average percentage of
auxotrophic amino acid usage vs. expressivity yields an
rS 5 ÿ0.17, P , 1 3 10ÿ7). Chlamydophila pneumoniae’s
transition to auxotrophy is likely to have occurred approximately 700 MYA (Horn et al. 2004), making it unlikely
that its avoidance of auxotrophic amino acids is simply a reflection of its ancestral amino acid usage. Despite the alternative strategies of C. trachomatis and C. pneumoniae, both
adhere to the universal trend to preferentially utilize inexpensive heterotrophic amino acids in highly expressed
genes (fig. 3 and table 2).
It is interesting to note that each of these organisms has
tended to lose the ability to anabolize the more energetically
costly amino acids (see table 3 for individual amino acid
costs). Chlamydia trachomatis was determined in these
analyses to be auxotrophic for the production of Ala,
Thr, Arg, Lys, Met, and Asp. Of these, only alanine is
a low-energy amino acid. This trend is even more dramatic
in C. pneumoniae, which was determined in these analyses
to be auxotrophic for His, Leu, Pro, Ala, Glu, and Gln. The
less oligotrophic nature of C. trachomatis’s environment is
also reflected in our results. Even among those amino acids
that it synthesizes for itself, the Spearman ranking is less
negative than for the free-living mesophiles (table 4). This
may reflect the organism’s increased ease in obtaining these
amino acids from its environment.
The results presented in this article demonstrate the
usefulness of expanding the studies of amino acid utilization to additional species from various habitats and having
various metabolic strategies for the acquisition and synthesis of amino acids and their building blocks. They also suggest several additional lines of investigation. For instance, it
may be possible to determine an organism’s ‘‘perceived’’
amino acid biosynthetic cost that, in turn, may lead to
the discovery of previously unappreciated biosynthetic or
transport opportunities. Analysis of substitutions within
classes of amino acids (external, internal, and ambiguous)
may also facilitate explorations of the contributions of specific amino acids to protein structure and function. Differences in the average amino acid biosynthetic cost among
paralogs (Miller and Day 2004) might be revealed in the
event of rapid evolution of one or more of these genes toward new functions or, conversely, their drift toward pseudogene status/irrelevancy. Conformation to energy
conservation may even allow estimation of a protein’s (such
as the various RecA proteins; Miller 2000) initial appearance in evolutionary time. Similarly, it might also be possible to gauge the rate at which this trend toward energetic
efficiency affects change in an organism’s genome through
analyses of species such as Coxiella burnetii (Seshadri et al.
2003) that have adopted intracellular lifestyles only recently
in evolutionary time (E. Shaw, personal communication).
Amino Acid Cost and Codon-Usage Biases in 6 Prokaryotes 1679
Table 4
Amino Acid Abundance across the Genome, Production Costs of Amino Acids (listed by subcategories), Spearman Rank
Correlation, and Z-Scores in 4 Bacterial Species
Bacillus subtilis
Internal
External
Ambivalent
Escherichia coli
K12
Chlamydia
trachomatis D/
UW-3/CX
Synechocystis sp.
PCC 6803
Amino Acid
Chemo. Cost
Photo. Cost
rS
Z
rS
Z
rS
Z
rS
Z
Val
Leu
Ile
Met
Phe
Asp
Asn
Glu
Gln
Arg
Lys
His
Ala
Gly
Ser
Thr
Pro
Cys
Tyr
Trp
23.3
27.3
32.3
34.3
52
12.7
14.7
15.3
16.3
27.3
30.3
38.3
11.7
11.7
11.7
18.7
20.3
24.7
50
74.3
23.3
27.3
32.3
34.3
54
12.7
14.7
15.3
16.3
27.3
30.3
40.3
11.7
11.7
11.7
18.7
20.3
24.7
52
76.3
0.08
ÿ0.28
ÿ0.16
ÿ0.14
ÿ0.32
0.17
0.21
0.31
0.07
ÿ0.08
0.33
ÿ0.11
0.05
ÿ0.07
ÿ0.09
0.13
ÿ0.05
ÿ0.14
ÿ0.14
ÿ0.16
0.07*
ÿ0.05*
ÿ0.03*
ÿ0.11*
0.27
0.32
0.03*
0.76
0.03*
ÿ0.04*
0.32
0.10*
0.08*
0.19*
ÿ0.06*
0.00*
0.00*
ÿ0.06*
0.00*
0.02*
0.19
ÿ0.29
ÿ0.04
0.09
ÿ0.10
0.18
0.04
0.20
ÿ0.12
ÿ0.04
0.26
ÿ0.14
0.08
0.23
ÿ0.22
0.01*
ÿ0.07
ÿ0.14
ÿ0.01*
ÿ0.21
0.24
ÿ0.85
ÿ0.10*
0.23*
ÿ0.30
0.83
0.40
1.00
ÿ0.36
ÿ0.01*
1.28
ÿ0.52
ÿ0.15
0.28
ÿ0.48
0.03*
ÿ0.30
ÿ1.15
0.30
ÿ1.27
0.29
ÿ0.31
ÿ0.19
0.19
ÿ0.20
0.23
0.00*
0.34
0.02*
0.12
0.22
ÿ0.16
0.10
0.27
ÿ0.25
ÿ0.29
ÿ0.33
ÿ0.10
ÿ0.11
ÿ0.01*
0.03*
0.22*
0.64
0.31*
0.72
0.65
0.29*
1.15
0.07*
0.18*
0.79
0.88
0.36
1.07
ÿ0.06*
ÿ0.30*
0.10*
0.30*
0.00*
0.03*
0.18
ÿ0.06
ÿ0.09
0.16
ÿ0.22
0.07
ÿ0.18
0.09
ÿ0.09
0.11
ÿ0.05
0.00*
0.26
0.16
ÿ0.14
0.03*
ÿ0.01*
ÿ0.06
ÿ0.09
ÿ0.14
0.14*
0.09*
0.36
0.19*
0.55
0.80
0.06*
0.93
ÿ0.05*
0.05*
0.31
0.58
0.64
0.62
ÿ0.02*
ÿ0.01*
0.06*
0.67
0.00*
0.06*
NOTE.—Asterisk denotes no statistical significance (P . 0.05); rS: Spearman rank correlation, Z: Mantel–Haenszel Z-score, Chemo. cost: chemoheterotrophic costs, and
Photo. cost: photoautotrophic costs.
Efficient energy management of protein synthesis
requires a tight coupling between regulation of amino acid
biosynthesis and the need for these amino acids in protein
synthesis (Akashi and Gojobori 2002). Conway’s group has
demonstrated through microarray analysis that the amino
acid biosynthetic genes of E. coli are induced in minimal
medium but are repressed in rich medium (Tao et al.
1999). This tight regulation of amino acid biosynthesis is
surely an adaptation to the feast–famine (Koch 1979) mode
of existence that E. coli naturally encounters on a daily basis. The comparison of amino acid energy cost with microarray data as they become available may ultimately provide
a more accurate picture of the correlation of amino acid usage with expressivity than does MCU. This is especially
true in organisms where MCU is a weak predictor of expressivity (i.e., the organism does not exhibit a dominant
codon-usage bias that increases translational efficiency
by exploiting tRNA abundance). Relationships between
amino acid–altering mutations and MCU can be weak or
nonexistent in such organisms (Akashi 2003).
The analyses reported here have confirmed the observations of Akashi and Gojobori (2002) that increased expressivity of genes is correlated with the substitution
over evolutionary time of low-energy production amino
acids for synonymous amino acids that cost more energy
to biosynthesize. It also expands their study to include obligate intracellular pathogens, a thermophilic species, and
a photochemotrophic species. Our data demonstrate that,
as Akashi and Gojobori (2002) suggested, compliance to
this principle varies with lifestyle and habitat. Expansion
of analyses of this kind to organisms of various metabolic
groups including chemo- and photolithotrophs and to other
extremophiles including psychrophiles, barophiles, halo-
philes, and even anaerobic fermentative species is likely
to provide even further insights into proteome evolution.
It will be most interesting to see if our observations and
predictions on the restraints that unusual environments
place on protein structure will be borne out in the trends
seen in their amino acid energy conservation patterns.
Acknowledgments
We wish to thank R. Burnap and E. Shaw for helpful
discussions on the subject of amino acid synthesis and auxotrophy. We also thank the many graduate and undergraduate students that have contributed to the data acquisition
and analysis during the course of bioinformatics classes at
Wright State University. This study was supported in part
by grants no. MBC-0132097 (to R.V.M.) and no. EIA0122582 (to M.L.R., T.E.D., and D.E.K.) from the National
Science Foundation. Any opinions, findings, and conclusions or recommendations expressed in this material are
those of the authors and do not necessarily reflect the views
of the National Science Foundation.
Literature Cited
Akashi H. 2003. Translational selection and yeast proteome evolution. Genetics 164:1291–303.
Akashi H, Gojobori T. 2002. Metabolic efficiency and amino acid
composition in the proteomes of Escherichia coli and Bacillus
subtilis. Proc Natl Acad Sci USA 99:3695–700.
Altschul S, Gish W, Miller W, Myers E, Lipman D. 1990. Basic
local alignment search tool. J Mol Biol 215:403–10.
Atkinson D. 1977. Cellular energy metabolism and its regulation.
New York: Academic Press.
1680 Heizer et al.
Axe D. 2000. Extreme functional sensitivity to conservative amino
acid changes on enzyme exteriors. J Mol Biol 301:585–95.
Biaudet V, Samson F, Bessieres P. 1997. Micado—a networkoriented database for microbial genomes. Comput Appl Biosci
13:431–8.
Carbone A, Kèpés F, Zinovyev A. 2005. Codon bias signatures,
organization of microorganisms in codon space, and lifestyle.
Mol Biol Evol 22:547–61.
Colwell R, Grimes D, editors. 1997. Non-culturable organisms in
the environment. New York: Chapman & Hall.
Craig C, Weber R. 1998. Selection costs of amino acid substitutions in ColE1 and ColIa gene clusters harbored by Escherichia coli. Mol Biol Evol 15:774–6.
dos Reis M, Wernisch L, Savva R. 2003. Unexpected correlations
between gene expression and codon usage bias from microarray data for the whole Escherichia coli K-12 genome. Nucleic
Acids Res 31:6976–85.
Ditty JL, Canales SR, Anderson BE, Williams SB, Golden SS.
2005. Stability of the Synechococcus elongatus pcc 7942 circadian clock under directed clock under directed anti-phase expression of the kai genes. Microbiology 151:2605–13.
Dykhuizen D. 1995. Natural selection and the single cell. Symp
Soc Gen Microbiol 52:101–73.
Eyre-Walker A. 1996. Synonymous codon bias is related to gene
length in Escherichia coli: selection for translational accuracy?
Mol Biol Evol 13:864–72.
Garcia-Vallve S, Guzman E, Montero M, Romeu A. 2003. HGTDB: a database of putative horizontally transferred genes in
prokaryotic complete genomes. Nucleic Acids Res 31:187–9.
Ghai R, Hain T, Chakraborty T. 2004. GenomeViz: visualizing
microbial genomes. BMC Bioinformatics 5:198–203.
Gianese G, Bossa F, Pascarella S. 2002. Comparative structural
analysis of psychrophilic and meso- and thermophilic
enzymes. Proteins 47:236–49.
Grantham R, Gautier C, Gouy M. 1980. Codon frequencies in 119
individual genes confirm consistent choices of degenerate bases according to genome type. Nucleic Acids Res 8:1893–912.
Grantham R, Gautier C, Gouy M, Pave A. 1980. Codon catalog
usage and the genome hypothesis. Nucleic Acids Res 8:
r49–62.
Hess W. 2004. Genome analysis of marine photosynthetic
microbes and their global role. Curr Opin Biotechnol
15:191–8.
Horn M, Collingro A, Schmitz-Esser S, et al. (13 co-authors).
2004. Illuminating the evolutionary history of Chlamydiae.
Science 304:728–30.
Ikemura T. 1981a. Correlation between the abundance of Escherichia coli transfer RNA and the occurrence of the respective
codons in its protein genes: a proposal for a synonymous codon
choice that is optimal for the Escherichia coli translational system. J Mol Biol 151:389–410.
Ikemura T. 1981b. Correlation between the abundance of Escherichia coli transfer RNA and the occurrence of the respective
codons in its protein genes. J Mol Biol 146:1–22.
Kanaya S, Yamada Y, Kudo Y, Ikemura T. 1999. Studies of codon
usage and tRNA genes of 18 unicellular organisms and quantification of Bacillus subtilis tRNAs: gene expression level and
species-specific diversity of codon usage based on multivariate
analysis. Gene 238:143–55.
Koch A. 1971. The adaptive responses of Escherichia coli to
a feast and famine existence. Adv Microb Physiol 6:147–217.
Koch A. 1979. Microbial growth in low concentrations of
nutrients. In: Shilo M, editor. Strategies in microbial life in extreme environments. Dahlem Konferenzen—1978. Weinheim,
Germany: Verlag Chemie. p 261–79.
Koch A. 1997. Microbial physiology and ecology of slow growth.
Microbiol Mol Biol Rev 61:305–18.
Kumar S, Tsai C, Mussinov R. 2000. Factors enhancing protein
thermostability. Protein Eng 13:179–91.
Lafay B, Atherton JC, Sharp PM. 2000. Absence of translationally
selected synonymous codon usage bias in Helicobacter pylori.
Microbiology 146:851–60.
McHardy A, Puehler A, Kalinowski J, Meyer F. 2004. Comparing
expression level-dependent features in codon usage with protein abundance: an analysis of Ôpredictive proteomicsÕ. Proteomics 4:46–58.
Miller R. 2000. recA: the gene and its protein product. In: Luria S,
editor. Encyclopedia of microbiology. 2nd ed. Volume 4. San
Diego, CA: Academic Press. p 43–54.
Miller R, Day M. 2004. Evolution of the bacterial genome: gene
establishment, survival, and exchange. Washington, DC: ASM
Press.
Ogata H, Goto S, Sato K, Fujibuchi W, Bono H, Kanehisa M.
1999. KEGG: Kyoto encyclopedia of genes and genomes.
Nucleic Acids Res 27:29–34.
Overbeek R, Larsen N, Pusch G, D’Souza M, Selkov E, Kyrpides
N, Fonstein M, Maltsev N, Selkov E. 2000. WIT: integrated
system for high-throughput genome sequence analysis and
metabolic reconstruction. Nucleic Acids Res 28:123–5.
Poindexter J. 1981. Oligotrophy: fast and famine existence. Adv
Microb Ecol 5:63–89.
Poolman M, Fell D, Thomas S. 2000. Modeling photosynthesis
and its control. J Exp Bot 51:319–28.
Precup J, Parker J. 1987. Missense misreading of asparagine
codons as a function of codon identity and context. J Biol
Chem 262:11351–5.
Razin S. 1999. Adherence of pathogenic mycoplasmas to host
cells. Biosci Rep 19:367–72.
Robinson M, Lilley R, Little S, Emtage J, Yarranton G, Stephens
P, Millican A, Eaton M, Humphreys G. 1984. Codon usage can
affect efficiency of translation of genes in Escherichia coli. Nucleic Acids Res 12:6663–71.
Rocha E, Danchin A. 2002. Base composition bias might result
from competition for metabolic resources. Trends Genet
18:291–4.
Rosner B. 2000. Fundamentals of biostatistics. Pacific Grove, CA:
Duxbury.
Seshadri R, Paulsen I, Eisen J, et al. (24 co-authors). 2003. Complete genome sequence of the Q-fever pathogen Coxiella burnetii. Proc Natl Acad Sci USA 100:5455–60.
Sharp P, Tuohy T, Mosurski K. 1986. Codon usage in
yeast—cluster-analysis clearly differentiates highly and lowly
expressed genes. Nucleic Acids Res 14:5125–43.
Spearman C. 1904. ‘‘General intelligence,’’ objectively determined and measured. Am J Psychol 15:201–93.
Stouthamer A. 1973. A theoretical study on the amount of ATP
required for synthesis of microbial cell material. Antonie
Leeuwenhoek 39:545–65.
Tao H, Bausch C, Richmond C, Blattner FR, Conway T. 1999.
Functional genomics: expression analysis of Escherichia coli
growing on minimal and rich media. J Bacteriol 181:6425–40.
Tatusov R, Fedorova N, Jackson J, et al. (17 co-authors). 2003.
The COG database: an updated version includes eukaryotes.
BMC Bioinformatics 4:41.
Zavala A, Naya H, Romero H, Musto H. 2002. Trends in codon
and amino acid usage in Thermotoga maritima. J Mol Evol
54:563–8.
Edward Holmes, Associate Editor
Accepted May 25, 2006