Download Studies of codon usage and tRNA genes of 18 unicellular organisms

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Gene nomenclature wikipedia , lookup

Non-coding DNA wikipedia , lookup

No-SCAR (Scarless Cas9 Assisted Recombineering) Genome Editing wikipedia , lookup

Short interspersed nuclear elements (SINEs) wikipedia , lookup

Epigenetics of neurodegenerative diseases wikipedia , lookup

Genetic engineering wikipedia , lookup

Mitochondrial DNA wikipedia , lookup

Copy-number variation wikipedia , lookup

Transposable element wikipedia , lookup

Therapeutic gene modulation wikipedia , lookup

Epitranscriptome wikipedia , lookup

Oncogenomics wikipedia , lookup

Metagenomics wikipedia , lookup

Polycomb Group Proteins and Cancer wikipedia , lookup

Gene desert wikipedia , lookup

Genomics wikipedia , lookup

Nutriepigenomics wikipedia , lookup

Quantitative trait locus wikipedia , lookup

Human genome wikipedia , lookup

Genomic library wikipedia , lookup

Public health genomics wikipedia , lookup

Gene expression programming wikipedia , lookup

Site-specific recombinase technology wikipedia , lookup

Essential gene wikipedia , lookup

History of genetic engineering wikipedia , lookup

Helitron (biology) wikipedia , lookup

Genome editing wikipedia , lookup

RNA-Seq wikipedia , lookup

Genomic imprinting wikipedia , lookup

Pathogenomics wikipedia , lookup

Epigenetics of human development wikipedia , lookup

Biology and consumer behaviour wikipedia , lookup

Ridge (biology) wikipedia , lookup

Designer baby wikipedia , lookup

Expanded genetic code wikipedia , lookup

Genome (book) wikipedia , lookup

Gene wikipedia , lookup

Microevolution wikipedia , lookup

Gene expression profiling wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Genetic code wikipedia , lookup

Transfer RNA wikipedia , lookup

Minimal genome wikipedia , lookup

Genome evolution wikipedia , lookup

Transcript
Gene 238 (1999) 143–155
www.elsevier.com/locate/gene
Studies of codon usage and tRNA genes of 18 unicellular organisms
and quantification of Bacillus subtilis tRNAs: gene expression
level and species-specific diversity of codon usage based on
multivariate analysis
Shigehiko Kanaya a,b, Yuko Yamada c, Yoshihiro Kudo a, Toshimichi Ikemura d, *
a Department of Electrical and Information Engineering, Faculty of Engineering, Yamagata University, Yonezawa, Yamagata-ken 992-8510, Japan
b Division of Physiological Genetics, Department of Ontogenetics, National Institute of Genetics, Mishima, Shizuoka-ken 411-8540, Japan
c Department of Biochemistry, Jichi Medical School, Kawachi-gun, Tochigi-ken 329-0498, Japan
d Division of Evolutionary Genetics, Department of Population Genetics, National Institute of Genetics, Mishima, Shizuoka-ken 411-8540, Japan
Received 1 March 1999; received in revised form 7 May 1999; accepted 1 June 1999
Abstract
We examined codon usage in Bacillus subtilis genes by multivariate analysis, quantified its cellular levels of individual tRNAs,
and found a clear constraint of tRNA contents on synonymous codon choice. Individual tRNA levels were proportional to the
copy number of the respective tRNA genes. This indicates that the tRNA gene copy number is an important factor to determine
in cellular tRNA levels, which is common with Escherichia coli and yeast Saccharomyces cerevisiae. Codon usage in 18 unicellular
organisms whose genomes have been sequenced completely was analyzed and compared with the composition of tRNA genes.
The 18 organisms are as follows: yeast S. cerevisiae, Aquifex aeolicus, Archaeoglobus fulgidus, B. subtilis, Borrelia burgdorferi,
Chlamydia trachomatis, E. coli, Haemophilus influenzae, Helicobacter pylori, Methanococcus jannaschii, Methanobacterium thermoautotrophicum, Mycobacterium tuberculosis, Mycoplasma genitalium, Mycoplasma pneumoniae, Pyrococcus horikoshii, Rickettsia
prowazekii, Synechocystis sp., and Treponema pallidum. Codons preferred in highly expressed genes were related to the codons
optimal for the translation process, which were predicted by the composition of isoaccepting tRNA genes. Genes with specific
codon usage are discussed in connection with their evolutionary origins and functions. The origin and terminus of replication
could be predicted on the basis of codon usage when the usage was analyzed relative to the transcription direction of individual
genes. © 1999 Elsevier Science B.V. All rights reserved.
Keywords: Bacillus subtilis; Codon usage; Principal component analysis; Trna content
1. Introduction
With the progress of genome projects, a vast amount
of nucleotide sequence data is now available, which
makes it possible to study the general characteristics of
codon usage for a wide range of organisms. Multivariate
analyses, such as factor corresponding analysis and
principal component analysis, have been used to study
systematically heterogeneous codon usage in various
species (Grantham et al., 1980; Medigue et al., 1991;
Pouwels and Leunissen, 1994; Andersson and Sharp,
1996; Kanaya et al., 1996a; Guerdoux-Jamet et al., 1997;
* Corresponding author. Tel.: +81-559-81-6788;
fax: +81-559-81-6794.
E-mail address: [email protected] ( T. Ikemura)
Kunst et al., 1997). To characterize the species-specific
heterogeneity of genes in codon usage, we have developed a measure denoted by Z based on the widest
1
range of axis obtained by principal component analysis
of codon frequency patterns of genes ( Kanaya et al.,
1996a) and have analyzed several bacterial species
( Kanaya et al., 1996b; Nakayama et al., 1999).
The relative proportions of isoaccepting tRNAs in
cells are important factors that determine synonymous
codon choice in individual genes. Codon choice patterns
in unicellular organisms such as Escherichia coli and
Saccharomyces cerevisiae have been studied extensively
and revealed that codon usage in highly expressed genes
is typically dependent on tRNA content ( Ikemura,
1981a, b, 1982). Codons that were optimal for the
translation process (optimal codons) were assigned for
0378-1119/99/$ – see front matter © 1999 Elsevier Science B.V. All rights reserved.
PII: S0 3 7 8 -1 1 1 9 ( 9 9 ) 0 0 22 5 - 5
144
S. Kanaya et al. / Gene 238 (1999) 143–155
E. coli and S. cerevisiae based on the cellular contents
of isoacceptors and the strength of interaction between
anticodon and codon. The extent of codon bias for each
gene has been associated with the level of protein
production in E. coli and S. cerevisiae (Ikemura, 1981a,
b, 1982, 1985a, b; Gouy and Gautier, 1982; Medigue
et al., 1991; reviewed in Andersson and Kurland 1990;
Sharp and Matassi, 1994) and in T4 bacteriophage
( Kunisawa, 1992).
Levels of the optimal codon use in E. coli genes and
their protein production levels were correlated with the
aforementioned Z ( Kanaya et al., 1996a). In the present
1
study, we examined codon usage in Bacillus subtilis
genes with multivariate analysis, and determined the
optimal codons on the basis of its cellular tRNA
contents. As of 1998, the complete genomic sequences
of the following 18 unicellular species have been determined: Aquifex aeolicus (Deckert et al., 1998),
Archaeoglobus fulgidus ( Klenk et al., 1997), B. subtilis
( Kunst et al., 1997), Borrelia burgdorferi (Fraser et al.,
1997), Chlamydia trachomatis (Stephens et al., 1998),
E. coli (Blattner et al., 1997), Haemophilus influenzae
(Fleischmann et al., 1995), Helicobacter pylori ( Tomb
et al., 1997), Methanococcus jannaschii (Bult et al.,
1996), Methanobacterium thermoautotrophicum (Smith
et al., 1997), Mycobacterium tuberculosis (Cole et al.,
1998), Mycoplasma genitalium (Fraser et al., 1995),
Mycoplasma pneumoniae (Himmelreich et al., 1996),
Pyrococcus horikoshii ( Kawarabayasi et al., 1998),
Rickettsia prowazekii (Andersson et al., 1998),
Synechocystis sp. ( Kaneko et al., 1996), Treponema
pallidum (Fraser et al., 1998), and S. cerevisiae (Mewes
et al., 1997). Codon usage heterogeneities for these
species were related to the level of the optimal codon
use that was predicted by the copy numbers of isoaccepting tRNA genes.
2. Materials and methods
2.1. Quantification of B. subtilis tRNAs
To purify and quantify B. subtilis tRNAs, two-dimensional polyacrylamide gel electrophoresis was performed
in two different ways (A and B systems) as described in
Ikemura (1989). The first dimension of electrophoresis
was done in 0.3×TBE on 10% acrylamide gels for
system A and on 14% acrylamide for system B. The
second dimension was separated on 22% acrylamide gels
with 7 M urea for both systems. The separated molecules
were assigned to known tRNAs by RNase-fingerprinting
and their relative contents were measured, as described
by Ikemura and Ozeki (1977).
2.2. Computer analysis of codon usage
We characterized the latent structure of speciesspecific codon usage heterogeneity in genes with known
functions based on principal component analysis. Since
the methodology has been described previously ( Kanaya
et al., 1996a), we provide an outline here. A measure
based on the widest scale of gene distribution in codon
frequency space was constructed based on principal
component analysis, and this scale is denoted by Z in
1
Eq. (3). The procedure for estimating Z is as follows.
1
First, Z∞ was estimated by principal component analysis
1
for a data set consisting of codon usage patterns of
genes for a species:
Z∞ =b · X,
(1)
1 1
where b is a vector consisting of the first principal
1
component coefficients b ( j=1, 2, …, M ). Here, M
1j
represents the total codon number, and X is the codon
usage vector for a gene. To exclude effects of gene size,
amino acid composition, and codon box number, codon
frequency for the i(m)th codon x was calculated with
i(m)
Eq. (2):
C
D
M(m)
∑ f /M(m) .
(2)
i(m)
i=1
Here, f denotes the i(m)th synonymous codon number
i(m)
for the mth amino acid, and M(m) denotes its codon
box number.
The variable Z∞ was normalized by Eq. (3):
1
Z =(Z∞ −Av[Z∞ ])/SD[Z∞ ].
(3)
1
1
1
1
Here, Av[Z∞ ] and SD[Z∞ ] are the average and standard
1
1
deviation of Z∞ for the data set. With this scaling, we
1
obtained statistical information for genes in the principal
component axes. For example, if a gene has a Z larger
1
than 2, then codon usage is very biased in genes of this
species because, theoretically, only 2.3% of genes are
included in this interval.
The contribution of the ith codon frequency to the
first principal component is represented by the factor
loadings in Eq. (4):
x
i(m)
=f
i(m)
/
r(Z∞ , X )=Cov[Z∞ , X ]/( Var[Z∞ ] Var[X ])1/2.
(4)
1 i
1 i
1
i
Here, Cov[A, B] and Var[A] denote the covariance
between two variables, A and B, and the variance of A,
respectively.
3. Results and discussion
3.1. Synonymous codon choice in B. subtilis genes is
constrained by translation efficiency
Fig. 1 shows that ribosomal protein genes, which
were assumed to be the constitutively highly expressed
S. Kanaya et al. / Gene 238 (1999) 143–155
145
Fig. 1. Gene distribution in Z . All, all genes analyzed; Rp, genes encoding ribosomal proteins; Rp(mit) and Rp(cyt), nuclear ribosomal protein
1
genes used in mitochondria and cytoplasm in S. cerevisiae; Spo, sporulation genes; Pro, prophage genes; Ret, retrotransposable elements; TN,
transposable elements; cag, genes in cag cluster; Chl, 210 Synechocystis genes with homologues in any chloroplast; PKS, genes encoding polyketide
synthetases; Mit I and II, R. prowazekii genes with homologues in R. americana mitochondrial genome and human mitochondrial genome.
146
S. Kanaya et al. / Gene 238 (1999) 143–155
Fig. 2. Factor loadings. Red and blue marks represent optimal codons predicted by the four rules and by Bennetzen and Hall (1982), respectively.
Abbreviations of species are as follows: Aa, A. aeolicus; Af, A. fulgidus; Bs, B. subtilis; Bb, B. burgdorferi; Ct, C. trachomatis; Ec, E. coli; Hi, H.
influenzae; Hp, H. pylori; Mj, M. jannaschii; Mt, M. thermoautotrophicum; Mb, M. tuberculosis; Mg, M. genitalium; Mp, M. pneumoniae; Ph, P.
horikoshii; Rp, R. prowazekii; Ss, Synechocystis sp.; Tp, T. pallidum; Sc, S. cerevisiae. These abbreviations are used also in other figures. Bimodal
distributions of Z for all genes of B. burgdorferi and C. trachomatis are observed ( Fig. 1 N and O). Ribosomal protein genes are concentrated in
1
the positive Z peak (red histogram), and encoded on the leading strands (see also Fig. 6A, B). Codons contributing positively to Z do not
1
1
correspond to the optimal codons predicted by Rule 4 or the rule proposed by Bennetzen and Hall (1982). Asymmetric base composition between
the leading and lagging strands (Lobry, 1996; McInerney, 1998) may affect more significantly in Z for these two organisms than for others.
1
S. Kanaya et al. / Gene 238 (1999) 143–155
genes and used to obtain the codon adaptation index
(Sharp and Li, 1987; Nakamura and Tabata, 1997), are
distributed in the positive region of Z in B. subtilis (red
1
histogram in Fig. 1A). The cause of the biased use
among synonymous codons can be explained by the
factor loadings in Fig. 2; i.e., genes composed of codons
with positive factor loadings have positive Z values,
1
and genes composed of codons with negative factor
loadings have negative Z values. We investigated this
1
bias with respect to optimization for the translation
process. Ikemura (1985a) proposed four rules for assigning the optimal codons of E. coli and S. cerevisiae.
Codon choices are constrained by the cellular amounts
of isoaccepting tRNAs (Rule 1); modified uridines such
as thiolated uridine and 5-carboxymethyluridine at the
anticodon wobble position produce a preference for A
over G at the codon third position (Rule 2); an inosine
at the anticodon wobble position produces a preference
for U or C over A at the third position (Rule 3); and
in two-codon sets of the (A/U )–(A/U )-pyrimidine type,
C rather than U at the third position promotes an
optimal interaction strength between codon and anticodon (Rule 4; Grosjean and Fiers, 1982).
B. subtilis tRNAs were separated by two-dimensional
polyacrylamide gel electrophoresis (Fig. 3), and assigned
to known tRNAs, and their relative contents were
quantified ( Table 1). Based on the tRNA quantification
data and the four rules listed above, the optimal codons
in B. subtilis were estimated ( Table 1). Fig. 2 shows that
most of the optimal codons (marked in red) contribute
positively to Z , and Fig. 1 shows that most ribosomal
1
protein genes have positive Z values and thus use the
1
optimal codons. This is totally consistent with previous
findings that synonymous codon choice in E. coli and
S. cerevisiae genes is constrained by translation efficiency
(Ikemura, 1985a). Highly expressed genes of these
organisms are almost always more dependent on the
tRNA content and tend to have a strong bias of codon
usage. This common characteristic among the three
organisms should reflect the fact that, of all the cellular
processes, the greatest amounts of energy and mass are
required for translation (Maaløe, 1976).
3.2. Copy number of tRNA genes and optimal codons
Fig. 4A illustrates the correlation between B. subtilis
tRNA contents and tRNA gene copy numbers. Here,
the tRNAs with the same anticodon were grouped into
a single isoacceptor species regardless of minor base
differences in other positions. The correlation coefficient
was 0.86, demonstrating a clear gene-dosage effect on
tRNA contents. This is consistent with the previous
findings for E. coli (Ikemura, 1981a; Dong et al., 1996)
and S. cerevisiae (Percudani et al., 1997), and indicates
a possibility that the tRNA contents of other species
can be estimated by the tRNA gene copy number. If so,
147
Fig. 3. Two-dimensional gel separation of 32P-labeled B. subtilis
tRNAs. (A, B) Systems A and B described in Materials and methods.
Spot numbers assigned to known tRNAs in each system are presented
in Table 1.
this could be a valuable tool to study genome properties
and codon usage for a wide range of species whose
genome has been sequenced completely. Fig. 5 shows
the composition of tRNA genes for the 18 unicellular
species. In H. influenzae, genes for the isoacceptors for
14 amino acids are multiplied. The respective tRNA
genes are also multiplied in B. subtilis, but three of them
are not in E. coli (Fig. 5), indicating that the multiplication pattern of H. influenzae is more similar to that of
B. subtilis, which is a phylogenetically more distant
species than E. coli. The GC contents of H. influenzae
and B. subtilis (38.1 and 43.5%, respectively) are lower
than that of E. coli (50.8%), and this AT-richness may
require multiplication of tRNA species with anticodons
matching the codons with A or U at the third position.
Interestingly, the factor loading of H. influenzae is most
similar to that of B. subtilis (r=0.77) among the 18
unicellular species (Fig. 2). Amplification of isoacceptor
genes and genome GC content should have coevolved.
Codons denoted by red marks in the H. influenzae
148
S. Kanaya et al. / Gene 238 (1999) 143–155
Table 1
tRNA genes in B. subtilis genome
Amino
acid
Arg
Leu
Ser
Ala
Gly
Pro
Thr
Val
Ile
Asn
Asp
Cys
Gln
Glu
His
Lys
Phe
Tyr
Trp
Met
fMet
Anticodon
ACG
CCG
CCU
UCU
UAA
CAA
GAG
UAG
CAG
UGA
GGA
GCU
UGC
GGC
UCC
GCC
UGG
UGU
GGU
UAC
GAC
GAU
CAU
GUU
GUC
GCA
UUG
UUC
GUG
UUU
GAA
GUA
CCA
CAU
CAU
Modified at the
wobble positiona
I
mnm5U
*
*
mo5U
mo5U
cmnm5U
mo5U
mo5U
mo5U
L
Q
*
*
Q
cmnm5s2U
Gm
Q
Gene copy
No.
4
1
1
1
3
1
1
2
1
2
1
2
5
1
3
4
3
4
1
4
1
3
1
4
4
1
4
6
2
4
3
2
1
2
3
Gel spot ID
A
B
46
55
25
34
52
43
53
56
54, 55
15
16
51
32
42
31, 41
24
13
10
11
23
22, 26
45
52
51
63
56
61, 62
22
53
34
31
36
14
12
11, 13
16, 21, 33
42
33
43
45
12
24, 25
26
46
23
35
32
21
36, 35
14
tRNA
content
Codon
recognized
0.48
nd
nd
0.13
0.41
0.22
0.20
0.20
0.15
0.20
0.13
0.20
1.24
0.23
0.66
0.85
1.12
1.19
0.16
0.90
0.42
1.42
0.19
1.20
1.31
np
np
1.52
np
0.73
0.75
0.38
0.15
0.54
1.00
CGC, CGU, CGA
CGG
AGG
AGA
UUA, UUG
UUG
CUC, CUU
CUA, CUG
CUG
UCA, UCG, UCU
UCC, UCU
AGC, AGU
GCA, GCG, GCU
GCC, GCU
GGA, GGG, GGU
GGC, GGU
CCA, CCG, CCU
ACA, ACG, ACU
ACC, ACU
GUA, GUG, GUU
GUC, GUU
AUC, AUU
AUA
AAC, AAU
GAC, GAU
UGC, UGU
CAA, CAG
GAA, GAG
CAC, CAU
AAA, AAG
UUC, UUU
UAC, UAU
UGG
AUG
AUG
a Modified at the wobble position (Sprinzl et al., 1998; Y. Yamada, unpublished ); mo5U, 5-methoxyuridine; I, inosine; cmnm5U,
5-carboxymethylaminomethyluridine; cmnm5s2U, 5-carboxymethylaminomethyl-2-thiouridine; Q, queuosine; Gm, 2∞-O-methylguanosine; mnm5U,
5-methylaminomethyluridine; L, lysidine; *, unidentified modified base. The content for tRNA fMet was normalized to 1.0; nd, not determined,
presumably because of low contents; np, not obtained in a pure form. Optimal codons are denoted by underline at the column of codon recognized.
Detail procedures for assigning optimal codons were described in Ikemura (1981b).
column ( Fig. 2) are the optimal codons predicted on
the basis of the copy number of tRNA genes. The
predicted optimal codons clearly correspond to preferred
codons in ribosomal protein genes in H. influenzae,
demonstrating the validity of the prediction.
The copy number of rRNA genes varies between
species. Multiplication of tRNA or rRNA genes can
increase the level of the respective gene product circumventing the limited capacity of a single promoter
(Nomura and Morgan, 1977). To support efficient protein synthesis, the genes for tRNAs and rRNAs should
have concordantly increased in their copy numbers.
Actually, the copy number of rRNA genes is correlated
closely with the total number of tRNA genes ( Fig. 4B).
In most species we analyzed, ribosomal protein genes
tend to have positive Z values ( Fig. 1), but this ten1
dency decreases when the gene copy number for individual isoacceptors decreases. In other words, the tendency
is weaker in organisms where only one gene encodes
each isoacceptor species (Figs. 4C and 5). Even in such
organisms, tRNA sets differ between organisms (Fig. 5).
Bacteria with high GC contents such as M. tuberculosis
and T. pallidum tend to have additional tRNA species
that respond solely to a G or C at the codon third
position; i.e., tRNA with C or G at the anticodon first
position. Organisms with high GC contents usually
require larger number of isoacceptors than organisms
with low GC contents. Therefore, the number of tRNA
species is correlated with GC content ( Fig. 4D).
However, in Micrococcus luteus, which has an extremely
S. Kanaya et al. / Gene 238 (1999) 143–155
149
Fig. 4. (A) Relation between tRNA contents and tRNA gene number of B. subtilis. The level for tRNA fMet was normalized to 1.0. (B) Relation
between rRNA gene number and total number of tRNA genes. Correlation coefficient between rRNA gene number and the square of the total
tRNA gene number was calculated with or without the S. cerevisiae data. (C ) Relation between total number of tRNA genes and bias of ribosomal
proteins in Z . The Z bias for ribosomal proteins represents the average of differences between Z∞ for ribosomal proteins and the average of Z∞
1
1
1
1
for all genes; and log (number of tRNA genes) represents the common logarithm of the number of tRNA genes. (D) Relation between genome
GC content and number of isoaccepting tRNA species.
high GC content (74 G+C%), tRNAs with the UNN
anticodon are lost, and some NNA-type codons are
absent in the protein genes ( Kano et al., 1991). These
demonstrate that genome GC content affects not only
codon usage (Sueoka, 1962; Bernardi and Bernardi,
1985; Muto and Osawa, 1987; Sueoka, 1988), but also
tRNA composition (Osawa, 1995). In M. luteus and M.
capricolum, while each isoacceptor species is encoded
primarily by a single gene, the relative levels of the
isoacceptors are known to be different and clearly correlated with codon usage patterns ( Kano et al., 1991;
Yamao et al., 1991). This suggests that codon usage,
even in bacteria where each isoacceptor is encoded by a
single gene, is constrained by translation efficiency. The
cellular tRNA levels have been quantified only for a
few of the aforementioned unicellular organisms, and
therefore the optimal codons based on Rules 1–3 cannot
be deduced in most cases. Rule 4, however, is independent of tRNA content and thus is generally applicable;
AUC, AAC, UUC, and UAC are the optimal codons
for Ile, Asn, Phe, and Tyr, respectively. Interestingly,
all four codons contributed positively to the Z (denoted
1
by red mark in Fig. 2) in most organisms examined.
This should not be due to genome GC content because
Fig. 5. Compositions of tRNA genes. The rRNA-based phylogenetic tree reported by Pace (1997) and Olsen et al. (1994) is depicted in the column of Phylogeny. Of tRNAs annotated in the
GenBank/EMBL/DDBJ database, a few tRNAs whose clover-leaf structures could not be built up are removed from this analysis. Total number of tRNA genes in parentheses represents that of
isoacceptor types. Results of Mycoplasma capricolum tRNAs (denoted by Mc) are obtained from Yamao et al. (1991) and Andachi et al. (1989).
150
S. Kanaya et al. / Gene 238 (1999) 143–155
S. Kanaya et al. / Gene 238 (1999) 143–155
151
Fig. 6. Profiles of Z along individual DNA strands for (A) B. burgdorferi, (B) C. trachomatis, (C ) B. subtilis, and (D) E. coli. Blue and red
1
represent Z for genes on Watson and Crick strands. Origin, replication origin; arrows, the direction of replication. In B. subtilis and E. coli,
1
smoothing Z for ±10 contiguous genes was conducted.
1
152
S. Kanaya et al. / Gene 238 (1999) 143–155
out of the 18 organisms examined, 12 have an AT-rich
genome and only one has a GC-rich genome ( Fig. 5).
By combining Rules 1–4, a total of 117 optimal codons
could be predicted for 16 organisms; B. burgdorferi and
C. trachomatis are discussed separately in the legend to
Fig. 2. Of the 117 optimal codons, 110 contribute positively to Z and only 7 contribute negatively to Z (red
1
1
marks in Fig. 2). Some codons that were not assigned
to optimal codons contributed positively to Z . One
1
trivial case is that the respective isoacceptor was not
quantified either because the isoacceptor was not purified
or the sequence was not known. Another case is that
there are two major isoacceptors with an equivalent
abundance (e.g., B. subtilis tRNASer), and therefore the
optimal codon for the amino acid could not be assigned.
To explain codon usage patterns of a wide range of
species, Ikemura and Ozeki (1982) summarized a total
of eight rules, some of which are not related directly
with translation efficiency and are applicable only to a
limited class of species. The preferred codons being not
assigned to optimal codons may relate with such rules
that are not connected with translation efficiency. The
point to be stressed in the present study is that most of
the optimal codons for the examined species contribute
positively to Z . This demonstrates that codon usage in
1
most bacteria, if not all, is constrained by translation
efficiency. This finding is again consistent with the fact
that, of all the cellular processes, the greatest amounts
of energy and mass are required for translation (Maaløe,
1976). Recently, the four rules related with translation
efficiency have been successfully applied for determining
optimal codons in Drosophila melanogaster (Moriyama
and Powell, 1997). Bennetzen and Hall (1982) proposed
another rule concerning a codon preference; a preference
for codons that can form a standard Watson–Crick base
pair at the codon third position over those that would
require wobble pairing. This preference is seen in codon
choice of Asp and His (blue marks in Fig. 2) in most
organisms.
It is worth noting the correlation between Z values
1
and the gene attrition process from ancestral to presentday chloroplast and mitochondria genomes. In the
cyanobacteria Synechocystis sp., the genes involved in
photosynthesis and those for which homologues are
present in present-day chloroplasts are of particular
interest, because chloroplasts evolved from cyanobacteria-like endosymbionts (Gray, 1992). Martin et al.
(1998) illustrated the process of gene attrition from
ancestral to present-day chloroplasts focusing on 210
Synechocystis genes and their homologues present in a
wide range of chloroplast genomes. Interestingly, the
Synechocystis genes with homologues in chloroplast
genomes tend to have positive Z values (yellow histo1
gram in Fig. 1J ). These findings support the view that
codon usage in Synechocystis genes with homologues in
the present-day chloroplast genomes have been opti-
mized for translational efficiency presumably during the
entire course of evolution. This view also appears applicable to Rickettsia genes. Mitochondria evolved from
eubacteria-like endosymbionts (Gray, 1992) whose closest known relative is Rickettsia according to rRNAbased phylogenetic analyses ( Yang et al., 1985; Olsen
et al., 1994). The mitochondrial genome in the freshwater protozoon Reclinomonas americana resembles
most closely the ancestral proto-mitochondrial genome
and is assumed to be a primitive mitochondrial genome
(Lang et al., 1997). The genome size (69 kbp) is fourtimes larger than vertebrate mitochondrial genomes (16–
18 kbp). Rickettsia prowazekii seems to be the closest
relative of R. americana mitochondria ( Yang et al.,
1985; Olsen et al., 1994). R. prowazekii genes with
homologues in the mitochondrial genes tend to have
positive Z values (green histograms in Fig. 1P). This
1
supports the view that the genes with optimal codons in
the R. prowazekii genome tends to have retained in
present-day mitochondrial genomes during evolution.
3.3. Genes with negative Z
1
In the above sections, we primarily considered positive Z values and highly expressed genes. We extended
1
our focus to genes with negative Z values in connection
1
with the life-form of individual organisms and gene
functions. Horizontally transferred genes, such as transposable elements in bacteria, retrotransposable elements
of yeast, and prophage genes in B. subtilis, tend to have
negative Z values (blue histograms denoted by ‘TN’,
1
‘Ret’, ‘Pro’ in Fig. 1). A negative Z was also obtained
1
for the cag cluster in H. pylori (a pathogenicity island
containing genes that stimulate production of
interleukin-8 by gastric epithelial cells) (Censini et al.,
1996; ‘cag’ in Fig. 1I ), for most genes in flagellar operons
in E. coli, T. pallidum, and A. aeolicus, and for genes
associated with type III protein secretion in C. trachomatis. Collectively, codon usage in horizontally transferred genes and in those involved in pathogenicity is
clearly distinct from that in highly expressed intrinsic
genes.
In B. subtilis, over 100 genes are involved in the
sporulation process (Stragier and Losick, 1996). The
expression of these genes is regulated by six sigma
factors, sA, sH, sE, sF, sG, and sK. Sporulation genes
regulated by sporulation-specific s factors (sE, sF, sG,
and sK) tend to have negative Z values (‘Spo’ in
1
Fig. 1A) and do not use the optimal codons identified
on the basis of the present tRNA contents except for
four genes; sspC, sspB, and sspG regulated by sG, and
cotG regulated by sK, show very biased codon usage
(Z >2.0). The four gene products are required in large
1
amounts for spore construction, and therefore, the optimal codons may be used preferentially. This is also the
case for flagellar synthesis genes. In B. subtilis, the
S. Kanaya et al. / Gene 238 (1999) 143–155
flagellin gene hag uses optimal codons preferentially,
although most other genes involved in flagellar synthesis
tend to have negative Z values. In E. coli, the flagellin
1
gene, which is required in large amounts to assemble
flagellar filaments (Macnab, 1987), has a high Z ,
1
although other genes have negative Z values. For M.
1
tuberculosis, foreign-type genes (Fig. 1J ), such as
polyketide synthesis (Hopwood, 1997), prophages, and
transposable elements tend to have negative Z values.
1
In S. cerevisiae, sporulation and germination genes
also tend to have negative Z values (‘Spo’ in Fig. 1B).
1
Nuclear genes for ribosomal proteins used in mitochondria tend to have negative Z values (‘Rp(mit)’ in
1
Fig. 1B) in common with retrotransposable elements
(‘Ret’), although genes for ribosomal proteins used in
the cytoplasm tend to have high Z values (red histo1
gram). This may be explained by the fact that the former
ribosomal genes are foreign-type for the yeast nuclear
genome and/or by their lower expression levels.
153
organization near the replication origin may have kept
a more ancient pattern in B. subtilis than in E. coli; i.e.,
the genetic organization of E. coli has undergone gross
changes by duplications and translocations during evolution (Ogasawara et al. 1985; Bachmann, 1983). This
property of Z should be useful to identify the replication
1
origin in a wide range of genomes, and presumably the
terminus in small genomes.
Acknowledgements
This work was supported by Grant-in-Aid of
Scientific Research from the Ministry of Education,
Science and Culture of Japan. The computers at the
DDBJ and the Human Genome Center of Japan were
used. The authors are very grateful to Ms Kazuko
Suzuki for technical assistance.
3.4. Prediction of replication origin and terminus by Z
1
In E. coli, B. subtilis, and B. burgdorferi, highly
expressed genes usually orient along the genome so that
transcription is in the same direction as replication
(Brewer, 1988; Ziegler and Dean, 1990; McInerney,
1998). Therefore, replication origins may be detected
by Z -distribution profiles along the reported genome
1
sequence when the direction of transcription of individual genes is taken into account. Genes transcribed in
the same direction as the leading strand progression
may have positive Z values, and genes transcribed in
1
the opposite direction to the leading strand progression
may have negative Z values. If so, a switch of Z from
1
1
positive to negative in one direction and a switch from
negative to positive in the other direction should be
observed at the replication origin. In linear chromosome
in B. burgdorferi, a switch of Z -profile was observed at
1
the position of 0.46 Mbp (Fig. 6A); the Z -profile on
1
one strand switches from negative to positive (red marks)
and on the other strand switches from positive to
negative (blue marks). This switch site corresponds to
the replication origin reported by Fraser et al. (1997).
In Z -profiles of the circular chromosome for C. tracho1
matis, the replication origin and the terminus appear to
be located at 0.72 Mbp and 0.2 Mbp, respectively
(Fig. 6B). The fact that the replication origin is annotated as being between 71 998 and 72 058 bp in the
database shows the validity of the present method. In
B. subtilis, similar clear switch was observed also near
the origin when profiles were analyzed by smoothing
Z for contiguous genes ( Fig. 6C ). However, in E. coli
1
the strand asymmetry was observed only in the close
vicinity of replication origin ( Fig. 6D). The fact that the
asymmetry in B. subtilis is more evident than that in E.
coli may relate with the previous finding that genetic
References
Andachi, Y., Yamao, F., Muto, A., Osawa, S., 1989. Codon recognition patterns as deduced from sequences of the complete set of
transfer RNA species in Mycoplasma capricolum. J. Mol. Biol.
209, 37–54.
Andersson, S.G., Kurland, C.G., 1990. Codon preferences in freeliving microorganisms. Microbiol. Rev. 54, 198–210.
Andersson, S.G.E., Sharp, P.M., 1996. Codon usage in the Mycobacterium tuberculosis complex. Microbiology 142, 915–925.
Andersson, S.G.E., Zomorodipour, A., Andersson, J.O., SicheritzPonten, T., Alsmark, U.C.M., Podowski, R.M., Naslund, A.K.,
Eriksson, A., Winkler, H.H., Kurland, C.G., 1998. The genome
sequence of Rickettsia prowazekii and the origin of mitochondria...
Nature 396, 133–140.
Bachmann, B.J., 1983. Linkage map of Escherichia coli K-12, Edition
7. Microbiol. Rev. 47, 180–230.
Bennetzen, J.L., Hall, B.D., 1982. Codon selection in yeast. J. Biol.
Chem. 257, 3026–3031.
Bernardi, G., Bernardi, G., 1985. Codon usage and genome composition. J. Mol. Evol. 22, 363–365.
Blattner, F.R., Plunkett III, G., Bloch, C.A., et al., 1997. The complete
genome sequence of Escherichia coli K-12. Science 277, 1453–1462.
Brewer, B.J., 1988. When polymerases collide: replication and the transcriptional organization of the E. coli chromosome. Cell 53,
679–686.
Bult, C.J., White, O., Olsen, G.J., et al., 1996. Complete genome
sequence of the methanogenic Archaeon, Methanococcus jannaschii. Science 273, 1058–1073.
Censini, S., Lange, C., Xiang, Z., Crabtree, J.E., Ghiara, P., Borodovsky, M., Rappuoli, R., Covacci, A., 1996. cag, a pathogenicity
island of Helicobacter pylori, encodes type I-specific and diseaseassociated virulence factors. Proc. Natl. Acad. Sci. USA 93,
14648–14653.
Cole, S.T., Brosch, R., Parkhill, J., et al., 1998. Deciphering the biology
of Mycobacterium tuberculosis from the complete genome sequence.
Nature 393, 537–544.
Deckert, G., Warren, P.V., Gaasterland, T., et al., 1998. The complete
154
S. Kanaya et al. / Gene 238 (1999) 143–155
genome of the hyperthermophilic bacterium Aquifex aeolicus.
Nature 392, 353–358.
Dong, H., Nilsson, L., Kurland, C.G., 1996. Co-variation of tRNA
abundance and codon usage in Escherichia coli at different growth
rates. J. Mol. Biol. 260, 649–663.
Fleischmann, R.D., Adams, M.D., White, O., et al., 1995. Wholegenome random sequencing and assembly of Haemophilus influenzae Rd. Science 269, 496–512.
Fraser, C.M., Casjens, S., Huang, W.M., et al., 1997. Genomic
sequence of a Lyme disease spirochetaete, Borrelia burgdorferi.
Nature 390, 580–586.
Fraser, C.M., Gocayne, J.D., White, O., et al., 1995. The minimal gene
complement of Mycoplasma genitalium. Science 270, 397–403.
Fraser, C.M., Norris, S.J., Weinstock, G.M., et al., 1998. Complete
genome sequence of Treponema pallidum, the syphilis spirochete.
Science 281, 375–388.
Grantham, R., Gautier, C., Gouy, M., Mercier, R., Pave, A., 1980.
Codon catalog usage and the genome hypothesis. Nucleic Acids
Res. 8, r49–r62.
Gray, M.W., 1992. The endosymbiont hypothesis revisited. Int. Rev.
Cyt. 141, 233–357.
Gouy, M., Gautier, C., 1982. Codon usage in bacteria: correlation
with gene expressivity. Nucleic Acids Res. 10, 7055–7074.
Grosjean, H., Fiers, W., 1982. Preferential codon usage in prokaryotic
genes: the optimal codon-anticodon interaction energy and the
selective codon usage in efficiently expressed genes. Gene 18,
199–209.
Guerdoux-Jamet, P., Henaut, A., Nitschke, P., Risler, J., Danchin, A.,
1997. Using codon usage to predict genes origin: is the Escherichia
coli outer membrane a patchwork of products from different
genomes? DNA Res. 4, 257–265.
Himmelreich, R., Hilbert, H., Plagens, H., Pirkl, E., Li, B., Herrmann,
R., 1996. Complete sequence analysis of the genome of the bacterium Mycoplasma pneumoniae. Nucleic Acids Res. 24, 4420–4449.
Hopwood, D.A., 1997. Genetic contributions to understanding polyketide synthases. Chem. Rev. 97, 2465–2497.
Ikemura, T., 1981a. Correlation between the abundance of Escherichia
coli transfer RNAs and the occurrence of the respective codons in
its protein genes. J. Mol. Biol. 146, 1–21.
Ikemura, T., 1981b. Correlation between the abundance of Escherichia
coli transfer RNAs and the occurrence of the respective codons in
its protein genes: a proposal for a synonymous codon choice that
is optimal for the E. coli translational system. J. Mol. Biol. 151,
389–409.
Ikemura, T., 1982. Correlation between the abundance of yeast transfer
RNAs and the occurrence of the respective codons in protein genes:
differences in synonymous codon choice patterns of yeast and Escherichia coli with reference to the abundance of isoaccepting transfer
RNAs. J. Mol. Biol. 158, 573–597.
Ikemura, T., 1985a. Codon usage and tRNA content in unicellular
and multicellular organisms. Mol. Biol. Evol. 2, 13–34.
Ikemura, T., 1985b. Codon usage, tRNA content and rate of synonymous substitution. In: Ohta, T., Aoki, K. ( Eds.), Population Genetics and Molecular Evolution. Japan Science Society Press,
pp. 385–406.
Ikemura, T., 1989. Purification of RNA molecules by gel techniques.
Methods Enzymol. 180, 14–25.
Ikemura, T., Ozeki, H., 1977. Gross map location of Escherichia coli
transfer RNA genes. J. Mol. Biol. 117, 419–446.
Ikemura, T., Ozeki, H., 1982. Codon usage and transfer RNA contents:
organism-specific codon-choice patterns in reference to the isoacceptor contents. Cold Spring Harbor Symp.Quant. Biol. 47,
1087–1097.
Kanaya, S., Kudo, Y., Nakamura, Y., Ikemura, T., 1996a. Detection
of genes in Escherichia coli sequences determined by genome projects and prediction of protein production levels, based on multivariate diversity in codon usage. CABIOS 12, 213–225.
Kanaya, S., Kudo, Y., Suzuki, S., Ikemura, T., 1996b. Systematization
of species-specific diversity of genes in codon usage: comparison of
the diversity among bacteria and prediction of the protein production levels in cells in: Akutsu, T.et al., (Eds.), Genome Informatics Series No.7. Universal Academy Press, Tokyo, pp. 61–71.
Kaneko, T., Sato, S., Kotani, H., et al., 1996. Sequence analysis of the
genome of the unicellular cyanobacterium Synechocystis sp. strain
PCC6803. II. Sequence determination of the entire genome and
assignment of potential protein-coding regions. DNA Res. 3,
109–136.
Kano, A., Andachi, Y., Ohama, T., Osawa, S., 1991. Novel anticodon
composition of transfer RNAs in Micrococcus luteus, a bacterium
with a high genomic G+C content. Correlation with codon usage.
J Mol. Biol. 20, 387–401.
Kawarabayasi, Y., Sawada, M., Horikawa, H., et al., 1998. Complete
sequence and gene organization of the genome of a hyper-thermophilic Archaebacterium, Pyrococcus horikoshii OT3. DNA Res. 5,
55–76.
Klenk, H., Clayton, R.A., Tomb, J., et al., 1997. The complete genome
sequence of the hyperthermophilic, sulphate-reducing archaeon
Archaeoglobus fulgidus. Nature 390, 364–370.
Kunisawa, T., 1992. Synonymous codon preferences in bacteriophage
T4: a distinctive use of transfer RNAs from T4 and from its Host
Escherichia coli. J. Theor. Biol. 159, 287–298.
Kunst, F., Ogasawara, N., Moszer, I., et al., 1997. The complete
genome sequence of the Gram-positive bacterium Bacillus subtilis.
Nature 390, 249–256.
Lang, B.F., Burger, G., O’Kelly, C.J., Cedergren, R., Golding, G.B.,
Lemieux, C., Sankoff, D., Turmel, M., Gray, M.W., 1997. An
ancestral mitochondrial DNA resmebling a eubacterial genome in
miniature. Nature 387, 493–497.
Lobry, J.R., 1996. Asymmetric substitution patterns in the two DNA
strands of bacteria. Mol. Biol. Evol. 13, 660–665.
Maaløe, O., 1976. Past, present and future trends. In: Kjeldgaard,
N.C., Maaløe, O. ( Eds.), Control of Ribosome Synthesis in: Alfred
Benzon Symposium IX. Munksgaard, Copenhagen, pp. 15–21.
Macnab, R.M., 1987. Flagella, Neidhardt, F.C., Ingraham, J.L.Low,
K.B., Magasanik, B., Schaechter, M., Umbarger, H.E. ( Eds.),
Escherichia coli and Salmonella typhimurium. American Society for
Microbiology vol. 1, 70–83.
Martin, W., Stoebe, B., Goremykin, V., Hansmann, S., Hasegawa, M.,
Kowallik, K.V., 1998. Gene transfer to the nucleus and the evolution of chloroplasts. Nature 393, 162–165.
McInerney, J.O., 1998. Replication and transcriptional selection on
codon usage in Borrelia burgdorferi. Proc. Natl. Acad. Sci. USA
95, 10698–10703.
Medigue, C., Rouxel, T., Vigier, P., Henaut, A., Danchin, A., 1991.
Evidence for horizontal gene transfer in Escherichia coli speciation.
J. Mol. Biol. 222, 851–856.
Mewes, H.W., Albermann, K., Bahr, M., Frishman, D., Gleissner, A.,
Hani, J., Heumann, K., Kleine, K., Maierl, A., Oliver, S.G.,
Pfeiffer, F., Zollner, A., 1997. Overview of the yeast genome.
Nature 387, 7–9.
Moriyama, E.N., Powell, J.R., 1997. Codon usage bias and tRNA
abundance in Drosophila. J. Mol. Evol. 45, 514–523.
Muto, A., Osawa, S., 1987. The guanine and cytosine content of genomic DNA and bacterial evolution. Proc. Natl. Acad. Sci. USA
84, 166–169.
Nakamura, Y., Tabata, S., 1997. Codon-anticodon assignment and
detection of codon usage trends in seven microbial genomes. Microbiol. Comp. Genom. 2, 299–312.
Nakayama, K., Kanaya, S., Ohnishi, M., Terawaki, Y., Hayashi, T.,
1999. The complete nucleotide sequence of wCTX, a cytotoxinconverting phage of Pseudomonas aeruginosa: implications for
phage evolution and horizontal gene transfer via bacteriophages.
Mol. Microbiol. 31, 399–419.
S. Kanaya et al. / Gene 238 (1999) 143–155
Nomura, M., Morgan, E.A., 1977. Genetics of bacterial ribosomes.
Annu. Rev. Genet. 11, 297–347.
Ogasawara, N., Moriya, S., von Meyenburg, K., Hansen, F.G., Yoshikawa, H., 1985. Conservation of genes and their organization in
the chromosomal replication origin region of Bacillus subtilis and
Escherichia coli. EMBO J. 4, 3345–3350.
Olsen, G.J., Woese, C.R., Overbeek, R., 1994. The winds of (evolutionary) change: breathing new life into microbiology. J. Bacteriol.
176, 1–6.
Osawa, S., 1995. Evolution of the Genetic Code. Oxford University
Press, Oxford.
Pace, N.R., 1997. A molecular view of microbial diversity and the
biosphere. Science 276, 734–740.
Percudani, R., Pavesi, A., Ottonello, S., 1997. Transfer RNA gene
redundancy and translational selection in Saccharomyces cerevisiae.
J. Mol. Biol. 268, 322–330.
Pouwels, P.H., Leunissen, J.A.M., 1994. Divergence in codon usage
of Lactobacillus species. Nucleic Acids Res. 22, 929–936.
Sharp, P.M., Li, W., 1987. The codon adaptation index — a measure
of directional synonymous codon usage bias, and its potential
applications. Nucleic Acids Res. 15, 1281–1295.
Sharp, P.M., Matassi, G., 1994. Codon usage and genome evolution.
Curr. Opin. Genet. Dev. 4, 851–860.
Smith, D.R., Douchette-Stamm, L.A., Deloughery, C., et al., 1997.
155
Complete genome sequence of Methanobacterium thermoautotrophicum DH: functional analysis and comparative genomics. J. Bacteriol. 179, 7135–7155.
Sprinzl, M., Horn, C., Brown, M., Ioudovitch, A., Steinberg, S., 1998.
Compilation of tRNA sequences and sequences of tRNA genes.
Nucleic Acids Res. 26, 148–153.
Stephens, R.S., Kalman, S., Lammel, C., et al., 1998. Genome sequence
of an obligate intracellular pathogen of humans: Chlamydia trachomatis. Science 282, 754–759.
Stragier, P., Losick, R., 1996. Molecular genetics of sporulation in
Bacillus subtilis. Annu. Rev. Genet. 30, 297–341.
Sueoka, N., 1962. On the genetic basis of variation and heterogeneity
in base composition. Proc. Natl. Acad. Sci. USA 48, 582–592.
Sueoka, N., 1988. Directional mutation pressure and neutral molecular
evolution. Proc. Natl. Acad. Sci. USA 85, 2653–2657.
Tomb, J., White, O., Kerlavage, A.R., et al., 1997. The complete
genome sequence of the gastric pathogen Helicobacter pylori.
Nature 388, 539–547.
Yamao, F., Andachi, Y., Muto, A., Ikemura, T., Osawa, S., 1991.
Levels of tRNAs in bacterial cells as affected by acid usage in
proteins. Nucleic Acids Res. 19, 6119–6122.
Yang, D., Oyaizu, Y., Oyaizu, H., Olsen, G.J., Woese, C.R., 1985.
Mitochondrial origins. Proc. Natl. Acad. Sci. USA 82, 4443–4447.
Ziegler, D., Dean, D., 1990. Orientation of genes in Bacillus subtilis
chromosome. Genetics 125, 703–708.