Download Insights into the evolutionary process of genome degradation Jan O

yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Insights into the evolutionary process of genome degradation
Jan O Andersson* and Siv GE Andersson†
Studies of noncoding and pseudogene sequence diversity,
particularly in Rickettsia, have begun to reveal the basic
principles of genome degradation in microorganisms.
Increasingly, studies of genes and genomes suggest that there
has been an extensive amount of horizontal gene transfer
among microorganisms. As this inflow of genetic material does
not seem generally to have resulted in genome size
expansions, however, degenerative processes must be at the
very least as widespread as horizontal gene transfer. The basic
principles of gene degradation and elimination that are being
explored in Rickettsia are likely to be of major importance for
our understanding of how microbial genomes evolve.
Department of Molecular Evolution, Uppsala University, Box 590,
Biomedical Center, 751 24 Uppsala, Sweden
*e-mail: [email protected]
†e-mail: [email protected]
Current Opinion in Genetics & Development 1999, 9:664–671
0959-437X/99/$ — see front matter © 1999 Elsevier Science Ltd.
All rights reserved.
Ado-Met S-Adenosylmethionine
open reading frame
spotted fever group
typhus group
in R. prowazekii and C. trachomatis has provided examples
of reductive convergent evolution associated with the evolution of metabolic parasitism in response to the
intracellular habitat [6•]. For example, both organisms rely
on their host cells for supply of nucleoside monophosphates and seem to have discarded all genes coding for
enzymes involved in de novo purine and pyrimidine
biosynthesis. Overall, the relative fraction of genes allocated to different functional categories is very similar in the
genomes of R. prowazekii and C. trachomatis [6•]
The reductive evolutionary processes acting on genomes of
intracellular bacteria is likely to also have shaped the structures of organellar genomes [5•]. As mitochondria and
α-proteobacteria are thought to share a common ancestor,
there is also a very close, phylogenetic link between mitochondria and Rickettsia [1••,7•,8••]. Needless to say, this does
not mean that mitochondria has evolved from an ancestral
bacterium with a genome like that of modern Rickettsia.
More likely, the rickettsial and the mitochondrial genomes
have both been reduced in size independently since they
diverged from a common ancestor some 2000 million years
ago [5•,7•,8••,9•]. Indeed, comparative studies of protist
mitochondrial genomes suggest that individual genes have
been lost many times independently in different lineages
and that the flux of genes from the mitochondrion to the
host nucleus is an ongoing process [8••,10].
A large number of obligate intracellular parasites and symbionts – Rickettsia, Chlamydia and Buchnera – have genome
sizes in the 1 Mb range or less [1••–3••,4]. These bacteria
have almost certainly evolved from bacteria with larger
genome sizes. Obligate intracellular parasitism has been
found in a variety of bacterial phyla, suggesting that transitions to intracellular environments have occurred a number
of times independently in evolution [5•]. Thus, there
seems to be a correlation between intracellular lifestyles,
small genome sizes and reductive evolutionary processes.
We expect this similarity in history to have left ‘footprints’
in the genome sequences of modern intracellularly replicating bacteria.
Indeed, recent work has begun to reveal how genomes of
intracellular parasites deteriorate. Fundamental to this
progress has been the publication of complete genome
sequence data from two genera of obligate intracellular
parasites: Rickettsia and Chlamydia. The 1.1 Mb genome
sequence of Rickettsia prowazekii, a member of the αProteobacteria and the causative agent of epidemic typhus,
was published last year [1••]. The 1.0–1.1 Mb genome
sequences of Chlamydia trachomatis and Chlamydia pneumoniae, the causative agents of trachoma and pneumoniae,
respectively, were also published during the past year.
[2••,3••]. A comparative analysis of the gene complements
The chloroplast genomes of non-photosynthetic plants provide a particularly apt model system for studies of
degenerative processes. For example, photosynthesis has
been lost secondarily in Epifagus virigiana, a plant which
lives on the roots of beech trees [11]. Not surprisingly, most
genes for photosynthesis and chlororespiration have been
discarded from this genome [12]. Chloroplast genomes from
this group of plants contain a multitude of pseudogenes
such as the photosynthesis gene rbcL, which has been mutationally destroyed in some lineages, whereas it has either
been retained or completely eliminated in others [12].
Another nice model system for studies of reductive evolutionary processes are the nucleomorph genomes — vestigial
nuclear remnants of eukaryotic algae that have established
secondary endosymbiotic relationships with marine protists
[13]. The nucleomorph genomes have been reduced in size
to <1 Mb and contain densely packed, co-transcribed genes
which are interrupted by very short but functional introns
[13,14]. Comparative analyses of the residual genes in the
nucleomorph genomes are likely to yield important information about the flux and elimination of genetic information
within and among eukaryotic organisms.
Recent studies on the evolution of pseudogene sequences
in Rickettsia have now also started to yield insights into the
Insights into the evolutionary process of genome degradation Andersson and Andersson
Figure 1
A schematic view of gene degradation in
Rickettsia. The tree shows the phylogenetic
relationship of a subset of species from the TG
and SFG Rickettsia. Thick boxes represent
functional genes and thin boxes with a Ψ-sign
pseudogenes. The genes are from left to right:
polA, white; hicB, light gray; metK, black; f-orf,
gray; dnaE, white. The flanking genes polA and
dnaE are functional and present in all species.
The pseudogene status of (a–e), the ancestral
species, have been inferred from the
occurrence of pseudogenes in (f–j), the
modern Rickettsia species. We hypothesize
that (a) in the common ancestor of the SFG
and the TG Rickettsia, all three genes between
polA and dnaE were functional and that the
inactivation of the metK gene was triggered by
the invention of a transport system for AdoMet. Early in the branch leading to the TG
Rickettsia, both hicB and the f-orf were
(b) inactivated and (c) eliminated. The single
internal termination codon in the metK gene in
(f) R. prowazekii is indicative of a recent geneinactivation event. The spacer region between
metK and dnaE in (f–g) R. prowazekii and
R. typhi show weak sequence similarity to the
complete ORF in (j) R. felis. In the SFG
Rickettsia, the three genes were functional at
(f) 1 stop
R. prowazekii
TG Rickettsia
R. typhi
7 del
1 ins
(h) 5 del 1 stop
SFG Rickettsia
R. rickettsii
2 del
1 ins
4 del
1 ins
(i) 1 stop 1 stop
Invention of a transport
system for Ado-Met?
3 del
polA hicB metK f-orf dnaE
the time when (d) R. felis diverged from the
other two species. However, both the metK
and the f-orf genes were inactivated prior to
(e) the split between R. rickettsii and
R. montana. A number of deletion/insertion
R. montana
R. felis
Current Opinion in Genetics & Development
mutations have accumulated in (h–j) the
modern lineages of the SFG Rickettsia. The
number of deletions (del), insertions (ins) and
internal termination codons (stop) are
indicated. (Data taken from [16••].)
principles of degenerative processes in microorganisms. The
R. prowazekii genome sequence is unique in this sense: only
76% of the 1.1 Mb genome has a coding function and about
a dozen pseudogenes were initially identified [1••]. The
high fraction of noncoding DNA has been speculated to represent remnants of ancient genes that are currently in the
process of being eliminated from the genome. Here, we
review recent work on the analysis of pseudogene sequence
variation in the Rickettsia genomes, with additional references to similar studies now being initiated in other species.
R. prowazekii strain B, had metK genes that comprise complete open reading frames (ORFs). In contrast, the metK
genes in all members of the SFG were found to be disrupted by insertion/deletion mutations and termination
codons (Figure 1; [16••]). These presumably inactivated
genes were found to be subjected to increased fixation
rates for substitutions at sites that cause amino acid
replacements. The evidence taken together suggests
strongly that metK is a non-functional, neutrally evolving
pseudogene in most lineages of Rickettsia [16••].
Degradation of the gene coding for
S-Adenosylmethionine synthetase…
The inactivation of metK may have been induced by a relaxation of the functional constraints acting on this gene, for
example by the combined utilization of cytosolic and internally produced Ado-Met. To be able to exploit cytosolic
metabolites that are not freely exchangeable over the bacterial cell membrane, however, specific import systems must be
invented. For example, bacteria and eukaryotes are normally
impermeable to nucleotides because of the lack of appropriate transport systems but both Rickettsia and Chlamydia are
able to exploit the cytosolic ATP with the help of a unique
transport system for ATP and ADP [1••–3••], which has not
yet been found in any other bacteria. By analogy, it may be
speculated that a transport system for Ado-Met was invented
prior to the divergence of the TG and the SFG, after which
both of the diverging lineages would have been free to start
accumulating nucleotide and frameshift mutations.
The first indication that genome degradation is an ongoing
process in Rickettsia was obtained from the identification of
an internal termination codon in the metK gene which codes
for S-Adenosylmethionine synthetase [15]. This enzymes catalyzes the biosynthesis of S-Adenosylmethionine (Ado-Met),
an essential co-factor in a variety of very important cellular
processes (e.g. the methylation of DNA sequences). To determine whether the termination codon was an unusual but
conserved feature of a functional gene or if it was the very first
sign of gene inactivation, we examined seven additional
Rickettsia species for sequence variation in this region [16••].
The genus Rickettsia can be divided into two major groups:
the typhus group (TG), represented by the etiological
agent of epidemic typhus, R. prowazekii, and the spotted
fever group (SFG), represented by the etiologic agent of
Rocky Mountain spotted fever, Rickettsia rickettsii
(Figure 1). Only two TG lineages, R. typhi and
It is interesting to note that uptake of Ado-Met has previously been demonstrated in Leishmania as well as in
mitochondria [17,18]. This might be a reflection of a much
Genomes and evolution
Figure 2
An illustration of the patterns of changes in
pseudogenes in Rickettsia. The relative
frequencies (%) have been plotted against the
average sizes (bp) of insertion and deletion
mutations in the metK and f-orf pseudogenes.
The figure shows that deletion mutations
predominate over insertion mutations, both
with respect to occurrences and average
sizes. (Data taken from [16••].)
Relative frequency (%)
Size in bp
Current Opinion in Genetics & Development
more general phenomenon: whenever a bacterium makes
a change in its lifestyle and habitat, some genes will
become nonessential and thereby act as targets for gene
inactivation events. For example, Lactococcus lactis strains
isolated from dairy products have been shown to be auxotrophs for several amino acids due to frameshifts and
nonsense mutations in the corresponding biosynthetic
genes, while strains from nondairy products are prototrophs for the same amino acids [19,20].
changes has shown that deletions are far more common
than insertions, and on the average much larger in size
(Figure 2). Whereas the insertions were only 1–2 bp in
size, the deletions ranged in size from 1 bp up to >1000 bp
[16••]. Thus, in the long-term the deletions will tend to
override the insertions, which means that once a rickettsial
gene has become nonfunctional it will be eliminated from
the genome solely by mutational events.
… and many, many other rickettsial genes!
… and its downstream gene…
A second pseudogene was found in the region downstream
of metK. This region contains a long open reading frame in
Rickettsia felis, which has a nucleotide composition pattern
that is characteristic of rickettsial genes but with no
sequence similarities to genes in the public databases
[16••]. Remnants of this gene were detected in five additional members of the SFG. In these species, between 6
and 12 frameshift events were required to recreate ORFs
with the expected codon usage patterns (Figure 1; [16••]).
However, as these pseudogenes do not show any sequence
similarities to genes in the public databases, there are no
clues as to what the ancestral function of this gene might
have been, or why it is being eliminated from the genome.
It may be that its inactivation was an indirect result of a
promoter mutation upstream of metK which simultaneously inactivated both genes [16••].
These inactivated gene sequences serve as a wonderful
dataset for studies of neutral sequence evolution in
Rickettsia. A detailed examination of the patterns of
The genomic regions initially associated with putative
pseudogenes in R. prowazekii have by now been examined
systematically for sequence variation in several other
Rickettsia species. The analysis has shown that seven of the
disrupted genes in R. prowazekii are also defective in one or
more of the other species (JO Andersson, SGE Andersson,
unpublished data). Surprisingly, out of a total of 18 genes
uniquely present in members of the SFG, as many as half
were found to correspond to pseudogenes. As the unique
genes as well as the reconstructed pseudogenes displayed
the characteristic patterns in codon usage, they did not
seem to have been acquired by horizontal transfer. Rather,
the analysis suggested that these genes were present in the
Rickettsia lineage long before the divergence of the two
groups of Rickettsia, implying that their absence from the
TG must be a result of recent gene losses. Indeed, the
pseudogenes in the SFG were occasionally surrounded by
flanking genes the homologs of which were separated by
long intergenic regions in the TG. It is interesting to note
that sequence similarities were detected for some of
these intergenic regions in the TG and the corresponding
Insights into the evolutionary process of genome degradation Andersson and Andersson
genes or pseudogenes in the SFG (JO Andersson,
SGE Andersson, unpublished data). These data support
the notion that some of the noncoding DNA in the
R. prowazekii genome corresponds to genes that have been
so extensively degraded that they are no longer recognizable as genes [1••].
A rough calculation based on ~6% of the R. prowazekii
genome for which sequence data is also available for four
other Rickettsia species suggests that 200–300 genes may
have been lost since the divergence of the TG and SFG
(JO Andersson, SGE Andersson, unpublished data).
Plasmid pseudogenes in Buchnera and
Borrelia burgdorferi
Bacterial pseudogenes have also been identified on naturally occurring plasmids in Buchnera sp. and Borrelia
burgdorferi. Buchnera are obligate endosymbionts of aphids
[21]. The symbiotic relationship is mutual; it has not yet
been possible to cultivate the bacteria on artificial media
and the aphids are either sterilized or killed by treatment
with antibiotics [21]. The role of the endosymbionts is to
supply the aphids with essential amino acids. To ensure
that the amino acids are produced efficiently, several
amino acid biosynthetic genes have been amplified on
plasmids [22]. Recently, it has been found that these tandem repeats sometimes contain pseudogene copies that
have accumulated mutations in a seemingly neutral manner, possibly through changes in the exogenous amino acid
supply [23,24].
Additional examples of plasmid pseudogenes have been
found in B. burgdorferi, the etiological agent of Lyme disease. This parasite contains at least 17 different plasmids
[25]. The average coding content for the plasmids, however, is only 71% and putative gene functions could be
assigned for only 16% of the plasmid genes [25]. Even
more surprising, a very large number of the putative genes
were found to contain frameshift and/or internal termination codons [25]. For example, the gene coding for
recombinase/invertase was present as a full-length copy on
one plasmid but as many as seven copies seemed to be in
various stages of degradation on the other plasmids [25].
Although the pseudogene status of these sequences has
yet to be verified by comparative sequence analysis or by
expression studies, it seems likely that the genes with
frameshift and/or internal stop codons have indeed been
inactivated and are no longer under purifying selection.
The ouflows and inflows of DNA sequences
The outflow of DNA sequences by gene inactivation events
can, in principle, be compensated for by a corresponding
inflow of DNA sequences via horizontal transfers. Indeed,
horizontal transfer events in free-living bacteria such as
Escherichia coli, have been suggested to occur much more frequently than was previously thought [26•]. It should be
recalled, however, that the host cell cytoplasm is a very isolated environment, with few opportunities for intracellularly
growing parasites to mix and mingle with other bacteria during their reproductive phase. It is therefore questionable
whether small populations of isolated obligate intracellular
parasites are as prone to horizontal transfers as large populations of free-living bacteria. Phylogenetic studies and
comparative sequence analysis have provided a few examples of putative, ancient horizontal transfers in both Rickettsia
and Chlamydia [1••,2••,26•]. For example, the valyl-tRNA
synthetase and lysyl-tRNA synthetase in R. prowazekii show
a close phylogenetic relationship with the corresponding synthetases in the archaea rather than with their homologs in
bacteria ([1••]; B Canbäck, SGE Andersson, unpublished
data) but a majority of genes display the expected phylogenetic relationships to bacteria (T Sicheritz, SGE Andersson,
unpublished data).
One way of quantifying the relative frequencies of horizontal gene transfers is by estimating the fraction of recently
introduced genes from their atypical codon usage patterns.
Indeed, it was recently inferred from such an analysis that as
much as 18% of the E. coli genome may be of recent foreign
origin [27•]. In striking contrast to the heterogeneity in codon
usage patterns within the E. coli genome, R. prowazekii genes
are extremely homogeneous in their usage of codons [28],
with few, if any, indications of recently introduced genes
([1••]; M Remm, SGE Andersson, unpublished data). An
alternative way of ‘creating’ new DNA sequences is by internal gene duplication events but both the number and sizes of
gene families are much lower in the R. prowazekii genome
than in the genomes of other free-living relatives. Taken
together, the suggestion is that the outflow of DNA
sequences is not compensated for by either externally introduced DNA or by internal gene duplications in R. prowazekii.
Thus, low rates of gene influx in combination with a mutation bias for deletions will cause a gradual shrinkage in
genome sizes, as expected for obligate intracellular parasites. Furthermore, in organisms with small population
sizes, recurrent bottlenecks and low rates of recombination, even mutations that are slightly deleterious to the
organism, may become fixed in the population. This phenomenon, which is known as Muller’s ratchet, [29,30], has
been most extensively studied in the genus Buchnera
[31,32•–34•]. Thus, genes may be inactivated and lost
either because they are no longer needed or just by coincidence even though the inactivated genes may be slightly
disadvantageous to the organism. In either case, the lost
gene functions will be difficult or impossible to recover
again in organisms with low rates of gene inflow from other
individuals, strains or species.
Intracellular parasites in different stages of
genome degradation
The evolutionary transition to the intracellular environment is likely to have occurred in a series of steps that
successively eliminated most of the initial gene complement. We expect that these degenerative processes are
relatively fast in the early stages and then gradually slow
Genomes and evolution
down as the genome decreases in size. Indeed, the obligate intracellular parasite Mycobacterium leprae has a genome
with a size of 2.8 Mb [35]. This is the largest genome
known for an obligate intracellular parasite and its ancestral genome may have been even larger, possibly as large as
the 4.4 Mb genome of its close relative Mycobacterium tuberculosis [36]. The M. leprae genome seems to be in an early,
rapid phase of degradation, as inferred from both its large
genome size and from the observation that as much as
3.5% of the possible protein coding regions contain multiple frameshift and/or in-frame termination codons [35].
The presence of pseudogenes and a large fraction of noncoding DNA in the R. prowazekii genome suggests that
genes are currently being inactivated at a higher rate than
they are being eliminated ([1••,16••]; JO Andersson,
SGE Andersson, unpublished data). However, once an
equilibrium has been reached, such that the rate of gene
inactivation is significantly lower than the rate at which
genes are being degraded, the coding content should
increase up to a level of ~90%, as seen for a majority of the
bacterial genomes sequenced so far.
Indeed, C. trachomatis has a coding content of 90% and no
identifiable pseudogenes [2••]. This might indicate that
C. trachomatis has already reached the final stage of its
adaptation to the host-cell environment or that the rate of
degradation is much faster in Chlamydia than in Rickettsia.
The finding that C. pneumoniae has 214 protein genes that
are not present in C. trachomatis [3••] suggests that there is
a significant rate of gene turnover also in the Chlamydia
genomes. It is possible, however, that Chlamydia has a
more efficient system for removing nonfunctional genes,
which would make it more difficult to identify pseudogenes at any given time point. Indeed, it is interesting to
note that both metK and spoT/relA, which are present as
pseudogenes in the R. prowazekii genome, have already
been completely eliminated from the C. trachomatis
genome ([1••,2••,16••]; JO Andersson, SGE Andersson,
unpublished data).
Genome sequences are only snapshots in
evolutionary time and space!
It is argued increasingly that horizontal transfers occur at
such a high rate that it may not be possible to reconstruct
organismal relationships on the basis of individual gene
sequences [37•,38–40,41•,42•]; but if horizontal transfers
are indeed as common as suggested, the sizes of microbial
genomes would grow indefinitely! As genomes apparently
do not grow in such an uncontrolled fashion, it means that
the estimated frequencies of horizontal transfers are either
overestimated or that they are compensated for by an
equally frequent occurrence of degenerative processes.
For simplicity, it can be assumed that the size of a genome
is the net result of the rate at which sequences are being
acquired versus the rate at which sequences are being
eliminated. These depend on the bias for different types
of mutations — horizontal transfers, duplications and deletions — as well as on the strength of any selection on
genome size. Similarly, the coding content of a genome is
determined by the fixation rate for gene inactivation
events and for how long time a gene no longer under purifying selection remains in the genome as a pseudogene.
Cleaning up pseudogene sequences solely by random
mutations requires that deletions predominate over insertions, both in frequencies of occurrence and in average
sizes. Indeed, it has been argued that a high rate and large
average size of deletions in Drosophila compared to mammals may explain the lack of pseudogenes in Drosophila, as
well as the differences in genome size between the two lineages [43,44,45••]. Likewise, the sizes and coding contents
of microbial genomes probably reflect the rates and sizes of
horizontal transfers as well as of internal duplication and
deletion events.
A rigorous phylogenetic study based on a set of 312 orthologous genes from six completely sequenced prokaryotic
genomes has suggested that the transfer of genetic material occurs continuously [39]. Furthermore, the complete
genome sequence of the bacterium Thermotoga maritima
has revealed that as much as 24% of the genes were most
similar to archaeal genes [40]. Many of these were clustered in the genome, which was taken as evidence for
extensive lateral transfer from the Archaea to T. maritima
[40]. Finally, it has been estimated that as much as 18% of
the E. coli genome may be of recent foreign origin [27•].
Indirect evidence for horizontal transfers in E. coli has also
been obtained from the striking differences in genome
sizes of natural isolates, sometimes by as much as 1 Mb
[41•]. Indeed, a sequence analysis of the accessory DNA in
the genomes of different strains of the E. coli reference collection has shown that the strain-specific genes are mostly
genes of exogenous origin [42•]. Thus, lateral gene transfer
seems to be an important mechanism for generating
genomic variants, although the extent to which it occurs in
individual species remains to be determined. This means
that the basic principles of gene inactivation, degradation
and elimination that we have started to glimpse in the
Rickettsia genomes are processes that are probably far more
general than what has been appreciated to date.
Thus, it is important to recognize that the sequence of any
individual genome is only a snapshot in evolutionary time
and space. To really understand the dynamics of genomes,
we need to understand the balance as well as the processes whereby new genes are being acquired and old genes
are being removed. Such information can only be obtained
through vigorous, comparative analyses of closely related
strains and species. Here, the situation is encouraging: the
genomes of several closely related strains and species are
currently under investigation [3••,46••]. This kind of
knowledge will be crucial for how new genome sequence
data is interpreted in general and to evaluate hypotheses of
horizontal gene transfer in particular. Indeed, we are convinced that as scientists begin to inspect genomes from a
Insights into the evolutionary process of genome degradation Andersson and Andersson
comparative, evolutionary perspective, many more examples of degenerative processes will be obtained from a
large variety of different microorganisms.
References and recommended reading
The cytoplasm of a eukaryotic cell is an extreme growth
environment. When a free-living bacterium changes
lifestyle to become an obligate intracellular parasite or
symbiont, the genomic consequences are enormous. For
example, the ability to exploit host-cell metabolites will
immediately lead to a reduced level of purifying selection
on a large set of the genes involved in small molecule
biosynthesis. These genes will disappear from the genome
at a rate set by the balance between the insertion/deletion
mutation bias and the strength of selection acting on the
size of the genome. During this process the number of
pseudogenes will gradually increase and the coding content decrease until a new steady state has been reached.
Papers of particular interest, published within the annual period of review,
have been highlighted as:
The very first evidence for reductive evolutionary processes acting on the genomes of obligate intracellular parasites
was obtained from the metK pseudogene, which contains
an internal termination codon in R. prowazekii and numerous short insertions and deletions in the SFG Rickettsia.
Comparative analyses of several other pseudogenes have
since confirmed that there is a continuous outflow of gene
sequences from the Rickettsia genomes. In total, we have
estimated that R. prowazekii may have lost ~200–300 genes
since its divergence from the SFG.
The basic principles of gene deterioration upon shifts to
intracellular environments may apply to changes of
lifestyles in general. Indeed, it seems likely that longterm shifts to new growth habitats renders subsets of
genes nonessential and these will eventually be eliminated. How many pseudogenes can be detected at any
given time-point is largely dependent upon the intrinsic
insertion/deletion mutation bias. If insertions and deletions are rare compared to point mutations,
non-functional genes may remain in the genome for a
long period of time.
This process has profound effects on the way in which
microbial genomes evolve. The loss of genetic information may, in principle, be equilibrated by a corresponding
level of horizontally transferred genes that are more beneficial for growth in the new environment. However, the
relative rates of gains and losses of genes may vary substantially in different microbial genomes, which could
provide an explanation for the over ten-fold variation in
genome sizes. Unfortunately, single genome sequences
provide very few clues about the extent to which genes
are being shuffled into and out of the genome. Resolution
of these issues can only be obtained by comparative
sequencing of closely related strains and species. Elegant
experiments can then be designed to fully explore the
delicate balance of genome shrinkage and expansion in
different microbial lineages.
The authors work is supported by the National Science Research Council, the
Knut and Alice Wallenberg Foundation and the Swedish Foundation for
Strategic Research.
• of special interest
•• of outstanding interest
Andersson SGE, Zomorodipour A, Andersson JO, Sicheritz-Ponten T,
Alsmark UCM, Podowski RM, Naslund AK, Eriksson A-S, Winkler HH,
Kurland CG: The genome sequence of Rickettsia prowazekii and
the origin of mitochondria. Nature 1998, 396:133-140.
The complete genome sequence of the obligate intracellular parasite
Rickettsia prowazekii. One of the most remarkable aspects of this genome
is its high non-coding content (24%) and the presence of several pseudogenes. The non-coding DNA is speculated to represent remnants of genes
that are in their final stages of elimination.
Stephens RS, Kalman S, Lammel C, Fan J, Marathe R, Aravind L,
Mitchell W, Olinger L, Tatusov RL, Zhao Q et al.: Genome sequence
of an obligate intracellular pathogen of humans: Chlamydia
trachomatis. Science 1998, 282:754-759.
The complete genome sequence of the obligate intracellular parasite C. trachomatis. The genome lacks many genes for biosynthetic capabilities but
encodes an intact glycolytic pathway. In addition, it contains genes coding
for a transport system for ATP which enables C. trachomatis to exploit
cytosolic ATP as a source of energy.
Kalman S, Mitchell W, Marathe R, Lammel C, Fan J, Hyman RW,
Olinger L, Grimwood J, Davis RW, Stephens RS: Comparative
genomes of Chlamydia pneumoniae and C. trachomatis.
Nat Genet 1999, 21:385-389.
The first comparison of two closely related obligate intracellular parasites.
The analysis shows that 214 protein coding sequences are uniquely present
in the larger genome of C. pneumoniae. The unique genes are dispersed
throughout the chromosome.
Charles H, Ishikawa H: Physical and genetical map of the genome
of Buchnera, the primary endosymbiont of the pea aphid
Acyrthosiphon pisum. J Mol Evol 1999, 48:142-150.
5. Andersson SGE, Kurland CG: Reductive evolution of resident
genomes. Trends Microbiol 1998, 6:263-278.
Genome evolution of intracellular bacteria resembles the evolution of
organelles in many ways. This review discusses the evolutionary forces acting on genomes that replicate within the cytoplasm of a eukaryotic host cell.
The effects of these reductive forces on genome sizes, architectures and
nucleotide substitution rates are discussed.
Zomorodipour A, Andersson SGE: Obligate intracellular parasites:
Rickettsia prowazekii and Chlamydia trachomatis. FEBS Lett
1999, 452:11-15.
This review discusses a comparative analysis of the obligate intracellular parasites R. prowazekii and C. trachomatis. These organisms are not phylogenetically related and it is generally thought that they have adopted to the
intracellular environment independently of each other. Both genomes have
small genomes sizes, few biosynthetic genes and similar fractions of genes
allocated to the different functional categories; however, the identity of
genes within the functional categories differ. The most striking difference is
that the C. trachomatis genome has a coding content of 89.5%, whereas the
R. prowazekii genome has a coding content of only 75.4%.
Sicheritz-Ponten T, Kurland CG, Andersson SGE: A phylogenetic
analysis of the cytochrome b and cytochrome c oxidase I genes
supports an origin of mitochondria from within the Rickettsiaceae.
Biochim Biophys Acta 1998, 1365:545-551.
This is a phylogenetic study based on cytochrome c oxidase I and cytochrome
b. The analysis reveals a close phylogenetic relationship between mitochondria and α-proteobacteria in general and between mitochondria and the
group of bacteria to which R. prowazekii belongs in particular.
8. Gray MW, Burger G, Lang BF: Mitochondrial evolution. Science
•• 1999, 283:1476-1481.
An interesting discussion of mitochondrial origin and evolution. Of special
interest for the purpose of this review is that all sequenced mitochondrial
genomes can be divided into two types: ‘the conserved’ and ‘the derived’.
The implication is that there was a first rapid phase of degradation during
which a majority of the initial genes were lost, resulting in mitochondrial
genomes with similarities to the conserved type of mitochondrial genomes,
such as, for example, those found in protists. In some lineages, a second
phase of degradation occurred, which resulted in additional gene losses,
Genomes and evolution
accelerated mutation rates and non-standard genetic codes. The mammalian
mitochondrial genomes are examples of highly derived genomes.
9. Gray MW: Rickettsia, typhus and the mitochondrial connection.
Nature 1998, 396:109-110.
A ‘News and Views’ piece stressing the striking similarities between
R. prowazekii and modern mitochondria. The loss of genetic information is
most likely a result of convergent reductive evolution, as their common ancestor was almost certainly a free-living microorganism with a larger genome size.
10. Gray MW, Lang BF, Cedergren R, Golding GB, Lemieux C, Sankoff D,
Turmel M, Brossard N, Delage E, Littlejohn TG et al.: Genome
structure and gene content in protist mitochondrial DNAs.
Nucleic Acids Res 1998, 26:865-878.
11. Wolfe KH, Morden CW, Palmer JD: Function and evolution of a
minimal plastid genome from a nonphotosynthetic parasitic plant.
Proc Natl Acad Sci USA 1992, 89:10648-10652.
12. dePamphilis CW, Young ND, Wolfe AD: Evolution of plastid gene
rps2 in a lineage of hemiparasitic and holoparasitic plants: many
losses of photosynthesis and complex patterns of rate variation.
Proc Natl Acad Sci USA 1997, 94:7367-7372.
13. Gilson PR, McFadden GI: The miniaturized nuclear genome of a
eukaryotic endosymbiont contains genes that overlap, genes that
are cotranscribed, and the smallest known spliceosomal introns.
Proc Natl Acad Sci USA 1996, 93:7737-7742.
14. Gilson PR, McFadden GI: Good things in small packages: the tiny
genomes of chlorarachniophyte endosymbionts. Bioessays 1997,
15. Andersson JO, Andersson SGE: Genomic rearrangements during
evolution of the obligate intracellular parasite Rickettsia
prowazekii as inferred from an analysis of 52015 bp nucleotide
sequence. Microbiology 1997, 143:2783-2795.
16. Andersson JO, Andersson SGE: Genome degradation is an
•• ongoing process in Rickettsia. Mol Biol Evol 1999, 16:1178-1191.
The first detailed, comparative analysis of pseudogene sequence evolution
in microorganisms. The analysis shows that genes which have been inactivated by frameshift mutations and/or termination codons in the Rickettsia
genomes have strongly elevated fixation rates for mutations at sites that
cause amino acid replacements, which demonstrates that there is no purifying selection acting on the identified pseudogenes. The analysis also shows
that deletions predominate over insertions in these neutral evolving
sequences, indicating that an inactivated gene will gradually accumulate
substitutions and short deletions until it is no longer recognizable and/or until
it is totally eliminated.
Avila J, Polegre MA: Uptake and metabolism of S-adenosyl-Lmethionine by Leishmania mexicana and Leishmania braziliensis
promastigotes. Mol Biochem Parasitol 1993, 58:123-134.
18. Horne DW, Holloway RS, Eagner C: Transport of Sadenosylmethionine in isolated rat liver mitochondria.
Arch Biochem Biophys 1997, 343:201-206.
19. Godon JJ, Delorme C, Bardowski J, Chopin MC, Ehrlich SD,
Renault P: Gene inactivation in Lactococcus lactis: branched-chain
amino acid biosynthesis. J Bacteriol 1993, 175:4383-4390.
20. Delorme C, Godon JJ, Ehrlich SD, Renault P: Gene inactivation in
Lactococcus lactis: histidine biosynthesis. J Bacteriol 1993,
21. Baumann P, Baumann L, Lai C-Y, Rouhbakhshu D, Moran N,
Clark MA: Genetics, physiology, and evolutionary relationships of
the genus Buchnera: intracellular symbionts of aphids.
Annu Rev Microbiol 1995, 49:55-94.
22. Lai CY, Baumann L, Baumann P: Amplification of trpEG: adaptation
of Buchnera aphidicola to an endosymbiotic association with
aphids. Proc Natl Acad Sci USA 1994, 91:3819-3823.
23. Lai CY, Baumann P, Moran N: The endosymbiont (Buchnera sp.) of
the aphid Diuraphis noxia contains plasmids consisting of trpEG
and tandem repeats of trpEG pseudogenes.
Appl Environ Microbiol 1996, 62:332-339.
24. Baumann L, Clark MA, Rouhbakhsh D, Baumann P, Moran NA,
Voegtlin DJ: Endosymbionts (Buchnera) of the aphid Uroleucon
sonchi contain plasmids with trpEG and remnants of trpE
pseudogenes. Curr Microbiol 1997, 35:18-21.
25. Fraser CM, Casjens S, Huang WM, Sutton GG, Clayton R,
Lathigra R, White O, Ketchum KA, Dodson R, Hickey EK et al.:
Genomic sequence of a Lyme disease spirochaete, Borrelia
burgdorferi. Nature 1997, 390:580-586.
26. Wolf YI, Aravind L, Koonin EV: Rickettsiae and Chlamydiae:
evidence of horizontal gene transfer and gene exchange. Trends
Genet 1999, 15:173-175.
This analysis of the genomes of the intracellular parasites R. prowazekii and
C. trachomatis shows that a total of 16 and 26 proteins, respectively, are
most similar to their eukaryotic homologs. The genes coding for these proteins may have been obtained by horisontal transfer. It would be interesting
to examine well sampled phylogenetic trees based on these proteins to infer
when and from which organisms the putative transfers occured.
Lawrence JG, Ochman H: Molecular archaeology of the
Escherichia coli genome. Proc Natl Acad Sci USA 1998,
This paper discusses frequencies of horizontal transfers in the E. coli
genome. The analysis utilizes parameters such as the G+C contents of the
first and third position, χ2 values of codon usage biases and codon adaptation indices to distinguish between ‘native’ E. coli genes, and genes which
have been introduced recently from another genome with different base composition and/or codon usage patterns. It is concluded that ~18% of the current E. coli chromosome is of foreign origin and has been introduced recently.
28. Andersson SGE, Sharp PM: Codon usage and base composition in
Rickettsia prowazekii. J Mol Evol 1996, 42:525-536.
29. Muller JJ: The relation of recombination to mutational advance.
Mutat Res 1964, 1:2-9.
30. Felsenstein J: The evolutionary advantage of recombination.
Genetics 1974, 78:737-756.
31. Moran NA: Accelerated evolution and Muller’s rachet in
endosymbiotic bacteria. Proc Natl Acad Sci USA 1996,
32. Brynnel EU, Kurland CG, Moran NA, Andersson SGE: Evolutionary
rates for tuf genes in endosymbionts of aphids. Mol Biol Evol
1998, 15:574-582.
This paper shows that both synonymous and non-synonymous substitution
rates are higher in intracellularly replicating symbionts (Buchnera) than in the
free-living microorganisms (E. coli and S. typhimurium). The intrinsic mutation rates for the two lineages were estimated to be very similar, however,
suggesting that the fixation rate for synonymous and non-synonymous mutations are significantly higher in the endosymbionts. The results are related to
the absence of codon preferences in Buchnera and to the influence of
Muller’s ratchet on small asexual populations.
33. Lambert JD, Moran NA: Deleterious mutations destabilize
ribosomal RNA in endosymbiotic bacteria. Proc Natl Acad Sci USA
1998, 95:4458-4462.
By examining the free energy of the 16S rRNA genes in a number of bacteria, it has been shown that endosymbiotic bacteria, such as Buchnera have
reduced rRNA stabilities compared to their free-living relatives. The results
suggest that endosymbiotic bacteria may accumulate slightly deleterious
mutations probably as a result of their asexuality and small population sizes.
34. Wernegreen JJ, Moran NA: Evidence for genetic drift in
endosymbionts (Buchnera): analyses of protein-coding genes.
Mol Biol Evol 1999, 16:83-97.
This paper demonstrates that there is either no or only a very weak selection
for codon bias in Buchnera. Furthermore, the authors show that many genes
in Buchnera seem to have accumulated slightly deleterious mutations at
sites that cause amino acid replacements, consistent with a decreased
effectiveness of purifying selection at these sites. The extent to which the
strong codon bias in E. coli and Salmonella typhimurium and the strong
composition bias towards A+T nucleotides in Buchnera may have affected
the results is unclear.
35. Smith DR, Richterich P, Rubenfield M, Rice PW, Butler C, Lee HM,
Kirst S, Gundersen K, Abendschan K, Xu Q et al.: Multiplex
sequencing of 1.5 Mb of the Mycobacterium leprae genome.
Genome Res 1997, 7:802-819.
36. Cole ST, Brosch R, Parkhill J, Garnier T, Churcher C, Harris D,
Gordon SV, Eiglmeier K, Gas S, Barry CE III et al.: Deciphering the
biology of Mycobacterium tuberculosis from the complete
genome sequence. Nature 1998, 393:537-544.
37. Martin W: Mosaic bacterial chromosomes: a challenge en route to
a tree of genomes. Bioessays 1999, 21:99-104.
A commentary on [26•] discussing the profound impact their results will
have if the estimated rates of horizontal transfers in E. coli are correct and if
they can be generalized to all bacterial genomes throughout evolutionary
time. It is argued that bacterial genomes should be viewed as dynamic rather
than static structures in which genes come and go in a continual manner.
38. Doolittle WF: Phylogenetic classification and the universal tree.
Science 1999, 284:2124-2129.
Insights into the evolutionary process of genome degradation Andersson and Andersson
39. Jain R, Rivera MC, Lake JA: Horizontal gene transfer among
genomes: the complexity hypothesis. Proc Natl Acad Sci USA
1999, 96:3801-3806.
40. Nelson KE, Clayton RA, Gill SR, Gwinn ML, Dodson RJ, Haft DH,
Hickey EK, Peterson JD, Nelson WC, Ketchum KA et al.: Evidence
for lateral gene transfer between Archaea and bacteria from
genome sequence of Thermotoga maritima. Nature 1999,
41. Bergthorsson U, Ochman H: Distribution of chromosome length
variation in natural isolates of Escherichia coli. Mol Biol Evol 1998,
A comparative, experimental study which supports the idea that the E. coli
chromosome is a highly dynamic structure with high rates of genetic material inflow and outflow. The genome length variations seen in natural isolates
of E. coli seem to have been generated by multiple changes throughout the
genome. It is argued that the major source of variation is related to horizontal transfer events.
42. Hurtado A, Rodriguez-Valera F: Accessory DNA in the genomes of
representatives of the Escherichia coli reference collection.
J Bacteriol 1999, 181:2548-2554.
Fragments generated by random amplified polymorphic DNA which were not
found in all strains of the E. coli reference collection were analysed. It is
shown that most of this strain-specific DNA has base composition patterns
and sequence similarities which are consistent with an exogenous origin.
43. Petrov DA, Lozovskaya ER, Hartl DL: High intrinsic mutation rate of
DNA loss in Drosophila. Nature 1996, 384:346-349.
44. Petrov DA, Hartl DL: Trash DNA is what gets thrown away: high
rate of DNA loss in Drosophila. Gene 1997, 205:279-289.
45. Petrov DA, Hartl DL: High rate of DNA loss in the Drosophila
•• melanogaster and Drosophila virilis species groups. Mol Biol Evol
1998, 15:293-302.
Non-LTR transposable elements have been used to study patterns of spontaneous mutations in Drosophila. The most remarkable aspect of this paper
is that deletions were found to be much larger and much more frequent than
insertions. It is also shown that deletions in Drosophila are larger and more
frequent than deletions in mammals. The authors have estimated that the
half-life of a pseudogene is 14 million years in Drosophila as compared to
880 million years in mammals. The results may explain the rarity of pseudogenes in Drosophila, as well as the large differences in genome sizes in
46. Alm RA, Ling LL, Moir DT, King BL, Brown ED, Doig PC, Smith DR,
•• Noonan B, Guild BC, deJonge BL et al.: Genomic-sequence
comparison of two unrelated isolates of the human gastric
pathogen Helicobacter pylori. Nature 1999, 397:176-180.
This is the first comparison at the genomic level of two strains of H. pylori.
The overall genomic organization, gene order and coding content of the two
strains are quite similar. The analysis shows that 6–7% of the genes are
uniquely present in each strain, almost half of which are clustered in the
hypervariable region.