Download Comparative genomics and the evolution of prokaryotes

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Koinophilia wikipedia , lookup

Gene therapy wikipedia , lookup

Vectors in gene therapy wikipedia , lookup

Mitochondrial DNA wikipedia , lookup

Short interspersed nuclear elements (SINEs) wikipedia , lookup

Polyploid wikipedia , lookup

Gene nomenclature wikipedia , lookup

NUMT wikipedia , lookup

No-SCAR (Scarless Cas9 Assisted Recombineering) Genome Editing wikipedia , lookup

Quantitative trait locus wikipedia , lookup

Copy-number variation wikipedia , lookup

Nutriepigenomics wikipedia , lookup

Essential gene wikipedia , lookup

Therapeutic gene modulation wikipedia , lookup

Oncogenomics wikipedia , lookup

Whole genome sequencing wikipedia , lookup

Transposable element wikipedia , lookup

Genetic engineering wikipedia , lookup

Gene desert wikipedia , lookup

Biology and consumer behaviour wikipedia , lookup

Non-coding DNA wikipedia , lookup

Epigenetics of human development wikipedia , lookup

RNA-Seq wikipedia , lookup

Ridge (biology) wikipedia , lookup

Gene expression programming wikipedia , lookup

Genomic imprinting wikipedia , lookup

Human genome wikipedia , lookup

Genomic library wikipedia , lookup

Gene wikipedia , lookup

Metagenomics wikipedia , lookup

History of genetic engineering wikipedia , lookup

Human Genome Project wikipedia , lookup

Gene expression profiling wikipedia , lookup

Public health genomics wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Site-specific recombinase technology wikipedia , lookup

Designer baby wikipedia , lookup

Genomics wikipedia , lookup

Genome editing wikipedia , lookup

Genome (book) wikipedia , lookup

Helitron (biology) wikipedia , lookup

Microevolution wikipedia , lookup

Minimal genome wikipedia , lookup

Pathogenomics wikipedia , lookup

Genome evolution wikipedia , lookup

Transcript
Review
TRENDS in Microbiology
Vol.15 No.3
Comparative genomics and the
evolution of prokaryotes
Sophie Abby and Vincent Daubin
Université de Lyon, Université Lyon 1, Centre National de la Recherche Scientifique, UMR5558, Laboratoire de Biométrie et
Biologie évolutive, Villeurbanne, F-69622 cedex, France
Although biologists have long recognized the importance
of studying evolution to understand the organization of
living organisms, only with the development of genomics
have evolutionary studies become part of their routine
toolkit. Placing genomes into an evolutionary framework
has proved useful for understanding the functioning of
organisms. It has also substantially increased understanding of the processes by which genomes evolve
and led to a re-evaluation of our representation of the
diversity and the history of life. In this review, we present
some of the most important recent advances and promising leads in the field of microbial evolutionary genomics.
Genomics: a new era for molecular evolution
The science of molecular evolution is in its golden age. The
dawning era of genomics provides invaluable information
for studying the hereditary material in both its micro- and
macro-evolution (see Glossary). Especially in prokaryotes,
where the number of sequenced genomes will soon be
counted in thousands, understanding how contemporary
organisms evolved their range of functions is challenging
and fascinating. Prokaryotes certainly provide an interesting opportunity for studying the mechanisms of evolution: they harbor a previously unsuspected diversity even
within species and populations [1], they are found in small
[2] to very large population sizes [3], they can survive or
prosper in most inhospitable environments from the inside
of eukaryotic cells [2] to hot springs [4] or spaceships [5],
and they frequently acquire and use genetic material from
distantly related organisms.
A large majority of prokaryotic species have yet to be
sampled but the task of making sense of the exponentially
growing amount of available data is already enormous.
However, it has also become evident that the annotation of
a genome sequence greatly benefits from comparative
genomic analyses. The algorithms used for predicting open
reading frames (ORFs) are essentially based on the search
for start and stop codons along the genome and, although
these algorithms are usually efficient for annotating bacterial genomes, the best way to confirm the functional
status of ORFs is still to study their degree of conservation
among species. Although experimental validation is more
specific in revealing gene functions, its application is much
more complex. In addition, mutants for many evolutionary
conserved genes have no detectable phenotypes [6], and
Corresponding author: Daubin, V. ([email protected]).
Available online 7 February 2007.
www.sciencedirect.com
would therefore be considered as unimportant based solely
on this approach. Confirmation of functional status by
comparative analysis is particularly crucial for small
ORFs, which can occur by chance [7]. In addition, the
prediction of coding sequences is relatively straightforward, but systematically identifying other functional
regions such as small non-coding (NC) RNAs can be
impractical without the tools of comparative genomics [8].
This review presents recent advances in our understanding of molecular evolution of prokaryotes that arose
from comparative genomic approaches. Comparisons
across genomes from distantly related organisms have
enabled the identification of constraints that determine
the organization of the bacterial chromosome and the identification of mechanisms responsible for their incomparable diversity and adaptability. Evolutionary biologists are
beginning to uncover the processes of gene birth and
death and how newly acquired functions integrate into
an existing complex cellular machinery.
Universal features and diversity of prokaryotic genomes
Comparative studies of genomes have revealed that
bacterial chromosomes are under selective pressures that
have deeply shaped their organization. The processes of
replication, transcription and the regulation of gene
expression all impact how genes are arranged along the
genome. It has long been known that the asymmetric
manner in which the bacterial chromosome is replicated,
with a leading and a lagging strand, is correlated with
many evolutionary features, such as differential mutational bias between the two strands [9] and location of
essential genes [10] (Figure 1). Interestingly, the intensity
Glossary
Coalescence: a coalescence event for two lineages is the first event of common
ancestry of the two lineages.
Core genome: set of genes shared by all genomes in a species or taxa.
Macroevolution: any evolutionary change above the level of species.
Microevolution: any evolutionary change under the level of species.
Neo-functionalization: the process by which a gene acquires a new function,
generally after duplication.
ORFan (orphan): gene for which no homolog is found in current databases.
Orthologs: genes with a relationship that arose from a speciation event.
Pan genome: the union of all the genes that can be found in a species.
Paralogy: relationships between two duplicated genes.
Phylogenetic inertia: influence of evolutionary history on the conservation of a
character.
Pseudogene: relic of an ancient functional gene that is no longer functional.
Sub-functionalization: the process by which duplicated genes come to fulfill
complementary functions that were all encoded by ancestral genes.
0966-842X/$ – see front matter ß 2007 Elsevier Ltd. All rights reserved. doi:10.1016/j.tim.2007.01.007
136
Review
TRENDS in Microbiology Vol.15 No.3
Figure 1. Replication constraints on bacterial chromosomal organization. Two kinds of biases are detected along the bacterial chromosome: asymmetries owing to the
existence of a leading and a lagging strand, and biases related to the proximity of the origin and terminus of replication (Ori and Ter). Essential genes are represented by red
arrows and non-essential genes are shown in green. The thickness of an arrow is proportional to the expression rate of the gene it represents. Essential genes are
preferentially located on the leading strand and highly expressed genes, especially those related to transcription and translation, tend to be closer to the origin of replication
in fast-growing bacteria (see main text). The evolutionary rate and the G + C content (gray gradients) are respectively increasing and decreasing with distance to the origin.
of these biases shows a strong phylogenetic inertia, that is,
a good correlation with the tree of species. This could be
related to the nature of the DNA polymerase complex.
Indeed, the group of Firmicutes, which has two different
DNA polymerase a-subunits, exhibits a much stronger bias
than species that have only one DNA polymerase [10]. It
was previously thought that highly expressed genes underwent a selective pressure to be co-oriented with the replication fork to avoid frequent collisions between the DNA
polymerase and RNA polymerase; however, Rocha and
Danchin [11] have shown that this effect is more visible
for essential genes, whether highly expressed or not. Thus,
they proposed that the deleterious effect of the head-on
collisions of polymerases lies more in the production of
truncated transcripts and, consequently, non-functional
proteins than in the disruption of the replication complex.
The replication of the bacterial chromosome is
orientated from the origin (Ori) to the terminus (Ter).
This is also correlated with several evolutionary features
(Figure 1): in many genomes, genes near the terminus of
replication tend to have lower content in G + C nucleotides
and to exhibit higher rates of evolution [12]. Also, as a
result of the replication process, genes located close to the
Ori can be significantly amplified in dividing cells, in
comparison with those closer to the terminus. Couturier
and Rocha [13] have recently shown that, although doubling time and replication-associated gene dosage are fast
evolving features, the organization of the genomes of fastgrowing bacteria are deeply impacted by the necessity to
overexpress genes related to transcription and translation.
Not only are these genes over-represented in the region of
the origin, but the genomes of fast-dividing bacteria show
www.sciencedirect.com
evidence that rearrangements that would disrupt this
association are counter-selected [13].
In spite of these common principles of genome
organization, comparative genomics has revealed a previously unexpected degree of diversity among prokaryotic
genomes. One of the most striking examples of this diversity is the comparison of gene contents within and between
species. All forms of life seem to share only a handful of
genes, 60 according to the review by Koonin [14], and
these are mainly dedicated to translation. The genes for
other fundamental functions, such as DNA replication,
transcription or basic metabolism seem to be more sporadically spread in the tree of life. More surprisingly, this
diversity of genome content can be seen at every phylogenetic scale. Lerat et al. [15] have estimated that the core
genome of 13 g-proteobacteria contains <300 genes and,
although all Escherichia coli and Shigella genomes
sequenced to date have >4000 protein coding genes, these
genomes share <3000 genes [16,17] (Box 1). This variability has raised the question of how to define the genome of
a species, and the concept of ‘pan genome’ was proposed as
the sum of all genes found in a species. Tettelin et al. [18]
have examined the question of how many genomes are
needed to describe fully the pan genome of a species. Their
results showed that in Bacillus anthracis, the pan genome
was found to be fully described with only four genomes (a
probable testimony of the recent emergence of this species);
however, in group B Streptococcus, the pan genome was
‘open’, meaning that the number of new genes contributed
by every new genome sequence was expected to be 30
whatever the number of genomes already present in the
comparison (Box 1). The study of seven strains of E. coli
Review
TRENDS in Microbiology
Vol.15 No.3
137
Box 1. Core and pan genomes of prokaryotes
The diversity of genome repertoires is illustrated by two complementary concepts: the core genome and the pan genome. The
number of genes per genome varies widely between and within
kingdoms (Figure I). The ranges of genome sizes are given under
kingdom names. The core genome is defined as the set of genes that
is shared by all members of a monophyletic group. It only
represents a minimal estimate of the gene repertoire of the common
ancestor. The small size of core genomes and a comparison with
genome sizes of contemporary organisms suggest that evolution
has repeatedly produced various ways of accomplishing the same
tasks. Estimates of the size of various core genomes are shown in
Figure I at the basis of their group by solid red and blue circles.
Numbers of genes in core genomes are extracted from HOGENOM
database (http://pbil.univ-lyon1.fr/databases/hogenom.html), except for Streptococcus agalactiae [18], Escherichia coli [17] and
Haloquadratum walsbyi [19]. The estimate of the core genome not
only depends on the phylogenetic depth of the group considered
but also on the number of genomes available for comparison
(Bacillales, 14 genomes; Lactobacillales, 10 genomes; g-proteobacteria, 38 genomes).
The pan genome is defined as the union of all the genes that can be
found in a species. Gray circles represent the pan genomes of three
species of bacteria and one archaeon. Tettelin et al. [18] showed that
the pan genome of the clonal organism Bacillus anthracis can be
described with only four of the eight genomes sequenced to date and
the authors therefore proposed that the pan genome of this species is
closed (solid circle). By contrast, for E. coli and S. agalactiae, the pan
genome is still significantly growing with every new genome
sequenced. Tettelin et al. [18] showed that the number of genes of
the pan genome is far from reaching a plateau, and argued that these
pan genomes were open (dashed circles), which was also confirmed
for E. coli [17]. The pan genome of the archaeon H. walsbyi was
collected and evaluated by a metagenomics approach [19]. Because
only one genome was entirely sequenced for this species, the core
genome is not known but its size is necessarily smaller than the 2800
genes present in the complete genome sequence.
Figure I. Gene repertoires in the tree of life.
also suggested an open pan genome for this species, but
with a much more imposing pan genome as each sequenced
genome was reported to contribute >440 genes [17]. These
results suggest that the gene pool available in the
microbial world is far larger than previously thought,
and that the pan genome of a species is typically several
times bigger than its core genome.
Based on the random sequencing of environmental
samples, metagenomics studies have provided the first
www.sciencedirect.com
glimpse at the tremendous diversity of these genetic
surroundings. In a recent paper, Legault et al. [19]
suggested that the pan genome of the square halophilic
archaeon Haloquadratum walsbyi, analyzed in an environmental genomic assay, was at least twice as big as the
genome of the sequenced strain. Furthermore, most of
these additional sequences exhibited atypical GC content
and were associated with insertion sequence (IS) elements
and phage sequences, suggesting a role for horizontal gene
138
Review
TRENDS in Microbiology Vol.15 No.3
transfer (HGT) in the maintenance of this accessory gene
pool (see later). Many environments have been analyzed
for their gene content, from the human distal gut, in which
16 novel bacterial phylotypes and 60 uncultured species
within the 151 bacterial phylotypes analyzed were discovered [20], to the Sargasso Sea where more than a billion
nonredundant base pairs were sequenced [21]. Recently,
Edwards et al. [22] analyzed the metagenomes isolated
from two sites of a deep mine and discovered a great
diversity of species and significant differences in metabolic
potential between these neighboring spots. Not only do
these studies support the vision of an outstanding genetic
diversity of microbes, but they also demonstrate the possibility of sequencing unculturable microorganisms and provide elements with which to compare the structure of
ecological systems.
The evolution of gene repertoires
The mechanisms explaining this diversity of gene
repertoires in genomes [i.e. the processes by which genes
are gained and lost (Figure 2)] have been the subject of
numerous studies. Before complete genomes from different
strains of the same species or from closely related species
were available, nonfunctional genes or pseudogenes were
thought to be rare in bacteria. The first reports of a
significant number of pseudogenes were in pathogens
undergoing strong genome reduction such as Rickettsia
prowazekii or Mycobacterium leprae, but free-living bacteria were believed to contain relatively few pseudogenes.
The first release of the E. coli MG1665 genome was
reported to contain only one pseudogene. A recent
approach based on comparisons of closely related genomes
[23,24] has shown that this genome contains 100 genes
that are >80% shorter than their orthologs in other E. coli
strains, and are therefore likely to be pseudogenes. The
most frequent causes of gene disruption are frameshifts
and truncations but some recent pathogens such as Yersinia pestis [25,26] or Shigella flexneri [27,28] exhibit a high
proportion of pseudogenes due to the introduction of IS
elements, probably as a result of relaxed selection pressure
Figure 2. The dynamics of genome repertoire. Bacterial genomes are dynamic entities that constantly gain (left; blue boxes) and lose genes (right; beige boxes). These
modifications of gene repertoires arise by different mechanisms. First, bacterial genomes can acquire genetic material from other organisms, even distantly related ones.
Horizontal gene transfers are evidenced by different types of approaches that generally identify distinct sets of genes. (i) Analysis of gene composition (GC%, codons)
identifies mostly genes that are rarely found in other species. This generally precludes a confirmation of their foreign status by a phylogenetic analysis. However, a mapping
of gene presence on a phylogenetic tree of complete genomes can confirm that they have been recently acquired in the genome [41]. By contrast, phylogenetic analysis can
reveal HGT for genes that have wider phylogenetic distribution, and these genes only rarely show a striking difference in composition [36]. In this case, HGT can result in the
addition of a completely new gene (i), the replacement of an existing gene (ii) or genetic redundancy (iii) if a homologous gene is already present in the recipient genome.
Genetic redundancy can also arise from gene duplication and only phylogenetic analysis can distinguish between these two origins. Recent analyses have demonstrated
that HGT participates significantly in the degree of redundancy in a bacterial genome [30]. Gene excision and formation of pseudogenes are the mechanisms for gene loss.
Excision occurs when a gene is completely deleted from the genome, and pseudogene formation occurs when mutations (point mutation and/or insertion/deletion)
accumulate, resulting in function loss. The loss of a gene is evidenced by the absence of the gene in the analyzed genome whereas it is present in related species
(phylogenetic mapping). Pseudogenes can be identified by comparisons of closely related genomes [23].
www.sciencedirect.com
Review
TRENDS in Microbiology
owing to a recent bottleneck in their population size. These
results have shown that pseudogenes are more abundant
than previously thought in bacterial genomes but are
subject to quick elimination once disrupted because only
a small proportion of them are conserved long enough to be
found in several strains.
These recurrent losses of genes and functions must be
compensated by the acquisition of new genetic material. In
eukaryotes, the evolution of new genes is thought to occur
mainly through duplication followed by sub- or neo-functionalization of one or both resulting copies. But prokaryotes can integrate genes of diverse origin into their
genomes through HGT, which is believed to have a crucial
role in speciation and prokaryotic adaptation to new
environments [29]. The question of how much duplication
and HGT are contributing to genetic novelty has thus been
investigated in bacteria. Using maximum likelihood tests
to compare phylogenetic gene trees, Lerat et al. [30] have
shown that a large proportion of the genes that are usually
deemed as duplicates in bacterial genomes are more
likely to be genes that have been acquired by HGT while
they already had a homolog present in the recipient genome. Another study confirmed the dominant role of HGT
over duplication to the evolution of the E. coli metabolic
network [31]. However, the relative role of HGT and duplication might vary significantly among species: recent studies of two large bacterial genomes, Myxococcus xanthus
[32] and Burkholderia xenovorans LB400 [33] (9.14 Mbp
and 9.73 Mbp, respectively), estimated that HGT and
duplication contributed in equal proportions to their gene
repertoires (15–20%). This amount of duplicates is exceptional in bacteria and has been proposed to be correlated
with specific ecological or behavioral needs, such as cellular
communication for the social M. xanthus and the ability to
adapt to different nutrient sources for B. xenovorans.
Interestingly, the contribution of HGT and duplications
to genome content does not only vary among bacterial
groups but also within taxa; for example, other strains
of the species B. xenovorans do not harbor such an amount
of redundancy in their genomes [33].
The number of foreign genes present in the genome of E.
coli was estimated to be >10% before the era of comparative genomics, when Médigue et al. [34] analyzed the codon
composition of about a third of its genes. Later, similar
analyses that searched for genes having atypical features
in a genome revealed that the number of HGT events
varies drastically among species, from zero to >20%
[29,35]. However, confirming the status of these foreign
genes by independent approaches is difficult [36]. It might
be significant that among thousands of HGT detected using
nucleotide compositions, the example chosen by Nakamura
et al. [35] to illustrate a confirmation by phylogenetic
analysis was later pointed out to have been deliberately
introduced in Neisseria meningitidis by genetic modification to reduce virulence [37,38].
Nevertheless, most of these genes are probably genuine
HGT, as demonstrated by comparisons of genome content
in a phylogenetic framework [37,39,40]: the distribution of
genes with atypical codon composition on a species phylogeny strongly suggests that most of them are transmitted
horizontally. The origin of these genes can rarely be
www.sciencedirect.com
Vol.15 No.3
139
confirmed by phylogenetic analyses because most of them
have no known homologs in databases. The fact that HGT
are strongly enriched in these orphan genes or ‘ORFans’
again points at the ‘open’ pan genome and the tremendous
diversity of the available pool of genes. A solution to the
dilemma of this infinite pool of available proteins has been
proposed by Daubin and Ochman [41,42]: many ORFans
show characteristics that are strikingly similar to bacteriophage- and plasmid-specific genes, and could be continuously generated there through their exceptionally high
mutation rates and opportunities for heterologous recombination. Although most of these genes are probably deleterious or useless for the transducted host, evidence exists
that such genes can prove useful for their cellular recipient
and can even become essential and, ultimately, become
incorporated into the core genome of a species [41,42]. This
hypothesis has been tested by searching for homologs of
ORFans in databases of bacteriophage genes [41,43]. In
their recent study, Yin and Fischer [43] found that only a
few genomes show evidence for ORFans having more
homologs in phage genomes than other genes. However,
their study showed that databases of viral genes are
strongly biased toward bacteriophages associated with gproteobacteria and Firmicutes, and that ORFans from both
of these groups show significantly higher homology to
bacteriophage than other genes. Although the role of bacteriophage in generating ORFans seems significant in
these groups, a more representative sample of the diversity
of phages would be necessary to generalize this result to
other bacteria.
Although ORFans and uncharacterized genes explain a
significant part of the diversity of gene repertoires in
bacteria, there is also strong evidence that distantly
related bacteria exchange genes and that these transfers
have a key role in the acquisition of new capabilities and
the adaptation to new environments [29,44]. Such genes
are usually less well detected by codon composition
analyses and are more readily found by incongruent trees
or sporadic occurrence in the phylogeny of bacteria [36].
HGT and the evolution of gene networks
One of the important questions raised by HGT is how a
newly acquired gene fits into the complex cellular network
of the recipient organism. Based on an analysis of congruence of gene phylogenies, Jain et al. [45] proposed the
‘complexity hypothesis’ that genes might have different
probabilities of being transferred depending on how many
interacting partners their products have in the cell. Most
notably, genes involved in translation and transcription,
most of them part of protein complexes, were found to show
fewer indications of transfers. More recently, Pal et al.
[31,46] analyzed the metabolic network of E. coli to study
the influence of the metabolic network on the probability of
gene transfers. The success of a HGT was found to depend
on the pathway it affected, with an HGT that intervened in
a peripheral pathway [46] or having physiologically interacting partners already present in the genome [31] being
more likely to be fixed. According to Pal et al. [46], prokaryotic gene networks evolve by continuous addition of peripheral functions that are more directly involved in
interacting with the environment. This view contrasts with
140
Review
TRENDS in Microbiology Vol.15 No.3
the model proposed by Teichmann and Madan Babu [47]
for the evolution of regulatory networks in which 45% of
the regulatory interactions in E. coli arose by duplication
and inheritance of interaction. However, these views are
not necessarily incompatible because Lerat et al. [30]
showed that many genes traditionally identified as duplicates in bacterial genomes might have arisen from HGT of
a gene that possessed a homolog in the recipient genome, a
possibility not considered in the study by Teichmann and
Madan Babu [47].
Starting from the gene content of the E. coli regulatory
network [48,49], Hershberg and Margalit [50] investigated the conservation of transcription factors (TFs)
and their targets among g-proteobacteria. They found
that repressors co-occur with their targets, while activators can be lost independently of their targets. This
suggests a differential evolving mechanism to turn off
a regulatory pathway: the loss of TFs is sufficient in
the case of positive regulation whereas in the case of
negatively regulated pathways, the loss of a repressor
can have strong negative effects by constitutively expressing the target function. Madan Babu et al. [51] and
Lozada-Chavez et al. [52] did similar studies at a higher
evolutionary scale – they analyzed 175 (bacterial and
archaeal) and 110 (bacterial) genomes, respectively,
and showed a lower conservation of TFs compared with
their targets.
A limitation of these studies is that the gene networks of
only a few model organisms have been studied experimentally and networks are generally reconstructed by searching for homologous genes in other genomes. However, the
inference of a protein function based on comparative
analysis is not straightforward, especially when divergent
organisms are considered because homologous genes can
encode different functions [53,54]. In these analyses of the
evolution of gene networks, homology of function is generally inferred solely on BLAST searches and the risk of
assigning erroneous functions to proteins can be high.
Because gene histories intermingle evolutionary events
such as duplications, gene transfers, gene losses and speciation, the assessment of the type of homologous relationships among genes is a crucial point in comparative
genomics.
Phylogenomics and the problem of HGT
Comparative genomics has raised the issue of HGT and the
concept of species but it has also provided a large amount of
new phylogenetic markers and stimulated the field of
phylogenetics. Numerous phylogenomic methods were
recently developed to use complete genome data (see
Ref. [55] for a review) and were used to attempt to reconstruct the tree of life [56] or to test for the existence of a
phylogenetic signal.
With the finding of the extent of HGT, the Darwinian
tree-like representation of relationships between species
has been questioned by some authors asserting that HGT
events are so ‘rampant’ that genes cannot be used as
reliable phylogenetic markers. They propose a network
of species [44], arguing that a signal of vertical inheritance
cannot be unraveled from horizontal signals due to HGT.
This idea is hotly controversial because several studies
www.sciencedirect.com
showed the existence of a predominant signal for an
organismal phylogeny [29,30,57–59]. It was thought that
using a large dataset to reconstruct phylogenetic trees
would ensure that a powerful signal would emerge to
resolve phylogenetic relationships, provided that HGT
had been adequately identified [56,60]. However, a recent
study showed that, even independently of HGT, population
genetics predicts a great deal of incongruence among gene
trees and that combining even large amounts of these data
would not help. Degnan and Rosenberg [61] have simulated the evolution of genes using a coalescence model and
shown that, especially in the case of deep trees, gene trees
can often conflict with species trees even without
exchanges among lineages. This effect would probably be
particularly important in prokaryotes because the time of
coalescence can be greater with larger population sizes.
Therefore, the conflict observed among gene trees might
not be only the result of HGT, phylogenetic artifacts or
hidden paralogy, but also of a genuine vertical descent in
which polymorphic alleles cohabit for a long time in populations. Future attempts to assess the degree of incongruence among gene trees and to reconstruct the tree of life
will have to take into account this possible effect.
Concluding remarks and future perspectives
The amount of data generated by genome projects
stimulates many fields of biology and has a deep impact
on our vision of the evolution and the organization of life.
Bacteria have been found to be far more diverse, complex
and variable than ever suspected but, although their genomes exhibit striking differences in gene contents, they
show an organization based on the same principles. The
necessity to replicate and express their genome simultaneously imposes constraints on gene dosage and
arrangement. How this organization is exploited for new
adaptations in an ever-changing genome constantly
impacted by HGT has yet to be understood. However,
comparative genomics has already enabled the identification of some of the mechanisms that determine the
acquisition of new genes and functions, and how they
integrate in the cellular network. The development of
metagenomics will enable a better description of the
genetic environment of organisms and an understanding
of the possible functional innovations that can arise from
HGT. The role of HGT seems to be crucial but one should
not consider that gene exchanges have been so profound as
to preclude the reconstruction of the history of life, in the
sense of understanding how genomes have evolved to what
they are. More integrative approaches combining information from species phylogenies, gene histories, ecology
and cellular networks will be necessary to tell the chronicles of contemporary genomes. But the comparative
analysis of genomes of different domains of life, cellular
and viral organisms already suggests that this story is a
tale of invasions, exchanges and conflicts turning into
cooperation.
Acknowledgements
We would like to thank Daniel Kahn, Sylvain Mousset, Bastien Boussau
and five anonymous reviewers for their helpful comments on the
manuscript. S.A. is the recipient of a fellowship from the Ministère de
l’Education Nationale de la Recherche et de la Technologie.
Review
TRENDS in Microbiology
References
1 Binnewies, T.T. et al. (2006) Ten years of bacterial genome sequencing:
comparative-genomics-based discoveries. Funct. Integr. Genomics 6,
165–185
2 Moran, N.A. (2002) Microbial minimalism: genome reduction in
bacterial pathogens. Cell 108, 583–586
3 Lynch, M. and Conery, J.S. (2003) The origins of genome complexity.
Science 302, 1401–1404
4 Alain, K. et al. (2002) Caminicella sporogenes gen. nov., sp. nov., a novel
thermophilic spore-forming bacterium isolated from an East-Pacific
Rise hydrothermal vent. Int. J. Syst. Evol. Microbiol. 52, 1621–1628
5 Novikova, N. et al. (2006) Survey of environmental biocontamination on
board the International Space Station. Res. Microbiol. 157, 5–12
6 Kobayashi, K. et al. (2003) Essential Bacillus subtilis genes. Proc. Natl.
Acad. Sci. U. S. A. 100, 4678–4683
7 Ochman, H. (2002) Distinguishing the ORFs from the ELFs: short
bacterial genes and the annotation of genomes. Trends Genet. 18, 335–
337
8 Vogel, J. and Sharma, C.M. (2005) How to find small non-coding RNAs
in bacteria. Biol. Chem. 386, 1219–1238
9 Lobry, J.R. and Sueoka, N. (2002) Asymmetric directional mutation
pressures in bacteria. Genome Biol. 3, RESEARCH0058
10 Rocha, E.P. (2004) Order and disorder in bacterial genomes. Curr.
Opin. Microbiol. 7, 519–527
11 Rocha, E.P. and Danchin, A. (2003) Gene essentiality determines
chromosome organisation in bacteria. Nucleic Acids Res. 31, 6570–6577
12 Daubin, V. and Perrière, G. (2003) G + C3 structuring along the
genome: a common feature in prokaryotes. Mol. Biol. Evol. 20, 471–483
13 Couturier, E. and Rocha, E.P. (2006) Replication-associated gene
dosage effects shape the genomes of fast-growing bacteria but only
for transcription and translation genes. Mol. Microbiol. 59, 1506–1518
14 Koonin, E.V. (2003) Comparative genomics, minimal gene-sets and the
last universal common ancestor. Nat. Rev. Microbiol. 1, 127–136
15 Lerat, E. et al. (2003) From gene trees to organismal phylogeny in
prokaryotes: the case of the g-proteobacteria. PLoS Biol. 1, E19
16 Welch, R.A. et al. (2002) Extensive mosaic structure revealed by the
complete genome sequence of uropathogenic Escherichia coli. Proc.
Natl. Acad. Sci. U. S. A. 99, 17020–17024
17 Chen, S.L. et al. (2006) Identification of genes subject to positive
selection in uropathogenic strains of Escherichia coli: a comparative
genomics approach. Proc. Natl. Acad. Sci. U. S. A. 103, 5977–5982
18 Tettelin, H. et al. (2005) Genome analysis of multiple pathogenic
isolates of Streptococcus agalactiae: implications for the microbial
‘‘pan-genome’’. Proc. Natl. Acad. Sci. U. S. A. 102, 13950–13955
19 Legault, B.A. et al. (2006) Environmental genomics of ‘‘Haloquadratum
walsbyi’’ in a saltern crystallizer indicates a large pool of accessory
genes in an otherwise coherent species. BMC Genomics 7, 171
20 Gill, S.R. et al. (2006) Metagenomic analysis of the human distal gut
microbiome. Science 312, 1355–1359
21 Venter, J.C. et al. (2004) Environmental genome shotgun sequencing of
the Sargasso Sea. Science 304, 66–74
22 Edwards, R.A. et al. (2006) Using pyrosequencing to shed light on deep
mine microbial ecology. BMC Genomics 7, 57
23 Lerat, E. and Ochman, H. (2005) Recognizing the pseudogenes in
bacterial genomes. Nucleic Acids Res. 33, 3125–3132
24 Lerat, E. and Ochman, H. (2004) Psi-Phi: exploring the outer limits of
bacterial pseudogenes. Genome Res. 14, 2273–2278
25 Deng, W. et al. (2002) Genome sequence of Yersinia pestis KIM.
J. Bacteriol. 184, 4601–4611
26 Parkhill, J. et al. (2001) Genome sequence of Yersinia pestis, the
causative agent of plague. Nature 413, 523–527
27 Jin, Q. et al. (2002) Genome sequence of Shigella flexneri 2a: insights
into pathogenicity through comparison with genomes of Escherichia
coli K12 and O157. Nucleic Acids Res. 30, 4432–4441
28 Wei, J. et al. (2003) Complete genome sequence and comparative
genomics of Shigella flexneri serotype 2a strain 2457T. Infect.
Immun. 71, 2775–2786
29 Ochman, H. et al. (2000) Lateral gene transfer and the nature of
bacterial innovation. Nature 405, 299–304
30 Lerat, E. et al. (2005) Evolutionary origins of genomic repertoires in
bacteria. PLoS Biol. 3, 130
www.sciencedirect.com
Vol.15 No.3
141
31 Pal, C. et al. (2005) Adaptive evolution of bacterial metabolic networks
by horizontal gene transfer. Nat. Genet. 37, 1372–1375
32 Goldman, B.S. et al. (2006) Evolution of sensory complexity recorded in
a myxobacterial genome. Proc. Natl. Acad. Sci. U. S. A. 103, 15200–
15205
33 Chain, P.S. et al. (2006) Burkholderia xenovorans LB400 harbors a
multi-replicon. 9.73-Mbp genome shaped for versatility. Proc. Natl.
Acad. Sci. U. S. A. 103, 15280–15287
34 Médigue, C. et al. (1991) Evidence for horizontal gene transfer in
Escherichia coli speciation. J. Mol. Biol. 222, 851–856
35 Nakamura, Y. et al. (2004) Biased biological functions of horizontally
transferred genes in prokaryotic genomes. Nat. Genet. 36, 760–766
36 Ragan, M.A. (2001) On surrogate methods for detecting lateral gene
transfer. FEMS Microbiol. Lett. 201, 187–191
37 van Passel, M. et al. (2004) Phylogenetic validation of horizontal gene
transfer? Nat. Genet. 36, 1028
38 Tettelin, H. and Parkhill, J. (2004) The use of genome annotation
data and its impact on biological conclusions. Nat. Genet. 36, 1028–
1029
39 Daubin, V. et al. (2003) The source of laterally transferred genes in
bacterial genomes. Genome Biol. 4, R57
40 Daubin, V. et al. (2003) Phylogenetics and the cohesion of bacterial
genomes. Science 301, 829–832
41 Daubin, V. and Ochman, H. (2004) Bacterial genomes as new gene
homes: the genealogy of ORFans in E. coli. Genome Res. 14, 1036–
1042
42 Daubin, V. and Ochman, H. (2004) Start-up entities in the origin of new
genes. Curr. Opin. Genet. Dev. 14, 616–619
43 Yin, Y. and Fischer, D. (2006) On the origin of microbial ORFans:
quantifying the strength of the evidence for viral lateral transfer. BMC
Evol. Biol. 6, 63
44 Gogarten, J.P. et al. (2002) Prokaryotic evolution in light of gene
transfer. Mol. Biol. Evol. 19, 2226–2238
45 Jain, R. et al. (1999) Horizontal gene transfer among genomes: the
complexity hypothesis. Proc. Natl. Acad. Sci. U. S. A. 96, 3801–3806
46 Pal, C. et al. (2005) Horizontal gene transfer depends on gene content of
the host. Bioinformatics 21 (Suppl 2), ii222–ii223
47 Teichmann, S.A. and Madan Babu, M. (2004) Gene regulatory network
growth by duplication. Nat. Genet. 36, 492–496
48 Shen-Orr, S.S. et al. (2002) Network motifs in the transcriptional
regulation network of Escherichia coli. Nat. Genet. 31, 64–68
49 Salgado, H. et al. (2004) RegulonDB (version 4.0): transcriptional
regulation, operon organization and growth conditions in
Escherichia coli K-12. Nucleic Acids Res. 32, D303–D306
50 Hershberg, R. and Margalit, H. (2006) Co-evolution of transcription
factors and their targets depends on mode of regulation. Genome Biol.
7, R62
51 Madan Babu, M. et al. (2006) Evolutionary dynamics of prokaryotic
transcriptional regulatory networks. J. Mol. Biol. 358, 614–633
52 Lozada-Chavez, I. et al. (2006) Bacterial regulatory networks are
extremely flexible in evolution. Nucleic Acids Res. 34, 3434–3445
53 Eisen, J.A. (1998) Phylogenomics: improving functional predictions for
uncharacterized genes by evolutionary analysis. Genome Res. 8, 163–
167
54 Lazareva-Ulitsky, B. et al. (2005) On the quality of tree-based protein
classification. Bioinformatics 21, 1876–1890
55 Delsuc, F. et al. (2005) Phylogenomics and the reconstruction of the tree
of life. Nat. Rev. Genet. 6, 361–375
56 Ciccarelli, F.D. et al. (2006) Toward an automatic reconstruction of a
highly resolved tree of life. Science 311, 1283–1287
57 Ge, F. et al. (2005) The cobweb of life revealed by genome-scale
estimates of horizontal gene transfer. PLoS Biol. 3, e316
58 Beiko, R.G. et al. (2005) Highways of gene sharing in prokaryotes. Proc.
Natl. Acad. Sci. U. S. A. 102, 14332–14337
59 Ochman, H. et al. (2005) Examining bacterial species under the specter
of gene transfer and exchange. Proc. Natl. Acad. Sci. U. S. A. 102, 6595–
6599
60 Brown, J.R. et al. (2001) Universal trees based on large combined
protein sequence data sets. Nat. Genet. 28, 281–285
61 Degnan, J.H. and Rosenberg, N.A. (2006) Discordance of species trees
with their most likely gene trees. PLoS Genet. 2, e68