Download Catalyzing Bacterial Speciation: Correlating Lateral Transfer with

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Transposable element wikipedia , lookup

Mutation wikipedia , lookup

Cre-Lox recombination wikipedia , lookup

Oncogenomics wikipedia , lookup

Gene desert wikipedia , lookup

Point mutation wikipedia , lookup

Quantitative trait locus wikipedia , lookup

Vectors in gene therapy wikipedia , lookup

Human genome wikipedia , lookup

No-SCAR (Scarless Cas9 Assisted Recombineering) Genome Editing wikipedia , lookup

Ridge (biology) wikipedia , lookup

Nutriepigenomics wikipedia , lookup

Therapeutic gene modulation wikipedia , lookup

Epigenetics of human development wikipedia , lookup

RNA-Seq wikipedia , lookup

Non-coding DNA wikipedia , lookup

Genomic imprinting wikipedia , lookup

Koinophilia wikipedia , lookup

Population genetics wikipedia , lookup

Biology and consumer behaviour wikipedia , lookup

Gene expression programming wikipedia , lookup

Genomics wikipedia , lookup

Genomic library wikipedia , lookup

Gene expression profiling wikipedia , lookup

Metagenomics wikipedia , lookup

Genetic engineering wikipedia , lookup

Gene wikipedia , lookup

Public health genomics wikipedia , lookup

Genome editing wikipedia , lookup

Helitron (biology) wikipedia , lookup

Genome (book) wikipedia , lookup

Minimal genome wikipedia , lookup

Designer baby wikipedia , lookup

Site-specific recombinase technology wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Pathogenomics wikipedia , lookup

History of genetic engineering wikipedia , lookup

Genome evolution wikipedia , lookup

Microevolution wikipedia , lookup

Transcript
Syst. Biol. 50(4):479–496, 2001
Catalyzing Bacterial Speciation: Correlating Lateral Transfer
with Genetic Headroom
J EFFREY G. LAWRENCE
Department of Biological Sciences, University of Pittsburgh, Pittsburgh, Pennsylvania 15260, USA;
E-mail: [email protected]
Abstract.—Unlike crown eukaryotic species, microbial species are created by continual processes of
gene loss and acquisition promoted by horizontal genetic transfer. The amounts of foreign DNA in
bacterial genomes, and the rate at which this is acquired, are consistent with gene transfer as the primary catalyst for microbial differentiation. However, the rate of successful gene transfer varies among
bacterial lineages. The heterogeneity in foreign DNA content is directly correlated with amount of
genetic headroom intrinsic to a bacterial species. Genetic headroom reects the amount of potentially dispensable information—reected in codon usage bias and codon context bias—that can be
transiently sacriŽced to allow experimentation with functions introduced by gene transfer. In this
way, genetic headroom offers a potential metric for assessing the propensity of a lineage to speciate.
[Bacteria; codon usage bias; compositional bias; evolution; genome; horizontal transfer; speciation.]
The majority of free-living organisms in
the biosphere are Bacteria and Archaea,
which comprise the two domains of
“prokaryotic” microbes. Among extant organisms, the majority of the genetic variation
at loci encoding rRNA, ATPase, elongation
factors, aminoacyl-tRNA synthases, the protein export translocon, and other universally
distributed and functionally constrained
molecules is found within the prokaryotic
domains (Gogarten et al., 1989; Iwabe
et al., 1989; Woese et al., 1990; Brown et al.,
1997; Gribaldo and Cammarano, 1998). In
addition, the bulk of physiological variation
is found among these microbes as well,
members of which thrive in virtually every
environment examined, including superheated hydrothermal vents, Antarctic ice
oes, salt-saturated alkaline pools, acid hot
springs, distilled water reservoirs, and the
upper atmosphere. Virtually any substance
that can be oxidized, either organic or
inorganic, can be exploited as an energy
source by a microbe. Analyses of reassociation curves for DNA extracted from
soil bacteria—which reect the diversity
of bacterial species in this sample (Torsvik
et al., 1990)—have led to conservative estimates that Bacteria and Archaea comprise
more than 1,000,000,000 species within the
biosphere (Dykhuizen, 1998) (note that the
term “bacteria” is used to connote both
Bacteria and Archaea).
Over their substantial evolutionary history, microbial taxa have diversiŽed and speciated to exploit an ever-increasing array of
complex environments. Speciation can be deŽned as the process by which one group of
organisms propagates into two groups of organisms, each distinct from each other and,
in many cases, from their common parental
stock (King, 1993; Vrba, 1985). The creation of
two daughter species from a single parental
species is thought to be correlated with, although not necessarily caused by, the differential partitioning of parental genetic variation in the two descendent groups. This
process allows for phenotypic distinction between incipient species and serves to deŽne
each group of organisms in terms of their
own ecological capabilities. Yet the working
criteria for deŽning and describing a species
remain delightfully contentious issues, and
many species concepts have been developed
to categorize organisms into groups that reect some relevant aspects of their biology.
Almost uniformly, mechanisms driving
the speciation process have been postulated
in the context of diploid, obligatorily sexual eukaryotic species. However, Bacterial
and Archaeal cells are typically haploid, bear
single chromosomes (although see Casjens
[1998]), and reproduce by binary Žssion in
life cycles that lack requisite sexual exchange.
These features preclude the adoption of conventional speciation mechanisms, which rely
on facets of eukaryotic biology that are absent
or irrelevant among Bacterial and Archaeal
taxa. More importantly, most proposed speciation mechanisms fail to include important
aspects of Bacterial evolution that have little or no impact in the evolution of most
479
480
S YSTEMATIC BIOLOGY
eukaryotic species. Broad descriptions that
attempt to address speciation processes both
in Bacteria and in Eukarya are invariably too
vague to be of use in deŽning speciation in
either class of organisms.
Here I provide a framework for examining speciation in the Bacteria and Archaea, a process that differs fundamentally
from speciation in Eukaryotes. These differences can be attributed to the marked
contrast in reproductive mechanisms, the
frequency and necessity of intraspeciŽc recombination, the frequency and utility of
interspeciŽc recombination, mechanisms of
gene expression, population size, genetic
diversity, and the biochemical foundation
of phenotypic differentiation between these
groups of organisms. As detailed below, the
nature of a Bacterial species provides a context for understanding Bacterial speciation.
T HE PROBLEM WITH S PECIES CONCEPTS
The Biological Species Concept
The earliest descriptions of biological
species relied on a “type specimen” concept,
reective of the Platonic idealized form (that
is, all earthly items are projections of their
perfect, heavenly forms), and thereby categorized organisms on the basis of their overall morphological similarity (Linneus, 1742).
However, bacterial cells lack the rich sets
of structural characteristics that allow partitioning of eukaryotic organisms into relevant groupings. As a result, classiŽcation and
taxonomy of prokaryotes lagged behind the
description of eukaryotic organisms. These
difŽculties were exasperated by complications in identifying synapomorphic characters, especially when using variable biotypic
markers.
The limitations of taxonomic classiŽcation
in rigorously deŽning the most fundamental
of groups, the species, was somewhat circumvented by the Biological Species Concept
(BSC), which deŽned a species as a population of organisms that share a common gene
pool (Mayr, 1942, 1954, 1963). This idea is
rooted in the observation that intraspeciŽc
recombination is obligatory upon syngamy
among sexually reproducing eukaryotes; in
those organisms, interspeciŽc recombination
is almost completely absent, in that crossspecies syngamy is rarely possible or fruitful. The limitations of the BSC are evident
when considering organisms that do not ex-
VOL. 50
hibit obligate genetic exchange. As a result,
factors inuencing bacterial speciation remained unexplored, partly because it was not
clear how to deŽne a microbial species.
Yet extensive population genetic surveys
(Hartl and Dykhuizen, 1984; Milkman and
Stoltzfus, 1988; Milkman and Bridges, 1990,
1993; Dykhuizen and Green, 1991; Guttman
and Dykhuizen, 1994a; Milkman, 1997;
Selander et al., 1996) have demonstrated that
intraspeciŽc recombination among strains
of the enteric bacterium Escherichia coli, or
among strains of its sister species Salmonella
enterica, occur far more often than exchange
between these lineages. Dykhuizen and
Green (1991) proposed that microbial
species could be effectively and empirically
deŽned as groups within which gene trees
would not be congruent (because of recombination) but between which gene trees would
be congruent. However, the prevalence of
horizontal genetic transfer among Bacterial
and Archaeal taxa (Syvanen and Kado, 1998;
Doolittle, 1999a, 1999b)—and even between
Bacteria and Plants or between Bacteria
and Fungi (Buchanan-Wollaston et al., 1987;
Heinemann and Sprague, 1989; Figge et al.,
1999)—necessitates a more formal deŽnition
of “common gene pool” for prokaryotes,
given that gene transfer across vast phylogenetic distances suggests all organisms share
the same gene pool to some extent. The BSC
could be revised to describe a species as a
group of organisms that mutually exchange
genes substantially more often with each
other than with other organisms. However,
the rates of intraspeciŽc recombination are
tremendously variable among bacterial taxa
(Maynard Smith et al., 1993), rendering a
simple threshold for discriminating between
intraspeciŽc and interspeciŽc recombination
impractical. Thus, additional criteria are
required to distinguish microbial species and
to make inferences regarding their origins.
The Ecological Species Concept
The Ecological Species Concept (ESC) is
practical in describing a species as a group
of organisms that exploit the same ecological niche (Van Valen, 1976). Regardless
of common gene pools, mate recognition
(Paterson, 1985), or evolutionary trajectory
(Wiley, 1978), if two sets of organisms are
ecological identical—that is, if they are
attempting to occupy the same ecological
2001
LAWRENCE—BACTERIAL S PECIATION
niche at the same time in the same place—
stochastic processes will inevitably result in
one group being displaced by the other. Unlike macroscopic eukaryotic taxa, microorganisms are typically not constrained by the
physical or geographic barriers that can allow otherwise identical groups of organisms
to develop ecological differences very slowly,
or not at all. The ESC acts somewhat to move
the task of discriminating between species
to the task of discriminating among distinct,
persistent ecological niches. Yet, parameters
deŽning ecological niches, such as physiological capabilities and environmental tolerances, can be evaluated empirically in microbial taxa, or can be predicted from genomic
sequence information (for example, the enteric bacterium E. coli grows best at 37± C,
respires to numerous anaerobic electron acceptors, and degrades milk sugar, suiting it
well for life in mammalian guts; in contrast,
the related bacterium Serratia marcesens cannot degrade milk sugar and grows best at
26± C). Moreover, the process of speciation is
readily applied to bacterial taxa when it can
be deŽned as an organism’s adoption or invasion of a novel ecology.
A UniŽed Bacterial Species Concept
In practical ways, a bacterial species can
be deŽned by using the modiŽcation of the
BSC detailed above, invoking the ESC as the
arbiter of natural selection. In this fashion, a
bacterial species is deŽned as a group of organisms that exploit a common ecology and,
as a result, exhibit effective rates of recombination that are greater among members of
the group than with other organisms; that
is, although interspeciŽc recombination may
occur, the resulting recombinants are, on average, less successful because the hybrids do
not effectively exploit either parental environment (see below). Over time, the divergence of nucleotide sequences reduces the
likelihood of homologous DNA exchange between lineages, effectively imposing premating genetic isolation.
This species concept is sufŽciently exible to accommodate the huge variance in the
tempo and mode of recombination among
bacterial species. Moreover, it makes predictions regarding population structure and
functional diversity that are testable in the
laboratory environment. Under this framework, mechanisms of speciation would en-
481
tail substantial changes in ecology among
subpopulations of a species; as a result, incipient species would form as the rates of intergroup recombination decreased. These speciation mechanisms are discussed below. The
Žne-scale dynamics that mediate the propagation of incipient species, or their demise,
are discussed elsewhere (Cohan, 2001).
FUNCTIONAL ECOLOGY AND M ETABOLIC
D IVERSITY
Metabolism Is Ecology
Because the origin of a distinct bacterial
lineage is contingent on the exploitation of
a novel ecological niche, this process plays
a key role in understanding bacterial speciation. Although a description of an organism’s
ecology includes complex evaluations of environmental tolerances, food sources, habitat
selection, and interactions with other species,
microbial ecologies nonetheless can be described and predicted on a molecular and genetic level. That is, the presence of a gene, and
the implementation of a gene product, can often be equated with ecological capabilities.
This intimacy of microbial physiology
with functional ecology reects primarily
the size of microorganisms. Bacterial adaptation to the consumption of novel food
sources typically entails the use of novel biochemistries. For example, consumption of
lactose as a carbon source requires the action of a ¯-galactosidase; expression of this
enzyme indicates an organism has a role
for ¯-galactoside degradation in its lifestyle,
whereas lack of this enzyme activity precludes exploitation of that resource. The mere
redeployment of existing metabolic pathways, or subtle alteration of morphology, will
not allow a bacterium to hydrolyze an unfamiliar glycosidic bond. Rather, the exploitation of novel environments often requires the
use of novel gene products.
In contrast, consider differentiation
among Darwin’s Žnches, where food selection can be predicted by the size and shape
of the beak. Although these organisms may
select morphologically different substances
as food, all of them are using the same
underlying physiology and biochemistry to
utilize these substrates (carbohydrates, fats,
and proteins). So, although ecological adaptation among these Žnches may entail the
consumption of different types of food,
novel biochemistry is not involved. Rather,
482
VOL. 50
S YSTEMATIC BIOLOGY
one would predict that differential regulation and implementation of developmental
processes would result in the necessary alterations of skeletal, muscular, and digestive
systems to accommodate the novel food
sources.
As expected, the physiological differentiation between bacterial species can be dramatic, with some organisms exhibiting complex characteristics that may be completely
absent from closely related organisms but
shared with distantly related organisms. For
example, the enteric bacterium E, coli is characterized by its ability to ferment lactose
by using the LacZ ¯-galactosidase, a function found in many bacteria, but absent from
the closely related species Salmonella enterica. Similarly, Salmonella can degrade citrate
and propanediol, synthesize coenzyme B12
de novo, and reduce thiosulfate to H2 S—all
characteristics lacking in E. coli. The acquisition of such novel metabolic and physiological capabilities could catalyze the process of
bacterial speciation.
Reinventing the Wheel
How did these novel capabilities arise?
Classic models for the evolution of novel
functions typically invoke gene duplication and divergence. For example, a ¯galactosidase may evolve by duplication of
the resident gene, placing one of the genes
under relaxed selection for function. This
duplicated gene would then be free to mutate and to adopt new functions. Yet this
model is somewhat unsatisfying, in that purifying selection cannot prevent the immediate mutation or deletion of the duplicated
gene. Moreover, loss of duplicated genes will
be selectively advantageous if aberrant gene
dosage is problematic, or if mutant forms
conferred dominant negative phenotypes.
Models for transient maintenance of duplicate genes have been proposed (Stoltzfus, 1999), but such models seek to prolong
the brief period available for the acquisition
of a selectively advantageous functions before one of the copies is inevitably lost. The
persistence of multiple copies of the same
gene—even if both are crippled by mutation
(Stoltzfus, 1999)—is intrinsically unstable,
because mutational processes, intragenomic
gene conversion, or intraspeciŽc recombination will eventually restore one copy, leading
to deletion of the second.
An additional conundrum appears if the
deployment of novel capabilities is important for the invasion of novel ecologies, and
hence speciation. In vitro studies suggest that
change in enzyme speciŽcity would entail
multiple changes, even for subtle alterations
in activity (Matsumura et al., 1999). If a new
environment is competitive, then gradual reŽnement of novel biochemical characteristics by alteration of endogenous genetic material (by duplication and divergence) will
provide insufŽcient selective advantages to
promote niche invasion. The immense population sizes of microorganisms make even
mildly deleterious alleles unsuitable for catalyzing niche invasion in competition with
organisms already bearing reŽned, highly efŽcient processes for accomplishing similar
tasks.
Inheriting the Wheel
Examination of the phylogenetic relationships among protein sequences shows that
rampant paralogy—the constant reinvention
of speciŽc enzymes from related protein sequences by duplication and divergence—is
not evident among bacteria. Rather, closely
related groups of enzymes typically catalyze
the same biochemical reactions, with the
same substrates and products; enzymes with
different substrate speciŽcities form distinct
groups. This pattern has been evident since
the early 1970s, when analysis of NADC binding dehydrogenases revealed clear segregation based on substrate speciŽcity (e.g.,
malate dehydrogenase, lactate dehydrogenase) and demonstrated these enzymes
were distinct from FAD-binding dehydrogenases (Rossman et al., 1974). Analyses
of additional protein families have yielded
similar results. So, although exceptions
have been noted (Wu et al., 1999), genes
encoding lactate dehydrogenases have not
arisen multiple times from parental malate
dehydrogenase (or vice versa), even though
a single mutation can enable this transition
(Wilks et al., 1988; Golding and Dean,
1998). Yet the duplication and divergence
model makes clear and contrary predictions
regarding the assortment of enzymatic
functions among members of a protein
family, in which novel enzymatic functions
(e.g., recognition of a new substrate) are
expected to arise numerous times. According to this model, one would expect that
2001
LAWRENCE—BACTERIAL S PECIATION
lactate dehydrogenases would have evolved
multiple times from within the clade of
malate dehydrogenases, and vice versa.
Because analyses of proteins families
demonstrate that enzymatic novelties have
arisen very few times, the distribution of
these enzymes among extant organisms—
including both Bacteria and Archaea—must
reect one of two processes. Either genes
encoding all enzymes were present in the
common ancestor of all known life (clearly
a cumbersome and infeasible proposition),
or genes have been mobilized among taxa
after their origin. This second model suggests that rather than reinvent the wheel every time it is needed (by point mutation from
transiently duplicated sequences present in
the same cytoplasm), bacteria can acquire the
wheel from other taxa. That is, horizontal genetic transfer can serve to distribute genes
encoding speciŽc metabolic functions among
diverse bacterial genomes (Lawrence, 1999a;
Lawrence and Roth, 1999). Although adaptation by way of internal genome dynamics do
occur, and have been documented in natural
populations (Sokurenko et al., 1998) and in
the laboratory (Rainey and Travisano, 1998;
Papadopoulos et al., 1999), lateral gene transfer allows fully functional pathways to be acquired and used for efŽcient exploitation of
novel environments. Below we explore the
feasibility of invoking lateral gene transfer
as the catalyst for Bacterial speciation.
I MPACT OF HORIZONTAL G ENE T RANS FER
Although gene acquisition is a powerful
mechanism for gaining new metabolic capabilities, it cannot be responsible for microbial
diversiŽcation if it occurs only very rarely.
Therefore, assessing the impact of horizontal gene transfer is tantamount to measuring its rate, which requires two sets of data.
First, the amount of foreign DNA present
in a genome must be assessed. Mere genecounting cannot establish a rate of gene
transfer, because bacterial genomes are littered with selŽsh elements—transposons, integrated bacteriophages, and the like—that
are foreign but typically do not contribute
functions that change the ecological character of a species (transposons bearing antibiotic resistance genes [Hall et al., 1999] and
bacteriophages mediating virulence [Waldor,
1998] are notable exceptions; see Campbell
[1981] and Levin and Bergstrom [2000]
483
regarding the role of accessory elements in
bacterial evolution). Thus, the time of introduction of foreign DNA must also be established. If a gene has persisted in a genome for
a sufŽciently long period without accumulating mutations that would abolish its encoded function, then we may infer that it has
evolved under purifying selection.
Identifying Foreign DNA
Three general methods have been used to
identify foreign genes in bacterial genomes
(Ochman and Lawrence, 1996; Ochman et al.,
2000). First, the recent acquisition of a gene
will be reected in its restricted distribution
among sibling species. Such data are readily
collected (albeit tediously if done systematically), but they support only an inference
that a gene may have been acquired. Second, lateral transfer will result in an atypically high degree of similarity between genes
found in otherwise unrelated organisms; for
example, two E. coli genes (f108 and f234) are
90% identical to genes encoding glutathione
transporters in humans and mice. Although
these classes of data can be quite conclusive, detecting foreign genes by this method
clearly depends on the breadth and depth of
the sequence database. Moreover, less striking similarities can result in spurious conclusions of horizontal gene transfer to explain
apparent phylogenetic incongruities, where
convergent evolution, mutational saturation,
long branch attraction, or other processes
may have produced the aberrant phylogenetic pattern.
Lastly, DNA sequences themselves often
provide clues to their ancestry. That is, genes
native to a bacterial genome accumulate
mutations that reect the directional mutation pressures intrinsic to that cytoplasm
(Sueoka, 1962, 1988, 1992, 1993); the resulting mutational biases are reected in the
nucleotide composition, codon usage biases
(Sharp and Li, 1987a, b; Mrazek and Karlin,
1999), and di- and trinucleotide frequencies
(Karlin, 1998; Campbell et al., 1999) within
coding regions. As a result, genes recently
integrated into a bacterial genome will exhibit atypical compositional patterns, having
evolved under a different suite of directional
mutation pressures. Thus, foreign genes
can be identiŽed as those bearing atypical
sequence features that cannot be readily
explained by internal processes (such as an
484
S YSTEMATIC BIOLOGY
unusual amino acid composition of the encoded protein) and appear unusual only in
their new genomic context. Hence, encyclopedic phylogenetic comparisons and homology searches are not necessary for the identiŽcation of acquired genes, although they
provide clear means for verifying inferences
of gene ancestry. The genes encoding glutathione transporters in E. coli noted above
were Žrst identiŽed as bearing unusual compositional patterns, and their foreign ancestry was veriŽed by their unusual similarity
to mammalian genes.
Analyses of subsets of genes from the
closely related enteric bacteria E. coli and
Salmonella enterica led to predictions that between 8% and 15% of their genomes was introduced by horizontal processes (Medigue
et al., 1991; Whittam and Ake, 1992; Ochman
and Lawrence, 1996; Lawrence and Ochman,
1997). These estimates were congruent with
the amount of unique DNA predicted from
alignments of their genetic maps (Riley and
Anilonis, 1978). More recent estimates based
on complete genome sequences yielded similar estimates for the E. coli genome (Lawrence
and Ochman, 1998). Similarly high values
have been inferred for the genomes of Aquifex
aeolicus (Aravind et al., 1998) and Thermotoga maratima (Nelson et al., 1999), for each of
which large fractions of their genome have
been inferred to be derived from the Archaea, although alternative interpretations
have been offered (Logsdon and Fuguy,
1999).
Rate of Lateral Gene Transfer Is Substantial
The same sequence features that enable identiŽcation of horizontally transferred
genes within a genome also allow an estimation of their time of arrival (Lawrence
and Ochman, 1997). Immediately after transfer, acquired genes will naturally resemble
the genes of their donor genome, reecting that particular set of directional mutation pressures. Over time, acquired genes
will accumulate mutations that reect the directional mutation pressures of their recipient genome and will ameliorate to resemble
native genes. The degree to which amelioration has progressed provides an estimate
of the time the acquired DNA has persisted
in the new host genome. Because all bacterial genomes display certain properties along
predictable and quantiŽable continua—such
VOL. 50
as nucleotide composition at codon positions (Muto and Osawa, 1987)—these measurements provide baselines against which
ameliorating genes can be compared. Genes
that conform to these relationships are not
in the process of amelioration, being either
long-term residents of a genome or having been acquired very recently. In contrast,
foreign genes ameliorating to a new set of
directional mutation pressures will deviate
from these relationships until they converge
on the patterns exhibited by their new host
genome.
Lawrence and Ochman (1997, 1998) identiŽed atypical genes in the E. coli genome
and estimated that 18% of the protein-coding
sequences were atypical and probably had
been introduced by lateral genetic transfer. QuantiŽcation of the amelioration times
of these genes established a rate of horizontal transfer of 16 kb per million years
(My)(Lawrence and Ochman, 1998), implying that 1.6 Mb of the E. coli DNA had been
acquired by lateral transfer processes since its
divergence 100 My ago from its sister lineage,
Salmonella enterica (Fig. 1). Although the bulk
of these sequences have had only transient
persistence times, >750 of the 4,288 proteincoding genes in the extant E. coli genome are
readily identiŽed as having been introduced
over the past several hundred million years.
The identiŽcation of foreign genes by
atypical compositional patterns is suited
only for detecting relatively recent acquisitions, which are those most likely to mediate recent speciation events. After a sufŽciently long time, acquired genes will be
fully ameliorated to their recipient genome—
having experienced the mutational biases of
their new host genomes—and will not be
identiŽed as atypical, or foreign. Therefore,
ancient lateral transfer events cannot be identiŽed by this method. Reconstructing ancient lateral transfer events would require
the application of traditional phylogenetic
methods to detect incongruities in sets of
gene trees. Because such inferences are readily confounded by convergent evolutionary
processes, phylogenetic incongruency alone
does not demand invocation of horizontal
processes. Although the rates of lateral gene
transfer may be substantial, and can serve to
confound phylogenetic inference (Doolittle,
1999a, b), analyses of complete genome sequences suggest that rampant recent horizontal gene transfer has not completely
2001
LAWRENCE—BACTERIAL S PECIATION
485
FIGURE 1. Evolution of bacterial genomes by genomic ux. The old genes from the ancestral chromosome are
lost, while the new genes are acquired by horizontal processes. Flux data are shown for the evolution of the E. coli
genome, taken from Lawrence and Ochman (1998). Acquired DNA is depicted by solid arrows and arcs; ancestral
DNA is depicted as open arrows and arcs.
obliterated phylogenetic signal from microbial genomes, where the bulk of chromosomal genes are phylogenetically distributed
in a manner consistent with the rDNA phylogeny (Huynen and Bork, 1998; Snel et al.,
1999).
Horizontal Events and Point Mutations
Confer Different Classes of Changes
Point mutations will almost always produce modest, incremental changes in the
performance of encoded functions. In contrast, horizontal process can, in a single
step, dramatically extend the cell’s repertoire of metabolic capabilities (Lawrence,
1997, 1999a; Lawrence and Roth, 1998, 1999).
The rate of horizontal gene transfer calculated above, 16 kb/My, is comparable to
the amount of variant information introduced into the E. coli genome by mutational
processes (Drake et al., 1998). However, although mutational processes introduce an
important number of changes into a bacterial
genome, most of these changes are effectively
neutral. In contrast, the information introduced by lateral gene transfer may allow for
immediate and effective exploitation of new
resources. In this way, horizontal processes
have most likely catalyzed the diversiŽcation
of enteric bacteria such as E. coli. Among enteric bacteria, all functions that can be used to
discriminate among closely related taxa can
be attributed to genes gained by horizontal
processes or lost by deletion. No functions
that differentiate between these organisms
can be attributed to gene products ances-
tral to these species that mediate different
processes.
The horizontal transfer of a single gene into
a naive genome will not be successful—that
is, the gene will not persist over time—if its
product does not confer a selectable function.
In many cases, multiple genes are required
for the implementation of a useful function,
such as degradation of a compound for energy or the biosynthesis of a cofactor or other
metabolite. Acquisition of novel functions is
facilitated by the organization of bacterial
genes into operons (clusters of cotranscribed
genes, whose products often contribute to a
single function), which offer highly promiscuous packages of genetic material that can,
in horizontal transmission, confer complex
metabolic capabilities to recipient taxa. In
contrast to the changes conferred by point
mutations, horizontal processes may deliver multiple functions simultaneously for
example, both an enzyme required for degradation of a new food source and a highafŽnity transporter allowing acquisition of
the new food source—in the form of the bacterial operon. Therefore, operons circumvent
the necessity for the ineffectual intermediate stages implicit in the evolution of complex, novel capabilities by point mutational
processes.
Although operons represent convenient
parcels for the mobilization of functions
among organisms, how did they arise? The
SelŽsh Operon Model proposes that lateral
gene transfer has catalyzed the assembly of
genes into operons by promoting facile transfer to naive genomes (Lawrence and Roth,
486
VOL. 50
S YSTEMATIC BIOLOGY
1996; Lawrence, 1997, 1999b, 2000). That is,
the physical clustering of genes allows all
information required for implementation of
a selectable function to be cotransferred. The
cluster improves the Žtness of a gene by allowing the gene to exploit both vertical and
horizontal inheritance. Because the organization of genes into operons does not immediately beneŽt the host, it may be considered a
selŽsh property of the constituent genes. And
because cotranscription of genes allows their
efŽcient expression in foreign hosts from a
promoter at the site of integration, the coalescence of clustered genes into operons can
also be considered a selŽsh process.
As predicted by the SelŽsh Operon Model,
operons in E. coli and Salmonella enterica typically encode nonessential metabolic functions; in many cases (cob, pdu, lac, phn, tct),
operons have clearly been obtained by lateral gene transfer (Lawrence and Roth, 1996).
In contrast, genes less likely to have been
subject to lateral gene transfer—for example, essential genes found in all potential
recipients—are rarely found in operons (notable exceptions to this, such as operons of
ribosomal proteins, are discussed elsewhere
[Lawrence and Roth, 1996]). Moreover, the
inconsistency of operon organization across
genes reects their constant assembly and
breakdown (Itoh et al., 1999). Hence, the
organization of genes into operons reects
the history of gene transfer among bacteria and their role in catalyzing bacterial
diversiŽcation.
Coupling DNA Acquisition and DNA Loss
Although the rate of horizontal transfer
in E. coli has been substantial, and potentially useful information has been delivered
by way of selŽsh operons, clearly bacterial genomes are not growing ever larger in
size (Bergthorsson and Ochman, 1995, 1998;
Ochman and Bergthorsson, 1995, 1998). And
far from enabling every possible biochemical
function, the physiology of an individual
bacterium reects the synergism of a speciŽc, deŽnable subset of metabolic capabilities. What limits bacterial genome size?
A Žnite population of individuals cannot maintain an inŽnite amount of information free from mutation. If mutation rates
(¹) are nonzero, individuals will accumulate mutations over time. If the effective population size (Ne ) is Žnite, some mutant in-
dividuals will succeed in reproducing, and
their progeny will not be eliminated from the
population. Whereas intraspeciŽc recombination (r) can recreate individuals free from
deleterious mutations (Muller, 1932), limited recombination—as seen in many bacteria (Dykhuizen and Green, 1991; Maynard
Smith et al., 1993; Guttman and Dykhuizen,
1994a, b)—will allow the Žxation of deleterious mutations, including those that eliminate potentially useful genes. These factors
all inuence the maximum amount of genomic information (G) that can be maintained (Lawrence and Roth, 1999). We can express this relationships as follows, where the
genome size can be expressed as functions ( f,
g, h) of the mutation rate, recombination rate,
and effective population size:
G / f (r)g(Ne )= h(¹)
(1)
The amount of genomic information that
can be maintained under purifying selection
must decrease as mutation rate increases, recombination rate decreases, or population
size decreases. As a result, genome size cannot increase indeŽnitely.
An integrated model of bacterial genome
evolution would offset chronic gene acquisition by horizontal transfer with gene loss
by deletion (Fig. 1). Empirical evidence supports this limitation to genome size. Despite
high rates of horizontal genetic transfer into
enteric bacterial species, the genomes of E.
coli, Salmonella enterica, and related organisms are notably uniform in size (Bergthorsson and Ochman, 1995, 1998). For example,
although bacterial genomes vary in size from
» 500 to >10,000 kb, the genome sizes of natural variants of E. coli vary far less, measuring
4,968 § 253 kb (Bergthorsson and Ochman,
1998), despite an inux of 16 kb/MY over the
past 100 MY. Although absolute genome size
cannot be equated to the amount of information maintained in a genome (because not
all base pairs actually carry information; see
below), these results demonstrate that gene
gain is indeed offset by gene loss in E. coli.
Comparisons among enteric bacteria have
revealed many cases in which gene products have been lost from certain lineages
while being maintained in other lineages.
Genes that confer selectable functions in
one ecological context may fail to provide a
beneŽt to the cell in another environmental
2001
LAWRENCE—BACTERIAL S PECIATION
context; such genes would be subject to loss
by mutation and genetic drift. For example, the phoA gene, encoding alkaline phosphatase, has been lost from the Salmonella lineage but is maintained in the genomes of virtually all other enteric bacteria (DuBose and
Hartl, 1990). In addition, genes may be lost
because their functions may interfere with
the adoption of a novel ecological role. For
example, the surface protease OmpT probably was lost from pathogenic Shigella because its function interferes with virulence
(Nakata et al., 1993). Similarly, Shigella lost
the cadA gene because its product, lysine
decarboxylase, also diminishes virulence
(Maurelli et al., 1998).
HETEROGENEITY IN HORIZONTAL G ENE
T RANSFER
Lateral Gene Transfer Varies Among
Lineages
Can analyses of the E. coli genome
be extended to form a general model of
bacterial innovation and diversiŽcation catalyzed by horizontal genetic transfer? Bacterial genomes vary dramatically in the proportions of foreign DNA they harbor (Fig. 2).
Yet the amount of foreign DNA within a
genome is not entirely predictable from the
genome size. For example, among organisms
with large genomes, the pathogen Mycobacterium tuberculosis harbors few foreign genes,
the mammalian commensal E. coli contains
487
many more acquired genes, and the soil bacterium Bacillus subtilus contains an intermediate amount. These results suggest that the
rate of lateral gene transfer derived for the E.
coli genome is not immediately applicable to
other organisms.
The variation in the amount of acquired
DNA evident in microbial genomes may be
attributable to several sources. First, organisms may differ in their exposure to foreign
DNA. For example, the intracellular lifestyle
of Rickettsia or Mycoplasma may reduce the
opportunity for DNA introduction into their
cytoplasm. Second, methods of DNA integration into the chromosome may differ. Acquired DNA in E. coli is strongly associated
with chromosomally located mobile genetic
elements that probably mediated its integration (Lawrence and Ochman, 1998); this association is also evident in the Synechocystis, Archeoglobus, and Helicobacter genomes
(Ochman et al., 2000). The smaller amount
of acquired DNA in some organisms may reect a dearth of mechanisms for allowing integration of foreign DNA once it has been
introduced into the cytoplasm.
Beyond these mechanical constraints,
natural selection is the Žnal arbiter of gene
exchange, and newly acquired genes must
provide a beneŽcial function for them to
persist. Yet bacterial populations can maintain only a Žnite amount of information in
their genomes (see above). Even if newly acquired genes provide a potentially beneŽcial
FIGURE 2. Horizontally transferred DNA present in bacterial genomes, after Ochman et al. (2000). Grey bars
denote protein-coding sequences native to the bacterial genome (present for at least 100 MY); black bars denote
atypical genes probably acquired recently by horizontal transfer. Atypical genes were identiŽed by the methods of
Lawrence and Ochman (1997, 1998).
488
S YSTEMATIC BIOLOGY
function, they must confer a sufŽciently
strong selective advantage to allow displacement of—that is, the loss of purifying selection on—existing information in the cell. Simply stated, the acquisition of new information necessitates the loss of existing information. Therefore, the rate of gene acquisition
should be inversely correlated with the quality of the information it must confer for the
acquired genes to persist. If an acquired gene
must provide a strong advantage to displace
existing information, the rate of acquisition
will be low, for few genes would make such
a contribution to organismal Žtness. Alternatively, if an acquired gene need provide only
a modest beneŽt to displace existing information, the apparent rate of acquisition will
be commensurately higher.
Classes of Genomic Information
How does variation in information content
lead to variation in the effective rate of horizontal transfer among genomes? Although
one may infer that total genomic information should increase linearly with genome
size, this is not always the case. EfŽciency
in operon organization aside, some information does scale linearly with gene number:
signals that allow for its appropriate expression, including transcription initiation and
termination signals; sequences that regulate
promoter activity and, within protein-coding
regions, signals for translation initiation and
termination; and the suite of requisite features that allow the gene product to perform
its function (e.g., appropriate transmembrane domains, export signal sequences, ligand binding domains, catalytic centers, and
other critical features) encoded in nonsynonymous sites. Strong purifying selection
maintains these classes of information, and
is reected in their high selection coefŽcients
(Fig. 3).
Additional classes of genomic information are not reected directly in the presence
or composition of gene products. Nonrandom codon usage demonstrates that some
genomes maintain a substantial amount of
information that inuences the expression,
not the composition, of protein products
(Sharp and Li, 1986, 1987a; Sharp et al., 1995).
Not simply the result of mutational biases,
codon usage bias reects the intervention of
natural selection in preventing the accumulation of certain synonymous substitutions
VOL. 50
among highly expressed genes. The degree
of codon usage bias is proportional to the expression level of a gene and is inversely correlated to its synonymous substitution rate
(Sharp and Li, 1987b). For example, AUC
(Ile), UUR (Leu), and AGR (Arg) codons are
underrepresented among highly expressed
genes in E. coli, because their cognate tRNAs
are rare; similarly, the preference of NAC
codons over NAU codons for tyrosine, histidine, asparagine, and aspartate reects the
differential binding of queuosine-bearing
tRNAs to each pair of codons. In genes with
codon usage bias, synonymous mutations introducing nonpreferred codons are counterselected, even thought the protein product is
not affected.
In addition, codon context bias
(Borodovskii et al., 1988) reects problematic juxtaposition of tRNAs within
the ribosome, which is strongly avoided
(Lawrence, unpubl. results). For example, although GAA glutamine codons are favored
on average 4:1 over GAG among highly
expressed genes in E. coli, it is favored by
an extraordinary 53:1 when followed by
guanosine (GAGG is highly avoided), yet
barely favored by 1.65:1 when followed by
a cytosine (GAGC is modestly avoided)
(Lawrence, unpubl. results). Because these
classes of information do not affect the
composition of gene products, only their
expression, one may infer that the purifying
selection maintaining this information is
reected in smaller selection coefŽcients
(Fig. 3). That is, nonsynonymous substitutions are expected to impart a more dramatic
average phenotype than are synonymous
substitutions.
The information maintained in genomic
codon usage and codon context biases is
qualitatively different from the other classes
of information discussed above. Rather than
providing additional genes—and therefore
additional metabolic capabilities—to the cell,
additional information reected in codon biases reŽnes the expression of existing genes.
That is, genomes maintaining large amounts
of codon usage bias have invested information in Žne-tuning the efŽcient expression of
a particular suite of genes; they did not invest
this information in the maintenance of additional genes. Because the amount of genomic
information that can be maintained is Žnite,
this pattern demonstrates a trade-off in how
genomic information is apportioned.
2001
LAWRENCE—BACTERIAL S PECIATION
489
FIGURE 3. Types of information present in microbial genomes. Curves are plotted along arbitrary axes, the
ordinate depicting a gradient of selection coefŽcients, the abscissa depicting the amount of information maintained.
The areas under the arbitrary curves represent the total genomic information for each class of sites. Information
representing “genetic headroom” is indicated with the gray bar.
Genetic Headroom
Given that genomes are limited in their information capacity, acquisition of additional
information by way of laterally transferred
sequences must be offset by information loss.
A genome cannot maintain both the full complement of ancestral information and the information contained in the newly acquired
genes. Which information is maintained and
which information is discarded? Clearly the
information that maximally augments the Žtness of the organism—that associated with
the highest selection coefŽcients—will be
maintained. Genetic headroom can be deŽned as information that bears very low
selection coefŽcients—exhibited as codon
usage bias, codon context bias, and other
sites that do not contribute directly to
metabolic capabilities—and can be removed
from purifying selection without altering the
metabolic capabilities of the organism.
Organisms with large genetic headroom
can explore novel ecologies with impunity,
because the information that is transiently
sacriŽced does not handicap the organism
with respect to its metabolic capabilities. That
is, maintenance of the additional metabolic
capabilities encoded by newly acquired
genes is offset by accumulation of mutations that affect codon usage bias and
codon context bias (but not primary amino
acid sequences), the class of information
bearing the lowest selection coefŽcients
(Fig. 3). If the niche is successful, ancestral
genes may be discarded (e.g., the Shigella
cadA and ompT genes) as the organism adapts
to a new ecological role. Alternatively, if the
new niche is not successful, the acquired
genes will be discarded. In either case, ancestral physiology has been maintained during
the exploratory phase (Fig. 4).
In organisms with little genetic headroom, experimentation with novel ecologies
upon acquisition of additional information
in novel genes cannot be offset by allowing accumulation of synonymous substitutions. Rather, the maintenance of additional
information is offset by loss of protein-coding
sequences or other classes of information
bearing high selection coefŽcients (Fig. 3).
Without genetic headroom, an organism may
compromise ancestral physiology to pursue
novel ecological routes (Fig. 4). If the lineage is not successful, it cannot return to its
ancestral state. In this way, the magnitude
of genetic headroom can promote microbial
490
S YSTEMATIC BIOLOGY
VOL. 50
FIGURE 4. Models for the reapportionment of genomic information upon acquisition of genes by horizontal
transfer. On the left, organisms with high genetic headroom can offset information gain by transient accumulation
of synonymous substitutions. On the right, organisms with little genetic headroom offset the gain of information
with loss of ancestral coding sequences.
2001
LAWRENCE—BACTERIAL S PECIATION
diversiŽcation by allowing purifying selection to be reapportioned between counterselecting synonymous substitutions in native
genes and nonsynonymous substitutions in
acquired genes.
This model predicts that organisms with
greater genetic headroom can maintain
greater numbers of genes acquired by lateral
gene transfer. Because codon usage bias can
be assessed within individual genomes and
does not require extensive comparisons with
homologues for calculation of relative rates
of evolution, quantitation of codon usage
bias offers a cogent vehicle for measuring
overall genomic information. Thus, the
magnitude of codon usage bias may be an
accurate predictor of the rate of effective
horizontal transfer. As expected, the overall
extent of codon usage bias, here quantitated
as the average normalized Â2 of codon usage,
is a good predictor of the amount of horizontally transferred DNA in a microbial genome
(Fig. 5); this correlation between average Â2
of codon usage and percentage atypical DNA
is independent of genome size. Organisms
capable of maintaining laterally transferred
genes presumably would be more prone to
speciation, because acquired genes would
promote the exploitation of novel environments. Thus, genetic headroom can be
viewed as a predictor of the potential of a
lineage to diversify and to proliferate.
491
LIMITATIONS ON HORIZONTAL G ENE
T RANSFER
Shufing the Deck
Horizontal transfer mediates a combinatorial process whereby organisms can be assembled in a stepwise fashion from both
existing functions and those acquired from
other organisms. Yet organisms are clearly
constrained in the paths available for realistic diversiŽcation. For example, the ancestor of E. coli and Salmonella enterica was
likely a commensal inhabitant of the differentiated, lower intestinal tract. E. coli evolved
as a commensal of mammalian gut environments, optimized to grow at 37± C and able to
degrade milk sugars. Salmonella still inhabits
a variety of gastrointestinal tracts, including
those of reptiles and birds, but has adopted a
pathogenic lifestyle as well. Although horizontal gene transfer was instrumental in
catalyzing the evolution of each of these
lineages, the new ecological roles of these
species were not dramatic leaps from the ancestral ecology. That is, a facultatively anaerobic, mammalian commensal did not evolve
from a photosynthetic cyanobacterium. Similarly, we would not expect that the immediate descendents of an E. coli lineage would
include carnivorous, social predators like the
Myxobacteria. Therefore, although horizontal gene transfer can introduce functions capable of catalyzing bacterial speciation, the
potential ecologies exploited by species-tobe must be reasonably accessible.
Inventing the Cards
FIGURE 5. Correlation between the amount of foreign DNA in a bacterial genome and the amount of genetic headroom. Data points represent complete genome
analyses for the organisms listed in Figure 2. Average codon usage bias was calculated from length (L)normalized  2 values for each gene by using codonposition-speciŽc nucleotide compositions as expected
values. Atypical DNA was calculated as described
(Lawrence and Ochman, 1997, 1998).
Although horizontal genetic transfer can
reshufe metabolic capabilities among organisms, allowing for novel combinations of
capabilities to deŽne incipient species, gene
transfer is unlikely to result directly in the
creation of truly novel metabolic capabilities. That is, although a current organism
is likely to gain the ability to degrade ¯galactosides by the acquisition of another organism’s ¯-galactosidase (as did E. coli in obtaining the lacZYA operon), the enzyme must
have evolved at some point by mutational
processes. However, gene transfer can facilitate the evolution of novel functions by allowing for gene duplication and divergence
of function—the classic model for the evolution of novel genes—to occur in different
cytoplasms. That is, the difŽculties inherent
492
VOL. 50
S YSTEMATIC BIOLOGY
in asking for a duplicated gene to be maintained until a fortuitous beneŽcial mutation
occurs are circumvented if novel functionality evolves in separate cytoplasms; the genes
may be reunited in the same cytoplasm by intraspeciŽc or interspeciŽc recombination after the evolution of distinct functions.
Role of IntraspeciŽc Recombination
In contrast to lateral, interspeciŽc gene
transfer, intraspeciŽc gene transfer is unlikely to catalyze speciation. When genes
are mobilized among conspeciŽc strains, homologous recombination facilitates their integration into the chromosome. This process
has been demonstrated to reassort alleles in
E. coli (Milkman and Stoltzfus, 1988; Milkman and Bridges, 1990, 1993; Dykhuizen and
Green, 1991; Guttman and Dykhuizen, 1994a;
Selander et al., 1996; Milkman, 1997) and
other organisms. Novel functionality is not
introduced directly, but intraspeciŽc recombination will allow propagation of acquired
genes throughout a species. As organisms diverge, homologous recombination becomes
less efŽcient (Zawadzki et al., 1995; Majewski
et al., 2000), as mismatch correction systems prevent the formation of heteroduplex
strands (Vulic et al., 1997, 1999; Majewski and
Cohan, 1998, 1999). Therefore, homologous
recombination becomes less efŽcient at transferring traits across species boundaries.
This caveat does not diminish in any way
the role of intraspeciŽc recombination in distributing potentially beneŽcial alleles within
species boundaries. On the contrary, it has
been estimated that the rate of recombination among E. coli strains is on the order of
the mutation rate (Guttman and Dykhuizen,
1994a, b) and is as much as 10 times the mutation rate in recombinagenic species such
as Streptomyces (Feil et al., 2000); as a result,
a single nucleotide may be 50 times more
likely to change through recombination than
through mutational processes (Guttman and
Dykhuizen, 1994a; Guttman, 1997; Feil et al.,
2000). Hence, intraspeciŽc recombination is
a formidable force behind periodic selection
events that would allow beneŽcial genes, including those introduced by horizontal genetic transfer, to rise to high frequency in bacterial populations. The population dynamics
mediating the dispersal of beneŽcial genes
and, hence, speciation, are discussed elsewhere (Cohan, 2001).
PUS H AND P ULL IN THE S PECIATION
PROCESS
Pull Speciation
In standard models of speciation in eukaryotes (Vrba, 1985; King, 1993), two events
are crucial for successful speciation to occur. First, gene ow must be reduced between the nascent species (reproductive isolation) to avoid coalescence of the lineages
into a single species. Second, if they are to
coexist, the organisms eventually must play
markedly different ecological roles to avoid
direct competition. Although many mechanisms have been proposed that allow for reproductive isolation, models for the evolution of phenotypic differentiation between
the two diverging lineages share a common
feature: Natural variation found in a parental
population is apportioned differently among
daughter populations. For example, differential response to selection may allow for differential utilization along a resource gradient (Fig. 6, left); these differences then allow
for simultaneous (for sympatric speciation)
or eventual (for allopatric speciation) coexistence of the two taxa.
Intrinsically, this separation of ecological
roles is a gradual process, in which incipient
species-to-be progressively alter their phenotypic character by selection for naturally
arising variants present in the parental population. As the lineages are diversifying, the
average phenotype in at least one daughter population is being slightly altered relative to that in the parental population. In
this way, daughter populations are “pulled”
away from each other and into new niches
by the action of natural selection on variant
traits. Because daughter populations are effectively enriched for variants found in the
parental populations, incipient species initially exhibit characteristics that are effectively subsets of those seen in their parental
populations. That is, parental populations
always contain the seeds of the incipient
species, and the vagaries of population dynamics dictate when species are created from
the preexisting variation within a parental
population.
Push Speciation
The prevalence of horizontal transfer in
prokaryotes makes this view incongruent
with bacterial speciation. Acquired genes
2001
LAWRENCE—BACTERIAL S PECIATION
493
FIGURE 6. Two models for speciation, pull and push (see text). The distribution of individuals is depicted along
an arbitrary phenotypic axis.
have the potential to alter dramatically the
metabolic capabilities of recipient organisms;
cells can suddenly Žnd themselves performing feats that were never within the grasp
of the parent population (Fig. 6, right). In
a way, daughter populations are “pushed”
into new niches beyond the scope of
their parental populations. If new functions
mediate competitive exploitation of this environment, the new lineage will persist. Additional acquired genes will reinforce the ecological distinctiveness of each lineage, and
inevitable gene loss will further differentiate between nascent species and their parent
populations.
In time, much of the steady rain of horizontally transferred DNA into bacterial genomes
will quickly evaporate, not having conferred
useful functions. Eventually, however, some
functions will be introduced, packaged in
selŽsh operons, that allow the organism to
invade a new niche successfully. The inevitable loss of ancestral functions provides
further ecological differentiation between
parent and daughter populations that serves
to counterselect recombinants between these
populations. In this way, we can view bacterial speciation as an inevitable process, which
must occur in the face of widespread gene
loss and acquisition. From a phylogenetic
standpoint, dendrograms of descent are less
accurately represented as phylogenetic trees
than as intricate webs (Doolittle, 1999a, b),
where the ow of genes among clades has
promoted the diversiŽcation of prokaryotic
life.
ACKNOWLEDGMENTS
I thank Ford Doolittle, Matt Kane, and an anonymous reviewer for helpful comments on the manuscript.
This work was supported by grants from the Alfred P.
Sloan Foundation and the David and Lucile Packard
Foundation.
R EFERENCES
ARAVIND , L., R. L. TATUSOV , Y. I. WOLF, D. R. WALKER ,
AND E. V. KOONIN. 1998. Evidence for massive gene
exchange between archaeal and bacterial hyperthermophiles. Trends Genet. 14:442–444.
BERGTHORSSON, U., AND H. OCHMAN. 1995. Heterogeneity of genome size among natural isolates of Escherichia coli. J. Bacteriol. 177:5784 –5789.
BERGTHORSSON, U., AND H. OCHMAN. 1998. Distribution of chromosome length variatio n in natural isolates of Escherichia coli. Mol. Biol. Evol. 15:6–16.
BORODOVSKII , M. Y., V. A. SHEPELEV , AND A. A.
ALEKSANDR OV. 1988. Context-connected shift pattern of the frequencies of synonymous codons in Escherichia coli. Mol. Biol. 22:767–779 (in Russian).
BROWN, J. R., F. T. ROBB , R. WEISS , AND W. F.
DOOLITTLE. 1997. Evidence for the early divergence of
tryptophanyl- and tyrosyl-tRNA synthetases. J. Mol.
Evol. 45:9–16.
BUCHANAN-WOLLASTON, V., J. E. PASSIATORE, AND F.
CANON. 1987. The mob and oriT mobilization functions
of a bacterial plasmid promote its transfer to plants.
Nature 328:170–175.
494
S YSTEMATIC BIOLOGY
CAMPBELL, A. 1981. Evolutionary signiŽcance of accessory DNA elements in bacteria. Annu. Rev. Microbiol.
35:55–83.
CAMPBELL, A., J. MR AZEK , AND S. K ARLIN. 1999. Genome
signature comparisons among prokaryote, plasmid,
and mitochondrial DNA. Proc. Natl. Acad. Sci. USA
96:9184–9189.
CASJENS , S. 1998. The diverse and dynamic structure of
bacterial genomes. Annu. Rev. Genet. 32:339–377.
COHAN, F. M. 2001. Bacterial species and speciation. Sys.
Biol. 50: (this issue).
DOOLITTLE, W. F. 1999a. Lateral genomics. Trends Cell.
Biol. 9:M5–8.
DOOLITTLE, W. F. 1999b. Phylogenetic classiŽcation and
the universal tree. Science 284:2124–2129.
DRAKE, J. W., B. CHARLESWORTH, D. CHARLESWORTH,
AND J. F. CROW. 1998. Rates of spontaneous mutation.
Genetics 148:1667 –1686.
DUBOSE, R. F., AND D. L. HARTL. 1990. The molecular
evolution of alkaline phosphatase: Correlating variation among enteric bacteria to experimental manipulations of the protein. Mol. Biol. Evol. 7:547–577.
DYKHUIZEN, D. E. 1998. Santa Rosalia revisited: Why
are there so many species of bacteria? Antoine van
Leeuwenhoek 73:25–33.
DYKHUIZEN, D. E., AND L. GREEN. 1991. Recombination in Escherichia coli and the deŽnition of biological
species. J. Bacteriol. 173:7257 –7268.
FEIL, E. J., J. M. SMITH, M. C. ENRIG HT , AND B. G. SPRATT .
2000. Estimating recombinational parameters in streptococcus pneumoniae from multilocus sequence typing
data. Genetics 154:1439 –1450.
FIGGE, R. M., M. SCHUBERT , H. BRINKMANN, AND R.
CERFF. 1999. Glyceraldehyde-3-phosphate dehydrogenase genes in eubacteria and eukaryotes: Evidence
for intra- and inter-kingdom gene transfer. Mol. Biol.
Evol. 16:429–440.
GOGARTEN, J. P., H. KIBAK , P. DITTRICH, L. TAIZ, E.
J. BOWMAN, B. J. BOWMAN, M. F. MANOLSON, R. J.
POOLE, T. DATE, T. OS HIMA, ET AL. 1989. Evolution of
the vacuolar HC -ATPase: Implications for the origin of
eukaryotes. Proc. Natl. Acad. Sci. USA 86:6661–6665.
GOLDING , G. B., AND A. M. DEAN . 1998. The structural
basis of molecular adaptation. Mol. Biol. Evol. 15:355–
369.
GRIBALDO , S., AND P. CAMMAR ANO . 1998. The root of
the universal tree of life inferred from anciently duplicated genes encoding components of the proteintargeting machinery. J. Mol. Evol. 47:508–516.
GUTTMAN, D. S. 1997. Recombination and clonality in
natural populations of Escherichia coli. trends Ecol.
Evol. 12:16–22.
GUTTMAN, D. S., AND D. E. DYKHUIZEN. 1994a. Clonal
divergence in Escherichia coli as a result of recombination, not mutation. Science 266:1380–1383.
GUTTMAN, D. S., AND D. E. DYKHUIZEN. 1994b. Detecting
selective sweeps in naturally occurring Escherichia coli.
Genetics 138:993–1003.
HALL, R. M., C. M. COLLIS , M. J. KIM , S. R. P ARTRIDGE,
G. D. RECCHIA, AND H. W. STOKES . 1999. Mobile gene
cassettes and integrons in evolution. Ann. N.Y. Acad.
Sci. 870:68–80.
HARTL, D. L., AND D. E. DYKHUIZEN. 1984. The population genetics of Escherichia coli. Annu. Rev. Genet.
18:31–68.
HEINEMANN , J. A., AND G. F. J. SPR AGUE. 1989. Bacterial
conjugative plasmids mobilize DNA transfer between
bacteria and yeast. Nature 340:205–209.
VOL.
50
HUYNEN, M. A., AND P. BORK . 1998. Measuring
genome evolution. Proc. Natl. Acad. Sci. USA 95:442–
444.
ITOH, T., K. TAKEMO TO , H. MORI, AND T. GOJOBORI.
1999. Evolutionary instability of operon structures
disclosed by sequence comparisons of complete microbial genomes. Mol. Biol. Evol. 16:332–346.
IWABE, N., K.-I. KUMA, M. HASEGAWA, S. OSAWA, AND
T. MIYATA. 1989. Evolutionary relationship of archaebacteria, eubacteria, and eukaryotes inferred from
phylogenetic trees of duplicated genes. Proc. Natl.
Acad. Sci. USA 86:9355–9359.
KARLIN, S. 1998. Global dinucleotide signatures and
analysis of genomic heterogeneity. Curr. Opin. Microbiol. 1:598–610.
KING , M. 1993. Species evolution. Cambridge Univ.
Press, Cambridge.
LAWRENCE, J. G. 1997. SelŽsh operons and speciatio n by
gene transfer. Trends Microbiol. 5:355–359.
LAWRENCE, J. G. 1999a. Gene transfer, speciation, and the
evolution of bacterial genomes. Curr. Opin. Microbiol.
2:519–523.
LAWRENCE, J. G. 1999b. SelŽsh operons: The evolutionary impact of gene clustering in the prokaryotes and
eukaryotes. Curr. Opin. Genet. Dev. 9:642–648.
LAWRENCE, J. G. 2000. Clustering of antibiotic resistance
genes : Beyond the selŽsh operon. ASM News 66:281–
286.
LAWRENCE, J. G., AND H. OCHMAN. 1997. Amelioration
of bacterial genomes: Rates of change and exchange.
J. Mol. Evol. 44:383–397.
LAWRENCE, J. G., AND H. O CHMAN. 1998. Molecular archaeology of the Escherichia coli genome. Proc. Natl.
Acad. Sci. USA 95:9413–9417.
LAWRENCE, J. G., AND J. R. ROTH. 1996. SelŽsh operons:
Horizontal transfer may drive the evolution of gene
clusters. Genetics 143:1843–1860.
LAWRENCE, J. G., AND J. R. ROTH. 1998. Roles of horizontal transfer in bacterial evolution. Pages 208–225
in Horizontal Ttransfer (M. Syvanen, and C. I. Kado,
eds.). Chapman and Hall, London.
LAWRENCE, J. G., AND J. R. ROTH. 1999. Genomic ux:
Genome evolution by gene loss and acquisition. Pages
263–289 in Organization of the prokaryotic genome (R.
L. Charlebois, ed.). ASM Press, Washington, D.C.
LEVIN, B. R., AND C. T. BERGSTROM . 2000. Bacteria
are different: Observations, interpretations, speculations, and opinions about the mechanisms of adaptive
evolution in prokaryotes. Proc. Natl. Acad. Sci. USA
97:6981–6985.
LINNEUS , C. 1742. Systema Naturale.
LOGSDON, J. M., AND D. M. FUGUY. 1999. Thermotoga
heats up lateral gene transfer. Curr. Biol. 9:R747–R751.
MAJEWS KI , J., AND F. M. COHAN. 1998. The effect of mismatch repair and heteroduplex formation on sexual
isolation in Bacillus. Genetics 148:13–18.
MAJEWS KI , J., AND F. M. COHAN. 1999. DNA sequence
similarity requirements for interspeciŽc recombination in Bacillus. Genetics 153:1525 –1533.
MAJEWS KI , J., P. ZAWADZKI, P. PICKERILL, F. M. COHAN,
AND C. G. DOWSON. 2000. Barriers to genetic exchange between bacterial species: Streptococcus pneumoniae transformation. J. Bacteriol. 182:1016–1023.
MATSUMURA, I., J. B. WALLINGFORD, N. K. SURANA,
P. D. VIZE, AND A. D. ELLINGTON. 1999. Directed
evolution of the surface chemistry of the reporter
enzyme beta-glucuronidase. Nat. Biotechnol. 17:696–
701.
2001
LAWRENCE—BACTERIAL S PECIATION
MAURELLI , A. T., R. E. FERNÁNDEZ, C. A. BLOCH, C. K.
RODE, AND A. FASANO . 1998. “Black holes” and bacterial pathogenicity: A large genomic deletion that
enhances the virulence of Shigella spp. and enteroinvasive Escherichia coli. Proc. Natl. Acad. Sci. USA
95:3943–3948.
MAYNARD S MITH, J., N. H. SMITH , M. O’ROURKE, AND B.
G. SPRATT . 1993. How clonal are bacteria? Proc. Natl.
Acad. Sci. USA 90:4384–4388.
MAYR, E. 1942. Systematics and the origin of species.
Columbia Univ. Press, New York.
MAYR, E. 1954. Change of genetic environment and evolution. Pages 156–180 in Evolution as a process (J. S.
Huxley, A. C. Hardy, and E. B. Ford, eds.). Allen and
Unwin, London.
MAYR, E. 1963. Animal species and evolution. Harvard
Univ. Press, Cambridge, Massachusetts.
MEDIGUE, C., T. ROUXEL, P. VIGIER , A. HENAUT , AND A.
DANCHIN. 1991. Evidence for horizontal gene transfer
in Escherichia coli speciation. J. Mol. Biol. 222:851–856.
MILKMAN, R. 1997. Recombination and population
structure in Escherichia coli. Genetics 146:745–750.
MILKMAN, R.,AND M. M. BRIDGES . 1990. Molecular evolution of the E. coli chromosome. III. Clonal frames.
Genetics 126:505–517.
MILKMAN, R., AND M. M. BRIDGES . 1993. Molecular evolution of the E. coli chromosome. IV. Sequence comparisons. Genetics 133:455–468.
MILKMAN, R., AND A. STOLTZFUS . 1988. Molecular evolution of the Escherichia coli chromosome. II. Clonal
segments. Genetics 120:359–366.
MRAZEK , J., AND S. KARLIN. 1999. Detecting alien genes
in bacterial genomes. Ann. N.Y. Acad. Sci. 870:314–
329.
MULLER , H. 1932. Some genetic aspects of sex. Am. Nat.
66:118–138.
MUTO , A., AND S. OSAWA. 1987. The guanine and cytosine content of genomic DNA and bacterial evolution.
Proc. Natl. Acad. Sci. USA 84:166–169.
NAKATA, N., T. TOBE, I. FUKUDA, T. SUZUKI, K.
KOMATSU, M. YOSHIKAWA, AND C. SASAKAWA. 1993.
The absence of a surface protease, OmpT, determines
the intercellular spreading ability of Shigella: The relationship between the ompT and kcpA loci. Mol. Microbiol. 9:459–468.
NELSON, K. E., R. A. CLAYTON, S. R. GILL, M. L. GWINN,
R. J. DODSON, D. H. HAFT, E. K. HICKEY, J. D.
PETERS ON, W. C. NELSON, K. A. KETCHUM , ET AL.
1999. Evidence for lateral gene transfer between Archaea and bacteria from genome sequence of Thermotoga maritima. Nature 399:323–329.
OCHMAN, H., AND U. BERGTHORS SON. 1995. Genome
evolution in enteric bacteria. Curr. Opin. Genet. Dev.
5:734–738.
OCHMAN, H., AND U. BERGTHORSS ON. 1998. Rates and
patterns of chromosome evolution in enteric bacteria.
Curr. Opin. Microbiol. 1:580–583.
OCHMAN, H., AND J. G. LAWRENCE. 1996. Phylogenetics and the amelioratio n of bacterial genomes.
Pages 2627–2637 in Escherichia coli and Salmonella typhimurium: Cellular and molecular biology, 2nd edition (F. C. Neidhardt, R. Curtiss III, J. L. Ingraham, E.
C. C. Lin, K. B. Low, B. Magasanik, W. S. Reznikoff, M.
Riley, M. Schaechter, and H. E. Umbarger, eds.). American Society for Microbiology, Washington, D.C.
OCHMAN, H., J. G. LAWRENCE, AND E. GROISMAN. 2000.
Lateral gene transfer and the nature of bacterial innovation. Nature 405:299–304.
495
PAPADOPOULOS , D., D. SCHNEIDER , J. MEIER -EIS S , W.
ARBER , R. E. LENSKI , AND M. BLOT . 1999. Genomic
evolution during a 10,000-generation experiment with
bacteria. Proc. Natl. Acad. Sci. USA 96:3807–3812.
PATERSON, H. E. H. 1985. The recognition concept of
species. Pages 21–29 in Species and speciation (E. S.
Vrba, ed.). Transvaal Museum, Pretoria, South Africa.
RAINEY, P. B., AND M. TRAVISANO . 1998. Adaptive radiation in a heterogeneous environment. Nature 394:69–
72.
RILEY, M., AND A. ANILONIS . 1978. Evolution of the bacterial genome. Annu. Rev. Microbiol. 32:519–560.
ROS SMAN, M. G., D. MORAS , AND K. W. OLSEN. 1974.
Chemical and biological evolution of a nucleotidebinding protein. Nature 250:194–199.
SELANDER , R. K., J. LI , AND K. NELSON. 1996. Evolutionary genetics of Salmonella enterica. Pages 2691–2707 in
Escherichia coli and Salmonella typhimurium: Cellular
and molecular biology, 2nd edition (F. C. Neidhardt,
R. Curtiss III, J. L. Ingraham, E. C. C. Lin, K. B. Low, B.
Magasanik, W. S. Reznikoff, M. Riley, M. Schaechter,
and H. E. Umbarger, eds.). American Society for Microbiology, Washington, D.C.
SHARP, P. M., M. AVEROF, A. T. LLOYD, G. MATAS SI ,
AND J. F. PEDEN. 1995. DNA sequence evolution: The
sounds of silence. Philos. Trans. R. Soc. London B Biol.
Sci. 349:241–247.
SHARP, P. M., AND W.-H. LI . 1986. Codon usage in regulatory genes in Escherichia coli does not reect selection
for ‘rare’ codons. Nucleic. Acids Res. 14:7737–7749.
SHARP, P. M., AND W.-H. LI . 1987a. The codon adaptation index—a measure of directional synonymous
codon usage bias, and its potential applications. Nucleic Acids Res. 15:1281–1295.
SHARP, P. M., AND W.-H. LI . 1987b. The rate of synonymous substitution in enterobacterial genes is inversely
related to codon usage bias. Mol. Biol. Evol. 4:222–
230.
SNEL, B., P. BORK , AND M. HUYNEN. 1999. Genome phylogeny based on gene content. Nat. Genet. 21:108–
110.
SOKURENKO , E. V., V. CHESNOKOVA, D. E. DYKHUIZEN,
I. OFEK , X. R. WU, K. A. K ROGFELT, C. STR UVE, M. A.
SCHEMBR I , AND D. L. HASTY. 1998. Pathogenic adaptation of Escherichia coli by natural variation of the
FimH adhesin. Proc. Natl. Acad. Sci. USA 95:8922–
8926.
STOLTZFUS , A. 1999. On the possibility of constructive
neutral evolution. J. Mol. Evol. 49:169–181.
SUEOKA, N. 1962. On the genetic basis of variation and
heterogeneity in base composition. Proc. Natl. Acad.
Sci. USA 48:582–592.
SUEOKA, N. 1988. Directional mutation pressure and
neutral molecular evolution. Proc. Natl. Acad. Sci.
USA 85:2653–2657.
SUEOKA, N. 1992. Directional mutation pressure, selective constraints, and genetic equilibria. J. Mol. Evol.
34:95–114.
SUEOKA, N. 1993. Directional mutation pressure, mutator mutations, and dynamics of molecular evolution.
J. Mol. Evol. 37:137–153.
SYVANEN, M., AND C. I. KADO (eds.). 1998. Horizontal
gene transfer. Chapman & Hall, London.
TORSVIK , V., J. GOKS øYR , AND F. L. DAAE. 1990. High
diversity of DNA in soil bacteria. Appl. Environ. Microbiol. 56:776–781.
VAN VALEN, L. 1976. Ecological species, multispecies,
and oaks. Taxon 25:223–239.
496
S YSTEMATIC BIOLOGY
VRBA, E. S. (ed.). 1985. Species and Sspeciation.
Transvaal Museum, Pretoria, South Africa.
VULIC, M., F. DIONISIO , F. TAD DEI , AND M. RADMAN.
1997. Molecular keys to speciation: DNA polymorphism and the control of genetic exchange in Enterobacteria. Proc. Natl. Acad. Sci. USA 94:9763–
9767.
VULIC, M., R. E. LENSKI , AND M. RADMAN. 1999. Mutation, recombination, and incipient speciation of bacteria in the laboratory. Proc. Natl. Acad. Sci. USA
96:7348–7351.
WALDOR , M. K. 1998. Bacteriophage biology and bacterial virulence. Trends Microbiol. 6:295–297.
WHITTAM , T. S., AND S. AKE. 1992. Genetic polymorphisms and recombination in natural populations of Escherichia coli. Pages 223–246 in Mechanisms of molecular evolution (N. Takahata and
A. G. Clark, eds.). Japan ScientiŽc Society Press,
Tokyo.
WILEY, E. O. 1978. The evolutionary species concept reconsidered. Syst. Zool. 27:17–26.
VOL. 50
WILKS , H. M., K. W. HART , R. FEENEY, C. R. DUNN,
H. MUIRHEAD , W. N. CHIA , D. A. BARSTOW, T.
ATKINSO N, A. R. CLARKE, AND J. J. HOLBROOK. 1988.
A speciŽc, highly active malate dehydrogenase by redesign of a lactate dehydrogenase framework. Science
242:1541 –1544.
WOESE, C. R., O. KAND LER , AND M. L. WHEELIS . 1990. Towards a natural system of organisms: Proposal for the
domains archaea, bacteria, and eucarya. Proc. Natl.
Acad. Sci. USA 87:4576–4579.
WU, G., A. FISER , B. TER K UILE, S. ALI , AND M. MÜLLER .
1999. Convergent evolution of Trichomonas vaginalis
lactate dehydrogenase from malate dehydrogenase.
Proc. Natl. Acad. Sci. USA 96:6285–6290.
ZAWADZKI, P., M. S. ROBERTS , AND F. M. COHAN. 1995.
The log-linear relationship between sexual isolation
and sequence divergence in Bacillus transformation is
robust. Genetics 140:917–932.
Received 14 April 2000; accepted 19 June 2000
Associate Editor: M. Kane