Download The relative contributions of recombination and point mutation to the

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Ecological fitting wikipedia , lookup

Bifrenaria wikipedia , lookup

Latitudinal gradients in species diversity wikipedia , lookup

Theoretical ecology wikipedia , lookup

Community fingerprinting wikipedia , lookup

Molecular ecology wikipedia , lookup

Transcript
602
The relative contributions of recombination and point mutation
to the diversification of bacterial clones
Brian G Spratt*, William P Hanage* and Edward J Feil†
Low levels of recombination in bacterial species have often
been inferred from the presence of linkage disequilibrium
between the alleles at different loci in the population.
However, significant linkage disequilibrium is inevitable in
organisms that divide by binary fission, and recombinational
replacements must be very frequent, compared to point
mutation, to dissipate disequilibrium. Recent studies using
data from multilocus sequence typing indicate that, in many
species, recombinational replacements contribute more
greatly to clonal diversification than do point mutations and, in
some species, recombination has been sufficient to eliminate
any phylogenetic signal from gene trees. Recent efforts to
improve understanding of the extent and impact of
homologous recombination in the diversification of bacterial
clones are discussed.
Addresses
*Department of Infectious Disease Epidemiology, Imperial College
School of Medicine, St. Mary’s Hospital, London W2 1PG, UK
† Department of Biology and Biochemistry, University of Bath,
Bath BA2 7AY, UK
Correspondence: Brian G Spratt; e-mail: [email protected]
Current Opinion in Microbiology 2001, 4:602–606
1369-5274/01/$ — see front matter
© 2001 Elsevier Science Ltd. All rights reserved.
Abbreviations
MLST
multilocus sequence typing
r/m
recombination/mutation
SLV
single-locus variant
into a bacterial genome of pieces of DNA from distantly
related organisms in the absence of sequence homology.
The impact of homologous and non-homologous recombination on the evolution of microorganisms has become
increasingly clear in recent years and is being widely
discussed as a feature of microbial evolution at all levels of
the tree of life. At the deepest branches of the tree of life,
the differences in the phylogenetic relationships inferred
from the sequences of different genes has been interpreted
as evidence for frequent lateral gene transfer during early
evolution [2,3]. A more recent history of recombination is
apparent from analyses of bacterial genome sequences, in
which the presence of individual genes, or of blocks of
genes, with highly atypical base compositions suggests
that these parts of the genome have been acquired from
distantly related species [4,5••].
Towards the tips of the tree of life, gene acquisition also
appears to be a feature of bacterial evolutionary processes.
Genome sequences from different strains of the same
bacterial species show marked differences in gene content
within a single species [6••]. Some of these differences
may result from differential gene loss since the emergence
of the species, but many are probably due to gene acquisition
by illegitimate recombinational events that, in some cases,
have been responsible for the emergence of individual
strains of a basically commensal species that have acquired
the ability to cause disease (e.g. pathogenic strains of
Escherichia coli [6••,7•]).
Introduction
Bacteria divide by binary fission and are therefore considered
to be asexual organisms. This is true in the sense that the
obligate mixing and re-assortment of parental genomes at
each generation, which characterises most higher organisms,
is absent, but sexual processes that bring into the cell
genetic or allelic variation from different sources are
certainly available to bacteria. However, these sexual
processes — transduction, transformation and conjugation
— are very different from those in higher organisms, and
result in localised, unidirectional genetic exchanges in
which a small part of the chromosome of the recipient cell
is replaced by the corresponding region from a co-colonising
donor cell [1]. These replacements may range in size
from a few kilobases in natural transformation to several
tens of kilobases in phage-mediated transduction and,
potentially, hundreds of kilobases in conjugation. In
addition to the mechanisms that promote homologous
recombination and that are therefore effective only in
promoting recombination between similar isolates (the
same species or closely related species), there are illegitimate
(non-homologous) events that can lead to the incorporation
There is also growing awareness of the importance of
homologous recombination at the extreme tips of the tree
of life, where individual clones of a bacterial species are
beginning to diversify. In this review, we concentrate on
recent work that attempts to understand the extent and
impact of homologous recombination in the diversification of
bacterial clones. A more detailed account can be found in [8].
Measuring recombination
Although recombination rates can be measured in the laboratory, inferring a history of recombination and estimating
rates of recombination from samples of bacterial isolates
recovered from natural sources is more problematic.
Historically, the presence of linkage disequilibrium (the
non-random association between alleles at different loci in
a population), demonstrated using data from multilocus
enzyme electrophoresis (MLEE), led to the view that rates
of recombination are typically low in bacterial species.
However, whereas the presence of linkage equilibrium (the
random association between alleles) is difficult to explain
other than by high rates of recombination, the reverse is
The relative contributions of recombination and point mutation to the diversification of bacterial clones Spratt, Hanage and Feil
not necessarily true, and the possible confounding effects
of sampling bias, ecological substructuring and the emergence of transient adaptive clones need to be carefully
considered [8–11]. Analysis of linkage disequilibrium is, in
any case, an insensitive test of recombination, and there
are probably few bacterial species whose rates of recombination are sufficiently high to completely eliminate the
linkage between alleles that arises as a consequence of
reproduction by binary fission. Maynard Smith et al. [9]
estimated that an allele must change at least 20 times
more frequently by recombination than by point mutation
to eliminate linkage disequilibrium within a bacterial
population, and a similar estimate has been obtained by
Hudson [12]. Significant linkage disequilibrium is therefore
expected in bacterial populations even when evolutionary
change occurs much more frequently by recombination
than by point mutation, and its presence only allows the
conclusion that rates of recombination are not extremely
high (Figure 1).
Estimates of recombination rates from
sequence data
The sequences of the same gene from multiple isolates of
a single bacterial species are now easy to obtain, and a
history of recombination has frequently been inferred from
the analysis of the patterns of nucleotide sequence variation within such datasets, or from the lack of congruence
between gene trees. These approaches can indicate that
recombination has occurred, but they provide little quantitative information on the frequency of recombination,
compared to that of point mutation. More quantitative
approaches have been developed. For example, Maynard
Smith and Smith [13] used an approach based on the level
of homoplasy (the occurrence of the same nucleotide
change in different branches of a phylogenetic tree) within
a set of related sequences. Homoplasies can be introduced
by the chance occurrence of the same mutation in different
branches of a tree or by transfer by recombination of an
existing substitution into an unrelated isolate. The homoplasy test provides a useful index of recombination by
determining the extent to which the observed number
of homoplasies in a set of sequences is greater than the
number expected if recombination were absent.
Although useful, these methods are unable to provide
direct estimates of rates of recombination, compared to
rates of mutation, from nucleotide sequence data. Hudson
[14] presented a method for calculating the neutral-recombination parameter, C, on the basis of the variance in the
number of nucleotide differences between pairs of
sequences in a random sample of alleles from a population
[15,16]. This approach was used by Whittam and Ake [17],
who suggested that changes occur between one and ten
times more frequently by recombination than by point
mutation for several E. coli loci. Such calculations rely on
estimates of key genetic and population variables (such as
homoplasy levels, codon bias, effective population size,
mutation rate and the extent of selection), some of which
603
Figure 1
Increasing ratio of recombination to mutation
Weakly clonal
Highly clonal
Stable clones
Increasingly transient clones
Linkage disequilibrium
between alleles
Current Opinion in Microbiology
Non-clonal
No clones
Linkage
equilibrium
between
alleles
The impact of recombination on bacterial population structures. In a
bacterial population in which recombinational replacements are absent,
the diversification of clones is slow, as it depends entirely on the
accumulation of point mutations. High levels of linkage disequilibrium
are present, and the population is highly clonal, consisting of
independently evolving lineages. As the contribution of recombination to
evolutionary change at neutral loci increases, clones become
increasingly transient until, at high ratios of recombination to mutation,
clones cannot emerge because their genomes diversify too rapidly.
Significant linkage disequilibrium, and clones or clonal complexes, can
be present even in populations in which recombinational replacements
are far more common than point mutation, and these provide insensitive
indicators of the extent of recombination in bacterial populations.
are difficult to estimate with confidence [18]. This type of
approach has not often been applied to bacterial sequences
and further attempts to develop and validate these methods
for bacterial populations are required.
Estimates of the ratio of recombination to
point mutation during clonal diversification
Perhaps the most promising, and also the most simple,
method for estimating the extent to which evolutionary
change is brought about by recombination, compared to
point mutation, is that described by Guttman and
Dykhuizen [19]. This approach elegantly bypasses the
need for difficult estimates of population parameters, by
examining the sequence differences between isolates of a
species that are extremely closely related in genotype and
are therefore likely to be descended from a very recent
common ancestor. As the sequence differences are due to
very recent events, the problems of distinguishing ancient
events from more recent events are avoided. Such problems
can occur when comparing sequences from distantly
related isolates. Guttman and Dykhuizen [19] analysed
sequence variation along a region of the E. coli chromosome
and attempted to assign the observed sequence differences
as the result of recombination or point mutation. From
their analysis of sequence variation in 12 strains, they
proposed that three recombinational events, but no point
mutations, had occurred, and concluded that recombination
was a major force in the diversification of E. coli clones.
A new method for the characterisation of isolates of bacterial
species — multilocus sequence typing (MLST) — provides
large amounts of data that are ideal for measuring the
604
Genomics
relative contributions of recombination and mutation
during the initial stages of clonal diversification using the
above approach. MLST characterises each isolate of a bacterial species on the basis of the alleles present at each of
seven house-keeping loci [20]. For each locus, the
sequence of an internal fragment of about 500 bp is
obtained, and each different sequence is assigned as a distinct allele; the alleles present at each of the seven loci
provide an allelic profile that unambiguously defines each
strain. The allelic profiles of large numbers of isolates from
several species, including Neisseria meningitidis [20],
Streptococcus pneumoniae [21], Staphylococcus aureus [22],
Campylobacter jejuni [23] and Streptococcus pyogenes [24], are
available at the MLST website (http://www.mlst.net).
Within each species, clones may be identified as those
isolates that possess the same alleles at all seven loci. As
there are large numbers of alleles at each locus within
most bacterial species, MLST can distinguish billions of
potential allelic profiles, and isolates that have identical
allelic profiles or very closely related allelic profiles can be
assumed to share a recent common ancestor [25].
Groups of isolates with closely related allelic profiles have
been called ‘clonal complexes’. Within each clonal complex, it is usually possible to identify one predominant
allelic profile and a number of less common variant allelic
profiles. The simplest explanation for this pattern is that
the predominant allelic profile represents the genotype of
the ancestral clone that gave rise to the clonal complex,
and the variants that differ from this allelic profile at a single
locus (single-locus variants [SLVs]) represent the initial
stages of diversification of the clone. The ancestral clone
can be more rigorously defined using the BURST program
(http://www.mlst.net/BURST/burst.htm), which identifies
the allelic profile within each clonal complex that differs
from the largest number of other allelic profiles at only a
single locus and, hence, is most likely to be phylogenetically
central. This approach is robust to sampling errors, and
ancestral clones assigned in this manner typically correspond to the most frequently isolated allelic profile within
the clonal complex, thus lending independent support to
the assignments [26••,27•].
Once ancestral clones and their associated SLVs have been
identified, it is possible to distinguish SLVs that have
arisen by recombination from those that have arisen by
point mutation. These distinctions are made on the basis
of two criteria: the number of nucleotide sites that differ
between the variant allele in the SLV and the typical allele
in the putative ancestral clone, and the frequency at which
the variant allele is found elsewhere in the database. The
occurrence of multiple point mutations at one locus with
no changes at the other six loci is unlikely, and variant
alleles that differ at multiple nucleotide sites are therefore
considered to be recombinational replacements. However,
differences at a single site could result from either point
mutation or from recombinational replacements between
very similar sequences that introduce only a single
nucleotide difference. These types of events can be
distinguished, as a point mutation within a house-keeping
locus is very likely to result in a variant allele that is unique
within the MLST database. In contrast, an allele that has
been imported by recombination must be present elsewhere in the natural population, although not necessarily
within the isolates in the MLST database.
The probability that an imported allele will be found in
unrelated isolates in the database is dependent on the size
and characteristics of the sample of the population from
which the MLST data were generated. For example, if the
sequences of all alleles in the bacterial population were
known for all seven loci, the donor allele imported by
recombination would always be represented in the MLST
dataset (provided the whole allele was replaced). The
presence of all possible donor alleles in the MLST
database may seem unrealistic, given the large number of
alleles expected within a bacterial species. However,
MLST databases containing a few hundred isolates will
identify most of the alleles that are present at a significant
frequency in the population. Undoubtedly, there are many
more alleles present at very low frequencies, but the fact
that these are not in the database is unimportant as, owing to
their rarity, they will seldom be those that are introduced
by recombination. What is probably more important is
that the sample of isolates in the MLST database is representative of the population from which the recombinant
alleles are likely to be sampled.
For the samples of N. meningitidis and S. pneumoniae from
invasive disease that have been characterised by MLST,
83% and 81% of the variant alleles within SLVs that
differ at multiple sites were present elsewhere in the
MLST database [26••,27•], suggesting that the majority
of the common alleles in each population were represented. The variant alleles differing at a single site that
arose by recombination should be present in unrelated
isolates in the database in the same proportions, and if all
of these alleles arose by recombination, we would expect
<20% of them to be novel. In fact, approximately 80% of
the variant alleles differing at only a single site are novel
within the N. meningitidis database, and 53% are novel
within the S. pneumoniae database. The observation that
alleles differing at a single nucleotide site are significantly
more likely to be novel than alleles differing at multiple
sites reflects the fact that a proportion of the former alleles
arose by point mutation rather than by recombination.
As the great majority (>80%) of variant alleles in SLVs
that differ at multiple nucleotide sites are found in unrelated isolates in the databases, the number of variant
alleles that arise by point mutation can be estimated with
reasonable accuracy as the number of variant alleles differing at a single site that are novel. All other alleles
(those differing at multiple sites and those differing at a
single site, but which correspond to alleles present elsewhere in the database) are considered to be due to
recombinational replacements.
The relative contributions of recombination and point mutation to the diversification of bacterial clones Spratt, Hanage and Feil
It is then possible to estimate two recombination parameters:
the ratio of recombination/mutation (r/m) per allele, and
the r/m per site. These are the relative frequencies at
which an allele or an individual nucleotide site changes by
recombination, compared to by point mutation. For
N. meningitidis and S. pneumoniae, the ratios of r/m per allele
are similar, between 5:1 and 10:1 in favour of recombination,
indicating that alleles at house-keeping loci change more
frequently by recombination than by point mutation. For
isolates of S. aureus from invasive disease, the r/m per allele
parameter is approximately 1:1.
The ratio of r/m per site varies more markedly between
species and this reflects the variation in the amount of
sequence diversity within each population. For example, the
N. meningitidis population exhibits higher levels of sequence
variation than either S. pneumoniae or S. aureus, and donor
and recipient alleles will typically differ at more sites in the
former species than in the latter. Hence, on average, each
recombinational replacement results in more nucleotide
changes, leading to a higher ratio of r/m per site in
N. meningitidis than in S. pneumoniae or S. aureus [26••,27•,28].
Recombination, trees and networks
In each of the above three species, evolutionary change
during clonal diversification occurs as frequently or more
frequently by recombination than by point mutation. Is it
therefore valid to represent the relationships between isolates of these species as a phylogenetic tree, or should they
be represented as a network in which each isolate possesses
alleles from many different ancestors? Feil et al. [29••] used
the data from MLST to demonstrate a complete lack of
congruence between gene trees in N. meningitidis, S. pneumoniae and S. aureus. Using a set of about 40 diverse
isolates from each of the three species, the maximum likelihood tree for each of the seven MLST genes was no more
similar to that of the other MLST genes than it was to trees
with randomised topologies. Over the long term, the
observed rates of recombination in these species appear to
be sufficient to eliminate the phylogenetic signal in gene
trees. Intermediate levels of congruence were observed
between encapsulated isolates of Haemophilus influenzae,
and higher levels of congruence were found between isolates
of the pathogenic E. coli clones, confirming a recent report
of detectable phylogenetic signal in sequences from the
latter species [7•].
Conclusions
In this review, we have discussed recent experiments that
use MLST data to estimate the relative contributions of
recombination and point mutation to clonal diversification.
The high levels of recombination inferred for three pathogenic species using this method are supported by statistical
tests of congruence that demonstrate that recombination
has been sufficient to result in complete non-congruence
between gene trees. These results indicate that the evolutionary history of a set of bacterial isolates should not be
inferred from a single gene tree [29••], unless it is clear that
605
rates of recombination are sufficiently low for this
approach to be meaningful [7•]. Caution is also required
when using linkage disequilibrium to infer low rates of
recombination, as significant linkage can be found even in
species such as N. meningitidis [30], in which recombination
is clearly much more frequent than point mutation [26••].
Although we have described examples of species in which
recombination rates appear to be high, relative to mutation,
the number of species so far examined is small, and the
observed impact of recombination is likely to vary between
species, between subpopulations within species, and
between different regions of the genome. Despite the fact
that meaningful quantitative comparisons have proved
difficult in the past, the simple approach described in this
review illustrates that the rapidly expanding nucleotide
sequence databases can be exploited to provide a more
complete understanding of the evolutionary significance of
recombination in bacteria.
Acknowledgements
We acknowledge the support of the Wellcome Trust.
References and recommended reading
Papers of particular interest, published within the annual period of review,
have been highlighted as:
• of special interest
•• of outstanding interest
1.
Milkman R, Bridges MM: Molecular evolution of the Escherichia coli
chromosome. IV. Sequence comparisons. Genetics 1993,
133:455-468.
2.
Katz LA: The tangled web: gene genealogies and the origin of
eukaryotes. Am Nat 1999, 154:S137-S145.
3.
Woese CR: Interpreting the universal phylogenetic tree. Proc Natl
Acad Sci USA 2000, 97:8392-8396.
4.
Groisman EA, Ochman H: Pathogenicity islands: bacterial evolution
in quantum leaps. Cell 1996, 87:791-794.
5. Ochman H, Lawrence JG, Groisman EA: Lateral gene transfer and
•• the nature of bacterial innovation. Nature 2000, 405:299-304.
This is a recent review of the major role of lateral gene transfer in launching
isolates of a bacterial species into a new lifestyle.
6.
••
Perna NT, Plunkett G, Burland V, Mau B, Glasner JD, Rose DJ,
Mayhew GF, Evans PS, Gregor J, Kirkpatrick HA et al.: Genome
sequence of enterohaemorrhagic Escherichia coli O157:H7.
Nature 2001, 409:529-533.
This is a key article that illustrates the extent of variation in gene content
between isolates of a single species, and highlights the difficulties that are
likely to arise in using comparative genomics to understand biological differences between strains.
7.
•
Reid SD, Herbelin CJ, Bumbaugh AC, Selander RK, Whittam TS:
Parallel evolution of virulence in pathogenic Escherichia coli.
Nature 2000, 406:64-67.
By first establishing that recombination has not resulted in the elimination of
all phylogenetic signals, this is one of the very few papers to establish
whether or not phylogenetic inferences are likely to be meaningful.
8.
Feil EJ, Spratt BG: Recombination and the population
structures of bacterial pathogens. Ann Rev Microbiol 2001,
55:561-590.
9.
Maynard Smith J, Smith NH, O’Rourke M, Spratt BG: How clonal are
bacteria? Proc Natl Acad Sci USA 1993, 90:4384-4388.
10. Guttman DS: Recombination and clonality in natural populations
of Escherichia coli. Trends Ecol Evol 1997, 12:16-22.
11. Smith JM, Feil EJ, Smith NH: Population structure and evolutionary
dynamics of pathogenic bacteria. Bioessays 2000, 22:1115-1122.
12. Hudson RR: Analytical results concerning linkage disequilibrium
in models with genetic transformation and conjugation. J Evol Biol
1994, 7:535-548.
606
Genomics
13. Maynard Smith JM, Smith NH: Detecting recombination from gene
trees. Mol Biol Evol 1998, 15:590-599.
14. Hudson RR: Estimating the recombination parameter of a finite
population model without selection. Genet Res Camb 1987,
50:245-250.
15. Hey J, Wakeley J: A coalescent estimator of the population
recombination rate. Genetics 1997, 145:833-846.
16. Kuhner MK, Yamato J, Felsenstein J: Maximum likelihood estimation
of recombination rates from population data. Genetics 2000,
156:1393-1401.
17.
Whittam TS, Ake SE: Genetic polymorphisms and recombination
in natural populations of Escherichia coli. In Mechanisms of
Molecular Evolution. Edited by Takahata N, Clark AG. Sunderland,
Massachusetts: Sinauer Associates, Inc; 1993:223-245.
18. Holmes EC, Urwin R, Maiden MCJ: The influence of recombination
on the population structure and evolution of the human
pathogen Neisseria meningitidis. Mol Biol Evol 1999,
16:744-749.
19. Guttman DS, Dykhuizen DE: Clonal divergence in Escherichia coli
as a result of recombination, not mutation. Science 1994,
266:1380-1383.
20. Maiden MCJ, Bygraves JA, Feil EJ, Morelli G, Russell JE, Urwin R,
Zhang Q, Zhou J, Zurth K, Caugant DA et al.: Multilocus sequence
typing: a portable approach to the identification of clones within
populations of pathogenic microorganisms. Proc Natl Acad Sci
USA 1998, 95:3140-3145.
21. Enright MC, Spratt BG: A multilocus sequence typing scheme
for Streptococcus pneumoniae: identification of clones
associated with serious invasive disease. Microbiology 1998,
144:3049-3060.
22. Enright MC, Day NP, Davies CE, Peacock SJ, Spratt BG: Multilocus
sequence typing for characterization of methicillin-resistant and
methicillin-susceptible clones of Staphylococcus aureus. J Clin
Microbiol 2000, 38:1008-1115.
23. Dingle KE, Colles FM, Wareing DR, Ure R, Fox AJ, Bolton FE,
Bootsma HJ, Willems RJ, Urwin R, Maiden MCJ: Multilocus
sequence typing system for Campylobacter jejuni. J Clin Microbiol
2001, 39:14-23.
24. Enright MC, Spratt BG, Kalia A, Cross JH, Bessen DE: Multilocus
sequence typing of Streptococcus pyogenes and the relationships
between Emm-type and clone. Infect Immun 2001, 69:2416-2427.
25. Spratt BG: Multilocus sequence typing: molecular typing of
bacterial pathogens in an era of rapid DNA sequencing and the
internet. Curr Opin Microbiol 1999, 2:312-316.
26. Feil EJ, Maiden MCJ, Achtman M, Spratt, BG: The relative
•• contributions of recombination and mutation to the divergence of
clones of Neisseria meningitidis. Mol Biol Evol 1999,
16:1496-1502.
The first demonstration of how MLST data can be used to estimate recombination and mutation rates using the approach described in [19].
27.
•
Feil EJ, Maynard Smith J, Enright MC, Spratt BG: Estimating
recombinational parameters in Streptococcus pneumoniae from
multilocus sequence typing data. Genetics 2000, 154:1439-1450.
This is a more detailed discussion of the use of MLST data to calculate
recombination rates.
28. Feil EJ, Enright MC, Spratt BG: Estimating the relative
contributions of mutation and recombination to clonal
diversification: a comparison between Neisseria meningitidis and
Streptococcus pneumoniae. Res Microbiol 2000, 151:465-469.
29. Feil EJ, Holmes EC, Bessen DE, Chan M-S, Day NPJ, Enright MC,
•• Goldstein R, Hood DW, Kalia A, Moore CE et al.: Recombination
within natural populations of pathogenic bacteria: short-term
empirical estimates and long-term phylogenetic consequences.
Proc Natl Acad Sci USA 2001, 98:182-187.
Statistical tests of congruence are used to show that recombination in many
bacterial species may be sufficiently frequent to eliminate phylogenetic
signals and to make it impossible to understand the true evolutionary
relationships between the major lineages within a bacterial species.
30. Haubold B, Hudson RR: LIAN 3.0 Detecting linkage disequilbrium in
multilocus data. Linkage analysis. Bioinformatics 2001, 16:847-848.