Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
602 The relative contributions of recombination and point mutation to the diversification of bacterial clones Brian G Spratt*, William P Hanage* and Edward J Feil† Low levels of recombination in bacterial species have often been inferred from the presence of linkage disequilibrium between the alleles at different loci in the population. However, significant linkage disequilibrium is inevitable in organisms that divide by binary fission, and recombinational replacements must be very frequent, compared to point mutation, to dissipate disequilibrium. Recent studies using data from multilocus sequence typing indicate that, in many species, recombinational replacements contribute more greatly to clonal diversification than do point mutations and, in some species, recombination has been sufficient to eliminate any phylogenetic signal from gene trees. Recent efforts to improve understanding of the extent and impact of homologous recombination in the diversification of bacterial clones are discussed. Addresses *Department of Infectious Disease Epidemiology, Imperial College School of Medicine, St. Mary’s Hospital, London W2 1PG, UK † Department of Biology and Biochemistry, University of Bath, Bath BA2 7AY, UK Correspondence: Brian G Spratt; e-mail: [email protected] Current Opinion in Microbiology 2001, 4:602–606 1369-5274/01/$ — see front matter © 2001 Elsevier Science Ltd. All rights reserved. Abbreviations MLST multilocus sequence typing r/m recombination/mutation SLV single-locus variant into a bacterial genome of pieces of DNA from distantly related organisms in the absence of sequence homology. The impact of homologous and non-homologous recombination on the evolution of microorganisms has become increasingly clear in recent years and is being widely discussed as a feature of microbial evolution at all levels of the tree of life. At the deepest branches of the tree of life, the differences in the phylogenetic relationships inferred from the sequences of different genes has been interpreted as evidence for frequent lateral gene transfer during early evolution [2,3]. A more recent history of recombination is apparent from analyses of bacterial genome sequences, in which the presence of individual genes, or of blocks of genes, with highly atypical base compositions suggests that these parts of the genome have been acquired from distantly related species [4,5••]. Towards the tips of the tree of life, gene acquisition also appears to be a feature of bacterial evolutionary processes. Genome sequences from different strains of the same bacterial species show marked differences in gene content within a single species [6••]. Some of these differences may result from differential gene loss since the emergence of the species, but many are probably due to gene acquisition by illegitimate recombinational events that, in some cases, have been responsible for the emergence of individual strains of a basically commensal species that have acquired the ability to cause disease (e.g. pathogenic strains of Escherichia coli [6••,7•]). Introduction Bacteria divide by binary fission and are therefore considered to be asexual organisms. This is true in the sense that the obligate mixing and re-assortment of parental genomes at each generation, which characterises most higher organisms, is absent, but sexual processes that bring into the cell genetic or allelic variation from different sources are certainly available to bacteria. However, these sexual processes — transduction, transformation and conjugation — are very different from those in higher organisms, and result in localised, unidirectional genetic exchanges in which a small part of the chromosome of the recipient cell is replaced by the corresponding region from a co-colonising donor cell [1]. These replacements may range in size from a few kilobases in natural transformation to several tens of kilobases in phage-mediated transduction and, potentially, hundreds of kilobases in conjugation. In addition to the mechanisms that promote homologous recombination and that are therefore effective only in promoting recombination between similar isolates (the same species or closely related species), there are illegitimate (non-homologous) events that can lead to the incorporation There is also growing awareness of the importance of homologous recombination at the extreme tips of the tree of life, where individual clones of a bacterial species are beginning to diversify. In this review, we concentrate on recent work that attempts to understand the extent and impact of homologous recombination in the diversification of bacterial clones. A more detailed account can be found in [8]. Measuring recombination Although recombination rates can be measured in the laboratory, inferring a history of recombination and estimating rates of recombination from samples of bacterial isolates recovered from natural sources is more problematic. Historically, the presence of linkage disequilibrium (the non-random association between alleles at different loci in a population), demonstrated using data from multilocus enzyme electrophoresis (MLEE), led to the view that rates of recombination are typically low in bacterial species. However, whereas the presence of linkage equilibrium (the random association between alleles) is difficult to explain other than by high rates of recombination, the reverse is The relative contributions of recombination and point mutation to the diversification of bacterial clones Spratt, Hanage and Feil not necessarily true, and the possible confounding effects of sampling bias, ecological substructuring and the emergence of transient adaptive clones need to be carefully considered [8–11]. Analysis of linkage disequilibrium is, in any case, an insensitive test of recombination, and there are probably few bacterial species whose rates of recombination are sufficiently high to completely eliminate the linkage between alleles that arises as a consequence of reproduction by binary fission. Maynard Smith et al. [9] estimated that an allele must change at least 20 times more frequently by recombination than by point mutation to eliminate linkage disequilibrium within a bacterial population, and a similar estimate has been obtained by Hudson [12]. Significant linkage disequilibrium is therefore expected in bacterial populations even when evolutionary change occurs much more frequently by recombination than by point mutation, and its presence only allows the conclusion that rates of recombination are not extremely high (Figure 1). Estimates of recombination rates from sequence data The sequences of the same gene from multiple isolates of a single bacterial species are now easy to obtain, and a history of recombination has frequently been inferred from the analysis of the patterns of nucleotide sequence variation within such datasets, or from the lack of congruence between gene trees. These approaches can indicate that recombination has occurred, but they provide little quantitative information on the frequency of recombination, compared to that of point mutation. More quantitative approaches have been developed. For example, Maynard Smith and Smith [13] used an approach based on the level of homoplasy (the occurrence of the same nucleotide change in different branches of a phylogenetic tree) within a set of related sequences. Homoplasies can be introduced by the chance occurrence of the same mutation in different branches of a tree or by transfer by recombination of an existing substitution into an unrelated isolate. The homoplasy test provides a useful index of recombination by determining the extent to which the observed number of homoplasies in a set of sequences is greater than the number expected if recombination were absent. Although useful, these methods are unable to provide direct estimates of rates of recombination, compared to rates of mutation, from nucleotide sequence data. Hudson [14] presented a method for calculating the neutral-recombination parameter, C, on the basis of the variance in the number of nucleotide differences between pairs of sequences in a random sample of alleles from a population [15,16]. This approach was used by Whittam and Ake [17], who suggested that changes occur between one and ten times more frequently by recombination than by point mutation for several E. coli loci. Such calculations rely on estimates of key genetic and population variables (such as homoplasy levels, codon bias, effective population size, mutation rate and the extent of selection), some of which 603 Figure 1 Increasing ratio of recombination to mutation Weakly clonal Highly clonal Stable clones Increasingly transient clones Linkage disequilibrium between alleles Current Opinion in Microbiology Non-clonal No clones Linkage equilibrium between alleles The impact of recombination on bacterial population structures. In a bacterial population in which recombinational replacements are absent, the diversification of clones is slow, as it depends entirely on the accumulation of point mutations. High levels of linkage disequilibrium are present, and the population is highly clonal, consisting of independently evolving lineages. As the contribution of recombination to evolutionary change at neutral loci increases, clones become increasingly transient until, at high ratios of recombination to mutation, clones cannot emerge because their genomes diversify too rapidly. Significant linkage disequilibrium, and clones or clonal complexes, can be present even in populations in which recombinational replacements are far more common than point mutation, and these provide insensitive indicators of the extent of recombination in bacterial populations. are difficult to estimate with confidence [18]. This type of approach has not often been applied to bacterial sequences and further attempts to develop and validate these methods for bacterial populations are required. Estimates of the ratio of recombination to point mutation during clonal diversification Perhaps the most promising, and also the most simple, method for estimating the extent to which evolutionary change is brought about by recombination, compared to point mutation, is that described by Guttman and Dykhuizen [19]. This approach elegantly bypasses the need for difficult estimates of population parameters, by examining the sequence differences between isolates of a species that are extremely closely related in genotype and are therefore likely to be descended from a very recent common ancestor. As the sequence differences are due to very recent events, the problems of distinguishing ancient events from more recent events are avoided. Such problems can occur when comparing sequences from distantly related isolates. Guttman and Dykhuizen [19] analysed sequence variation along a region of the E. coli chromosome and attempted to assign the observed sequence differences as the result of recombination or point mutation. From their analysis of sequence variation in 12 strains, they proposed that three recombinational events, but no point mutations, had occurred, and concluded that recombination was a major force in the diversification of E. coli clones. A new method for the characterisation of isolates of bacterial species — multilocus sequence typing (MLST) — provides large amounts of data that are ideal for measuring the 604 Genomics relative contributions of recombination and mutation during the initial stages of clonal diversification using the above approach. MLST characterises each isolate of a bacterial species on the basis of the alleles present at each of seven house-keeping loci [20]. For each locus, the sequence of an internal fragment of about 500 bp is obtained, and each different sequence is assigned as a distinct allele; the alleles present at each of the seven loci provide an allelic profile that unambiguously defines each strain. The allelic profiles of large numbers of isolates from several species, including Neisseria meningitidis [20], Streptococcus pneumoniae [21], Staphylococcus aureus [22], Campylobacter jejuni [23] and Streptococcus pyogenes [24], are available at the MLST website (http://www.mlst.net). Within each species, clones may be identified as those isolates that possess the same alleles at all seven loci. As there are large numbers of alleles at each locus within most bacterial species, MLST can distinguish billions of potential allelic profiles, and isolates that have identical allelic profiles or very closely related allelic profiles can be assumed to share a recent common ancestor [25]. Groups of isolates with closely related allelic profiles have been called ‘clonal complexes’. Within each clonal complex, it is usually possible to identify one predominant allelic profile and a number of less common variant allelic profiles. The simplest explanation for this pattern is that the predominant allelic profile represents the genotype of the ancestral clone that gave rise to the clonal complex, and the variants that differ from this allelic profile at a single locus (single-locus variants [SLVs]) represent the initial stages of diversification of the clone. The ancestral clone can be more rigorously defined using the BURST program (http://www.mlst.net/BURST/burst.htm), which identifies the allelic profile within each clonal complex that differs from the largest number of other allelic profiles at only a single locus and, hence, is most likely to be phylogenetically central. This approach is robust to sampling errors, and ancestral clones assigned in this manner typically correspond to the most frequently isolated allelic profile within the clonal complex, thus lending independent support to the assignments [26••,27•]. Once ancestral clones and their associated SLVs have been identified, it is possible to distinguish SLVs that have arisen by recombination from those that have arisen by point mutation. These distinctions are made on the basis of two criteria: the number of nucleotide sites that differ between the variant allele in the SLV and the typical allele in the putative ancestral clone, and the frequency at which the variant allele is found elsewhere in the database. The occurrence of multiple point mutations at one locus with no changes at the other six loci is unlikely, and variant alleles that differ at multiple nucleotide sites are therefore considered to be recombinational replacements. However, differences at a single site could result from either point mutation or from recombinational replacements between very similar sequences that introduce only a single nucleotide difference. These types of events can be distinguished, as a point mutation within a house-keeping locus is very likely to result in a variant allele that is unique within the MLST database. In contrast, an allele that has been imported by recombination must be present elsewhere in the natural population, although not necessarily within the isolates in the MLST database. The probability that an imported allele will be found in unrelated isolates in the database is dependent on the size and characteristics of the sample of the population from which the MLST data were generated. For example, if the sequences of all alleles in the bacterial population were known for all seven loci, the donor allele imported by recombination would always be represented in the MLST dataset (provided the whole allele was replaced). The presence of all possible donor alleles in the MLST database may seem unrealistic, given the large number of alleles expected within a bacterial species. However, MLST databases containing a few hundred isolates will identify most of the alleles that are present at a significant frequency in the population. Undoubtedly, there are many more alleles present at very low frequencies, but the fact that these are not in the database is unimportant as, owing to their rarity, they will seldom be those that are introduced by recombination. What is probably more important is that the sample of isolates in the MLST database is representative of the population from which the recombinant alleles are likely to be sampled. For the samples of N. meningitidis and S. pneumoniae from invasive disease that have been characterised by MLST, 83% and 81% of the variant alleles within SLVs that differ at multiple sites were present elsewhere in the MLST database [26••,27•], suggesting that the majority of the common alleles in each population were represented. The variant alleles differing at a single site that arose by recombination should be present in unrelated isolates in the database in the same proportions, and if all of these alleles arose by recombination, we would expect <20% of them to be novel. In fact, approximately 80% of the variant alleles differing at only a single site are novel within the N. meningitidis database, and 53% are novel within the S. pneumoniae database. The observation that alleles differing at a single nucleotide site are significantly more likely to be novel than alleles differing at multiple sites reflects the fact that a proportion of the former alleles arose by point mutation rather than by recombination. As the great majority (>80%) of variant alleles in SLVs that differ at multiple nucleotide sites are found in unrelated isolates in the databases, the number of variant alleles that arise by point mutation can be estimated with reasonable accuracy as the number of variant alleles differing at a single site that are novel. All other alleles (those differing at multiple sites and those differing at a single site, but which correspond to alleles present elsewhere in the database) are considered to be due to recombinational replacements. The relative contributions of recombination and point mutation to the diversification of bacterial clones Spratt, Hanage and Feil It is then possible to estimate two recombination parameters: the ratio of recombination/mutation (r/m) per allele, and the r/m per site. These are the relative frequencies at which an allele or an individual nucleotide site changes by recombination, compared to by point mutation. For N. meningitidis and S. pneumoniae, the ratios of r/m per allele are similar, between 5:1 and 10:1 in favour of recombination, indicating that alleles at house-keeping loci change more frequently by recombination than by point mutation. For isolates of S. aureus from invasive disease, the r/m per allele parameter is approximately 1:1. The ratio of r/m per site varies more markedly between species and this reflects the variation in the amount of sequence diversity within each population. For example, the N. meningitidis population exhibits higher levels of sequence variation than either S. pneumoniae or S. aureus, and donor and recipient alleles will typically differ at more sites in the former species than in the latter. Hence, on average, each recombinational replacement results in more nucleotide changes, leading to a higher ratio of r/m per site in N. meningitidis than in S. pneumoniae or S. aureus [26••,27•,28]. Recombination, trees and networks In each of the above three species, evolutionary change during clonal diversification occurs as frequently or more frequently by recombination than by point mutation. Is it therefore valid to represent the relationships between isolates of these species as a phylogenetic tree, or should they be represented as a network in which each isolate possesses alleles from many different ancestors? Feil et al. [29••] used the data from MLST to demonstrate a complete lack of congruence between gene trees in N. meningitidis, S. pneumoniae and S. aureus. Using a set of about 40 diverse isolates from each of the three species, the maximum likelihood tree for each of the seven MLST genes was no more similar to that of the other MLST genes than it was to trees with randomised topologies. Over the long term, the observed rates of recombination in these species appear to be sufficient to eliminate the phylogenetic signal in gene trees. Intermediate levels of congruence were observed between encapsulated isolates of Haemophilus influenzae, and higher levels of congruence were found between isolates of the pathogenic E. coli clones, confirming a recent report of detectable phylogenetic signal in sequences from the latter species [7•]. Conclusions In this review, we have discussed recent experiments that use MLST data to estimate the relative contributions of recombination and point mutation to clonal diversification. The high levels of recombination inferred for three pathogenic species using this method are supported by statistical tests of congruence that demonstrate that recombination has been sufficient to result in complete non-congruence between gene trees. These results indicate that the evolutionary history of a set of bacterial isolates should not be inferred from a single gene tree [29••], unless it is clear that 605 rates of recombination are sufficiently low for this approach to be meaningful [7•]. Caution is also required when using linkage disequilibrium to infer low rates of recombination, as significant linkage can be found even in species such as N. meningitidis [30], in which recombination is clearly much more frequent than point mutation [26••]. Although we have described examples of species in which recombination rates appear to be high, relative to mutation, the number of species so far examined is small, and the observed impact of recombination is likely to vary between species, between subpopulations within species, and between different regions of the genome. Despite the fact that meaningful quantitative comparisons have proved difficult in the past, the simple approach described in this review illustrates that the rapidly expanding nucleotide sequence databases can be exploited to provide a more complete understanding of the evolutionary significance of recombination in bacteria. Acknowledgements We acknowledge the support of the Wellcome Trust. References and recommended reading Papers of particular interest, published within the annual period of review, have been highlighted as: • of special interest •• of outstanding interest 1. Milkman R, Bridges MM: Molecular evolution of the Escherichia coli chromosome. IV. Sequence comparisons. Genetics 1993, 133:455-468. 2. Katz LA: The tangled web: gene genealogies and the origin of eukaryotes. Am Nat 1999, 154:S137-S145. 3. Woese CR: Interpreting the universal phylogenetic tree. Proc Natl Acad Sci USA 2000, 97:8392-8396. 4. Groisman EA, Ochman H: Pathogenicity islands: bacterial evolution in quantum leaps. Cell 1996, 87:791-794. 5. Ochman H, Lawrence JG, Groisman EA: Lateral gene transfer and •• the nature of bacterial innovation. Nature 2000, 405:299-304. This is a recent review of the major role of lateral gene transfer in launching isolates of a bacterial species into a new lifestyle. 6. •• Perna NT, Plunkett G, Burland V, Mau B, Glasner JD, Rose DJ, Mayhew GF, Evans PS, Gregor J, Kirkpatrick HA et al.: Genome sequence of enterohaemorrhagic Escherichia coli O157:H7. Nature 2001, 409:529-533. This is a key article that illustrates the extent of variation in gene content between isolates of a single species, and highlights the difficulties that are likely to arise in using comparative genomics to understand biological differences between strains. 7. • Reid SD, Herbelin CJ, Bumbaugh AC, Selander RK, Whittam TS: Parallel evolution of virulence in pathogenic Escherichia coli. Nature 2000, 406:64-67. By first establishing that recombination has not resulted in the elimination of all phylogenetic signals, this is one of the very few papers to establish whether or not phylogenetic inferences are likely to be meaningful. 8. Feil EJ, Spratt BG: Recombination and the population structures of bacterial pathogens. Ann Rev Microbiol 2001, 55:561-590. 9. Maynard Smith J, Smith NH, O’Rourke M, Spratt BG: How clonal are bacteria? Proc Natl Acad Sci USA 1993, 90:4384-4388. 10. Guttman DS: Recombination and clonality in natural populations of Escherichia coli. Trends Ecol Evol 1997, 12:16-22. 11. Smith JM, Feil EJ, Smith NH: Population structure and evolutionary dynamics of pathogenic bacteria. Bioessays 2000, 22:1115-1122. 12. Hudson RR: Analytical results concerning linkage disequilibrium in models with genetic transformation and conjugation. J Evol Biol 1994, 7:535-548. 606 Genomics 13. Maynard Smith JM, Smith NH: Detecting recombination from gene trees. Mol Biol Evol 1998, 15:590-599. 14. Hudson RR: Estimating the recombination parameter of a finite population model without selection. Genet Res Camb 1987, 50:245-250. 15. Hey J, Wakeley J: A coalescent estimator of the population recombination rate. Genetics 1997, 145:833-846. 16. Kuhner MK, Yamato J, Felsenstein J: Maximum likelihood estimation of recombination rates from population data. Genetics 2000, 156:1393-1401. 17. Whittam TS, Ake SE: Genetic polymorphisms and recombination in natural populations of Escherichia coli. In Mechanisms of Molecular Evolution. Edited by Takahata N, Clark AG. Sunderland, Massachusetts: Sinauer Associates, Inc; 1993:223-245. 18. Holmes EC, Urwin R, Maiden MCJ: The influence of recombination on the population structure and evolution of the human pathogen Neisseria meningitidis. Mol Biol Evol 1999, 16:744-749. 19. Guttman DS, Dykhuizen DE: Clonal divergence in Escherichia coli as a result of recombination, not mutation. Science 1994, 266:1380-1383. 20. Maiden MCJ, Bygraves JA, Feil EJ, Morelli G, Russell JE, Urwin R, Zhang Q, Zhou J, Zurth K, Caugant DA et al.: Multilocus sequence typing: a portable approach to the identification of clones within populations of pathogenic microorganisms. Proc Natl Acad Sci USA 1998, 95:3140-3145. 21. Enright MC, Spratt BG: A multilocus sequence typing scheme for Streptococcus pneumoniae: identification of clones associated with serious invasive disease. Microbiology 1998, 144:3049-3060. 22. Enright MC, Day NP, Davies CE, Peacock SJ, Spratt BG: Multilocus sequence typing for characterization of methicillin-resistant and methicillin-susceptible clones of Staphylococcus aureus. J Clin Microbiol 2000, 38:1008-1115. 23. Dingle KE, Colles FM, Wareing DR, Ure R, Fox AJ, Bolton FE, Bootsma HJ, Willems RJ, Urwin R, Maiden MCJ: Multilocus sequence typing system for Campylobacter jejuni. J Clin Microbiol 2001, 39:14-23. 24. Enright MC, Spratt BG, Kalia A, Cross JH, Bessen DE: Multilocus sequence typing of Streptococcus pyogenes and the relationships between Emm-type and clone. Infect Immun 2001, 69:2416-2427. 25. Spratt BG: Multilocus sequence typing: molecular typing of bacterial pathogens in an era of rapid DNA sequencing and the internet. Curr Opin Microbiol 1999, 2:312-316. 26. Feil EJ, Maiden MCJ, Achtman M, Spratt, BG: The relative •• contributions of recombination and mutation to the divergence of clones of Neisseria meningitidis. Mol Biol Evol 1999, 16:1496-1502. The first demonstration of how MLST data can be used to estimate recombination and mutation rates using the approach described in [19]. 27. • Feil EJ, Maynard Smith J, Enright MC, Spratt BG: Estimating recombinational parameters in Streptococcus pneumoniae from multilocus sequence typing data. Genetics 2000, 154:1439-1450. This is a more detailed discussion of the use of MLST data to calculate recombination rates. 28. Feil EJ, Enright MC, Spratt BG: Estimating the relative contributions of mutation and recombination to clonal diversification: a comparison between Neisseria meningitidis and Streptococcus pneumoniae. Res Microbiol 2000, 151:465-469. 29. Feil EJ, Holmes EC, Bessen DE, Chan M-S, Day NPJ, Enright MC, •• Goldstein R, Hood DW, Kalia A, Moore CE et al.: Recombination within natural populations of pathogenic bacteria: short-term empirical estimates and long-term phylogenetic consequences. Proc Natl Acad Sci USA 2001, 98:182-187. Statistical tests of congruence are used to show that recombination in many bacterial species may be sufficiently frequent to eliminate phylogenetic signals and to make it impossible to understand the true evolutionary relationships between the major lineages within a bacterial species. 30. Haubold B, Hudson RR: LIAN 3.0 Detecting linkage disequilbrium in multilocus data. Linkage analysis. Bioinformatics 2001, 16:847-848.