Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Microbiology (2006), 152, 3245–3259 DOI 10.1099/mic.0.29170-0 Multiple gene genealogical analyses reveal both common and distinct population genetic patterns among replicons in the nitrogen-fixing bacterium Sinorhizobium meliloti Sheng Sun, Hong Guo and Jianping Xu Correspondence Jianping Xu Center for Environmental Genomics, Department of Biology, McMaster University, 1280 Main St West, Hamilton, ON L8S 4K1, Canada [email protected] Received 29 May 2006 Revised 8 August 2006 Accepted 10 August 2006 Sinorhizobium meliloti is a Gram-negative alpha-proteobacterium that can form symbiotic relationships with alfalfa and fix atmospheric nitrogen. The complete genome of a laboratory strain, Rm1021, was published in 2001, and the genome of this strain is arranged in three replicons: a chromosome of 3?65 million base pairs (Mb), and two megaplasmids, pSymA (1?35 Mb) and pSymB (1?68 Mb). However, the potential difference in genetic variation among the three replicons in natural strains remains poorly understood. In this study, a total of 16 gene fragments were sequenced, four from pSymA and six each from the chromosome and pSymB, for 49 natural S. meliloti strains. The analyses identified significant differences in divergence among genes, with the mean Hasegawa–Kishino–Yano–1985 (HKY85) distance ranging from 0?00157 to 0?04109 between pairs of strains. Overall, genes on pSymA showed the highest mean HKY85 distance, followed by those on pSymB and the chromosome. Although evidence for recombination was found, the authors’ population genetic analyses revealed overall significant linkage disequilibria among genes within both pSymA and the chromosome. However, genes on pSymB were in overall linkage equilibrium, consistent with frequent recombination among genes on this replicon. Furthermore, the genealogical comparisons among the three replicons identified significant incongruence, indicating reassortment among the three replicons in natural populations. The results suggest both shared and distinct patterns of molecular evolution among the three replicons in the genomes of natural strains of S. meliloti. INTRODUCTION Prokaryotes and many eukaryotic microbes propagate primarily through asexual binary fission. As a result, in nature, most microbial populations show evidence of clonality. In population genetic terms, indicators of clonality include overrepresentation of certain genotypes, congruent phylogenies, and significant deviations from Hardy–Weinberg equilibrium and linkage equilibrium (Xu, 2004, 2005). However, most natural microbial populations examined so far have also shown evidence of recombination (Xu, 2004, 2006; Seifert & DiRita, 2006). Recent gene and genome-sequence comparisons have revealed that both homologous recombination and horizontal gene transfer (HGT) are ubiquitous in prokaryotic microbes (e.g. Gogarten et al., 2002; Seifert & DiRita, 2006). One of the Abbreviations: HGT, horizontal gene transfer; HKY85, Hasegawa– Kishino–Yano–1985; IA, index of association; IR, incompatibility ratio; Mb, million base pairs; MGGA, multiple gene genealogical analysis; MLEE, multilocus enzyme electrophoresis; MLST, multilocus sequence typing; MP, maximum parsimony; PH, partition homogeneity. 0002-9170 G 2006 SGM most commonly cited phenomena of microbial recombination and HGT is the spread of pathogenicity islands and antibiotic-resistance genes among human pathogenic microbes (e.g. Davies, 1994). Understanding the roles of clonality and recombination in the genomes of natural microbial populations has significant implications in both basic and applied aspects of microbiology (Xu, 2006; Seifert & DiRita, 2006). Sinorhizobium meliloti is a Gram-negative alpha-proteobacterium capable of forming symbiotic relationships with alfalfa and occasionally several other plant species (including those in genera Medicago, Medica and Tetrinella), and fixing atmospheric nitrogen. This species is a model organism for studying plant–microbe interactions and the mechanisms of symbiotic nitrogen fixation (Finan et al., 2002). The genome of a laboratory strain of this species, Rm1021, has been completely sequenced (Galibert et al., 2001). Its genome structure has been found to be similar to that of most symbiotic nitrogen-fixing bacteria, in which genetic information is typically partitioned into a chromosome and a variable number of large plasmids that encode Downloaded from www.microbiologyresearch.org by IP: 88.99.165.207 On: Sat, 17 Jun 2017 15:12:39 Printed in Great Britain 3245 S. Sun, H. Guo and J. Xu genes for symbiosis (Finan et al., 2002). Like all strains of S. meliloti analysed so far, strain Rm1021 is found to contain three replicons (Sobral et al., 1991; Van Sluys et al., 2002; S. Sun & J. Xu, unpublished data). However, the sizes of these replicons may vary among strains. For example, strain Rm1021 has a genome size of 6?7 million base pairs (Mb), with a 3?65 Mb chromosome, a 1?35 Mb megaplasmid called pSymA and a 1?68 Mb megaplasmid called pSymB (Galibert et al., 2001). In contrast, the genome size of the type strain of S. meliloti, ATCC 9930, is about 370 kb larger than that of strain Rm1021 (Guo et al., 2005). Strain ATCC 9930 has a 3?65 Mb chromosome, a 1?63 Mb pSymA and a 1?82 Mb pSymB (Guo et al., 2005). Using a variety of molecular markers, such as multilocus enzyme electrophoresis (MLEE), random amplified polymorphic DNA (RAPD), amplified fragment length polymorphism (AFLP) and PCR-RFLP, several studies have shown high levels of genetic diversity within natural populations of S. meliloti (Biondi et al., 2003; Carelli et al., 2000; Hartmann et al., 1998; Jebara et al., 2001; Paffetti et al., 1996; Roumiantseva et al., 2002). However, little is known about the potential similarities and differences among the three replicons in natural strains of S. meliloti. Based on Southern hybridization data using replicon-specific probes, evidence for recombination has been found between replicons in a soil population of 16 strains of a different symbiotic nitrogen-fixing bacterium, Rhizobium leguminosarum var. trifolii (Schofield et al., 1987). However, in a different study, correlation was found between genotypes from the sym plasmid and the chromosome in field populations of R. leguminosarum (Young & Wexler, 1988). At present, there is little information about the potential replicon-specific population structure in other prokaryotes with multiple large replicons. Multiple gene genealogical analysis (MGGA) is a powerful method for inferring strain relationships and for analysing the structure of microbial populations (e.g. Kidd et al., 2005; Lan & Xu, 2006; Silva et al., 2005; Vinuesa et al., 2005; Xu, 2005; Xu et al., 2000). MGGA is similar to the more widely known multilocus sequence typing (MLST) (Cooper & Feil, 2004; Maiden et al., 1998). The major advantage of MGGA over MLST is that, using MGGA, the evolutionary relationships among individual alleles are taken into account in the inferences of relationships among strains and populations. Compared to other molecular markers, data generated by MGGA (and MLST) are unambiguous, can be easily stored in public databases, and are readily shared among researchers. In this study, we used MGGA to examine the patterns of DNA sequence variation in a collection of natural strains of the nitrogen-fixing bacterium S. meliloti. A total of 16 genes distributed across the three replicons were analysed for each of 49 strains. Eighteen strains belonged to the most frequent MLEE type (ET), ET1, and the remaining 31 strains each had a different ET (Eardly et al., 1990; Table 1). Using this sample, we aimed to address the following questions. First, 3246 how much divergence is there among strains of S. meliloti at the individual gene level, at the replicon level, and at the genome level? Specifically, do genes on all three replicons show similar levels of polymorphism and divergence? Second, will the inferred strain relationships differ from each other depending on the genes and replicons analysed? Specifically, do strains belonging to ET1 cluster together in our gene genealogical analysis? Third, is there evidence for recombination within each replicon and among replicons? METHODS Strains and DNA isolation. The 49 isolates of S. meliloti analysed in this study were part of the collection used for the MLEE study reported by Eardly et al. (1990). The ETs, geographic origins and host plant species of these strains are presented in Table 1. This group contains 18 strains of ET1 and 31 strains of other ETs, with each of the 31 strains having a different ET. Dr Bert Eardly of Pennsylvania State University kindly provided us with these strains. For each isolate, the storage culture from a 270 uC freezer was first streaked onto a TY (tryptone/yeast extract) agar plate and incubated at 30 uC. For each strain, a single colony was picked to inoculate liquid LBmc broth (per litre: 10 g pancreatic digest of casein, 5 g NaCl, 5 g yeast extract, 2?5 mM MgSO4 and 2?5 mM CaCl2, pH 7). Cells were incubated at 30 uC with constant agitation at 120 r.p.m. and harvested by centrifugation when the population density reached an OD600 of 0?8–1?0. Genomic DNA was extracted using a method previously described for S. meliloti (Guo et al., 2005). The quantity and quality of DNA were assessed using the UltraSpec 2000 pro spectrophotometer (Fisher Scientific). Primers, PCR, and DNA sequencing. The 16 genes analysed here were randomly picked from diverse regions of the genome of strain Rm1021. Here, we assumed that strain Rm1021 had a genome structure typical of S. meliloti, and that the 16 genes analysed on different replicons of Rm1021 were on corresponding replicons in other strains (Sobral et al., 1991). The primers were designed based on the genome sequence of strain Rm1021 (http://bioinfo.genopole-toulouse.prd.fr/annotation/iANT/bacteria/rhime/). The gene names, primer sequences, and their genomic locations are shown in Fig. 1 and Table 2. The information in Fig. 1 and Table 2 is all based on the sequenced laboratory strain Rm1021 (Galibert et al., 2001). For the SmbExoF3 gene, two primer pairs were used, with 44 strains amplified by the ExoF3-1 primer pair, and five strains (M56, M275, N6B9, CC2003 and M294) amplified by the ExoF3-2 primer pair. The change of primers was necessary because the initial ExoF3-1 primer pair did not work for these five strains. PCR for all other genes was successful for all strains using a single primer pair each (Table 2). A typical PCR reaction contained 6 ml diluted genomic DNA template (~20 ng), 0?5 U Taq DNA polymerase, 1 mM each primer and 200 mM of each of the four deoxyribonucleotide triphosphates, in a total volume of 30 ml. The following PCR conditions were used for all amplifications: 4 min at 95 uC, followed by 30 cycles of 30 s at 95 uC, 30 s at 56 uC (at 50 uC for ExoF3-2), 45 s at 72 uC, and finally 7 min at 72 uC. After confirmation of the PCR products by agarose gel electrophoresis, PCR products were cleaned using the DiaMed PCR cleanup kit according to the manufacturer’s manual. The purified PCR products were then sequenced (Mobix Laboratory, McMaster University, Downloaded from www.microbiologyresearch.org by IP: 88.99.165.207 On: Sat, 17 Jun 2017 15:12:39 Microbiology 152 Gene genealogy in S. meliloti Table 1. Strains of S. meliloti used in this study Strain ET* ATCC 9930 V-7 102F28 U45 Rm1021 (RCR2011) L5-30 41 Sa10 M124 M101 M94 M68 M44 M11 M6 M5 N4A6 N4A3 M56 A145 M98 M275 M270 102F85 74B3 N6B1 M95 128A7 56A14 M286 M289 15B4 128A10 N6B5 N6B11 17B6 N6B9 1322 CC2003 N6B4 M248 15A5 M294 M119 S33 102F51 74B4 74B12 74B15 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 3 4 5 6 7 8 9 10 11 12 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 Original host species (genus Medicago) M. sativa M. sativa M. sativa M. sativa M. sativa M. sativa M. sativa M. sativa Unspecified M. truncatula M. truncatula M. polymorpha M. minima M. rigidula M. rigidula M. rigidula M. sativa M. sativa M. rotato M. sativa M. rotato M. rigidula M. truncatula M. sativa M. sativa M. falcate M. rotato M. sativa M. sativa M. rotato M. truncatula M. sativa M. sativa M. falcate M. falcate M. sativa M. falcate M. sativa M. sativa M. sativa M. polymorpha M. sativa M. polymorpha Unspecified M. sativa M. sativa M. sativa M. sativa M. sativa Country of origin USA Canada USA Uruguay Australia Poland Hungary France Syria Syria Syria Syria Syria Syria Syria Syria Nepal Nepal Syria Syria Syria Jordan Jordan Canada Pakistan Nepal Syria Pakistan Pakistan Jordan Jordan Pakistan Pakistan Nepal Nepal Pakistan Nepal New Zealand Australia Nepal Jordan Pakistan Jordan Syria USA USA Pakistan Pakistan Pakistan *ETs refer to those defined by Eardly et al. (1990). http://mic.sgmjournals.org Downloaded from www.microbiologyresearch.org by IP: 88.99.165.207 On: Sat, 17 Jun 2017 15:12:39 3247 S. Sun, H. Guo and J. Xu mdh smc00408 smc00735 minC fdhE smb20036 cbbR Chromosome oxyR pSymB smb20492 exoF3 sma1440 smc01261 Canada) using an Applied BioSystems Prism 3100 automated sequencer with dRhodamine-labelled terminators (PE Applied BioSystems), following the manufacturer’s instructions. Data analyses Phylogenetic analysis. The analyses of DNA sequence variation within a gene, among genes within a replicon and among replicons were performed using the programs PAUP* (Swofford, 2004) and MULTILOCUS 1.3 (Agapow & Burt, 2001). The maximum parsimony (MP) trees were obtained using heuristic searches and the tree-bisection reconnection (TBR) branch swapping with 100 starting trees obtained by a random sequential addition of taxa. Searches for MP trees were conducted for each of the 16 genes, the combined gene sequences on each replicon, and the combined sequences for all 16 gene fragments. Mid-point rooting was used for all phylogenetic trees. To compare the amount of sequence divergence among genes, pairwise strain sequence difference was calculated for each of the 16 genes, for the combined sequence information on each of the three replicons, and for the combined 16 genes from all three replicons using the Hasegawa– Kishino–Yano–1985 (HKY85) distance measure. The HKY85 model is chosen because it treats transitions and transversions differently, and uses observed base-substitution patterns to derive the optimal weighting scheme among various types of transitions and transversions. In addition, HKY85 does not assume equal base frequencies in the analysed genes and takes into account base frequency heterogeneity. Base frequency heterogeneity is commonly observed in rhizobia genes and genomes, including those of S. meliloti (Galibert et al., 2001). The HKY85 distances were obtained through PAUP*, but were then exported to Microsoft Excel for calculations of means and standard deviations. Statistical significance between replicons in mean pairwise strain divergence was assessed by the non-parametric Mann– Whitney U test (a rank-order test). Linkage disequilibrium. To examine the potential differences in linkage disequilibrium among genes and among replicons, we implemented three complementary tests. Because of the large number of haplotypes for each gene in this collection of strains, the use of unique haplotypes as individual alleles for each sequenced gene would make two of the following three tests meaningless [i.e. the index of association (IA) and the phylogenetic incompatibility tests]. Instead, we analysed linkage disequilibria among phylogenetically informative polymorphic nucleotide sites, and treated each site as a locus and different bases at each site as an allele. The phylogenetically informative sites were then defined as different linkage groups based on the specifics of individual analyses using the program MULTILOCUS (Agapow & Burt, 2001). In the first test, we calculated the standard, most commonly used multilocus linkage disequilibrium, IA. In this test, the observed data were compared against the null hypothesis that alleles (i.e. bases) from different loci (i.e. phylogenetically informative sites) were randomly associating with each other. If the population were clonal, there would 3248 pSymA nifH smb21596 aqpzl sma1821 Fig. 1. Relative positions of the 16 analysed genes on the three replicons of the model laboratory strain Rm1021. be significant linkage disequilibrium and the null hypothesis of random allelic association would be rejected. The underlying assumptions, formulae, and inferences of statistical significance of IA, can be found on the MULTILOCUS program homepage (Agapow & Burt, 2001). Because the value of the traditional IA can be influenced by the number of loci analysed (typically the higher the number of loci, the higher the IA value), to make comparisons between genes and replicons easier to interpret, we standardized the IA value by the number of loci (i.e. phylogenetically informative sites). This standardized index is called Rd (Agapow & Burt, 2001). In the second test, the proportion of pairwise loci that were phylogenetically incompatible was calculated. In the simplest case, in a haploid species (such as most bacteria, including S. meliloti), a phylogenetic incompatibility occurs between two loci with two alleles each when all four possible genotypes are found in the population. Phylogenetic incompatibility is an indicator of recombination at the population level. The incompatibility ratio (IR), where IR=(number of incompatible pairs of sites in the test dataset)/(number of incompatible pairs of sites in a randomly shuffled dataset), can be used as a test for inferences of statistical significance. For each IR test, 1000 randomizations were performed and the 95 % confidence interval was generated and compared with the observed percentage of phylogenetically incompatible pairs of loci. In each comparison, a P value of less than 0?05 would indicate that the hypothesis of random recombination should be rejected for the population (Agapow & Burt, 2001). The above two analyses compared the observed data against the null hypothesis of random recombination. In small populations with highly skewed allele frequencies, these tests can have a significant type II error: the error to accept a null but false hypothesis. Because the sample size here is relatively small (n=49) and singleton alleles are common, to minimize the type II error, we also used the third complementary test, the partition homogeneity (PH) test. The PH test is also called the incongruence length difference (ILD) test. The null hypothesis for the PH test is strict clonality (Farris et al., 1994). Using phylogenies inferred from different genes, this test compares whether genealogies from different genes are congruent. Congruent gene genealogies suggest clonality and incongruent gene genealogies indicate recombination. Specifically, when multiple genealogies are compared, the length of the shortest possible tree from the combined dataset is compared to that of observed data. If the tree length from the observed dataset is significantly longer than that of the shortest possible tree, the genealogies are considered incongruent. In contrast, if the tree length of the observed dataset is not significantly different from that of the shortest possible tree, these genealogies are considered statistically congruent. The statistical significance of this test is derived using 100 randomizations of phylogenetically informative polymorphic nucleotides among genes within the same strain. The starting trees (100) were obtained by random sequential addition of taxa, and branch swapping was done using the tree-bisection reconnection method. The PH test was conducted among genes within each replicon, between pairs of replicons, and among all three replicons. Downloaded from www.microbiologyresearch.org by IP: 88.99.165.207 On: Sat, 17 Jun 2017 15:12:39 Microbiology 152 Gene genealogy in S. meliloti Table 2. Genes and primers used in this study Gene name Function Primer* Genes on pSymA SmaNifH Nitrogenase Fe protein Sma1440 5-Dehydro-4-deoxyglucarate dehydratase SmaFdhE Probable FdhE formate dehydrogenase formation Sma1821 Conserved hypothetical protein Genes on pSymB Smb20036 Putative ABC transporter periplasmic solute-binding protein SmbCbbR Transcriptional regulator Smb20492 Short-chain oxidoreductase SmbExoF3 Putative OMA family outer-membrane protein precursor Smb21596 Conserved hypothetical protein SmbMinC Putative cell-division inhibitor Genes on the chromosome Smc00408 Bacitracin-resistance protein, putative undecaprenol kinase SmcOxyR Hydrogen peroxide-inducible gene activator Smc01261 Transporter SmcAqpz1 Aquaporin Z (bacterial nodulin-like intrinsic protein) Smc00735 Hypothetical protein SmcMdh Malate dehydrogenase Primer sequence (5§R3§) Position in databaseD F R F CCGAACAACCGAAATAGCTTAAAC AAGCATCTGCTCGTCGCTCTTCATG CGCCAGTTCCGGCACGAAATT 453 517–453 540 454 404–454 380 794 608–794 628 R F CGAGCGAAAAAACCGATGCG AAGCCGAATTTGGCACGCCT 795 362–795 343 6218–6237 R F R AGCCCATCAGGAACGGGTCAA AGCAACCAACCGAAGAGGCCA AGGCGCCGCCGAATTTTTTG 7064–7044 1 033 058–1 033 078 1 032 458–1 032 477 F GGCATGGAGAAATTCGCCGA 46 790–46 809 R F R F R 1-F TTCCATTCCCGTCTTGCGGA AAGGATGGCGCAAAAGGGGA TGATCGTCTCGTTCGAAGCGA CGCAACGCGTCCAATGTTGA TGCCCACAACCCGAACAATG TTCCTTGACGATGCCGAGCTG 47 508–47 527 212 394–212 413 213 158–213 138 510 333–510 352 509 716–509 735 813 783–813 803 1-R 2-F 2-R F R F R TGCAAGCTTTGCGAGCTGCA ACTTCCTTGACGATGCCGAG TTCGGCGGAGTGTTTTCCAG ATCCAGCCAAATCCATCCGC GTCCAATTGCTGTCGCCGAA CCCTCTAGAAGCGTCCCGTAGATATG CCCCGGATCCGCTAGCAATAATTAACGAAGATG 814 607–814 588 813 781–813 800 814 694–814 675 1 137 619–1 137 638 1 136 909–1 136 928 1 447 462–1 447 487 1 448 050–1 448 028 F TCGGATTCAAATCGCCGGGA 359 358–359 377 R F AGGATGCGCCAGATCGCAAA AGGCGGATATGGCGTTTGCA 359 993–360 012 839 259–839 278 R F R F TGGAAGAACATCTGGGCGTGA ATGGATTCCGATGACGCGGT TGGTTTGCGATCCGGCATTG GGCACTCGAGTATGCGTCGAGCCAAGAATGATGAG 840 017–839 997 1 514 466–1 514 485 1 515 170–1 515 189 2 339 959–2 339 993 R F R F R TTCAAGATCTGGAAGCTCTCTGTGGAATTTC ATTCGAGGCCGCGATCTTCGA AGCACGAGCCGATGATGGTGA GCACGCGCTTCTTGTCCTTGA TTCGGGGATGATTGGTGGCA 2 340 428–2 340 398 2 846 121–2 846 141 2 846 709–2 846 729 3 318 701–3 318 721 3 318 760–3 318 741 *F, forward; R, reverse. Dhttp://bioinfo.genopole-toulouse.prd.fr/annotation/iANT/bacteria/rhime/ http://mic.sgmjournals.org Downloaded from www.microbiologyresearch.org by IP: 88.99.165.207 On: Sat, 17 Jun 2017 15:12:39 3249 S. Sun, H. Guo and J. Xu RESULTS Haplotype variation among genes and replicons The number of nucleotides analysed for each gene and each replicon is presented in Table 3. Among the total of 11 189 nucleotides obtained from the 16 genes, 2995 nucleotides were from the four genes on pSymA, 4170 nucleotides from the six genes on pSymB, and 4024 nucleotides from the six genes on the chromosome. For the 49 strains, the number of unique sequence types (called a haplotype) varied widely among the 16 sequenced gene fragments, from six to 24, with a mean of 13?5 haplotypes per gene fragment (Table 3). Overall, more haplotypes were found for genes on the two megaplasmids pSymA (a mean of 18?5 haplotypes per gene) and pSymB (a mean of 14 haplotypes per gene) than for those on the chromosome (a mean of 9?7 haplotypes per gene). Among the 49 strains, the combined gene sequence analyses identified 43 unique haplotypes for pSymA, 46 haplotypes for pSymB, and 34 haplotypes for the chromosome (Fig. 2, Table 3). Analysis of the 16 genes together showed that each of the 49 strains had a unique multilocus genotype (Fig. 3). In this collection of strains, similar genotypic diversities were observed between strains of ET1 (n=18 strains) and those of other ETs (n=31 strains). Based on the sequences from pSymA, we identified a total of 15 unique genotypes for the 18 ET1 strains and 29 genotypes for the 31 strains of other ETs. While a slightly higher genotypic diversity was observed for the non-ET1 sample based on pSymB sequences (31 genotypes for 31 strains) than for the ET1 sample (16 genotypes for 18 strains), the reverse was true for the chromosomal genes. Specifically, for the combined six genes on the chromosome, 16 genotypes were found for the 18 ET1 strains, while 20 genotypes were found for the 31 strains with other ETs. Of interest are two strains, 74B3 (ET8) and N4A3 (ET1), which showed identical pSymA and pSymB sequences, but had slightly different genotypes based on chromosomal gene sequences (Figs 2 and 3). These two strains differed at one base each for two of the six sequenced genes, Smc1261 and SmcOxyR. Based on the combined DNA sequences from all 16 genes, each of the 49 strains had a unique genotype. Because the genotypic diversity among the ET1 strains was similar to that of non-ET1 strains, in the following analysis we did not treat them separately but analysed all 49 strains together. Mean sequence divergence between strains The mean pairwise HKY85 distance between strains was calculated for each gene, each replicon and the whole genome, based on available sequences (Table 3, last column). The smallest was found for the SmcAqpz1 gene (mean±SD=0?00157±0?00164; n=1176 pairwise comparisons) located on the chromosome, and the largest was found Table 3. Molecular variation within and among genes in natural strains of S. meliloti Gene name Number of nucleotides Number of analysed (bp) unique alleles Sma1440 Sma1821 SmaFdhE SmaNifH pSymA genes combined Smb20036 Smb20492 Smb21596 SmbCbbR SmbExoF3 SmbMinC pSymB genes combined Smc00408 Smc00735 Smc01261 SmcAqpz1 SmcMdh SmcOxyR Chromosome genes combined All 16 genes combined 721 621 810 843 2995 738 637 699 729 793 574 4170 655 806 724 435 693 711 4024 11 189 24 12 24 14 43 16 17 15 15 13 8 46 6 18 9 8 8 9 34 49 Phylogenetic analysis No. of Tree length MP trees 232 2 4 150 6144 96 6 6 35 6 1 576 1 12 6 1 4 1 5 8 52 25 92 76 325 34 24 92 24 145 16 479 17 51 12 8 11 14 133 1179 Pairwise HKY85 distanceD CI* 0?865 0?920 0?924 0?882 0?677 0?853 1?000 0?946 0?833 0?959 1?000 0?658 1?000 0?980 0?917 1?000 0?909 1?000 0?827 0?547 0?753±0?532 0?807±0?755 2?501±2?078 1?736±2?211 1?498±1?066 0?888±0?721 0?353±0?249 1?104±1?801 0?629±0?525 4?109±5?813 0?382±0?683 1?285±1?109 0?538±0?627 0?615±0?685 0?325±0?223 0?157±0?164 0?170±0?233 0?200±0?256 0?349±0?258 1?001±0?513 *CI, consistency index. DValues shown are mean±SD (61022). 3250 Downloaded from www.microbiologyresearch.org by IP: 88.99.165.207 On: Sat, 17 Jun 2017 15:12:39 Microbiology 152 Gene genealogy in S. meliloti (a) MP tree based on combined pSymA sequences Sm1021, ET1, sat, AUS U45, ET1, sat, URY L5-30, ET1, sat, POL 9930, ET1, sat, USA M56, ET2, rot, SYR M98, ET4, rot, SYR 98 100 100 A145, ET3, sat, SYR 102F85, ET7, sat, CAN 56A14, ET12, sat, PAK 15B4, ET17, sat, PAK 74B4, ET32, sat, PAK 100 128A10, ET18, sat, PAK 74B15, ET34, sat, PAK 74B12, ET33, sat, PAK V-7, ET1, sat, CAN 41, ET1, sat, HUN N6B4, ET25, sat, NPL M124, ET1, uns, SYR M94, ET1, tru, SYR M44, ET1, min, SYR M275, ET5, rig, JOR M95, ET10, rot, SYR M289, ET16, tru, JOR N4A6, ET1, sat, NPL N6B1, ET9, fal, NPL M286, ET15, rot, JOR N6B9, ET22, fal, NPL M294, ET28, pol, JOR M6, ET1, rig, SYR S33, ET30, sat, USA 102F51, ET31, sat, USA 74B3, ET8, sat, PAK N4A3, ET1, sat, NPL 128A7, ET11, sat, PAK 102F28, ET1, sat, USA Sa10, ET1, sat, FRA 97 N6B5, ET19, fal, NPL N6B11, ET20, fal, NPL 17B6, ET21, sat, PAK 17A5, ET27, sat, PAK 100 99 M270, ET6, tru, JOR M101, ET1, tru, SYR M11, ET1, rig, SYR M5, ET1, rig, SYR 1322, ET23, sat, NZL M119, ET29, uns, SYR M68, ET1, pol, SYR CC2003, ET24, sat, AUS M248, ET26, pol, JOR 97 5 Fig. 2. For legend see page 3253. for the SmbExoF3 gene (mean±SD=0?04109±0?05813; n=1176) located on pSymB, corresponding to a difference of over 26-fold. In general, genes on the chromosome showed significantly less among-strain divergence than genes on pSymA and pSymB. When genes on the same replicon were considered together, the mean divergence for genes on pSymA (mean=0?01498; SD=0?01066) was over four times greater than that for chromosomal genes (mean=0?00349; SD=0?00258). Similarly, the mean strain divergence for genes on pSymB (mean=0?01285; SD=0?01109) was over three times greater than that of the chromosomal genes. The non-parametric http://mic.sgmjournals.org Mann–Whitney U test indicated that while the mean divergence for the sequenced pSymA genes was not significantly different from that of those on pSymB (U=16, one-tailed P=0?238), both showed significantly greater divergences than those on the chromosome. Specifically, the Mann–Whitney U values were 24 and 32 between pSymA and the chromosome, and between pSymB and the chromosome, with P values of 0?005 and 0?013, respectively. Because the gene SmbExoF3 from pSymB showed an abnormally high divergence, we also tested the differences between replicons excluding SmbExoF3. However, our analysis indicated no significant difference in the tests with or without the SmbExoF3 gene. Without Downloaded from www.microbiologyresearch.org by IP: 88.99.165.207 On: Sat, 17 Jun 2017 15:12:39 3251 S. Sun, H. Guo and J. Xu (b) MP tree based on combined pSymB sequences Sm1021, ET1, sat, AUS U45, ET1, sat, URY L5-30, ET1, sat, POL M289, ET16, tru, JOR M124, ET1, uns, SYR M94, ET1, tru, SYR M44, ET1, min, SYR M6, ET1, rig, SYR M98, ET4, rot, SYR 102F51, ET31, sat, USA M119, ET29, uns, SYR M5, ET1, rig, SYR M101, ET1, tru, SYR M68, ET1, pol, SYR N6B4, ET25, sat, NPL M270, ET6, tru, JOR M95, ET10, rot, SYR 17B6, ET21, sat, PAK 1322, ET23, sat NZL 98 97 91 S33, ET30, sat, USA 74B3, ET8, sat, PAK N4A3, ET1, sat, PAK M286, ET15, rot, JOR N4A6, ET1, sat, NPL M248, ET26, pol, JOR 74B15, ET34, sat, PAK 102F85, ET7, sat, CAN 100 100 95 5 56A14, ET12, sat, PAK 15B4, ET17, sat, PAK 74B4, ET32, sat, PAK 94 128A10, ET18, sat, PAK 74B12, ET33, sat, PAK M11, ET1, rig, SYR 9930, ET1, sat, USA A145, ET3, sat, SYR 128A7, ET11, sat, PAK 15A5, ET27, sat, PAK N6B1, ET9, fal, NPL N6B5, ET19, fal, NPL N6B11, ET20, fal, NPL V-7, ET1, sat, CAN 102F28, ET1, sat, USA 99 41, ET1, sat, HUN Sa10, ET1, sat, FRA M56, ET2, rot, SYR M275, ET5, rig, JOR 100 N6B9, ET22, fal, NPL CC2003, ET24, sat, AUS M294, ET28, pol, JOR Fig. 2. For legend see page 3253. including the SmbExoF3 gene, the Mann–Whitney U value between pSymA and pSymB was 16, with a P value of 0?095, indicating no significant difference. Similarly, excluding the SmbExoF3 gene, the U value between pSymB and the chromosome was 26, with a P value of 0?026, indicating a significant difference and consistent with the comparison that included the SmbExoF3 gene. Linkage disequilibria and IR tests To investigate the relationships among alleles at polymorphic nucleotide sites, we analysed all phylogenetically 3252 informative sites (i.e. nucleotide sites that had at least two alleles with each present in at least two strains) using the MULTILOCUS software. Specifically, we calculated the overall IA among alleles at different sites and obtained the percentage of pairwise sites that were phylogenetically incompatible. Within each of the 16 genes, we found no evidence of phylogenetic incompatibility or recombination (data not shown). Therefore, our focus here is on comparing the associations among phylogenetically informative sites between genes on the same replicon and between replicons. To achieve these goals, we performed two sets of tests. The first computed the allelic association between all pairwise Downloaded from www.microbiologyresearch.org by IP: 88.99.165.207 On: Sat, 17 Jun 2017 15:12:39 Microbiology 152 Gene genealogy in S. meliloti (c) MP tree based on combined chromosomal sequences 90 100 91 1 Sm1021, ET1, sat, AUS 1322, ET23, sat, NZL M119, ET29, uns, SYR U45, ET1, sat, URY N4A6, ET1, sat, NPL L5-30, ET1, sat, POL M6, ET1, rig, SYR CC2003, ET24, sat, AUS M275, ET5, rig, JOR 17B6, ET21, sat, PAK N6B9, ET22, fal, NPL M248, ET26, pol, JOR 15A5, ET27, sat, PAK M294, ET28, pol, JOR M124, ET1, uns, SYR M94, ET1, tru, SYR M44, ET1, min, SYR M68, ET1, pol, SYR 9930, ET1, sat, USA A145, ET3, sat, SYR 128A7, ET11, sat, PAK N6B5, ET19, fal, NPL N6B11, ET20, fal, NPL V-7, ET1, sat, CAN 102F28, ET1, sat, USA 102F85, ET7, sat, CAN 41, ET1, sat, HUN 74B3, ET8, sat, PAK N4A3, ET1, sat, NPL M95, ET10, rot, SYR M289, ET16, tru, JOR M56, ET2, rot, SYR M98, ET4, rot, SYR 91 M270, ET6, tru, JOR M101, ET1, tru, SYR M11, ET1, rig, SYR M5, ET1, rig, SYR M286, ET15, rot, JOR N6B1, ET9, fal, NPL N6B4, ET25, sat, NPL Sa10, ET1, sat, FRA S33, ET30, sat, USA 102F51, ET31, sat, USA 56A14, ET12, sat, PAK 98 15B4, ET17, sat, PAK 128A10, ET18, sat, PAK 74B4, ET32, sat, PAK 100 74B12, ET33, sat, PAK 74B15, ET34, sat, PAK Fig. 2. Single representative MP trees for the combined DNA sequences from each of the three replicons. (a) pSymA, (b) pSymB, (c) chromosome. Mid-point rooting was used for all trees. Scale bars represent the number of nucleotide substitutions, and the branch lengths are proportional to the amount of sequence divergence. Bootstrap values greater than 90 % are shown in each tree. gene combinations within each of the three replicons, and the second computed the overall association among all genes within each replicon. The summary results of the analyses are presented in Table 4. Below is a brief overview of the results. Of the six pairwise combinations for genes on pSymA, results from three pairs consistently rejected the null hypothesis of random association among alleles by both the IR test and the IA test. The hypothesis of random association for two other combinations was rejected by http://mic.sgmjournals.org one test each, and both tests failed to reject one combination (i.e. the gene pair Sma1440 versus SmaNifH) (Table 4). Interestingly, all three pairs that showed signatures of random association involved gene Sma1440. Analysing all four genes together, the null hypothesis of random association among polymorphic nucleotide sites was rejected by both methods (Table 4). Overall, the results provided evidence of clonality with localized recombination (most likely involving gene Sma1440) among the four sequenced genes on pSymA. Downloaded from www.microbiologyresearch.org by IP: 88.99.165.207 On: Sat, 17 Jun 2017 15:12:39 3253 S. Sun, H. Guo and J. Xu While a similar pattern to that of pSymA was observed for the six genes on the chromosome (Table 4), a different pattern overall was seen for genes on pSymB. The combined analysis of all phylogenetically informative polymorphic sites among genes on pSymB failed to reject the null hypothesis of random recombination, indicating a popula- tion structure not significantly different from random association (Table 4). However, it should be pointed out that significant linkage disequilibria were seen in about onethird of the pairwise gene combinations in pSymB in each of the two tests, a result indicative of some localized clonality on pSymB. Table 4. IA and phylogenetic incompatibility in S. meliloti Genomic region pSymA All four genes on pSymA combined pSymB All six genes on pSymB combined Chromosome All six genes on the chromosome combined Gene pair Sma1440/Sma1821 Sma1440/SmaFdhE Sma1440/SmaNifH Sma1821/SmaFdhE Sma1821/SmaNifH SmaFdhE/SmaNifH Smb20036/Smb20492 Smb20036/Smb21596 Smb20036/SmbCbbR Smb20036/SmbExoF3 Smb20036/SmbMinC Smb20492/Smb21596 Smb20492/SmbCbbR Smb20492/SmbExoF3 Smb20492/SmbMinC Smb21596/SmbCbbR Smb21596/SmbExoF3 Smb21596/SmbMinC SmbCbbR/SmbExoF3 SmbCbbR/SmbMinC SmbExoF3/SmbMinC Smc00408/Smc00735 Smc00408/Smc01261 Smc00408/SmcAqpz1 Smc00408/SmcMdh Smc00408/SmcOxyR Smc00735/Smc01261 Smc00735/SmcAqpz1 Smc00735/SmcMdh Smc00735/SmcOxyR Smc01261/SmcAqpz1 Smc01261/SmcMdh Smc01261/SmcOxyR SmcAqpz1/SmcMdh SmcAqpz1/SmcOxyR SmcMdh/SmcOxyR PrCD Rdd 0?747 0?831** 0?757 0?846** 0?719* 0?772** 0?719** 0?877* 0?767 0?779* 0?883* 0?773 0?883 0?728** 0?94 0?766 0?778 0?788 0?782 0?926* 0?56 0?959* 0?723 0?744 0?868** 0?83 0?789 0?783 0?812** 0?87 0?895 0?824 0?924** 0?879* 0?78 0?894 0?791 0?847 0?768** 0?122* 0?215 0?292 0?312** 0?452** 0?289** 0?204** 0?139 0?146* 0?209** 0?433 0?226** 0?205 0?216 0?572 0?274 0?182 0?419 0?225** 0?507 0?300* 0?508 0?244 0?308** 0?257** 0?281 0?317** 0?225 0?225 0?299 0?401** 0?285* 0?041 0?089 0?17 0?095 0?281* 0?274* 0?167** DPrC, proportion of pairwise polymorphic nucleotide sites that are phylogenetically compatible. Asterisks refer to the rejection of the random recombination hypothesis between genes: *P<0?05; **P<0?01. dRd, standardized IA by the number of analysed loci (i.e. phylogenetically informative nucleotide sites). 3254 Downloaded from www.microbiologyresearch.org by IP: 88.99.165.207 On: Sat, 17 Jun 2017 15:12:39 Microbiology 152 Gene genealogy in S. meliloti Gene genealogy analyses We used the MP method to infer the relationships among strains for each of the 16 genes (trees not shown), as well as for the combined sequences for each of the three replicons and for all the combined gene sequences. The number of MP trees and the lengths of these MP trees for the 16 genes are summarized in Table 3. Fig. 2 shows three representative MP trees, one for each replicon, based on all the combined DNA sequences. Bootstrap values greater than 90 % are labelled for individual branches on the phylogenetic trees. While phylogenetic analyses of genes on each of the three replicons identified several robust clusters of strains that were shared among the three replicons (Figs 2 and 3), overall, the PH test identified limited evidence for phylogenetic congruence among genes within each of the three replicons as well as between replicons (Table 5). All combined MP trees had lengths significantly longer than the summed lengths of individual gene trees (Table 5). The results of the PH tests are consistent with recombination among the analysed genes on the three replicons. MP tree based on combined total sequence Sm1021, ET1, sat, AUS U45, ET1, sat, URY L5-30, ET1, sat, POL M6, ET1, rig, SYR 98 A145, ET3, sat, SYR V-7, ET1, sat, CAN 41, ET1, sat, HUN 102F85, ET7, sat, CAN 100 91 10 56A14, ET12, sat, PAK 15B4, ET17, sat, PAK 128A10, ET18, sat, PAK 95 74B4, ET32, sat, PAK 100 74B12, ET33, sat, PAK 74B15, ET34, sat, PAK N6B4, ET25, sat, NPL M98, ET4, rot, SYR S33, ET30, sat, USA 102F51, ET31, sat, USA M270, ET6, tru, JOR 100 M101, ET1, tru, SYR M11, ET1, rig, SYR 96 M5, ET1, rig, SYR 1322, ET23, sat, NZL 100 M119, ET29, uns, SYR M68, ET1, pol, SYR M248, ET26, pol, JOR 17B6, ET21, sat, PAK 74B3, ET8, sat, PAK N4A3, ET1, sat, NPL M286, ET15, rot, JOR N4A6, ET1, sat, NPL M95, ET10, rot, SYR M289, ET16, tru, JOR M124, ET1, uns, SYR M94, ET1, tru, SYR M44, ET1, min, SYR 9930, ET1, sat, USA 128A7, ET11, sat, PAK 15A5, ET27, sat, PAK 100 N6B1, ET9, fal, NPL N6B5, ET19, fal, NPL 95 N6B11, ET20, fal, NPL 102F28, ET1, sat, USA Sa10, ET1, sat, FRA M56, ET2, rot, SYR M275, ET5, rig, JOR 94 N6B9, ET22, fal, NPL 100 CC2003, ET24, sat, AUS M294, ET28, pol, JOR Fig. 3. One of eight MP trees for the combined DNA sequence from all 16 sequenced genes. Mid-point rooting was used for this tree. The scale bar represents the number of nucleotide substitutions, and the branch lengths are proportional to the amount of sequence divergence. Bootstrap values greater than 90 % are shown. http://mic.sgmjournals.org Downloaded from www.microbiologyresearch.org by IP: 88.99.165.207 On: Sat, 17 Jun 2017 15:12:39 3255 S. Sun, H. Guo and J. Xu In addition, we also observed incongruence among the phylogenetic trees inferred from among the three different replicons. In all three pairwise PH tests among the three replicons, the lengths of the combined MP trees were significantly longer than the summed length of MP trees based on individual replicons (Table 5). Thus, the PH tests rejected the null hypotheses of congruence among MP trees from the sequences of the three replicons. The significantly incongruent phylogenies among the three replicons provided evidence of recombination among the replicons in nature. Our genealogical analyses identified some geographic or host-specific clusters of strains. For example, five strains (56A14, 15B4, 74B4, 128A10 and 74B12), each with a different ET, originally isolated from M. sativa plants in Pakistan, were consistently clustered together in all phylogenies, based on sequences derived from pSymA, pSymB and the chromosome (Fig. 2). Another strain from Pakistan, 74B15, showed clustering with the above five strains based on the pSymA and chromosomal phylogenies but had a different clustering pattern on the pSymB phylogeny (Fig. 2). Despite the existence of such clusters, overall, there was little consistent pattern of host species- or geographic location-based strain relationships across the three replicons. Furthermore, strains of ETI were not clustered together in any of the replicon-based MP trees (Fig. 2) or the combined phylogenetic tree that included all the sequences (Fig. 3). DISCUSSION In this study, we sequenced fragments of 16 genes distributed widely throughout the genome of the model laboratory strain Rm1021 of the nitrogen-fixing bacterium S. meliloti. For each of the 49 strains, we analysed a total of 11 189 nucleotides, representing about 0?167 % of the whole genome of Rm1021. Our analyses revealed a diverse range of molecular divergence among genes (HKY85 distance ranges between 0?00157 and 0?04109) and among replicons (HKY85 distance ranges between 0?00349 and 0?01498). Among the sequenced genes, on average, those on the megaplasmid pSymA showed the highest sequence divergence, followed by genes on pSymB, with genes on the chromosome showing the lowest divergence. Our multilocus linkage disequilibrium analyses using IA identified two replicons, pSymA and the chromosome, having overall clonal population structures. However, limited recombination was also noted for these two replicons based on other tests. In contrast, pSymB showed an overall structure not significantly different from random recombination. Our phylogenetic analyses identified very limited geographic, host species-specific or MLEE-based patterns of sequence variation. Most MLST studies of prokaryotes and eukaryotic microbes conducted so far have examined four to seven genes with a total of 2–5 kb of sequence (Cooper & Feil, 2004; Maiden et al., 1998). In these studies, to ensure amplification of the genes from all strains, the chosen genes have typically been conserved house-keeping genes, encoding essential cellular functions. In contrast, our criteria were to provide a broad coverage of the physical locations of the genes on the three replicons. Prior to the analysis of these genes, we had no knowledge about the level of DNA sequence variation among strains for any of these 16 gene fragments. As a result, we believe that the large number of genes analysed and the randomness in our selection process ensure that our data are representative of the genome of S. meliloti and that the data here provide a realistic assessment of the genetic variation among strains. As expected, we found that the variation in mean sequence divergence among genes in this study was greater (a 26-fold difference between the most variable gene SmbExoF3 and the least variable gene SmcAqpz1) than those in most other studies. For example, the sequenced loci in the human pathogen Neisseria meningitidis show a maximum sixfold difference between genes in their mean sequence divergence among strains, typical of most MLST studies of prokaryotes (Cooper & Feil, 2004; Seifert & DiRita, 2006). Compared to genes on the two megaplasmids, genes on the chromosome showed a lower divergence level among strains. While the exact mechanism(s) for these differences is unknown at present, there are two possibilities. The first is Table 5. Summary results of the PH test Values shown are the number of steps in each analysis. Analysed sequence combinations Among genes within pSymA Among genes within pSymB Among genes on the chromosome pSymA versus pSymB pSymA versus chromosome pSymB versus chromosome pSymA versus pSymB versus chromosome All 16 genes separately 3256 Length of original partition Length of observed MP trees Lengths of trees based on randomized dataset P 245 335 113 804 458 612 937 693 325 479 133 968 507 695 1179 1179 303–320 444–481 122–137 952–975 491–512 675–698 1150–1189 1038–1195 <0?001 <0?001 <0?001 <0?001 <0?001 <0?001 <0?001 <0?001 Downloaded from www.microbiologyresearch.org by IP: 88.99.165.207 On: Sat, 17 Jun 2017 15:12:39 Microbiology 152 Gene genealogy in S. meliloti that genes on pSymA and pSymB may be under significantly fewer functional constraints than those on the chromosome. Therefore, genes on the two megaplasmids might be more prone to the accumulation of mutations. Laboratory studies have shown that significant portions of the genes on pSymA and pSymB can be deleted with few or no fitness consequences (Charles & Finan, 1991; Oresnik et al., 2000), consistent with the lack of functional constraints for genes on these two megaplasmids. The second hypothesis is that genes on pSymA and pSymB might be under positive selection. Signatures of positive selection have been detected for the nodulation receptor kinase (NORK) gene in one of the host plants of S. meliloti, Medicago truncatula (De Mita et al., 2006). It is possible that many genes on pSymA and pSymB are highly niche-specific and that their divergence is associated with host specialization and/or frequent niche switching. It is interesting to note that the gene with the highest mean divergence, SmbExoF3, is located on pSymB and that it encodes a putative outer-membrane protein. Its high rate of divergence might have significant functional implications, e.g. in host recognition and niche specialization. However, a preliminary analysis found little evidence of positive Darwinian selection for this gene, as the ratio of non-synonymous substitutions to synonymous substitutions within the sequenced SmbExoF3 gene fragment was significantly lower than 1 (data not shown). More robust analyses with additional sequences from closely related bacterial species such as Sinorhizobium medicae and Sinorhizobium fredii, as well as their interacting genes from a variety of host species, are needed to critically evaluate these hypotheses. This study used three different tests to infer allelic associations among polymorphic nucleotide sites within a replicon as well as among replicons: the IA test, the IR test and the PH test. Overall, the three analytical methods showed similar results. However, minor differences and inconsistencies were found (Tables 3 and 4). The inconsistencies were likely the results of the differences among the tests themselves. For example, the three methods have different null models. The tests for IA and IR used the null model of complete random association, while the PH test used strict clonality as the null hypothesis (Agapow & Burt, 2001; Farris et al., 1994). In addition, all three analytical methods are highly sensitive to several factors, such as strain population size, the number of polymorphic nucleotide sites, and the frequency of individual alleles at these polymorphic sites. The number of strains used in this study was relatively small (n=49), and many polymorphic sites had highly skewed allele frequencies, thus potentially contributing to the observed inconsistencies among the analytical methods. Our analyses provided evidence of recombination between genes within all three replicons, with genes on pSymB showing overall linkage equilibrium based on IA. These results suggested that genetic exchange and recombination have played a significant role in natural populations of S. http://mic.sgmjournals.org meliloti. Using a whole-replicon nearest-neighbour analysis, Wong & Golding (2003) have shown that pSymB in strain Rm1021 has a complex evolutionary history, with closest sequence matches coming from diverse groups of organisms. Their analysis is consistent with our observation of frequent recombination for genes on pSymB. However, their theoretical study analysed only one replicon from a single strain; the potential differences among strains and replicons in S. meliloti were not analysed (Wong & Golding, 2003). We observed differences in the overall allelic associations among the three replicons. While the exact mechanisms are unknown, several non-mutually exclusive mechanisms could help explain the observed differences in the degree of recombination among the replicons. In the first hypothesis, there might be intrinsic differences in the rate of recombination (DNA-strand breakage and repair) among the three replicons. These breakages and repairs may be related to the activity of insertion sequence (IS) elements and phage-like elements among the three replicons. In the genome of strain Rm1021, abundant IS elements and phage-like elements are found in all three replicons (Galibert et al., 2001). In the second hypothesis, there might be frequent loss and gain of megaplasmids (or portions of megaplasmids) among strains in natural populations. Indeed, the entire megaplasmid pSymA and large portions of megaplasmid pSymB can be deleted from the genome of S. meliloti, with few or no fitness consequences, under certain laboratory conditions (Charles & Finan, 1991; Oresnik et al., 2000). Recent screening for novel DNA sequences (sequences that are absent in the genome of the model laboratory strain Rm1021 but are present in other natural strains of S. meliloti) among natural strains of S. meliloti has also identified frequent gains and losses of DNA fragments on the two megaplasmids and, to a lesser extent, on the chromosome as well (Guo et al., 2005). The frequent gains and losses could have contributed to the observed phylogenetic incongruences among the three replicons. In the third hypothesis, putative conjugative transfer genes have been found on pSymA of strain Rm1021, and these genes could potentially mediate HGT and contribute to random allelic associations among replicons (Galibert et al., 2001). HGT has been found in many natural bacterial populations, including nitrogen-fixing bacteria (Seifert & DiRita, 2006; Silva et al., 2005; Vinuesa et al., 2005; Xu, 2006). For example, in a symbiotic nitrogenfixing bacterium, Sullivan et al. (2002) found that a 502 kb symbiosis island in Mesorhizobium loti strain R7A was transferred to a non-symbiotic Mesorhizobium strain in the soil, converting the recipient cell to a symbiont. Although no evidence for direct genetic exchange between marked strains of S. meliloti has been found in nature, a cluster of genes encoding a type IV pilus has been found on pSymA of strain Rm1021 (Galibert et al., 2001). Type IV pili are unique structures on the bacterial surface that are found in many Gram-negative bacteria. They play important roles in adhesion to host cells, in infection by bacteriophages, and in conjugative DNA transfer among strains (Ashelford et al., 2003; Door et al., 1998). Downloaded from www.microbiologyresearch.org by IP: 88.99.165.207 On: Sat, 17 Jun 2017 15:12:39 3257 S. Sun, H. Guo and J. Xu We identified significant sequence variation among the 18 strains of ET1 (Figs 2 and 3). The combined sequence analysis revealed that each of the 18 strains had a unique multilocus genotype. While small clusters of ET1 strains were found in most phylogenetic trees (Figs 2 and 3), overall, strains of ET1 showed a broad distribution on all three replicon-based phylogenies as well as on the phylogeny based on the combined sequences of the 16 genes. Our analyses thus demonstrated the high discriminatory power of MGGA over MLEE. A similar pattern of non-clustered distribution of ET1 strains was also found in a phylogenetic tree based on the analyses of 12 novel DNA fragments (Guo et al., 2005). The patterns from both datasets (i.e. gene sequences and novel DNA distributions) are consistent with the main conclusion of our study, i.e. recombination plays a significant role in the evolution and genome structure in natural strains of S. meliloti. However, like the results obtained from MLEE analysis, the genealogical analyses of these randomly selected genes revealed limited geographicor host species-based patterns of molecular variation (Figs 2 and 3; statistical analysis not shown). Instead, the analyses of these potentially neutral markers suggested significant gene flow between geographic regions and host species. REFERENCES Agapow, P. M. & Burt, A. (2001). Indices of multilocus linkage disequilibrium. Mol Ecol Notes 1, 101–102. Ashelford, K. E., Day, M. J. & Fry, J. C. (2003). Elevated abundance of bacteriophage infecting bacteria in soil. Appl Environ Microbiol 69, 285–289. Biondi, E. G., Pilli, E., Giuntini, E. & 8 other authors (2003). Genetic relationship of Sinorhizobium meliloti and Sinorhizobium medicae strains isolated from Caucasian region. FEMS Microbiol Lett 220, 207–213. Carelli, M., Gnocchi, S., Fancelli, S., Mengoni, A., Paffetti, D., Scotti, C. & Bazzicalupo, M. (2000). Genetic diversity and dynamics of Sinorhizobium meliloti populations nodulating different alfalfa cultivars in Italian soils. Appl Environ Microbiol 66, 4785–4789. Charles, T. & Finan, T. M. (1991). Analysis of a 1600-kilobase Rhizobium meliloti megaplasmid using defined deletions generated in vivo. Genetics 127, 5–20. Cooper, J. E. & Feil, E. J. (2004). Multilocus sequence typing – what is resolved? Trends Microbiol 12, 373–377. Davies, J. E. (1994). Inactivation of antibiotics and the dissemination of resistance genes. Science 264, 375–382. De Mita, S., Santoni, S., Hochu, I., Ronfort, J. & Bataillon, T. (2006). Molecular evolution and positive selection of the symbiotic gene NORK in Medicago truncatula. J Mol Evol 62, 234–244. Dispersion and migration between different geographic areas could have been brought about by wind, water, or human activities such as the widespread cultivation of the host plant alfalfa in many parts of the world. Indeed, extensive gene flow between geographic populations has been found in other symbiotic nitrogen-fixing Rhizobium species associated with agricultural crops (e.g. Moreiar et al., 1998; Oyaizu et al., 1993). For example, strains of Rhizobium etli, another common nitrogen-fixing species that can form a symbiotic relationship with legumes, have been found to be capable of dispersal along with the seeds of its host plant, Phaseolus vulgaris (Perez-Ramirez et al., 1998). Similar evidence for long-distance dispersal has also been found for several other nitrogen-fixing bacteria, such as Rhizobium gallicum sensu lato (Silva et al., 2005) and species in the genus Bradyrhizobium (Vinuesa et al., 2005). Door, J., Hurek, T. & Reinhold-Hurek, B. (1998). Type IV pili are involved in plant–microbe and fungus–microbe interactions. Mol Microbiol 30, 7–17. The lack of strictly host-specific clades in S. meliloti is also consistent with its lifestyle in nature. S. meliloti is not an obligate symbiont but exists mostly as a free-living bacterium in the soil. As a result, each genotype might be exposed to many potential host species. Indeed, in such situations, an obligate host specialization might be detrimental to the longterm survival of these strains in natural soil environments. Further functional analyses of these strains in diverse environments could help us determine the fitness consequences of these and other genetic variations in S. meliloti. evolution in light of gene transfer. Mol Biol Evol 19, 2226–2238. natural populations of the nitrogen-fixing bacterium Rhizobium meliloti. Appl Environ Microbiol 56, 187–194. Farris, J. S., Källersjö, M., Kluge, A. G. & Bult, C. (1994). Testing significance of incongruence. Cladistics 10, 315–319. Finan, T. M., O’Brian, M. R., Layzell, D. B., Vessey, J. K. & Newton, W. (2002). Nitrogen-fixation: global perspectives. In Proceedings of the 13th International Congress on Nitrogen Fixation. Wallingford, UK: CABI Publishing. Galibert, F., Finan, T. M., Long, S. R. & 53 other authors (2001). The composite genome of the legume symbiont Sinorhizobium meliloti. Science 293, 668–672. Gogarten, J. P., Doolittle, W. F. & Lawrence, J. G. (2002). Prokaryotic Guo, H., Sun, S., Finan, T. M. & Xu, J. (2005). Novel DNA sequences from natural strains of the nitrogen-fixing symbiotic bacterium Sinorhizobium meliloti. Appl Environ Microbiol 71, 7130–7138. Hartmann, A., Giraud, J. J. & Catroux, G. (1998). Genotypic diversity of Sinorhizobium (formerly Rhizobium) meliloti strains isolated directly from a soil and from nodules of alfalfa (Medicago sativa) grown in the same soil. FEMS Microbiol Ecol 25, 107–116. Jebara, M., Mhamdi, R., Aouani, M. E., Ghrir, R. & Mars, M. (2001). Genetic diversity of Sinorhizobium populations recovered from different Medicago varieties cultivated in Tunisian soils. Can J Microbiol 47, 139–147. Kidd, S. E., Guo, H., Bartlett, K. H., Xu, J. & Kronstad, J. W. (2005). ACKNOWLEDGEMENTS We are grateful to Dr Bert Eardly for generously providing us with the strains. Financial support for this study is provided by Genome Canada, the Canadian Foundation for Innovation, the Premiers Research Excellence Award (PREA), the Ontario Innovation Trust, and the Ontario Research and Challenge Development Fund. 3258 Eardly, B. D., Materon, L. A., Smith, N. H., Johnson, D. A., Rumbaugh, M. D. & Selander, R. K. (1990). Genetic structure of Comparative gene genealogies indicate that two clonal lineages of Cryptococcus gattii in British Columbia resemble strains from other geographical areas. Eukaryot Cell 4, 1629–1638. Lan, L. & Xu, J. (2006). Multiple gene genealogical analyses suggest divergence and recent clonal dispersal in the opportunistic human pathogen Candida guilliermondii. Microbiology 152, 1539–1549. Downloaded from www.microbiologyresearch.org by IP: 88.99.165.207 On: Sat, 17 Jun 2017 15:12:39 Microbiology 152 Gene genealogy in S. meliloti Maiden, M. C., Bygraves, J. A., Feil, E. & 10 other authors (1998). Multilocus sequence typing: a portable approach to the identification of clones within populations of pathogenic microorganisms. Proc Natl Acad Sci U S A 95, 3140–3145. Sobral, B. W., Honeycutt, R. J., Atherly, A. G. & McClelland, M. (1991). Electrophoretic separation of the three Rhizobium meliloti replicons. J Bacteriol 173, 5173–5180. Moreiar, F. M., Haukka, S. K. & Young, J. P. (1998). Biodiversity of Sullivan, J. T., Trzebiatowski, J. R., Cruickshank, R. W. & 11 other authors (2002). Comparative sequence analysis of the symbiosis rhizobia isolated from a wide range of forest legumes in Brazil. Mol Ecol 7, 889–895. Swofford, D. L. Oresnik, I. J., Liu, S. L., Yost, C. K. & Hynes, M. F. (2000). Megaplasmid pRm2011a of Sinorhizobium meliloti is not required for viability. J Bacteriol 182, 3582–3586. Oyaizu, H., Matsumoto, S., Minamisawa, K. & Gamou, T. (1993). island of Mesorhizobium loti strain R7A. J Bacteriol 184, 3086–3095. (2004). PAUP* 4.10b: Phylogenetic Analysis Using Parsimony and Other Methods. Sunderland, MA: Sinaur Associates. Van Sluys, M. A., Monteiro-Vitorello, C. B., Camargo, L. E. & 7 other authors (2002). Comparative genomic analysis of plant-associated Distribution of rhizobia in leguminous plants surveyed by phylogenetic identification. J Gen Appl Microbiol 39, 339–354. bacteria. Annu Rev Phytopathol 40, 169–189. Paffetti, D., Scotti, C., Gnocchi, S., Fancelli, S. & Bazzicalupo, M. (1996). Genetic diversity of an Italian Rhizobium meliloti population Population genetics and phylogenetic inference in bacterial molecular systematics: the roles of migration and recombination in Bradyrhizobium species cohesion and delineation. Mol Phylogenet Evol 34, 29–54. from different Medicago sativa varieties. Appl Environ Microbiol 62, 2279–2285. Perez-Ramirez, N. O., Rogel, M. A., Wang, E., Castellanos, J. Z. & Martinez-Romero, E. (1998). Seeds of Phaseolus vulgaris bean carry Rhizobium etli. FEMS Microbiol Ecol 26, 289–296. Vinuesa, P., Silva, C., Werner, D. & Martı́nez-Romero, E. (2005). Wong, K. & Golding, G. B. (2003). A phylogenetic analysis of the pSymB replicon from the Sinorhizobium meliloti genome reveals a complex evolutionary history. Can J Microbiol 49, 269–280. Roumiantseva, M. L., Andronov, E. E., Sharypova, L. A., DammannKalinowski, T., Keller, M. & Young, J. P. (2002). Diversity of Xu, J. (2004). The prevalence and evolution of sex in microorgan- Sinorhizobium meliloti from the Central Asian Alfalfa Gene Center. Appl Environ Microbiol 68, 4694–4697. Xu, J. (2005). Fundamentals of fungal molecular population genetic Schofield, P. R., Gibson, A. H., Dudman, W. F. & Watson, J. M. (1987). Evidence for genetic exchange and recombination of Rhizobium symbiotic plasmids in a soil population. Appl Environ Microbiol 53, 2942–2947. Seifert, H. S. & DiRita, V. J. (2006). Evolution of Microbial Pathogens. Washington, DC: American Society for Microbiology. Silva, C., Vinuesa, P., Eguiarte, L. E., Souza, V. & Martı́nez-Romero, E. (2005). Evolutionary genetics and biogeographic structure of Rhizobium gallicum sensu lato, a widely distributed bacterial symbiont of diverse legumes. Mol Ecol 14, 4033–4050. http://mic.sgmjournals.org isms. Genome 47, 775–780. analyses. In Evolutionary Genetics of Fungi, pp. 87–116. Edited by J. Xu. Wymondham, UK: Horizon Scientific Press. Xu, J. (2006). Microbial ecology in the age of genomics and metagenomics: concepts, tools, and recent advances. Mol Ecol 15, 1713–1731. Xu, J., Vilgalys, R. & Mitchell, T. G. (2000). Multiple gene genealogies reveal recent dispersion and hybridization in the human pathogenic fungus Cryptococcus neoformans. Mol Ecol 9, 1471–1482. Young, J. P. W. & Wexler, M. (1988). Sym plasmids and chromosomal genotypes are correlated in field populations of Rhizobium leguminosarum. J Gen Microbiol 134, 2731–2739. Downloaded from www.microbiologyresearch.org by IP: 88.99.165.207 On: Sat, 17 Jun 2017 15:12:39 3259