Download Multiple gene genealogical analyses reveal both common and

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Microbiology (2006), 152, 3245–3259
DOI 10.1099/mic.0.29170-0
Multiple gene genealogical analyses reveal both
common and distinct population genetic patterns
among replicons in the nitrogen-fixing bacterium
Sinorhizobium meliloti
Sheng Sun, Hong Guo and Jianping Xu
Correspondence
Jianping Xu
Center for Environmental Genomics, Department of Biology, McMaster University, 1280 Main
St West, Hamilton, ON L8S 4K1, Canada
[email protected]
Received 29 May 2006
Revised 8 August 2006
Accepted 10 August 2006
Sinorhizobium meliloti is a Gram-negative alpha-proteobacterium that can form symbiotic
relationships with alfalfa and fix atmospheric nitrogen. The complete genome of a laboratory strain,
Rm1021, was published in 2001, and the genome of this strain is arranged in three replicons: a
chromosome of 3?65 million base pairs (Mb), and two megaplasmids, pSymA (1?35 Mb) and
pSymB (1?68 Mb). However, the potential difference in genetic variation among the three replicons
in natural strains remains poorly understood. In this study, a total of 16 gene fragments were
sequenced, four from pSymA and six each from the chromosome and pSymB, for 49 natural S.
meliloti strains. The analyses identified significant differences in divergence among genes, with the
mean Hasegawa–Kishino–Yano–1985 (HKY85) distance ranging from 0?00157 to 0?04109
between pairs of strains. Overall, genes on pSymA showed the highest mean HKY85 distance,
followed by those on pSymB and the chromosome. Although evidence for recombination was
found, the authors’ population genetic analyses revealed overall significant linkage disequilibria
among genes within both pSymA and the chromosome. However, genes on pSymB were in overall
linkage equilibrium, consistent with frequent recombination among genes on this replicon.
Furthermore, the genealogical comparisons among the three replicons identified significant
incongruence, indicating reassortment among the three replicons in natural populations. The results
suggest both shared and distinct patterns of molecular evolution among the three replicons in the
genomes of natural strains of S. meliloti.
INTRODUCTION
Prokaryotes and many eukaryotic microbes propagate
primarily through asexual binary fission. As a result, in
nature, most microbial populations show evidence of
clonality. In population genetic terms, indicators of
clonality include overrepresentation of certain genotypes,
congruent phylogenies, and significant deviations from
Hardy–Weinberg equilibrium and linkage equilibrium (Xu,
2004, 2005). However, most natural microbial populations
examined so far have also shown evidence of recombination
(Xu, 2004, 2006; Seifert & DiRita, 2006). Recent gene and
genome-sequence comparisons have revealed that both
homologous recombination and horizontal gene transfer
(HGT) are ubiquitous in prokaryotic microbes (e.g.
Gogarten et al., 2002; Seifert & DiRita, 2006). One of the
Abbreviations: HGT, horizontal gene transfer; HKY85, Hasegawa–
Kishino–Yano–1985; IA, index of association; IR, incompatibility ratio;
Mb, million base pairs; MGGA, multiple gene genealogical analysis;
MLEE, multilocus enzyme electrophoresis; MLST, multilocus sequence
typing; MP, maximum parsimony; PH, partition homogeneity.
0002-9170 G 2006 SGM
most commonly cited phenomena of microbial recombination and HGT is the spread of pathogenicity islands and
antibiotic-resistance genes among human pathogenic
microbes (e.g. Davies, 1994). Understanding the roles of
clonality and recombination in the genomes of natural
microbial populations has significant implications in both
basic and applied aspects of microbiology (Xu, 2006; Seifert
& DiRita, 2006).
Sinorhizobium meliloti is a Gram-negative alpha-proteobacterium capable of forming symbiotic relationships with
alfalfa and occasionally several other plant species (including those in genera Medicago, Medica and Tetrinella), and
fixing atmospheric nitrogen. This species is a model
organism for studying plant–microbe interactions and the
mechanisms of symbiotic nitrogen fixation (Finan et al.,
2002). The genome of a laboratory strain of this species,
Rm1021, has been completely sequenced (Galibert et al.,
2001). Its genome structure has been found to be similar to
that of most symbiotic nitrogen-fixing bacteria, in which
genetic information is typically partitioned into a chromosome and a variable number of large plasmids that encode
Downloaded from www.microbiologyresearch.org by
IP: 88.99.165.207
On: Sat, 17 Jun 2017 15:12:39
Printed in Great Britain
3245
S. Sun, H. Guo and J. Xu
genes for symbiosis (Finan et al., 2002). Like all strains of S.
meliloti analysed so far, strain Rm1021 is found to contain
three replicons (Sobral et al., 1991; Van Sluys et al., 2002; S.
Sun & J. Xu, unpublished data). However, the sizes of these
replicons may vary among strains. For example, strain
Rm1021 has a genome size of 6?7 million base pairs (Mb),
with a 3?65 Mb chromosome, a 1?35 Mb megaplasmid
called pSymA and a 1?68 Mb megaplasmid called pSymB
(Galibert et al., 2001). In contrast, the genome size of the
type strain of S. meliloti, ATCC 9930, is about 370 kb larger
than that of strain Rm1021 (Guo et al., 2005). Strain ATCC
9930 has a 3?65 Mb chromosome, a 1?63 Mb pSymA and a
1?82 Mb pSymB (Guo et al., 2005).
Using a variety of molecular markers, such as multilocus
enzyme electrophoresis (MLEE), random amplified polymorphic DNA (RAPD), amplified fragment length polymorphism (AFLP) and PCR-RFLP, several studies have
shown high levels of genetic diversity within natural
populations of S. meliloti (Biondi et al., 2003; Carelli et al.,
2000; Hartmann et al., 1998; Jebara et al., 2001; Paffetti et al.,
1996; Roumiantseva et al., 2002). However, little is known
about the potential similarities and differences among the
three replicons in natural strains of S. meliloti. Based on
Southern hybridization data using replicon-specific probes,
evidence for recombination has been found between
replicons in a soil population of 16 strains of a different
symbiotic nitrogen-fixing bacterium, Rhizobium leguminosarum var. trifolii (Schofield et al., 1987). However, in a
different study, correlation was found between genotypes
from the sym plasmid and the chromosome in field
populations of R. leguminosarum (Young & Wexler,
1988). At present, there is little information about the
potential replicon-specific population structure in other
prokaryotes with multiple large replicons.
Multiple gene genealogical analysis (MGGA) is a powerful
method for inferring strain relationships and for analysing
the structure of microbial populations (e.g. Kidd et al., 2005;
Lan & Xu, 2006; Silva et al., 2005; Vinuesa et al., 2005; Xu,
2005; Xu et al., 2000). MGGA is similar to the more widely
known multilocus sequence typing (MLST) (Cooper & Feil,
2004; Maiden et al., 1998). The major advantage of MGGA
over MLST is that, using MGGA, the evolutionary relationships among individual alleles are taken into account in
the inferences of relationships among strains and populations. Compared to other molecular markers, data generated
by MGGA (and MLST) are unambiguous, can be easily
stored in public databases, and are readily shared among
researchers.
In this study, we used MGGA to examine the patterns of
DNA sequence variation in a collection of natural strains of
the nitrogen-fixing bacterium S. meliloti. A total of 16 genes
distributed across the three replicons were analysed for each
of 49 strains. Eighteen strains belonged to the most frequent
MLEE type (ET), ET1, and the remaining 31 strains each had
a different ET (Eardly et al., 1990; Table 1). Using this
sample, we aimed to address the following questions. First,
3246
how much divergence is there among strains of S. meliloti at
the individual gene level, at the replicon level, and at the
genome level? Specifically, do genes on all three replicons
show similar levels of polymorphism and divergence?
Second, will the inferred strain relationships differ from
each other depending on the genes and replicons analysed?
Specifically, do strains belonging to ET1 cluster together in
our gene genealogical analysis? Third, is there evidence for
recombination within each replicon and among replicons?
METHODS
Strains and DNA isolation. The 49 isolates of S. meliloti analysed
in this study were part of the collection used for the MLEE study
reported by Eardly et al. (1990). The ETs, geographic origins and
host plant species of these strains are presented in Table 1. This
group contains 18 strains of ET1 and 31 strains of other ETs, with
each of the 31 strains having a different ET. Dr Bert Eardly of
Pennsylvania State University kindly provided us with these strains.
For each isolate, the storage culture from a 270 uC freezer was first
streaked onto a TY (tryptone/yeast extract) agar plate and incubated at
30 uC. For each strain, a single colony was picked to inoculate liquid
LBmc broth (per litre: 10 g pancreatic digest of casein, 5 g NaCl, 5 g
yeast extract, 2?5 mM MgSO4 and 2?5 mM CaCl2, pH 7). Cells were
incubated at 30 uC with constant agitation at 120 r.p.m. and harvested
by centrifugation when the population density reached an OD600 of
0?8–1?0. Genomic DNA was extracted using a method previously
described for S. meliloti (Guo et al., 2005). The quantity and quality of
DNA were assessed using the UltraSpec 2000 pro spectrophotometer
(Fisher Scientific).
Primers, PCR, and DNA sequencing. The 16 genes analysed here
were randomly picked from diverse regions of the genome of strain
Rm1021. Here, we assumed that strain Rm1021 had a genome structure typical of S. meliloti, and that the 16 genes analysed on different
replicons of Rm1021 were on corresponding replicons in other
strains (Sobral et al., 1991). The primers were designed based on the
genome sequence of strain Rm1021 (http://bioinfo.genopole-toulouse.prd.fr/annotation/iANT/bacteria/rhime/). The gene names, primer
sequences, and their genomic locations are shown in Fig. 1 and
Table 2. The information in Fig. 1 and Table 2 is all based on the
sequenced laboratory strain Rm1021 (Galibert et al., 2001). For the
SmbExoF3 gene, two primer pairs were used, with 44 strains amplified by the ExoF3-1 primer pair, and five strains (M56, M275,
N6B9, CC2003 and M294) amplified by the ExoF3-2 primer pair.
The change of primers was necessary because the initial ExoF3-1
primer pair did not work for these five strains. PCR for all other
genes was successful for all strains using a single primer pair each
(Table 2).
A typical PCR reaction contained 6 ml diluted genomic DNA template
(~20 ng), 0?5 U Taq DNA polymerase, 1 mM each primer and
200 mM of each of the four deoxyribonucleotide triphosphates, in a
total volume of 30 ml. The following PCR conditions were used for all
amplifications: 4 min at 95 uC, followed by 30 cycles of 30 s at 95 uC,
30 s at 56 uC (at 50 uC for ExoF3-2), 45 s at 72 uC, and finally 7 min at
72 uC.
After confirmation of the PCR products by agarose gel electrophoresis,
PCR products were cleaned using the DiaMed PCR cleanup kit
according to the manufacturer’s manual. The purified PCR products
were then sequenced (Mobix Laboratory, McMaster University,
Downloaded from www.microbiologyresearch.org by
IP: 88.99.165.207
On: Sat, 17 Jun 2017 15:12:39
Microbiology 152
Gene genealogy in S. meliloti
Table 1. Strains of S. meliloti used in this study
Strain
ET*
ATCC 9930
V-7
102F28
U45
Rm1021 (RCR2011)
L5-30
41
Sa10
M124
M101
M94
M68
M44
M11
M6
M5
N4A6
N4A3
M56
A145
M98
M275
M270
102F85
74B3
N6B1
M95
128A7
56A14
M286
M289
15B4
128A10
N6B5
N6B11
17B6
N6B9
1322
CC2003
N6B4
M248
15A5
M294
M119
S33
102F51
74B4
74B12
74B15
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
2
3
4
5
6
7
8
9
10
11
12
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
Original host species
(genus Medicago)
M. sativa
M. sativa
M. sativa
M. sativa
M. sativa
M. sativa
M. sativa
M. sativa
Unspecified
M. truncatula
M. truncatula
M. polymorpha
M. minima
M. rigidula
M. rigidula
M. rigidula
M. sativa
M. sativa
M. rotato
M. sativa
M. rotato
M. rigidula
M. truncatula
M. sativa
M. sativa
M. falcate
M. rotato
M. sativa
M. sativa
M. rotato
M. truncatula
M. sativa
M. sativa
M. falcate
M. falcate
M. sativa
M. falcate
M. sativa
M. sativa
M. sativa
M. polymorpha
M. sativa
M. polymorpha
Unspecified
M. sativa
M. sativa
M. sativa
M. sativa
M. sativa
Country of origin
USA
Canada
USA
Uruguay
Australia
Poland
Hungary
France
Syria
Syria
Syria
Syria
Syria
Syria
Syria
Syria
Nepal
Nepal
Syria
Syria
Syria
Jordan
Jordan
Canada
Pakistan
Nepal
Syria
Pakistan
Pakistan
Jordan
Jordan
Pakistan
Pakistan
Nepal
Nepal
Pakistan
Nepal
New Zealand
Australia
Nepal
Jordan
Pakistan
Jordan
Syria
USA
USA
Pakistan
Pakistan
Pakistan
*ETs refer to those defined by Eardly et al. (1990).
http://mic.sgmjournals.org
Downloaded from www.microbiologyresearch.org by
IP: 88.99.165.207
On: Sat, 17 Jun 2017 15:12:39
3247
S. Sun, H. Guo and J. Xu
mdh
smc00408
smc00735
minC
fdhE
smb20036
cbbR
Chromosome
oxyR
pSymB
smb20492
exoF3
sma1440
smc01261
Canada) using an Applied BioSystems Prism 3100 automated
sequencer with dRhodamine-labelled terminators (PE Applied
BioSystems), following the manufacturer’s instructions.
Data analyses
Phylogenetic analysis. The analyses of DNA sequence variation
within a gene, among genes within a replicon and among replicons
were performed using the programs PAUP* (Swofford, 2004) and
MULTILOCUS 1.3 (Agapow & Burt, 2001). The maximum parsimony
(MP) trees were obtained using heuristic searches and the tree-bisection reconnection (TBR) branch swapping with 100 starting trees
obtained by a random sequential addition of taxa. Searches for MP
trees were conducted for each of the 16 genes, the combined gene
sequences on each replicon, and the combined sequences for all 16
gene fragments. Mid-point rooting was used for all phylogenetic
trees.
To compare the amount of sequence divergence among genes, pairwise
strain sequence difference was calculated for each of the 16 genes, for
the combined sequence information on each of the three replicons, and
for the combined 16 genes from all three replicons using the Hasegawa–
Kishino–Yano–1985 (HKY85) distance measure. The HKY85 model is
chosen because it treats transitions and transversions differently, and
uses observed base-substitution patterns to derive the optimal
weighting scheme among various types of transitions and transversions. In addition, HKY85 does not assume equal base frequencies in
the analysed genes and takes into account base frequency heterogeneity. Base frequency heterogeneity is commonly observed in
rhizobia genes and genomes, including those of S. meliloti (Galibert
et al., 2001). The HKY85 distances were obtained through PAUP*, but
were then exported to Microsoft Excel for calculations of means and
standard deviations. Statistical significance between replicons in mean
pairwise strain divergence was assessed by the non-parametric Mann–
Whitney U test (a rank-order test).
Linkage disequilibrium. To examine the potential differences in
linkage disequilibrium among genes and among replicons, we implemented three complementary tests. Because of the large number of
haplotypes for each gene in this collection of strains, the use of
unique haplotypes as individual alleles for each sequenced gene
would make two of the following three tests meaningless [i.e. the
index of association (IA) and the phylogenetic incompatibility tests].
Instead, we analysed linkage disequilibria among phylogenetically
informative polymorphic nucleotide sites, and treated each site as a
locus and different bases at each site as an allele. The phylogenetically informative sites were then defined as different linkage groups
based on the specifics of individual analyses using the program
MULTILOCUS (Agapow & Burt, 2001).
In the first test, we calculated the standard, most commonly used
multilocus linkage disequilibrium, IA. In this test, the observed data
were compared against the null hypothesis that alleles (i.e. bases) from
different loci (i.e. phylogenetically informative sites) were randomly
associating with each other. If the population were clonal, there would
3248
pSymA
nifH
smb21596
aqpzl
sma1821
Fig. 1. Relative positions of the 16 analysed
genes on the three replicons of the model
laboratory strain Rm1021.
be significant linkage disequilibrium and the null hypothesis of random
allelic association would be rejected. The underlying assumptions,
formulae, and inferences of statistical significance of IA, can be found
on the MULTILOCUS program homepage (Agapow & Burt, 2001).
Because the value of the traditional IA can be influenced by the number
of loci analysed (typically the higher the number of loci, the higher the
IA value), to make comparisons between genes and replicons easier to
interpret, we standardized the IA value by the number of loci (i.e.
phylogenetically informative sites). This standardized index is called
Rd (Agapow & Burt, 2001).
In the second test, the proportion of pairwise loci that were
phylogenetically incompatible was calculated. In the simplest case, in
a haploid species (such as most bacteria, including S. meliloti), a
phylogenetic incompatibility occurs between two loci with two alleles
each when all four possible genotypes are found in the population.
Phylogenetic incompatibility is an indicator of recombination at the
population level. The incompatibility ratio (IR), where IR=(number
of incompatible pairs of sites in the test dataset)/(number of
incompatible pairs of sites in a randomly shuffled dataset), can be
used as a test for inferences of statistical significance. For each IR test,
1000 randomizations were performed and the 95 % confidence interval
was generated and compared with the observed percentage of
phylogenetically incompatible pairs of loci. In each comparison, a P
value of less than 0?05 would indicate that the hypothesis of random
recombination should be rejected for the population (Agapow & Burt,
2001).
The above two analyses compared the observed data against the null
hypothesis of random recombination. In small populations with highly
skewed allele frequencies, these tests can have a significant type II error:
the error to accept a null but false hypothesis. Because the sample size
here is relatively small (n=49) and singleton alleles are common, to
minimize the type II error, we also used the third complementary test,
the partition homogeneity (PH) test. The PH test is also called the
incongruence length difference (ILD) test. The null hypothesis for the
PH test is strict clonality (Farris et al., 1994). Using phylogenies
inferred from different genes, this test compares whether genealogies
from different genes are congruent. Congruent gene genealogies
suggest clonality and incongruent gene genealogies indicate recombination. Specifically, when multiple genealogies are compared, the
length of the shortest possible tree from the combined dataset is
compared to that of observed data. If the tree length from the observed
dataset is significantly longer than that of the shortest possible tree, the
genealogies are considered incongruent. In contrast, if the tree length of
the observed dataset is not significantly different from that of the
shortest possible tree, these genealogies are considered statistically
congruent. The statistical significance of this test is derived using 100
randomizations of phylogenetically informative polymorphic nucleotides among genes within the same strain. The starting trees (100) were
obtained by random sequential addition of taxa, and branch swapping
was done using the tree-bisection reconnection method. The PH test
was conducted among genes within each replicon, between pairs of
replicons, and among all three replicons.
Downloaded from www.microbiologyresearch.org by
IP: 88.99.165.207
On: Sat, 17 Jun 2017 15:12:39
Microbiology 152
Gene genealogy in S. meliloti
Table 2. Genes and primers used in this study
Gene name
Function
Primer*
Genes on pSymA
SmaNifH
Nitrogenase Fe protein
Sma1440
5-Dehydro-4-deoxyglucarate
dehydratase
SmaFdhE
Probable FdhE formate
dehydrogenase formation
Sma1821
Conserved hypothetical protein
Genes on pSymB
Smb20036
Putative ABC transporter
periplasmic solute-binding
protein
SmbCbbR
Transcriptional regulator
Smb20492
Short-chain oxidoreductase
SmbExoF3
Putative OMA family
outer-membrane protein
precursor
Smb21596
Conserved hypothetical protein
SmbMinC
Putative cell-division inhibitor
Genes on
the chromosome
Smc00408
Bacitracin-resistance protein,
putative undecaprenol kinase
SmcOxyR
Hydrogen peroxide-inducible
gene activator
Smc01261
Transporter
SmcAqpz1
Aquaporin Z (bacterial
nodulin-like intrinsic protein)
Smc00735
Hypothetical protein
SmcMdh
Malate dehydrogenase
Primer sequence (5§R3§)
Position in databaseD
F
R
F
CCGAACAACCGAAATAGCTTAAAC
AAGCATCTGCTCGTCGCTCTTCATG
CGCCAGTTCCGGCACGAAATT
453 517–453 540
454 404–454 380
794 608–794 628
R
F
CGAGCGAAAAAACCGATGCG
AAGCCGAATTTGGCACGCCT
795 362–795 343
6218–6237
R
F
R
AGCCCATCAGGAACGGGTCAA
AGCAACCAACCGAAGAGGCCA
AGGCGCCGCCGAATTTTTTG
7064–7044
1 033 058–1 033 078
1 032 458–1 032 477
F
GGCATGGAGAAATTCGCCGA
46 790–46 809
R
F
R
F
R
1-F
TTCCATTCCCGTCTTGCGGA
AAGGATGGCGCAAAAGGGGA
TGATCGTCTCGTTCGAAGCGA
CGCAACGCGTCCAATGTTGA
TGCCCACAACCCGAACAATG
TTCCTTGACGATGCCGAGCTG
47 508–47 527
212 394–212 413
213 158–213 138
510 333–510 352
509 716–509 735
813 783–813 803
1-R
2-F
2-R
F
R
F
R
TGCAAGCTTTGCGAGCTGCA
ACTTCCTTGACGATGCCGAG
TTCGGCGGAGTGTTTTCCAG
ATCCAGCCAAATCCATCCGC
GTCCAATTGCTGTCGCCGAA
CCCTCTAGAAGCGTCCCGTAGATATG
CCCCGGATCCGCTAGCAATAATTAACGAAGATG
814 607–814 588
813 781–813 800
814 694–814 675
1 137 619–1 137 638
1 136 909–1 136 928
1 447 462–1 447 487
1 448 050–1 448 028
F
TCGGATTCAAATCGCCGGGA
359 358–359 377
R
F
AGGATGCGCCAGATCGCAAA
AGGCGGATATGGCGTTTGCA
359 993–360 012
839 259–839 278
R
F
R
F
TGGAAGAACATCTGGGCGTGA
ATGGATTCCGATGACGCGGT
TGGTTTGCGATCCGGCATTG
GGCACTCGAGTATGCGTCGAGCCAAGAATGATGAG
840 017–839 997
1 514 466–1 514 485
1 515 170–1 515 189
2 339 959–2 339 993
R
F
R
F
R
TTCAAGATCTGGAAGCTCTCTGTGGAATTTC
ATTCGAGGCCGCGATCTTCGA
AGCACGAGCCGATGATGGTGA
GCACGCGCTTCTTGTCCTTGA
TTCGGGGATGATTGGTGGCA
2 340 428–2 340 398
2 846 121–2 846 141
2 846 709–2 846 729
3 318 701–3 318 721
3 318 760–3 318 741
*F, forward; R, reverse.
Dhttp://bioinfo.genopole-toulouse.prd.fr/annotation/iANT/bacteria/rhime/
http://mic.sgmjournals.org
Downloaded from www.microbiologyresearch.org by
IP: 88.99.165.207
On: Sat, 17 Jun 2017 15:12:39
3249
S. Sun, H. Guo and J. Xu
RESULTS
Haplotype variation among genes and replicons
The number of nucleotides analysed for each gene and
each replicon is presented in Table 3. Among the total
of 11 189 nucleotides obtained from the 16 genes,
2995 nucleotides were from the four genes on pSymA,
4170 nucleotides from the six genes on pSymB, and
4024 nucleotides from the six genes on the chromosome.
For the 49 strains, the number of unique sequence types
(called a haplotype) varied widely among the 16 sequenced
gene fragments, from six to 24, with a mean of 13?5
haplotypes per gene fragment (Table 3). Overall, more
haplotypes were found for genes on the two megaplasmids
pSymA (a mean of 18?5 haplotypes per gene) and pSymB
(a mean of 14 haplotypes per gene) than for those on the
chromosome (a mean of 9?7 haplotypes per gene). Among
the 49 strains, the combined gene sequence analyses
identified 43 unique haplotypes for pSymA, 46 haplotypes
for pSymB, and 34 haplotypes for the chromosome (Fig. 2,
Table 3). Analysis of the 16 genes together showed that each
of the 49 strains had a unique multilocus genotype (Fig. 3).
In this collection of strains, similar genotypic diversities
were observed between strains of ET1 (n=18 strains) and
those of other ETs (n=31 strains). Based on the sequences
from pSymA, we identified a total of 15 unique genotypes
for the 18 ET1 strains and 29 genotypes for the 31 strains of
other ETs. While a slightly higher genotypic diversity was
observed for the non-ET1 sample based on pSymB
sequences (31 genotypes for 31 strains) than for the ET1
sample (16 genotypes for 18 strains), the reverse was true for
the chromosomal genes. Specifically, for the combined six
genes on the chromosome, 16 genotypes were found for the
18 ET1 strains, while 20 genotypes were found for the 31
strains with other ETs. Of interest are two strains, 74B3
(ET8) and N4A3 (ET1), which showed identical pSymA and
pSymB sequences, but had slightly different genotypes based
on chromosomal gene sequences (Figs 2 and 3). These two
strains differed at one base each for two of the six sequenced
genes, Smc1261 and SmcOxyR. Based on the combined
DNA sequences from all 16 genes, each of the 49 strains had
a unique genotype. Because the genotypic diversity among
the ET1 strains was similar to that of non-ET1 strains, in the
following analysis we did not treat them separately but
analysed all 49 strains together.
Mean sequence divergence between strains
The mean pairwise HKY85 distance between strains was
calculated for each gene, each replicon and the whole
genome, based on available sequences (Table 3, last
column). The smallest was found for the SmcAqpz1 gene
(mean±SD=0?00157±0?00164; n=1176 pairwise comparisons) located on the chromosome, and the largest was found
Table 3. Molecular variation within and among genes in natural strains of S. meliloti
Gene name
Number of nucleotides Number of
analysed (bp)
unique alleles
Sma1440
Sma1821
SmaFdhE
SmaNifH
pSymA genes combined
Smb20036
Smb20492
Smb21596
SmbCbbR
SmbExoF3
SmbMinC
pSymB genes combined
Smc00408
Smc00735
Smc01261
SmcAqpz1
SmcMdh
SmcOxyR
Chromosome genes combined
All 16 genes combined
721
621
810
843
2995
738
637
699
729
793
574
4170
655
806
724
435
693
711
4024
11 189
24
12
24
14
43
16
17
15
15
13
8
46
6
18
9
8
8
9
34
49
Phylogenetic analysis
No. of
Tree length
MP trees
232
2
4
150
6144
96
6
6
35
6
1
576
1
12
6
1
4
1
5
8
52
25
92
76
325
34
24
92
24
145
16
479
17
51
12
8
11
14
133
1179
Pairwise HKY85 distanceD
CI*
0?865
0?920
0?924
0?882
0?677
0?853
1?000
0?946
0?833
0?959
1?000
0?658
1?000
0?980
0?917
1?000
0?909
1?000
0?827
0?547
0?753±0?532
0?807±0?755
2?501±2?078
1?736±2?211
1?498±1?066
0?888±0?721
0?353±0?249
1?104±1?801
0?629±0?525
4?109±5?813
0?382±0?683
1?285±1?109
0?538±0?627
0?615±0?685
0?325±0?223
0?157±0?164
0?170±0?233
0?200±0?256
0?349±0?258
1?001±0?513
*CI, consistency index.
DValues shown are mean±SD (61022).
3250
Downloaded from www.microbiologyresearch.org by
IP: 88.99.165.207
On: Sat, 17 Jun 2017 15:12:39
Microbiology 152
Gene genealogy in S. meliloti
(a) MP tree based on combined pSymA sequences
Sm1021, ET1, sat, AUS
U45, ET1, sat, URY
L5-30, ET1, sat, POL
9930, ET1, sat, USA
M56, ET2, rot, SYR
M98, ET4, rot, SYR
98
100
100
A145, ET3, sat, SYR
102F85, ET7, sat, CAN
56A14, ET12, sat, PAK
15B4, ET17, sat, PAK
74B4, ET32, sat, PAK
100
128A10, ET18, sat, PAK
74B15, ET34, sat, PAK
74B12, ET33, sat, PAK
V-7, ET1, sat, CAN
41, ET1, sat, HUN
N6B4, ET25, sat, NPL
M124, ET1, uns, SYR
M94, ET1, tru, SYR
M44, ET1, min, SYR
M275, ET5, rig, JOR
M95, ET10, rot, SYR
M289, ET16, tru, JOR
N4A6, ET1, sat, NPL
N6B1, ET9, fal, NPL
M286, ET15, rot, JOR
N6B9, ET22, fal, NPL
M294, ET28, pol, JOR
M6, ET1, rig, SYR
S33, ET30, sat, USA
102F51, ET31, sat, USA
74B3, ET8, sat, PAK
N4A3, ET1, sat, NPL
128A7, ET11, sat, PAK
102F28, ET1, sat, USA
Sa10, ET1, sat, FRA
97
N6B5, ET19, fal, NPL
N6B11, ET20, fal, NPL
17B6, ET21, sat, PAK
17A5, ET27, sat, PAK
100
99
M270, ET6, tru, JOR
M101, ET1, tru, SYR
M11, ET1, rig, SYR
M5, ET1, rig, SYR
1322, ET23, sat, NZL
M119, ET29, uns, SYR
M68, ET1, pol, SYR
CC2003, ET24, sat, AUS
M248, ET26, pol, JOR
97
5
Fig. 2. For legend see page 3253.
for the SmbExoF3 gene (mean±SD=0?04109±0?05813;
n=1176) located on pSymB, corresponding to a difference
of over 26-fold. In general, genes on the chromosome showed
significantly less among-strain divergence than genes on
pSymA and pSymB.
When genes on the same replicon were considered together,
the mean divergence for genes on pSymA (mean=0?01498;
SD=0?01066) was over four times greater than that
for chromosomal genes (mean=0?00349; SD=0?00258).
Similarly, the mean strain divergence for genes on pSymB
(mean=0?01285; SD=0?01109) was over three times greater
than that of the chromosomal genes. The non-parametric
http://mic.sgmjournals.org
Mann–Whitney U test indicated that while the mean
divergence for the sequenced pSymA genes was not
significantly different from that of those on pSymB
(U=16, one-tailed P=0?238), both showed significantly
greater divergences than those on the chromosome.
Specifically, the Mann–Whitney U values were 24 and 32
between pSymA and the chromosome, and between pSymB
and the chromosome, with P values of 0?005 and 0?013,
respectively. Because the gene SmbExoF3 from pSymB
showed an abnormally high divergence, we also tested the
differences between replicons excluding SmbExoF3.
However, our analysis indicated no significant difference
in the tests with or without the SmbExoF3 gene. Without
Downloaded from www.microbiologyresearch.org by
IP: 88.99.165.207
On: Sat, 17 Jun 2017 15:12:39
3251
S. Sun, H. Guo and J. Xu
(b) MP tree based on combined pSymB sequences
Sm1021, ET1, sat, AUS
U45, ET1, sat, URY
L5-30, ET1, sat, POL
M289, ET16, tru, JOR
M124, ET1, uns, SYR
M94, ET1, tru, SYR
M44, ET1, min, SYR
M6, ET1, rig, SYR
M98, ET4, rot, SYR
102F51, ET31, sat, USA
M119, ET29, uns, SYR
M5, ET1, rig, SYR
M101, ET1, tru, SYR
M68, ET1, pol, SYR
N6B4, ET25, sat, NPL
M270, ET6, tru, JOR
M95, ET10, rot, SYR
17B6, ET21, sat, PAK
1322, ET23, sat NZL
98
97
91
S33, ET30, sat, USA
74B3, ET8, sat, PAK
N4A3, ET1, sat, PAK
M286, ET15, rot, JOR
N4A6, ET1, sat, NPL
M248, ET26, pol, JOR
74B15, ET34, sat, PAK
102F85, ET7, sat, CAN
100
100
95
5
56A14, ET12, sat, PAK
15B4, ET17, sat, PAK
74B4, ET32, sat, PAK
94
128A10, ET18, sat, PAK
74B12, ET33, sat, PAK
M11, ET1, rig, SYR
9930, ET1, sat, USA
A145, ET3, sat, SYR
128A7, ET11, sat, PAK
15A5, ET27, sat, PAK
N6B1, ET9, fal, NPL
N6B5, ET19, fal, NPL
N6B11, ET20, fal, NPL
V-7, ET1, sat, CAN
102F28, ET1, sat, USA
99
41, ET1, sat, HUN
Sa10, ET1, sat, FRA
M56, ET2, rot, SYR
M275, ET5, rig, JOR
100
N6B9, ET22, fal, NPL
CC2003, ET24, sat, AUS
M294, ET28, pol, JOR
Fig. 2. For legend see page 3253.
including the SmbExoF3 gene, the Mann–Whitney U value
between pSymA and pSymB was 16, with a P value of 0?095,
indicating no significant difference. Similarly, excluding the
SmbExoF3 gene, the U value between pSymB and the
chromosome was 26, with a P value of 0?026, indicating a
significant difference and consistent with the comparison
that included the SmbExoF3 gene.
Linkage disequilibria and IR tests
To investigate the relationships among alleles at polymorphic nucleotide sites, we analysed all phylogenetically
3252
informative sites (i.e. nucleotide sites that had at least two
alleles with each present in at least two strains) using the
MULTILOCUS software. Specifically, we calculated the overall
IA among alleles at different sites and obtained the percentage
of pairwise sites that were phylogenetically incompatible.
Within each of the 16 genes, we found no evidence of
phylogenetic incompatibility or recombination (data not
shown). Therefore, our focus here is on comparing the
associations among phylogenetically informative sites
between genes on the same replicon and between replicons.
To achieve these goals, we performed two sets of tests. The
first computed the allelic association between all pairwise
Downloaded from www.microbiologyresearch.org by
IP: 88.99.165.207
On: Sat, 17 Jun 2017 15:12:39
Microbiology 152
Gene genealogy in S. meliloti
(c) MP tree based on combined chromosomal sequences
90
100
91
1
Sm1021, ET1, sat, AUS
1322, ET23, sat, NZL
M119, ET29, uns, SYR
U45, ET1, sat, URY
N4A6, ET1, sat, NPL
L5-30, ET1, sat, POL
M6, ET1, rig, SYR
CC2003, ET24, sat, AUS
M275, ET5, rig, JOR
17B6, ET21, sat, PAK
N6B9, ET22, fal, NPL
M248, ET26, pol, JOR
15A5, ET27, sat, PAK
M294, ET28, pol, JOR
M124, ET1, uns, SYR
M94, ET1, tru, SYR
M44, ET1, min, SYR
M68, ET1, pol, SYR
9930, ET1, sat, USA
A145, ET3, sat, SYR
128A7, ET11, sat, PAK
N6B5, ET19, fal, NPL
N6B11, ET20, fal, NPL
V-7, ET1, sat, CAN
102F28, ET1, sat, USA
102F85, ET7, sat, CAN
41, ET1, sat, HUN
74B3, ET8, sat, PAK
N4A3, ET1, sat, NPL
M95, ET10, rot, SYR
M289, ET16, tru, JOR
M56, ET2, rot, SYR
M98, ET4, rot, SYR
91
M270, ET6, tru, JOR
M101, ET1, tru, SYR
M11, ET1, rig, SYR
M5, ET1, rig, SYR
M286, ET15, rot, JOR
N6B1, ET9, fal, NPL
N6B4, ET25, sat, NPL
Sa10, ET1, sat, FRA
S33, ET30, sat, USA
102F51, ET31, sat, USA
56A14, ET12, sat, PAK
98
15B4, ET17, sat, PAK
128A10, ET18, sat, PAK
74B4, ET32, sat, PAK
100
74B12, ET33, sat, PAK
74B15, ET34, sat, PAK
Fig. 2. Single representative MP trees for the combined DNA sequences from each of the three replicons. (a) pSymA, (b)
pSymB, (c) chromosome. Mid-point rooting was used for all trees. Scale bars represent the number of nucleotide
substitutions, and the branch lengths are proportional to the amount of sequence divergence. Bootstrap values greater than
90 % are shown in each tree.
gene combinations within each of the three replicons, and
the second computed the overall association among all genes
within each replicon. The summary results of the analyses are
presented in Table 4. Below is a brief overview of the results.
Of the six pairwise combinations for genes on pSymA,
results from three pairs consistently rejected the null
hypothesis of random association among alleles by both
the IR test and the IA test. The hypothesis of random
association for two other combinations was rejected by
http://mic.sgmjournals.org
one test each, and both tests failed to reject one combination
(i.e. the gene pair Sma1440 versus SmaNifH) (Table 4).
Interestingly, all three pairs that showed signatures of
random association involved gene Sma1440. Analysing all
four genes together, the null hypothesis of random
association among polymorphic nucleotide sites was
rejected by both methods (Table 4). Overall, the results
provided evidence of clonality with localized recombination
(most likely involving gene Sma1440) among the four
sequenced genes on pSymA.
Downloaded from www.microbiologyresearch.org by
IP: 88.99.165.207
On: Sat, 17 Jun 2017 15:12:39
3253
S. Sun, H. Guo and J. Xu
While a similar pattern to that of pSymA was observed for
the six genes on the chromosome (Table 4), a different
pattern overall was seen for genes on pSymB. The combined
analysis of all phylogenetically informative polymorphic
sites among genes on pSymB failed to reject the null
hypothesis of random recombination, indicating a popula-
tion structure not significantly different from random
association (Table 4). However, it should be pointed out
that significant linkage disequilibria were seen in about onethird of the pairwise gene combinations in pSymB in each of
the two tests, a result indicative of some localized clonality
on pSymB.
Table 4. IA and phylogenetic incompatibility in S. meliloti
Genomic region
pSymA
All four genes on pSymA combined
pSymB
All six genes on pSymB combined
Chromosome
All six genes on the chromosome
combined
Gene pair
Sma1440/Sma1821
Sma1440/SmaFdhE
Sma1440/SmaNifH
Sma1821/SmaFdhE
Sma1821/SmaNifH
SmaFdhE/SmaNifH
Smb20036/Smb20492
Smb20036/Smb21596
Smb20036/SmbCbbR
Smb20036/SmbExoF3
Smb20036/SmbMinC
Smb20492/Smb21596
Smb20492/SmbCbbR
Smb20492/SmbExoF3
Smb20492/SmbMinC
Smb21596/SmbCbbR
Smb21596/SmbExoF3
Smb21596/SmbMinC
SmbCbbR/SmbExoF3
SmbCbbR/SmbMinC
SmbExoF3/SmbMinC
Smc00408/Smc00735
Smc00408/Smc01261
Smc00408/SmcAqpz1
Smc00408/SmcMdh
Smc00408/SmcOxyR
Smc00735/Smc01261
Smc00735/SmcAqpz1
Smc00735/SmcMdh
Smc00735/SmcOxyR
Smc01261/SmcAqpz1
Smc01261/SmcMdh
Smc01261/SmcOxyR
SmcAqpz1/SmcMdh
SmcAqpz1/SmcOxyR
SmcMdh/SmcOxyR
PrCD
Rdd
0?747
0?831**
0?757
0?846**
0?719*
0?772**
0?719**
0?877*
0?767
0?779*
0?883*
0?773
0?883
0?728**
0?94
0?766
0?778
0?788
0?782
0?926*
0?56
0?959*
0?723
0?744
0?868**
0?83
0?789
0?783
0?812**
0?87
0?895
0?824
0?924**
0?879*
0?78
0?894
0?791
0?847
0?768**
0?122*
0?215
0?292
0?312**
0?452**
0?289**
0?204**
0?139
0?146*
0?209**
0?433
0?226**
0?205
0?216
0?572
0?274
0?182
0?419
0?225**
0?507
0?300*
0?508
0?244
0?308**
0?257**
0?281
0?317**
0?225
0?225
0?299
0?401**
0?285*
0?041
0?089
0?17
0?095
0?281*
0?274*
0?167**
DPrC, proportion of pairwise polymorphic nucleotide sites that are phylogenetically compatible. Asterisks
refer to the rejection of the random recombination hypothesis between genes: *P<0?05; **P<0?01.
dRd, standardized IA by the number of analysed loci (i.e. phylogenetically informative nucleotide sites).
3254
Downloaded from www.microbiologyresearch.org by
IP: 88.99.165.207
On: Sat, 17 Jun 2017 15:12:39
Microbiology 152
Gene genealogy in S. meliloti
Gene genealogy analyses
We used the MP method to infer the relationships among
strains for each of the 16 genes (trees not shown), as well as
for the combined sequences for each of the three replicons
and for all the combined gene sequences. The number of MP
trees and the lengths of these MP trees for the 16 genes are
summarized in Table 3. Fig. 2 shows three representative
MP trees, one for each replicon, based on all the combined
DNA sequences. Bootstrap values greater than 90 % are
labelled for individual branches on the phylogenetic trees.
While phylogenetic analyses of genes on each of the
three replicons identified several robust clusters of
strains that were shared among the three replicons (Figs 2
and 3), overall, the PH test identified limited evidence
for phylogenetic congruence among genes within each of
the three replicons as well as between replicons (Table 5).
All combined MP trees had lengths significantly longer
than the summed lengths of individual gene trees
(Table 5). The results of the PH tests are consistent with
recombination among the analysed genes on the three
replicons.
MP tree based on combined total sequence
Sm1021, ET1, sat, AUS
U45, ET1, sat, URY
L5-30, ET1, sat, POL
M6, ET1, rig, SYR
98
A145, ET3, sat, SYR
V-7, ET1, sat, CAN
41, ET1, sat, HUN
102F85, ET7, sat, CAN
100
91
10
56A14, ET12, sat, PAK
15B4, ET17, sat, PAK
128A10, ET18, sat, PAK
95
74B4, ET32, sat, PAK
100
74B12, ET33, sat, PAK
74B15, ET34, sat, PAK
N6B4, ET25, sat, NPL
M98, ET4, rot, SYR
S33, ET30, sat, USA
102F51, ET31, sat, USA
M270, ET6, tru, JOR
100
M101, ET1, tru, SYR
M11, ET1, rig, SYR
96
M5, ET1, rig, SYR
1322, ET23, sat, NZL
100
M119, ET29, uns, SYR
M68, ET1, pol, SYR
M248, ET26, pol, JOR
17B6, ET21, sat, PAK
74B3, ET8, sat, PAK
N4A3, ET1, sat, NPL
M286, ET15, rot, JOR
N4A6, ET1, sat, NPL
M95, ET10, rot, SYR
M289, ET16, tru, JOR
M124, ET1, uns, SYR
M94, ET1, tru, SYR
M44, ET1, min, SYR
9930, ET1, sat, USA
128A7, ET11, sat, PAK
15A5, ET27, sat, PAK
100
N6B1, ET9, fal, NPL
N6B5, ET19, fal, NPL
95
N6B11, ET20, fal, NPL
102F28, ET1, sat, USA
Sa10, ET1, sat, FRA
M56, ET2, rot, SYR
M275, ET5, rig, JOR
94
N6B9, ET22, fal, NPL
100
CC2003, ET24, sat, AUS
M294, ET28, pol, JOR
Fig. 3. One of eight MP trees for the combined DNA sequence from all 16 sequenced genes. Mid-point rooting was used for
this tree. The scale bar represents the number of nucleotide substitutions, and the branch lengths are proportional to the
amount of sequence divergence. Bootstrap values greater than 90 % are shown.
http://mic.sgmjournals.org
Downloaded from www.microbiologyresearch.org by
IP: 88.99.165.207
On: Sat, 17 Jun 2017 15:12:39
3255
S. Sun, H. Guo and J. Xu
In addition, we also observed incongruence among the
phylogenetic trees inferred from among the three different
replicons. In all three pairwise PH tests among the three
replicons, the lengths of the combined MP trees were
significantly longer than the summed length of MP trees
based on individual replicons (Table 5). Thus, the PH tests
rejected the null hypotheses of congruence among MP trees
from the sequences of the three replicons. The significantly
incongruent phylogenies among the three replicons provided
evidence of recombination among the replicons in nature.
Our genealogical analyses identified some geographic or
host-specific clusters of strains. For example, five strains
(56A14, 15B4, 74B4, 128A10 and 74B12), each with a
different ET, originally isolated from M. sativa plants in
Pakistan, were consistently clustered together in all
phylogenies, based on sequences derived from pSymA,
pSymB and the chromosome (Fig. 2). Another strain from
Pakistan, 74B15, showed clustering with the above five
strains based on the pSymA and chromosomal phylogenies
but had a different clustering pattern on the pSymB
phylogeny (Fig. 2). Despite the existence of such clusters,
overall, there was little consistent pattern of host species- or
geographic location-based strain relationships across the
three replicons. Furthermore, strains of ETI were not
clustered together in any of the replicon-based MP trees
(Fig. 2) or the combined phylogenetic tree that included all
the sequences (Fig. 3).
DISCUSSION
In this study, we sequenced fragments of 16 genes distributed
widely throughout the genome of the model laboratory strain
Rm1021 of the nitrogen-fixing bacterium S. meliloti. For each
of the 49 strains, we analysed a total of 11 189 nucleotides,
representing about 0?167 % of the whole genome of Rm1021.
Our analyses revealed a diverse range of molecular divergence
among genes (HKY85 distance ranges between 0?00157 and
0?04109) and among replicons (HKY85 distance ranges
between 0?00349 and 0?01498). Among the sequenced genes,
on average, those on the megaplasmid pSymA showed the
highest sequence divergence, followed by genes on pSymB,
with genes on the chromosome showing the lowest
divergence. Our multilocus linkage disequilibrium analyses
using IA identified two replicons, pSymA and the chromosome, having overall clonal population structures. However,
limited recombination was also noted for these two replicons
based on other tests. In contrast, pSymB showed an overall
structure not significantly different from random recombination. Our phylogenetic analyses identified very limited
geographic, host species-specific or MLEE-based patterns of
sequence variation.
Most MLST studies of prokaryotes and eukaryotic microbes
conducted so far have examined four to seven genes with a
total of 2–5 kb of sequence (Cooper & Feil, 2004; Maiden
et al., 1998). In these studies, to ensure amplification of the
genes from all strains, the chosen genes have typically been
conserved house-keeping genes, encoding essential cellular
functions. In contrast, our criteria were to provide a broad
coverage of the physical locations of the genes on the three
replicons. Prior to the analysis of these genes, we had no
knowledge about the level of DNA sequence variation
among strains for any of these 16 gene fragments. As a result,
we believe that the large number of genes analysed and the
randomness in our selection process ensure that our data are
representative of the genome of S. meliloti and that the data
here provide a realistic assessment of the genetic variation
among strains. As expected, we found that the variation in
mean sequence divergence among genes in this study was
greater (a 26-fold difference between the most variable gene
SmbExoF3 and the least variable gene SmcAqpz1) than those
in most other studies. For example, the sequenced loci in the
human pathogen Neisseria meningitidis show a maximum
sixfold difference between genes in their mean sequence
divergence among strains, typical of most MLST studies of
prokaryotes (Cooper & Feil, 2004; Seifert & DiRita, 2006).
Compared to genes on the two megaplasmids, genes on the
chromosome showed a lower divergence level among
strains. While the exact mechanism(s) for these differences
is unknown at present, there are two possibilities. The first is
Table 5. Summary results of the PH test
Values shown are the number of steps in each analysis.
Analysed sequence combinations
Among genes within pSymA
Among genes within pSymB
Among genes on the chromosome
pSymA versus pSymB
pSymA versus chromosome
pSymB versus chromosome
pSymA versus pSymB versus chromosome
All 16 genes separately
3256
Length of
original partition
Length of
observed MP trees
Lengths of trees based
on randomized dataset
P
245
335
113
804
458
612
937
693
325
479
133
968
507
695
1179
1179
303–320
444–481
122–137
952–975
491–512
675–698
1150–1189
1038–1195
<0?001
<0?001
<0?001
<0?001
<0?001
<0?001
<0?001
<0?001
Downloaded from www.microbiologyresearch.org by
IP: 88.99.165.207
On: Sat, 17 Jun 2017 15:12:39
Microbiology 152
Gene genealogy in S. meliloti
that genes on pSymA and pSymB may be under significantly
fewer functional constraints than those on the chromosome.
Therefore, genes on the two megaplasmids might be more
prone to the accumulation of mutations. Laboratory studies
have shown that significant portions of the genes on pSymA
and pSymB can be deleted with few or no fitness consequences (Charles & Finan, 1991; Oresnik et al., 2000),
consistent with the lack of functional constraints for genes
on these two megaplasmids. The second hypothesis is that
genes on pSymA and pSymB might be under positive
selection. Signatures of positive selection have been detected
for the nodulation receptor kinase (NORK) gene in one of
the host plants of S. meliloti, Medicago truncatula (De Mita et
al., 2006). It is possible that many genes on pSymA and
pSymB are highly niche-specific and that their divergence is
associated with host specialization and/or frequent niche
switching. It is interesting to note that the gene with the
highest mean divergence, SmbExoF3, is located on pSymB
and that it encodes a putative outer-membrane protein. Its
high rate of divergence might have significant functional
implications, e.g. in host recognition and niche specialization. However, a preliminary analysis found little evidence
of positive Darwinian selection for this gene, as the ratio of
non-synonymous substitutions to synonymous substitutions within the sequenced SmbExoF3 gene fragment was
significantly lower than 1 (data not shown). More robust
analyses with additional sequences from closely related
bacterial species such as Sinorhizobium medicae and
Sinorhizobium fredii, as well as their interacting genes
from a variety of host species, are needed to critically
evaluate these hypotheses.
This study used three different tests to infer allelic
associations among polymorphic nucleotide sites within a
replicon as well as among replicons: the IA test, the IR test
and the PH test. Overall, the three analytical methods
showed similar results. However, minor differences and
inconsistencies were found (Tables 3 and 4). The inconsistencies were likely the results of the differences among the
tests themselves. For example, the three methods have
different null models. The tests for IA and IR used the null
model of complete random association, while the PH test
used strict clonality as the null hypothesis (Agapow & Burt,
2001; Farris et al., 1994). In addition, all three analytical
methods are highly sensitive to several factors, such as strain
population size, the number of polymorphic nucleotide
sites, and the frequency of individual alleles at these
polymorphic sites. The number of strains used in this
study was relatively small (n=49), and many polymorphic
sites had highly skewed allele frequencies, thus potentially
contributing to the observed inconsistencies among the
analytical methods.
Our analyses provided evidence of recombination between
genes within all three replicons, with genes on pSymB
showing overall linkage equilibrium based on IA. These
results suggested that genetic exchange and recombination
have played a significant role in natural populations of S.
http://mic.sgmjournals.org
meliloti. Using a whole-replicon nearest-neighbour analysis,
Wong & Golding (2003) have shown that pSymB in strain
Rm1021 has a complex evolutionary history, with closest
sequence matches coming from diverse groups of organisms.
Their analysis is consistent with our observation of frequent
recombination for genes on pSymB. However, their
theoretical study analysed only one replicon from a single
strain; the potential differences among strains and replicons
in S. meliloti were not analysed (Wong & Golding, 2003).
We observed differences in the overall allelic associations
among the three replicons. While the exact mechanisms are
unknown, several non-mutually exclusive mechanisms could
help explain the observed differences in the degree of
recombination among the replicons. In the first hypothesis,
there might be intrinsic differences in the rate of recombination (DNA-strand breakage and repair) among the three
replicons. These breakages and repairs may be related to the
activity of insertion sequence (IS) elements and phage-like
elements among the three replicons. In the genome of strain
Rm1021, abundant IS elements and phage-like elements are
found in all three replicons (Galibert et al., 2001). In the
second hypothesis, there might be frequent loss and gain of
megaplasmids (or portions of megaplasmids) among strains
in natural populations. Indeed, the entire megaplasmid
pSymA and large portions of megaplasmid pSymB can be
deleted from the genome of S. meliloti, with few or no fitness
consequences, under certain laboratory conditions (Charles
& Finan, 1991; Oresnik et al., 2000). Recent screening for
novel DNA sequences (sequences that are absent in the
genome of the model laboratory strain Rm1021 but are
present in other natural strains of S. meliloti) among natural
strains of S. meliloti has also identified frequent gains and
losses of DNA fragments on the two megaplasmids and, to a
lesser extent, on the chromosome as well (Guo et al., 2005).
The frequent gains and losses could have contributed to the
observed phylogenetic incongruences among the three
replicons. In the third hypothesis, putative conjugative
transfer genes have been found on pSymA of strain
Rm1021, and these genes could potentially mediate HGT
and contribute to random allelic associations among
replicons (Galibert et al., 2001). HGT has been found in
many natural bacterial populations, including nitrogen-fixing
bacteria (Seifert & DiRita, 2006; Silva et al., 2005; Vinuesa et
al., 2005; Xu, 2006). For example, in a symbiotic nitrogenfixing bacterium, Sullivan et al. (2002) found that a 502 kb
symbiosis island in Mesorhizobium loti strain R7A was
transferred to a non-symbiotic Mesorhizobium strain in the
soil, converting the recipient cell to a symbiont. Although no
evidence for direct genetic exchange between marked strains
of S. meliloti has been found in nature, a cluster of genes
encoding a type IV pilus has been found on pSymA of strain
Rm1021 (Galibert et al., 2001). Type IV pili are unique
structures on the bacterial surface that are found in many
Gram-negative bacteria. They play important roles in
adhesion to host cells, in infection by bacteriophages, and
in conjugative DNA transfer among strains (Ashelford et al.,
2003; Door et al., 1998).
Downloaded from www.microbiologyresearch.org by
IP: 88.99.165.207
On: Sat, 17 Jun 2017 15:12:39
3257
S. Sun, H. Guo and J. Xu
We identified significant sequence variation among the 18
strains of ET1 (Figs 2 and 3). The combined sequence
analysis revealed that each of the 18 strains had a unique
multilocus genotype. While small clusters of ET1 strains
were found in most phylogenetic trees (Figs 2 and 3),
overall, strains of ET1 showed a broad distribution on all
three replicon-based phylogenies as well as on the phylogeny
based on the combined sequences of the 16 genes. Our
analyses thus demonstrated the high discriminatory power
of MGGA over MLEE. A similar pattern of non-clustered
distribution of ET1 strains was also found in a phylogenetic
tree based on the analyses of 12 novel DNA fragments (Guo
et al., 2005). The patterns from both datasets (i.e. gene
sequences and novel DNA distributions) are consistent with
the main conclusion of our study, i.e. recombination plays a
significant role in the evolution and genome structure in
natural strains of S. meliloti. However, like the results
obtained from MLEE analysis, the genealogical analyses of
these randomly selected genes revealed limited geographicor host species-based patterns of molecular variation (Figs 2
and 3; statistical analysis not shown). Instead, the analyses of
these potentially neutral markers suggested significant gene
flow between geographic regions and host species.
REFERENCES
Agapow, P. M. & Burt, A. (2001). Indices of multilocus linkage
disequilibrium. Mol Ecol Notes 1, 101–102.
Ashelford, K. E., Day, M. J. & Fry, J. C. (2003). Elevated abundance of
bacteriophage infecting bacteria in soil. Appl Environ Microbiol 69,
285–289.
Biondi, E. G., Pilli, E., Giuntini, E. & 8 other authors (2003). Genetic
relationship of Sinorhizobium meliloti and Sinorhizobium medicae
strains isolated from Caucasian region. FEMS Microbiol Lett 220,
207–213.
Carelli, M., Gnocchi, S., Fancelli, S., Mengoni, A., Paffetti, D., Scotti, C.
& Bazzicalupo, M. (2000). Genetic diversity and dynamics of
Sinorhizobium meliloti populations nodulating different alfalfa
cultivars in Italian soils. Appl Environ Microbiol 66, 4785–4789.
Charles, T. & Finan, T. M. (1991). Analysis of a 1600-kilobase
Rhizobium meliloti megaplasmid using defined deletions generated in
vivo. Genetics 127, 5–20.
Cooper, J. E. & Feil, E. J. (2004). Multilocus sequence typing – what
is resolved? Trends Microbiol 12, 373–377.
Davies, J. E. (1994). Inactivation of antibiotics and the dissemination
of resistance genes. Science 264, 375–382.
De Mita, S., Santoni, S., Hochu, I., Ronfort, J. & Bataillon, T. (2006).
Molecular evolution and positive selection of the symbiotic gene
NORK in Medicago truncatula. J Mol Evol 62, 234–244.
Dispersion and migration between different geographic
areas could have been brought about by wind, water, or
human activities such as the widespread cultivation of the
host plant alfalfa in many parts of the world. Indeed,
extensive gene flow between geographic populations has
been found in other symbiotic nitrogen-fixing Rhizobium
species associated with agricultural crops (e.g. Moreiar et al.,
1998; Oyaizu et al., 1993). For example, strains of Rhizobium
etli, another common nitrogen-fixing species that can form
a symbiotic relationship with legumes, have been found to
be capable of dispersal along with the seeds of its host plant,
Phaseolus vulgaris (Perez-Ramirez et al., 1998). Similar
evidence for long-distance dispersal has also been found for
several other nitrogen-fixing bacteria, such as Rhizobium
gallicum sensu lato (Silva et al., 2005) and species in the
genus Bradyrhizobium (Vinuesa et al., 2005).
Door, J., Hurek, T. & Reinhold-Hurek, B. (1998). Type IV pili are
involved in plant–microbe and fungus–microbe interactions. Mol
Microbiol 30, 7–17.
The lack of strictly host-specific clades in S. meliloti is also
consistent with its lifestyle in nature. S. meliloti is not an
obligate symbiont but exists mostly as a free-living bacterium
in the soil. As a result, each genotype might be exposed to
many potential host species. Indeed, in such situations, an
obligate host specialization might be detrimental to the longterm survival of these strains in natural soil environments.
Further functional analyses of these strains in diverse
environments could help us determine the fitness consequences of these and other genetic variations in S. meliloti.
evolution in light of gene transfer. Mol Biol Evol 19, 2226–2238.
natural populations of the nitrogen-fixing bacterium Rhizobium
meliloti. Appl Environ Microbiol 56, 187–194.
Farris, J. S., Källersjö, M., Kluge, A. G. & Bult, C. (1994). Testing
significance of incongruence. Cladistics 10, 315–319.
Finan, T. M., O’Brian, M. R., Layzell, D. B., Vessey, J. K. & Newton, W.
(2002). Nitrogen-fixation: global perspectives. In Proceedings of the
13th International Congress on Nitrogen Fixation. Wallingford, UK:
CABI Publishing.
Galibert, F., Finan, T. M., Long, S. R. & 53 other authors (2001). The
composite genome of the legume symbiont Sinorhizobium meliloti.
Science 293, 668–672.
Gogarten, J. P., Doolittle, W. F. & Lawrence, J. G. (2002). Prokaryotic
Guo, H., Sun, S., Finan, T. M. & Xu, J. (2005). Novel DNA sequences
from natural strains of the nitrogen-fixing symbiotic bacterium
Sinorhizobium meliloti. Appl Environ Microbiol 71, 7130–7138.
Hartmann, A., Giraud, J. J. & Catroux, G. (1998). Genotypic diversity
of Sinorhizobium (formerly Rhizobium) meliloti strains isolated
directly from a soil and from nodules of alfalfa (Medicago sativa)
grown in the same soil. FEMS Microbiol Ecol 25, 107–116.
Jebara, M., Mhamdi, R., Aouani, M. E., Ghrir, R. & Mars, M. (2001).
Genetic diversity of Sinorhizobium populations recovered from
different Medicago varieties cultivated in Tunisian soils. Can
J Microbiol 47, 139–147.
Kidd, S. E., Guo, H., Bartlett, K. H., Xu, J. & Kronstad, J. W. (2005).
ACKNOWLEDGEMENTS
We are grateful to Dr Bert Eardly for generously providing us with the
strains. Financial support for this study is provided by Genome
Canada, the Canadian Foundation for Innovation, the Premiers
Research Excellence Award (PREA), the Ontario Innovation Trust, and
the Ontario Research and Challenge Development Fund.
3258
Eardly, B. D., Materon, L. A., Smith, N. H., Johnson, D. A.,
Rumbaugh, M. D. & Selander, R. K. (1990). Genetic structure of
Comparative gene genealogies indicate that two clonal lineages of
Cryptococcus gattii in British Columbia resemble strains from other
geographical areas. Eukaryot Cell 4, 1629–1638.
Lan, L. & Xu, J. (2006). Multiple gene genealogical analyses suggest
divergence and recent clonal dispersal in the opportunistic human
pathogen Candida guilliermondii. Microbiology 152, 1539–1549.
Downloaded from www.microbiologyresearch.org by
IP: 88.99.165.207
On: Sat, 17 Jun 2017 15:12:39
Microbiology 152
Gene genealogy in S. meliloti
Maiden, M. C., Bygraves, J. A., Feil, E. & 10 other authors (1998).
Multilocus sequence typing: a portable approach to the identification
of clones within populations of pathogenic microorganisms. Proc
Natl Acad Sci U S A 95, 3140–3145.
Sobral, B. W., Honeycutt, R. J., Atherly, A. G. & McClelland, M.
(1991). Electrophoretic separation of the three Rhizobium meliloti
replicons. J Bacteriol 173, 5173–5180.
Moreiar, F. M., Haukka, S. K. & Young, J. P. (1998). Biodiversity of
Sullivan, J. T., Trzebiatowski, J. R., Cruickshank, R. W. & 11 other
authors (2002). Comparative sequence analysis of the symbiosis
rhizobia isolated from a wide range of forest legumes in Brazil. Mol
Ecol 7, 889–895.
Swofford, D. L.
Oresnik, I. J., Liu, S. L., Yost, C. K. & Hynes, M. F. (2000).
Megaplasmid pRm2011a of Sinorhizobium meliloti is not required for
viability. J Bacteriol 182, 3582–3586.
Oyaizu, H., Matsumoto, S., Minamisawa, K. & Gamou, T. (1993).
island of Mesorhizobium loti strain R7A. J Bacteriol 184, 3086–3095.
(2004). PAUP* 4.10b: Phylogenetic Analysis
Using Parsimony and Other Methods. Sunderland, MA: Sinaur
Associates.
Van Sluys, M. A., Monteiro-Vitorello, C. B., Camargo, L. E. & 7 other
authors (2002). Comparative genomic analysis of plant-associated
Distribution of rhizobia in leguminous plants surveyed by
phylogenetic identification. J Gen Appl Microbiol 39, 339–354.
bacteria. Annu Rev Phytopathol 40, 169–189.
Paffetti, D., Scotti, C., Gnocchi, S., Fancelli, S. & Bazzicalupo, M.
(1996). Genetic diversity of an Italian Rhizobium meliloti population
Population genetics and phylogenetic inference in bacterial molecular systematics: the roles of migration and recombination in
Bradyrhizobium species cohesion and delineation. Mol Phylogenet
Evol 34, 29–54.
from different Medicago sativa varieties. Appl Environ Microbiol 62,
2279–2285.
Perez-Ramirez, N. O., Rogel, M. A., Wang, E., Castellanos, J. Z. &
Martinez-Romero, E. (1998). Seeds of Phaseolus vulgaris bean carry
Rhizobium etli. FEMS Microbiol Ecol 26, 289–296.
Vinuesa, P., Silva, C., Werner, D. & Martı́nez-Romero, E. (2005).
Wong, K. & Golding, G. B. (2003). A phylogenetic analysis of the
pSymB replicon from the Sinorhizobium meliloti genome reveals a
complex evolutionary history. Can J Microbiol 49, 269–280.
Roumiantseva, M. L., Andronov, E. E., Sharypova, L. A., DammannKalinowski, T., Keller, M. & Young, J. P. (2002). Diversity of
Xu, J. (2004). The prevalence and evolution of sex in microorgan-
Sinorhizobium meliloti from the Central Asian Alfalfa Gene Center.
Appl Environ Microbiol 68, 4694–4697.
Xu, J. (2005). Fundamentals of fungal molecular population genetic
Schofield, P. R., Gibson, A. H., Dudman, W. F. & Watson, J. M.
(1987). Evidence for genetic exchange and recombination of
Rhizobium symbiotic plasmids in a soil population. Appl Environ
Microbiol 53, 2942–2947.
Seifert, H. S. & DiRita, V. J. (2006). Evolution of Microbial Pathogens.
Washington, DC: American Society for Microbiology.
Silva, C., Vinuesa, P., Eguiarte, L. E., Souza, V. & Martı́nez-Romero, E.
(2005). Evolutionary genetics and biogeographic structure of
Rhizobium gallicum sensu lato, a widely distributed bacterial symbiont
of diverse legumes. Mol Ecol 14, 4033–4050.
http://mic.sgmjournals.org
isms. Genome 47, 775–780.
analyses. In Evolutionary Genetics of Fungi, pp. 87–116. Edited by J.
Xu. Wymondham, UK: Horizon Scientific Press.
Xu, J. (2006). Microbial ecology in the age of genomics and
metagenomics: concepts, tools, and recent advances. Mol Ecol 15,
1713–1731.
Xu, J., Vilgalys, R. & Mitchell, T. G. (2000). Multiple gene genealogies
reveal recent dispersion and hybridization in the human pathogenic
fungus Cryptococcus neoformans. Mol Ecol 9, 1471–1482.
Young, J. P. W. & Wexler, M. (1988). Sym plasmids and chromosomal
genotypes are correlated in field populations of Rhizobium
leguminosarum. J Gen Microbiol 134, 2731–2739.
Downloaded from www.microbiologyresearch.org by
IP: 88.99.165.207
On: Sat, 17 Jun 2017 15:12:39
3259
Related documents