Download Sequence Heterogeneities Among 16s

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

DNA sequencing wikipedia , lookup

Transposable element wikipedia , lookup

Extrachromosomal DNA wikipedia , lookup

Epitranscriptome wikipedia , lookup

Vectors in gene therapy wikipedia , lookup

Koinophilia wikipedia , lookup

Nucleic acid analogue wikipedia , lookup

History of genetic engineering wikipedia , lookup

Cell-free fetal DNA wikipedia , lookup

Molecular cloning wikipedia , lookup

Deoxyribozyme wikipedia , lookup

Genome evolution wikipedia , lookup

Designer baby wikipedia , lookup

Mutation wikipedia , lookup

Non-coding RNA wikipedia , lookup

History of RNA biology wikipedia , lookup

Cre-Lox recombination wikipedia , lookup

DNA barcoding wikipedia , lookup

Human genome wikipedia , lookup

Gene wikipedia , lookup

Primary transcript wikipedia , lookup

Non-coding DNA wikipedia , lookup

Bisulfite sequencing wikipedia , lookup

Maximum parsimony (phylogenetics) wikipedia , lookup

Site-specific recombinase technology wikipedia , lookup

Therapeutic gene modulation wikipedia , lookup

Sequence alignment wikipedia , lookup

Genomic library wikipedia , lookup

Point mutation wikipedia , lookup

Genome editing wikipedia , lookup

Microsatellite wikipedia , lookup

Microevolution wikipedia , lookup

Helitron (biology) wikipedia , lookup

RNA-Seq wikipedia , lookup

Genomics wikipedia , lookup

Pathogenomics wikipedia , lookup

No-SCAR (Scarless Cas9 Assisted Recombineering) Genome Editing wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Computational phylogenetics wikipedia , lookup

Metagenomics wikipedia , lookup

Transcript
Sequence Heterogeneities Among 16s Ribosomal RNA Sequences,
and Their Effect on Phylogenetic Analyses at the Species Level
Vale’rie Cilia, Bgnkdicte Lafay, and Richard Christen
CNRS 8z UniversitC Paris 6, Observatoire
Ockanologique,
Villefranche
sur mer, France
We have analyzed what phylogenetic
signal can be derived by small subunit rRNA comparison for bacteria of
different but closely related genera (enterobacteria) and for different species or strains within a single genus (Escherichia or Salmonella), and finally how similar are the ribosomal operons within a single organism (Escherichia
coli). These sequences have been analyzed by neighbor-joining, maximum likelihood, and parsimony. The robustness
of each topology was assessed by bootstrap.
Sequences were obtained for the seven rrn operons of E. coli strain PK3. These data demonstrated differences
located in three highly variable domains. Their nature and localization suggest that since the divergence of E. coli
and SaZmoneZZutyphimurium, most point mutations that occurred within each gene have been propagated among
the gene family by conversions involving short domains, and that homogenization
by conversions may not have
affected the entire sequence of each gene. We show that the differences that exist between the different operons
are ignored when sequences are obtained either after cloning of a single operon or directly from polymerase chain
reaction (PCR) products. Direct sequencing of PCR products produces a mean sequence in which mutations present
in the most variable domains become hidden. Cloning a single operon results in a sequence that differs from that
of the other operons and of the mean sequence by several point mutations. For identification of unknown bacteria
at the species level or below, a mean sequence or the sequence of a single nonidentified operon should therefore
be avoided. Taking into account the seven operons and therefore mutations that accumulate in the most variable
domains would perhaps increase tree resolution. However, if gene conversions that homogenize the rRNA multigene
family are rare events, some nodes in phylogenetic trees will reflect these recombination events and these trees may
therefore be gene trees rather than organismal trees.
Introduction
Ribosomal
RNA sequences are now widely used
for phylogenetic
analyses and identifications
of species.
For phylogeny, rRNA sequences have become popular
because they are universally
present and have a conserved function (they are homologous in all organisms),
because they are easy to sequence, and finally because
they are constituted of an interspersion
of highly conserved to very variable domains. Ribosomal
RNA sequences are now widely used in bacteriology because it
is often easier to identify bacteria by specific nucleic
acid sequences rather than by their biochemical or physiological traits. Moreover, the coupling of a polymerase
chain reaction (PCR) amplification using universal primers to molecular cloning allows the identification
of microorganisms
that cannot be easily cultured under laboratory conditions (Giovannoni
et al. 1990; Ward, Weller, and Bateson 1990; Britschgi and Giovannoni
1991;
Schmidt, Delong, and Pace 1991; Fuhrman, McCallum,
and Davis 1992; Berchtold, Ludwig, and Kijnig 1994).
The large ribosomal subunit RNA (LSU rRNA) has
a larger information
content than the small ribosomal
subunit RNA (SSU rRNA), not only because of its largKey words:
16s rRNA, phylogeny,
TTTZ
operons,
Escherichiu
coli.
Address for correspondence
and reprints: Richard Christen, CNRS
& UniversitC Paris 6, Observatoire OcCanologique, Station Zoologique,
Villefranche sur mer, 06230 France. Email: [email protected].
Mol. Biol. Evol. 13(3):451461.
1996
0 1996 by the Society for Molecular Biology and Evolution. ISSN: 0737-4038
er size but also because it contains domains of rapid
expansion and higher mutation rates (Hassouna, Michot,
and Bachellerie 1984) that could be more adequate for
a distance evaluation
between closely related species.
Despite their seemingly lower phylogenetic
information
content, the SSU rRNA sequences have become more
widely used for estimating
phylogenetic
relationships
among unicellular organisms (Fox, Wisotzkey, and Jurtshuk 1992). However, SSU rRNA sequences may not
be adequate to analyze phylogenetic
relationships
between closely related species (not to speak of different
strains within the same species) because there may not
be enough differences between these sequences; this has
been suggested by phylogenetic analyses of genera such
as Bacillus (Ash et al. 1991; Riissler et al. 1991) or
Vibrio (Ruimy et al. 1994).
Another problem derives from the organization
of
rRNA genes (rDNA) as a multigene family, often as rrn
operons in bacteria. In eubacteria for example there are
two rrn operons in Mycobacterium
smegmatis (Suzuki
and Yamada 1988) and seven rrn operons in Escherichia
coli and Salmonella
typhimurium
(Hill and Harnish
198 1); Bacillus subtilis is reported to have 9 or 10 rDNA
per genome (Loughney, Lund, and Dahlberg 1982) and
a few hundred copies are often present in eukaryotic
genomes (Long and Dawid 1980). The members of a
multigene family are subject to a homogenization
process (Dover 1982, 1987; Ohta 1991) and rDNA se451
452
Cilia et al.
quences tend to evolve in concert. A particular rRNA
gene is probably more similar to a paralogous copy in
the same organism than to its homologous
copy in another species, but we do not know how the necessary
homology condition is met when nonhomologous
t-RNA
genes are compared in different species, that is when the
sequence of a single gene is obtained by cloning. Finally, the differences that may exist between the different members of the t-RNA multigene family are probably
ignored when sequences are obtained by direct sequencing of RNA or of PCR products.
In order to analyze if a lack of resolution in phylogenetic trees of closely related species could be the
result either of a lack of divergence
between rRNA
sequences or of recombination
events that may blur the
hierarchichal
phylogenetic
information
(Harvey et al.
1988; Harvey and Hill 1990; Sneath 1993) or because
analyses are performed irrespective
of operon homology, the following
questions
have been investigated:
(1) What phylogenetic
signal can be derived for bacteria of different but closely related genera? (2) Is it
possible to distinguish
different species within a genus
and to analyze their phylogeny?
(3) How similar are
the different operons within a single organism?
Bacteria belonging
to the family Enterobacteriaceae
have
been chosen as a model system. Within this family,
most genera are at least 20% related one to another as
measured by reassociation
of hybrid genomic DNA as
a function of temperature:
20%-60%
between Escherichia coli and the genera Citrobacter,
Klebsiella, or
Erwinia; 40%-50%
between E. coli and Salmonella
and finally, more than 70% between E. coli and Shigella (Brenner and Falkow 1971; Krieg and Holt 1984).
Sequences have been obtained for bacteria of each of
these genera, for several species within two genera
(Escherichia
and Salmonella),
for several strains of E.
coli, and finally for the seven rrn operons of a single
strain of E. coli. These sequences have been compared
and analyzed by three phylogenetic
methods: neighborjoining, maximum likelihood,
and parsimony,
and the
robustness of each topology has been assessed by bootstrap. It is concluded that the method used for obtaining a sequence (i.e., cloning versus PCR product direct
sequencing)
and the occurrence
of recombinations
between operons are important parameters that should be
taken into account for deriving phylogenies
between
closely related organisms.
Materials and Methods
Bacterial Strains
Bacterial strains belonging
to the family Enterobacteriaceae
for which SSU rRNA sequences are now
available are listed in table 1. Escherichia sp. and Sal-
monella sp. are peculiar strains whose identifications
by phenotypic
methods could not be obtained at the
species level, but that were characterized
at the genus
level. They were therefore considered
of interest for
our analysis.
All phenotypic
characterizations
of
“BioM”
strains have been done by the API laboratories at the Institut BioMerieux. Escherichia coli 5 strain
MC4 100 has been characterized previously (Casadaban
1976). Table 1 summarizes the methods used, when the
information
was available,
for obtaining
a sequence:
with reverse transcriptase
directly from RNAs (rRNA),
with T7 DNA polymerase directly from PCR products
of full genomic DNAs (PCR) or after cloning (E. coli
2 and 3), or isolation of an operon (rrnX).
DNA Preparation
and Hybridization
Genomic DNA of E. coli PK3 strain was purified
as previously
described (Ausubel et al. 1991). DNA
was digested using BamHI and Pstl (Hill and Harnish
1981). Restriction fragments were separated in two different lanes on a 0.6% low melting agarose gel (Gibco
BRL). Southern blot transfers of DNA were performed
(Ausubel et al. 1991) on Nylon membrane Positive@
(Appligene,
France). One lane was transferred on the
membrane and the other lane was stored at 4°C. DNA
fixation was achieved by baking for 15 min.
A DNA probe of 1.4 kb specific for E. coli SSU
rDNA was generated by PCR. Unincorporated
dNTP
were separated from the probe with a sepharose CL4B column (Pharmacia).
DNA probes were then labeled by random priming with digoxygenin
1 l-dUTP
using a DIG DNA labeling kit (Boehringer Mannheim)
as recommended
in the instructions.
Hybridization
and
immunological
detection of DIG-labeled
probes were
performed using a DIG-DNA
detection kit according
to manufacturer’s
instructions
(Boehringer Mannheim).
The membrane was then superposed to the remaining
half of the agarose gel. Bands corresponding
to each
rrn operon were cut off and used for amplification.
DNA PCR
All PCR reactions
were carried out using Taq
DNA polymerase
supplied by Appligene France, paraffin wax (fusion temperature:
6O”C), and a hot start
procedure. Reaction buffer (final concentrations:
TrisHCl, 50 mM; (NHJ2S04,
16 mM; KCl, 50 mM;
MgCl,, 7 mM; bovine serum albumin, 0.2 mg/ml; pH
9.0), dNTP 250 PM each and 30 pmol of each primer
were added in a 50-~1 volume below the wax layer.
Taq DNA polymerase
(2.5 U), 5 ~1 of DNA sample in
0.6% low melting agarose, and reaction buffer were in
a 50-p,l volume above the wax layer. Reactions were
carried out using a Techne PHC-2 (Techne UK) with
the following set of conditions:
95”C, 3 min; 53OC, 2
16s t-RNA Heterogeneities
Table 1
Strains and rrn Operons Analyzed
Specie@
Escherichia coli 1 ...............
Escherichia coli 2 ...............
Escherichia coli 3 ...............
Escherichia coli 4* ..............
Escherichia coli 5* ..............
Escherichia coli 6* ..............
Escherichia coli 7* ..............
Escherichia coli 8A ..............
Escherichia coli 8B ..............
Escherichia coli 8C ..............
Escherichia coli 8D ..............
Escherichia coli 8E ..............
Escherichia coli 9 ...............
Escherichia coli 10 ..............
Escherichia sp.* ................
Escherichia vulneris* ............
Escherichia hennaniP
...........
Shigella Jlexneri* ................
Shigella dysenteriae*
............
Shigella sonnei* ................
Salmonella give* ................
Salmonella paratyphi A* .........
Salmonella sp.* .................
Salmonella sofa*
...............
Salmonella typhimurium* .........
Salmonella shomron* ............
Citrobacter freundii ..............
Klebsiella pneumoniae*
..........
Serratia marcescens
.............
Erwinia carotovora ..............
Escherichia coli 4A* .............
Escherichia coli 4B* .............
Escherichia coli 4C* .............
Escherichia coli 4D* .............
Escherichia coli 4E* .............
Escherichia coli 4G* .............
Escherichia coli 4H* .............
453
and Methods Used to Obtain SSU RNA Sequences
Source (strain)b
(PK3)
(MC4100)
CIP (ATCC 1 1775T)
ATCC 25922
(Kl2)
(Kl2)
(Kl2)
(Kl2)
(Kl2)
BioM
CIP (ATCC 33821T)
BioM
BioM
BioM
BioM
BioM
BioM
ATCC 9712
BioM
ATCC 13311
BioM
Nice, Pasteur hospital
(PK3)
(PK3)
(PK3)
(PK3)
(PK3)
(PK3)
(PK3)
EMBLC
JO1859
JO1695
VOO348
X8073 1
X80732
X80725
X80724
M87049
UOOO06
L10328
U18997
UOOO06
D15061
M29364
X80733
X80734
X80675
X80679
X80680
X80726
X80683
X80682
X80676
X80677
X8068 1
X80678
M5929 1
X80684
M59160
M59149
X8072 1
X80722
X80723
X80727
X80728
X80729
X80730
Methodd
rRNA
rrnB
rrnB
PCR
PCR
PCR
PCR
rrnA
rmB
rrnC
rrnD
rmE
rrnH
rrnG
PCR
PCR
rRNA
rRNA
rRNA
rRNA
x-RNA
rRNA
rRNA
rRNA
rRNA
rRNA
t-RNA
rrnA
rrnB
rrnC
rrnD
rrnE
rrnG
rrnH
a Asterisks indicate strains that have been sequenced for the purpose of this study.
b Bacterial collection from which each strain has been obtained: BioM (BioMtrieux, Marcy l’Etiole, France), CIP (Collection de 1’Institut Pasteur), ATCC (American Type Culture Collection).
c Accession numbers under which each sequence is available.
d Method by which each sequence has been obtained: rRNA indicates that total rRNA has been sequenced using reverse
transcriptase, PCR indicates that total PCR products have been sequenced using T7-DNA polymerase; rrnX indicates that
the sequence of a single operon (X) has been obtained; finally the method of sequencing could not be retrieved for some
sequences.
min; 72”C, 1 min followed by 94”C, 30 set; 53OC, 1
min; 72”C, 1 min for 25 cycles; a final cycle was 94”C,
30 set; 53”C, 1 min; 72”C, 5 min. Samples were precipitated with PEG 8000 20%, NaCl 2.5 M (Paithankar
and Prasad 1991) and purified on 1% low melting agarose in TAE buffer (Tris-acetate
pH 7.0, 40 mM;
EDTA, pH 8.0, 2 mM).
PCR product direct sequencing was carried out using T7 DNA polymerase
(USB). Direct sequencing reactions of total PCR products were performed as previously described (Ruimy et al. 1994).
Direct SSU r-RNA Sequencing
Ribosomal RNA purification and sequencing reactions using reverse transcriptase were carried out as previously described (Ruimy et al. 1994).
Primer Positions
Ten DNA primers were used in the sequencing reactions. These primers corresponded
to the following
positions in E. coli sequence: positions 99-l 19; positions 342-356; positions 5 18-534; positions 684-702;
positions
906-925;
positions
1099-l 114; positions
454
Cilia et al.
1223-1240; positions 1384-1400; positions 1493-1509.
A reverse primer (positions 8-28) was used in conjunction with the last primer to amplify SSU rDNAs.
Phylogenetic
Analysis
The phylogenetic
data described below were obtained by alignment of the different sequences and phylogenetic analyses. All operations were done using computer programs developed by us and available on request
from R. Christen.
The SSU rRNA sequences
were
aligned by eye. Domains used for deriving phylogenies
were restricted to parts of the sequences for which homologies were without doubt and did not include too
many undetermined
nucleotides;
for figure 1 they were
positions 14-78, 101-858, 862-l 124, 1137-1431 (corresponding
to E. coli 2 sequence). 5 ’ end and 3’ end
analyses used the same domains but, respectively, up to
and from position 665. For the Escherichia and Shigella
species trees, we used positions 14-80, 94-205, 207418, 420-860, 863-1023,
1029-1381,
1385-1438. Domains used for the Salmonella species trees were: 3-78,
103-205, 207418,
420-1023,
1029-1431 (corresponding to E. coli 2 sequence). Individuals
positions that
contained undetermined
nucleotides have also been removed from analyses. Although all available sequences
are mentioned in table 1, they have not all been included
in phylogenetic
analyses shown in this paper because
some of them were identical to other sequences (sequences of E. coli 8 and 9) or because one of them was
only a partial sequence (E. coli 10).
A neighbor-joining
algorithm like that developed
by Saitou and Nei (1987) was used. The program was
rewritten to include inputs and outputs compatible with
the ribosomal database and other programs developed in
our laboratory. Distances were calcultated as uncorrected percentages of divergence or using corrections (Jukes
and Cantor 1969) and two-parameter correction (Kimura
1980). Identical trees were obtained by neighbor-joining
whichever distance was used. Topologies shown in this
paper were obtained using the correction of Jukes and
Cantor for estimating
distances.
For parsimony,
the
PAUP program (Swofford 1992) for Macintosh computers was used. All topologies were first obtained by
using the heuristic options. When several most-parsimonious trees were obtained, a 100% consensus tree
was constructed
and treated as the most-parsimonious
tree for constructing figures. For the parsimony data matrices only positions that had at least one mutation in
one sequence were considered (nonuniform
positions).
For maximum likelihood, every position was taken into
account in order to assess base frequency. Finally, a
bootstrap analysis was performed (heuristic option, 100
replications)
to check each topology for robustness. For
maximum likelihood the fDNAm1 program rewritten by
G. J. Olsen (University
of Illinois, Urbana) was used
(using E Y, and G options). All trees were plotted by
using a program developed by M. Gouy (University
Claude Bernard, Lyon, France) that allows transformation of a formal tree representation
(Newick’s format)
into MacDraw drawings.
EMBL Accession
Numbers
All sequences obtained for the purpose of this study
have been deposited in the EMBL databank under accession numbers indicated in table 1.
Results
Phylogenetic
analyses of SSU rRNA sequences
representative
of different enterobacterial
genera were
performed using three methods: neighbor-joining,
maximum parsimony, and maximum likelihood. Considering
that E. coli and ShigeZZa in fact belong to the same bacterial species because they share a genomic DNA relatedness greater than 70%, and that Escherichia hermanii
is probably misclassified
as Escherichia
(Hart1 1992),
the phylogenetic
analyses provided resolution at the genus level as two robust monophyletic
units were found
in all analyses (fig. 1): (1) the genus Salmonella (S. give,
S. paratyphi A, S. sp., S. Sofia, S. typhimurium, and S.
shomron) and (2) the genera Escherichia (all of seven
strains of E. coli, E. vuZneris, E. sp.) and ShigeZZa (S.
sonnei, S. dysenteriae, S. jfexneri). The parsimony analysis with a heuristic search option found 12 trees (for
23 sequences, 146 nonuniform
characters, 81 informative positions, tree length 238, consistency
index excluding uninformative
characters 0.576, retention index
0.822). The two other methods produced very similar
topologies
as summarized
in figure 1. The internal
branches for these two monophyletic
taxa were well
supported, either by maximum likelihood (each branch
showed a significantly
positive length at P < 0.01) or
in a bootstrap analysis using the parsimony method that
showed, respectively,
a support in 97% and 100% of
bootstrap replications (fig. 1). Two other analyses were
also undertaken, by excluding the leftmost or the rightmost parts of the sequences. Analyses performed using
the 5’ ends of the sequences produced a topology identical to that obtained when considering
the entire sequences. However, trees obtained with the 3’ ends of the
sequences showed significant differences. In particular,
the two monophyletic
units described above were not
retrieved (fig. 1). The most significant changes (topologies obtained using neighbor-joining,
maximum parsimony and maximum likelihood) were for the positions
of E. sp. and S. paratyphi A. Escherichia sp. clustered
with Salmonella (but this was not supported in a bootstrap analysis),
while S. paratyphi
A clustered with
16s t-RNA Heterogeneities
r
Escherichia coli 6
Escherichia coli 4
Escherichia coli 3
Escherkhia coli 2
Escherichia coli 1
Escherichia coli I
Escherichia coli 5
[S; 67%]
[3’; 93%]
91%
**
[5’; 9951
97%
Escherichia sp.
Salmonella paratyphiA
Salmonella shomron
**
Salmonella typhimurium
Escherichia he-ii
Citrobacter freundii
Klebsiella pneumoniae
Erwinia car&worn
FIG. 1.-Phylogenetic
relationships
among Enterobacteriaceae
inferred from SSU rRNA sequences. The topology shown is an unrooted
tree obtained using a neighbor-joining
algorithm. Branches also found
by maximum likelihood (branches significantly positive at P < 0.01)
are labeled with asterisks. Supports from a bootstrap analysis using
parsimony are shown as percentages of replications above each branch
(only percentage above 50% are indicated), these numbers also indicate
branches retrieved in the most parsimonious
tree (strict consensus of
12 trees). Internal branches retrieved by all three methods using the 5’
or the 3’ ends are indicated (as well as percentage of bootstrap replications). For Escherichia coli strain PK3, the mean sequence obtained
from direct sequencing of PCR products was used. A resolution at the
genus level is achieved, since two robust monophyletic
taxa regroup
strains that belong to the same genus (see the text for more details).
Escherichia and ShigeZZa (supported at the 75% level of
bootstrap replications).
These problems could result either from a lack of
data or from inappropriate
data. The two reduced data
matrices used in parsimony had similar characteristics:
for the 3’ end: 77 nonuniform characters, 45 informative
positions, tree length 118, consistency
index excluding
noninformative
characters 0.612, retention index 0.876;
and 69 nonuniform
characters, 36 informative positions,
tree length 110, consistency
index excluding noninformative characters 0.610, retention index 0.790 for the 5 ’
end. A lack of support in bootstrap analyses suggested
that there was at least a lack of significant data, as did
the low distances between any two sequences used in
the neighbor-joining
approach. A reduced signal when
only half of the sequences was used may however not
be the only problem, as the presence of distant outgroups could bring a spurious signal in phylogenetic
analyses because multiple mutations
occurring at the
same site may not be detected during the sequence
alignment and the choice of characters retained for the
phylogenetic
analysis
(Smith, Lafay, and Christen
1992). In order to examine this problem in more detail,
455
a phylogenetic
study was undertaken,
restricted to the
genera Escherichia,
Shigella, and Salmonella.
These
phylogenetic analyses revealed the same pattern as analyses with an outgroup (data not shown), therefore excluding the outgroup effect mentioned above.
We then analyzed how phylogenetic
relationships
were resolved within the monophyletic
group that can
be tentatively identified as the true genus Escherichia.
For this taxon, the maximum parsimony analysis (heuristic search) found four trees (for 12 sequences, 34 nonuniform characters, 7 informative positions, tree length
40, consistency
index excluding uninformative
characters 0.818, retention index 0.800). The branch separating
all E. coli strains on the one hand and E. vulneris, S.
dysenteriae, and S. flexneri on the other hand was supported at the 70% level of bootstrap replications and was
also observed in analyses using the two other methods
(fig. 2). But the other internal branches could not be
resolved. For the Salmonella
taxon, maximum parsimony found three trees (for 6 sequences, 36 nonuniform
characters, 7 informative positions, tree length 45, consistency
index excluding
uninformative
characters
0.750, retention index 0.571). A single internal branch
was retrieved by all three methods (and supported by
70% of bootstrap replications).
In both taxa, internal
branches that were retrieved with some robust support
also corresponded to the deepest branchings observed in
the general analysis of figure 1. The lack of resolution
observed at the species level clearly resulted from a lack
of significant data in all analyses. In particular, the data
matrices used for parsimony had only seven informative
positions for, respectively, 6 or 12 species.
In order to investigate
if the lack of differences
between species was indeed a lack of divergence or if
it resulted from an inappropriate
use of sequencing
methods (see below), we then compared the SSU rRNA
sequences of the different operons within a single organism. Escherichia coli strain PK3 (Kahn 1968) was
chosen because it is one of the parental strain of the
recombinant
bacteria “Salmorichia,”
which can used in
experimental
studies to estimate the rates of recombination between rrn operons (Rayssiguier,
Thaler, and
Radman 1989). The presence of seven rrn operons has
been described for E. coli (Hill and Harnish 1981).
These data were confirmed by DNA hybridization
of a
labeled probe to the genomic DNA of E. coli strain PK3,
after digestion by a combination
of restriction enzymes
(see the Materials
and Methods). These seven SSU
rRNA genes were sequenced revealing that there were
mutations in the different genes. These sequences were
then aligned with the available SSU rRNA sequences of
the other enterobacterial
species and strains of E. coli
retrieved from EMBL (fig. 3). This analysis showed that
these mutations were not-dispersed over-the entire length
456
Cilia et al.
B
I
D
SalmoneUa
paroryphiA
E
F
FIG. 2.-Phylogenetic
relationships among Escherichiu and Shigella or among Salmonella. The trees shown in this figure are unrooted trees
obtained by the neighbor-joining
method (trees A and D), by the maximum likelihood method (trees B and E), and by the maximum parsimony
method (trees C and F). Different topologies were obtained with different methods. Branches significantly positive at P < 0.01 in maximum
likelihood are labeled with asterisks. When several trees were obtained by parsimony, a strict consensus was computed. Supports in a bootstrap
analyses are indicated as percentages of replications above each branch (only percentage above 50% are indicated); these numbers also indicate
branches retrieved in the most parsimonious
tree. Thicker lines indicate branches that were found in all three methods.
of the sequence but appeared in domains that were also
variable in the SSU r-RNA sequences of other species
(fig. 3), and that corresponded
to helices in the secondary structure (De Rijk et al. 1992). Identical mutations
were often found when our sequences were compared
to published
sequences
for homologous
operons although probably for other strains of E. coli. The most
obvious example was the 3’ end of the sequence of rmH
(Nakayashiki
et al. 1992; Miyamoto
1993), but some
minor differences can also be found (fig. 3). These differences are difficult to analyze because strains have often not been mentioned in previously published works.
For strain PK3, it is striking that differences between
operons do not seem to appear at random. Operons D
and G had identical sequences but differed from any
other operon at positions 80, 81, 90, 91, and 94. The
sequence of operon D was different from any other operon at positions 25 1, 254, and 274. Finally, the sequence of operon H was different from any other operon
at positions 1004, 1008, 1012, 1021, 1022, 1023, 1024,
1025, and 1041. A “bulk”
sequence was also obtained
by amplification
of total genomic DNA of E. coli PK3
followed by a direct sequencing of PCR products using
T7 DNA polymerase.
The readings of the sequencing
gels did not show any strongly ambiguous
position;
some double bands could be observed in positions that
corresponded
to differences
between r-m operons, but
the second band was weak and would have been interpreted as a small sequencing artifact in an usual reading.
As a result, the sequence obtained can indeed be considered as a mean rRNA sequence for the organism considered (compare this sequence with that of the seven
operons in fig. 3).
A phylogenetic analysis including the sequences of
the seven rrn operons showed that rmH clustered with
E. sp. (fig. 4). The parsimony analysis found seven trees
(for 19 sequences, 37 nonuniform
characters, 15 informative positions, tree length 46, consistency index excluding noninformative
characters 0.696, retention index
0.720). The internal branch linking E. coli rmH with E.
sp. was supported at 78% level of bootstrap replication
and showed a significantly positive length at P < 0.01
using the maximum likelihood method.
Discussion
Relationships
among organisms are often presented
as evolutionary
trees. The resolution obtained in a given
phylogenetic
tree is generally thought to be a schematic
picture of our understanding
of the evolutionary
history
of these organisms.
A group of species appears as a
clade when they all descend from a node from which
no other species that is not a member of this group also
descend. In figure 1 for example, two clades can be robustly identified, i.e., that of {Escherichia
+ Shigella}
and that of {SaZmoneZZa}. However, further analyses
could not derive decisive phylogenetic ingroup relationships within the {Escherichia
+ ShigeZZa} and Salmo-
16s rRNA Heterogeneities
80
90
251
261
271
1003
1013
1023
1033
TCTTT
--
AGTAGGTGGG
GTAACGGCTC
ACCT
CGGAAGTTTT
CAGAGATGAG
*AAT*GTGCC
Eco4rmD
Eco4rrnG
GAAGCTTGCT
_-___-__
TTCGGGAACC
-________-
AC____----
GT-_C
Eco 4 rmH
_____--___
_____
Eco 1
Eco 2
__________
CT_-*
-A---C---C
___---_-
-
Eco 4 rmABCE
Eco 3
Eco 4
Eco 5
-__---______--__________
_-_---_---_______-_-----_--CT__C
Eco 6
_C________
G_---
EC07
CG________
G____
457
-________-_________
__----------- ----_----- --__
-_________
---------- -_-_
__________
___-______
__----_-_____~--------- ---_
__________
____A_____
____
Eco 8 rrnABE
.&08tTIlC
----___
__
Eco%rmD
AC____----
Eco9rrnH
----------
-____
Eco 10 rmG
_c________
l___c
GT__C
E. sp.
---_-_____-----
E. vulneris
C-NN-----G
NT-CG
*T--C
E. he-ii
_C_______N
Sh. jlexnen’
-C-----_--
GU-_C
Sh. dysenteriae
_C_____--_
G____
__________
__________
____
T--T___---
__________
______
____
__ _____^
-_________
------------ --________
____
__---_____
__________
__---___-__________
_-_----__y
______-_-_
Sh. sonnei
S. give
AC_-------
G,J__C
_--_-__--C
-C-------C
G____
-
S. paratyphiA
NNN+--N_-
l ____
s. sp.
_C________
G__--
S. sofa
_C__-_---_
G____
S. typhimurium
_C______-N
*---C
S. shomron
_c________
*___c
-A---CCC-G”
------AC_
--GM_____
-----__--”
C. fredii
_G________
C-_-G
GA---C--G&
-------CC
_N’“G-_____
_-----_-_”
K. pneumoniae
*G-_-_-_-N
-~---C---C-
--_--_-GA
-wG
----_*
_____----g
Se. marcexens
-G-____---
GA___C___C_
_-_-_--GA
_,J”G
--____
---------y
Er. carotovom
_G__---___
-- ---_ _ __________
--_
-- -__- _ -__--_--_____
_--“-___A_
__________
-__A
*__CG
C-C-G
N-_-G
----____A_
___-“_____
____
--_
-----_--GA
A__G______
---_--__
_
---_______
______--__
_---____--
_--_-C---C
-____---GA
-“‘“G------
-________u
_____C._N
--------GA
_---____
---___---N
-A--_-~-C
_____-_-GA
_T,“G______
---______”
_A___“__GG”
_-_
______ACC
_~A__----
--___*
___--___-”
FIG. 3.-Aligned
SSU t-RNA sequences for Enterobacteriaceae
and for the seven operons of E. coli strain PK3. Only parts of the sequences
that showed differences from one operon to another are shown. Positions that were identical to that in the first sequence are indicated by dashes.
Asterisks are used to show deletions necessary to maximize homologies.
For strain PK3, individual operons were isolated and sequenced
(Escherichia coli 4 rrn A-H), but a sequence was also obtained directly from PCR products using full genomic DNA (E. coli 4). Note that for
each domain, a sequence that is different in one operon of strain PK3 matches a mean sequence of another species (underlined). Sequence for
operon B was identical to that obtained after cloning of the same operon: E. coli 2 (Brosius, et al. 1978; Weisburg et al. 1991). Operon H was
different from any other operon at positions strikingly located in the 3’ end of the sequence. Confirmation of the peculiar 3’ end sequence of
operon H can be found elsewhere (Nakayashiki et al. 1992; Miyamoto 1993). Numbers refer to positions in E. coli 2 sequence. Abbreviations:
Eco: Escherichia coli; E.: Escherichia; Sh.: Shigella; S. : Salmonella; C.: Citrobacter, K.: Klebsiella; Se.: Serratia; Er.: Erwinia. Eco4 rrn ABCE
correspond to rrnA, B, C, and E of E. coli 4, which possess identical sequences. rrnA, B, and E of E. coli 8 were also identical and are
referenced as Eco 8 rrn ABE.
nella clades, at least on the basis of SSU t-RNA sequences analysis. This problem is clearly due to a lack of data
because not enough differences exist within these genera
between the SSU rRNA sequences of such closely related species, at least as these sequences are currently
obtained. Usually, a mean sequence of the different copies of the rRNA genes is now obtained by direct sequencing of rRNA molecules (Devereux et al. 1990; this
work) or by direct sequencing of PCR products (Lawson
et al. 1993; Rainey and Stackebrandt
1993; Willems and
Collins 1993; Ruimy et al. 1994; this work). Such sequences are filtered either because mutations
present
only in a single gene copy are not apparent or because
bases are scored as undetermined
when ambiguities are
apparent. The success of this filtering effort was verified
by comparing
bulk sequences
obtained by direct sequencing of PCR products or rRNA with individual sequences obtained after isolation of each operon (fig. 3).
Most of the variable positions that have been observed from one operon to another are located in domains of high mutation rates. These highly divergent
domains are usually excluded from the data matrix when
a phylogenetic
analysis is undertaken for resolving distant relationships
because these positions would include
too many characters that are obviously homoplasic. It is
therefore not important that the existence of a heterogeneity is not known or not taken into account, provided
that these characters are removed from phylogenetic
analyses. Any sequencing method mentioned above is
then probably perfectly appropriate for determining
the
sequences of conserved domains. It is worth noting that
the sequences of conserved domains as determined from
different operons in different species do not respect the
homology condition, but they are identical in practice
within a clade of related organisms probably because
the rate of fixation of mutations allowed by the selective
pressure is lower than the rate of homogenization
through conversion
(see below). As heterogeneity
between the different members of the rrn family has already been described in Mycoplasma mycoides and Mycoplasma sp. (Bascufiana
et al. 1994; Pettersson,
Johansson, and Uhlen 1994), our results suggest that het-
458
Cilia et al.
-.chctichia
coli 7
**
LEscherichia
coli 4 & 8 rmD
l- Escherichia coli 1
94%
**
scherichia co/i 4 (mean sequence)
scherichia coli 3
scherichia coli 2
scherichia coli 4 rmE
schen’chia coli 4 rmC
scherichio coli 4 rmB
scherichia coli 4 rmA
I
Escherichia sp.
Escherichia coli 4 rmH
FIG. 4.-Phylogenetic
relationships among the seven rrn operons of
Escherichia coli PK3. The topology shown is an unrooted tree obtained
using a neighbor-joining
algorithm. Branches also found by maximum
likelihood (branches significantly positive at P < 0.01) are labeled with
asterisks. Supports from a bootstrap analysis using parsimony
are
shown as percentages
of replications
above each branch (only percentage above 50% are indicated). These numbers also indicate branches retrieved in the most parsimonious tree.
erogeneity
among rrn operons might be a common
phenomenon
at least in bacteria. Because this heterogeneity increases the number of differences that can be
analyzed between the rRNA sequences of any two organisms, obtaining
a more robust phylogeny between
closely related organisms may be considered after sequencing each operon, determining
operon homology
and comparing
sequences
that are homologous.
Although this approach may seem sound, it is probably
not appropriate because of a likely loss of homology at
the nucleotide level that will result from frequent conversions (discussed below).
The differences observed between the seven operons that we have sequenced are not located randomly
over the entire SSU t-RNA sequence, but they are localized for each operon within a single particular domain of high evolutionary
rate (fig. 3). Because two different operons can bear identical mutations all located
over short domains (operons D and G between positions
71 and 100 for example), this suggests that the independent appearance of point mutations in the sequence
of each operon has been obliterated by recombination
events. The alternative hypothesis would be that these
identical mutations at five different positions have appeared independently
in two sequences. Although these
mutations are compensatory
mutations because they do
not alter the secondary structure, it seems unlikely that
five identical mutations occurred in a single variable domain for only two operons and not in the other domains
nor in the other operons (see operons D and G in positions 251-280).
A decisive proof that such domain
identity is the result of gene recombination
and not of
convergent
point mutations would require a statistical
analysis including data of several closely related strains
of E. coli, but we suggest that recombinations
have occurred and that they did not affect the entire length of
the SSU rRNA sequences. Finally, the very different
sequence observed in the 1001-1050 domain of operon
H and the 251-280 domain of operon D are very similar
to some bulk sequences of species belonging to other
genera.
The SSU RNA sequence of each operon within a
particular line of descent probably results from the superposition of different phenomena:
(1) an initially heterogeneous multigenic family when the taxon diverged,
(2) fixation of random mutations, (3) gene conversions
within the multigene family, and (4) lateral transfers.
Gene conversions affecting short domains are suggested
by our results that show small identical domains in different operons. It is still difficult to ascertain the origin
of the differences that allow identification
of such recombinations
within a single organism. They may result
either from point mutations since the establishment
of
the lineage, from a heterogenous
rrn family at the time
of divergence or from a lateral transfer. For E. coli, lateral transfers can occur even from distantly related organisms such as S. typhimurium
(Rayssiguier,
Thaler,
and Radman 1989). An interesting hypothesis is that the
sequences observed presently in strain PK3 are in part
the result of a polymorphic
rm family in the bacteria
that was the common ancestor at least to Escherichia
and Salmonella and an absence of homogenizing
conversion between some domains (3’ end of rrnH for example) and the rest of the family during the lapse of
time that separates strain PK3 from this ancestor. In this
view, it can be predicted that some strains of E. coli
have probably experienced a conversion and that these
domains have been replaced by a typical E. coli sequence in these strains.
After the completion of homogenization
within the
rRNA family in a single genome, a direction of homogenization can be defined by considering which particular
operonic sequence has replaced the ancestral sequences
in all other operons. If the homogenization
process does
not involve the entire SSU-rRNA sequence, different directions of homogenization
can occur for different domains of the molecule; as a result, even without sequencing each operon, this phenomenon could be traced
from the observation
of conflicting trees when, respec-
16s r-RNA Heterogeneities
tively, the 5’ end or the 3’ end of mean sequences or
different variable domains are used in a phylogenetic
analysis. Apart from our observations,
other studies
have demonstrated
this problem either with SSU rRNA
sequences (Sneath 1993) or with the multigene families
of heat-shock HSP70 (Boorstein, Ziegelhoffer, and Craig
1994) and chorion proteins (Regier et al. 1994) for
which trees derived from the C-terminal domain differed
from that obtained using the N termini. Inheritance of a
polymorphism
followed by a subsequent complete homogenization
through conversions would cause all gene
trees to become coherent and display robust taxa that
may reflect a similarity in the directions of conversions
rather than a true monophyletic
assemblage of species.
This is a well known problem that phylogenetic analyses
reflect the genealogy of the genes, not necessarily that
of the organisms that possess the sequences in question
(Nei 1987). One way to circumvent
such chance phylogenetic relationships
due to historical recombinations
is to use another gene, such as the rRNA gene for the
large subunit, which is expected to be differently homogenized by conversion or to use a different type of
gene, such as a single copy gene not liable to gene conversion (Barcak and Wolf 1988; Nelson, Whittam, and
Selander 1991; Nelson and Selander 1992; Boyd et al.
1994). However, lateral transfers have been demonstrated for many single copy genes, resulting in conflicting
phylogenies
when different genes are studied (Nelson,
Whittam, and Selander 199 1; Nelson and Selander 1992,
1994); an example of such conflict can be observed for
the position of E. vulneris in our study and that of Lawrence and collaborators
(Lawrence, Ochman, and Hart1
1991). Organismal trees need to be derived from comparisons of phylogenies
of several genes widely spaced
on the genome in order to avoid as much as possible
the effect of lateral transfers. However, hierarchical relationships and a rigid concept of species may not always apply at the gene level, because evolutionary
processes are probably
in part reticulate as opposed to
strictly hierarchical
(“modification
with descent”)
and
taxon dependent.
Considering
the large database of SSU rRNA sequences now available (Maidak et al. 1994), the ease
with which these sequences are now obtained for any
organism, and their wide use in bacterial systematics, it
will be of importance
to determine the rates at which
gene homogenizations
and lateral transfers occur in the
rRNA multigene
family. A precise assessment
of the
relations between the species tree and the gene tree will
require a knowledge of the respective rates of speciation,
recombination,
and point mutation. Finally, one should
note that some care should be taken when deriving species-specific probes, especially for a detection by PCR,
as a single operon in one species might bear a sequence
459
that is present in the majority of the operons in another
species (see fig. 3). How often such problems can be
encountered
is still difficult to assess, because we lack
data for individual operonic sequences in closely related
organisms and because we have only poor estimates for
the frequencies
of recombinations
for rrn operons
(Milkman and MC Kane Bridges 1990, 1993; Medigue
et al. 1991; Hart1 1992; Lenski 1993; Guttman and Dykhuizen 1994).
Acknowledgments
This work was supported by the CNRS, fundings
from BioMerieux,
and an MRT fellowship to V. Cilia.
We thank Bernard Michot, Christiane Rayssiguier, Man010 Gouy, and Andrew B. Smith for reading this manuscript.
LITERATURE CITED
ASH, C., J. A. E. FARROW, S. WALLBANKS, and M. D. COLLINS.
199 1. Phylogenetic heterogeneity of the genus Bacillus revealed by comparative analysis of small-subunit-ribosomal
RNA sequences. Lett. Appl. Microbial. 13:202-206.
AUSUBEL, E M., R. BRENT, R. E. KINGSTON, D. D. MOORE, J.
G. SEIDMAN, J. A. SMITH, and K. STRUHL, eds. 1991. Current protocols in molecular biology. John Wiley & Sons,
New York.
BARCAK, G. J., and J. WOLF. 1988. Comparative nucleotide
sequence analysis of growth-rate-regulated
gnd alleles from
natural isolates of Escherichia coli and from Salmonella
typhimurium LT-2. J. Bacterial. 170:372-379.
BASCU&XNA, C. R., J. G. MATTSSON, G. BOLSKE, and K. E.
JOHANSSON. 1994. Characterization of the 16s rRNA genes
from Mycoplasma sp. strain F38 and development
of an
identification system based on PCR. J. Bacterial. 176:25772586.
BERCHTOLD,M., W. LUDWIG, and H. K~NIG. 1994. 16s rDNA
sequence and phylogenetic position of an uncultivated spirochete from the hindgut of the termite Mustotermes durwiniensis Froggatt. FEMS Microbial. Lett. 123:269-274.
BOORSTEIN,W. R., T. ZIEGELHOFFER,and E. A. CRAIG. 1994.
Molecular evolution of the HSP70 multigene family. J. Mol.
Evol. 38: 1-17.
BOYD, E. E, K. NELSON, E S. WANG, T. S. WHITTAM, and R.
K. SELANDER. 1994. Molecular genetic basis of allelic polymorphism in malate dehydrogenase
(mdh) in natural populations of Escherichia coli and Salmonella enterica. Proc.
Natl. Acad. Sci. USA 91: 1280-1284.
BRENNER, D. J., and S. FALKOW. 1971. Molecular relationships
among members of Enterobacteriaceae.
Adv. Genet. 16:81118.
BRITSCHGI, T. B., and S. J. GIOVANNONI. 1991. Phylogenetic
analysis of a natural marine bacterioplankton
population by
rRNA gene cloning and sequencing. Appl. Environ. Microbiol. 57:1707-1713.
BROSIUS, J., M. L. PALMER, J. P KENNEDY, and H. E NOLLER.
1978. Complete nucleotide sequence of a 16s ribosomal
460
Cilia et al.
RNA gene from Escherichia coli. Proc. Natl. Acad. Sci.
USA 75:4801-4805.
CASADABAN, M. J. 1976. Transposition and fusion of the lac
genes to selected promoters in Escherichia coli using bacteriophage lambda and mu. J. Mol. Biol. 104541-555.
DE RLIK, I?, J. M. NEEFS, Y. VAN DE PEER, and R. DE WACHTER.
1992. Compilation
of small ribosomal subunit RNA sequence. Nucleic Acids Res. 20:2075-2089.
DEVEREUX R., S. HE, C. L. DOYLE, S. ORKLAND, D. A. STAHL,
J. LEGALL, and W. B. WHITMAN. 1990. Diversity and origin
of desulfovibrio species: phylogenetic definition of a family.
J. Bacterial. 172:3609-3619.
DOVER, G. 1982. Molecular drive: a cohesive mode of species
evolution. Nature 299: 11 l-l 17.
-.
1987. DNA turnover and the molecular clock. J. Mol.
Evol. 26:47-58.
Fox, G. E., J. D. WISOTZKEY, and F? J. JURTSHUK. 1992. How
close is close: 16s rRNA sequence identity may not be
sufficient to guarantee species identity. Int. J. Syst. Bacteriol. 42: 166-170.
FUHRMAN, J. A., K. MCCALLUM, and A. A. DAVIS. 1992. Novel major archaebacterial
group from marine plankton. Nature 356: 148-149.
GIOVANNONI,S. J., T. B. BRITSCHGI, C. L. MOYER, and K. G.
FIELD. 1990. Genetic diversity in Sargasso sea bacterioplankton. Nature 345:60-63.
GUTTMAN, D. S., and D. E. DYKHUIZEN. 1994. Clonal divergence in Escherichia coli as a result of recombination,
not
mutation. Science 266: 1380-1383.
HARTL, D. 1992. Population genetics of microbial organisms.
Curr. Opin. Genet. Dev. 2:937-942.
HARVEY, S., and C. W. HILL. 1990. Exchange of spacer regions
between rRNA operons in Escherichia coli. Genetics 125:
683-690.
HARVEY, S., C. W. HILL, C. SQUIRES, and C. L. SQUIRES. 1988.
Loss of the spacer loop sequence from the rrnB operon in
the Escherichia coli K- 12 subline that bears the relA1 mutation. J. Bacterial. 170:1235-1238.
HASSOUNA, N., B. MICHOT, and J. I? BACHELLERIE. 1984. The
complete nucleotide sequence of mouse 28s rRNA gene.
Implications for the process of size increase of large subunit
rRNA in higher eukaryotes. Nucleic Acids Res. 12:35633583.
HILL, C. W., and B. W. HARNISH. 1981. Inversion between
ribosomal RNA genes of Escherichia coli. Proc. Natl. Acad.
Sci. USA 787069-7072.
JUKES, T. H., and C. R. CANTOR. 1969. Evolution of protein
molecules. Pp. 21-132 in H. N. MUNTU, ed. Mammalian
protein metabolism, Vol. III. Academic Press, New York.
KAHN, I? L. 1968. Isolation of high-frequency
recombining
strains from Escherichia coli containing the V colicinogenic
factor. J. Bacterial. 96:205-214.
KIMURA, M. 1980. A simple method for estimating evolutionary rate of base substitutions through comparative studies
of nucleotides sequences. J. Mol. Evol. 16: 11 l-120.
KRIEG, N. R., and J. G. HOLT. 1984. Bergey’s manual of systematic bacteriology,
Vol. 1. Williams & Wilkins, Baltimore, London.
LAWRENCE, J. G., H. OCHMAN, and D. L. HARTL. 1991. Mo-
lecular and Evolutionary relationships among enteric bacteria. J. Gen. Microbial. 137:1911-1921.
LAWSON, I? A., I? LLOP-PEREZ, R. A. HUTSON, H. HIPPE, and
M. D. COLLINS. 1993. Towards a phylogeny of the clostridia
based on 16s rRNA sequences. FEMS Microbial. Lett. 113:
87-92.
LENSKI, R. 1993. Assessing the genetic structure of microbial
populations. Proc. Natl. Acad. Sci. USA 90:4334--4336.
LONG, E. O., and I. B. DAWID. 1980. Repeated genes in eukaryotes. Annu. Rev. Biochem. 49:727-764.
LOUGHNEY, K., E. LUND, and J. E. DAHLBERG. 1982. tRNA
genes are found between the 16s and 23s rRNA genes in
Bacillus subtilis. Nucleic Acids Res. 10: 1607-1625.
MAIDAK, B. L., N. LARSEN, M. J. MCCAUGHEY, R. OVERBEEK,
G. J. OLSEN, K. FOGEL, J. BLANDY, and C. R. WOESE. 1994.
The ribosomal database project. Nucleic Acids Res. 22:
3485-3488.
M~DIGUE, C., T. ROUXEL, l? VIGIER, A. HENAUT, and A. DANCHIN. 1991. Evidence for horizontal gene transfer in Escherichia coli speciation. J. Mol. Biol. 222:851-856.
MILKMAN, R., and M. MC KANE BRIDGES. 1990. Molecular
evolution of the Escherichia coli chromosome. III. Clonal
frames. Genetics 126:505-5 17.
-.
1993. Molecular evolution of the Escherichia coli
chromosome.
IV. Sequence comparisons.
Genetics 133:
455-468.
MIYAMOTO, K. 1993. Nucleotide sequence of 5’ flanking region of the ribosomal RNA. Unpublished material. EMBL,
access number: D 1506 1.
NAKAYASHIKI,T., T. ADACHI, T. MIKI, and H. INOKUCHI. 1992.
Nucleotide sequence of the rrnH gene from Escherichia
coli. Unpublished
material.
EMBL,
access
number:
D12649.
NEI, M. 1987. Molecular evolutionary genetics. Columbia University Press, New York.
NELSON, K., and R. K. SELANDER. 1992. Evolutionary genetics
of the proline permease gene (PutP) and the control region
of the proline utilization operon in populations of Salmonella and Escherichia coli. J. Bacterial. 174:6886-6895.
-.
1994. Intergeneric transfer and recombination of the
6-phosphogluconate
dehydrogenase
gene (gnd) in enteric
bacteria. Proc. Natl. Acad. Sci. USA 91: 10227-10231.
NELSON, K., T. S. WHITTAM, and R. K. SELANDER. 199 1. Nucleotide polymorphism and evolution in the glyceraldehyde3-phosphate dehydrogenase
gene (gapA) in natural population of Salmonella and Escherichia coli. Proc. Natl. Acad.
Sci. USA 88:6667-667 1.
OHTA, T. 199 1. Multigene families and the evolution of complexity. J. Mol. Evol. 33:34-41.
PAITHANKAR,K. R., and K. S. N. PRASAD. 1991. Precipitation
of DNA by polyethylene glycol and ethanol. Nucleic Acids
Res. 19: 1346.
PETTERSSON,B., K. E. JOHANSSON, and M. UHLEN. 1994. Sequence analysis of 16s rRNA from mycoplasmas by direct
solid-phase
sequencing.
Appl. Environ. Microbial.
60:
2456-246 1.
16s rRNA Heterogeneities
RAINEY, E A., and E. STACKEBRANDT.1993. 16s rDNA analysis reveals phylogenetic diversity among the polysaccharolytic clostridia. FEMS Microbial. Lett. 113: 125-128.
RAYSSIGUIER,C., D. S. THALER, and M. RADMAN. 1989. The
barrier to recombination between Escherichia coli and Salmonella typhimurium is disrupted in mismatch repair mutants. Nature 342:39wOl.
REGIER, J. C., B. M. WIEGMANN, R. E LECLERC, and T. I?
FRIEDLANDER.1994. Loss of phylogenetic in chorion gene
families of Bombyx mori by gene conversion. Mol. Biol.
Evol. 11:72-87.
R~SSLER, D., W. LUDWIG, K. H. SCHLEIFER, C. LIN, T J. McGILL, J. D. WISOTZKEY, J. R. JURTSHUK, and G. E. Fox.
199 1. Phylogenetic diversity in the genus Bacillus as seen
by 16s rRNA sequencing studies. System. Appl. Microbial.
14:266-269.
RUIMY, R., V. BREIT~MAYER,l? ELBAZE, B. LAFAY, 0. BousSEMART,and R. CHRISTEN. 1994. Phylogenetic analysis and
assessment of the genera Vibrio, Photobacterium, Aeromonus, and Plesiomonus deduced from small-subunit rRNA
sequences. Int. J. Syst. Bacterial. 44:416426.
SAITOU, N., and M. NEI. 1987. The neighbor-joining
method:
a new method for reconstructing
phylogenetic
trees. Mol.
Biol. Evol. 4:4Ow25.
SCHMIDT,T. M., E. E DELONG, and N. R. PACE. 199 1. Analysis
of a marine picoplankton community by 16s rRNA gene
cloning and sequencing. J., Bacterial. 173:4371-4378.
461
SMITH, A. B., B. LAFAY, and R. CHRISTEN. 1992. Comparative
variation of morphological and molecular evolution through
geologic time: 28s ribosomal RNA versus morphology in
echinoids. Phil. Trans. R. Sot. Lond. B 338:365-382.
SNEATH, F? H. A. 1993. Evidence from Aeromonus for genetic
crossing-over in ribosomal sequences. Int. J. Syst. Bacterial.
43:626629.
SUZUKI, Y., and T. YAMADA. 1988. Study on rRNA genes in
Mycobucterium smegmatis. Microbial. Immunol. 32: 12591262.
SWOFFORD, D. 1992. PAUP: phylogenetic analysis using parcimony, version 3.0s. Illinois Natural History Survey,
Champaign.
WARD, D. M., R. WELLER, and M. M. BATESON. 1990. 16s
rRNA sequences reveal numerous uncultured microorganisms in a natural community. Nature 345:6365.
WEISBURG, W. G., S. M. BARNS, D. A. PELLETIER, and D. L.
LANE. 1991. 16s ribosomal DNA amplification for phylogenetic study. J. Bacterial. 173:697-703.
WILLEMS, A., and M. D. COLLINS. 1993. Phylogenetic analysis
of Rhizobiu and Agrobucteriu based on 16s t-RNA gene
sequences. Int. J. Syst. Bacterial. 43:305-313.
PAUL M. SHARP, reviewing
Accepted
November
8, 1995
editor