Download Molecualr Biology and Evolution

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts

Minimal genome wikipedia , lookup

Nutriepigenomics wikipedia , lookup

Epigenetics of human development wikipedia , lookup

Genome (book) wikipedia , lookup

Gene desert wikipedia , lookup

Vectors in gene therapy wikipedia , lookup

Non-coding DNA wikipedia , lookup

History of genetic engineering wikipedia , lookup

Genome evolution wikipedia , lookup

Human genome wikipedia , lookup

Quantitative comparative linguistics wikipedia , lookup

Expanded genetic code wikipedia , lookup

Site-specific recombinase technology wikipedia , lookup

Gene expression programming wikipedia , lookup

Pathogenomics wikipedia , lookup

Genetic code wikipedia , lookup

Gene expression profiling wikipedia , lookup

Designer baby wikipedia , lookup

Gene wikipedia , lookup

Genomics wikipedia , lookup

Genome editing wikipedia , lookup

Therapeutic gene modulation wikipedia , lookup

RNA-Seq wikipedia , lookup

Point mutation wikipedia , lookup

Microevolution wikipedia , lookup

Helitron (biology) wikipedia , lookup

Metagenomics wikipedia , lookup

Maximum parsimony (phylogenetics) wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Computational phylogenetics wikipedia , lookup

Transcript
Assessing Horizontal Transfer of nifHDK Genes in Eubacteria: Nucleotide
Sequence of nifK from Frankia Strain HFPCcI3
Ann il.4 Hirsch, Heather I. McKhann,’ Amala Reddy,2Jiayu Liao, Yiwen Fang,
and Charles R. Marshall*
Department
of Biology
and *Department
of Earth and Space Sciences, University
of California,
Los Angeles
Introduction
The ability to fix nitrogen is widely distributed
among representatives of the eubacteria, methanogens,
and halobacteria (Young 1992). Nitrogenase is an enzyme complex consisting of two components: dinitrogenase, the molybdenum-iron
protein, which is an a&
tetramer encoded by nifD and niflu, respectively, and
dinitrogen reductase, the iron-containing
protein, a
homodimer encoded by n$H. Ruvkun and Ausubel
( 1980) were the first to demonstrate that the nifgenes
of evolutionary-diverse
diazotrophs are remarkably
conserved. Such conservation, coupled with the fact that
the capacity to fix nitrogen was thought to be uncommon
yet widely dispersed among the major bacterial groups,
led some to hypothesize that nzxgenes have been hori-
Institut
1. Current
address: Centre
des Sciences Vegetales,
2. Current
address:
12 17, Bangladesh.
National de la Recherche
Scientifique,
9 1198 Gif-sur-Yvette,
Cedex, France.
A23 Century
Estate,
Baramaghbazaar,
Key words: nzyK, Frankia,
16s rRNA
phylogeny.
nzyH,
Address
for correspondence
and reprints:
Ann
partment
of Biology,
405 Hilgard
Avenue,
University
Los Angeles, California
90024- 1606.
Mol. Bid. Evol. 12(1):16-27.
0 1995 by The University
0737-4038/95/1201-0003$02.00
1995.
of Chicago.
Dhaka
All rights reserved.
M. Hirsch,
Deof California,
Downloaded from http://mbe.oxfordjournals.org/ by guest on January 12, 2012
The structural genes for nitrogenase, nifK, nifD, and nifr-r, are crucial for nitrogen fixation. Previous phylogenetic
analysis of the amino acid sequence of nifH suggested that this gene had been horizontally
transferred from a
proteobacterium
to the gram-positive/cyanobacterial
clade, although the confounding
effects of paralogous comparisons made interpretation
of the data difficult. An additional test of nifgene
horizontal transfer using nzjD was
made, but the NifD phylogeny lacked resolution. Here nifgene phylogeny is addressed with a phylogenetic analysis
of a third and longer nifgene, nzyK. As part of the study, the nifK gene of the key taxon Frankia was sequenced.
Parsimony and some distance analyses of the nifK amino acid sequences provide support for vertical descent of
nzyK, but other distance trees provide support for the lateral transfer of the gene. Bootstrap support was found for
both hypotheses in all trees; the nifK data do not definitively favor one or the other hypothesis. A parsimony
analysis of NifH provides support for horizontal transfer in accord with previous reports, although bootstrap
analysis also shows some support for vertical descent of the orthologous nifH genes. A wider sampling of taxa and
more sophisticated methods of phylogenetic inference are needed to understand the evolution of nifgenes.
The
nifgenes may also be powerful phylogenetic tools. If nifK evolved by vertical descent, it provides strong evidence
that the cyanobacteria and proteobacteria are sister groups to the exclusion of the firmicutes, whereas 16s rRNA
sequences are unable to resolve the relationships of these three major eubacterial lineages.
zontally transferred among microorganisms (Ruvkul
and Ausubel 1980; Normand and Bousquet 1989; Mu1
lin and An 1990). Others believe that nitrogenase is al
ancient enzyme (Kleiner et al. 1976) and that the ni
genes evolved by vertical descent (e.g., Hennecke et al
1985).
The use of phylogenetic analysis to detect horizonta
transfer involves the detection of discrepancies betweel
the phylogeny of the gene of interest and a reference, o
standard, phylogeny. The strongest evidence for th
horizontal transfer of nifgenes comes from the pioneer
ing investigations of the phylogeny of the nifH gent
(Normand and Bousquet 1989; Normand et al. 1992)
Using the 16s rRNA phylogenies of Woese ( 1987), 01
sen and Woese ( 1993), and Olsen et al. ( 1994) as stan
dards, the nzf.H tree has three unanticipated features
( 1) the thallobacterium Frankia is grouped with the cy
anobacterium Anabaena and not with the other firmi
cute, the firmibacterium Clostridium; (2) this Frankia
Anabaena clade lies within the proteobacterial clade
and (3) the P-proteobacterium Thiobacillus lies withir
the a-proteobacterial
clade. The case for horizonta
transfer has centered on the unusual position of Frankit
and Anabaena within the proteobacteria, and it was sug
gested that the FrankialAnabaena
clade acquired its m
Fmnkia HFPCcI3
nifK Gene
17
Downloaded from http://mbe.oxfordjournals.org/ by guest on January 12, 2012
genes horizontally from a proteobacterium. The case has genes of Frankia strain CeD and was provided by P.
also hinged on the absence of nitrogen fixation in many
Simonet ( University of Lyon I ) .
eubacterial clades (Normand and Bousquet 1989).
However, Young ( 1992) argued that the NifH tree Preparation of DNA and Hybridizations
of Normand and Bousquet ( 1989) does not provide clear
Plasmid DNA was prepared according to estabsupport for the horizontal transfer of nifrr, noting
lished procedures (Sambrook et al. 1989). Total DNA
( 1) that the crucial branches on the nifH tree are not
from Frankia strain HFPCcI3 was isolated as described
defined unambiguously; (2) that if the Clostridium n$H
previously (Reddy et al. 1992). The DNA was digested
is paralogous to the other n&H genes analyzed, then it with restriction enzymes, subjected to electrophoresis,
may be the aberrantly placed sequence, not the Frankia
and blotted onto Nytran membranes (Schleicher and
sequence; and (3) that the G+C contents of the n$H
Schuell) or GeneScreen Plus (New England Nuclear)
genes match the rest of the genomes in which they are according to the manufacturers’ protocol. Hybridizations
found and not the G+C contents of the hypothesized
were performed using a radioactively labeled probe made
donor lineage.
either by nick translation or with an Oligo-Labeling kit
Nonetheless, the subsequently published NifH tree (Pharmacia). Washes were done under stringent conof Normand et al. ( 1992) does provide some support
ditions as described in Sambrook et al. ( 1989). A partial
for horizontal transfer: the FrankialAnabaena
branch
restriction map was made of pAR2 and used to generate
lies within the proteobacterial clade with bootstrap sup- smaller subclones of the n$K region in pUCl19.
port of 80%. Normand et al. ( 1992) provided an additional test of the horizontal transfer hypothesis by ex- Sequencing
amining the phylogeny of another nf gene, n$D.
Initial sequencing was done using a Sequenase kit
Unfortunately, the NifD phylogeny lacks resolution, al(USB) and 35S-dATP (NEN). DNA fragments were
though at face value it is essentially the same as the 16s
separated on an 8% (or 5%) wedge polyacrylamide gel
rRNA topology. If the NifH tree supports horizontal
(BRL) in the presence of 7 M urea (Sanger et al. 1977 ) .
transfer, the NifD tree does not. Given the lack of corInstead of dideoxynucleotides, we used 7-deaza-2 ‘-nuroboration of the NifH tree by the NifD tree, and with
cleotides as well as PCR-based sequencing with a fmolreports of nitrogen fixation in an increasing wider range
sequencing kit (Promega) because the high G+C content
of bacteria, especially among other thallobacteria (highof the DNA often led to unresolvable compressions.
G+C gram-positives), Normand et al. ( 1992) recanted
Synthetic oligonucleotides were used as primers as nectheir earlier position and suggested, in accord with
essary to generate a complete sequence of n$K. Both
Young ( 1992), that vertical descent is the most likely
strands were completely sequenced. For PCR sequencexplanation for the distribution of nifgenes among bacing, reactions were performed on an Ericomp Easy-Cyteria and that the unexpected aspects of the NifH tree
cler thermal cycler. Samples were incubated 2 min at
may be due to the confounding effects of paralogous
95°C and then given 30 cycles of 95°C for 30 s followed
comparisons.
by 70°C for 30 s.
Despite the conclusions of Young ( 1992) and Normand et al. ( 1992), the position of Frankia and AnaSequence Analysis
baena in the nifH tree still requires explanation (as also
All available complete niflv sequences were anadoes the position of the P-proteobacterium Thiobacillus
lyzed.
Evolutionary comparisons were based on deduced
within the a-proteobacterial clade). As noted by Noramino
acid sequences (fig. 1) to reduce the artifacts
mand et al. ( 1992), the lack of signal in the NifD tree
caused
by
the high G+C content of Frankia, which is
renders the lack of conformation of the NifH tree by
especially
high
at third codon positions (Normand and
the NifD tree of little significance. Here we address nif
Bousquet
1989).
Frankia strains have a G+C content
gene phylogeny with a phylogenetic analysis of a third
ranging
from
66%
to 75% (Fernandez et al. 1989). Muland longer nifgene, n$K. No bona fide nifK sequences
tiple-sequence
alignment
was performed by the multifrom the crucial taxon Frankia were to be found in the
alignment
program
CLUSTAL
(Higgens and Sharp
database, so we sequenced nifK from Frankia strain
1988),
followed
by
refinement
by
eye.
HFPCcI3 and subjected the deduced amino acid seThe
deduced
amino
acid
sequences
were subjected
quence to phylogenetic analysis.
to maximum-parsimony
analysis using PAUP version
Material and Methods
3.0s (Swofford 1992) using branch and bound to find
Clones and Vectors
the most parsimonious tree. The aligned amino acid seWe used pUC19 and pUC 119 (Stratagene) as quences were also analyzed using several distance methods provided in PHYLIP 3.5~ (Felsenstein 1993). The
cloning vectors. Plasmid pFQ 167 contains the nifHDK
4
I
ll(
Azo
Thi
Rhi
Brj
Azb
Ana
cc1
Cl0
Azvn
Acvn
Azan
Met
.
MS------Q~IDKINSCYP~FEQDEYQEL~~KRQ-L~-~AHDAQR"Q~:"FAWTTTA~~~NFRR~T~PA~CQ~LGA~CSLGF~~PY"HG~QG~AYFRT~
. . -----. . . . . . . . . . . . . . . . . . . . . V . . . ..-..-.
R . . . . . . ..-...............
H- . . . . . . . . . . . HA . . . . . . . . . . . . . . . . ..I........
. . ------.QV...KAS
. . ..LDQD.KDMLAK..DGF.-.KYPQDKID.-..Q....K..QE...Q......N.............A...EK.M..............S.
------.NA
..
. ..VDHFN..K.PD
. ..M.K..QKTF.-NRLP.W.AR-GQE..K.W..REK..AR-..DG.R.F............St
.A------.SA.HVLDHLE..RGP
. ..QMLAD.-KMF.-NPR.PAE.ER-IR.V.K.P..REK..A-...A.N...........FV.V..EG...F..........Y.St
.P------.SAEHVLDHVE..RGP
. ..QMLAK.-KIF.-NPR.PAE.ER-IKE..K....REK..A....A.N...........FA.V..ER...F..........Y.St
..MSHPVS.SA..VIDHFT..R.P..K...ER.KTEF.YGPTPTKE.AR-.S...K.E..KEK.LPV..WIN.T.....I..M.~Q..EG...F........S.Y..t
.P------.NPERTMHVD..K.P..T...E...KNF.-G..PPEE.ER-.SE..KSWD.REK..A
. . . . . . N . . . G...V..MFAA...EG...F.Q...........t
..GDDDPGNKQFRPAAGPRPQRAVQGRG.PEAV.GQ~VRERQRR.RGRPGPE..PGW..REK..A......N............AG...EG.M.L............St
.-------------------------------------LD.TPKEI.E------------------.K..RIN...T...V..MY~..IH.C..HS......CS.H..~
..NCE------------LTVKPA.VK-.SPRD.EG-----------------------------------IIN.MYD...A..QYAGI.IKDCI.L...G...TM~.~
-.NCE------------LTVKPA.VK-.VKRE.EG----------------------------------IIN.MYD...A..QYAGI.VKDCI.L...G...TM~.~
.T-CE------------VK----------EKG.VG----------------------------------.IN.IFT...A..Q~.I.IKDCIGI...G....M~.Ll
..EVNAGEICYV.------------------KQ.KG----------------------------------.IN.N.I...I..M~TV.VKG.I.F.Q.....TT.V.Y~
Kle
Fat
Azo
Thi
Rhi
Brj
Azb
Ana
cc1
Cl0
Azvn
Acvn
Azan
Met
.....
#
....
I
..
FNRHFKEPIACVSDSMTEDAFGGNNN~LGLQNAS~Y-KPEIIAVSTTC~EVIGDD~AFI~AKK---DGFVDSSIAVPHAHTPSFIGSHVTGWDNMFEGFAK'I
........................................
. .....................
R .....
..---.........
W ..........................
... ..R..V
S .............
..QQ..KD....CK.T.-..D
M ................
N . ..N.S .. ----E..IPDEFP..F......V.............I.R
......
..SS...S.....P.....L...ID..AISYS....KM ................
N . ..KT..E----K.N.PE.FD..F......V...I..Y...MK.ILT
LS ......
ss . ..s ...........
L . ..ID..A.SYNM.-..KM.C-.............N...KTS.E----K.S.RR.-ST.F....A.V......Y..~K.ILE
LS......SS...S...........L...TD..A.SYKM...KM ................
N . ..KTS.E----K.S.PADFD..F....A.V......Y..ALK.ILE
LT ....
..NSA..S...........L...ID...R-Y...-..KM...L............SG..N...N----KES.P~FP..F....A.V...IV.Y...IK.~T
LS..Y...CSA..S...........L...IE.M.VSYQ..-..~...C............G
... T.S.N----A.SIPQDFP..F......V...I..Y...MK.ILS
.A......VPAA
.S .........
..L..LVEA.E..TT..-..KMV............E..F.Y.GA.RD----K.AISAYP..Y......V...L..Y.S.LK.IL
LS ....
L .. ..PTY.SQMED----A.SIPEGKL.I.TN...YV......FA..VQ.I~
..AMASTS.F..G.S....GS.IKTAVK.IFS..-N
.D .... H...LS.T
.AQ .... NFDVA.T.LH.ES
... ..AKRVEE.VLVLARR.PNLRV.PII...ST......IEGS.RVCN~EAE-.P.RK.YL~V.....K......Y~CVKSVF
NFDVA.T.LH.ES
.AQ ....
... ..AKRVEE.VLVLARR.PELRL.PII...ST......IEGT.~CN~~E-.PERK.Y~V.....K......Y~~KSMF
..G .. ..ACGRVEEAVDVLLSR.PDVKWPII...ST
.SQ.Y .. SFELA.S.LH
.I .. ..VDGV.KKLNEGLLKEK.P.REVHLIAM......V..MIS.Y.VAV~W
7777777777777777377777777777777777777777777777777777777777777777777777777777777777777777~
... ..R..VSIATA.FH
.H ...........................................................................................
Kle
Fat
Aro
Thi
Rhi
Brj
Azb
Ana
cc1
Cl0
Atvn
Acvn
Azan
Met
33c
.
.
.
.
.
.
.
.
.
.
.
FTADYQGQPGKL---PKL--NLVTGFETYLG--NFRVLKRMMEOMAVPCSLLSDPSEVLDTPADGHYRMYS-GGTTQQEMKEAPDAIDTLLLQPWQLLKSKKWQEMWNC
. . . . . . . . . . ..---...
KL...P . . . . . ..TG . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Q . . . ..- . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
..LKSMDDKVVG---SNKKI.I.P.......--....I...LSE.G.GY
. . . . . . E . . . . . . . ..QF...A-.....E...D..N.LN.V.....H.E.T..F.EGT.Kt
.WDGKA.TVPA.ERK.DEKI.FIG..DG.-mG.M.EI,,LFSL.N.DYTI.G.G.D.W......EF...D. . ..FA.AEA.LN.KA.VCM.GISTE.TMAYI..KGQE
.WNGKA.TAP..ERK.NEAI.IIG..DGN-TVG.L.EI..I~.GIKHTI.A.N...F...T..EF...D. ..HVEDTAN.IH.KA.ISM.Q.CTE.TLPF.S.HGQC
. . ..LKDAAN.IH.KA.ISM.Q.CTE.TLSFAA.HGQC
. WDGUA. TAP ..ERK.NGAI.IIG..DG.-TVG.L.EI..IL.L.EI..IL.L.GIQHTV.A.N...F...T..EF...D. ..VH.KA.ISM.EYCTPQ.I.QFIKREGPJ
.W----.TSENFDTPKNETI..IP..DGF-AVG.N.E...IAGLFGIQMTI...V.DNF......E....D-...PL~T
L.EGKKKATS------NGKI.FIP..D..--VG.N.E.....GV.G.DYTI
. ..S.DYF.S.NM.E.E..P-S..KLEDAADSIN.KA.VA..AYTTP.TREYIKTQ.K.
L--SRTADAD.VESPG.PRL.I.P.....
--VG.H.EYR.IL.L.G.DPLI.G.H.SS..S..T.E.EL.P-...P~.~T.RFS~.~..ETTTR.TTEF.~G.K.
LSENTGAK--------NGKI.VIP
..---V.PADM.EI..LF.A.DI.YIMFP.T.G...G.TT.E.K..PE...
KIEDL.DTGNSDL..S.GSYASDLGA.TLEKKCm
I.DAHGKGQ------.SGKL.V---.PGWVNPGDVVL...
YFKE.D.EANIYM.TED-F.S.MLPNKSIETH.R..VEDI
I.EVHGKGQ------.SGKL.V---.PGWVNPGDVVL...
YFKE.G.DATVFM.TED-F.S.MLPNKSIETH.R..VEDI
.AK---REA------.NDKI
..---LTGWVNPGDVKE..HLLGE.DIEANV.FEIES-F.S.ILPDGSAV.H.N..IEDLIDTGN.~.FA.NRYEGT.~YL.KKFEl
~~~~73~~3~~333~~~7~3~777333333777~7777777777777777777777777777~777777~7777777~7~777~777777~77777777~
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ..-...................
Kle
FaC
G .a
22c
Downloaded from http://mbe.oxfordjournals.org/ by guest on January 12, 2012
1
t
t
t
t
h
EI
.
..
..
.
44c
Kle
Fat
Azo
Thi
Rhi
Brj
Azb
Ana
cc1
Cl0
Awn
Acvn
Azan
Met
PATEVAIPL;;LAATDELLM~VSQLSGKPIADALTLERG~MMMLD-SH~WLHGKKFGL;GDPDFVMG
. . ..*.......................................... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ..-......--.........
EVPKLN..M..DW
. ..F..K..EI..Q..PAS..K.........T.-........R.A.W.........VK.........VH..C..G.-...K..--~AI.A.....~
W-ALHC.I.VTG..HF.QE..GI.....SEE.KK.......AIGT-.ISY......AIK.....CL.VAG......A..VHWCTRG.--------GTPMIHIGF.IHC
W-SFNY.V.VS...D..VAL.RI...E.PEQ.AR.......AIA.-.SAHI.....AI.....LCY..~......A...HV..T.G.-~AGEN---A~FAG..F.E
VL-SFNY.V..S...DFIVAL.RI...E.PEQ.AR.......AIA.-.SAHV.....AI.....LCY..~......A...HV..T.G.-.A..EK--.Q~.AS..F.C
NR?!
GRQAYNY.M.VTG......KLAE.....-SRGVK........AIA.-...H....R.AV......CL.MSK..M...A..VH...TSGS-.K.E.Q--VQ.V...CRS~
ETQVLR-.F.VKG...
F.TA..E.T..A.PEE.EI.......AIT.-.YA.I.....AI.....LIISI.S
. . ..M.A..VH..CN.GD-DTFK.E--.EAI.A...P.H
ETWLET.I.VKN..RF.TE.AR.AEVE.PAE..A.......~T.-..AY....RVAIA....L.I~.G.A....MI.VHL..T..D-NSFAPR--LE.V.ST.KF.E
.FKTLRT.I.VS....
FI.~.EAT..EvPASIEE...Q.I.L.I.-AQQY.Q...VA.L....EIIA.SK.II...AI.KY~GTPG-MKF..E--IDA..A~GI-E
.NAL.NT.Y.IKN..DM.RKIAEVT..E.PES.VR...
IAL.ALA.LA.MFFAN..VAIF.H..L.L..AQ.CM.VEL..VLL.IGDWGNKYK.DPRIEELKNTAHFDl
.NSL.NT.Y.IKN..DM.RKIAEIT..E.PES.VR.PRIAWI~A.LA.MFF~..VAIF.H..L.L..AQ.C..V~..~L.IGDDQGSKYK.DP~Q~K..AHF~
..IIGPT.I.IRN..IF.QNLKKAT....PQS.AH
. ..VAI.ALA.LT.MF.AE.RVAI..A..L.I..AE.C.D.~K.~L.LGDD.-SK~DPRI~QE~..~
~~77777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777~77777~
. ... .. .. .. .. .. .. .. ... .. .. .. .. ... .. ... . ... ... .. .. .. ... ... .. ... .. .. ... .. ... ... .. .. .. ... .. .. ... .. .. ... .. .. ... ..
540
.
Kle
Fat
Azo
Thi
Rhi
W
Arb
Ana
cc1
Cl0
Azvn
Acvn
Azan
Met
.
DSEVFINCDLWHFRSLMFTR--QPDFMIGI;SYGKFIQRDDYSFDLVR
RYS.................--................V
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .A . . . . . . . . . .
NAT.Y.GK....L...V..D--K..................H...E
. . . . . . . I . . . I . . . . . . ..S..L......Q.L.....SI..R..EE.RGMQA...NH....
RHHHHRYPIWGYQGAVNVLVILDRIHLELDRNTLGIGT.KDWAEK--~~FASS.FGT~.AYPGKDLWHMRSL~TE--P~F.IGNSYTKYLE~..F.Y....
L-PAYPGR....M...L..E--PV..L...TH..YLE...-------GT
. ..Q.GL.VLVKILDKIFDEI.KK..V.........II.
. . . . I...I.....H..FPV
GCQ.YPGR....M...L..E--PV..L...T...YLE...-------AT
. ..Q.GL.VLVKILDKIFDEI.NK.NI.........II.
. . . . I...I.....H..FPI
.
.
..Q..L.VLVRILDRIF.DM.AN.NIV.E.........
SGKAYGAK....L...V..D--KV.YI.......YLE...-------KI.....TY.I.....H..YP
EAK.W.QK........L..E--PV..F.......YLW...-------SI.~.I.Y
. . . . . . . . ..YS.L..Q.GL..LNWV..TL.DEM.RS.NIT....I....I.
AAT.WPGK....L...
V..E--PV.LL..S..L.Y.S.EA-------N...V.V...I
. . . . . . ..FPIV..T.GLHLL.Q...T..DE..RTS-----P.H...V..
G.K.KVEG.FFDVHQWIKNE--GV.LL.S.T.....A.EE-------NI.FV.F...IM..YGHYYNPKV..K..I~.EEIT.VI.D.IEREC-----.EED.EV..
E--1VH . A . ..ELEKRI-NAGL.L.LIM.H.K.RYVAIEA-------NI.~.V...T...AG.Y.KPSI..Q...ELGEMIA..MFAHMEYT~----KEWILN-TW
E--1VH . A . ..ELEKRI-NDGL.L.LIM.H.K.RYVAIEA-------NI.~.V...T...AG.Y.KPSI..Q...~GEMIA..MFAHMEYT~----KEWILN-TW
E--IVT.A.F.ELENRIKNEGLEL.LIL.LIL.H.K.R..SI.Y-------NI.~.V
. ..TY..AG.F.YP.V..G..IWLAEQMA.TLFADMEHKKN-.W
777777777777777777777777777777777777777777777~7777777777777777777777777777~777777777777777777777777777777777
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
..-...I..........................
Frunkia HFPCcI3 n$K Gene
amino acid sequences were first converted to distances
by PROTDIST using either Dayhoff’s ( 1979) PAM 00 1
matrix ( Dayhoff distances) or Kimura’s ( 1983 ) approximation of the Dayhoff distances (Kimura distances):
D = -ln( 1 - p - 0.2 p2),
Results
Isolation of Clones and Sequencing of nifK from
Frankia Strain HFPCcI3
Plasmid pFQ 167, which contains the nz$HDK genes
of Frankia strain CeD (Normand et al. 1988)) was used
for Southern hybridizations against total DNA isolated
from Frankia strain CcI3. A single 5.6-kb BamHI fragment that hybridized to the pFQ167 insert was detected
in the Cc13 genome. A partial genomic library was made
in pUC 19 from Cc13 BamHI-restricted DNA fragments
of 5-9 kb. The plasmid pAR2 was isolated from the
library on the basis of its hybridization to pFQ 167. Plasmids pAR2 and pFQ 167 nifregions were shown to have
similar restriction patterns (not shown). As in other
Frankias (see review by Simonet et al. 1990), the
nzfHDK genes of Frankia strain Cc13 are contiguous
and located on the Frankia chromosome.
The Frankia Cc13 nzfK nucleotide sequence
(GenBank U01238) ranges from a low of 55.5% sequence identity to the Klebsiella pneumoniae nzfK to a
high of 62.5% sequence identity to the Anabaena nzfK,
indicating that the nzfK genes are sufficiently conserved
at the DNA level that they will hybridize; this was the
strategy used by Ruvkun and Ausubel ( 1980) to demonstrate that nzf genes were conserved among evolutionarily diverse diazotrophs. Previously, Simonet et al.
( 1988) indicated on the basis of DNA hybridization that
there is low sequence similarity between K. pneumoniae
and Frankia nzfK. Our sequence data confirm their
findings.
The nucleotide composition of the Frankia Cc13
nzfK gene is highly skewed toward G+C; 9 1.5% G or C
occurs in the third codon position, similar to the average
for the entire genome. The only exception is in amino
acids 10 through 53, which exhibit 67% A or T in the
third position. This stretch of amino acids also shows
minimal similarity with all the other N termini of the
diazotrophs studied.
The protein-coding region of nzfK is 1,356 nucleotides long, including the stop codon, and encodes a
polypeptide of 45 1 amino acids. A potential ribosomal
binding site ( RBS), the tetranucleotide sequence GAGG,
is just upstream of the initial nzfK ATG (Shine and Dalgarno 1974). An intergenic region of 57 nucleotides exists between the start of nzfK and the end of nzfD.
Comparison
Sequences
of Frankia Cc13 nzfK with Published nzjYK
Figure 1 is an alignment of the deduced amino acid
sequence of the Cc13 nzfK with the deduced nzfK polypeptides from published sequences of other diazotrophs.
FIG. I.-Alignment
of available NifK sequences. Regions difficult to align, eliminated before phylogenetic analysis, are indicated by the
horizontal lines. Amino acids that are identical to the reference taxon are indicated by a period and gaps are indicated by a dash; #, the three
cysteine and serine residues of the p subunit involved in the P-cluster pair , ? refers to sequences that are not known. Abbreviation names and
sources: He, Klebsiella pneumoniae (Arnold et al. 1988); Fat, “Frankia” FaCl = Klebsiella pneumoniae (Lignon and Nakas 1988); Azo,
Azotobacter vinelandii (Brigle et al. 1985); Thi, , Thiobacillus ferrooxidans (Rawlings 1988); Rhi, Parasponium Rhizobium (Weinman et al.
1984); Brj, Bradyrhizobium japonicum (Kaluza and Hennecke 1984); Azb, Azospirillum brasilense (Passaglia et al. 199 1); Ana, Anabaena sp.
(Mazur and Chui 1982); Ccl, Frankia strain HFPCcI3, this study; Clo, Clostridium pasteurianum (Wang et al. 1988); Azvn, A. vinelandii vnjK
(Joerger et al. 1990); Acvn, A. chroococcum vnf(Robson et al. 1989); Azan, A. vinelundii anf(Joerger et al. 1989); Met, Methanococcus thermolithotrophicus (Souillard and Silbold 1989).
Downloaded from http://mbe.oxfordjournals.org/ by guest on January 12, 2012
where p is the fraction of amino acids that differ between
two species. The generated distance matrices were then
subjected to Saitou and Nei’s ( 1987) neighbor-joining
(NEIGHBOR)
and to the least squares methods provided in FITCH, with the value of P set to 0.0 (the
method of Cavalli-Sforza and Edwards 1967), 1.O, and
2.0 (the method of Fitch and Margoliash 1967).
Bootstrapping ( Felsenstein 1985 ) was performed
for each analysis. For the parsimony analyses, branchand-bound searches were made using PAUP ( 100 pseudoreplicates were performed). For the distance analyses,
100 pseudoreplicate amino acid sequence datasets were
generated by SEQBOOT in PHYLIP, followed by the
conversion of those 100 datasets to distances according
to the two methods outlined above using PROTDIST.
These pseudoreplicate distance matrices were then analyzed by NEIGHBOR, and the consensus trees were
computed using CONSENSE.
Trees were rooted using either the Methanococcus
thermolithotrophicus nifKgene, the only noneubacterial
sequence analyzed ( Woese 1987 ) (parsimony analysis
only), or with the alternative nitrogenase gene clade
(Azotobacter vinelandii and A. chroococcum vnfKs and
A. vinelandii anfK [see Results]). The justification for
the alternative NifKs being considered as an outgroup
is their paralogous status (Young 1992; Chien and Zinder 1994).
19
20
Hirsch
et al.
I Klebsiella
@
78
, 53
11 “Frankia”
FaCl
111 = (Klebsiella
pneumoniae)
Azot obacter
137
\
pneumoniae
vinelandii
ferrooxidans
Azospirillum
66
77
100
T104
101
Frankia
Clost ridium
‘2
Azot obacter
vinelandii
-
Azotobacter
chroococcum
l9
Azot obact
er vinelandii
sp.
1
HFPCcl3
1
past eurianum
I-
anf
y-prot
p -prot
brasilense
eobact
eria
eria
t hallobact
eria
f irmibact
eobact
eria
I
cyanobact
eobact
ol-prot
japonicum
eria
vnf
vnf
I
eria
1
Downloaded from http://mbe.oxfordjournals.org/ by guest on January 12, 2012
113
Anabaena
eria
Rhfzobium
Bradyrhizobium
*
eobact
l-
Thiobacillus
Parasponfum
y-prot
FIG. 2.-Phylogenetic treeof I3 n[fK sequences
based on deduced amino acid sequences and obtained with parsimony
(PAUP).
The same
topology
is obtained with analysis of Kimura
distances
(see text). The tree is rooted with the alternative
nifK genes. Branch lengths (number
of
amino acid replacements
observed
over entire gene) are given above the horizontal
lines, and bootstrap
values are encircled
at the base of
branches.
All proteobacterial
orthologs
have been highlighted.
The asterisks refer to the positions
of the root for the two equally parsimonious
trees obtained
when A4~~~hanococw.sthermolithotrophiczls is included as the outgroup
(see text).
All are based on complete sequences, except the nzjYKof
Methanococcus thermolithotrophicus,
where only 237
nucleotides were available in the database. Gaps were
introduced by eye to maximize the alignment of the sequences. The three cysteine residues and the serine of
the p subunit that are involved in the P-cluster pair and
conserved among the various taxa are indicated (fig. 1).
Very little amino acid sequence similarity between
Frankia and the other diazotrophs was observed in the
first 52 amino acids of the Frankia Cc13 NifK. Particularly striking is the almost-complete absence of the first
70 amino acids in the Clostridium pasteurianum deduced NifK compared with Frankia, Anabaena, and the
proteobacterial NifK sequences. In this amino terminal
end, large gaps also occur in the sequences of the Azotobacter vinelandii, A. chroococcum vnf Ks (both vanadium based), the A. vinelandii anfK (Fe-only based),
as well as in the M. thermolithotrophicus
n.$K.
The rest of the sequence was generally easy to align,
except for sites 222-236 and 243-253, which, along with
the first 57 amino acids, were eliminated from the phylogenetic analyses (fig. 1) . For most of the other regions
that show length variation, it is almost always the alternative NifKs that contain amino acids that the rest of
the diazotrophs lack (fig. 1).
Phylogeny of NifK:
Parsimony
Initially, parsimony analysis was performed on the
complete sequences with M. thermolithotrophicus used
as the outgroup. Two trees were generated; one rooted
to Clostridium, the other to A. brasilense (asterisks in
fig. 2). However, because of the extensive length variation in the amino terminus of NifK, we eliminated
amino acid positions 1 through 57 from all taxa (fig. 1).
This meant the elimination of almost half the available
M. thermolithotrophicus sequence. For this reason, M.
thermolithotrophicus was removed from all subsequent
analyses, and the alternative Nif polypeptides (deduced
from the Azotobacter vinelandii and A. chroococcum
vnf Ks, and the A. vinelandii anfK> were used as the
outgroup.
The most parsimonious tree found with the removal
of the 57 amino terminal amino acids and the regions
that were difficult to align (fig. 2) was identical to the
tree rooted on Clostridium found with the complete sequences. This tree (fig. 2) shows five distinct groups.
The proteobacterial cluster (Stackebrandt et al. 1988),
supported by 7 1% of the bootstrap replicates, is divided
into two distinct subgroups. One subgroup, supported
by 100% of the bootstrap replicates and consisting of y-
Frankia HFPCcI3
-
Thiobaciiius
Parasponium
Azospiriiium
Azotobacter
Azotobacter
t
Azotobacter
Vineiandii
vineiandii
Vnf
chroococcum
japonicum
brasilense
pasteurianum
anf
0
vnf
FIG. 3.-Phylogenetic
tree of the 13 nifK sequences based on deduced
(see text). Bootstrap
values are encircled
at the base of branches.
distances
All proteobacterial
orthologs
have been highlighted.
proteobacteria, includes the A. vinelandii NifK and the
NifKs from K. pneumoniae and FaC 1. The nucleotide
sequence of FaC 1 was originally published as a Frankia
sequence (Lignon and Nakas 1988), but this is now
known to be the result of contaminating K. pneumoniae
DNA (Lignon and Nakas 1990). It is still listed in the
database as a Frankia nifK sequence, however.
The second proteobacterial subgroup, supported by
59% of the bootstrap replicates, consists of Parasponium
Rhizobium, Bradyrhizobium japonicum, Thiobacillus
ferrooxidans, and A. brasilense NifKs. This clustering
of Thiobacillus, a P-proteobacterium, with members of
the a-proteobacterial Rhizobiaceae is supported by 64%
of the bootstrap replicates. However, if Frankia, or
Clostridium, or both are removed from the analysis, the
most parsimonious tree yields a monophyletic a-proteobacteria. We also note that the last 100 amino acids
of Thiobacillus are highly divergent from all other sequences (fig. 1). However, exclusion of sites 428-528
from the parsimony analysis does not change the topology of the shortest tree or affect the bootstrap values
appreciably. The inclusion of this P-proteobacterial genus with the a-proteobacteria group is also seen in the
NifH trees (Normand and Bousquet 1989; Normand et
al. 1992).
The Frankia and Anabaena NifKs form successive
sister groups to a monophyletic proteobacterial group
ferrooxidans
I
100
’
J
amino acid sequences and most commonly
obtained using Dayhoff
Scale bar indicates
estimated
number
of amino acid replacements.
(supported by 100% and 8 I%, respectively,
strap replicates). The Clostridium NifK, the
positive bacterium included in this study,
group to all other eubacterial sequences,
bootstrap support.
of the bootother gramis the sister
with 100%
Phylogeny of NifK : Distance Analyses
The shortest tree obtained with the distance analyses depended on the specific analysis performed. The
phylogeny computed with the Kimura distances gave
the same topology as the most parsimonious tree (fig.
2 ), whether these distances were analyzed with NEIGHBOR or FITCH (P = 0.0 or 2.0). However, a tree very
similar to the NifH tree of Normand et al. ( 1992) was
found when the Dayhoff distances were used; with
NEIGHBOR and FITCH (P = 1.O, or 2.0)) the Dayhoff
distance tree was essentially the same as the NifH tree
of Normand et al. (fig. 3). With FITCH (P = O.O),
Frankia and Anabaena are still most closely related to
the a- and P-proteobacteria, but in this tree, Frankia is
the sister group to an Anabaena/a+P-proteobacterial
clade.
Phylogeny of NifH
Normand and Bousquet ( 1989) and Normand et
al. ( 1992) only analyzed the complete array of NifH
sequences with distance methods, although Normand
Downloaded from http://mbe.oxfordjournals.org/ by guest on January 12, 2012
Ciostridium
21
Rhlzobium
Bradyrhizobium
-
nifK Gene
22
Hirsch
et al.
latter’s branches are too short by about a factor of three,
unless the branch lengths are number of replacements
per 100 sites.)
Discussion
Review of the Case for and against Horizontal
Transfer of nifGenes
Verification of horizontal transfer is difficult, but
perhaps the most crucial component in establishing
horizontal transfer is that the phylogeny of the sequences
thought to be transferred should conform to a conventional phylogeny except for the radically aberrant position of one member or group (Smith et al. 1992). Using
the 16s rRNA phylogenies of Woese ( 1987), Olsen and
Woese ( 1993), and Olsen et al. ( 1994) as standards, the
Normand and Bousquet ( 1989) NifH tree has three unanticipated features: ( 1) Frankia is grouped with the
cyanobacterium Anabaena and not with the other firmicute, Clostridium; (2) this FrankialAnabaena
clade
lies within the proteobacterial clade; and (3) the P-proteobacterium Thiobacillus lies within the a-proteobacteria clade.
Young ( 1992) gives three major reasons for arguing
that the NifH tree of Normand and Bousquet ( 1989)
Arotobacter
1
I
vineiandii
Azotobacter
vineiandii
Azotobacter
chroococcum
Kiebsieiia
Thiobaciiius
19
11
52
Azotobacter
97
ferrooxidans
Bradyrhizobium
Azospiriiium
!Wzobium
Clostridium
vnf
pneumoniae
14
42
vnf
Downloaded from http://mbe.oxfordjournals.org/ by guest on January 12, 2012
and Bousquet ( 1989) did perform parsimony analyses
on a quartet of taxa, including Anabaena, Frankia,
Klebsiella, and Clostridium. However, if the Clostridium
sequence is paralogous to the others (Young 1992; Chien
and Zinder 1994), then the analysis, with just one orthologous representative from the cyanobacteria, proteobacteria, and gram-positive bacteria, has no bearing
on hypotheses of horizontal transfer.
Given that our analyses of NifK with different
methods produced different trees and that these trees
were consistent either with horizontal transfer or with
vertical descent, we wondered whether NifH might also
produce different topologies under a wider range of
methods of analysis. Hence, we undertook a parsimony
analysis of the complete array of NifH amino acid sequences. Difficult to align regions were eliminated (sites
115-l 17 and 273 to the C-terminal end on Normand
and Bousquet’s [ 19891 alignment), although inclusion
of these regions did not change the topology of the shortest tree. The most parsimonious tree obtained is identical
to the trees of Normand and colleagues ( 1989, 1992)
(fig. 4). (Note that there is a discrepancy in the branch
lengths between the NifH trees reported in Normand
and Bousquet [ 19891 and Normand et al. [ 19921; the
japonicum
brasiiense
nmiiloti
pasteurianum
vineiandii
anf
Methanococcus
thermoiithotrophicus
FIG. 4.-Phylogenetic
tree of 13 nlfH sequences based on deduced amino acid sequences and obtained with parsimony
(PAUP).
The tree
is rooted with Methanococcus thermolithotrophicus. Branch lengths (number
of observed
amino acid replacements
over entire gene) are given
above the horizontal
lines, and bootstrap
values are encircled
at the base of branches.
All proteobacterial
orthologs
have been highlighted.
Sequence sources as in fig. 1 except Thiobacillus ferrooxidans (Pretorius
et al. 1987); Anabaena sp. (Mevarech
et al. 1980); Rhizobium meliloti
(T&ok
and Kondorosi
1981); Bradyrhizobium juponicum (Fuhrmann
and Hennecke
1984); Frankia Ar13 (Normand
et al. 1988).
Frunkia
23
bacterial groups may have diverged >3,500 MYA
(Schopf 1993). If there was even a slight selection pressure toward G+C enrichment (say, with a selection coefficient [s] of 0.0001 ), and if the population sizes (N)
were on the order of 109, then 63% G+C enrichment
could occur in as little as 50,000 yr (the time taken to
acquire 290 substitutions at a substitution rate of K
= Nsp, where K = the substitution rate under selection
[equation modified from Lewontin ( 1974)) to take into
account haploid genomes] . These calculations are simplistic, but they show that the G+C signature test, as
Young ( 1992) implies, lacks power. This is especially
true for the specific hypothesis of horizontal transfer implied by the nifH tree. Note, if the horizontal transfer
occurred from a proteobacterium
to the ancestor of
Frankia and Anabaena, and Frankia later acquired its
G+C-rich signature after it diverged from its common
ancestor with the cyanobacteria, then there is little to
explain-the
G+C contents of the proteobacteria and
cyanobacteria are essentially identical.
While Young seriously undermined the case for
nifH horizontal transfer, the nifH tree of Normand et
al. ( 1992) does provide some support for horizontal
transfer: the Frankia/Anabaena
branch lies within the
proteobacterial clade with bootstrap support of 80% and
the P-proteobacterium lies within the a-proteobacterial
clade with a 99% bootstrap support. Unfortunately,
Normand et al.‘s ( 1992) test of the horizontal-transfer
hypothesis by examining the phylogeny of another nif
gene, nifo, lacks resolution, and the NifD tree will not
be considered further.
Note that the combining of the NifH and NifD
data makes little sense in the context of testing the
hypothesis of horizontal transfer. Insofar as they give
different topologies, it either suggests that one of the
other genes underwent horizontal transfer or that one
or both datasets are contaminated with phylogenetic
noise.
On the Significance of Bootstrap Values
The identification of horizontal transfer clearly requires well-supported phylogenies. The crucial question
is, When is the position of the aberrantly placed group
significant? Typically bootstrap values are used to assess
robustness of individual nodes on a phylogen;. Thus,
for example, the 80% support for the inclusion of the
Frankia / Anabaena clade within the proteobacteria in
the NifH tree (Normand et al. 1992) offers some support
for horizontal transfer. Hillis and Bull ( 1993 ) and Felsenstein and Kishino ( 1993) have discussed the relationship between bootstrap values and the familiar P
values of statistics. Hillis and Bull show that in simulations that the interpretation of bootstrap values as P
values often underestimates the true support for a node.
Downloaded from http://mbe.oxfordjournals.org/ by guest on January 12, 2012
does not provide clear support for the horizontal transfer
of nifrr. First, the crucial branches on the NifH tree are
not defined unambiguously, and, with different algorithms, different NifH trees were obtained. The difficulties associated with assessing the robustness of molecular phylogenies is discussed below. However, it is
noted that the fact that different analyses of the NifH
data by Normand and Bousquet ( 1989) gave different
results is not a concern in this instance. The methods
(UPGMA and PHYLIP’s KITSCH) that gave different
results from all other analyses both assume that sequence
evolution has occurred according to a molecular clock.
Perusal of the phenograms of Normand and Bousquet
( 1989) and Normand et al. ( 1992) suggests this is not
the case. The difficulty is that both UPGMA and
KITSCH will often recover the wrong tree if the underlying data have not evolved according to a clock (see
Marshall 1992a).
Young’s ( 1992) second argument against horizontal transfer is that the Clostridium nfH sequence may
be the aberrantly placed sequence, not the Frankia sequence if it is a paralog of the other n$H genes analyzed.
If the Clostridium n$H sequence is a paralog, then the
support for the separation of Clostridium from the rest
of the Eubacteria analyzed is to be expected and does
not provide support for horizontal transfer. However,
this does not explain inclusion of Frankia and Anabaena
within the proteobacterial clade, or the inclusion of the
P-proteobacterium
Thiobacillus within the a-proteobacterial clade.
Young’s ( 1992) third argument against horizontal
transfer is that the G+C contents of each of the n$H
genes match the rest of the genomes in which they are
found, and not the G+C contents of the hypothesized
donor groups. This certainly argues against a very recent
horizontal transfer. But how recent is recent?
Estimates of the rate of conversion of a set of 290
third positions (the number in nifH) from being entirely
A+T rich to highly G+C rich can be made. If it is assumed that there is no selective advantage to having a
G+C-rich third position, but that observed high G+C
richnesses (e.g., in Frankia) are due to substitution biases
toward G+C, then, using the binomial distribution, it
can be shown that 290 substitutions will, on the average,
convert a set of entirely A+T-rich third positions to
about 63% G+C and that a 97% enrichment would occur
with approximately
1,000 substitutions. Given a bacterial neutral substitution rate (=mutation
rate, p) of
0.7 to 0.8 X lo-* substitutions/site/year
(Ochman and
Wilson 1987), it would take about 145 million yr to get
a 63% enrichment in G+C and about 500 million yr to
see an enrichment of 97% G+C. These time periods are
loner. but not that long if one considers that the maior
HFPCcI3 n(fK Gene
24 Hirsch et al.
certainly tantalizing evidence of horizontal transfer of
that gene, but without further corroboration, it remains
little more.
Orthology, Paralogy, Reference Phylogenies, and Tests
of nifHorizonta1 Transfer
Given the sequences at hand, what is the range of
topologies that would provide evidence for horizontal
transfer among the nifgenes? Clearly, a prerequisite for
demonstrating horizontal transfer is that orthologous
genes are compared (although, of course, paralogs make
excellent outgroups). There is now strong phylogenetic
evidence (Chien and Zinder 1994) that the n$H of
Clostridium is paralogous to the other nifH genes analyzed by Normand et al. ( 1992). Thus, the position of
the Clostridium NifH, as argued by Normand et al.
( 1992) and Young ( 1992), is of little significance to the
hypothesis of the horizontal transfer of the nzjYHgenes
analyzed to date.
There is also strong phylogenetic evidence that the
nzjYDsequence of the gram-positive Clostridium is paralogous to the other eubacterial nifD sequences analyzed
by Normand et al. ( 1992) (Chien and Zinder 1994).
Thus, for NifD too, the position of Clostridium contributes little to the arguments for or against horizontal
transfer.
Like the NifH and NifD trees, the eubacterial lineages in the NifK tree are rooted on the Clostridium
branch (figs. 2, 3). The two genes nifD and nifk’ are
closely linked and encode polypeptides that are two separate components of dinitrogenase. It seems more likely
than not that they have remained closely linked
throughout their history. If this is true and if the Clostridium nifD is paralogous to the other eubacterial nifDs
analyzed, then the Clostridium nifK is also, in all probability, paralogous to the other eubacterial nifKs analyzed. Hence the position of Clostridium in the NifK
tree offers little to the debate on nif gene horizontal
transfer.
For the remaining taxa for which presumed orthologous nifgenes have been sequenced, there is surprisingly little known about their relationships, and thus
there are relatively few topologies that would be viewed
as being sufficiently aberrant to warrant hypothesizing
horizontal transfer. For example, the relationships between the three major groups, the cyanobacteria, proteobacteria, and gram-positives are not well understood
(see below). Hence, any of the three ways of grouping
these taxa would not seem aberrant in a nif gene tree.
In addition, with only one cyanobacterium and one orthologous gram-positive nif gene (the Frankia sequences), the only expected relationships of the known
nifgenes are that the proteobacteria should be monophyletic and that each of the subgroups of the proteo-
Downloaded from http://mbe.oxfordjournals.org/ by guest on January 12, 2012
For example, for groups with bootstrap support of 85%,
greater than 95% may actually be supported. These arguments suggest that the support for the unusual placement of Frankia and Anabaena is probably stronger than
the 80% bootstrap value might suggest.
However, the generalizations made by Hillis and
Bull ( 1993) are based on simple models of sequence
evolution (e.g., no site-to-site variation in rates of sequence evolution, no substitution biases, fixed transversion/ transition ratios). In any situation where a
method of inference is likely to be inconsistent, that is,
where the chances of getting an incorrect tree increases
with increasing amounts of data, the bootstrap values
are difficult to interpret. They may simply measure the
degree of spurious support generated by the evolutionary
process that produced the sequences (see Raff et al.
[ 19941 for a review). For example, an 18s rRNA parsimony phylogeny of amniotes (Hedges et al. 1990) has
an 88% bootstrap support for an unexpected bird /mammal clade, very strong support indeed by the Hillis and
Bull ( 1993) criterion. However, it was demonstrated that
there was extensive lineage-specific substitution bias in
the sites that had varied, and reanalysis with a weighted
parsimony algorithm removed the mammals from the
top of the tree, to the bottom (Marshall 1992b). The
bootstrap value in this case was primarily measuring the
very strong contribution of taxon-specific substitution
biases and elevated rates of substitution (but no more
than is apparent in the nifgene trees) and not the strength
of the phylogenetic signal. Further analysis by Van de
Peer et al. ( 1993) indicates that the 18s rRNA molecule
has site-to-site variation in substitution rates spanning
three orders of magnitude. By breaking the molecule
up into five classes based on the rate of change and
correcting each separately for multiple hits, they were
able to recover an amniote phylogeny (with mammals
at the bottom, not the top of the tree) completely
concordant with morphological
and paleontological
evidence.
For molecules with long histories and highly conserved functions, it is likely that there has been great
heterogeneity in the rates that sites have evolved, as well
as episodes of substitution bias, rate changes, and so on.
In these cases, bootstrap values may not measure the
accuracy or reliability of the derived phylogeny; all they
provide is a measure of the relative strengths of the conflicting signal in the dataset (Marshall 1992b). Some of
that signal may be due to history, but it is also likely
that some, perhaps much of it, will be artifact created
by substitution bias, the proximity of long branches to
short branches, and site-to-site variation in rates of substitution. The 80% bootstrap support for the Frankia/
Anabaena clade within the proteobacterial clade is
Frankia
rzifK Gene
25
horizontally transferred. Note that none of the trees that
support horizontal transfer deviate grossly from topologies expected under the hypothesis of vertical descent:
all can be converted to a tree consistent with vertical
descent by interchanging just one or two adjacent nodes.
To resolve the evolutionary relationships of the nifgenes,
sequences need to be obtained from a considerably wider
range of diazotrophs. There also needs to be development
of methods to determine whether sequences have dramatically different site-to-site nucleotide substitution
rates, substitution biases, and so on, and phylogenetic
methods need to be developed to handle these cases.
Rooting the Eubacterial Tree
The relationships between the three major eubacterial lineages, the cyanobacteria, the proteobacteria, and
the gram-positives are not firmly established. For example, the 16s rRNA sequences give different roots (fig.
5) depending on the taxa included. Figure 1 of Olsen
and Woese ( 1993) provides perhaps the most accurate
16s rRNA tree of the three groups-an
unresolved trichotomy. One of the few other genes sequenced from
sufficient taxa to bear on the relationships of these three
eubacterial groups is the glutamine synthetase 1-p gene
(GSI-0) (Brown et al. 1994)) although different analyses
provide different roots (fig. 5 ) .
The NifK sequences offers one of the best opportunities to establish the relationships among the major
nitrogen-fixing eubacterial lineages, although a larger
database is desirable. If nifK evolved by vertical descent,
then the present data provide evidence for the cyanobacteria and proteobacteria being sister groups ( fig. 5 ).
On the basis of parsimony tree ( fig. 2 ), there is 8 1%
bootstrap support for this topology; this is also the tree
with the strongest support from GSI-P.
16s rRNA
(Olsen et al. 1994)
CYANOBACTERIA
GRAM-POSITIVES
(NifK)
GSI-P (Brown
et al. 1994: MP)
FIG. 5.-Position
of the root for the three eubacterial
lineages
suggested by four phylogenetic
studies. The NifK is enclosed in parenthesis to emphasize that vertical descent is not the favored hypothesis
in all analyses. Brown et al. (1994) presented
two trees, a maximum
parsimony
tree (MP, 97% bootstrap
support
for the root shown) and
a neighbor-joining
tree (NJ, 60% bootstrap support for the root shown),
for the glutamine
synthetase
(GS) genes. The relevant portions
of their
trees are based on the GSI-P member of the family.
Downloaded from http://mbe.oxfordjournals.org/ by guest on January 12, 2012
bacteria with more than one taxon represented should
be monophyletic. What then do the Nif trees have to
tell us about the possibility of horizontal transfer of the
nifgenes?
Taken at face value, the parsimony and Kimura
distance NifK trees (fig. 2) provide support for vertical
descent of the nifsenes, while the Dayhoff distance trees
(fig. 3) provide support for horizontal transfer. However,
in the bootstrap analysis performed for each phylogenetic
analysis, support was found for both hypotheses. For
example, while the most parsimonious tree provides 7 1%
bootstrap support for a monophyletic proteobacteria,
29% of the bootstrap trees do not have a monophyletic
proteobacteria. Accordingly, while the best Dayhoff distance tree supports horizontal transfer, 27% of the
bootstraps support a monophyletic proteobacteria and
vertical descent of nifK.
It is interesting to note that, while there is support
for the P-proteobacterium Thiobacillus lying within the
a-proteobacterial clade, there is also 35% bootstrap support for a monophyletic a-proteobacterial clade ( Thiobacillus lies outside the three a-proteobacteria) in the
parsimony analysis and a 3% and 3 1% support for that
group in the Dayhoff and Kimura distance analysis, respectively. It is also interesting that there is no bootstrap
support for a monophyletic Frankia / Anabaena clade in
the parsimony analysis (although there is 20% support
for Anabaena lying within the proteobacteria and 12%
support for Frankia lying within the proteobacteria),
while there is 40% and 15% support for this grouping in
the Dayhoff and Kimura distance analyses, respectively.
Parsimony analysis of NifH provided support for
horizontal transfer in accord with the distance analyses
of Normand et al. ( 1992). However, in the bootstrap
analyses, there is also some support for vertical descent
of the NifH orthologs. In the parsimony analysis, there
was 16% support, while in the Kimura distance analysis
there was 10% support for a monophyletic proteobacterial clade. However, there was virtually no support for
a monophyletic a-proteobacterial clade; in only 7% of
the bootstrap replicates in the parsimony analysis, and
in only 2% of replicates in the Kimura distance analysis
did the P-proteobacterium fall outside the a-proteobacterium clade. Bootstraps were not performed using Dayhoff distances.
In summary, the NifK data provide support for
either vertical descent or horizontal transfer, depending
on the method of analysis. The NifH sequences provide
stronger support for horizontal transfer, although there
is some signal in the sequence that supports vertical descent. The NifD tree of Normand et al. ( 1992) has insufficient resolution to add substantially to the question
of whether, and to what extent, the nifgenes have been
HFPCcI3
26 Hirsch et al.
Dedication and Acknowledgments
LITERATURE
CITED
W., A. RUMP, W. KLIPP, U. B. PRIEFER, and A.
1988. Nucleotide sequence of a 24,206-base-pair
DNA fragment carrying the entire nitrogen fixation gene cluster
of Klebsiellapneumoniae. J. Mol. Biol. 203:7 15-738.
BRIGLE,
K. E., W. E. NEWTON,
and D. R. DEAN. 1985. Complete nucleotide sequence of the Azotobacter vinelandii nitrogenase structural gene cluster. Gene 37:37-44.
BROWN,
J. R., Y. MASUCHI,
F. T. ROBB, and W. F. DOOLITTLE.
1994. Evolutionary
relationships of bacterial and Archaeal
glutamine synthetase genes. J. Mol. Evol. 38:566-576.
CAVALLI-SFORZA,
L. L., and A. W. F. EDWARDS.
1967. Phylogenetic analysis: models and estimation procedures. Evolution 32:550-570.
CHIEN,
Y. T., and S. H. ZINDER.
1994. Cloning, DNA sequencing and characterization
of a nifD-homologous
gene
from the archaeon Methanosarcinia barkeri 227 which resembles nifDl from the eubacterium Clostridium pasteurianum. J. Bacterial. 176:6590-6598.
DAYHOFF,
M. 0. 1979. Atlas of protein sequence and structure.
Vol. 5. Suppl. 3. National Biomedical Research Foundation,
Washington, D.C.
FELSENSTEIN,
J. 1993. PHYLIP (Phylogeny Inference Package), Version 3.5~. Department of Genetics, University of
Washington, Seattle.
-.
1985. Confidence limits on phylogenies: an approach
using the bootstrap. Evolution 39:783-79 1.
FELSENSTEIN,
J., and H. KISHINO.
1993. Is there somelhing
wrong with the bootstrap on phylogenies? A reply to Hillis
and Bull. Syst. Biol. 42:193-200.
FERNANDEZ,
M. P., H. MEUGNIER,
P. A. D. GRIMONT,
and
R. BARDIN.
1989. Deoxyribonucleic
acid relatedness among
members of the genus Frankia. Inter. J. System. Bacterial.
ARNOLD,
WHLER.
39:424-429.
HENNECKE,
LUDWIG,
H.,
K. KALUZA,
B. THONY,
M.
FUHRMANN,
W.
and E. STACKEBRANDT.
1985. Concurrent evolution of nitrogenase genes and 16s rRNA in Rhizobium
species and other nitrogen fixing bacteria. Arch. Microbial.
142~342-348.
HIGGINS,
D.
G., and P. M. SHARP. 1988. Clustal: a package
for performing
multiple alignment on a microcomputer.
Gene 73:237-244.
HILLIS,
D. M., and J. J. BULL. 1993. An empirical test of
bootstrapping as a method for assessing confidence in phylogenetic analysis. Syst. Biol. 42: 182- 192.
JOERGER,
R. D., M. R. JACOBSON,
R. PREMAKUMAR,
E. D.
WOLFINGER,
and P. E. BISHOP. 1989. Nucleotide sequence
and mutational analysis of the structural genes (anfHDGK)
for the second alternative nitrogenase from Azotobacter vinelandii. J. Bacterial. 171:1075-1086.
JOERGER,
R. D., T. L. LOVELESS,
R. N. PAU, L. A. MITCHENALL,
B. H. SIMON, and P. E. BISHOP.
1990. Nucleotide
sequences and mutational analysis of the structural genes
for nitrogenase 2 of Azotobacter vinelandii. J. Bacterial.
172:3400-3408.
Downloaded from http://mbe.oxfordjournals.org/ by guest on January 12, 2012
This paper is dedicated to the memory of Professor
John G. Torrey of the Harvard Forest, Peter-sham, Mass.
He will be remembered not only for his pioneering work
on Frankia but also for his profound influence on his students and postdoctoral associates. We thank Pascal Simonet (Universite Claude Bernard, Lyon I, Villeurbanne,
France) for generously providing plasmid pFQ 167 and also
members of our laboratory group, especially Bettina Niner,
for helpful comments on the manuscript. We are also
grateful to Rob Gunsalus and Imke Schroeder (University
of California, Los Angeles) for helpful discussions and to
Steve Zinder (Cornell University) for conveying information before publication. Our gratitude is also extended
to an anonymous but extremely knowledgeable reviewer.
This research was supported in part by grants from the
United States Department of Agriculture (8%37262-3979),
the UCLA Latin American Center, and by the University
of California Pacific Rim Research Program (MC930209 )
to A.M.H., and by National Science Foundation grant
EAR-9258045 to C.R.M.
W. M., and E. MARGOLIASH.
1967. Construction of
phylogenetic trees. Science 155:279-284.
F~HRMANN,
M., and H. HENNECKE.
1984. Rhizobium japonicum nitrogenase Fe protein gene ( nifH) . J. Bacterial. 158:
1005-1011,
HEDGES,
S. B., IS. D. MOBERG,
and L. MAXSON.
1990. Tetrapod phylogeny inferred from 18s and 28s ribosomal
RNA sequences and a review of the evidence for amniote
relationships. Mol. Biol. Evol. 7:607-633.
FITCH,
K., and H. HENNECKE. 1984. Fine structure analysis
of the nifDK operon encoding the a and B subunits of dinitrogenase from Rhizobium japonicum. Mol. Gen. Genet.
KALUZA,
196~35-42.
M. 1983. The neutral theory of molecular evolution.
Cambridge University Press, Cambridge.
KLEINER,
D., W. LITTKE,
H. BENDER,
and K. WALLENFES.
1976. Possible evolutionary relationships of the nitrogenase
proteins. J. Mol. Evol. 7:159-165.
LEWONTIN,
R. C. 1974. The genetic basis of evolutionary
change. Columbia University Press, New York.
LIGON,
J. M., and J. P. NAKAS.
1988. Nucleotide sequence of
nifK and partial sequence of nifD from Frankia species
FaC 1. Nucl. Acids Res. 16: 11843.
. 1990. Corrigendum.
Nucleotide sequence of nifK and
partial sequence of nzfD from Frankia species FaC 1. Nucleic
Acids. Res. 18: 1097.
MARSHALL,
C. R. 1992a. Character analysis and the integration
of molecular and morphological
data in an understanding
of sand dollar phylogeny. Mol. Biol. Evol. 9:309-322.
~
. 19926. Substitution bias, weighted parsimony, and
amniote phylogeny as inferred from 18s rRNA sequences.
Mol. Biol. Evol. 9:370-373.
MAZUR,
B. J., and C. F. CHUI.
1982. Sequence of the gene
coding for the B-subunit of dinitrogenase from the bluegreen alga Anabaena. Proc. Natl. Acad. Sci. USA 79:6782KIMURA,
6785.
Frankia HFPCcI3 niJKGene 27
J. W. 1993. Microfossils of the early Archean Apex
Chert: new evidence of the antiquity of life. Science 260:
640-646.
SHINE, J., and L. DALGARNO.
1974. The 3’-terminal sequence
of Escherichia coli 16s ribosomal RNA: complementary
to
nonsense triplets and ribosome binding sites. Proc. Natl.
Acad. Sci. USA 71: 1342- 1346.
SIMONET,
P., P. NORMAND,
and R. BARDIN.
1988. Heterologous hybridization of Frankia DNA to Rhizobium meliloti
and Klebsiella pneumoniae nif genes. FEMS Micro. Lett.
55:141-146.
SIMONET,
P., P. NORMAND,
A. M. HIRSCH,
and A. D. L. AKKERMANS.
1990. The genetics of the Frankia-actinorhizal
symbiosis. Pp. 77-109 in P. M. GRESSHOFF,
ed. Molecular
biology of symbiotic nitrogen fixation. CRC, Boca Raton,
Fla.
SMITH,
M. W., D.-F. FENG, and R. F. DOOLITTLE.
1992. Evolution by acquisition: the case for horizontal gene transfers.
TIBS 17:489-493.
SOUILLARD,
N., and L. SIBOLD. 1989. Primary structure, functional organization and expression of nitrogenase structural
genes of the thermophilic
archaebacterium Methanococcus
thermolithotrophicus. Mol. Microbial. 3:44 l-552.
STACKEBRANDT,
E. R., G. E. MURRAY,
and H. G. TRUPER.
1988. Proteobacteria, classis nov., a name for the phylogenetic taxon that includes the “purple bacteria and their
relatives.” Int. J. Syst. Bacterial. 38:32 l-325.
SWOFFORD,
D. L. 1992. Phylogenetic analysis using parsimony
(PAUP),
Version 3.0s. Illinois Natural History Survey,
Champaign.
TOR~K,
I., and A. KONDOROSI
. 198 1. Nucleotide sequence of
the Rhizobium meliloti nitrogenase reductase (nifH) gene.
Nucleic Acids Res. 9:57 1 l-5723.
VAN DE PEER, Y ., J.-M. NEEFS, P. DE RIJK, and R. DE WACHTER. 1993. Reconstructing
evolution from eukaryotic smallribosomal-subunit
RNA sequences: calibration of the molecular clock. J. Mol. Evol. 37:221-232.
YOUNG,
J. P. W. 1992. Phylogenetic classification of nitrogenfixing organisms. Pp. 43-86 in G. STACEY, R. H. BURRIS,
and H. J. EVANS, eds. Biological nitrogen fixation. Chapman
& Hall, New York.
WANG,
S.-Z., J.-S. CHEN, and J. L. JOHNSON.
1988. The presence of five nzyH-like sequences in Clostridium pasteurianum: sequence divergence and transcriptional properties.
Nucleic Acids Res. 16:439-454.
WEINMAN,
J. J., F. F. FELLOWS,
P. M. GRESSHOFF,
J. SHINE,
and K. F. SCOTT.
1984. Structural analysis of the genes
encoding the molybdenum-iron
protein of nitrogenase in
the Parasponia Rhizobium strain ANU289. Nucleic Acids
Res. 12:8329-8344.
WOESE,
C. R. 1987. Bacterial evolution. Microbial. Rev. 51:
221-271.
SCHOPF,
JEFFREY PALMER, reviewing
Received
June 29, 1994
Accepted
August
29, 1994
editor
Downloaded from http://mbe.oxfordjournals.org/ by guest on January 12, 2012
M., D. A. RICE, and R. HASELKORN.
1980. Nucleotide sequence of a cyanobacterial nifH gene coding for
nitrogenase reductase. Proc. Natl. Acad. Sci. USA 77:64766480.
MULLIN,
B. C., and C. S. AN. 1990. The molecular genetics
of Frankia. Pp. 196-2 15 in C. R. SCHWINTZER
and J. D.
TJEPKEMA,
eds. The biology of Frankia and actinorhizal
plants. Academic Press, New York.
NORMAND,
P., and J. BOUSQUET.
1989. Phylogeny of nitrogenase sequences in Frankia and other nitrogen-fixing
microorganisms. J. Molec. Evol. 29:436-444.
NORMAND,
P., M. GOUY,
B. COURNOYER,
and P. SIMONET.
1992. Nucleotide sequence of nifD from Frankia alni strain
Ar13: phylogenetic inferences. Mol. Biol. Evol. 9:495-506.
NORMAND,
P., P. SIMONET,
and R. BARDIN.
1988. Conservation of nifsequences in Frankia. Mol. Gen. Genet. 213:
238-246.
OCHMAN,
H., and A. C. WILSON.
1987. Evolution in bacteria:
evidence for a universal rate in cellular genomes. J. Mol.
Evol. 26:74-86.
OLSEN, G. J., and C. R. WOESE.
1993. Ribosomal RNA: a key
to phylogeny. FASEB J. 7:113-123.
OLSEN,
G. J., C. R. WOESE,
and R. OVERBEEK.
1994. The
winds of (evolutionary)
change: breathing new life into microbiology. J. Bacterial. 176: l-6.
PASSAGLIA,
L. M. P., C. P. NUNES,
A. ZAHA,
and I. S.
SCHRANK.
199 1. The nifHDK operon in the free-living nitrogen-fixing bacteria Azospirillum brasilense sequentially
comprises genes H, D, K, an 353 bp orf and gene Y. Braz.
J. Med. Biology Res. 24:649-675.
PRETORIUS,
I.-M., D. E. RAWLINGS,
E. G. O’NEILL,
W. A.
JONES,
R. KIRBY,
and D. R. WOODS.
1987. Nucleotide
sequence of the gene encoding the nitrogenase iron protein
of Thiobacillus ferrooxidans. J. Bacterial. 169:367-370.
RAFF, R. A., C. R. MARSHALL,
and J. M. TURBEVILLE.
1994.
Using DNA sequences to unravel the Cambrian radiation
of the animal phyla. Ann. Rev. Ecol. Syst. 25: 351-375.
RAWLINGS,
D. E. 1988. Sequence and structural analysis of
the a- and P-dinitrogenase
subunits of Thiobacillus ferrooxidans. Gene 69~337-343.
REDDY,
A., B. BOCHENEK,
and A. M. HIRSCH.
1992. A new
Rhizobium meliloti symbiotic mutant isolated after introducing Frankia DNA sequence into a nodA : : Tn5 strain.
Molec. Plant-Microbe
Inter. 5:62-7 1.
ROBSON,
R. L., P. R. WOODLEY,
R. N. PAU, and R. R. EADY.
1989. Structural genes for the vanadium nitrogenase from
Azotobacter chroococcum. EMBO J. 8: 1.217- 1224.
RUVKUN,
G. B., and F. M. AUSUBEL.
1980. Interspecies homology of nitrogenase genes. Proc. Natl. Acad. Sci. USA
77:191-195.
SAITOU,
N., and M. NEI . 1987. The neighbor-joining
method:
a new method for reconstructing
phylogenetic trees. Mol.
Biol. Evol. 4:406-425.
SAMBROOK,
J., E. F. FRITSCH,and T. MANIATIS.
1989. Molecular cloning. A laboratory manual. 2d ed. Cold Spring
Harbor Laboratory Press, Plainville, N.Y.
SANGER,
F., S. NICKLEN,
and A. R. COULSEN.
1977. DNA
sequencing with chain elongation inhibitors.
Proc. Natl.
Acad. Sci. USA 74:5463-5467.
MEVARECH,