* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
Download Molecualr Biology and Evolution
Minimal genome wikipedia , lookup
Nutriepigenomics wikipedia , lookup
Epigenetics of human development wikipedia , lookup
Genome (book) wikipedia , lookup
Gene desert wikipedia , lookup
Vectors in gene therapy wikipedia , lookup
Non-coding DNA wikipedia , lookup
History of genetic engineering wikipedia , lookup
Genome evolution wikipedia , lookup
Human genome wikipedia , lookup
Quantitative comparative linguistics wikipedia , lookup
Expanded genetic code wikipedia , lookup
Site-specific recombinase technology wikipedia , lookup
Gene expression programming wikipedia , lookup
Pathogenomics wikipedia , lookup
Genetic code wikipedia , lookup
Gene expression profiling wikipedia , lookup
Designer baby wikipedia , lookup
Genome editing wikipedia , lookup
Therapeutic gene modulation wikipedia , lookup
Point mutation wikipedia , lookup
Microevolution wikipedia , lookup
Helitron (biology) wikipedia , lookup
Metagenomics wikipedia , lookup
Maximum parsimony (phylogenetics) wikipedia , lookup
Assessing Horizontal Transfer of nifHDK Genes in Eubacteria: Nucleotide Sequence of nifK from Frankia Strain HFPCcI3 Ann il.4 Hirsch, Heather I. McKhann,’ Amala Reddy,2Jiayu Liao, Yiwen Fang, and Charles R. Marshall* Department of Biology and *Department of Earth and Space Sciences, University of California, Los Angeles Introduction The ability to fix nitrogen is widely distributed among representatives of the eubacteria, methanogens, and halobacteria (Young 1992). Nitrogenase is an enzyme complex consisting of two components: dinitrogenase, the molybdenum-iron protein, which is an a& tetramer encoded by nifD and niflu, respectively, and dinitrogen reductase, the iron-containing protein, a homodimer encoded by n$H. Ruvkun and Ausubel ( 1980) were the first to demonstrate that the nifgenes of evolutionary-diverse diazotrophs are remarkably conserved. Such conservation, coupled with the fact that the capacity to fix nitrogen was thought to be uncommon yet widely dispersed among the major bacterial groups, led some to hypothesize that nzxgenes have been hori- Institut 1. Current address: Centre des Sciences Vegetales, 2. Current address: 12 17, Bangladesh. National de la Recherche Scientifique, 9 1198 Gif-sur-Yvette, Cedex, France. A23 Century Estate, Baramaghbazaar, Key words: nzyK, Frankia, 16s rRNA phylogeny. nzyH, Address for correspondence and reprints: Ann partment of Biology, 405 Hilgard Avenue, University Los Angeles, California 90024- 1606. Mol. Bid. Evol. 12(1):16-27. 0 1995 by The University 0737-4038/95/1201-0003$02.00 1995. of Chicago. Dhaka All rights reserved. M. Hirsch, Deof California, Downloaded from http://mbe.oxfordjournals.org/ by guest on January 12, 2012 The structural genes for nitrogenase, nifK, nifD, and nifr-r, are crucial for nitrogen fixation. Previous phylogenetic analysis of the amino acid sequence of nifH suggested that this gene had been horizontally transferred from a proteobacterium to the gram-positive/cyanobacterial clade, although the confounding effects of paralogous comparisons made interpretation of the data difficult. An additional test of nifgene horizontal transfer using nzjD was made, but the NifD phylogeny lacked resolution. Here nifgene phylogeny is addressed with a phylogenetic analysis of a third and longer nifgene, nzyK. As part of the study, the nifK gene of the key taxon Frankia was sequenced. Parsimony and some distance analyses of the nifK amino acid sequences provide support for vertical descent of nzyK, but other distance trees provide support for the lateral transfer of the gene. Bootstrap support was found for both hypotheses in all trees; the nifK data do not definitively favor one or the other hypothesis. A parsimony analysis of NifH provides support for horizontal transfer in accord with previous reports, although bootstrap analysis also shows some support for vertical descent of the orthologous nifH genes. A wider sampling of taxa and more sophisticated methods of phylogenetic inference are needed to understand the evolution of nifgenes. The nifgenes may also be powerful phylogenetic tools. If nifK evolved by vertical descent, it provides strong evidence that the cyanobacteria and proteobacteria are sister groups to the exclusion of the firmicutes, whereas 16s rRNA sequences are unable to resolve the relationships of these three major eubacterial lineages. zontally transferred among microorganisms (Ruvkul and Ausubel 1980; Normand and Bousquet 1989; Mu1 lin and An 1990). Others believe that nitrogenase is al ancient enzyme (Kleiner et al. 1976) and that the ni genes evolved by vertical descent (e.g., Hennecke et al 1985). The use of phylogenetic analysis to detect horizonta transfer involves the detection of discrepancies betweel the phylogeny of the gene of interest and a reference, o standard, phylogeny. The strongest evidence for th horizontal transfer of nifgenes comes from the pioneer ing investigations of the phylogeny of the nifH gent (Normand and Bousquet 1989; Normand et al. 1992) Using the 16s rRNA phylogenies of Woese ( 1987), 01 sen and Woese ( 1993), and Olsen et al. ( 1994) as stan dards, the nzf.H tree has three unanticipated features ( 1) the thallobacterium Frankia is grouped with the cy anobacterium Anabaena and not with the other firmi cute, the firmibacterium Clostridium; (2) this Frankia Anabaena clade lies within the proteobacterial clade and (3) the P-proteobacterium Thiobacillus lies withir the a-proteobacterial clade. The case for horizonta transfer has centered on the unusual position of Frankit and Anabaena within the proteobacteria, and it was sug gested that the FrankialAnabaena clade acquired its m Fmnkia HFPCcI3 nifK Gene 17 Downloaded from http://mbe.oxfordjournals.org/ by guest on January 12, 2012 genes horizontally from a proteobacterium. The case has genes of Frankia strain CeD and was provided by P. also hinged on the absence of nitrogen fixation in many Simonet ( University of Lyon I ) . eubacterial clades (Normand and Bousquet 1989). However, Young ( 1992) argued that the NifH tree Preparation of DNA and Hybridizations of Normand and Bousquet ( 1989) does not provide clear Plasmid DNA was prepared according to estabsupport for the horizontal transfer of nifrr, noting lished procedures (Sambrook et al. 1989). Total DNA ( 1) that the crucial branches on the nifH tree are not from Frankia strain HFPCcI3 was isolated as described defined unambiguously; (2) that if the Clostridium n$H previously (Reddy et al. 1992). The DNA was digested is paralogous to the other n&H genes analyzed, then it with restriction enzymes, subjected to electrophoresis, may be the aberrantly placed sequence, not the Frankia and blotted onto Nytran membranes (Schleicher and sequence; and (3) that the G+C contents of the n$H Schuell) or GeneScreen Plus (New England Nuclear) genes match the rest of the genomes in which they are according to the manufacturers’ protocol. Hybridizations found and not the G+C contents of the hypothesized were performed using a radioactively labeled probe made donor lineage. either by nick translation or with an Oligo-Labeling kit Nonetheless, the subsequently published NifH tree (Pharmacia). Washes were done under stringent conof Normand et al. ( 1992) does provide some support ditions as described in Sambrook et al. ( 1989). A partial for horizontal transfer: the FrankialAnabaena branch restriction map was made of pAR2 and used to generate lies within the proteobacterial clade with bootstrap sup- smaller subclones of the n$K region in pUCl19. port of 80%. Normand et al. ( 1992) provided an additional test of the horizontal transfer hypothesis by ex- Sequencing amining the phylogeny of another nf gene, n$D. Initial sequencing was done using a Sequenase kit Unfortunately, the NifD phylogeny lacks resolution, al(USB) and 35S-dATP (NEN). DNA fragments were though at face value it is essentially the same as the 16s separated on an 8% (or 5%) wedge polyacrylamide gel rRNA topology. If the NifH tree supports horizontal (BRL) in the presence of 7 M urea (Sanger et al. 1977 ) . transfer, the NifD tree does not. Given the lack of corInstead of dideoxynucleotides, we used 7-deaza-2 ‘-nuroboration of the NifH tree by the NifD tree, and with cleotides as well as PCR-based sequencing with a fmolreports of nitrogen fixation in an increasing wider range sequencing kit (Promega) because the high G+C content of bacteria, especially among other thallobacteria (highof the DNA often led to unresolvable compressions. G+C gram-positives), Normand et al. ( 1992) recanted Synthetic oligonucleotides were used as primers as nectheir earlier position and suggested, in accord with essary to generate a complete sequence of n$K. Both Young ( 1992), that vertical descent is the most likely strands were completely sequenced. For PCR sequencexplanation for the distribution of nifgenes among bacing, reactions were performed on an Ericomp Easy-Cyteria and that the unexpected aspects of the NifH tree cler thermal cycler. Samples were incubated 2 min at may be due to the confounding effects of paralogous 95°C and then given 30 cycles of 95°C for 30 s followed comparisons. by 70°C for 30 s. Despite the conclusions of Young ( 1992) and Normand et al. ( 1992), the position of Frankia and AnaSequence Analysis baena in the nifH tree still requires explanation (as also All available complete niflv sequences were anadoes the position of the P-proteobacterium Thiobacillus lyzed. Evolutionary comparisons were based on deduced within the a-proteobacterial clade). As noted by Noramino acid sequences (fig. 1) to reduce the artifacts mand et al. ( 1992), the lack of signal in the NifD tree caused by the high G+C content of Frankia, which is renders the lack of conformation of the NifH tree by especially high at third codon positions (Normand and the NifD tree of little significance. Here we address nif Bousquet 1989). Frankia strains have a G+C content gene phylogeny with a phylogenetic analysis of a third ranging from 66% to 75% (Fernandez et al. 1989). Muland longer nifgene, n$K. No bona fide nifK sequences tiple-sequence alignment was performed by the multifrom the crucial taxon Frankia were to be found in the alignment program CLUSTAL (Higgens and Sharp database, so we sequenced nifK from Frankia strain 1988), followed by refinement by eye. HFPCcI3 and subjected the deduced amino acid seThe deduced amino acid sequences were subjected quence to phylogenetic analysis. to maximum-parsimony analysis using PAUP version Material and Methods 3.0s (Swofford 1992) using branch and bound to find Clones and Vectors the most parsimonious tree. The aligned amino acid seWe used pUC19 and pUC 119 (Stratagene) as quences were also analyzed using several distance methods provided in PHYLIP 3.5~ (Felsenstein 1993). The cloning vectors. Plasmid pFQ 167 contains the nifHDK 4 I ll( Azo Thi Rhi Brj Azb Ana cc1 Cl0 Azvn Acvn Azan Met . MS------Q~IDKINSCYP~FEQDEYQEL~~KRQ-L~-~AHDAQR"Q~:"FAWTTTA~~~NFRR~T~PA~CQ~LGA~CSLGF~~PY"HG~QG~AYFRT~ . . -----. . . . . . . . . . . . . . . . . . . . . V . . . ..-..-. R . . . . . . ..-............... H- . . . . . . . . . . . HA . . . . . . . . . . . . . . . . ..I........ . . ------.QV...KAS . . ..LDQD.KDMLAK..DGF.-.KYPQDKID.-..Q....K..QE...Q......N.............A...EK.M..............S. ------.NA .. . ..VDHFN..K.PD . ..M.K..QKTF.-NRLP.W.AR-GQE..K.W..REK..AR-..DG.R.F............St .A------.SA.HVLDHLE..RGP . ..QMLAD.-KMF.-NPR.PAE.ER-IR.V.K.P..REK..A-...A.N...........FV.V..EG...F..........Y.St .P------.SAEHVLDHVE..RGP . ..QMLAK.-KIF.-NPR.PAE.ER-IKE..K....REK..A....A.N...........FA.V..ER...F..........Y.St ..MSHPVS.SA..VIDHFT..R.P..K...ER.KTEF.YGPTPTKE.AR-.S...K.E..KEK.LPV..WIN.T.....I..M.~Q..EG...F........S.Y..t .P------.NPERTMHVD..K.P..T...E...KNF.-G..PPEE.ER-.SE..KSWD.REK..A . . . . . . N . . . G...V..MFAA...EG...F.Q...........t ..GDDDPGNKQFRPAAGPRPQRAVQGRG.PEAV.GQ~VRERQRR.RGRPGPE..PGW..REK..A......N............AG...EG.M.L............St .-------------------------------------LD.TPKEI.E------------------.K..RIN...T...V..MY~..IH.C..HS......CS.H..~ ..NCE------------LTVKPA.VK-.SPRD.EG-----------------------------------IIN.MYD...A..QYAGI.IKDCI.L...G...TM~.~ -.NCE------------LTVKPA.VK-.VKRE.EG----------------------------------IIN.MYD...A..QYAGI.VKDCI.L...G...TM~.~ .T-CE------------VK----------EKG.VG----------------------------------.IN.IFT...A..Q~.I.IKDCIGI...G....M~.Ll ..EVNAGEICYV.------------------KQ.KG----------------------------------.IN.N.I...I..M~TV.VKG.I.F.Q.....TT.V.Y~ Kle Fat Azo Thi Rhi Brj Azb Ana cc1 Cl0 Azvn Acvn Azan Met ..... # .... I .. FNRHFKEPIACVSDSMTEDAFGGNNN~LGLQNAS~Y-KPEIIAVSTTC~EVIGDD~AFI~AKK---DGFVDSSIAVPHAHTPSFIGSHVTGWDNMFEGFAK'I ........................................ . ..................... R ..... ..---......... W .......................... ... ..R..V S ............. ..QQ..KD....CK.T.-..D M ................ N . ..N.S .. ----E..IPDEFP..F......V.............I.R ...... ..SS...S.....P.....L...ID..AISYS....KM ................ N . ..KT..E----K.N.PE.FD..F......V...I..Y...MK.ILT LS ...... ss . ..s ........... L . ..ID..A.SYNM.-..KM.C-.............N...KTS.E----K.S.RR.-ST.F....A.V......Y..~K.ILE LS......SS...S...........L...TD..A.SYKM...KM ................ N . ..KTS.E----K.S.PADFD..F....A.V......Y..ALK.ILE LT .... ..NSA..S...........L...ID...R-Y...-..KM...L............SG..N...N----KES.P~FP..F....A.V...IV.Y...IK.~T LS..Y...CSA..S...........L...IE.M.VSYQ..-..~...C............G ... T.S.N----A.SIPQDFP..F......V...I..Y...MK.ILS .A......VPAA .S ......... ..L..LVEA.E..TT..-..KMV............E..F.Y.GA.RD----K.AISAYP..Y......V...L..Y.S.LK.IL LS .... L .. ..PTY.SQMED----A.SIPEGKL.I.TN...YV......FA..VQ.I~ ..AMASTS.F..G.S....GS.IKTAVK.IFS..-N .D .... H...LS.T .AQ .... NFDVA.T.LH.ES ... ..AKRVEE.VLVLARR.PNLRV.PII...ST......IEGS.RVCN~EAE-.P.RK.YL~V.....K......Y~CVKSVF NFDVA.T.LH.ES .AQ .... ... ..AKRVEE.VLVLARR.PELRL.PII...ST......IEGT.~CN~~E-.PERK.Y~V.....K......Y~~KSMF ..G .. ..ACGRVEEAVDVLLSR.PDVKWPII...ST .SQ.Y .. SFELA.S.LH .I .. ..VDGV.KKLNEGLLKEK.P.REVHLIAM......V..MIS.Y.VAV~W 7777777777777777377777777777777777777777777777777777777777777777777777777777777777777777~ ... ..R..VSIATA.FH .H ........................................................................................... Kle Fat Aro Thi Rhi Brj Azb Ana cc1 Cl0 Atvn Acvn Azan Met 33c . . . . . . . . . . . FTADYQGQPGKL---PKL--NLVTGFETYLG--NFRVLKRMMEOMAVPCSLLSDPSEVLDTPADGHYRMYS-GGTTQQEMKEAPDAIDTLLLQPWQLLKSKKWQEMWNC . . . . . . . . . . ..---... KL...P . . . . . ..TG . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Q . . . ..- . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ..LKSMDDKVVG---SNKKI.I.P.......--....I...LSE.G.GY . . . . . . E . . . . . . . ..QF...A-.....E...D..N.LN.V.....H.E.T..F.EGT.Kt .WDGKA.TVPA.ERK.DEKI.FIG..DG.-mG.M.EI,,LFSL.N.DYTI.G.G.D.W......EF...D. . ..FA.AEA.LN.KA.VCM.GISTE.TMAYI..KGQE .WNGKA.TAP..ERK.NEAI.IIG..DGN-TVG.L.EI..I~.GIKHTI.A.N...F...T..EF...D. ..HVEDTAN.IH.KA.ISM.Q.CTE.TLPF.S.HGQC . . ..LKDAAN.IH.KA.ISM.Q.CTE.TLSFAA.HGQC . WDGUA. TAP ..ERK.NGAI.IIG..DG.-TVG.L.EI..IL.L.EI..IL.L.GIQHTV.A.N...F...T..EF...D. ..VH.KA.ISM.EYCTPQ.I.QFIKREGPJ .W----.TSENFDTPKNETI..IP..DGF-AVG.N.E...IAGLFGIQMTI...V.DNF......E....D-...PL~T L.EGKKKATS------NGKI.FIP..D..--VG.N.E.....GV.G.DYTI . ..S.DYF.S.NM.E.E..P-S..KLEDAADSIN.KA.VA..AYTTP.TREYIKTQ.K. L--SRTADAD.VESPG.PRL.I.P..... --VG.H.EYR.IL.L.G.DPLI.G.H.SS..S..T.E.EL.P-...P~.~T.RFS~.~..ETTTR.TTEF.~G.K. LSENTGAK--------NGKI.VIP ..---V.PADM.EI..LF.A.DI.YIMFP.T.G...G.TT.E.K..PE... KIEDL.DTGNSDL..S.GSYASDLGA.TLEKKCm I.DAHGKGQ------.SGKL.V---.PGWVNPGDVVL... YFKE.D.EANIYM.TED-F.S.MLPNKSIETH.R..VEDI I.EVHGKGQ------.SGKL.V---.PGWVNPGDVVL... YFKE.G.DATVFM.TED-F.S.MLPNKSIETH.R..VEDI .AK---REA------.NDKI ..---LTGWVNPGDVKE..HLLGE.DIEANV.FEIES-F.S.ILPDGSAV.H.N..IEDLIDTGN.~.FA.NRYEGT.~YL.KKFEl ~~~~73~~3~~333~~~7~3~777333333777~7777777777777777777777777777~777777~7777777~7~777~777777~77777777~ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ..-................... Kle FaC G .a 22c Downloaded from http://mbe.oxfordjournals.org/ by guest on January 12, 2012 1 t t t t h EI . .. .. . 44c Kle Fat Azo Thi Rhi Brj Azb Ana cc1 Cl0 Awn Acvn Azan Met PATEVAIPL;;LAATDELLM~VSQLSGKPIADALTLERG~MMMLD-SH~WLHGKKFGL;GDPDFVMG . . ..*.......................................... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ..-......--......... EVPKLN..M..DW . ..F..K..EI..Q..PAS..K.........T.-........R.A.W.........VK.........VH..C..G.-...K..--~AI.A.....~ W-ALHC.I.VTG..HF.QE..GI.....SEE.KK.......AIGT-.ISY......AIK.....CL.VAG......A..VHWCTRG.--------GTPMIHIGF.IHC W-SFNY.V.VS...D..VAL.RI...E.PEQ.AR.......AIA.-.SAHI.....AI.....LCY..~......A...HV..T.G.-~AGEN---A~FAG..F.E VL-SFNY.V..S...DFIVAL.RI...E.PEQ.AR.......AIA.-.SAHV.....AI.....LCY..~......A...HV..T.G.-.A..EK--.Q~.AS..F.C NR?! GRQAYNY.M.VTG......KLAE.....-SRGVK........AIA.-...H....R.AV......CL.MSK..M...A..VH...TSGS-.K.E.Q--VQ.V...CRS~ ETQVLR-.F.VKG... F.TA..E.T..A.PEE.EI.......AIT.-.YA.I.....AI.....LIISI.S . . ..M.A..VH..CN.GD-DTFK.E--.EAI.A...P.H ETWLET.I.VKN..RF.TE.AR.AEVE.PAE..A.......~T.-..AY....RVAIA....L.I~.G.A....MI.VHL..T..D-NSFAPR--LE.V.ST.KF.E .FKTLRT.I.VS.... FI.~.EAT..EvPASIEE...Q.I.L.I.-AQQY.Q...VA.L....EIIA.SK.II...AI.KY~GTPG-MKF..E--IDA..A~GI-E .NAL.NT.Y.IKN..DM.RKIAEVT..E.PES.VR... IAL.ALA.LA.MFFAN..VAIF.H..L.L..AQ.CM.VEL..VLL.IGDWGNKYK.DPRIEELKNTAHFDl .NSL.NT.Y.IKN..DM.RKIAEIT..E.PES.VR.PRIAWI~A.LA.MFF~..VAIF.H..L.L..AQ.C..V~..~L.IGDDQGSKYK.DP~Q~K..AHF~ ..IIGPT.I.IRN..IF.QNLKKAT....PQS.AH . ..VAI.ALA.LT.MF.AE.RVAI..A..L.I..AE.C.D.~K.~L.LGDD.-SK~DPRI~QE~..~ ~~77777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777~77777~ . ... .. .. .. .. .. .. .. ... .. .. .. .. ... .. ... . ... ... .. .. .. ... ... .. ... .. .. ... .. ... ... .. .. .. ... .. .. ... .. .. ... .. .. ... .. 540 . Kle Fat Azo Thi Rhi W Arb Ana cc1 Cl0 Azvn Acvn Azan Met . DSEVFINCDLWHFRSLMFTR--QPDFMIGI;SYGKFIQRDDYSFDLVR RYS.................--................V . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .A . . . . . . . . . . NAT.Y.GK....L...V..D--K..................H...E . . . . . . . I . . . I . . . . . . ..S..L......Q.L.....SI..R..EE.RGMQA...NH.... RHHHHRYPIWGYQGAVNVLVILDRIHLELDRNTLGIGT.KDWAEK--~~FASS.FGT~.AYPGKDLWHMRSL~TE--P~F.IGNSYTKYLE~..F.Y.... L-PAYPGR....M...L..E--PV..L...TH..YLE...-------GT . ..Q.GL.VLVKILDKIFDEI.KK..V.........II. . . . . I...I.....H..FPV GCQ.YPGR....M...L..E--PV..L...T...YLE...-------AT . ..Q.GL.VLVKILDKIFDEI.NK.NI.........II. . . . . I...I.....H..FPI . . ..Q..L.VLVRILDRIF.DM.AN.NIV.E......... SGKAYGAK....L...V..D--KV.YI.......YLE...-------KI.....TY.I.....H..YP EAK.W.QK........L..E--PV..F.......YLW...-------SI.~.I.Y . . . . . . . . ..YS.L..Q.GL..LNWV..TL.DEM.RS.NIT....I....I. AAT.WPGK....L... V..E--PV.LL..S..L.Y.S.EA-------N...V.V...I . . . . . . ..FPIV..T.GLHLL.Q...T..DE..RTS-----P.H...V.. G.K.KVEG.FFDVHQWIKNE--GV.LL.S.T.....A.EE-------NI.FV.F...IM..YGHYYNPKV..K..I~.EEIT.VI.D.IEREC-----.EED.EV.. E--1VH . A . ..ELEKRI-NAGL.L.LIM.H.K.RYVAIEA-------NI.~.V...T...AG.Y.KPSI..Q...ELGEMIA..MFAHMEYT~----KEWILN-TW E--1VH . A . ..ELEKRI-NDGL.L.LIM.H.K.RYVAIEA-------NI.~.V...T...AG.Y.KPSI..Q...~GEMIA..MFAHMEYT~----KEWILN-TW E--IVT.A.F.ELENRIKNEGLEL.LIL.LIL.H.K.R..SI.Y-------NI.~.V . ..TY..AG.F.YP.V..G..IWLAEQMA.TLFADMEHKKN-.W 777777777777777777777777777777777777777777777~7777777777777777777777777777~777777777777777777777777777777777 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ..-...I.......................... Frunkia HFPCcI3 n$K Gene amino acid sequences were first converted to distances by PROTDIST using either Dayhoff’s ( 1979) PAM 00 1 matrix ( Dayhoff distances) or Kimura’s ( 1983 ) approximation of the Dayhoff distances (Kimura distances): D = -ln( 1 - p - 0.2 p2), Results Isolation of Clones and Sequencing of nifK from Frankia Strain HFPCcI3 Plasmid pFQ 167, which contains the nz$HDK genes of Frankia strain CeD (Normand et al. 1988)) was used for Southern hybridizations against total DNA isolated from Frankia strain CcI3. A single 5.6-kb BamHI fragment that hybridized to the pFQ167 insert was detected in the Cc13 genome. A partial genomic library was made in pUC 19 from Cc13 BamHI-restricted DNA fragments of 5-9 kb. The plasmid pAR2 was isolated from the library on the basis of its hybridization to pFQ 167. Plasmids pAR2 and pFQ 167 nifregions were shown to have similar restriction patterns (not shown). As in other Frankias (see review by Simonet et al. 1990), the nzfHDK genes of Frankia strain Cc13 are contiguous and located on the Frankia chromosome. The Frankia Cc13 nzfK nucleotide sequence (GenBank U01238) ranges from a low of 55.5% sequence identity to the Klebsiella pneumoniae nzfK to a high of 62.5% sequence identity to the Anabaena nzfK, indicating that the nzfK genes are sufficiently conserved at the DNA level that they will hybridize; this was the strategy used by Ruvkun and Ausubel ( 1980) to demonstrate that nzf genes were conserved among evolutionarily diverse diazotrophs. Previously, Simonet et al. ( 1988) indicated on the basis of DNA hybridization that there is low sequence similarity between K. pneumoniae and Frankia nzfK. Our sequence data confirm their findings. The nucleotide composition of the Frankia Cc13 nzfK gene is highly skewed toward G+C; 9 1.5% G or C occurs in the third codon position, similar to the average for the entire genome. The only exception is in amino acids 10 through 53, which exhibit 67% A or T in the third position. This stretch of amino acids also shows minimal similarity with all the other N termini of the diazotrophs studied. The protein-coding region of nzfK is 1,356 nucleotides long, including the stop codon, and encodes a polypeptide of 45 1 amino acids. A potential ribosomal binding site ( RBS), the tetranucleotide sequence GAGG, is just upstream of the initial nzfK ATG (Shine and Dalgarno 1974). An intergenic region of 57 nucleotides exists between the start of nzfK and the end of nzfD. Comparison Sequences of Frankia Cc13 nzfK with Published nzjYK Figure 1 is an alignment of the deduced amino acid sequence of the Cc13 nzfK with the deduced nzfK polypeptides from published sequences of other diazotrophs. FIG. I.-Alignment of available NifK sequences. Regions difficult to align, eliminated before phylogenetic analysis, are indicated by the horizontal lines. Amino acids that are identical to the reference taxon are indicated by a period and gaps are indicated by a dash; #, the three cysteine and serine residues of the p subunit involved in the P-cluster pair , ? refers to sequences that are not known. Abbreviation names and sources: He, Klebsiella pneumoniae (Arnold et al. 1988); Fat, “Frankia” FaCl = Klebsiella pneumoniae (Lignon and Nakas 1988); Azo, Azotobacter vinelandii (Brigle et al. 1985); Thi, , Thiobacillus ferrooxidans (Rawlings 1988); Rhi, Parasponium Rhizobium (Weinman et al. 1984); Brj, Bradyrhizobium japonicum (Kaluza and Hennecke 1984); Azb, Azospirillum brasilense (Passaglia et al. 199 1); Ana, Anabaena sp. (Mazur and Chui 1982); Ccl, Frankia strain HFPCcI3, this study; Clo, Clostridium pasteurianum (Wang et al. 1988); Azvn, A. vinelandii vnjK (Joerger et al. 1990); Acvn, A. chroococcum vnf(Robson et al. 1989); Azan, A. vinelundii anf(Joerger et al. 1989); Met, Methanococcus thermolithotrophicus (Souillard and Silbold 1989). Downloaded from http://mbe.oxfordjournals.org/ by guest on January 12, 2012 where p is the fraction of amino acids that differ between two species. The generated distance matrices were then subjected to Saitou and Nei’s ( 1987) neighbor-joining (NEIGHBOR) and to the least squares methods provided in FITCH, with the value of P set to 0.0 (the method of Cavalli-Sforza and Edwards 1967), 1.O, and 2.0 (the method of Fitch and Margoliash 1967). Bootstrapping ( Felsenstein 1985 ) was performed for each analysis. For the parsimony analyses, branchand-bound searches were made using PAUP ( 100 pseudoreplicates were performed). For the distance analyses, 100 pseudoreplicate amino acid sequence datasets were generated by SEQBOOT in PHYLIP, followed by the conversion of those 100 datasets to distances according to the two methods outlined above using PROTDIST. These pseudoreplicate distance matrices were then analyzed by NEIGHBOR, and the consensus trees were computed using CONSENSE. Trees were rooted using either the Methanococcus thermolithotrophicus nifKgene, the only noneubacterial sequence analyzed ( Woese 1987 ) (parsimony analysis only), or with the alternative nitrogenase gene clade (Azotobacter vinelandii and A. chroococcum vnfKs and A. vinelandii anfK [see Results]). The justification for the alternative NifKs being considered as an outgroup is their paralogous status (Young 1992; Chien and Zinder 1994). 19 20 Hirsch et al. I Klebsiella @ 78 , 53 11 “Frankia” FaCl 111 = (Klebsiella pneumoniae) Azot obacter 137 \ pneumoniae vinelandii ferrooxidans Azospirillum 66 77 100 T104 101 Frankia Clost ridium ‘2 Azot obacter vinelandii - Azotobacter chroococcum l9 Azot obact er vinelandii sp. 1 HFPCcl3 1 past eurianum I- anf y-prot p -prot brasilense eobact eria eria t hallobact eria f irmibact eobact eria I cyanobact eobact ol-prot japonicum eria vnf vnf I eria 1 Downloaded from http://mbe.oxfordjournals.org/ by guest on January 12, 2012 113 Anabaena eria Rhfzobium Bradyrhizobium * eobact l- Thiobacillus Parasponfum y-prot FIG. 2.-Phylogenetic treeof I3 n[fK sequences based on deduced amino acid sequences and obtained with parsimony (PAUP). The same topology is obtained with analysis of Kimura distances (see text). The tree is rooted with the alternative nifK genes. Branch lengths (number of amino acid replacements observed over entire gene) are given above the horizontal lines, and bootstrap values are encircled at the base of branches. All proteobacterial orthologs have been highlighted. The asterisks refer to the positions of the root for the two equally parsimonious trees obtained when A4~~~hanococw.sthermolithotrophiczls is included as the outgroup (see text). All are based on complete sequences, except the nzjYKof Methanococcus thermolithotrophicus, where only 237 nucleotides were available in the database. Gaps were introduced by eye to maximize the alignment of the sequences. The three cysteine residues and the serine of the p subunit that are involved in the P-cluster pair and conserved among the various taxa are indicated (fig. 1). Very little amino acid sequence similarity between Frankia and the other diazotrophs was observed in the first 52 amino acids of the Frankia Cc13 NifK. Particularly striking is the almost-complete absence of the first 70 amino acids in the Clostridium pasteurianum deduced NifK compared with Frankia, Anabaena, and the proteobacterial NifK sequences. In this amino terminal end, large gaps also occur in the sequences of the Azotobacter vinelandii, A. chroococcum vnf Ks (both vanadium based), the A. vinelandii anfK (Fe-only based), as well as in the M. thermolithotrophicus n.$K. The rest of the sequence was generally easy to align, except for sites 222-236 and 243-253, which, along with the first 57 amino acids, were eliminated from the phylogenetic analyses (fig. 1) . For most of the other regions that show length variation, it is almost always the alternative NifKs that contain amino acids that the rest of the diazotrophs lack (fig. 1). Phylogeny of NifK: Parsimony Initially, parsimony analysis was performed on the complete sequences with M. thermolithotrophicus used as the outgroup. Two trees were generated; one rooted to Clostridium, the other to A. brasilense (asterisks in fig. 2). However, because of the extensive length variation in the amino terminus of NifK, we eliminated amino acid positions 1 through 57 from all taxa (fig. 1). This meant the elimination of almost half the available M. thermolithotrophicus sequence. For this reason, M. thermolithotrophicus was removed from all subsequent analyses, and the alternative Nif polypeptides (deduced from the Azotobacter vinelandii and A. chroococcum vnf Ks, and the A. vinelandii anfK> were used as the outgroup. The most parsimonious tree found with the removal of the 57 amino terminal amino acids and the regions that were difficult to align (fig. 2) was identical to the tree rooted on Clostridium found with the complete sequences. This tree (fig. 2) shows five distinct groups. The proteobacterial cluster (Stackebrandt et al. 1988), supported by 7 1% of the bootstrap replicates, is divided into two distinct subgroups. One subgroup, supported by 100% of the bootstrap replicates and consisting of y- Frankia HFPCcI3 - Thiobaciiius Parasponium Azospiriiium Azotobacter Azotobacter t Azotobacter Vineiandii vineiandii Vnf chroococcum japonicum brasilense pasteurianum anf 0 vnf FIG. 3.-Phylogenetic tree of the 13 nifK sequences based on deduced (see text). Bootstrap values are encircled at the base of branches. distances All proteobacterial orthologs have been highlighted. proteobacteria, includes the A. vinelandii NifK and the NifKs from K. pneumoniae and FaC 1. The nucleotide sequence of FaC 1 was originally published as a Frankia sequence (Lignon and Nakas 1988), but this is now known to be the result of contaminating K. pneumoniae DNA (Lignon and Nakas 1990). It is still listed in the database as a Frankia nifK sequence, however. The second proteobacterial subgroup, supported by 59% of the bootstrap replicates, consists of Parasponium Rhizobium, Bradyrhizobium japonicum, Thiobacillus ferrooxidans, and A. brasilense NifKs. This clustering of Thiobacillus, a P-proteobacterium, with members of the a-proteobacterial Rhizobiaceae is supported by 64% of the bootstrap replicates. However, if Frankia, or Clostridium, or both are removed from the analysis, the most parsimonious tree yields a monophyletic a-proteobacteria. We also note that the last 100 amino acids of Thiobacillus are highly divergent from all other sequences (fig. 1). However, exclusion of sites 428-528 from the parsimony analysis does not change the topology of the shortest tree or affect the bootstrap values appreciably. The inclusion of this P-proteobacterial genus with the a-proteobacteria group is also seen in the NifH trees (Normand and Bousquet 1989; Normand et al. 1992). The Frankia and Anabaena NifKs form successive sister groups to a monophyletic proteobacterial group ferrooxidans I 100 ’ J amino acid sequences and most commonly obtained using Dayhoff Scale bar indicates estimated number of amino acid replacements. (supported by 100% and 8 I%, respectively, strap replicates). The Clostridium NifK, the positive bacterium included in this study, group to all other eubacterial sequences, bootstrap support. of the bootother gramis the sister with 100% Phylogeny of NifK : Distance Analyses The shortest tree obtained with the distance analyses depended on the specific analysis performed. The phylogeny computed with the Kimura distances gave the same topology as the most parsimonious tree (fig. 2 ), whether these distances were analyzed with NEIGHBOR or FITCH (P = 0.0 or 2.0). However, a tree very similar to the NifH tree of Normand et al. ( 1992) was found when the Dayhoff distances were used; with NEIGHBOR and FITCH (P = 1.O, or 2.0)) the Dayhoff distance tree was essentially the same as the NifH tree of Normand et al. (fig. 3). With FITCH (P = O.O), Frankia and Anabaena are still most closely related to the a- and P-proteobacteria, but in this tree, Frankia is the sister group to an Anabaena/a+P-proteobacterial clade. Phylogeny of NifH Normand and Bousquet ( 1989) and Normand et al. ( 1992) only analyzed the complete array of NifH sequences with distance methods, although Normand Downloaded from http://mbe.oxfordjournals.org/ by guest on January 12, 2012 Ciostridium 21 Rhlzobium Bradyrhizobium - nifK Gene 22 Hirsch et al. latter’s branches are too short by about a factor of three, unless the branch lengths are number of replacements per 100 sites.) Discussion Review of the Case for and against Horizontal Transfer of nifGenes Verification of horizontal transfer is difficult, but perhaps the most crucial component in establishing horizontal transfer is that the phylogeny of the sequences thought to be transferred should conform to a conventional phylogeny except for the radically aberrant position of one member or group (Smith et al. 1992). Using the 16s rRNA phylogenies of Woese ( 1987), Olsen and Woese ( 1993), and Olsen et al. ( 1994) as standards, the Normand and Bousquet ( 1989) NifH tree has three unanticipated features: ( 1) Frankia is grouped with the cyanobacterium Anabaena and not with the other firmicute, Clostridium; (2) this FrankialAnabaena clade lies within the proteobacterial clade; and (3) the P-proteobacterium Thiobacillus lies within the a-proteobacteria clade. Young ( 1992) gives three major reasons for arguing that the NifH tree of Normand and Bousquet ( 1989) Arotobacter 1 I vineiandii Azotobacter vineiandii Azotobacter chroococcum Kiebsieiia Thiobaciiius 19 11 52 Azotobacter 97 ferrooxidans Bradyrhizobium Azospiriiium !Wzobium Clostridium vnf pneumoniae 14 42 vnf Downloaded from http://mbe.oxfordjournals.org/ by guest on January 12, 2012 and Bousquet ( 1989) did perform parsimony analyses on a quartet of taxa, including Anabaena, Frankia, Klebsiella, and Clostridium. However, if the Clostridium sequence is paralogous to the others (Young 1992; Chien and Zinder 1994), then the analysis, with just one orthologous representative from the cyanobacteria, proteobacteria, and gram-positive bacteria, has no bearing on hypotheses of horizontal transfer. Given that our analyses of NifK with different methods produced different trees and that these trees were consistent either with horizontal transfer or with vertical descent, we wondered whether NifH might also produce different topologies under a wider range of methods of analysis. Hence, we undertook a parsimony analysis of the complete array of NifH amino acid sequences. Difficult to align regions were eliminated (sites 115-l 17 and 273 to the C-terminal end on Normand and Bousquet’s [ 19891 alignment), although inclusion of these regions did not change the topology of the shortest tree. The most parsimonious tree obtained is identical to the trees of Normand and colleagues ( 1989, 1992) (fig. 4). (Note that there is a discrepancy in the branch lengths between the NifH trees reported in Normand and Bousquet [ 19891 and Normand et al. [ 19921; the japonicum brasiiense nmiiloti pasteurianum vineiandii anf Methanococcus thermoiithotrophicus FIG. 4.-Phylogenetic tree of 13 nlfH sequences based on deduced amino acid sequences and obtained with parsimony (PAUP). The tree is rooted with Methanococcus thermolithotrophicus. Branch lengths (number of observed amino acid replacements over entire gene) are given above the horizontal lines, and bootstrap values are encircled at the base of branches. All proteobacterial orthologs have been highlighted. Sequence sources as in fig. 1 except Thiobacillus ferrooxidans (Pretorius et al. 1987); Anabaena sp. (Mevarech et al. 1980); Rhizobium meliloti (T&ok and Kondorosi 1981); Bradyrhizobium juponicum (Fuhrmann and Hennecke 1984); Frankia Ar13 (Normand et al. 1988). Frunkia 23 bacterial groups may have diverged >3,500 MYA (Schopf 1993). If there was even a slight selection pressure toward G+C enrichment (say, with a selection coefficient [s] of 0.0001 ), and if the population sizes (N) were on the order of 109, then 63% G+C enrichment could occur in as little as 50,000 yr (the time taken to acquire 290 substitutions at a substitution rate of K = Nsp, where K = the substitution rate under selection [equation modified from Lewontin ( 1974)) to take into account haploid genomes] . These calculations are simplistic, but they show that the G+C signature test, as Young ( 1992) implies, lacks power. This is especially true for the specific hypothesis of horizontal transfer implied by the nifH tree. Note, if the horizontal transfer occurred from a proteobacterium to the ancestor of Frankia and Anabaena, and Frankia later acquired its G+C-rich signature after it diverged from its common ancestor with the cyanobacteria, then there is little to explain-the G+C contents of the proteobacteria and cyanobacteria are essentially identical. While Young seriously undermined the case for nifH horizontal transfer, the nifH tree of Normand et al. ( 1992) does provide some support for horizontal transfer: the Frankia/Anabaena branch lies within the proteobacterial clade with bootstrap support of 80% and the P-proteobacterium lies within the a-proteobacterial clade with a 99% bootstrap support. Unfortunately, Normand et al.‘s ( 1992) test of the horizontal-transfer hypothesis by examining the phylogeny of another nif gene, nifo, lacks resolution, and the NifD tree will not be considered further. Note that the combining of the NifH and NifD data makes little sense in the context of testing the hypothesis of horizontal transfer. Insofar as they give different topologies, it either suggests that one of the other genes underwent horizontal transfer or that one or both datasets are contaminated with phylogenetic noise. On the Significance of Bootstrap Values The identification of horizontal transfer clearly requires well-supported phylogenies. The crucial question is, When is the position of the aberrantly placed group significant? Typically bootstrap values are used to assess robustness of individual nodes on a phylogen;. Thus, for example, the 80% support for the inclusion of the Frankia / Anabaena clade within the proteobacteria in the NifH tree (Normand et al. 1992) offers some support for horizontal transfer. Hillis and Bull ( 1993 ) and Felsenstein and Kishino ( 1993) have discussed the relationship between bootstrap values and the familiar P values of statistics. Hillis and Bull show that in simulations that the interpretation of bootstrap values as P values often underestimates the true support for a node. Downloaded from http://mbe.oxfordjournals.org/ by guest on January 12, 2012 does not provide clear support for the horizontal transfer of nifrr. First, the crucial branches on the NifH tree are not defined unambiguously, and, with different algorithms, different NifH trees were obtained. The difficulties associated with assessing the robustness of molecular phylogenies is discussed below. However, it is noted that the fact that different analyses of the NifH data by Normand and Bousquet ( 1989) gave different results is not a concern in this instance. The methods (UPGMA and PHYLIP’s KITSCH) that gave different results from all other analyses both assume that sequence evolution has occurred according to a molecular clock. Perusal of the phenograms of Normand and Bousquet ( 1989) and Normand et al. ( 1992) suggests this is not the case. The difficulty is that both UPGMA and KITSCH will often recover the wrong tree if the underlying data have not evolved according to a clock (see Marshall 1992a). Young’s ( 1992) second argument against horizontal transfer is that the Clostridium nfH sequence may be the aberrantly placed sequence, not the Frankia sequence if it is a paralog of the other n$H genes analyzed. If the Clostridium n$H sequence is a paralog, then the support for the separation of Clostridium from the rest of the Eubacteria analyzed is to be expected and does not provide support for horizontal transfer. However, this does not explain inclusion of Frankia and Anabaena within the proteobacterial clade, or the inclusion of the P-proteobacterium Thiobacillus within the a-proteobacterial clade. Young’s ( 1992) third argument against horizontal transfer is that the G+C contents of each of the n$H genes match the rest of the genomes in which they are found, and not the G+C contents of the hypothesized donor groups. This certainly argues against a very recent horizontal transfer. But how recent is recent? Estimates of the rate of conversion of a set of 290 third positions (the number in nifH) from being entirely A+T rich to highly G+C rich can be made. If it is assumed that there is no selective advantage to having a G+C-rich third position, but that observed high G+C richnesses (e.g., in Frankia) are due to substitution biases toward G+C, then, using the binomial distribution, it can be shown that 290 substitutions will, on the average, convert a set of entirely A+T-rich third positions to about 63% G+C and that a 97% enrichment would occur with approximately 1,000 substitutions. Given a bacterial neutral substitution rate (=mutation rate, p) of 0.7 to 0.8 X lo-* substitutions/site/year (Ochman and Wilson 1987), it would take about 145 million yr to get a 63% enrichment in G+C and about 500 million yr to see an enrichment of 97% G+C. These time periods are loner. but not that long if one considers that the maior HFPCcI3 n(fK Gene 24 Hirsch et al. certainly tantalizing evidence of horizontal transfer of that gene, but without further corroboration, it remains little more. Orthology, Paralogy, Reference Phylogenies, and Tests of nifHorizonta1 Transfer Given the sequences at hand, what is the range of topologies that would provide evidence for horizontal transfer among the nifgenes? Clearly, a prerequisite for demonstrating horizontal transfer is that orthologous genes are compared (although, of course, paralogs make excellent outgroups). There is now strong phylogenetic evidence (Chien and Zinder 1994) that the n$H of Clostridium is paralogous to the other nifH genes analyzed by Normand et al. ( 1992). Thus, the position of the Clostridium NifH, as argued by Normand et al. ( 1992) and Young ( 1992), is of little significance to the hypothesis of the horizontal transfer of the nzjYHgenes analyzed to date. There is also strong phylogenetic evidence that the nzjYDsequence of the gram-positive Clostridium is paralogous to the other eubacterial nifD sequences analyzed by Normand et al. ( 1992) (Chien and Zinder 1994). Thus, for NifD too, the position of Clostridium contributes little to the arguments for or against horizontal transfer. Like the NifH and NifD trees, the eubacterial lineages in the NifK tree are rooted on the Clostridium branch (figs. 2, 3). The two genes nifD and nifk’ are closely linked and encode polypeptides that are two separate components of dinitrogenase. It seems more likely than not that they have remained closely linked throughout their history. If this is true and if the Clostridium nifD is paralogous to the other eubacterial nifDs analyzed, then the Clostridium nifK is also, in all probability, paralogous to the other eubacterial nifKs analyzed. Hence the position of Clostridium in the NifK tree offers little to the debate on nif gene horizontal transfer. For the remaining taxa for which presumed orthologous nifgenes have been sequenced, there is surprisingly little known about their relationships, and thus there are relatively few topologies that would be viewed as being sufficiently aberrant to warrant hypothesizing horizontal transfer. For example, the relationships between the three major groups, the cyanobacteria, proteobacteria, and gram-positives are not well understood (see below). Hence, any of the three ways of grouping these taxa would not seem aberrant in a nif gene tree. In addition, with only one cyanobacterium and one orthologous gram-positive nif gene (the Frankia sequences), the only expected relationships of the known nifgenes are that the proteobacteria should be monophyletic and that each of the subgroups of the proteo- Downloaded from http://mbe.oxfordjournals.org/ by guest on January 12, 2012 For example, for groups with bootstrap support of 85%, greater than 95% may actually be supported. These arguments suggest that the support for the unusual placement of Frankia and Anabaena is probably stronger than the 80% bootstrap value might suggest. However, the generalizations made by Hillis and Bull ( 1993) are based on simple models of sequence evolution (e.g., no site-to-site variation in rates of sequence evolution, no substitution biases, fixed transversion/ transition ratios). In any situation where a method of inference is likely to be inconsistent, that is, where the chances of getting an incorrect tree increases with increasing amounts of data, the bootstrap values are difficult to interpret. They may simply measure the degree of spurious support generated by the evolutionary process that produced the sequences (see Raff et al. [ 19941 for a review). For example, an 18s rRNA parsimony phylogeny of amniotes (Hedges et al. 1990) has an 88% bootstrap support for an unexpected bird /mammal clade, very strong support indeed by the Hillis and Bull ( 1993) criterion. However, it was demonstrated that there was extensive lineage-specific substitution bias in the sites that had varied, and reanalysis with a weighted parsimony algorithm removed the mammals from the top of the tree, to the bottom (Marshall 1992b). The bootstrap value in this case was primarily measuring the very strong contribution of taxon-specific substitution biases and elevated rates of substitution (but no more than is apparent in the nifgene trees) and not the strength of the phylogenetic signal. Further analysis by Van de Peer et al. ( 1993) indicates that the 18s rRNA molecule has site-to-site variation in substitution rates spanning three orders of magnitude. By breaking the molecule up into five classes based on the rate of change and correcting each separately for multiple hits, they were able to recover an amniote phylogeny (with mammals at the bottom, not the top of the tree) completely concordant with morphological and paleontological evidence. For molecules with long histories and highly conserved functions, it is likely that there has been great heterogeneity in the rates that sites have evolved, as well as episodes of substitution bias, rate changes, and so on. In these cases, bootstrap values may not measure the accuracy or reliability of the derived phylogeny; all they provide is a measure of the relative strengths of the conflicting signal in the dataset (Marshall 1992b). Some of that signal may be due to history, but it is also likely that some, perhaps much of it, will be artifact created by substitution bias, the proximity of long branches to short branches, and site-to-site variation in rates of substitution. The 80% bootstrap support for the Frankia/ Anabaena clade within the proteobacterial clade is Frankia rzifK Gene 25 horizontally transferred. Note that none of the trees that support horizontal transfer deviate grossly from topologies expected under the hypothesis of vertical descent: all can be converted to a tree consistent with vertical descent by interchanging just one or two adjacent nodes. To resolve the evolutionary relationships of the nifgenes, sequences need to be obtained from a considerably wider range of diazotrophs. There also needs to be development of methods to determine whether sequences have dramatically different site-to-site nucleotide substitution rates, substitution biases, and so on, and phylogenetic methods need to be developed to handle these cases. Rooting the Eubacterial Tree The relationships between the three major eubacterial lineages, the cyanobacteria, the proteobacteria, and the gram-positives are not firmly established. For example, the 16s rRNA sequences give different roots (fig. 5) depending on the taxa included. Figure 1 of Olsen and Woese ( 1993) provides perhaps the most accurate 16s rRNA tree of the three groups-an unresolved trichotomy. One of the few other genes sequenced from sufficient taxa to bear on the relationships of these three eubacterial groups is the glutamine synthetase 1-p gene (GSI-0) (Brown et al. 1994)) although different analyses provide different roots (fig. 5 ) . The NifK sequences offers one of the best opportunities to establish the relationships among the major nitrogen-fixing eubacterial lineages, although a larger database is desirable. If nifK evolved by vertical descent, then the present data provide evidence for the cyanobacteria and proteobacteria being sister groups ( fig. 5 ). On the basis of parsimony tree ( fig. 2 ), there is 8 1% bootstrap support for this topology; this is also the tree with the strongest support from GSI-P. 16s rRNA (Olsen et al. 1994) CYANOBACTERIA GRAM-POSITIVES (NifK) GSI-P (Brown et al. 1994: MP) FIG. 5.-Position of the root for the three eubacterial lineages suggested by four phylogenetic studies. The NifK is enclosed in parenthesis to emphasize that vertical descent is not the favored hypothesis in all analyses. Brown et al. (1994) presented two trees, a maximum parsimony tree (MP, 97% bootstrap support for the root shown) and a neighbor-joining tree (NJ, 60% bootstrap support for the root shown), for the glutamine synthetase (GS) genes. The relevant portions of their trees are based on the GSI-P member of the family. Downloaded from http://mbe.oxfordjournals.org/ by guest on January 12, 2012 bacteria with more than one taxon represented should be monophyletic. What then do the Nif trees have to tell us about the possibility of horizontal transfer of the nifgenes? Taken at face value, the parsimony and Kimura distance NifK trees (fig. 2) provide support for vertical descent of the nifsenes, while the Dayhoff distance trees (fig. 3) provide support for horizontal transfer. However, in the bootstrap analysis performed for each phylogenetic analysis, support was found for both hypotheses. For example, while the most parsimonious tree provides 7 1% bootstrap support for a monophyletic proteobacteria, 29% of the bootstrap trees do not have a monophyletic proteobacteria. Accordingly, while the best Dayhoff distance tree supports horizontal transfer, 27% of the bootstraps support a monophyletic proteobacteria and vertical descent of nifK. It is interesting to note that, while there is support for the P-proteobacterium Thiobacillus lying within the a-proteobacterial clade, there is also 35% bootstrap support for a monophyletic a-proteobacterial clade ( Thiobacillus lies outside the three a-proteobacteria) in the parsimony analysis and a 3% and 3 1% support for that group in the Dayhoff and Kimura distance analysis, respectively. It is also interesting that there is no bootstrap support for a monophyletic Frankia / Anabaena clade in the parsimony analysis (although there is 20% support for Anabaena lying within the proteobacteria and 12% support for Frankia lying within the proteobacteria), while there is 40% and 15% support for this grouping in the Dayhoff and Kimura distance analyses, respectively. Parsimony analysis of NifH provided support for horizontal transfer in accord with the distance analyses of Normand et al. ( 1992). However, in the bootstrap analyses, there is also some support for vertical descent of the NifH orthologs. In the parsimony analysis, there was 16% support, while in the Kimura distance analysis there was 10% support for a monophyletic proteobacterial clade. However, there was virtually no support for a monophyletic a-proteobacterial clade; in only 7% of the bootstrap replicates in the parsimony analysis, and in only 2% of replicates in the Kimura distance analysis did the P-proteobacterium fall outside the a-proteobacterium clade. Bootstraps were not performed using Dayhoff distances. In summary, the NifK data provide support for either vertical descent or horizontal transfer, depending on the method of analysis. The NifH sequences provide stronger support for horizontal transfer, although there is some signal in the sequence that supports vertical descent. The NifD tree of Normand et al. ( 1992) has insufficient resolution to add substantially to the question of whether, and to what extent, the nifgenes have been HFPCcI3 26 Hirsch et al. Dedication and Acknowledgments LITERATURE CITED W., A. RUMP, W. KLIPP, U. B. PRIEFER, and A. 1988. Nucleotide sequence of a 24,206-base-pair DNA fragment carrying the entire nitrogen fixation gene cluster of Klebsiellapneumoniae. J. Mol. Biol. 203:7 15-738. BRIGLE, K. E., W. E. NEWTON, and D. R. DEAN. 1985. Complete nucleotide sequence of the Azotobacter vinelandii nitrogenase structural gene cluster. Gene 37:37-44. BROWN, J. R., Y. MASUCHI, F. T. ROBB, and W. F. DOOLITTLE. 1994. Evolutionary relationships of bacterial and Archaeal glutamine synthetase genes. J. Mol. Evol. 38:566-576. CAVALLI-SFORZA, L. L., and A. W. F. EDWARDS. 1967. Phylogenetic analysis: models and estimation procedures. Evolution 32:550-570. CHIEN, Y. T., and S. H. ZINDER. 1994. Cloning, DNA sequencing and characterization of a nifD-homologous gene from the archaeon Methanosarcinia barkeri 227 which resembles nifDl from the eubacterium Clostridium pasteurianum. J. Bacterial. 176:6590-6598. DAYHOFF, M. 0. 1979. Atlas of protein sequence and structure. Vol. 5. Suppl. 3. National Biomedical Research Foundation, Washington, D.C. FELSENSTEIN, J. 1993. PHYLIP (Phylogeny Inference Package), Version 3.5~. Department of Genetics, University of Washington, Seattle. -. 1985. Confidence limits on phylogenies: an approach using the bootstrap. Evolution 39:783-79 1. FELSENSTEIN, J., and H. KISHINO. 1993. Is there somelhing wrong with the bootstrap on phylogenies? A reply to Hillis and Bull. Syst. Biol. 42:193-200. FERNANDEZ, M. P., H. MEUGNIER, P. A. D. GRIMONT, and R. BARDIN. 1989. Deoxyribonucleic acid relatedness among members of the genus Frankia. Inter. J. System. Bacterial. ARNOLD, WHLER. 39:424-429. HENNECKE, LUDWIG, H., K. KALUZA, B. THONY, M. FUHRMANN, W. and E. STACKEBRANDT. 1985. Concurrent evolution of nitrogenase genes and 16s rRNA in Rhizobium species and other nitrogen fixing bacteria. Arch. Microbial. 142~342-348. HIGGINS, D. G., and P. M. SHARP. 1988. Clustal: a package for performing multiple alignment on a microcomputer. Gene 73:237-244. HILLIS, D. M., and J. J. BULL. 1993. An empirical test of bootstrapping as a method for assessing confidence in phylogenetic analysis. Syst. Biol. 42: 182- 192. JOERGER, R. D., M. R. JACOBSON, R. PREMAKUMAR, E. D. WOLFINGER, and P. E. BISHOP. 1989. Nucleotide sequence and mutational analysis of the structural genes (anfHDGK) for the second alternative nitrogenase from Azotobacter vinelandii. J. Bacterial. 171:1075-1086. JOERGER, R. D., T. L. LOVELESS, R. N. PAU, L. A. MITCHENALL, B. H. SIMON, and P. E. BISHOP. 1990. Nucleotide sequences and mutational analysis of the structural genes for nitrogenase 2 of Azotobacter vinelandii. J. Bacterial. 172:3400-3408. Downloaded from http://mbe.oxfordjournals.org/ by guest on January 12, 2012 This paper is dedicated to the memory of Professor John G. Torrey of the Harvard Forest, Peter-sham, Mass. He will be remembered not only for his pioneering work on Frankia but also for his profound influence on his students and postdoctoral associates. We thank Pascal Simonet (Universite Claude Bernard, Lyon I, Villeurbanne, France) for generously providing plasmid pFQ 167 and also members of our laboratory group, especially Bettina Niner, for helpful comments on the manuscript. We are also grateful to Rob Gunsalus and Imke Schroeder (University of California, Los Angeles) for helpful discussions and to Steve Zinder (Cornell University) for conveying information before publication. Our gratitude is also extended to an anonymous but extremely knowledgeable reviewer. This research was supported in part by grants from the United States Department of Agriculture (8%37262-3979), the UCLA Latin American Center, and by the University of California Pacific Rim Research Program (MC930209 ) to A.M.H., and by National Science Foundation grant EAR-9258045 to C.R.M. W. M., and E. MARGOLIASH. 1967. Construction of phylogenetic trees. Science 155:279-284. F~HRMANN, M., and H. HENNECKE. 1984. Rhizobium japonicum nitrogenase Fe protein gene ( nifH) . J. Bacterial. 158: 1005-1011, HEDGES, S. B., IS. D. MOBERG, and L. MAXSON. 1990. Tetrapod phylogeny inferred from 18s and 28s ribosomal RNA sequences and a review of the evidence for amniote relationships. Mol. Biol. Evol. 7:607-633. FITCH, K., and H. HENNECKE. 1984. Fine structure analysis of the nifDK operon encoding the a and B subunits of dinitrogenase from Rhizobium japonicum. Mol. Gen. Genet. KALUZA, 196~35-42. M. 1983. The neutral theory of molecular evolution. Cambridge University Press, Cambridge. KLEINER, D., W. LITTKE, H. BENDER, and K. WALLENFES. 1976. Possible evolutionary relationships of the nitrogenase proteins. J. Mol. Evol. 7:159-165. LEWONTIN, R. C. 1974. The genetic basis of evolutionary change. Columbia University Press, New York. LIGON, J. M., and J. P. NAKAS. 1988. Nucleotide sequence of nifK and partial sequence of nifD from Frankia species FaC 1. Nucl. Acids Res. 16: 11843. . 1990. Corrigendum. Nucleotide sequence of nifK and partial sequence of nzfD from Frankia species FaC 1. Nucleic Acids. Res. 18: 1097. MARSHALL, C. R. 1992a. Character analysis and the integration of molecular and morphological data in an understanding of sand dollar phylogeny. Mol. Biol. Evol. 9:309-322. ~ . 19926. Substitution bias, weighted parsimony, and amniote phylogeny as inferred from 18s rRNA sequences. Mol. Biol. Evol. 9:370-373. MAZUR, B. J., and C. F. CHUI. 1982. Sequence of the gene coding for the B-subunit of dinitrogenase from the bluegreen alga Anabaena. Proc. Natl. Acad. Sci. USA 79:6782KIMURA, 6785. Frankia HFPCcI3 niJKGene 27 J. W. 1993. Microfossils of the early Archean Apex Chert: new evidence of the antiquity of life. Science 260: 640-646. SHINE, J., and L. DALGARNO. 1974. The 3’-terminal sequence of Escherichia coli 16s ribosomal RNA: complementary to nonsense triplets and ribosome binding sites. Proc. Natl. Acad. Sci. USA 71: 1342- 1346. SIMONET, P., P. NORMAND, and R. BARDIN. 1988. Heterologous hybridization of Frankia DNA to Rhizobium meliloti and Klebsiella pneumoniae nif genes. FEMS Micro. Lett. 55:141-146. SIMONET, P., P. NORMAND, A. M. HIRSCH, and A. D. L. AKKERMANS. 1990. The genetics of the Frankia-actinorhizal symbiosis. Pp. 77-109 in P. M. GRESSHOFF, ed. Molecular biology of symbiotic nitrogen fixation. CRC, Boca Raton, Fla. SMITH, M. W., D.-F. FENG, and R. F. DOOLITTLE. 1992. Evolution by acquisition: the case for horizontal gene transfers. TIBS 17:489-493. SOUILLARD, N., and L. SIBOLD. 1989. Primary structure, functional organization and expression of nitrogenase structural genes of the thermophilic archaebacterium Methanococcus thermolithotrophicus. Mol. Microbial. 3:44 l-552. STACKEBRANDT, E. R., G. E. MURRAY, and H. G. TRUPER. 1988. Proteobacteria, classis nov., a name for the phylogenetic taxon that includes the “purple bacteria and their relatives.” Int. J. Syst. Bacterial. 38:32 l-325. SWOFFORD, D. L. 1992. Phylogenetic analysis using parsimony (PAUP), Version 3.0s. Illinois Natural History Survey, Champaign. TOR~K, I., and A. KONDOROSI . 198 1. Nucleotide sequence of the Rhizobium meliloti nitrogenase reductase (nifH) gene. Nucleic Acids Res. 9:57 1 l-5723. VAN DE PEER, Y ., J.-M. NEEFS, P. DE RIJK, and R. DE WACHTER. 1993. Reconstructing evolution from eukaryotic smallribosomal-subunit RNA sequences: calibration of the molecular clock. J. Mol. Evol. 37:221-232. YOUNG, J. P. W. 1992. Phylogenetic classification of nitrogenfixing organisms. Pp. 43-86 in G. STACEY, R. H. BURRIS, and H. J. EVANS, eds. Biological nitrogen fixation. Chapman & Hall, New York. WANG, S.-Z., J.-S. CHEN, and J. L. JOHNSON. 1988. The presence of five nzyH-like sequences in Clostridium pasteurianum: sequence divergence and transcriptional properties. Nucleic Acids Res. 16:439-454. WEINMAN, J. J., F. F. FELLOWS, P. M. GRESSHOFF, J. SHINE, and K. F. SCOTT. 1984. Structural analysis of the genes encoding the molybdenum-iron protein of nitrogenase in the Parasponia Rhizobium strain ANU289. Nucleic Acids Res. 12:8329-8344. WOESE, C. R. 1987. Bacterial evolution. Microbial. Rev. 51: 221-271. SCHOPF, JEFFREY PALMER, reviewing Received June 29, 1994 Accepted August 29, 1994 editor Downloaded from http://mbe.oxfordjournals.org/ by guest on January 12, 2012 M., D. A. RICE, and R. HASELKORN. 1980. Nucleotide sequence of a cyanobacterial nifH gene coding for nitrogenase reductase. Proc. Natl. Acad. Sci. USA 77:64766480. MULLIN, B. C., and C. S. AN. 1990. The molecular genetics of Frankia. Pp. 196-2 15 in C. R. SCHWINTZER and J. D. TJEPKEMA, eds. The biology of Frankia and actinorhizal plants. Academic Press, New York. NORMAND, P., and J. BOUSQUET. 1989. Phylogeny of nitrogenase sequences in Frankia and other nitrogen-fixing microorganisms. J. Molec. Evol. 29:436-444. NORMAND, P., M. GOUY, B. COURNOYER, and P. SIMONET. 1992. Nucleotide sequence of nifD from Frankia alni strain Ar13: phylogenetic inferences. Mol. Biol. Evol. 9:495-506. NORMAND, P., P. SIMONET, and R. BARDIN. 1988. Conservation of nifsequences in Frankia. Mol. Gen. Genet. 213: 238-246. OCHMAN, H., and A. C. WILSON. 1987. Evolution in bacteria: evidence for a universal rate in cellular genomes. J. Mol. Evol. 26:74-86. OLSEN, G. J., and C. R. WOESE. 1993. Ribosomal RNA: a key to phylogeny. FASEB J. 7:113-123. OLSEN, G. J., C. R. WOESE, and R. OVERBEEK. 1994. The winds of (evolutionary) change: breathing new life into microbiology. J. Bacterial. 176: l-6. PASSAGLIA, L. M. P., C. P. NUNES, A. ZAHA, and I. S. SCHRANK. 199 1. The nifHDK operon in the free-living nitrogen-fixing bacteria Azospirillum brasilense sequentially comprises genes H, D, K, an 353 bp orf and gene Y. Braz. J. Med. Biology Res. 24:649-675. PRETORIUS, I.-M., D. E. RAWLINGS, E. G. O’NEILL, W. A. JONES, R. KIRBY, and D. R. WOODS. 1987. Nucleotide sequence of the gene encoding the nitrogenase iron protein of Thiobacillus ferrooxidans. J. Bacterial. 169:367-370. RAFF, R. A., C. R. MARSHALL, and J. M. TURBEVILLE. 1994. Using DNA sequences to unravel the Cambrian radiation of the animal phyla. Ann. Rev. Ecol. Syst. 25: 351-375. RAWLINGS, D. E. 1988. Sequence and structural analysis of the a- and P-dinitrogenase subunits of Thiobacillus ferrooxidans. Gene 69~337-343. REDDY, A., B. BOCHENEK, and A. M. HIRSCH. 1992. A new Rhizobium meliloti symbiotic mutant isolated after introducing Frankia DNA sequence into a nodA : : Tn5 strain. Molec. Plant-Microbe Inter. 5:62-7 1. ROBSON, R. L., P. R. WOODLEY, R. N. PAU, and R. R. EADY. 1989. Structural genes for the vanadium nitrogenase from Azotobacter chroococcum. EMBO J. 8: 1.217- 1224. RUVKUN, G. B., and F. M. AUSUBEL. 1980. Interspecies homology of nitrogenase genes. Proc. Natl. Acad. Sci. USA 77:191-195. SAITOU, N., and M. NEI . 1987. The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol. Biol. Evol. 4:406-425. SAMBROOK, J., E. F. FRITSCH,and T. MANIATIS. 1989. Molecular cloning. A laboratory manual. 2d ed. Cold Spring Harbor Laboratory Press, Plainville, N.Y. SANGER, F., S. NICKLEN, and A. R. COULSEN. 1977. DNA sequencing with chain elongation inhibitors. Proc. Natl. Acad. Sci. USA 74:5463-5467. MEVARECH,