Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Gene 318 (2003) 185 – 191 www.elsevier.com/locate/gene Rampant horizontal gene transfer and phospho-donor change in the evolution of the phosphofructokinase Eric Bapteste a,*, David Moreira b, Hervé Philippe a,1 a Equipe Phylogénie, Bioinformatique et Génome, UMR CNRS 7622, Université Pierre et Marie Curie, 9 quai St. Bernard, 75005 Paris, France b Unité d’Ecologie, Systématique et Evolution, UMR CNRS 8079, Université Paris-Sud, 91405 Orsay Cedex, France Received 1 April 2003; received in revised form 2 June 2003; accepted 24 June 2003 Received by A. Roger Abstract Previous work on the evolution of the phosphofructokinase (PFK) has shown that this key regulatory enzyme of glycolysis has undergone an intricate evolutionary history. Here, we have used a comprehensive data set to address the taxonomic distribution of the different types of PFK (ATP-dependent and PPi-dependent ones) and to estimate the frequency of horizontal gene transfer (HGT) events. Numerous HGT events appear to have occurred. In addition, we focused on the analysis of sites 104 and 124 (usually Gly104 + Gly124 or Asp104 + Lys124), known to be involved in catalysis (J. Biol. Chem. 275 (2000) 35677). It revealed the existence of numerous sequences from distantly related species carrying atypical combinations of amino acids. Several adaptive changes of phospho-donors, probably requiring a single mutation at position 104, have likely occurred independently in many lineages. The analysis of this gene suggests the existence of a high rate of both HGT and substitution in its active sites. These rampant HGT events and flexibility in phospho-donor use illustrate the importance of tinkering in molecular evolution. D 2003 Published by Elsevier B.V. Keywords: Phosphofructokinase; Phylogeny; Evolution; Horizontal gene transfer; Adaptability 1. Introduction Energy metabolism of many organisms is based on glycolysis. The classical glycolytic pathway (the Embden –Meyerhof pathway) is regulated by the phosphofructokinase (PFK). However, at least two alternative pathways to degrade glucose also exist. One of them, the EntnerDouderoff pathway, does not require PFK and is broadly distributed in bacteria (Conway, 1992). The other one is present only in some Archaea and requires an enzyme called ADP-PFK, which does not seem to be homologous to PFK. Abbreviations: EST, expressed sequence tag; GLK, glucokinase; HGT, horizontal gene transfer; KDPG, 2-keto-3-deoxy-6-phosphogluconate; ML, maximum likelihood; NJ, neighbour joining; PFK, phosphofructokinase; PFP, PPi-phosphofructokinase. * Corresponding author. Tel.: +33-1-44-27-34-70; fax: +33-1-44-2734-45. E-mail address: [email protected] (E. Bapteste). 1 Present address: Canadian Institute for Advanced Research. Département de Biochimie, Université Montréal, C.P. 6128 Succursale CentreVille. Montréal, QC. Canada H3C 3J7. 0378-1119/$ - see front matter D 2003 Published by Elsevier B.V. doi:10.1016/S0378-1119(03)00797-2 It belongs instead to the glucokinase (GLK) family of kinases (Verhees et al., 2001). PFK is central to the classical glycolytic pathway, which is present in all three domains of life (Siebers et al., 1998), so that one could assume that the evolution of this enzyme is very constrained. However, this does not seem to be the case, since various types of PFK exist. They use distinct energy phospho-donors, such as ATP and inorganic pyrophosphate (PPi), and fulfil different tasks in the cell. The widely distributed ATP-PFK catalyses an irreversible catabolic reaction, the phosphorylation of fructose-6-phosphate to fructose-1,6-bisphosphate, while the PPi-PFK, also called PPi-phosphofructokinase (PFP), catalyses the same reaction in a reversible way and can thus function both in glycolysis and gluconeogenesis. Furthermore, in many cases, the physiological role of PFP is not obvious (Siebers et al., 1998). More precisely, the simultaneous presence of PFP and ATP-PFK (i.e., of a reversible and an irreversible enzyme) in a single organism, even if rarely reported, has suggested that PFP may perform an alternative unknown function (Alves et al., 2001; Van Praag, 1997). 186 E. Bapteste et al. / Gene 318 (2003) 185–191 Biochemical approaches, including functional, structural, and mutational analyses, have led to the determination of several amino acids essential for ATP-PFK and PFP functions (Chi and Kemp, 2000; Moore et al., 2002), in particular, the positions 104 and 124 (according to the numbering of Escherichia coli ATP-PFK) (Chi and Kemp, 2000). First, it has been demonstrated that Gly at position 104, present in all ATP-PFK, is essential for the use of ATP (Chi and Kemp, 2000; Moore et al., 2002; Van Praag, 1997). By contrast, the Asp residue present at this position in PFP prevents ATP from binding by sterical hindrance (Moore et al., 2002). Second, when the position 124 in E. coli PFK is occupied by Gly, the absence of a lateral chain allows room for the alpha-phosphate of ATP. This is not the case when a larger amino acid, Lys, is present at this position, as it occurs in all characterised PFP (Hinds et al., 1998). In fact, this Lys plays an important role in the recognition of PPi, which has been biochemically tested in Entamoeba histolytica and in several other organisms (Chi and Kemp, 2000; Hinds et al., 1998; Lopez et al., 2002; Moore et al., 2002). In summary, PFK working with ATP harbours Gly at positions 104 and 124, while PFP has an Asp and a Lys (Chi and Kemp, 2000; Claustre et al., 2002; Moore et al., 2002). However, the atypical amino-acidic combination Gly104 and Lys124 has been reported in E. histolytica (Chi and Kemp, 2000), Trypanosoma brucei (Claustre et al., 2002), Leishmania donovani (Lopez et al., 2002), and Chlamydia trachomatis (Moore et al., 2002). For the three first cases, biochemical characterisation indicates that these PFK use ATP as a phospho-donor, and not PPi. Moore et al. (2002) argued that this same phospho-donor is likely used also in C. trachomatis. ATP-PFK and PFP share a common ancestry, but phylogenies show a very complex evolutionary pattern (Müller et al., 2001; Siebers et al., 1998). An ancient duplication has been supposed to give birth to two groups of PFK, using PPi and ATP as phospho-donor, respectively (Alves et al., 1996). However, their monophyly is questioned (Müller et al., 2001; Siebers et al., 1998) as well as the validity of the amino acid content in sites 104 and 124 as a phylogenetic signature. In fact, the evolutionary history of PFK does not coincide in several points with accepted notions of organismic relationships. On the contrary, several duplications and horizontal gene transfer (HGT) blur the phylogeny of PFK (Müller et al., 2001). Some recent duplications were followed by the differentiation of catalytic and regulatory subunits in animals, fungi, and plants (Heinisch et al., 1989; Kemp and Gunasekera, 2002; Poorman et al., 1984; Van Praag, 1997). In addition, the fusion of the catalytic and regulatory subunits in fungi, animals, and Dictyostelium led to huge variation in sequence size of PFK. All these duplications and HGT events can eventually lead to multiple divergent copies of PFK in a single species. In this work, we have analysed a large data set of ATPPFK and PFP sequences to study the distribution of the different types of PFK (ATP-PFK, PFP, and those with atypical active sites) and to estimate the frequency of HGT events. We include a new sequence from the choanoflagellate Monosiga ovata providing an additional example of HGT affecting eukaryotic species. We also looked carefully at positions 104 and 124 and detected several atypical active sites (Gly104 + Lys124) in distantly related species. Hence, we suggest that adaptive changes of phospho-donors, requiring a mutation at position 104, have occurred independently in many lineages. The activity of these particular proteins should be biochemically tested, and if they possess the expected ATP-PFK activity, species owning simultaneously ATP-PFK and PFP in their genomes would be in fact quite common. 2. Materials and methods 2.1. Sequencing The sequence of the pfk gene from M. ovata was obtained from random sequencing of a cDNA library (collaboration with Dr. P. Holland, to be published elsewhere). Both 5Vand 3Vends of the pfk clone were sequenced, allowing and overlapping of around 500 nucleotides for a total cDNA length of 897 nucleotides. The sequence has been submitted to GenBank under the accession number AY291291. 2.2. Sequence recovery and alignment Most PFK protein sequences were retrieved from GenBank using the program ALIBABA (Philippe Lopez, unpublished work). The sequences from Dictyostelium discoideum, fungi, and animals (formed by the fusion of the catalytic and regulatory subunits) were split in two parts to align separately the catalytic and the regulatory regions. The sequences were aligned with CLUSTAL W (Thompson et al., 1994) and the alignment was manually refined with the program ED of the MUST package (Philippe, 1993). To construct a more comprehensive data set, we also included sequences obtained from ongoing expressed sequence tag (EST) and genome projects. PFK homologues were detected by TBLASTN search (Altschul et al., 1997). All of the high-scoring segments with a BLAST score below 10 10 were retained and incorporated into the alignment. Only a few regions could be aligned without ambiguity for the entire data set, notably those around the active site of the enzyme. Our complete data set contained 227 sequences. The alignment is available upon request. 2.3. Phylogenetic analysis The complete PFK data set was initially analysed by using the neighbour joining (NJ) method (Saitou and Nei, 1987). Partial and/or phylogenetically very closely related sequences were discarded, yielding a final alignment that E. Bapteste et al. / Gene 318 (2003) 185–191 contained 152 sequences and 153 unambiguously aligned positions. A phylogenetic tree with the 152 representative sequences was then reconstructed using TREE-PUZZLE 5.0 (Schmidt et al., 2002) and Neighbor (Felsenstein, 1999). To handle rate variation among sites, a maximum likelihood (ML) distance matrix with a G law model (eight discrete classes) was computed and then used to reconstruct the tree by the NJ method. Bootstrap values were computed upon 1000 replicates using PUZZLEBOOT (www.tree-puzzle.de/ puzzleboot.sh). An extended majority rule consensus tree from the replicates was inferred by CONSENSE from the PHYLIP package (Felsenstein, 1999). This NJ approach allows to work on a vast number of sequences using a complex model of sequence evolution, which is impossible by a standard maximum likelihood (ML) analysis. Statistical comparisons of alternative tree topologies were carried out by applying the Shimodaira’s (2002) approximately unbiased (AU) test implemented in the program CONSEL (Shimodaira and Hasegawa, 2001). 3. Results 3.1. An odd taxonomic distribution Our taxonomic sample contained almost exclusively bacteria and eukaryotes, with the exception of the sequence of the archaeon Thermoproteus tenax, most likely acquired by HGT (Siebers et al., 1998). All completely sequenced eukaryotic genomes contain a pfk gene, whereas it is missing in some bacterial and in all archaeal completely sequenced genomes. A BLAST search (using standard parameter values) shows that PFK is not present (with an expectation value threshold at 10 30) in the complete genome sequences of some alpha-proteobacteria (Brucella melitensis, Caulobacter crescentus, Rickettsia conorii, Rickettsia prowazekii), some beta-proteobacteria (Neisseria meningitidis, Ralstonia solanacearum), some gamma-proteobacteria (Pseudomonas aeruginosa, Xanthomonas campestris, Xanthomonas citri), epsilon-proteobacteria (Campylobacter jejuni, Helicobacter pylori), the Gram-positive Oceanobacillus iheyensis, the Green sulfur bacterium Chlorobium tepidum, the fusobacterium Fusobacterium nucleatum, and the cyanobacterium Thermosynechococcus elongatus. Several of these species lacking PFK (H. pylori, N. meningitidis, R. solanacearum, S. typhimurium, and X. citri) possess a KDPG aldolase, central in the Entner-Douderoff pathway. 3.2. Numerous HGT events in the PFK tree A comprehensive data set (152 sequences, 153 unambiguously aligned positions) was used to construct a phylogenetic tree (Fig. 1). It allowed the definition of eight monophyletic groups, five of them containing both bacterial and eukaryotic sequences. Five of these groups (X, P, 187 SHORT, LONG, and III) have been previously named by Siebers et al. (1998) and Müller et al. (2001). In this work, we name three additional groups: B1 (containing only bacteria), B2 (containing Clostridium spp. and several proteobacteria), and E (containing mainly eukaryotes). In addition, four sequences (Aquifex aeolicus, Dictyoglomus thermophilum, T. tenax, and one sequence from Thermotoga maritima) do not belong to any of the eight monophyletic groups. The phylogenetic tree is very complex, suggesting an intricate evolutionary history involving duplications and HGT in addition to vertical inheritance. The finding of several monophyletic groups with a wide and phylogenetically coherent taxonomic composition (e.g., the bacterial clade B1 and the eukaryotic clade E) is in agreement with an extensive vertical inheritance. Moreover, phylogenetic relationships within these groups are coherent with accepted phylogenetic groups based on ribosomal RNA and protein markers (Baldauf et al., 2000; Brochier et al., 2002; Embley et al., 1992; Van de Peer et al., 2000). For instance, metazoa are sister group to fungi in group E, and gamma proteobacteria emerge within a well-supported clade. Within group B1, we found both the monophyly of Thermus + Deinococcus and that of Cytophaga + Bacteroides. Yet, other regions of the tree are more puzzling. Some species that are thought to be closely related are separated among several groups (e.g., the alpha-proteobacteria in groups B1, B2, P, and III). In other cases, some groupings were recovered with unusual relationships inside a group (e.g., spirochetes forming a monophyletic group with the choanoflagellate M. ovata and plants in group X). This may have resulted from ancient duplication events followed by differential gene losses. However, the number of duplications and losses needed to explain that the complete phylogeny would probably be very large, so that at least for a number of cases, independent HGT events seem a parsimonious explanation. One of the clearest examples of HGT concerns the choanoflagellate M. ovata, expected to branch in clade E as sister of metazoa (Lang et al., 2002) but which emerges as relative of spirochetes in clade X (Fig. 1). HGT may also concern species with several pfk copies (see numbers on the right of the species names in Fig. 1). For instance, a delta proteobacterium Desulfitobacterium hafniense and a low GC Gram-positive bacterium Clostridium perfringens harbours three gene copies, branching in groups B1, B2, and III. Some eukaryotes also have multiple pfk gene copies. However, in some cases, they arose from duplication events, such as in Dictyostelium, fungi, and metazoa forming two paralogous clusters within the group E, as supported by the tree topology. In other cases, one of the copies has probably been acquired by HGT (e.g., E. histolytica, present in groups X and LONG). More complex scenarios simultaneously involving HGT and gene duplications are also found. For instance, in group LONG, plants, apicomplexa (Plasmodium and Cryptosporidium), and chlamydiales have at least two copies. One possible interpreta- 188 E. Bapteste et al. / Gene 318 (2003) 185–191 51 Aquifex aeolicus **G*C** Magnetococcus sp. 2 **G*G** Bacteroides fragilis 2 **G*G** Cytophaga hutchinsonii **G*G** 58 Thermus thermophilus **G*G** Deinococcus radiodurans **G*G** Desulfitobacterium hafniense 3 **G*G** 2 **G*G** Clostridium difficile Paenibacillus macquariensis **G*G** 54 Clostridium perfringens 3 **G*G** Clostridium botulinum **G*G** 93 Clostridium acetobutylicum **G*G** Vibrio cholerae **G*G** Buchnera sp. **G*G** **G*G** Yersinia pestis Yersinia enterocolitica **G*G** 100 Escherichia coli **G*G** 91 Salmonella typhi **G*G** Klebsiella pneumoniae **G*G** 98 100 Enterobacter cloacae **G*G** 58 Haemophilus ducreyi **G*G** Pasteurella multocida **G*G** Haemophilus influenzae **G*G** 82 54 100 Actinobacillus actinomycetemcomitans **G*G** **G*G** Staphylococcus aureus **G*G** Bacillus sphaericus **G*G** Bacillus halodurans 97 **G*G** Bacillus subtilis **G*G** ATP Geobacillus stearothermophilus 86 60 **G*G** Enterococcus faecium 70 Listeria innocua **G*G** **G*G** Lactobacillus delbrueckii **G*G** Streptococcus pneumoniae **G*G** Streptococcus pyogenes **G*G** ATP 80 54 Lactococcus lactis Thermotoga maritima 2 **G*A** ATP Dictyostelium discoideum **G*G** 2 **G*G** Chloroflexus aurantiacus Schistosoma mansoni **G*G** 57 Homo sapiens **G*G** ATP 100 Homo sapiens **G*G** 81 62 Homo sapiens **G*G** 99 Drosophila melanogaster **G*G** ATP Caenorhabditis elegans **G*G** Haemonchus contortus **G*G** ATP 99 Aspergillus oryzae **G*G** Aspergillus oryzae **G*G** 78 Pichia pastoris **G*G** 54 53 Candida albicans **G*G** Saccharomyces cerevisiae 96 **G*G** ATP 52 Kluyveromyces lactis **G*G** Schizosaccharomyces pombe **G*G** Neurospora crassa **G*G** 56 Pichia pastoris **G*G** Candida albicans **G*G** 86 Saccharomyces cerevisiae **G*G** Kluyveromyces lactis **G*G** 100 86 B1 E Dictyostelium discoideum **N*A** reg 100 94 Magnetospirillum magnetotacticum 2 **G*K** Nostoc sp. **G*K** Synechocystis sp. **G*K** Synechocystis sp. **G*K** 90 Nostoc sp. **G*K** 96 99 Nostoc punctiforme **G*K** 2 **G*K** Chloroflexus aurantiacus 3 **G*K** Clostridium perfringens Desulfitobacterium hafniense 3 **G*K** Myxococcus xanthus **G*K** III 66 **D*K** Streptomyces coelicolor **D*K** Streptomyces coelicolor 95 **D*K** Thermobifida fusca 56 **G*K** ATP Streptomyces coelicolor Amycolatopsis mediterranei **D*K** 100 Amycolatopsis methanolica 2 **D*K** PPi 81 Corynebacterium diphtheriae **G*K** Mycobacterium leprae **G*K** 91 **G*K** Mycobacterium tuberculosis 100 Dictyoglomus thermophilum **G*K** Thermoproteus tenax **D*K** PPi 54 Propionibacterium freudenreichii **D*K** PPi **D*K** PPi Mastigamoeba balamuthi P Sinorhizobium meliloti **D*K** 100 **D*K** Agrobacterium tumefaciens 97 Desulfitobacterium hafniense 3 **D*K** Mesorhizobium loti **D*K** Magnetospirillum magnetotacticum 2 **D*K** 99 100 Magnetococcus sp. 2 **D*K** 67 Xylella fastidiosa **D*K** 100 Nitrosomonas europaea **D*K** 100 87 2 **D*K** Clostridium difficile 54 Clostridium perfringens 3 **D*K** Thermotoga maritima **D*K** PPi Naegleria fowleri **D*K** PPi SHORT 99 Trichomonas vaginalis **D*K** PPi 80 Cryptosporidium parvum Plasmodium falciparum 87 Chlamydia muridarum Chlamydophila pneumoniae 100 96 Chlamydia trachomatis 98 Borrelia burgdorferi Spirochaeta thermophila Treponema pallidum Entamoeba histolytica Porphyromonas gingivalis Bacteroides fragilis 91 Cryptosporidium parvum Solanum tuberosum Arabidopsis thaliana 96 Hexamita inflata Giardia intestinalis 90 Chlamydophila pneumoniae Chlamydophila psittaci 99 Chlamydia muridarum 90 58 93 Opisthokonta **E*A** reg 85 82 92 90 0.1 Entamoeba histolytica Amycolatopsis methanolica Trypanoplasma borreli Trypanosoma brucei 94 Leishmania donovani 98 Monosiga ovata Borrelia burgdorferi Treponema denticola 75 Treponema pallidum 76 Arabidopsis thaliana Oryza sativa Oryza sativa 77 Arabidopsis thaliana 99 Arabidopsis thaliana Arabidopsis thaliana Arabidopsis thaliana 2 **G*K** 2 **G*K** **G*K** **G*K** **G*K** **G*K** 2 **G*K** **G*K** 2 **G*K** 2 **G*K** **G*K** **G*K** 2 **G*K** 2 **G*K** 2 **G*K** 2 **G*K** ATP ATP ATP ATP X Zea mays 100 Solanum tuberosum 90 Arabidopsis thaliana 68 B2 **G*K** **G*K** **G*K** **G*K** **G*K** 2 **D*K** **D*K** 2 **D*K** 2 **D*K** **D*K** 2 **D*K** **D*K** **D*K** 2 **D*K** **D*K** **D*K** **N*K** **N*K** **N*K** C. parvum **F*A** P. falciparum **I*L** **T*V** **T*V** **T*V** PPi PPi PPi PPi reg? reg? reg reg reg LONG E. Bapteste et al. / Gene 318 (2003) 185–191 tion is to invoke an ancient duplication in the ancestor of plants and apicomplexa (or two independent duplications in these two lineages) and the subsequent acquisition of the two paralogous copies by the chlamydiales. Finally, species for which a single copy has been identified are also involved in HGT, such as the cases of T. tenax and Mastigamoeba balamuthi (Müller et al., 2001; Siebers et al., 1998). Our phylogenetic tree suggests several HGT events. However, some of them have to be tested because of the lack of resolution of certain regions in the tree. We have therefore analysed alternative tree topologies that minimise the number of HGT events. These topologies concerned nodes supported by bootstrap values < 90%, since we have assumed that nodes with superior bootstrap values were correctly inferred. In particular, we have analysed three alternative topologies affecting the groups that show a mixture of eukaryotic and prokaryotic sequences, namely, groups E, LONG, and X. In group E, we have constrained the position of Chloroflexus aurantiacus at the base of the group; in group LONG, we have constrained the monophyly of the eukaryotic sequences, and in group X, we have constrained the monophyly of the bacterial sequences. Shimodaira’s approximately unbiased test confirmed that these constrained topologies minimising HGT were significantly worse than the preferred tree. Therefore, it appears that HGT events are a likely explanation for the complex phylogeny of PFK. 3.3. A complex evolution of PFK function This complicated PFK phylogeny that seems largely blurred by HGT events, gene duplications, and gene losses provides the framework to understand the relationships between ATP-PFK and PFP as well as phospho-donor transitions in PFK. Sensu stricto ATP-PFK, classically defined on the base of their sequence conservation, notably by the possession of two glycines in the active site, is monophyletic. They were all included into the groups B1 and E. Similarly, four other monophyletic groups in the tree (B2, P, SHORT, and X) showed a homogeneous amino acid composition at the active site. Groups P, B2, and SHORT included only Asp104 + Lys124 PFP, while the group X was constituted exclusively by Gly104 + Lys124 PFK using ATP. However, groups III and LONG contain a mixture of sequences with atypical amino acid combinations (Gly104 + Lys124 and Asp104 + Lys124), meaning that enzymes using ATP and those using PPi are mixed within these two groups. Therefore, Gly104 + Lys124 PFK (see the species marked with 189 **G*K** in Fig. 1) and PFK using PPi probably appeared several times in evolution. Consequently, these amino acid combinations and, likewise, the use of ATP or PPi as phospho-donor do not define reliable phylogenetic signatures. It is also possible to envisage that the transition from one form to the other has occurred several times. Nevertheless, amino acids of these two sites display very little polymorphism, and their evolutionary conservation, covariation, and involvement in phospho-donor recognition have been discussed in several works (Chi and Kemp, 2000; Claustre et al., 2002; Lopez et al., 2002; Moore et al., 2002). In most catalytic PFK sequences, the position 104 is either a Gly or an Asp, while position 124 is either a Gly or a Lys. The exceptions were rare (see Fig. 1). In fact, in our data set, the three sequences of the regulatory alpha subunits of plants (Van Praag, 1997) and the two closely related sequences from apicomplexa contained other amino acids, which would be coherent with the fact that the alphasubunits do not link to phospho-donors. The other exceptions were the sequences from Chlamydophila psitacci, C. pneumoniae, and Chlamydia muridarum (Asn at position 104), from A. aeolicus (Cys at position 124), and from T. maritima (Ala at position 124). 4. Discussion Despite being a key enzyme involved in glycolysis regulation, PFK has a very complex evolutionary history. The presence of this protein in several bacterial species whose genomes have been completely sequenced but its absence in closely related species (e.g., the alpha-proteobacteria) strongly suggests that PFK can be lost. However, the loss of PFK is not synonymous of the loss of glycolysis (for instance, in the species harbouring a KDPG aldolase, central in the Entner-Douderoff pathway). Hence, since PFK can be lost, it is not surprising to find a phylogenetic tree with odd relationships. Such independent losses make the task of identifying paralogous more difficult and could allow subsequent LGT of PFK in the species secondarily devoid of PFK. Moreover, flexibility in the use of PFK also resulted from independent acquisitions of new PFKs by HGT. Our tree illustrates that a key enzyme can be obtained from another species. Indeed, HGT led to a complex phylogeny with groups at odds with the classical relationships (Müller et al., 2001). This could occur if the native PFK is replaced by a new one or if the original copy is conserved together with Fig. 1. Unrooted NJ tree for 152 PFK sequences and 153 amino acid positions based on distances calculated with a G law correction. Eukaryotic species are in bold, while prokaryotic species are in italic. Numbers at nodes are bootstrap values (only values >50% are shown). Monophyletic groups are named according to Müller et al. (2001) and this work. Solid circles indicate the most parsimonious distribution of G104 to D104 mutations on the tree. Numbers after species names indicate species with multiple PFK copies in distantly related groups. The various amino acid combinations at positions 104 and 124 are reported after species name, notably **G*G** for the typical ATP-PFK, **D*K** for the typical PFP, and **G*K** for putative atypical PFK using ATP. If the sequence has been biochemically characterised, its phospho-donor is mentioned. Confirmed and putative regulatory subunits are indicated by reg and reg?, respectively. The triangle corresponds to fungi and metazoa regulatory sequences. The scale bar corresponds to the number of substitutions per site. 190 E. Bapteste et al. / Gene 318 (2003) 185–191 the new copy, leading to species harbouring distantly related PFK (e.g., in Amycolatopsis methanolica). In many cases, conservation of two distantly related PFK enzymes in a single species could be further explained in terms of adaptability, each copy allowing the use of either a phospho-donor or the other, potentially enhancing the fitness of the species. Such species have a PFK using ATP and a PFK using PPi, one of which may have been obtained by HGT. In fact, species with both enzymes would be or would have been able to initiate glycolysis either with ATP or with PPi, which may be adaptive, according to the relative concentrations of these metabolites in the cell and its environment. For example, parasites may be advantaged by a PFK using PPi to economise their limited stock of ATP (Moore et al., 2002). Moreover, supplementary PFK copies may be recruited in alternative metabolic pathways. For instance, ATP-PFK participate in the RuMP cycle in Amycolatopsis (Alves et al., 2001). As a single-point mutation can induce a change of the phospho-donor (Chi and Kemp, 2000), one can imagine that the opposite mutation should revert it, allowing the PFK to switch from the use of PPi to the use of ATP and vice versa. Our phylogenetic tree suggests that these changes of phospho-donor concerned several species. One example can be found within group III, containing a monophyletic group of Gram-positive bacteria showing a mixture of sequences with either Gly104 or Asp104. Moreover, the direction of these changes can be suggested. It is most parsimonious to postulate that Gly104 + Lys124 PFK using ATP was ancestral and that PFK using PPi evolved more recently, in seven independent occasions (i.e., once in T. tenax, once in group LONG before the emergence of Borrelia burgdorferi, once in the ancestor of group SHORT, once in the ancestor of groups B2 and P, and three times in the group III, in the ancestor of Thermobifida fusca, Streptomyces coelicolor, and Amycolatopsis mediterranei). This suggests that at least seven species have evolved from an irreversible glycolysis towards a reversible one, whereas the opposite appears to have never occurred in nature (Fig. 1). Yet, this result is very sensitive to our still limited taxonomic sample and would need additional confirmation. We show here that the presence of multiple PFK copies with different active sites is a common situation in nature. It is notably the case for all the plants, some apicomplexa (Cryptosporidium parvum, Plasmodium falciparum), some amoeba (E. histolytica), some high GC Gram-positive bacteria (S. coelicolor), some alpha-proteobacteria (Magneetococcus sp., Magnetospirillum magnetotacticum), some low GC Gram-positive bacteria (D. hafniense), some spirochetes (B. burgdorferi, Treponema pallidum), and green sulfur bacteria (C. aurantiacus). Thus, in these species, multiple PFK copies may be involved in different metabolic functions. The complex phylogeny of PFK illustrates that tinkering in evolution can concern fundamental molecular processes. Our study emphasises the potential adaptability of this key enzyme—lost, gained, and modifiable by single-point mutations—in living beings. Finally, our results complicates the debate on the nature of the ancestral phospho-donor used by the PFK, which opposes the advocates of the emergence of metabolism in an organic or an inorganic context (see Siebers et al., 1998; Chi and Kemp, 2000, for two opposed opinions). We suggest that in several cases, the use of PPi as phospho-donor may be derived. However, in addition to possible convergent adaptive mutations, HGT events, including those of mutated sequences, further complicate the distribution of these enzymes. Hence, it is not possible with this tree to conclude which phospho-donor, ATP, or PPi was ancestrally used by PFK. Acknowledgements We thank Simonetta Gribaldo and Miklós Müller for critical reading of the manuscript, and Céline Brochier for advice on bacterial phylogeny. We thank Peter Holland and Elizabeth Snell for the M. ovata PFK sequence. Genome and EST sequence data of the species E. histolytica, Oryza sativa, Zea mays, C. parvum, and P. falciparum were obtained from databases deposited at GenBank (www.ncbi. nlm.nih.gov/dbGSS/index.html and www.ncbi.nlm.nih.gov/ dbEST/index.html). Neurospora crassa EST data were obtained from the Center for Genome Research (www-genome. wi.mit.edu/annotation/fungi/neurospora). Bacteroides fragilis, Clostridium difficile, Clostridium botulinum, Staphylococcus aureus, Streptococcus equi, Yersinia enterocolitica, and Corynebacterium diphtheriae sequence data were obtained from the Sanger Institute (www.sanger.ac.uk/ DataSearch/). Treponema denticola and Chlamydophila psittaci sequence data were obtained from the Institute for Genomic Research (www.tigr.org/tdb/mdb/mdbcomplete. html). Cytophaga hutchinsonii, D. hafniense, C. aurantiacus, M. magnetotacticum, Nostoc punctiforme, and T. fusca sequence data were obtained from the DOE Joint Genome Institute (www.science.doe.gov/ober/EPR/mig_cont.html). References Altschul, S.F., Madden, T.L., Schaffer, A.A., Zhang, J., Zhang, Z., Miller, W., Lipman, D.J., 1997. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25, 3389 – 3402. Alves, A.M., Meijer, W.G., Vrijbloed, J.W., Dijkhuizen, L., 1996. Characterization and phylogeny of the pfp gene of Amycolatopsis methanolica encoding PPi-dependent phosphofructokinase. J. Bacteriol. 178, 149 – 155. Alves, A.M., Euverink, G.J., Santos, H., Dijkhuizen, L., 2001. Different physiological roles of ATP- and PP(i)-dependent phosphofructokinase isoenzymes in the methylotrophic actinomycete Amycolatopsis methanolica. J. Bacteriol. 183, 7231 – 7240. Baldauf, S.L., Roger, A.J., Wenk-Siefert, I., Doolittle, W.F., 2000. A kingdom-level phylogeny of eukaryotes based on combined protein data. Science 290, 972 – 977. E. Bapteste et al. / Gene 318 (2003) 185–191 Brochier, C., Bapteste, E., Moreira, D., Philippe, H., 2002. Eubacterial phylogeny based on translational apparatus proteins. Trends Genet. 18, 1 – 5. Chi, A., Kemp, R.G., 2000. The primordial high energy compound: ATP or inorganic pyrophosphate? J. Biol. Chem. 275, 35677 – 35679. Claustre, S., Denier, C., Lakhdar-Ghazal, F., Lougare, A., Lopez, C., Chevalier, N., Michels, P.A., Perie, J., Willson, M., 2002. Exploring the active site of Trypanosoma brucei phosphofructokinase by inhibition studies: specific irreversible inhibition. Biochemistry 41, 10183 – 10193. Conway, T., 1992. The Entner-Douderoff pathway: history, physiology and molecular biology. FEMS Microbiol. Rev. 9, 1 – 27. Embley, T.M., Thomas, R.H., Wlliams, R.A.D., 1992. Reduced thermophilic bias in the 16S rDNA sequence from Thermus ruber provides further support for a relationship between Thermus and Deinococcus. Syst. Appl. Microbiol. 16, 25 – 29. Felsenstein, J., 1999. PHYLIP—Phylogeny Inference Package. University of Washington, Seattle, WA. Heinisch, J., Ritzel, R.G., von Borstel, R.C., Aguilera, A., Rodicio, R., Zimmermann, F.K., 1989. The phosphofructokinase genes of yeast evolved from two duplication events. Gene 78, 309 – 321. Hinds, R.M., Xu, J., Walters, D.E., Kemp, R.G., 1998. The active site of pyrophosphate-dependent phosphofructo-1-kinase based on site-directed mutagenesis and molecular modeling. Arch. Biochem. Biophys. 349, 47 – 52. Kemp, R.G., Gunasekera, D., 2002. Evolution of the allosteric ligand sites of mammalian phosphofructo-1-kinase. Biochemistry 41, 9426 – 9430. Lang, B.F., O’Kelly, C., Nerad, T., Gray, M.W., Burger, G., 2002. The closest unicellular relatives of animals. Curr. Biol. 12, 1773 – 1778. Lopez, C., Chevalier, N., Hannaert, V., Rigden, D.J., Michels, P.A., Ramirez, J.L., 2002. Leishmania donovani phosphofructokinase. Gene characterization, biochemical properties and structure-modeling studies. Eur. J. Biochem. 269, 3978 – 3989. Moore, S.A., Ronimus, R.S., Roberson, R.S., Morgan, H.W., 2002. The structure of a pyrophosphate-dependent phosphofructokinase from the Lyme disease spirochete Borrelia burgdorferi. Structure (Camb.) 10, 659 – 671. Müller, M., Lee, J.A., Gordon, P., Gaasterland, T., Sensen, C.W., 2001. 191 Presence of prokaryotic and eukaryotic species in all subgroups of the pp(i)-dependent group ii phosphofructokinase protein family. J. Bacteriol. 183, 6714 – 6716. Philippe, H., 1993. MUST, a computer package of Management Utilities for Sequences and Trees. Nucleic Acids Res. 21, 5264 – 5272. Poorman, R.A., Randolph, A., Kemp, R.G., Heinrikson, R.L., 1984. Evolution of phosphofructokinase-gene duplication and creation of new effector sites. Nature 309, 467 – 469. Saitou, N., Nei, M., 1987. The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol. Biol. Evol. 4, 406 – 425. Schmidt, H., Strimmer, K., Vingron, M., von Haeseler, A., 2002. TREEPUZZLE: maximum likelihood phylogenetic analysis using quartets and parallel computing. Bioinformatics 18, 502 – 504. Shimodaira, H., 2002. An approximately unbiased test of phylogenetic tree selection. Syst. Biol. 51, 492 – 508. Shimodaira, H., Hasegawa, M., 2001. CONSEL: for assessing the confidence of phylogenetic tree selection. Bioinformatics 17, 1246 – 1247. Siebers, B., Klenk, H.P., Hensel, R., 1998. PPi-dependent phosphofructokinase from Thermoproteus tenax, an archaeal descendant of an ancient line in phosphofructokinase evolution. J. Bacteriol. 180, 2137 – 2143. Thompson, J.D., Higgins, D.G., Gibson, T.J., 1994. CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. 22, 4673 – 4680. Van de Peer, Y., Baldauf, S.L., Doolittle, W.F., Meyer, A., 2000. An updated and comprehensive rRNA phylogeny of (Crown) eukaryotes based on rate-calibrated evolutionary distances. J. Mol. Evol. 51, 565 – 576. Van Praag, E., 1997. Use of 3-D computer modelling and kinetic studies to analyse grapefruit pyrophosphate-dependent phosphofructokinase. Int. J. Biol. Macromol. 21, 307 – 317. Verhees, C.H., Tuininga, J.E., Kengen, S.W., Stams, A.J., van der Oost, J., de Vos, W.M., 2001. ADP-dependent phosphofructokinases in mesophilic and thermophilic methanogenic archaea. J. Bacteriol. 183, 7145 – 7153.