Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Evolution of Elongation Factor G and the Origins of Mitochondrial and Chloroplast Forms Gemma C. Atkinson* ,1 and Sandra L. Baldauf1 1 Department of Systematic Biology, Evolutionary Biology Centre, Uppsala University, Uppsala, Sweden Present address: University of Tartu, Centre of Biomedical Technology, Institute of Technology, Tartu, Estonia *Corresponding author: E-mail: [email protected]. Associate editor: Martin Embley Abstract Key words: EF-G, elongation factor G, organelle, xenology, paralogy, ribosome, translation. Introduction Elongation factor G (EF-G) is an ancient translational GTPase (trGTPase) and the bacterial homolog of eukaryotic eEF2 and archaeal aEF2. EF-G/EF2 is a universal protein, one of at least four trGTPases that were present in the last universal common ancestor along with EF1, IF2, and possibly, SelB (Leipe et al. 2002; Margus et al. 2007). The EF2 family altogether consists of at least nine protein subfamilies of which five are bacterial (EF-G, LepA, RF3, Tet, TypA, and EFG-L), one is archaeal (aEF2), and three are eukaryotic (eEF2, Ria1p, and Snu114p) (Leipe et al. 2002; Atkinson and Baldauf 2009). In all three domains of life, EF2 functions in the elongation stage of protein synthesis, promoting the translocation of peptidyl-transfer RNA (tRNA) from the A to the P site of the ribosome (Rodnina et al. 1997). In addition, bacterial EF-G has recently been shown to be an essential component of the ribosome recycling complex, which causes the two ribosomal subunits to dissociate. This final stage of protein synthesis requires EF-G, the ribosome recycling factor (RRF), and initiation factor 3 (IF3) (Hirokawa et al. 2005; Zavialov et al. 2005). EF-G comprises five domains, the G (GTPase) domain and four additional domains referred to by number (II– V). The G domain is responsible for GTP binding and hydrolysis and is common to all GTPases, whereas domain II is common to all trGTPases, and domains III–V are EF2 family specific. Together, the latter three domains are proposed to mimic the structure of aminoacyl-tRNA (Nyborg et al. 1996) and to bind to the same region of the ribosome, the A site. EF-G domains III and IV also interact directly with RRF during ribosome recycling (Ito et al. 2002; Gao et al. 2007). EF-G promotes an interdomain rotation of RRF, which in turn forces the two ribosomal subunits apart (Gao et al. 2007). EF-G is encoded by the gene fusA, which tends to occur in the highly conserved str operon. The operon classically consists of four protein-coding genes in the order 5#-rpsLrpsG-fusA-tufA-3#, an organization found widely in bacteria and also in some archaea (Lathe et al. 2000). In addition, str is often followed immediately downstream by all or part of the ribosomal S10 operon. Together, these two operons are hypothesized to have formed an ancient ‘‘str über-operon’’ (5#-rpsL-rpsG-fusA-tufA-rpsJ-splC-3#) (Lathe et al. 2000). © The Author 2010. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. All rights reserved. For permissions, please e-mail: [email protected] Mol. Biol. Evol. 28(3):1281–1292. 2011 doi:10.1093/molbev/msq316 Advance Access publication November 22, 2010 1281 Research article Protein synthesis elongation factor G (EF-G) is an essential protein with central roles in both the elongation and ribosome recycling phases of protein synthesis. Although EF-G evolution is predicted to be conservative, recent reports suggest otherwise. We have characterized EF-G in terms of its molecular phylogeny, genomic context, and patterns of amino acid substitution. We find that most bacteria carry a single ‘‘canonical’’ EF-G, which is phylogenetically conservative and encoded in an str operon. However, we also find a number of EF-G paralogs. These include a pair of EF-Gs that are mostly found together and in an eclectic subset of bacteria, specifically d-proteobacteria, spirochaetes, and planctomycetes (the ‘‘spd’’ bacteria). These spdEFGs have also given rise to the mitochondrial factors mtEFG1 and mtEFG2, which probably arrived in eukaryotes before the eukaryotic last common ancestor. Meanwhile, chloroplasts apparently use an a-proteobacterial–derived EF-G rather than the expected cyanobacterial form. The long-term comaintenance of the spd/ mtEFGs may be related to their subfunctionalization for translocation and ribosome recycling. Consistent with this, patterns of sequence conservation and site-specific evolutionary rate shifts suggest that the faster evolving spd/mtEFG2 has lost translocation function, but surprisingly, the protein also shows little conservation of sites related to recycling activity. On the other hand, spd/mtEFG1, although more slowly evolving, shows signs of substantial remodeling. This is particularly extensive in the GTPase domain, including a highly conserved three amino acid insertion in switch I. We suggest that subfunctionalization of the spd/mtEFGs is not a simple case of specialization for subsets of original activities. Rather, the duplication allows the release of one paralog from the selective constraints imposed by dual functionality, thus allowing it to become more highly specialized. Thus, the potential for fine tuning afforded by subfunctionalization may explain the maintenance of EF-G paralogs. MBE Atkinson and Baldauf · doi:10.1093/molbev/msq316 Eukaryotes conduct cytoplasmic protein synthesis with eEF2, but most eukaryotic nuclear genomes also encode organelle-targeted EF-Gs that function in mitochondrial and/or plastid protein synthesis (mtEFG and cpEFG, respectively). Organelle targeting of these proteins has been demonstrated by their purification from mitochondrial and chloroplast cellular fractions, respectively (Grandi and Kuntzel 1970; Ciferri and Tiboni 1973), and by whole organelle proteome analyses of both mitochondria and chloroplasts (Sickmann et al. 2003; Heazlewood et al. 2007; Smith et al. 2007). Nonetheless, all organellar EFGs are nuclear encoded. This is presumably the result of endosymbiotic gene transfer early in eukaryotic evolution, as has occurred more recently for its 3# str neighbor tufA in both plastids (Baldauf and Palmer 1990; Martin 2003; Timmis et al. 2004) and mitochondria (Lang et al. 1997). Due to their antiquity, universality, and high levels of sequence conservation, various members of the EF-G protein family have been used to address important evolutionary questions. These include whether or not archaea are monophyletic (Cammarano et al. 1999) and the position of the root of the universal tree (Iwabe et al. 1989; Baldauf et al. 1996). However, the ability of molecular phylogeny to recover taxonomic relationships among bacteria and archaea has come into question due to the discovery of extensive horizontal gene transfer (HGT) among them (Doolittle 1999). On the other hand, it has been argued that genes encoding components of complex networks, such as transcription and translation, should be resistant to functional HGT (Jain et al. 1999). This is supported by an observed low number of detected transfers of essential genes in highly connected interaction networks (Wellner et al. 2007). By this reasoning, the genes encoding EF-G, a core information processing protein and therefore at the heart of a complex network, are predicted to be refractory to HGT. Nonetheless, anomalies in EF-G phylogeny have been noted and interpreted as HGT, both between band c-proteobacteria (Brochier et al. 2002) and between spirochetes and eukaryotes (Doolittle 1999; Nosenko et al. 2006). We have examined EF-G family evolution using molecular phylogeny, consensus sequence comparisons, substitution rate divergence analysis, and genomic context analysis. Our results confirm that most bacteria carry a single slowly evolving EF-G encoded in a canonical str operon. However, operon-free EF-G duplicates are also found, and these are scattered across the tree. All but two of these duplicates have very limited taxonomic distribution, and these two also have very similar taxonomic distributions, including most mitochondriate eukaryotes. The frequent cooccurrence of these two EF-Gs suggests that they are codependent, most likely due to subfunctionalization, as shown experimentally for Borrelia and human EF-Gs (Tsuboi et al. 2009; Suematsu et al. 2010). We investigate the evolution of these two proteins in light of the known functional roles of EF-G. 1282 Methods Data Set Assembly EF-G sequences were retrieved from NCBI using BlastP (Altschul et al. 1997) with an E-value cutoff of e40. Bacterial sequences were identified by searching all completely sequenced genomes in the NCBI database. Due to the large number of these genome sequences, target genomes were limited to one representative for each genus in the order they are listed on the NCBI genomes table. Preliminary phylogenetic analyses identified two paralogous EF-G clades in spirochetes, planctomycetes, and d-proteobacteria (referred to here as ‘‘spd’’ bacteria). Therefore, additional searches were conducted against all spd genomes in the NCBI genome and nr database. Bacterial small subunit ribosomal RNA (SSU rDNA) sequences were downloaded from the ribosomal database project (Cole et al. 2009). Eukaryotic sequences were identified through searches against a taxonomically wide sampling of complete genome sequences (supplementary table S1, Supplementary Material online). Additional eukaryotic EF-G sequences were retrieved from BlastP searches against genomes from the following species: Eimeria tenella (GeneDB, Sanger Institute—http://www.genedb.org/); Aureococcus anophagefferens, Chlamydomonas reinhardtii, Micromonas pusilla, Naegleria gruberi, Phaeodactylum tricornutum, Thalassiosira pseudonana, Volvox carterii, and Emiliania huxleyi (Department of Energy Joint Genome Institute—http:// www.jgi.doe.gov/); Cyanidioschyzon merolae (University of Tokyo—http://merolae.biol.s.u-tokyo.ac.jp/); Toxoplasma gondii (ToxoDB—http://ToxoDB.org/); and Tetrahymena thermophila (NCBI nr database via trGTPbase: www.trGTPbase.org; Atkinson GC and Baldauf SL, unpublished data). Phaeodactylum tricornutum mtEFG1 was assembled from two overlapping gene models: e_gw1.5.154.1_Phatr2:11200 and e_gw1.5.161.1_Phatr2: 11121 (http://www.jgi.doe.gov/). The 5# ends of predicted mtEFG2 sequences from three species of Leishmania were completed by BlastN searches of the genome sequences to retrieve the full-length gene sequences, which were then translated with Transeq (http://www.ebi.ac.uk/Tools/). The EF-G identity of all sequences was confirmed by scanning against a set of EF2 family profile hidden markov models (HMMs) generated from subsets of a curated trGTPase superfamily alignment (trGTPbase; Atkinson GC and Baldauf SL, unpublished). The resulting EF-G data set is referred to here as EFG_dset1. EFG_dset1 was aligned using MAFFT v6.234b with strategy L-INS-I (Katoh et al. 2005). Consensus sequences were then computed from the full alignment using the Python script ConsensusFinder (Atkinson GC and Baldauf SL, unpublished data), which indicates sites with conservation of a single amino acid in uppercase characters and conservation of a substitution group in lowercase. A 70% consensus sequence was constructed and used to identify well-conserved regions These unambiguously aligned regions were confirmed and extracted using the Gblocks server v.0.91b (http://molevol. cmima.csic.es/castresana/Gblocks_server.html; Talavera and Evolution of Chloroplast and Mitochondrial EF-G · doi:10.1093/molbev/msq316 Table 1. The Total Number of Gains, Losses, and Changes in Consensus versus Canonical EF-G for Various Classes of EF-Gs. Change in Numbers of Consensus Sites vs. Canonical EF-G Gain EF-G Domain spd/mtEFGs spdEFG1 mtEFG1 spd1mtEFG1 spdEFG2 mtEFG2 spd1mtEFG2 Change Loss G II III IV V G II III IV V G II III IV V 36 18 8 16 10 9 6 4 0 1 31 14 10 16 8 14 5 4 0 2 22 11 5 11 4 7 3 4 0 1 11 1 5 5 1 1 0 1 0 0 11 4 2 0 1 4 0 1 0 0 4 1 2 0 0 1 0 0 0 0 24 23 13 50 57 37 16 19 14 22 26 20 18 21 17 29 33 24 7 10 4 27 29 26 8 14 7 19 17 11 NOTE.—Consensus sequences were calculated at the 80% level, after excluding highly divergent sequences and are summarized by EF-G structural domains (Ævarsson et al. 1994). Castresana 2007), run with the more lenient ‘‘smaller final blocks’’ option enabled. A trimmed-down version of the large EFG_dset1, EFG_dset2, was created to enable the more computationally demanding Bayesian phylogenetic analysis. Taxa to be trimmed were identified from a maximum likelihood (ML) tree of EFG_dset1 constructed with RaxML (Stamatakis 2006). Representatives were selected to sample within strongly supported clades (.70% ML bootstrap support) and across all bacterial phyla. In order to minimize long-branch attraction affects (Bergsten 2005), only those sequences from the spd/mtEFG1 and spd/mtEFG2 yielding the shortest branches were kept. The dimensions of the final data sets are as follows: EFG_dset1: 343 sequences, 400 positions and EFG_dset2: 93 sequences, 400 positions. Phylogenetic Analyses Phylogenetic analyses were conducted using Bayesian Inference (BI) and ML. BI analysis of EFG_dset2 used MrBayes 3.2 (Ronquist and Huelsenbeck 2003). ML analyses EFG_dset1 and EFG_dset2 used RAxML 7.0.4 (Stamatakis 2006). MrBayes was run with a mixture of all available amino acid substitution models, which converged on the Whelan and Goldman (WAG) model with probability 1.0, and a C distribution with a number of invariant sites estimated by the program. Two simultaneous runs of eight chains were carried out from a random starting tree for 1 million generations, with results saved every ten generations. The standard deviation of split frequencies at termination of the analysis was 0.047. At this point, the two runs had converged on very similar tree topologies. Following summation of parameters over the entire run to confirm burn in, results from the first 100,000 generations per run were discarded and posterior probabilities (biPP) estimated from the remaining 180,000 trees. RAxML analyses were conducted using the PROTMIXWAG model with 25 distinct per site rate categories (Stamatakis 2006). This implements the WAG substitution model with the CAT approximation for rate heterogeneity MBE during the topology search, evaluating the final tree under a C distribution of rates. Optimal tree searches consisted of 100 replicates, each starting from a maximum parsimony starting tree. Clade robustness was assessed with 100 bootstrap replicates, with the bootstrap percentage (mlBP) shown for each branch. Relative evolutionary rates for different EF-G types were computed with the relative rate test as implemented in RRTree (Robinson-Rechavi and Huchon 2000). Site-Specific Sequence Conservation and Functional Diversification Analyses Two methods were used for analyzing patterns of sequence evolution: 1) site-specific sequence conservation and 2) ML identification of sites that have experienced shifts in evolutionary rate as implemented in DIVERGE 2.0 (Gu 2001, 2006; Gu and Vander Velden 2002). To analyze site-specific conservation across subfamilies, the total number of gains, losses, and changes in consensus versus canonical EF-G were calculated for each major EF-G type using 80% strict consensus sequences calculated using the Python script ConsensusFinder (Atkinson GC and Baldauf SL, unpublished data, table 1). The data were divided into five classes based on the EFG_dset1 phylogeny (supplementary fig. S3, Supplementary Material online) excluding highly divergent sequences: ‘‘canonical EF-G,’’ ‘‘spdEFG1,’’ ‘‘mtEFG1,’’ ‘‘spdEFG2,’’ and ‘‘mtEFG2.’’ Gains are sites that meet the 80% conservation threshold within a class but not in canonical EF-G. Changes are sites that are conserved both in the class and canonical EF-G, but the identity of the conserved residue has changed. Losses are sites that are unconserved within a class but are conserved in canonical EF-G. DIVERGE 2.0 (Gu 2001, 2006; Gu and Vander Velden 2002) was used to identify sites that have undergone ‘‘Type I’’ and ‘‘Type II’’ divergence in substitution rates between canonical EF-G, spd/mtEFG1, and spd/mtEFG2 subgroups of EF-G. Type I divergence refers to changes in the amino acid substitution rates of homologous sites among clades of sequences. In Type II divergent sites, different amino acids with radically different properties are fixed in duplicates. EFG_dset2 was used as input to DIVERGE, cut down to contain only canonical EF-G, spd/mtEFG1, and spd/ mtEFG2. A neighbor-joining tree was generated by the software using the Kimura distance correction measure (supplementary fig. S5, Supplementary Material online), and the three monophyletic clusters of spd/mtEFG1, spd/mtEFG2, and canonical EF-G were selected for pairwise comparisons. All possible paired cluster comparisons were used to calculate the coefficient of Type I (hI) and Type II (hII) divergence and the posterior probability (PP) that a site has undergone a shift in substitution rates (excluding columns containing gaps) (table 2). Genomic Context The genomic context of EF-G–encoding genes (fus) was determined by retrieving the identity of up- and downstream flanking genes using the Python script ContextFinder 1283 MBE Atkinson and Baldauf · doi:10.1093/molbev/msq316 Table 2. Functionally Divergence in Pairwise Comparisons between spdEFG1, spdEFG1, and Canonical EF-G (canEFG). Sites by Domain Where PP>0.90 Pairwise Combinations spdEFG1 vs. spdEFG2 spdEFG1 vs. canEFG spdEFG2 vs. canEFG uI 0.32 0.30 0.26 uI SE 60.03 60.03 60.03 Total sites where PP > 0.90 43 47 38 G 9 14 10 II 7 8 5 III 4 10 3 IV 18 9 18 V 5 6 2 NOTE.—hI is the coefficient of functional divergence, and SE is standard error. PP is the Posterior probability of a site being functionally divergent, with sites .0.90 indicating a strong probability of divergence. The number of these sites in total and within each domain are shown. (Atkinson GC, unpublished data). This script uses NCBI GI numbers to find corresponding entries in the Entrez Gene database (Maglott et al. 2007) and retrieve gene ID numbers, gene name, title, strand, and coordinates for each flanking gene from their Entrez Gene record. The identity of the genes was assigned by searching for keywords in gene name, title, and COG identifiers in the annotations of each record. Where identity could not be automatically assigned using annotations, the Entrez Gene record was checked by eye. Transit Peptides All putative mitochondrial and plastid sequences were analyzed using TargetP, which scans the N terminus for the predicted ability of sequences to form functional chloroplast, mitochondrial, and signal peptides (Emanuelsson et al. 2000). Apicomplexan nonmitochondrial-type sequences were also analyzed with PATS (predict apicoplast-targeted sequences) (Zuegge et al. 2001) to predict possible apicoplast transit peptides. Results and Discussion Molecular Phylogeny of EF-G We have reconstructed the phylogeny of bacterial and organellar EF-G, which together form a monophyletic subgroup within the EF2 family (Leipe et al. 2002; Atkinson GC and Baldauf SL, in preparation). Altogether, 218 bacterial and 52 eukaryotic genomes were sampled including at least one representative from each genus of bacteria with a completely sequenced genome. Eukaryotic sampling includes representatives of five of the six recognized eukaryotic supergroups (Keeling et al. 2005) as little molecular data are available for Rhizaria. We find that EF-G is universal in bacteria and mitochondria-carrying eukaryotes. In addition, it has given rise to at least 12 paralogs and xenologs derived by gene duplication or HGT. The molecular phylogeny of EF-G shows three main sections or subtrees (fig. 1 and supplementary fig. S3, Supplementary Material online). The largest subtree includes sequences from nearly every examined major group of bacteria. These are also usually the only EF-G in their host cells (supplementary table S1, Supplementary Material online), and they are almost exclusively found in str operons (fig. 1 and supplementary fig. S3, Supplementary Material online). Therefore, these are referred to here as canonical EF-G or simply EF-G. The other two subtrees consist almost entirely of sequences from an eclectic but similar subset of bacteria. These taxa include all 9 examined species of Spi1284 rochetes, 4 of the 5 examined species of Planctomycetes, and 14 of the 21 examined species of d-Proteobacteria. This diverse set of bacteria is herein referred to as the ‘‘spd’’ bacteria and their EF-G sequences as spdEFG1 and spdEFG2 (fig. 1). Each of the spdEFG1 and spdEFG2 subtrees also includes one of the two mitochondrially targeted EF-Gs, mtEFG1 and mtEFG2, respectively (fig. 1 and supplementary fig. S3, Supplementary Material online). Together these account for all the putative or proven mitochondrial EFGs (supplementary table S1, Supplementary Material online). The monophyly of the spd/mtEFG1 clade is strongly supported by both ML and BI phylogenetic methods (1.0 biPP, 100% bootstrap percentages [mlBP]; fig. 1). The clade is also supported by the presence of a small wellconserved insertion with the consensus sequence GVG, which is found exclusively and universally in all spd/ mtEFG1 sequences (positions 47–49 in fig. 2, see below). The grouping of all spdEFG2 sequences and mtEFG2 is less well or consistently supported (1.0 biPP, 65% mlBP; fig. 1), but these sequences also have a very similar taxonomic distribution to spd/mtEFG1 and are therefore referred to here as spd/mtEFG2. Nearly all species with spdEFG1 possess spdEFG2 and vice versa, and these are usually the only EF-Gs encoded in the genome (supplementary table S1, Supplementary Material online). Only two closely related bacteria were found with all three main EF-G types (Anaeromyxobacter sp. and A. halogenans, supplementary table S1 and fig. S3, Supplementary Material online). Canonical EF-G is by far the most widespread of the three EF-G types. It is found in at least some and, in most cases, all representatives of every major bacterial taxon examined except Spirochetes (supplementary table S1 and fig. S3, Supplementary Material online). Within the canonical EF-G subtree, the bacteria are also roughly grouped into recognized higher order taxa (Ciccarelli et al. 2006; Cole et al. 2009), albeit with varying levels of support (supplementary fig. S3, Supplementary Material online). As expected for a single gene or protein tree of bacteria, there is no confident resolution of any deeper branches (Delsuc et al. 2005). Canonical EF-G also has a slower evolutionary rate (0.37 substitutions/site) compared with both spd/mtEFG1 and spd/mtEFG2 (0.42 and 0.58 substitutions/ site, respectively). DIVERGE 2.0 (Gu 2001, 2006; Gu and Vander Velden 2002) was used to calculate the coefficients of Type I functional divergence (hI) among EF-G subtrees (supplementary fig. S5, Supplementary Material online). Type I divergent sites are those with significantly different amino acid substitution rates among subtrees. hI values Evolution of Chloroplast and Mitochondrial EF-G · doi:10.1093/molbev/msq316 MBE FIG. 1 Phylogeny of EF-G proteins and genomic context. The tree shown was derived by BI using data set EFG_dset2 and the program MrBayes. Branch lengths are drawn to scale as indicated by the scale bar. Numbers on the branches correspond to support values following the format: MrBayes posterior probabilities (biPP)/RAxML bootstrap percentages (mlBP) shown for branches with over 0.65 MrBayes biPP support only. Eukaryotic taxon names are followed with major group designations as follows: Me, metazoa; Ee, euglenozoa; Am, amoebozoa; Al, alveolates; St, stramenopiles; Pl, archaeplastida; He, heterolobosae; and Rh, rhodophyta. Major bacerial groups are abbreviated as follows: Plancto: planctomycetes and alpha/beta/gamma/delta/epsilon: proteobacteria of these respective orders. The unique identifying codes for sequences from eukaryotic genome project databases are shown in supplementary fig. S3, Supplementary Material online. Operon structure is indicated by colored blocks to the right of taxon names following the key shown in the inset box. Inset:The classical str operon (Post et al. 1978) is shown with each block representing one gene. Examples of common genomic contexts for the EF-G–encoding fus gene and those specifically discussed in the text are shown, with gene orientation indicated. Starred taxa can be found in the full phylogeny (supplementary fig. S3, Supplementary Material online). Blocks in the same color represent homologous genes and are labeled with gene names. Gray blocks are unconserved within which ‘‘HP’’ stands for hypothetical protein and ‘‘trans’’ stands for transamidase. were significantly greater than 0 and in the same range for all pairwise comparisons (0.26–0.32 with a standard error of 0.03; table 2), indicating significant differences in substitution rates among all the three groups. In contrast to canonical EF-G, the two spd/mtEFG clades include sequences from only a small subset of bacteria (fig. 1). The eclectic taxonomic distribution of the spdEFGs suggests that the encoding genes have been horizontally transferred, 1285 Atkinson and Baldauf · doi:10.1093/molbev/msq316 MBE FIG. 2 Aligned consensus sequences for major EF-G types. Consensus sequences for each subfamily of EF-G subgroups were calculated at the 70% level using the Python program ConsensusFinder (G.C.A. and S.L.B.). A ‘‘-’’ denotes a consensus gap position, and an uppercase letter represents an amino acid conserved in .70% of the sequences. Lowercase letters denote the most common representative of an amino acid substitution group as defined by the BLOSUM62 matrix, where the group as a whole is present at the designated consensus level. Positions present in all sequences but for which neither conservation threshold is met are denoted by dots (‘‘.’’). Stars (*) show predicted RRF interacting sites (Gao et al. 2007), which cluster into five patches (numbers 1–5 below stars). The G1-5 GTP-interacting loops are indicated under the alignment. The colored numbering above the alignment indicates the five structural domains of EF-G. Regions to which specific functions have been assigned are indicated with boxes, and the P loop, switch 1, and switch 2 regions are also labeled. The turquoise box shows tRNAinteracting loops (Gao et al. 2009). ‘‘GVG’’ shows the location of the GVG three amino acid insertion. Residue highlighting follows the conservation pattern of spd/mtEFG1 and 2 relative to EF-G: yellow—differentially conserved, green—specifically conserved in either mt/ spdEFG1, and pale blue—no conservation. Secondary structure is indicated beneath the alignment, with parentheses denoting helices and arrows denoting sheets. Green, loops protruding into the ribosome; pink, site of interaction with SRL loop of ribosome; turquoise, tRNAinteracting loops; and purple, sites that form a network of interactions to stabilize structure. Sites with a high probability of being functionally divergent (PP . 0.90) as determined by DIVERGE 2.0 are boxed with a black line. possibly multiple times. HGT of spdEFG1, between spirochetes and d-proteobacteria, is also supported by the shared presence of an adjacent downstream gene in Treponema denticola and several d-proteobacteria (fig. 1 and supplementary fig. S1, Supplementary Material online). The two spd/mtEFG clades do not cluster together as a monophyletic group in the tree (fig. 1), suggesting that they were independently transferred. However, intervening branches between the two clades are few and lack strong support (fig. 1). Additionally, the fact that spdEFG1 and spdEFG2 cooccur in three very different groups of bacteria—spirochaetes, d-proteobacteria, and planctomycetes—is most parsimoniously explained by cotransfer. The simplest explanation for this would be that the spdEFG-encoding genes first arose by tandem duplication and that they remained adjacent in the genome long enough to be cotransferred several times. However, these genes are not found at adjacent locations in any examined bacterium. Thus, the co-HGT of these genes, if it did occur, would probably have been largely limited to a narrow window of time when the duplicates were proximal in the genome. Origin of Organelle EF-Gs No a-proteobacterium was found to encode an spdEFG, and there is no apparent affinity of mtEFG for 1286 a-proteobacteria in the EF-G tree (fig. 1). This in itself is not too unusual as only ;20% of nuclear-encoded mitochondrial proteins show an a-proteobacterial affinity (Kurland and Andersson 2000; Brindefalk et al. 2007). Instead, the mtEFGs appear to have arisen from spd bacterial EF-Gs. The fact that both mtEFGs are present in all five of the examined lineages of mitochondriate eukaryotes (Opisthokonts, Amoebozoa, Archaeplastida, Chromalveolates, and Discobids) indicates that these EF-Gs probably arrived in eukaryotes before the eukaryotic last common ancestor. In fact, the simplest scenario is that they were present in the progenitor of the mitochondrion, and thus, the dual nature of mtEFG is an inherited rather than an eukaryote-derived feature. The link between mtEFG and mitochondrial protein synthesis is further strengthened by the fact that the only eukaryotes lacking mtEFGs are taxa that lack respiring mitochondria, none of which encode either protein. This includes Giardia intestinalis, Entamoeba histolytica, Trichomonas vaginalis, and the mitochondrial DNA–lacking Cryptosporidium parvum and C. hominis (Embley and Martin 2006). Because these species represent three widely separated eukaryotic supergroups (Excavates, Opisthokonts, and Alveolates), this also represents at least three parallel independent losses of mitochondrial respiration and mtEFGs (Embley and Martin 2006). Evolution of Chloroplast and Mitochondrial EF-G · doi:10.1093/molbev/msq316 With the single exception of the obligate fungal parasite, Cryptococcus neoformans, the only aerobic eukaryotes with only one mtEFG are the plastid/apicoplast-carrying eukaryotes: Archaeplastida, stramenopile algae, and Apicomplexa (supplementary table S1, Supplementary Material online). Instead of a second mtEFG, these species carry a plastid/apicoplast-targeted EF-G, termed cpEFG or apiEFG (fig. 1 and supplementary fig. S3, Supplementary Material online). Unlike the mtEFGs, cpEFG is a canonical-type EF-G, whereas the apicoplast-targeted apiEFGs are too divergent to classify (fig. 1 and supplementary fig. S3 and table S2). The latter is a common feature in apicoplast proteins (Blanchard and Hicks 1999). All cpEFGs form a single tight clade in the canonical section of the EF-G tree with full support. However, unusual for cpEFGs, this clade shows a moderate to strong affinity for a-proteobacterial EF-G (76% mlBP, 0.99 biPP; fig. 1), whereas most nuclear-encoded proteins of bacterial ancestry that are targeted to the chloroplast show cyanobacterial ancestry (Reyes-Prieto et al. 2010). This makes it tempting to speculate that cpEFG could have originated with the a-proteobacterial progenitor of mitochondria. This would require either that the mitochondrial progenitor had all three EF-G types, an extremely rare situation among the bacteria examined here (supplementary table S1, Supplementary Material online), or that the original canonical mtEFG was replaced later by an spdEFG pair acquired from another bacterial source or sources. However, both mtEFGs appear to have been present in the last common ancestor of eukaryotes, whereas plastids appear to have arisen later, early in the evolution of Archaeplastida (Archibald 2009). Thus, if cpEFG has been resident in eukaryotes as long as mitochondria, it must have been maintained under strong selective pressure until the advent of eukaryotic photosynthesis at which point it suddenly became dispensable in all nonphotosynthetic lines. Alternatively, in a simpler explanation, the progenitor of the chloroplast could have had an a-proteobacterial EF-G acquired by HGT and xenologous replacement. If the two spd/mtEFGs have complementary functions, as the data suggest (see below), then the fact that plastidcarrying eukaryotes have only spd/mtEFG1 suggests that cpEFG is filling in for the function of spd/mtEFG2. If this is the case, then cpEFG must be dually targeted to the chloroplast and the mitochondrion. Most of the cpEFGs appear to have amino-terminal extensions that can be confidently predicted to encode chloroplast-targeting (transit peptide) sequences (supplementary table S2, Supplementary Material online), and the chloroplast targeting of cpEFG has been demonstrated by plastid proteome analyses in Arabidopsis thaliana (Heazlewood et al. 2007). However, dual targeting of plastid and apicoplast proteins to the mitochondrion is known to occur, including other components of the translation machinery, aminoacyl-tRNA synthetases (Duchene et al. 2005; Pino et al. 2007). Additional EF-G Paralogs and Xenologs In addition to spdEFG, there are a number of other paired EF-Gs in the canonical EF-G subtree. In all these cases, the MBE shorter branched EF-G is str operon encoded, and its phylogeny (supplementary fig. S3, Supplementary Material online) roughly follows the species phylogeny as predicted by small SSU rRNA (supplementary fig. S4, Supplementary Material online). Meanwhile, in all cases, the longer branched EF-G form is not str encoded and is often found elsewhere in the tree, suggesting that these are xenologs, that is, they arose by HGT. However, there are two examples where the two EF-Gs group together, which may indicate in-paralogs or very recent HGT from a close relative (supplementary fig. S3, Supplementary Material online). Alpha-Proteobacterial Subtree A mixed grouping of non-str–encoded EF-Gs from subsets of c-proteobacteria and cyanobacteria is embedded in the a-proteobacterial subtree (gcEFG; fig. 1 and supplementary fig. S3, Supplementary Material online) forming a strongly supported clade with a subset of a-proteobacteria. This suggests at least two horizontal transfers of gcEFG, one at its origin and a second transfer some time later. The c-proteobacteria in the gcEFG group are from at least three different divisions of c-proteobacteria (supplementary figs. S3 and S4, Supplementary Material online), suggesting additional possible interphylum transfers or early HGT followed by multiple independent losses. In addition, the EF-Gs in the a-proteobacterial sister group to gcEFG are non-str encoded and group with Hahella chejuensis EFG2 (supplementary fig. S3, Supplementary Material online), which suggests that the a-proteobacterial sequences may themselves have been derived by HGT. Beta-Proteobacterial Subtree Three unusual groupings are found embedded in the b-proteobacterial subtree (fig. 1 and supplementary fig. S3, Supplementary Material online). The bgEFG1 clade consists of non-str–encoded EF-Gs from two c-proteobacteria. This suggests simple HGT, although these two c-proteobacteria are not sister taxa in their home clade in either the EF-G or rRNA trees (supplementary figs. S3 and S4, Supplementary Material online, respectively), suggesting more than one HGT event. The bEFG2 clade consists of non-str–encoded EF-Gs from four related b-proteobacteria, which means that these could be in-paralogs. However, the branching order among the bEFG2 sequences does not entirely match that of the EF-G or rDNA species trees (supplementary figs. S3 and S4, Supplementary Material online, respectively), suggesting some additional HGT. Finally, the bgEFG2 clade consists of two c-proteobacteria EF-Gs, indicating horizontal transfer, as previously reported (Brochier et al. 2002). Interestingly, these sequences are also str-encoded EF-Gs, which would indicate HGT followed by xenologous replacement as with ribosomal protein L29 (Omelchenko et al. 2003). Other Groups The c-proteobacterium Pseudomonas aeruginosa has both an str-encoded EF-G and a longer branched non-str– encoded EF-G that group together. This looks like in-paralogs, although there is substantial distance between the two 1287 MBE Atkinson and Baldauf · doi:10.1093/molbev/msq316 sequences, so it could be HGT between closely related genomes that have not yet been sequenced (100% BP; fig. 1 and supplementary fig. S3, Supplementary Material online). Likewise, Syntrophomonas wolfei (home clade Synergistes) also carries both an str- and non-str–encoded EFG. These both have very long branches and only group together weakly (25% BP; supplementary fig. S3, Supplementary Material online), although again the non-str–EF-G is more divergent. Finally, two actinobacteria, Streptomyces coelicolor and St. avermitilis, have str-encoded EF-Gs in their home clade (fig. 2) plus non-str–encoded EF-Gs that form a separate strongly-supported clade (strepEFG, 100% BP; supplementary fig. S3, Supplementary Material online) lying close to spdEFG2 (51% BP, supplementary fig. S3, Supplementary Material online). These extremely divergent sequences may be in-paralogs that are misplaced in the tree due to long-branch attraction to some of the extremely long spdEFG branches. No second copy EF-Gs were found in any of the other major bacterial groups represented here, but there is also very limited taxonomic sampling from most of these except Firmicutes. Other Possible HGTs Other than the above listed cases, there is general agreement between the rDNA tree (supplementary fig. S4, Supplementary Material online) and canonical EF-G subtree (supplementary fig. S3, Supplementary Material online). That is, all major groups and some subgroups are reconstructed similarly in both trees. However, many branches lack strong statistical support (BP . 70%; Hillis and Bull 1993), which makes inconsistencies difficult to detect with confidence. Nonetheless, there are a number of nontrivial differences between these trees. For example, Maricaulis maris and Caulobacter crescentus (a-proteobacteria) occupy quite different positions in the EF-G and SSU trees (supplementary figs. S3 and S4, Supplementary Material online, respectively), including a number of strongly supported intervening branches. However, all the solo EF-Gs are str encoded (fig. 1), meaning that HGT would need to have been followed by xenologous replacements. This would lead to strong inconsistencies between the EF-G and EF-Tu phylogenies of the same species, which so far we do not see (Atkinson GC and Baldauf SL, unpublished observations). Operon Dynamics Nearly all canonical EF-Gs are encoded in an str operon (5#rpsL-rpsG-fusA-tufA-3#; fig. 2 and supplementary fig. S3, Supplementary Material online) often including a downstream rpsJ and sometimes even rplC, thus forming a complete str über-operon (Lathe et al. 2000). Meanwhile, most spdEFG-encoding genes (referred to here as fusS1 and fusS2) show no signs of flanking str operon structure. There are two exceptions: The fusS1 of Treponema denticola and the fusS2 of Planctomycetes are both found in nearly canonical str operons missing only the downstream tufA, in the case of Blastopirellula marina, including even the farther downstream rpsJ (fig. 1). This presence of str operons 1288 deeply embedded in operon-free clades suggests three possible scenarios. First, the fusS copies originated from a whole operon duplication that was retained through much of spd evolution and then lost independently in multiple lineages. This would require a minimum of four operon losses for spdEFG1 and two for spdEFG2 (fig. 1). Second, ‘‘de novo’’ operon assembly may have occurred in one or both spdEFG lineages and been maintained through selection. Although this seems unlikely based on pure chance, ‘‘de novo’’ assembly would be simplified by the fact that rpsL and rpsG are already linked in most spd species (data not shown). Third, the sudden appearance of operon structure in both spd lineages may be due to multiple independent HGTs of nearly complete str operons into a close ancestor of the Spirochaetes and a common ancestor of the Planctomycetes. This would have been followed by xenologous replacement whereby the incoming str-encoded fusA was replaced by a resident fusS gene. This should lead to anomalies in the rpsL and rpsG phylogenies, but unfortunately, these genes are too small for reliable phylogenetic analysis. Xenologous replacement has been previously demonstrated for at least seven ribosomal proteins (Brochier et al. 2000; Makarova et al. 2001; Omelchenko et al. 2003). At present, there are not enough data to support one of these scenarios over the others; however, this may be resolved in the future with further genome sequencing of spd-type organisms. Functional Evolution of EF-G Types Examination of the distribution of EF-G types among bacteria shows that the two spdEFGs never occur alone. In most cases, they cooccur and are the only EF-Gs in the cell (supplementary table S1, Supplementary Material online). Four bacteria were found with only spdEFG1 and six with only spdEFG2 (supplementary table S1, Supplementary Material online). However, all these carry a second canonical-type EF-G, except for Leptospira spp. Where the second EF-G, while still clearly an EF-G based on HMM screening, is so divergent that its phylogenetic position could not be resolved (data not shown). Likewise, all mitochondriate eukaryotes have either both mtEFGs or one mtEFG and a cpEFG or apiEFG. The single exception to this is the fungal parasite, Cryptococcus neoformans (supplementary table S1, Supplementary Material online). The long-term and widespread comaintenance of spd/ mtEFG suggests that these are performing complementary essential functions. The fact that either can be substituted for by canonical EF-G suggests that these complementary functions are subsets of canonical EF-G function. Recent evidence indicates that EF-G has distinct roles in both translocation and ribosome recycling. These activities occur at different times in the translation cycle, involve different interacting partners, and reside at least partly in different domains of the protein. Ribosomal recycling has especially been shown by Cryo-EM mapping to mainly involve interaction of EF-G domains III and IV with RRF (Ito et al. 2002; Gao et al. 2007). This is also supported by domain swapping experiments on human mtEFGs (Tsuboi Evolution of Chloroplast and Mitochondrial EF-G · doi:10.1093/molbev/msq316 et al. 2009). In fact, human mtEFG1 and 2 have been shown to be specific for translocation and recycling, respectively (Tsuboi et al. 2009), as have spdEFG1 and 2, respectively, of Borelia burgdorferi (Suematsu et al. 2010). If all spd/mtEFGs have indeed become specialized for these two functions, then we would expect to see different patterns of sequence conservation in the two EF-G types consistent with these two roles. Specifically, we expect to see stronger conservation of ribosome recycling– specific sites in spd/mtEFG2 and stronger conservation of translocation-specific sites in spd/mtEFG1. We therefore analyzed patterns of sequence conservation in the two spd/ mtEFGs relative to canonical EF-G by comparing consensus sequences and site-specific evolutionary rates (Gu 2001, 2006; Gu and Vander Velden 2002) and mapping these onto the structural model of EF-G:RRF (Gao et al. 2007) and the structure of EF-G on the ribosome (Gao et al. 2009). Cryo-EM mapping of the E. coli EFG:RRF complex predicts five sets of contact points or ‘‘patches’’ on EF-G. Two highly specific patches are predicted in domain III (positions 421–424 and 492–492, fig. 2) and three less-specific contacts in domain IV (positions 508–510, 558–561 þ 567, and 590–594, fig. 2), where the interactions between EF-G and RRF are predicted to be largely general surface charge repulsion (Gao et al. 2007). Inspection of a semistrict consensus sequence for spd/mtEFG1 (fig. 2) indicates little conservation at these sites or within the expected error limit of the cryo-EM mapping (±1 residue, Gao N, personal communication). However, surprisingly, there is even less conservation of these sites in spd/mtEFG2 (fig. 2). In domain III, patch 1 (PKTK in canonical EF-G; fig. 2), spd/ mtEFG1 conserves only P421, but there is no conservation here in mtEFG2. In fact, there is a wide range of residues at both the second and fourth position of this patch (.13 alternative residues), including extremely dissimilar amino acids. Although patch 2 (TIxk in canonical EF-G, fig. 2) is moderately if not identically well conserved in all three EFG types, mtEFG2 also has several small indels in this region. Thus, the two domain III–predicted RRF interaction sites are highly variable in spd/mtEFG2. In domain IV, Patch 3 (SGG in canonical EF-G; fig. 2) is highly conserved in spd/mtEFG1 (70% consensus TGG), probably because this patch also has an important role in translocation (see below). In contrast, there is again no conservation here in spd/mtEFG2, plus several indels occur here including a five amino acid deletion in S. cerevisiae mtEFG2 (data not shown). Patch 4 (558–561 and G567; fig. 2) is poorly conserved even in canonical EF-G, except for G567, which is more or less conserved in all three EF-G types. Finally, patch 5 (mAFKi in canonical EF-G; fig. 2) is moderately conserved in spd/mtEFG1 but poorly conserved in spd/mtEFG2 including a wide variety of substituted amino acids of all possible chemical categories. Thus, spd/mtEFG1 shows at least as much and, in most cases, much more retention of predicted RRF interacting sites than spd/mtEFG2 In terms of overall sequence conservation, the different EF-G forms also show quite different patterns (table 1). MBE Compared with canonical EF-G, spd/mtEFG1 has lost a total of 87–88 consensus sites but gained 79–88 new ones (table 1; yellow highlighting in fig. 2). In contrast, spd/mtEFG2 has lost 147–162 consensus sites relative to canonical EF-G but added only seven new ones (table 1; blue highlighting in fig. 2). In addition, spd/mtEFG1 has also ‘‘remodeled’’ 16 of the consensus sites shared with canonical EF-G, that is, these sites are still strongly conserved in spd/mtEFG1 but with a new consensus residue, often with very different chemical characteristics (table 1; green highlighting in fig. 2). Meanwhile, spd/mtEFG2 has only remodeled one canonical EF-G consensus site. Thus, it appears that spd/ mtEFG1 has been substantially remodeled relative to canonical EF-G, possibly becoming more specialized or modifying its preexisting role in translocation. Meanwhile, spd/ mtEFG2 has mostly just deteriorated. Although the latter is consistent with subfunctionalization, the actual pattern of conservation (fig. 2) does not appear to be consistent with retention of RRF interaction (Gao et al. 2009). Sites that DIVERGE 2.0 (Gu 2001, 2006; Gu and Vander Velden 2002) predicts at the 0.90 PP level to have experienced a shift in substitution rates (Type I divergence) between spd/mtEFG1 and spd/mtEFG2 are indicated on figure 2. Similar to the consensuses comparison results, the sites with a shift in substitution rate show no correlation with predicted RRF interaction sites. Additionally, DIVERGE failed to identify any substitutions that both shift physiochemical properties and are completely fixed among all members of the subgroups (Type II functionally divergent [hII] sites; hII ML ,0). The lack of signal of subfunctionalization at RRF-interacting sites seems inconsistent with the fact that spdEFG1 and 2 have been shown to be subfunctionalized for translocation and recycling, respectively, in Borelia (Tsuboi et al. 2009) as have mtEFG1 and 2 in human mitochondria (Tsuboi et al. 2009; Suematsu et al. 2010). There are various possible explanations for the lack of RRF interacting site conservation in spd/mtEFG2. 1) It is possible that only some spd/mtEFG2s are subfunctionalized for ribosome recycling. This would require another explanation for the long-term comaintenance of both spd/mtEFGs. 2) There could be balancing mutations in RRF that compensate for spdEFG2 mutations. However, inspection of an RRF alignment shows the same pattern of sequence conservation at EF-G-interacting sites in canonical and spdEFG2carrying bacteria (supplementary fig. S4, Supplementary Material online). 3) EF-G:RRF interactions may be highly species specific and therefore rapidly evolving. This is supported by the observation that mutational defects in one RRF:EF-G intermolecular interface can be compensated for by gain-of-function mutations at another interface (Ito et al. 2002). The latter also suggests that ribosome recycling may not require highly specific interactions between EF-G and RRF but rather simply the maintenance of an EF-G–like structure with ribosome-binding capabilities and appropriate surface charges. This could be sufficient to cause the necessary repulsion of the RRF head domain (Gao et al. 2007). Indeed, many of the sites that are strongly conserved 1289 MBE Atkinson and Baldauf · doi:10.1093/molbev/msq316 in spd/mtEFG2 are internal to the structure and responsible for forming networks of interactions for structural integrity (fig. 2). On the other hand, there is strong evidence of loss of translocation function by spd/mtEFG2. The greatest concentration of sites with substitution rate shifts between spd/mtEFG1 and spd/mtEFG2 is around the tRNA-interacting loops (positions 500–520 and 579–590; fig. 2) located at the very tip of the domain IV structure (Gao et al. 2009). These sites are highly conserved in spd/mtEFG1, but degraded in spd/mtEFG2, consistent with loss of translocation function by the latter. However, the region of EF-G (positions 459–466; fig. 2) that interacts with the sarcin–ricin loop (SRL) of rRNA is 100% conserved in all three EF-G subtypes. The SRL has been proposed to be critical for EF-G and EF-Tu discrimination and EF-G*GTP binding to the ribosome A site (Sergiev et al. 2005; Garcia-Ortega et al. 2010). This suggests that specific interactions on the ribosome may differ among subtypes, but important structural features required for ribosome binding are preserved. Thus, the key to the subfunctionalization of EF-G may lie with spd/mtEFG1. Unlike spd/mtEFG2, the latter shows substantial remodeling relative to canonical EF-G (fig.2; table 1). Much of this remodeling appears to be in the GTPase domain, where 13 canonical consensus sites are lost, 22 new ones are gained, and 7 are conserved but with a new residue (vs. 4–17, 4–11, and 0–4 for the other four domains, respectively, table 1). This domain is also the location of the three amino acid insertion unique to spd/mtEFG1 (fig. 2). Together, these G domain modifications suggest that spd/mtEFG1 evolution has involved changes in how it interacts with and/or is regulated by nucleotides. The three amino acid insertion is particularly interesting as it occurs in the ‘‘switch I’’ region of the G domain (fig. 2). Switch I is a flexible structural element between the G1 and G2 GTP-binding loops, responsible for accommodating of the nucleotide in the G domain. Its length varies among trGTPase families, reflecting family-specific differences in nucleotide interactions (Hauryliuk et al. 2009). In EF-G’s GDP-bound and nucleotide-free states, switch I is disordered in structure, but on binding GTP, it becomes structured by locking onto the gamma phosphate of GTP (Connell et al. 2007; Hauryliuk et al. 2008; Gao et al. 2009; Ticu et al. 2009). After GTP hydrolysis and peptidyl-tRNA translocation, the switch changes conformation again, flipping out from the ribosome and promoting release of GDP and EF-G from the ribosome (Connell et al. 2007; Hauryliuk et al. 2008; Gao et al. 2009; Ticu et al. 2009). In this respect, it is interesting to note that human mtEFG1 (Bhargava et al. 2004) and B. burgdorferi spdEFG1 (Suematsu et al. 2010) are resistant to fusidic acid (FA), an antibiotic that inhibits release of EF-G from the ribosome, thus blocking translation. FA binding appears to require an open disordered conformation of the switch I loop (Gao et al. 2009). Thus, the GVG insertion could change the structure of the open conformation of switch I and make 1290 spd/mtEFG1 inaccessible to FA. Interestingly, spd/mtEFG2 is also resistant to FA, which also inhibits ribosome recycling by E. coli EF-G (Savelsbergh et al. 2009; Suematsu et al. 2010). The cause here might also be switch I, which has also experienced multiple substitutions in spd/mtEFG2 relative to canonical EF-G (fig. 2). However, unlike spdEFG1, this is more of a loss of conservation rather than a strongly differently conserved motif. Therefore, FA resistance of spdEFG2 might be a lineage-specific phenomenon. Our analyses of site-specific conservation and substitution rates clearly show retention and loss of translocation functions in spd/mtEFG1 and spd/mtEFG2, respectively, consistent with experimental data on subfunctionalization (Tsuboi et al. 2009; Suematsu et al. 2010). However, this subfunctionalization appears asymmetric as we find surprisingly little signal in the primary sequence of spd/ mtEFG2 for specialization for ribosome recycling. As spdEFG2 is in fact most strongly conserved in structurally important regions, the advantage of maintaining spdEFG2 may have been as a structural mimic of EF-G, capable of binding the ribosome and promoting recycling primarily through surface charge repulsion. This would have released spdEFG1 from its recycling responsibilities, allowing it to become more specialized for its role in translocation. Substantial remodeling of the G domain suggests that this specialization involved modifying or refining its interactions with nucleotides. The spdEFG pair subsequently entered eukaryotes already subfunctionalized in a frozen accident that survived both mtEFG genes being transferred from the mitochondrial genome to the nucleus. Supplementary Material Supplementary tables 1–2 and figs. 1–5 are available at Molecular Biology and Evolution online (http://www.mbe .oxfordjournals.org/). Acknowledgments We wish to thank S. Andersson, V. Hauryliuk, T. Margus, T. Tenson, O. Fiz-Palacio, and T. Ettema for helpful discussions and critical reading of the manuscript. We also wish to thank the Department of Energy Joint Genome Institute for the use of unpublished sequence data. This work was primarily supported by a Westin Foundation Grant (S.L.B. and G.C.A.) and VR grant 70495101 (S.L.B.), with additional funds from Estonian Science Foundation Mobilitas grant MJD99 (G.C.A.), and the European Regional Development Fund through the Center of Excellence in Chemical Biology (G.C.A.). References Ævarsson A, Brazhnikov E, Garber M, Zheltonosova J, Chirgadze Y, al-Karadaghi S, Svensson LA, Liljas A. 1994. Three-dimensional structure of the ribosomal translocase: elongation factor G from Thermus thermophilus. EMBO J. 13:3669–3677. Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ. 1997. Gapped BLAST and PSI-BLAST: a new Evolution of Chloroplast and Mitochondrial EF-G · doi:10.1093/molbev/msq316 generation of protein database search programs. Nucleic Acids Res. 25:3389–3402. Archibald JM. 2009. The puzzle of plastid evolution. Curr Biol. 19:R81–R88. Atkinson G, Baldauf S. 2009. Evolution of the Translational GTPase Superfamily. J Biomol Struct Dyn. 26:841–842. Baldauf SL, Palmer JD. 1990. Evolutionary transfer of the chloroplast tufA gene to the nucleus. Nature 344:262–265. Baldauf SL, Palmer JD, Doolittle WF. 1996. The root of the universal tree and the origin of eukaryotes based on elongation factor phylogeny. Proc Natl Acad Sci U S A. 93:7749–7754. Bergsten J. 2005. A review of long-branch attraction. Cladistics 21:163–193. Bhargava K, Templeton P, Spremulli L. 2004. Expression and characterization of isoform 1 of human mitochondrial elongation factor G. Protein Expr Purif. 37:368–376. Blanchard JL, Hicks JS. 1999. The non-photosynthetic plastid in malarial parasites and other apicomplexans is derived from outside the green plastid lineage. J Eukaryot Microbiol. 46:367–375. Brindefalk B, Viklund J, Larsson D, Thollesson M, Andersson S. 2007. Origin and evolution of the mitochondrial aminoacyl-tRNA synthetases. Mol Biol Evol. 24:743–756. Brochier C, Bapteste E, Moreira D, Philippe H. 2002. Eubacterial phylogeny based on translational apparatus proteins. Trends Genet. 18:1–5. Brochier C, Philippe H, Moreira D. 2000. The evolutionary history of ribosomal protein RpS14: horizontal gene transfer at the heart of the ribosome. Trends Genet. 16:529–533. Cammarano P, Creti R, Sanangelantoni AM, Palm P. 1999. The archaea monophyly issue: a phylogeny of translational elongation factor G(2) sequences inferred from an optimized selection of alignment positions. J Mol Evol. 49:524–537. Ciccarelli FD, Doerks T, von Mering C, Creevey CJ, Snel B, Bork P. 2006. Toward automatic reconstruction of a highly resolved tree of life. Science 311:1283–1287. Ciferri O, Tiboni O. 1973. Elongation factors for chloroplast and mitochondrial protein synthesis in Chlorella vulgaris. Nat New Biol. 245:209–211. Cole JR, Wang Q, Cardenas E, et al. (11 co-authors). 2009. The ribosomal database project: improved alignments and new tools for rRNA analysis. Nucleic Acids Res. 37:D141–D145. Connell SR, Takemoto C, Wilson DN, et al. (15 co-authors). 2007. Structural basis for interaction of the ribosome with the switch regions of GTP-bound elongation factors. Mol Cell. 25: 751–764. Delsuc F, Brinkmann H, Philippe H. 2005. Phylogenomics and the reconstruction of the tree of life. Nat Rev Genet. 6:361–375. Doolittle W. 1999. Phylogenetic classification and the universal tree. Science 284:2124–2129. Duchene AM, Giritch A, Hoffmann B, Cognat V, Lancelin D, Peeters NM, Zaepfel M, Marechal-Drouard L, Small ID. 2005. Dual targeting is the rule for organellar aminoacyl-tRNA synthetases in Arabidopsis thaliana. Proc Natl Acad Sci U S A. 102:16484–16489. Emanuelsson O, Nielsen H, Brunak S, von Heijne G. 2000. Predicting subcellular localization of proteins based on their N-terminal amino acid sequence. J Mol Biol. 300:1005–1016. Embley TM, Martin W. 2006. Eukaryotic evolution, changes and challenges. Nature 440:623–630. Gao N, Zavalla AV, Ehrenberg M, Frank J. 2007. Specific interaction between EF-G and RRF and its implication for GTP-dependent ribosome splitting into subunits. J Mol Biol. 374:1345–1358. Gao YG, Selmer M, Dunham CM, Weixlbaumer A, Kelley AC, Ramakrishnan V. 2009. The structure of the ribosome with elongation factor G trapped in the posttranslocational state. Science 326:694–699. MBE Garcia-Ortega L, Alvarez-Garcia E, Gavilanes JG, Martinez-DelPozo A, Joseph S. 2010. Cleavage of the sarcin-ricin loop of 23S rRNA differentially affects EF-G and EF-Tu binding. Nucleic Acids Res. 38:4108–4119. Grandi M, Kuntzel H. 1970. Mitochondrial peptide chain elongation factors from Neurospora crassa. FEBS Lett. 10:25–28. Gu X. 2006. A simple statistical method for estimating type-II (cluster-specific) functional divergence of protein sequences. Mol Biol Evol. 23:1937–1945. Gu X. 2001. Maximum-likelihood approach for gene family evolution under functional divergence. Mol Biol Evol. 18:453–464. Gu X, Vander Velden K. 2002. DIVERGE: phylogeny-based analysis for functional-structural divergence of a protein family. Bioinformatics 18:500–501. Hauryliuk V, Mitkevich VA, Draycheva A, Tankov S, Shyp V, Ermakov A, Kulikova AA, Makarov AA, Ehrenberg M. 2009. Thermodynamics of GTP and GDP binding to bacterial initiation factor 2 suggests two types of structural transitions. J Mol Biol. 394:621–626. Hauryliuk V, Mitkevich VA, Eliseeva NA, Petrushanko IY, Ehrenberg M, Makarov AA. 2008. The pretranslocation ribosome is targeted by GTP-bound EF-G in partially activated form. Proc Natl Acad Sci U S A. 105:15678–15683. Heazlewood JL, Verboom RE, Tonti-Filippini J, Small I, Millar AH. 2007. SUBA: the Arabidopsis subcellular database. Nucleic Acids Res. 35:D213–D218. Hillis DM, Bull JJ. 1993. An empirical test of bootstrapping as a method for assessing confidence in phylogenetic analysis. Syst Biol. 42:182–192. Hirokawa G, Nijman RM, Raj VS, Kaji H, Igarashi K, Kaji A. 2005. The role of ribosome recycling factor in dissociation of 70S ribosomes into subunits. RNA 11:1317–1328. Ito K, Fujiwara T, Toyoda T, Nakamura Y. 2002. Elongation factor G participates in ribosome disassembly by interacting with ribosome recycling factor at their tRNA-mimicry domains. Mol Cell. 9:1263–1272. Iwabe N, Kuma K, Hasegawa M, Osawa S, Miyata T. 1989. Evolutionary relationship of archaebacteria, eubacteria, and eukaryotes inferred from phylogenetic trees of duplicated genes. Proc Natl Acad Sci U S A. 86:9355–9359. Jain R, Rivera M, Lake J. 1999. Horizontal gene transfer among genomes: the complexity hypothesis. Proc Natl Acad Sci U S A. 96:3801–3806. Katoh K, Kuma K, Toh H, Miyata T. 2005. MAFFT version 5: improvement in accuracy of multiple sequence alignment. Nucleic Acids Res. 33:511–518. Keeling PJ, Burger G, Durnford DG, Lang BF, Lee RW, Pearlman RE, Roger AJ, Gray MW. 2005. The tree of eukaryotes. Trends Ecol Evol. 20:670–676. Kurland C, Andersson S. 2000. Origin and evolution of the mitochondrial proteome. Microbiol Mol Biol Rev. 64:786–820. Lang BF, Burger G, O’Kelly CJ, Cedergren R, Golding GB, Lemieux C, Sankoff D, Turmel M, Gray MW. 1997. An ancestral mitochondrial DNA resembling a eubacterial genome in miniature. Nature 387:493–497. Lathe WC 3rd, Snel B, Bork P. 2000. Gene context conservation of a higher order than operons. Trends Biochem Sci. 25:474–479. Leipe D, Wolf Y, Koonin E, Aravind L. 2002. Classification and evolution of P-loop GTPases and related ATPases. J Mol Biol. 317:41–72. Maglott D, Ostell J, Pruitt K, Tatusova T. 2007. Entrez Gene: genecentered information at NCBI. Nucleic Acids Res. 35:D26–D31. Makarova KS, Ponomarev VA, Koonin EV. 2001. Two C or not two C: recurrent disruption of Zn-ribbons, gene duplication, lineage-specific gene loss, and horizontal gene transfer in 1291 Atkinson and Baldauf · doi:10.1093/molbev/msq316 evolution of bacterial ribosomal proteins. Genome Biol. 2:RESEARCH 0033 Margus T, Remm M, Tenson T. 2007. Phylogenetic distribution of translational GTPases in bacteria. BMC Genomics. 8:15. Martin W. 2003. Gene transfer from organelles to the nucleus: frequent and in big chunks. Proc Natl Acad Sci U S A. 100:8612–8614. Nosenko T, Lidie K, Van Dolah F, Lindquist E, Cheng J, Bhattacharya D. 2006. Chimeric plastid proteome in the Florida ‘‘red tide’’ dinoflagellate Karenia brevis. Mol Biol Evol. 23:2026–2038. Nyborg J, Nissen P, Kjeldgaard M, Thirup S, Polekhina G, Clark BF. 1996. Structure of the ternary complex of EF-Tu: macromolecular mimicry in translation. Trends Biochem Sci. 21:81–82. Omelchenko M, Makarova K, Wolf Y, Rogozin I, Koonin E. 2003. Evolution of mosaic operons by horizontal gene transfer and gene displacement in situ. Genome Biol. 4:R55. Pino P, Foth BJ, Kwok LY, Sheiner L, Schepers R, Soldati T, SoldatiFavre D. 2007. Dual targeting of antioxidant and metabolic enzymes to the mitochondrion and the apicoplast of Toxoplasma gondii. PLoS Pathog. 3:e115. Post L, Arfsten A, Reusser F, Nomura M. 1978. DNA sequences of promoter regions for the str and spc ribosomal protein operons in E. coli. Cell 15:215–229. Reyes-Prieto A, Yoon HS, Moustafa A, Yang EC, Andersen RA, Boo SM, Nakayama T, Ishida KI, Bhattacharya D. 2010. Differential gene retention in plastids of common recent origin. Mol Biol Evol. 27:1530–1537. Robinson-Rechavi M, Huchon D. 2000. RRTree: relative-rate tests between groups of sequences on a phylogenetic tree. Bioinformatics 16:296–297. Rodnina MV, Savelsbergh A, Katunin VI, Wintermeyer W. 1997. Hydrolysis of GTP by elongation factor G drives tRNA movement on the ribosome. Nature 385:37–41. Ronquist F, Huelsenbeck J. 2003. MrBayes 3: Bayesian phylogenetic inference under mixed models. Bioinformatics 19:1572–1574. Savelsbergh A, Rodnina MV, Wintermeyer W. 2009. Distinct functions of elongation factor G in ribosome recycling and translocation. RNA 15:772–780. Sergiev PV, Bogdanov AA, Dontsova OA. 2005. How can elongation factors EF-G and EF-Tu discriminate the functional 1292 MBE state of the ribosome using the same binding site? FEBS Lett. 579:5439–5442. Sickmann A, Reinders J, Wagner Y, et al. (13 co-authors). 2003. The proteome of Saccharomyces cerevisiae mitochondria. Proc Natl Acad Sci U S A. 100:13207–13212. Smith DG, Gawryluk RM, Spencer DF, Pearlman RE, Siu KW, Gray MW. 2007. Exploring the mitochondrial proteome of the ciliate protozoon Tetrahymena thermophila: direct analysis by tandem mass spectrometry. J Mol Biol. 374: 837–863. Stamatakis A. 2006. RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models. Bioinformatics 22:2688–2690. Suematsu T, Yokobori SI, Morita H, Yoshinari S, Ueda T, Kita K, Takeuchi N, Watanabe YI. 2010. A bacterial elongation factor G homolog exclusively functions in ribosome recycling in the spirochaete Borrelia burgdorferi. Mol Microbiol. 75: 1445–1454. Talavera G, Castresana J. 2007. Improvement of phylogenies after removing divergent and ambiguously aligned blocks from protein sequence alignments. Syst Biol. 56:564–577. Ticu C, Nechifor R, Nguyen B, Desrosiers M, Wilson KS. 2009. Conformational changes in switch I of EF-G drive its directional cycling on and off the ribosome. EMBO J. 28:2053–2065. Timmis JN, Ayliffe MA, Huang CY, Martin W. 2004. Endosymbiotic gene transfer: organelle genomes forge eukaryotic chromosomes. Nat Rev Genet. 5:123–135. Tsuboi M, Morita H, Nozaki Y, Akama K, Ueda T, Ito K, Nierhaus KH, Takeuchi N. 2009. EF-G2mt is an exclusive recycling factor in mammalian mitochondrial protein synthesis. Mol Cell. 35:502–510. Wellner A, Lurie MN, Gophna U. 2007. Complexity, connectivity, and duplicability as barriers to lateral gene transfer. Genome Biol. 8:R156. Zavialov A, Hauryliuk V, Ehrenberg M. 2005. Splitting of the posttermination ribosome into subunits by the concerted action of RRF and EF-G. Mol Cell. 18:675–686. Zuegge J, Ralph S, Schmuker M, McFadden GI, Schneider G. 2001. Deciphering apicoplast targeting signals—feature extraction from nuclear-encoded precursors of Plasmodium falciparum apicoplast proteins. Gene 280:19–26.