Download Evolution of Elongation Factor G and the Origins of Mitochondrial

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Triclocarban wikipedia , lookup

Gene nomenclature wikipedia , lookup

Community fingerprinting wikipedia , lookup

Trimeric autotransporter adhesin wikipedia , lookup

Metagenomics wikipedia , lookup

Horizontal gene transfer wikipedia , lookup

Transcript
Evolution of Elongation Factor G and the Origins of
Mitochondrial and Chloroplast Forms
Gemma C. Atkinson* ,1 and Sandra L. Baldauf1
1
Department of Systematic Biology, Evolutionary Biology Centre, Uppsala University, Uppsala, Sweden
Present address: University of Tartu, Centre of Biomedical Technology, Institute of Technology, Tartu, Estonia
*Corresponding author: E-mail: [email protected].
Associate editor: Martin Embley
Abstract
Key words: EF-G, elongation factor G, organelle, xenology, paralogy, ribosome, translation.
Introduction
Elongation factor G (EF-G) is an ancient translational
GTPase (trGTPase) and the bacterial homolog of eukaryotic eEF2 and archaeal aEF2. EF-G/EF2 is a universal protein,
one of at least four trGTPases that were present in the last
universal common ancestor along with EF1, IF2, and possibly, SelB (Leipe et al. 2002; Margus et al. 2007). The EF2
family altogether consists of at least nine protein subfamilies of which five are bacterial (EF-G, LepA, RF3, Tet, TypA,
and EFG-L), one is archaeal (aEF2), and three are eukaryotic
(eEF2, Ria1p, and Snu114p) (Leipe et al. 2002; Atkinson and
Baldauf 2009).
In all three domains of life, EF2 functions in the elongation stage of protein synthesis, promoting the translocation
of peptidyl-transfer RNA (tRNA) from the A to the P site of
the ribosome (Rodnina et al. 1997). In addition, bacterial
EF-G has recently been shown to be an essential component of the ribosome recycling complex, which causes the
two ribosomal subunits to dissociate. This final stage of
protein synthesis requires EF-G, the ribosome recycling factor (RRF), and initiation factor 3 (IF3) (Hirokawa et al. 2005;
Zavialov et al. 2005).
EF-G comprises five domains, the G (GTPase) domain
and four additional domains referred to by number (II–
V). The G domain is responsible for GTP binding and hydrolysis and is common to all GTPases, whereas domain II is
common to all trGTPases, and domains III–V are EF2 family
specific. Together, the latter three domains are proposed to
mimic the structure of aminoacyl-tRNA (Nyborg et al.
1996) and to bind to the same region of the ribosome,
the A site. EF-G domains III and IV also interact directly
with RRF during ribosome recycling (Ito et al. 2002; Gao
et al. 2007). EF-G promotes an interdomain rotation of
RRF, which in turn forces the two ribosomal subunits apart
(Gao et al. 2007).
EF-G is encoded by the gene fusA, which tends to occur
in the highly conserved str operon. The operon classically
consists of four protein-coding genes in the order 5#-rpsLrpsG-fusA-tufA-3#, an organization found widely in bacteria
and also in some archaea (Lathe et al. 2000). In addition, str
is often followed immediately downstream by all or part of
the ribosomal S10 operon. Together, these two operons are
hypothesized to have formed an ancient ‘‘str über-operon’’
(5#-rpsL-rpsG-fusA-tufA-rpsJ-splC-3#) (Lathe et al. 2000).
© The Author 2010. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. All rights reserved. For permissions, please
e-mail: [email protected]
Mol. Biol. Evol. 28(3):1281–1292. 2011 doi:10.1093/molbev/msq316
Advance Access publication November 22, 2010
1281
Research article
Protein synthesis elongation factor G (EF-G) is an essential protein with central roles in both the elongation and ribosome
recycling phases of protein synthesis. Although EF-G evolution is predicted to be conservative, recent reports suggest
otherwise. We have characterized EF-G in terms of its molecular phylogeny, genomic context, and patterns of amino acid
substitution. We find that most bacteria carry a single ‘‘canonical’’ EF-G, which is phylogenetically conservative and
encoded in an str operon. However, we also find a number of EF-G paralogs. These include a pair of EF-Gs that are mostly
found together and in an eclectic subset of bacteria, specifically d-proteobacteria, spirochaetes, and planctomycetes (the
‘‘spd’’ bacteria). These spdEFGs have also given rise to the mitochondrial factors mtEFG1 and mtEFG2, which probably
arrived in eukaryotes before the eukaryotic last common ancestor. Meanwhile, chloroplasts apparently use an
a-proteobacterial–derived EF-G rather than the expected cyanobacterial form. The long-term comaintenance of the spd/
mtEFGs may be related to their subfunctionalization for translocation and ribosome recycling. Consistent with this,
patterns of sequence conservation and site-specific evolutionary rate shifts suggest that the faster evolving spd/mtEFG2
has lost translocation function, but surprisingly, the protein also shows little conservation of sites related to recycling
activity. On the other hand, spd/mtEFG1, although more slowly evolving, shows signs of substantial remodeling. This is
particularly extensive in the GTPase domain, including a highly conserved three amino acid insertion in switch I. We
suggest that subfunctionalization of the spd/mtEFGs is not a simple case of specialization for subsets of original activities.
Rather, the duplication allows the release of one paralog from the selective constraints imposed by dual functionality, thus
allowing it to become more highly specialized. Thus, the potential for fine tuning afforded by subfunctionalization may
explain the maintenance of EF-G paralogs.
MBE
Atkinson and Baldauf · doi:10.1093/molbev/msq316
Eukaryotes conduct cytoplasmic protein synthesis with
eEF2, but most eukaryotic nuclear genomes also encode
organelle-targeted EF-Gs that function in mitochondrial
and/or plastid protein synthesis (mtEFG and cpEFG, respectively). Organelle targeting of these proteins has been
demonstrated by their purification from mitochondrial
and chloroplast cellular fractions, respectively (Grandi
and Kuntzel 1970; Ciferri and Tiboni 1973), and by whole
organelle proteome analyses of both mitochondria and
chloroplasts (Sickmann et al. 2003; Heazlewood et al.
2007; Smith et al. 2007). Nonetheless, all organellar EFGs are nuclear encoded. This is presumably the result
of endosymbiotic gene transfer early in eukaryotic evolution, as has occurred more recently for its 3# str neighbor
tufA in both plastids (Baldauf and Palmer 1990; Martin
2003; Timmis et al. 2004) and mitochondria (Lang et al.
1997).
Due to their antiquity, universality, and high levels of
sequence conservation, various members of the EF-G protein family have been used to address important evolutionary questions. These include whether or not archaea are
monophyletic (Cammarano et al. 1999) and the position
of the root of the universal tree (Iwabe et al. 1989; Baldauf
et al. 1996). However, the ability of molecular phylogeny
to recover taxonomic relationships among bacteria and
archaea has come into question due to the discovery of
extensive horizontal gene transfer (HGT) among them
(Doolittle 1999).
On the other hand, it has been argued that genes encoding components of complex networks, such as transcription and translation, should be resistant to functional
HGT (Jain et al. 1999). This is supported by an observed
low number of detected transfers of essential genes in
highly connected interaction networks (Wellner et al.
2007). By this reasoning, the genes encoding EF-G, a core
information processing protein and therefore at the heart
of a complex network, are predicted to be refractory
to HGT. Nonetheless, anomalies in EF-G phylogeny have
been noted and interpreted as HGT, both between band c-proteobacteria (Brochier et al. 2002) and between
spirochetes and eukaryotes (Doolittle 1999; Nosenko
et al. 2006).
We have examined EF-G family evolution using molecular phylogeny, consensus sequence comparisons, substitution rate divergence analysis, and genomic context analysis.
Our results confirm that most bacteria carry a single slowly
evolving EF-G encoded in a canonical str operon. However,
operon-free EF-G duplicates are also found, and these are
scattered across the tree. All but two of these duplicates
have very limited taxonomic distribution, and these two
also have very similar taxonomic distributions, including
most mitochondriate eukaryotes. The frequent cooccurrence of these two EF-Gs suggests that they are codependent, most likely due to subfunctionalization, as shown
experimentally for Borrelia and human EF-Gs (Tsuboi
et al. 2009; Suematsu et al. 2010). We investigate the
evolution of these two proteins in light of the known
functional roles of EF-G.
1282
Methods
Data Set Assembly
EF-G sequences were retrieved from NCBI using BlastP
(Altschul et al. 1997) with an E-value cutoff of e40. Bacterial sequences were identified by searching all completely
sequenced genomes in the NCBI database. Due to the large
number of these genome sequences, target genomes were
limited to one representative for each genus in the order
they are listed on the NCBI genomes table. Preliminary phylogenetic analyses identified two paralogous EF-G clades in
spirochetes, planctomycetes, and d-proteobacteria (referred to here as ‘‘spd’’ bacteria). Therefore, additional
searches were conducted against all spd genomes in the
NCBI genome and nr database. Bacterial small subunit
ribosomal RNA (SSU rDNA) sequences were downloaded
from the ribosomal database project (Cole et al. 2009).
Eukaryotic sequences were identified through searches
against a taxonomically wide sampling of complete genome sequences (supplementary table S1, Supplementary
Material online). Additional eukaryotic EF-G sequences
were retrieved from BlastP searches against genomes from
the following species: Eimeria tenella (GeneDB, Sanger Institute—http://www.genedb.org/); Aureococcus anophagefferens, Chlamydomonas reinhardtii, Micromonas
pusilla, Naegleria gruberi, Phaeodactylum tricornutum, Thalassiosira pseudonana, Volvox carterii, and Emiliania huxleyi
(Department of Energy Joint Genome Institute—http://
www.jgi.doe.gov/); Cyanidioschyzon merolae (University
of Tokyo—http://merolae.biol.s.u-tokyo.ac.jp/); Toxoplasma
gondii (ToxoDB—http://ToxoDB.org/); and Tetrahymena
thermophila (NCBI nr database via trGTPbase:
www.trGTPbase.org; Atkinson GC and Baldauf SL, unpublished data). Phaeodactylum tricornutum mtEFG1
was assembled from two overlapping gene models:
e_gw1.5.154.1_Phatr2:11200 and e_gw1.5.161.1_Phatr2:
11121 (http://www.jgi.doe.gov/). The 5# ends of predicted
mtEFG2 sequences from three species of Leishmania were
completed by BlastN searches of the genome sequences to
retrieve the full-length gene sequences, which were then
translated with Transeq (http://www.ebi.ac.uk/Tools/).
The EF-G identity of all sequences was confirmed by
scanning against a set of EF2 family profile hidden markov
models (HMMs) generated from subsets of a curated
trGTPase superfamily alignment (trGTPbase; Atkinson
GC and Baldauf SL, unpublished). The resulting EF-G data
set is referred to here as EFG_dset1. EFG_dset1 was aligned
using MAFFT v6.234b with strategy L-INS-I (Katoh et al.
2005). Consensus sequences were then computed from
the full alignment using the Python script ConsensusFinder
(Atkinson GC and Baldauf SL, unpublished data), which
indicates sites with conservation of a single amino acid
in uppercase characters and conservation of a substitution
group in lowercase. A 70% consensus sequence was constructed and used to identify well-conserved regions
These unambiguously aligned regions were confirmed and
extracted using the Gblocks server v.0.91b (http://molevol.
cmima.csic.es/castresana/Gblocks_server.html; Talavera and
Evolution of Chloroplast and Mitochondrial EF-G · doi:10.1093/molbev/msq316
Table 1. The Total Number of Gains, Losses, and Changes in
Consensus versus Canonical EF-G for Various Classes of EF-Gs.
Change in Numbers of Consensus Sites vs.
Canonical EF-G
Gain
EF-G Domain
spd/mtEFGs
spdEFG1
mtEFG1
spd1mtEFG1
spdEFG2
mtEFG2
spd1mtEFG2
Change
Loss
G II III IV V G II III IV V G II III IV V
36 18 8 16 10 9 6 4 0 1
31 14 10 16 8 14 5 4 0 2
22 11 5 11 4 7 3 4 0 1
11 1 5 5 1 1 0 1 0 0
11 4 2 0 1 4 0 1 0 0
4 1 2 0 0 1 0 0 0 0
24
23
13
50
57
37
16
19
14
22
26
20
18
21
17
29
33
24
7
10
4
27
29
26
8
14
7
19
17
11
NOTE.—Consensus sequences were calculated at the 80% level, after excluding
highly divergent sequences and are summarized by EF-G structural domains
(Ævarsson et al. 1994).
Castresana 2007), run with the more lenient ‘‘smaller final
blocks’’ option enabled.
A trimmed-down version of the large EFG_dset1,
EFG_dset2, was created to enable the more computationally demanding Bayesian phylogenetic analysis. Taxa
to be trimmed were identified from a maximum likelihood (ML) tree of EFG_dset1 constructed with RaxML
(Stamatakis 2006). Representatives were selected to sample within strongly supported clades (.70% ML bootstrap support) and across all bacterial phyla. In order
to minimize long-branch attraction affects (Bergsten
2005), only those sequences from the spd/mtEFG1 and
spd/mtEFG2 yielding the shortest branches were kept.
The dimensions of the final data sets are as follows:
EFG_dset1: 343 sequences, 400 positions and EFG_dset2:
93 sequences, 400 positions.
Phylogenetic Analyses
Phylogenetic analyses were conducted using Bayesian
Inference (BI) and ML. BI analysis of EFG_dset2 used
MrBayes 3.2 (Ronquist and Huelsenbeck 2003). ML analyses
EFG_dset1 and EFG_dset2 used RAxML 7.0.4 (Stamatakis
2006).
MrBayes was run with a mixture of all available amino
acid substitution models, which converged on the Whelan
and Goldman (WAG) model with probability 1.0, and a C
distribution with a number of invariant sites estimated by
the program. Two simultaneous runs of eight chains were
carried out from a random starting tree for 1 million generations, with results saved every ten generations. The
standard deviation of split frequencies at termination of
the analysis was 0.047. At this point, the two runs had converged on very similar tree topologies. Following summation of parameters over the entire run to confirm burn in,
results from the first 100,000 generations per run were
discarded and posterior probabilities (biPP) estimated from
the remaining 180,000 trees.
RAxML analyses were conducted using the PROTMIXWAG model with 25 distinct per site rate categories
(Stamatakis 2006). This implements the WAG substitution
model with the CAT approximation for rate heterogeneity
MBE
during the topology search, evaluating the final tree under
a C distribution of rates. Optimal tree searches consisted
of 100 replicates, each starting from a maximum parsimony
starting tree. Clade robustness was assessed with 100
bootstrap replicates, with the bootstrap percentage (mlBP)
shown for each branch. Relative evolutionary rates for different EF-G types were computed with the relative rate test
as implemented in RRTree (Robinson-Rechavi and Huchon
2000).
Site-Specific Sequence Conservation and Functional
Diversification Analyses
Two methods were used for analyzing patterns of sequence
evolution: 1) site-specific sequence conservation and 2) ML
identification of sites that have experienced shifts in evolutionary rate as implemented in DIVERGE 2.0 (Gu 2001,
2006; Gu and Vander Velden 2002). To analyze site-specific
conservation across subfamilies, the total number of gains,
losses, and changes in consensus versus canonical EF-G
were calculated for each major EF-G type using 80% strict
consensus sequences calculated using the Python script
ConsensusFinder (Atkinson GC and Baldauf SL, unpublished data, table 1). The data were divided into five classes
based on the EFG_dset1 phylogeny (supplementary fig. S3,
Supplementary Material online) excluding highly divergent
sequences: ‘‘canonical EF-G,’’ ‘‘spdEFG1,’’ ‘‘mtEFG1,’’
‘‘spdEFG2,’’ and ‘‘mtEFG2.’’ Gains are sites that meet the
80% conservation threshold within a class but not in
canonical EF-G. Changes are sites that are conserved both
in the class and canonical EF-G, but the identity of the
conserved residue has changed. Losses are sites that are
unconserved within a class but are conserved in canonical
EF-G.
DIVERGE 2.0 (Gu 2001, 2006; Gu and Vander Velden
2002) was used to identify sites that have undergone ‘‘Type
I’’ and ‘‘Type II’’ divergence in substitution rates between
canonical EF-G, spd/mtEFG1, and spd/mtEFG2 subgroups
of EF-G. Type I divergence refers to changes in the amino
acid substitution rates of homologous sites among clades of
sequences. In Type II divergent sites, different amino acids
with radically different properties are fixed in duplicates.
EFG_dset2 was used as input to DIVERGE, cut down to
contain only canonical EF-G, spd/mtEFG1, and spd/
mtEFG2. A neighbor-joining tree was generated by the
software using the Kimura distance correction measure
(supplementary fig. S5, Supplementary Material online),
and the three monophyletic clusters of spd/mtEFG1,
spd/mtEFG2, and canonical EF-G were selected for pairwise
comparisons. All possible paired cluster comparisons were
used to calculate the coefficient of Type I (hI) and Type II
(hII) divergence and the posterior probability (PP) that a site
has undergone a shift in substitution rates (excluding
columns containing gaps) (table 2).
Genomic Context
The genomic context of EF-G–encoding genes (fus) was determined by retrieving the identity of up- and downstream
flanking genes using the Python script ContextFinder
1283
MBE
Atkinson and Baldauf · doi:10.1093/molbev/msq316
Table 2. Functionally Divergence in Pairwise Comparisons between spdEFG1, spdEFG1, and Canonical EF-G (canEFG).
Sites by Domain Where PP>0.90
Pairwise Combinations
spdEFG1 vs. spdEFG2
spdEFG1 vs. canEFG
spdEFG2 vs. canEFG
uI
0.32
0.30
0.26
uI SE
60.03
60.03
60.03
Total sites where PP > 0.90
43
47
38
G
9
14
10
II
7
8
5
III
4
10
3
IV
18
9
18
V
5
6
2
NOTE.—hI is the coefficient of functional divergence, and SE is standard error. PP is the Posterior probability of a site being functionally divergent, with sites .0.90 indicating
a strong probability of divergence. The number of these sites in total and within each domain are shown.
(Atkinson GC, unpublished data). This script uses NCBI GI
numbers to find corresponding entries in the Entrez Gene
database (Maglott et al. 2007) and retrieve gene ID numbers, gene name, title, strand, and coordinates for each
flanking gene from their Entrez Gene record. The identity
of the genes was assigned by searching for keywords in gene
name, title, and COG identifiers in the annotations of
each record. Where identity could not be automatically
assigned using annotations, the Entrez Gene record was
checked by eye.
Transit Peptides
All putative mitochondrial and plastid sequences were
analyzed using TargetP, which scans the N terminus for
the predicted ability of sequences to form functional chloroplast, mitochondrial, and signal peptides (Emanuelsson et al.
2000). Apicomplexan nonmitochondrial-type sequences
were also analyzed with PATS (predict apicoplast-targeted
sequences) (Zuegge et al. 2001) to predict possible apicoplast
transit peptides.
Results and Discussion
Molecular Phylogeny of EF-G
We have reconstructed the phylogeny of bacterial and
organellar EF-G, which together form a monophyletic subgroup within the EF2 family (Leipe et al. 2002; Atkinson GC
and Baldauf SL, in preparation). Altogether, 218 bacterial
and 52 eukaryotic genomes were sampled including at least
one representative from each genus of bacteria with a completely sequenced genome. Eukaryotic sampling includes
representatives of five of the six recognized eukaryotic
supergroups (Keeling et al. 2005) as little molecular data
are available for Rhizaria. We find that EF-G is universal
in bacteria and mitochondria-carrying eukaryotes. In addition, it has given rise to at least 12 paralogs and xenologs
derived by gene duplication or HGT.
The molecular phylogeny of EF-G shows three main sections or subtrees (fig. 1 and supplementary fig. S3, Supplementary Material online). The largest subtree includes
sequences from nearly every examined major group of
bacteria. These are also usually the only EF-G in their host
cells (supplementary table S1, Supplementary Material online), and they are almost exclusively found in str operons
(fig. 1 and supplementary fig. S3, Supplementary Material
online). Therefore, these are referred to here as canonical
EF-G or simply EF-G. The other two subtrees consist almost
entirely of sequences from an eclectic but similar subset of
bacteria. These taxa include all 9 examined species of Spi1284
rochetes, 4 of the 5 examined species of Planctomycetes,
and 14 of the 21 examined species of d-Proteobacteria. This
diverse set of bacteria is herein referred to as the ‘‘spd’’ bacteria and their EF-G sequences as spdEFG1 and spdEFG2
(fig. 1).
Each of the spdEFG1 and spdEFG2 subtrees also includes
one of the two mitochondrially targeted EF-Gs, mtEFG1
and mtEFG2, respectively (fig. 1 and supplementary fig.
S3, Supplementary Material online). Together these account for all the putative or proven mitochondrial EFGs (supplementary table S1, Supplementary Material
online). The monophyly of the spd/mtEFG1 clade is
strongly supported by both ML and BI phylogenetic methods (1.0 biPP, 100% bootstrap percentages [mlBP]; fig. 1).
The clade is also supported by the presence of a small wellconserved insertion with the consensus sequence GVG,
which is found exclusively and universally in all spd/
mtEFG1 sequences (positions 47–49 in fig. 2, see below).
The grouping of all spdEFG2 sequences and mtEFG2 is less
well or consistently supported (1.0 biPP, 65% mlBP; fig. 1),
but these sequences also have a very similar taxonomic distribution to spd/mtEFG1 and are therefore referred to here
as spd/mtEFG2. Nearly all species with spdEFG1 possess
spdEFG2 and vice versa, and these are usually the only
EF-Gs encoded in the genome (supplementary table S1,
Supplementary Material online). Only two closely related
bacteria were found with all three main EF-G types (Anaeromyxobacter sp. and A. halogenans, supplementary table
S1 and fig. S3, Supplementary Material online).
Canonical EF-G is by far the most widespread of the
three EF-G types. It is found in at least some and, in most
cases, all representatives of every major bacterial taxon examined except Spirochetes (supplementary table S1 and
fig. S3, Supplementary Material online). Within the canonical EF-G subtree, the bacteria are also roughly grouped into
recognized higher order taxa (Ciccarelli et al. 2006; Cole
et al. 2009), albeit with varying levels of support (supplementary fig. S3, Supplementary Material online). As expected for a single gene or protein tree of bacteria,
there is no confident resolution of any deeper branches
(Delsuc et al. 2005). Canonical EF-G also has a slower evolutionary rate (0.37 substitutions/site) compared with both
spd/mtEFG1 and spd/mtEFG2 (0.42 and 0.58 substitutions/
site, respectively). DIVERGE 2.0 (Gu 2001, 2006; Gu and
Vander Velden 2002) was used to calculate the coefficients
of Type I functional divergence (hI) among EF-G subtrees
(supplementary fig. S5, Supplementary Material online).
Type I divergent sites are those with significantly different
amino acid substitution rates among subtrees. hI values
Evolution of Chloroplast and Mitochondrial EF-G · doi:10.1093/molbev/msq316
MBE
FIG. 1 Phylogeny of EF-G proteins and genomic context. The tree shown was derived by BI using data set EFG_dset2 and the program MrBayes.
Branch lengths are drawn to scale as indicated by the scale bar. Numbers on the branches correspond to support values following the format:
MrBayes posterior probabilities (biPP)/RAxML bootstrap percentages (mlBP) shown for branches with over 0.65 MrBayes biPP support only.
Eukaryotic taxon names are followed with major group designations as follows: Me, metazoa; Ee, euglenozoa; Am, amoebozoa; Al, alveolates; St,
stramenopiles; Pl, archaeplastida; He, heterolobosae; and Rh, rhodophyta. Major bacerial groups are abbreviated as follows: Plancto:
planctomycetes and alpha/beta/gamma/delta/epsilon: proteobacteria of these respective orders. The unique identifying codes for sequences
from eukaryotic genome project databases are shown in supplementary fig. S3, Supplementary Material online. Operon structure is indicated
by colored blocks to the right of taxon names following the key shown in the inset box. Inset:The classical str operon (Post et al. 1978) is shown
with each block representing one gene. Examples of common genomic contexts for the EF-G–encoding fus gene and those specifically
discussed in the text are shown, with gene orientation indicated. Starred taxa can be found in the full phylogeny (supplementary fig. S3,
Supplementary Material online). Blocks in the same color represent homologous genes and are labeled with gene names. Gray blocks are
unconserved within which ‘‘HP’’ stands for hypothetical protein and ‘‘trans’’ stands for transamidase.
were significantly greater than 0 and in the same range for
all pairwise comparisons (0.26–0.32 with a standard error of
0.03; table 2), indicating significant differences in substitution rates among all the three groups.
In contrast to canonical EF-G, the two spd/mtEFG clades
include sequences from only a small subset of bacteria (fig. 1).
The eclectic taxonomic distribution of the spdEFGs suggests
that the encoding genes have been horizontally transferred,
1285
Atkinson and Baldauf · doi:10.1093/molbev/msq316
MBE
FIG. 2 Aligned consensus sequences for major EF-G types. Consensus sequences for each subfamily of EF-G subgroups were calculated at the
70% level using the Python program ConsensusFinder (G.C.A. and S.L.B.). A ‘‘-’’ denotes a consensus gap position, and an uppercase letter
represents an amino acid conserved in .70% of the sequences. Lowercase letters denote the most common representative of an amino acid
substitution group as defined by the BLOSUM62 matrix, where the group as a whole is present at the designated consensus level. Positions
present in all sequences but for which neither conservation threshold is met are denoted by dots (‘‘.’’). Stars (*) show predicted RRF interacting
sites (Gao et al. 2007), which cluster into five patches (numbers 1–5 below stars). The G1-5 GTP-interacting loops are indicated under the
alignment. The colored numbering above the alignment indicates the five structural domains of EF-G. Regions to which specific functions have
been assigned are indicated with boxes, and the P loop, switch 1, and switch 2 regions are also labeled. The turquoise box shows tRNAinteracting loops (Gao et al. 2009). ‘‘GVG’’ shows the location of the GVG three amino acid insertion. Residue highlighting follows the
conservation pattern of spd/mtEFG1 and 2 relative to EF-G: yellow—differentially conserved, green—specifically conserved in either mt/
spdEFG1, and pale blue—no conservation. Secondary structure is indicated beneath the alignment, with parentheses denoting helices and
arrows denoting sheets. Green, loops protruding into the ribosome; pink, site of interaction with SRL loop of ribosome; turquoise, tRNAinteracting loops; and purple, sites that form a network of interactions to stabilize structure. Sites with a high probability of being functionally
divergent (PP . 0.90) as determined by DIVERGE 2.0 are boxed with a black line.
possibly multiple times. HGT of spdEFG1, between spirochetes and d-proteobacteria, is also supported by the shared
presence of an adjacent downstream gene in Treponema denticola and several d-proteobacteria (fig. 1 and supplementary
fig. S1, Supplementary Material online). The two spd/mtEFG
clades do not cluster together as a monophyletic group in the
tree (fig. 1), suggesting that they were independently transferred. However, intervening branches between the two
clades are few and lack strong support (fig. 1). Additionally,
the fact that spdEFG1 and spdEFG2 cooccur in three very
different groups of bacteria—spirochaetes, d-proteobacteria,
and planctomycetes—is most parsimoniously explained
by cotransfer. The simplest explanation for this would
be that the spdEFG-encoding genes first arose by tandem
duplication and that they remained adjacent in the genome long enough to be cotransferred several times.
However, these genes are not found at adjacent locations
in any examined bacterium. Thus, the co-HGT of these
genes, if it did occur, would probably have been largely
limited to a narrow window of time when the duplicates
were proximal in the genome.
Origin of Organelle EF-Gs
No a-proteobacterium was found to encode an
spdEFG, and there is no apparent affinity of mtEFG for
1286
a-proteobacteria in the EF-G tree (fig. 1). This in itself is
not too unusual as only ;20% of nuclear-encoded mitochondrial proteins show an a-proteobacterial affinity (Kurland and
Andersson 2000; Brindefalk et al. 2007). Instead, the mtEFGs
appear to have arisen from spd bacterial EF-Gs. The fact that
both mtEFGs are present in all five of the examined lineages
of mitochondriate eukaryotes (Opisthokonts, Amoebozoa,
Archaeplastida, Chromalveolates, and Discobids) indicates
that these EF-Gs probably arrived in eukaryotes before the
eukaryotic last common ancestor. In fact, the simplest scenario is that they were present in the progenitor of the mitochondrion, and thus, the dual nature of mtEFG is an
inherited rather than an eukaryote-derived feature.
The link between mtEFG and mitochondrial protein synthesis is further strengthened by the fact that the only
eukaryotes lacking mtEFGs are taxa that lack respiring
mitochondria, none of which encode either protein. This
includes Giardia intestinalis, Entamoeba histolytica, Trichomonas vaginalis, and the mitochondrial DNA–lacking Cryptosporidium parvum and C. hominis (Embley and Martin
2006). Because these species represent three widely separated eukaryotic supergroups (Excavates, Opisthokonts,
and Alveolates), this also represents at least three parallel
independent losses of mitochondrial respiration and
mtEFGs (Embley and Martin 2006).
Evolution of Chloroplast and Mitochondrial EF-G · doi:10.1093/molbev/msq316
With the single exception of the obligate fungal parasite,
Cryptococcus neoformans, the only aerobic eukaryotes with
only one mtEFG are the plastid/apicoplast-carrying eukaryotes: Archaeplastida, stramenopile algae, and Apicomplexa (supplementary table S1, Supplementary Material
online). Instead of a second mtEFG, these species carry
a plastid/apicoplast-targeted EF-G, termed cpEFG or
apiEFG (fig. 1 and supplementary fig. S3, Supplementary
Material online). Unlike the mtEFGs, cpEFG is a canonical-type EF-G, whereas the apicoplast-targeted apiEFGs
are too divergent to classify (fig. 1 and supplementary
fig. S3 and table S2). The latter is a common feature in
apicoplast proteins (Blanchard and Hicks 1999).
All cpEFGs form a single tight clade in the canonical section of the EF-G tree with full support. However, unusual
for cpEFGs, this clade shows a moderate to strong affinity
for a-proteobacterial EF-G (76% mlBP, 0.99 biPP; fig. 1),
whereas most nuclear-encoded proteins of bacterial ancestry that are targeted to the chloroplast show cyanobacterial
ancestry (Reyes-Prieto et al. 2010). This makes it tempting
to speculate that cpEFG could have originated with the
a-proteobacterial progenitor of mitochondria. This would require either that the mitochondrial progenitor had all three
EF-G types, an extremely rare situation among the bacteria
examined here (supplementary table S1, Supplementary Material online), or that the original canonical mtEFG was replaced later by an spdEFG pair acquired from another
bacterial source or sources. However, both mtEFGs appear
to have been present in the last common ancestor of eukaryotes, whereas plastids appear to have arisen later, early in the
evolution of Archaeplastida (Archibald 2009). Thus, if cpEFG
has been resident in eukaryotes as long as mitochondria, it
must have been maintained under strong selective pressure
until the advent of eukaryotic photosynthesis at which point
it suddenly became dispensable in all nonphotosynthetic
lines. Alternatively, in a simpler explanation, the progenitor
of the chloroplast could have had an a-proteobacterial EF-G
acquired by HGT and xenologous replacement.
If the two spd/mtEFGs have complementary functions,
as the data suggest (see below), then the fact that plastidcarrying eukaryotes have only spd/mtEFG1 suggests that
cpEFG is filling in for the function of spd/mtEFG2. If this
is the case, then cpEFG must be dually targeted to the chloroplast and the mitochondrion. Most of the cpEFGs appear
to have amino-terminal extensions that can be confidently
predicted to encode chloroplast-targeting (transit peptide)
sequences (supplementary table S2, Supplementary
Material online), and the chloroplast targeting of cpEFG
has been demonstrated by plastid proteome analyses in
Arabidopsis thaliana (Heazlewood et al. 2007). However,
dual targeting of plastid and apicoplast proteins to the
mitochondrion is known to occur, including other components of the translation machinery, aminoacyl-tRNA
synthetases (Duchene et al. 2005; Pino et al. 2007).
Additional EF-G Paralogs and Xenologs
In addition to spdEFG, there are a number of other paired
EF-Gs in the canonical EF-G subtree. In all these cases, the
MBE
shorter branched EF-G is str operon encoded, and its
phylogeny (supplementary fig. S3, Supplementary Material
online) roughly follows the species phylogeny as predicted
by small SSU rRNA (supplementary fig. S4, Supplementary
Material online). Meanwhile, in all cases, the longer
branched EF-G form is not str encoded and is often found
elsewhere in the tree, suggesting that these are xenologs,
that is, they arose by HGT. However, there are two
examples where the two EF-Gs group together, which
may indicate in-paralogs or very recent HGT from a close
relative (supplementary fig. S3, Supplementary Material
online).
Alpha-Proteobacterial Subtree
A mixed grouping of non-str–encoded EF-Gs from subsets
of c-proteobacteria and cyanobacteria is embedded in the
a-proteobacterial subtree (gcEFG; fig. 1 and supplementary
fig. S3, Supplementary Material online) forming a strongly
supported clade with a subset of a-proteobacteria. This
suggests at least two horizontal transfers of gcEFG, one
at its origin and a second transfer some time later. The
c-proteobacteria in the gcEFG group are from at least three
different divisions of c-proteobacteria (supplementary figs.
S3 and S4, Supplementary Material online), suggesting
additional possible interphylum transfers or early HGT
followed by multiple independent losses. In addition, the
EF-Gs in the a-proteobacterial sister group to gcEFG are
non-str encoded and group with Hahella chejuensis EFG2 (supplementary fig. S3, Supplementary Material online),
which suggests that the a-proteobacterial sequences may
themselves have been derived by HGT.
Beta-Proteobacterial Subtree
Three unusual groupings are found embedded in the
b-proteobacterial subtree (fig. 1 and supplementary fig.
S3, Supplementary Material online). The bgEFG1 clade consists of non-str–encoded EF-Gs from two c-proteobacteria.
This suggests simple HGT, although these two c-proteobacteria are not sister taxa in their home clade in either the EF-G
or rRNA trees (supplementary figs. S3 and S4, Supplementary Material online, respectively), suggesting more than one
HGT event. The bEFG2 clade consists of non-str–encoded
EF-Gs from four related b-proteobacteria, which means that
these could be in-paralogs. However, the branching order
among the bEFG2 sequences does not entirely match that
of the EF-G or rDNA species trees (supplementary figs. S3
and S4, Supplementary Material online, respectively), suggesting some additional HGT. Finally, the bgEFG2 clade consists of two c-proteobacteria EF-Gs, indicating horizontal
transfer, as previously reported (Brochier et al. 2002). Interestingly, these sequences are also str-encoded EF-Gs, which
would indicate HGT followed by xenologous replacement as
with ribosomal protein L29 (Omelchenko et al. 2003).
Other Groups
The c-proteobacterium Pseudomonas aeruginosa has both
an str-encoded EF-G and a longer branched non-str–
encoded EF-G that group together. This looks like in-paralogs,
although there is substantial distance between the two
1287
MBE
Atkinson and Baldauf · doi:10.1093/molbev/msq316
sequences, so it could be HGT between closely related
genomes that have not yet been sequenced (100% BP;
fig. 1 and supplementary fig. S3, Supplementary Material
online). Likewise, Syntrophomonas wolfei (home clade Synergistes) also carries both an str- and non-str–encoded EFG. These both have very long branches and only group together weakly (25% BP; supplementary fig. S3, Supplementary Material online), although again the non-str–EF-G is
more divergent. Finally, two actinobacteria, Streptomyces
coelicolor and St. avermitilis, have str-encoded EF-Gs in
their home clade (fig. 2) plus non-str–encoded EF-Gs that
form a separate strongly-supported clade (strepEFG, 100%
BP; supplementary fig. S3, Supplementary Material online)
lying close to spdEFG2 (51% BP, supplementary fig. S3, Supplementary Material online). These extremely divergent
sequences may be in-paralogs that are misplaced in the tree
due to long-branch attraction to some of the extremely
long spdEFG branches. No second copy EF-Gs were found
in any of the other major bacterial groups represented here,
but there is also very limited taxonomic sampling from
most of these except Firmicutes.
Other Possible HGTs
Other than the above listed cases, there is general agreement between the rDNA tree (supplementary fig. S4, Supplementary Material online) and canonical EF-G subtree
(supplementary fig. S3, Supplementary Material online).
That is, all major groups and some subgroups are reconstructed similarly in both trees. However, many branches
lack strong statistical support (BP . 70%; Hillis and Bull
1993), which makes inconsistencies difficult to detect with
confidence. Nonetheless, there are a number of nontrivial
differences between these trees. For example, Maricaulis
maris and Caulobacter crescentus (a-proteobacteria)
occupy quite different positions in the EF-G and SSU trees
(supplementary figs. S3 and S4, Supplementary Material
online, respectively), including a number of strongly
supported intervening branches. However, all the solo
EF-Gs are str encoded (fig. 1), meaning that HGT would
need to have been followed by xenologous replacements.
This would lead to strong inconsistencies between the EF-G
and EF-Tu phylogenies of the same species, which so far
we do not see (Atkinson GC and Baldauf SL, unpublished
observations).
Operon Dynamics
Nearly all canonical EF-Gs are encoded in an str operon (5#rpsL-rpsG-fusA-tufA-3#; fig. 2 and supplementary fig. S3,
Supplementary Material online) often including a downstream rpsJ and sometimes even rplC, thus forming a complete str über-operon (Lathe et al. 2000). Meanwhile, most
spdEFG-encoding genes (referred to here as fusS1 and
fusS2) show no signs of flanking str operon structure. There
are two exceptions: The fusS1 of Treponema denticola and
the fusS2 of Planctomycetes are both found in nearly canonical str operons missing only the downstream tufA,
in the case of Blastopirellula marina, including even the farther downstream rpsJ (fig. 1). This presence of str operons
1288
deeply embedded in operon-free clades suggests three possible scenarios. First, the fusS copies originated from a whole
operon duplication that was retained through much of spd
evolution and then lost independently in multiple lineages.
This would require a minimum of four operon losses for
spdEFG1 and two for spdEFG2 (fig. 1). Second, ‘‘de novo’’
operon assembly may have occurred in one or both spdEFG
lineages and been maintained through selection. Although
this seems unlikely based on pure chance, ‘‘de novo’’ assembly would be simplified by the fact that rpsL and rpsG are
already linked in most spd species (data not shown). Third,
the sudden appearance of operon structure in both spd
lineages may be due to multiple independent HGTs of
nearly complete str operons into a close ancestor of the
Spirochaetes and a common ancestor of the Planctomycetes. This would have been followed by xenologous
replacement whereby the incoming str-encoded fusA
was replaced by a resident fusS gene. This should lead
to anomalies in the rpsL and rpsG phylogenies, but
unfortunately, these genes are too small for reliable phylogenetic analysis. Xenologous replacement has been previously demonstrated for at least seven ribosomal proteins
(Brochier et al. 2000; Makarova et al. 2001; Omelchenko
et al. 2003). At present, there are not enough data to support one of these scenarios over the others; however, this
may be resolved in the future with further genome sequencing of spd-type organisms.
Functional Evolution of EF-G Types
Examination of the distribution of EF-G types among bacteria shows that the two spdEFGs never occur alone. In
most cases, they cooccur and are the only EF-Gs in the cell
(supplementary table S1, Supplementary Material online).
Four bacteria were found with only spdEFG1 and six with
only spdEFG2 (supplementary table S1, Supplementary
Material online). However, all these carry a second canonical-type EF-G, except for Leptospira spp. Where the second
EF-G, while still clearly an EF-G based on HMM screening, is
so divergent that its phylogenetic position could not be
resolved (data not shown). Likewise, all mitochondriate
eukaryotes have either both mtEFGs or one mtEFG and
a cpEFG or apiEFG. The single exception to this is the fungal
parasite, Cryptococcus neoformans (supplementary table
S1, Supplementary Material online).
The long-term and widespread comaintenance of spd/
mtEFG suggests that these are performing complementary
essential functions. The fact that either can be substituted
for by canonical EF-G suggests that these complementary
functions are subsets of canonical EF-G function. Recent
evidence indicates that EF-G has distinct roles in both
translocation and ribosome recycling. These activities
occur at different times in the translation cycle, involve
different interacting partners, and reside at least partly
in different domains of the protein. Ribosomal recycling
has especially been shown by Cryo-EM mapping to mainly
involve interaction of EF-G domains III and IV with RRF (Ito
et al. 2002; Gao et al. 2007). This is also supported by domain swapping experiments on human mtEFGs (Tsuboi
Evolution of Chloroplast and Mitochondrial EF-G · doi:10.1093/molbev/msq316
et al. 2009). In fact, human mtEFG1 and 2 have been shown
to be specific for translocation and recycling, respectively
(Tsuboi et al. 2009), as have spdEFG1 and 2, respectively, of
Borelia burgdorferi (Suematsu et al. 2010).
If all spd/mtEFGs have indeed become specialized for
these two functions, then we would expect to see different
patterns of sequence conservation in the two EF-G
types consistent with these two roles. Specifically, we expect
to see stronger conservation of ribosome recycling–
specific sites in spd/mtEFG2 and stronger conservation of
translocation-specific sites in spd/mtEFG1. We therefore analyzed patterns of sequence conservation in the two spd/
mtEFGs relative to canonical EF-G by comparing consensus
sequences and site-specific evolutionary rates (Gu 2001,
2006; Gu and Vander Velden 2002) and mapping these onto
the structural model of EF-G:RRF (Gao et al. 2007) and the
structure of EF-G on the ribosome (Gao et al. 2009).
Cryo-EM mapping of the E. coli EFG:RRF complex predicts five sets of contact points or ‘‘patches’’ on EF-G.
Two highly specific patches are predicted in domain III (positions 421–424 and 492–492, fig. 2) and three less-specific
contacts in domain IV (positions 508–510, 558–561 þ 567,
and 590–594, fig. 2), where the interactions between EF-G
and RRF are predicted to be largely general surface charge
repulsion (Gao et al. 2007). Inspection of a semistrict consensus sequence for spd/mtEFG1 (fig. 2) indicates little
conservation at these sites or within the expected error
limit of the cryo-EM mapping (±1 residue, Gao N, personal
communication). However, surprisingly, there is even less
conservation of these sites in spd/mtEFG2 (fig. 2). In domain III, patch 1 (PKTK in canonical EF-G; fig. 2), spd/
mtEFG1 conserves only P421, but there is no conservation
here in mtEFG2. In fact, there is a wide range of residues at
both the second and fourth position of this patch (.13
alternative residues), including extremely dissimilar amino
acids. Although patch 2 (TIxk in canonical EF-G, fig. 2) is
moderately if not identically well conserved in all three EFG types, mtEFG2 also has several small indels in this region.
Thus, the two domain III–predicted RRF interaction sites
are highly variable in spd/mtEFG2.
In domain IV, Patch 3 (SGG in canonical EF-G; fig. 2) is
highly conserved in spd/mtEFG1 (70% consensus TGG),
probably because this patch also has an important role
in translocation (see below). In contrast, there is again
no conservation here in spd/mtEFG2, plus several indels
occur here including a five amino acid deletion in S. cerevisiae mtEFG2 (data not shown). Patch 4 (558–561 and
G567; fig. 2) is poorly conserved even in canonical EF-G,
except for G567, which is more or less conserved in all three
EF-G types. Finally, patch 5 (mAFKi in canonical EF-G; fig. 2)
is moderately conserved in spd/mtEFG1 but poorly conserved in spd/mtEFG2 including a wide variety of
substituted amino acids of all possible chemical categories.
Thus, spd/mtEFG1 shows at least as much and, in most
cases, much more retention of predicted RRF interacting
sites than spd/mtEFG2
In terms of overall sequence conservation, the different
EF-G forms also show quite different patterns (table 1).
MBE
Compared with canonical EF-G, spd/mtEFG1 has lost a total
of 87–88 consensus sites but gained 79–88 new ones (table
1; yellow highlighting in fig. 2). In contrast, spd/mtEFG2 has
lost 147–162 consensus sites relative to canonical EF-G
but added only seven new ones (table 1; blue highlighting
in fig. 2). In addition, spd/mtEFG1 has also ‘‘remodeled’’ 16
of the consensus sites shared with canonical EF-G, that is,
these sites are still strongly conserved in spd/mtEFG1 but
with a new consensus residue, often with very different
chemical characteristics (table 1; green highlighting in
fig. 2). Meanwhile, spd/mtEFG2 has only remodeled one
canonical EF-G consensus site. Thus, it appears that spd/
mtEFG1 has been substantially remodeled relative to canonical EF-G, possibly becoming more specialized or modifying its preexisting role in translocation. Meanwhile, spd/
mtEFG2 has mostly just deteriorated. Although the latter is
consistent with subfunctionalization, the actual pattern of
conservation (fig. 2) does not appear to be consistent with
retention of RRF interaction (Gao et al. 2009). Sites that
DIVERGE 2.0 (Gu 2001, 2006; Gu and Vander Velden
2002) predicts at the 0.90 PP level to have experienced
a shift in substitution rates (Type I divergence) between
spd/mtEFG1 and spd/mtEFG2 are indicated on figure 2.
Similar to the consensuses comparison results, the sites
with a shift in substitution rate show no correlation with
predicted RRF interaction sites. Additionally, DIVERGE
failed to identify any substitutions that both shift physiochemical properties and are completely fixed among all
members of the subgroups (Type II functionally divergent
[hII] sites; hII ML ,0).
The lack of signal of subfunctionalization at RRF-interacting sites seems inconsistent with the fact that spdEFG1
and 2 have been shown to be subfunctionalized for translocation and recycling, respectively, in Borelia (Tsuboi et al.
2009) as have mtEFG1 and 2 in human mitochondria
(Tsuboi et al. 2009; Suematsu et al. 2010). There are various
possible explanations for the lack of RRF interacting site
conservation in spd/mtEFG2. 1) It is possible that only
some spd/mtEFG2s are subfunctionalized for ribosome recycling. This would require another explanation for the
long-term comaintenance of both spd/mtEFGs. 2) There
could be balancing mutations in RRF that compensate
for spdEFG2 mutations. However, inspection of an RRF
alignment shows the same pattern of sequence conservation at EF-G-interacting sites in canonical and spdEFG2carrying bacteria (supplementary fig. S4, Supplementary
Material online). 3) EF-G:RRF interactions may be highly
species specific and therefore rapidly evolving. This is supported by the observation that mutational defects in one
RRF:EF-G intermolecular interface can be compensated for
by gain-of-function mutations at another interface (Ito
et al. 2002). The latter also suggests that ribosome recycling
may not require highly specific interactions between EF-G
and RRF but rather simply the maintenance of an EF-G–like
structure with ribosome-binding capabilities and appropriate surface charges. This could be sufficient to cause the
necessary repulsion of the RRF head domain (Gao et al.
2007). Indeed, many of the sites that are strongly conserved
1289
MBE
Atkinson and Baldauf · doi:10.1093/molbev/msq316
in spd/mtEFG2 are internal to the structure and responsible for forming networks of interactions for structural
integrity (fig. 2).
On the other hand, there is strong evidence of loss
of translocation function by spd/mtEFG2. The greatest
concentration of sites with substitution rate shifts
between spd/mtEFG1 and spd/mtEFG2 is around the
tRNA-interacting loops (positions 500–520 and 579–590;
fig. 2) located at the very tip of the domain IV structure
(Gao et al. 2009). These sites are highly conserved in
spd/mtEFG1, but degraded in spd/mtEFG2, consistent with
loss of translocation function by the latter. However, the
region of EF-G (positions 459–466; fig. 2) that interacts with
the sarcin–ricin loop (SRL) of rRNA is 100% conserved in all
three EF-G subtypes. The SRL has been proposed to be
critical for EF-G and EF-Tu discrimination and EF-G*GTP
binding to the ribosome A site (Sergiev et al. 2005;
Garcia-Ortega et al. 2010). This suggests that specific interactions on the ribosome may differ among subtypes, but
important structural features required for ribosome binding are preserved.
Thus, the key to the subfunctionalization of EF-G may lie
with spd/mtEFG1. Unlike spd/mtEFG2, the latter shows
substantial remodeling relative to canonical EF-G (fig.2;
table 1). Much of this remodeling appears to be in the
GTPase domain, where 13 canonical consensus sites are
lost, 22 new ones are gained, and 7 are conserved but with
a new residue (vs. 4–17, 4–11, and 0–4 for the other four
domains, respectively, table 1). This domain is also the
location of the three amino acid insertion unique to
spd/mtEFG1 (fig. 2). Together, these G domain modifications suggest that spd/mtEFG1 evolution has involved
changes in how it interacts with and/or is regulated by
nucleotides. The three amino acid insertion is particularly
interesting as it occurs in the ‘‘switch I’’ region of the G
domain (fig. 2). Switch I is a flexible structural element
between the G1 and G2 GTP-binding loops, responsible
for accommodating of the nucleotide in the G domain.
Its length varies among trGTPase families, reflecting
family-specific differences in nucleotide interactions
(Hauryliuk et al. 2009). In EF-G’s GDP-bound and nucleotide-free states, switch I is disordered in structure, but on
binding GTP, it becomes structured by locking onto the
gamma phosphate of GTP (Connell et al. 2007; Hauryliuk
et al. 2008; Gao et al. 2009; Ticu et al. 2009). After GTP hydrolysis and peptidyl-tRNA translocation, the switch
changes conformation again, flipping out from the ribosome and promoting release of GDP and EF-G from the
ribosome (Connell et al. 2007; Hauryliuk et al. 2008; Gao
et al. 2009; Ticu et al. 2009).
In this respect, it is interesting to note that human
mtEFG1 (Bhargava et al. 2004) and B. burgdorferi spdEFG1
(Suematsu et al. 2010) are resistant to fusidic acid (FA), an
antibiotic that inhibits release of EF-G from the ribosome,
thus blocking translation. FA binding appears to require an
open disordered conformation of the switch I loop (Gao
et al. 2009). Thus, the GVG insertion could change the
structure of the open conformation of switch I and make
1290
spd/mtEFG1 inaccessible to FA. Interestingly, spd/mtEFG2
is also resistant to FA, which also inhibits ribosome
recycling by E. coli EF-G (Savelsbergh et al. 2009; Suematsu
et al. 2010). The cause here might also be switch I,
which has also experienced multiple substitutions in
spd/mtEFG2 relative to canonical EF-G (fig. 2). However,
unlike spdEFG1, this is more of a loss of conservation
rather than a strongly differently conserved motif. Therefore, FA resistance of spdEFG2 might be a lineage-specific
phenomenon.
Our analyses of site-specific conservation and substitution rates clearly show retention and loss of translocation
functions in spd/mtEFG1 and spd/mtEFG2, respectively,
consistent with experimental data on subfunctionalization
(Tsuboi et al. 2009; Suematsu et al. 2010). However, this
subfunctionalization appears asymmetric as we find surprisingly little signal in the primary sequence of spd/
mtEFG2 for specialization for ribosome recycling. As
spdEFG2 is in fact most strongly conserved in structurally
important regions, the advantage of maintaining spdEFG2
may have been as a structural mimic of EF-G, capable of
binding the ribosome and promoting recycling primarily
through surface charge repulsion. This would have released
spdEFG1 from its recycling responsibilities, allowing it to
become more specialized for its role in translocation. Substantial remodeling of the G domain suggests that this specialization involved modifying or refining its interactions
with nucleotides. The spdEFG pair subsequently entered
eukaryotes already subfunctionalized in a frozen accident
that survived both mtEFG genes being transferred from
the mitochondrial genome to the nucleus.
Supplementary Material
Supplementary tables 1–2 and figs. 1–5 are available at Molecular Biology and Evolution online (http://www.mbe
.oxfordjournals.org/).
Acknowledgments
We wish to thank S. Andersson, V. Hauryliuk, T. Margus,
T. Tenson, O. Fiz-Palacio, and T. Ettema for helpful discussions and critical reading of the manuscript. We also wish
to thank the Department of Energy Joint Genome Institute
for the use of unpublished sequence data. This work was
primarily supported by a Westin Foundation Grant (S.L.B.
and G.C.A.) and VR grant 70495101 (S.L.B.), with additional
funds from Estonian Science Foundation Mobilitas grant
MJD99 (G.C.A.), and the European Regional Development
Fund through the Center of Excellence in Chemical Biology
(G.C.A.).
References
Ævarsson A, Brazhnikov E, Garber M, Zheltonosova J, Chirgadze Y,
al-Karadaghi S, Svensson LA, Liljas A. 1994. Three-dimensional
structure of the ribosomal translocase: elongation factor G from
Thermus thermophilus. EMBO J. 13:3669–3677.
Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W,
Lipman DJ. 1997. Gapped BLAST and PSI-BLAST: a new
Evolution of Chloroplast and Mitochondrial EF-G · doi:10.1093/molbev/msq316
generation of protein database search programs. Nucleic Acids
Res. 25:3389–3402.
Archibald JM. 2009. The puzzle of plastid evolution. Curr Biol.
19:R81–R88.
Atkinson G, Baldauf S. 2009. Evolution of the Translational GTPase
Superfamily. J Biomol Struct Dyn. 26:841–842.
Baldauf SL, Palmer JD. 1990. Evolutionary transfer of the chloroplast
tufA gene to the nucleus. Nature 344:262–265.
Baldauf SL, Palmer JD, Doolittle WF. 1996. The root of the universal
tree and the origin of eukaryotes based on elongation factor
phylogeny. Proc Natl Acad Sci U S A. 93:7749–7754.
Bergsten J. 2005. A review of long-branch attraction. Cladistics
21:163–193.
Bhargava K, Templeton P, Spremulli L. 2004. Expression and
characterization of isoform 1 of human mitochondrial elongation factor G. Protein Expr Purif. 37:368–376.
Blanchard JL, Hicks JS. 1999. The non-photosynthetic plastid in
malarial parasites and other apicomplexans is derived from
outside the green plastid lineage. J Eukaryot Microbiol. 46:367–375.
Brindefalk B, Viklund J, Larsson D, Thollesson M, Andersson S. 2007.
Origin and evolution of the mitochondrial aminoacyl-tRNA
synthetases. Mol Biol Evol. 24:743–756.
Brochier C, Bapteste E, Moreira D, Philippe H. 2002. Eubacterial
phylogeny based on translational apparatus proteins. Trends
Genet. 18:1–5.
Brochier C, Philippe H, Moreira D. 2000. The evolutionary history of
ribosomal protein RpS14: horizontal gene transfer at the heart of
the ribosome. Trends Genet. 16:529–533.
Cammarano P, Creti R, Sanangelantoni AM, Palm P. 1999. The
archaea monophyly issue: a phylogeny of translational elongation factor G(2) sequences inferred from an optimized selection
of alignment positions. J Mol Evol. 49:524–537.
Ciccarelli FD, Doerks T, von Mering C, Creevey CJ, Snel B, Bork P.
2006. Toward automatic reconstruction of a highly resolved tree
of life. Science 311:1283–1287.
Ciferri O, Tiboni O. 1973. Elongation factors for chloroplast and
mitochondrial protein synthesis in Chlorella vulgaris. Nat New
Biol. 245:209–211.
Cole JR, Wang Q, Cardenas E, et al. (11 co-authors). 2009. The
ribosomal database project: improved alignments and new tools
for rRNA analysis. Nucleic Acids Res. 37:D141–D145.
Connell SR, Takemoto C, Wilson DN, et al. (15 co-authors). 2007.
Structural basis for interaction of the ribosome with the
switch regions of GTP-bound elongation factors. Mol Cell. 25:
751–764.
Delsuc F, Brinkmann H, Philippe H. 2005. Phylogenomics and the
reconstruction of the tree of life. Nat Rev Genet. 6:361–375.
Doolittle W. 1999. Phylogenetic classification and the universal tree.
Science 284:2124–2129.
Duchene AM, Giritch A, Hoffmann B, Cognat V, Lancelin D,
Peeters NM, Zaepfel M, Marechal-Drouard L, Small ID. 2005.
Dual targeting is the rule for organellar aminoacyl-tRNA
synthetases in Arabidopsis thaliana. Proc Natl Acad Sci U S A.
102:16484–16489.
Emanuelsson O, Nielsen H, Brunak S, von Heijne G. 2000. Predicting
subcellular localization of proteins based on their N-terminal
amino acid sequence. J Mol Biol. 300:1005–1016.
Embley TM, Martin W. 2006. Eukaryotic evolution, changes and
challenges. Nature 440:623–630.
Gao N, Zavalla AV, Ehrenberg M, Frank J. 2007. Specific interaction
between EF-G and RRF and its implication for GTP-dependent
ribosome splitting into subunits. J Mol Biol. 374:1345–1358.
Gao YG, Selmer M, Dunham CM, Weixlbaumer A, Kelley AC,
Ramakrishnan V. 2009. The structure of the ribosome with
elongation factor G trapped in the posttranslocational state.
Science 326:694–699.
MBE
Garcia-Ortega L, Alvarez-Garcia E, Gavilanes JG, Martinez-DelPozo A, Joseph S. 2010. Cleavage of the sarcin-ricin loop of 23S
rRNA differentially affects EF-G and EF-Tu binding. Nucleic Acids
Res. 38:4108–4119.
Grandi M, Kuntzel H. 1970. Mitochondrial peptide chain elongation
factors from Neurospora crassa. FEBS Lett. 10:25–28.
Gu X. 2006. A simple statistical method for estimating type-II
(cluster-specific) functional divergence of protein sequences.
Mol Biol Evol. 23:1937–1945.
Gu X. 2001. Maximum-likelihood approach for gene family
evolution under functional divergence. Mol Biol Evol.
18:453–464.
Gu X, Vander Velden K. 2002. DIVERGE: phylogeny-based analysis
for functional-structural divergence of a protein family.
Bioinformatics 18:500–501.
Hauryliuk V, Mitkevich VA, Draycheva A, Tankov S, Shyp V,
Ermakov A, Kulikova AA, Makarov AA, Ehrenberg M. 2009.
Thermodynamics of GTP and GDP binding to bacterial initiation
factor 2 suggests two types of structural transitions. J Mol Biol.
394:621–626.
Hauryliuk V, Mitkevich VA, Eliseeva NA, Petrushanko IY,
Ehrenberg M, Makarov AA. 2008. The pretranslocation ribosome
is targeted by GTP-bound EF-G in partially activated form. Proc
Natl Acad Sci U S A. 105:15678–15683.
Heazlewood JL, Verboom RE, Tonti-Filippini J, Small I, Millar AH.
2007. SUBA: the Arabidopsis subcellular database. Nucleic Acids
Res. 35:D213–D218.
Hillis DM, Bull JJ. 1993. An empirical test of bootstrapping as
a method for assessing confidence in phylogenetic analysis. Syst
Biol. 42:182–192.
Hirokawa G, Nijman RM, Raj VS, Kaji H, Igarashi K, Kaji A. 2005. The
role of ribosome recycling factor in dissociation of 70S
ribosomes into subunits. RNA 11:1317–1328.
Ito K, Fujiwara T, Toyoda T, Nakamura Y. 2002. Elongation factor G
participates in ribosome disassembly by interacting with
ribosome recycling factor at their tRNA-mimicry domains. Mol
Cell. 9:1263–1272.
Iwabe N, Kuma K, Hasegawa M, Osawa S, Miyata T. 1989.
Evolutionary relationship of archaebacteria, eubacteria, and
eukaryotes inferred from phylogenetic trees of duplicated genes.
Proc Natl Acad Sci U S A. 86:9355–9359.
Jain R, Rivera M, Lake J. 1999. Horizontal gene transfer among
genomes: the complexity hypothesis. Proc Natl Acad Sci U S A.
96:3801–3806.
Katoh K, Kuma K, Toh H, Miyata T. 2005. MAFFT version 5:
improvement in accuracy of multiple sequence alignment.
Nucleic Acids Res. 33:511–518.
Keeling PJ, Burger G, Durnford DG, Lang BF, Lee RW, Pearlman RE,
Roger AJ, Gray MW. 2005. The tree of eukaryotes. Trends Ecol
Evol. 20:670–676.
Kurland C, Andersson S. 2000. Origin and evolution of the
mitochondrial proteome. Microbiol Mol Biol Rev. 64:786–820.
Lang BF, Burger G, O’Kelly CJ, Cedergren R, Golding GB, Lemieux C,
Sankoff D, Turmel M, Gray MW. 1997. An ancestral mitochondrial DNA resembling a eubacterial genome in miniature. Nature
387:493–497.
Lathe WC 3rd, Snel B, Bork P. 2000. Gene context conservation of
a higher order than operons. Trends Biochem Sci. 25:474–479.
Leipe D, Wolf Y, Koonin E, Aravind L. 2002. Classification and
evolution of P-loop GTPases and related ATPases. J Mol Biol.
317:41–72.
Maglott D, Ostell J, Pruitt K, Tatusova T. 2007. Entrez Gene: genecentered information at NCBI. Nucleic Acids Res. 35:D26–D31.
Makarova KS, Ponomarev VA, Koonin EV. 2001. Two C or not
two C: recurrent disruption of Zn-ribbons, gene duplication,
lineage-specific gene loss, and horizontal gene transfer in
1291
Atkinson and Baldauf · doi:10.1093/molbev/msq316
evolution of bacterial ribosomal proteins. Genome Biol. 2:RESEARCH 0033
Margus T, Remm M, Tenson T. 2007. Phylogenetic distribution of
translational GTPases in bacteria. BMC Genomics. 8:15.
Martin W. 2003. Gene transfer from organelles to the nucleus:
frequent and in big chunks. Proc Natl Acad Sci U S A.
100:8612–8614.
Nosenko T, Lidie K, Van Dolah F, Lindquist E, Cheng J,
Bhattacharya D. 2006. Chimeric plastid proteome in the Florida
‘‘red tide’’ dinoflagellate Karenia brevis. Mol Biol Evol.
23:2026–2038.
Nyborg J, Nissen P, Kjeldgaard M, Thirup S, Polekhina G, Clark BF.
1996. Structure of the ternary complex of EF-Tu: macromolecular mimicry in translation. Trends Biochem Sci. 21:81–82.
Omelchenko M, Makarova K, Wolf Y, Rogozin I, Koonin E. 2003.
Evolution of mosaic operons by horizontal gene transfer and
gene displacement in situ. Genome Biol. 4:R55.
Pino P, Foth BJ, Kwok LY, Sheiner L, Schepers R, Soldati T, SoldatiFavre D. 2007. Dual targeting of antioxidant and metabolic
enzymes to the mitochondrion and the apicoplast of Toxoplasma gondii. PLoS Pathog. 3:e115.
Post L, Arfsten A, Reusser F, Nomura M. 1978. DNA sequences of
promoter regions for the str and spc ribosomal protein operons
in E. coli. Cell 15:215–229.
Reyes-Prieto A, Yoon HS, Moustafa A, Yang EC, Andersen RA,
Boo SM, Nakayama T, Ishida KI, Bhattacharya D. 2010.
Differential gene retention in plastids of common recent origin.
Mol Biol Evol. 27:1530–1537.
Robinson-Rechavi M, Huchon D. 2000. RRTree: relative-rate tests
between groups of sequences on a phylogenetic tree.
Bioinformatics 16:296–297.
Rodnina MV, Savelsbergh A, Katunin VI, Wintermeyer W. 1997.
Hydrolysis of GTP by elongation factor G drives tRNA
movement on the ribosome. Nature 385:37–41.
Ronquist F, Huelsenbeck J. 2003. MrBayes 3: Bayesian phylogenetic
inference under mixed models. Bioinformatics 19:1572–1574.
Savelsbergh A, Rodnina MV, Wintermeyer W. 2009. Distinct
functions of elongation factor G in ribosome recycling and
translocation. RNA 15:772–780.
Sergiev PV, Bogdanov AA, Dontsova OA. 2005. How can
elongation factors EF-G and EF-Tu discriminate the functional
1292
MBE
state of the ribosome using the same binding site? FEBS Lett.
579:5439–5442.
Sickmann A, Reinders J, Wagner Y, et al. (13 co-authors). 2003. The
proteome of Saccharomyces cerevisiae mitochondria. Proc Natl
Acad Sci U S A. 100:13207–13212.
Smith DG, Gawryluk RM, Spencer DF, Pearlman RE, Siu KW,
Gray MW. 2007. Exploring the mitochondrial proteome of
the ciliate protozoon Tetrahymena thermophila: direct
analysis by tandem mass spectrometry. J Mol Biol. 374:
837–863.
Stamatakis A. 2006. RAxML-VI-HPC: maximum likelihood-based
phylogenetic analyses with thousands of taxa and mixed models.
Bioinformatics 22:2688–2690.
Suematsu T, Yokobori SI, Morita H, Yoshinari S, Ueda T, Kita K,
Takeuchi N, Watanabe YI. 2010. A bacterial elongation factor
G homolog exclusively functions in ribosome recycling in
the spirochaete Borrelia burgdorferi. Mol Microbiol. 75:
1445–1454.
Talavera G, Castresana J. 2007. Improvement of phylogenies after
removing divergent and ambiguously aligned blocks from
protein sequence alignments. Syst Biol. 56:564–577.
Ticu C, Nechifor R, Nguyen B, Desrosiers M, Wilson KS. 2009.
Conformational changes in switch I of EF-G drive its directional
cycling on and off the ribosome. EMBO J. 28:2053–2065.
Timmis JN, Ayliffe MA, Huang CY, Martin W. 2004. Endosymbiotic
gene transfer: organelle genomes forge eukaryotic chromosomes.
Nat Rev Genet. 5:123–135.
Tsuboi M, Morita H, Nozaki Y, Akama K, Ueda T, Ito K,
Nierhaus KH, Takeuchi N. 2009. EF-G2mt is an exclusive
recycling factor in mammalian mitochondrial protein synthesis.
Mol Cell. 35:502–510.
Wellner A, Lurie MN, Gophna U. 2007. Complexity, connectivity,
and duplicability as barriers to lateral gene transfer. Genome Biol.
8:R156.
Zavialov A, Hauryliuk V, Ehrenberg M. 2005. Splitting of the
posttermination ribosome into subunits by the concerted action
of RRF and EF-G. Mol Cell. 18:675–686.
Zuegge J, Ralph S, Schmuker M, McFadden GI, Schneider G. 2001.
Deciphering apicoplast targeting signals—feature extraction
from nuclear-encoded precursors of Plasmodium falciparum
apicoplast proteins. Gene 280:19–26.