Download Origin and Evolution of the Mitochondrial Aminoacyl

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Triclocarban wikipedia , lookup

Marine microorganism wikipedia , lookup

Metagenomics wikipedia , lookup

Community fingerprinting wikipedia , lookup

Horizontal gene transfer wikipedia , lookup

Transcript
Origin and Evolution of the Mitochondrial Aminoacyl-tRNA Synthetases
Björn Brindefalk, Johan Viklund, Daniel Larsson, Mikael Thollesson, and Siv G. E. Andersson
Department of Molecular Evolution, Evolutionary Biology Center, Uppsala University, Uppsala, Sweden
Many theories favor a fusion of 2 prokaryotic genomes for the origin of the Eukaryotes, but there are disagreements on the
origin, timing, and cellular structures of the cells involved. Equally controversial is the source of the nuclear genes for
mitochondrial proteins, although the a-proteobacterial contribution to the mitochondrial genome is well established. Phylogenetic inferences show that the nuclearly encoded mitochondrial aminoacyl-tRNA synthetases (aaRSs) occupy a position in the tree that is not close to any of the currently sequenced a-proteobacterial genomes, despite cohesive and
remarkably well-resolved a-proteobacterial clades in 12 of the 20 trees. Two or more a-proteobacterial clusters were
observed in 8 cases, indicative of differential loss of paralogous genes or horizontal gene transfer. Replacement and retargeting events within the nuclear genomes of the Eukaryotes was indicated in 10 trees, 4 of which also show split aproteobacterial groups. A majority of the mitochondrial aaRSs originate from within the bacterial domain, but none
specifically from the a-Proteobacteria. For some aaRS, the endosymbiotic origin may have been erased by ongoing gene
replacements on the bacterial as well as the eukaryotic side. For others that accurately resolve the a-proteobacterial divergence patterns, the lack of affiliation with mitochondria is more surprising. We hypothesize that the ancestral eukaryotic
gene pool hosted primordial ‘‘bacterial-like’’ genes, to which a limited set of a-proteobacterial genes, mostly coding for
components of the respiratory chain complexes, were added and selectively maintained.
Introduction
The evolutionary origin of the eukaryotic genome is
debated, particularly the extent to which it is the product
of chimerism, genome fusion, or endosymbiotic events
(Embley and Martin 2006; Kurland et al. 2006). A hypothesis based on a recent analysis of gene content data is that
the eukaryotic genome is the result of a fusion of 2 prokaryotic genomes (Rivera and Lake 2004). Likewise, various
endosymbiotic models for the origin of mitochondria and
chloroplasts from a-Proteobacteria and Cyanobacteria imply transfers of bacterial genes into the nuclear genome of
the eukaryotic host (reviewed in Gray et al. 1999; Dyall
et al. 2004; Kurland et al. 2006). Yet, the question of
whether the eubacterial partner in the fusion event corresponds to the endosymbiont that contributed the mitochondrial genome (Martin and Koonin 2006) is left unresolved
(Martin and Embley 2004; Bapteste and Walsh 2005). Also
debated is whether the host was a nucleus-less archaebacterium (Martin 2005) or a highly compartmentalized
eukaryotic cell (Mans et al. 2003).
Endosymbiotic models for the origin of mitochondria
posit that there was a massive transfer of genes from the
bacterial endosymbiont to the nuclear genome of the host.
Thus, the expectation is that mitochondrial and nuclear genomes of the Eukaryotes contain genes of a-proteobacterial
ancestry, as verified by case studies of mitochondrial proteins involved in aerobic respiration (Gray et al. 1999; Hrdy
et al. 2004; Fitzpatrick et al. 2006). However, broader phylogenomic studies show that less than 20% of the proteins
examined in the mitochondrial proteomes can be traced
back with confidence to the a-Proteobacteria (Gabaldon
and Huynen 2003; Karlberg and Andersson 2003). More
than 50% of the proteins in the mitochondrial proteomes
have no bacterial homologs, and for the remaining circa
30% that have bacterial homologs, a confirmation of the aproteobacterial descent is lacking (Karlberg and Andersson
2003). The problems in identifying the source of the many
Key words: mitochondria, phylogeny, aminoacyl-tRNA synthetase.
E-mail: [email protected].
Mol. Biol. Evol. 24(3):743–756. 2007
doi:10.1093/molbev/msl202
Advance Access publication December 20, 2006
bacterial-like genes in the eukaryotic genome have been attributed to rampant horizontal gene transfer, differential
gene deletion, extensive duplication, and loss of the phylogenetic signal at deep evolutionary divergences (Creevey
et al. 2004; Martin and Embley 2004; Bapteste and Walsh
2005). Indeed, only a small set of essential and broadly
distributed core proteins may retain enough of the phylogenetic signal to support inferences of very deep evolutionary
relationships.
The focus of this study is on the evolutionary origin of
the genes for the aminoacyl-tRNA synthetases (aaRS). Because of their ubiquity, conservation, specificity, and defined
interactions in protein synthesis, the aaRS represent important keys to resolve early cellular evolution (Diaz-Lazcoz
et al. 1998; Wolf et al. 1999; Woese et al. 2000; Brown
2001, 2003). Although the canonical patterns have been
partially eroded by duplication, divergence, and horizontal
gene transfers (Brown et al. 2003), traces of ancestral relationships are still evident in many aaRS trees (Woese
et al. 2000). A particular advantage with the aaRSs for the
purpose of this study is that there are normally 2 nuclear genes
of different origins that are targeted to the cytoplasm and the
mitochondrion (Kurland and Andersson 2000). This makes it
easy to identify and exclude cases of intranuclear gene duplication and replacement events, disguised as duplicate genes
of the same origin or as a single gene of either bacterial or
eukaryotic descent (Kurland and Andersson 2000).
Our analysis of the aaRS shows that the phylogenetic
signal is retained in a majority of cases, as indicated by wellresolved a-proteobacterial species divergences. Yet, none of
the mitochondrial aaRS cluster within the a-proteobacterial
clade. This contrasts with the divergence pattern observed for
other proteins in the mitochondrial energy production system, which cluster with the a-Proteobacteria. The implication of these conflicting signals is discussed in the context
of suggested fusion models for the origin of the nuclear
genomes of the Eukaryotes.
Methods
Sequence Data
Complete genome sequence data as well as protein
sequences of the aaRS were directly retrieved from the
Ó 2006 The Authors.
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/
by-nc/2.0/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
744 Brindefalk et al.
National Center for Biotechnology Information database
(http://www.ncbi.nlm.nih.gov/genomes/lproks.cgi). In the
very few cases where no previous annotation was available,
the aaRS genes were identified using the BlastP algorithm
(Altschul et al. 1997) with the sequence from the most
closely related species used as a query and the best hit
in the genome presumed to be the homolog. The data sets
used in the analysis consisted of a total of 70 species, selected from completely sequenced genomes (as of 1 October
2005). All published a-proteobacterial genome sequences,
a representative selection of other eubacterial groups, and
a number of Archaea and Eukaryotes were included for
the phylogenetic analysis of the mitochondrial aaRS. In
some cases the number of species was lower when no
apparent homolog could be found.
Sequence Alignment
Protein sequences were aligned using ClustalW 1.81
(Thompson et al. 1994), and gaps were manually edited with
the aid of the Seaview alignment editor (Galtier et al. 1996).
The program SOAP, version 1.1 (Löytynoja and Milinkovitch
2001) was used to check for regions of ambiguous alignment
by realignments using a wide range of different parameter settings. Nucleotide sequences were aligned using the DiAlign
software, version 2.2.1 (Morgenstern 2004), with similarity
calculated at the peptide level.
Model Selection
Appropriate protein models were selected for each of
the aaRS using the software MODELGENERATOR, version 0.82 (Keane et al. 2006). In all cases the proposed
model was WAG (Whelan and Goldman 2001).
Phylogenetic Inference
Neighbor-Joining (NJ) trees were constructed using
the amino acid sequences after global and pairwise gap removal, using Phylo_win, version 2.0 (Galtier et al. 1996) on
Poisson distances. Maximum likelihood (ML) analyses
were done with PHYML, version 2.4.4 (Guindon and
Gascuel 2003) using the most appropriate model of protein
evolution. Bootstrap support values were derived from 500
replicates for the NJ analyses and 100 replicates for the ML
analyses, with the exception of the ML LeuRS tree, for
which 500 bootstrap replicates were done. The above steps
were performed for all the 18 aaRS, unless otherwise stated.
Bayesian analyses were done on the PheRS and
LeuRS alignments using protein models with MrBayes
(MPI version; Hulsenbeck and Ronquist 2001). We applied
a fixed rate (empirical) mixture model with a gamma distributed rate variation; the overwhelmingly dominating
contribution came from the WAG model. Two independent
Markov chains, each with 4 differently heated chains, were
run for 106 generations, and the first 105 generations were
discarded as burn-in; all free parameters were examined for
convergence. LogDet distances (Lockhart et al. 1994) were
calculated and to account for differences in amino acid bias
using LDDist version 1.4 (Thollesson 2004). NJ trees as
well as Neighbor Net networks to visualize conflicting
phylogenetic signals (Bryant and Moulton 2004) were
constructed with SplitsTree, version 4 (Huson 1998). NJ
trees were also constructed using the distance method
of Gaultier and Gouy (1995) on nucleotides from first and
second codon positions.
Phylogenetic Hypothesis Testing
To test the hypothesis that the best phylogeny where
the mitochondrial sequences form a clade with the aProteobacteria is not significantly worse than the unconstrained best hypothesis, we used the SOWH test (Swofford
et al. 1996) as described by Goldman and coworkers (as
posPfud; Goldman et al. 2000) on the LeuRS and PheRS
data sets. The test statistic is the (double) difference in
log likelihood between the unconstrained and constrained
optimal trees, and the null distribution is calculated using
parametric bootstrapping. TreeFinder, version of October
2005 (Jobb et al. 2004), was used to find the best unconstrained and constrained (forcing mitochondrial and aproteobacterial sequences to form a clade) and simulated
data sets (100 replicates) were created with Seq-Gen, version
1.3.2 (Rambaut and Grassly 1997) on the best-constrained
hypothesis. The substitution model and parameters used
were the one found by the Bayesian analysis (WAG, gamma
distributed heterogeneity). For the simulated data sets,
amino acid sites corresponding to gaps in the real aligned
data were replaced with gaps. Recombination tests were performed using Topali V2, version 2.09 (Milne et al. 2004).
Relative Substitution Frequency Estimates
To assess if LeuRS and/or PheRS sequences show an
elevated substitution rate in Eukaryotes compared with the
a-Proteobacteria, we did a separate Bayesian analysis of the
2 groups. The data were composed of 6 selected amino acid
sequences that in Eukaryotes represent mitochondrial and
nuclear genes, as well as mitochondrial and cytoplasmic
proteins, in addition to PheRS and LeuRS. The data was
partitioned in 8 separate character partitions, each with
its own rate heterogeneity parameter, but with the same topology. Each partition, however, had its own rate multiplier
allowing for different relative substitution rates. These
relative rates from the analyses of the eukaryote and aproteobacterial groups were then compared.
Phylogenomic Analyses
The sequences from the 630 putative protomitochondrial proteins were extracted from the supplementary data
in Gabaldon and Huynen (2003) and blasted against all prokaryotic genomes (as of 1 October 2005, http://www.ncbi.
nlm.nih.gov/genomes/lproks.cgi) using BlastP (Altschul
et al. 1997). Best hits (E , 10 3) were extracted and
aligned using ClustalW 1.81 (Thompson et al. 1994). NJ
trees were generated for our updated set of homologous proteins and compared with the published set of homologous
proteins (Gabaldon and Huynen 2003). The constructed
trees were placed in a database that also included the original trees (Gabaldon and Huynen 2003). A viewer was written in-house to enable visual inspection and comparison of
the 2 sets of trees.
Evolution of Mitochondrial Aminoacyl-tRNA Synthetase 745
FIG. 1.—The cohesion and divergence patterns of the a-proteobacterial genomes are supported by 12 of the aaRSs. (A) The topology, bootstrap
support values, and genes included in the a-proteobacterial ‘‘species tree’’ were taken from Boussau et al. (2004). (B) The species tree topology is broadly
supported by 12 aaRSs according to a ML analysis of the same set of 13 a-proteobacterial species used for construction of the species tree (supplementary
fig. 2, Supplementary Material online). Bootstrap support values in boxes are based on ML analyses applied to the PheRS, LeuRS, AspRS, ProRS, TyrRS,
TrpRS, AlaRS, AsnRS, CysRS, GlyRS, SerRS, ThrRS, ValRS, MetRS, ArgRS, HisRS, IleRS, and GlnRS alignments in that order, to be read from the left
to the right and from the top to the bottom. Bootstrap support values shown in the box at node A were taken from supplementary figure 1 (Supplementary
Material online) and those inside the boxes at other nodes were taken from supplementary figure 2 (Supplementary Material online). Bootstrap support
values not inside boxes were taken from ML analyses of the PheRS and LeuRS alignments (supplementary fig. 2, Supplementary Material online). Only
bootstrap support values above 50% are shown, ‘‘_’’ refer to nodes that are not resolved. The animals and plants indicate the eukaryotic hosts for modern aproteobacterial species.
Results
We were interested in examining the a-proteobacterial
contribution to the eukaryotic genome by phylogenetic inference of the mitochondrial aaRS. To this end we inferred
tree topologies for each of the 20 aaRS extracted from more
than 20 a-proteobacterial and 10 eukaryotic genomes among
a total of more than 70 genomes using NJ and ML methods.
We sorted the aaRS trees into 3 broad groups based on the
cohesion of the a-Proteobacteria and the number of nuclear
genes for the mitochondrial and cytoplasmic aaRS. The rationale was that the proteins most likely to disclose the origin
of the mitochondrial aaRS are those that resolve the aproteobacterial divergence pattern and for which 2 different
nuclear genes are present in the eukaryotic genome, 1 for the
mitochondrial and 1 for the cytoplasmic enzyme.
Cohesion of the a-Proteobacteria
We first tested the retention of the phylogenetic signal for divergences corresponding to the deepest node in
the a-proteobacterial clade (fig. 1). The cohesion of the
a-proteobacterial subdivision was supported in 12 of the
20 examined aaRS trees, 11 with bootstrap support values
above 85% in the NJ and/or the ML analysis (supplementary fig. 1, Supplementary Material online). A species tree
was previously inferred for the a-proteobacterial species for
which complete genome data are available (Boussau et al.
2004; Fitzpatrick et al. 2005). The species topology suggests that members of the Rhizobiales (including human
and animal pathogens such as Bartonella and Brucella
spp. and plant-associated species such as Sinorhizobium
spp.) cluster distinctly from obligate intracellular bacteria
of the order Rickettsiales (Rickettsia, Wolbachia, Anaplasma,
and Ehrlichia) (Boussau et al. 2004; Fitzpatrick et al. 2005).
This divergence pattern within the a-Proteobacteria was
observed in all of the 12 cases, typically with bootstrap support values above 90% (supplementary fig. 2, Supplementary
Material online). Thus, in two-thirds of the cases, the tree topologies seem not to be distorted by horizontal gene transfer
events that involve the a-Proteobacteria.
A schematic representation of the 20 aaRS trees is
shown in figure 2 (for the original trees, see supplementary
746 Brindefalk et al.
FIG. 2.—Schematic illustration of the relative placement of a-proteobacterial and mitochondrial taxa in each of the 20 aaRS trees. Letters and colors
of the triangles refer to broad species categories: Red, Alfa 5 a-Proteobacteria; white, Bac 5 Bacteria; yellow, Mito 5 mitochondria; blue, Cyto 5
eukaryote cytoplasmic; black, Arc 5 Archaea; green, Plas 5 plastid and/or plant mitochondria; and mixed blue and yellow, Euk 5 eukaryotic cytoplasmic
and mitochondrial; shaded colors indicate mixed groups and a single line in a triangle indicate a single species of a different category. Sizes of the triangles
are proportional to the number of species in each group. Numbers refer to bootstrap support values (.75%) for inferences based on ML and NJ methods in
that order. The rightmost values in the PheRS and LeuRS trees were taken from a Bayesian analysis.
fig. 1, Supplementary Material online). In 6 of the 12 trees that
supported a monophyletic grouping of the a-Proteobacteria,
2 eukaryotic genes of different origins were identified (PheRS,
LeuRS, ProRS, AspRS, TrpRS, and TyrRS). These represent
our best candidates for tracing the ancestral origin of the
mitochondrial aaRS. Losses and putative replacements of
the nuclear genes were observed in the other 6 of these 12
trees (AlaRS, GlyRS, CysRS, ThrRS, AsnRS, and SerRS).
The remaining 8 trees revealed 2 or more divergent
a-proteobacterial clusters (HisRS, ValRS, IleRS, GluRS,
ArgRS, MetRS, GlnRS, and LysRS), indicative of lineagespecific loss of paralogous genes and/or horizontal gene transfers across the a-proteobacterial borders. Below, we discuss
the placement of the mitochondrial lineage in each of the 20
tree topologies.
Phylogenetic Inference of PheRS, LeuRS, AspRS,
ProRS, TyrRS, and TrpRS
To begin, we examined the topology of the PheRS,
LeuRS, AspRS, ProRS, TyrRS, and TrpRS trees, each of
which resolved the a-proteobacterial divergence pattern
and was represented by 2 nuclear genes for the mitochondrial and cytoplasmic aaRS, respectively. Phylogenetic
hypotheses were constructed with the aid of ML, NJ (supplementary fig. 1, Supplementary Material online), and
Neighbor Net methods (supplementary fig. 3, Supplementary Material online). Additionally, we applied Bayesian
analyses to the PheRS and LeuRS alignments (fig. 3).
All trees were consistent with the 3-domain hypothesis
in that the cytoplasmic and archaeal aaRS formed a group
to the exclusion of the bacterial and mitochondrial proteins
(Wolf et al. 1999; Woese et al. 2000). However, irrespective of the method used and the aaRS analyzed, the results
consistently placed the node of the mitochondrial divergence within the bacterial domain, but distinct from the
a-proteobacterial clade (fig. 2).
The PheRS tree (fig. 3A; supplementary fig. 1, Supplementary Material online) showed similar divergence patterns within the mitochondrial and cytoplasmic clusters,
with each clade being supported by bootstrap support values of 100% in the ML analysis. The mitochondrial aaRS
represented a deeply diverging clade within the bacterial domain, whereas the cytoplasmic aaRS were most similar to
Evolution of Mitochondrial Aminoacyl-tRNA Synthetase 747
FIG. 2. (Continued)
their homologs in the Archaea. The bacterial PheRS consist
of 2 subunits a and b and are encoded by the pheS and pheT
genes that are normally situated in the same operon (Brown
2001). The mitochondrial PheRS is a fusion product of the
N-terminal part of the a-subunit and the C-terminal part of
the b subunit (Roy et al. 2005). We concatenated the 2 subunits in species where the protein was split. However, to
ensure that the PheRS topology was not obscured by different signals from the 2 PheRS subunits, we also inferred
phylogenetic trees separately for the a- and b-chains. Both
subunits yielded individually the same topology as that obtained for the combined sequences (data not shown).
The LeuRS tree (fig. 3B; supplementary fig. 1, Supplementary Material online) showed a similar separation of a
broad bacterial group (that includes mitochondria) from an
archaeal-cytosolic group, except that the LeuRS from Halobacterium sp. was of the bacterial type (Dohm et al. 2006).
As in the analysis of PheRS, identical divergence patterns
were observed for the 2 sets of nuclear-encoded proteins,
with each of the cytoplasmic and mitochondrial clusters being supported by high bootstrap support values. The mitochondrial lineage clustered within the bacterial domain,
but showed no particular affiliation with either the a-Proteobacteria or any other bacterial group included in the analysis.
The ProRS tree (fig. 2; supplementary fig. 1, Supplementary Material online) showed some exceptions to the
universal pattern as also noted previously (Woese et al.
2000), such as the placement of the plastids and some bacterial species within the eukaryotic domain. Nevertheless,
similar divergence patterns were observed for the 2 sets of
nuclear-encoded proteins, with the mitochondrial aaRS representing a deeply diverging branch in the bacterial domain.
The mitochondrial AspRS lineage formed a cluster with
100% bootstrap support and was embedded in the bacterial
domain (fig. 2; supplementary fig. 1, Supplementary Material online), but again was not affiliated with any particular
bacterial group. The plant proteins clustered with the Cyanobacteria, consistent with their endosymbiotic origin. An
interesting detail in this tree is that Chlorobium tepidum
showed a close relationship with the a-Proteobacteria. Also
surprising was that Bacillus anthracis and Deinococcus radiodurans clustered within the archaeal domain.
Five bacterial subtypes were previously suggested for
TrpRS, with D. radiodurans containing 2 variants (Woese
et al. 2000). Like in the previous tree topologies, the mitochondrial enzymes were embedded within the bacterial domain, whereas the cytoplasmic aaRS clustered with those of
the Archaea (fig. 2; supplementary fig. 1, Supplementary
Material online). The plant organelle aaRS clustered with
the Cyanobacteria, suggesting that they were acquired via
the chloroplast endosymbiont. Two or more bacterial subtypes were also suggested for TyrRS (Woese et al. 2000). In
the TyrRS tree presented here, the mitochondrial aaRS from
animals and fungi represent a distinct clade in the bacterial
748 Brindefalk et al.
FIG. 2. (Continued)
domain, distantly related to a large group of bacteria including Chlamydia spp., Escherichia coli, and Bacillus subtilis.
The cytoplasmic TyrRS cluster with the Archaea as expected, but there is an interesting split between the animal,
fungal, and Encephalitozoon aaRSs in one clade and the
plant cytoplasmic aaRS together with Trypanosoma cruzi
and Giardia intestinalis in another clade (fig. 2; supplementary fig. 1, Supplementary Material online).
Split a-Proteobacterial Tree Topologies
Deviations from the inferred a-proteobacterial species
tree were observed in the remaining 8 aaRS trees in that not
all a-proteobacterial species formed a monophyletic clade
(fig. 2; supplementary fig. 1, Supplementary Material
online). Rather, the a-proteobacterial genes for HisRS,
ValRS, IleRS, GluRS, GlnRS, ArgRS, MetRS, and LysRS
were split into 2 or more clusters, each of which was supported by bootstrap support values higher than 90%. The
divergence pattern within each cluster was typically congruent with the expected species divergence pattern, suggesting
differential loss of ancient paralogs or rare cases of gene
exchanges across the domain borders that involved the aProteobacteria. The mitochondrial aaRS were often but
not always of bacterial origin, but in either case showed
no affiliation with any of the partial a-proteobacterial clades.
Evolution of Mitochondrial Aminoacyl-tRNA Synthetase 749
FIG. 3.—The tree topologies obtained from alignments of (A) PheRS and (B) LeuRS in a representative set of species from Bacteria, Archaea, and
Eukaryotes. Numbers refer to bootstrap support values (.75%) with ML, NJ, and Bayesian methods, respectively. Colors refer to broad species categories
(red, a-Proteobacteria; yellow, mitochondria; blue, Eukaryotes; green, plant mitochondria/chloroplast). The PheRS tree was rooted between the divergence of Bacteria and Eukaryotes–Archaea. As outgroups for the LeuRS tree, we used the ValRS and IleRS. A complete set of trees for all of the 20 aaRSs
are shown in supplementary figure 1 (Supplementary Material online).
750 Brindefalk et al.
FIG. 3. (Continued)
For example, 2 highly divergent a-proteobacterial
clades were observed in the HisRS tree, one of which
encompasses the Rickettsiales and additional bacterial species, whereas the other clade contains the Rhizobiales, other
Bacteria, the Eukaryotes, and the Archaea. Recently, we dem-
onstrated perfect coevolution of the HisRS sequence and the
tRNAHis identity elements for the 2 a-proteobacterial clades
(Ardell and Andersson 2006).
The IleRS and ValRS trees also suggested a split of the
a-Proteobacteria into 2 clades, but unlike the HisRS tree,
Evolution of Mitochondrial Aminoacyl-tRNA Synthetase 751
members of the Rickettsiales clustered in the same group as
the Archaea. It has previously been shown that 2 bacterial
genes, ileS1 and ileS2, are maintained in the Pseudomonas
fluorescens, Bacillus cereus, and B. anthracis genomes.
The latter gene, ileS2, belongs to the Archaea-cytoplasmic
clade and confers drug resistance to naturally produced
antibiotic compounds (Brown et al. 2003). Here, we show
that members of the Rickettsiales cluster with the ileS2
group, whereas the mitochondrial IleRS clusters with
Bacteria—only of the ileS1 type, including members of
the Rhizobiales.
Likewise, a second gene for MetRS has been identified
in Streptococcus pneumoniae that confers resistance to antibiotic compounds (Brown et al. 2003). Homologs of this
second gene were also observed in gram-positive bacteria,
and it was speculated that antibiotic resistance has served as
the selection agent for horizontal gene transfer of MetRS
genes (Brown et al. 2003). Considering this, it is perhaps
surprising that the a-proteobacterial divergences are nevertheless well resolved in our MetRS tree, with the exception
of the fresh and salt water isolates Caulobacter crescentus
and Silicibacter pomeroyi that cluster with the cytoplasmic
and archeael homologs (fig. 2; supplementary fig. 1, Supplementary Material online). The MetRS tree topology is consistent with an early diverging mitochondrial clade within
the Bacteria, distinct from the rest of the a-Proteobacteria.
GluRS conforms to the classical 3-domain pattern in
that the archaeal and cytoplasmic aaRS were well separated
from the bacterial group (Woese 2000). This is the only
aaRS that is encoded by 2 or more paralogous genes in
some of the a-proteobacterial species (fig. 2; supplementary
fig. 1, Supplementary Material online). The mitochondrial
aaRS are clearly of bacterial origin, but not affiliated with
any of the 3 partial a-proteobacterial clades. Finally, we
show that a majority of the a-proteobacterial species contain LysRS of class I, whereas the single eukaryotic gene
for LysRS (which is unlikely to represent the ancestral
mitochondrial gene because it clusters with the Archaea)
belongs to class II, as does also the plant-associated aProteobacteria.
Nuclear Replacements of Mitochondrial and Cytoplasmic
aaRS Genes
In a total of 8 trees, we identified only a single eukaryotic aaRS gene in animals and fungi, suggesting loss and
replacement of either the ancestral mitochondrial or cytoplasmic gene. Four of these aaRS (AlaRS, GlyRS, CysRS,
and ThrRS) revealed a monophyletic grouping of the aProteobacteria, whereas the other 4 (HisRS, ValRS, GlnRS,
and LysRS) showed 2 or more divergent a-proteobacterial
clades (fig. 2; supplementary fig. 1, Supplementary Material online). The clustering of the single eukaryotic protein
with archaeal proteins in the AlaRS, GlyRS, and the LysRS
trees might indicate that the ancestral cytoplasmic gene has
been retained. Vice versa, the clustering of the eukaryotic
lineage with bacterial species in the CysRS, ThrRS, ValRS,
and GlnRS trees signals retention of the ancestral mitochondrial aaRS, which now has a dual function in both the cytoplasm and the mitochondrion. None of the single eukaryotic
aaRS clustered with any of the a-proteobacterial clades.
Although we identified 2 nuclear genes for the cytoplasmic and mitochondrial enzymes in the AsnRS and
SerRS trees, neither displayed the characteristic mitochondrial-bacterial and cytoplasmic-archaeal patterns (fig. 2;
supplementary fig. 1, Supplementary Material online).
The AsnRS tree suggests several different subtypes, with
the mitochondrial and the cytoplasmic enzymes belonging
to the same overall broad group, consistent with ancestral
gene duplication and replacement events. We also found
a peculiar pattern for SerRS; the mitochondrial aaRS clustered with the Archaea, whereas the cytoplasmic enzyme
grouped with bacteria other than the a-Proteobacteria.
Taken together, our analysis suggests that the evolution
of these mitochondrial aaRS has been accompanied by loss,
duplication, and intranuclear gene replacement events. As
in all other trees, none clusters with the a-Proteobacteria.
Testing for Systematic Errors
We took great precautions to avoid systematic errors
(e.g., long-branch attraction) by applying phylogenetic
methods designed to minimize such problems. For example, we assessed the effects of removing fast-evolving sites
(as designated by the Shannon–Wiener index) when calculating LogDet distances on the alignments and subjected
them to Neighbor Net analyses (supplementary fig. 3, Supplementary Material online), as exemplified with PheRS
and LeurRS (fig. 4). The mitochondrial grouping essentially
remained stable until the phylogenetic signal was lost due to
the removal of too many sites. The only exception was the
TrpRS tree in which the mitochondrial clade approached that
of the a-Proteobacteria following the gradual removal of variable sites. We also tested the observed ML estimate for the
position of the mitochondrial clade in the PheRS and LeuRS
trees against a forced position within the a-proteobacterial
group with a likelihood ratio test using the SOWH procedure.
The test confirmed in both cases that a position of mitochondria within the a-proteobacterial group had a significantly
lower likelihood than the ML estimate (P , 0.01). Recombination tests revealed no instances of recombination events
that could explain the separate clusterings of mitochondria
and a-Proteobacteria (data not shown).
Finally, to examine whether the mitochondrial LeuRS
and/or PheRS showed an elevated substitution rate relative
to the a-proteobacterial aaRS, we did a separate Bayesian
analysis of the 2 groups (fig. 5). For comparison, we also
examined 2 mitochondrially encoded proteins (COX-1 and
COB) as well as 4 mitochondrial components of the pyruvate (PdhA, PdhB, and PdhC) and NADH (NADH dehydrogenase subunit F) dehydrogenase components encoded
by the nuclear genomes. These were selected because they
support a clustering of mitochondria and a-Proteobacteria
(supplementary fig. 4, Supplementary Material online). In
most instances, the amino acid substitution frequencies
were slightly higher for the mitochondrial group, but this
effect was seen irrespectively of the different tree topologies. Thus, the relative rate enhancement was no different
for genes that supported a clustering of mitochondria and aProteobacteria with those that failed to do so, suggesting
that rate acceleration alone cannot explain the lack of affiliation of the mitochondrial aaRS with the a-Proteobacteria.
752 Brindefalk et al.
FIG. 4.—Neighbor net analyses of (A, C) PheRS and (B, D) LeuRS based on LogDet distances. The effects of removing the fastest evolving sites (here
ranked according to the Shannon–Wiener index) from the analyses (C, D) are illustrated with 30% of the sites excluded. Taxonomic groups are indicated
by letters: red 5 a-Proteobacteria, yellow 5 mitochondria, and blue 5 eukaryote. Abbreviations of species names are spelled out in supplementary table
2, Supplementary Material online.
The Protomitochondrial Proteome Revisited
A previous application of phylogenetic methods to
a set of more than 70,000 bacterial and eukaryotic proteins
identified 630 nuclear eukaryotic genes as ancestrally derived from the a-Proteobacteria endosymbiont (Gabaldon
and Huynen 2003). This is based on an automatic survey
at the whole-genome level for clusters that contain mitochondria and proteobacterial species. We reexamined the
630 suggested genes in the protomitochondrion with a 4fold larger a-proteobacterial species set using more stringent criteria. To avoid spurious relationships across small
sets of taxa, we included only genes that were broadly present in Bacteria and Eukaryotes, defined as gene presence in
at least 10 out of 25 a-proteobacterial species, in 4 out of 9
Eukaryotes, and in 15 additional bacterial species. Using
this cutoff level for inclusion, we reevaluated 120 genes
in the protomitochondrial data set. Circa 40 trees supported
Evolution of Mitochondrial Aminoacyl-tRNA Synthetase 753
FIG. 5.—Relative substitution rates in the a-proteobacterial and the
eukaryotic clades. The relative rates were obtained from a Bayesian analysis of each group where the 8 different partitions (5genes) were allowed
to have different relative rates. The rate is normalized so the mean relative
rate of PdhA, PdhC, and PdhD equals 1 within each group. The 8 partitions
analyzed were cytochrome b (COB), cytochrome oxidase subunit 1 (COX1), NADH dehydrogenase subunit F (NuoF) and subunits of the pyruvate
dehydrogenase complex, E1 component alpha subunit (PdhA), dihydrolipoamide dehydrogenase E2 component (PdhC), and dihydrolipoamide
acyltransferase E3 component (PdhD). The PheRS and LeuRS trees are
shown in supplementary figure 1 (Supplementary Material online). The
COX, COB, NuoF, PdhA, PdhC, and PdhD trees are shown in supplementary figure 4 (Supplementary Material online).
a clustering of a-Proteobacteria with mitochondria, the
most significant of which were observed for proteins involved in the pyruvate dehydrogenase and respiratory chain
complexes (see supplementary fig. 4, Supplementary Material online for a few examples). None of the aaRS trees
showed a similar support for the clustering of mitochondria
and a-Proteobacteria.
Discussion
The early evolution of the eukaryotic cell and the origin of the many bacterial-like genes in the eukaryotic genome is an enigma that has proven hard to resolve. These
genes may either have been acquired en masse from the
endosymbionts that gave rise to mitochondria (reviewed
in Gray et al. 1999; Dyall et al. 2004; Embley and Martin
2006) and/or obtained via horizontal transfer of individual
bacterial genes (Lester et al. 2005) and/or from a consortium
of bacterial endosymbionts. Unlike most previous attempts
to discern the origin of the eukaryotic genome, such as most
recently in the ‘‘Ring of Life’’ hypothesis (Rivera and Lake
2004), our aim was to place the origin of the bacterial-like
genes for aminoacylation processes ‘‘in relation to’’ the previously demonstrated a-proteobacterial origin for a subset
of genes coding for components of the mitochondrial respiratory chain system (Gray et al. 1999; Hrdy et al. 2004;
Fitzpatrick et al. 2005).
There is no reason to expect a priori that every single
mitochondrial gene of a truly a-proteobacterial origin
should branch with the a-Proteobacteria in a phylogenetic
analysis. This is because of the inefficiency of single-gene
sequences to recover true evolutionary relationships, which
places constraints on our ability to identify mitochondrial
protein ancestors in retrospect. Such concerns should be
taken seriously, as demonstrated by Esser et al. (2004)
who examined all genes in the mitochondrial genomes
by rigorous phylogenetic methods and found that although
all are likely to share a common ancestor and truly be of aproteobacterial origin, not every gene supported a clustering
with the a-Proteobacteria. Indeed, many genes have been
proven unsuitable for phylogenetic studies at deep divergences due to duplications, horizontal gene transfers, and
rapid sequence evolution (Martin 1999; Creevey et al.
2004; Martin and Embley 2004; Bapteste and Walsh 2005).
The novelty of our approach is that we have tested for
the retention of the phylogenetic signal at the level of the
individual gene by examining the congruence of the aaRS
tree with the a-proteobacterial species tree (fig. 1). This approach is valid under the condition that an a-proteobacterial
species tree exists and can be inferred with confidence. Indeed, the same a-proteobacterial species tree topology was
inferred in 2 studies independently using different approaches. In one study, the topology was inferred from
a concatenated alignment of 20–40 genes with conserved
gene order structures in the Rhizobiales (Boussau et al.
2004). Another analysis utilized a super-tree approach, in
which a combined set of 418 single-gene families served
as the input data (Fitzpatrick et al. 2005). Incompatibilities
were estimated for less than 20% of the information genes
and 50% of the operational genes, which set the upper limits
for horizontal gene transfer, gene paralogy, or systematic
biases in the inference methods (Fitzpatrick et al. 2005).
We conclude that a meaningful a-proteobacterial species
relationship can be inferred with high confidence (Boussau
et al. 2004; Fitzpatrick et al. 2005).
A concordance between the aaRS trees with the aproteobacterial species tree was observed in 12 cases, suggesting retention of the phylogenetic signal at the deepest
level of the a-proteobacterial clade. The presence of 2 different genes for the mitochondrial and cytoplasmic aaRS in
6 of these trees makes it unlikely that gene replacements on
either side have obscured the underlying phylogenetic signals. In these cases, the gene geneaology seems not to have
been massively eroded by horizontal gene transfer events or
rapid sequence evolution. Yet, no evidence for a clustering
of mitochondrial and a-Proteobacteria aaRS was found.
In the remaining 8 trees, we observed a split of the aProteobacteria species into 2 unrelated groups, indicative of
horizontal gene transfers and/or differential loss of ancestral
paralogs. Among the 12 that supported a monophyletic aproteobacterial clade, potential gene replacement events
within the nuclear genome of the Eukaryotes were observed
in 6 cases. Such retargeting events within the eukaryotic genome place constraints on our ability to discern the bacterial
origin of these mitochondrial aaRS. The HisRS and ValRS
trees were particularly complex, signaling gene transfers on
the a-proteobacterial as well as the eukaryotic side.
We are aware that even aaRS that support the cohesion
of the a-Proteobacteria and that are encoded by 2 different
nuclear genes may fail to reveal the true origin of the mitochondrial proteins because of methodological problems.
The position of the mitochondrial clade in the aaRS trees
could potentially be an artifact of an evolutionary rate acceleration in the mitochondrial lineages relative to the aproteobacterial branches, as observed for some of the most
rapidly evolving genes in intracellular symbionts and
754 Brindefalk et al.
parasites (Canback et al. 2004; Thomarat et al. 2004). However, given the many tests performed in this study, the tree
topologies seem robust. In particular, mitochondrial proteins that fail to support the a-proteobacterial origin do
not evolve more rapidly than those that support such a relationship. Because the nuclear genes coding for components of the mitochondrial respiratory processes accurately
disclosed an affiliation with the a-Proteobacteria, we are convinced that our methods would be fully capable of identifying an a-proteobacterial origin also for the aaRS, had it
existed.
The different phylogenetic placements of the mitochondrial aaRS and the respiratory chain proteins are difficult to reconcile with hypotheses that try to explain the
origin of the Eukaryotes by the symbioses of 2 partners of
well-defined groups, such as Archaea with a-Proteobacteria,
Thermoplasma with spirochetes, methanogens with
d-Proteobacteria, Sulfolobus with Clostridium, or Pyrococcus with c-Proteobacteria (for a review of suggested partners,
see Embley and Martin 2006). In these hypotheses, the
underlying assumption is that Bacteria and Archaea evolved
as 2 identifiable lineages long before the Eukaryotes emerged
and that the partners involved in the fusion process had
already at this stage separated from their most closely related
sister groups. The explicit suggestion is that the gene set of
the Eukaryotes is the sum of what was present in the 2 partners plus some extra genes added later by horizontal gene
transfer. If so, we expect the majority of nuclear genes for
mitochondrial proteins to show an affiliation that is consistent with their natural history, such as for example with
the a-Proteobacteria.
However, despite the essentiality of aaRS in protein
synthesis, we failed to recover the anticipated a-proteobacterial source of these genes. These results are consistent
with previous phylogenetic studies (Brown 2001, 2003)
that have also noted the lack of evolutionary concordance
with the classical endosymbiotic theory. In a previous study
of the PheRS a-subunit that included 2 a-proteobacterial
species (Brown 2001) as well as in our study that included
a larger and more representative set of a-proteobacterial
species (fig. 3A), the mitochondrial lineage was positioned near the base of the bacterial tree rather than with
the a-Proteobacteria. Like the aaRS, eukaryotic genes for
glycolysis are bacterial like but also do not cluster specifically with the a-Proteobacteria (Canback et al. 2002). The
strong support for an affiliation between mitochondria and aProteobacteria for some genes and the lack thereof for others
suggest the acquisition of bacterial-like genes into the ancestral eukaryotic genome from various different sources.
One possibility is that the a-proteobacterial ancestor itself contained a naturally diverse collection of genes (Martin
and Koonin 2006) or arose long before the emergence of the
a-proteobacterial species analyzed in this study. Alternatively, some of the aaRS may be remnants of endosymbiotic
attempts that failed (Doolitte 1998; Brown 2003) prior to
the successful establishment of an endosymbiotic relationship with the a-Proteobacteria. A replacement of primary
endosymbiont gene functions by those of secondary endosymbionts has been observed in the case of aphid endosymbionts (Koga et al. 2003; Perez-Brocal et al. 2006). If
replacements of bacterial gene functions have occurred
in endosymbiotic relationships that are only a few hundred
million years old, it is perhaps not unreasonable to think
that remnants of premitochondrial endosymbionts could explain some of the anomalies of the mitochondrial proteins
that have evolved over a billion year time period. If this
holds true, the implication is that a protoeukaryotic cell
might have existed already prior to the acquisition of the
mitochondrion.
Another possibility is that only a limited set of a-proteobacterial genes were transferred into the mitochondrial
genome and processed by endosymbionts or bacterial genes
acquired via other evolutionary routes. The recent identification of intramitochondrial a-proteobacterial symbionts in
ovarian cells of ticks provides conditions under which aproteobacterial genes may be transferred into the mitochondrial genomes even in modern times (Beninati et al. 2004;
Sassera et al. 2006). In light of this, it is interesting to note
that it has been shown that the sex-ratio distorter Wolbachia
pipientis have transferred some of its genes into the nuclear
genome of their arthropod hosts (Kondo et al. 2002). The
transfer of a-proteobacterial genes for components of the
respiratory chain complexes may hide acquisitions of other
genes into the eukaryotic genome that are more difficult to
trace evolutionarily.
In this context, it is noteworthy that other mitochondrial information processing enzymes, such as RNA polymerase, DNA polymerase, and replicative helicases, is not
of a-proteobacterial origin, but rather similar to homologs of
T-odd bacteriophages (Filée et al. 2003; Filée and Forterre
2005; Shutt and Gray 2005, 2006; Forterre 2006). Most of
these genes are nucleus encoded, although T phage–like
RNA polymerase genes have been identified in mitochondrial genomes and plasmids. Cryptic prophages of the
T-odd type have also been discovered in several proteobacterial genomes, including those of a-Proteobacteria (Filée
and Forterre 2005). Given the recent identification of aaRS
genes in giant Mimivirus (Abergel et al. 2005), a viral origin or a viral transmission route might be considered.
However, the mimiviral sequences are highly diverged
and cluster specifically with neither the mitochondrial aaRS
nor bacterial or cytoplasmic aaRS (data not shown).
The emerging evolutionary scenario is increasingly
complex with mitochondrial contributions from bacterial,
eukaryotic, and viral partners. Ongoing gene loss and replacements may go a long way to explain why there are
so few a-proteobacterial genes with homologs in eukaryotic
genomes (Boussau et al. 2004). Thus, some ‘‘noise’’ is to be
anticipated; however, it is remarkable that ‘‘none’’ of the
aaRS trees supports an affiliation between mitochondria
and a-Proteobacteria, not even those that otherwise show
a strong retention of the phylogenetic signal over the time
span considered. We can think of several possible explanations, among which we favor the simplest, namely, that the
mitochondrial aaRS have been acquired from sources other
than the a-Proteobacteria.
Supplementary Material
Supplementary figures 1–4 and table 2 are available
at Molecular Biology and Evolution online (http://www.
mbe.oxfordjournals.org/).
Evolution of Mitochondrial Aminoacyl-tRNA Synthetase 755
Acknowledgments
This research was supported by grants from the Swedish
Research Council, the Göran Gustafsson Foundation,
the Swedish Foundation for Strategic Research, and the
Wallenberg Foundation.
Literature Cited
Abergel C, Chenivesse S, Byrne D, Suhre K, Arondel V, Claverie
JM. 2005. Mimivirus TyrRS: preliminary structural and functional characterization of the first amino-acyl tRNA synthetase
found in a virus. Acta Crystallogr Sect F Struct Biol Cryst
Commun. 61:212–215.
Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller
W, Lipman DJ. 1997. Gapped BLAST and PSI-BLAST: a new
generation of protein database search programs. Nucleic Acids
Res. 25:3389–3402.
Ardell DG, Andersson SGE. 2006. TFAM detects co-evolution of
tRNA identity rules with lateral transfer of histidyl-tRNA synthetase. Nucleic Acids Res. 34:893–904.
Bapteste E, Walsh TM. 2005. Does the ÔRing of LifeÕ ring true?
Trends Microbiol. 13:256–261.
Beninati T, Lo N, Sacchi L, Genchi L, Noda H, Bandi C. 2004. A
novel alpha-Proteobacterium resides in the mitochondria of
ovarian cells of the tick Ixodes ricinus. Appl Environ Microbiol. 70:2596–2602.
Boussau B, Karlberg EO, Frank AC, Legault B-A, Andersson
SGE. 2004. Computational inference of scenarios for alphaproteobacterial genome evolution. Proc Natl Acad Sci USA.
101:9722–9727.
Brown JR. 2001. Genomic and phylogenetic perspectives on the
evolution of prokaryotes. Syst Biol. 50:497–512.
Brown JR. 2003. Ancient horizontal gene transfer. Nat Rev Genet.
4:121–132.
Brown JR, Gentry D, Becker AJ, Ingraham K, Holmes DJ,
Stanhope MJ. 2003. Horizontal transfer of drug-resistant
aminoacyl-transfer-RNA synthetases of anthrax and grampositive pathogens. EMBO Rep. 4:692–698.
Bryant D, Moulton V. 2004. Neighbor-net: an agglomerative
method for the construction of phylogenetic networks. Mol
Biol Evol. 21:255–265.
Canback B, Andersson SGE, Kurland CE. 2002. The global phylogeny of glycolytic enzymes. Proc Natl Acad Sci USA.
99:6097–6102.
Canback B, Tamas I, Andersson SGE. 2004. A phylogenomic
study of endosymbiotic bacteria. Mol Biol Evol. 21:1110–
1122.
Creevey CJ, Fitzpatrick DA, Philip GK, Kinsella RJ, O’Connell
MJ, Pentony MM, Travers SA, Wilkinson M, McInerney
JO. 2004. Does a tree-like phylogeny only exist at the tips
in the prokaryotes? Proc R Soc Lond B Biol Sci. 271:2551–
2558.
Diaz-Lazcoz Y, Aude J-C, Nitschké P, Chiapello H, LandesDevauchelle C, Risler L. 1998. Evolution of genes, evolution
of species: the case of aminoacyl-tRNA synthetases. Mol Biol
Evol. 15:1548–1561.
Dohm JC, Vingron M, Staub E. 2006. Horizontal gene transfer in
aminoacyl-tRNA synthetases including leucine-specific subtypes. J Mol Evol. 63:437–447.
Doolittle WF. 1998. You are what you eat: a gene transfer ratchet
that could account for bacterial genes in eukaryotic nuclear
genomes. Trends Genet. 14:307–311.
Dyall SF, Brown MT, Johnson PJ. 2004. Ancient invasions: from
endosymbionts to organelles. Science. 304:253–257.
Embley TM, Martin W. 2006. Eukaryotic evolution, changes and
challenges. Nature. 440:623–630.
Esser C, Ahmadinejad N, Wiegand C. (15 co-authors). 2004. A genome phylogeny for mitochondria among alpha-proteobacteria
and a predominantly eubacterial ancestry of yeast nuclear
genes. Mol Biol Evol. 21:1643–1660.
Filée J, Forterre P. 2005. Viral proteins functioning in organelles:
a cryptic origin? Trends Microbiol. 13:510–513.
Filée J, Forterre P, Laurent J. 2003. The role played by viruses in
the evolution of their hosts: a view based on informational protein phylogenies. Res Microbiol. 154:237–243.
Fitzpatrick DA, Creevey CJ, McInerney JO. 2006. Genome phylogenies indicate a meaningful alpha-proteobacterial phylogeny and support a grouping of the mitochondria with the
Rickettsiales. Mol Biol Evol. 23:74–85.
Forterre P. 2006. Three RNA cells for ribosomal lineages and
three DNA viruses to replicate their genomes: a hypothesis
for the origin of cellular domain. Proc Natl Acad Sci USA.
103:3669–3674.
Gabaldon T, Huynen MA. 2003. Reconstruction of the protomitochondrial metabolism. Science. 301:609.
Galtier N, Gouy M. 1995. Inferring phylogenies from DNA
sequences of unequal base compositions. Proc Natl Acad
Sci USA. 92:11317–11321.
Galtier N, Gouy M, Gautier C. 1996. SEAVIEW and PHYLO_
WIN: two graphic tools for sequence alignment and molecular
phylogeny. Comput Appl Biosci. 12:543–548.
Goldman NJ, Anderson P, Rodrigo AG. 2000. Likelihoodbased tests of topologies in phylogenetics. Syst Biol.
49:652–670.
Gray M, Burger WG, Lang BF. 1999. Mitochondrial evolution.
Science. 283:1476–1481.
Guindon S, Gascuel O. 2003. A simple, fast, and accurate algorithm to estimate large phylogenies by maximum likelihood.
Syst Biol. 52:696–704.
Hrdy I, Hirt RP, Dolezal P, Bardonova L, Foster PG, Tachezy J,
Embley TM. 2004. Trichomonas hydrogenosomes contain the
NADH dehydrogenase module of mitochondrial complex I.
Nature. 432:618–622.
Huelsenbeck JP, Ronquist F. 2001. MRBAYES: Bayesian inference of phylogenetic trees. Bioinformatics. 17:754–755.
Huson DH. 1998. SplitsTree: analyzing and visualizing evolutionary data. Bioinformatics. 14:68–73.
Jobb G, von Haesler A, Strimmer K. 2004. TREEFINDER: a powerful graphical analysis environment for molecular phylogenetics. BMC Evol Biol. 4:18.
Karlberg EO, Andersson SGE. 2003. Mitochondrial gene history
and mRNA localization: is there a correlation? Nat Rev Genet.
4:391–397.
Keane TM, Creevey CJ, Pentony MM, Naughton TJ, McInerney
JO. 2006. Assessment of methods for amino acid matrix selection and their use on empirical data shows that ad hoc assumptions for choice of matrix are not justified. BMC Evol Biol.
6:29.
Koga R, Tsuchida T, Fukatsu T. 2003. Changing partners in an
obligate symbiosis: a facultative endosymbiont can compensate for loss of the essential endosymbiont Buchnera in an
aphid. Proc R Soc Lond B. 270:2543–2550.
Kondo N, Nikoh N, Ijichi N, Shimada M, Fukatsu T. 2002. Genome fragment of Wolbachia endosymbiont transferred to X
chromosome of host insect. Proc Natl Acad Sci USA.
99:14280–14285.
Kurland CG, Andersson SGE. 2000. Origin and evolution of the
mitochondrial proteome. Microbiol Mol Biol Rev. 64:
786–820.
Kurland CG, Collins LJ, Penny D. 2006. Genomics and the irreducible nature of eukaryote cells. Science. 312:1011–1014.
Lester L, Meade A, Pagel M. 2005. The slow road to the eukaryotic genome. Bioessays. 28:57–64.
756 Brindefalk et al.
Lockhart PJ, Steel MA, Hendy MD, Penny D. 1994. Recovering
evolutionary trees under a more realistic model of sequence
evolution. Mol Biol Evol. 11:605–612.
Löytynoja A, Milinkovitch MC. 2001. SOAP, cleaning multiple
alignments from unstable blocks. Bioinformatics. 17:573–574.
Mans BJ, Anantharaman V, Aravind L, Koonin EV. 2003. Comparative genomics, evolution and origins of the nuclear envelope and nuclear pore complex. Cell Cycle. 3:1612–1637.
Martin W. 1999. Mosaic bacterial chromosomes: a challenge en
route to a tree of genomes. Bioessays. 21:99–104.
Martin W. 2005. Archaebacteria (Archaea) and the origin of the
eukaryotic nucleus. Curr Opin Microbiol. 8:630–637.
Martin W, Embley TM. 2004. Evolutionary biology: early evolution comes full circle. Nature. 431:134–135.
Martin W, Koonin EV. 2006. Introns and the origin of nucleuscytosol compartmentalization. Nature. 440:41–45.
Milne I, Wright F, Rowe G, Marshall DF, Husmeier D, McGuire
G. 2004. TOPALi: software for automatic identification of
recombinant sequences within DNA multiple alignments. Bioinformatics. 20:1806–1807.
Morgenstern B. 2004. DIALIGN: multiple DNA and protein sequence alignment at BiBiServ. Nucleic Acids Res. 32:W33–
W36.
Perez-Brocal V, Gil R, Ramos S, Lamela A, Postigo M, Michelena
JM, Silva FJ, Moya A, Latorre A. 2006. A small microbial genome: the end of a long symbiotic relationship? Science.
314:312–313.
Rambaut A, Grassly NC. 1997. Seq-Gen: an application for the
Monte Carlo simulation of DNA sequence evolution along
phylogenetic trees. Comput Appl Biosci. 13:235–238.
Rivera MC, Lake JA. 2004. The ring of life provides evidence for
a genome fusion origin of eukaryotes. Nature. 431:152–155.
Roy H, Ling J, Alfonzo J, Ibba M. 2005. Loss of editing activity
during the evolution of mitochondrial phenylalanyl-tRNA synthetase. J Biol Chem. 280:38186–38192.
Sassera D, Beninati T, Bandi C, Bouman EA, Sacchi L, Fabbi M,
Lo N. 2006. Candidatus Midichloria mitochondrii, an endosymbioint of the tick Ixodes ricinus with a unique intramitochondrial lifestyle. Int J Syst Evol Microbiol. 56:2535–2540.
Shutt TE, Gray MW. 2005. Bacteriophage origins of mitochondrial replication and transcription proteins. Trends Genet.
22:90–95.
Shutt TE, Gray MW. 2006. Twinkle, the mitochondrial replicative
DNA helicase, is widespread in the eukaryotic radiation and
may also be the mitochondrial DNA primase in most eukaryotes. J Mol Evol. 62:588–599
Swofford DL, Olsen GJ, Waddell PJ, Hillis DM. 1996. Phylogenetic inference. In: Hillis DM, Moritz C, Mabel BK, editors.
Molecular systematics. Sunderland (MA): Sinauer Associates,
Inc. p. 407–514.
Thollesson M. 2004. LDDist: a Perl module for calculating LogDet
pair-wise distances for protein and nucleotide sequences. Bioinformatics. 20:416–418.
Thomarat F, Vivares C, Gouy M. 2004. Phylogenetic analysis of
the complete genome sequence of Encephalitozoon cuniculi
supports the fungal origin of microsporidia and reveals a high
frequency of fast-evolving genes. J Mol Evol. 59:780–791.
Thompson JD, Higgins DG, Gibson TJ. 1994. CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap
penalties and weight matrix choice. Nucleic Acids Res.
22:4673–4680.
Whelan S, Goldman N. 2001. A general empirical model of
protein evolution derived from multiple protein families
using a maximum-likelihood approach. Mol Biol Evol.
18:691–699.
Woese CR, Olsen GJ, Ibba M, Söll D. 2000. Aminoacyl-tRNA
synthetases, the genetic code, and the evolutionary process.
Microbiol Mol Biol Rev. 64:202–236.
Wolf YI, Aravind L, Grishin NV, Koonin EV. 1999. Evolution of
aminoacyl-tRNA synthetases—analysis of unique domain
architectures and phylogenetic trees reveals a complex history
of horizontal gene transfer events. Genome Res. 8:689–710.
Martin Embley, Associate Editor
Accepted December 12, 2006