Download The ring of life hypothesis for eukaryote origins is supported by

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Flagellum wikipedia , lookup

Amitosis wikipedia , lookup

List of types of proteins wikipedia , lookup

Transcript
Downloaded from http://rstb.royalsocietypublishing.org/ on June 14, 2017
rstb.royalsocietypublishing.org
The ring of life hypothesis for eukaryote
origins is supported by multiple kinds of
data
James McInerney1,2, Davide Pisani3 and Mary J. O’Connell4
Review
Cite this article: McInerney J, Pisani D,
O’Connell MJ. 2015 The ring of life hypothesis
for eukaryote origins is supported by multiple
kinds of data. Phil. Trans. R. Soc. B 370:
20140323.
http://dx.doi.org/10.1098/rstb.2014.0323
Accepted: 10 July 2015
One contribution of 17 to a theme issue
‘Eukaryotic origins: progress and challenges’.
Subject Areas:
evolution, genomics, microbiology, palaeontology, taxonomy and systematics, theoretical
biology
Keywords:
ring of life, fusion, merger, hybrid, phylogeny,
eukaryote
Author for correspondence:
James McInerney
e-mail: [email protected]
1
Department of Biology, National University of Ireland Maynooth, Co. Kildare, Republic of Ireland
Faculty of Life Sciences, University of Manchester, Oxford Road, Manchester M13 9PL, UK
3
School of Biological Sciences and School of Earth Sciences, University of Bristol, Life Sciences Building,
24 Tyndall Avenue, Bristol BS8 1TG, UK
4
School of Biotechnology, Dublin City University, Glasnevin, Dublin 9, Republic of Ireland
2
The literature is replete with manuscripts describing the origin of eukaryotic
cells. Most of the models for eukaryogenesis are either autogenous (sometimes called slow-drip), or symbiogenic (sometimes called big-bang). In
this article, we use large and diverse suites of ‘Omics’ and other data to
make the inference that autogeneous hypotheses are a very poor fit to the
data and the origin of eukaryotic cells occurred in a single symbiosis.
1. Introduction
The value indeed of an aggregate of characters is very evident in natural history
— Charles Darwin [1, p. 417].
How we think we should depict evolutionary history has, of course, an enormous effect on how we analyse data, how we write analytical software and how
we depict the final results. If we feel that the evolutionary history of a dataset has
been tree-like, then it is likely that the first, perhaps only, analyses we carry out
will be a phylogenetic analysis using software that generates, as an output, a tree.
We know of course that human populations do not have a tree-like history; therefore, we are usually disinclined to use software for tree reconstruction to depict
these histories [2]. A phylogenetic tree can always be derived based on the complete genomes of two parents and their children. However, we know that the tree
will be meaningless, because a tree-like process did not generate the data. What
this means is that the outcome of an evolutionary analysis is always contingent
on our a priori opinions for how the data have evolved. In some cases, as in
the above example, our knowledge of the process that generated the data is
good enough to let us unambiguously avoid the use of a particularly poorly fitting model (i.e. a tree) to describe the data (i.e. the relationships between the
genomes of two parents and their progeny). However, in most cases, we lack
the knowledge to unambiguously reject a model (or a class of models) based
on previous observations. In such cases, a better course of action would be to
consider a variety of models and ask which fits the data best (if not adequately).
In this article, we try to understand the patterns we observe that speak about
eukaryotic evolutionary history and we assess the goodness-of-fit between the
data collected so far and a variety of models for eukaryotic origins and evolution.
We do not limit our model analysis to simple alternative phylogenetic trees, we
also include processes that are not tree-like.
One of the most profound changes in evolutionary biology has taken place in
the past 10 years. We have had to adjust our thinking on the best means of depicting and analysing evolutionary relationships. Instead of using only phylogenetic
trees of organisms as the organizing principle, we must now think also in terms of
flows of genetic information. Flows of genetic information can be vertical from
ancestor to descendants (following a tree-like process) or horizontal from one lineage to another (following a network-like process) [3]. Horizontal flows can be
facilitated by plasmids [4], phage [5], viruses [6], transposons [7], gene transfer
& 2015 The Author(s) Published by the Royal Society. All rights reserved.
Downloaded from http://rstb.royalsocietypublishing.org/ on June 14, 2017
number of genomes [22]) would have been enough to clarify
the early evolution of life on Earth. This narrative remains
omnipresent, despite our current understanding of the pervasive role of horizontal gene transfer (HGT) in evolution [14].
A tree is expected; therefore, a tree is sought out.
2. What is the problem with trees in prokaryotic
evolution?
3. Different data, different viewpoints on
eukaryotic origins?
Most scientists are familiar with the ideas of Karl Popper and
his development of falsifiability [36], id est, that the demarcation line between science and non-science is whether or
not an idea can be tested, or falsified. However, prior to
Popper, a popular philosophy of science was that of William
Whewell, who put forward the idea of consilience [37].
Phil. Trans. R. Soc. B 370: 20140323
Phylogenetic trees have a very precise meaning [23]. Each
node on a tree is either a contemporary taxon or a hypothetical ancestral taxon and the linkages indicate vertical
inheritance. An evolutionary diagram is a tree if it does not
display any reticulations (loops), otherwise it is not a tree.
Trees depict evolution as a continuously diverging process
and the ramifications on trees indicate speciation or duplication events. A more general diagram is a phylogenetic
network. These are found in two varieties—networks that
display uncertainty and networks that display recombination
[24]. Recombination networks also have a restrictive set of
assumptions—the data are required to be homologous
along its entirety, and reticulations on phylogenetic network
graphs indicate places where recombination events between
homologous regions have occurred.
More than 10 years ago, Creevey et al. [25] showed that
strongly supported but completely conflicting phylogenies
existed throughout the prokaryote world. Since then, other
studies have confirmed that the majority of prokaryotic
genome data cannot fit onto a single phylogenetic tree
[26 –29] (also see [20]). Only gene flow (both horizontal and
vertical) analysis can adequately account for prokaryotic
evolution [13,30].
More recently, because of our growing realization that so
many molecular sequences are in fact composite entities, we
have been advocating a more realistic model of sequence
evolution where we allow the merging of both homologous
and non-homologous evolving entities [31 –33]. In this
view, we can analyse situations where very different evolving
entities, sometimes from the same evolutionary level (e.g.
symbiosis of two organisms, merger of two genes) and sometimes from different levels (e.g. plasmids merging with cells,
domains merging with genes) can merge [14]. This is a
‘goods-thinking’ outlook and can have many benefits [12].
We have recently defined N-rooted fusion graphs that can
instead be used to depict branching relationships with
mergings [31,33].
An over-reliance on tree-thinking and tree-based methods
[34] has led some researchers to proclaim that we have
reached in phylogenetic impasse [35]. However, this is
surely premature, as there are a wealth of characters that
can be used to provide insights into eukaryogenesis.
2
rstb.royalsocietypublishing.org
agents [8], intercellular nanotubes [9], or simply the hybridization of sexual species, followed by re-integration of the
hybrid into one of the ancestral species [10,11]. Contemporary
genomes are, and extinct genomes were, complex mixtures of
genetic mergers [12,13], with the horizontal gene flows being
as normal as vertical gene flows. This presents us with a problem if we are restricted to using tree-like processes that only
depict vertical gene flows to model the data, as we would
not be able to model and understand the impact of horizontal
gene flows in evolution.
Phenotypes, particularly in single-celled organisms,
coalitions of evolving entities [14] and viruses can only be
explained by integrating both a horizontal and vertical
view of evolution. In large-scale genome sequencing of bacterial strains, we see thousands of recombination events
[15], and indeed, on occasion, explaining phenotypes requires
methods that explicitly need to account for and remove vertical transmission signals [16]. One thousand eubacterial genes
have flowed into the stem lineage of halophilic Archaebacteria and identification of this important evolutionary event
required the modelling of horizontal, rather than vertical
gene flows [17]. Indeed, the impossibility of identifying horizontal flows (even when massive) when using only tree-like
models is evident in the history of haloarchaeal studies.
One of the first analyses of complete genomes from a broad
selection of prokaryotes placed the halophiles as deepbranching Archaebacteria [18]. The phylogenetic position of
this lineage was determined through the interaction of two
signals, both present in the genomes of halophiles, one pulling this lineage towards the methanogenic euryarchaeotes,
the other pulling the halophiles towards the Eubacteria. The
existence of two incongruent signals ultimately caused the
halophiles to cluster at the base of the archaebacterial tree—
a phylogenetic position that was not supported by the individual gene trees. A tree-based analysis could not get the
correct answer for the placement of archaebacterial halophiles, because halophile evolution is driven by gene flow
based adaptive processes rather than by a pure cladogenetic
process. Indeed, archaebacterial evolution more broadly can
only be fully understood when considered as the result of
large-scale horizontal gene flows [19] interacting with vertical
ones. This is because the origins of most major groups of
Archaebacteria coincide with massive flows of genes from
Eubacteria to Archaebacteria [19,20]. It is becoming increasingly evident that at least some of the genes present in
every known organism or lineage (animals included) underwent part of their evolution in a completely different lineage
[3]. In the case of animals, obvious examples include all genes
of mitochondrial origins (that evolved in the alpha-proteobacterial lineage). To conclude, while one can always force
chimerical data onto a tree, model misspecification will
ensue. In the best-case scenario, this will simply limit our
understanding of the generative process underpinning the
data. However, in the worst-case scenario, model misidentification will lead to misleading conclusions, such as the finding
of support for monophyletic Archaebacteria [21], a topology
that is no longer supported by the data or by better models
(see discussion in §8).
Genomes contain exquisitely detailed information about
evolutionary history; however, because of a deep-rooted tradition in the use of tree-like models to explain biological
evolution, it was felt that a clear understanding of cladogenesis (to be derived using a phylogenetic tree for a sufficient
Downloaded from http://rstb.royalsocietypublishing.org/ on June 14, 2017
Eukaryotes presumably did not arise ex nihilo. However, there
are considerable disagreements concerning how we might read
their traits. In the following sections, we first review what is
known about eukaryote traits and how they relate to prokaryote traits. We then ask how these traits map onto
hypotheses of eukaryote origins and we finally ask which mappings are the best fit. In some cases, we find that eukaryotes
have homologous structures with all prokaryotes—membranes, for instance are homologous. In other cases, we find
that eukaryotes have proteins or structures that are uniquely
eukaryote. In other cases, we find homologies between eukaryotes and some, but not all prokaryotes, and also we find that
there are partial homologies when comparing prokaryotic
and eukaryotic features. The detail of the homologies, the
organelles and the structures is important. Fortunately, the current wave of ‘omics’ studies is providing more and more data
that can speak to eukaryotic origins [39–41].
There are a number of traits that are either uniquely
eukaryotic or are present in all eukaryotes. These are the
traits that most likely trace back to the first eukaryotic cells.
3
Phil. Trans. R. Soc. B 370: 20140323
4. What do we know about the last eukaryotic
common ancestor?
The last eukaryote common ancestor (LECA) had a nucleus
and associated with this nucleus were nuclear pore complexes, some of whose proteins appear to be homologous
and some analogous in modern eukaryotes [42,43]. Additionally, comparative analysis of chromatin proteins from protists
has provided insights into the early evolutionary history of
histone and DNA modification, nucleosome assembly and
chromatin-remodelling systems [44]. Many of the individual
domains in these eukaryotic systems can be shown to have
had bacterial precursors, but today they are found in distinctive regulatory complexes that are unique to the eukaryotes
[44]. In general, eukaryotes have linear chromosomes and
possess centromeres [45], though prokaryotes like Borrelia
burgdorferi also contain linear chromosomes and plasmids
[46]. It is unlikely that chromosome linearity is homologous,
as there is no suggestion of a sister-group relationship
between Borrelia and eukaryotes. Similarly, there have been
suggestions of a relationship between Planctomycetes and
eukaryotes, based on the superficial similarity between the
eukaryotic nucleus and the planctomycete membrane that
encloses its chromosome. Any direct (or intermediate)
relationship between Planctomycetes and eukaryotes has
been shown to be spurious or analogous [47]. In conclusion,
though there is some evidence that the eukaryotic nucleus
(excluding the genetic material in this case) has some homologies with eubacterial characters, it is mostly uniquely
eukaryotic, with its own independent evolutionary history.
The nucleolus is uniquely eukaryotic. However, careful
analysis of the phyletic distribution of the proteome of the
nucleolus shows that the nucleolar protein domains confirm
the archaebacterial origin of the core machinery for ribosome
maturation and assembly, but also reveals substantial eubacterial and eukaryotic contributions to nucleolus evolution
[48]. The nucleolus as a whole has no homologue in prokaryotes, though the nucleolus parts have homologies in
prokaryotes. These ‘parts’ or ‘goods’ [12,49] form a unique
structure in eukaryotes. It appears therefore, that the nucleolus
is in some respects chimerical.
Introns are not unique to eukaryotes, but spliceosomal
introns are [50]. While group II (self-splicing) introns are commonly found in eukaryotes, Eubacteria and Archaebacteria,
spliceosomal introns are only found in eukaryotes. It is
thought that spliceosomal introns arose from group II introns
early in eukaryotic evolution [50]. Furthermore, intron positions are seen to be conserved across animals, plants and
many protists, indicating that spliceosomal introns arose
early in eukaryotic evolution and differential loss of introns
has been a typical mode of evolution since then [50]. If the
suggested relationship between group II introns and spliceosomal introns is confirmed, then this suggests a specific flow
of genetic material between eukaryotes and Eubacteria.
Capped and polyadenylated mRNA is also unique to
eukaryotes and while both kinds of cellular processes are distinct, there is a common involvement of RNA polymerase II,
which is a large macromolecular complex involving both noncoding RNAs and 12–14 proteins. Eukaryotes have three different polymerases involved in transcription, while Eubacteria
have only one [51]. There are clear homologies between the
RNA polymerase II components of eukaryotes and RNA polymerase proteins in Eubacteria and Archaebacteria, though
there are more subunits in common between archaebacterial
RNA polymerase and eukaryotic polymerase II (10 subunits in
common) than between the eubacterial version and the others
rstb.royalsocietypublishing.org
Consilience means the ‘jumping together’ of pieces of evidence from disparate kinds of data. Consilience does not
mean compromise; rather it is an appeal to look at reciprocal
supports from disparate sets of data. Whewell argued that
support for a particular theory would be made stronger if
multiple lines of evidence that were orthogonal to one
another were seen to agree with the same overall theory. Furthermore, if hypothesis X and hypothesis Y both tended to
support an overall theory T, then if some new evidence supporting hypothesis X arises, this new evidence not only
increases support for the overall theory, but it also strengthens our support for hypothesis Y, even if we have not
collected any new data that speaks to hypothesis
Y. Specifically with regard to the origins of the eukaryotic
cell, we might look to disparate datasets in an effort to see
how they agree with an overall theory (if indeed they do at
all).
Consilience was the most popular way of viewing science
at the time when Darwin, Wallace and colleagues were working out the theory of evolution. Indeed in The origin of species,
Darwin noted, ‘The importance, for classification, of trifling
characters mainly depends on their being correlated with several other characters of more or less importance’ [1, p. 417].
Here, we argue that some of the hypotheses for the origins
of eukaryotes have already been falsified (sensu Popper). In
other cases, we rely on an accumulation of evidence from disparate sources for hypotheses that have not yet been rejected
by specific observations in order to evaluate which model
for the origin of the eukaryotes is better supported (sensu Whewell). Indeed, some hypotheses of eukaryogenesis can be
rejected simply because the prediction of the model does not
fit with the empirical data, and in some other cases, while
no individual piece of evidence is enough to unequivocally
define or support a model, when the evidence is taken
together its support for the considered hypothesis will
become overwhelming. We argue that the only current
model that can adequately account for the origin of eukaryotes
is the ring of life model of Lake and Rivera [38].
Downloaded from http://rstb.royalsocietypublishing.org/ on June 14, 2017
4
Phil. Trans. R. Soc. B 370: 20140323
[63, p. 177]. The ESCRT pathway, which up to now was considered unique to eukaryotes is involved in the formation of
multivesicular bodies, which are a trafficking system to bring
cargo to the vacuole in eukaryotes. However, the ESCRT
system is also known to have other functions such as eukaryotic cell abscission, viral budding, exosome secretion and
autophagy [64]. These are not typical features of prokaryotes,
so future work will have to properly elucidate their likely
function in Lokiarchaeota. It is worth mentioning that
Lokiarchaeota are, at the time of writing, only known from
metagenomic sequencing [63]. However, the finding of
Lokiarchaeota suggests a gene flow between eukaryotes
and Archaebacteria, at least for these key genes.
Mitochondria are a unique feature of eukaryotes, though
not all eukaryotes possess mitochondria [65–69]. For some
time, it was thought that many lineages of eukaryotes that
lacked mitochondria were primitively amitochondriate [70].
This was because they branched deeply in phylogenetic
trees of eukaryotes using poorly performing phylogenetic
methods [71]. However, using better phylogenetic methods
[72] and the discovery of typical mitochondrial genes in the
genomes of these amitochondriate eukaryotes [73,74], and
furthermore the demonstration that some possessed an
organelle that is homologous to the mitochondrion (the
hydrogenosome), the ‘Archaezoa’ hypothesis has been dismantled [65,66,75]. The implication of this work is that we
now know of no primitively amitochondriate eukaryotes
[76]. That is to say, we know of no eukaryote genomes
that are primitively devoid of a significant gene flow from
Eubacteria through mitochondrial acquisition.
To facilitate cell division, Walker A-type cytoskeletal
ATPase protein MinD is found widely among Eubacteria
and in some eukaryotes, but is absent in Archaebacteria. In
prokaryotes, this protein acts to prevent Z-ring formation
anywhere, except for mid-cell [77].
An analysis of 630 genes in eukaryotes with a sister-group
relationship to alpha-proteobacteria on phylogenetic trees
revealed that many biochemical pathways are completely, or
nearly, contained in this cohort. These include the pathways
for beta-oxidation, electron transport chain, fructose/mannose
metabolism and pathways for the synthesis of lipids, biotin,
haem and iron –sulfur clusters [78]. In addition, the eukaryotic
glyceraldehyde-3-phosphate dehydrogenase [79] is of bacterial origin, and the machinery for iron –sulfur cluster
assembly (iron –sulfur clusters are an important part of
eukaryotic metalloproteins) is related to alpha-proteobacterial
genes [80,81]. The implication from this analysis is that much
of eukaryotic metabolism is eubacterial-derived.
Analyses of homologies between eukaryotes and prokaryotes have consistently found a disjoint between the
relationships that have been inferred [69,82–85] (see supporting
information, figure S3, for Alvarez-Ponce et al. [85]—http://
archaebacterial.pnas.org/content/suppl/2013/04/01/121137
1110.DCSupplemental/sapp.pdf). In general, while there are
many gene families that are found across all three groups,
quite often we find gene families that are found in only two of
the three groups (i.e. eukaryotes and Eubacteria, and eukaryotes
and Archaebacteria). Analysis of ‘omics’ data has shown that
the phyletic affiliation of these genes has a very important
effect on the cellular role of their proteins.
In gene deletion experiments, it is possible to find out
which proteins are essential for a cell to exist. Several such
studies have been conducted for Saccharomyces cerevisiae [86].
rstb.royalsocietypublishing.org
(five subunits in common with the others). Eukaryote polymerase II has two subunits (Rpb8 and Rpb9) that are unique to
eukaryotes.
In terms of other processes that are unique to eukaryotes,
the anaphase promoting complex or cyclosome (APC/C) is a
major component of the cell cycle and is not found in prokaryotes. In total, 24 out of 37 known APC/C subunits,
adaptors/co-activators and main targets, could be inferred
to be present in the LECA. For the most part, these components are well conserved in all extant eukaryotic lineages.
Meiosis and mitosis are both eukaryotic features, but two
points are interesting about this. Firstly, it is thought that meiosis might be an early invention in eukaryotic evolution [52];
secondly, is the observation of cellular fusion and recombination occurring in Archaebacteria [53,54]. Cellular fusion and
recombination are a long way from being sexual reproduction;
however, it is interesting in the light of early meiosis in eukaryotes that prokaryotes can carry out such functions.
One of the most significant differences between prokaryotes
and eukaryotes is to be found in cell size and cell volume. Typically, there is a 1000-fold difference in the sizes of the cells in
the two groups, though there can be huge variation [55]. Biochemical calculations have also found that the energy
requirement per gene is orders of magnitude higher in eukaryotes than in prokaryotes [55], and this additional
requirement in terms of energy means that eukaryotes typically
need a cellular powerhouse with large amounts of energy-generating membranes. Given the clear relationship between the
mitochondrion and alphaproteobacteria, this would only support a model for eukaryote origins that involves a
mitochondrial endosymbiosis.
In addition to eukaryote-specific features, eukaryotes share
a number of traits that seem more to resemble Archaebacteria
than Eubacteria. For instance, eukaryotes and Archaebacteria
both have histones [56,57], whereas Eubacteria have a
non-homologous system of ‘histone-like’ proteins [58].
Eukaryotes have an elaborate cytoskeleton based on tubulin and actins [59,60]; however, many prokaryotes are known
to have homologues of these molecules. FtsZ, which is
involved in septum formation during cell division in Eubacteria is found in many eukaryotes as well as Eubacteria and
in many archaebacterial groups, except Crenarchaeota [61].
By contrast, while the Crenarchaeota do not have an FtsZ
equivalent, the crenarchaeotal protein crenactin, is related to
eukaryotic actin or ‘actin-related proteins’ and is absent
from Euryarchaeota [59]. Actins are structural proteins,
often having such diverse functions as providing cell
structural support and, in animals, facilitating muscle contraction, for instance [62]. Most recently, it has been shown
that a newly discovered group of Archaebacteria—the
‘Lokiarchaeota’—possess an actin variant that is the most
closely related prokaryotic actin to eukaryotic actins.
This ‘lokiactin’ has a limited distribution within Archaebacteria and is specifically found in Archaebacteria that
themselves—using ribosomal protein phylogenies—are
placed as sister taxa to eukaryotes. Additionally, a group of
proteins that are typically responsible for cell proliferation,
the GTPases of the Ras superfamily, are found in Lokiarchaeota and many of these sequences form sister-group
relationships with eukaryotes on phylogenetic trees. In
addition, the Lokiarchaeota contain ‘[. . .] a primordial version of the eukaryotic ESCRT (endosomal sorting complex
required for transport) vesicle trafficking pathway’
Downloaded from http://rstb.royalsocietypublishing.org/ on June 14, 2017
Table 1. Hypothesis as depicted in figure 1 and the corresponding path lengths inferred by these hypotheses. EUB, Eubacteria; EUK, Eukaryota; ARC,
Archaebacteria; EOC, Eocyta.
relationship of path lengths to one another
(a) three domains
(b) eukaryotes early
EUK-ARC EUK-EUB ARC-EUB
EUB-ARC EUK-EUB EUK-ARC
(c) eocyte
(d ) ring of life
EUK-EOC , EUK-ARC EUK-EUB ARC - EUB
EUB-ARC EUB-EOC . EUK-ARC EUK-EUB EUK-EOC
5. Analysis of path lengths on four different
hypotheses
Interpretations of evolutionary history usually involve two
components: the data and the model. When looking at
deep evolutionary history, it is possible to employ many
different kinds of data: fossil, cellular, genomic, among
others. The model, on the other hand, is a statement of how
evolution might have occurred. There are usually two parts
to the model: the branching diagram and the process. The
overwhelming majority of branching diagrams in the literature are tree diagrams, and in all likelihood for most
purposes these are the correct kinds of diagram to use [91].
However, if evolutionary history of the evolving objects
under study has contained introgressive events, then a
simple diverging tree cannot be used. To use a tree when
the data were not generated by a tree-like process is an
error in model selection [92]. There are many network-like
alternatives to trees, including phylogenetic networks [93],
sequence-sharing networks [32] and N-rooted fusion networks [31,33]. Each of these diagrams contains inferences of
flows of evolving entities [24]. In this article, we use the relative distances between the major groups of organisms, as
inferred on four different evolutionary scenarios, to ask
how well the data that have been collected can map onto
these distances. The paths we analyse have been defined elsewhere and we are using the available data, together with
Popperian and Whewellian philosophy to analyse how well
they fit the data.
Using path length analysis (see table 1 and figure 1), we
find that there are significant differences in the times that
are estimated since the lineages split. Naturally, HGT will
confuse these times for some genes and make coalescence
seem more recent than it actually is for some genes and
more ancient for other genes. However, all things being
equal, the different hypotheses suggest radically different
path length ratios. When a long separation time exists, we
expect that two taxa will have diverged in most of their features. We expect, for instance, that their PINs will have
diverged, that genome contents will have changed and biochemistry will be different. For shorter paths, we expect
fewer differences. In addition, we expect that the totality of
data will map differently onto the different diagrams. Most
importantly, all four hypotheses imply different path lengths:
none are simply a re-rooting of another hypothesis.
An analysis of table 1 and figure 1 shows some interesting
patterns. The three-domains hypothesis (table 1a and
figure 1a) implies that the shortest path on the tree is the
one between eukaryotes and Archaebacteria—implying that
these should be the most similar organisms, on average.
The eukaryotes-early hypothesis (table 1b and figure 1b)
shows eukaryotes to be equally distant from both kinds of
prokaryote and in both cases this is a long path. The ring
of life hypothesis (table 1d and figure 1d) shows eukaryotes
equally distant to both kinds of prokaryote, though in this
case eukaryotes are closer to the prokaryotes than either is
to the other. The eocyte hypothesis (table 1c and figure 1c)
suggests a short path separating eukaryotes and a subset of
the Archaebacteria—the eocytes, while a longer path
Phil. Trans. R. Soc. B 370: 20140323
This allows a test of the association between evolutionary history and function. When eukaryotic genes with prokaryote
homologues are partitioned into one class that are homologous
with archaebacterial genes and another class that are homologous to eubacterial genes, we see that the genes that are
homologous to Archaebacteria are more likely to be lethal
upon deletion, they are more highly expressed than the eubacterial genes, the protein products are significantly more central
in protein interaction networks (PINs) and are usually under
more selective constraint (as judged by dN/dS ratios)
[82,83,85]. In addition, there is an interesting difference
between eukaryotic genes with archaebacterial homologues
and those with eubacterial homologues—the eubacterial homologues are more likely to be duplicated in large eukaryotic
genomes and are more likely to be lost in small eukaryotic genomes. The genes with archaebacterial homologues are more
stable in terms of duplication or loss [85].
Phylogenetic analyses using sophisticated heterogeneous
models have consistently placed eukaryote informational
genes within archaebacterial clades [87–89]. Phylogenetic
supertrees recognize three main kinds of tree topology—
eukaryotes branch with Cyanobacteria, eukaryotes branch
with alpha-proteobacteria and eukaryotes branch within
Archaebacteria [84]. This supertree analysis found that the
gene trees infer many other kinds of relationships, but all
these topologies were at a low frequency that was no more
than expected by chance [84].
It is clear from the above that there are many eukaryotic
signature proteins (ESPs) [69,83,85], and many eukaryoticspecific features (proteins, organelles and processes) that
are either unique, highly elaborated compared with their prokaryotic equivalents, or completely absent (e.g. simultaneous
transcription and translation), just as there are many eubacterial-specific features (e.g. peptidoglycan cell walls) and
archaebacterial-specific features (e.g. ether-linked lipids, methanogenesis in some groups). If these are considered in isolation,
explaining their existence requires hypothesizing that either
(a) these features evolved in an isolated derived eukaryotic lineage that engaged in sustained gene genesis, or that (b) these
eukaryote-specific features are ancestral and prokaryotes have
lost them as a form of simplifying evolution [90].
rstb.royalsocietypublishing.org
hypothesis
5
Downloaded from http://rstb.royalsocietypublishing.org/ on June 14, 2017
hypothesis
inferred path lengths
three-domains
Eubacteria
Archaebacteria
eukaryotes
Eukaryotes-early
(b)
Archaebacteria
7. The eukaryotes-early hypothesis (tree-like,
simplificationist)
eukaryotes
(c)
eocyte
Eubacteria
eukaryotes
eocytes
other
Archaebacteria
ring of life
(d)
Eubacteria
eukaryotes
eocytes
Archaebacteria
EUK-ARC path
EUK-EUB path
EUB-ARC path
EUK-EOC path
Figure 1. Phylogenetic hypotheses for the highest level relationships of life.
On the right are the classes of hypothesis described in the text and on the
left are the various inferred path lengths. The colour coding is retained for
each hypothesis. (Online version in colour.)
separates eukaryotes and the rest of the Archaebacteria and a
much longer path separates eukaryotes from Eubacteria. The
eocyte hypothesis is, of course, a pruning of the ring of life
hypothesis, where there is no explicit direct path from Eubacteria to eocytes, except through the root. In reality, the eocyte
hypothesis is a ‘region’ of the overall ring of life hypothesis,
having been based initially on ribosome structures and later
on informational gene phylogenies.
6. Three-domains hypothesis (tree-like,
autogenous)
The three-domains model places the eukaryotes as a sistergroup to Archaebacteria. This implies a shorter path length,
on average between eukaryote and archaebacterial genomes.
That is to say, we would expect that eukaryotes and Archaebacteria to have more in common than either has in common
with Eubacteria (this assumes a somewhat constant rate of
genomic change—in effect clocklike behaviour). Instead, the
data reveal that eukaryote genomes tend to have more
The eukaryotes-early hypothesis has been seen in several
incarnations. Its first incarnation was known as the ‘intronsearly’ hypothesis, which speculated that genes in pieces
were ancestral ( perhaps a left-over from the RNA world
that preceded it) and in some lineages, these genes were
streamlined in order to become prokaryote [94,95]. Rooted
phylogenetic trees also were put forward to support this
idea [96,97] and even genome content phylogenies analysed
using parsimony [98], the authors having called the last
common ancestor of all life ‘incongruously complex’. In this
scenario, prokaryotes are seen as ‘superficially and secondarily’ simple [96], with the eukaryotic genomes being a ‘[. . .]
unique cell type that cannot be deconstructed into features inherited
directly from Archaea and Bacteria’ [99]. How this irreducible
complexity emerged, however, is unclear, unless one assumes
a previous unsampled ( perhaps extra-terrestrial) history
during which this complexity first arose (which would make
this hypothesis highly unparsimonious when compared
against other much simpler hypotheses), or the existence of a
creator. Irrespective of the above, assuming the tree in
figure 1b, we see that the longest period of independent evolution is the lineage leading to eukaryotes and we see that
Archaebacteria and Eubacteria have a shorter path separating
them. Ancestrally, the cell type is eukaryote and so the
prokaryote cell structure has evolved from a more complex
ancestor and prokaryote evolution is one of predominantly
reductional evolution. Certainly, we know that prokaryote
genome evolution in particular is biased towards deletions
[100], though eukaryotes also manifest this bias.
Following the paths in figure 1b, this would imply that
somehow when Archaebacteria and Eubacteria split, the
Archaebacteria preferentially took with them or retained
more essential genes, more highly expressed genes and more
central proteins in the ancestral PIN and lost the less essential,
more lowly expressed genes and genes whose protein products were less central in PINs. Loss of non-essential genes is
not new to evolutionary biology and such a scenario is well
known, particularly in symbiont or parasite genomes [101].
However, the topology has an additional implication—that
the Eubacteria preferentially took or retained the less essential
genes, the more lowly expressed genes and the genes whose
protein products were less central in PINs, while dispensing
with the more essential genes. It is not immediately obvious
what selective advantage might accrue from deleting genes
that are presumably very important for life. In effect, this
Phil. Trans. R. Soc. B 370: 20140323
Eubacteria
6
rstb.royalsocietypublishing.org
(a)
eubacterial genes than archaebacterial genes [67,69,82,83].
This could still be explained on the Woese model, if those
eubacterial homologues were the least likely to be lost
throughout evolutionary history. However, this hypothesis is
falsified by the observation by Alvarez-Ponce et al. [85] of an
association between the likelihood of gene loss (and also of
gene duplication) and whether a eukaryote gene is homologous to a eubacterial gene or an archaebacterial gene. In
other words, despite the longer path between eukaryotes
and Eubacteria in figure 1a, this model suggests that a greater
number of dispensable genes were retained than were retained
from the more closely related Archaebacteria. This finding
makes the three-domains hypothesis highly implausible.
Downloaded from http://rstb.royalsocietypublishing.org/ on June 14, 2017
The eocyte hypothesis was first put forward in 1984 on the
basis of the similarities in the ultrastructures of ribosomes
found in eukaryotes and some Archaebacteria [103]. This
led Lake and colleagues to propose that there was a sistergroup relationship between eukaryotes and a subset of
Archaebacteria, which he called eocytes, and not the relationship that was being suggested from studies of small-subunit
ribosomal RNA sequences which placed eukaryotes as the
sister-group of all Archaebacteria. This eocyte hypothesis
spent more than 20 years as the lesser-known alternative to
the Woesian tree, but has recently received more attention
as better phylogenetic methods have been applied to collections of informational genes [87 –89] and supertrees have
been built from large collections of orthologues [84]. This
hypothesis relates to the specific set of relationships between
eukaryotes and Archaebacteria and centres on whether
Archaebacteria are monophyletic or not. If eukaryotes are
ancestral, then it is necessary to explain how sophisticated
heterogeneous maximum-likelihood phylogenies place these
informational genes within the diversity of Archaebacteria
and reject with significance the hypotheses of monophyletic
Archaebacteria [87,89].
Probably the most conclusive support for a within-Archaebacteria relationship for some genes in eukaryotes has come
from the newly discovered ‘Lokiarchaeota’ sequences [63].
These sequences clearly identify a sister-group relationship
with eukaryotes for some key genes, including members of
the Ras superfamily, eukaryotic actins (lokiactins) and the
ESCRT pathway. They provide functional clues to the first
eukaryotes and provide a genetic toolbox that could explain
the origin of phagotrophy.
9. Ring of life (network, complexificationist)
The ring of life hypothesis places a genomic and cellular
merger event at the centre of eukaryogenesis. In this scenario,
eukaryotes are not ancient: they are a more recent group than
either of the two prokaryotic groups. This scenario also
implies that neither prokaryotic group is monophyletic, as
the eukaryotes arose from within the group and not as a separate lineage. Furthermore, the expectation from this scenario
is that Archaebacteria (specifically, the eocytes) and the
Eubacteria (specifically, the mitochondrial ancestor) made
7
Phil. Trans. R. Soc. B 370: 20140323
8. Eocyte hypothesis (tree-like, partial
explanation)
similar contributions to the eukaryote. The implication is
that eukaryotes are not simply modified Archaebacteria or
‘Archaebacteria with some eubacterial parts’, rather that the
origin of eukaryotes corresponds to an egalitarian merger
of two distant relatives, and without both parts eukaryotes
would not emerge. This is what we mean by ‘similar’: this
event would not have occurred unless both were present.
Support for this hypothesis comes from genome content
phylogenies [38]. Lake and Rivera [104] have used the
method of conditioned reconstruction to produce a collection
of phylogenetic trees that are incongruent. However, these
trees can be perfectly mapped onto a ring [105]. By taking a
‘signal stripping’ approach, Pisani et al. [84] showed that there
were three and only three phylogenetic signals uniting
prokaryotes and eukaryotes: one specifically uniting eukaryotes and cyanobacteria, one uniting eukaryotes and alphaproteobacteria and one uniting eukaryotes and Archaebacteria.
An analysis of chaperone systems in eukaryotes showed
that there are a mix of genes with homologies to both Archaebacteria and Eubacteria [106] active in eukaryotes and,
furthermore, that folding of proteins with archaebacterial
affiliations by eubacterial chaperones and vice versa was
normal in eukaryotes. A simple analysis of the phylogenetic
affiliations of eukaryotic genes [69,82,83,85] shows there to be
evolutionary affiliations between most functional categories
of eukaryotic proteins.
Calculations from bioenergetics show that the per-gene
energy requirement for eukaryotic genes vastly exceeds the prokaryotic requirement and, therefore, eukaryotes can only exist if
they have a ‘powerhouse’ [55]. In addition, eukaryote proteomes
show evidence of protein interactions being structured according to phylogenetic affiliation [82,83]: eubacterial proteins
preferentially interact with eubacterial proteins, archaebacterial
proteins preferentially interact with eubacterial proteins and
ESPs preferentially interact with ESPs.
This model would expect and require that some gene
phylogenies place eukaryotes within the diversity of Eubacteria and also that other phylogenies would place them
within the diversity of Archaebacteria. This is exactly what
we see [67,68,87].
Analyses have not attributed equal roles to archaebacterial
and eubacterial homologues of eukaryotes. For instance,
archaebacterial homologues are more central in PINs (both
closeness centrality and betweenness centrality), more slowly
evolving, more likely to be lethal upon deletion, and more
likely to be involved in informational processes, among
other things [83]. Simply stated, archaebacterial genes seem
to be more important in eukaryotic genomes, but eubacterial
genes seem to be more flexible and more likely to be duplicated
or lost [85]. Given a merger scenario for eukaryogenesis, this
would imply that at the start of eukaryogenesis there were
two genome equivalents in the merger. Over time, one (the
archaebacterial) acted as a more central genome, more
involved in informational processes. The eubacterial genome
could lose the more important informational genes without a
fitness cost if much of the cellular duplication, transcription
and translation was being carried out by the archaebacterial
proteins. The mitochondrion retained some informational processes, but the nucleo-cytosolic informational functions were
the responsibility of the archaebacterial homologues. By contrast, the less central eubacterial homologues were largely
involved in metabolic functions and overall, these genes
were more numerous.
rstb.royalsocietypublishing.org
hypothesis expects that Eubacteria were effectively eviscerated,
but somehow thrived and expanded. This speculation is the
opposite of our expectations using standard evolutionary
theories.
In addition to the problem with explaining the eukaryotes-early hypothesis using cellular data, the model fails to
account for the absence of fossil eukaryotes before 1.7 billion
years ago. The oldest known fossil eukaryote Shuiyousphaeridium macroreticulatum is from the Ruyang group (in China)
and is dated at approximately 1.6 to 1.8 Ga [102]. By contrast,
the oldest known prokaryotic fossils are twice as old [102].
Assuming that single-celled prokaryotes and eukaryotes
have been equally amenable to fossilization over time, there
is a question over where the missing fossils have gone.
Downloaded from http://rstb.royalsocietypublishing.org/ on June 14, 2017
10. What is a eukaryote?
References
1.
2.
3.
4.
5.
6.
Darwin C. 1859 On the origin of species by means of
natural selection. London, UK: John Murray.
Excoffier L, Laval G, Schneider S. 2005 Arlequin
(version 3.0): an integrated software package for
population genetics data analysis. Evol. Bioinform.
Online 1, 47–50.
McInerney JO. 2013 More than three dimensions: interlineage evolution’s ecological importance. Trends Ecol.
Evol. 28, 624–625. (doi:10.1016/j.tree.2013.09.002)
Lydiate DJ, Malpartida F, Hopwood DA. 1985 The
Streptomyces plasmid SCP2*: its functional analysis
and development into useful cloning vectors. Gene
35, 223 –235. (doi:10.1016/0378-1119(85)90001-0)
Canchaya C, Fournous G, Chibani-Chennoufi S,
Dillmann ML, Brussow H. 2003 Phage as agents of
lateral gene transfer. Curr. Opin. Microbiol. 6,
417–424. (doi:10.1016/S1369-5274(03)00086-9)
Xiao X, Li J, Samulski RJ. 1996 Efficient long-term gene
transfer into muscle tissue of immunocompetent
mice by adeno-associated virus vector. J. Virol. 70,
8098–8108.
7. Salyers AA, Shoemaker NB, Stevens AM, Li LY. 1995
Conjugative transposons: an unusual and diverse set
of integrated gene transfer elements. Microbiol. Rev.
59, 579– 590.
8. Lang AS, Zhaxybayeva O, Beatty JT. 2012 Gene
transfer agents: phage-like elements of genetic
exchange. Nat. Rev. Microbiol. 10, 472–482.
(doi:10.1038/nrmicro2802)
9. Dubey GP, Ben-Yehuda S. 2011 Intercellular
nanotubes mediate bacterial communication.
Cell 144, 590–600. (doi:10.1016/j.cell.2011.01.015)
10. Heliconius Genome Consortium. 2012 Butterfly
genome reveals promiscuous exchange of mimicry
adaptations among species. Nature 487, 94 –98.
(doi:10.1038/nature11041)
11. Patterson N, Richter DJ, Gnerre S, Lander ES, Reich
D. 2006 Genetic evidence for complex speciation of
12.
13.
14.
15.
humans and chimpanzees. Nature 441, 1103–
1108. (doi:10.1038/nature04789)
McInerney JO, Pisani D, Bapteste E, O’Connell MJ.
2011 The public goods hypothesis for the evolution
of life on Earth. Biol. Direct 6, 41. (doi:10.1186/
1745-6150-6-41)
Bapteste E et al. 2009 Prokaryotic evolution and the
tree of life are two different things. Biol. Direct 4,
34. (doi:10.1186/1745-6150-4-34)
Bapteste E, Lopez P, Bouchard F, Baquero F,
McInerney JO, Burian RM. 2012 Evolutionary
analyses of non-genealogical bonds produced
by introgressive descent. Proc. Natl Acad. Sci.
USA 109, 18 266–18 272. (doi:10.1073/pnas.
1206541109)
Chewapreecha C et al. 2014 Dense genomic
sampling identifies highways of pneumococcal
recombination. Nat. Genet. 46, 305–309. (doi:10.
1038/ng.2895)
Phil. Trans. R. Soc. B 370: 20140323
Competing interests. We declare we have no competing interests.
Funding. We received no funding for this study.
8
rstb.royalsocietypublishing.org
Eukaryotes are a secondary lineage of life that originated following a symbiotic event involving a eubacterium
and an archaebacterium. As a consequence, the eukaryotic
genome is a complex mixture of archaebacterial and eubacterial genes that were horizontally shaken into the same cocktail
glass. In addition, eukaryotes invented their own genes and
cellular structures. The major gene flows into eukaryotes
have almost certainly been from an archaebacterium (the
eocyte) that would not be hugely different to modern Archaebacteria, from a eubacterium that would be placed
somewhere within the diversity of modern alpha-proteobacteria, and the third gene flow is to be found in the
Archaeplastida and arose from within the diversity of Cyanobacteria. Other prokaryote-to-eukaryote gene flows, though
they are known to exist [107] and might have had significant
phenotypic effects [108], do not seem to have been formative.
Modern eukaryotes are composed largely of archaebacterial homologues that are more highly expressed, more central
in PINs, more likely to be lethal upon deletion, more slowly
evolving and more likely to be informational. By contrast,
eukaryotic genes with eubacterial homologues are more
likely to be duplicated in larger genomes and more likely to
be lost in smaller genomes, they are more likely to be
involved in mendelian disease in humans, they are less
likely to be lethal upon deletion and are faster evolving
than their archaebacterial counterparts.
The main archaebacterial gene flow into eukaryotes
brought with it the raw materials for the eukaryote cytoskeleton, transcription, vesicle trafficking and some elements of
cell division. The eubacterial gene flow brought with it
beta-oxidation, electron transport chain, fructose/mannose
metabolism and pathways for the synthesis of lipids, biotin,
haem and iron–sulfur clusters [78]. The eubacterial gene
flow is also likely to have brought group II introns, which
evolved into spliceosomal introns [50].
To conclude, it is clear that eukaryotes cannot be correctly
defined as ‘derived’ Archaebacteria, or as ‘derived’ Eubacteria.
Indeed, to view eukaryotes as being from either the archaebacterial or the eubacterial lineages is an over-simplification. Each
human is derived equally from both parents. They would not
exist without a genetic contribution from both, and it does not
matter if they look more like their mother or father, or which surname they carry, if any. The reality is that a human only exists as a
consequence of a contribution from both parents. Analogously,
eukaryotes are equally eubacterial and archaebacterial. A taxonomic debate exists in the literature on the early evolution of
life, whereby hypotheses have been suggested to be characterizable either as three-domains or two-domains based (2D versus
3D hypotheses). This characterization inherently assumes the
existence of a tree-like pattern of evolution, which is misleading. Because eukaryotes arose from both Archaebacteria and
Eubacteria, there are only two (monophyletic) lineages of life:
(i) cellular life and (ii) the eukaryotes. Monophyletic eukaryotes
are nested within monophyletic life. Eukaryotes make domainbased classifications obsolete and we therefore advocate dismissing the use of this term (which can easily be replaced by the term
lineage, for instance) entirely. That is, we advocate a ‘domainfree’ view of the history of life, as debates about whether there
should be two domains or three are essentialist and moot.
In a pluralistic view of cellular life on the planet, we can
see that the merging of eubacterial genes with archaebacterial
genes gave rise to the halophiles and indeed it made an enormous contribution to the origins of most of the major groups
of Archaebacteria. We see that photosynthesis can only be
interpreted as a series of gene flows around the prokaryotic
and eukaryotic worlds. We see that eukaryotes have arisen
as a consequence of major flows between prokaryotes
initially (eukaryogenesis), and later, between a prokaryote
group and a eukaryotic group ( plastid origins) [84].
Life’s history is complex and we should not try to
simplify it to suit our need for orderly nomenclatural
systems.
Downloaded from http://rstb.royalsocietypublishing.org/ on June 14, 2017
47. McInerney JO, Martin WF, Koonin EV, Allen JF,
Galperin MY, Lane N, Archibald JM, Embley TM.
2011 Planctomycetes and eukaryotes: a case of
analogy not homology. Bioessays 33, 810–817.
(doi:10.1002/bies.201100045)
48. Staub E, Fiziev P, Rosenthal A, Hinzmann B. 2004
Insights into the evolution of the nucleolus by an
analysis of its protein domain repertoire. Bioessays
26, 567 –581. (doi:10.1002/bies.20032)
49. Erwin DH. 2015 A public goods approach to major
evolutionary innovations. Geobiology 13, 308 –315.
(doi:10.1111/gbi.12137)
50. Rogozin IB, Carmel L, Csuros M, Koonin EV. 2012
Origin and evolution of spliceosomal introns. Biol.
Direct 7, 11. (doi:10.1186/1745-6150-7-11)
51. Decker KB, Hinton DM. 2013 Transcription regulation
at the core: similarities among bacterial, archaeal,
and eukaryotic RNA polymerases. Annu. Rev.
Microbiol. 67, 113– 139. (doi:10.1146/annurevmicro-092412-155756)
52. Ramesh MA, Malik SB, Logsdon JM Jr. 2005 A
phylogenomic inventory of meiotic genes; evidence
for sex in Giardia and an early eukaryotic origin of
meiosis. Curr. Biol. 15, 185– 191.
53. Naor A, Gophna U. 2013 Cell fusion and hybrids in
Archaea: prospects for genome shuffling and
accelerated strain development for biotechnology.
Bioengineered 4, 126–129. (doi:10.4161/bioe.22649)
54. Gophna AN, Lapierre P, Mevarech M, Papke RT, Gophna
U. 2012 Low species barriers in halophilic Archaea and
the formation of recombinant hybrids. Curr. Biol. 22,
1444–1448. (doi:10.1016/j.cub.2012.05.056)
55. Lane N, Martin W. 2010 The energetics of genome
complexity. Nature 467, 929– 934. (doi:10.1038/
nature09486)
56. Pereira SL, Grayling RA, Lurz R, Reeve JN. 1997
Archaeal nucleosomes. Proc. Natl Acad. Sci. USA 94,
12 633 –12 637. (doi:10.1073/pnas.94.23.12633)
57. Reeve JN, Bailey KA, Li WT, Marc F, Sandman K,
Soares DJ. 2004 Archaeal histones: structures,
stability and DNA binding. Biochem. Soc. Trans. 32,
227–230. (doi:10.1042/BST0320227)
58. Anuchin AM, Goncharenko AV, Demidenok OI,
Kaprel’iants AS. 2011 Histone-like proteins of
bacteria (review). Prikl. Biokhim. Mikrobiol. 47,
635–641.
59. Yutin N, Wolf MY, Wolf YI, Koonin EV. 2009 The
origins of phagocytosis and eukaryogenesis. Biol.
Direct 4, 9. (doi:10.1186/1745-6150-4-9)
60. Hammesfahr B, Kollmar M. 2012 Evolution of the
eukaryotic dynactin complex, the activator of
cytoplasmic dynein. BMC Evol. Biol. 12, 95. (doi:10.
1186/1471-2148-12-95)
61. Makarova KS, Yutin N, Bell SD, Koonin EV. 2010
Evolution of diverse cell division and vesicle
formation systems in Archaea. Nat. Rev. Microbiol. 8,
731–741. (doi:10.1038/nrmicro2406)
62. Gunning PW, Ghoshdastider U, Whitaker S, Popp
D, Robinson RC. 2015 The evolution of
compositionally and functionally distinct actin
filaments. J. Cell Sci. 128, 2009–2019. (doi:10.
1242/jcs.165563)
9
Phil. Trans. R. Soc. B 370: 20140323
31. Haggerty LS et al. 2014 A pluralistic account of
homology: adapting the models to the data. Mol. Biol.
Evol. 31, 501–516. (doi:10.1093/molbev/mst228)
32. Halary S, Leigh JW, Cheaib B, Lopez P, Bapteste E.
2010 Network analyses structure genetic diversity
in independent genetic worlds. Proc. Natl Acad.
Sci. USA 107, 127–132. (doi:10.1073/pnas.
0908978107)
33. Coleman O, Hogan R, McGoldrick N, Rudden N,
McInerney JO. 2015 Evolution by pervasive gene
fusion in antibiotic resistance and antibiotic
synthesizing genes. Computation 3, 114 –127.
(doi:10.3390/computation3020114)
34. O’Hara R. 1998 Population thinking and tree
thinking in systematics. Zool. Scr. 26, 323 –329.
(doi:10.1111/j.1463-6409.1997.tb00422.x)
35. Gribaldo S, Poole AM, Daubin V, Forterre P,
Brochier-Armanet C. 2010 The origin of eukaryotes
and their relationship with the Archaea: are we at a
phylogenomic impasse? Nat. Rev. Microbiol. 8,
743 –752. (doi:10.1038/nrmicro2426)
36. Popper K. 1934 The logic of scientific discovery.
Vienna, Austria: Mohr Siebeck.
37. Whewell W. 1840 The philosophy of inductive sciences,
founded upon their history. London, UK: JW Parker.
38. Rivera MC, Lake JA. 2004 The ring of life provides
evidence for a genome fusion origin of eukaryotes.
Nature 431, 152–155. (doi:10.1038/nature02848)
39. Cravatt BF, Kodadek T. 2015 Editorial overview: Omics:
methods to monitor and manipulate biological
systems: recent advances in ’omics’. Curr. Opin. Chem.
Biol. 24, v–vii. (doi:10.1016/j.cbpa.2014.12.023)
40. Fisch KM, Meißner T, Gioia L, Ducom JC, Carland
TM, Loguercio S, Su AI. 2015 Omics Pipe: a
community-based framework for reproducible
multi-omics data analysis. Bioinformatics 31,
1724 –1728. (doi:10.1093/bioinformatics/btv061)
41. Fondi M, Lio P. 2015 Multi-omics and metabolic
modelling pipelines: challenges and tools for
systems microbiology. Microbiol. Res. 171, 52 –64.
(doi:10.1016/j.micres.2015.01.003)
42. Bapteste E, Charlebois RL, MacLeod D, Brochier C.
2005 The two tempos of nuclear pore complex
evolution: highly adapting proteins in an ancient
frozen structure. Genome Biol. 6, R85. (doi:10.1186/
gb-2005-6-10-r85)
43. Mans BJ, Anantharaman V, Aravind L, Koonin EV.
2004 Comparative genomics, evolution and origins
of the nuclear envelope and nuclear pore complex.
Cell Cycle 3, 1612–1637. (doi:10.4161/cc.3.12.1345)
44. Iyer LM, Anantharaman V, Wolf MY, Aravind L. 2008
Comparative genomics of transcription factors and
chromatin proteins in parasitic protists and other
eukaryotes. Int. J. Parasitol. 38, 1–31. (doi:10.
1016/j.ijpara.2007.07.018)
45. Cavalier-Smith T. 2010 Origin of the cell nucleus,
mitosis and sex: roles of intracellular coevolution.
Biol. Direct 5, 7. (doi:10.1186/1745-6150-5-7)
46. McInerney JO. 1998 Replicational and transcriptional
selection on codon usage in Borrelia burgdorferi.
Proc. Natl Acad. Sci. USA 95, 10 698 –10 703.
(doi:10.1073/pnas.95.18.10698)
rstb.royalsocietypublishing.org
16. Sheppard SK et al. 2013 Genome-wide association study
identifies vitamin B5 biosynthesis as a host specificity
factor in Campylobacter. Proc. Natl Acad. Sci. USA 110,
11 923–11 927. (doi:10.1073/pnas.1305559110)
17. Nelson-Sathi S, Dagan T, Landan G, Janssen A, Steel
M, McInerney JO, Deppenmeier U, Martin WF. 2012
Acquisition of 1,000 eubacterial genes
physiologically transformed a methanogen at the
origin of Haloarchaea. Proc. Natl Acad. Sci. USA 109,
20 537–20 542. (doi:10.1073/pnas.1209119109)
18. Wolf YI, Rogozin IB, Grishin NV, Tatusov RL, Koonin EV.
2001 Genome trees constructed using five different
approaches suggest new major bacterial clades. BMC
Evol. Biol. 1, 8. (doi:10.1186/1471-2148-1-8)
19. Nelson-Sathi S et al. 2015 Origins of major archaeal
clades correspond to gene acquisitions from bacteria.
Nature 517, 77–80. (doi:10.1038/nature13805)
20. Akanni WA, Siu-Ting K, Creevey CJ, McInerney JO,
Wilkinson M, Foster PG, Pisani D. 2015 Horizontal
gene flow from Eubacteria to Archaebacteria and
what it means for our understanding of
eukaryogenesis. Phil. Trans. R. Soc. B 370,
20140337. (doi:10.1098/rstb.2014.0337)
21. Gouy M, Li WH. 1989 Phylogenetic analysis based
on rRNA sequences supports the archaebacterial
rather than the eocyte tree. Nature 339, 145–147.
(doi:10.1038/339145a0)
22. Eisen JA, Fraser CM. 2003 Phylogenomics:
intersection of evolution and genomics. Science
300, 1706 –1707. (doi:10.1126/science.1086292)
23. Doolittle WF, Bapteste E. 2007 Pattern pluralism and
the Tree of Life hypothesis. Proc. Natl Acad. Sci. USA
104, 2043–2049. (doi:10.1073/pnas.0610699104)
24. Bapteste E et al. 2013 Networks: expanding
evolutionary thinking. Trends Genet. 29, 439–441.
(doi:10.1016/j.tig.2013.05.007)
25. Creevey CJ, Fitzpatrick DA, Philip GK, Kinsella RJ,
O’Connell MJ, Pentony MM, Travers SA, Wilkinson
M, McInerney JO. 2004 Does a tree-like phylogeny
only exist at the tips in the prokaryotes?
Proc. R. Soc. Lond. B 271, 2551 –2558. (doi:10.
1098/rspb.2004.2864)
26. Puigbo P, Wolf YI, Koonin EV. 2010 The tree
and net components of prokaryote evolution. Genome
Biol. Evol. 2, 745–756. (doi:10.1093/gbe/evq062)
27. Puigbo P, Wolf YI, Koonin EV. 2009 Search for a
’Tree of Life’ in the thicket of the phylogenetic
forest. J. Biol. 8, 59. (doi:10.1186/jbiol159)
28. Koonin EV, Wolf YI, Puigbo P. 2009 The
phylogenetic forest and the quest for the elusive
tree of life. Cold Spring Harb. Symp. Quant. Biol. 74,
205–213. (doi:10.1101/sqb.2009.74.006)
29. Koonin EV, Puigbó P, Wolf YI. 2011 Comparison of
phylogenetic trees and search for a central trend in
the ‘Forest of Life’. J. Comput. Biol. 18, 917–924.
(doi:10.1089/cmb.2010.0185)
30. Kloesges T, Popa O, Martin W, Dagan T. 2011
Networks of gene sharing among 329
proteobacterial genomes reveal differences in lateral
gene transfer frequency at different phylogenetic
depths. Mol. Biol. Evol. 28, 1057–1074. (doi:10.
1093/molbev/msq297)
Downloaded from http://rstb.royalsocietypublishing.org/ on June 14, 2017
93.
94.
95.
96.
97.
98.
99.
100.
101.
102.
103.
104.
105.
106.
107.
108.
choice of matrix are not justified. BMC Evol. Biol. 6,
29. (doi:10.1186/1471-2148-6-29)
Bryant D, Moulton V. 2004 Neighbor-net: an
agglomerative method for the construction of
phylogenetic networks. Mol. Biol. Evol. 21,
255–265. (doi:10.1093/molbev/msh018)
Kersanach R, Brinkmann H, Liaud MF, Zhang DX, Martin
W, Cerff R. 1994 Five identical intron positions in ancient
duplicated genes of eubacterial origin. Nature 367,
387–389. (doi:10.1038/367387a0)
de Souza SJ, Long M, Klein RJ, Roy S, Lin S, Gilbert W.
1998 Toward a resolution of the introns early/late
debate: only phase zero introns are correlated with
the structure of ancient proteins. Proc. Natl Acad. Sci.
USA 95, 5094– 5099. (doi:10.1073/pnas.95.9.5094)
Forterre P, Philippe H. 1999 The last universal common
ancestor (LUCA), simple or complex? Biol. Bull. 196,
373–375; discussion 375–377. (doi:10.2307/1542973)
Forterre P, Philippe H. 1999 Where is the root of the
universal tree of life? Bioessays 21, 871–879.
(doi:10.1002/(SICI)1521-1878(199910)21:10,871:
:AID-BIES10.3.0.CO;2-Q)
Harish A, Tunlid A, Kurland CG. 2013 Rooted
phylogeny of the three superkingdoms. Biochimie
95, 1593–1604. (doi:10.1016/j.biochi.2013.04.016)
Kurland CG, Collins LJ, Penny D. 2006 Genomics and
the irreducible nature of eukaryote cells. Science
312, 1011– 1014. (doi:10.1126/science.1121674)
Kuo CH, Ochman H. 2009 Deletional bias across
the three domains of life. Genome Biol. Evol. 1,
145–152. (doi:10.1093/gbe/evp016)
Moran NA. 1996 Accelerated evolution and Muller’s
rachet in endosymbiotic bacteria. Proc. Natl Acad. Sci.
USA 93, 2873–2878. (doi:10.1073/pnas.93.7.2873)
Wacey D, Kilburn M, Saunders M, Cliff J, Brasier M.
2011 Microfossils of sulphur-metabolizing cells in
3.4-billion-year-old rocks of Western Australia. Nat.
Geosci. 4, 698–702. (doi:10.1038/ngeo1238)
Lake JA, Henderson E, Oakes M, Clark MW. 1984
Eocytes: a new ribosome structure indicates a
kingdom with a close relationship to eukaryotes.
Proc. Natl Acad. Sci. USA 81, 3786–3790. (doi:10.
1073/pnas.81.12.3786)
Lake JA, Rivera MC. 2004 Deriving the genomic tree
of life in the presence of horizontal gene transfer:
conditioned reconstruction. Mol. Biol. Evol. 21,
681–690. (doi:10.1093/molbev/msh061)
McInerney JO, Wilkinson M. 2005 New methods
ring changes for the tree of life. Trends Ecol. Evol.
20, 105 –107. (doi:10.1016/j.tree.2005.01.007)
Bogumil D, Alvarez-Ponce D, Landan G, McInerney JO,
Dagan T. 2014 Integration of two ancestral chaperone
systems into one: the evolution of eukaryotic molecular
chaperones in light of eukaryogenesis. Mol. Biol. Evol.
31, 410–418. (doi:10.1093/molbev/mst212)
Hirt RP, Alsmark C, Embley TM. 2015 Lateral gene
transfers and the origins of the eukaryote proteome: a
view from microbial parasites. Curr. Opin. Microbiol. 23,
155–162. (doi:10.1016/j.mib.2014.11.018)
Marcet-Houben M, Gabaldon T. 2010 Acquisition of
prokaryotic genes by fungal genomes. Trends Genet.
26, 5 –8. (doi:10.1016/j.tig.2009.11.007)
10
Phil. Trans. R. Soc. B 370: 20140323
78. Gabaldon T, Huynen MA. 2003 Reconstruction of the
proto-mitochondrial metabolism. Science 301, 609.
(doi:10.1126/science.1085463)
79. Martin W, Brinkmann H, Savonna C, Cerff R. 1993
Evidence for a chimeric nature of nuclear genomes:
eubacterial origin of eukaryotic glyceraldehyde-3phosphate dehydrogenase genes. Proc. Natl Acad. Sci.
USA 90, 8692–8696. (doi:10.1073/pnas.90.18.8692)
80. Tovar J, Leon-Avila G, Sanchez LB, Sutak R, Tachezy
J, van der Giezen M, Hernandez M, Muller M,
Lucocq JM. 2003 Mitochondrial remnant organelles
of Giardia function in iron-sulphur protein
maturation. Nature 426, 172–176. (doi:10.1038/
nature01945)
81. Emelyanov VV. 2003 Phylogenetic affinity of a
Giardia lamblia cysteine desulfurase conforms to
canonical pattern of mitochondrial ancestry. FEMS
Microbiol. Lett. 226, 257–266. (doi:10.1016/S03781097(03)00598-6)
82. Alvarez-Ponce D, McInerney JO. 2011 The human
genome retains relics of its prokaryotic ancestry:
human genes of archaebacterial and eubacterial
origin exhibit remarkable differences. Genome Biol.
Evol. 3, 782–790. (doi:10.1093/gbe/evr073)
83. Cotton JA, McInerney JO. 2010 Eukaryotic genes of
archaebacterial origin are more important than the
more numerous eubacterial genes, irrespective of
function. Proc. Natl Acad. Sci. USA 107, 17 252–
17 255. (doi:10.1073/pnas.1000265107)
84. Pisani D, Cotton JA, McInerney JO. 2007 Supertrees
disentangle the chimerical origin of eukaryotic
genomes. Mol. Biol. Evol. 24, 1752– 1760. (doi:10.
1093/molbev/msm095)
85. Alvarez-Ponce D, Lopez P, Bapteste E, McInerney JO.
2013 Gene similarity networks provide tools for
understanding eukaryote origins and evolution.
Proc. Natl Acad. Sci. USA 110, E1594–E1603.
(doi:10.1073/pnas.1211371110)
86. Giaever G et al. 2002 Functional profiling of the
Saccharomyces cerevisiae genome. Nature 418,
387 –391. (doi:10.1038/nature00935)
87. Cox CJ, Foster PG, Hirt RP, Harris SR, Embley TM.
2008 The archaebacterial origin of eukaryotes. Proc.
Natl Acad. Sci. USA 105, 20 356–20 361. (doi:10.
1073/pnas.0810647105)
88. Williams TA, Embley TM. 2014 Archaeal ‘dark
matter’ and the origin of eukaryotes. Genome Biol.
Evol. 6, 474–481. (doi:10.1093/gbe/evu031)
89. Williams TA, Foster PG, Cox CJ, Embley TM. 2013 An
archaeal origin of eukaryotes supports only two
primary domains of life. Nature 504, 231 –236.
(doi:10.1038/nature12779)
90. de Duve C. 2007 The origin of eukaryotes: a
reappraisal. Nat. Rev. Genet. 8, 395–403. (doi:10.
1038/nrg2071)
91. Baldauf SL. 2003 Phylogeny for the faint of heart: a
tutorial. Trends Genet. 19, 345–351. (doi:10.1016/
S0168-9525(03)00112-4)
92. Keane TM, Creevey CJ, Pentony MM, Naughton TJ,
McLnerney JO. 2006 Assessment of methods for
amino acid matrix selection and their use on
empirical data shows that ad hoc assumptions for
rstb.royalsocietypublishing.org
63. Spang A et al. 2015 Complex Archaea that bridge
the gap between prokaryotes and eukaryotes.
Nature 521, 173 –179. (doi:10.1038/nature14447)
64. Henne WM, Buchkovich NJ, Emr SD. 2011 The ESCRT
pathway. Dev. Cell 21, 77– 91. (doi:10.1016/j.
devcel.2011.05.015)
65. Hirt RP, Logsdon JM Jr, Healy B, Dorey MW,
Doolittle WF, Embley TM. 1999 Microsporidia are
related to fungi: evidence from the largest subunit
of RNA polymerase II and other proteins. Proc. Natl
Acad. Sci. USA 96, 580 –585. (doi:10.1073/pnas.96.
2.580)
66. Embley TM, Horner DA, Hirt RP. 1997 Anaerobic
eukaryote evolution: hydrogenosomes as
biochemically modified mitochondria? Trends Ecol.
Evol. 12, 437 –441. (doi:10.1016/S01695347(97)01208-1)
67. Fitzpatrick DA, Creevey CJ, McInerney JO. 2006
Genome phylogenies indicate a meaningful alphaproteobacterial phylogeny and support a grouping of
the mitochondria with the Rickettsiales. Mol. Biol.
Evol. 23, 74 –85. (doi:10.1093/molbev/msj009)
68. Esser C, Martin W, Dagan T. 2007 The origin of
mitochondria in light of a fluid prokaryotic
chromosome model. Biol. Lett. 3, 180 –184. (doi:10.
1098/rsbl.2006.0582)
69. Esser C et al. 2004 A genome phylogeny for
mitochondria among alpha-proteobacteria and a
predominantly eubacterial ancestry of yeast nuclear
genes. Mol. Biol. Evol. 21, 1643 –1660. (doi:10.
1093/molbev/msh160)
70. Cavalier-Smith T. 1989 Molecular phylogeny.
Archaebacteria and Archezoa. Nature 339, 100.
71. Woese CR, Kandler O, Wheelis ML. 1990 Towards a
natural system of organisms: proposal for the domains
Archaea, Bacteria, and Eucarya. Proc. Natl Acad. Sci. USA
87, 4576–4579. (doi:10.1073/pnas.87.12.4576)
72. Foster PG. 2004 Modeling compositional
heterogeneity. Syst. Biol. 53, 485–495. (doi:10.
1080/10635150490445779)
73. Horner DS, Embley TM. 2001 Chaperonin 60
phylogeny provides further evidence for secondary
loss of mitochondria among putative earlybranching eukaryotes. Mol. Biol. Evol. 18, 1970 –
1975. (doi:10.1093/oxfordjournals.molbev.a003737)
74. Horner DS, Hirt RP, Kilvington S, Lloyd D, Embley
TM. 1996 Molecular data suggest an early
acquisition of the mitochondrion endosymbiont.
Proc. R. Soc. Lond. B 263, 1053 –1059. (doi:10.
1098/rspb.1996.0155)
75. Horner DS, Hirt RP, Embley TM. 1999 A single
eubacterial origin of eukaryotic pyruvate: ferredoxin
oxidoreductase genes: implications for the evolution
of anaerobic eukaryotes. Mol. Biol. Evol. 16, 1280 –
1291. (doi:10.1093/oxfordjournals.molbev.a026218)
76. Embley TM, Martin W. 2006 Eukaryotic evolution,
changes and challenges. Nature 440, 623 –630.
(doi:10.1038/nature04546)
77. Ghosal D, Trambaiolo D, Amos LA, Lowe J. 2014
MinCD cell division proteins form alternating
copolymeric cytomotive filaments. Nat. Commun. 5,
5341. (doi:10.1038/ncomms6341)