Download Rooting the Ribosomal Tree of Life Research article

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Citric acid cycle wikipedia , lookup

Molecular ecology wikipedia , lookup

Magnesium transporter wikipedia , lookup

Gene wikipedia , lookup

Fatty acid synthesis wikipedia , lookup

Community fingerprinting wikipedia , lookup

Ribosomally synthesized and post-translationally modified peptides wikipedia , lookup

Two-hybrid screening wikipedia , lookup

Metalloprotein wikipedia , lookup

Nucleic acid analogue wikipedia , lookup

Hepoxilin wikipedia , lookup

Protein wikipedia , lookup

Peptide synthesis wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Ancestral sequence reconstruction wikipedia , lookup

Metabolism wikipedia , lookup

Point mutation wikipedia , lookup

Proteolysis wikipedia , lookup

Molecular evolution wikipedia , lookup

Amino acid wikipedia , lookup

Amino acid synthesis wikipedia , lookup

Biosynthesis wikipedia , lookup

Biochemistry wikipedia , lookup

Genetic code wikipedia , lookup

Transcript
Rooting the Ribosomal Tree of Life
Gregory P. Fournier1 and J. Peter Gogarten*,1
1
Department of Molecular and Cell Biology, University of Connecticut
*Corresponding author: E-mail: [email protected].
Associate editor: Martin Embley
Abstract
Key words: tree of life, ribosome, genetic code, ancestral reconstruction.
Introduction
In the last few decades, extensive sequencing of genetic material from a broad range of organisms has permitted the
construction of large detailed phylogenetic trees representing their evolutionary relationships. Of particular interest
are the deeper relationships between the three domains
of life: Bacteria, Archaea, and Eukarya. As no organismal
outgroup exists to polarize a universal phylogeny representing all three domains, identifying alternative methods
to root universal trees is critical to understand the deep
evolutionary history of life on Earth. This problem is also
compounded by an increasing realization that universal
trees do not necessarily reflect the organismal tree of life
(ToL) due to extensive horizontal gene transfer. Many
genes have distinct incongruent histories, and therefore,
a single tree is insufficient to explain the full evolutionary
history of life (Bapteste et al. 2009).
Many protein families contain ancestral gene duplications, where divergent paralogous copies of a specific
gene existed at the time of the most recent common ancestor (MRCA). In effect, this allows each paralog to act as
outgroup for the other within a phylogenetic reconstruction, producing a reciprocal rooting of the tree (Schwartz
and Dayhoff 1978). These ancestral gene duplications
have been frequently used to root the ToL, an approach
pioneered using protein sequences of the catalytic and
noncatalytic subunits of the ATP synthase complex
(Gogarten et al. 1989) and elongation factors (Iwabe
et al. 1989). This approach has also been used with other
gene families containing ancestral duplications, including
aminoacyl transfer RNA (tRNA) synthetases (aaRS) (Brown
and Doolittle 1995), an additional analysis of elongation factors (Baldauf et al. 1996), and protein-targeting machinery
components (Gribaldo and Cammarano 1998). Additionally,
this approach has been applied in at least one case to
ancient internal gene duplications using the large subunit
of carbamoyl phosphate synthetase (Schofield 1993;
Lawson et al. 1996; Olendzenski and Gogarten 1998; Islas
et al. 2007). These analyses generally support placing the
root either on the bacterial branch or within the bacterial
domain itself. In most of these analyses, the eukaryotic
nucleocytoplasm groups together with archaeal homologs, and this grouping is further supported by higher
order shared derived molecular characteristics (see discussion in Zhaxybayeva and Gogarten 2007). Molecular
data have provided overwhelming evidence for the endosymbiotic origin of mitochondria and plastids (Gray 1993;
Margulis 1995). Following common practice, we use the
term ‘‘eukaryotes’’ to denote the nucleocytoplasmic component of the eukaryotic cell. Although support for the
eukaryotic nucleocytoplasm grouping with the archaea is
strong, the question of monophyletic or paraphyletic
archaea remains contested. Sometimes, the eukaryotic
nucleocytoplasm is found to have originated from one
particular group of archaea (e.g., Cox et al. 2008); however, the root of the group of Eukaryotes plus Archaea
is difficult to determine, and many argue that the eukaryotic nucleocytoplasm constitutes a deeper branch and
a true sister group to extant archaea (see, e.g., the discussion in Dagan and Martin 2007; Poole and Penny 2007a,
2007b), although recent analyses have also suggested an
eocyte rooting and topology (Cox et al. 2008). Phylogenetic analysis including all three domains of life is further
complicated by the possibility of horizontal gene transfer
events (Zhaxybayeva and Gogarten 2004), as well as artifacts of tree reconstruction, especially those caused by
long branches frequently associated with interdomain
relationships (for review, see Zhaxybayeva et al. 2005).
Other approaches for rooting the ToL focus on specific
physiological characters that are proposed to be primitive,
© The Author 2010. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. All rights reserved. For permissions, please
e-mail: [email protected]
1792
Mol. Biol. Evol. 27(8):1792–1801. 2010 doi:10.1093/molbev/msq057
Advance Access publication March 1, 2010
Downloaded from mbe.oxfordjournals.org at Libraries-Montana State University, Bozeman on September 30, 2010
Research article
The origin of the genetic code and the rooting of the tree of life (ToL) are two of the most challenging problems in the
study of life’s early evolution. Although both have been the focus of numerous investigations utilizing a variety of methods,
until now, each problem has been addressed independently. Typically, attempts to root the ToL have relied on phylogenies
of genes with ancient duplications, which are subject to artifacts of tree reconstruction and horizontal gene transfer, or
specific physiological characters believed to be primitive, which are often based on subjective criteria. Here, we
demonstrate a unique method for rooting based on the identification of amino acid usage biases comprising the residual
signature of a more primitive genetic code. Using a phylogenetic tree of concatenated ribosomal proteins, our analysis of
amino acid compositional bias detects a strong and unique signal associated with the early expansion of the genetic code,
placing the root of the translation machinery along the bacterial branch.
Rooting the Ribosomal Tree of Life · doi:10.1093/molbev/msq057
translation machinery must have evolved in a preprotein
(and likely RNA-based) world, such a complex system could
only have evolved incrementally, likely coevolving with the
emergence of encoded proteins of increasing complexity
and functionality. As such, it is also likely that specific genetically encoded amino acids were incorporated into protein synthesis gradually, up until the code reached its
current retinue of 20 amino acids. Amino acids could
not be fixed at specific positions within proteins until their
establishment in the code; therefore, at the time of the
MRCA, the most ‘‘recent’’ amino acids would have had less
time to become selected for and should have been underrepresented if there was insufficient time for equilibrium to
be attained. Conversely, more ancient amino acids should
have been overrepresented at this time (Fournier and
Gogarten 2007).
Based on this model, at least two distinct approaches
have attempted to use amino acid usage at ancient positions to infer the evolutionary history of the genetic code,
albeit with somewhat differing results (Brooks and Fresco
2002; Brooks et al. 2002, 2004; Fournier and Gogarten 2007).
The work of Brooks et al. 2002, 2004; identifies biases in
overall amino acid usage in ancestral reconstructions of
a wide set of universally conserved genes using an expectation maximization methodology. In contrast, Fournier
and Gogarten (2007) only count completely conserved positions within universal ribosomal proteins, a simpler yet
more stringent model that seeks to avoid overinterpretation of reconstructions at variable positions at the expense
of a large data set. These analyses both agree on the ancient
underrepresentation of Cys, Trp, Phe, and Tyr residues.
However, whereas Brooks et al. (2004) also identify Ser,
Thr, Leu, Gln, and Asp as recent and Val, Ile, His, and
Glu as more ancient, Fournier and Gogarten (2007) identify
Ile, Val, Glu, and Lys as recent and Asn and Gly as ancient.
Interestingly, a subsequent revised analysis (Fournier 2009)
incorporating a 90% conservation probability cutoff for the
inclusion of reconstructed positions produces results more
similar to that of Brooks et al., providing statistical support
for Ser and Gln as more recent additions while removing
support for Ile and Val. As these hydrophobic amino acids
are extremely similar to one another, it is likely that the
initial analysis underestimated their ancestral frequency
due to occasional substitutions in some genomes.
In both these compositional analyses, the location of the
root of the ToL is assumed to be known (as either a midpoint
of the individual molecular phylogenies [Brooks et al. 2004]
or the bacterial branch rooting [Fournier and Gogarten
2007]), with the ancestral genetic code impacting usages
on this particular deep branch. Both these assumptions
are increasingly difficult to justify, given the emerging understanding of the rate heterogeneity of genomic evolution, the
frequency of gene transfer, and the occasionally conflicting
rootings identified using different sets of genes (Zhaxybayeva
et al. 2005; Bapteste et al. 2009). However, assuming that
a strong and unique compositional bias indicates a proximity
to a more primitive genetic code unique to the deepest
branch on the ToL, it becomes apparent that detecting such
1793
Downloaded from mbe.oxfordjournals.org at Libraries-Montana State University, Bozeman on September 30, 2010
such as aspects of membrane and cell wall structure
(Cavalier-Smith 2002), split tRNA structures (Di Giulio
2007), or extensive ‘‘RNA-world’’ molecular relics (Penny
and Poole 1999). Branches associated with groups showing
these primitive characters are postulated to contain the
root. These approaches root the tree within the bacterial
or archaeal domains or on the eukaryal branch. Although
logical support can be given to define any of these characters as primitive, ultimately these approaches depend on
subjective and often controversial hypotheses about the
physiological nature of early life. Whereas primitive characters are not useful in phylogenetic reconstruction (Hennig
1966), derived characters (duplicated genes, large insertions, or gaps) can be used to exclude the root from that
part of the ToL where organisms possess the derived
character state, although the root may be located on
the branch where the character transitioned from primitive
to derived (Gogarten et al. 1989; Zhaxybayeva et al. 2005;
Skophammer et al. 2006, 2007; Lake et al. 2007; Zhaxybayeva
and Gogarten 2007).
Many of the aforementioned analyses depend either
directly or indirectly on specific genes that show weak
conservation and/or a history of frequent interdomain
horizontal gene transfer, making them poor proxies for
an organismal ToL. It seems likely that the best genetic
proxy for an organismal ToL are phylogenetic trees generated using the ribosomal machinery, specifically the three
ribosomal RNAs (rRNAs) and about 29 universal ribosomal
proteins that comprise the ‘‘core ribosome.’’ Although
ribosomal protein- and RNA-encoding genes have been
transferred in the past (see examples and discussion in
Gogarten et al. 2002), these genes are resistant to transfer
(Sorek et al. 2007), with most transfers occurring between
close relatives followed by homologous recombination
(e.g., Morandi et al. 2005). These transfers across short
phylogenetic distances have little effect on large-scale tree
topology, especially among deep branches where the root
of the ToL may reside. Additionally, the high level of structural, sequence, and functional conservation in ribosomal
proteins allows for a higher confidence in tree topologies
being free of artifacts, especially those producing long
branches via radical functional divergence following duplication, as proposed for some deep paralogs (Philippe and
Forterre 1999; Cavalier-Smith 2006). For these reasons, a ribosomal ToL has been proposed to be an ideal ‘‘backbone’’
upon which to map horizontal gene transfers, clearly depicting their distinct contribution to genomic evolution
(Gogarten 1995; Dagan et al. 2008; Swithers et al. 2009).
Unfortunately, because there are no confirmed paralogs
for ribosomal genes, rooting the ribosomal ToL using established methods so far has not been possible.
In comparison with many organismal characters used in
other rooting methods, the genetic code is shared by all
cellular life and likely present in its complete form at
the time of the MRCA, apart from slight modifications
within specific lineages (Knight et al. 2001; Miranda
et al. 2006; Fournier and Gogarten 2007). Although this
strongly suggests that the genetic code and its requisite
MBE
MBE
Fournier and Gogarten · doi:10.1093/molbev/msq057
domain. Domain alignments were then combined in
ClustalW v. 1.83 (Thompson et al. 1994) using a profile
alignment. Alignments for each individual protein were
then concatenated.
Phylogeny and Ancestral sequence Reconstruction
a bias is, in fact, a method for empirically locating the root of
the ribosomal ToL.
In this generalized approach, positions conserved within
every branch of a phylogenetic tree can be identified using
ancestral sequence reconstructions (fig. 1). Therefore, if
enough sequence conservation exists within a universal
protein phylogeny, the branch containing the root can
be directly inferred, if positions conserved within it have
retained a strong and unique signature of bias in amino
acid composition independent of other physiological effects. Provided that such a unique compositional signal
is detected, its compatibility with existing hypotheses regarding genetic code evolution provides additional empirical evidence for both the location of the root and the
supported model(s) of code evolution.
Like previous composition-based investigations into genetic code evolution, this hypothesis assumes that newer
amino acids will approach an equilibrium of usage due
to the effects of positive and purifying selection on protein
sequences, thereby sequestering this ‘‘signature’’ to only the
deepest branches of the tree. It is also important to note
that, while relying on the ‘‘primitive’’ character of an ancestral genetic code signature, this approach is distinct from
other attempts to root the tree using primitive characters,
in that the root is being directly inferred along an internal
branch, as opposed to indirectly inferred via the character
states within various taxa at terminal positions on the tree.
Model for Identification of Conserved Positions
Materials and Methods
Usage bias along each branch was calculated for each
amino acid as the difference between the observed and
average simulated usage rate at conserved positions, measured as a relative percentage. For tests of statistical significance, this difference was measured in SD units (standard
deviation of simulated data sets), followed by a two-sided
Z-test. Significance was defined as a corresponding P value of
less than 5% (P , 0.05). To measure combined compositional bias for each branch, usage biases (SD) for all 20 analyzed amino acids were combined by calculating a vector in
Sequence Collection and Alignment
Sequences of 29 universally conserved ribosomal proteins
were collected from GenBank (Benson et al. 2008) from
121 genomes, including 45 completed bacterial genomes
with a wide phylogenetic distribution, 43 completed archaeal genomes, and 33 eukaryal genomes. The MUSCLE
program (Edgar 2004) was used to perform a multiple
sequence alignment for individual proteins within each
1794
Positions within each node of the ancestral reconstruction
were defined as ‘‘conserved’’ if their identity was reported
with at least 90% probability. This filter permits the inclusion of sites that may have infrequent substitutions between similar amino acids at terminal branches, avoiding
any underrepresentation of similar frequently interchanged
residues due to excessive stringency. Assuming in each case
that the branch in question contains the root, positions
were defined as conserved within each branch if identical
positions were conserved as the same amino acid within
both adjacent nodes.
Data Set Simulations
Simulated sequences (ten replicates) were generated using
the EVOLVER program in PAML (Yang 2007), using the
above-described observed tree topology, 4,500 positions
(roughly equal to the average ribosomal protein concatenate length), a 5 1.14 (as estimated for the actual data),
under a WAG model with four rate categories. Initial amino
acid compositions were set to the average observed overall
ribosomal protein amino acid usages from the concatenated data sets. For each replicate, ancestral sequences
were reconstructed using ANCESCON (O-option and no
optimization of P-vector).
Measuring/Combining Compositional Bias
Downloaded from mbe.oxfordjournals.org at Libraries-Montana State University, Bozeman on September 30, 2010
FIG. 1. Model for identifying amino acid positions conserved on
branches. For ancestral node reconstructions, shaded positions
indicate .90% probability of amino acid identity, reflecting a high
probability of fixation. The set of conserved positions shared
between adjacent nodes corresponds to those conserved at
a postulated root located along this particular branch. The analysis
is repeated for every branch within the tree. Sequence fragments are
from reconstructions of ribosomal protein L11.
Maximum likelihood (ML) trees were constructed using
PHYML (Guindon and Gascuel 2003) under a WAG þ
C (four rate categories, estimated a) þ PINVAR model,
with 100 bootstrap replicates. Ancestral sequences were
reconstructed using ANCESCON (O-option and no optimization of P-vector, WAG model) (Cai et al. 2004) using an
ML tree constructed from positions with at least 50% conservation (i.e., at least 50% of the extant sequences at the
tip of the branches shared the same amino acid). This marginal reconstruction method does not rely on tree polarity
and determines the likelihood of each amino acid at each
position within ancestral nodes, given the alignment and
unrooted tree topology.
MBE
Rooting the Ribosomal Tree of Life · doi:10.1093/molbev/msq057
20D space. The length of this vector was then used to calculate the geometric mean distance per amino acid (SD).
Long-Branch Compositional Bias Simulation
Results and Discussion
Phylogeny of Concatenated Ribosomal Proteins
The ribosome is one of the most ancient and wellconserved structures in the biological world, with 29 ‘‘core’’
proteins universally present across all cellular life (Harris
et al. 2003). The high level of sequence conservation across
all three domains allows for reliable alignment of individual
ribosomal protein sequences, and as there are high selective
barriers to horizontal transfers of these genes (Sorek et al.
2007), they are more likely to retain a vertical signal of organismal evolution. These features make ribosomal protein
sequences excellent for constructing reliable phylogenetic
trees, especially for distantly related groups; however, because no paralogs of universal ribosomal proteins are currently known, phylogenies of ribosomal proteins (and
ribosomal RNAs) have remained unrooted.
Individual core ribosomal proteins are short in length,
each containing few phylogenetically informative positions.
As such, although universal phylogenies generated from
alignments of individual ribosomal proteins generally do
not show significant conflict, they are largely unresolved
(data not shown). Concatenation of the 29 individually
aligned protein sequences produces an alignment with
9,258 positions, resulting in consistent highly resolved phylogenies using a variety of tree reconstruction methods.
Consistency and resolution is further improved by only including slowly evolving positions, generating trees using an
alignment consisting of 4,571 positions showing at least
50% identity within each site. The resulting ML phylogenetic tree was largely congruent with phylogenies generated using 16S rRNA (Woese and Fox 1977) in that it
showed strong support for the monophyly of Archaea, Bacteria, and Eukarya, as well as the monophyly of other major
groups, such as Crenarchaeota, Euryarchaeota, Proteobacteria, and Firmicutes (supplementary text file S2, Supplementary Material online). Some artifacts due to
composition and long branches are also present, such as
the deep placement of some protist and fungal lineages
within the Eukarya and possibly the locations of the roots
of the bacterial and archaeal domains. Tree reconstructions
using the more sophisticated nonhomogenous site model
in PhyloBayes (Lartillot and Philippe 2004) did not remove
the apparent long-branch attraction artifacts present in
Compositional Bias and Substitution Models
To attribute compositional bias at conserved positions to
a branch-specific biological effect, the model of amino acid
substitutions must be first taken into account. In any biologically relevant substitution model, similar amino acids
more frequently substitute for one another, whereas some
more unique amino acids undergo substitutions much less
often. Based on these characteristics, conserved positions
on any branch should contain an overabundance of amino
acids, which are relatively invariant (e.g., Cys), and an
underabundance of amino acids, which frequently substitute for others (e.g., Ser), relative to their overall frequencies, which remain constant across the reconstruction.
Furthermore, the more evolution separates sequences
(i.e., the longer the branch), the more pronounced these
effects should become.
Testing this assumption by simulating sequence evolution across increasingly long branches shows that this is
indeed the case (fig. 2). Even before a branch length of
1 substitution/site is reached, conserved positions show
a clear enrichment in Gly, Cys, Trp, and Pro. At longer
branch lengths, an underabundance of Asn, Gln, Lys,
and Ser also becomes apparent. Therefore, observed usages
at conserved positions along branches must be compared
with those detected within simulated sequence reconstructions in order to avoid attributing branch-specific biological relevance to any observed bias in these particular
amino acids. Interestingly, these biases seem to reach a maximum at branch lengths of 10–20 and then decrease until
converging with overall amino acid usage rates at branch
lengths around 50. As this is approximately the branch
length at which, for this model, the number of conserved
positions remaining approaches the number expected by
chance between two nonrelated sequences (N/20), this
likely represents the eventual removal of any detectable sequence homology due to extreme saturation with substitutions. At this point, remaining matching positions are
therefore merely a product of overall sequence composition, as would be the case for two random sequences.
Amino Acid Usage Biases
In comparing observed conserved positions with those
within reconstructions using simulated sequences, ten
1795
Downloaded from mbe.oxfordjournals.org at Libraries-Montana State University, Bozeman on September 30, 2010
Simulated sequences were generated using the EVOLVER
program in PAML, using a simple three-branch tree topology, 4,500 positions, flat amino acid usage rates (all equal to
0.05), a 5 1.14, under a WAG model with four rate categories. Ten replicates were generated for each branch
length at log-linear intervals (0.135–90.017 substitutions/
site). For each simulation, the usage rates of amino acids
at conserved positions along one branch were calculated
and then averaged across replicates.
our phylogeny (data not shown). However, the observed
phylogenetic artifacts and uncertainties should have little
if any impact on our analysis, given the very short deep
branches within the bacterial and archaeal domains, and
the absence of any hypothesis, placing the root within
the eukaryal domain. In addition, the impact of model misspecification on ancestral state reconstruction is greatly reduced by the limitation of our analyses to sites for which
the ancestral state is determined with more than 90% posterior probability. Future refinement of this methodology
may explore branch heterogeneity with respect to composition and covariation (e.g., Blanquart and Lartillot 2008),
although it is unclear if such an approach would significantly impact our results.
Fournier and Gogarten · doi:10.1093/molbev/msq057
MBE
amino acids showed strong statistically significant biases
within deep branches, with nine showing the strongest bias
within the bacterial branch, that is, the branch leading
to the bacterial root (table 1, figs. 3 and 4, supplementary
fig. S1, Supplementary Material online). Gly, Ala, and Asn all
showed a strong significant overrepresentation in the bacterial branch (35.1%, 45.6%, and 54.7%, respectively). The
branch leading to the archaeal root also showed weaker
yet significant overrepresentation of Ala and Gly. The only
other amino acid showing significant overrepresentation
Table 1. Amino Acid Usage at Conserved Positions on Deep
Phylogenetic Branches.
Amino
Acid
Ala
Asp
Cys
Glu
Phe
Gly
His
Ile
Lys
Leu
Met
Asn
Pro
Gln
Arg
Ser
Thr
Val
Trp
Tyr
Usage within Deep Branches
Bacterial
0.113*
0.043
0.005*
0.045
0.020*
0.185*
0.022
0.068
0.068
0.089
0.021
0.037*
0.067
0.011*
0.086
0.024*
0.042
0.080
0.002*
0.015*
Archaeal
0.093*
0.040
0.007
0.058
0.025*
0.135*
0.023
0.071
0.082
0.085
0.022
0.026
0.070
0.016*
0.096*
0.031*
0.045
0.077
0.008*
0.030
Eukaryal
0.091
0.041
0.009
0.055
0.026*
0.140
0.024
0.063
0.081
0.085
0.019
0.028
0.076
0.017
0.102*
0.028*
0.048
0.073
0.009*
0.025*
Global Ribosomal
Usage
0.082
0.042
0.008
0.064
0.027
0.079
0.021
0.066
0.101
0.075
0.018
0.037
0.043
0.032
0.088
0.051
0.049
0.083
0.007
0.027
* amino acid usage rates significantly different from expected usage in sequence
simulations (2-sided Z-test, p,0.05, see Materials and Methods).
1796
on deep branches was Arg along the branches leading
to the archaeal and eukaryal roots, albeit with a substantially weaker bias (13.9% and 22.2%, respectively). Conversely, strong significant underrepresentation along the
bacterial branch was observed for Gln (47.6%), Phe
(44.0%), Ser (38.2%), Trp (85.1%), Tyr (64.1%),
and Cys (65.4%). Weaker yet significant biases were observed along the branch leading to the archaeal root for
Gln, Phe, Ser, and Trp and along the branch to the eukaryal
root for Phe, Ser, Trp, and Tyr. The strong statistically significant biases observed for Asn, Cys, and Trp along the
bacterial branch are especially compelling as their direction
is the inverse of that expected for conserved positions
within long branches (fig. 2). Therefore, these usage biases
cannot be explained by an underestimation of branch
lengths at deep positions within the tree. Similarly, usages
of Ala, Tyr, and Phe should be immune to any such conservation bias as these are not observed to substantially
fluctuate over increasing branch lengths. Because the usage
biases of Gly, Gln, and Ser are of the same direction as the
expected conservation biases, their significance lies in the
magnitude of the effect, which potentially could be an
artifact of underestimated branch lengths. However, the
magnitude of the observed biases is beyond what the
model can accommodate. For example, at the maximal bias
observed within simulations (branch length ;20), usage of
Gly at conserved positions is less than double the initial
condition (8.7% vs. 5.0%) (fig. 2), a substantially smaller increase than what is observed on the much shorter branch
leading to the bacterial root compared with current average ribosomal usages (18.5% vs. 7.9%) (table 1). The case is
similar for the underrepresentation of Gln and Ser. Therefore, it is unlikely that the strong compositional bias observed at conserved positions on the bacterial branch is
Downloaded from mbe.oxfordjournals.org at Libraries-Montana State University, Bozeman on September 30, 2010
FIG. 2. Branch length–induced compositional bias at conserved positions. Bias in the usage of particular AAs increases with evolutionary
distance (branch length) due to the differential substitution rates of AAs as described in biologically relevant substitution models. As labeled,
AAs that tend to occupy conserved positions tend to increase in usage (Gly, Cys, Trp, and Pro), whereas AAs that frequently substitute tend to
decrease (Asn, Gln, Ser, and Lys). At extreme branch lengths, this bias subsides as no informative positions remain due to saturation with
substitutions. Branch lengths are represented on a log scale. AA, amino acid.
MBE
Rooting the Ribosomal Tree of Life · doi:10.1093/molbev/msq057
a product of especially large conservation bias along long
branches. Rare amino acids such as Cys, Trp, and Met show
particularly strong biases across most of the tree; however,
because their usage rates are low with respect to their variance across simulated analyses, most of these biases are
nonsignificant. For example, even though Met shows an
overrepresentation of 40.1% along the bacterial branch, this
value is not statistically significant at the cutoff level used
for this analysis (P , 0.05).
Composite Analysis
Combining individual amino acid biases into an overall
composite bias across the tree (fig. 3), the bacterial branch
clearly shows the strongest composite bias outside of the
haloarchaea, corresponding to 3.44 SD per amino acid. In
comparison, the composite bias in the archaeal branch is
equivalent to 1.95 SD per amino acid and in the eukaryal
branch is 1.71 SD per amino acid. This verifies that the
strong individual biases in amino acid usage along this
branch are indicative of a unique signal and that there
is no strong alternative signal consisting of a combined effect among amino acids with weaker biases, which, on their
own, would not be identified as significant. In general,
composite bias is greater within the Archaea than in the
Bacteria and Eukarya due to the combined effects of widespread physiological factors that influence amino acid composition in different archaeal groups.
Alternative Compositional Signatures
Several physiological characters can impose amino acid
biases within proteins, including thermophily, halophily,
and bias in genomic composition, specifically G þ C content.
The impact of each of these factors is well known and is reflected in the observed amino acid composition across several branches of the concatenated ribosomal tree (table 2).
The overrepresentation of Asp and Glu in halophiles is an
adaptation to the ionic hypersaline environment of their intracellular space (Dennis and Shimmin 1997). Thermophiles
prefer charged residues over polar due to the thermodynamics of protein folding and also tend to favor proline to promote rigidity of protein structures (Watanabe et al. 1991;
Fukuchi and Nishikawa 2001; Zhou et al. 2008). Finally, genomes with strong G þ C biases tend to favor and avoid
specific sets of amino acids based on the nucleotide content
of their codons (Paila et al. 2008). Each of these observed
physiologically imposed compositional biases is incongruent with the set of biases observed on the bacterial branch;
this reinforces our interpretation that this signature is not
simply reflection of an ancestral physiological state but is an
echo of a more primitive genetic code at the deepest branch
of the phylogeny of the ribosome.
An alternative hypothesis to explain the previously reported (Fournier and Gogarten 2007) compositional bias
within the bacterial branch is that it represents a ‘‘mesophilic
1797
Downloaded from mbe.oxfordjournals.org at Libraries-Montana State University, Bozeman on September 30, 2010
FIG. 3. Composite amino acid usage bias across universal ribosomal tree. Bias values reflect the geometric mean distances (normalized as SD)
between observed and expected amino acid usages. Increased bias across the archaeal domain is largely due to widespread thermophily,
halophily, and nucleotide composition bias. Aside from branches associated with haloarchaea, the branch leading to the bacteria (the bacterial
root) contains the greatest bias.
Fournier and Gogarten · doi:10.1093/molbev/msq057
MBE
signature’’ for early life, as indicated by an underrepresentation of amino acids typically favored in thermophiles
(Boussau et al. 2008). Although the signature we reconstruct
for the bacterial branch is inconsistent with ancient thermophily, it does not necessarily follow that this signature
is solely caused by the converse physiological state, that
is, ‘‘nonthermophily.’’ Because the majority of sequences
used to provide the initial amino acid frequencies in the expectation simulation are from mesophiles, a branch solely
defined by its mesophilic character should produce no significant signature of compositional bias at all; underrepresentation of ‘‘thermophilic’’ amino acids would therefore not be
expected. Our data therefore are compatible with an underrepresentation of amino acids added late during the expansion of the genetic code and a mesophilic MRCA located on
the bacterial branch. Assuming a mesophilic MRCA alone
cannot explain the observed compositional bias.
Comparison with Other Models for the Early
Expansion of the Genetic Code
The strong unique biases in amino acid composition at
conserved positions present along the bacterial branch
cannot be explained by the effects of substitution models
within long branches, nucleotide composition, or the compositional impacts of any known physiological conditions.
Additionally, the specific amino acid biases detected are
similar to those predicted in several models of genetic code
Table 2. Amino Acid Bias Signatures within the Universal
Ribosomal Tree.
Signature
Thermophiles
Halophiles
Low G 1 C
High G 1 C
Deep branches
1798
Overrepresentation
Glu, Pro, Trp
Asp, Glu
Ile, Lys, Ser, Tyr, Asn
Arg
Ala, Gly, Asn
Underrepresentation
Ser, Gln
Ile, Lys, Tyr, Cys
Pro, Glu
Ile, Asn
Phe, Gln, Ser, Trp, Tyr, Cys
evolution, suggesting that the bacterial branch is closest to
a more primitive state of the genetic code, and therefore
contains the root of the translational machinery.
The set of amino acids showing significant bias along the
bacterial branch is similar to those previously detected by
a related but less sophisticated analysis (Fournier and
Gogarten 2007). However, the incorporation of phylogeny,
ancestral sequence reconstruction, substitution biases, and
a probabilistic definition of conserved positions greatly increase the sample size and utility of the method presented,
correcting artifacts in the previous analysis, such as the previously discussed apparent underrepresentation of Ile, Val,
and Lys.
The absence of a significant bias on this deep branch for
some amino acids (Val, Ile, Leu, Asp, Pro, Thr, Lys, or Arg)
can be explained in two ways. First, an amino acid may actually be within the ‘‘intermediate set’’ of those added to
the code, with sufficient time for usage equilibrium before
the MRCA, but not old enough for an initial overwhelming
excess to exist. Second, some amino acids within this set
may be equally ancient to those showing a significant overrepresentation; however, their initial excess may have rapidly disappeared with the addition of other similar amino
acids that provided a slight selective advantage in many
positions, effectively ‘‘partitioning away’’ their excess. In
some cases, this would also effectively diminish the bias signal of the underrepresented ‘‘new’’ amino acid as sites
would have been ‘‘preselected’’ for their invasion. This preselection effect could explain the flat usage levels of Val,
Leu, and Ile, making it impossible to determine from this
analysis in what order they were added to the code. This
may also be the case for Arg and Lys. However, because Thr
and Asp both have similar alternatives, which are underrepresented on the bacterial branch (Ser and Glu), there is
stronger evidence that these are both indeed more ancient
amino acids. Because Pro is a functionally and structurally
unique amino acid that should not be susceptible to
Downloaded from mbe.oxfordjournals.org at Libraries-Montana State University, Bozeman on September 30, 2010
FIG. 4. Amino acid usages at conserved positions along the branch leading to the bacterial root. Error bars represent SD calculated from ten
replicate simulations in the case of expected values (red bars, SD corrected for low sample size) and from 29 universal ribosomal proteins in the
case of observed usages in extant ribosomal protein sequence (green bars).
MBE
Rooting the Ribosomal Tree of Life · doi:10.1093/molbev/msq057
(Mosyak et al. 1995) and undergoes a class I-like aminoacylation reaction (Sprinzl and Cramer 1975); SerRS does not
recognize the anticodon of its cognate tRNAs (as its
uniquely disparate set of six codons makes this impossible),
instead relying on a long variable arm for interaction specificity (Asahara et al. 1994). This may suggest that in each
case, these aaRS are more derived and among the more
recent class II variants to evolve. The identification of
Ser as a late amino acid is especially interesting for additional reasons as this amino acid is generally considered
to be among the most ancient based on biochemical, metabolic, and codon table criteria (Trifonov 2000). Although it
may be the case that the biochemical and metabolic support for Ser being a primordial amino acid simply indicates
its likely participation in primordial intermediate metabolism (predating its incorporation into the genetic code),
this type of speculation requires evidence beyond that provided in the scope of this investigation.
There is at least one model of genetic code evolution predicting Thr preceding Ser (Higgs 2009), which reconstructs
the organization and evolution of the genetic code using
a selection-based model, evaluating the fitness effect of
code expansion via the addition of new amino acids via
stepwise codon space partitioning. The amino acids predicted to be the earliest in this model (Gly, Ala, and Asp)
are also in agreement with the presented analysis, along
with Val, on which our analysis remains agnostic as previously described. In further agreement, Pro is predicted to be
added at an intermediate stage and Cys, Trp, Tyr, and Phe at
a later stage. The only amino acids showing conflict between
these two analyses are Glu and Asn. Higgs (2009) places
Glu among the earlier amino acids to be added (fifth, in
the second stage of code evolution), as opposed to among
the most recent. Conversely, Higgs (2009) places Asn as
an intermediate-late addition, as opposed to an early
one. In both cases, these differences are likely related to
the selection-based model favoring the physiochemical
ionic similarity of the Asp–Glu pair over the metabolic/
chemical similarity of the Glu–Gln and Asp–Asn pairs.
Conclusions
Rooting the tree of the translation machinery using the polarizing ancestral state of its own functional semantics (i.e.,
the genetic code) is a novel approach that avoids many of
the shortcomings of other rooting methods while simultaneously making use of a large set of proteins providing
reliable phylogenetic reconstruction. The results of this
method locate the ribosomal MRCA upon the branch leading to the bacterial domain, in agreement with previous
analyses utilizing sets of genes that underwent ancestral
duplications, including those also related to protein synthesis. As ribosomal proteins are more likely to show a consistent phylogenetic signal indicative of vertical inheritance,
the rooting of the ribosomal ToL provides a polarized phylogenetic scaffold upon which the complex genetic reticulations of evolutionary history can be mapped and better
understood. Additionally, this method identifies a pattern
1799
Downloaded from mbe.oxfordjournals.org at Libraries-Montana State University, Bozeman on September 30, 2010
competition, it seems most likely that it was actually added
at an intermediate stage.
Additional inferences about code evolution can be made
by comparing the signatures of sets of metabolically related
amino acids with similar physiochemical properties and assuming that these ‘‘come as a set.’’ For example, Gln is
physiochemically analogous to Asn and shows strong underrepresentation on the bacterial branch, indicating that
it was not a specifically encoded amino acid in an earlier
version of the genetic code. Although its underrepresentation on the bacterial branch is not statistically significant at
the cutoff used in this analysis, Glu does show a trend congruent with the metabolically related Glu–Gln amino acid
pair being a more recent addition to the code. Likewise, the
similar Asp–Asn pair is more likely to be ancient due to the
detected overrepresentation of Asn along the same deep
branch. Based on this inference, at a more primitive state,
the code would still contain a similar diversity of physiochemical properties, being able to synthesize proteins containing hydroxyl, amino, hydrophobic, and both positively
and negatively charged side chains. This suggests that after
its earliest stages, genetic code evolution may have promoted continued optimization of existing protein folds
rather than being responsible for a radical expansion of
protein diversity and functionality. The only exceptions
seem to be unique specialized amino acids such as those
with aromatic structure (Tyr, Trp, and Phe) and Cys.
Aromatic residues may have been advantageous for the
hydrophobic core packing needed as proteins became
larger, whereas Cys would have been advantageous as more
opportunities developed for complex linkages between
proteins, lipids, and other substrates. Alternatively, aromatic amino acids could have been selected for as RNA
‘‘replacements’’ in a dying RNA world, mimicking nucleotides in important structural roles (e.g., stacking interactions). The unique metal-binding properties of Cys may
also have played a role in its selection as this faculty
may have been deficient in an RNA-based physiology.
The specific amino acid biases identified on the bacterial
branch are consistent with several generally accepted principles regarding the evolution of the genetic code, specifically a preference for earlier amino acids to be simple, with
latter ones more complex (Trifonov 2000). This is especially
apparent in the identification of Trp and Tyr as more recent amino acids as both are products of complex metabolic pathways (Klipcan and Safro 2004). These results are
also in agreement with the hypothesis that class II aaRS are
generally associated with earlier amino acids, with class I
aaRS evolving later (Hartman 1995). In fact, every amino
acid identified as ‘‘early’’ in this analysis has a cognate class
II aaRS, whereas most of the amino acids identified as ‘‘late’’
have a cognate class I aaRS. Second-order inferences regarding amino acids without significant bias also favor this
model as Asp and Thr (inferred to be earlier) are both associated with class II aaRS, whereas Glu (inferred to be more
recent) is associated with class I. The exceptions are Phe
and Ser; interestingly, both are associated with unique class
II aaRS: PheRS has a unique heterotetrameric structure
Fournier and Gogarten · doi:10.1093/molbev/msq057
of amino acid usage biases in general agreement with current models of genetic code evolution.
Supplementary Material
Supplementary figure S1 and supplementary text file S2 are
available at Molecular Biology and Evolution online (http://
www.mbe.oxfordjournals.org/).
Acknowledgments
References
Asahara H, Himeno H, Tamura K, Nameki N, Hasegawa T,
Shimizu M. 1994. Escherichia coli seryl-tRNA synthetase
recognizes tRNA(Ser) by its characteristic tertiary structure.
J Mol Biol. 236:738–748.
Baldauf SL, Palmer JD, Doolittle WF. 1996. The root of the universal
tree and the origin of eukaryotes based on elongation factor
phylogeny. Proc Natl Acad Sci U S A. 93:7749–7754.
Bapteste E, O’Malley MA, Beiko RG, et al. (11 co-authors). 2009.
Prokaryotic evolution and the tree of life are two different
things. Biol Direct. 4:34.
Benson DA, Karsch-Mizrachi I, Lipman DJ, Ostell J, Sayers EW. 2008.
GenBank. Nucleic Acids Res. 37:26–31.
Blanquart S, Lartillot N. 2008. A site- and time-heterogeneous model
of amino-acid replacement. Mol Biol Evol. 25:842–858.
Boussau B, Blanquart S, Necsulea A, Lartillot N, Gouy M. 2008.
Parallel adaptations to high temperatures in the Archaean eon.
Nature 456:942–945.
Brooks DJ, Fresco JR. 2002. Increased frequency of cysteine, tyrosine,
and phenylalanine residues since the last universal ancestor. Mol
Cell Proteomics. 1:125–131.
Brooks DJ, Fresco JR, Lesk AM, Singh M. 2002. Evolution of amino
acid frequencies in proteins over deep time: inferred order of
introduction of amino acids into the genetic code. Mol Biol Evol.
19:1645–1655.
Brooks DJ, Fresco JR, Singh M. 2004. A novel method for estimating
ancestral amino acid composition and its application to proteins
of the Last Universal Ancestor. Bioinformatics 20:2251–2257.
Brown J, Doolittle W. 1995. Root of the universal tree of life based
on ancient aminoacyl-tRNA synthetase gene duplications. Proc
Natl Acad Sci U S A. 92:2441–2445.
Cai W, Pei J, Grishin NV. 2004. Reconstruction of ancestral protein
sequences and its applications. BMC Evol Biol. 4:33.
Cavalier-Smith T. 2002. The neomuran origin of archaebacteria, the
negibacterial root of the universal tree and bacterial megaclassification. Int J Syst Evol Microbiol. 52:7–76.
Cavalier-Smith T. 2006. Rooting the tree of life by transition
analyses. Biol Direct. 1:19.
Cox CJ, Foster PG, Hirt RP, Harris SR, Embley TM. 2008. The
archaebacterial origin of eukaryotes. Proc Natl Acad Sci U S A.
105:20356–20361.
Dagan T, Artzy-Randrup Y, Martin W. 2008. Modular networks and
cumulative impact of lateral transfer in prokaryote genome
evolution. Proc Natl Acad Sci U S A. 105:10039–10044.
1800
Dagan T, Martin W. 2007. Testing hypotheses without considering
predictions. Bioessays 29:500–503.
Dennis PP, Shimmin LC. 1997. Evolutionary divergence and salinitymediated selection in halophilic archaea. Microbiol Mol Biol Rev.
61:90–104.
Di Giulio M. 2007. The tree of life might be rooted in the branch
leading to Nanoarchaeota. Gene 401:108–113.
Edgar RC. 2004. MUSCLE: multiple sequence alignment with
high accuracy and high throughput. Nucleic Acids Res. 32:
1792–1797.
Fournier GP. 2009. Genetic code evolution and amino acid
composition analysis [PhD thesis]. [Storrs (CT)]: Department
of Molecular and Cell Biology, University of Connecticut.
Fournier GP, Gogarten JP. 2007. Signature of a primitive genetic
code in ancient protein lineages. J Mol Evol. 65:425–436.
Fukuchi S, Nishikawa K. 2001. Protein surface amino acid
compositions distinctively differ between thermophilic and
mesophilic bacteria. J Mol Biol. 309:835–843.
Gogarten JP. 1995. The early evolution of cellular life. Trends Ecol
Evol. 10:147–151.
Gogarten JP, Doolittle WF, Lawrence JG. 2002. Prokaryotic evolution
in light of gene transfer. Mol Biol Evol. 19:2226–2238.
Gogarten JP, Kibak H, Dittrich P, et al. (13 co-authors). 1989.
Evolution of the vacuolar Hþ-ATPase: implications for the
origin of eukaryotes. Proc Natl Acad Sci U S A. 86:6661–6665.
Gray MW. 1993. Origin and evolution of organelle genomes. Curr
Opin Genet Dev. 3:884–890.
Gribaldo S, Cammarano P. 1998. The root of the universal tree of life
inferred from anciently duplicated genes encoding components
of the protein-targeting machinery. J Mol Evol. 47:508–516.
Guindon S, Gascuel O. 2003. A simple, fast, and accurate algorithm
to estimate large phylogenies by maximum likelihood. Syst Biol.
52:696–704.
Harris JK, Kelley ST, Spiegelman GB, Pace NR. 2003. The genetic core
of the universal ancestor. Genome Res. 13:407–412.
Hartman H. 1995. Speculations on the evolution of the genetic code
IV: the evolution of the aminoacyl-tRNA synthetases. Orig Life
Evol Biosph. 25:265–269.
Hennig W. 1966. Phylogenetic systematics. Urbana (IL): University of
Illinois Press.
Higgs PG. 2009. A four-column theory for the origin of the genetic
code: tracing the evolutionary pathways that gave rise to an
optimized code. Biol Direct. 4:16.
Islas S, Hernández-Morales R, Lazcano A. 2007. Question 7:
comparative genomics and early cell evolution: a cautionary
methodological note. Orig Life Evol Biosph. 37:415–418.
Iwabe N, Kuma K, Hasegawa M, Osawa S, Miyata T. 1989.
Evolutionary relationship of archaebacteria, eubacteria, and
eukaryotes inferred from phylogenetic trees of duplicated genes.
Proc Natl Acad Sci U S A. 86:9355–9359.
Klipcan L, Safro M. 2004. Amino acid biogenesis, evolution of the
genetic code and aminoacyl-tRNA synthetases. J Theor Biol.
228:389–396.
Knight RD, Freeland SJ, Landweber LF. 2001. Rewiring the keyboard:
evolvability of the genetic code. Nat Rev Genet. 2:49–58.
Lake JA, Herbold CW, Rivera MC, Servin JA, Skophammer RG. 2007.
Rooting the tree of life using nonubiquitous genes. Mol Biol Evol.
24:130–136.
Lartillot N, Philippe H. 2004. Bayesian phylogenetic software based
on mixture models. Mol Biol Evol. 21:1095–1109.
Lawson F, Charlebois R, Dillon J. 1996. Phylogenetic analysis of
carbamoylphosphate synthetase genes: complex evolutionary
history includes an internal duplication within a gene which can
root the tree of life. Mol Biol Evol. 13:970–977.
Margulis L. 1995. Symbiosis in cell evolution: microbial communities in
the archean and proterozoic eons. New York: W H Freeman & Co.
Downloaded from mbe.oxfordjournals.org at Libraries-Montana State University, Bozeman on September 30, 2010
We thank Tim Harlow for writing the program for parsing
ANCESCON results; Pascal Lapierre for writing the program
to identify conserved positions within multiple sequence
alignments; and Kristen Swithers, Nicolas Galtier, and
two anonymous reviewers for comments and discussions.
This work was supported through grants from the NASA
Exobiology (NNX07AK15G) and National Science Foundation Assembling the Tree of Life (DEB 0830024) programs.
MBE
Rooting the Ribosomal Tree of Life · doi:10.1093/molbev/msq057
Sorek R, Zhu Y, Creevey CJ, Francino MP, Bork P, Rubin EM. 2007.
Genome-wide experimental determination of barriers to
horizontal gene transfer. Science 318:1449–1452.
Sprinzl M, Cramer F. 1975. Site of aminoacylation of tRNAs from
Escherichia coli with respect to the 2#- or 3#-hydroxyl group of
the terminal adenosine. Proc Natl Acad Sci U S A. 72:3049–3053.
Swithers KS, Gogarten JP, Fournier GP. 2009. Trees in the web of life.
J Biol. 8:54.
Thompson J, Higgins D, Gibson T. 1994. CLUSTAL W: improving the
sensitivity of progressive multiple sequence alignment through
sequence weighing, position-specific gap penalties and weight
matrix choice. Nucleic Acids Res. 22:4673–4680.
Trifonov EN. 2000. Consensus temporal order of amino acids and
evolution of the triplet code. Gene 261:139–151.
Watanabe K, Chishiro K, Kitamura K, Suzuki Y. 1991. Proline
residues responsible for thermostability occur with high
frequency in the loop regions of an extremely thermostable
oligo-1,6-glucosidase from Bacillus thermoglucosidasius KP1006.
J Biol Chem. 266:24287–24294.
Woese CR, Fox GE. 1977. Phylogenetic structure of the prokaryotic
domain: the primary kingdoms. Proc Natl Acad Sci U S A.
74:5088–5090.
Yang Z. 2007. PAML 4: phylogenetic analysis by maximum
likelihood. Mol Biol Evol. 24:1586–1591.
Zhaxybayeva O, Gogarten JP. 2004. Cladogenesis, coalescence and the
evolution of the three domains of life. Trends Genet. 20:182–187.
Zhaxybayeva O, Gogarten JP. 2007. Horizontal gene transfer, gene
histories and the root of the tree of life. In: Pudritz RE, Higgs PG,
Stone J, editors. Astrobiology and the origins of life. Cambridge:
Cambridge University Press p.178–192.
Zhaxybayeva O, Lapierre P, Gogarten JP. 2005. Ancient gene duplications
and the root(s) of the tree of life. Protoplasma 227:53–64.
Zhou XX, Wang YB, Pan YJ, Li WF. 2008. Differences in amino acids
composition and coupling patterns between mesophilic and
thermophilic proteins. Amino Acids. 34:25–33.
1801
Downloaded from mbe.oxfordjournals.org at Libraries-Montana State University, Bozeman on September 30, 2010
Miranda I, Silva R, Santos MA. 2006. Evolution of the genetic code in
yeasts. Yeast 23:203–213.
Morandi A, Zhaxybayeva O, Gogarten JP, Graf J. 2005. Evolutionary
and diagnostic implications of intragenomic heterogeneity in the
16S rRNA gene in Aeromonas strains. J Bacteriol. 187:6561–6564.
Mosyak L, Reshetnikova L, Goldgur Y, Delarue M, Safro MG. 1995.
Structure of phenylalanyl-tRNA synthetase from Thermus
thermophilus. Nat Struct Biol. 2:537–547.
Olendzenski L, Gogarten JP. 1998. Deciphering the molecular record
for the early evolution of life: gene duplication and horizontal
gene transfer. In: Wiegel J, Adams MWW, editors. Thermophiles:
the keys to molecular evolution and the origin of life?
Philadelphia (PA): Taylor & Francis Inc. p. 165–176.
Paila U, Kondam R, Ranjan A. 2008. Genome bias influences amino
acid choices: analysis of amino acid substitution and recompilation of substitution matrices exclusive to an AT-biased
genome. Nucleic Acids Res. 36:6664–6675.
Penny D, Poole A. 1999. The nature of the last universal common
ancestor. Curr Opin Genet Dev. 9:672–677.
Philippe H, Forterre P. 1999. The rooting of the universal tree of life
is not reliable. J Mol Evol. 49:509–523.
Poole AM, Penny D. 2007a. Evaluating hypotheses for the origin of
eukaryotes. Bioessays 29:74–84.
Poole AM, Penny D. 2007b. Response to Dagan and Martin.
Bioessays 29:611–614.
Schofield JP. 1993. Molecular studies on an ancient gene encoding for
carbamoyl-phosphate synthetase. Clin Sci (Lond). 84:119–128.
Schwartz RM, Dayhoff MO. 1978. Origins of prokaryotes, eukaryotes,
mitochondria, and chloroplasts. Science. 199:395–403.
Skophammer RG, Herbold CW, Rivera MC, Servin JA, Lake JA. 2006.
Evidence that the root of the tree of life is not within the
Archaea. Mol Biol Evol. 23:1648–1651.
Skophammer RG, Servin JA, Herbold CW, Lake JA. 2007. Evidence for
a gram-positive, eubacterial root of the tree of life. Mol Biol Evol.
24:1761–1768.
MBE