Download Origin of the eukaryotic cell

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Non-coding DNA wikipedia , lookup

Protein moonlighting wikipedia , lookup

Ridge (biology) wikipedia , lookup

Short interspersed nuclear elements (SINEs) wikipedia , lookup

NEDD9 wikipedia , lookup

Transposable element wikipedia , lookup

Genetic engineering wikipedia , lookup

Genomic library wikipedia , lookup

Neuronal ceroid lipofuscinosis wikipedia , lookup

Genomic imprinting wikipedia , lookup

Oncogenomics wikipedia , lookup

Gene desert wikipedia , lookup

Extrachromosomal DNA wikipedia , lookup

Epigenetics of human development wikipedia , lookup

Metagenomics wikipedia , lookup

Gene nomenclature wikipedia , lookup

Expanded genetic code wikipedia , lookup

Copy-number variation wikipedia , lookup

RNA-Seq wikipedia , lookup

Public health genomics wikipedia , lookup

Genomics wikipedia , lookup

Human genome wikipedia , lookup

History of genetic engineering wikipedia , lookup

Genome (book) wikipedia , lookup

Therapeutic gene modulation wikipedia , lookup

Site-specific recombinase technology wikipedia , lookup

Epitranscriptome wikipedia , lookup

Gene expression programming wikipedia , lookup

Gene expression profiling wikipedia , lookup

Pathogenomics wikipedia , lookup

Gene wikipedia , lookup

Designer baby wikipedia , lookup

Minimal genome wikipedia , lookup

Genome editing wikipedia , lookup

Computational phylogenetics wikipedia , lookup

NUMT wikipedia , lookup

Microevolution wikipedia , lookup

Helitron (biology) wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Mitochondrial DNA wikipedia , lookup

Genome evolution wikipedia , lookup

Human mitochondrial genetics wikipedia , lookup

Transfer RNA wikipedia , lookup

Transcript
Origin of the eukaryotic cell
Min Wu
Degree Project in Biology, Master of Science (2 years), 2011
Examensarbete I biologi 45 hp till masterexamen, 2011
Biology Education Centre and Department of Molecular Evolution, Uppsala University
Supervisor: Siv Andersson
Abstract
The discovery of 2.1 billion years old Eukaryotic fossils in Gabon July 2010 has brought new insights into
the origin of eukaryotic cell (Albani et al. 2010). The relationship among three domains, Eukaryotes,
Bacteria and Archaea, in the tree of life was investigated in this project through phylogenies constructed
from two mitochondrial transmembrane proteins HSP70 and HSP60. The trees were rooted with a paralog
clade from endoplasmic reticulum, eukaryotes were the earliest derived clade in HSP70 tree, while in
HSP60 tree, it is more related to Archaea as was observed from universal tree of life. A lateral gene transfer
between Bacteria and Archaea was suspected for both genes. All mitochondrial copies of genes including
those from secondary amitochondrial organisms share the same last common ancestor with
alphaproteobacteria, and appear to be particularly affiliated to Rickettsiales. Placement of SAR11 clade
within alphaproteobacteria fluctuated between Rickettsiales and Rhodobacterales; Placement of SAR116
within alphaproteobacteria is identified for the first time ever since the release of its genome, and it is
consistently clustered within Rhodospirillales.
There are 137 Eukaryotic Signature Proteins previously defined in yeast mitochondrial proteome ten years
ago (Karlberg, 2000). Blast for these genes hits de novo prokaryotic homologs for twelve genes, whereas
only five of them possess more than 5 hits.
Patterns of tRNA loss in mitochondrial genome were studied, and a co-evolution between tRNA loss and
their aminoacyl-tRNA synthetase was proved in this project, according to the analysis of 1704
mitochondrial genomes, and falsified the hypothesis proposed by Schneider (Schneider 2001; Lithgow
2010) that the frequency of tRNA gene loss is related to sequence identity of bacterial and eucaryotic
aminoacyl-tRNA synthetase by Pearson Correlation test.
2
Contents
1. Introduction ....................................................................................................... 4
1.1 The latest discovery of 2.1 Gyr old Eukaryotic Fossils ..................................................................................4
1.2 The 3D Scenario verses the 2D Scenario........................................................................................................4
1.3 Endosymbiont Theory.....................................................................................................................................5
1.4 Eukaryotic Signatures .....................................................................................................................................6
1.5 Hsp multigene family and Mitohchondrial Protein Import Machine..............................................................6
1.6 tRNA gene loss in Mitochondrial genome and Aminoacyl-tRNA Synthetase gene .......................................7
2. Material and Methods........................................................................................ 9
2.1 Blast for Eukaryotic Signature Proteins..........................................................................................................9
2.2 Phylogenies for Mitochondrial Transmembrane proteins ...............................................................................9
2.3 Mitochondrial tRNA gene loss......................................................................................................................10
2.3.1 Construction of phylogenetic trees .......................................................................................................10
2.3.2Parsimony inference of tRNA loss patterns...........................................................................................10
2.3.3 Statistics in R ........................................................................................................................................ 11
3. Results ............................................................................................................. 12
3.1 Blast for Eukaryotic Signature Proteins........................................................................................................12
3.2 Phylogenies for HSP70 and HSP60 multigene family..................................................................................12
3.2.1 Trees for three domains of Life ............................................................................................................12
3.2.2 Trees for Mitochondria and Alphaproteobacteria .................................................................................17
3.3 Mitochondrial tRNA gene loss......................................................................................................................17
3.3.1 tRNA gene loss in Viridiplantae Kingdom ...........................................................................................17
3.3.2 tRNA gene loss in Fungi Kingdom.......................................................................................................18
3.3.3 ANOVA for loss tRNA genes of different origin ..................................................................................18
3.3.4 Correlation between tRNA Min-loss and sequence identity.................................................................19
3.3.5 Correlation between tRNA Total-loss and Mitochondrial Genome size...............................................20
3.3.6 Correlation between tRNA Total-loss and Mitochondrial Coding Genome size ..................................20
4. Discussion ....................................................................................................... 21
4.1 Lateral gene transfer between Bacteria and Archaea ....................................................................................21
4.2 Deep node of Cyanobacteria ........................................................................................................................21
4.3 Place of SAR11 and SAR116 sequences in HSP phylogenies ......................................................................21
4.4 Clean-ups of alignment result in variations of phylogenetic topology .........................................................21
4.5 Paralog Rooting ............................................................................................................................................22
4.6 Consistence with Endosymbiont Theory ......................................................................................................22
4.7 Co-evolution of tRNA gene loss and Aminoacyl-tRNA Synthetase gene.....................................................22
Acknowledgement............................................................................................... 23
Reference............................................................................................................. 24
Appendix ............................................................................................................. 27
3
1. Introduction
1.1 The latest discovery of 2.1 Gyr old Eukaryotic Fossils
Ever since Charles Darwin, palaeontological records were accepted as one of the most powerful evidence
for biological evolution. His prediction of the extensive prehistory life, reasoned from Cambrian fossils 542
million years ago, were proved by the latest discovered macroscopic fossils from Paleoproterozoic rocks
dated 2.1 billion years ago from southeast of Gabon. These fossils were believed to be eukaryotes as they
contain the eukaryotic signature compounds, sterane (derived from sterol precursors) (Albani et al. 2010;
Donoghue & Antcliffe 2010).
The time scale of these eukaryotic fossils was dated after the Great Oxidation Event of Earth (Figure 1.1,
Donoghue & Antcliffe 2010). It was astonishing to push the fossil record of eukaryotes 200 million years
back, but still, it will make sense to believe that the origin of eukaryotes occurred shortly after the first rise
of oxygen in the atmosphere 2.4 billion years ago, as cyanobacteria consumed toxic greenhouse gases and
accumulated oxygen. Moreover, there is quite a high possibility that these fossils are evidence for presently
extinct ancestral-eukaryuotes, and more details of the unclear characters of this ancient lineage could be
inferred.
Figure1.1. Fossil Evidence of early life and the corresponding chemical composition of Oceans and Atmosphere on
earth (Donoghue & Antcliffe 2010, with the permission from the publisher).
1.2 The 3D Scenario verses the 2D Scenario
The two domains scenario (2D Scenario), based on those phylogenetic trees that show an Archaeal origin of
Eukaryotes, states that there are only two domains in the universal tree of life: one is the Prokaryotes that is
composed of Bacteria and Archaea, and the other is Eukaryotes, which arose as the descendent of the first
4
domain (Gribaldo, 2010). Within the two domain scenario, the hypothesis suggests that Eukarya was
originated from the fusion of Bacteria and a phylum of Archaea, Crenarchaeota, based on small-subunit
ribosomal RNA (16s rRNA) phylogenies (Lake, et al. 1984; Rivera & Lake, 1992). Alternative hypothesis
indicate that Eukarya was emerged from symbiosis of Bacteria and the other Archaea phyla, Euryarhaeota
(Margulis, 1970; Margulis, 1996; Searcy & Stein, 1978; Searcy, 2003). However, the argument against 2D
scenario is that, despite the example of gammaproteobacterial symbionts hosted in betaproteobacterium, no
proof of endosymbiotic association between Archaeon and Bacterium has been obtained yet (Poole &
Penny, 2007).
The three domains scenario (3D Scenario), based on the observation from classical universal tree of life,
proposes that all the living organisms were divided into three independent domains: Eukaryote, Bacteria
and Archaea (Woese, 1990). In this scenario, a present-extinct ancestral-eukaryotic lineage was suggested
to explain the origin of mitochondria and the evolvement of other complex characters in eukaryotes
(Kurland, et al. 2006; Poole & Penny, 2007; Field & Dacks, 2009).
Seven recently performed large-scale phylogenomic investigations on tree of life, applying different
strategic approaches failed in drawing a consensus conclusion. Three studies show independent clades of
Archaea and Eukaryotes, which is in agreement with 3D Scenario, whereas four studies exhibit a
relationship between Eukaryotes and a particular lineage of Archaea, which supports 2D Scenario, but the
particular lineage of Archaea appears to be variable from one to another. Thus, the significance of a robust
rooted phylogeny of Archaea was emphasized on the way to answering the question of origin of Eukaryotes
(Brochier Armanet, et al. 2008; Robertson, et al. 2005; Gribaldo, et al. 2010).
1.3 Endosymbiont Theory
The endosymbiont theory which states that mitochondria and chloroplasts were derived from free-living
bacteria was first proposed in the nineteenth century, but it was not widely accepted until Margulis
reannounced it with her own molecular experimental evidence in the 1970s (Brindefalk, 2009).
Many characters such as single circular genome, bacteria-type transcription and translation enzynmes and
components, and bacterial-binary-fission-like replication and division have supported the link of
mitochondria and plastids to bacteria (Barton, et al. 2007). The phagotrophic nature of eukaryote also
might have facilitated the occurence of endosymbiont events (Kurland & Penny, 2006). Conclusive proof
of the endosymbiont theory came from phylogenetic analysis.
Phylogenetic analysis of all primary plastid traced back to a single endosymbiont event between a
eukaryotic cell and cyanobacteria, and the origin of mitochondria was phylogenetically traced back to a
common ancestor within alphaproteobacteria. Early genomic studies suggested that the highly reduced
genome of Rickettsia is the most related alphaproteobacteria to mitochondria (Andersson, et al. 1998). A
recent phylogenomic study also placed mitochondria within the alphaproteobacteria clade (unpublished
results), but the exact position within alphaproteobacteria is still unknown.
The discovery of homologs of mitochondrial proteins in amitochondrial protists (Bui, et al., 1996; Clark &
5
Roger, 1995; Germot, et al., 1996; Germot, et al., 1997; Horner, et al., 1996; Mueller, 1998; Roger, et al.,
1996; Roger, et al., 1998) suggested that these organisms, such as Trichomonas, Giardia and Entamoeba
are actually secondarily amitochondrial, and all eukaryotes have, or at least have had mitochondria. Hence,
it was proposed that the endosymbiont event of mitochondria occurred before the emergence of the last
common ancestor of eukaryotes (Kurland & Andersson, 2000). In that, answering the question about origin
of mitochondria is helpful to unravel the mystery of the eukaryotic origin.
Another interesting phenomenon as a result of endosymbionts was the gene migration from the organelle
genome to the nuclear genome, and thus the reduction of the organelle genome, since stable environment
within eukaryotic cells has eliminated the unnecessary genes that used to be essential for free-living
bacteria. For example, the maximum number of protein-coding genes in organelle genomes is only 100
(Barton, et al. 2007). As compensation, materials including metabolites, non-coding RNAs and proteins
were imported from their host. However, many imported proteins or RNAs are not necessarily of
eukaryotic origin; sometimes the genes that encode them were actually transferred from the organelle and
of bacterial origin.
1.4 Eukaryotic Signatures
Features such as nucleus, endomembrane system (Bapteste, et al., 2005; Mans, et al., 2004; Field, et al.,
2009), mitochondrion (Embley, 2006; Giezen & Tovar, 2005), spliceosomal introns (Collins & Penny, 2005;
Roy & Gilbert, 2006), linear chromosomes with telomeres synthesized by telomerases (Nakamura & Cech,
1998), meiotic sex (Ramesh, et al., 2005), sterol synthesis (Desmond & Gribaldo, 2009), unique
cytokinesis structures (Eme, et al., 2009) and the capability of phagocytosis (Cavalier, 2002; Jekely, 2003)
have distinguished eukaryotes from the other two domains of bacteria and archaea (Gribaldo, et al. 2010).
From the perspective of comparative genomics, proteins with no orthologs in the genomes of prokaryotes
are regarded as eukaryotic signature proteins (ESPs) (Kurland, et al. 2006), and 137 proteins were
described as ESPs in the yeast mitochondrial proteome ten years ago (Karlberg, et al., 2000). As there are
much more genomic data available nowadays, it would be interesting to blast these previously defined
ESPs against the current prokaryotic genomic database, looking for homologs that might have been missed
ten years ago due to limited resources.
1.5 Hsp multigene family and Mitohchondrial Protein Import Machine
Heat shock proteins are groups of proteins that are conserved in almost all living organisms (Craig, 1985).
They function as molecular chaperones, preventing proteins from degradation and promoting the folding of
preproteins during their translocation across membranes (Johnston, et al. 1998, Gupta, et al. 1994).
Families of large Hsps were categorized based on their molecular weight, such as Hsp90, Hsp70, Hsp60
and Hsp40, where in particular, the function of mtHsp70 is clear as a central part of an ATP-dependent
preprotein import motor across the inner membrane of mitochondria into the matrix (Figure1.2, Pfanner &
Geissler, 2001; Kang, et al., 1990; Horst, et al., 1997; Voos, et al., 1996).
6
It was recently discovered that, all large Hsps in eukaryotic cells cluster in four different monophyletic
groups according to their subcellular location (Huang, et al. 2008). The cytoplasm and endoplasmic
reticulum versions appear to be evolved from a duplication of the same ancestral gene, while the origin of
mitochondrial and plastid versions are more controversial for different Hsps. However, phylogenies for
both Hsp70 and Hsp60 are consistent with endosymbiont theory, where mitochondrial versions are grouped
with proteobacteria (Hsp60 was shown to be grouped with alpha-proteobacteria) and plastid versions are
grouped with cyanobacteria (Figure 1.3, Boorstein, et al. 1994, Huang, et al. 2008).
Figure1.2. Mitochondrial Two main protein import pathways (Pfanner & Geissler, 2001, with the permission from
the publisher).
Since places of different versions of Hsps in the content of three domains are still mysterious, it would be
interesting to look at the whole picture within tree of life and to try to untangle the evolutionary
relationship among eukaryotes, bacteria and archaea. Questions concerning origin of mitochondria, such as
is mitocondrial version of Hsp70 grouped with alpha-proteobacteria as was observed from Hsp60, which
order, or more specifically, which living species of alpha-proteobacteria would be the best representative
for the ancestor of mitochondria remain to be answered. In addition, mitochondrial version from eukaryotic
species that have loss their mitochondria, might provide another perspective of view for both origin of
mitochondria and origin on eukaryotes.
1.6 tRNA gene loss in Mitochondrial genome and Aminoacyl-tRNA Synthetase gene
In contrast to the monophyletic origin of mitochondrial protein import, tRNA import evolved multiple
times during the evolution of eukaryotes, since some tRNAs were lost from the genome of mitochondria,
and the loss of the same tRNA occurred several times in different eukaryotic lineages independently
(Adams and Palmer 2003; Gissi et al. 2008).
Schneider proposed the hypothesis that the frequency of mitochondrial tRNA gene loss is positively
correlated to the sequence similarity between the bacterial-type (endosymbiont) mitochondrial
aminoacyl-tRNA synthetases (aaRSs) and their eukaryotic-type cytoplasmic homologs (Schneider 2001;
7
Lithgow 2010).
However, the phylogenetic studies of aaRSs have falsified the assumption of Schneiders hypothesis that
mitocohdrial genes are always of bacteria origin and cytoplasmic genes are of eukaryote origin. And the
truth is that the complex evolutionary history of mitochondrial and cytoplasmic aaRSs have got some aaRS
active in both compartment but of single origin, either from bacteria or eukaryote (Brindefalk et al. 2007).
B.
A.
Figure1.3. Multigene copies of HSP70 and HSP60 in Eukaryotic cells (Boorstein, et al. 1994; Huang, et al.
2008, with the permission from the publishers). A: Distance-marix based phylogeny of HSP70 (Boorstein, et al.
1994), B: Neighbor-Joining based phylogeny of HSP60 (Huang, et al. 2008). Both trees show different origin of
genes functioning in different organelles. Mitochondrial copy of HSP70 is grouped with proteobacteria, and
more specifically, HSP60is grouped with Alphaproteobacteria; Plastid copy of HSP70 is grouped with
Cyanobacteria, and so is Chloroplast copy of HSP60.
Hence, it would be interesting to take this study further, untangle the patterns of tRNA loss and raise the
question, since it is not reasoned by the sequence similarity of aaRS homologs, what is the appropriate
explanation for tRNA loss in the mitocondrial genome.
Unpublished evidences in Brindefalks PhD thesis suggested that there was a co-evolution between
mitochondrial tRNA loss and the replacement of corresponding aaRSs, based on all the mitochondrial
genomes available up until November 2006. In this project, we are trying to update Brindefalks analysis
with the latest mitochondrial genome resources in December 2010.
8
2. Material and Methods
2.1 Blast for Eukaryotic Signature Proteins
There were 137 genes from yeast proteome that were defined as Eukaryotic Signature Proteins, ESP, whose
homologous were only found in Eukaryotes (Karlberg, et al., 2000). Since more genomic data is available
nowadays compared with ten years ago when Karlberg did his research, these 137 ESPs were blast against
the prokaryotes genome database that were updated in April 2010 according to NCBI. The database
includes genomes of 1113 species, within which, 78 are Archaea and 992 are Bacteria. The program
BLASTP 2.2.18 was used locally, and the E-value threshold of blast was set at 10-10.
For genes that do have prokaryotic hits, according to blast results, a Maximum Likelihood tree was
constructed for each gene, using CAT matrix and WAG model in RAxML -7.0.4. Sequence alignment was
processed in Kalign version 2.03.
2.2 Phylogenies for Mitochondrial Transmembrane proteins
Protein sequences of HSP60 and HSP70 from three domains were blast from the NCBI dataset, including
52 Bacteria, 8 Archaea, and 31 Eukaryotes. For the 52 bacterial sequences, 33 were from 6 orders of
alphaproteobacteria, including Rickettsiales, Rhodospirillales, Rhizobiales, Rhodobacterales,
Sphingomonadales and Caulobacterales, 2 were environmental sequence SAR11 and SAR116, where
SAR11 clade was represented by Candidatus Pelagibacter. For the 31 eukaryotic sequences, 7 were from
endoplasmid reticulum, 8 were from cytoplasm, 16 were from mitochondria, and for all the three
eukaryotic copies, there were always some species who have lost their mitochondria being taken into our
consideration, such as Giardiai intestinalis, Entamoeba dispar, Trichomonas vaginalis and
Encephalitozoon cuniculi.
Kalign 2.03 was used for sequence alignment, and a maximum likelihood tree was constructed with
RAxML -7.0.4, using CAT matrix and WAG model. Before running Bayesian tree, a protein model test was
performed in Prottest version 2.4, to find the best combination of models and matrix that would give an
ideal topology of trees with the highest support values. And finally, the Bayesian trees were run in
Phylobayes3.2, using models suggested by Prottest results. Two chains were generated in Phylobayes,
running over 10,000 generations. To accept the final phylogeny, the maximum difference between two
chains is below 0.01. One optional treatment of the alignment is to pick up the conserved part of whole
alignment for the construction of phylogenies, using GBLOCKS 0.91b.
Different parameters tested in Bayesian Phylogenies are listed in table 1, for example, we were interested in
the changes of tree topology and relative places of certain species, when the sequence content is within
three domains or only within alphaproteobacteria, when the tree was made of conserved part of the
sequences or with whole alignment, when species without mitochondria were or were not taken into
account, and when the trees were rooted in different ways.
9
Table2.1. Parameters for HSP phylogenies
HSP70
HSP60
Content
Clean-ups
Amito-taxa
Root
3D
-
+
Endoplasmic reticulum copy
3D
+
-
Endoplasmic reticulum copy
3D
+
+
Endoplasmic reticulum copy
Alpha
-
-
Gama-proteobacteria /Aquifiae
Alpha
-
+
Gama-proteobacteria
Alpha
+
+
Gama-proteobacteri
Alpha
+
+
Gama-proteobacteria
3D
-
+
Endoplasmic reticulum copy
3D
+
+
Endoplasmic reticulum copy
Alpha
-
-
Gama/Aquificae
Alpha
-
+
Gama-proteobacteria
Alpha
+
-
Gama-proteobacteria
Alpha
+
+
Gama-proteobacteria
2.3 Mitochondrial tRNA gene loss
2.3.1 Construction of phylogenetic trees
Phylogenetic tree for all Viridiplantae species whose mitochondrial genome was available was indicated
from a highly conserved cox-1 gene, cytochrome c oxidase subunit 1. All the sequences were obtained from
NCBI Genome Organelle Resources. Since species from the same order were not presented in the tree
repeatedly, there are 37 plants shown in the final Bayesian tree. Maximum Likelihood tree was constructed
in RAxML -7.0.4, with CAT matrix and WAG model. Prottest 2.4 was used to find the best protein matrix
and model that fit our data. The Bayesian tree was constructed in Phylobayes3.2. Two chains were running
parallel in phylobayes for more than 10,000 generations. To accept the final phylogeny, the maximum
difference between two chains is below 0.01.
Phylogeny for all Fungi species was made in the same way as for Viridiplantae. In the final tree of Fungi,
there are 47 species present in total.
2.3.2Parsimony inference of tRNA loss patterns
The tRNA genes present and absent matrix was inferred from mitochondrial genomes of all taxa obtained
from Organelle Genome Resource. The parsimony pattern of tRNA gene loss was inferred from Paup4.0
b10. Calculation of minimum number of tRNA gene loss was parsed by Bio Perl script written by our own.
10
2.3.3 Statistics in R
ANOVA for minimum number of mitochondrial tRNA loss and the class of corresponding aminoacyl-tRNA
synthetase gene was performed in R, version 2.7.2. Two types of aaRS gene classification was tested, one
was three group classification from Brindefalk (Bridefalk et al. 2007), the other was two group
classification, where in one group, aaRS genes from mitochondria and cytoplasm were of different origin,
and in the other group, aaRS genes were of single origin (table 2.2).
Table2.2. Classification and Number of Aminoacyl-tRNA Synthetase Genes
Aminoacyl-tRNAsynthetase
Group
Number of genes
TyrRS
A
2
TrpRS
A
2
GluRS
A
2
MetRS
A
2
PheRS
A
2
ProRS
A
2
SerRS
A
2
LeuRS
A
2
AspRS
A
2
IleRS
B
2
ArgRS
B
2
AsnRS
B
2
CysRS
C
1
HisRS
C
1
AlaRS
C
1
ThrRS
C
1
GlyRS
C
1
ValRS
C
1
LysRS
C
2
GlnRS
C
1
Three group classification (Classifications from Brindefalk et al. 2007): A: Mitochondrial aaRS of bacterial affiliation and
cytoplasmic aaRS of archaeal affiliation, B: both mitochondrial and cytoplasmic aaRS of bacgterial affiliation, C: single
aaRS active in both compartments.; Two group classification: A: Mitochondrial aaRS of bacterial affiliation and
cytoplasmic aaRS of archaeal affiliation, BC: both mitochondrial and cytoplasmic aaRS of single origin.
Pearson Correlation between minimum number of mitochondrial tRNA gene loss and the sequence identity
between 5 eukaryotes and 10 bacteria aaRS was analysed in R. Pearson Correlation of total number of
mitochondrial tRNA genes loss to both mitochondrial genome size and the coding genome size was
analysed as well.
11
3. Results
3.1 Blast for Eukaryotic Signature Proteins
Out of 137 eukaryotic signature proteins previously defined in Yeast mitochondrial proteome base on
genome resource available ten years ago in 2000, there are 12 prokaryotic homologs hits in the blast reulsts
when the threshold were set at 10-10, 5 of these 12 genes get more than 5 hits (Table 3.1). PIF1,
mitochondrial DNA repair and recombination protein hits 93 as the highest number of homologs, whereas
ATP16, IMP1, IMP2, MIP1 and PIS1 only got 1 hit each. But Maximun Likelihood tree of the prokaryotic
homologs for these genes appears to be inconsistent with universal patterns for clustering of different
bacterial phylums, which suggests that not all these prokaryotic hits are the real homologs to the
corresponding ESPs. For instance, the identity of query sequence to hit sequences range from 25% to 35%,
the highest sequence identity is 51.11%.
Table3.1 Prokaryotic homologs found out of 137 previously defined Eukaryotic Signature Proteins in Yeast
Mitochondrial Proteome
Genes
Description
Num of Hits
PIF1
mitochondrial DNA repair and recombination protein PIF1 precursor
93
SHY1
out membrane translocase complex
26
MCR1
NADH-cytochrome B5 reductase precursor (P34/P32)
21
ribosomal protein
mitochondrial ribosomal protein of the small subunit; member of E.coli S4 superfamily
8
RPO41
mitochondrial DNA-directed RNA polymerase
7
QCR2
ubiquinol-cytochrome C reductase core protein 2
2
YML025C
putative L4P like ribosomal protein
2
ATP16
ATP syntase delta chain
1
IMP1
mitochondrial inner membrane protease subunit 1
1
IMP2
mitochondrial inner membrane protease subunit 2
1
MIP1
DNA polymerase gamma (mitochondrial DNA polymerase catalytic subunit)
1
PIS1
CDP-diacylglycerol-Inositol 3-phosphatidyltransferase
1
3.2 Phylogenies for HSP70 and HSP60 multigene family
3.2.1 Trees for three domains of Life
In the phylogeny based on Bayesian method for HSP70, eukaryotes and prokaryotes form two independent
clades, when the tree was rooted with the paralog from endoplasmic reticulum of eukaryotes, though the
topology of the tree does not change if it was midpoint-rooted (Figure 3.1). Mitochondrial copy of HSP70
from all lineages as well as the amitochondrial species, Encephalitozoon cuniculi, Giardia lamblia,
Entamoeba histolytica and Trichomonas vaginalis appear to be of single origin with 0.99 support, and the
mitochondrial clade was grouped within alphaproteobacteria as endosymbiont theory suggested, especially
close related to Rickettsiales with one hundred percent support. The rest order of alphaproteobacteria are
well seperaded, and Pelagibacter, representative of SAR11 data is clustered within Rhodobacterales, also
with one hundred percent support.
12
Figure3.1. Tree of Three domains inferred from HSP 70
Bayesian algorithm based on full sequence alignment, Dirichlet process for rates across sites (CAT model), LG distribution
for relative exchangeability (exchange rate), rooted with endoplasmic reticulum copy of gene from eukaryotes. The
phylogeny shows a single clade of mitochondria, and its sister clade to alphaproteobaceria. Archaea is mixed with Firmicutes
and Actinobacteria, which may resulted from a lateral gene transfer. The cytoplasm copy of gene from Eukaryotes appeared
at the base of the tree, which suggests an ancient origin of Eukaryotic genes.
13
Figure3.2. Tree of Three domains inferred from HSP 60
Bayesian algorithm based on the alignment of 456 amino acid after clean-ups of the original alignment. Dirichlet process for
rates across sites (CAT model), LG distribution for relateive exchangeability (exchange rates), rooted with endoplasmic
reticulum copy of gene from eukaryotes. The phylogeny shows a single clade of mitochondria, and its sister clade to
alphaproteobaceria. Archaea appeared at the base of the tree, and the single clade of cytoplasmic copy of gene from
eukaryotes is diverged within the Archae. There is also one Archael linage goes with Firmicutes as in the case of HSP70.
14
The alphaproteobacteria clade is followed by gammaproteobacteria, betaproteobacteria and
deltaproteobacteria one by one as was seen from universal tree of life. Chlamydia, Victivallis and
Planctomyces are grouped close to each other as a PVC Superphylum as previously suggested, but do not
form its own clade. Aquificae appear to be the earliest diverged bacterial phylum in the tree of HSP70, the
node of Cyanobacteria is also place quite deep in the tree.
The most unexpected part of the tree of HSP70 is the mixture of Archaea with firmicultes and
actinobacterias from Bacteria. Thus neither of Bacteria domain or Archaea domain is monophyletic. And a
horizontal gene transfer between Archaea and Bacteria for HSP70 was suspected.
The phylogeny of HSP60 based on Bayesian method is also rooted with Endoplasmic reticulum under the
assumption that HSP70 and HSP60 have the same evolutionary history, though the topology of the tree
remains the same when it was midpoint-rooted (Figure 3.2). But the Endoplasmic reticulum copy of HSP60
is only found in mammals, which may be resulted from a recent duplication of this gene.
Figure3.3. Tree of alphaproteobacteria and Mitochondria inferred from HSP70
(With amitochondrial eukaryotes and clean-ups)
15
In the tree of HSP60, eukaryotic clade shows an affiliation to Archaea, as was seen from universal tree of
life, and it is especially close related to a clade of Archaea that contains two species from Crenarchaeota,
Thermofilum pendens, Staphylothermus marinus and one Korachaeota species Candidatus Korarchaeum.
But neither of Euryarchaeota or Crenachaeota is monophylotic in the tree of HSP60. Furthermore, one
Euryarchaeota species Methanospirillum hugatei shows it affiliation to Bacteria rather than to the other to
the other Archaea lineages.
The rest patterns in the phylogeny of HSP60 is consistently with HSP70, as it sufficiently supports the
endosymbiont of mitochondria and mitochondria derived organelles form Rickettsiales within
alphaproteobacteria. The representative of SAR11 clade is grouped with Rhodobacterales, and the
representative of SAR116 clade is clustered within Rhodospirillales and Caolobacterals.
Figure3.4. Tree of alphaproteobacteria and Mitochondria inferred from HSP60
(Without amitochondrial eukaryotes or clean-ups)
16
3.2.2 Trees for Mitochondria and Alphaproteobacteria
Phylogenies for both HSP70 and HSP60 were constructed within the context of alphaproteobacteria, and
two trees were rooted with gammaproteobacteria (Figure 3.3 & Figure 3.4). Both trees highly support the
endosoymbiont of mitochondria from Rickettsiales, the SAR11 clade was grouped with Rhodobacterales
and Caulobacterales, and the SAR116 clade was clustered within Rhodospirillales.
But Rhodospirillales is not monophyletic in both tree of HSP70 and HSP60. Sphingopyxis alaskensis from
Sphingalmonales is clustered with Rhizobiales in the tree of HSP60, which may due to horizontal gene
transfer or merely because of a coding bias.
3.3 Mitochondrial tRNA gene loss
3.3.1 tRNA gene loss in Viridiplantae Kingdom
Trp
Leu, Arg, Val
Leu, Arg, Val
Gly, Thr
Ile, Leu, Arg, Val
Val
Asp, Trp
Val
Thr
Leu, Arg, Thr, Val
Leu, Arg, Thr, Val
Leu, Arg, Thr, Val
Ser
Ala
His
Phe
Asp, Phe, Gly, Lys, Gln, Ser, Val
Leu, Arg, Thr
Thr
Asn
Asn,
Arg, Ser
Val
Thr
Ile, Lys, Met, Asn, Gln, Arg
Ala, Phe, Gly, Lys, Arg, Thr
Thr
Ala, Cys, Asp, Glu, Phe, Gly, His, Ile,
Lys, Leu, Asn, Pro, Arg, Ser, Val, Tyr
Gln, Trp
Thr
Ala, Asp, Gly, Ile, Lys, Met,
Asn, Pro, Arg, Ser, Thr, Val
Figure3.5. Parsimony inference of tRNA gene loss patterns in Viridiplantae Kingdom
Phylogeny for Viridiplantae kingdom was constructed based on highly conserved COX1 gene, cytochrome
c oxidase subunit 1, and the tree was rooted with a Metazoa species Nematostella, since a Fungi species
Saccharomyces cerevisiae is clustered as a sister node of the other plants, and it would be more
controversial to root the tree with Saccharomyces.
17
The patterns for 95 minimum number of tRNA loss in Viridiplantae kingdom calculated from parsimony
inference were tagged in the constructed phylogeny (Figure 3.5). tRNAs such as Thr and Val was lost more
than ten times, including in internal nodes and in external leaves. But tRNAs such as Cys, Glu and Tyr was
lost only once for each within the whole Viridiplantae kingdom.
3.3.2 tRNA gene loss in Fungi Kingdom
Arg
Ala, Cys
Pro, Val
Cys
Glu
Val
Glu, Tyr
Cys
Cys
Glu, Gly
Ala, Cys,
Phe, His, Ile,
Asn, Arg, Ser, Thr, Val
Asp
Leu
Gly
Figure3.6. Parsimony inference of tRNA gene loss patterns in Fungi Kingdom
Phylogeny for Fungi kingdom was constructed based on COX1 gene as well, and was rooted with a node
containing one specie from Viridiplantae kingdom Zea mays and one species from Metazoa Nematostella.
Those tRNA annotated as tRNA-X, tRNA-Sec, tRNA-Glx were double checked and reannotated according
to their anti-codon (Appendix, Table A2). The pattern of 28 minimum number of tRNA loss was calculated
from parsimony inference was presented in the constructed phylogeny for Fungi.
3.3.3 ANOVA for loss tRNA genes of different origin
The number of deletion for each tRNA are showed in Figure 3.7, nine aminoacyl-tRNA synthetases
classified as group A (two genes of different origins) were less likely tend to loss corresponding tRNAs,
while the rest three aaRSs from group B and nine from group C, or group BC as referred in 2 groups
classification (mitochondrial and cytoplasmic tRNAs of single origin), are more likely to lose their
18
corresponding tRNAs.
ANOVA test has confirmed this tendency for two-group-classification, both for when Gln and Lys was
included and excluded (P=0.03659, P=0.0316). But it is not true for three-group-classification, either when
Gln and Lys were included or not (P=0.0851, P=0.0915).
40
35
Deletions
30
25
20
15
10
5
Arg B
Thr C
Val C
Gly C
Cys C
Asp A
Pro A
Ile B
Asn B
Gln C
Glu A
Ser A
Ala C
Lys C
Leu A
Phe A
Trp A
Tyr A
His C
Met A
0
tRNA
Figure3.7 Minimal number of mitiochondrial tRNA loss inferred from analysis of 1704 mitochondrial genomes
tRNAs are categorized according to the evolutionary history of their aaRSs listed in Table2.2. Black charts: Two genes of
different origins, Grey charts: Two genes of similar origin, White charts: A single gene of either origin.
Table3.2 ANOVA test for Min-tRNA loss
Classification of aminoacyl-tRNA synthetase
tRNA-Gln & tRNA-Lys
P-value
2 groups (A, BC)
+
0.03659 *
2 groups (A, BC)
-
0.0316 *
3 groups (A, B, C)
+
0.0851
3 groups (A. B, C)
-
0.0915
3.3.4 Correlation between tRNA Min-loss and sequence identity
Correlation between minimum number of tRNA loss and the sequence identity between bacterial genes and
eukaryotic genes were plotted in Figure 3.8. The insignificant result of Pearson Correlation test
(Cor=0.3855, P=0.1141) suggested that it is inappropriate to explain the frequency of tRNA loss by
aminoacyl-tRNA synthetase sequence identity.
19
3.3.5 Correlation between tRNA Total-loss and Mitochondrial Genome size
Pearson correlation test performed in R suggested that there is correlation between tRNA total number of
loss and mitochondrial genome size, either when tRNA-Gln and tRNA-Lys were taken into account or not
(Cor=-0.06877, P=0.8234; Cor=-0.05822, P=0.8502).
Figure3.8 Sequence similarity between bacterial and eukaryotic aminoacyl-tRNA synthetases plotted against the
minimum number of tRNA gene loss. Black square: two genes of different origins, Grey triangles: two genes of similar
origins, White diamonds: a single gene of either origin. No correlation was found between sequence identity and number
tRNA loss.
3.3.6 Correlation between tRNA Total-loss and Mitochondrial Coding Genome size
Pearson correlation test performed in R suggested that there is correlation between tRNA total number of
loss and mitochondrial coding genome size, either when tRNA-Gln and tRNA-Lys were taken into account
or not (Cor=-0.04530, P=0.1200; Cor=-0.04524, P=0.1206).
Table 3.3 Pearson Correlation test
Correlation
tRNA-Gln & tRNA-Lys
Cor
P-value
Total-tRNA loss & mito-Genomesize
-
-0.05822
0.8502
Total-tRNA loss & mito-Codingsize
-
-0.4524
0.1206
Total-tRNA loss & mito-Genomesize
+
-0.06877
0.8234
Total-tRNA loss & mito-Codingsize
+
-0.4530
0.1200
Min-tRNA-loss & Sequence-identity
-
0.3855
0.1141
20
4. Discussion
4.1 Lateral gene transfer between Bacteria and Archaea
The mixed pattern of Archaea within phylums of Bacteria, namely Firmiculte, Actinobacteria and
Thermogae, indicated that there was a lateral gene transfer between Archaea and Bacteria. And this
phenomenon was reported by previous researches, and accordingly, only the affiliation between
Actionbacteria and Archaea could be explained by conding bias (Macario, et al. 2006).
4.2 Deep node of Cyanobacteria
The phylogenetic position of Cyanobacteria is placed in a deep node in both trees for HSP70 and HSP60,
which is different from what was observed from universal tree of life based on concatenated protein
sequences (Brown, et al. 2001). However, the early divergent of Cyanobacteria in the phylogeny is
consistent with the discovery of 3.5 billion years old stromatolite fossils of morden-Cyanobacteria-like
ancient life (Barton, et al.2007).
4.3 Place of SAR11 and SAR116 sequences in HSP phylogenies
Placement of SAR11 within alphaproteobacteria was emphasized because this most abundant clade of
free-living bacteria is likely to be the closest representative of ancestors of mitochondria, since the
upper-layer-of-ocean lifestyle perfectly matches the oxygen rich environment required for the arise of
aerobic cellular organelle. However, it is difficult to draw a straight conclusion where does Pelagibacter
belong to. Clustering both within Rickettsiales and Rhodobacterales was observed in different phylogenies
constructed with different parameters, but it tended to give higher support with Rhodobacterales.
SAR116 is another clade whose lifestyle is still unclear. The genome of a Punieispirillum marinum from
this clade was released in June 2010 (Oh, 2010), and the sequence was available online early April.
Therefore, it is the first time to investigate the phylogenetic placement of this lineage within
alphaproteobacteria. And the phylogenetic results from both HSPS70 and HSP60 agreed that SAR116 is
always grouped within Rhodospirillales.
4.4 Clean-ups of alignment result in variations of phylogenetic topology
Topology within alphaproteobacteria for both HSP70 and HSP60 varies depending on the clean-ups of
sequence alignment. For HSP70, the order Rickettsiales is monophyletic only when amitochondria species
were excluded from the tree, and the sequence alignment was used directly without clean-ups. Otherwise,
the order is divided into two subclades, one containing Rickettsia, and the other containing Wolbachia and
the rest species in Rickettsiales, and mitochondrial clade are more related to the Wolbachia subclade of
Rickettsiales. In the alpha tree for HSP70 with amitochondrial lineages and clean-ups of the alignment,
mitochondrial clade shares the common ancestor with the Wolbachia subclade of Rickettsiales.
21
For HSP60, relationship between mitochondrial clade and Rickettsia dose not change whether sequence
alignment was applied. But the relative position of Planctomyces and Chlamydia fluctuated according to
the application of sequence alignment, that PVC phylum was broken down when the clean-up step was
missed. One explanation for that HSP60 prefers alignment clean-ups is the lower identity between paralogs
from endoplasmic reticulum and cytoplasm.
4.5 Paralog Rooting
The extreme conservation and multicopies of the two genes serve as a perfect marker for investigation of
the evolutionary relationships between Eukaryotes, Bacteria and Archaea. The duplication of Endoplasmic
reticulate and cytoplasmic copies gene in Eukatyotes has facilitated a possible way to root the phylogeny
with a eukaryotic paralog.
The ER copy of HSP60 was only found in mammals, the protein sequences, expect the cpn60 domain are
more variable than the cytoplasm copy, and less conserved than either archaeal or bacterial homologs. This
suggests an ancient duplication within this gene. The advantage of rooting in this manner is the
monophyletic of prokaryotes in the phylogeny of HSP70, and the molophyletic of bacteria in the phylogeny
of HSP60.
4.6 Consistence with Endosymbiont Theory
Phylogenies from both HSP70 and HSP60 supported the endosymbiont theory and indicated the single
origin of mitochondria from all lineages including secondary amitochondrial organisms. There is no doubt
that mitochondrial copies of these two genes were derived from alphaproteobacteria, specifically within
Rickettsiales order. And this topology is quite robust, no matter how the tree is rooted, or how many species
of alphaproteobacteria is included.
However, not all proteins from HSP multigene family share the same evolutionary history. The phylogenies
in this project are only single-gene trees, further genome wise analysis would provide a broader view in
answering the question of origin of eukaryotic cells.
4.7 Co-evolution of tRNA gene loss and Aminoacyl-tRNA Synthetase gene
Both tRNA-Gln and tRNA-Lys were excluded in correlation test between tRNA loss and aminoacyl-tRNA
synthetases sequence identity, due to the complex evolution history of GlnRS and LysRS. In correlation test
for tRNA loss to both mitochondrial genome size and coding genome size, statistics for both with and
without consideration of tRNA-Gln and tRNA-Lys were performed, and two set of analysis gave the same
results that neither mitochondrial genome size nor coding genome size is related to the total number of
tRNA loss.
ANOVA test and Pearson Correlation test suggest that aminoacyl-tRNA synthetases with higher sequence
identity between bacterial type and eukaryotic type do not necessarily tend to loss the corresponding tRNAs
in mitochondrial genome. In reality, single origin of aaRSs that functions in both mitochondria and
22
cytoplasm are more likely to lose their corresponding tRNAs.
The minimum number of tRNA loss inferred from parsimony algorithm would be different, as more
mitochondrial genomes are continuously being released. But comparing to Brindefalk’s work in 2008
(Brindefalk, 2009), this research has updated the database during the 3 years, the total number of eukaryotic
taxa has almost doubled itself, increased from 914 to 1704 (Appendix Table A1), the conclusion is still
concrete. Hence, there is the confidence to believe a co-evolution between tRNA genes loss and their
aminoacyl-tRNA synthetases.
Acknowledgement
Thank my supervisor Professor Siv Andersson for her instructions in each progressive step, provide me
with opportunities to learn various bioinformatic methodologies, trust in my capability to finish such a big
project as a Master student and to answer such a fundamental scientific question within six months.
Thank Professor Charles Kurland from Lund University for his concern about this project, thanks for his
visit to my lab, suggestion for my work both practically and theoretically, and thanks for the discussions
even when he was traveling around the world.
Thank Johan Viklund for his magical computation tutorial, and thank him for the data mining work when
we are solving the problems for tRNA loss.
Thank Thijs Ettema for his computer since all my Bayesian trees were constructed in that way, and thank
him for his suggestion and discussion.
Thank Lin Gen for helping me learn how to start with everything in this lab.
Thanks to everybody in Department Molecular Evolution.
And finally, thanks to Nature Publication Group and Springer for their permission of quoting some figures
in this thesis.
23
Reference
Adams K.L., Palmer J. D., Evolution of mitochondrial gene content: gene loss and transfer to the nudcleus. Mol Phylogenet
Evol 2003; 29: 380-95.
Albani A. E., Bengtson S., Canfield D. E., et al. Large colonial organisms with coordinated growth in oxygenated
environments 2.1 Gyr ago. Nature 2010; 466: 100-104.
Andersson S. G. E., Zomorodipour A., Andersson J. O., et al. The genome sequence of Rickettsia prowazekii and the origin of
mitochondria. Nature. 1998; 396: 133-143.
Bapteste E., Charlebois R. L., MacLeod D. & Brochier C. The two tempos of nuclear pore complex evolution: highly
adapting proteins in an ancient frozen structure. Genome Biol 2005; 6: R85.
Barton N. H., Briggs D. E.G., Eisen J. A., Goldstein D. B., Patel N. H., Evolution, Cold Spring Harbor Lab Press, New York.
2007; 8: 202-210.
Boorstein W. R., Ziegelhoffer T., Craig E. A., Molecular Evoluton of the HSP70 Multigene Family. J Mol Evol 1994; 38:
1-17.
Brindefalk B., Mitochondrial and Eukaryotic Origins: a phylogeneic perspective. Uppsala University. 2009; 1: 15-20.
Brindefalk B., Viklund J., Larsson D., Thollesson M., Andersson S. G. E., Origin and evolution of the mitochondrial
aminoacyl-tRNA synthetases. Mol Biol Evol. 2007; 24: 743-56.
Brochier-Armanet C., Boussau B., Gribaldo S., Fortere P., Mesophilic Crenarchaeota: proposal for a third archaeal phylum,
the Thaumarchaeota. Nature Rev. Microbiol. 2008; 6: 245-252.
Brow J. R., Douady C. J., Italia M. J., Marshal W. E., Stanhope M. J., Universal trees based on large combined protein
sequence data sets. Nature Gene. 2001; 28: 281-285.
Bui E. T. N., Bradley P. J., Johnson P. J., A common evolutionary origin for mitochondria and hydrogenosomes. Proc. Natl.
Acad. Sci. USA.1996; 93: 9651-9656.
Cavalier-Smith T., The phagotrophic origin of eukaryotes and phylogenetic classification of Protozoa. Int. J. Syst. Evol.
Microbiol. 2002; 52: 297–354.
Clark C. G., Roger A. J., Direct evidence for secondary loss of mitochondria in Entamoeba histolytica. Proc.Natl. Acad. Sci.
USA. 1995; 92: 6518-6521.
Collins L. & Penny D., Complex spliceosomal organization ancestral to extant eukaryotes. Mol. Biol. Evol. 2005; 22:
1053–1066.
Craig E.A., The heat shock response. CRC Crit Rev Biochem 1985; 18: 239-280.
Desmond E. & Gribaldo S., Phylogenomics of sterol synthesis: insights into the origin, evolution, and diversity of a key
eukaryotic feature. Genome Biol. Evol. 2009; 2009: 364–381.
Donoghue P. C. J., Antcliffe J. B., Origins of multicellularity. Naure 2010; 466: 41-42.
Embley T. M., Multiple secondary origins of the anaerobic lifestyle in eukaryotes. Philos. Trans. R. Soc. Lond. B Biol. Sci.
2006; 361: 1055–106.
Eme L., Moreira D., Talla E. & Brochier-Armanet C., A complex cell division machinery was present in the last common
24
ancestor of eukaryotes. PLoS ONE 2009; 4: e5021.
Field M. C. & Dacks J. B., First and last ancestors: reconstructing evolution of the endomembrane system with ESCRTs,
vesicle coat proteins, and nuclear pore complexes. Curr. Opin.Cell Biol. 2009; 21: 4–13.
Germot A., Phillip H., Guyader H. L., Presence of a mitochondrial-type 70-kDa heat shock protein in Trichomonas vaginalis
suggests a very early mitochondrial endosymbiosis in eukaryotes. Proc. Natl. Acad. Sci. U.S.A. 93: 14614-14618.
Germot A., Phillip H., Guyader H. L., Evidence for loss of mitochondria in microsporidia from a mitochondrial type HSP70
in Nosema locustae. Mol. Biochem. Parasitol. 1997; 87: 159-168.
Gissi C., Iannelli F., Pesole G., Evolution of the mitochondrial genome of Metazoa as exemplified by comparison of
congeneric species. Heredity 2008; 101: 301-20.
Giezen M. & Tovar J., Degenerate mitochondria. EMBO 2005; Rep. 6: 525–530.
Gribaldo S., Poole A. M., Daubin V., Forterre P., Brochier-Armanet C., The origin of eukaryotes and their relationship with
the Archaea: are we at a phylogenomic impasse? Nature Rev Micr. 2010; 8: 743-752.
Gupta R.S., Aitken K., Falah M., Singh B., Cloning of Giardia lamblia heat shock protein HSP70 homologous: Implications
regarding origin of eukaryotic cells and of endoplamic reticulum. Proc. Natl. Acad.Sci.U.S.A. 1994; 91: 2895-2899.
Horner D. S., Hirt R. P., Kilvington S., Lloyd D., Embley T. M., Molecular data suggest an early acquisition of the
mitochondrion endosymbiont. Proc. R. Soc. London Ser. B 1996; 263: 1053-1059.
Horst M. et al., Sequential action of two hsp70 complexs during protein import into mitochondria. EMBO J. 1997;
16:2668-2677.
Huang L. H., Wang H. S., Kang L., Different evolutionary lineages of large and small heat shock proteins in eukaryotes. Cell
Res 2008; 18: 1074-1076.
Jekely G., Small GTPases and the evolution of the eukaryotic cell. Bioessays 2003; 25: 1129–1138.
Johnston J. A., Ward C. L., Kopitto R. R., Aggesomes: A cellular response to misfolded proteins. J Cell Biol 1998; 143:
1883-1898.
Kang P. J., et al., Requirement for hsp70 in the mitochondrial matrix for translocation and folding of precursor proteins.
Nature 1990; 348: 137-143.
Karlberg O., Canback B., Kurland C. G., Andersson S. G. E., The dual origin of the yeast mitochondrial proteome. Yeast
Comp. Functional Genomics 2000; 17: 170-187.
Kurland C. G., Andersson S. G. E., Origin and Evolution of the Mitochondrial Proteome. Micrbiol. Mol. Biol. Rev. 2000; 64:
786-820.
Kurland C. G., Collins L. J., Penny D., Genomics and the irreducible nature of eukaryote cells. Science 2006; 312:
1011-1014.
Lake J. A., Henderson E., Oakes M., Clark M. W., Eocyte: a new ribosome structure indicates a kingdom with a close
relationship to eukaryotes. Proc. Natl Acad. Sci. USA 1984; 81: 3786-3790.
Lithgow T., Schneider A., Evolution of macromolecular import pathways in mitochondria, hygrogenosoes and mitosomes.
Phil. Trans. R. Soc. B 2010; 365: 799-817.
25
Macario A. J. L., Brocchieri L., Shenoy A. R., Macario E. C., Evolution of a Protein-Folding Machine: Genomic and
Evolutionary Analyses Reveal Three Lineages of the Archaeal hsp70 (dnaK) Gene. J. Mol Evol. 2006; 63: 74-86.
Mans, B. J., Anantharaman, V., Aravind, L. & Koonin, E. V., Comparative genomics, evolution and origins of the nuclear
envelope and nuclear pore complex. Cell Cycle 2004; 3: 1612–1637.
Margulis L., Origin of Eukatyotic Cells Yale Univ, Press, New Haven, 1970.
Margulis L. Archaeal-eubacterial mergers in the origin of Eukaryota: phylogenetic classification of the life. Proc. Natl Acad.
Sci. USA. 1996; 93: 1971-1076.
Mueller M., Enzymes and compartmentation of core energy metabolism of anaerobic protists: a special case in eukatyotic
evolution., Evolutionary relationships among protozoa. Kluwer Academic Publishers, London, U.K. 1998; In Coombs G. H.,
Vickerman K., Sleigh M. A.., Warren A. (ed.): 109-127.
Nakamura T. M. & Cech T. R. Reversing time: origin of telomerase. Cell 1998; 92: 587–590.
Oh H. M., Kwon K. K.. Kang I., Kang S. G., Lee J. H.., Kim S. J., Cho J. C. Complete Genome Sequence of “Candidatus
Puniceispirillum marinum” IMCC1322, a Representative of the SAR116 Clade in the Alphaproteobacteria. Journ. Bacterio.
2010; 192: 3240-3241.
Pfanner N., Geissle A., Versatility of the mitochondrial protein import machinery. Nature Rev. Mol.Biol. 2001; 2: 339-349.
Poole A. M., Penny D. Evaluating hypotheses for the origin of eukaryotes. Bioessays 2007; 29: 74-84.
Ramesh M. A., Malik S. B. & Logsdon J. M. Jr., A phylogenomic inventory of meiotic genes; evidence for sex in Giardia and
an early eukaryotic origin of meiosis. Curr. Biol. 2005; 15: 185–191.
Rivera M. C., Lake J. A., Evidence that eukaryotes and eocyte prokaryotes are immediate relatives. Science 1992; 257: 74-76.
Robertson C. E., Harris J. K., Spear J. R., Pace N. R., Phylogenetic diversityand ecology of environmental archaea. Curr.
Opin. Microbiol. 2005; 8: 638-642.
Roy S. W. & Gilbert W., The evolution of spliceosomal introns: patterns, puzzles and progress. Nature Rev.Genet. 2006; 7:
211–221.
Seacy D. G., Stein D. B., Green G. R., Phylogenetic affinities between eukaryotic cells and a thermophilic mycoplasma.
Biosystems 1978; 10: 19-28.
Searcy D. G., Metablolic intergration during the evolutionary origin of mitochondria. Cell Res. 2003; 13: 229-238.
Schneider A., Dose the evolutionary history of aninoacyl-tRNA snthetasesexplain the loss of mitochondrial tRNA genes?
Trends Genet. 2001; 17: 557-558.
Schneider H. C., Westermann B., Neupert W., Brunner M., The nucleotide exchange factor MGE exerts a key function in th
ATP-dependent cycle of mtHsp70-Tim44 interaction driving mitochondrial protein import. EMBO J. 1996; 15: 5796-5803.
Woese C. R., kandler O., Wheelis M. L., Towards a natural system of organisms: proposal for the domains Archaea, Bacteria,
and Eucarya. Proc. Natl Acad. Sci. USA 1990; 87:4576-4579.
26
Appendix
Table A1. Mitochondrial tRNA gene loss in different eukaryotic groups
Numbera
Genome sizeb
Coding size
tRNA-Totc
1574
16,164
11,338
818
Fungi
54
43,357
18,123
97
Viridiplantae
36
252,019
37,725
174
Stramenopiles
17
42,438
31,096
23
Group
Metazoa
Alveolata
8
15,402
10,596
149
Amoebozoa
5
52,314
35,291
40
Rhodophyta
3
31,600
23,296
3
Cryptophyta
2
54,308
31,932
0
Haptophyceae
1
29,013
16,494
0
Jakobida
1
69,034
55,920
1
Malawimonadidae
1
47,328
31,824
0
Choanoflagellida
1
76,568
24,969
0
Heterolobosea
1
49,843
40,608
5
-
1310
Total
1704
-
a
Number of genomes analyzed
Average mitochondrial genome size of each group a
c
Total: total number of tRNA loss
b
Table A2. Re-annotation of tRNA according to Anti-codon
Taxa ID
Taxa name
Annotated tRNA
Anti-codon
Group
153609
Moniliophora perniciosa
XaatRNA
Terminal
Fungi
5755
Acanthameba castellanii
XaatRNA
tRNA-Lys
Amoebozoa
416842
Saccharina ochotensis
trnaX
tRNA-Lys
Stramenopiles
9913
Bos taurus (Cattle)
Sec_tRNA
tRNA-Trp
Metazoa
9615
Canis lupus familaris (dog)
Sec_tRNA
tRNA-Trp
Metazoa
9598
Pan trolodtes (Chimpanzee)
Sec_tRNA
tRNA-Trp
Metazoa
7668
Strongylocentrotus purpuratus ( purple urchin)
Sec_tRNA
tRNA-Trp
Metazoa
5062
Aspergillus oryzae
tRNA-Sec
tRNA-Trp
Fungi
8226
Katsuwons pelamis
Glx_tRNA
tRNA-Glu
Metazoa
8228
Euthynnus alletteratus
Glx_tRNA
tRNA-Glu
Metazoa
8235
Thunns alaluga
Glx_tRNA
tRNA-Glu
Metazoa
27
Figure A1. Phylogeny for HSP70 with increased number of alphaproteobacteria
Bayesian algorithm based on full sequence alignment, Dirichlet process for rates across sites (CAT model), LG distribution
for relative exchangeability (exchange rate), rooted with endoplasmic reticulum copy of gene from eukaryotes.
28