Download Slides Here

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Genome evolution wikipedia , lookup

Pathogenomics wikipedia , lookup

Gene expression programming wikipedia , lookup

Microevolution wikipedia , lookup

Metagenomics wikipedia , lookup

Maximum parsimony (phylogenetics) wikipedia , lookup

Koinophilia wikipedia , lookup

Quantitative comparative linguistics wikipedia , lookup

Computational phylogenetics wikipedia , lookup

Transcript
National Center for Biotechnology Information
The Tree of Life (TOL) in the age of
Genomics
or a journey through the
Phylogenetic Forest
Eugene Koonin, NCBI / NLM / NIH
RECOM BE, San Diego, May 23, 2010
National Center for Biotechnology Information
A brief history of TOL
Thinking of the history of life in terms of phylogenetic trees is as old as
scientific biology (if not older)
Charles Darwin (1859) Origin of Species [one and only
illustration]: "descent with modification"
Ernst Haeckel (1879)
The Evolution of Man
National Center for Biotechnology Information
A brief history of TOL
Advent of molecular phylogenetics – trees can be made from alignments
of homologous genes - expectations of objectively reconstructed
complete Tree of Life
•ribosomal RNA phylogeny
Woese et al. (1990) Towards a natural system of organisms: proposal for the domains
Archaea, Bacteria, and Eucarya. PNAS 87, 4576-4579 [Figure 1, modified]
National Center for Biotechnology Information
Enter genomics…and things
get complicated
Fleischmann et al.
Whole-genome random sequencing and assembly of Haemophilus influenzae Rd.
Science. 1995 Jul 28;269(5223):496-512.
National Center for Biotechnology Information
Massive horizontal gene transfer (HGT) from archaea to a
hyperthermophilic bacterium
National Center for Biotechnology Information
Significant HGT from eukaryotes to a
(facultative) symbiotic bacterium
National Center for Biotechnology Information
Phylogenetic trees for aaRSs – 20 key enzymes of protein
biosynthesis
diversity of HGT signals
©1999 by Cold Spring Harbor Laboratory Press
Wolf Y I et al. Genome Res. 1999;9:689-710
National Center for Biotechnology Information
Phylogenetic trees for aaRS: diversity of HGT
signals
©1999 by Cold Spring Harbor Laboratory Press
Wolf Y I et al. Genome Res. 1999;9:689-710
National Center for Biotechnology Information
Phylogenetic trees for aaRS: diversity of HGT
signals
Wolf Y I et al. Genome Res. 1999;9:689-710
©1999 by Cold Spring Harbor Laboratory Press
National Center for Biotechnology Information
Phylogenetic trees for aaRS: diversity of HGT
signals
©1999 by Cold Spring Harbor Laboratory Press
Wolf Y I et al. Genome Res. 1999;9:689-710
National Center for Biotechnology Information
Phylogenetic trees for aaRS: diversity of HGT
signals
©1999 by Cold Spring Harbor Laboratory Press
Wolf Y I et al. Genome Res. 1999;9:689-710
National Center for Biotechnology Information
Phylogenetic trees for aaRS: diversity of HGT
signals
©1999 by Cold Spring Harbor Laboratory Press
Wolf Y I et al. Genome Res. 1999;9:689-710
National Center for Biotechnology Information
History of TOL
Troubled times – "uprooting" of TOL for prokaryotes.
•
•
•
•
•
horizontal gene transfer is rampant; no gene is exempt
histories of individual genes are non-coherent with each other
tree-like signal is completely lost (or never existed at all)
there are no species (or other taxa) in prokaryotes
the consistent tree-likesignal we observe is created by biases in HGT
"Standard Model"
Bacteria
Eukaryotes
Archaea
"Net of Life"
Bacteria
Eukaryotes
Archaea
Doolittle WF. (2000) Uprooting the tree of life. Sci. Am. 282, 90-95 [modified]
National Center for Biotechnology Information
So – in the light of HGT - is there a
Tree of Life?
Or was the history of life more like a
net?
To address this question, explore the
Forest of Life – trees for ALL
conserved genes
Exploration of the Forest of Life (FOL): bioinformatic pipeline
> 90% species
National Center for Biotechnology Information
RECONSTRUCTION OF THE FOL
NUTs
FOL
1. SELECTION OF ORTHOLOGOUS GENES
ANALYSIS OF THE FOL
2. MULTIPLE ALIGNMENT
4. TREE COMPARISON METHODS
Matrix of distances between trees
Tree1
1
Tree2
0.492
Tree3
0.591 0.487
…
TreeN
3. CONSTRUCTION OF ML
PHYLOGENETIC TREES
0.2
…
1
…
1
…
0.325 0.485 0.112
Tree
comparison
networks
1
cluster
analysis
Forest of Life (FOL) and Nearly Universal Trees (NUTs)
6901 trees (FOL)
2000
Number of trees
National Center for Biotechnology Information
FOL and NUTs
1000
0
0
20
40
60
80
100
Number of species in tree
most of the trees
contain relatively
few species
102 "nearly universal"
trees (90+ species):
NUTs
National Center for Biotechnology Information
Archaea and Bacteria in NUTs
How well archaea and bacteria are separated?
archaea and
bacteria mixed
A/ B
archaea invade
bacterial subtree
A<-B
A->B
A<->B
8%
archaea and
bacteria separated
13%
23%
56%
bacteria invade
archaeal subtree
56% of NUTs show perfect separation between archaea and bacteria;
92% of NUTs show partial, non-random separation
Inconsistency Score (IS) compares a tree to a set of trees. The score is
based on the frequency of bipartitions derived from the given tree among
the whole set of trees. Range ~[0..1].
1
random trees
0.5
IS
NUTs
COG0009
COG0537
COG0057
COG0198
COG0177
COG0149
COG0519
COG0540
COG0185
COG0094
COG0126
COG0124
COG0089
COG0522
COG0172
COG0452
COG0495
COG0096
COG0525
COG0051
COG0528
COG0100
COG0090
COG0092
COG0541
0
COG0532
National Center for Biotechnology Information
NUTs vs Random Trees
NUTs are much more consistent other than random trees
National Center for Biotechnology Information
NUTs Pattern of Similarity
An edge connects two vertices (trees) if the distance between them is
below the threshold
A single connected cluster appears and gradually grows to encompass all
NUTs as the threshold is lowered.
National Center for Biotechnology Information
Are NUTs Clustered?
The 102x102 matrix of distances between NUTs is projected into a lowerdimensional space using multidimensional scaling (CMDS).
Analysis using gap function approach (Tibshirani et al. 2001) shows lack
of distinct separate clusters in the tree topology space
National Center for Biotechnology Information
NUTs vs FOL
Similarity between NUTs and the rest of the FOL.
The NUTs are connected to 2505 trees (36%) from the FOL at a 0.8
similarity cut-off. The mean similarity between the NUTs and the FOL is
~0.5 (only ~0.1 for random trees).
National Center for Biotechnology Information
NUTs vs FOL
The matrix of distances between all the trees in the FOL is projected into
a lower-dimensional space using CMDS.
0.5
0.4
NUTs
0.3
0.2
2
0.1
3
-0.4
-0.2
-0.1
-0.2
-0.3
(2)
63.34% *
(1)
42.43% *
(3)
62.11 %
**
4
0
-0.6
(6)
48.6% **
1
0
0.2
0.4
0.6
0.8
5
6
7
NUTs
(4)
56.21 %
**
(5)
50.17 %
**
(7)
49.66 %
**
-0.4
-0.5
NUTs vs FOL clusters
FOL/COG trees form 7 distinct clusters in tree topology space.
Clusters differ largely by phyletic patterns. NUTs form a tight group within
one of the clusters and are approximately equidistant to all clusters.
National Center for Biotechnology Information
Interim conclusions
Analysis of the full Forest of Life in comparison to NUTs shows that:
• a considerable fraction of FOL trees are very similar to NUTs: average
FOL-NUTs similarity is dramatically above the random level
• unlike NUTs, topologies of the FOL trees show distinct clustering
largely determined by the phyletic patterns (i.e. set of species present)
• in the tree topology space NUTs form a comparatively tight centrally
located group
• compared to NUTs, FOL trees show much wider diversity of their
topologies; however, the "central" tree-like signal still exists for a large
part of the FOL
• a "consensus" tree make sense at least as a crude representation of
the common trend in the FOL (especially so for the NUTs)
National Center for Biotechnology Information
Distinguishing Tree and Net signals in the FOL
using Quartets of species
The FOL shows a great diversity of phyletic and phylogenetic patterns.
We employ the quartet analysis to measure the vertical and horizontal
evolutionary signal in different areas of the Forest.
• there are C4100  4x106 species quartets from the set of 100 species
• each quartet of species can resolve into 3 different tree topologies
(~12x106 combinations total)
• any tree containing all 4 species from a given quartet resolves them
into one of these 3 topologies
• for any quartet one can compute the support for all 3 topologies within
a set of trees (i.e. relative frequencies of the topologies); ~8x1010
comparisons for the whole FOL (or any subset of trees))
1
2
3
3
f1 + f2 + f3 = 1
Looking for tree-like and net-like evolution in the FOL
National Center for Biotechnology Information
Species Matrix – NUTs – mostly tree-like signal
NUTs
Crenarchaeota
Euryarchaeota
Cyanobacteria
Proteobacteria
together
often
rarely
Archaea
Bacteria
National Center for Biotechnology Information
Species Matrix – Rest of the FOL –
much stronger net-like (horizontal) signal
FOL-NUTs
Crenarchaeota
Euryarchaeota
Cyanobacteria
Proteobacteria
together
often
rarely
Archaea
Bacteria
National Center for Biotechnology Information
Species Matrix vs Function
J
U
K
L
D
F
H
I
N
O
S
M
E
C
G
R
Q
P
T
V
National Center for Biotechnology Information
Conclusions – 3
Quartet-based analysis of species position in the Forest of
Life shows that:
• relationships between species in NUTs roughly follow the
conventional microbial taxonomy – probably, a
consequence of significant contribution of tree-like
evolution
• the “TOL” signal declines with decreasing number of
species in the tree
• different functional classes of genes show substantially
different balance between tree-like and net-like modes of
gene transfer and possibly in preferred routes of HGT
National Center for Biotechnology Information
The Take-Home Message
• There is no single "Tree of Life" describing the evolutionary history of
all or even the majority of the prokaryote genomes
• Yet, there is a central tree-like trend of evolution compatible with a
common history of descent of prokaryotic groups
• This trend is more evident at shallow phylogenetic depths, in more
ubiquitous genes and among some functional categories of genes
(eg, translation)
• Observations are compatible with the ancient divergence between the
bacterial and archaeal clades followed by rapid diversification of major
phyla followed by HGT that distorted but did not destroy the tree-like
signal
• Altogether, HGT might dominate evolution but the tree-like signal is
stronger than the signal from any particular route of HGT
Puigbo P, Wolf YI, Koonin EV. (2009) Search for a 'Tree of Life' in the thicket of the
phylogenetic forest. J. Biol. 8, 59
Koonin EV, Wolf YI, Puigbo P. (2009) The Phylogenetic Forest and the Quest for the
Elusive Tree of Life. Cold Spring Harb Symp Quant Biol. [advanced epub]
Koonin EV, Wolf YI. (2009) The fundamental units, processes and patterns of evolution, and
the tree of life conundrum. Biol Direct. 4, 33
Puigbo P, Wolf YI, Koonin EV. The tree and net signals in the evolution of prokaryotes,
In preparation
National Center for Biotechnology Information
The Take-Home Message
To see the forest for the trees…
…but also to see trees for the forest.
National Center for Biotechnology Information
A quick redux on bioinformatics
•bioinformatics is a toolbox for computational analysis of
biological problems
•the toolbox is already quite large and versatile, so in
many cases, combination of available tools into pipelines
allows researchers to address new questions
•of course, new tools often have to be developed and added
to the toolbox
•biologists must understand the capacities but also the
limitations of the tools
Can bioinformatics – using the vast amounts of data produced by genomics and
systems biology – help transform biology “from stamp collection to physics”?
National Center for Biotechnology Information
Acknowledgments
Fellow explorers of the Forest of Life
Pere Puigbo
Yuri Wolf