Download Mapping phylogenetic trees to reveal distinct patterns of evolution

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Mapping phylogenetic trees
to reveal distinct patterns of evolution
Michelle Kendall
Joint work with Caroline Colijn
Motivation: tree uncertainty
Tree inference is complicated
Large space of trees: (2k − 3)!!
Bayesian methods give posterior
collection of trees
Non tree-like evolution
An approach to compare and ‘map’ trees
tree accuracy, credibility
detect different phylogenetic signals in complex data
Metrics
Let Tk be the set of all rooted, labelled trees on k taxa (leaves).
A distance function d : Tk × Tk → R is a metric if, for all x, y , z ∈ T :
1
d(x, y ) ≥ 0
2
d(x, y ) = 0 ⇔ x = y
3
d(x, y ) = d(y , x)
4
d(x, y ) ≤ d(x, z) + d(z, y )
y
x
z
Phylogenetic metrics
Robinson Foulds
Designed for unrooted trees
Topological and weighted versions
A lot of trees are all the same distance
apart
Distances do not necessarily correspond
to biological intuition
RF distances from Tr ∈ T100 to 300 random trees
from T100
Phylogenetic metrics
Billera, Holmes and Vogtmann
Make a space by “glueing together” orthants
Hard to visualize and compute
Considers branch lengths
Convex
Distances do not necessarily correspond
to biological intuition
Geometry of the Space of Phylogenetic Trees,
Billera, Holmes, Vogtmann
Phylogenetic metrics
We present a metric which:
has a simple, biologically meaningful
definition
can focus on topology or branch
lengths, or both
is ‘discerning’ of subtle differences
Our distances Tr ∈ T100 to 300 random trees
from T100
What we compare (topology only)
What we compare (with branch lengths)
Our metric
Let mi,j = the number of edges from the root to the MRCA of i and j,
let Mi,j = the path length from the root to the MRCA of i and j, and
let pi = the length of the pendant edge to tip i.




m1,2
M1,2
 m1,3 
 M1,3 




 .. 
 .. 
 . 
 . 




mk−1,k 


 + λ Mk−1,k 
vλ (Ta ) = (1 − λ) 
 1 
 p1 




 1 
 p2 




 .. 
 .. 
 . 
 . 
pk
1
The distance between two trees is
dλ (Ta , Tb ) = kvλ (Ta ) − vλ (Tb )k
Our metric
Why is this comparison informative?
compares lineages
sensitive to any changes, but changes deeper in the tree are
emphasised
small differences in shape and/or labelling are similarly prioritised
can include branch lengths to a specified extent
Euclidean distance lends itself to clear visualisation
MDS visualisation of tree space, λ = 0
The space of 6-tip trees
RF
BHV
The space of 6-tip trees
λ=0
RF
Detecting alternatives in Bayesian posteriors
Anole lizards: a model system for
ecological phenomena
I
reproductive character displacement,
adaptation, behavior and speciation
recent mitochondrial and nuclear DNA
analysis of distichus group of trunk
ecomorph anoles found two main areas
of uncertainty in species tree:
1
2
Bahamas and the North Paleo-island
of Hispaniola (clade with 0.64
support)
sister clade containing mainly anoles
from the South Paleo-island and
dominicensis1 from northern Haiti.
Geneva, A. J. et al. Multilocus phylogenetic analyses of Hispaniolan and Bahamian
trunk anoles (distichus species group). Mol Phylogenet Evol 87, 105–117 (2015)
Mapping 1000 trees from the posterior
Credible alternatives
The posterior typically becomes unimodal as λ → 1
λ = 0.05
The posterior typically becomes unimodal as λ → 1
λ = 0.1
We have found island structure repeatedly
Chorus frogs, λ = 0
We have found island structure repeatedly
Dengue fever, λ = 0
Summary trees
Heled, J., and Bouckaert, R. R. Looking for trees in the forest: summary tree from
posterior samples. BMC Evolutionary Biology, 13, 221, (2013)
Summary trees, divergence times
I
I
I
I
I
Maximum clade credibility
tree
Maximum a posteriori tree
topology
Consensus tree
Median tree
etc.
Drawbacks:
I
I
summary tree may be different
from all posterior trees
negative branch lengths
HIV within-host consensus tree
Summary trees
We can select a representative tree from each cluster
The geometric median tree for the posterior or a given cluster S ⊂ Tk
is the tree(s) which minimises the distance to the vector:
vcentre =
1 X
vλ (Ti )
|S|
Ti ∈S
We can include weights (for example the likelihood of the trees):
Vcentre =
1 X
wi vλ (Ti )
|S|
Ti ∈S
Application: Trees from different data / methods
Two related questions:
Do different parts of the data give different phylogenetic signals?
I
e.g. individual genes / concatenated genes / whole genome
How sensitive is a phylogeny to the method used?
I
I
I
DNA v protein / SNPs v INDELs etc.
include regions with missing data?
choice of software / settings
Trees from different data: HIV
Islands within HIV pol gene trees
Ebola: trees from different genes, λ = 0
Ebola: trees from different genes, λ = 0
A
B
C
E
D
E
D
F
G
Ebola: trees from different genes, λ = 0
A
D
B
E
C
F
G
Work in progress: which sites shape the phylogeny most?
HIV in-host longitudinal study
Data: env glycoprotein, in-host over many years pre treatment.
Shankarappa, ... Mullins et al, J. Virology 1999
3
Distance from the tree with all sites
●
●
●
●
●
●
●
●
●
●
●
●
●
●
2
●
●
●
●
●
●●
●
1
0
●
●
●
●
●
●
●
●
●
●●
● ●●●●● ●●● ●●●
●● ●●●
●●●●
●●●●●●●●●●●●●●●
●●
●●
0
●
●
●
●
●
●
●●● ●● ● ●
●● ●●
●● ●●●●● ●
●● ● ● ● ●●●●● ●●
● ●
●
●
●
●
●
●
●
●
●
●
●● ●● ●
●●
●●
● ●
●
●●●●●● ●●
●●●●
●●●●●● ●●●●●● ●●● ●●●●●●●●
●●
● ●●●● ●● ●●●●●●●●●●● ●●
● ●●●●
●● ● ● ● ● ●
200
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●● ●
●
●
●
●
●●
●
●
●●
●
●
●
● ●●●
●●●● ●
●●●●
● ●●●●●●●
●● ● ●●●●●
● ●●●●● ● ●●●●●●●●●●●● ●
●●●●●
●●
●●
●●● ●●●
●●●● ● ●● ●● ●
●●● ●
400
●
●
●
●
● ●●
●●●●●
600
Site within env glycoprotein C2−V5
Size: topological distance. Colour: Robinson-Foulds distance
Summary
A method to map and explore phylogenetic tree space
Applications:
I
I
I
I
I
I
Detecting credible alternative evolutionary patterns supported by data
Selecting representative tree(s)
Comparing gene trees to each other and whole genome trees
Comparing methods, software testing
Sensitivity of tree to input data
Detecting informative sites
For more details...
Kendall, M. and Colijn, C. A tree metric using structure and length to
capture distinct phylogenetic signals arXiv:1507.05211
Kendall, M. and Colijn, C. Mapping phylogenetic trees to reveal distinct
patterns of evolution bioRxiv http://dx.doi.org/10.1101/026641
Jombart, T., Kendall, M., Almagro-Garcia, J. and Colijn, C. treescape R
package available on CRAN
Thank you