Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
Mapping phylogenetic trees to reveal distinct patterns of evolution Michelle Kendall Joint work with Caroline Colijn Motivation: tree uncertainty Tree inference is complicated Large space of trees: (2k − 3)!! Bayesian methods give posterior collection of trees Non tree-like evolution An approach to compare and ‘map’ trees tree accuracy, credibility detect different phylogenetic signals in complex data Metrics Let Tk be the set of all rooted, labelled trees on k taxa (leaves). A distance function d : Tk × Tk → R is a metric if, for all x, y , z ∈ T : 1 d(x, y ) ≥ 0 2 d(x, y ) = 0 ⇔ x = y 3 d(x, y ) = d(y , x) 4 d(x, y ) ≤ d(x, z) + d(z, y ) y x z Phylogenetic metrics Robinson Foulds Designed for unrooted trees Topological and weighted versions A lot of trees are all the same distance apart Distances do not necessarily correspond to biological intuition RF distances from Tr ∈ T100 to 300 random trees from T100 Phylogenetic metrics Billera, Holmes and Vogtmann Make a space by “glueing together” orthants Hard to visualize and compute Considers branch lengths Convex Distances do not necessarily correspond to biological intuition Geometry of the Space of Phylogenetic Trees, Billera, Holmes, Vogtmann Phylogenetic metrics We present a metric which: has a simple, biologically meaningful definition can focus on topology or branch lengths, or both is ‘discerning’ of subtle differences Our distances Tr ∈ T100 to 300 random trees from T100 What we compare (topology only) What we compare (with branch lengths) Our metric Let mi,j = the number of edges from the root to the MRCA of i and j, let Mi,j = the path length from the root to the MRCA of i and j, and let pi = the length of the pendant edge to tip i. m1,2 M1,2 m1,3 M1,3 .. .. . . mk−1,k + λ Mk−1,k vλ (Ta ) = (1 − λ) 1 p1 1 p2 .. .. . . pk 1 The distance between two trees is dλ (Ta , Tb ) = kvλ (Ta ) − vλ (Tb )k Our metric Why is this comparison informative? compares lineages sensitive to any changes, but changes deeper in the tree are emphasised small differences in shape and/or labelling are similarly prioritised can include branch lengths to a specified extent Euclidean distance lends itself to clear visualisation MDS visualisation of tree space, λ = 0 The space of 6-tip trees RF BHV The space of 6-tip trees λ=0 RF Detecting alternatives in Bayesian posteriors Anole lizards: a model system for ecological phenomena I reproductive character displacement, adaptation, behavior and speciation recent mitochondrial and nuclear DNA analysis of distichus group of trunk ecomorph anoles found two main areas of uncertainty in species tree: 1 2 Bahamas and the North Paleo-island of Hispaniola (clade with 0.64 support) sister clade containing mainly anoles from the South Paleo-island and dominicensis1 from northern Haiti. Geneva, A. J. et al. Multilocus phylogenetic analyses of Hispaniolan and Bahamian trunk anoles (distichus species group). Mol Phylogenet Evol 87, 105–117 (2015) Mapping 1000 trees from the posterior Credible alternatives The posterior typically becomes unimodal as λ → 1 λ = 0.05 The posterior typically becomes unimodal as λ → 1 λ = 0.1 We have found island structure repeatedly Chorus frogs, λ = 0 We have found island structure repeatedly Dengue fever, λ = 0 Summary trees Heled, J., and Bouckaert, R. R. Looking for trees in the forest: summary tree from posterior samples. BMC Evolutionary Biology, 13, 221, (2013) Summary trees, divergence times I I I I I Maximum clade credibility tree Maximum a posteriori tree topology Consensus tree Median tree etc. Drawbacks: I I summary tree may be different from all posterior trees negative branch lengths HIV within-host consensus tree Summary trees We can select a representative tree from each cluster The geometric median tree for the posterior or a given cluster S ⊂ Tk is the tree(s) which minimises the distance to the vector: vcentre = 1 X vλ (Ti ) |S| Ti ∈S We can include weights (for example the likelihood of the trees): Vcentre = 1 X wi vλ (Ti ) |S| Ti ∈S Application: Trees from different data / methods Two related questions: Do different parts of the data give different phylogenetic signals? I e.g. individual genes / concatenated genes / whole genome How sensitive is a phylogeny to the method used? I I I DNA v protein / SNPs v INDELs etc. include regions with missing data? choice of software / settings Trees from different data: HIV Islands within HIV pol gene trees Ebola: trees from different genes, λ = 0 Ebola: trees from different genes, λ = 0 A B C E D E D F G Ebola: trees from different genes, λ = 0 A D B E C F G Work in progress: which sites shape the phylogeny most? HIV in-host longitudinal study Data: env glycoprotein, in-host over many years pre treatment. Shankarappa, ... Mullins et al, J. Virology 1999 3 Distance from the tree with all sites ● ● ● ● ● ● ● ● ● ● ● ● ● ● 2 ● ● ● ● ● ●● ● 1 0 ● ● ● ● ● ● ● ● ● ●● ● ●●●●● ●●● ●●● ●● ●●● ●●●● ●●●●●●●●●●●●●●● ●● ●● 0 ● ● ● ● ● ● ●●● ●● ● ● ●● ●● ●● ●●●●● ● ●● ● ● ● ●●●●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ●● ●● ● ● ● ●●●●●● ●● ●●●● ●●●●●● ●●●●●● ●●● ●●●●●●●● ●● ● ●●●● ●● ●●●●●●●●●●● ●● ● ●●●● ●● ● ● ● ● ● 200 ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ●●● ●●●● ● ●●●● ● ●●●●●●● ●● ● ●●●●● ● ●●●●● ● ●●●●●●●●●●●● ● ●●●●● ●● ●● ●●● ●●● ●●●● ● ●● ●● ● ●●● ● 400 ● ● ● ● ● ●● ●●●●● 600 Site within env glycoprotein C2−V5 Size: topological distance. Colour: Robinson-Foulds distance Summary A method to map and explore phylogenetic tree space Applications: I I I I I I Detecting credible alternative evolutionary patterns supported by data Selecting representative tree(s) Comparing gene trees to each other and whole genome trees Comparing methods, software testing Sensitivity of tree to input data Detecting informative sites For more details... Kendall, M. and Colijn, C. A tree metric using structure and length to capture distinct phylogenetic signals arXiv:1507.05211 Kendall, M. and Colijn, C. Mapping phylogenetic trees to reveal distinct patterns of evolution bioRxiv http://dx.doi.org/10.1101/026641 Jombart, T., Kendall, M., Almagro-Garcia, J. and Colijn, C. treescape R package available on CRAN Thank you