* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download PHYLOGENETIC NETWORKS
Gene expression profiling wikipedia , lookup
Microevolution wikipedia , lookup
Maximum parsimony (phylogenetics) wikipedia , lookup
Designer baby wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Helitron (biology) wikipedia , lookup
Metabolic network modelling wikipedia , lookup
Gene expression programming wikipedia , lookup
A Separate Analysis Approach to the Reconstruction of Phylogenetic Networks Luay Nakhleh Department of Computer Sciences UT Austin Who’s Involved – UT CS: Tandy Warnow, Luay Nakhleh – UT BIO: Randy Linder – UNM CS: Bernard Moret Why Networks? • Lateral gene transfer (LGT) – Ochman estimated that 755 of 4,288 ORF’s in E.coli were from at least 234 LGT events • Hybridization – Estimates that as many as 30% of all plant lineages are the products of hybridization – Fish – Some frogs Phylogenetic Networks • Rooted, directed, acyclic graphs that actually model the evolutionary process • “tree” nodes and “network” nodes • Time constraints Separate Analysis • Analyze individual genes separately • Reconcile the resulting phylogenies • As opposed to combined analysis in which the datasets are combined (via concatenation) and the combined dataset is then analyzed SPR Distances Among Gene Trees A B C D E SPR Distance 1 A B C D E A B C D E Maddison’s Method Given two gene datasets • Construct two gene trees T1 and T2 • If SPR(T1,T2)=0 – Return a tree • If SPR(T1,T2)=1 – Return a network with one reticulation event Open problem: extend to reconstructing a network with m reticulation events Challenges (1) Computational – Computing SPR distances is of unknown computational complexity (probably hard) Solving the Computational Challenge • Galled-networks: reticulation events are independent • For two gene trees T1 and T2 on n leaves we can – Decide whether SPR(T1,T2)=m in O(mn) time, and – Construct network N from T1 and T2 in O(mn) time Challenges (2) Systematic – Obtaining the correct gene trees in practice is very hard (due to missing data, inaccuracy of tree reconstruction methods, wrong assumptions, etc.) Solving the Systematic Challenge: Our Method SpNet Given the sequences of two genes I & II on a set of species • Run MP or ML on gene I and obtain a set U1 of trees, represented by its consensus tree t1 • Run MP or ML on gene II and obtain a set U2 of trees, represented by its consensus tree t2 • Find binary trees T1 and T2, that refine t1 and t2, respectively, and such that SPR(T1,T2)=1 • Build network N from T1 and T2 SpNet: Running Time • We have a linear-time algorithm for the single hybrid case (implementation and experimental results are available as well) • We are working on the general case of arbitrary number of reticulation events Experimental Study • Generated random networks on 10 and 20 taxa, with 0, 1, and 2 hybrids • Evolved sequences under the GTR+Gamma model of evolution with invariant sites • We studies the topological accuracy based on the splits defined by the model and inferred network Evaluation Criteria • Detection Quality – How often did the method infer the correct number of hybrids in the model phylogeny? • Reconstruction Quality – What is the topological accuracy of the inferred phylogeny? Methods • SpNet(i): Our method where we contract i edges • NNet: The method of Bryant and Moulton • NJ Detection Quality of SpNet Model Phylogeny: 20-taxon Tree Detection Quality of SpNet Model Phylogeny: 20-taxon 1-hybrid network Detection Quality of SpNet Model Phylogeny: 20-taxon 2-hybrid network Reconstruction Quality Model Phylogeny: 20-taxon tree Reconstruction Quality Model Phylogeny: 20-taxon 1-hybrid network Reconstruction Quality Model Phylogeny: 20-taxon 1-hybrid network Conclusions • Considering a set of “good” trees rather than a single optimal tree is advantageous in network reconstruction • Separate analysis approaches outperform combined analysis approaches Ongoing research • Using other techniques for obtaining unresolved trees (e.g., Bayesian analyses, bootstrapping, etc.) • Detection vs. reconstruction – visualization and clustering techniques may also be useful (collaboration with St John) • Refining unresolved networks • DCM-like network reconstruction