* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download RodPageKewTalk - Taxonomy and Systematics at Glasgow
Information privacy law wikipedia , lookup
Data analysis wikipedia , lookup
Operational transformation wikipedia , lookup
Data vault modeling wikipedia , lookup
Open data in the United Kingdom wikipedia , lookup
Business intelligence wikipedia , lookup
Concurrency control wikipedia , lookup
Versant Object Database wikipedia , lookup
Relational model wikipedia , lookup
TreeBASE and Phyloinformatics Roderic Page University of Glasgow At the core of a ToL effort must be a “phyloinformatic intrastructure” Tools for: • data and tree storage • analysis (supertrees, supermatrices) • collaboration • meta analysis It’s a scandal • We cannot answer even the most basic question: “what is the phylogeny for group x?” • GenBank is currently the best phylogenetic database(!) • Can't even say how many species are in a given group • Little idea of who is doing what Tree of Life tolweb.org • Provides text and images • Relies on extensive manual effort (e.g., writing text) • Can’t do any computations with it • Limited research value TreeBASE www.treebase.org • Relational database • Query by author, taxon, study number • Compute supertrees • Submit NEXUS data files TreeBASE and mincut supertrees • User selects two or more trees • Clicks on button and script on darwin.zoology.gla.ac.uk is run to create supertree • Can view as PS, PDF, treefile, or in Java applet (ATV) Dependencies amongst studies (Gatesy et al.) What’s wrong with TreeBASE? • No consistency of taxon names • (e.g., Human, Homo sapiens, Homo sapiens X54666-1) • No consistency of data names (e.g., gene names, morphological characters, etc.) What needs to be done to TreeBASE? • Consistency of taxon names • Consistency of data names (e.g., gene names) General issues • Develop tools for rapid construction of supertrees and supermatrices • Visualisation of trees (and other graphs) • Queries to highlight areas of uncertainty • Easy submission of rigorously annotated data • Resolve centralisation versus distributed (one database or many?) The single most important thing we could do is to create a phyloloinformatic infrastructure to support ToL studies (IMHO) Collections and Voucher Specimen Databases Species Name Databases Sequence Databases PII Primary Database Comparative Data Phylogenetic Trees Higher Taxon Name Database Secondary Databases Synthetic View of Tree of Life Synthetic View of Tree of Life ....additional syntheses Phylogenetically driven queries Biological Databases ..... Biological Databases