Download RodPageKewTalk - Taxonomy and Systematics at Glasgow

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Information privacy law wikipedia , lookup

Data analysis wikipedia , lookup

Operational transformation wikipedia , lookup

Data vault modeling wikipedia , lookup

SAP IQ wikipedia , lookup

Open data in the United Kingdom wikipedia , lookup

Business intelligence wikipedia , lookup

Concurrency control wikipedia , lookup

Versant Object Database wikipedia , lookup

Database wikipedia , lookup

Relational model wikipedia , lookup

Clusterpoint wikipedia , lookup

Database model wikipedia , lookup

Transcript
TreeBASE and
Phyloinformatics
Roderic Page
University of Glasgow
At the core of a ToL effort
must be a “phyloinformatic intrastructure”
Tools for:
• data and tree storage
• analysis (supertrees, supermatrices)
• collaboration
• meta analysis
It’s a scandal
• We cannot answer even the most basic question:
“what is the phylogeny for group x?”
• GenBank is currently the best phylogenetic
database(!)
• Can't even say how many species are in a given
group
• Little idea of who is doing what
Tree of Life
tolweb.org
• Provides text and
images
• Relies on extensive
manual effort (e.g.,
writing text)
• Can’t do any
computations with it
• Limited research value
TreeBASE
www.treebase.org
• Relational database
• Query by author,
taxon, study number
• Compute supertrees
• Submit NEXUS data
files
TreeBASE and mincut
supertrees
• User selects two or more
trees
• Clicks on button
and script on
darwin.zoology.gla.ac.uk
is run to create supertree
• Can view as PS, PDF,
treefile, or in Java applet
(ATV)
Dependencies amongst
studies (Gatesy et al.)
What’s wrong with
TreeBASE?
• No consistency of taxon names
• (e.g., Human, Homo sapiens, Homo sapiens
X54666-1)
• No consistency of data names (e.g., gene
names, morphological characters, etc.)
What needs to be done to
TreeBASE?
• Consistency of
taxon names
• Consistency of data
names (e.g., gene
names)
General issues
• Develop tools for rapid construction of supertrees and
supermatrices
• Visualisation of trees (and other graphs)
• Queries to highlight areas of uncertainty
• Easy submission of rigorously annotated data
• Resolve centralisation versus distributed (one database or
many?)
The single most important
thing we could do is to create a
phyloloinformatic
infrastructure
to support ToL studies
(IMHO)
Collections and
Voucher Specimen
Databases
Species Name
Databases
Sequence
Databases
PII
Primary Database
Comparative Data
Phylogenetic Trees
Higher Taxon
Name Database
Secondary Databases
Synthetic View
of
Tree of Life
Synthetic View
of
Tree of Life
....additional
syntheses
Phylogenetically driven
queries
Biological
Databases
.....
Biological
Databases