* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download ppt
Oracle Database wikipedia , lookup
Open Database Connectivity wikipedia , lookup
Microsoft SQL Server wikipedia , lookup
Extensible Storage Engine wikipedia , lookup
Entity–attribute–value model wikipedia , lookup
Microsoft Jet Database Engine wikipedia , lookup
Concurrency control wikipedia , lookup
Relational model wikipedia , lookup
Develop Database Requirements to Yield Schema and Interfaces 1. (near term) 2. MoBIoS: Database Management for Data in Metric Spaces Daniel P. Miranker Univ. of Texas What we know for sure: Exploit Commodity Architecture External Data/DB Sources Curating New Content Users Web App Server Computing Grid DB Repository Schema and Interface Definitions Issue: • Database organization and data interchange should be addressed simultaneously • Once established, difficult to change Best to get this right the first time. What we know for sure: 1. Data transfer XML & Nexus files 2. Curate: (manage quality) Curating New Content Users Web App Server Computing Grid DB Schema Both 1 & 2 impact schema, (data provenance) XML and Bioinformatics • Taxonomic Markup Language (TML) • PhyloML • BEAST: Bayesian Evolutionary Analysis Sampling Trees • AGAVE: Architecture for Genomic Annoation Visualization and Exchange § Answers Start with a Requirements Analysis • • • • Who What Why How “Use cases”: specific examples of what is to be accomplish A Head Start Requirements of Phylogenetic Databases (with Nakhleh, Barbancon Piel & Donoghue)[BIBE ’03] • Did • Proof of concept for a correctly normalized database schema a requirements analysis 1 evolutionary (tree)-edge = 1 row in the database Who is interested in using Phylogenies? • • • • • • • Casual Users Visualization Study Development Super-tree algorithms Simulation Studies Parameter Derivation Comparative Genomics Super-Tree Algorithms Use-Cases Construct phylogenies by assembling existing studies Collect those studies by: • Determine minimum spanning clade for a set of taxa • Find all phylogenies sufficiently similar to a given phylogeny Requirements of Phylogenetic Databases The MoBIoS Project Molecular Biological Information System Daniel P. Miranker University of Texas MoBIoS – A Simple Idea Organize the Storage Manager Around Metric Space Indexing Relational Databases B+ trees Spatial Databases Metric Databases R & K-D trees VP, M & GNAT trees 1 dimensional 2&3 dimensions No dimensions Or very high dimensions Biological queries conducted with sequential scans. • • • • Sequence (BLAST) Phylogenies (Tree of Life) Mass Spectra (Proteomics) Ligand Docking (Rational Drug Design) Metric Space is • a pair, M=(D,d), where • D is a set of points • d is [metric] distance function with the following properties: – d(x, y) = d (y, x) – d(x, y) > 0, d(x, x) = 0 – d(x, y) <= d(x, z) + d(z, y) (symmetry) (non negativity) (triangle inequality) Can Biology Be Modeled by Metrics? • Already metrics re: – Phylogenetic trees – Ligand docking • First Biologically Effective Metric Model of Amino Acid Substitution [Xu&Miranker 03] In effect, precisely the phylogenetic relationships among sequences are exploited to form a database index. • Metrics for proteomic mass-spectra underway MoBIoS Architecture (Molecular Biological Information System) phylogenies First Application (with Randy Linder) Compared: {entire Arib. Genome} x {“entire” Rice genome} To determine conserved pairs of primer pairs, In O(m log n), will repeat study again soon, faster. When biological data is put in to an RDBMS • Primary data is stored in text or blob fields – Annotations may be relational Organism Function Sequence (BLOB) Yeast membrane AACCGGTTT Yeast mitosis TATCGAAA E. Coli membrane AGGCCTA • Data retrieval – Filter DB, sequential dump, O(n), to utilities • E.g. BLAST, TreeBASE, Sequest Homework: Due tomorrow morning 1. Who are you, (generically)? 2. Use case involving the database Don’t know: A General Web Service ToL Infrastructure @ SDSC Curating New Content Computing Grid Web App Server Computing Grid DB Schema