Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
BODHI, A Bio-diversity Database Pla(n)tform Jayant Haritsa Database Systems Lab Supercomputer Education and Research Centre Indian Institute of Science BODHI 1 Team B. J. Srikanta (next talk) Prof. Madhav Gadgil Prof. V. Nanjundiah (Centre for Ecological Sciences, IISc) Several Masters Students Funded by DBT BODHI 2 Motivation GATT – Patent Laws To be in place by 2005 Loss Neem Basmati (estimated export value: Rs. 1,198 crore) Turmeric Global and local efforts GBIF (Global Biodiversity Information Facility) Karnataka Bio-diversity Board [Deccan Herald - Aug 26 2000] BODHI 3 Bio-diversity Data Taxonomy of species Phenetic (physical) characteristics Phylogenetic (evolutionary) characteristics Habitat / Spatial distribution Political Layout Geographic Layout Biospheres Genetic information Bio-molecular sequences Structural information BODHI 4 MULTI-DOMAIN QUERY Retrieve all plant species that share a common habitat, have identical Inflorescence characteristics, and have a DNA sequence within BLAST score of 80, with respect to “Michelia-champa”. BODHI 5 Difficulties: Complex range of data types sets, hierarchies, aggregations, sequences, geometries, maps, audio, images … Multidimensional data spatial (latitude, longitude, elevation) to proteins (hundreds of coordinates) Computationally-intensive operators species relationships, spatial distributions, sequence alignments, ... BODHI 6 Current Solutions Small-Scale MS-Access / FoxPro / Excel / ... Pentium PCs Large-Scale RDBMS: Oracle / DB2 / Informix / Sybase / … Unix servers: Sun / SGI / IBM / HP / ... BODHI 7 Limitations: RDBMS approach of “the world is a flat collection of tables with simple attributes” suits financial applications, NOT scientific (biological) applications In particular, taxonomic / spatial / sequence / multimedia data modeling and processing are very cumbersome and coarse BODHI 8 Limitations (contd) Spatial and other applications are not within the database kernel but are connected externally. E.g. Many GIS systems have ArcInfo and MS-Access hooked up in a “black-box” manner. Or, Blast/FASTA utilizing sequence files generated from Oracle. Problem: Slow and ugly! BODHI 9 Is there Hope? Object-Oriented DBMS “Natural” for biological applications High-performance data access methods Path Dictionary Index, Multi-key Type Index, Pyramid Tree, ... High-performance specialized operators spatial join, data mining, sequence processing, … XML = HTML + Semantics BODHI 10 Goals of BODHI Seamless integration of taxonomic, spatial and genomic data using OO technology Latest access methods and operators for all three types of data Utilize XML for data exchange Low-cost (ideally, free!) BODHI 11 Architecture of BODHI The Internet Client Interface Framework Query Processor Spatial Operations Object Operations Genome Operations Spatial Indexes Object Indexes Genome Indexes Spatial Model Taxonomy Model Genome Model Spatial Services Object Services Sequence Services OBJECT STORAGE MANAGER BODHI 12 Implementation of BODHI The Internet Client Interface Framework –DB Overlaps, Contains, Closest, Within Inheritance Aggregation Alignment BLAST, FASTA R*-tree, Hilbert-Rtree Multi-Key Type, Path-Dictionary ??? Indexes (next talk) Country, State, City, River, Road Species, Genera, Family, Order DNA, Protein Spatial Services Object Services Sequence Services Basic Types (Point, Line, Polygon, Sets, Sequences, ...) SHORE MICRO-KERNEL BODHI 13 Query Flow BODHI 15 Project Status Prototype (minus Client Interface Framework) is operational since last month ! Platform: PIII-700MHz running Redhat Linux. For Code, contact “[email protected]” BODHI 16 Performance Evaluation SEQUOIA 2000 spatial benchmark: Competitive with Paradise GIS from Wisconsin Taxonomy + Spatial Queries: Reasonably fast But Genomics slows things down a lot due to absence of indexes (next talk) BODHI 17 More details “Design and Implementation of a Biodiversity Information System”, Proc. of Intl. Conf. On Management of Data (COMAD), Pune, December 2000 “The Building of BODHI, A Bio-diversity Database System”, TechRep-2001-02, DSL/SERC, IISc Available at http://dsl.serc.iisc.ernet.in BODHI 18 End of Talk BODHI 19