Survey							
                            
		                
		                * Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
BODHI, A Bio-diversity Database Pla(n)tform Jayant Haritsa Database Systems Lab Supercomputer Education and Research Centre Indian Institute of Science BODHI 1 Team  B. J. Srikanta (next talk)  Prof. Madhav Gadgil Prof. V. Nanjundiah (Centre for Ecological Sciences, IISc)  Several Masters Students  Funded by DBT BODHI 2 Motivation  GATT – Patent Laws  To be in place by 2005  Loss    Neem Basmati (estimated export value: Rs. 1,198 crore) Turmeric  Global and local efforts   GBIF (Global Biodiversity Information Facility) Karnataka Bio-diversity Board [Deccan Herald - Aug 26 2000] BODHI 3 Bio-diversity Data  Taxonomy of species   Phenetic (physical) characteristics Phylogenetic (evolutionary) characteristics  Habitat / Spatial distribution    Political Layout Geographic Layout Biospheres  Genetic information   Bio-molecular sequences Structural information BODHI 4 MULTI-DOMAIN QUERY  Retrieve all plant species that share a common habitat, have identical Inflorescence characteristics, and have a DNA sequence within BLAST score of 80, with respect to “Michelia-champa”. BODHI 5 Difficulties:  Complex range of data types  sets, hierarchies, aggregations, sequences, geometries, maps, audio, images …  Multidimensional data  spatial (latitude, longitude, elevation) to proteins (hundreds of coordinates)  Computationally-intensive operators  species relationships, spatial distributions, sequence alignments, ... BODHI 6 Current Solutions  Small-Scale   MS-Access / FoxPro / Excel / ... Pentium PCs  Large-Scale   RDBMS: Oracle / DB2 / Informix / Sybase / … Unix servers: Sun / SGI / IBM / HP / ... BODHI 7 Limitations:  RDBMS approach of “the world is a flat collection of tables with simple attributes” suits financial applications, NOT scientific (biological) applications  In particular, taxonomic / spatial / sequence / multimedia data modeling and processing are very cumbersome and coarse BODHI 8 Limitations (contd)  Spatial and other applications are not within the database kernel but are connected externally. E.g. Many GIS systems have ArcInfo and MS-Access hooked up in a “black-box” manner. Or, Blast/FASTA utilizing sequence files generated from Oracle.  Problem: Slow and ugly! BODHI 9 Is there Hope?  Object-Oriented DBMS  “Natural” for biological applications  High-performance data access methods  Path Dictionary Index, Multi-key Type Index, Pyramid Tree, ...  High-performance specialized operators  spatial join, data mining, sequence processing, …  XML = HTML + Semantics BODHI 10 Goals of BODHI  Seamless integration of taxonomic, spatial and genomic data using OO technology  Latest access methods and operators for all three types of data  Utilize XML for data exchange  Low-cost (ideally, free!) BODHI 11 Architecture of BODHI The Internet Client Interface Framework Query Processor Spatial Operations Object Operations Genome Operations Spatial Indexes Object Indexes Genome Indexes Spatial Model Taxonomy Model Genome Model Spatial Services Object Services Sequence Services OBJECT STORAGE MANAGER BODHI 12 Implementation of BODHI The Internet Client Interface Framework –DB Overlaps, Contains, Closest, Within Inheritance Aggregation Alignment BLAST, FASTA R*-tree, Hilbert-Rtree Multi-Key Type, Path-Dictionary ??? Indexes (next talk) Country, State, City, River, Road Species, Genera, Family, Order DNA, Protein Spatial Services Object Services Sequence Services Basic Types (Point, Line, Polygon, Sets, Sequences, ...) SHORE MICRO-KERNEL BODHI 13 Query Flow BODHI 15 Project Status  Prototype (minus Client Interface Framework) is operational since last month !  Platform: PIII-700MHz running Redhat Linux.  For Code, contact “[email protected]” BODHI 16 Performance Evaluation  SEQUOIA 2000 spatial benchmark: Competitive with Paradise GIS from Wisconsin  Taxonomy + Spatial Queries: Reasonably fast  But Genomics slows things down a lot due to absence of indexes (next talk) BODHI 17 More details  “Design and Implementation of a Biodiversity Information System”, Proc. of Intl. Conf. On Management of Data (COMAD), Pune, December 2000  “The Building of BODHI, A Bio-diversity Database System”, TechRep-2001-02, DSL/SERC, IISc  Available at http://dsl.serc.iisc.ernet.in BODHI 18 End of Talk BODHI 19