Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
DATABASE MANAGEMENT SYSTEMS IN DATA INTENSIVE ENVIRONMENNTS Leon Guzenda Chief Technology Officer AGENDA • Introduction • Issues and Approaches • Summary & Resources DMW2004 Copyright Objectivity, Inc. 2004 3/16/04 Objectivity, Inc. & Objectivity/DB Objectivity Corporate Informationfor: Object Database Management • Data intensive applications that manipulate complex data • High throughput systems • Very large volumes of data Main Markets Product Highlights • Government • High Performance with complex data • Scientific • Scalability and High Availability • Telecommunications • Engineering • Manufacturing • Complex IT • Fully Distributed • Interoperability - C++, Java, Smalltalk, SQL and XML - Linux, LynxOS, Unix and Windows • Productivity - Eclipse IDE - Eliminates the object to DB mapping layer DMW2004 Copyright Objectivity, Inc. 2004 3/16/04 SCALABILITY • Data Volume - 890 Terabytes [BaBar] • Throughput – Ingested 32 Terabytes per Day [Benchmark] In a recent benchmark with Objectivity/DB running on 64 Irix processors (600 MHz), CXFS and a 100 Terabyte SAN we achieved: • An ingest rate of 32 Terabytes per day (input, correlate and commit) • Simultaneous queries from 32 processors running at near to 100% CPU capacity • Simultaneous movement and deletion of aged data to a long term repository • Simultaneous Users – 100s of Thousands [SprintPCS] DMW2004 Copyright Objectivity, Inc. 2004 3/16/04 Issues and Approaches ISSUES • Describing complex data • Exponentially increasing data volumes • Sharing data across sites • Querying huge datasets • Cost of Ownership DMW2004 Copyright Objectivity, Inc. 2004 3/16/04 DESCRIBING COMPLEX DATA Approaches: • Old Way - Definitions buried in header files - Language-specific schema language (DDL/SQL) • Current Approaches - Unified Modeling Language [UML] - XML • Trends - Java Database Objects [JDO] - Grid Database Access and Integration Services - Higher level schemas and ONTOLOGIES DMW2004 Copyright Objectivity, Inc. 2004 3/16/04 DATA VOLUMES Approaches: • Old Way - Keep data in compressed files and index them in a DBMS - Proprietary tape archives • Current Approaches - Store everything in an ODBMS (lower overheads than an RDBMS) - Hierarchical storage systems (HPSS etc.) • Trends - DMW2004 Solid State Disks at the front end, commodity disks at the back end Heterogeneous Storage Area Networks [SAN], e.g. CXFS Fiber Optic processor-to-SAN switches Grid enablement (totally distributed archives) Copyright Objectivity, Inc. 2004 3/16/04 SHARING DATA ACROSS SITES Approaches: • Old Way - Transfer files/disks/tapes - Filesystem or no security • Current Approaches - Distributed databases and the World Wide Web - High bandwidth networks - Authentication and secure transport layers • Trends DMW2004 Grid enablement Federated databases Ultra-high bandwidth networks and remote replication Flexible, localized security mechanisms Copyright Objectivity, Inc. 2004 3/16/04 Distributed Federations User X1 A3 A Organization X User X2 Replica of A User X3 Organization Y User Y1 A2 Replica of A DMW2004 Copyright Objectivity, Inc. 2004 3/16/04 Distributed Federations User X1 Mobile and Detached A3 A Organization X Replica of A User X2 User X3 Organization Y User Y1 A2 Replica of A DMW2004 Copyright Objectivity, Inc. 2004 3/16/04 QUERYING HUGE DATASETS Approaches: • Old Way - Hold metadata (indexes and relationships) in a searchable file • Current Approaches - Hold metadata in a RDBMS and data in files - Hold metadata and data in an ODBMS • Trends - Adaptations of text search engines - Distributed Parallel Query Engines - Specialized search accelerators DMW2004 Copyright Objectivity, Inc. 2004 3/16/04 Current Architecture Queries run synchronously within the client DBA Tools APPLICATION Lock Server Lock Server Language Interfaces Object & Schema Managers Data “Page” Server Query & Index Managers Storage & Transaction Managers Data “Page” Server Networking & Event Managers Mass Storage DMW2004 Copyright Objectivity, Inc. 2004 3/16/04 Parallel Query Engine [PQE] Queries run asynchronously and in parallel, either locally or distributed DBA Tools APPLICATION Language Interfaces Lock Server Lock Server Object & Schema Managers Query & Index Managers PQE Data “Page” Servers Storage & Transaction Managers Networking & Event Managers DMW2004 Copyright Objectivity, Inc. 2004 3/16/04 PQE and Search Accelerator Queries run asynchronously and in parallel, but with Predicate Management within the Search Accelerator DBA Tools APPLICATION FPGA & RAM Language Interfaces Search Accelerator Lock Server Lock Server Object & Schema Managers Query Manager PQE Data Servers Storage & Transaction Managers Networking & Event Managers DMW2004 Copyright Objectivity, Inc. 2004 3/16/04 COST OF OWNERSHIP Approaches: • Old Way - Build It Yourself (many hidden costs) - Run It Yourself • Current Approaches - Use Commercial Off The Shelf [COTS] software - Open Source - Commodity hardware & tiered storage • Trends - Heterogeneous storage - Grid Enablement - Resource and Skill Brokers (Future) DMW2004 Copyright Objectivity, Inc. 2004 3/16/04 SUMMARY SUMMARY • Database languages are still evolving • Data throughput and system latency times are decreasing • Sharing data across sites still presents many challenges • Querying vast datasets will become faster and cheaper • Software vendors are wrestling with Open Source issues • Startup costs are still high, but the trends are downward • Grid enablement will help • Keep working on the Standards! DMW2004 Copyright Objectivity, Inc. 2004 3/16/04 RESOURCES • http://www.objectivity.com • Technical Overview • Data Sheets and White Papers • Free downloadable Java and C++ evaluation software and tutorials • Global Grid Forum • http://www.ggf.org • Email: [email protected] ANY QUESTIONS? DMW2004 Copyright Objectivity, Inc. 2004 3/16/04