* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download presentation source
Open Database Connectivity wikipedia , lookup
Microsoft SQL Server wikipedia , lookup
Functional Database Model wikipedia , lookup
Clusterpoint wikipedia , lookup
Ingres (database) wikipedia , lookup
Relational algebra wikipedia , lookup
Versant Object Database wikipedia , lookup
Entity–attribute–value model wikipedia , lookup
Extensible Storage Engine wikipedia , lookup
O-O, What Are They Doing to Relational Databases? (The Evolution of DB2 Universal Database) Michael J. Carey IBM Almaden January 1999 Plan for Today's Presentation The relational DBMS revolution The object-relational DBMS evolution O-R features in DB2 Universal Database V5.2 Some O-R implementation tradeoffs (V5.2) What lies ahead for DB2 UDB & O-R databases? Questions (and possibly answers) Please ask questions throughout...! The Relational DBMS Revolution The pre-relational era (1970's) Graph-based data models Hierarchical (IMS), network (Codasyl) Low-level, navigational interfaces Labor-intensive and error-prone The relational era (1980's) Simple, abstract data model Database = set of relations ("tables") 3 schema levels: views, base tables, physical schema Algebra of set-oriented operations High-level, declarative interfaces SQL, QBE, et al Embedded languages, 4GLs The Relational Model (in one slide) Employees and departments Department Employee ? dno 10 20 eno 1 7 22 name Toy Shoe name Lou Laura Mike salary 10000000 150000 80000 select E.name, E.salary, D.no from Employee E, Department D where E.salary < 100000 and D.name = 'Shoe' and E.dept = D.dno; dept 10 20 20 Relational Databases: A Success Story The relational model has been a big success Simplicity has made research tractable Data independence yields productivity gains Both academia and industry have benefitted Relational DBMS "goodies" include Efficient query optimization and execution Well-defined transaction semantics and support Excellent multi-user performance and robustness Views for data independence, authorization Constraints, triggers, and stored procedures for (shared) business rule capture/enforcement "The" success story for parallel computing We've Achieved Nirvana ... Right? The world is becoming increasingly complex New data types are appearing (e.g., multimedia) Real-world data doesn't fit neatly into tables Entities and relationships (vs. tables) Variance among entities (vs. homogeneity) Set-valued attributes (vs. normalization) Advanced applications bring complex data E.g., CAD/CAM data management, web data management, geographic information management, medical data management, (your favorite application goes here) So maybe objects are the answer...? Yes, if we can keep all the relational "goodies"! The Object-Relational DBMS Evolution O-R extension #1: Abstract data types (ADTs) New column types and functions/methods E.g., text, image, audio, video, time series, point, line, OLE... For modeling new kinds of facts about enterprise entities Infrastructure for extenders/datablades/cartridges O-R extension #2: Row types Types and functions/methods for rows of tables Desirable features include references, inheritance, methods, late binding, and collection-valued attributes For modeling enterprise entities with relationships & behavior Infrastructure for DBMS-native object management Recent SQL3 merger: Structured types Can use for types of columns and/or tables "Not Your Father's Employee Type" Beyond name, rank, and serial number New attribute types Location (2-d point), job description (text), photo (image), ... Associated functions Distance(point, point), contains(text, string), ... Beyond your basic employee record Employees come in different flavors Employees have many known relationships Emp, RSM, Programmer, Manager, Temp, ... Manager, department, projects, ... Employees have behavior Age(Emp), qualified(Emp, Job), hire(Emp), ... An Employee is a simple "business object" The OSF Project at IBM Almaden OSF stands for "Object Strike Force" Semi-autonomous group "outside" UDB development Focus: object-relational extensions for DB2 UDB Both near-term and longer-term interests Collaborate with our Toronto and Santa Teresa labs Significant activities to date Prototyped "row type" support for DB2 UDB Delivered in DB2 UDB Version 5.2 (9/98) Significantly revised SQL3 draft standard Working on next step plus future technologies DB2 Universal Database, Version 5 DB2 for Common Servers (Version 2) User-defined column types (UDTs/distinct types) User-defined functions (UDFs) Binary/character large objects (BLOBs/CLOBs) Distinct types: new data types for columns Ex: create distinct type US_Dollar as Real with comparisons; US_Dollar is an available UDT with functions =, <>, <, <=, >, >=, US_Dollar(Real), Real(US_Dollar) User-defined functions: associated operations create function CA_Tax (US_Dollar) returns US_Dollar external name 'money!US_Dollar' language C; DB2 Universal Database, Version 5 (cont.) Lots of other interesting features as well, e.g.: Constraints and triggers Recursive queries OLAP support (cube and rollup) Extenders (based on UDTs/UDFs) Wide range of hardware/software platforms PCs: Windows95, NT, OS/2, SCO Unix workstations: AIX, Solaris, HP/UX Parallel platforms: SMPs, MPPs (e.g., SP2) Descended from Almaden's Starburst system Extensible query compiler (with rule-based query rewrite and query optimizer components) New O-R Features in DB2 UDB V5.2 Structured types and references Named types with attributes, O-O subtyping model Ref(T) for directly modelling relationships Typed tables and table hierarchies Oid (user-provided) plus a column per attribute of T Subtables for querying and managing subtype instances Query language extensions Substitutability for queries/updates (data independence ++) Path expressions for querying relationships easily Functions/predicates for runtime type inquiries Object views (via a novel approach) Virtual table hierarchies for flexible access control Also facilitates O-O views of legacy tables A Simple Example Employee and department tables in the (late) 90's person dept mgr emp dept exec student Structured Types and References Create structured types (and references) create type Person_t as ( name Varchar(40), birthyear Integer ); create type Emp_t under Person_t as ( salary Integer, dept Ref(Dept_t) ); create type Exec_t under Emp_t as ( bonus Integer ); create type Student_t under Person_t as ( major Varchar(20) ); Structured Types and References (cont.) Create structured types (cont). create type Dept_t as ( name Varchar(20), budget Integer, headcount Integer, mgr Ref(Emp_t) ); Okay, so I lied (a little) on the last slide... alter type Emp_t add attribute dept Ref(Dept_t); Typed Tables and Table Hierarchies Now create typed tables (and subtables) create table person of Person_t (ref is oid user generated); create table emp of Emp_t under person (dept with options scope dept); create table exec of Exec_t under emp; create table student of Student_t under person; create table dept of Dept_t (ref is oid user generated, mgr with options scope emp); SQL Query Extensions (by example) Substitutability select E.* from emp E where E.birthyear > 1970 and E.salary > 50000; Data modification (insert; update/delete) insert into emp values (Emp_t('e100'), 'John Smith', 1968, 65000, (select oid from dept where name = 'Database')); update person set birthyear = birthyear + 1 where name = 'John Smith'; Path expressions select E.name, E.dept->name from emp E where E.dept->mgr->dept->mgr->name = 'Lou Gerstner'; Querying Table Hierarchies: An Example Dept Person Emp name birthyear P1 Harold 1970 P2 Carol 1958 oid oid name birthyear P3 Hamid 1956 P4 Lou 1940 select name, dept->name from Emp where birthyear < 1960 oid D1 dept _ name ..... Databases ..... select * from Person where name like 'H%' SQL Query Extensions (cont.) Support for type-dependent queries select * from only (emp) E where dept->budget > 10000000; select name from person P where deref(oid) is of type (only Emp_t, Student_t); select type_name(deref(E.oid)), E.* from outer (emp) E where e.oid = Emp_t('e13'); Other Data Definition Features ref is for object id column Unique, user-generated (on insert) scope clause for reference columns Provides critical information to the query optimizer not null constraints Definable at any level of a table hierarchy Enforced for indicated table and its subtables unique constraints Root level (and columns) only create index for physical schema Unique or non-unique index on root table Non-unique index on subtable Other Data Definition Features (cont.) Authorization model for table hierarchies grant and revoke on table or subtables Substitutability: implicit subtable authorization on columns inherited from an authorized supertable Ex #1: select privilege on person table Ex #2: update privilege on salary column of emp table Some operations require authorization everywhere deref function is of type predicate and type_xxx functions SQL3 also supports granting of table/subtable privileges with hierarchy option Object Views in DB2 UDB Typed views and view hierarchies vperson vdept mgr vemp vstudent dept Requirements: virtual table hierarchies Typed rows with (derived) object ids Views may be quite different from base data Support for interconnected "view schemas" Types For Object Views Create types for use in views create type VPerson_t as ( name Varchar(40) ); create type VEmp_t under VPerson_t as ( dept Ref(VDept_t) ); create type VStudent_t under VPerson_t as ( kind Varchar(8) ); create type VDept_t as ( name Varchar(20), mgr Ref(VEmp_t) ); Typed View Hierarchies Now create typed views (and subviews) create view vperson of VPerson_t (ref is oid user generated) as select VPerson_t(Varchar(oid)), name from only (person); create view vemp of VEmp_t under vperson (dept with options scope vdept) as select VEmp_t(Varchar(oid)), name, VDept_t(Varchar(dept)) from emp where salary > 0; create view vstudent of VStudent_t under vperson as select VStudent_t(Varchar(oid)), name, case when major like '%Engineer%' then 'Geek' else 'non-Geek' end from student; create view vdept of VDept_t ...; O-R Implementation Issues/Tradeoffs Some guiding principles for DB2 UDB V5.2 Performance must equal/exceed relational equivalents Design amenable to future plans w.r.t. type evolution Structured types must be supported in columns (someday) Localize initial changes to query compiler where possible Want "free" indexing, rewrites, optimization, parallelization, ... Influenced by discussions with a CAD/CAM vendor Information on existing approach and installations Requirements for efficiency of new products Let's look briefly at two areas Table hierarchy representation References and path query processing Implementing Table Hierarchies Implementation table approach One physical table per table hierarchy with: Type tag column (to distinguish subtable rows) Object id column Columns for all columns of the root table and its subtables Vertical partitioning approach One physical root table with: Type tag column Object id column Columns for each root table column N physical delta tables (one per subtable) with: Object id column Columns for each column introduced by this subtable Implementing Table Hierarchies (cont.) Horizontal partitioning approach N separate physical tables with: Object id column Columns for every subtable column (inherited or not) So what did we do for UDB V5.2...? Vertical partitioning approach rejected quickly Too many joins to materialize subtable rows Multi-column constraints and indices problematic Horizontal partitioning approach rejected eventually Uniqueness issue for user-generated object ids Query complexity for multi-hierarchy join queries Ex: select p.name, q.name from Person p, Person q where ... Implementation table approach taken for V5.2 Appeared to give us the most "free" functionality Adopted despite row size (null columns) downside References and Path Expressions Reference values in tables should have a scope "Other end" info for query rewrite and join optimization Ditto for authorization checking (static vs. dynamic) Schema makes overly wide references unnecessary Uniqueness is hierarchy-relative, enforced with an index V5.2 self-references (object ids) are user-generated CAD/CAM vendor had "legacy references" in files Different users have different id generation schemes Loading cyclic data (e.g., emp<-> dept) is messy and slow Ditto for creating objects from an object cache References and Path Expressions (cont.) Path expressions are logically equivalent to subqueries select E.name, E.dept->name, E.dept->mgr->name from emp E where E.dept->headcount > 10; Actual approach: shared subquery generation (QGM) Compute common paths (prefixes) once to save work Not every SQL context accepts an actual subquery Also need to handle non-serializable locking levels Efficiency obtained through query rewrite, e.g.: Subquery to outer-join transformation Outer-join to join transformation where possible Where We Are Today in UDB V5.2 of UDB contains new O-R features Structured types with inheritance Object tables and table hierarchies References and path expressions Object views and view hierarchies Moreover, so does the SQL3 standard Includes object views and user-defined references IBM, Oracle, Informix heading in same general direction Work continuing on O-R extensions Let's have a brief look... Additional Object Table Support Business rule mechanisms for typed tables Check constraints on tables/subtables (w/inheritance) Referential integrity constraints to and from tables/subtables Triggers on tables/subtables Object modeling and management extensions User-defined reference types (ref using) More flexible object view definitions Type and instance (i.e., row) evolution Structured types for attributes/columns Work in progress at IBM Santa Teresa Lab Functions/methods just around the corner as well Other Exploratory O-R Work Efficient support for collection types Multivalued attributes (e.g., Project.team) Flavors: set, multiset, array, list, ... Need to integrate into SQL, support querying well Some experience from a first prototype Other activities (and open problems) Java mappings & bindings for O-R data XML & data-centric web sites ("d-commerce") Business object servers (caching/consistency) Heterogeneous data & O-R database systems User-defined and/or external indexing Optimizer "hooks" for new data types Etc.! Partial List of UDB O-R Contributors Almaden Research Center Mike Carey, Don Chamberlin, Srinivasa Narayanan, Bennet Vance; C.M. Park; Guido Moerkotte Santa Teresa Lab Nelson Mattos Gene Fuh, Michelle Jou, Brian Tran Toronto Lab Doug Doole, Serge Rielau, Rick Swagerman Leo Lao, Walid Rjaibi, Calisto Zuzarte Cheryl Greene, various other consultants/hecklers And as for future versions of UDB Your name could appear here! (MS/PhD) The End What About Object-Oriented DBMSs? OOPL + DBMS = OO-DBMS Commonly based on C++ or Smalltalk Persistence, collections, queries, versions, ... Lots of interesting and useful research results O-O data models and query languages O-O query processing, system architecture, performance Various products (O2, Objectstore, Versant, Objectivity, ...) No widespread commercial acceptance Many differences across systems (despite ODMG-93) Never really caught up to RDBMS techology Schema compilation, evolution painful Missing many of the relational "goodies" Single-language focus, lack of (relational) tools Stonebraker Fellow Criteria (found on web) Industrial database researcher PhD from UC Berkeley Must agree with the following motto: Databases are the answer...! What was the question again...? At least 6' tall Had a PhD thesis advisor with first name Mike Produced a PhD student with first name Mike