Download presentation source

O-O, What Are They Doing to Relational Databases? (The Evolution of DB2 Universal Database) Michael J. Carey IBM Almaden January 1999 Plan for Today's Presentation The relational DBMS revolution The object-relational DBMS evolution O-R features in DB2 Universal Database V5.2 Some O-R implementation tradeoffs (V5.2) What lies ahead for DB2 UDB & O-R databases? Questions (and possibly answers) Please ask questions throughout...! The Relational DBMS Revolution The pre-relational era (1970's)  Graph-based data models   Hierarchical (IMS), network (Codasyl) Low-level, navigational interfaces  Labor-intensive and error-prone The relational era (1980's)  Simple, abstract data model Database = set of relations ("tables")  3 schema levels: views, base tables, physical schema  Algebra of set-oriented operations   High-level, declarative interfaces SQL, QBE, et al  Embedded languages, 4GLs  The Relational Model (in one slide) Employees and departments Department Employee ? dno 10 20 eno 1 7 22 name Toy Shoe name Lou Laura Mike salary 10000000 150000 80000 select E.name, E.salary, D.no from Employee E, Department D where E.salary < 100000 and D.name = 'Shoe' and E.dept = D.dno; dept 10 20 20 Relational Databases: A Success Story The relational model has been a big success Simplicity has made research tractable  Data independence yields productivity gains  Both academia and industry have benefitted  Relational DBMS "goodies" include Efficient query optimization and execution  Well-defined transaction semantics and support  Excellent multi-user performance and robustness  Views for data independence, authorization  Constraints, triggers, and stored procedures for (shared) business rule capture/enforcement  "The" success story for parallel computing  We've Achieved Nirvana ... Right? The world is becoming increasingly complex New data types are appearing (e.g., multimedia)  Real-world data doesn't fit neatly into tables  Entities and relationships (vs. tables)  Variance among entities (vs. homogeneity)  Set-valued attributes (vs. normalization)   Advanced applications bring complex data  E.g., CAD/CAM data management, web data management, geographic information management, medical data management, (your favorite application goes here) So maybe objects are the answer...?  Yes, if we can keep all the relational "goodies"! The Object-Relational DBMS Evolution O-R extension #1: Abstract data types (ADTs)  New column types and functions/methods E.g., text, image, audio, video, time series, point, line, OLE...  For modeling new kinds of facts about enterprise entities   Infrastructure for extenders/datablades/cartridges O-R extension #2: Row types  Types and functions/methods for rows of tables Desirable features include references, inheritance, methods, late binding, and collection-valued attributes  For modeling enterprise entities with relationships & behavior   Infrastructure for DBMS-native object management Recent SQL3 merger: Structured types  Can use for types of columns and/or tables "Not Your Father's Employee Type" Beyond name, rank, and serial number  New attribute types   Location (2-d point), job description (text), photo (image), ... Associated functions  Distance(point, point), contains(text, string), ... Beyond your basic employee record  Employees come in different flavors   Employees have many known relationships   Emp, RSM, Programmer, Manager, Temp, ... Manager, department, projects, ... Employees have behavior  Age(Emp), qualified(Emp, Job), hire(Emp), ... An Employee is a simple "business object" The OSF Project at IBM Almaden OSF stands for "Object Strike Force" Semi-autonomous group "outside" UDB development  Focus: object-relational extensions for DB2 UDB  Both near-term and longer-term interests  Collaborate with our Toronto and Santa Teresa labs  Significant activities to date Prototyped "row type" support for DB2 UDB  Delivered in DB2 UDB Version 5.2 (9/98)  Significantly revised SQL3 draft standard  Working on next step plus future technologies  DB2 Universal Database, Version 5 DB2 for Common Servers (Version 2) User-defined column types (UDTs/distinct types)  User-defined functions (UDFs)  Binary/character large objects (BLOBs/CLOBs)  Distinct types: new data types for columns Ex: create distinct type US_Dollar as Real with comparisons;  US_Dollar is an available UDT with functions =, <>, <, <=, >, >=, US_Dollar(Real), Real(US_Dollar)  User-defined functions: associated operations  create function CA_Tax (US_Dollar) returns US_Dollar external name 'money!US_Dollar' language C; DB2 Universal Database, Version 5 (cont.) Lots of other interesting features as well, e.g.: Constraints and triggers  Recursive queries  OLAP support (cube and rollup)  Extenders (based on UDTs/UDFs)  Wide range of hardware/software platforms PCs: Windows95, NT, OS/2, SCO  Unix workstations: AIX, Solaris, HP/UX  Parallel platforms: SMPs, MPPs (e.g., SP2)  Descended from Almaden's Starburst system  Extensible query compiler (with rule-based query rewrite and query optimizer components) New O-R Features in DB2 UDB V5.2 Structured types and references Named types with attributes, O-O subtyping model  Ref(T) for directly modelling relationships  Typed tables and table hierarchies Oid (user-provided) plus a column per attribute of T  Subtables for querying and managing subtype instances  Query language extensions Substitutability for queries/updates (data independence ++)  Path expressions for querying relationships easily  Functions/predicates for runtime type inquiries  Object views (via a novel approach) Virtual table hierarchies for flexible access control  Also facilitates O-O views of legacy tables  A Simple Example Employee and department tables in the (late) 90's person dept mgr emp dept exec student Structured Types and References Create structured types (and references) create type Person_t as ( name Varchar(40), birthyear Integer ); create type Emp_t under Person_t as ( salary Integer, dept Ref(Dept_t) ); create type Exec_t under Emp_t as ( bonus Integer ); create type Student_t under Person_t as ( major Varchar(20) ); Structured Types and References (cont.) Create structured types (cont). create type Dept_t as ( name Varchar(20), budget Integer, headcount Integer, mgr Ref(Emp_t) ); Okay, so I lied (a little) on the last slide... alter type Emp_t add attribute dept Ref(Dept_t); Typed Tables and Table Hierarchies Now create typed tables (and subtables) create table person of Person_t (ref is oid user generated); create table emp of Emp_t under person (dept with options scope dept); create table exec of Exec_t under emp; create table student of Student_t under person; create table dept of Dept_t (ref is oid user generated, mgr with options scope emp); SQL Query Extensions (by example) Substitutability select E.* from emp E where E.birthyear > 1970 and E.salary > 50000; Data modification (insert; update/delete) insert into emp values (Emp_t('e100'), 'John Smith', 1968, 65000, (select oid from dept where name = 'Database')); update person set birthyear = birthyear + 1 where name = 'John Smith'; Path expressions select E.name, E.dept->name from emp E where E.dept->mgr->dept->mgr->name = 'Lou Gerstner'; Querying Table Hierarchies: An Example Dept Person Emp name birthyear P1 Harold 1970 P2 Carol 1958 oid oid name birthyear P3 Hamid 1956 P4 Lou 1940 select name, dept->name from Emp where birthyear < 1960 oid D1 dept _ name ..... Databases ..... select * from Person where name like 'H%' SQL Query Extensions (cont.) Support for type-dependent queries select * from only (emp) E where dept->budget > 10000000; select name from person P where deref(oid) is of type (only Emp_t, Student_t); select type_name(deref(E.oid)), E.* from outer (emp) E where e.oid = Emp_t('e13'); Other Data Definition Features ref is for object id column  Unique, user-generated (on insert) scope clause for reference columns  Provides critical information to the query optimizer not null constraints Definable at any level of a table hierarchy  Enforced for indicated table and its subtables  unique constraints  Root level (and columns) only create index for physical schema Unique or non-unique index on root table  Non-unique index on subtable  Other Data Definition Features (cont.) Authorization model for table hierarchies grant and revoke on table or subtables  Substitutability: implicit subtable authorization on columns inherited from an authorized supertable  Ex #1: select privilege on person table  Ex #2: update privilege on salary column of emp table   Some operations require authorization everywhere deref function  is of type predicate and type_xxx functions   SQL3 also supports granting of table/subtable privileges with hierarchy option Object Views in DB2 UDB Typed views and view hierarchies vperson vdept mgr vemp vstudent dept Requirements: virtual table hierarchies Typed rows with (derived) object ids  Views may be quite different from base data  Support for interconnected "view schemas"  Types For Object Views Create types for use in views create type VPerson_t as ( name Varchar(40) ); create type VEmp_t under VPerson_t as ( dept Ref(VDept_t) ); create type VStudent_t under VPerson_t as ( kind Varchar(8) ); create type VDept_t as ( name Varchar(20), mgr Ref(VEmp_t) ); Typed View Hierarchies Now create typed views (and subviews) create view vperson of VPerson_t (ref is oid user generated) as select VPerson_t(Varchar(oid)), name from only (person); create view vemp of VEmp_t under vperson (dept with options scope vdept) as select VEmp_t(Varchar(oid)), name, VDept_t(Varchar(dept)) from emp where salary > 0; create view vstudent of VStudent_t under vperson as select VStudent_t(Varchar(oid)), name, case when major like '%Engineer%' then 'Geek' else 'non-Geek' end from student; create view vdept of VDept_t ...; O-R Implementation Issues/Tradeoffs Some guiding principles for DB2 UDB V5.2 Performance must equal/exceed relational equivalents  Design amenable to future plans w.r.t. type evolution  Structured types must be supported in columns (someday)  Localize initial changes to query compiler where possible   Want "free" indexing, rewrites, optimization, parallelization, ... Influenced by discussions with a CAD/CAM vendor Information on existing approach and installations  Requirements for efficiency of new products  Let's look briefly at two areas Table hierarchy representation  References and path query processing  Implementing Table Hierarchies Implementation table approach  One physical table per table hierarchy with: Type tag column (to distinguish subtable rows)  Object id column  Columns for all columns of the root table and its subtables  Vertical partitioning approach  One physical root table with: Type tag column  Object id column  Columns for each root table column   N physical delta tables (one per subtable) with: Object id column  Columns for each column introduced by this subtable  Implementing Table Hierarchies (cont.) Horizontal partitioning approach  N separate physical tables with: Object id column  Columns for every subtable column (inherited or not)  So what did we do for UDB V5.2...?  Vertical partitioning approach rejected quickly Too many joins to materialize subtable rows  Multi-column constraints and indices problematic   Horizontal partitioning approach rejected eventually Uniqueness issue for user-generated object ids  Query complexity for multi-hierarchy join queries    Ex: select p.name, q.name from Person p, Person q where ... Implementation table approach taken for V5.2 Appeared to give us the most "free" functionality  Adopted despite row size (null columns) downside  References and Path Expressions Reference values in tables should have a scope "Other end" info for query rewrite and join optimization  Ditto for authorization checking (static vs. dynamic)  Schema makes overly wide references unnecessary  Uniqueness is hierarchy-relative, enforced with an index  V5.2 self-references (object ids) are user-generated CAD/CAM vendor had "legacy references" in files  Different users have different id generation schemes  Loading cyclic data (e.g., emp<-> dept) is messy and slow  Ditto for creating objects from an object cache  References and Path Expressions (cont.) Path expressions are logically equivalent to subqueries select E.name, E.dept->name, E.dept->mgr->name from emp E where E.dept->headcount > 10; Actual approach: shared subquery generation (QGM) Compute common paths (prefixes) once to save work  Not every SQL context accepts an actual subquery  Also need to handle non-serializable locking levels  Efficiency obtained through query rewrite, e.g.: Subquery to outer-join transformation  Outer-join to join transformation where possible  Where We Are Today in UDB V5.2 of UDB contains new O-R features Structured types with inheritance  Object tables and table hierarchies  References and path expressions  Object views and view hierarchies  Moreover, so does the SQL3 standard Includes object views and user-defined references  IBM, Oracle, Informix heading in same general direction  Work continuing on O-R extensions  Let's have a brief look... Additional Object Table Support Business rule mechanisms for typed tables Check constraints on tables/subtables (w/inheritance)  Referential integrity constraints to and from tables/subtables  Triggers on tables/subtables  Object modeling and management extensions User-defined reference types (ref using)  More flexible object view definitions  Type and instance (i.e., row) evolution  Structured types for attributes/columns Work in progress at IBM Santa Teresa Lab  Functions/methods just around the corner as well  Other Exploratory O-R Work Efficient support for collection types Multivalued attributes (e.g., Project.team)  Flavors: set, multiset, array, list, ...  Need to integrate into SQL, support querying well  Some experience from a first prototype  Other activities (and open problems) Java mappings & bindings for O-R data  XML & data-centric web sites ("d-commerce")  Business object servers (caching/consistency)  Heterogeneous data & O-R database systems  User-defined and/or external indexing  Optimizer "hooks" for new data types  Etc.!  Partial List of UDB O-R Contributors Almaden Research Center  Mike Carey, Don Chamberlin, Srinivasa Narayanan, Bennet Vance; C.M. Park; Guido Moerkotte Santa Teresa Lab Nelson Mattos  Gene Fuh, Michelle Jou, Brian Tran  Toronto Lab Doug Doole, Serge Rielau, Rick Swagerman  Leo Lao, Walid Rjaibi, Calisto Zuzarte  Cheryl Greene, various other consultants/hecklers  And as for future versions of UDB  Your name could appear here! (MS/PhD) The End What About Object-Oriented DBMSs? OOPL + DBMS = OO-DBMS Commonly based on C++ or Smalltalk  Persistence, collections, queries, versions, ...  Lots of interesting and useful research results O-O data models and query languages  O-O query processing, system architecture, performance  Various products (O2, Objectstore, Versant, Objectivity, ...)  No widespread commercial acceptance Many differences across systems (despite ODMG-93)  Never really caught up to RDBMS techology  Schema compilation, evolution painful  Missing many of the relational "goodies"  Single-language focus, lack of (relational) tools  Stonebraker Fellow Criteria (found on web) Industrial database researcher PhD from UC Berkeley Must agree with the following motto: Databases are the answer...!  What was the question again...?  At least 6' tall Had a PhD thesis advisor with first name Mike Produced a PhD student with first name Mike

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download presentation source