Download create type - Berkeley Database Group

Object-Relational Database Systems: Evolution Beats Revolution Michael J. Carey IBM Almaden Research Center Plan for the Talk The relational DBMS revolution  Relational model and query language  Why relational succeeded  Why relational isn't enough, and some options The object-oriented DBMS revolution  Object-oriented model(s) and query language(s)  Why object-oriented "failed"  Why wrappers will fail as well The object-relational DBMS evolution  The object-relational model and query language  Current products and examples  Performance and other challenges The Relational DBMS Revolution The pre-relational era (1970's)  Graph-based data models Hierarchical model (e.g., IMS)  Network model (e.g., Codasyl)   Low-level, navigational interfaces  Labor-intensive and error-prone The relational era (1980's)  Simple, abstract data model Database = set of relations ("tables")  3 schema levels: views, base tables, physical schema  Algebra of set-oriented operations   High-level, declarative interfaces SQL, Quel, QBE  Embedded languages, 4GLs  The Relational Model (by example) Employees and departments Department Employee dno 10 20 eno 1 7 22 name Toy Shoe name Lou Laura Mike salary 10000000 150000 80000 ? select E.name, E.salary, D.no from Employee E, Department D where E.salary < 100000 and D.name = 'Shoe' and E.dept = D.dno dept 10 20 20 Relational DBMS "Goodies" Relational query processing  Queries range over tables and/or views  Programmers use a declarative language (SQL)  Query optimizer picks the lowest-cost query plan   Alternative access paths, join orders, join methods, and so on (based on indices and database characteristics) Result: data independence Support for (shared) business logic  Integrity constraints   Check constraints, referential integrity constraints Triggers, stored procedures, views, authorization Performance and robustness  Buffering, locking, crash recovery, replication, ... We've Achieved Nirvana ... Right? Relations are surely the answer!  Simple, high-level model for programmers  Easy to distribute data and parallelize queries But what was the question?  Sometimes difficult to model "real world" data Entities and relationships (versus tables)  Variance among entities (versus homogeneity)  Set-valued attributes (versus normalization)  Demanding new database applications  New applications bring new data types  Complex objects are problematic  "A relational database is like a garage which forces you to take your car apart and store the pieces in little drawers..." What are the Options? Throw in the towel  OOPL + your favorite file system Object-oriented DBMS  Tightly integrated: OOPL w/built-in DBMS Object-oriented client wrapper  Loosely integrated: OOPL + relational DBMS Object-relational DBMS  Newly integrated: Relational model + OO features Which solution is the "right" one...? Let's Examine the Problem Space Stonebraker's 4-quadrant model Complex Simple OO DBMS O-R DBMS File System Relational DBMS Queries Complex The Object-Oriented DBMS Revolution Motivated by new database applications, e.g.:  Computer-aided engineering  Document management  Geographic data management Engineering applications were early drivers  Complex data structures ("pointer spaghetti")  Navigational data access required  Tight coupling between applications and data  Version management support needed Approach: OOPL + DBMS = OO-DBMS  Commonly based on C++ or Smalltalk  Persistence, collections, versions, queries, ... No OO "Ted Codd" Stepped Forward Object-Oriented Database System Manifesto  Mandatory features Complex objects, identity, encapsulation  Inheritance w/substitutability and late binding  Computationally complete methods  Extensible type system, persistence  Secondary storage, concurrency and recovery  Ad hoc queries   Optional features Multiple inheritance, static type checking  Distribution, long transactions, versions   Individual choices Programming paradigm/language  Details and uniformity of object model  OO-DBMS Technology Today Lots of research results  Object data models and features  OO query languages and processing techniques  Client-server architectures and performance Significant commercial progress  Important and innovative systems   E.g., O2, ObjectStore, ODE Quite a few commercial product offerings  GemStone, Objectivity, ObjectStore, Ontos, O2, Matisse, Poet, Versant, others The ODMG-93 standard (release 2.0)  Consortium of OO-DBMS startups  Three key parts: ODL, OQL, C++ binding But the Revolution "Failed" ($0B) Lingering OO-DBMS differences  Query power, API details, implementation twists  Piecewise ODMG standard conformance (ex: OQL!) Still behind R-DBMSs in important ways  Codasyl-like schema compilation cycle  Schema evolution painful, if supported  Typically missing many useful "goodies" Support for multiple application languages  Query optimization, views, authorization, constraints, triggers, multi-user scalability and robustness, ...  Other factors (niche market)  SQL-based application building tools  Architecturally biased towards "fat clients" OO Client Wrappers are the Answer... Available from a number of vendors  Persistence Software, Ontologic, HP, Next, ... Language-specific relational wrappers  Proxy classes for C++ or Smalltalk (or Java)  Mapping of row data into language objects  Client-side (or middle-tier) object caching and method execution Why is this approach attractive?  Can develop OO applications today, against existing enterprise data, for "business objects" ...Not! Paradigm mismatch for querying  C++ or Smalltalk for simple business logic and navigation, against object-oriented schema  SQL for queries, against relational schema Choice forced for business logic & rules  Do on server, using DBMS facilities?   Check constraints, referential integrity constraints, triggers, stored procedures, authorization Do on client, using OO wrapper facilities?  C++ or Smalltalk (or Java) programming This had better be a stop-gap solution  R-DBMS could become a storage manager, throwing away 20+ years of successful R&D! The Object-Relational DBMS Evolution  Third Generation Database System Manifesto  Support rich object structures and rules Rich type system, inheritance, encapsulation  Functions, optional unique ids, rules/trigggers   Subsume second generation database systems High-level query-oriented interface  Stored and virtual collections  Updatable views  Data model/performance feature separation   Open to other subsystems (tools, middleware) Accessible from multiple languages  Layered persistence-oriented language bindings  SQL support ("intergalactic dataspeak")  Query-shipping architecture  "Not Your Father's Employee Type" Beyond name, rank, and serial number  Several new attribute types   Location (2-d point), job description (text), photo (image), ... Associated functions  Distance(point, point), contains(text, string), ... Beyond your basic employee record  Employees come in different flavors   Employees have many known relationships   Emp, RSM, Programmer, Manager, Temp, ... Manager, department, projects, ... Employees have behavior  Age(Emp), qualified(Emp, Job), hire(Emp), ... An Employee is a "business object" Two Flavors of O-R Object Extensions Object extension #1: Abstract data types (ADTs)  New column types and functions E.g.,text, image, audio, video, time series, point, line, OLE...  For modeling new kinds of facts about enterprise entities  Object extension #2: Row types  Types and functions for rows of tables Includes inheritance, references, set-valued attributes  For modeling business objects with relationships & behavior  Impact on schemas and query language: SQL3  Schemas: tables at the top, OO richness within  Queries: extensions to support the added richness  Structured types: support both ADT and row type object modeling needs (unified type system) ADTs (Black Box) To define and use a "black box ADT", a user will  Implement its internal structure and functions in an external programming language (e.g., C/C++, Java)  Use the DDL to register the type with the DBMS Size of an instance of the type  Input (constructor) and output functions  Other functions and operators, including signatures and linkable implementations  Costs and other properties for query optimizer   Use the new type like a built-in data type Now available for defining columns of tables  Functions and operators become available in queries  Example: Illustra Black Box ADT Point as a "black box ADT" (written in C) create type Point ( internallength = 16; input = point_in; output = point_out; ); -- typedef struct {double x, double y} point -- for reading in Point constants -- for displaying Point results create function point_in(Text) returns Point as external name 'MI_HOME/functions/point.so' language C; create function point_out(Point) returns Text as external name 'MI_HOME/functions/point.so' language C; Example: Illustra Black Box ADT (cont.) Now we can put an end to "Pointless" queries...! create function further_west(Point, Point) returns Boolean as external name 'MI_HOME/functions/pointfuns.so' language C; select E1.name, E1.location from Emp E1, Emp E2 where further_west(E1.location, E2.location) and E2.name = 'Mike'; create binary operator binding to further_west; select E1.name, E1.location from Emp E1, Emp E2 where E1.location >> E2.location and E2.name = 'Mike'; ADTs (White Box) To define and use a "white box ADT", a user will  Describe its internal structure using SQL3 DDL Attribute definitions are column-like  Advantages: heterogeneity, nulls, nesting, constraints, ...   Implement its functions either directly in SQL or in his/her favorite external programming language   Finish explaining the type to the DBMS using DDL   For query optimizer, as before Use the new type like a built-in data type   Utilize system-generated accessors and mutators In tables and queries, as before Note: this is just a SQL3 structured type definition that's primarily intended for use in columns Example: DB2 UDB/OSF White Box ADT Point as a "white box ADT" (written in SQL3) create type Point as ( x double, y double, ); create function distance(p1 Point, p2 Point) returns Point language SQL inline not variant return sqrt((p2..y-p1..y)*(p2..y-p1..y) + (p2..x-p1..x)*(p2..x-p1..x)); select E.name from Emp E, City C where C.name = 'San Jose' and distance(E.location, C.center) < 25; Of Extenders, Blades, and Cartridges High performance demands "deep" integration  Optimizer must know about an ADT operator's... Execution cost (especially for expensive functions)  Logical properties (e.g., transitivity, negator, ...)  Selectivity estimates (i.e., filtering/matching power)  Relationship to access methods (both old and new)   DBMS runtime must invoke functions efficiently  Static vs. dynamic loading, fenced vs. unfenced execution Partnerships and third-party packages  E.g., DB2's text, image, and spatial extenders  Package contains types, functions, access methods, optimizer information, and SQL DDL statements for all of the above Row Types To define and use a "row type", a user will  Create the desired structured type using SQL3 DDL   Create functions/methods involving the type   Columns, plus (optional) specification of a supertype Arguments of the new type, w/overloading in the case of methods Create one or more tables of the indicated type  Type hierarchy (if any) yields corresponding table hierarchies Type Hierarchy Person_t Emp_t Kid_t Table Hierarchy IBM_People IBM_Emps IBM_Kids Example: SQL3 Row Types (plus Sets...) Employees are people, so ... create type Person_t as( name Varchar(20), birthdate Date) method age( ) returns Integer language SQL; create method age( ) for Person_t return year(current date) - year(birthdate); create table IBM_People of Person_t (ref is self); (**Note: this is approximate SQL3 syntax) create type Emp_t under Person_t as ( salary Float, job_description Varchar(100), department ref(Dept), projects set(ref(Project) ); create table IBM_Emps of Emp_t under IBM_People (...); Queries Over Row Types SQL's query constructs, extended with the ability to access these features (a la SQL3 plus sets) User-defined functions in queries (w/late method binding)  Dereferencing of references (path expressions)  Queries over nested collections (table expressions)  For example, find unexplainable discrepencies between employees' and managers' salaries: select E.name, E.manager->name, display(E.photo) from IBM_Emps E where E.salary > E.department->manager->salary and E.department->manager->age( ) > E.self->age( ) and not contains(E.job_description, "Java") Other OR-Related Features Support for large objects  Multimedia data types aren't small (e.g., video)  Special handling required for efficiency   Minimal copying, piecewise retrieval, optional logging, movement to/from files, separate storage area from other attributes DB2 has blob, clob, and dbclob types (up to 2GB) Support for active data (triggers and constraints)  Ex: create trigger me_too after insert on IBM_Emps referencing new as newemp foreach row mode db2sql when salary > department->manager->salary begin atomic set newemp.department->manager->salary = newemp.salary; end OR-DBMS Technology Status Many OR-DBMS research results  Postgres, EXODUS, Starburst, ...  OODB query processing research Commercial systems exist today  IBM DB2 CS (V2.1) and CA-Ingres   Illustra, UniSQL/X   User-defined types & functions, large objects, triggers Early providers of ADTs, row objects, inheritance IBM DB2 UDB, Informix, Oracle  "Universal server" products contain subsets of all this stuff Standards right around the corner  SQL3 is "hardening" and has an object part with structured types, table hierachies, user-defined functions and methods, object views, .... Some OR-DBMS Performance Issues Bucky OR-DBMS benchmark from UW-Madison  Based on a hypothetical university schema  Exercised a range of OR-DBMS features Row types, inheritance, late binding, subtables  Queries involving path expressions and/or sets  ADTs (black or white box) and functions   In Proc. 1997 ACM SIGMOD Conference Tested a first-generation OR-DBMS product  OR versus relational simulation, same DB engine Showed benefits of (complex) ADTs, indexes on functions  Indicated areas where query optimization needs schema support: scope for path expressions, inverse relationships  Turned up bugs and performance problems (e.g., sets)  OR Enterprise Scenerio (w/Challenges) Object-relational server managing the database  ADTs w/inheritance and multi-language support  Row types, integrated with all of SQL (OO views, authorization, triggers, constraints, etc.) High-function, OO, caching front-ends  Support for desktop and middle-tier (web!) applications  OR object model at all levels, for queries and navigation  Clean bindings for OOPLs (Java, C++, Smalltalk)  Methods/queries running on client or server  Likewise for triggers and constraints Business rules specified & implemented once!  In SQL (+ OOPL), running where appropriate Multi-Tier Integration Challenges Good mappings and interfaces to provide object-relational objects to OOPLs Java, C++, Smalltalk, others  Full query support in addition to navigation  Challenges in querying and caching  Intelligent querying over cache + database  Correct and efficient caching of view objects Update-related challenges  Triggers and constraints of all types  View objects (both directions) Method execution on client or server  Java should be very useful here Legacy Data Access Challenges Some data will live outside the OR-DBMS  Older DBMSs (both relational & pre-relational)  Specialized data stores (documents, images, ...)  Applications (i.e., legacy transactions) Object-relational middleware is the answer!  Table functions can handle simple cases now  Distributed OR query engine (a la DataJoiner) can mediate between new applications and legacy data  Resulting appearance is that of an integrated OR database, accessible via SQL3 APIs and OO tools Front-End Integration + Legacy Data Access C++ Client Co-op Cache Java Client Co-op Cache Smalltalk Client Co-op Cache Co-op Interface Object-Relational Engine Query Object Wrappers OR-DBMS R-DBMS Image Mgr Text Mgr Conclusions Relational DBMS era: 1980's, early 1990's.  Significantly raised the levels of abstraction & productivity  Only "real" parallel computing success story to date, too! Object DBMS era: Should have been early 1990's...  Never made it out of the (mainstream) starting gate Object-relational DBMS era: You are there!  Object enhancements to relational DBMSs ADTs (white box, black box) and functions  Row types with inheritance, references, sets, ...   Vastly reduces the "impedence mismatch" w/OOPLs Today's OO wrappers are an interim solution  Possibilities abound for nice OO/OR tools   Will have OR middleware as well as engines

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download create type - Berkeley Database Group