* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download create type - Berkeley Database Group
Survey
Document related concepts
Microsoft Access wikipedia , lookup
Extensible Storage Engine wikipedia , lookup
Microsoft Jet Database Engine wikipedia , lookup
Relational algebra wikipedia , lookup
Entity–attribute–value model wikipedia , lookup
Microsoft SQL Server wikipedia , lookup
Functional Database Model wikipedia , lookup
Clusterpoint wikipedia , lookup
Open Database Connectivity wikipedia , lookup
Versant Object Database wikipedia , lookup
Transcript
Object-Relational Database Systems: Evolution Beats Revolution Michael J. Carey IBM Almaden Research Center Plan for the Talk The relational DBMS revolution Relational model and query language Why relational succeeded Why relational isn't enough, and some options The object-oriented DBMS revolution Object-oriented model(s) and query language(s) Why object-oriented "failed" Why wrappers will fail as well The object-relational DBMS evolution The object-relational model and query language Current products and examples Performance and other challenges The Relational DBMS Revolution The pre-relational era (1970's) Graph-based data models Hierarchical model (e.g., IMS) Network model (e.g., Codasyl) Low-level, navigational interfaces Labor-intensive and error-prone The relational era (1980's) Simple, abstract data model Database = set of relations ("tables") 3 schema levels: views, base tables, physical schema Algebra of set-oriented operations High-level, declarative interfaces SQL, Quel, QBE Embedded languages, 4GLs The Relational Model (by example) Employees and departments Department Employee dno 10 20 eno 1 7 22 name Toy Shoe name Lou Laura Mike salary 10000000 150000 80000 ? select E.name, E.salary, D.no from Employee E, Department D where E.salary < 100000 and D.name = 'Shoe' and E.dept = D.dno dept 10 20 20 Relational DBMS "Goodies" Relational query processing Queries range over tables and/or views Programmers use a declarative language (SQL) Query optimizer picks the lowest-cost query plan Alternative access paths, join orders, join methods, and so on (based on indices and database characteristics) Result: data independence Support for (shared) business logic Integrity constraints Check constraints, referential integrity constraints Triggers, stored procedures, views, authorization Performance and robustness Buffering, locking, crash recovery, replication, ... We've Achieved Nirvana ... Right? Relations are surely the answer! Simple, high-level model for programmers Easy to distribute data and parallelize queries But what was the question? Sometimes difficult to model "real world" data Entities and relationships (versus tables) Variance among entities (versus homogeneity) Set-valued attributes (versus normalization) Demanding new database applications New applications bring new data types Complex objects are problematic "A relational database is like a garage which forces you to take your car apart and store the pieces in little drawers..." What are the Options? Throw in the towel OOPL + your favorite file system Object-oriented DBMS Tightly integrated: OOPL w/built-in DBMS Object-oriented client wrapper Loosely integrated: OOPL + relational DBMS Object-relational DBMS Newly integrated: Relational model + OO features Which solution is the "right" one...? Let's Examine the Problem Space Stonebraker's 4-quadrant model Complex Simple OO DBMS O-R DBMS File System Relational DBMS Queries Complex The Object-Oriented DBMS Revolution Motivated by new database applications, e.g.: Computer-aided engineering Document management Geographic data management Engineering applications were early drivers Complex data structures ("pointer spaghetti") Navigational data access required Tight coupling between applications and data Version management support needed Approach: OOPL + DBMS = OO-DBMS Commonly based on C++ or Smalltalk Persistence, collections, versions, queries, ... No OO "Ted Codd" Stepped Forward Object-Oriented Database System Manifesto Mandatory features Complex objects, identity, encapsulation Inheritance w/substitutability and late binding Computationally complete methods Extensible type system, persistence Secondary storage, concurrency and recovery Ad hoc queries Optional features Multiple inheritance, static type checking Distribution, long transactions, versions Individual choices Programming paradigm/language Details and uniformity of object model OO-DBMS Technology Today Lots of research results Object data models and features OO query languages and processing techniques Client-server architectures and performance Significant commercial progress Important and innovative systems E.g., O2, ObjectStore, ODE Quite a few commercial product offerings GemStone, Objectivity, ObjectStore, Ontos, O2, Matisse, Poet, Versant, others The ODMG-93 standard (release 2.0) Consortium of OO-DBMS startups Three key parts: ODL, OQL, C++ binding But the Revolution "Failed" ($0B) Lingering OO-DBMS differences Query power, API details, implementation twists Piecewise ODMG standard conformance (ex: OQL!) Still behind R-DBMSs in important ways Codasyl-like schema compilation cycle Schema evolution painful, if supported Typically missing many useful "goodies" Support for multiple application languages Query optimization, views, authorization, constraints, triggers, multi-user scalability and robustness, ... Other factors (niche market) SQL-based application building tools Architecturally biased towards "fat clients" OO Client Wrappers are the Answer... Available from a number of vendors Persistence Software, Ontologic, HP, Next, ... Language-specific relational wrappers Proxy classes for C++ or Smalltalk (or Java) Mapping of row data into language objects Client-side (or middle-tier) object caching and method execution Why is this approach attractive? Can develop OO applications today, against existing enterprise data, for "business objects" ...Not! Paradigm mismatch for querying C++ or Smalltalk for simple business logic and navigation, against object-oriented schema SQL for queries, against relational schema Choice forced for business logic & rules Do on server, using DBMS facilities? Check constraints, referential integrity constraints, triggers, stored procedures, authorization Do on client, using OO wrapper facilities? C++ or Smalltalk (or Java) programming This had better be a stop-gap solution R-DBMS could become a storage manager, throwing away 20+ years of successful R&D! The Object-Relational DBMS Evolution Third Generation Database System Manifesto Support rich object structures and rules Rich type system, inheritance, encapsulation Functions, optional unique ids, rules/trigggers Subsume second generation database systems High-level query-oriented interface Stored and virtual collections Updatable views Data model/performance feature separation Open to other subsystems (tools, middleware) Accessible from multiple languages Layered persistence-oriented language bindings SQL support ("intergalactic dataspeak") Query-shipping architecture "Not Your Father's Employee Type" Beyond name, rank, and serial number Several new attribute types Location (2-d point), job description (text), photo (image), ... Associated functions Distance(point, point), contains(text, string), ... Beyond your basic employee record Employees come in different flavors Employees have many known relationships Emp, RSM, Programmer, Manager, Temp, ... Manager, department, projects, ... Employees have behavior Age(Emp), qualified(Emp, Job), hire(Emp), ... An Employee is a "business object" Two Flavors of O-R Object Extensions Object extension #1: Abstract data types (ADTs) New column types and functions E.g.,text, image, audio, video, time series, point, line, OLE... For modeling new kinds of facts about enterprise entities Object extension #2: Row types Types and functions for rows of tables Includes inheritance, references, set-valued attributes For modeling business objects with relationships & behavior Impact on schemas and query language: SQL3 Schemas: tables at the top, OO richness within Queries: extensions to support the added richness Structured types: support both ADT and row type object modeling needs (unified type system) ADTs (Black Box) To define and use a "black box ADT", a user will Implement its internal structure and functions in an external programming language (e.g., C/C++, Java) Use the DDL to register the type with the DBMS Size of an instance of the type Input (constructor) and output functions Other functions and operators, including signatures and linkable implementations Costs and other properties for query optimizer Use the new type like a built-in data type Now available for defining columns of tables Functions and operators become available in queries Example: Illustra Black Box ADT Point as a "black box ADT" (written in C) create type Point ( internallength = 16; input = point_in; output = point_out; ); -- typedef struct {double x, double y} point -- for reading in Point constants -- for displaying Point results create function point_in(Text) returns Point as external name 'MI_HOME/functions/point.so' language C; create function point_out(Point) returns Text as external name 'MI_HOME/functions/point.so' language C; Example: Illustra Black Box ADT (cont.) Now we can put an end to "Pointless" queries...! create function further_west(Point, Point) returns Boolean as external name 'MI_HOME/functions/pointfuns.so' language C; select E1.name, E1.location from Emp E1, Emp E2 where further_west(E1.location, E2.location) and E2.name = 'Mike'; create binary operator binding to further_west; select E1.name, E1.location from Emp E1, Emp E2 where E1.location >> E2.location and E2.name = 'Mike'; ADTs (White Box) To define and use a "white box ADT", a user will Describe its internal structure using SQL3 DDL Attribute definitions are column-like Advantages: heterogeneity, nulls, nesting, constraints, ... Implement its functions either directly in SQL or in his/her favorite external programming language Finish explaining the type to the DBMS using DDL For query optimizer, as before Use the new type like a built-in data type Utilize system-generated accessors and mutators In tables and queries, as before Note: this is just a SQL3 structured type definition that's primarily intended for use in columns Example: DB2 UDB/OSF White Box ADT Point as a "white box ADT" (written in SQL3) create type Point as ( x double, y double, ); create function distance(p1 Point, p2 Point) returns Point language SQL inline not variant return sqrt((p2..y-p1..y)*(p2..y-p1..y) + (p2..x-p1..x)*(p2..x-p1..x)); select E.name from Emp E, City C where C.name = 'San Jose' and distance(E.location, C.center) < 25; Of Extenders, Blades, and Cartridges High performance demands "deep" integration Optimizer must know about an ADT operator's... Execution cost (especially for expensive functions) Logical properties (e.g., transitivity, negator, ...) Selectivity estimates (i.e., filtering/matching power) Relationship to access methods (both old and new) DBMS runtime must invoke functions efficiently Static vs. dynamic loading, fenced vs. unfenced execution Partnerships and third-party packages E.g., DB2's text, image, and spatial extenders Package contains types, functions, access methods, optimizer information, and SQL DDL statements for all of the above Row Types To define and use a "row type", a user will Create the desired structured type using SQL3 DDL Create functions/methods involving the type Columns, plus (optional) specification of a supertype Arguments of the new type, w/overloading in the case of methods Create one or more tables of the indicated type Type hierarchy (if any) yields corresponding table hierarchies Type Hierarchy Person_t Emp_t Kid_t Table Hierarchy IBM_People IBM_Emps IBM_Kids Example: SQL3 Row Types (plus Sets...) Employees are people, so ... create type Person_t as( name Varchar(20), birthdate Date) method age( ) returns Integer language SQL; create method age( ) for Person_t return year(current date) - year(birthdate); create table IBM_People of Person_t (ref is self); (**Note: this is approximate SQL3 syntax) create type Emp_t under Person_t as ( salary Float, job_description Varchar(100), department ref(Dept), projects set(ref(Project) ); create table IBM_Emps of Emp_t under IBM_People (...); Queries Over Row Types SQL's query constructs, extended with the ability to access these features (a la SQL3 plus sets) User-defined functions in queries (w/late method binding) Dereferencing of references (path expressions) Queries over nested collections (table expressions) For example, find unexplainable discrepencies between employees' and managers' salaries: select E.name, E.manager->name, display(E.photo) from IBM_Emps E where E.salary > E.department->manager->salary and E.department->manager->age( ) > E.self->age( ) and not contains(E.job_description, "Java") Other OR-Related Features Support for large objects Multimedia data types aren't small (e.g., video) Special handling required for efficiency Minimal copying, piecewise retrieval, optional logging, movement to/from files, separate storage area from other attributes DB2 has blob, clob, and dbclob types (up to 2GB) Support for active data (triggers and constraints) Ex: create trigger me_too after insert on IBM_Emps referencing new as newemp foreach row mode db2sql when salary > department->manager->salary begin atomic set newemp.department->manager->salary = newemp.salary; end OR-DBMS Technology Status Many OR-DBMS research results Postgres, EXODUS, Starburst, ... OODB query processing research Commercial systems exist today IBM DB2 CS (V2.1) and CA-Ingres Illustra, UniSQL/X User-defined types & functions, large objects, triggers Early providers of ADTs, row objects, inheritance IBM DB2 UDB, Informix, Oracle "Universal server" products contain subsets of all this stuff Standards right around the corner SQL3 is "hardening" and has an object part with structured types, table hierachies, user-defined functions and methods, object views, .... Some OR-DBMS Performance Issues Bucky OR-DBMS benchmark from UW-Madison Based on a hypothetical university schema Exercised a range of OR-DBMS features Row types, inheritance, late binding, subtables Queries involving path expressions and/or sets ADTs (black or white box) and functions In Proc. 1997 ACM SIGMOD Conference Tested a first-generation OR-DBMS product OR versus relational simulation, same DB engine Showed benefits of (complex) ADTs, indexes on functions Indicated areas where query optimization needs schema support: scope for path expressions, inverse relationships Turned up bugs and performance problems (e.g., sets) OR Enterprise Scenerio (w/Challenges) Object-relational server managing the database ADTs w/inheritance and multi-language support Row types, integrated with all of SQL (OO views, authorization, triggers, constraints, etc.) High-function, OO, caching front-ends Support for desktop and middle-tier (web!) applications OR object model at all levels, for queries and navigation Clean bindings for OOPLs (Java, C++, Smalltalk) Methods/queries running on client or server Likewise for triggers and constraints Business rules specified & implemented once! In SQL (+ OOPL), running where appropriate Multi-Tier Integration Challenges Good mappings and interfaces to provide object-relational objects to OOPLs Java, C++, Smalltalk, others Full query support in addition to navigation Challenges in querying and caching Intelligent querying over cache + database Correct and efficient caching of view objects Update-related challenges Triggers and constraints of all types View objects (both directions) Method execution on client or server Java should be very useful here Legacy Data Access Challenges Some data will live outside the OR-DBMS Older DBMSs (both relational & pre-relational) Specialized data stores (documents, images, ...) Applications (i.e., legacy transactions) Object-relational middleware is the answer! Table functions can handle simple cases now Distributed OR query engine (a la DataJoiner) can mediate between new applications and legacy data Resulting appearance is that of an integrated OR database, accessible via SQL3 APIs and OO tools Front-End Integration + Legacy Data Access C++ Client Co-op Cache Java Client Co-op Cache Smalltalk Client Co-op Cache Co-op Interface Object-Relational Engine Query Object Wrappers OR-DBMS R-DBMS Image Mgr Text Mgr Conclusions Relational DBMS era: 1980's, early 1990's. Significantly raised the levels of abstraction & productivity Only "real" parallel computing success story to date, too! Object DBMS era: Should have been early 1990's... Never made it out of the (mainstream) starting gate Object-relational DBMS era: You are there! Object enhancements to relational DBMSs ADTs (white box, black box) and functions Row types with inheritance, references, sets, ... Vastly reduces the "impedence mismatch" w/OOPLs Today's OO wrappers are an interim solution Possibilities abound for nice OO/OR tools Will have OR middleware as well as engines