* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download create type - Berkeley Database Group
Microsoft Access wikipedia , lookup
Extensible Storage Engine wikipedia , lookup
Microsoft Jet Database Engine wikipedia , lookup
Relational algebra wikipedia , lookup
Entity–attribute–value model wikipedia , lookup
Microsoft SQL Server wikipedia , lookup
Functional Database Model wikipedia , lookup
Clusterpoint wikipedia , lookup
Open Database Connectivity wikipedia , lookup
Versant Object Database wikipedia , lookup
Object-Relational Database
Systems:
Evolution Beats Revolution
Michael J. Carey
IBM Almaden Research Center
Plan for the Talk
The relational DBMS revolution
 Relational model and query language
 Why relational succeeded
 Why relational isn't enough, and some options
The object-oriented DBMS revolution
 Object-oriented model(s) and query language(s)
 Why object-oriented "failed"
 Why wrappers will fail as well
The object-relational DBMS evolution
 The object-relational model and query language
 Current products and examples
 Performance and other challenges
The Relational DBMS Revolution
The pre-relational era (1970's)
 Graph-based data models
Hierarchical model (e.g., IMS)
 Network model (e.g., Codasyl)
Low-level, navigational interfaces
Labor-intensive and error-prone
The relational era (1980's)
 Simple, abstract data model
Database = set of relations ("tables")
 3 schema levels: views, base tables, physical schema
 Algebra of set-oriented operations
High-level, declarative interfaces
SQL, Quel, QBE
 Embedded languages, 4GLs
The Relational Model (by example)
Employees and departments
Department
Employee
dno
10
20
eno
1
7
22
name
Toy
Shoe
name
Lou
Laura
Mike
salary
10000000
150000
80000
?
select E.name, E.salary, D.no
from Employee E, Department D
where E.salary < 100000
and D.name = 'Shoe'
and E.dept = D.dno
dept
10
20
20
Relational DBMS "Goodies"
Relational query processing
 Queries range over tables and/or views
 Programmers use a declarative language (SQL)
 Query optimizer picks the lowest-cost query plan
Alternative access paths, join orders, join methods, and so
on (based on indices and database characteristics)
Result: data independence
Support for (shared) business logic
 Integrity constraints
Check constraints, referential integrity constraints
Triggers, stored procedures, views, authorization
Performance and robustness
 Buffering, locking, crash recovery, replication, ...
We've Achieved Nirvana ... Right?
Relations are surely the answer!
 Simple, high-level model for programmers
 Easy to distribute data and parallelize queries
But what was the question?
 Sometimes difficult to model "real world" data
Entities and relationships (versus tables)
 Variance among entities (versus homogeneity)
 Set-valued attributes (versus normalization)
Demanding new database applications
 New applications bring new data types
 Complex objects are problematic
"A relational database is like a garage which
forces you to take your car apart and store
the pieces in little drawers..."
What are the Options?
Throw in the towel
 OOPL + your favorite file system
Object-oriented DBMS
 Tightly integrated: OOPL w/built-in DBMS
Object-oriented client wrapper
 Loosely integrated: OOPL + relational DBMS
Object-relational DBMS
 Newly integrated: Relational model + OO features
Which solution is the "right" one...?
Let's Examine the Problem Space
Stonebraker's 4-quadrant model
Complex
Simple
OO DBMS
O-R DBMS
File
System
Relational
DBMS
Queries
Complex
The Object-Oriented DBMS Revolution
Motivated by new database applications, e.g.:
 Computer-aided engineering
 Document management
 Geographic data management
Engineering applications were early drivers
 Complex data structures ("pointer spaghetti")
 Navigational data access required
 Tight coupling between applications and data
 Version management support needed
Approach: OOPL + DBMS = OO-DBMS
 Commonly based on C++ or Smalltalk
 Persistence, collections, versions, queries, ...
No OO "Ted Codd" Stepped Forward
Object-Oriented Database System Manifesto
 Mandatory features
Complex objects, identity, encapsulation
 Inheritance w/substitutability and late binding
 Computationally complete methods
 Extensible type system, persistence
 Secondary storage, concurrency and recovery
 Ad hoc queries
Optional features
Multiple inheritance, static type checking
 Distribution, long transactions, versions
Individual choices
Programming paradigm/language
 Details and uniformity of object model
OO-DBMS Technology Today
Lots of research results
 Object data models and features
 OO query languages and processing techniques
 Client-server architectures and performance
Significant commercial progress
 Important and innovative systems
E.g., O2, ObjectStore, ODE
Quite a few commercial product offerings
GemStone, Objectivity, ObjectStore, Ontos, O2, Matisse,
Poet, Versant, others
The ODMG-93 standard (release 2.0)
 Consortium of OO-DBMS startups
 Three key parts: ODL, OQL, C++ binding
But the Revolution "Failed" ($0B)
Lingering OO-DBMS differences
 Query power, API details, implementation twists
 Piecewise ODMG standard conformance (ex: OQL!)
Still behind R-DBMSs in important ways
 Codasyl-like schema compilation cycle
 Schema evolution painful, if supported
 Typically missing many useful "goodies"
Support for multiple application languages
 Query optimization, views, authorization, constraints, triggers,
multi-user scalability and robustness, ...
Other factors (niche market)
 SQL-based application building tools
 Architecturally biased towards "fat clients"
OO Client Wrappers are the Answer...
Available from a number of vendors
 Persistence Software, Ontologic, HP, Next, ...
Language-specific relational wrappers
 Proxy classes for C++ or Smalltalk (or Java)
 Mapping of row data into language objects
 Client-side (or middle-tier) object caching and
method execution
Why is this approach attractive?
 Can develop OO applications today, against
existing enterprise data, for "business objects"
...Not!
Paradigm mismatch for querying
 C++ or Smalltalk for simple business logic
and navigation, against object-oriented
schema
 SQL for queries, against relational schema
Choice forced for business logic & rules
 Do on server, using DBMS facilities?
Check constraints, referential integrity constraints,
triggers, stored procedures, authorization
Do on client, using OO wrapper facilities?
C++ or Smalltalk (or Java) programming
This had better be a stop-gap solution
 R-DBMS could become a storage manager,
throwing away 20+ years of successful R&D!
The Object-Relational DBMS Evolution
 Third Generation Database System Manifesto
 Support rich object structures and rules
Rich type system, inheritance, encapsulation
 Functions, optional unique ids, rules/trigggers
Subsume second generation database systems
High-level query-oriented interface
 Stored and virtual collections
 Updatable views
 Data model/performance feature separation
Open to other subsystems (tools, middleware)
Accessible from multiple languages
 Layered persistence-oriented language bindings
 SQL support ("intergalactic dataspeak")
 Query-shipping architecture
"Not Your Father's Employee Type"
Beyond name, rank, and serial number
 Several new attribute types
Location (2-d point), job description (text), photo (image), ...
Associated functions
Distance(point, point), contains(text, string), ...
Beyond your basic employee record
 Employees come in different flavors
Employees have many known relationships
Emp, RSM, Programmer, Manager, Temp, ...
Manager, department, projects, ...
Employees have behavior
Age(Emp), qualified(Emp, Job), hire(Emp), ...
An Employee is a "business object"
Two Flavors of O-R Object Extensions
Object extension #1: Abstract data types (ADTs)
 New column types and functions
E.g.,text, image, audio, video, time series, point, line, OLE...
 For modeling new kinds of facts about enterprise entities
Object extension #2: Row types
 Types and functions for rows of tables
Includes inheritance, references, set-valued attributes
 For modeling business objects with relationships & behavior
Impact on schemas and query language: SQL3
 Schemas: tables at the top, OO richness within
 Queries: extensions to support the added richness
 Structured types: support both ADT and row type
object modeling needs (unified type system)
ADTs (Black Box)
To define and use a "black box ADT", a user will
 Implement its internal structure and functions in an
external programming language (e.g., C/C++, Java)
 Use the DDL to register the type with the DBMS
Size of an instance of the type
 Input (constructor) and output functions
 Other functions and operators, including signatures and
linkable implementations
 Costs and other properties for query optimizer
Use the new type like a built-in data type
Now available for defining columns of tables
 Functions and operators become available in queries
Example: Illustra Black Box ADT
Point as a "black box ADT" (written in C)
create type Point
(
internallength = 16;
input = point_in;
output = point_out;
);
-- typedef struct {double x, double y} point
-- for reading in Point constants
-- for displaying Point results
create function point_in(Text) returns Point as
external name 'MI_HOME/functions/point.so'
language C;
create function point_out(Point) returns Text as
external name 'MI_HOME/functions/point.so'
language C;
Example: Illustra Black Box ADT (cont.)
Now we can put an end to "Pointless" queries...!
create function further_west(Point, Point) returns Boolean as
external name 'MI_HOME/functions/pointfuns.so'
language C;
select E1.name, E1.location
from Emp E1, Emp E2
where further_west(E1.location, E2.location) and E2.name = 'Mike';
create binary operator binding to further_west;
select E1.name, E1.location
from Emp E1, Emp E2
where E1.location >> E2.location and E2.name = 'Mike';
ADTs (White Box)
To define and use a "white box ADT", a user will
 Describe its internal structure using SQL3 DDL
Attribute definitions are column-like
 Advantages: heterogeneity, nulls, nesting, constraints, ...
Implement its functions either directly in SQL or in
his/her favorite external programming language
Finish explaining the type to the DBMS using DDL
For query optimizer, as before
Use the new type like a built-in data type
Utilize system-generated accessors and mutators
In tables and queries, as before
Note: this is just a SQL3 structured type definition that's
primarily intended for use in columns
Example: DB2 UDB/OSF White Box ADT
Point as a "white box ADT" (written in SQL3)
create type Point as
(
x double,
y double,
);
create function distance(p1 Point, p2 Point) returns Point
language SQL inline not variant
return sqrt((p2..y-p1..y)*(p2..y-p1..y) + (p2..x-p1..x)*(p2..x-p1..x));
select E.name
from Emp E, City C
where C.name = 'San Jose'
and distance(E.location, C.center) < 25;
Of Extenders, Blades, and Cartridges
High performance demands "deep" integration
 Optimizer must know about an ADT operator's...
Execution cost (especially for expensive functions)
 Logical properties (e.g., transitivity, negator, ...)
 Selectivity estimates (i.e., filtering/matching power)
 Relationship to access methods (both old and new)
DBMS runtime must invoke functions efficiently
Static vs. dynamic loading, fenced vs. unfenced execution
Partnerships and third-party packages
 E.g., DB2's text, image, and spatial extenders
 Package contains types, functions, access methods,
optimizer information, and SQL DDL statements for
all of the above
Row Types
To define and use a "row type", a user will
 Create the desired structured type using SQL3 DDL
Create functions/methods involving the type
Columns, plus (optional) specification of a supertype
Arguments of the new type, w/overloading in the case of methods
Create one or more tables of the indicated type
Type hierarchy (if any) yields corresponding table hierarchies
Type Hierarchy
Person_t
Emp_t
Kid_t
Table Hierarchy
IBM_People
IBM_Emps
IBM_Kids
Example: SQL3 Row Types (plus Sets...)
Employees are people, so ...
create type Person_t as(
name Varchar(20),
birthdate Date)
method age( ) returns Integer language SQL;
create method age( ) for Person_t
return year(current date) - year(birthdate);
create table IBM_People of Person_t (ref is self);
(**Note: this is approximate SQL3 syntax)
create type Emp_t
under Person_t as (
salary Float,
job_description Varchar(100),
department ref(Dept),
projects set(ref(Project)
);
create table IBM_Emps of Emp_t
under IBM_People (...);
Queries Over Row Types
SQL's query constructs, extended with the ability to
access these features (a la SQL3 plus sets)
User-defined functions in queries (w/late method binding)
 Dereferencing of references (path expressions)
 Queries over nested collections (table expressions)
For example, find unexplainable discrepencies
between employees' and managers' salaries:
select E.name, E.manager->name, display(E.photo)
from IBM_Emps E
where E.salary > E.department->manager->salary
and E.department->manager->age( ) > E.self->age( )
and not contains(E.job_description, "Java")
Other OR-Related Features
Support for large objects
 Multimedia data types aren't small (e.g., video)
 Special handling required for efficiency
Minimal copying, piecewise retrieval, optional logging, movement to/from
files, separate storage area from other attributes
DB2 has blob, clob, and dbclob types (up to 2GB)
Support for active data (triggers and constraints)
 Ex:
create trigger me_too
after insert on IBM_Emps
referencing new as newemp
foreach row mode db2sql
when salary > department->manager->salary
begin atomic
set newemp.department->manager->salary
= newemp.salary;
end
OR-DBMS Technology Status
Many OR-DBMS research results
 Postgres, EXODUS, Starburst, ...
 OODB query processing research
Commercial systems exist today
 IBM DB2 CS (V2.1) and CA-Ingres
Illustra, UniSQL/X
User-defined types & functions, large objects, triggers
Early providers of ADTs, row objects, inheritance
IBM DB2 UDB, Informix, Oracle
"Universal server" products contain subsets of all this stuff
Standards right around the corner
 SQL3 is "hardening" and has an object part with
structured types, table hierachies, user-defined
functions and methods, object views, ....
Some OR-DBMS Performance Issues
Bucky OR-DBMS benchmark from UW-Madison
 Based on a hypothetical university schema
 Exercised a range of OR-DBMS features
Row types, inheritance, late binding, subtables
 Queries involving path expressions and/or sets
 ADTs (black or white box) and functions
In Proc. 1997 ACM SIGMOD Conference
Tested a first-generation OR-DBMS product
 OR versus relational simulation, same DB engine
Showed benefits of (complex) ADTs, indexes on functions
 Indicated areas where query optimization needs schema
support: scope for path expressions, inverse relationships
 Turned up bugs and performance problems (e.g., sets)
OR Enterprise Scenerio (w/Challenges)
Object-relational server managing the database
 ADTs w/inheritance and multi-language support
 Row types, integrated with all of SQL (OO views,
authorization, triggers, constraints, etc.)
High-function, OO, caching front-ends
 Support for desktop and middle-tier (web!) applications
 OR object model at all levels, for queries and navigation
 Clean bindings for OOPLs (Java, C++, Smalltalk)
 Methods/queries running on client or server
 Likewise for triggers and constraints
Business rules specified & implemented once!
 In SQL (+ OOPL), running where appropriate
Multi-Tier Integration Challenges
Good mappings and interfaces to provide
object-relational objects to OOPLs
Java, C++, Smalltalk, others
 Full query support in addition to navigation
Challenges in querying and caching
 Intelligent querying over cache + database
 Correct and efficient caching of view objects
Update-related challenges
 Triggers and constraints of all types
 View objects (both directions)
Method execution on client or server
 Java should be very useful here
Legacy Data Access Challenges
Some data will live outside the OR-DBMS
 Older DBMSs (both relational & pre-relational)
 Specialized data stores (documents, images, ...)
 Applications (i.e., legacy transactions)
Object-relational middleware is the answer!
 Table functions can handle simple cases now
 Distributed OR query engine (a la DataJoiner) can
mediate between new applications and legacy data
 Resulting appearance is that of an integrated OR
database, accessible via SQL3 APIs and OO tools
Front-End Integration + Legacy Data Access
C++ Client
Co-op Cache
Java Client
Co-op Cache
Smalltalk Client
Co-op Cache
Co-op Interface
Object-Relational
Engine
Query
Object Wrappers
OR-DBMS
R-DBMS
Image Mgr
Text Mgr
Conclusions
Relational DBMS era: 1980's, early 1990's.
 Significantly raised the levels of abstraction & productivity
 Only "real" parallel computing success story to date, too!
Object DBMS era: Should have been early 1990's...
 Never made it out of the (mainstream) starting gate
Object-relational DBMS era: You are there!
 Object enhancements to relational DBMSs
ADTs (white box, black box) and functions
 Row types with inheritance, references, sets, ...
Vastly reduces the "impedence mismatch" w/OOPLs
Today's OO wrappers are an interim solution
 Possibilities abound for nice OO/OR tools
Will have OR middleware as well as engines
					 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
                                             
                                             
                                             
                                             
                                             
                                             
                                             
                                             
                                             
                                             
                                            