Download Mirror, Mirror on the Wall, What is the Best Database

Document related concepts

Database wikipedia , lookup

Entity–attribute–value model wikipedia , lookup

Relational model wikipedia , lookup

Clusterpoint wikipedia , lookup

Object-relational impedance mismatch wikipedia , lookup

Database model wikipedia , lookup

Transcript
Mirror, Mirror
on the Wall,
What is the
Best <XML>
Database
Solution of All?
Akmal B Chaudhri
Senior Architect
Informix Labs
Disclaimer

Any opinions expressed are mine and not
necessarily those of my employer
Copyright © 2001 Informix
2
Acknowledgements

All trademarks are acknowledged

Various people at Informix for some of
the presentation material

Robert Sutor, Lee Kheng Joo and Eve
Maler
Copyright © 2001 Informix
3
Abstract

Managing XML documents is problematic when
document collections grow. How do we
successfully store and query document
collections? One solution is a database. We
will discuss the problem of integrating XML with
databases and examine choices, such as
relational databases, object databases, objectrelational databases and native XML servers
Copyright © 2001 Informix
4
Speaker Biography

The speaker has been working in the
area of Object Databases for 10 years

He has previously worked for Reuters,
Logica and Computer Associates as well
as OODB research at City University
?
Copyright © 2001 Informix
5
Life at Informix Labs

I build linear accelerators!

Technology Evangelist 

Technology Pragmatist 
Copyright © 2001 Informix
6
Agenda

The Importance of XML

Architectural View of XML

Demonstrations
 XML
to ORDB using “roll-your-own”
 XML to ORDB using Mapping Tool
Copyright © 2001 Informix
7
The Importance of XML
Waves of Technology
eCommerce
Internet Web Computing
Client Server
Departmental Servers
Mainframes
Copyright © 2001 Informix
9
The Importance of XML
By 2003, more than 75% of
ebusiness applications will include
XML, regardless of which language
the application has been written-in.
Copyright © 2001 Informix
10
Tasks/Roles Assumed by XML
Data transfer between applications and
systems
As middleware between an RDBMS and
an e-commerce front end
As a document repository, possibly
replacing SGML repositories
As a centralized database
Other
0
10
20
30
40
50
60
%
Source: [Walker00]
Copyright © 2001 Informix
11
Vendor Market Share 1999
Vendor
Product
Revenue
US$ Million
Market
Share %
Sterling/CA
Vision
3
26.8
SAG
Tamino
2.4
21.4
Poet
CMS
2.1
18.8
eXcelon
1.5
13.4
2.2
19.6
eXcelon
All Others
Source: [IDC00]
Copyright © 2001 Informix
12
XML DBs Predicted Growth
800
700
US$ Million
600
500
400
300
200
100
0
1999
2000
2001
2002
2003
2004
Source: [IDC00]
Copyright © 2001 Informix
13
Architectural View of XML
Jump Gate Ready …
Copyright © 2001 Informix
15
XML Persistence Options

Indexed File System

Database System
 Relational
 Object
 Native

Dynamic Hashing Libraries

Hybrid
Source: [Edwards01]
Copyright © 2001 Informix
16
XML Database Products
Type
Middleware
XML-Enabled DBs
Native XML DBs
XML Servers
XML App Servers
CMS
Persistent DOM
DataCentric






DocumentCentric




Source: [Bourret00]
Copyright © 2001 Informix
17
Data-Centric
Fine-grained data
 Order of elements not significant
 Examples

 Sales
Order
 Flight Schedule
 Restaurant Menu
…

Machine consumption
Source: [Bourret00]
Copyright © 2001 Informix
18
Document-Centric
Large-grained data
 Order of elements is significant
 Examples

 Book
 Email
 Advertisement
…

Human consumption
Source: [Bourret00]
Copyright © 2001 Informix
19
Three Types of XML DBs

XML Generating Database

XML Document Database

XML Component Database
Source: [Chelsom00]
Copyright © 2001 Informix
20
XML Generating Database

XML is generated from the database
XML
Document
XML
Formatter
Copyright © 2001 Informix
21
XML Document Database

Database stores complete XML
documents or document fragments
XML
Document
Copyright © 2001 Informix
XML
Document
XML
Document
XML
Document
22
XML Component Database

Full XML awareness
XML
Document
Copyright © 2001 Informix
<A>
<A>
<B>...</B>
<A>
<B>...</B>
</A>
<B>...</B>
</A>
</A>
23
XML Persistence Options

Database System
 Relational
 Object
 Native
Copyright © 2001 Informix
24
Object Databases
eXtensible Markup Language
Enterprise Java Beans
Java Language
World Wide Web
Copyright © 2001 Informix
25
IEEE Computer, August 2000
Copyright © 2001 Informix
26
Cattell vs. Stonebraker

Cattell
Object-oriented databases
are doing just fine, and the
news of their demise is highly
exaggerated.

Stonebraker
ODBMSs occupy a small
niche market that has no
broad appeal. The technology
is in semi-rigor mortis, …
Source: [Leavitt00]
Copyright © 2001 Informix
27
DB Sales Revenue
1999
2001*
US$
US$
RDB/ORDB
11.1 Billion
15.6 Billion
OODB
211 Million
265 Million
*Predicted
Source: [IDC00]
Copyright © 2001 Informix
28
XML and OO

XML is not OO
 No
inheritance
 No encapsulation
 No behaviour
 ...
OODB is overkill for structured text
 Some Content Management Systems are
built on top of OODBs

Copyright © 2001 Informix
29
XML Persistence Options

Database System
 Relational
 Object
 Native
Copyright © 2001 Informix
30
Native Databases
Many vendors developing “Native” XML
databases
 Documents needed in original form

 Structural
information is maintained
 Storage, query and retrieval of structure and
content
Good for point solutions
 Support for non-XML data?

Copyright © 2001 Informix
31
XML Persistence Options

Database System
 Relational
 Object
 Native
Copyright © 2001 Informix
32
Relational Databases

RDB products scale well

Traditional and semi-structured data can
co-exist and be used by multiple
applications

RDBs can process complex XML queries
on large databases within seconds
Source: [Florescu99]
Copyright © 2001 Informix
33
Three Things We Need To Do

Get XML into Database (storage)

Get XML out of Database (retrieval)

Query XML (processing)
Copyright © 2001 Informix
34
BLOB/CLOB Storage
XML Storage
Multiple Relational
Table Mapping
purchase_order
customer
<?xml version='1.0'?>
<ORDER id="abc123" date="27 Oct 1999">
<PERSON age="50" gender="Male">
<NAME>
<FAMILY>Doe</FAMILY>
<GIVEN>John</GIVEN>
</NAME>
<ADDRESS> ... </ADDRESS>
</PERSON>
<ITEM id="s1">Shirt</ITEM>
<ITEM id="j2">Jacket</ITEM>
</ORDER>
items
XML DataPort
Hierarchical
Storage
Copyright © 2001 Informix
1.0
ORDER
id, date
1.1
1.2
1.3
PERSON
ITEM
ITEM
gender,age
id
id
1.1.1
1.1.2
NAME
ADDRESS
1.1.1.1
1.1.1.2
FAMILY
GIVEN
35
Choosing RDB Storage Model

If Relational schema already exists
 Consider

If no Relational schema exists
 Consider

mapping to multiple tables
BLOB/CLOB model
If documents needed in original form
 Consider
hierarchical model
 Structural information is maintained
 Storage, query and retrieval of structure and content
Copyright © 2001 Informix
36
XML Processing

XML is SGML derivative

HTML is SGML derivative

Therefore …
 Tools
used for HTML can be reworked for
XML
 DTDs/XML Schema
 SELECT query results formatted as XML
Copyright © 2001 Informix
37
XML Storage/Retrieval

Multiple Relational Tables
 Roll-your-own
 Mapping
Tool
 JDBC

BLOB/CLOB
 Verity/Excalibur

Hierarchical Storage
Copyright © 2001 Informix
38
BLOB/CLOB

BLOB storage for semi-structured data
 This

is the usual approach
Indexing is key to efficient query
processing
 Full-text
indexing for semi-structured data
 Advanced indexing for path queries
Copyright © 2001 Informix
39
BLOB/CLOB
XML
Document
Copyright © 2001 Informix
40
Indexing Example
create table docs (id serial, xml_doc clob);
insert into docs values (0,
FileToClob('d:\xml\order_abc123.xml', 'server'));
create index idx1 on docs (xml_doc vts_clob_ops)
using vts in sbspace;
select * from docs
where vts_contains(xml_doc, '(John) <IN> GIVEN');
Copyright © 2001 Informix
41
XML Storage/Retrieval

Multiple Relational Tables
 Roll-your-own
 Mapping
Tool
 JDBC

BLOB/CLOB
 Verity/Excalibur

Hierarchical Storage
Copyright © 2001 Informix
42
XML Storage/Retrieval

Multiple Relational Tables
 Roll-your-own
 Mapping
Tool
 JDBC

BLOB/CLOB
 Verity/Excalibur

Hierarchical Storage
Copyright © 2001 Informix
43
XML Storage/Retrieval

Multiple Relational Tables
 Roll-your-own
 Mapping
Tool
 JDBC

BLOB/CLOB
 Verity/Excalibur

Hierarchical Storage
Copyright © 2001 Informix
44
JAXP Overview

Java API for XML Parsing (JAXP) is
currently available for programmatically
accessing XML documents

JAXP can be divided into three sets
 Simple
API for XML (SAX)
 Document Object Model (DOM)
 Plugability Layer
Copyright © 2001 Informix
45
JAXP Glossary



SAX - event-driven protocol, with the
programmer providing callback methods that
the parser invokes when parsing a document
DOM - random-access protocol, which
converts an XML document into a collection of
in-memory objects
Plugability Layer - standardizes access to
SAX/DOM by providing “Factory” methods for
creating and configuring SAX parsers and
creating DOM objects (type “Document”)
Copyright © 2001 Informix
46
XML in JDBC 2.20

We would like to support users who use JAXP
in their JDBC applications without putting code
that is specifically related to JDBC in the driver

New static methods to facilitate storage and
retrieval of XML data in database columns

These methods not only support users of XML
but also provide flexibility regarding which
JAXP package the user is using
Copyright © 2001 Informix
47
Storing XML Data

The methods used during data storage
will assist in
 Parsing
the XML data
 Verify that well-formed and/or valid XML
data are stored
 Invalid XML data are rejected
Copyright © 2001 Informix
48
XMLtoString() Example
-- Example of inserting an XML file into an lvarchar column
create table tab1 (col1 lvarchar);
try {
String cmd = "insert into tab1 values(?)";
PreparedStatement pstmt = conn.prepareStatement(cmd);
pstmt.setString(1, UtilXML.XMLtoString("/tmp/x.xml"));
pstmt.execute();
pstmt.close();
} catch (SQLException e) { ... }
Copyright © 2001 Informix
49
Retrieving XML Data

The methods used during data retrieval
will assist in converting
data to type “InputSource” which is the
standard input type for both SAX and DOM
methods
 XML data to DOM
 XML
Copyright © 2001 Informix
50
getInputSource() Example (1)
-- Fetch XML data from an lvarchar column into an InputSource
-- for (SAX) parsing
try {
String sql = "select col1 from tab1";
Statement stmt = conn.createStatement();
ResultSet r = stmt.executeQuery(sql);
// Other SAX parsers can go here if desired
Parser p =
ParserFactory.makeParser("com.sun.xml.parser.Parser");
p.setDocumentHandler(new myHandler());
p.setErrorHandler(new errHandler());
Copyright © 2001 Informix
51
getInputSource() Example (2)
while(r.next()) {
InputSource i = UtilXML.getInputSource(r.getString(1));
p.parse(i);
}
r.close();
} catch (SQLException e) { ... }
Copyright © 2001 Informix
52
DOM Support

The DOM specification does not provide
a standard way to create a DOM object

JAXP provides factory methods that
provide a standard way of creating DOM
objects
Copyright © 2001 Informix
53
InputStreamtoDOM() Example
-- Fetch XML data from a text column into a DOM object
create table tab2 (col1 text);
try {
String sql = "select col1 from tab2";
Statement stmt = conn.createStatement();
ResultSet r = stmt.executeQuery(sql);
while(r.next()) {
Document doc =
UtilXML.InputStreamtoDOM(r.getAsciiStream(1));
}
r.close();
} catch (SQLException e) { ... }
Copyright © 2001 Informix
54
XML Parser

JDBC driver uses Sun’s JAXP API and
by default a non-validating XML Parser

The default can be changed in two ways
where <new parser> is the alternative
parser
%
java -Dorg.xml.sax.parser=<new parser>
 System.setProperty("org.xml.sax.parser",
"<new parser>");
Copyright © 2001 Informix
55
JAXP Summary

JDBC 2.20 XML support makes it easy to
store/retrieve XML documents to/from an
Informix Database using Sun’s JAXP 1.0 API

Ensures valid or well-formed XML document
during insertion because of XML parsing using
the SAX protocol

Sun’s non-validation parser is used by default,
but the ability to specify and use any parser is
provided
Copyright © 2001 Informix
56
Demonstrations
Architecture for Demos
Source:
Derived from [Plummer99]
Copyright © 2001 Informix
58
Cloudscape

Cloudscape can store Java objects in
table columns
 Not
just blobs – objects have structure

Java code can accept different data and
store as XML

Embed XML formatter into Cloudscape
 Extend
server
Copyright © 2001 Informix
59
Cloudscape Demo: Tables
create table xml_objects (dtd_name char(20),
constraint dtd_name_primary_key primary key,
xml serialize(xmlobject));
create table dtd_nodes (nodename char(20),
constraint nodename_primary_key primary key,
contains_elements varchar(20),
node_root boolean,
contains_attributes varchar(20),
attribute_required boolean,
contains_data boolean,
data_required boolean);
Copyright © 2001 Informix
60
Cloudscape Demo: Java (1)
import ...
public class XMLObject implements Serializable {
public Vector elementNames;
public Vector elementValues;
public String rtnString;
public void XMLObject(Vector names, Vector values) {
this.elementNames = names;
this.elementValues = values;
}
...
Copyright © 2001 Informix
61
Cloudscape Demo: Java (2)
...
public String returnXMLFormat(String DTD) {
genFromDTD dtd = new genFromDTD(); // XML Formatter
rtnString = genFromDTD.returnXMLFormat(this, DTD);
return rtnString;
}
public String toString() {
return "XMLObject Class";
}
}
Copyright © 2001 Informix
62
Cloudscape Demo: SQL
select xml.returnXMLFormat('BOOKS')
from xml_objects
where dtd_name = 'BOOKS';
Copyright © 2001 Informix
63
Cloudscape XML Demo
1.
Start Cloudview
3.
Compile Java Files
5.
Start Web Server
2.
View XML
4.
Start Cloudview
6.
Start Browser
7.
Stop Web Server
Copyright © 2001 Informix
64
Object Translator

Provides an object view of a database
 Supports

Java™/EJB (and VB/MTS)
Builds an object model from a relational
schema
 DBA
can focus on the schema, developers focus on
Java

Outputs components

Supports Cloudscape, Informix and other
JDBC sources
Copyright © 2001 Informix
65
Mapping/Modelling Process
Compile-time
SQL
UML
OR Maps
Runtime Database
Access
Object Model
Forward Engineer
Data Model
Reverse Engineer
Object Translator Solution
Copyright © 2001 Informix
66
Object Translator 1.1
Developer maps XML documents to map
objects
 Generated Java objects become XML
document handlers

 Store
and restore the XML document data in
the database

XML markup is not stored or restored
 Allows
applications to use existing schemas
for incoming XML documents
Copyright © 2001 Informix
67
Object Translator XML Demo

Use an existing XML document

Create links between elements of XML
document and attributes of map object

Generate Java files and servlet from map
object

Compile and run
Copyright © 2001 Informix
68
Object Translator XML Demo
1.
Start Cloudview
3.
Start OT
5.
Start Web Server
2.
View XML
4.
Copy Files
6.
Start Browser
7.
Stop Web Server
Copyright © 2001 Informix
69
What about Performance?

A couple of independent benchmarks are
being developed
 XMach-1
 XML
Store
 ...
Copyright © 2001 Informix
70
Example Performance Results
We conclude DTD approach is the
best strategy among the six
approaches we studied and there is
no clear need to build an “XMLspecific” database system.
Source: [Tian]
Copyright © 2001 Informix
71
Final Thoughts ...

Technology is moving fast

Vendor marketing ahead of product
capabilities

Many “Beta” products available
Copyright © 2001 Informix
72
Software Downloads

Cloudscape
 http://www.cloudscape.com/

Object Translator
 http://www.informix.com/idn-
secure/webtools/ot/
Copyright © 2001 Informix
73
Resources
http://www.oasisopen.org/cover/xmlAndDatabases.html
 http://www.rpbourret.com/xml/
 http://www.sees.bangor.ac.uk/~rich/resea
rch.html
 http://www.xml-und-datenbanken.de/
 http://www.soi.city.ac.uk/~akmal/html.dir/
benchmarks.html

Copyright © 2001 Informix
74