Download XML In An RDBMS World

Document related concepts

Oracle Database wikipedia , lookup

Relational algebra wikipedia , lookup

Concurrency control wikipedia , lookup

Microsoft Jet Database Engine wikipedia , lookup

Open Database Connectivity wikipedia , lookup

Extensible Storage Engine wikipedia , lookup

Microsoft SQL Server wikipedia , lookup

Database wikipedia , lookup

Functional Database Model wikipedia , lookup

SQL wikipedia , lookup

PL/SQL wikipedia , lookup

Entity–attribute–value model wikipedia , lookup

Clusterpoint wikipedia , lookup

Relational model wikipedia , lookup

Database model wikipedia , lookup

Transcript
XML In An RDBMS World
Michael D. Thomas
[email protected]
Data, data, data




In general, computing is never far from
data
XML, RDBMSes deal with data – play in
the same very large playground
Extensive overlap between RDBMSes
and XML
Goal: Learn how to choose between
RDBMS vs. XML, how to integrate the
two technologies
First, some pragmatism…






“When all you have is a hammer,
everything looks like a nail.”
What if all you have is a hammer?
“When all you have is a hammer,
everything must look like a nail.”
Best to have XML and RDBMS in your toolset
Architectural purity is often impossible
But we should know when & what we are
compromising
Tonight’s Agenda




Compare XML and SQL RDBMS
Technologies
Understand Their Sweet Spots
Examine Their Use Through Studies of
Anti-Patterns and Patterns Of Usage
Discuss XQuery and XML Storage In
The RDBMS
Topics









Defining The Playground – XML vs. RDBMS
Anti-Pattern: XML As Transactional Data Store
Data Stores & The ACID Test
Anti-Pattern: Document Storage & The RDBMS
XQuery & RDBMS Doc Storage
Anti-Pattern: XML Declarative Languages &
Gratuitous Encapsulation Of SQL
Understanding Declarative Languages, Such As SQL
Anti-Pattern: XML Meta-data & The Overly Abstract
Database
Managing The Split Between XML-based & RDBMSbased Data Models In The Same Application
XML As Database Anti-Pattern




“I’ll use an XML file as a database”
Advantages: Simple, Cheap
An XML Schema defines the metadata
of the database
Data is right on the filesystem, not
“hidden” in a complicated SQL RDBMS
XML As Database Problems




Updates are hard
Have to re-invent concurrency handling
– multiple users changing data at the
same time
No constraint checking
No optimization, such as indexes,
caching, etc.
Organization of Data





Been formalizing the storage
information for a long time
Age of computing called for new
approaches
Relational model highly successful
XML is relatively new
Good for loosely structured data –
documents -- and data transmission
Transmission &
Interoperability




Electronic transmission is newer than
electronic storage
With the growth of the Internet,
transmission of data is exploding
No Interoperability problems in 1950 –
lots of interoperability problems now
XML is an important standard for
transmission and interoperability
Vocabulary



Datastore – Anything that stores data
SQL Relational Database – A database
that organizes info in tables and
adheres to relational theory
XML – 1) eXtensible Markup Language
2) The XML standard 3) All or part of an
XML document
Vocabulary



XML Database – A database specialized
for the storage of XML documents
Object Database – A database that
stores objects
Object-Relational Database – A
relational database with an extensible
type system
Data Centric vs. Document
Centric


Data Centric approach – the datastore
is focused on handling highly
structured, fine grained data. Favors the
relational model
Document centric approach – the
datastore is focused on handling semistructured data, such as web pages,
books, etc. Favors XML
Data Centric vs. Document
Centric




Best queries are largely a result of structuring
the data well (Messy desk vs. organized file
cabinet)
Documents are semi-structured data with ad
hoc structures
The overhead of defining rigorous structure
for each type of document increases the
overall cost of management
(Would still give you the best queries)
Storage vs. Transmission




Data Storage and Data Transmission are
two different concepts
XML is very strong for interoperable
transmission
Can store by writing XML to a file
You can ‘transmit’ relational data by
exporting a few tables and ftping, but
isn’t a strong solution
Datastore Application Stack
Application
Setup
Metadata
Definition
Data
Persistence
Datastore
Query
XML/Relational Comparison
Relational/SQL
XML
Metadata
Definition
Create Table…
Data
Persistence
Insert …
Update …
Select …
Define XML
Schema
(Optional)
Create XML
Document
XPath, XQuery
Data
Query
Application With XML Transmissions
(X)HTML
XSLT
Style
sheet
CSV
XML
XSLT
Style
sheet
XSLT
Style
sheet
XML
XSLT Processor
XML
Application
XML
Web Service
Producer
Datastore
XML
Web Service
Consumer
Three Data Models



Persistent Model – how data is stored
Active Model (Object Model) – how data
is arranged when it is being
manipulated by a program, usually
written in an imperative language
Presentation/Transmission Model – how
data is transmitted, usually as XML
Datastore Basics

Any Datastore must tackle the following
issues:






Concurrency
Transactions – the ACID test
Locking
Joins
Normalization
Administration Issues
Concurrency


Datastores support concurrency if
multiple users can access the same
datastore at the same time
Datastores must not allow the same
data to be modified at the same time
Transactions




Atomicity – No matter how complex, a
transaction is atomic and indivisible.
Transactions are “all or nothing.”
Consistency – Transactions must leave the
database in a consistent state, i.e., consistent
with the rules
Isolation – Transaction is isolated from other
transactions
Durability – The effects of a transaction
persist
Isolation levels




TRANSACTION_READ_UNCOMMITTED
– Dirty reads, non-repeatable reads,
phantom reads
TRANSACTION_READ_COMMITTED
– Non-repeatable reads, phantom
reads
TRANSACTION_REPEATABLE_READ
– phantom reads
TRANSACTION_SERIALIZABLE
Locking





Data must be locked for transactions to be
isolated
Locking is both a datastore and an application
concern
How much extraneous data is locked? (Page
level locking, document level locking)
Pessimistic locking: prevents reading of
locked data
Optimistic locking: generates an error when
inappropriate data updates are attempted
Joins





A Join joins the data between two
different data entities
E.g., SELECT * from emp, dept
WHERE emp.deptno =
dept.deptno
Joins are the cornerstone of SQL
XPath doesn’t do joins between XML
documents!
XQuery, others can
Normalization





Normalization – organizing data to minimize
redundancy
Normalized data is easier to maintain and
easier to understand conceptually
In relational db design, normalized DBs need
to be denormalized for performance reasons
XML docs can also be normalized, but not as
much support for tying elements together
Normalization is important when designing,
less crucial at implementation
Administrative Issues


Includes backup & recovery, installation,
upgrades, optimization, maintenance, etc.
In general:




Bigger is better (economies of scale, 24x7
support)
More popular, more standard is better (law of
increasing returns)
Whatever is already working in your organization
is better (no need to hire, re-train administrators)
Existing DB vendors have a huge advantage
Datastores

Different types:



Relational Database – highly structured
data such as account balances, inventory
quantities, etc.)
Document Database – used to store
documents, probably in XML format
The same DB can serve as both, e.g.,
Oracle
Overview Of Relational
Databases



The Relational Model
SQL
Entity Relationship Diagrams
Relational Model -- structure





Data is grouped in tables
Tables have columns and rows
Columns are fairly fixed – the set of
columns shouldn’t change much (if at
all) over the life of a table
A table can have any number of rows
Rows change constantly
Relational model – primary
key




In general, a table should have one or
more columns defined as the primary
key
The primary key is unique and non-null
Usually only one column
Can consist of more than one column
(composite primary key)
Relational model – foreign key




A foreign key describes a relationship
between two tables
The foreign key column of tableA points
to a column in tableB
TableB is said to be the parent of tableA
Often, the foreign key points at a
primary key
SQL
Three types:
 Query SELECT * FROM emp WHERE
deptno=10 ORDER BY ename


Data Modification Language (DML):
Update, Insert
Data Definition Language (DDL): Create
Table
Joins
A Join is a Cartesian Cross-Product:
SELECT count(*) FROM emp;
SELECT count(*) FROM emp;
SELECT count(*) FROM emp, dept;
SELECT ename, emp.deptno,
dept.deptno, dname FROM emp, dept;
SELECT ename, emp.deptno,
dept.deptno, dname FROM emp, dept
WHERE emp.deptno = dept.deptno;

Joins
SELECT ename, dname FROM emp, dept
WHERE emp.deptno = dept.deptno
Entity Relationship Diagrams



Used to create a map of your data
Describes tables, attributes of tables
and relationships between tables
Come at it from two directions: lay out
the entities, assign the attributes; group
the attributes into entities
XML vs. SQL Tables






XML – order matters! (Rows in a relational
table are unordered)
XML is a tree structure
XML documents tend to be semi-structured,
SQL tables are highly structured
SQL tables aren’t as flexible or interchangable
XML joins aren’t straightforward
An XML document/element doesn’t serve
multiple purposes as well as DB schema/table
Anti-Pattern: XML Document
Storage As A SQL BLOB





Need to store XML in a database
Better than storing in a filesystem
Make a BLOB (Binary Large Object) and store
a document as an element in a row of a table
Problem: can’t query contents of document
without extracting it from the db!
For simple queries, might have to extract all
of the documents into the application – very
inefficient
Anti-Pattern: One-off SQL
Schema For A Particular XML
Document



BLOB storage is bad, so why not
“shred” the document across multiple
DB tables?
I.e., for XML elements named “Dept”
make a “Dept” table, “Emp” elements
make a “Emp” table, describe the
hierarchy with foreign key constraints
Query performance is much, much
better – probably better than XPath
against XML as a file
Problems




A lot of work! Have to do this for every
XML schema
Hard! XML schemas are inherently
more flexible than SQL schemas. Some
mappings can be difficult
Negates flexibility of XML
Not as learnable – Programmers have
to learn to query your SQL schema, not
just XPath and XML Schema
Anti-Pattern: Developing A
General “Shredding” Solution For
All Of XML




Your Application has several XML
schemas, and it’s time-consuming to
develop “shredding” for all of them
So, you try to do a more generalized
shredding
Is possible, but is a very horizontal
problem. You probably won’t get the
time to solve it completely.
Still might present learnability problems
XML Datastore Architectures







XML Views Of Relational Data
Relational Wrappers Of XML
Independent Storage of XML Documents
(Native XML Database)
Text Storage Of XML In RDBMS
XML Shredding Across Relational Tables
Storing XML as Objects in Object- Relational
DBs (Oracle XMLType)
Everything is XML (XQuery approach)
Vocabulary: XML Collections


XML Collection is a collection of XML
documents
“A row is to a table as an XML
document is to an XML collection.”
XML Derived From RDBMS



Data exists naturally as relational data
Needs to be represented as an XML
document for some reason
The derived XML is usually used for
transmission
Independent XML Store



A specialized database is used to store
XML documents
Typically, an application will either have
two datastores – relational and XML –
or relational data will be stored as XML
In the dual datastore case, the
application code has to join the
different data sets
Text Storage Of XML In
Databases



XML is stored in a column of a relational
table
Allows you to mix structured and semistructured approaches
However, hard to query against the XML
directly
XML Shredding Across Tables





An XML document is stored across
many tables
Is possible to use SQL queries against
the document’s parts
Vendors can implement an XPath-toSQL translator
Can structure the tables based on a
schema
A pain to handle yourself
Exercise

Shred an XML schema across relational
tables
XMLType





Object-Relational databases allow you to define your
own types
You could define an Address object, define functions
for the object, and store instances of the object in a
column in the database
Oracle defines an object-relational type, XMLType, for
the storage of XML documents in the database
SELECT e.poDoc.getClobval() AS poXML
FROM po_xml_tab e WHERE
e.poDoc.existsNode('/PO[PNAME =
"po_2"]') = 1;
Shredding is managed & encapsulated for you
Oracle 9i: XMLType, Text
Everything Is XML (XQuery)



With XMLType, Oracle says that
everything fits in to the ObjectRelational realm
XQuery says that everything can be
represented and queried as XML
Relational data is derived from RDBMS
using SQL/XML
XQuery
SQL/XML
select xmlelement("emp",
'Employee ' ,
xmlelement( "name", e.job || ' ' ||
e.ename),
' was hired on ',
xmlelement("hiredate", e.hiredate)) as
result
from emp e;
------------------------------------------------------------------------------
<emp>Employee
<name>CLERK SMITH</name>
was hired on
<hiredate>17-DEC-80</hiredate>
</emp>
Query Soup




SQL – queries relational data
XPath – queries a particular XML
document
SQL/XML – a standard for deriving XML
from relational data
XQuery – queries a collection of XML
documents and uses XPath to query
particular XML documents
Query Soup



There are other XML query languages,
such as XML-QL and Quilt
XSLT is a transformation, not a query,
language
XSQL & MS XML/SQL are wrapper
languages
Anti-Pattern: Encapsulating SQL
With XML




General Rule: “Encapsulation is good,
but encapsulating good things is bad.”
SQL is good
(Also, HTML is good)
Why hide SQL with XML?
XML encapsulating SQL
SELECT emp.ename, dept.dname FROM emp, dept WHERE
emp.deptno = dept.deptno
<sql-query>
<fields>
<field> ename </field>
<field> dname </field>
</fields>
<tables>
<table> emp </table>
<table> dept </table>
</tables>
<joins>
<join>emp.deptno = dept.deptno </join>
</joins>
</sql-query>
Same concept in XQueryX
“XQueryX is an XML representation of an
XQuery… The result is not particularly
convenient for humans to read and
write, but it is easy for programs to
parse…”
XQuery Example
XQuery:
{ for $b in
doc("http://www.bn.com/bib.xml")/b
ib/book where
$b/publisher = "AddisonWesley" and
$b/@year > 1991 return
<book year =
"{ $b/@year }"> { $b/title }
</book>
}

XQueryX Version
<?xml version="1.0" encoding="UTF-8"?> <?xml-stylesheet type="text/xsl" href="xqueryx.xsl"?>
<xqx:module xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:xqx="http://www.w3.org/2003/12/XQueryX"
xsi:schemaLocation="http://www.w3.org/2003/12/XQueryX xqueryx.xsd">
<xqx:mainModule> <xqx:queryBody> <xqx:expr xsi:type="xqx:elementConstructor">
<xqx:tagName>bib</xqx:tagName> <xqx:elementContent> <xqx:expr
xsi:type="xqx:flwrExpr"> <xqx:forClause> <xqx:forClauseItem>
<xqx:typedVariableBinding> <xqx:varName>b</xqx:varName>
</xqx:typedVariableBinding> <xqx:forExpr> <xqx:expr xsi:type="xqx:pathExpr"> <xqx:expr
xsi:type="xqx:functionCallExpr"> <xqx:functionName>document</xqx:functionName>
<xqx:parameters> <xqx:expr xsi:type="xqx:stringConstantExpr">
<xqx:value>bib.xml</xqx:value> </xqx:expr> </xqx:parameters> </xqx:expr>
<xqx:stepExpr> <xqx:xpathAxis>child</xqx:xpathAxis> <xqx:elementTest>
<xqx:nodeName> <xqx:QName>bib</xqx:QName> </xqx:nodeName> </xqx:elementTest>
</xqx:stepExpr> <xqx:stepExpr> <xqx:xpathAxis>child</xqx:xpathAxis>
<xqx:elementTest> <xqx:nodeName> <xqx:QName>book</xqx:QName>
</xqx:nodeName> </xqx:elementTest> </xqx:stepExpr> </xqx:expr> </xqx:forExpr>
</xqx:forClauseItem> </xqx:forClause> <xqx:whereClause> <xqx:expr
xsi:type="xqx:operatorExpr" xqx:infix="true"> <xqx:opType>AND</xqx:opType>
<xqx:parameters> <xqx:expr xsi:type="xqx:operatorExpr" xqx:infix="true">
<xqx:opType>=</xqx:opType> <xqx:parameters> <xqx:expr xsi:type="xqx:pathExpr">
<xqx:expr xsi:type="xqx:variable"> <xqx:name>b</xqx:name> </xqx:expr> <xqx:stepExpr>
<xqx:xpathAxis>child</xqx:xpathAxis> <xqx:elementTest> <xqx:nodeName>
<xqx:QName>publisher</xqx:QName> </xqx:nodeName> </xqx:elementTest>
</xqx:stepExpr> </xqx:expr> <xqx:expr xsi:type="xqx:stringConstantExpr">
<xqx:value>Addison-Wesley</xqx:value> \
XQueryX Version (cont.)
<xqx:QName>publisher</xqx:QName> </xqx:nodeName> </xqx:elementTest> </xqx:stepExpr>
</xqx:expr> <xqx:expr xsi:type="xqx:stringConstantExpr"> <xqx:value>AddisonWesley</xqx:value> </xqx:expr> </xqx:parameters> </xqx:expr> <xqx:expr
xsi:type="xqx:operatorExpr" xqx:infix="true"> <xqx:opType>&gt;</xqx:opType>
<xqx:parameters> <xqx:expr xsi:type="xqx:pathExpr"> <xqx:expr xsi:type="xqx:variable">
<xqx:name>b</xqx:name> </xqx:expr> <xqx:stepExpr>
<xqx:xpathAxis>child</xqx:xpathAxis> <xqx:attributeTest> <xqx:nodeName>
<xqx:QName>year</xqx:QName> </xqx:nodeName> </xqx:attributeTest> </xqx:stepExpr>
</xqx:expr> <xqx:expr xsi:type="xqx:integerConstantExpr"> <xqx:value>1991</xqx:value>
</xqx:expr> </xqx:parameters> </xqx:expr> </xqx:parameters> </xqx:expr>
</xqx:whereClause> <xqx:returnClause> <xqx:expr xsi:type="xqx:elementConstructor">
<xqx:tagName>book</xqx:tagName> <xqx:attributeList> <xqx:expr
xsi:type="xqx:attributeConstructor"> <xqx:attributeName>year</xqx:attributeName>
<xqx:attributeValue> <xqx:expr xsi:type="xqx:pathExpr"> <xqx:expr xsi:type="xqx:variable">
<xqx:name>b</xqx:name> </xqx:expr> <xqx:stepExpr>
<xqx:xpathAxis>child</xqx:xpathAxis> <xqx:attributeTest> <xqx:nodeName>
<xqx:QName>year</xqx:QName> </xqx:nodeName> </xqx:attributeTest> </xqx:stepExpr>
</xqx:expr> </xqx:attributeValue> </xqx:expr> </xqx:attributeList> <xqx:elementContent>
<xqx:expr xsi:type="xqx:pathExpr"> <xqx:expr xsi:type="xqx:pathExpr"> <xqx:expr
xsi:type="xqx:variable"> <xqx:name>b</xqx:name> </xqx:expr> <xqx:stepExpr>
<xqx:xpathAxis>child</xqx:xpathAxis> <xqx:elementTest> <xqx:nodeName>
<xqx:QName>title</xqx:QName> </xqx:nodeName> </xqx:elementTest> </xqx:stepExpr>
</xqx:expr> </xqx:expr> </xqx:elementContent> </xqx:expr> </xqx:returnClause>
</xqx:expr> </xqx:elementContent> </xqx:expr> </xqx:queryBody> </xqx:mainModule>
</xqx:module>
When To Use Declarative XML
XML As User Interface




Development teams often forget to
write use cases for administrators,
downstream developers
XML declarative languages must be
considered as ways to provide user
interfaces
Documentation, examples important
Usability, Learnability issues important
Declarative XML In App.
Lifecycle
Compile-Time – Best Performance, Least
Flexibility
•
Start-Time – XML Parsed Once, Good
Performance, Good Flexibility. App must be
restarted or manually refreshed to get
flexibility
•
• Run-Time – Can Impact Performance,
But Great Flexibility. Web Services
Approach
Start-Time Configuration
Runtime Configuration
Anti-Pattern: The Overly Abstract
Database





By using XML, you can achieve runtime
configurability
Instead of “hard coding” SQL table names, do
the same job in XML configuration files
Problem: database is so abstract that it is
very hard or impossible to do SQL queries
against it directly
Database is closed – just a persistence
extension to the application
The application (and thus, the app dev team)
assumes all responsibility for any work with
the data
The Split Persistence Data
Model






Applications Have Persistence Data Models
Persistence Data Model – any data stored on
disk
Everything might be in the DB
Everything might be in XML docs
Usually, there is some split – i.e., 95% in a
SQL DB, but 5% in a properties file
Managing the split between DB-based data
and XML-based data is “accidental
complexity” and can be a headache
Traditional Architecture
Table
Application
Table
Table
SQL Database
Table
Typical Architecture
Table
Application
Table
Table
SQL Database
XML
XML
XML
Table
Joins between data models



Can use XQuery to join the two data
models
Could use XPath enhanced SQL to join
the two models
Often, the two models are joined at the
object level
Databases Have Meta Data
Meta Data
Table
Application
Table
Table
Table
SQL Database
XML
XML
XML
XML Datastore
Databases Have Metadata



Column names, table names &
constraints are all metadata
Defined at DB design time, which is
usually app design time
App is often hard coded against the
database meta data
XML as DB Metadata
Meta Data
Table
Table
Table
Table
Application
es
rib
c
s
De
XML
XML
XML
XML Datastore
SQL Database
Pros vs. Cons





More flexible – XML can change at
runtime
More complex at the application level
Database must be more abstract
Can make the application harder to
extend
Don’t forget – a DB table can also serve
the same purpose as an XML meta data
doc
Alternative: Tables As Meta
Data
Meta Data
Table
Table
Table
Table
Describes
Application
XML
XML
Table
Table
XML
XML Datastore
SQL Database