Download XML Publishing - Computer Science, NMSU

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Microsoft Access wikipedia , lookup

Extensible Storage Engine wikipedia , lookup

Database wikipedia , lookup

Microsoft Jet Database Engine wikipedia , lookup

Open Database Connectivity wikipedia , lookup

Entity–attribute–value model wikipedia , lookup

Relational algebra wikipedia , lookup

Microsoft SQL Server wikipedia , lookup

PL/SQL wikipedia , lookup

Versant Object Database wikipedia , lookup

Clusterpoint wikipedia , lookup

SQL wikipedia , lookup

Database model wikipedia , lookup

Relational model wikipedia , lookup

Transcript
XML Publishing

Introduction
 General approach
 XPERRANTO
SilkRoute
Microsoft SQL 2000
 Summary
Introduction
 What
is XML Publishing?
XML Publishing is the task of transforming
the relational data into XML, for the purpose of
exchange over the Internet.
More specifically, publishing XML data
involves joining tables, selecting and projecting
the data that needs to be exported, creating XML
hierarchies; and processing values in an
application specific manner.
Introduction

Why need XML Publishing?
- most business data are stored in
relational database systems.
- XML is a standard for exchanging
business data on the web.
- it’s a simply, platform independent, unicode based
syntax for which simple and efficient parsers are widely
available.
- it can not only represent structured data, but also
provide an uniform syntax for semi-structured data and
marked-up content.
Introduction
Two data model:

Relational data
- fragmented into many flat relations
- normalized
- proprietary
 XML data
- nested
- un-normalized
- public (450 schemas at www.biztalk.org)
General Approach

Create XML views over Relational Data,
each of these XML views can provide an
alternative, application-specific view of the
underlying relational data. Through these
XML views, business partners can access
existing relational data as though it was in
some industry-standard XML format.
Virtual vs. Materialize

Materialized XML Publishing
Materialize the entire XML view on request
and return the resulting XML document.
 Virtual XML Publishing
Support queries over XML views, return
what user applications actually want.
Virtual vs. Materialize

Materialized XML Publishing
- applications can access all the data without interfering
with the relational engine
- XML view need to be refreshed periodically
- inefficient in some cases

Virtual XML Publishing
- guarantee data freshness
- leverage the processing power of relational engines
- translation of an XML query of an XML view into
SQL may be complex
Middleware System

Interface between Relational Database and
User Application
- defines and manages XML views
- translates incoming XML queries into SQL
and submits them to the database system
- receives the queries’ results, then translates
them back into XML terms.
Applications
Middleware
System
Web/Intranet
User XML
Queries
XML
Query
Processor
Result XML
Documents
View Definition
XML Views
Manager
View Description
SQL Queries
Figure 1 A high-level
architecture of middleware
system
XML
Tagger
Tuples Streams
RDBMS
XPERRANTO vs. SilkRoute

IBM XPERRANTO
- pure XML, single query language approach.
XML views are defined by XML query language
which is using the type system of XML schema.

SilkRoute
- XML views are defined using a declarative query
language called RXL (Relational to XML
Transformation Language).
XPERRANTO vs. SilkRoute

XPERRANTO
- user only need be familiar with XML
- both relation data and meta-data can be
represented and queried in the same framework
- can publish object-relational structures
- pushes all relational logic down to database
engine
Query Translation
View Definition
XML View Services
XML-QL Parser
View Description
XQGM
XML Schema
Query Rewrite
XML Schema Generator
Catalog
Info.
XQGM
SQL Translation
XML Result
XML Tagger
Data Tuples
O-R Database
SQL Query Processor
SQL Queries
Stored
Tables
System
Catalog
Figure 2 XPERRANTO
Architecture
Example 1: Relational Schema vs. XML
View Schema

DDL (Data Definition Language) for O-R
Schema in SQL99 Terms
1.Create Table Book AS (bookID CHAR(30), name VARCHAR(225), publisher VARCHAR(30))
2.Create Table publisher AS (name VARCHAR(30), address VARCHAR(255))
3.Create Type author_type AS (bookID CHAR(30), first VARCHAR(30), last VARCHAR(30))
4.Create Table author OF author_type (REF IS ssn USER GENERATED)
<simpleType name=“string255” source=“string”> <maxlength value=“255”/> </simpleType>
<simpleType name=“string30” source=“string”> <maxlength value=“30”/> </simpleType>
<complexType name=“bookTupleType”>
<element name=“bookID” type=“string30”/>
<element name=“name” type=“string225”/>
<element name=“publisher” type=“string30”/>
</complexType>
Create Table book AS...
<complexType name=“bookSetType”>
<element name=“bookTuple” type=“bookTupleType” maxOccurs=“*”/>
</complexType>
<element name=“book” type=“bookSetType”/>
<complexType name=“author_type”>
<element name=“bookID” type=“string30”/>
<element name=“first” type=“string30”/>
<element name=“last” type=“string30”/>
</complexType>
Create Type author_type AS...
<complexType name=“authorTupleType” source=“author_type” derivedBy=“extension”>
<attribute name=“ssn” type=“ID”/>
</complexType>
<complexType name=“authorSetType”>
<element name=“authTuple” type=“authTupleType” maxOccurs=“*”/>
</complexType>
<element name=“author” type=“authSetType”/>
XML View Schema over Example O-R database
Create Table author OF ...
<db>
<book>
<row><bookID>…</bookID><name>…</name><publisher>…</publisher></row>
<row><bookID>…</bookID><name>…</name><publisher>…</publisher></row>
…
</book>
<author>
<row><ssn>…</ssn><bookID>…</bookID><first>…</first><last>…</last></row>
<row><ssn>…</ssn><bookID>…</bookID><first>…</first><last>…</last></row>
…
</author>
<publisher>
…similar to <book> and <item>
</publisher>
</db>
Default XML View over Example O-R database
Example 2: From XQuery to SQL
Query Result
XQuery
XPERRANTO Query Engine
XQuery
Parser
XQGM
Query Rewrite &
View Composition
XQGM
Computational
Pushdown
Tagger Graph
SQL Query
Tagger
Runtime
Tuples
RDBMS
order
id
custname
10
Smith Construction
9
Western Builders
item
custnum
payment
oid
desc
cost
oid
7734
10
10
8000
12000
10
7725
generator
backhoe
10
due
1/10/01
6/10/01
amt
20000
12000
<db>
<order>
<row><id>10</id><custname>Smith Construction</custname><custnum>7734</custnum></row>
<row><id>9</id><custname>Western Builders</custname><custnum>7725</custnum></row>
</order>
<item>
<row><oid>10</id><desc>generator</desc><cost>8000</cost></row>
<row><oid>10</id><desc>backhoe</desc><cost>24000</cost></row>
</item>
<payment>
…similar to <order> and <item>
</payment>
</db>
A Purchase Order Database and its Default View
<order id=“10”>
<customer>Smith Construction</customer>
<items>
<item description=“generator”>
<cost>8000</cost>
</item>
<item description=“backhoe”>
<cost>24000</cost>
</item>
</items>
<payments>
<payment due=“1/10/01”>
<amount>20000</amount>
</payment>
<payment due=“6/10/01”>
<amount>12000</amount>
</payment>
</payments>
</order>
<order id=“9”>
…
</order>
XML Purchase Order
01. create view orders as (
02. for $order in view (“default”)/order/row
03. return
04.
<order id=$order/id>
05.
<customer>$order/custname</customer>
06.
<items>
07.
for $item in view(“default”)/item/row
08.
where $order/id=$item/oid
09.
return
10.
<item description=$item/desc>
11.
<cost>$item/cost</cost>
12.
</item>
13.
</items>
14.
<payments>
15.
for $payment in view(“default”)/payment/row
16.
where $order/id=$payment/oid
17.
return
18.
<payment due=$payment/data>
19.
<amount>$payment/amount</amount>
20.
</payment>
21.
sortby(@due)
22.
</payments>
23.
</order>)
User-defined XML “orders” view
1. for $order in view(“orders”)
2. where $order/customer/text() like “Smith%”
3. return $order
XQuery over “orders” view
XQuery Parser
Query Parsing

XQGM (XML Query Graph Model)
- extension of a SQL internal query
representation called Query Graph Model
(QGM).
- consists of a set of operators and functions
that are designed to capture the semantics of
an XML query.
OPERATOR
DESCRIPTION
Table
Represents a table in a relational database
Project
Computes results based on its input
Select
Restricts its input
Join
Join two or more inputs
Groupby
Applies aggregate functions and grouping
Orderby
Sorts input based on column values
Union
Unions two or more inputs
Unnest
Applies super-scalar functions to input
View
Represents a view
Function
Represents an Xquery function
XML FUNCTION
Part of the XML Functions and
Operators in XQGM
DESCRIPTION
OPERATORS
1
cr8Elem(Tag, Atts, Clist)
Creates an element with tag name Tag, attribute list Atts, and contents Clist
Project
2
cr8AttList(A1,…,An)
Creates a list of attributes from the attributes passed as parameters
Project
3
cr8XMLFragList(C1,…,Cn)
Creates an XML fragment list from the content (element/text) parameters
Project
4
aggXMLFrags©
Aggregate function that creates an XML fragment list from content inputs
Groupby
5
getTagName(Elem)
Returns the element name of Elem
Project, Select
6
getAttributes(Elem)
Returns the list of attributes of Elem
Project, Select
7
getAttName(Att)
Returns the name of attribute Att
Project, Select
8
Is Element(E)
Returns true if E is an element, returns false otherwise
Select
9
isText(T)
Returns true if T is text, returns false otherwise
Select
10
Unnest(List)
Superscalar function that unnest a list
Unnest
11
view result
$order
project: $order=
<order id=$id>
<custname>$custname</custname>
<items>$items</items>
<payments>$pmts</payments>
</order>
$id $custname $items
correlation
on order.id
$pmts
groupby:
orderby (on $due):
9 $pmts = aggXMLFrags($pmt)
groupby:
$items = aggXMLFrags($item)
$due
$item
4
$pmt
8 project: $pmt = <payment> …
project: $item = <item> …
$desc
$pmts
10 join (correlated):
$items
5
XQGM for the
XML Orders View
$cost
3
select: $oid = $id
2
$oid $desc $cost
table: item
7
$id
1
$custname
table: order
6
$due
$amt
select: $oid = $id
$oid $due $amt
table: payment
for $order in view(“order”) where
$order/customer/text() like “Smith%” return $order
$order
8
join (correlated):
7
$val
select: isText($val) and $val like “Smith%”
$val
6 Unnest: $val = unnest($vals)
correlation
5
on $order
$order
View: orders
1
$vals
project: $vals = getContents($elem)
4
$elem
select: isElement($elem) and
getTagName($elem) = “customer”
3
$elem
Unnest: $elem = unnest($elems)
2
$elems
project: $elems = getContents($order)
XQGM for the Query
over Orders View
View Composition

XQGM after the Query Parsing Stage is
composed with the views it references
(orders view here) and rewrite optimizations
are performed to eliminate the construction
of intermediate XML fragments and push
down predicates.
View Composition
FUNCTION
COMPOSES WITH
REDUCTION
1
getTagName
cr8Elem(Tag, Atts, Clist)
Tag
2
getAttributes
cr8Elem(Tag, Atts, Clist)
Atts
3
getContents
cr8Elem(Tag, Atts, Clist)
Clist
4
getAttName
cr8Att(Name, Val)
Name
5
getAttValue
cr8Att(Name, Val)
Val
6
isElement
cr8Elem(Tag, Atts, Clist)
True
7
isElement
Other than cr8Elem
False
8
isText
PCDATA
True
9
isText
Other than PCDATA
False
10
unnest
aggXMLFrags( C )
C
11
unnest
cr8XMLFragList(C1,…,Cn)
C1∪… ∪Cn
12
unnest
cr8AttList(A1,…, An)
A1 ∪… ∪An
Composition Rules
View
11
$custname
$order
project: $order= <order id=$id>…
$id $custname $items
correlation
on order.id
3
12
$due
$pmt
8 project: $pmt = <payment> …
project: $item = <item> …
$cost
7
select: $oid = $id
Select: $custname like “Smith%”
$oid $desc $cost
table: item
$id
1
$due
$amt
select: $oid = $id
$custname
$id
2
Select: $custname like “Smith%”
$pmts
groupby:
orderby (on $due):
9 $pmts = aggXMLFrags($pmt)
groupby:
$items = aggXMLFrags($item)
$desc
13
$custname
10 join (correlated):
$item
4
join (correlated):
$pmts
$items
5
Query
$order
$custname
table: order
6
Predicate pushdown
$oid $due $amt
table: payment
Computation Pushdown

1.
2.
The goal in this phase of query processing
is to push all data and memory intensive
operations down to the relational engine as
an efficient SQL query. Two techniques
are available:
Query Decorrelation
Tagger Pull-up
Query Decorrelation

Complex expressions in Xquery can be
represented using correlations. However, it
has been shown in earlier work that
executing correlated XML queries over a
relational database leads to poor
performance, so query de-correlation is a
necessary step for efficient XML query
execution.
13
$order
project: $order= <order>…
12
$order $custname $items
left outer join: $id = $id
$id
5
4
$pmts
$id
groupby:
orderby (on $due):
9 $pmts = aggXMLFrags($pmt)
$items
$due $pmt
$id
8 project: $pmt = <payment> …
$item
project: $item = <item> …
$id
3
right outer join: $id = $id
Groupby (on $id) :
$items = aggXMLFrags($item)
$id
$pmts
$items
$id $custname
11
XQGM after
Decorrelation
$desc
$cost
7
join: $oid = $id
$id
$due $amt
join: $oid=$id
$custname
$id
10 Select: $custname like “Smith%”
2
$oid $desc $cost
table: item
$id
1
$custname
table: order
6
$oid $due $amt
table: payment
Tagger Pull-up

This step comes right after the query decorrelation.
It separates the tagger and SQL operations before
SQL query are generated
 Relational operations are pushed to the bottom of
the graph. SQL statements are generated and sent
to the relational engine for execution.
 XML construction functions are pulled up to the
top of the query graph and transformed into a
“tagger run-time” graph, which produces the result
XML documents.
correlation
on id
8
$order
Merge: $order=<order>…
$items
4
$pmts
aggregate::
7 $pmts = aggXMLFrags($pmt)
aggregate:
$items = aggXMLFrags($item)
$item
3
2
$pmt
6 merge: $pmt = <payment> …
merge: $item = <item> …
$desc
$cost
$id
input: $oid = $id
1
select p.oid, i.desc, i.cost
from item i, order o
where o.custname like ‘Smith%’
and i.oid = o.id
order by o.id
XQGM after
Tagger Pull-up
$custname
5
$due
$amt
input: $oid = $id
input:
select o.id, o.custname
from order o
where o.custname like ‘Smith%’
order by o.id
select p.oid, p.due, p.amt
from payment p, order o
where o.custname like ‘Smith%’
and p.oid = o.id
order by o.id, p.due
SilkRoute Approach
Applications
Web/Intranet
User XML
Queries
Virtual View
Or
Materialized View
RXL
Result XML
Documents
SilkRoute
Query
Composer
XML
Tagger
Query RXL
Plan
Generator
XML Template
SQL Queries
Tuples Streams
Source Description
XML
SilkRoute’s Architecture
RDBMS





SilkRoute Approach
Database administrator starts by writing an RXL
query that defines the XML view of the database.
It is called the view query.
A materialized view is fed directly into the Plan
Generator, which generates a set of SQL queries
and one XML template.
A virtual view is first composed by the Query
Composer with a user query resulting another
RXL query which then is fed into Plan Generator.
SQL queries are sent to the RDMS server, which
returns one sorted tuple stream per SQL query
XML Tagger merges the tuple streams and
produces the XML document, which is returned to
the application.
Query Composer

This component takes a user XML-QL query and
composes it with the RXL view query resulting a
new RXL query. It combines fragments of the
view query and user query. Works the similar way
that the Query Parser and Query Rewrite
components in XPERRANTO do.
Plan Generator

This component in SilkRoute uses a greedy
optimization algorithm to choose an optimal set of
SQL queries for a given RXL view definition. The
algorithm bases its decisions on query cost
estimations provided by the relational engine and
can return more than one plan, which will be
integrated with additional optimization algorithms
that optimize specific parameters, such as network
traffic or server load. Details of the greedy
algorithm can be found in:
Efficient evaluation of XML middle-ware queries. M.
Fernandez etc.
XML Publishing : SQL Server
Two approaches

SQL-centric approach
extend the function of SQL queries to realize the
transformation. The extended version of SQL query
is called “FOR XML”.

Virtual XML views approach
use XDR (XML-based XML-Data Reduced)
schema language to define virtual XML
views over relation database, then do
querying with XPath.
XML Publishing : SQL Server
SQL-centric approach
Three modes
 RAW mode
 Auto Mode
 Explicit Mode
XML Publishing : SQL Server,
RAW Mode
SELECT CustomerID, OrderID
FROM Customer LEFT OUTER JOIN ORDERS
ON Customers.CustomerID = Orders.CustomerID
For XML Raw
<row CustomerID = “ALFKI”, OrderID = “10643”/>
<row CustomerID = “ALFKI”, OrderID = “10692”/>
<row CustomerID = “ANATR”, OrderID = “10308”/>
....
• flat XML
• default tag and attribute names
XML Publishing : SQL Server
Auto Mode
SELECT Customers.CustomerID, OrderID
FROM Customer LEFT OUTER JOIN ORDERS
ON Customers.CustomerID = Orders.CustomerID
ORDER BY Customers.OrderID
For XML Auto
<Customers CustomerID = “ALFKI”>
<Orders OrderID = “10643”/>
<Orders OrderID = “10692”/>
</Customers>
<Customers CustomerID = “ANATR”>
<Orders OrderID = “10308”/>
</Customers>
....
• default tag and attribute names
• no differently typed sibling elements
XML Publishing : SQL Server
Explicit Mode

Nested XML
 User defined tags and attributes

Idea: write SQL queries with complex
column names
 Ad-hoc, order dependent semantics
XML Publishing : SQL Server
Virtual XML Views

The core mechanism of providing XML views
over relation data is the concept of an
annotated schema, which consist of a schema
description of the XML view and annotations
that describe the mapping of the XML schema
constructs to the relational schema constructs.
Then the XPath query together with the
annotated schema is translated into a FOR
XML query that only returns the data that is
required by the query.
Summary

IBM XPERRANTO
pure XML, single query language approach.
XML views are defined by XML query language which is
using the type system of XML schema.

SilkRoute
XML views are defined using a declarative query language
called RXL (Relational to XML Transformation Language).

Microsoft SQL 2000
Supports queries over XML views, but the support is very
limited, because queries are specified using XPath, which is a
subset of XQuery.
Future Work

IBM XPERRANTO
- provides support for insertable and updateable XML views
- pushes tagging inside the database system

SilkRoute
- looks for better algorithms for translating of RXL into
efficient SQL and minimization of composed RXL views

Microsoft SQL 2000
- finds out whether query composition and decomposition is
possible for the complete XQuery language or for only a
subset of the language