Download Tools for Data Exchange in XML

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Tools for XML Data Exchange
Dan Suciu
AT&T Labs
Joint work with Mary Fernandez
Dan Suciu
Tools for XML Data Exchange
XML Has Many Facets
• XML for fancier Web pages
– XML generated with structural editors
• XML for messaging
– generated during applications
• XML for Data Exchange
– generated from legacy data
Dan Suciu
Tools for XML Data Exchange
XML in Data Exchange
•
•
•
•
communities agree on common DTD
export their data in XML
exchange over HTTP protocol
applications understand only that DTD
Dan Suciu
Tools for XML Data Exchange
An Example of XML Data
<book> <publisher> Addison-Wesley </publisher>
<author> Serge Abiteboul </author>
<author> <first-name> Rick </first-name>
<last-name> Hull </last-name>
<author> Victor Vianu </author>
<title> Foundations of Databases </title>
<year> 1995 </year>
</book>
<book> <publisher> Freeman </publisher>
<author> Jeffrey D. Ullman </author>
<title> Principles of Database and Knowledge Base Systems </title>
<year> 1998 </year>
</book>
Dan Suciu
Tools for XML Data Exchange
XML Exchange Vision
application
application
object-relational
Integrate
XML Data
Transform
WEB (HTTP)
Warehouse
application
relational data
Dan Suciu
legacy data
Tools for XML Data Exchange
Tools
• export legacy data to XML
– RXL
• query/transform/integrate XML data
– XML-QL
• compress XML data
– XMill
• store/process incoming XML data
– STORED
Dan Suciu
Tools for XML Data Exchange
XML-QL:
A Query Language for XML
• http://www.w3.org/TR/NOTE-xml-ql (8/98)
• W3C new Working Group on QL (9/99)
• XML-QL characteristics:
– relational complete (like SQL)
– XML input, XML output
– queries, transforms, integrates XML data
[Deutsch et al., 1999 (WWW8)]
Dan Suciu
Tools for XML Data Exchange
Querying in XML-QL
Pattern
where <book language=“french”>
<publisher>
<name> Morgan Kaufmann </name>
</publisher>
<author> $a </author>
</book> in “www.a.b.c/bib.xml”
construct $a
Dan Suciu
Tools for XML Data Exchange
Transformations in XML-QL
Template
where <book language = $l>
<author> $a </>
</> in “www.a.b.c/bib.xml”
construct <result> <author> $a </> <lang> $l </> </>
Note: </> abbreviates </book> or </result> or ...
<result> <author>. . .</author><lang>. . .</lang></result>
<result> <author>. . .</author><lang>. . .</lang></result>
<result> <author>. . .</author><lang>. . .</lang></result>
Dan Suciu
Tools for XML Data Exchange
Transformations in XML-QL
Skolem Functions in Templates
where <book language = $l>
<author> $a </>
</> in “www.a.b.c/bib.xml”
construct <result> <author id=F($a)> $a</>
<lang> $l </> </>
<result> <author>. . .</author> <lang>. . .</lang> <lang>. . .</lang> </result>
<result> <author>. . .</author> <lang>. . .</lang> <lang>. . .</lang> </result>
Dan Suciu
Tools for XML Data Exchange
Data Integration in XML-QL
{ where <book > <isbn> $n </> <title> $t </> </>
in “www.books.com”
construct <result id=F($n)> <title> $t </> </> }
{ where <review> <isbn> $n </> <review> $r </> </>
in “www.reviews.com”
construct <result id=F($n)> <review> $r </> </> }
Dan Suciu
<result id=“..” > <title>. . .</title>
<review>. . .</review>
<review>. . .</review>
</result>Tools for XML Data Exchange
RXL:
Export Legacy Data To XML
• legacy data
– fragmented into many flat relations
– 3rd normal form
– schema is proprietary
• XML data
– nested
– un-normalized
– schema designed by agreement
Dan Suciu
Tools for XML Data Exchange
RXL: An Example
Store
• relational database:
• virtual XML view:
Dan Suciu
s id
…
…
SB
nam e
…
…
s id
…
…
Book
b id
…
…
<store> <name> n1 </name>
<book> ... </book>
<book> ... </book>
...
</store>
<store> <name>n2 </name>
<book> ... </book>
<book> ... </book>
…
</store>
Tools for XML Data Exchange
b id
…
…
title
…
…
A Simple RXL Query
• specify XML view declaratively
from
where
Store, SB, Book
Store.sid=SB.sid and
SB.bid=Book.bid
construct <store ID=f(Store.sid)>
<name> Store.name </name>
<book> Book.title </book>
</store>
Dan Suciu
Tools for XML Data Exchange
RXL: Querying the XML
View
• users ask XML-QL queries:
– find stores who sell “The Calculus”
where
<store> <name> $n </name>
<book> The Calculus </book>
<store>
construct <result> $n </result>
Dan Suciu
Tools for XML Data Exchange
RXL: Query composition
Store
SB
s id n a m e
s id
…
…
…
…
…
…
b id
…
…
Book
b id
title
…
…
…
…
RXL
<store> <name> n1 </name>
<book> ... </book>
<book> ... </book>
...
</store>
<store> <name>n2 </name>
<book> ... </book>
<book> ... </book>
…
</store>
XML-QL
system composes query with view:
Dan Suciu
from Store, SB, Book
where Store.sid=SB.sid and
SB.bid=Book.bid and
Book.title=“The Calculus”
construct <result> Store.name </result>
Tools for XML Data Exchange
Compressing XML Data
• for exchange and archiving
• can use general tool (gzip)
• but specialized tool twice as good (Xmill)
Dan Suciu
Tools for XML Data Exchange
Xmill Example: Weblogs
202.239.238.16|GET / HTTP/1.0|text/html|200|1997/10/01-00:00:02|-|4478
|-|-|http://www02.so-net.or.jp/|Mozilla/3.01 [ja] (Win95; I)
<apache:entry>
<apache:host>202.239.238.16</apache:host>
<apache:requestLine>GET / HTTP/1.0</apache:requestLine>
<apache:contentType>text/html</apache:contentType>
<apache:statusCode>200</apache:statusCode>
<apache:date>1997/10/01-00:00:02</apache:date>
<apache:byteCount>4478</apache:byteCount>
<apache:referer>http://www02.so-net.or.jp/</apache:referer>
<apache:userAgent>Mozilla/3.01 [ja] (Win95; I)</apache:userAgent>
</apache:entry>
</store>
Dan Suciu
Tools for XML Data Exchange
Xmill Example: Weblogs
weblog.dat:
weblog.xml:
15.9MB
24.2MB
weblog.dat.gz:
weblog.xml.gz:
1.6MB
2.1MB
xmill -p // weblog.xml weblog1.xmi
weblog1.xmi:
1.75MB
xmill weblog.xml weblog2.xmi
weblog2.xmi:
1.33MB
xmill -f settings.pz weblog.xml weblog3.xmi
weblog3.xmi:
Dan Suciu
0.82MB
Tools for XML Data Exchange
Xmill: Fine Tuning the
Compression
-p//apache:host=>seqcomb(u8 "." u8 "." u8 "." u8)
-p//apache:userAgent=>seq(e "/" e)
-p//apache:byteCount=>u
-p//apache:statusCode=>e
-p//apache:contentType=>e
-p//apache:requestLine=>seq("GET " rep("/" e) " HTTP/1." e)
-p//apache:date=>seq(u "/" u8 "/" u8 "-" u8 ":" di ":" di)
-p//apache:referer=>or(seq("file:" t)
seq("http://" or(seq(rep("." e) "/" rep("/" e)) rep("." e))) t)
Dan Suciu
Tools for XML Data Exchange
Storing XML Data
• Scenario:
– receive a large XML data instance
– want to store, manage it
• Could build an XML management
system from scratch (eXcelon)
• Preferably: use existing database
systems
Dan Suciu
Tools for XML Data Exchange
Storing XML:
Ternary Relation
Ref
S o u rc e
&o1
&
&
&
&
&
paper
&o2
title
author
&o3
author
&o4
“The Calculus” “…”
year
&o5
&o6
“…”
“1986”
[Florescu, Kossman 1999]
Dan Suciu
Tools for XML Data Exchange
o1
o2
o2
o2
o2
Val
N ode
&
&
&
&
o3
o4
o5
o6
L abel
D est
paper
title
a u th o r
a u th o r
year
&
&
&
&
&
o2
o3
o4
o5
o6
V a lu e
T h e C a lc u lu s
…
…
1986
Storing XML:
Derive Schema from DTD
• DTD:
<!ELEMENT employee (name, address, project*)>
<!ELEMENT address (street, city, state, zip)>
• ODMG classes:
class Employee public type tuple
(name:string, address:Address, project:List(Project))
class Address public type tuple (street:string, …)
• [Christophides et al. 1994 , Shanmugasundaram et al. 1999]
Dan Suciu
Tools for XML Data Exchange
STORED Approach:
Mine Data to Derive Schema
paper
paper paper
Paper1
paper
fn 1
ln 1
fn 2
ln 2
title
year
X
X
X
X
X
X
X
-
X
-
X
X
X
X
-
year
author
title
author
authortitle authortitleauthor title
fn
ln fn
ln
fn
fn
ln
ln
Paper2
a u th o r
X
[Deutsch et al. 1999]
Dan Suciu
Tools for XML Data Exchange
title
X
Summary
• XML - simple (?), lightweight syntax
• Challenge: build bridges to existing
database tools
• XML in data exchange: YES
• XML as a new data model: NO
Dan Suciu
Tools for XML Data Exchange
More Info
http://www.research.att.com/~suciu
Data on the Web:
From Relational to Semistructured to XML
Morgan Kaufmann, 1999
Dan Suciu
Tools for XML Data Exchange
Related documents