Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Tools for XML Data Exchange Dan Suciu AT&T Labs Joint work with Mary Fernandez Dan Suciu Tools for XML Data Exchange XML Has Many Facets • XML for fancier Web pages – XML generated with structural editors • XML for messaging – generated during applications • XML for Data Exchange – generated from legacy data Dan Suciu Tools for XML Data Exchange XML in Data Exchange • • • • communities agree on common DTD export their data in XML exchange over HTTP protocol applications understand only that DTD Dan Suciu Tools for XML Data Exchange An Example of XML Data <book> <publisher> Addison-Wesley </publisher> <author> Serge Abiteboul </author> <author> <first-name> Rick </first-name> <last-name> Hull </last-name> <author> Victor Vianu </author> <title> Foundations of Databases </title> <year> 1995 </year> </book> <book> <publisher> Freeman </publisher> <author> Jeffrey D. Ullman </author> <title> Principles of Database and Knowledge Base Systems </title> <year> 1998 </year> </book> Dan Suciu Tools for XML Data Exchange XML Exchange Vision application application object-relational Integrate XML Data Transform WEB (HTTP) Warehouse application relational data Dan Suciu legacy data Tools for XML Data Exchange Tools • export legacy data to XML – RXL • query/transform/integrate XML data – XML-QL • compress XML data – XMill • store/process incoming XML data – STORED Dan Suciu Tools for XML Data Exchange XML-QL: A Query Language for XML • http://www.w3.org/TR/NOTE-xml-ql (8/98) • W3C new Working Group on QL (9/99) • XML-QL characteristics: – relational complete (like SQL) – XML input, XML output – queries, transforms, integrates XML data [Deutsch et al., 1999 (WWW8)] Dan Suciu Tools for XML Data Exchange Querying in XML-QL Pattern where <book language=“french”> <publisher> <name> Morgan Kaufmann </name> </publisher> <author> $a </author> </book> in “www.a.b.c/bib.xml” construct $a Dan Suciu Tools for XML Data Exchange Transformations in XML-QL Template where <book language = $l> <author> $a </> </> in “www.a.b.c/bib.xml” construct <result> <author> $a </> <lang> $l </> </> Note: </> abbreviates </book> or </result> or ... <result> <author>. . .</author><lang>. . .</lang></result> <result> <author>. . .</author><lang>. . .</lang></result> <result> <author>. . .</author><lang>. . .</lang></result> Dan Suciu Tools for XML Data Exchange Transformations in XML-QL Skolem Functions in Templates where <book language = $l> <author> $a </> </> in “www.a.b.c/bib.xml” construct <result> <author id=F($a)> $a</> <lang> $l </> </> <result> <author>. . .</author> <lang>. . .</lang> <lang>. . .</lang> </result> <result> <author>. . .</author> <lang>. . .</lang> <lang>. . .</lang> </result> Dan Suciu Tools for XML Data Exchange Data Integration in XML-QL { where <book > <isbn> $n </> <title> $t </> </> in “www.books.com” construct <result id=F($n)> <title> $t </> </> } { where <review> <isbn> $n </> <review> $r </> </> in “www.reviews.com” construct <result id=F($n)> <review> $r </> </> } Dan Suciu <result id=“..” > <title>. . .</title> <review>. . .</review> <review>. . .</review> </result>Tools for XML Data Exchange RXL: Export Legacy Data To XML • legacy data – fragmented into many flat relations – 3rd normal form – schema is proprietary • XML data – nested – un-normalized – schema designed by agreement Dan Suciu Tools for XML Data Exchange RXL: An Example Store • relational database: • virtual XML view: Dan Suciu s id … … SB nam e … … s id … … Book b id … … <store> <name> n1 </name> <book> ... </book> <book> ... </book> ... </store> <store> <name>n2 </name> <book> ... </book> <book> ... </book> … </store> Tools for XML Data Exchange b id … … title … … A Simple RXL Query • specify XML view declaratively from where Store, SB, Book Store.sid=SB.sid and SB.bid=Book.bid construct <store ID=f(Store.sid)> <name> Store.name </name> <book> Book.title </book> </store> Dan Suciu Tools for XML Data Exchange RXL: Querying the XML View • users ask XML-QL queries: – find stores who sell “The Calculus” where <store> <name> $n </name> <book> The Calculus </book> <store> construct <result> $n </result> Dan Suciu Tools for XML Data Exchange RXL: Query composition Store SB s id n a m e s id … … … … … … b id … … Book b id title … … … … RXL <store> <name> n1 </name> <book> ... </book> <book> ... </book> ... </store> <store> <name>n2 </name> <book> ... </book> <book> ... </book> … </store> XML-QL system composes query with view: Dan Suciu from Store, SB, Book where Store.sid=SB.sid and SB.bid=Book.bid and Book.title=“The Calculus” construct <result> Store.name </result> Tools for XML Data Exchange Compressing XML Data • for exchange and archiving • can use general tool (gzip) • but specialized tool twice as good (Xmill) Dan Suciu Tools for XML Data Exchange Xmill Example: Weblogs 202.239.238.16|GET / HTTP/1.0|text/html|200|1997/10/01-00:00:02|-|4478 |-|-|http://www02.so-net.or.jp/|Mozilla/3.01 [ja] (Win95; I) <apache:entry> <apache:host>202.239.238.16</apache:host> <apache:requestLine>GET / HTTP/1.0</apache:requestLine> <apache:contentType>text/html</apache:contentType> <apache:statusCode>200</apache:statusCode> <apache:date>1997/10/01-00:00:02</apache:date> <apache:byteCount>4478</apache:byteCount> <apache:referer>http://www02.so-net.or.jp/</apache:referer> <apache:userAgent>Mozilla/3.01 [ja] (Win95; I)</apache:userAgent> </apache:entry> </store> Dan Suciu Tools for XML Data Exchange Xmill Example: Weblogs weblog.dat: weblog.xml: 15.9MB 24.2MB weblog.dat.gz: weblog.xml.gz: 1.6MB 2.1MB xmill -p // weblog.xml weblog1.xmi weblog1.xmi: 1.75MB xmill weblog.xml weblog2.xmi weblog2.xmi: 1.33MB xmill -f settings.pz weblog.xml weblog3.xmi weblog3.xmi: Dan Suciu 0.82MB Tools for XML Data Exchange Xmill: Fine Tuning the Compression -p//apache:host=>seqcomb(u8 "." u8 "." u8 "." u8) -p//apache:userAgent=>seq(e "/" e) -p//apache:byteCount=>u -p//apache:statusCode=>e -p//apache:contentType=>e -p//apache:requestLine=>seq("GET " rep("/" e) " HTTP/1." e) -p//apache:date=>seq(u "/" u8 "/" u8 "-" u8 ":" di ":" di) -p//apache:referer=>or(seq("file:" t) seq("http://" or(seq(rep("." e) "/" rep("/" e)) rep("." e))) t) Dan Suciu Tools for XML Data Exchange Storing XML Data • Scenario: – receive a large XML data instance – want to store, manage it • Could build an XML management system from scratch (eXcelon) • Preferably: use existing database systems Dan Suciu Tools for XML Data Exchange Storing XML: Ternary Relation Ref S o u rc e &o1 & & & & & paper &o2 title author &o3 author &o4 “The Calculus” “…” year &o5 &o6 “…” “1986” [Florescu, Kossman 1999] Dan Suciu Tools for XML Data Exchange o1 o2 o2 o2 o2 Val N ode & & & & o3 o4 o5 o6 L abel D est paper title a u th o r a u th o r year & & & & & o2 o3 o4 o5 o6 V a lu e T h e C a lc u lu s … … 1986 Storing XML: Derive Schema from DTD • DTD: <!ELEMENT employee (name, address, project*)> <!ELEMENT address (street, city, state, zip)> • ODMG classes: class Employee public type tuple (name:string, address:Address, project:List(Project)) class Address public type tuple (street:string, …) • [Christophides et al. 1994 , Shanmugasundaram et al. 1999] Dan Suciu Tools for XML Data Exchange STORED Approach: Mine Data to Derive Schema paper paper paper Paper1 paper fn 1 ln 1 fn 2 ln 2 title year X X X X X X X - X - X X X X - year author title author authortitle authortitleauthor title fn ln fn ln fn fn ln ln Paper2 a u th o r X [Deutsch et al. 1999] Dan Suciu Tools for XML Data Exchange title X Summary • XML - simple (?), lightweight syntax • Challenge: build bridges to existing database tools • XML in data exchange: YES • XML as a new data model: NO Dan Suciu Tools for XML Data Exchange More Info http://www.research.att.com/~suciu Data on the Web: From Relational to Semistructured to XML Morgan Kaufmann, 1999 Dan Suciu Tools for XML Data Exchange