Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Managing XML and Semistructured Data Part 3: Query Languages 1 In this section… Lorel (A Lightweight Object REpository Language developed at Standford) XPath specification • data model • Examples [xpath, axis] • syntax XQuery FLWR expressions FOR and LET expressions Collections and sorting (XML-QL the earlier version in AT&T Labs) Resources: The Lorel Query Language for Semistructured Data by Abiteboul, Quass, McHugh, Widom, Wiener, in International Journal on Digital Libraries, 1997. A formal semantics of patterns in XSLT by Phil Wadler. XML Path Language (XPath) www.w3.org/TR/xpath XQuery: A Query Language for XML Chamberlin, Florescu, et al. W3C recommendation: www.w3.org/TR/xquery/ 2 Querying XML Data A core query language (extracting + restructuring) XPath (core expressions) allows simple navigation through the tree XQuery is used as the SQL of XML XSLT (Extensible Stylesheet Language Transformation) = recursive traversal based on pattern matching - will not discuss here 3 Sample Data for Queries <biblio> <paper> … </paper> <book><author> Smith </author> <date> 1999 </date> <title> Database Systems </title> </book> <book > <author> Roux</author> <author> Combalusier</author> <date> 1976 </date> <title> Database Systems </title> </book> </biblio> 4 A Core Query Language A SQL-like language for querying semi-structured data biblio &o1 paper &o12 book book &o24 &o29 ... author date title author &o52 Will illustrate with: XML DB = title author &96 date &25 Roux 1976 &o47 &o48 Combalusier Smith 1999 Database Systems &o50 &30 Database Systems 5 SELECT author: X FROM biblio.book.author X Query 1: answer biblio &o1 paper book book author author &o12 &o24 &o29 ... title author author date title author &o52 &o47 Smith &o48 1999 &o50 author &96 date &25 1976 Roux Combalusier Answer = {author: “Smith”, author: “Roux”, author: “Combalusier”} &30 Database Systems Database Systems 6 SELECT row: X FROM biblio._ X WHERE “Smith” in X.author Query 2: row biblio . . . &o1 paper &o12 book book &o24 title author date title author &o52 &o47 Smith &o48 1999 Answer = {row: {author:“Smith”, date: 1999, title: “Database…”}, row: … } row &o29 ... author answer &o50 &96 date &25 1976 Roux Combalusier &30 Database Systems Database Systems 7 SELECT row: ( SELECT author: Y FROM X.author Y) Query 3: FROM biblio.book X row answer biblio &o1 paper &o12 book &o29 ... title author &o52 &o47 Smith &o48 1999 &o50 author author title author date &a2 author &o24 author row &a1 book &96 date &25 1976 Roux Combalusier Answer = {row: {author:“Smith”}, row: {author:“Roux”, author:“Combalusier”,}, } &30 Database Systems Database Systems 8 SELECT ( SELECT row: {author: Y, title: T} FROM X.author Y, X.title T) FROM biblio.book X WHERE “Roux” in X.author Query 4: row answer biblio &o1 paper &o12 &a1 book book &o24 author &o29 ... date title author &o52 &o47 Smith &o48 1999 &o50 title &a2 author title title author author row &96 date &25 1976 Roux Combalusier &30 Answer = {row: {author:“Roux”, title: “Database…”}, row: {author:“Combalusier”, title: “Database…”}, } Database Systems Database Systems 9 Lorel Minor syntactic differences in regular path expressions (% instead of _, # instead of _*) Common path convention: becomes: SELECT biblio.book.author FROM biblio.book WHERE biblio.book.year = 1999 SELECT X.author FROM biblio.book X WHERE X.year = 1999 10 Lorel Existential variables: SELECT biblio.book.year FROM biblio.book WHERE biblio.book.author = “Roux” • What happens with books having multiple authors ? Author is existentially quantified: SELECT X.year FROM biblio.book X, X.author Y WHERE Y = “Roux” 11 Lorel Path variables. @P in: SELECT @P FROM biblio.# @P X • What happens on graphs with cycles ? Constructing new results • Several default rules Casting between datatypes • Very useful in practice 12 XPath http://www.w3.org/TR/xpath (11/99) Building block for other W3C standards: • • • • XSL Transformations (XSLT) XML Link (XLink) XML Pointer (XPointer) XML Query Was originally part of XSL 13 XPath: Summary bib matches a bib element * matches any element / matches the root element /bib matches a bib element under root bib/paper matches a paper in bib bib//paper matches a paper in bib, at any depth //paper matches a paper at any depth paper|book matches a paper or a book @price matches a price attribute bib/book/@price matches price attribute in book, in bib 14 bib/book/[@price<“55”]/author/lastname matches… Example for XPath Queries <bib> <book> <publisher> Addison-Wesley </publisher> <author> Serge Abiteboul </author> <author> <first-name> Rick </first-name> <last-name> Hull </last-name> </author> <author> Victor Vianu </author> <title> Foundations of Databases </title> <year> 1995 </year> </book> <book price=“55”> <publisher> Freeman </publisher> <author> Jeffrey D. Ullman </author> <title> Principles of Database and Knowledge Base Systems </title> <year> 1998 </year> </book> </bib> 15 Data Model for XPath The root bib book publisher Addison-Wesley The root element book author . . . . Serge Abiteboul 16 XPath: Simple Expressions /bib/book/year Result: <year> 1995 </year> <year> 1998 </year> /bib/paper/year Result: empty (there were no papers) 17 XPath: Restricted Kleene Closure //author Result:<author> Serge Abiteboul </author> <author> <first-name> Rick </first-name> <last-name> Hull </last-name> </author> <author> Victor Vianu </author> <author> Jeffrey D. Ullman </author> /bib//first-name Result: <first-name> Rick </first-name> 18 XPath: Text Nodes /bib/book/author/text() Result: Serge Abiteboul Jeffrey D. Ullman !Rick Hull doesn’t appear because he has firstname, lastname Functions in XPath: • text() = matches the text value • node() = matches any node (= * or @* or text()) • name() = returns the name of the current tag 19 XPath: Wildcard //author/* Result: <first-name> Rick </first-name> <last-name> Hull </last-name> * Matches any element 20 XPath: Attribute Nodes /bib/book/@price Result: “55” @price means that price is has to be an attribute 21 XPath: Predicates /bib/book/author[firstname] Result: <author> <first-name> Rick </first-name> <last-name> Hull </last-name> </author> 22 XPath: More Predicates /bib/book/author[firstname][address[//zip][city]]/lastname Result: <lastname> … </lastname> <lastname> … </lastname> 23 XPath: More Predicates /bib/book[@price < “60”] /bib/book[author/@age < “25”] /bib/book[author/text()] 24 XQuery Based on Quilt (which is based on XML-QL) http://www.w3.org/ TR/xquery/ 2/2001 XML Query data model • Ordered ! FLWOR (flower) Expressions FOR ... LET... WHERE... ORDER BY… RETURN... 25 XQuery Query: Find all book titles published after 1995: FOR $x IN document("bib.xml")/bib/book WHERE $x/year > 1995 RETURN $x/title <bib> <book> <publisher> Addison-Wesley </publisher> <author> Serge Abiteboul </author> <author> <first-name> Rick </first-name> <last-name> Hull </last-name> </author> <author> Victor Vianu </author> <title> Foundations of Databases </title> <year> 1995 </year> </book> <book price=“55”> <publisher> Freeman </publisher> <author> Jeffrey D. Ullman </author> <title> Principles of Database and Knowledge Base Systems </title> <year> 1998 </year> </book> * bib.xml is shown on slide 15 Result: <title> Principles of Database…</title> </bib> 26 XQuery Query: Find book titles by the coauthors of “Foundations of Databases”: FOR $x IN bib/book[title/text() = “Foundations …”]/author $y IN bib/book[author/text() = $x/text()]/title RETURN <answer> $y/text() </answer> The answer will contain duplicates ! Result: <answer> Foundations … </ answer > < answer> Foundations …</ answer > 27 XQuery Same as before, but eliminate duplicates: FOR $x IN bib/book[title/text() = “Database Theory”]/author $y IN distinct(bib/book[author/text() = $x/text()]/title) RETURN <answer> $y/text() </answer> distinct = a function that eliminates duplicates Result: < answer> Foundations …</ answer > 28 SQL and XQuery Side-by-side Product(pid, name, maker) Company(cid, name, city) Query: Find all products made in Seattle SELECT x.name FROM Product x, Company y WHERE x.maker=y.cid and y.city=“Seattle” SQL Cool XQuery FOR $x IN /db/Product/row $y IN /db/Company/row WHERE $x/maker/text()=$y/cid/text() and $y/city/text() = “Seattle” RETURN $x/name XQuery FOR $y IN /db/Company/row[city/text()=“Seattle”] $x IN /db/Product/row[maker/text()=$y/cid/text()] RETURN $x/name 29 Result: XQuery: Nesting Query: For each author of a book by Morgan Kaufmann, list all books s/he published: <result> <author>Jones</author> <title> abc </title> <title> def </title> </result> <result> <author> Smith </author> <title> ghi </title> </result> FOR $a IN distinct(document("bib.xml") /bib/book[publisher=“Morgan Kaufmann”]/author) RETURN <result> { $a, FOR $t IN /bib/book[author=$a]/title RETURN $t } </result> 30 XQuery FOR $x IN expr -- binds $x to each value in the list expr LET $x = expr -- binds $x to the entire list expr • Useful for common subexpressions and for aggregations <big_publishers> FOR $p IN distinct(document("bib.xml")//publisher) LET $b := document("bib.xml")/book[publisher = $p] WHERE count($b) > 100 RETURN $p </big_publishers> count = a (aggregate) function that returns the number of elms 31 XQuery Query: Find books whose price is larger than average: FOR $a IN /bib/book LET $b:=avg(/bib/book/price/text()) WHERE $a/price/text() > $b RETURN $a 32 XQuery Query: Find all publishers that published more than 100 books: <big_publishers> { FOR $p IN distinct(//publisher/text()) LET $b := document("bib.xml")/book[publisher/text() = $p] WHERE count($b) > 100 RETURN <publisher> $p </publisher> } </big_publishers> $b is a collection of elements, not a single element count = a (aggregate) function that returns the number of elements 33 FOR v.s. LET FOR Binds node variables iteration LET Binds collection variables one value Examples FOR $x IN document("bib.xml")/bib/book RETURN <result> $x </result> LET $x := document("bib.xml")/bib/book RETURN <result> $x </result> Returns: <result> <book>...</book></result> <result> <book>...</book></result> <result> <book>...</book></result> ... Returns: <result> <book>...</book> <book>...</book> <book>...</book> ... </result> 34 Sorting in XQuery <publisher_list> FOR $p IN distinct(document("bib.xml")//publisher) RETURN <publisher> {<name> $p/text() </name> , FOR $b IN document("bib.xml")//book[publisher = $p] RETURN <book> {$b/title , $b/@price} </book> SORTBY(price DESCENDING) } </publisher> SORTBY(name) </publisher_list> 35 Sorting in XQuery Sorting arguments: refer to the name space of the RETURN clause, not the FOR clause To sort on an element you don’t want to display, first return it, then remove it with an additional query. <publisher_list> FOR $p IN distinct(document("bib.xml")//publisher) RETURN <publisher> { <name> $p/text() </name> , FOR $b IN document("bib.xml")//book[publisher = $p] RETURN <book> { $b/title , $b/price } </book> ORDER BY price DESCENDING } </publisher> ORDER BY name </publisher_list> 36 Collections in XQuery Ordered and unordered collections • /bib/book/author = an ordered collection • Distinct(/bib/book/author) = an unordered collection LET $b = /bib/book $b is a collection $b/author a collection (several authors...) Returns: RETURN <result> $b/author </result> <result> <author>...</author> <author>...</author> <author>...</author> ... </result> 37 If-Then-Else FOR $h IN //holding RETURN <holding> { $h/title, IF $h/@type = "Journal" THEN $h/editor ELSE $h/author } </holding> ORDER BY title 38 Quantifiers FOR $b IN //book Existential Quantifiers WHERE SOME $p IN $b//para SATISFIES contains($p, "sailing") AND contains($p, "windsurfing") RETURN $b/title FOR $b IN //book Universal Quantifiers WHERE EVERY $p IN $b//para SATISFIES contains($p, "sailing") RETURN $b/title 39 Other Stuff in XQuery BEFORE and AFTER • for dealing with order in the input FILTER • deletes some edges in the result tree Recursive functions • Currently: arbitrary recursion • Perhaps more restrictions in the future ? 40 Group-By in XQuery ?? No GROUPBY currently in XQuery A recent proposal (next) • What do YOU think ? 41 Group-By in XQuery ?? FOR $b IN document("http://www.bn.com")/bib/book, $y IN $b/@year WHERE $b/publisher="Morgan Kaufmann" RETURN with GROUPBY GROUPBY $y WHERE count($b) > 10 IN <year> $y </year> SELECT year Equivalent SQL FROM Bib WHERE Bib.publisher="Morgan Kaufmann" GROUPBY year HAVING count(*) > 10 42 Group-By in XQuery ?? FOR $b IN document("http://www.bn.com")/bib/book, $a IN $b/author, $y IN $b/@year RETURN GROUPBY $a, $y IN <result> $a, <year> $y </year>, <total> count($b) </total> </result> Without GROUPBY with GROUPBY FOR $Tup IN distinct(FOR $b IN document("http://www.bn.com")/bib, $a IN $b/author, $y IN $b/@year RETURN <Tup> <a> $a </a> <y> $y </y> </Tup>), $a IN $Tup/a/node(), $y IN $Tup/y/node() LET $b = document("http://www.bn.com")/bib/book[author=$a,@year=$y] RETURN <result> $a, <year> $y </year>, <total> count($b) </total> </result> 43 Group-By in XQuery ?? FOR $b IN document("http://www.bn.com")/bib/book, $a IN $b/author, $y IN $b/@year, $t IN $b/title, $p IN $b/publisher RETURN GROUPBY $p, $y IN <result> $p, <year> $y </year>, GROUPBY $a IN <authorEntry> $a, GROUPBY $t IN $t <authorEntry> </result> Nested GROUPBY’s 44 XQuery Summary:[Demo] FOR-LET-WHERE-RETURN = FLWR FOR/LET Clauses List of tuples of bounded variables WHERE Clause List of pruned tuples of bounded variables RETURN Clause Instance of XQuery data model 45