Download return

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Managing XML and
Semistructured Data
Part 3: Query Languages
1
In this section…
 Lorel (A Lightweight Object REpository Language developed at Standford)
 XPath specification
• data model
• Examples [xpath, axis]
• syntax
 XQuery




FLWR expressions
FOR and LET expressions
Collections and sorting
(XML-QL the earlier version in AT&T Labs)
 Resources:
The Lorel Query Language for Semistructured Data by Abiteboul, Quass,
McHugh, Widom, Wiener, in International Journal on Digital Libraries, 1997.
A formal semantics of patterns in XSLT by Phil Wadler.
XML Path Language (XPath) www.w3.org/TR/xpath
XQuery: A Query Language for XML Chamberlin, Florescu, et al.
W3C recommendation: www.w3.org/TR/xquery/
2
Querying XML Data
 A core query language (extracting +
restructuring)
 XPath (core expressions) allows simple
navigation through the tree
 XQuery is used as the SQL of XML
 XSLT (Extensible Stylesheet Language
Transformation) = recursive traversal based on
pattern matching - will not discuss here
3
Sample Data for Queries
<biblio>
<paper>
…
</paper>
<book><author> Smith </author>
<date> 1999 </date>
<title> Database Systems </title>
</book>
<book >
<author> Roux</author>
<author> Combalusier</author>
<date> 1976 </date>
<title> Database Systems </title>
</book>
</biblio>
4
A Core Query Language
A SQL-like language for querying semi-structured data
biblio
&o1
paper
&o12
book
book
&o24
&o29
...
author
date
title
author
&o52
Will illustrate with:
XML DB =
title
author
&96
date
&25
Roux
1976
&o47
&o48
Combalusier
Smith 1999 Database
Systems
&o50
&30
Database
Systems
5
SELECT author: X
FROM biblio.book.author X
Query 1:
answer
biblio
&o1
paper
book
book
author
author
&o12
&o24
&o29
...
title
author
author
date
title
author
&o52
&o47
Smith
&o48
1999
&o50
author
&96
date
&25
1976
Roux Combalusier
Answer =
{author: “Smith”,
author: “Roux”,
author: “Combalusier”}
&30
Database
Systems
Database
Systems
6
SELECT row: X
FROM biblio._ X
WHERE “Smith” in X.author
Query 2:
row
biblio
. . .
&o1
paper
&o12
book
book
&o24
title
author
date
title
author
&o52
&o47
Smith
&o48
1999
Answer =
{row: {author:“Smith”,
date: 1999,
title: “Database…”},
row: …
}
row
&o29
...
author
answer
&o50
&96
date
&25
1976
Roux Combalusier
&30
Database
Systems
Database
Systems
7
SELECT row: ( SELECT author: Y
FROM X.author Y)
Query 3:
FROM biblio.book X
row
answer
biblio
&o1
paper
&o12
book
&o29
...
title
author
&o52
&o47
Smith
&o48
1999
&o50
author
author
title
author
date
&a2
author
&o24
author
row
&a1
book
&96
date
&25
1976
Roux Combalusier
Answer =
{row: {author:“Smith”},
row: {author:“Roux”,
author:“Combalusier”,},
}
&30
Database
Systems
Database
Systems
8
SELECT ( SELECT row: {author: Y, title: T}
FROM X.author Y, X.title T)
FROM biblio.book X
WHERE “Roux” in X.author
Query 4:
row
answer
biblio
&o1
paper
&o12
&a1
book
book
&o24
author
&o29
...
date
title
author
&o52
&o47
Smith
&o48
1999
&o50
title
&a2
author
title
title
author
author
row
&96
date
&25
1976
Roux Combalusier
&30
Answer =
{row: {author:“Roux”,
title: “Database…”},
row: {author:“Combalusier”,
title: “Database…”},
}
Database
Systems
Database
Systems
9
Lorel
 Minor syntactic differences in regular path
expressions (% instead of _, # instead of _*)
 Common path convention:
becomes:
SELECT biblio.book.author
FROM biblio.book
WHERE biblio.book.year = 1999
SELECT X.author
FROM biblio.book X
WHERE X.year = 1999
10
Lorel
 Existential variables:
SELECT biblio.book.year
FROM biblio.book
WHERE biblio.book.author = “Roux”
• What happens with books having multiple
authors ? Author is existentially quantified:
SELECT X.year
FROM biblio.book X, X.author Y
WHERE Y = “Roux”
11
Lorel
 Path variables. @P in:
SELECT @P
FROM biblio.# @P X
• What happens on graphs with cycles ?
 Constructing new results
• Several default rules
 Casting between datatypes
• Very useful in practice
12
XPath
 http://www.w3.org/TR/xpath (11/99)
 Building block for other W3C standards:
•
•
•
•

XSL Transformations (XSLT)
XML Link (XLink)
XML Pointer (XPointer)
XML Query
Was originally part of XSL
13
XPath: Summary
bib
matches a bib element
*
matches any element
/
matches the root element
/bib
matches a bib element under root
bib/paper
matches a paper in bib
bib//paper
matches a paper in bib, at any depth
//paper
matches a paper at any depth
paper|book
matches a paper or a book
@price
matches a price attribute
bib/book/@price
matches price attribute in book, in bib
14
bib/book/[@price<“55”]/author/lastname matches…
Example for XPath Queries
<bib>
<book> <publisher> Addison-Wesley </publisher>
<author> Serge Abiteboul </author>
<author> <first-name> Rick </first-name>
<last-name> Hull </last-name>
</author>
<author> Victor Vianu </author>
<title> Foundations of Databases </title>
<year> 1995 </year>
</book>
<book price=“55”>
<publisher> Freeman </publisher>
<author> Jeffrey D. Ullman </author>
<title> Principles of Database and Knowledge Base Systems </title>
<year> 1998 </year>
</book>
</bib>
15
Data Model for XPath
The root
bib
book
publisher
Addison-Wesley
The root element
book
author
. . . .
Serge Abiteboul
16
XPath: Simple Expressions
/bib/book/year
Result: <year> 1995 </year>
<year> 1998 </year>
/bib/paper/year
Result: empty
(there were no papers)
17
XPath: Restricted Kleene Closure
//author
Result:<author> Serge Abiteboul </author>
<author> <first-name> Rick </first-name>
<last-name> Hull </last-name>
</author>
<author> Victor Vianu </author>
<author> Jeffrey D. Ullman </author>
/bib//first-name
Result: <first-name> Rick </first-name>
18
XPath: Text Nodes
/bib/book/author/text()
Result:
Serge Abiteboul
Jeffrey D. Ullman
!Rick Hull doesn’t appear because he has firstname, lastname
Functions in XPath:
• text() = matches the text value
• node() = matches any node (= * or @* or text())
• name() = returns the name of the current tag
19
XPath: Wildcard
//author/*
Result: <first-name> Rick </first-name>
<last-name> Hull </last-name>
* Matches any element
20
XPath: Attribute Nodes
/bib/book/@price
Result: “55”
@price means that price is has to be an
attribute
21
XPath: Predicates
/bib/book/author[firstname]
Result: <author> <first-name> Rick </first-name>
<last-name> Hull </last-name>
</author>
22
XPath: More Predicates
/bib/book/author[firstname][address[//zip][city]]/lastname
Result: <lastname> … </lastname>
<lastname> … </lastname>
23
XPath: More Predicates
/bib/book[@price < “60”]
/bib/book[author/@age < “25”]
/bib/book[author/text()]
24
XQuery
 Based on Quilt
(which is based on
XML-QL)
 http://www.w3.org/
TR/xquery/
2/2001
 XML Query data
model
• Ordered !
FLWOR (flower)
Expressions
FOR ...
LET...
WHERE...
ORDER BY…
RETURN...
25
XQuery
Query: Find all book titles published after 1995:
FOR $x IN document("bib.xml")/bib/book
WHERE $x/year > 1995
RETURN $x/title
<bib>
<book> <publisher> Addison-Wesley </publisher>
<author> Serge Abiteboul </author>
<author> <first-name> Rick </first-name>
<last-name> Hull </last-name>
</author>
<author> Victor Vianu </author>
<title> Foundations of Databases </title>
<year> 1995 </year>
</book>
<book price=“55”>
<publisher> Freeman </publisher>
<author> Jeffrey D. Ullman </author>
<title> Principles of Database and Knowledge Base Systems
</title>
<year> 1998 </year>
</book>
* bib.xml is shown on slide 15
Result:
<title> Principles of Database…</title>
</bib>
26
XQuery
Query: Find book titles by the coauthors of
“Foundations of Databases”:
FOR $x IN bib/book[title/text() = “Foundations …”]/author
$y IN bib/book[author/text() = $x/text()]/title
RETURN <answer> $y/text() </answer>
The answer will
contain duplicates !
Result:
<answer> Foundations … </ answer >
< answer> Foundations …</ answer >
27
XQuery
Same as before, but eliminate duplicates:
FOR $x IN bib/book[title/text() = “Database Theory”]/author
$y IN distinct(bib/book[author/text() = $x/text()]/title)
RETURN <answer> $y/text() </answer>
distinct = a function
that eliminates duplicates
Result:
< answer> Foundations …</ answer >
28
SQL and XQuery Side-by-side
Product(pid, name, maker)
Company(cid, name, city)
Query: Find all products made in Seattle
SELECT x.name
FROM Product x, Company y
WHERE x.maker=y.cid
and y.city=“Seattle”
SQL
Cool
XQuery
FOR $x IN /db/Product/row
$y IN /db/Company/row
WHERE
$x/maker/text()=$y/cid/text()
and $y/city/text() = “Seattle”
RETURN $x/name
XQuery
FOR $y IN /db/Company/row[city/text()=“Seattle”]
$x IN /db/Product/row[maker/text()=$y/cid/text()]
RETURN $x/name
29
Result:
XQuery: Nesting
Query: For each author of
a book by Morgan
Kaufmann, list all
books s/he published:
<result>
<author>Jones</author>
<title> abc </title>
<title> def </title>
</result>
<result>
<author> Smith </author>
<title> ghi </title>
</result>
FOR $a IN distinct(document("bib.xml")
/bib/book[publisher=“Morgan Kaufmann”]/author)
RETURN <result>
{ $a,
FOR $t IN /bib/book[author=$a]/title
RETURN $t
}
</result>
30
XQuery
 FOR $x IN expr -- binds $x to each value in the
list expr
 LET $x = expr -- binds $x to the entire list expr
• Useful for common subexpressions and for
aggregations
<big_publishers>
FOR $p IN distinct(document("bib.xml")//publisher)
LET $b := document("bib.xml")/book[publisher = $p]
WHERE count($b) > 100
RETURN $p
</big_publishers>
count = a (aggregate) function that returns the number of elms
31
XQuery
Query: Find books whose price is larger than
average:
FOR $a IN /bib/book
LET $b:=avg(/bib/book/price/text())
WHERE $a/price/text() > $b
RETURN $a
32
XQuery
Query: Find all publishers that published more than 100 books:
<big_publishers>
{ FOR $p IN distinct(//publisher/text())
LET $b := document("bib.xml")/book[publisher/text() = $p]
WHERE count($b) > 100
RETURN <publisher> $p </publisher>
}
</big_publishers>
$b is a collection of elements, not a single element
count = a (aggregate) function that returns the number of elements
33
FOR v.s. LET
FOR
 Binds node variables  iteration
LET
 Binds collection variables  one value
Examples
FOR $x IN document("bib.xml")/bib/book
RETURN <result> $x </result>
LET $x := document("bib.xml")/bib/book
RETURN <result> $x </result>
Returns:
<result> <book>...</book></result>
<result> <book>...</book></result>
<result> <book>...</book></result>
...
Returns:
<result> <book>...</book>
<book>...</book>
<book>...</book>
...
</result>
34
Sorting in XQuery
<publisher_list>
FOR $p IN distinct(document("bib.xml")//publisher)
RETURN <publisher> {<name> $p/text() </name> ,
FOR $b IN document("bib.xml")//book[publisher = $p]
RETURN <book>
{$b/title ,
$b/@price}
</book> SORTBY(price DESCENDING) }
</publisher> SORTBY(name)
</publisher_list>
35
Sorting in XQuery
 Sorting arguments: refer to the name space of the
RETURN clause, not the FOR clause
 To sort on an element you don’t want to display,
first return it, then remove it with an additional
query.
<publisher_list>
FOR $p IN distinct(document("bib.xml")//publisher)
RETURN <publisher> { <name> $p/text() </name> ,
FOR $b IN document("bib.xml")//book[publisher = $p]
RETURN <book>
{ $b/title ,
$b/price
}
</book> ORDER BY price DESCENDING }
</publisher> ORDER BY name
</publisher_list>
36
Collections in XQuery
 Ordered and unordered collections
• /bib/book/author = an ordered collection
• Distinct(/bib/book/author) = an unordered collection
 LET $b = /bib/book  $b is a collection
 $b/author  a collection (several authors...)
Returns:
RETURN <result> $b/author </result>
<result> <author>...</author>
<author>...</author>
<author>...</author>
...
</result>
37
If-Then-Else
FOR $h IN //holding
RETURN <holding>
{ $h/title,
IF $h/@type = "Journal"
THEN $h/editor
ELSE $h/author
}
</holding> ORDER BY title
38
Quantifiers
FOR $b IN //book
Existential
Quantifiers
WHERE SOME $p IN $b//para SATISFIES
contains($p, "sailing")
AND contains($p, "windsurfing")
RETURN $b/title
FOR $b IN //book
Universal
Quantifiers
WHERE EVERY $p IN $b//para SATISFIES
contains($p, "sailing")
RETURN $b/title
39
Other Stuff in XQuery
 BEFORE and AFTER
• for dealing with order in the input
 FILTER
• deletes some edges in the result tree
 Recursive functions
• Currently: arbitrary recursion
• Perhaps more restrictions in the future ?
40
Group-By in XQuery ??
 No GROUPBY currently in XQuery
 A recent proposal (next)
• What do YOU think ?
41
Group-By in XQuery ??
FOR $b IN document("http://www.bn.com")/bib/book,
$y IN $b/@year
WHERE $b/publisher="Morgan Kaufmann"
RETURN
 with GROUPBY
GROUPBY $y
WHERE count($b) > 10
IN <year> $y </year>
SELECT year
Equivalent SQL 
FROM Bib
WHERE Bib.publisher="Morgan Kaufmann"
GROUPBY year
HAVING count(*) > 10
42
Group-By in XQuery ??
FOR $b IN document("http://www.bn.com")/bib/book,
$a IN $b/author,
$y IN $b/@year
RETURN GROUPBY $a, $y
IN <result> $a,
<year> $y </year>,
<total> count($b) </total>
</result>
Without GROUPBY 
 with GROUPBY
FOR $Tup IN distinct(FOR $b IN document("http://www.bn.com")/bib,
$a IN $b/author,
$y IN $b/@year
RETURN <Tup> <a> $a </a> <y> $y </y> </Tup>),
$a IN $Tup/a/node(),
$y IN $Tup/y/node()
LET $b = document("http://www.bn.com")/bib/book[author=$a,@year=$y]
RETURN <result> $a,
<year> $y </year>,
<total> count($b) </total>
</result>
43
Group-By in XQuery ??
FOR $b IN document("http://www.bn.com")/bib/book,
$a IN $b/author,
$y IN $b/@year,
$t IN $b/title,
$p IN $b/publisher
RETURN
GROUPBY $p, $y
IN <result> $p,
<year> $y </year>,
GROUPBY $a
IN <authorEntry>
$a,
GROUPBY $t
IN $t
<authorEntry>
</result>
 Nested GROUPBY’s
44
XQuery
Summary:[Demo]
FOR-LET-WHERE-RETURN = FLWR
FOR/LET Clauses
List of tuples of bounded variables
WHERE Clause
List of pruned tuples of bounded variables
RETURN Clause
Instance of XQuery data model
45
Related documents