Download Tree Logical Classes for Efficient Evaluation of XQuery

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
$
'
Tree Logical Classes for Efficient Evaluation of
XQuery
Stelios Paparizos, Yuqing Wu, H.V. Jagadish,
Laks V.S. Lakshmanan
Presented by:
Todd J. Green
University of Pennsylvania
October 7, 2004
&
1
%
$
'
Running Example
Here is an XQuery query.
FOR $p IN document(‘‘auction.xml’’)//person
FOR $o IN document(‘‘auction.xml’’)//open_auction
WHERE count($o/bidder) > 5 AND $p/age > 25
AND $p/@id = $o/bidder//@person
RETURN
<person name={$p/name/text()}> $o/bidder </person>
We all can see what the query means. But how do you actually
evaluate it? That is the focus of this paper.
&
2
%
$
'
Evaluating XQuery
Of course, this is not the first paper ever about evaluating XQuery.
There have been many. The approaches fall into three categories:
1. Mapping to relational
2. Native navigational-based (Galax)
3. Native algebraic-based (Tamino, Xyleme, . . . )
This paper is one of those which take the third approach.
Moreover, theirs is “tree-based” rather than “node-based”. This
means something like, intermediate results are collections of trees
rather than collections of vertices.
&
3
%
$
'
Matching
Before we try anything fancy, let’s just consider the fundamental
operation of matching. There are two crucial concepts used in
matching: the pattern tree and the witness tree. The exposition in
the paper is not very clear (witness tree is not even defined at all),
yet these are crucial concepts, so we will spend a little time here.
&
4
%
$
'
Annotated Pattern Tree
First some intuition. We use something called a “pattern tree” to
perform projection. A “match” of the pattern tree on the
“database” D yields an XML tree T which we call the “witness
tree”. It “witnesses” the match by virtue of having a mapping f
back to the pattern P and a mapping g forward to the database D,
like so:
f
g
P ←− T −→ D
The pattern P imposes rules on the allowed mappings f and g to
enforce things like cardinality (‘*’, ‘?’, etc) and ancestry
relationships (‘/’ versus ‘//’).
OK, now on to the details.
&
5
%
$
'
Annotated Pattern Tree
An annotated pattern tree P is a quintuple P = hV, E, m, p, ri,
where:
• (V, E) is a tree;
• m : E → {∗, ?, −, +} is a matching specification function
mapping each edge to a cardinality constraint;
• p is a mapping that associates with each vertex v in V a
predicate pv ;
• r : E → {/, //} is a function specifying for each edge (u, v) in
the pattern the desired structural relationship between u and v
(parent-child or ancestor-descendant).
&
6
%
$
'
Witness Tree
Let D be a set of trees representing the database, and let
P = hV, E, m, p, ri be an annotated pattern tree. We say that a
tree W is a witness tree for D and P if there exists a mapping
f : W → D such that
• f is a homomorphism, that is,
1. f (root(W )) = root(P ),
2. for all (u, v) ∈ W we have (f (u), f (v)) ∈ P .
• f satisfies the cardinality constraints of P , that is, for all
u ∈ P , we have:
1. |f −1 (u)| = 0 implies m(u) = ‘?’ ∨ m(u) = ‘*’,
2. |f −1 (u)| > 1 implies m(u) 6= ‘-’ ∧ m(u) 6= ‘?’.
&
7
%
'
and there exists a mapping g : W → D such that
$
• g is injective (one-to-one)
• g satisfies the ancestry constraint of P , that is, for all
(u, v) ∈ W , we have
1. r(f (u), f (v)) = ‘/’ implies (g(u), g(v)) ∈ D,
2. r(f (u), f (v)) = ‘//’ implies there exists a path t from g(u)
down to g(v) in D.
• g satisfies the predicate constraint of P , that is, for all u ∈ W ,
pf (u) (g(u)) = true
As a corallary, the composition h = g ◦ f −1 : P → 2D is what the
paper calls a match.
&
8
%
$
'
Example
&
9
%
$
'
Logical Classes
OK, we’re almost ready to look at the algebra itself. But first, we
need one more key concept, that of the logical class. Intuitively, a
logical class is just a set of nodes in the database that get matched
by a particular node in the pattern tree. More precisely, let P be a
pattern tree and let W denote the set of witness trees for P on
some database D. Then we define the logical class of a vertex v in
P to be
LC(v) = {u | ∃W ∈ W s.t. u ∈ W ∧ f (u) = v}
For each logical class LC(v) we additionally associate a unique
label LCLv identifying the class.
&
10
%
$
'
The Tree Logical Algebra
With the required machinery under our belts, we are ready to look
at the algebra itself. There are seven algebraic operators in total.
They all map sets of trees to sets of trees. The first six are unary:
1. Select σP : S → S
2. Project πnl : S → S
3. Filter φLCLf ,p,m : S → S
4. Duplicate Elimination δnl,ci : S → S
5. Aggregate Function αf name,LCLa ,LCLb : S → S
6. Construct κc : S → S
There is also one binary infix operator:
7. Join o
nP,p : S × S → S
&
11
%
'
$
&
%
Let’s go over a few of these.
12
$
'
Select
1. Select σP : S → S
P is an annotated pattern tree. For each tree in the input set S,
the operator performs a pattern tree match using P . The output is
the entire set of the matching witness trees for all input trees.
&
13
%
$
'
Project
2. Project πnl : S → S
Here, nl is a list of logical class labels. Roughly speaking, for each
input tree, retain only the nodes in the logical classes identified by
nl.
&
14
%
$
'
Filter
3. Filter φLCLf ,p,m : S → S
Given a set of trees S, a predicate p, and a mode flag m, the
meaning of φLCLf ,p,m (S) is the following: output only the trees in
S that satisfy the predicate p (a) for every node u ∈ S in logical
class LCf (if m = ∀), or (b) for some node u ∈ S in logical class
LCf (if m = ∃). Actually, there are additional modes too, like
“exactly one” and “first element.”
&
15
%
$
'
Join
7. Join o
nP,p : S × S → S
Let S1 and S2 be two sets of trees. Let P be an annotated pattern
tree. Let p be a predicate. Question: what does S1 o
nP,p S2 mean?
&
16
%
$
'
Running Example
Now that we have gotten the flavor of the algebra, let’s revisit the
XQuery example from earlier:
FOR $p IN document(‘‘auction.xml’’)//person
FOR $o IN document(‘‘auction.xml’’)//open_auction
WHERE count($o/bidder) > 5 AND $p/age > 25
AND $p/@id = $o/bidder//@person
RETURN
<person name={$p/name/text()}> $o/bidder </person>
&
17
%
$
'
Running Example
Here is how the TLA query plan looks.
&
18
%
$
'
Optimizations
Performing optimizations means eliminating or avoiding
unnecessary work. Here, the expensive operation that we want to
avoid doing too much is tree pattern matching. To that end, the
authors have developed various techniques to “reuse” pattern tree
matches. These are physical algebraic operators F L (flatten) and
SH/IL (shadow and illuminate).
&
19
%
$
'
Flatten
If the following figure appears blurry, please consider scheduling an
appointment with your eye doctor.
&
20
%
$
'
Shadow/Illuminate
Another illuminating example.
&
21
%
$
'
The End
&
22
%