Survey							
                            
		                
		                * Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Management of XML and
Semistructured Data
Lecture 13:
Keys for XML and
Advanced query analysis
Wednesday, May 9th, 2001
Outline
• Keys in XML
• Query analysis
– Query pruning
– Query containment (next time)
XML:
Keys in XML Schema
<purchaseReport>
<regions>
<zip code="95819">
<part number="872-AA" quantity="1"/>
<part number="926-AA" quantity="1"/>
<part number="833-AA" quantity="1"/>
<part number="455-BX" quantity="1"/>
</zip>
<zip code="63143">
<part number="455-BX" quantity="4"/>
</zip>
</regions>
<parts>
<part number="872-AA">Lawnmower</part>
<part number="926-AA">Baby Monitor</part>
<part number="833-AA">Lapis Necklace</part>
<part number="455-BX">Sturdy Shelves</part>
</parts>
</purchaseReport>
XML Schema:
<key name="NumKey">
<selector xpath="parts/part"/>
<field xpath="@number"/>
</key>
Keys in XML Schema
• In general, two flavors:
<key name=“someDummyNameHere">
<selector xpath=“p"/>
<field xpath=“p1"/>
<field xpath=“p2"/>
. . .
<field xpath=“pk"/>
</key>
<unique name=“someDummyNameHere">
<selector xpath=“p"/>
<field xpath=“p1"/>
<field xpath=“p2"/>
. . .
<field xpath=“pk"/>
</key>
Note: all Xpath expressions “start” at the element currently being defined
The fields must identify a single node
Keys in XML Schema
• Unique = guarantees uniqueness
• Key = guarantees uniqueness and existence
• All Xpath expressions are “restricted”:
– /a/b | /a/c OK for selector”
– //a/b/*/c OK for field
– To “help the implementors” (???)
• Note: better than DTD’s ID mechanism
Keys in XML Schema
• Examples
Recall: must have
A single forename,
Single surname
<key name="fullName">
<selector xpath=".//person"/>
<field xpath="forename"/>
<field xpath="surname"/>
</key>
<unique name="nearlyID">
<selector xpath=".//*"/>
<field xpath="@id"/>
</unique>
Foreign Keys in XML Schema
• Examples
<keyref name="personRef" refer="fullName">
<selector xpath=".//personPointer"/>
<field xpath="@first"/>
<field xpath="@last"/>
</keyref>
Another Proposal for Keys
• Keys for XML, Buneman, Davidson, Fan,
Hara, Tan, in WWW’10, May, 2001.
• Cleaner definition
• Extends with relative keys
• Addresses satisfiability problem
Another Proposal for Keys
• A key is q{p1, …, pk}
• An instance I satisfies the key, if:
–  x1, x2  q(root) ((z1  p1(x1).z2  p1(x2). z1=z2) 
...
(z1  pk(x1).z2  pk(x2). z1=z2))
 x1 = x2)
Another Proposal for Keys
Examples:
• //person  {@id}
• //person  {name}
• //person  {firstname, lastname}
– What happens with multiple names ?
• //person  {e}
• //person  {}
– What is the difference between these two ?
• //*  {id}
– What happens if an id doesn’t have an id child ?
Another Proposal for Keys
Intuition for q{p1, …, pk}
If I have k values, z1, …, zk, then there exists
at most one x  q(root) s.t.
z1  p1(x), …, zk  pk(x)
Think of retrieving x from z1, …, zk, using a
hash table
Another Proposal for Keys
• Some inference rules for keys
• q {p1, …, pk} is a key  q {p1, …, pn} is a
key, for k  n
• q.q’ {p} is a key  q {q’.p} is a key
Another Proposal for Keys
Relative key: q: q’{p1, …, pk}
An instance I satisfies the relative key,
if x q(I), q’{p1, …, pk} is a key for the
instance rooted at x
Another Proposal for Keys
Examples
• /bible/book/chapter: verse {number}
• /bible/book: chapter {number}
• /bible: book {name}
Another Proposal for Keys
• No relative keys in XML-Schema
• But could work around:
<key name=“dummyName">
<selector xpath=“/bible/book/chapter"/>
<field xpath=“number"/>
<field xpath=“../number"/>
<field xpath=“../../name"/>
</key>
Combining Keys and Schemas
• On XML Integrity Constraints in the
Presence of DTDs, Fan and Libkin,
PODS’2001
• Keys + DTDs sometimes imply unexpected
facts
• Main story: implication is undecidable
Combining Keys and Schemas
<teachers>
<teacher name=“Joe”> <subject expert=“Jim”> DB </subject>
<subject expert=“Karl”> Graphics </subject>
</teacher>
<teacher name=“Jim”> <subject expert=“Joe”> AI </subject>
<subject expert=“Fred”> OS </subject>
</teacher>
....
</teachers>
<!ELEMENT teachers (teacher+)>
<!ELEMENT teacher (subject,subject)>
Combining Keys and Schemas
Keys and foreign keys:
• Keys:
– //teacher  @name
– //subject  @expert
• Foreign keys:
– //@expert  //teacher/@name
• But this is impossible !
• In general: undecidable to check if it is possible
Query Analysis
Generic term to describe:
• Query rewriting based on schema
information
• Query containment and minimization
Query Rewriting
Problem:
• Given a query Q
– Regular path expression
– Or more complex Xquery expression
• Given a schema S
– graph schema
– DTD
– XML-Schema
• Rewrite Q to some QS s.t.
– Q is equivalent to QS over databases conforming to S
– QS is more efficient than Q
Query Rewriting
Optimizing Regular Path Expressions Using
Graph Schemas, M.Fernandez and D.Suciu,
Data Engineering, 98
Simplest setting:
• Regular path expression
• Graph schemas
Example of Query Rewriting
Q = //Department//Project
• Naive evaluation: need to traverse entire
graph (or tree)
Example of Query Rewriting
Graph Schema:
s1
S=
other
Org
s2
other
“Project”
“Member”
s3
Org = “Department”  “College”  “School”
other = Org  ”Project”  ”Member”
other
s4
other
Example of Query Rewriting
• Schema says: “there can be at most one
Department edge; below, there can be at most one
Project edge”
Q = //Department//Project
QS = (other)*/Department/(other)*/Project
other =  “Department”  “College”  “School”  ”Project”  ”Member”
• QS can be evaluated more efficiently than Q
– Why ?
Example of Query Rewriting
• How to construct QS systematically from Q
and S ?
• Step 1 build the automaton A for Q
• Step 2 build the product automaton S x A
• Step 3 QS = expression of S x A
Example of Query Rewriting
true
true
Project
Dept
A=
a3
a2
a1
S xA=
S=
s1
false
other
other
false
other
Org
Org
other
s2
other
Project
other
Member
s3
Project
false
Dept
Org
other
false
false
false
other
false
Project
false
Member
other
other
false
other
false
false
s4
other
QS = (other)*/Department/(other)*/Project
Query Rewriting
Correctness:
Proposition If the instance I conforms to S,
then Q(I) = QS(I)
That is, Q and QS are equivalent over
databases conforming to S
Query Rewriting
Efficiency
• Given query Q, instance I, define:
cost(Q,I) = | {w(I) | wprefix(Lang(Q))} |
Proposition If Q and Q’ are equivalent over all
databases conforming to S, and if I conforms to S,
then cost(QS,I)  cost(Q’,I)
Hence, QS is optimal (in a certain sense)
Query Rewriting
Query Optimization for Structured Documents Based
on Knowledge on the Document Type Definition,
K. Bohm, K. Gayer, K. Aberer, T. Özsu
More complex settings:
• Schema = DTD
• Query = region algebrar (think: Xpath)
Problem is more complex; this works proposes some
solution
Query Rewriting
Idea: analyze DTD and extract 3 relations:
Exclusivity. Element is E1 exclusively
contained in E2 if every path from the root
to E1 goes through E2
Xpath simplification:
E1[ancestor-or-self::E2]  E1
Query Rewriting
Obligation E1 obligatorily contains E2 if it
has a child of type E2
E1[E2]  E1
Query Rewriting
Entrance Location E is an entrance location
for E1, E2 if every path from E1 to E2 goes
through some E
E1[ancestor-or-self::E2] 
E1[ancestor-or-self::E[ancestor-or-self::E2]]
Query Rewriting
Add these rules, plus variations, to a rule-based
optimizer
• HyperStorM – a Structured Document Database
• On top of VODAK – an oo database system
Open question: does this approach exploit all the
information in a DTD/XML-Schema ? How can
we exploit what is not used ?