Download Inexact Querying of XML - Technion – Israel Institute of

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Database wikipedia , lookup

Concurrency control wikipedia , lookup

Microsoft Jet Database Engine wikipedia , lookup

Relational model wikipedia , lookup

Database model wikipedia , lookup

Clusterpoint wikipedia , lookup

Transcript
Incomplete Answers over
Semistructured Data
Kanza, Nutt, Sagiv
PODS 1999
Slides by Yaron Kanza
Dealing with Incomplete Data
Queries with complete answers
Queries with AND Semantics
Queries with Weak Semantics
Queries with OR Semantics
Queries with Incomplete Answers
Increasing
level of
incompleteness
Queries and Matchings
• The queries are labeled rooted directed graphs
– labels are on the edges
• Query nodes are variables
• Database nodes are objects
• Matchings are assignments of database nodes to
the query variables according to
– the constraints specified in the query, and
– the semantics of the query
Constraints On Exact Matchings
• Root Constraint:
• Satisfied if the query root is mapped to the db root
Query Root
r
1
Database Root
• Edge Constraint:
• Satisfied if a query edge with label l is mapped to a
database edge with label l
x
l
12
l
y
25
Movie Database
A Exact
Matching
1
r
Movie
Movie
Producer Movie
11
Uncredited
Director Actor
Actor
Title
Actor
Year
21
22
23 24
12
Actor
27
25
x
Director
Title
26
29
Star 1977
Name
Name
Hook
Name
Name Name Wars
Date of
34
30
32
31
33
birth George Dustin
Steven
Mark Harrison
Lucas Hoffman
Spielberg
35
Hamill Ford
14 May 1944
Movie
Producer
y
Director Uncredited
Actor
z
Date of
birth Name
u
The root constraint and
All the nodes are mapped
all the edge constraints
to non-null values
are satisfied
v
Movie Database
1
r
Movie
Movie
Producer Movie
11
Uncredited
Director Actor
Actor
Title
Actor
Year
21
22
23 24
12
Actor
27
25
x
Director
Title
26
29
Star 1977
Name
Name
Hook
Name
Name Name Wars
Date of
34
30
32
31
33
birth George Dustin
Steven
Mark Harrison
Lucas Hoffman
Spielberg
35
Hamill Ford
14 May 1944
Consider the case where Node 35
is removed from the database
Movie
Producer
y
Director Uncredited
Actor
z
Date of
birth Name
u
v
No Exact
Matching Exists!
Allow Partial Matchings
Movie Database
1
r
Movie
Movie
Producer Movie
11
Uncredited
Director Actor
Actor
Title
Actor
Year
21
22
23 24
Star 1977
Name Name Wars
30
31
Mark Harrison
Hamill Ford
Actor
27
25
NULL
12
Director
Title
26
29
Name
Name Name Hook
32
33
George Dustin
Lucas Hoffman
34
Steven
Spielberg
x
Movie
NULL
Producer
y
Director Uncredited
Actor
z
NULL
Date of
birth Name
u
NULL
Not Every Partial Assignment
This is not interesting, since the query returns data
is of interest
that has no connection to the query
v
The Reachability Constraint
on Partial Matchings
• A query node v that is mapped to a database
object o satisfies the reachability constraint
if there is a path from the query root to v,
such that all edge constraints along this path
Database
are satisfied
11
r l
2
l1
x
w
l4
l3
y
Query
z
l5
l6
vv
5
l1
l2
7
l4
l3
8
9
l5
l6
55
“And” Matchings
• A partial matching is an AND matching if
– The root constraint is satisfied
– The reachability constraint is satisfied by every
query node that is mapped to a database node
– If a query node is mapped to a database node,
all the incoming edge constraints are satisfied
r
x
Producer
y
Actor
Director
z
Movie Database
1
r
Movie
Movie
Producer Movie
11
Uncredited
Director Actor
Actor
Title
Actor
Year
21
22
23 24
Star 1977
Name Name Wars
30
31
Mark Harrison
Hamill Ford
12
Actor
27
25
x
Director
Title
29
26
Name
Name Name Hook
32
34
33
George Dustin
Lucas Hoffman
Steven
Spielberg
An AND Matching
Movie
Producer
y
Director Uncredited
Actor
z
Date of
birth Name
u
NULL
v
Movie Database
1
r
Movie
Movie
Producer Movie
11
Uncredited
Director Actor
Actor
Title
Actor
Year
21
22
23 24
Star 1977
Name Name Wars
30
31
Mark Harrison
Hamill Ford
12
Actor
27
25
x
Director
Title
26
29
Name
Name Name Hook
32
33
George Dustin
Lucas Hoffman
Suppose that we remove the
edges that are labeled with
Uncredited Actor
34
Movie
Producer
y
Director Uncredited
Actor
z
Date of
birth Name
u
v
Steven
Spielberg
In an AND matching,
Node z must be null!
Weak Satisfaction of
Edge Constraints
• Edge Constraint:
• Is Weakly Satisfied if it is either
• Satisfied (as defined earlier), or
• One (or more) of its nodes is mapped to a null value
null
x
12
l
l
y
25
x
12
l
m
y
25
null
x
l
null
12
m
y
25
x
l
y
null
Weak Matchings
• A partial matching is a weak matching
if
– The root constraint is satisfied
– The reachability constraint is satisfied by
every query node that is mapped to a
database node
– Every edge constraint is weakly satisfied
Movie Database
1
r
Movie
Movie
Producer Movie
11
12
Actor
Title Director
Actor
Year
21
22
23 24
Star 1977
Name Name Wars
30
27
Mark Harrison
Hamill Ford
25
Director
Title
26
29
Name
Name Name Hook
32
31
Actor
x
33
George Dustin
Lucas Hoffman
A Weak Matching
34
Steven
Spielberg
Movie
Producer
y
Director Uncredited
Actor
z
NULL
Date of
birth Name
u
v
NULL
Edges that are
weakly satisfied
In a weak matching, all four options are permitted
In an AND matching, only the first three options
are permitted
null
x
12
l
l
y
25
x
12
l
m
y
25
null
null
x
x
l
l
y
m
y
null
12
25
Movie Database
1
r
Movie
Movie
Producer Movie
11
12
Actor
Title Director
Actor
Year
21
22
23 24
Star 1977
Name Name Wars
30
31
Mark Harrison
Hamill Ford
27
Actor
25
x
Director
Title
26
29
Name
Name Name Hook
32
33
George Dustin
Lucas Hoffman
Consider the case where edges
labeled with Producer are removed
34
Movie
Producer
y
Director Uncredited
Actor
z
Date of
birth Name
u
v
Steven
Spielberg
In a weak matching,
Node z must be null!
“OR” Matchings
• A partial matching is an OR matching
if
– The root constraint is satisfied
– The reachability constraint is satisfied by
every query node that is mapped to a
database node
Movie Database
1
r
Movie
Movie
Movie
11
12
Actor
Title Director
Actor
Year
21
22
23 24
Star 1977
Name Name Wars
30
31
Mark Harrison
Hamill Ford
27
Actor
25
x
Director
Title
29
26
Name
Name Name Hook
32
34
33
George Dustin
Lucas Hoffman
Steven
Spielberg
An OR Matching
Movie
Producer
y
Director Uncredited
Actor
z
NULL
Date of
birth Name
u
v
NULL
An edge which
is not weakly
satisfied
Increasing Level of
Incompleteness
• A Exact matching is an AND matching
• An AND matching is a weak matching
• A weak matching is an OR matching
Maximal Matchings
• A matching is maximal if no other matching
subsumes it, i.e., if there is no other matching that
is equal on all mapped variables, and has
additional mapped variables
• A query result consists of maximal matchings
only
• The maximality of a matching may depend on the
semantics considered (i.e., or, weak, and)
Movie Database
1
r
Movie
Movie
Producer Movie
11
12
Actor
Title Director
Actor
Year
21
22
23 24
Star 1977
Name Name Wars
30
27
Mark Harrison
Hamill Ford
25
Director
Title
26
29
Name
Name Name Hook
32
31
Actor
x
33
George Dustin
Lucas Hoffman
34
Steven
Spielberg
Is this an AND matching?
Is it maximal?
Movie
Producer
y
Director Uncredited
Actor
z
NULL
Date of
birth Name
u
NULL
v
Movie Database
1
r
Movie
Movie
Producer Movie
11
12
Actor
Title Director
Actor
Year
21
22
23 24
Star 1977
Name Name Wars
30
27
Mark Harrison
Hamill Ford
25
Director
Title
26
29
Name
Name Name Hook
32
31
Actor
x
33
George Dustin
Lucas Hoffman
34
Steven
Spielberg
Is this a Weak matching?
Is it maximal?
Movie
Producer
y
Director Uncredited
Actor
z
NULL
Date of
birth Name
u
NULL
v
Movie Database
1
r
Movie
Movie
Producer Movie
11
12
Actor
Title Director
Actor
Year
21
22
23 24
Star 1977
Name Name Wars
30
31
Mark Harrison
Hamill Ford
27
Actor
25
x
Director
Title
26
29
Name
Name Name Hook
32
33
George Dustin
Lucas Hoffman
34
Steven
Spielberg
Is this an OR matching?
Is it maximal?
Movie
Producer
y
Director Uncredited
Actor
z
NULL
Date of
birth Name
u
NULL
v
Movie Database
1
r
Movie
Movie
Producer Movie
11
12
Actor
Title Director
Actor
Year
21
22
23 24
Star 1977
Name Name Wars
30
31
Mark Harrison
Hamill Ford
27
Actor
25
x
Director
Title
26
29
Name
Name Name Hook
32
33
George Dustin
Lucas Hoffman
34
Steven
Spielberg
Movie
Producer
y
Director Uncredited
Actor
z
NULL
Date of
birth Name
u
NULL
Is this an AND matching? Weak matching?
OR matching?
Is it maximal (for each option)?
v
University
University
1
u
Course
Teacher
Lab
2
Teacher
5
3
4
A. Cohen
Lab
v
Teacher
Course
Title
Title Instructor
Name Course
6
7
8
9
Logic Name OS C. Katz
Name
12
Course
13
B. Levi
10
Title
Teacher x
Teacher
11
Title
14
15
Compilers
Databases
w
Find all maximal answers under ANDSemantics, OR-Semantics and Weak
Semantics
Computing Maximal Answers
• How can we systematically compute all
maximal answers?
• Can we compute all answers in polynomial
time?
• We will see an algorithm to compute all
maximal answers of a DAG Query under
AND Semantics
Intuition
• Sort nodes in query by a topological order
• Start with the set of matchings containing the
matching (root of query/root of database)
• Iterate over nodes vi according to order
– extend each matchings by all possible images of vi that
yield AND-matchings
– if there are no appropriate images, then extend with vi
mapped to null
Eval-Dag-Query-AND-Semantics(Q,D)
let v0< v1 < …< vk be a topological ordering of the
nodes of Q
let S0 = {(v0/root(D)}
for i = 1 to k do
Si = ;
for each  2 Si-1 do
E = { u2 D |  © (vi/u) is an AND matching}
if E = ;
then Si = Si [ { © (vi/null)}
else Si = Si [ { © (vi/u) | u2 E}
Analyzing the Algorithm
• Why is the algorithm correct?
• What is the runtime of the algorithm?
• What are the memory requirements of the
algorithm?
• Can this algorithm easily be adapted for
general graph queries (which may contain
cycles)?
AND Semantics – Cyclic Queries
• Determining whether there is an AND
matching that maps at least 1 non-root node
to a non-null is NP-Complete
– why is it in NP?
– NP-hardness by reduction to Hamiltonian cycle
Hamiltonian Cycle
• Given a graph G, a Hamiltonian cycle is a
simple cycle that traverses each node in the
graph exactly once
• Determining if there is a Hamiltonian cycle
is NP-Complete!
Can You Find One Here?
Can You Find One Here?
Reduction
• We show how, given a solution to the
matchings under AND-Semantics problem,
we can solve the Hamiltonian cycle
problem
• Given graph G, we
– create database D and query Q such that
– G has a Hamiltonian cycle if and only if there is
an AND-matching that maps a non-root node to
a non-null value
Creating the Database
•
•
•
•
Suppose that the graph G has nodes n1,…,nk
We create a database with nodes u0,u1,…,uk
u0 is the root of the database
there is an edge labeled node from u0 to each node
ui
• for each pair of nodes ui, uj (i >=1, j>= 1, i j)
there is an edge labeled neql from ui to uj
• there is an edge labeled succ from ui to uj if there
is an edge from ni to nj in G
Example: Create the Database for
this Graph
Creating the Query
•
•
•
•
Suppose that the graph G has nodes n1,…,nk
We create a query with nodes v0,v1,…,vk
v0 is the root of the database
there is an edge labeled node from v0 to each node
vi
• for each pair of nodes vi, vj (i j) there is an edge
labeled neql from vi to vj
• there is an edge labeled succ from vi-1 to vi (for all
i>1) and an edge labeled succ from vk to v1
Example: Create the Query for this
Graph
How does the Reduction Work?
• Mapping the root of the query to the root of the
database is an AND-matching
– can any additional nodes be mapped?
• If there is a Hamiltonian cycle, then this gives rise
to a complete mapping of the query to the
database
• If there is a matching that maps something other
than the root to null, then:
– it must map all the nodes (because of the cycle of succ)
– it must map all query nodes to different database nodes
(because of neql edges)
– therefore, the mappings of the node correspond to a
Hamiltonian cycle (because of such edges)