Download ppt

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
CSE 544: Lecture 9
Conjunctive Queries, Views, Datalog
Monday, 4/29/2002
1
Conjunctive Queries
• A conjunctive query is an FO formula
containing:
– R(t1, ..., tn), , 
(missing are , , )
• CQ = set of conjunctive queries
• Example: q(x,y) = z.(R(x,z)  u.(R(z,u)  R(u,y)))
2
Conjunctive Queries
• Any CQ query can be written as:
q(x1,...,xn) = y1. y2... yp.(R1(t11,...,t1m)  ...  Rk(tk1,...,tkm))
(why ?)
• Same in Datalog notation:
Datalog rule
q(x1,...,xn) :- R1(t11,...,t1m), ... , Rk(tk1,...,tkm))
head
body
3
Examples
Employee(x), ManagedBy(x,y), Manager(y)
• Find all employees having the same manager as
Smith:
A(x) :- ManagedBy(“Smith”,y), ManagedBy(x,y)
4
Examples
Employee(x), ManagedBy(x,y), Manager(y)
• Find all employees having the same director as
Smith:
A(x) :- ManagedBy(“Smith”,y), ManagedBy(y,z),
ManagedBy(x,u), ManagedBy(u,z)
5
CQ and Relational Algebra
Relational Algebra:
• Conjunctive queries correspond precisely to
sC, PA, 
(missing: , –)
A(x) :- ManagedBy(“Smith”,y), ManagedBy(x,y)
P$2.name
$1.manager=$2.manager
sname=“Smith”
6
ManagedBy
ManagedBy
CQ and SQL
SQL:
• Conjunctive queries correspond to single
select-distinct-from-where blocks with
equality conditions in the WHERE clause
select distinct m2.name
from ManagedBy m1, ManagedBy m2
where m1.name=“Smith” AND
m1.manager=m2.manager
7
Conjunctive Queries
• Main focus of optimization techniques
• Focus of research during 70’s, 80’s
• Still focus of research in the 00’s
• Properties of CQ:
– Containment is decidable [Chandra&Merlin’77]
– Query rewriting using views [Levy et al.’95, Ullman’99]
8
Query Containment
• Query q1 is contained in q2 if for every
database D, q1(D)  q2(D).
• Notation: q1  q2
• Obviously: if q1  q2 and q2  q1 then q1 = q2.
9
Examples of Query Containments
In which cases is q1  q2 ?
q1(x) :- R(x,u), R(u,v), R(v,w)
q2(x) :- R(x,u), R(u,v)
q1(x) :- R(x,u), R(u,u)
q2(x) :- R(x,u), R(u,v)
q1(x) :- R(x,u), R(u,v), R(v,x)
q2(x) :- R(x,u), R(u,x)
q1(x) :- R(x,u), R(u,”Smith”)
q2(x) :- R(x,u), R(u,v)
10
Query Containment
• Theorem Query containment for FO is
undecidable
• Theorem Query containment for CQ is
decidable and NP-complete.
11
Query Containment Algorithm
How to check q1  q2
• Canonical database for q1 is:
Dq1 = (D, R1D, …, RkD)
– D = all variables and constants in q1
– R1D, …, RkD = the body of q1
• Canonical tuple for q1 is:
tq1 (the head of q1)
12
Examples of Canonical
Databases
q1(x,y) :- R(x,u),R(v,u),R(v,y)
• Canonical database: Dq1 = (D, RD)
– D={x,y,u,v}
– RD =
x
u
v
u
v
y
• Canonical tuple: tq1 = (x,y)
13
Examples of Canonical
Databases
q1(x) :- R(x,u), R(u,”Smith”), R(u,”Fred”), R(u, u)
• Dq1 = (D, R)
– D={x,u,”Smith”,”Fred”}
– R=
x
u
u
“Smith”
u
“Fred”
u
u
• tq1 = (x)
14
Checking Containment
Theorem: q1  q2 iff tq1 q2(Dq1).
Example:
q1(x,y) :- R(x,u),R(v,u),R(v,y)
q2(x,y) :- R(x,u),R(v,u),R(v,w),R(t,w),R(t,y)
• D={x,y,u,v}
• R=
x
u
v
u
v
y
• Yes, q1  q2
tq1 = (x,y)
15
Query Homomorphisms
• A homomorphism f : q2  q1 is a function
f: var(q2)  var(q1)  const(q1)
such that:
– f(body(q2))  body(q1)
– f(tq1) = tq2
The Homomorphism Theorem q1  q2 iff there
exists a homomorphism f : q2  q1
16
Example of Query Homeomorphism
var(q1) = {x, u, v, y}
var(q2) = {x, u, v, w, t, y}
q1(x,y) :- R(x,u),R(v,u),R(v,y)
q2(x,y) :- R(x,u),R(v,u),R(v,w),R(t,w),R(t,y)
17
Example of Query Homeomorphism
var(q1)  const(q1) = {x,u, “Smith”}
var(q2) = {x,u,v,w}
q1(x) :- R(x,u), R(u,”Smith”), R(u,”Fred”), R(u, u)
q2(x) :- R(x,u), R(u,v), R(u,”Smith”), R(w,u)
18
The Homeomorphism Theorem
• Theorem q1  q2 iff there exists a
homeomorphism from q2 to q1.
• Theorem Conjunctive query containment is:
(1) decidable (why ?)
(2) in NP (why ?)
(3) NP-hard
• Short: it is NP-complete
19
Views
Employee(x), ManagedBy(x,y), Manager(y)
L(x,y) :- ManagedBy(x,u), ManagedBy(u,y)
Views
E(x,y) :- ManagedBy(x,y), Employee(y)
Query
Q(x,y) :- ManagedBy(x,u), ManagedBy(u,v),
ManagedBy(v,w), ManagedBy(w,y), Employee(y)
How can we answer Q if we only have L and E ?
20
Views
• Query rewriting using views (when
possible):
Q(x,y) :- L(x,u), L(u,y), E(v,y)
• Query answering:
– Sometimes we cannot express it in CQ or FO,
but we can still answer it
21
Views
Applications:
• Using advanced indexes
• Using replicated data
• Data integration [Ullman’99]
22
Expressive Power
• Vocabulary: binary relation R
• The following queries cannot be expressed
in FO:
• Transitive closure:
– x.y. there exists x1, ..., xn s.t.
R(x,x1)  R(x1,x2)  ...  R(xn-1,xn)  R(xn,y)
• Parity: the number of edges in R is even
23
Datalog
• Adds recursion, so we can compute transitive
closure
• A datalog program (query) consists of several
datalog rules:
P1(t1) :- body1
P2(t2) :- body2
.. . .
Pn(tn) :- bodyn
24
Datalog
Terminology:
• EDB = extensional database predicates
– The database predicates
• IDB = intentional database predicates
– The new predicates constructed by the program
25
Datalog
Employee(x), ManagedBy(x,y), Manager(y)
All higher level managers that are employees:
EDBs
HMngr(x) :- Manager(x), ManagedBy(y,x), ManagedBy(z,y)
Answer(x) :- HMngr(x), Employee(x)
IDBs
26
Datalog
Employee(x), ManagedBy(x,y), Manager(y)
All persons:
Person(x) :- Manager(x)
Person(x) :- Employee(x)
Manger  Employee
27
Datalog
Graph: R(x,y)
P(x,y) :- R(x,u), R(u,v), R(v,y)
A(x,y) :- P(x,u), P(u,y)
Can “unfold” it into:
A(x,y) :- R(x,u), R(u,v), R(v,w), R(w,m), R(m,n), R(n,y)
28
Recursion in Datalog
Graph: R(x,y)
Transitive closure:
P(x,y) :- R(x,y)
P(x,y) :- P(x,u), R(u,y)
Transitive closure:
P(x,y) :- R(x,y)
P(x,y) :- P(x,u), P(u,y)
29
Recursion in Datalog
Boolean trees:
Leaf0(x), Leaf1(x),
AND(x, y1, y2), OR(x, y1, y2),
Root(x)
• Find out if the tree value is 0 or 1
One(x)
One(x)
One(x)
One(x)
Answer()
:- Leaf1(x)
:- AND(x, y1, y2), One(y1), One(y2)
:- OR(x, y1, y2), One(y1)
:- OR(x, y1, y2), One(y2)
:- Root(x), One(x)
30
Exercise
Boolean trees:
Leaf0(x), Leaf1(x),
AND(x, y1, y2), OR(x, y1, y2), Not(x,y),
Root(x)
• Hint: compute both One(x) and Zero(x)
here you need to use Leaf0
31
Variants of Datalog
without recursion
with recursion
Non-recursive Datalog
without 
= union of CQ (why ?)
Datalog
Non-recursive Datalog
= FO
Datalog
with 
32
Computational
Complexity Classes
Recall computational complexity classes:
• AC0
• LOGSPACE
• NLOGSPACE
We care mostly
• PTIME
about these
• NP
• PSPACE
• EXPTIME
• EXPSPACE
• (Kalmar) Elementary Functions
• Turing Computable functions
33
Query Languages and
Complexity Classes
Paper: On the Unusual Effectiveness of Logic in Computer Science
PSPACE
FO(PFP) = datalog,*
PTIME
FO(LFP) = datalog
AC0
FO = non-rec datalog
Important: the more complex a QL, the harder it is to optimize
34
Related documents