Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
CSE 544: Lecture 9 Conjunctive Queries, Views, Datalog Monday, 4/29/2002 1 Conjunctive Queries • A conjunctive query is an FO formula containing: – R(t1, ..., tn), , (missing are , , ) • CQ = set of conjunctive queries • Example: q(x,y) = z.(R(x,z) u.(R(z,u) R(u,y))) 2 Conjunctive Queries • Any CQ query can be written as: q(x1,...,xn) = y1. y2... yp.(R1(t11,...,t1m) ... Rk(tk1,...,tkm)) (why ?) • Same in Datalog notation: Datalog rule q(x1,...,xn) :- R1(t11,...,t1m), ... , Rk(tk1,...,tkm)) head body 3 Examples Employee(x), ManagedBy(x,y), Manager(y) • Find all employees having the same manager as Smith: A(x) :- ManagedBy(“Smith”,y), ManagedBy(x,y) 4 Examples Employee(x), ManagedBy(x,y), Manager(y) • Find all employees having the same director as Smith: A(x) :- ManagedBy(“Smith”,y), ManagedBy(y,z), ManagedBy(x,u), ManagedBy(u,z) 5 CQ and Relational Algebra Relational Algebra: • Conjunctive queries correspond precisely to sC, PA, (missing: , –) A(x) :- ManagedBy(“Smith”,y), ManagedBy(x,y) P$2.name $1.manager=$2.manager sname=“Smith” 6 ManagedBy ManagedBy CQ and SQL SQL: • Conjunctive queries correspond to single select-distinct-from-where blocks with equality conditions in the WHERE clause select distinct m2.name from ManagedBy m1, ManagedBy m2 where m1.name=“Smith” AND m1.manager=m2.manager 7 Conjunctive Queries • Main focus of optimization techniques • Focus of research during 70’s, 80’s • Still focus of research in the 00’s • Properties of CQ: – Containment is decidable [Chandra&Merlin’77] – Query rewriting using views [Levy et al.’95, Ullman’99] 8 Query Containment • Query q1 is contained in q2 if for every database D, q1(D) q2(D). • Notation: q1 q2 • Obviously: if q1 q2 and q2 q1 then q1 = q2. 9 Examples of Query Containments In which cases is q1 q2 ? q1(x) :- R(x,u), R(u,v), R(v,w) q2(x) :- R(x,u), R(u,v) q1(x) :- R(x,u), R(u,u) q2(x) :- R(x,u), R(u,v) q1(x) :- R(x,u), R(u,v), R(v,x) q2(x) :- R(x,u), R(u,x) q1(x) :- R(x,u), R(u,”Smith”) q2(x) :- R(x,u), R(u,v) 10 Query Containment • Theorem Query containment for FO is undecidable • Theorem Query containment for CQ is decidable and NP-complete. 11 Query Containment Algorithm How to check q1 q2 • Canonical database for q1 is: Dq1 = (D, R1D, …, RkD) – D = all variables and constants in q1 – R1D, …, RkD = the body of q1 • Canonical tuple for q1 is: tq1 (the head of q1) 12 Examples of Canonical Databases q1(x,y) :- R(x,u),R(v,u),R(v,y) • Canonical database: Dq1 = (D, RD) – D={x,y,u,v} – RD = x u v u v y • Canonical tuple: tq1 = (x,y) 13 Examples of Canonical Databases q1(x) :- R(x,u), R(u,”Smith”), R(u,”Fred”), R(u, u) • Dq1 = (D, R) – D={x,u,”Smith”,”Fred”} – R= x u u “Smith” u “Fred” u u • tq1 = (x) 14 Checking Containment Theorem: q1 q2 iff tq1 q2(Dq1). Example: q1(x,y) :- R(x,u),R(v,u),R(v,y) q2(x,y) :- R(x,u),R(v,u),R(v,w),R(t,w),R(t,y) • D={x,y,u,v} • R= x u v u v y • Yes, q1 q2 tq1 = (x,y) 15 Query Homomorphisms • A homomorphism f : q2 q1 is a function f: var(q2) var(q1) const(q1) such that: – f(body(q2)) body(q1) – f(tq1) = tq2 The Homomorphism Theorem q1 q2 iff there exists a homomorphism f : q2 q1 16 Example of Query Homeomorphism var(q1) = {x, u, v, y} var(q2) = {x, u, v, w, t, y} q1(x,y) :- R(x,u),R(v,u),R(v,y) q2(x,y) :- R(x,u),R(v,u),R(v,w),R(t,w),R(t,y) 17 Example of Query Homeomorphism var(q1) const(q1) = {x,u, “Smith”} var(q2) = {x,u,v,w} q1(x) :- R(x,u), R(u,”Smith”), R(u,”Fred”), R(u, u) q2(x) :- R(x,u), R(u,v), R(u,”Smith”), R(w,u) 18 The Homeomorphism Theorem • Theorem q1 q2 iff there exists a homeomorphism from q2 to q1. • Theorem Conjunctive query containment is: (1) decidable (why ?) (2) in NP (why ?) (3) NP-hard • Short: it is NP-complete 19 Views Employee(x), ManagedBy(x,y), Manager(y) L(x,y) :- ManagedBy(x,u), ManagedBy(u,y) Views E(x,y) :- ManagedBy(x,y), Employee(y) Query Q(x,y) :- ManagedBy(x,u), ManagedBy(u,v), ManagedBy(v,w), ManagedBy(w,y), Employee(y) How can we answer Q if we only have L and E ? 20 Views • Query rewriting using views (when possible): Q(x,y) :- L(x,u), L(u,y), E(v,y) • Query answering: – Sometimes we cannot express it in CQ or FO, but we can still answer it 21 Views Applications: • Using advanced indexes • Using replicated data • Data integration [Ullman’99] 22 Expressive Power • Vocabulary: binary relation R • The following queries cannot be expressed in FO: • Transitive closure: – x.y. there exists x1, ..., xn s.t. R(x,x1) R(x1,x2) ... R(xn-1,xn) R(xn,y) • Parity: the number of edges in R is even 23 Datalog • Adds recursion, so we can compute transitive closure • A datalog program (query) consists of several datalog rules: P1(t1) :- body1 P2(t2) :- body2 .. . . Pn(tn) :- bodyn 24 Datalog Terminology: • EDB = extensional database predicates – The database predicates • IDB = intentional database predicates – The new predicates constructed by the program 25 Datalog Employee(x), ManagedBy(x,y), Manager(y) All higher level managers that are employees: EDBs HMngr(x) :- Manager(x), ManagedBy(y,x), ManagedBy(z,y) Answer(x) :- HMngr(x), Employee(x) IDBs 26 Datalog Employee(x), ManagedBy(x,y), Manager(y) All persons: Person(x) :- Manager(x) Person(x) :- Employee(x) Manger Employee 27 Datalog Graph: R(x,y) P(x,y) :- R(x,u), R(u,v), R(v,y) A(x,y) :- P(x,u), P(u,y) Can “unfold” it into: A(x,y) :- R(x,u), R(u,v), R(v,w), R(w,m), R(m,n), R(n,y) 28 Recursion in Datalog Graph: R(x,y) Transitive closure: P(x,y) :- R(x,y) P(x,y) :- P(x,u), R(u,y) Transitive closure: P(x,y) :- R(x,y) P(x,y) :- P(x,u), P(u,y) 29 Recursion in Datalog Boolean trees: Leaf0(x), Leaf1(x), AND(x, y1, y2), OR(x, y1, y2), Root(x) • Find out if the tree value is 0 or 1 One(x) One(x) One(x) One(x) Answer() :- Leaf1(x) :- AND(x, y1, y2), One(y1), One(y2) :- OR(x, y1, y2), One(y1) :- OR(x, y1, y2), One(y2) :- Root(x), One(x) 30 Exercise Boolean trees: Leaf0(x), Leaf1(x), AND(x, y1, y2), OR(x, y1, y2), Not(x,y), Root(x) • Hint: compute both One(x) and Zero(x) here you need to use Leaf0 31 Variants of Datalog without recursion with recursion Non-recursive Datalog without = union of CQ (why ?) Datalog Non-recursive Datalog = FO Datalog with 32 Computational Complexity Classes Recall computational complexity classes: • AC0 • LOGSPACE • NLOGSPACE We care mostly • PTIME about these • NP • PSPACE • EXPTIME • EXPSPACE • (Kalmar) Elementary Functions • Turing Computable functions 33 Query Languages and Complexity Classes Paper: On the Unusual Effectiveness of Logic in Computer Science PSPACE FO(PFP) = datalog,* PTIME FO(LFP) = datalog AC0 FO = non-rec datalog Important: the more complex a QL, the harder it is to optimize 34