* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Inexact Querying of XML - Technion – Israel Institute of
Survey
Document related concepts
Transcript
Incomplete Answers over Semistructured Data Kanza, Nutt, Sagiv PODS 1999 Slides by Yaron Kanza Dealing with Incomplete Data Queries with complete answers Queries with AND Semantics Queries with Weak Semantics Queries with OR Semantics Queries with Incomplete Answers Increasing level of incompleteness Queries and Matchings • The queries are labeled rooted directed graphs – labels are on the edges • Query nodes are variables • Database nodes are objects • Matchings are assignments of database nodes to the query variables according to – the constraints specified in the query, and – the semantics of the query Constraints On Exact Matchings • Root Constraint: • Satisfied if the query root is mapped to the db root Query Root r 1 Database Root • Edge Constraint: • Satisfied if a query edge with label l is mapped to a database edge with label l x l 12 l y 25 Movie Database A Exact Matching 1 r Movie Movie Producer Movie 11 Uncredited Director Actor Actor Title Actor Year 21 22 23 24 12 Actor 27 25 x Director Title 26 29 Star 1977 Name Name Hook Name Name Name Wars Date of 34 30 32 31 33 birth George Dustin Steven Mark Harrison Lucas Hoffman Spielberg 35 Hamill Ford 14 May 1944 Movie Producer y Director Uncredited Actor z Date of birth Name u The root constraint and All the nodes are mapped all the edge constraints to non-null values are satisfied v Movie Database 1 r Movie Movie Producer Movie 11 Uncredited Director Actor Actor Title Actor Year 21 22 23 24 12 Actor 27 25 x Director Title 26 29 Star 1977 Name Name Hook Name Name Name Wars Date of 34 30 32 31 33 birth George Dustin Steven Mark Harrison Lucas Hoffman Spielberg 35 Hamill Ford 14 May 1944 Consider the case where Node 35 is removed from the database Movie Producer y Director Uncredited Actor z Date of birth Name u v No Exact Matching Exists! Allow Partial Matchings Movie Database 1 r Movie Movie Producer Movie 11 Uncredited Director Actor Actor Title Actor Year 21 22 23 24 Star 1977 Name Name Wars 30 31 Mark Harrison Hamill Ford Actor 27 25 NULL 12 Director Title 26 29 Name Name Name Hook 32 33 George Dustin Lucas Hoffman 34 Steven Spielberg x Movie NULL Producer y Director Uncredited Actor z NULL Date of birth Name u NULL Not Every Partial Assignment This is not interesting, since the query returns data is of interest that has no connection to the query v The Reachability Constraint on Partial Matchings • A query node v that is mapped to a database object o satisfies the reachability constraint if there is a path from the query root to v, such that all edge constraints along this path Database are satisfied 11 r l 2 l1 x w l4 l3 y Query z l5 l6 vv 5 l1 l2 7 l4 l3 8 9 l5 l6 55 “And” Matchings • A partial matching is an AND matching if – The root constraint is satisfied – The reachability constraint is satisfied by every query node that is mapped to a database node – If a query node is mapped to a database node, all the incoming edge constraints are satisfied r x Producer y Actor Director z Movie Database 1 r Movie Movie Producer Movie 11 Uncredited Director Actor Actor Title Actor Year 21 22 23 24 Star 1977 Name Name Wars 30 31 Mark Harrison Hamill Ford 12 Actor 27 25 x Director Title 29 26 Name Name Name Hook 32 34 33 George Dustin Lucas Hoffman Steven Spielberg An AND Matching Movie Producer y Director Uncredited Actor z Date of birth Name u NULL v Movie Database 1 r Movie Movie Producer Movie 11 Uncredited Director Actor Actor Title Actor Year 21 22 23 24 Star 1977 Name Name Wars 30 31 Mark Harrison Hamill Ford 12 Actor 27 25 x Director Title 26 29 Name Name Name Hook 32 33 George Dustin Lucas Hoffman Suppose that we remove the edges that are labeled with Uncredited Actor 34 Movie Producer y Director Uncredited Actor z Date of birth Name u v Steven Spielberg In an AND matching, Node z must be null! Weak Satisfaction of Edge Constraints • Edge Constraint: • Is Weakly Satisfied if it is either • Satisfied (as defined earlier), or • One (or more) of its nodes is mapped to a null value null x 12 l l y 25 x 12 l m y 25 null x l null 12 m y 25 x l y null Weak Matchings • A partial matching is a weak matching if – The root constraint is satisfied – The reachability constraint is satisfied by every query node that is mapped to a database node – Every edge constraint is weakly satisfied Movie Database 1 r Movie Movie Producer Movie 11 12 Actor Title Director Actor Year 21 22 23 24 Star 1977 Name Name Wars 30 27 Mark Harrison Hamill Ford 25 Director Title 26 29 Name Name Name Hook 32 31 Actor x 33 George Dustin Lucas Hoffman A Weak Matching 34 Steven Spielberg Movie Producer y Director Uncredited Actor z NULL Date of birth Name u v NULL Edges that are weakly satisfied In a weak matching, all four options are permitted In an AND matching, only the first three options are permitted null x 12 l l y 25 x 12 l m y 25 null null x x l l y m y null 12 25 Movie Database 1 r Movie Movie Producer Movie 11 12 Actor Title Director Actor Year 21 22 23 24 Star 1977 Name Name Wars 30 31 Mark Harrison Hamill Ford 27 Actor 25 x Director Title 26 29 Name Name Name Hook 32 33 George Dustin Lucas Hoffman Consider the case where edges labeled with Producer are removed 34 Movie Producer y Director Uncredited Actor z Date of birth Name u v Steven Spielberg In a weak matching, Node z must be null! “OR” Matchings • A partial matching is an OR matching if – The root constraint is satisfied – The reachability constraint is satisfied by every query node that is mapped to a database node Movie Database 1 r Movie Movie Movie 11 12 Actor Title Director Actor Year 21 22 23 24 Star 1977 Name Name Wars 30 31 Mark Harrison Hamill Ford 27 Actor 25 x Director Title 29 26 Name Name Name Hook 32 34 33 George Dustin Lucas Hoffman Steven Spielberg An OR Matching Movie Producer y Director Uncredited Actor z NULL Date of birth Name u v NULL An edge which is not weakly satisfied Increasing Level of Incompleteness • A Exact matching is an AND matching • An AND matching is a weak matching • A weak matching is an OR matching Maximal Matchings • A matching is maximal if no other matching subsumes it, i.e., if there is no other matching that is equal on all mapped variables, and has additional mapped variables • A query result consists of maximal matchings only • The maximality of a matching may depend on the semantics considered (i.e., or, weak, and) Movie Database 1 r Movie Movie Producer Movie 11 12 Actor Title Director Actor Year 21 22 23 24 Star 1977 Name Name Wars 30 27 Mark Harrison Hamill Ford 25 Director Title 26 29 Name Name Name Hook 32 31 Actor x 33 George Dustin Lucas Hoffman 34 Steven Spielberg Is this an AND matching? Is it maximal? Movie Producer y Director Uncredited Actor z NULL Date of birth Name u NULL v Movie Database 1 r Movie Movie Producer Movie 11 12 Actor Title Director Actor Year 21 22 23 24 Star 1977 Name Name Wars 30 27 Mark Harrison Hamill Ford 25 Director Title 26 29 Name Name Name Hook 32 31 Actor x 33 George Dustin Lucas Hoffman 34 Steven Spielberg Is this a Weak matching? Is it maximal? Movie Producer y Director Uncredited Actor z NULL Date of birth Name u NULL v Movie Database 1 r Movie Movie Producer Movie 11 12 Actor Title Director Actor Year 21 22 23 24 Star 1977 Name Name Wars 30 31 Mark Harrison Hamill Ford 27 Actor 25 x Director Title 26 29 Name Name Name Hook 32 33 George Dustin Lucas Hoffman 34 Steven Spielberg Is this an OR matching? Is it maximal? Movie Producer y Director Uncredited Actor z NULL Date of birth Name u NULL v Movie Database 1 r Movie Movie Producer Movie 11 12 Actor Title Director Actor Year 21 22 23 24 Star 1977 Name Name Wars 30 31 Mark Harrison Hamill Ford 27 Actor 25 x Director Title 26 29 Name Name Name Hook 32 33 George Dustin Lucas Hoffman 34 Steven Spielberg Movie Producer y Director Uncredited Actor z NULL Date of birth Name u NULL Is this an AND matching? Weak matching? OR matching? Is it maximal (for each option)? v University University 1 u Course Teacher Lab 2 Teacher 5 3 4 A. Cohen Lab v Teacher Course Title Title Instructor Name Course 6 7 8 9 Logic Name OS C. Katz Name 12 Course 13 B. Levi 10 Title Teacher x Teacher 11 Title 14 15 Compilers Databases w Find all maximal answers under ANDSemantics, OR-Semantics and Weak Semantics Computing Maximal Answers • How can we systematically compute all maximal answers? • Can we compute all answers in polynomial time? • We will see an algorithm to compute all maximal answers of a DAG Query under AND Semantics Intuition • Sort nodes in query by a topological order • Start with the set of matchings containing the matching (root of query/root of database) • Iterate over nodes vi according to order – extend each matchings by all possible images of vi that yield AND-matchings – if there are no appropriate images, then extend with vi mapped to null Eval-Dag-Query-AND-Semantics(Q,D) let v0< v1 < …< vk be a topological ordering of the nodes of Q let S0 = {(v0/root(D)} for i = 1 to k do Si = ; for each 2 Si-1 do E = { u2 D | © (vi/u) is an AND matching} if E = ; then Si = Si [ { © (vi/null)} else Si = Si [ { © (vi/u) | u2 E} Analyzing the Algorithm • Why is the algorithm correct? • What is the runtime of the algorithm? • What are the memory requirements of the algorithm? • Can this algorithm easily be adapted for general graph queries (which may contain cycles)? AND Semantics – Cyclic Queries • Determining whether there is an AND matching that maps at least 1 non-root node to a non-null is NP-Complete – why is it in NP? – NP-hardness by reduction to Hamiltonian cycle Hamiltonian Cycle • Given a graph G, a Hamiltonian cycle is a simple cycle that traverses each node in the graph exactly once • Determining if there is a Hamiltonian cycle is NP-Complete! Can You Find One Here? Can You Find One Here? Reduction • We show how, given a solution to the matchings under AND-Semantics problem, we can solve the Hamiltonian cycle problem • Given graph G, we – create database D and query Q such that – G has a Hamiltonian cycle if and only if there is an AND-matching that maps a non-root node to a non-null value Creating the Database • • • • Suppose that the graph G has nodes n1,…,nk We create a database with nodes u0,u1,…,uk u0 is the root of the database there is an edge labeled node from u0 to each node ui • for each pair of nodes ui, uj (i >=1, j>= 1, i j) there is an edge labeled neql from ui to uj • there is an edge labeled succ from ui to uj if there is an edge from ni to nj in G Example: Create the Database for this Graph Creating the Query • • • • Suppose that the graph G has nodes n1,…,nk We create a query with nodes v0,v1,…,vk v0 is the root of the database there is an edge labeled node from v0 to each node vi • for each pair of nodes vi, vj (i j) there is an edge labeled neql from vi to vj • there is an edge labeled succ from vi-1 to vi (for all i>1) and an edge labeled succ from vk to v1 Example: Create the Query for this Graph How does the Reduction Work? • Mapping the root of the query to the root of the database is an AND-matching – can any additional nodes be mapped? • If there is a Hamiltonian cycle, then this gives rise to a complete mapping of the query to the database • If there is a matching that maps something other than the root to null, then: – it must map all the nodes (because of the cycle of succ) – it must map all query nodes to different database nodes (because of neql edges) – therefore, the mappings of the node correspond to a Hamiltonian cycle (because of such edges)