Download Presentation_Erick

Cooperative Query Answering Erick Martinez Nov. 19, 2002 MOTIVATION:     Responses to queries posed by a user of a database do not always contain the information required DB and information systems are often hard to use because they do not explicitly attempt to cooperate with their users. They answer literally the queries posed to them A user might need more information than requested, or might actually need different information An answer with extra or alternative information may be more useful and less misleading to a user Cooperative Answer (CA)  A CA should be a correct, non-misleading, and useful answer to a query. Q0: “Which students are enrolled? A0: “joana, jacob, shakil, …“ A0: “X. student(X)“ Grice's maxims  Maxim of Quality: a system should never give an answer which might mislead the user  Maxim of Quantity: an answer should not be more informative, or more detailed, than necessary  Maxim of Relation: an answer should be always relevant to the user who asked the question  Maxim of Manner: an answer should not be ambiguous, leaving the user with choices to make about its meaning Database Stonewalling  Q1: "Who passed COSC6115 in the winter semester of 2001? A1: “No one“  Q2: "Who failed COSC6115 in the winter semester of 2001? A2: “No one“  Q3: "Who taught COSC6115 in the winter semester of 2001? A3: “No one" DB stonewall - will answer a yes/no question with a yes or no regardless of whether the answer is misleading. QUERY / ANSWER SYSTEMS    Natural language interfaces Databases (relational) Logic programming and deductive databases(*) Deductive Databases (DDB) A deductive database consists of three parts:  Facts – the set of all facts constitute the extensional database (EDB)  Rules – the set of rules constitute the intensional database (IDB)  Integrity constraints (IC) – the set of logical formula that must be true of the database e.g. IC0:  enrolled_in(X, Y), not student(X). Distinction between data and knowledge:  Data represented in the EDB, and knowledge in the IDB and IC.  Knowledge is the semantics of the DB, that which must be true of the DB’s state, and the logical conclusions that must follow from given data TECHNIQUES      Evaluation of presuppositions in a query(*) Detection and correction of misconceptions in a query(*) Relaxation and generalization of queries and responses(*) Consideration of specific information about a user's state of mind Formulation of intensional answers Presuppositions: T E C H N I Q U E S  Usually, asking a query not only presupposes the existence of all components of the query, but also presupposes an answer to the query itself. i.e. "Which employees own red cars?“ Q4:  emp(X), owns(X,Y), car(Y), red(Y).    Two atoms in a query are joined if they share a variable. A query is connected if every two atoms in the query are connected. 2n - 2 sub-queries for a conjunctive query with n atoms (exp. cost)  Algorithm: Report the smallest sub-queries that fail, considering only connected sub-queries Presuppositions: Lattice of sub-queries: T E C H N I Q U E S Q4: "Which employees own red cars?“  <- emp(X), owns(X,Y), car(Y), red(Y). <- emp(X), owns(X,Y), car(Y). <- emp(X), owns(X,Y). <- emp(X). <- emp(X), owns(X,Y), red(Y). <- owns(X,Y), car(Y). <- owns(X,Y). <- owns(X,Y), car(Y), red(Y). <- owns(X,Y) red(Y).  <- car(Y), red(Y). <- car(Y). <- red(Y). If a sub-query has no answers, the query cannot have any answers either (scalar implicature) Finding presuppositions (failed sub-queries) is independent of domain specific knowledge. Misconceptions: T E C H N I Q U E S  Integrity constraints: IC1:  professor(X), student(X). IC2:  enrolled_in(X, Y), not student(X).  Query: "Which professor is enrolled in COSC6115?“ Q5:  professor(X), enrolled_in(X, COSC6115).  Answer: “No one is both a professor and a student. Anyone who is enrolled in a class is a student. So no one is a professor and enrolled in class.“ Relaxation: T E C H N I Q U E S  Taxonomy clause: C6: travel(From, To)  serves_area(A, From), serves_area(B, To), flight(A,B) *.  Reciprocal clause: C6T: relax(flight(A,B) )  serves_area(A, From), serves_area(B, To), travel(From, To) . Relaxation step: let  be a substitution after unifying atom in goal with key (*) in the taxonomy clause 1. Apply  across the taxonomy clause. 2. Replace the query atom with the head atom of the taxonomy clause. 3. Add the non-key literals from the body of the taxonomy clause to the new query as constraints on the variables. C6: travel(From, To)  serves_area(A, From), serves_area(B, To), flight(A,B) *. C6T: relax(flight(A,B) )  serves_area(A, From), serves_area(B, To), travel(From, To) . … Relaxation: T E C H N I Q U E S  Original query: Q6 : Q6r :  flight(‘Dulles, ‘Orly’). relax (flight(‘Dulles, ‘Orly’)). Relaxing via reciprocal clause C6T : Q6r’ :     serves_area(‘Dulles, From), serves_area(‘Orly’, To), travel(From, To) . Resolving with taxonomy clause C6 : Q6r’’ :  serves_area(‘Dulles, From), serves_area(‘Orly’, To), serves_area(A, From), serves_area(B, To), flight(A, B) . airport washintong_dc … Relaxation: T E C H N I Q U E S Q6r’’ :    'Dulles' 'National' baltimore 'BWI' paris 'Orly' 'De Gaulle' serves_area(‘Dulles’, From), serves_area(‘Orly’, To), serves_area(A, From), serves_area(B, To), flight(A, B). When A = ‘Dulles’ and B = ‘Orly’, solving flight(‘Dulles, ‘Orly’) again and will get the same answers When A  ‘Dulles’ and B  ‘Orly’, will get new answers: – – From = ‘Washington, D.C.’ and serves_area(A, ‘Washington, D.C.’) will be satisfied by A = ‘National’, A = ‘BWI’ … C6T: relax(flight(A,B) )  Generalization: T E C H N I Q U E S     serves_area(A, From), serves_area(B, To), travel(From, To) . Relaxation is strictly a syntactic notion, a rewrite mechanism. Generalization is a semantic counterpart to relaxation. Literal answers to the relaxed query should include answers to the original query, plus some new neighbourhood answers with respect to the original query. After applying relaxation a new query is a generalization only if all the non-key atoms are satisfied whenever the key atom is satisfied. (conservative reciprocal clause) When all reciprocal clauses are conservative, resolution over a relaxed query will produce all the answers of the original query. USER GOALS AND MODELS Types of knowledge about a user relevant to CA    Interests and preferences Needs – user constraints (UC) Goals and intent MY KEY POINTS:       CA is mostly intended for DDB as a platform. For RDB, a deductive database interface should be implemented on top of any relational system. The system should support natural language input to some extend for some domains (the natural language translator generates a logical query) The system should produce natural language responses CA techniques, in particular relaxation, can useful for applications like Internet queries It is not evident that first order logic can serve as an adequate ontology for CA The End That’s that’s that’s all folks … A CA SYSTEM (at U of Maryland)  Uniform system: – –  Portable – –  Defined and implemented through logic Uniform representation and support for all cooperative methods General approach for RDB, DDB and logic programs Domain-independent Natural language interface – – Accept natural language queries Provide cohesive and coherent responses in natural language Deductive Database Structure:  EDB: prerequisite(‘MATH-300’, ‘MATH-350’). prerequisite(‘MATH-350’, ‘MATH-400’). teaches(smith, ‘MATH-400’). …  IDB : teaches(X, Y)  teaches(X, Z) , prerequisite(Y, Z). …  IC :  enrolled_in(X, Y), not student(X). …

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Presentation_Erick