Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Extensible Storage Engine wikipedia , lookup
Open Database Connectivity wikipedia , lookup
Microsoft SQL Server wikipedia , lookup
Microsoft Jet Database Engine wikipedia , lookup
Relational algebra wikipedia , lookup
Entity–attribute–value model wikipedia , lookup
Clusterpoint wikipedia , lookup
Versant Object Database wikipedia , lookup
Principles of the Semantic Web DB or KRDB? Alon Halevy Agenda A spectrum of representation and query formalisms: – From relational databases to description logics and beyond. Questions: – – – – What can we represent? What can we query? How much will it cost? Can we actually use this stuff? 2 Why Do We Care? Need to represent data and knowledge on the semantic web. Need to query. Need to map between different representations of data/knowledge. People are confused, very religious about these issues. 3 Perspectives from the Structure Chasm Authoring Querying Data sharing Writing text Creating a schema keywords Someone else’s schema Easy Committees & standards Specifics The relational model, a few query languages – Representing real data Horn rules, the DB view and the KR view. XML: shattering the myth of no semantics. Description Logics: a logic of descriptions. One shameless plug for my past work. 5 Recalling First-Order Logic A KB is a set of well-formed formulas X (Person(X) Mortal(X)) X (Student(X) Smart(X)) Person(a) v Dog(a) X (Student(x) (x=Aristotle)) Interpretation: a mapping from terms to the universe of discourse. Model: any interpretation that satisfies the formulas. Key idea: infer implicit facts from the explicit ones. 6 Example Inferences The KB: X (Person(X) Mortal(X)) X (Student(X) Smart(X)) Person(a) v Dog(a) X (Student(x) (x=Oren)) – Mortal(a)? don’t know. – Smart(Oren)? Yes. – Mortal(a) v Mortal(a)? Yes –X (Person(x) not Mortal(x))? No 7 Table name Attribute names The Relational Data Model Product PName Price Category Manufacturer Gizmo $19.99 Gadgets GizmoWorks Powergizmo $29.99 Gadgets GizmoWorks SingleTouch $149.99 Photography Canon MultiTouch $203.99 Household Hitachi Tuples or rows 8 No’s No negation No disjunction Ambiguous support for incomplete information The database represents a single model. Hence, inference is just model checking. 9 Integrity Constraints Very specific forms of logical formulae. Enforced (maybe) by the database system: – Functional dependencies: A B – Foreign key constraints: every tuple in the Purchase table must refer to a product in the Product table. – Multi-valued dependencies. – Tuple-generating dependencies, etc. etc. 10 Inference: Type 1 (Querying) Product (pname, price, category, manufacturer) Company (cname, stockPrice, country) Find all countries that manufacture some product in the ‘Gadgets’ category. SELECT country FROM Product, Company WHERE manufacturer=cname AND category=‘Gadgets’ Q(c) :- Product(w,y,’Gadgets’,x), Company(x,p,c) 11 Query Language Features Fundamental question: what queries can I express with my language? Start with selection, projection, and join. + union and negation (= relational completeness) To deal with real data: – Grouping and aggregation – Dealing with duplicates – Outer joins 12 Inference Type 2: Query Containment • Question: is the result of Q1 always a subset of Q2? Q1(A,B) :- cites(A,B), cites(B,A), sameTopic(A,B) Q2(C,D) :- cites(C,C1), cites(D,D1) • Inference on a very specific type of formula. • Only finite models are considered. 13 Complexity Results Galore For select-project-equi-join: NP-Complete Add comparisons (e.g, <): Pi^p_2 complete. Add negation: – Level 0: still Pi^p_2 complete. – Level 2: undecidable – Level 1: Sagiv and Ullman know but don’t want to tell Allow at most 2 occurrences of every predicate name: polynomial. A lot of papers. 14 Last Week Recap The relational model: ground facts + UNA + CWA. Integrity constraints: expressing more knowledge. Inference Type 1: querying (= model checking). – Note: polynomial time is not good enough. Inference Type 2: query containment – Type 2.5: answering queries using views. – Both are an inference problem of a particular type of formula in first-order logic over finite models. 15 Beyond Containment Answering Queries Using Views Given a query Q and a set of view definitions V1,…,Vn: Is it possible to answer Q using only the V’s? V1(A,B) :- cites(A,B), cites(B,A) V2(C,D) :- sameTopic(C,D), cites(C,C1), cites(D,D1) Query: q(x,y) :- sameTopic(x,y), cites(x,y), cites(y,x) Query rewriting: q’(X,Y) :- V1(X,Y), V2(X,Y) 16 Didn’t We Say 590 Semantic Web? Assume a virtual schema of the WWW, e.g., – Course(number, university, title, prof, quarter) Every data source on the web contains the answer to a view over the virtual schema: UW database: SELECT number, title, prof FROM Course WHERE univ=‘UW’ AND quarter=‘2/02’ Stanford database: SELECT number, title, prof, quarter FROM Course WHERE univ=‘Stanford’ User query: find all professors who teach “database systems” 17 Horn Rules / Datalog Easy for KR people. Hard for database people. Add recursion to the query language: – Path (x,y) :- edge(x,y) – Path (x,y) :- Path(x,z), Path(z,y) DB people consider least fixed point semantics. { edge(a,b), Path(a,b), Path(b,c), Path(a,c)}: – Is a model for KR folks, not DB folks. Recursion is not expressible in first-order logic. 18 More on Datalog Many clever algorithms for evaluating datalog queries: – They have fancy names: e.g., magic sets Query containment: undecidable, unless you constrain the queries (ask Surajit) Some ideas made it into SQL and relational systems: – Magic sets (useful even without recursion) – Linear recursion in SQL-3 19 XML <db> <book> <title>Complete Guide to DB2</title> <author>Chamberlin</author> </book> <book> <title>Transaction Processing</title> <author>Bernstein</author> <author>Newcomer</author> </book> <publisher> <name>Morgan Kaufman</name> <state>CA</state> </publisher> </db> 20 XML: Issues Data model: edge-labeled graph (/tree): – The tags can be viewed as binary relations Features: – – – – – The schema is embedded in the data. Can have a predefined schema (XML Schema) Nesting can be arbitrary. Can be irregular (e.g., different formats for elements) Order of elements may be important. 21 Querying XML XPath, XQuery, XSLT. XQuery is based on XPath, XML-QL, SQL. Query languages features: – Path expressions – The Return clause: creates the output XML document. – Standard query language bells and whistles. – Not in XQuery: tag variables – bind variables to schema elements. Query containment: ask Dan and Gerome. 22 Knowledge Representation McCarthy suggested some form of first-order logic. Semantic networks – popular on the east coast. Evolved into: – Frame-based systems – Description logics (a.k.a. terminological logics). 23 Description Logics A subset of first-order logic with a German syntax. No variables. Allows only: – Unary relations (Concepts): Person, Happy – Binary relations (Roles, attributes): childOf A DL Knowledge base: – Abox of ground facts: Person(sue), Happy(bob) – Tbox of definitions. 24 Concept Descriptions Built using a set of constructs: C, D A T ^ CD CUD C R.C R.C | | | | | | | | Primitives Konzept Top Konzept Bottom Konzept Durchschnitt Vereinigung Komplement Rollenquantifikation/Werterestriktion Rollenquant./Existentielle Restriktion 25 Concept Descriptions Built using a set of constructs: C, D A T ^ CD CUD C R.C R.C (> n R) | | | | | | | | | Primitive concepts Top concept Bottom concept Intersection Union Complement Universal restriction Existential restriction number restriction 26 TBox Assertions Concept introduction: – Person Mammal Concept definition: – Parent = Person (> 0 child) – HappyParent = Parent ( child.Smart) Inclusion assertions: – Parent ( child. (= name Karina)) HappyParent – Person (> 5 child) HappyParent ^ 27 Reasoning in DLs. Note: you can assert view facts – HappyParent(bob) – You don’t know who the children are, but you know they’re smart. Classification: C(a)? Consistency: is C necessarily an empty concept? Subsumption: C1 C2? Theoretically, everything boils down to subsumption. Complexity depends on set of constructors allowed. 28 DLs vs. Horn rules Horn rules can handle any variable pattern – DL’s can handle only specific patterns. DL’s can do subsumption with negation, number restrictions, and various other features. They can be combined but decidability is subtle (see CARIN, [Levy and Rousset, 1996]). 29