Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
August 25, VLDB 2009 Believe It or Not – Adding belief annotations to databases Wolfgang Gatterbauer, Magda Balazinska, Nodira Khoussainova, and Dan Suciu University of Washington http://db.cs.washington.edu/beliefDB/ High-level overview • • DBMS: manage consistent data Applications need inconsistent DM – Scientific databases reason: disagreement ! – Internet community databases • Community DBMS: manage inconsistent views • This work: Belief databases – manage data and curation – grounded in modal and default logic – implemented on top of relational model 2 Agenda • Motivating example • Logic foundations • Relational implementation • Discussion 3 Motivating application • NatureMapping project (http://depts.washington.edu/natmap/) – volunter contribute animal observations – one person curates the database problem: does not scale! Observations id uid 2 species Alice Crow date location comment 06-14-08 Lake Placid found feathers Sightings (S) sid uid species s2 Alice Crow date location 06-14-08 Lake Placid Comments (C) cid comment sid c1 found feathers s2 4 1. Distinct database instances S sid uid species date location s2 Alice Crow 06-14-08 Lake Placid sid uid date Alice S Bob species s2 Alice Raven location 06-14-08 Lake Placid D1: Belief worlds: individually consistent, mutually possibly inconsistent 5 1. Distinct database instances S sid uid species date location s2 Alice Crow 06-14-08 Lake Placid sid uid date Alice S Bob species s2 Alice Raven location 06-14-08 Lake Placid BeliefSQL I: Alice believes that she saw a crow. insert into BELIEF ‘Alice’ Sightings values (‘s2’,‘Alice’,‘Crow’,’06-14-08’,‘Lake Placid’) I: Bob believes that she actually saw a raven. insert into BELIEF ‘Bob’ Sightings values (‘s2’,‘Alice’,‘Raven’,’06-14-08’,‘Lake Placid’) Q: Who believes something different than Alice and what? select U2.name, S1.species, S2.species from Users as U, BELIEF ‘Alice’ Sightings as S1, BELIEF U.uid Sightings as S2, where S1.sid = S2.sid and S1.species <> S2.species A: {(‘Bob’, ‘Crow’, ‘Raven’)} 6 2. Open world assumption S sid uid species date location s2 Alice Crow 06-14-08 Lake Placid sid uid date Alice S Bob S Carol species location s2 Alice Raven 06-14-08 Lake Placid sid uid date species location s2 Alice Crow 06-14-08 Lake Placid s2 Alice Raven 06-14-08 Lake Placid − − Adapted key constraints ! D2: Model incomplete knowledge with exlicit negative beliefs 7 2. Open world assumption S Alice S Bob S Carol believe that Alice saw a crow species I: Carol datedoes not location nor a raven. s2 Alice Crow 06-14-08 Lake Placid insert into BELIEF ‘Carol’ not Sightings values (‘s2’,‘Alice’,‘Crow’,’06-14-08’,‘Lake Placid’) insert into BELIEF ‘Carol’ not Sightings Placid’) sid uid species values date(‘s2’,‘Alice’,‘Raven’,’06-14-08’,‘Lake location sid uid s2 Alice Raven 06-14-08 Lake Placid sid uid date species location s2 Alice Crow 06-14-08 Lake Placid s2 Alice Raven 06-14-08 Lake Placid − − 8 2. Open world assumption S sid uid species s2 Alice Crow Alice S species s2 Alice Raven Bob S Carol sid uid sid uid species Q: Who disagrees with a sighting from ‘06-14-08’ that Alice believes? U.name, S1.species date selectlocation from Lake Users as U, 06-14-08 Placid BELIEF ‘Alice’ Sightings as S1, BELIEF U.uid not Sightings as S2 where S1.sid = S2.sid S1.uid = S2.uid date and location and S1.species = S2.species 06-14-08 Lake Placid and S1.date = ‘06-14-08’ and S2.date = ‘06-14-08’ and S1.location = S2.location ‘Crow’), (‘Carol’, ‘Crow’)} date A: {(‘Bob’, location s2 Alice Crow 06-14-08 Lake Placid s2 Alice Raven 06-14-08 Lake Placid − − 9 3. Higher-order beliefs S sid uid species date location s2 Alice Crow 06-14-08 Lake Placid sid uid date Alice S species s2 Alice Raven C Bob Bob Alice C location 06-14-08 Lake Placid cid comment sid c1 plain black feathers s2 cid comment sid c1 purple-black feathers s2 D3: Beliefs about other user’s beliefs: allow discussion between users 10 3. Higher-order beliefs to location Bob, Alice believes that the species I: According date of the Lake sighted animal were plain black. s2 Alice Crow feathers 06-14-08 Placid insert into BELIEF ‘Bob’ BELIEF ‘Alice’ Comments values (‘c1’, ‘plain black feathers’, ‘s2’) S sid uid S sid uid Alice species s2 Alice Raven C Bob Bob Alice C date location 06-14-08 Lake Placid cid comment sid c1 plain black feathers s2 cid comment sid c1 purple-black feathers s2 11 3. Higher-order beliefs S s2 Alice S sid s2 C Bob Bob cid c1 Alice C does Alice believe according species Q: Which date comments location to Bob, which he does not believe himself? Alice Crow 06-14-08 Lake Placid select C1.cid, C1.comment from BELIEF ‘Bob’ BELIEF ‘Alice’ Comments as C1, BELIEF ‘Bob’ not Comments as C2 uid species date location where C1.cid = C2.cid Alice Raven and 06-14-08 Lake Placid C1.comment = C2.comment and C1.sid = C2.sid comment sid A: {(‘c1’,‘plain-black feathers’)} plain black feathers s2 sid uid cid comment sid c1 purple-black feathers s2 12 3. Higher-order beliefs S s2 Alice S sid s2 C Bob Bob cid c1 Alice C does Alice believe according species Q: Which date comments location to somebody, which (s)he does not believe themself? Alice Crow 06-14-08 Lake Placid select U.name, C1.sid, C1.comment from Users as U, BELIEF U.uid BELIEF ‘Alice’ Comments as C1, uid species date location BELIEF U.uid not Comments as C2 Alice Raven where 06-14-08 Placid C1.cid =Lake C2.cid and C1.comment = C2.comment comment sid = C2.sid and C1.sid plain black feathers s2 A: {(‘Bob’, ‘c1’, ‘plain-black feathers’)} sid uid cid comment sid c1 purple-black feathers s2 13 4. Message board assumption S sid uid species date location s2 Alice Crow 06-14-08 Lake Placid sid uid date Alice S species s2 Alice Raven C Bob S Bob Alice 06-14-08 Lake Placid cid comment sid c1 plain black feathers s2 sid uid species s2 Alice Crow C location date location 06-14-08 Lake Placid cid comment sid c1 purple-black feathers s2 D4: Default assumption: models a trusting attitude & avoids repeated inserts 14 4. Message board assumption S sid uid species s2 Alice Crow Alice S sid s2 C Bob cid c1 S sid select S1.sid, S1.species from BELIEF ‘Bob’ BELIEF ‘Alice’ Sightings as S1, uid species dateBELIEFlocation ‘Bob’ not Sightings as S2 S1.sid =Lake S2.sid Alice Raven where 06-14-08 Placid and S1.uid = S2.uid comment sid and S1.species = S2.species and S1.date = S2.date plain black feathers s2 and S1.location = S2.location A: {(‘s2’, ‘Crow’)} uid species date location s2 Alice Crow Bob Alice C Q: Which animal sightings does Alice believe date location according to Bob, which he does not 06-14-08 Lake Placid believe himself? 06-14-08 Lake Placid cid comment sid c1 purple-black feathers s2 15 What we have seen so far • 4 Design decisions for Belief Database model – – – – • Distinct belief worlds Open world assumption (OWA) Higher-order beliefs Message board assumption BeliefSQL – SQL + ‘BELIEF’ (+ ‘not’) 16 Agenda • Motivating example • Logic foundations • Relational implementation • Discussion 17 Logic foundations: Belief statements S Bob Alice Carol … Alice Alice Bob … Bob Alice species s2 Alice Crow Alice … sid uid … … insert into BELIEF ‘Alice’ S values (‘s2’, ‘Alice’, ‘Crow’,…) i: ☐Alice S+(‘s2’,‘Alice’,‘Crow’,…) modal operator & belief path (w) relational tuple (t) sign (s) Carol ε … Carol Alice … belief statement = ☐w ts “annotation of ground tuple” Belief database D = {1, …, n} Bob … 18 Logic foundations: Entailment S Bob Alice Carol … Alice S … Bob Alice Alice Alice Bob … Bob Alice sid uid species … s2 Alice Crow … sid uid … species s2 Alice Crow … select * from BELIEF ‘Bob’ BELIEF ‘Alice’ S Carol ε … … One belief annotation: D = {1} 1=☐Alice S+(…‘Crow’,…) … More than one entailed belief: D ☐Bob Alice S+(…‘Crow’,…) Carol Alice Bob 19 Logic foundations: Message board assumption Message board assumption If D ☐w ts and ☐u w ts consistent with D then D ☐u w ts Default logic : ☐u ☐u non-monotonic reasoning ! D D\D D Explicit beliefs (annotations) Implicit beliefs (assumptions) Entailed beliefs (extension) belief database “contains” more than the explicit belief annotations ! 20 “Semi-formal” problem statement INPUT OUTPUT Belief statements i1: 1 i2: 2 ... in: n Adapted key constraints Message board assumption : ☐u ☐u D ? D ☐w1…wd R+(x1,…xl) ? q(x) :− ☐w Ri+(xi) Belief Conjunctive Queries (BCQ) q(x) :− ☐w1R1s1(x1), …, ☐wgRgsg(xg) 21 Agenda • Motivating example • Logic foundations • Relational implementation • Discussion 22 Canonical Kripke structure Belief statements* i1: s11+ i2: ☐Bob s11− i3: ☐Bob s12− i4: ☐Alice s21+ i5: ☐Alice c11+ i6: ☐Bob s22+ i7: ☐BobAlice c21+ i8: ☐Bob c22+ Message board assumption Carol Carol Alice #0 {s11+} #1 {s11+,s21+,c11+} Carol Bob Carol Bob #2 Alice #3 {s11+,s21+,c11+,c21+} Bob {s11−,s12−,s22+,c22+} : ☐i ☐i * Running example from the paper 23 Relational representation Sightings_INTERNAL tid sid uid species Sightings_V date location wid tid E sid s e wid1 uid wid2 s1.1 s1 Carol Bald eagle 06-14-08 Lake Forest #0 s1.1 s1 + y #0 Alice #1 s1.2 s1 Carol Fish eagle 06-14-08 Lake Forest #1 s1.1 s1 + n #0 Bob s2.1 s2 Alice Crow 06-14-08 Lake Placid #1 s2.1 s2 + y #0 Carol #0 s2.2 s2 Alice Raven 06-14-08 Lake Placid #2 s1.1 s1 − y #1 Bob #2 s1.2 s1 − y #1 Carol #0 #2 s2.2 s2 + y #2 Alice #3 #3 s1.1 s1 + n #2 Carol #0 #3 s2.1 s2 + n #3 Bob #3 Carol #0 Comments_INTERNAL tid cid comment sid c1.1 c1 found feathers s2 c2.1 c2 plain black feathers s2 c2.2 c2 purple-black feathers s2 #2 #2 #2 Comments_V wd tid cid s e #1 c1.1 c1 + y #2 c2.2 c2 + y #3 c1.1 c1 + n #3 c2.1 c2 + y D S wid d wid1 wid2 #0 0 #1 #0 #1 1 #2 #0 #2 1 #3 #1 #3 2 24 Example Translation of a Belief CQ (BCQ) Q: Who disagrees with a sighting from ’06-14-08’ that Alice believes? BeliefSQL Translation into SQL select U.name, S1.species from Users as U, BELIEF ‘Alice’ Sightings as S1, BELIEF U.uid not Sightings as S2 where S1.sid = S2.sid and S1.uid = S2.uid and S1.species = S2.species and S1.date = ‘06-14-08’ and S2.date = ‘06-14-08’ and S1.location = S2.location select into from where and and and E1.uid as uid1, V.tid, V.sid, R.uid, R.species, R.date, R.location, V.s T2 E as E1, Sightings_V as V, Sightings_STAR as R E1.wid1=0 V.wid=E1.wid2 V.tid=R.tid E1.uid='1'; select into from where and and E1.uid as uid1, V.tid, V.sid, R.uid, R.species, R.date, R.location, V.s T1 E as E1, Sightings_V as V, Sightings_STAR as R E1.wid1=0 V.wid=E1.wid2 V.tid=R.tid; select from where and and and T1.uid1, T2.species T1 as T1, T2 as T2 T1.sid=T2.sid ((T1.s=0 and T1.uid=T2.uid and T1.species=T2.species and T1.date='6-14-08' and T1.location=T2.location) or (T1.s=1 and (T1.uid<>T2.uid or T1.species<>T2.species or T1.date<>'6-14-08' or T1.location<>T2.location))) T2.s=1 T2.date='6-14-08'; drop drop table T2; table T1; q(x,y) :− ☐Alice S+(u,v,y,‘06-14-08’,z), ☐x S−(u,v,y,‘06-14-08’,z) 25 Agenda • Motivating example • Logic foundations • Relational implementation • Discussion 26 Experiments Size |R*| Relative overhead ρ := n ρ = O(mdmax) In theory: e.g. 100 users, max. depth 2 ρ 10,000 m … #users dmax … maximum depth of belief annotation Experiments: ρ 21 – 1,009 Size not limitation of semantics, but of relational implementation! Time Depends on type of query (3 types in paper) Experiments on 10,000 annotations (ρ =22.4): Q1: ~0.1 s Q2: ~0.4 s Q3: ~4.5 s Considerable speed-up to come! 27 Inspirations and related work (excerpt) • Annotations – Buneman et al. [ICDT 2001 / ICDT 2007] – Bhagwat et al. [VLDBJ 2005], Geerts et al. [ICDE 2006] – Srivastava & Velegrakis [SIGMOD 2007] • Modal logic – Fagin et al. [1995] – Calvanese et al. [IS 2008] – Nguyen [LJ-IGPL 2008] • Uncertain / incomplete information – Sarma et al. [ICDE 2006] – Green & Tannen [IEEE Data Eng. 2006] – Dalvi & Suciu [PODS 2007] • Inconsistency / key violations – Arenas et al. [PODS 1999] – Fuxman et al. [SIGMOD 2005] • Peer-to-peer computing / collaborative data sharing – Bernstein et al. [WebDB 2002] – Ives et al. [SIGMOD record 2008] 28 Conclusions • Proposed BELIEF databases – Goal: manage, curate inconsistent data • Implementation – Logical foundations – Relational translation • Current work – making it compact and fast 29 BACKUP 30 Relative overhead of relational representation Relative overhed (|R|/n) 1E+4 1E+3 1E+2 1E+1 1E+1 1E+2 1E+3 Number of annotations (n) 1E+4 Distribution of belief path depths (Pr[k=x]) 31 Query types and execution times 32 Belief Conjunctive Queries (BCQ) 33 Revisiting the semantics / the user (3) ? -> Structured discourse (2) BeliefSQL Conflicts in belief worlds: OWA, keys, ML, DA (1) SQL BELIEF ’Alice’ (…,’eagle’,…) -> ’Alice’ASSERTS (…,’eagle’,…) BELIEF ’Bob’ BELIEF ’Alice’ (…,’black feathers’,…) -> ’Bob’SUGGESTS that the ASSUMPTION (…,’black feathers’,…) has led ‘Alice’ to her original observation Standard relational model 34