Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Graph Databases (GDB) Adrian Silvescu Doina Caragea Anna Atramentov Problems And Motivations • The necessity to represent, store and manipulate complex data make RDBMS somewhat obsolete • [P1] Problem 1: Violations of the 1NF – – – • Multi-valued attributes Complex attributes Complex combination of the previous two [P2] Problem 2 : Accommodate Changes – – Appears when acquiring data from autonomous dynamic sources or Web (eg: genexp & restaurants ). RDBMS may require schema renormalization Problems and Motivations (contd) • [P3] Problem 3: Unified representation for: – – – – Data Knowledge (Schemas are a subset of this) Queries (More generally: Goals) [results+def] Models (Concepts are a particular example) • In order to facilitate the application of learning and reasoning methods on these structures The Big Picture Discovery & Modelli ng Information Exploitation USER Query = Goal Specification Information AVAILABLE INFORMA TION IN THE WORLD INFORMA TION INTEGRATION Information Integration ... KNOWLEDGE SOURCES ... LINK SOURCES ... DATA SOURCES Existing Approaches • RDBMS – may need schema renormalisation • Approaches that try to fix the above mentioned problems: – OO Databases [P1], [P2] - graphs [but procedural] – XML Databases [P1] (somewhat [P3]) – trees – OORDBMS [P1] – graphs with foreign keys • Others – Datalog – More Efficient Crippled Prolog – Network Models - graphs – Hierarchical Models – trees • Therefore the motivation for Graph Databases Outline • Graph Databases – – – – – – Examples DDL DML – Queries DML – UPDATES Informal Semantics DB => GDB • GDB vs. OO, XML, OR, … • Conclusions and Further Work Graph Databases • We propose a new kind of Database: Graph Databases (GDB) as a solution to Problems [P1],[P2] and [P3]. • In order to define the GDB we will specify: – The Data Definition Language (DDL) – The Query Language (more generally DML) – Informal Semantics of the above languages • We will also show how to convert existing DBs (RDBMS) into the GDB DDL to facilitate the transition to GDBs Goals and Design Choice • Goals – Declarativity – Change • Design Choice : Have unique instance identifiers vs. having foreign keys – Close in Spirit to OO – Will allow us to cope easier with Change – Declarativity is an issue in OO, but not for GDB as we will show Database Representation • Sailors(sid:integer, sname:char(10), rating: integer, age:real) • Boats(bid:integer, bname:char(10), color:char(10)) • Reserve(sid:integer, bid:integer, day:date) Sailors Reserves sid sname rating age 22 dustin 7 45.0 31 lubber 8 55.5 58 rusty 10 35.0 Boats sid bid day 22 101 10/10/96 58 103 bid bname color 101 Interlake red 102 Clipper green 103 Marine red 11/12/96 sid 22 name dustin rating 7 Graph Representation IOF Sailor s IO F TB ID1 age 45.0 L IOF 31 name lubbe r rating 8 IOF IOF Boat s Reserves IOF ID2 IOF age 55.5 ID8 IOF ID6 ID5 ID4 : sid : : ID3 ID7 : … Foreign Keys ID1 sid 22 name dustin rating 7 age 45.0 Sailor sid 22 day 10/10/96 ID4 bid 101 Boat bid ID6 bname color 101 Interlake red Data Representation in the GDB DDL Name1 Val 1 Name2 ID Val 2 …… Name N Val N • ID:(Name1=Val1,…,NameN=ValN) Examples: ID1:(sid=22, name=“Dustin”, rating=7, age=45.0) ID4:(sailor=ID1, day=“10/10/96”, boat=ID6) ID6:(bid=101, bname=“Interlake”, color=“red”) Defining New Concepts in GDB DDL– Grandson _ID1 GrSon _ID2 :- Person Person Person IOF IOF IOF _ID1 Son _ID1:(GrSon=_ID2) :_ID1:(IOF=“Person”,Son=_ID3), _ID3:(IOF=“Person”, Son=_ID2), _ID2:(IOF=“Person”). _ID3 Son GrSon _ID2 [DML-QL] Writing simple queries: •The names of all sailors who have reserved a red boat _X Sailors _X Boats Name IOF Name IOF _ID Boat _ID1 _ID :- Color Red _ID:(Name = _X) :_ID:(IOF = Sailors, Boat = _ID1, Name = _X), _ID1:(IOF = Boats, Color = Red). Informal Semantics • Three kinds of Definitions – Facts: • G1. [Extensional definition] – Definitions: • G1 :- G2. [Intensional definition] • G1 :- PROC f(x1,…,xn). [Procedural definition] • Queries = Graphs to be Matched = QG – The same as a definition: Query :- QG. Informal Semantics - Picture Query Query match Facts Extended Graph RDBMS => GDB • Sailors(sid:integer, sname:char(10), rating: integer, age:real) • Boats(bid:integer, bname:char(10), color:char(10)) • Reserve(sid:integer, bid:integer, day:date) Sailors Reserves sid sname rating age 22 dustin 7 45.0 31 lubber 8 55.5 58 rusty 10 35.0 Boats sid bid day 22 101 10/10/96 58 103 bid bname color 101 Interlake red 102 Clipper green 103 Marine red 11/12/96 DML - Updates • Inline Query – _X : [ _ID: (IOF = Sailor, sname = lubber, rating = _X)] • Updates: MODIFY (QryGraph, UpdList) – Add to all GrandSons the money of the Grandparent as a potential inheritance. – MODIFY ( _ID : (GrSon = _ID1), (=> NEWID:(IOF = POT_INHER, BENFICIARY = _ID1, AMOUNT = _AMNT: [ _ID:(Money = _AMNT )]) )) – . Change Book IOF Title Databases Author Ramakrishan _ID Author Gehrke Change Gene_ex IOF _ID value 0.7 IOF x / Exp IOF _ID1 _ID2 value value x1 x2 : IOF Aggregate Operation _IDn value xN High Order Queries Find all the fields from Tables that contain the name John. _ID:(Name=_X) :_ID1:(IOF=Tables), _ID2:(IOF=_ID1, _X=“John”). GDB vs. OO, XML, OR, CG, … • GDB are close in spirit to OO but not the same (GDB : no encapsulation + more IDs). • Close To Datalog but with IDs(links) vs foreign keys • The same for ORDBMs and somewhat XML • Close to Conceptual Graphs But CG do not have IDs • We can also use foreign keys: _ID:[ _ID(IOF = Sailors, sname = lubber)]. OO vs. GDB OO GDB ID1 Class: Person age: 42 name: john car Person IOF ID1 ID2 Class: Car color: red age 42 Car name John IOF car ID2 color red Conclusions and Further work • Advantages & Disadvantages of GDB • Conclusions – Proposed a new database model DBG that copes with problems existing in previous approaches – Showed how to link existing DB with GDB – Showed advantages of GDB over existing database approaches • Further Work – Use GDB for learning