Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Implementing Database Coordination in P2P Networks * Ilya Zaihrayeu SemPGRID-04, 18 May 2004, New York, USA * work with Fausto Giunchiglia Why P2P Databases • P2P data sharing: files … relational data? • File sharing: KaZaa + Morpheus = more than 460 million downloads (download.com, May 2004) • P2P databases: academia testbeds so far.. • Promises: large-scale fault-tolerant multi-database system with low start-up and maintenance costs, and high “output” for an individual party • Difficulties: data integration solutions are not applicable due to centralized nature • Challenges: new methodologies, theories and algorithms, models, mechanisms and tools need to be developed Why P2P Databases, cont’d • Application: non performance critical domains, where local autonomy of each party is essential • Medical care scenario – John is going for skiing and suffers an accident – John is taken to local clinic for treatment – doctors need to know whether John has contraindication against some drugs – John does not know these details, but his database layer has a link to family doctor’s databases • Cooperating real estate agents example – Agents coordinate their data to push sales – When on the site of a customer who wants to sell, agent updates his database and makes data available for other agents – When on the site of a customer who may want to buy, agent shows details from his database, and may query other agent’s databases • Other examples: scientific databases (genomic data), tourism, etc Data Coordination Model • Interest Groups – group of peers able to answer queries about a certain topic – e.g., group topic – “Tourism in Trentino”, “Real Estate in Scotland”, etc – each Interest Group has group manager (GM) which helps in maintenance of the group • Acquaintances – “known” nodes that contribute data – acquaintance query – a query over the relations of an acquaintance which results satisfy some local relation • Correspondence Rules – solve heterogeneity problem at instance level – semantic heterogeneity at structure level is solved by acquaintance queries • Coordination Rules – coordinate data (queries and updates) with acquaintances Interest Groups • Help to cope with large number of nodes by clustering the network • Nodes self-organize into interest groups • A node may form a child interest group • One node may belong to multiple groups • Use schema matching to monitor group constitution • GM is to support group constitution, “talk” to other GMs and provide information about the group to newcomers All topics Arts Movies … Music Lyrics … Shopping … Publications Computers Books Acquaintance query • Acquaintance query is a conjunctive query: • q(X) :- r1(X1), …, rn(Xn) – q(X) – head, refers to local relation; – r1(X1), …, rn(Xn) – subgols of the body, refers to the relation of an acquaintance; and comparison predicates – X, X1,…, Xn – variables or constants; • E.g., P1: films (title, year, genre) :- P2: movie (title, year, director); genres (title, genre); year>1995 C :- E,F E A loop 1 3 F 2 D :- I,G A B C B :- C,D D I I :- A,B 4 F :- G G Correspondence Rules and Coordination Rules • Correspondence rules define how constants from the local domain are translated into constants in the domain of an acquaintance (forward translation) and vice versa (backward translation) – not necessarily symmetric, e.g. currency translation • Coordination Rules’ goal is data coordination with acquaintances and acquainted nodes – activated by user (user query) or from the network (network query, results, update) Algorithmic notes • Query answering algorithm – Use acquaintance queries and correspondence rules to translate queries and data – Propagate to acquaintances if acquaintance queries are relevant – Compute only new tuples, reconcile results – Process loops in query propagation, define termination point (no propagation using acquaintance queries that have been already used) • “Getting acquainted” protocol – Retrieve database schemas and then apply a matching operator on them – Based on the matching results, generate (with help of user) acquaintance queries, correspondence rules, tune up coordination rules • Updates handling (work with E. Franconi, G. Kuper, A. Lopatenko) – Data may go through a loop more than once, define termination point Implementing P2P databases on top of JXTA • Benefits – system platform, networking protocol independence – IP-independence (location independence) – gives basic blocks for building P2P applications • We implement Interest Groups and Acquaintances in JXTA • We encode database related functionalities into a set of custom JXTA services (DB-related services) DB-related services Node-level services Queries handler … Group-level services DB Screening operations service … GM service Architecture User User Interface (UI) Nodes on the network User-1 Database Manager (DBM) User-2 JXTA Layer Wrapper PDBMS SS Source Database (SDB) A node A P2P database network User-n Architecture, cont’d User Interface (UI) DBM Query Planner P2P Management Coordination Rules Query Propagation Updates Handler Acquaintance queries Acquaintances Results Handler Correspondence Rules Services GM in-pipe adv DB-related services JXTA Core Services JXTA Layer Wrapper Pipes Peer Groups Advertisements Gr. topic SS Peer Gr. Adv Peer Adv Pipe Adv In Out Discovery Demo: toy databases and topology Rendezvous peer Relations: (1) (2) (3) (4) [1,2] Movie (title, year, genre) Credits (name, title, role) Movie2 (title, year, director) Genre (title, genre) (2:-2) [2] 1 (1:-1) Q 0 [1,2] (1:-3,4) 3 (2:-2) (2:-2) (4:-1) (3:-3) 2 4 [2,3,4] (4:-4) Mediator peer [3] [4] 5 Query example 1 “List titles of movies featuring Tom Hanks” [1,2] (2:-2) [2] 1 Q(t) :- Credits (n,t,r); n=“Tom Hanks” (1:-1) Q 0 [1,2] (1:-3,4) 3 (2:-2) (2:-2) (4:-1) (2:-2) (3:-3) 2 [3] 4 [2,3,4] (4:-4) [4] 5 Query example 2 “Titles of drama movies issued after 1995” [1,2] (2:-2) [2] 1 Q(t) :- Movie (t,y,g); g=“Drama”; y>1995; (1:-1) Q 0 [1,2] (1:-3,4) 3 (2:-2) (2:-2) (4:-1) (3:-3) 2 [3] 4 [2,3,4] (4:-4) [4] 5 Query example 3 “Names of actors playing in action movies in 2003” [1,2] (2:-2) [2] 1 Q(n) :- Movie (t,y,g); Credits (n,t,r); r=“Actor”; g=“Action”; y=2003; (1:-1) Q 0 [1,2] (1:-3,4) 3 (2:-2) (2:-2) (4:-1) (3:-3) 2 [3] 4 [2,3,4] (4:-4) [4] 5 References • F. Giunchiglia and I. Zaihrayeu. Making peer databases interact - a vision for an architecture supporting data coordination. 6th International Workshop on Cooperative Information Agents (CIA2002), Madrid, Spain, September 18 -20, 2002. • P. Bernstein, F. Giunchiglia, A. Kementsietsidis, J. Mylopoulos, L. Serafini, and I. Zaihrayeu, “Data management for peer-to-peer computing: A vision,” WebDB, 2002. • A. Halevy, Z. Ives, D. Suciu, and I. Tatarinov, “Schema mediation in a peer data management system,” ICDE, 2003. • V. Kantere, I. Kiringa, J. Mylopoulos, A. Kementsietsidis, and M. Arenas, “Coordinating peer databases using ECA rules,” DBISP2P, September 2003. • Enrico Franconi, Gabriel Kuper, Andrei Lopatenko, Ilya Zaihrayeu (2004). The coDB Robust Peer-to-Peer Database System. Proc. of the 2nd Workshop on Semantics in Peer-to-Peer and Grid Computing (SemPGrid'04), 2004 • JXTA project, see http://www.jxta.org Announcement Submission deadline: 30 June, 2004 www.p2pkm.org Thank you