Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
A GRIDdistributed XML database Giorgio Ghelli Pisa University The project • (Small) part of GRID.it • Born very humbly: “let us generalize GRIS-GIIS structure with XML data model” • Now looking a bit like P2P XML DB • Still in native phase • Feedback looked for! GALT 03, Edinburgh GRID XML Database 2 The vision • A system where everybody builds its repository, decides it will be part of a community, connects to the GRID, and it works • Canonical application: resource description and discovery • Challenges: – No administrative burden – Dynamicity - autonomy GALT 03, Edinburgh GRID XML Database 3 Assumptions • Each piece of data belongs to a node (may be replicated) • Nodes come and go • Node adherence to a known schema is good enough • Nodes are not malicious • XML as a data model, subset of XQuery as the language GALT 03, Edinburgh GRID XML Database 4 Aims • High node autonomy (but for the protocol) • No administration • Scalability • Resilience GALT 03, Edinburgh GRID XML Database 5 The general idea • An overlay network with a dynamic hierarchical structure (peers and super-peers) • A peer receives a query, asks an access plan to a super-peer, executes the access plan • No answer from a node = empty query result from that node • (Update: local) GALT 03, Edinburgh GRID XML Database 6 The challenge • Query routing (with no central schema) – Broadcast: too many messages – Sequential scan: too much time – D-hash: we prefer data to be where it belongs • Peer clustering GALT 03, Edinburgh GRID XML Database 7 We ignore, or postpone • Forever: – Schema integration • For a while: – Replication – Security GALT 03, Edinburgh GRID XML Database 8 Query routing • Every node manages and publishes a level-1 schema (synthetic representation of a superset of its data: a type) • Super-peers manage: – A copy of the level-1 schemas of their subpeers – A summary level-2 schema • Super-peers use level-i schemas to decide who is involved in a query GALT 03, Edinburgh GRID XML Database 9 Issues • Schema formalism • Schema management • Super-peers communication protocol GALT 03, Edinburgh GRID XML Database 10 Schema formalism • XDuce like, with intervals: T ::= [v1…v2] T,T l[T] T or T X (guarded by l[…]) T* • Equivalent to unranked tree automata • Subtyping / intersection emptyness are decidable GALT 03, Edinburgh GRID XML Database 11 Schema management • Level-1 schemas either declared or inferred from data • Level-2 schemas have to be synthesized: – Take the union of level-1 schemas – Simplify the union – Trade off between precision and size? • Management of schema freshness GALT 03, Edinburgh GRID XML Database 12 Super-peers communication protocol •? GALT 03, Edinburgh GRID XML Database 13 Related work • P2P systems: we assume each piece of data lives in a fixed node • Distributed DB / OGSA-DQP: we do not want to assume that a node knows all the schema GALT 03, Edinburgh GRID XML Database 14 Conclusions • Are we tackling a meaningful problem? • So much work is still ahead… GALT 03, Edinburgh GRID XML Database 15