* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download The Data Ring: Community Content Sharing
Survey
Document related concepts
Transcript
The Data Ring: Community Content Sharing Serge Abiteboul (INRIA) Alkis Polyzotis (UC Santa Cruz) Data Sharing Communities Data sharing community: a group of users that share and query information within some domain • Examples: UCSC genome browser, SwissProt, Flickr • Interesting data management problem – – – – Shared information is heterogeneous Data is distributed and dynamic Lack of central administration Users are not database savvy The Data Ring • P2P middleware system that provides: – Monitoring – Querying – …and other database-like services over the distributed information • Main goal: simplicity of use Data abstraction in the data ring • Topological layer • Physical layer • External layer Data abstraction in the data ring Topological Layer QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture. • Declarative query services • Data and query model based on XML Data abstraction in the data ring QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture. Physical Layer • Basic service is distributed query evaluation • Comprises the overlay network (DHT), physical access structures (indices, replicas, views), and the catalog. Data abstraction in the data ring External Layer QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture. • Provides semantically richer data models Data abstraction in the data ring • Our focus is on the topological and physical layer • External layer is equally important and an active research area Topological Layer Physical Layer Thesis #1: formalism for distributed XML data and queries Distributed XML data and queries • What made the relational model successful: – A logic for describing tables – An algebra for query optimization • We need the equivalent for trees in a distributed context: – A logic for describing distributed XML data – An algebra for optimizing distributed XML queries Desiderata for description logic • Seamless transition between data and services – Important for loose data integration • Support for XML streams – Streams are essential for subscription services – They are also necessary to support recursion Starting point: AXML • AXML: XML tree with embedded web service calls – Seamless transition between intentional and extensional data – Provides a simple mechanism for loose data integration • Core concept: XML streams – A web service call returns a stream of elements – Support for both push and pull semantics Desiderata for algebra • Be amenable to rewrites • Capture the topology of distributed computation • Allow seamless transition between logical and physical state – Plans may need to be re-optimized in mid-flight – It may be necessary to perform partial optimization – Error recovery A proposal based on AXML • A distributed plan is a workflow of web services … which is exactly a AXML tree • Components: – An encoding of distributed plans in AXML – Rewrite rules • A nice bonus: plans can be readily exchanged between nodes Disclaimer • AXML is a starting point, not a panacea • Bottom line: we need formalisms for distributed XML queries Thesis #2: autonomic administration Autonomic administration • Users are not database experts – Typically, scientists with computer experience • Users are averse to too many “knobs” • No central authority that is responsible for administration • Autonomic administration is a necessity -- not a gadget Facets of autonomy • Self-monitoring • Self-tuning • Self-healing Some issues • • • • System integration Distribution On-line tuning Pro-active tuning Distributed vs. local tuning • Distributed tuning – Based on the global workload – Catalog organization, replication • Local tuning – Based on local workload – Physical design tuning Data activation for files • A large portion of the data is expected to be in files • We need to develop query processors for data residing in files • File activation: optimize access to the file based on the local workload – E.g., instantiate an index on file contents or materialize a relational view • Local tuning is essential in this context