Download AmbientDB

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
AmbientDB
Relational Query Processing in a P2P Network
Peter Boncz and Caspar Treijtel
LEE BYUNGIL
PL Lab.
Hongik University
2004.11.14
Outline
1.
Introduction
1.1 Goal
1.2 Assumptions
1.3 Example: Collaborative Filtering in a P2P Database
1.4 Overview
2.
AmbientDB Architecture
2.1 Data Model
2.2 Query Execution in AmbientDB
2.3 Dataflow Execution
2.4 Executing the Collaborative Filtering Query
3.
DHTs in AmbientDB
3.1 Example: Approximated Collaborative Filtering
4.
Conclusion
2
1. Introduction (1)
 AmbientDB
 A new peer-to-peer (P2P) DBMS prototype
 Developed at CWI (Centrum voor Wiskurde en Informatica)
 Distributed an ad-hoc P2P network
 Global query algebra

Multi-wave stream processing plans
 Ambient Intelligence (AmI)
 Digital environments in which multimedia services are sensitive
to people’s needs
3
Music Playlist Scenario
 amP2P player
 Log - mata information
 Homogeneous
 Content - AmbientDB instance,
or external sources
 Heterogeneous
 AmbientDB
 Its collection
 Only Meta-information
4
1.1 Goal
Full relational database functionality
Cooperate in ad-hoc way with other AmbientDB
devices
Propose
 A general architecture for AmbientDB
 Complex query processing in ad-hoc P2P network
5
1.2 Assumptions (1)
 Upscaling (flexibility)
 Amount of cooperating devices to be potentially large
 Home environment and ad-hoc P2P network
 Downscaling
 Devices often have few resources (CPU, memory, network, battery)
 Schema integration
 All devices operate under a common global schema
 Data placement
 Data placement is determined by user
 Network failure
 Resilience of Chord
 While a query runs, the routing tree stays intact
6
Chord
7
1.2 Assumptions (2)
Distributed database
 Priori
 Not in AmbientDB
Federated database
 Statically Heterogeneous schema integration
Mobile database
 Centralized database server and client (mobile node)
P2P file sharing system
 Non-centralized and ad-hoc topologies
 Simple keyword text search
8
Example Music Schema
 The global schema “AMP2P”
in AmbientDB
 distributed table
 On the global level
 The union of all horizontal
fragments of these tables
9
1.3 Example : Collaborative Filtering in a P2P Database (1)
amP2P player
 Access to a local content repository (digital music collection)
 AmbientDB instance
 Share all music content in the “home zone”
 Only share the meta-information in the huge P2P network
10
1.3 Example : Collaborative Filtering in a P2P Database (2)
Memory-based implicit voting scheme
 Predicted vote for the active user for item j
 vi,j = the vote of user i on item j
 w(a,i) = weight function defined on the active user and user i
 vi = average vote for user i
 k = nomalizing factor
weight(usera, useri)
 Times the example song has been fully played by user i
Refined form
 Negative information – skipped
11
Collaborative Filtering Query in SQL
12
1.4 Overview
General architecture
 Include Data model
Query execution
 Three-level query execution process
DHT (Distributed Hash Table)
 Global table indices
Optimize the query
Related work & future work
Conclusion
13
AmbientDB Architecture
14
2. AmbientDB Architecture
 Distributed Query processor
 Execute query on all ad-hoc connected devices
 P2P protocol
 Chord





scalable lookup and routing scheme
P2P IP overlay networks made out of unreliable connections
Query node = root
A small number of connections per node
Simultaneous bi-directional communication and query processing
 DHTs – global table indices
 Local DB component
 Local table
 Embedded database
 External data source – wrapper component (distributed database system)
 Schema integration engine
 Meta-data translation
 Using view-based schema mappings
15
AmbientDB Routing Tree Using IP Overlay
16
2.1 Data Model (1)
Standard relational data model & algebra as query
language
Query are formulated against global tables
Local node or limited set of node or all reachable
nodes
Converging answer
 Query locally
 Re-issue iteratively over more nodes
17
2.1 Data Model (2)
Abstract Table
 LT (Local Table)
 Each node has private schema
 Global schema – global table T
 All participating nodes Ni carry a table instance Ti
 In query node
 Ti may be accessed as a LT
 DT (Distributed Table)
 Q : Set of node that participate in some global query
 The union of local table instances
18
2.1 Data Model (3)
 PT (Partitioned Table)
 Specialization of the DT
 All participating tuples in each Ti are disjunct between all nodes
 Advantage over DT
 Exact query answers can often be computed in an efficient distributed fashion
 By broadcasting a query and letting each node compute a local result without
need for communication
 Attaching a bitmap index Ti.Q to each local table Ti
“virtual” column
 #NODEID
 Be aware in which node are located
 Stored in a DT/PT
 Location-specific query restrictions
19
LT, DT and PT
20
2.2 Query Execution in AmbientDB (1)
 Three level translation
 Abstract level
 User query
 Selection, join, aggregation, sort
 Lists
 (List<Type>)
 List instances
 <a,b,c>
 Concrete level
 Table parameters, return value
 Partition, union
 Execution level
 Wave-plans
21
The Abstract Global Algebra
22
The Concrete Global Algebra
23
2.2 Query Execution in AmbientDB (2)
Starting at the leaves
 Abstract query plan -> concrete
 Concrete operator have concrete result type
 Process continue to the root of the query graph
 Local result table, hence LT
Local concrete variant of all abstract operators
 All tables -> LT
Concrete union
 (T1)-> LT
 More efficient alternative query plans
24
2.2 Query Execution in AmbientDB (3)
select, aggr, order support distributed execution(dist)
 Execute in all node on their local partition (LT) of a PT or a DT
 Produce again a distributed result (PT or DT)
 Broadcast the query through the routing tree
 The result is again dispersed over all node as a PT or DT
Aggrmerge = aggrlocal(unionmerge(DT)):LT
 Reduce the fragments to be collected in the query node
 Save considerable bandwidth
25
2.2 Query Execution in AmbientDB (4)
 join variants
 Broadcast join
(LT, T1)->T1
 Foreign-key join
(T1,DT)->T1
 Referential integrity to minimize communication
 Split join
(LT1,T1)->T1
 Reduce bandwidth consumption
 O(T*N) -> O(T*log(N))
 partition
 A special operator that performs double elimination
 Create a PT from a DT by creating a tuple participation bitmap at all
nodes
 To be able to use the dist operators
 We should convert a DT to a PT
26
Mappings
27
2.3 Dataflow Execution (1)
 Query processing paradigm
 Routing tree using TCP connections is used to pass bidirectional tuple streams
 Multiple simultaneous such waves (upward and downward)
 Third translation phase
 Concrete query plan -> wave-plans
 Concrete operator

One or more waves (Local dataflow aglebra operators)
28
2.3 Dataflow Execution (2)
 dist plans for select, aggr, order and foreign-key join
 buffer-to-buffer local operator in each node, without further
communication
 broadcast join
 Propagates a tuple wave through the network
 split
 Split(<true,true>,<c1,c1>)
 Ordered -> effectively forming a DT/PT
 scan-select, quick-sort, merge-join, heap-based top-N,
ordered aggregation
 All stream-based
 Require little memory
29
The Dataflow Algebra
30
2.4 Executing the Collaborative Filtering Query (1)
31
2.4 Executing the Collaborative Filtering Query (2)
32
2.4 Executing the Collaborative Filtering Query (3)
Problems
 Query 1
 Large list of all users that have ever listened to the example song
 Hog resources from all nodes in the network
 Query 2
 Basically send all log record to the query node for aggregation
More efficiently in an AmbientDB enriched with DHTs
33
3. DHTs in AmbientDB (1)
Useful lookup structures for large-scale P2P
applications
Reduce the amount of nodes involved in answering
a query
 Involving many nodes
 Decrease query performance
 Create an overload in the average query frequency
Gnutella (not use DHT or global indices)
 Easy to locate popular music
 Difficult to locate less wel-known songs
34
3. DHTs in AmbientDB (2)
To enable the query optimizer to automatically
accelerate selection queries using such DHTs
DHT indices can be exploited by a query optimizer
to accelerate lookup queries
Special form of a PT, as the partitions are disjunct
selectchord(DHT):LT
 Dataflow level
 Route a message to the Chord finger on which the selection key-value
hashes
 Retrieving all corresponding tuples as an LT via a direct TCP/IP transfer
Non-complete index
35
DT and DHT in AmbientB
36
3.1 Example: Approximated Collaborative Filtering (1)
 HISTO
 Static histogram of fullylistened-to songs per user
 Reduce the histogram
computation cost of query
37
Optimized collaborative filtering query in SQL
38
3.1 Example: Approximated Collaborative Filtering (2)
39
3.1 Example: Approximated Collaborative Filtering (3)
40
Network Bandwidth Compared
41
4. Conclusion
Full query processing architecture
 Executing queries in a declarative, optimizable language, over
an ad-hoc P2P network
DHT
 Efficient global indices
42
Related documents