Download PPT

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

AppleTalk wikipedia , lookup

Backpressure routing wikipedia , lookup

Airborne Networking wikipedia , lookup

Recursive InterNetwork Architecture (RINA) wikipedia , lookup

List of wireless community networks by region wikipedia , lookup

Distributed operating system wikipedia , lookup

IEEE 802.1aq wikipedia , lookup

Everything2 wikipedia , lookup

CAN bus wikipedia , lookup

Routing in delay-tolerant networking wikipedia , lookup

Kademlia wikipedia , lookup

Transcript
Other Structured P2P
Systems
CAN, BATON
Lecture 4
1
CAN
 A Scalable Content Addressable Network
2
CAN (content-addressable network)
 hash value is viewed as a point in a D-dimensional cartesian space
 each node responsible for a D-dimensional “cube” in the space
 nodes are neighbors if their cubes “touch” at more than just a point
(more formally, nodes s & t are neighbors when


s contains some
[<n1, n2, …, ni, …, nj, …, nD>, <n1, n2, …, mi, …, nj, … nD>]
and t contains
[<n1, n2, …, ni, …, nj+δ, …, nD>, <n1, n2, …, mi, …, nj+ δ, … nD>])
2
3
1
6
5
4
7
8
3
•
•
•
•
•
Example: D=2
1’s neighbors: 2,3,4,6
6’s neighbors: 1,2,4,5
Squares “wrap around”, e.g., 7 and 8 are neighbors
expected # neighbors: O(D)
CAN routing
 To get to <n1, n2, …, nD> from <m1, m2, …, mD>

choose a neighbor with smallest cartesian distance from
<m1, m2, …, mD> (e.g., measured from neighbor’s center)
•
2
3
1
6
5
4
7
8
X
4
•
•
•
•
e.g., region 1 needs to send to node covering
X
checks all neighbors, node 2 is closest
forwards message to node 2
Cartesian distance monotonically decreases
with each transmission
expected # overlay hops: (DN1/D)/4
CAN node insertion
 To join the CAN:




find some node in the CAN (via
bootstrap process)
choose a point in the space uniformly at
random
10
using CAN, inform the node that
currently covers the space
that node splits its space in half




5
1st split along 1st dimension
if last split along dimension i < D, next
split along i+1st dimension
e.g., for 2-d case, split on x-axis, then yaxis
keeps half the space and gives other half
to joining node
2
3
X
9
7
1
6
5
4
8
Observation: the likelihood of
a rectangle being selected is
proportional to it’s size, i.e.,
big rectangles chosen more
frequently
CAN node removal
4
 Underlying cube structure should remain
intact

2
5
3
1
6
1
6
i.e., if the spaces covered by s & t were not
formed by splitting a cube, then they should
not be merged together
 Sometimes, can simply collapse removed
node’s portion to form bigger rectangle

e.g., if 6 leaves, its portion goes back to 1
 Other times, requires juxtaposition of nodes’
areas of coverage



6
e.g., if 3 leaves, should merge back into
square formed by 2,4,5
cannot simply collapse 3’s space into 4 and/or
5
one solution: 5’s old space collapses into 2’s
space, 5 takes over 3’s space
2
44 2
5
35
CAN (recovery from) removal process
4
7
8
2
12
5
11
9
10
3
1
6
14
13
11
13
7
14
12
1
6
3
8
4
9
10 5
2
 View partitioning as a binary tree of



7
leaves represent regions covered by overlay nodes (labeled by node that
covers the region)
intermediate nodes represent “split” regions that could be “reformed”, i.e.,
a leaf can appear at that position
siblings are regions that can be merged together (forming the region that is
covered by their parent)
CAN (recovery from) removal process
4
3
2
5
1
6
14
13
11
X3
7
12
 Repair algorithm when leaf s is removed

Find a leaf node t that is either




1
6
s’s sibling
descendant of s’s sibling where t’s sibling is also a leaf node
8
4
9
10 5
t takes over s’s region (moves to s’s position on the tree)
t’s sibling takes over t’s previous region
 Distributed process in CAN to find appropriate t w/ sibling:


current (inappropriate) t sends msg into area that would be covered by a sibling
if sibling (same size region) is there, then done. Else receiving node becomes t &
repeat
8
2
BATON
 A Balanced Tree Structure for Peer-to-Peer Networks
9
Overlay Networks
 Add additional routing protocol on top of network routing
(i.e. TCP/IP)
 Need to be resilient to node failures, intermittent
connectivity

Redundancy in data and metadata
 Most popular is Distributed Hash Table (DHT)


Pastry
Chord
Chord Overview





Randomly assign each peer an ID between 0 and 2m
Hash key to find ID for an item
Next peer after ID (mod 2m)is responsible for storing item
Each peer has routing table to subsequent peers
Table is used for incremental routing
Chord Routing Example (m = 6)
Figures poached from Stoica et al., “Chord: A Scalable Peer-to-peer Lookup Service for Internet
Applications,” SIGCOMM 2001.
DHTs: The Good, the Bad and the Ugly
 Good



Efficient routing algorithms
Easy to understand
Simple replication methods
 Bad

Load balancing depends on good choice of hash function
 Ugly


Can only do exact ID lookups, since items are clustered in key
space
Would like to do range and prefix searches
Related Work
 P-Grid (CoopIS’01)
Based on a binary prefix tree structure.
 Can not guarantee log N search step boundary

 P-Tree (WebDB’04)
Based on B+-tree structure and uses CHORD as overlay framework
 Each node maintains a branch of the B+tree
 Expensive cost to keep consistence knowledge among nodes

 Multi-way tree (DBISP2P’04)
Each node maintains links to its parent, its siblings, its neighbors,
and its children
 Can not guarantee log N search step boundary

BATON Architecture
 BATON: BAlanced Tree Overlay Network
 Definition: A tree is balanced if and only if at any node in
the tree, the height of its two subtrees differ by at most one.
Binary Balanced Tree Index Architecture
Theorems
 Theorem 1: The tree is a balanced tree if every node in the
tree that has a child also has both its left and right routing
tables full (*).
 Theorem 2: If a node, say x, contains a link to another node,
say y, in its left or right routing tables, parent node of x must
also contain a link to parent node of y unless the same node
is parent of both x and y.
(*) A routing table is full if none of the valid links is NULL.
Node join
 Example: new node u joins the network
u
a
b
c
d
h
e
i
j
p
f
k
q
l
g
m
r
n
u
o
s
Node join
 Cost of finding a node to join: O(log N)
 When a node accepts a new node as its child




Split half of its content (its range of values) to its new child
Update adjacent links of itself and its new child
Notify both its neighbor nodes and its new child’s neighbor
nodes to update their knowledge
Cost: 6 log N
Node departure
 When a node wishes to leave the network

If it is a leaf node and there is no neighbor node having children, it
can leave the network





Transfer its content to the parent node, and update correspondence adjacent link
Notify its neighbor nodes and its parent’s neighbor nodes to update their
knowledge
Cost: 4 log N
If it is a leaf node and there is a neighbor node having children, it
needs to find a leaf node to replace it by sending a
FINDREPLACEMENTNODE request to a child of that neighbor
node
If it is an intermediate node, it needs to find a leaf node to replace it
by sending a FINDREPLACEMENTNODE to one of its adjacent
nodes
Node departure
 Example: existing node b leaves the network
a
br
c
d
h
p
e
i
j
q
f
k
l
r
g
m
s
n
o
t
u
Node departure
 Cost of finding a leaf node to replace: O(log N)
 When a node comes to replace a leave node



Notify its parent and its neighbor nodes as in case of leaf node
leaving: 4 log N
Notify its new parent node, its new neighbor nodes, and the parent’s
neighbor nodes: 4 log N
Total cost: 8 log N
Fault tolerance
 Node failure



Nodes discovering failure of a node report to that node’s parent.
The failure’s parent node will take responsibility for finding a leaf
node to replace if necessary.
Routing information of the failure node can be recovered by
contacting its neighbor nodes via routing information of its parent.
 Fault tolerance: failure node can be passed by two ways



Through routing tables (similar to CHORD) - horizontal axis
Through parent-child and adjacent links - vertical axis
Specifically, even if all nodes at the same level fail, the tree is not
partitioned
Network restructuring
 Necessary in case of forced join or forced leave that is used
in load balancing scheme
 Network restructuring is triggered when the condition in the
theorem 1 is violated
 Network restructuring is done by shifting nodes via adjacent
links


No data movement is required
Each shifted node requires O(log N) effort to update routing tables
Forced join
 Example 1: network restructure is triggered as a forced join
Forced leave
 Example 2: network restructuring is triggered as a forced leave
Index construction
 Each node is assigned a range of values
 The range of values directly managed by a node is


Greater than the range managed by its left adjacent node
Smaller than the range managed by its right adjacent node
Exact match query
 Example: node h wants to search data belonged to node c,
say 74
a
[45-51)
b
c
[12-17)
[72-75)
d
e
f
g
[5-8)
[23-29)
[54-61)
[81-85)
h
i
j
k
l
m
n
[0-5)
[8-12)
[17-23)
[34-39)
[51-54)
[61-68)
[75-81)
p
q
[29-34) [39-45)
r
[68-72)
o
[89-93)
s
t
[85-89) [93-100)
Range query
 Process similar to exact match query


First, find an intersection with searched range
Second, follow adjacent links to retrieve all results
 Cost


Exact match query: O(log N)
Range query: O(log N + X) where X is the total number of nodes
containing searched results
Data insertion and deletion
 Insertion

Follow the exact match query process to find the node where data
should be inserted except that


If it is the left most node and the inserted value is still less than its lower bound,
or if it is the right most node and the inserted value is still greater than its upper
bound, it expands its range of values to accept the new inserted value.
In this case, additional log N cost is needed for updating routing tables
 Deletion

Follow the exact match query process to find the node containing
data which should be deleted
 Cost: similar to exact match query process

O(log N) for both insertion and deletion
Load balancing
 Load balancing process is initialized when a node is
overloaded or under loaded due to insertion or deletion
 2 load balancing schemes


Do load balancing with adjacent nodes
An overloaded node finds a lightly loaded node to share work load
(only if the overloaded / under loaded node is a leaf node)




A lightly loaded node is found by traveling through neighbor nodes within
O(logN) steps.
Once found, the lightly loaded node transfers its content to one of its adjacent
nodes, forced leaves its current position, and forced joins as a child of the
overloaded node.
Network restructuring is triggered if necessary
Similar process is applied to under loaded nodes
 Cost: O(log N) for each node attending load balancing
process
Load balancing
 Example: node g is an overloaded node while node f is a
lightly loaded node
Experimental study
 Experimental setup




Test the network with different number of nodes N from 1000 to
10000.
For a network of size N, 1000 x N data values in the domain of [1,
1000000000) are inserted in batches
1000 exact queries, and 1000 range queries are executed
CHORD and Multi-way tree are used to compare
Join and leave operations
Cost of finding join node
and replacement node
Cost of updating
routing tables
Insert and delete operations
Cost of insert and delete operations
Search operations
Cost of exact match query
Cost of range query
Access load
Access load for nodes at different levels
Effect of load balancing
Average messages of load
balancing operation
Size of load
balancing process
Effect of network dynamics
Network Dynamics
Conclusion
 BATON


The first P2P overlay network based on a balanced tree structure
Strengths




Incur less cost of updating routing tables compared to other systems
Support both exact match query and range query efficiently
Flexible and efficient load balancing scheme
Scalability (NOT bounded by network size or ID space before hand)
Thank you
Q&A