Download nodes

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

AppleTalk wikipedia , lookup

Airborne Networking wikipedia , lookup

Backpressure routing wikipedia , lookup

List of wireless community networks by region wikipedia , lookup

Recursive InterNetwork Architecture (RINA) wikipedia , lookup

Distributed operating system wikipedia , lookup

IEEE 802.1aq wikipedia , lookup

Routing wikipedia , lookup

Everything2 wikipedia , lookup

CAN bus wikipedia , lookup

Routing in delay-tolerant networking wikipedia , lookup

Kademlia wikipedia , lookup

Transcript
Peer-to-Peer Structured
Overlay Networks
Antonino Virgillito
Background
Peer-to-peer systems
•
•
•
•
•
distribution
symmetry (communication, node roles)
decentralized control
self-organization
dynamicity
Data Lookup in P2P Systems
• Data items spread over a large number of
nodes
• Which node stores which data item?
• A lookup mechanism needed
– Centralized directory -> bottleneck/single
point of failure
– Query Flooding -> scalability concerns
– Need more structure!
More Issues
• Organize, maintain overlay network
– node arrivals
– node failures
• Resource allocation/load balancing
• Resource location
• Network proximity routing
What is a Distributed HashTable?
• Exactly that 
• A service, distributed over multiple machines,
with hash table semantics
– put(key, value), Value = get(key)
• Designed to work in a peer-to-peer (P2P)
environment
• No central control
• Nodes under different administrative control
• But of course can operate in an “infrastructure”
sense
What is a DHT?
• Hash table semantics:
put(key, value),
Value = get(key)
•
•
•
•
•
Key is a single flat string
Limited semantics compared to keyword search
Put() causes value to be stored at one (or more) peer(s)
Get() retrieves value from a peer
Put() and Get() accomplished with unicast routed
messages
• In other words, it scales
• Other API calls to support application, like notification
when neighbors come and go
Distributed Hash Tables (DHT)
nodes
k1,v1
Operations:
put(k,v)
get(k)
P2P
overlay
network
k2,v2
k3,v3
k4,v4
k5,v5
k6,v6
• p2p overlay maps keys to nodes
• completely decentralized and self-organizing
• robust, scalable
Popular DHTs
• Tapestry (Berkeley)
– Based on Plaxton trees---similar to hypercube routing
– The first* DHT
– Complex and hard to maintain (hard to understand
too!)
•
CAN (ACIRI), Chord (MIT), and Pastry
(Rice/MSR Cambridge)
– Second wave of DHTs (contemporary with and
independent of each other)
DHTs Basics
• Node IDs can be mapped to the hash key space
• Given a hash key as a “destination address”,
you can route through the network to a given
node
• Always route to the same node no matter where
you start from
• Requires no centralized control (completely
distributed)
• Small per-node state is independent of the
number of nodes in the system (scalable)
• Nodes can route around failures (fault-tolerant)
Things to look at
• What is the structure?
• How does routing work in the structure?
• How does it deal with node joins and
departures (structure maintenance)?
• How does it scale?
• How does it deal with locality?
• What are the security issues?
The Chord Approach
• Consistent Hashing
• Logical Ring
• Finger Pointers
The Chord Protocol
• Provides:
– A mapping successor: key -> node
– To lookup key K, go to node successor(K)
• successor defined using consistent hashing:
– Key hash
– Node hash
– Both Keys and Nodes hash to same (circular)
identifier space
– successor(K)=first node with hash ID equal to or
greater than hash(K)
Example: The Logical Ring
Nodes 0, 1, 3
Keys 1, 2, 6
Consistent Hashing
[Karger et al. ‘97]
• Some Nice Properties:
– Smoothness: minimal key movement on node
join/leave
– Load Balancing: keys equitably distributed
over nodes
Mapping Details
• Range of Hash Function
– Circular ID space module 2m
• Compute 160 bit SHA-1 hash, and
truncate to m-bits
– Chance of collision rare if m is large enough
• Deterministic, but hard for an adversary to
subvert
Chord State
• Successor/Predecessor in
the Ring
• Finger Pointers
– n.finger[i] = successor (n+2 i-1)
– Each node knows more about
portion of circle close to it!
Example: Finger Tables
Chord: routing protocol
Notation n.foo( ) stands for a
remote call to node n.
- A set of nodes towards id are
contacted remotely
- Each node is queried for the
known node which is closest to
id
- Process stops when a node
is found having successor > id
Example: Chord Routing
Finger Pointers for Node 1
Lookup Complexity
• With high probability: O(log(N))
• Proof Intuition:
– Being p the successor of the targeted key, distance to
p reduces by at least half in each step
– In m steps, would reach p
– Stronger claim: In O(log(N)) steps, distance ≤ 2m/N
Thereafter even linear advance will suffice to give
O(log(N)) lookup complexity
Chord invariants
• Every key in the network can be located
as long as the following invariants are
preserved after joins and leaves:
– Each node’s successor is correctly maintained
– For every key k, node successor(k) is
responsible for k
Chord: Node Joins
• New node B learns of at least one existing
node A via external means
• B asks A to lookup its finger-table
information
– Given that B’s hash-id is b, A does lookup for
B.finger[i] = successor ( b + 2i-1) if interval not
already included in finger[i-1]
– B stores all finger information and sets up
pred/succ pointers
Node Joins (contd.)
•
Update of finger table of existing nodes p such
that:
1. p precedes b by at least 2i-1
2. the i-th finger of node p succeeds b
– Starts from p = predecessor( b - 2i-1 ) and proceeds
in counter-clock-wise direction while 2. is true
•
Transferring keys:
–
–
Only from successor(b) to b
Must send notification to the application
Example: finger table update
Node 6 joins
Example: transferring keys
Node 1 leaves
Concurrent Joins/Leaves
• Need a stabilization protocol to guard against
inconsistency
• Note:
– Incorrect finger pointers may only increase latency,
but incorrect successor pointers may cause lookup
failure!
• Nodes periodically run stabilization protocol
– Finds successor’s predecessor
– Repair if this isn’t self
• This algorithm is also run at join
Example: node 25 joins
Example: node 28 joins before 20
stabilizes (1)
Example: node 28 joins before 20
stabilizes (2)
CAN
• Virtual d-dimensional
Cartesian coordinate
system on a d-torus
– Example: 2-d [0,1]x[1,0]
• Dynamically partitioned
among all nodes
• Pair (K,V) is stored by
mapping key K to a point P in the space using a uniform
hash function and storing (K,V) at the node in the zone
containing P
• Retrieve entry (K,V) by applying the same hash function
to map K to P and retrieve entry from node in zone
containing P
– If P is not contained in the zone of the requesting
node or its neighboring zones, route request to
neighbor node in zone nearest P
Routing in a CAN
• Follow straight line path through the
Cartesian space from source to
destination coordinates
• Each node maintains a table of the IP
address and virtual coordinate zone of
each local neighbor
• Use greedy routing to neighbor closest to
destination
• For d-dimensional space partitioned into
n equal zones, nodes maintain 2d
neighbors
– Average routing path length:
1

d
  d 
  n 
 4  
CAN Construction
• Joining node locates a bootstrap
node using the CAN DNS entry
– Bootstrap node provides IP addresses
of random member nodes
• Joining node sends JOIN request to
random point P in the Cartesian space
• Node in zone containing P splits the
zone and allocates “half” to joining node
• (K,V) pairs in the allocated “half” are
transferred to the joining node
• Joining node learns its neighbor set
from previous zone occupant
– Previous zone occupant updates its neighbor set
Departure, Recovery and
Maintenance
• Graceful departure: node hands over its zone and the
(K,V) pairs to a neighbor
• Network failure: unreachable node(s) trigger an
immediate takeover algorithm that allocate failed node’s
zone to a neighbor
– Detect via lack of periodic refresh messages
– Neighbor nodes start a takeover timer initialized in proportion to
its zone volume
– Send a TAKEOVER message containing zone volume to all of
failed node’s neighbors
– If received TAKEOVER volume is smaller kill timer, if not reply
with a TAKEOVER message
– Nodes agree on neighbor with smallest volume that is alive
Pastry
Generic p2p location and routing
substrate
• Self-organizing overlay network
• Lookup/insert object in < log16 N
routing steps (expected)
• O(log N) per-node state
• Network proximity routing
Pastry: Object distribution
2128-1 O
Consistent hashing
128 bit circular id space
objId
nodeIds (uniform random)
objIds (uniform random)
nodeIds
Invariant: node with
numerically closest nodeId
maintains object
Pastry: Object insertion/lookup
2128-1 O
X
Msg with key X
is routed to live
node with nodeId
closest to X
Problem:
complete routing
table not feasible
Route(X)
Pastry: Routing table (# 65a1fc)
Row 0
0
x
1
x
2
x
3
x
4
x
Row 1
6
0
x
6
1
x
6
2
x
6
3
x
6
4
x
Row 2
6
5
0
x
6
5
1
x
6
5
2
x
6
5
3
x
6
5
4
x
Row 3
6
5
a
0
x
6
5
a
2
x
6
5
a
3
x
6
5
a
4
x
log16 N
rows
5
x
7
x
8 9 a
x x x
b
x
c
x
d
x
e
x
f
x
6 6
6 7
x x
6 6 6
8 9 a
x x x
6
b
x
6
c
x
6
d
x
6
e
x
6
f
x
6
5
5
x
6
5
6
x
6
5
7
x
6
5
8
x
6
5
9
x
6
5
b
x
6
5
c
x
6
5
d
x
6
5
e
x
6
5
f
x
6
5
a
5
x
6
5
a
6
x
6
5
a
7
x
6
5
a
8
x
6
5
a
9
x
6
5
a
b
x
6
5
a
c
x
6
5
a
d
x
6
5
a
e
x
6
5
a
f
x
6
5
a
a
x
Pastry: Leaf sets
Each node maintains IP addresses of the
nodes with the L/2 numerically closest
larger and smaller nodeIds, respectively.
• routing efficiency/robustness
• fault detection (keep-alive)
• application-specific local coordination
Pastry: Routing procedure
if (destination is within range of our leaf set)
forward to numerically closest member
else
let l = length of shared prefix
let d = value of l-th digit in D’s address
if (Rld exists)
forward to Rld
else
forward to a known node that
(a) shares at least as long a prefix
(b) is numerically closer than this node
Pastry: Routing
d471f1
d467c4
d462ba
d46a1c
d4213f
Route(d46a1c)
65a1fc
d13da3
Properties
• log16 N steps
• O(log N) state
Pastry: Performance
Integrity of overlay message delivery:
• guaranteed unless L/2 simultaneous failures
of nodes with adjacent nodeIds
Number of routing hops:
• No failures: < log16 N expected, 128/b + 1
max
• During failure recovery:
– O(N) worst case, average case much better
Pastry Join
• X = new node, A = bootstrap, Z = nearest
node
• A finds Z for X
• In process, A, Z, and all nodes in path
send state tables to X
• X settles on own table
– Possibly after contacting other nodes
• X tells everyone who needs to know about
itself
Pastry Leave
• Noticed by leaf set neighbors when leaving node
doesn’t respond
– Neighbors ask highest and lowest nodes in leaf set
for new leaf set
• Noticed by routing neighbors when message
forward fails
– Immediately can route to another neighbor
– Fix entry by asking another neighbor in the same
“row” for its neighbor
– If this fails, ask somebody a level up