* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download CS2104 Lecture1 - Royal Institute of Technology
Cracking of wireless networks wikipedia , lookup
Computer network wikipedia , lookup
Network tap wikipedia , lookup
Backpressure routing wikipedia , lookup
Distributed operating system wikipedia , lookup
Recursive InterNetwork Architecture (RINA) wikipedia , lookup
Airborne Networking wikipedia , lookup
List of wireless community networks by region wikipedia , lookup
IEEE 802.1aq wikipedia , lookup
Introduction to
Structured Overlay
Networks
Seif Haridi
KTH/SICS
5/25/2017
1
Presentation Overview
Gentle introduction to Structured Overlay
Networks and Distributed Hash Tables
General use of SONs and DHTs
Chord algorithms and others
5/25/2017
EP2400 Network Algorithms, Seif Haridi
2
What’s a Distributed Hash Table (DHT)?
5/25/2017
An ordinary hash table
, which is distributed
Key
Value
Alexander
Berlin
Ali
Stockholm
Marina
Gothenburg
Peter
Louvain la neuve
Seif
Stockholm
Stefan
Stockholm
Every node provides a lookup operation
Provide the value associated with a key
Nodes keep routing pointers
If item not found, route to another node
3
So what?
Characteristic
Scalability
properties
Time
to findofdata
is
Store
number
items
Self-management
routing info:
logarithmic
proportional
toinformation
number of
• Ensure
routing
Size
of routing tables is
nodes
is up-to-date
logarithmic
Typically:
Example:
of items:
of nodes can be huge Self-management
With D items and n nodes
that data is always
log2(1000000)≈20
Number of items can be huge • Ensure
Number
Store
D/n
items
per node
replicated
and available
EFFICIENT!
in presence joins/leaves/failures
Move D/n items when
nodes join/leave/fail
Routing information
Data items
EFFICIENT!
Self-manage
5/25/2017
4
Traditional Motivation (1/2)
Peer-to-Peer file sharing very
popular
central index
Napster
Completely centralized
Central server knows who has what
Judicial problems
decentralized index
Gnutella
5/25/2017
Completely decentralized
Ask everyone you know to find data
Very inefficient
EP2400 Network Algorithms, Seif Haridi
5
Traditional Motivation (2/2)
Grand vision of DHTs
Provide efficient file sharing
Quote from Chord: ”In particular, [Chord] can help avoid single points
of failure or control that systems like Napster possess, and the lack of
scalability that systems like Gnutella display because of their
widespread use of broadcasts.” [Stoica et al. 2001]
Hidden assumptions
Millions of unreliable nodes
User can switch off computer any time (leave=failure)
Extreme dynamism (nodes joining/leaving/failing)
Heterogeneity of computers and latencies
Untrusted nodes
5/25/2017
EP2400 Network Algorithms, Seif Haridi
6
Motivation: DHT overlay as
communication infra-structure
Internet communication
Not suited for 21st century computing
5/25/2017
IP/port, TCP and UDP
Firewalls
NATs
Changing IP addresses
EP2400 Network Algorithms, Seif Haridi
7
Name based communication
DHT’s can overcome these
node A
Value
seif
130.237.32.51
babak
193.10.64.99
mihhail
18.7.22.83
olle
…
128.178.50.12
…
node B
node D
node C
How?
5/25/2017
Key
Use the DHT
Map names to locations
Bypass firewalls and NATs by routing through
neighbors
EP2400 Network Algorithms, Seif Haridi
8
Name based communication
What about group communication?
IP Multicast is not enabled on the Internet
Use the overlay to broadcast to all nodes
Create multiple groups, broadcast within each
node A
node A
5/25/2017
Key
Value
seif
130.237.32.51
babak
193.10.64.99
mihhail
18.7.22.83
olle
…
128.178.50.12
…
node B
node D
node C
Key
Value
seif
130.237.32.51
babak
193.10.64.99
mihhail
18.7.22.83
olle
…
128.178.50.12
EP2400 Network Algorithms, Seif Haridi
…
node B
node D
node C
9
What’s it good for?
Lets look at 10 applications built using such
systems
5/25/2017
EP2400 Network Algorithms, Seif Haridi
10
Distributed Authorization
Defense project at SICS, Swedish Institute
of Computer Science
node A
Key
Value
seif
130.237.32.51
babak
193.10.64.99
mihhail
18.7.22.83
olle
128.178.50.12
node B
Store certificates in
the directory
node C
node D
…
…
No central server
Survives even if nodes are attacked
5/25/2017
EP2400 Network Algorithms, Seif Haridi
11
Distributed Backup
Setup
Clients installed the backup tool
Decide on amount of space to share
Choose files for backup
node A
Regular backup
Key
Value
seif
130.237.32.51
babak
193.10.64.99
mihhail
18.7.22.83
olle
128.178.50.12
node B
Data is encrypted
Stored in the directory
node C
node D
…
5/25/2017
EP2400 Network Algorithms, Seif Haridi
…
12
Distributed File System
Similar to AFS and NFS
Files stored in directory
What is new?
Application logic self-managed
node A
Key
Value
seif
130.237.32.51
babak
193.10.64.99
mihhail
18.7.22.83
olle
128.178.50.12
node B
node C
node D
…
…
Add/remove servers on the fly
Automatically handles failures
Automatically load-balances
No manual configuration needed
5/25/2017
EP2400 Network Algorithms, Seif Haridi
13
P2P Cache
A distributed cache
Every node in an org. runs a client
Want to browse a web page?
If exists locally -> download it from a peer
Otherwise, fetch and cache
Key
Value
seif
130.237.32.51
babak
193.10.64.99
mihhail
18.7.22.83
olle
128.178.50.12
node A
node B
No central proxy needed
…
5/25/2017
EP2400 Network Algorithms, Seif Haridi
…
node C
node D
14
P2P Web Servers
Distributed Web Server
node A
Pages stored in the directory
What is new?
Key
Value
seif
130.237.32.51
babak
193.10.64.99
mihhail
18.7.22.83
olle
128.178.50.12
node B
…
node C
node D
…
Application logic self-managed
Automatically load-balances
Add/remove servers on the fly
Automatically handles failures
5/25/2017
EP2400 Network Algorithms, Seif Haridi
15
P2P SIP
node A
Session Initiation Protocol
Key
Value
node B
Used to initiate calls on the Internet
seif
130.237.32.51
babak
193.10.64.99
mihhail
18.7.22.83
olle
128.178.50.12
Is being standardized
Use the directory to find end-hosts
5/25/2017
node C
node D
…
…
Improving Skype
EP2400 Network Algorithms, Seif Haridi
16
Host Identity Payload (HIP)
Uses the directory to provide seamless
mobility
Unlike Mobile IP
5/25/2017
No home agent needed
Self-managing
EP2400 Network Algorithms, Seif Haridi
17
PIER (databases)
A relational view of the directory
Use SQL to fetch data
5/25/2017
Standard operations (projection, selection, equijoin)
EP2400 Network Algorithms, Seif Haridi
18
Summary
DHT is a useful data structure
Assumptions mentioned might not be true
Moderate amount of dynamism
Leave not same thing as failure
Dedicated servers
5/25/2017
Nodes can be trusted
Less heterogeneity
EP2400 Network Algorithms, Seif Haridi
19
Chord as Example of DHT
5/25/2017
EP2400 Network Algorithms, Seif Haridi
20
How to construct a DHT (Chord)?
Use a logical name space, called the identifier space, consisting of
identifiers {0,1,2,…, N-1}
Identifier space is a logical ring modulo N
Every node picks a random identifier though Hash H
Example:
Space N=16 {0,…,15}
15
a picks 6
b picks 5
c picks 0
d picks 11
e picks 2
1
14
Five nodes a, b, c, d, e
0
2
13
3
12
4
5
11
5/25/2017
EP2400 Network Algorithms, Seif Haridi
6
10
9
8
7
21
Definition of Successor
The successor of an identifier is the
first node met going in clockwise direction
starting at the identifier
15
Example
succ(12)=14
succ(15)=2
succ(6)=6
0
1
14
2
13
3
12
4
5
11
5/25/2017
6
10
9
8
7
22
Where to store data (Chord) ?
Use
globally known hash function, H
Store number of items
Each item <key,value> gets
proportional to number of
identifier H(key) = k
nodes
Typically:
Store
is responsible for item k
With D items and nn nodes
Node
EFFICIENT!
H(“Seif”)=9
5/25/2017
H(“Stefan”)=14
Alexander
Berlin
Marina
Seif
Gothenburg
Louvain la
neuve
Stockholm
Stefan
Stockholm
15
Example
Move D/n items
when
H(“Marina”)=12
nodes join/leave/fail
H(“Peter”)=2
Value
Peter
each item at its successor
Store D/n items per node
Key
0
1
14
2
13
3
12
4
5
11
6
10
9
8
7
23
Where to point (Chord) ?
Each
node points to its successor
successor of a node n is succ(n+1)
Known as a node’s succ pointer
The
Each
node points to its predecessor
node met in anti-clockwise direction starting at n-1
Known as a node’s pred pointer
First
Example
0’s successor is succ(1)=2
2’s successor is succ(3)=5
5’s successor is succ(6)=6
6’s successor is succ(7)=11
11’s successor is succ(12)=0
15
1
14
5/25/2017
0
2
13
3
12
4
5
11
6
10
9
8
7
24
DHT Lookup
To
lookup a key k
Calculate H(k)
Follow succ pointers until item k is
found
Key
Value
Alexander
Berlin
Marina
Gothenburg
Peter
Louvain la neuve
Seif
Stockholm
Stefan
Stockholm
Example
Lookup ”Seif” at node 2
H(”Seif”)=9
Traverse nodes:
2, 5, 6, 11 (BINGO)
Return “Stockholm” to initiator
5/25/2017
15
0
1
14
2
13
3
12
4
5
11
6
10
9
8
7
25
DHT Lookup
(a, b] the segment of the ring moving clockwise from but not
including a until and including b
n.foo(.) denotes an RPC of foo(.) to node n
n.bar denotes and RPC to fetch the value of the variable bar in node
n
We call the process of finding the successor of an id a LOOKUP
// ask node n to find the successor of id
procedure n.findSuccessor(id)
if predecessor nil id (predecessor, n] then return n
else if id (n, successor] then
return successor
else // forward the query around the circle
return successor.findSuccessor(id)
5/25/2017
EP2400 Network Algorithms, Seif Haridi
26
DHT Lookup and Update
// ask node n to find the successor of id
procedure n.put(id,value)
s = findSuccessor(id)
s.store(id,value)
procedure n.get(id)
s = findSuccessor(id)
return s.retrieve(id)
PUT and GET are nothing but lookups!!
5/25/2017
EP2400 Network Algorithms, Seif Haridi
27
Speeding up lookups
If only pointer to succ(n+1) is used
Worst case lookup time is N, for N nodes
Time to find data is
Improving lookup time (finger/routing table)
logarithmic
Point
to succ(n+1)
15
Size of routing
tables
is
Point to succ(n+2)
14
logarithmic
Point to succ(n+4)
Example:
Point to succ(n+8)
log2(1000000)≈20
…
Point to succ(n+2M-1)
EFFICIENT!
0
1
2
5/25/2017
Distance always halved to
the destination
13
3
12
4
5
11
6
10
9
8
7
28
Chord – Routing (1/7)
15
Routing table size: M,
where N = 2M
Every node n knows
successor(n + 2 i-1) ,
for i = 1..M
Routing entries = log2(N)
log2(N) hops from any
node to any other node
15 0
Get(15
)
1
2
14
13
3
12
4
11
5
10
6
9
5/25/2017
EP2400 Network Algorithms, Seif Haridi
8
7
29
Chord – Routing (2/7)
15
Routing table size: M,
where N = 2M
Every node n knows
successor(n + 2 i-1) ,
for i = 1..M
Routing entries = log2(N)
log2(N) hops from any
node to any other node
15 0
1
2
14
13
3
12
4
11
5
10
Get(15
)
5/25/2017
6
9
EP2400 Network Algorithms, Seif Haridi
8
7
30
Chord – Routing (3/7)
Get(15
)
Routing table size: M,
where N = 2M
Every node n knows
successor(n + 2 i-1) ,
for i = 1..M
Routing entries = log2(N)
log2(N) hops from any
node to any other node
15
15 0
1
2
14
13
3
12
4
11
5
10
6
9
5/25/2017
EP2400 Network Algorithms, Seif Haridi
8
7
31
Chord – Routing (4/7)
15
From node 1, only 2 hops to
14
node 0 where item 15 is
stored
13
For an id space of 16 is, the
maximum is log2(16) = 4
12
hops between any two nodes
In fact, if nodes are uniformly
distributed, the maximum is 11
log2(# of nodes), i.e. log2(8)
hops between any two nodes
10
The average complexity is:
½ log(#nodes)
5/25/2017
15 0
Get(15
)
1
2
3
4
5
6
9
EP2400 Network Algorithms, Seif Haridi
8
7
32
Chord – Routing (5/7)
Pseudo code findSuccessor(.)
// ask node n to find the successor of id
procedure n.findSuccessor(id)
if predecessor nil id (predecessor, n] then return n
if id (n, successor] then
return successor
else
n’ := closestPrecedingNode(id)
return n’.findSuccessor(id)
// search locally for the highest predecessor of id
procedure closestPrecedingNode(id)
for i = m downto 1 do
if finger[i] (n, id) then return finger[i]
end
return n
5/25/2017
EP2400 Network Algorithms, Seif Haridi
33
Chord: Discussion
We are basically done
But….
What about joins and failures/leaves?
Nodes come and go as they wish
What about data?
Should I lose my doc because some kid decided to shut
down his machine and he happened to store my file?
What about storing addresses of files instead of files?
What did we gain compared to Gnutella? Increased
guarantees and determinism?
So actually we just started..
5/25/2017
EP2400 Network Algorithms, Seif Haridi
34
Agenda
Handling successor pointers
Scalability
Joins, Leaves
Routing table reducing the cost from O(N) to
O(logN)
Failures (for all the above)
5/25/2017
EP2400 Network Algorithms, Seif Haridi
35
Handling Successors
Ring maintenance
Every thing depends on successor pointers,
so, we better have them right all the time!!
In Chord, in addition to the successor pointer,
every node has a predecessor pointer as well
for ring maintenance
5/25/2017
EP2400 Network Algorithms, Seif Haridi
36
Handling Dynamism
Periodic stabilization is used to make pointers
eventually correct
Try pointing succ to closest alive successor
Try pointing pred to closest alive predecessor
Periodically at n:
When receiving notify(p) at n:
1. v:=succ.pred
2. if vnil and v is in (n,succ]
3.
set succ:=v
4. send a notify(n) to succ
1. if pred=nil or p is in
(pred,n]
2.
set pred:=p
5/25/2017
EP2400 Network Algorithms, Seif Haridi
37
Handling joins
When n joins
Find n’s successor with lookup(n)
Set succ to n’s successor
Stabilization fixes the rest
15
13
11
Periodically at n:
When receiving notify(p) at n:
1.
2.
3.
4.
1.
2.
set v:=succ.pred
if v≠nil and v is in (n,succ]
set succ:=v
send a notify(n) to succ
5/25/2017
if pred=nil or p is in (pred,n]
set pred:=p
EP2400
S. Haridi,
Network
ID2210,
Algorithms,
Lecture
Seif
02Haridi
38
Handling Successors - Chord Algorithm
nil
5/25/2017
EP2400 Network Algorithms, Seif Haridi
39
Handling Join/Leaves For Fingers
Finger Stabilization (1/5)
Periodically refresh finger table entries, and
store the index of the next finger to fix
This is also the initialization procedure for the
finger table
Local variable next initially 0
procedure n.fixFingers()
next := next+1
if next > m then next := 1
finger[next] := findSuccessor(n 2next-1)
5/25/2017
EP2400 Network Algorithms, Seif Haridi
40
Example
finger stabilization (2/5)
Current situation: succ(N48) is N60
Succ(N21.Fingerj.start) = Succ(53) =N21.Fingerj.node = N60
N21.Fingerj.start
N21
5/25/2017
N26
N32
N48 N53
EP2400 Network Algorithms, Seif Haridi
N21.Fingerj.node
N60
41
Example
finger stabilization (3/5)
New node N56 joins and stabilizes successor pointer
Finger j of node N21 is wrong
N53 eventually try to fix finger j by looking up 53 which stops
at N48, however and nothing changes
N21.Fingerj.start
N21
N26
N32
N21.Fingerj.node
N60
N48 N53
N56
5/25/2017
EP2400 Network Algorithms, Seif Haridi
42
Example
finger stabilization (4/5)
N48 will eventually stabilize its successor
This means the ring is correct now.
N21.Fingerj.start
N21
5/25/2017
N26
N32
N48 N53
EP2400 Network Algorithms, Seif Haridi
N21.Fingerj.node
N56
N60
43
Example
finger stabilization (5/5)
When N21 tries to fix Finger j again, this time the response
from N48 will be correct and N21 corrects the finger
N21.Fingerj.node
N21.Fingerj.start
N21
5/25/2017
N26
N32
N48 N53
EP2400 Network Algorithms, Seif Haridi
N56
N60
44
Agenda
Handling successor pointers
Scalability
Joins, Leaves,
Routing table reducing the cost from O(N) to
O(log N)
Failures (for all the above)
Handling data
5/25/2017
Joins, Leaves
EP2400 Network Algorithms, Seif Haridi
45
Handling Failures –
Replication of Successors
Evidently the failure of one successor pointer
means total collapse
Solution: A node has a “successors list” of size r
containing the immediate r successors
How big should r be? log(N) or a large constant
should be ok
Enhance periodic stabilization to handle failures
5/25/2017
46
Dealing with failures
Each node keeps a successor-list
Pointer to r closest successors
succ(n+1)
succ(succ(n+1)+1)
succ(succ(succ(n+1)+1)+1)
...
15
0
1
14
2
13
3
12
4
5
11
If successor fails
6
10
9
Replace with closest alive successor
8
7
If predecessor fails
Set pred to nil
5/25/2017
EP2400 Network Algorithms, Seif Haridi
47
Handling leaves
When n leaves
Just dissappear (like failure)
15
13
When pred detected failed
Set pred to nil
When succ detected failed
Set succ to closest alive in successor
11
list
Periodically at n:
When receiving notify(p) at n:
1.
2.
3.
4.
1.
2.
set v:=succ.pred
if v≠nil and v is in (n,succ]
set succ:=v
send a notify(n) to succ
5/25/2017
if pred=nil or p is in (pred,n]
set pred:=p
EP2400
S. Haridi,
Network
ID2210,
Algorithms,
Lecture
Seif
02Haridi
48
Handling Failures- Ring (1/5)
Maintaining the ring
Each node maintains a successor list of length r
If a node’s immediate successor fails, it uses the second
entry in its successor list
updateSuccessorList copies a successor list from s:
removing last entry, and prepending s
Join a Chord containing node n’
procedure n.join(n’)
predecessor := nil
s := n’.findSuccessor(n)
updateSuccessorList(s.successorList)
5/25/2017
EP2400
S. Haridi,
Network
ID2210,
Algorithms,
Lecture
Seif
02Haridi
49
Handling Failures- Ring (2/5)
Check whether predecessor has failed (Failure
detector)
procedure n.checkPredecessor()
if predecessor has failed then
predecessor := nil
5/25/2017
EP2400 Network Algorithms, Seif Haridi
50
Handling Failures- Ring (3/5)
procedure n.stabilize()
s := Find first alive node in successorList
x := s.predecessor
if x not nil and x (n, s) then s := x end
updateSuccessorList(s.successorList)
s.notify(n)
procedure n.notify(n’)
if predecessor = nil or n’ (predecessor, n) then
predecessor := n’
5/25/2017
EP2400 Network Algorithms, Seif Haridi
51
Failure – Ring (4/5)
Example – Node failure (N26)
Initially
suc(N21,2)
suc(N21,1)
N26
N21
N32
pred(N32)
pred(N32)
suc(N26,1)
After N21 performed stabilize(), before N21.notify(N32)
suc(N21,1)
N21
N32
N26
pred(N32)
5/25/2017
EP2400 Network Algorithms, Seif Haridi
52
Failure – Ring (5/5)
Example - Node failure (N26)
After N21 performed stabilize(), before N21.notify(N32)
N21.notify(N32) has no effect
suc(N21,1)
N21
N32
N26
pred(N32)
After N32.checkPredecessor()
suc(N21,1)
N21
N26
N32
Next N21.stabilize() fixes N32’s predecessor
5/25/2017
EP2400 Network Algorithms, Seif Haridi
53
Failure – Lookups (1/5)
// ask node n to find the successor of id
procedure n.findSuccessor(id)
if id (n, successor] then
return successor
else
n’ := closestPreceedingNode(id)
return
try
n’.findSuccessor(id)
catch failure of n’ then
mark n’ in finger[.] as failed
n.findSuccessor(id)
// search locally for the highest predecessor of id
procedure closestPreceedingNode(id)
for i = m downto 1 do
if finger[i].node is alive and finger[i] (n, id) then return finger[i]
end
return n
5/25/2017
EP2400 Network Algorithms, Seif Haridi
54
Variations of Chord
DKS
Chord#
5/25/2017
EP2400 Network Algorithms, Seif Haridi
55
DKS Routing
Generalization of Chord to provide arbitrary
arity
Provide logk(n) hops per lookup
k being a configurable parameter
n being the number of nodes
Instead of only log2(n)
5/25/2017
EP2400 Network Algorithms, Seif Haridi
56
Achieving logk(n) lookup
Each node logk(N)=L levels, N=kL
Each level contains k intervals,
Example, k=4, N=64 (43), node 0
0
4
8
Interval 3
Interval 0
Node 0
I0
I1
I2
I3
12
Level 1 0…15 16…31 32…47 48…63
48
16
Interval 2
Interval 1
5/25/2017
EP2400 Network Algorithms, Seif Haridi
32
57
Achieving logk(n) lookup
Each node logk(N) levels, N=kL
Each level contains k intervals,
Example, k=4, N=64 (43), node 0
0
4
Interval 0
8
Node 0
Interval 1
I0
I1
I2
I3
12
Level 1 0…15 16…31 32…47 48…63
Interval 2
48
Interval 3
5/25/2017
16
Level 2
EP2400 Network Algorithms, Seif Haridi
32
0…3
4…7
8…11 12…15
58
Achieving logk(n) lookup
Each node logk(N) levels, N=kL
Each level contains k intervals,
Example, k=4, N=64 (43), node 0
0
4
8
Node 0
I0
I1
I2
I3
12
Level 1 0…15 16…31 32…47 48…63
48
16
5/25/2017
Level 2
0…3
4…7
Level 3
0
1
EP2400 Network Algorithms, Seif Haridi
32
8…11 12…15
2
3
59
Arity is Important
Maximum number of hops can be configured
k N
1
r
1
r
log k (N ) log 1 (N ) log 1
N
Nr
N r
r
r
Example, a 2-hop system
k
log
5/25/2017
N
N
(N ) 2
EP2400 Network Algorithms, Seif Haridi
60
Chord #
The routing table has exponentially
increasing pointers on the node space and
NOT the identifier space (skip-list like
structure)
5/25/2017
EP2400 Network Algorithms, Seif Haridi
61
Routing Table of Chord#
Building the routing table
log2N pointers
exponentially spaced
pointers
Chord#
5/25/2017
EP2400 Network Algorithms, Seif Haridi
62
Chord vs. Chord#
Good for load balancing
5/25/2017
EP2400 Network Algorithms, Seif Haridi
63