Download CS2104 Lecture1 - Royal Institute of Technology

Document related concepts

Cracking of wireless networks wikipedia , lookup

Computer network wikipedia , lookup

Network tap wikipedia , lookup

Backpressure routing wikipedia , lookup

Distributed operating system wikipedia , lookup

Recursive InterNetwork Architecture (RINA) wikipedia , lookup

Airborne Networking wikipedia , lookup

List of wireless community networks by region wikipedia , lookup

IEEE 802.1aq wikipedia , lookup

Routing wikipedia , lookup

CAN bus wikipedia , lookup

Routing in delay-tolerant networking wikipedia , lookup

Kademlia wikipedia , lookup

Transcript
Introduction to
Structured Overlay
Networks
Seif Haridi
KTH/SICS
5/25/2017
1
Presentation Overview



Gentle introduction to Structured Overlay
Networks and Distributed Hash Tables
General use of SONs and DHTs
Chord algorithms and others
5/25/2017
EP2400 Network Algorithms, Seif Haridi
2
What’s a Distributed Hash Table (DHT)?



5/25/2017
An ordinary hash table
, which is distributed
Key
Value
Alexander
Berlin
Ali
Stockholm
Marina
Gothenburg
Peter
Louvain la neuve
Seif
Stockholm
Stefan
Stockholm
Every node provides a lookup operation
Provide the value associated with a key
Nodes keep routing pointers
If item not found, route to another node
3
So what?
Characteristic
Scalability
properties
Time
to findofdata
is
Store
number
items
Self-management
routing info:
logarithmic
proportional
toinformation
number of
• Ensure
routing
Size
of routing tables is
nodes
is up-to-date
logarithmic
Typically:
Example:
of items:
of nodes can be huge Self-management
With D items and n nodes
that data is always
log2(1000000)≈20
Number of items can be huge • Ensure
Number
Store
D/n
items
per node
replicated
and available
EFFICIENT!
in presence joins/leaves/failures
Move D/n items when
nodes join/leave/fail
Routing information
Data items
EFFICIENT!
Self-manage
5/25/2017
4
Traditional Motivation (1/2)

Peer-to-Peer file sharing very
popular
central index

Napster




Completely centralized
Central server knows who has what
Judicial problems
decentralized index
Gnutella



5/25/2017
Completely decentralized
Ask everyone you know to find data
Very inefficient
EP2400 Network Algorithms, Seif Haridi
5
Traditional Motivation (2/2)

Grand vision of DHTs

Provide efficient file sharing

Quote from Chord: ”In particular, [Chord] can help avoid single points
of failure or control that systems like Napster possess, and the lack of
scalability that systems like Gnutella display because of their
widespread use of broadcasts.” [Stoica et al. 2001]

Hidden assumptions
 Millions of unreliable nodes
 User can switch off computer any time (leave=failure)
 Extreme dynamism (nodes joining/leaving/failing)
 Heterogeneity of computers and latencies
 Untrusted nodes
5/25/2017
EP2400 Network Algorithms, Seif Haridi
6
Motivation: DHT overlay as
communication infra-structure

Internet communication


Not suited for 21st century computing



5/25/2017
IP/port, TCP and UDP
Firewalls
NATs
Changing IP addresses
EP2400 Network Algorithms, Seif Haridi
7
Name based communication

DHT’s can overcome these
node A

Value
seif
130.237.32.51
babak
193.10.64.99
mihhail
18.7.22.83
olle
…
128.178.50.12
…
node B
node D
node C
How?



5/25/2017
Key
Use the DHT
Map names to locations
Bypass firewalls and NATs by routing through
neighbors
EP2400 Network Algorithms, Seif Haridi
8
Name based communication

What about group communication?


IP Multicast is not enabled on the Internet
Use the overlay to broadcast to all nodes

Create multiple groups, broadcast within each
node A
node A
5/25/2017
Key
Value
seif
130.237.32.51
babak
193.10.64.99
mihhail
18.7.22.83
olle
…
128.178.50.12
…
node B
node D
node C
Key
Value
seif
130.237.32.51
babak
193.10.64.99
mihhail
18.7.22.83
olle
…
128.178.50.12
EP2400 Network Algorithms, Seif Haridi
…
node B
node D
node C
9
What’s it good for?

Lets look at 10 applications built using such
systems
5/25/2017
EP2400 Network Algorithms, Seif Haridi
10
Distributed Authorization

Defense project at SICS, Swedish Institute
of Computer Science
node A
Key
Value
seif
130.237.32.51
babak
193.10.64.99
mihhail
18.7.22.83
olle
128.178.50.12
node B

Store certificates in
the directory


node C
node D
…
…
No central server
Survives even if nodes are attacked
5/25/2017
EP2400 Network Algorithms, Seif Haridi
11
Distributed Backup

Setup



Clients installed the backup tool
Decide on amount of space to share
Choose files for backup
node A

Regular backup


Key
Value
seif
130.237.32.51
babak
193.10.64.99
mihhail
18.7.22.83
olle
128.178.50.12
node B
Data is encrypted
Stored in the directory
node C
node D
…
5/25/2017
EP2400 Network Algorithms, Seif Haridi
…
12
Distributed File System



Similar to AFS and NFS
Files stored in directory
What is new?

Application logic self-managed




node A
Key
Value
seif
130.237.32.51
babak
193.10.64.99
mihhail
18.7.22.83
olle
128.178.50.12
node B
node C
node D
…
…
Add/remove servers on the fly
Automatically handles failures
Automatically load-balances
No manual configuration needed
5/25/2017
EP2400 Network Algorithms, Seif Haridi
13
P2P Cache

A distributed cache

Every node in an org. runs a client

Want to browse a web page?
 If exists locally -> download it from a peer
 Otherwise, fetch and cache
Key
Value
seif
130.237.32.51
babak
193.10.64.99
mihhail
18.7.22.83
olle
128.178.50.12
node A
node B

No central proxy needed
…
5/25/2017
EP2400 Network Algorithms, Seif Haridi
…
node C
node D
14
P2P Web Servers

Distributed Web Server
node A


Pages stored in the directory
What is new?
Key
Value
seif
130.237.32.51
babak
193.10.64.99
mihhail
18.7.22.83
olle
128.178.50.12
node B
…

node C
node D
…
Application logic self-managed
 Automatically load-balances
 Add/remove servers on the fly
 Automatically handles failures
5/25/2017
EP2400 Network Algorithms, Seif Haridi
15
P2P SIP
node A

Session Initiation Protocol
Key
Value
node B

Used to initiate calls on the Internet
seif
130.237.32.51
babak
193.10.64.99
mihhail
18.7.22.83
olle
128.178.50.12

Is being standardized

Use the directory to find end-hosts

5/25/2017
node C
node D
…
…
Improving Skype
EP2400 Network Algorithms, Seif Haridi
16
Host Identity Payload (HIP)

Uses the directory to provide seamless
mobility

Unlike Mobile IP


5/25/2017
No home agent needed
Self-managing
EP2400 Network Algorithms, Seif Haridi
17
PIER (databases)

A relational view of the directory

Use SQL to fetch data

5/25/2017
Standard operations (projection, selection, equijoin)
EP2400 Network Algorithms, Seif Haridi
18
Summary

DHT is a useful data structure

Assumptions mentioned might not be true



Moderate amount of dynamism
Leave not same thing as failure
Dedicated servers


5/25/2017
Nodes can be trusted
Less heterogeneity
EP2400 Network Algorithms, Seif Haridi
19
Chord as Example of DHT
5/25/2017
EP2400 Network Algorithms, Seif Haridi
20
How to construct a DHT (Chord)?

Use a logical name space, called the identifier space, consisting of
identifiers {0,1,2,…, N-1}

Identifier space is a logical ring modulo N

Every node picks a random identifier though Hash H

Example:


Space N=16 {0,…,15}
15




a picks 6
b picks 5
c picks 0
d picks 11
e picks 2
1
14
Five nodes a, b, c, d, e

0
2
13
3
12
4
5
11
5/25/2017
EP2400 Network Algorithms, Seif Haridi
6
10
9
8
7
21
Definition of Successor

The successor of an identifier is the
first node met going in clockwise direction
starting at the identifier

15
Example
succ(12)=14
 succ(15)=2
 succ(6)=6

0
1
14
2
13
3
12
4
5
11
5/25/2017
6
10
9
8
7
22
Where to store data (Chord) ?
Use
globally known hash function, H
Store number of items
Each item <key,value> gets
proportional to number of
identifier H(key) = k
nodes
Typically:
Store
is responsible for item k
With D items and nn nodes
Node
EFFICIENT!
 H(“Seif”)=9

5/25/2017
H(“Stefan”)=14
Alexander
Berlin
Marina
Seif
Gothenburg
Louvain la
neuve
Stockholm
Stefan
Stockholm
15
Example
Move D/n items
when
H(“Marina”)=12
nodes join/leave/fail
 H(“Peter”)=2
Value
Peter
each item at its successor
Store D/n items per node
Key
0
1
14
2
13
3
12
4
5
11
6
10
9
8
7
23
Where to point (Chord) ?
Each
node points to its successor
successor of a node n is succ(n+1)
Known as a node’s succ pointer
The
Each
node points to its predecessor
node met in anti-clockwise direction starting at n-1
Known as a node’s pred pointer
First
Example
0’s successor is succ(1)=2
 2’s successor is succ(3)=5
 5’s successor is succ(6)=6
 6’s successor is succ(7)=11
 11’s successor is succ(12)=0
15
1
14

5/25/2017
0
2
13
3
12
4
5
11
6
10
9
8
7
24
DHT Lookup
To
lookup a key k

Calculate H(k)
Follow succ pointers until item k is
found
Key
Value
Alexander
Berlin
Marina
Gothenburg
Peter
Louvain la neuve
Seif
Stockholm
Stefan
Stockholm

Example

Lookup ”Seif” at node 2

H(”Seif”)=9

Traverse nodes:


2, 5, 6, 11 (BINGO)
Return “Stockholm” to initiator
5/25/2017
15
0
1
14
2
13
3
12
4
5
11
6
10
9
8
7
25
DHT Lookup




(a, b] the segment of the ring moving clockwise from but not
including a until and including b
n.foo(.) denotes an RPC of foo(.) to node n
n.bar denotes and RPC to fetch the value of the variable bar in node
n
We call the process of finding the successor of an id a LOOKUP
// ask node n to find the successor of id
procedure n.findSuccessor(id)
if predecessor  nil  id  (predecessor, n] then return n
else if id (n, successor] then
return successor
else // forward the query around the circle
return successor.findSuccessor(id)
5/25/2017
EP2400 Network Algorithms, Seif Haridi
26
DHT Lookup and Update
// ask node n to find the successor of id
procedure n.put(id,value)
s = findSuccessor(id)
s.store(id,value)
procedure n.get(id)
s = findSuccessor(id)
return s.retrieve(id)

PUT and GET are nothing but lookups!!
5/25/2017
EP2400 Network Algorithms, Seif Haridi
27
Speeding up lookups

If only pointer to succ(n+1) is used

Worst case lookup time is N, for N nodes
Time to find data is
 Improving lookup time (finger/routing table)
logarithmic
 Point
to succ(n+1)
15
Size of routing
tables
is
 Point to succ(n+2)
14
logarithmic
Point to succ(n+4)
Example:
 Point to succ(n+8)
log2(1000000)≈20
…
 Point to succ(n+2M-1)
EFFICIENT!
0
1
2


5/25/2017
Distance always halved to
the destination
13
3
12
4
5
11
6
10
9
8
7
28
Chord – Routing (1/7)
15




Routing table size: M,
where N = 2M
Every node n knows
successor(n + 2 i-1) ,
for i = 1..M
Routing entries = log2(N)
log2(N) hops from any
node to any other node
15 0
Get(15
)
1
2
14
13
3
12
4
11
5
10
6
9
5/25/2017
EP2400 Network Algorithms, Seif Haridi
8
7
29
Chord – Routing (2/7)
15




Routing table size: M,
where N = 2M
Every node n knows
successor(n + 2 i-1) ,
for i = 1..M
Routing entries = log2(N)
log2(N) hops from any
node to any other node
15 0
1
2
14
13
3
12
4
11
5
10
Get(15
)
5/25/2017
6
9
EP2400 Network Algorithms, Seif Haridi
8
7
30
Chord – Routing (3/7)
Get(15
)




Routing table size: M,
where N = 2M
Every node n knows
successor(n + 2 i-1) ,
for i = 1..M
Routing entries = log2(N)
log2(N) hops from any
node to any other node
15
15 0
1
2
14
13
3
12
4
11
5
10
6
9
5/25/2017
EP2400 Network Algorithms, Seif Haridi
8
7
31
Chord – Routing (4/7)
15




From node 1, only 2 hops to
14
node 0 where item 15 is
stored
13
For an id space of 16 is, the
maximum is log2(16) = 4
12
hops between any two nodes
In fact, if nodes are uniformly
distributed, the maximum is 11
log2(# of nodes), i.e. log2(8)
hops between any two nodes
10
The average complexity is:
½ log(#nodes)
5/25/2017
15 0
Get(15
)
1
2
3
4
5
6
9
EP2400 Network Algorithms, Seif Haridi
8
7
32
Chord – Routing (5/7)
Pseudo code findSuccessor(.)
// ask node n to find the successor of id
procedure n.findSuccessor(id)
if predecessor  nil  id  (predecessor, n] then return n
if id (n, successor] then
return successor
else
n’ := closestPrecedingNode(id)
return n’.findSuccessor(id)
// search locally for the highest predecessor of id
procedure closestPrecedingNode(id)
for i = m downto 1 do
if finger[i] (n, id) then return finger[i]
end
return n
5/25/2017
EP2400 Network Algorithms, Seif Haridi
33
Chord: Discussion


We are basically done
But….


What about joins and failures/leaves?
 Nodes come and go as they wish
What about data?
 Should I lose my doc because some kid decided to shut
down his machine and he happened to store my file?
What about storing addresses of files instead of files?


What did we gain compared to Gnutella? Increased
guarantees and determinism?
So actually we just started..
5/25/2017
EP2400 Network Algorithms, Seif Haridi
34
Agenda

Handling successor pointers


Scalability


Joins, Leaves
Routing table reducing the cost from O(N) to
O(logN)
Failures (for all the above)
5/25/2017
EP2400 Network Algorithms, Seif Haridi
35
Handling Successors
Ring maintenance


Every thing depends on successor pointers,
so, we better have them right all the time!!
In Chord, in addition to the successor pointer,
every node has a predecessor pointer as well
for ring maintenance
5/25/2017
EP2400 Network Algorithms, Seif Haridi
36
Handling Dynamism
Periodic stabilization is used to make pointers
eventually correct


Try pointing succ to closest alive successor

Try pointing pred to closest alive predecessor
Periodically at n:
When receiving notify(p) at n:
1. v:=succ.pred
2. if vnil and v is in (n,succ]
3.
set succ:=v
4. send a notify(n) to succ
1. if pred=nil or p is in
(pred,n]
2.
set pred:=p
5/25/2017
EP2400 Network Algorithms, Seif Haridi
37
Handling joins

When n joins

Find n’s successor with lookup(n)

Set succ to n’s successor

Stabilization fixes the rest
15
13
11
Periodically at n:
When receiving notify(p) at n:
1.
2.
3.
4.
1.
2.
set v:=succ.pred
if v≠nil and v is in (n,succ]
set succ:=v
send a notify(n) to succ
5/25/2017
if pred=nil or p is in (pred,n]
set pred:=p
EP2400
S. Haridi,
Network
ID2210,
Algorithms,
Lecture
Seif
02Haridi
38
Handling Successors - Chord Algorithm
nil
5/25/2017
EP2400 Network Algorithms, Seif Haridi
39
Handling Join/Leaves For Fingers
Finger Stabilization (1/5)
Periodically refresh finger table entries, and
store the index of the next finger to fix
 This is also the initialization procedure for the
finger table
 Local variable next initially 0
procedure n.fixFingers()
next := next+1
if next > m then next := 1
finger[next] := findSuccessor(n  2next-1)

5/25/2017
EP2400 Network Algorithms, Seif Haridi
40
Example
finger stabilization (2/5)


Current situation: succ(N48) is N60
Succ(N21.Fingerj.start) = Succ(53) =N21.Fingerj.node = N60
N21.Fingerj.start
N21
5/25/2017
N26
N32
N48 N53
EP2400 Network Algorithms, Seif Haridi
N21.Fingerj.node
N60
41
Example
finger stabilization (3/5)



New node N56 joins and stabilizes successor pointer
Finger j of node N21 is wrong
N53 eventually try to fix finger j by looking up 53 which stops
at N48, however and nothing changes
N21.Fingerj.start
N21
N26
N32
N21.Fingerj.node
N60
N48 N53
N56
5/25/2017
EP2400 Network Algorithms, Seif Haridi
42
Example
finger stabilization (4/5)


N48 will eventually stabilize its successor
This means the ring is correct now.
N21.Fingerj.start
N21
5/25/2017
N26
N32
N48 N53
EP2400 Network Algorithms, Seif Haridi
N21.Fingerj.node
N56
N60
43
Example
finger stabilization (5/5)

When N21 tries to fix Finger j again, this time the response
from N48 will be correct and N21 corrects the finger
N21.Fingerj.node
N21.Fingerj.start
N21
5/25/2017
N26
N32
N48 N53
EP2400 Network Algorithms, Seif Haridi
N56
N60
44
Agenda

Handling successor pointers


Scalability



Joins, Leaves,
Routing table reducing the cost from O(N) to
O(log N)
Failures (for all the above)
Handling data

5/25/2017
Joins, Leaves
EP2400 Network Algorithms, Seif Haridi
45
Handling Failures –
Replication of Successors




Evidently the failure of one successor pointer
means total collapse
Solution: A node has a “successors list” of size r
containing the immediate r successors
How big should r be? log(N) or a large constant
should be ok
Enhance periodic stabilization to handle failures
5/25/2017
46
Dealing with failures

Each node keeps a successor-list

Pointer to r closest successors
 succ(n+1)
 succ(succ(n+1)+1)
 succ(succ(succ(n+1)+1)+1)
 ...
15
0
1
14
2
13
3
12
4
5
11

If successor fails


6
10
9
Replace with closest alive successor
8
7
If predecessor fails

Set pred to nil
5/25/2017
EP2400 Network Algorithms, Seif Haridi
47
Handling leaves
When n leaves

Just dissappear (like failure)

15
13
When pred detected failed

Set pred to nil

When succ detected failed

Set succ to closest alive in successor
11
list

Periodically at n:
When receiving notify(p) at n:
1.
2.
3.
4.
1.
2.
set v:=succ.pred
if v≠nil and v is in (n,succ]
set succ:=v
send a notify(n) to succ
5/25/2017
if pred=nil or p is in (pred,n]
set pred:=p
EP2400
S. Haridi,
Network
ID2210,
Algorithms,
Lecture
Seif
02Haridi
48
Handling Failures- Ring (1/5)

Maintaining the ring




Each node maintains a successor list of length r
If a node’s immediate successor fails, it uses the second
entry in its successor list
updateSuccessorList copies a successor list from s:
removing last entry, and prepending s
Join a Chord containing node n’
procedure n.join(n’)
predecessor := nil
s := n’.findSuccessor(n)
updateSuccessorList(s.successorList)
5/25/2017
EP2400
S. Haridi,
Network
ID2210,
Algorithms,
Lecture
Seif
02Haridi
49
Handling Failures- Ring (2/5)
Check whether predecessor has failed (Failure
detector)
procedure n.checkPredecessor()
if predecessor has failed then
predecessor := nil

5/25/2017
EP2400 Network Algorithms, Seif Haridi
50
Handling Failures- Ring (3/5)
procedure n.stabilize()
s := Find first alive node in successorList
x := s.predecessor
if x not nil and x  (n, s) then s := x end
updateSuccessorList(s.successorList)
s.notify(n)
procedure n.notify(n’)
if predecessor = nil or n’ (predecessor, n) then
predecessor := n’
5/25/2017
EP2400 Network Algorithms, Seif Haridi
51
Failure – Ring (4/5)
Example – Node failure (N26)

Initially
suc(N21,2)
suc(N21,1)
N26
N21
N32
pred(N32)
pred(N32)

suc(N26,1)
After N21 performed stabilize(), before N21.notify(N32)
suc(N21,1)
N21
N32
N26
pred(N32)
5/25/2017
EP2400 Network Algorithms, Seif Haridi
52
Failure – Ring (5/5)
Example - Node failure (N26)


After N21 performed stabilize(), before N21.notify(N32)
N21.notify(N32) has no effect
suc(N21,1)
N21
N32
N26
pred(N32)

After N32.checkPredecessor()
suc(N21,1)
N21

N26
N32
Next N21.stabilize() fixes N32’s predecessor
5/25/2017
EP2400 Network Algorithms, Seif Haridi
53
Failure – Lookups (1/5)
// ask node n to find the successor of id
procedure n.findSuccessor(id)
if id (n, successor] then
return successor
else
n’ := closestPreceedingNode(id)
return
try
n’.findSuccessor(id)
catch failure of n’ then
mark n’ in finger[.] as failed
n.findSuccessor(id)
// search locally for the highest predecessor of id
procedure closestPreceedingNode(id)
for i = m downto 1 do
if finger[i].node is alive and finger[i] (n, id) then return finger[i]
end
return n
5/25/2017
EP2400 Network Algorithms, Seif Haridi
54
Variations of Chord

DKS

Chord#
5/25/2017
EP2400 Network Algorithms, Seif Haridi
55
DKS Routing



Generalization of Chord to provide arbitrary
arity
Provide logk(n) hops per lookup
 k being a configurable parameter
 n being the number of nodes
Instead of only log2(n)
5/25/2017
EP2400 Network Algorithms, Seif Haridi
56
Achieving logk(n) lookup

Each node logk(N)=L levels, N=kL
Each level contains k intervals,

Example, k=4, N=64 (43), node 0

0
4
8
Interval 3
Interval 0
Node 0
I0
I1
I2
I3
12
Level 1 0…15 16…31 32…47 48…63
48
16
Interval 2
Interval 1
5/25/2017
EP2400 Network Algorithms, Seif Haridi
32
57
Achieving logk(n) lookup

Each node logk(N) levels, N=kL
Each level contains k intervals,

Example, k=4, N=64 (43), node 0

0
4
Interval 0
8
Node 0
Interval 1
I0
I1
I2
I3
12
Level 1 0…15 16…31 32…47 48…63
Interval 2
48
Interval 3
5/25/2017
16
Level 2
EP2400 Network Algorithms, Seif Haridi
32
0…3
4…7
8…11 12…15
58
Achieving logk(n) lookup

Each node logk(N) levels, N=kL
Each level contains k intervals,

Example, k=4, N=64 (43), node 0

0
4
8
Node 0
I0
I1
I2
I3
12
Level 1 0…15 16…31 32…47 48…63
48
16
5/25/2017
Level 2
0…3
4…7
Level 3
0
1
EP2400 Network Algorithms, Seif Haridi
32
8…11 12…15
2
3
59
Arity is Important

Maximum number of hops can be configured
k N
1
r
1

r
log k (N )  log 1 (N )  log 1  
N

Nr
N r 






r

  r


Example, a 2-hop system
k 
log
5/25/2017
N
N
(N )  2
EP2400 Network Algorithms, Seif Haridi
60
Chord #

The routing table has exponentially
increasing pointers on the node space and
NOT the identifier space (skip-list like
structure)
5/25/2017
EP2400 Network Algorithms, Seif Haridi
61
Routing Table of Chord#
Building the routing table


log2N pointers
exponentially spaced
pointers
Chord#
5/25/2017
EP2400 Network Algorithms, Seif Haridi
62
Chord vs. Chord#
Good for load balancing
5/25/2017
EP2400 Network Algorithms, Seif Haridi
63