Download 10structured

Document related concepts

Backpressure routing wikipedia , lookup

Airborne Networking wikipedia , lookup

List of wireless community networks by region wikipedia , lookup

Recursive InterNetwork Architecture (RINA) wikipedia , lookup

IEEE 802.1aq wikipedia , lookup

Distributed operating system wikipedia , lookup

CAN bus wikipedia , lookup

Peer-to-peer wikipedia , lookup

Routing in delay-tolerant networking wikipedia , lookup

Kademlia wikipedia , lookup

Transcript
Structured P2P
Overlays
Classification of the
P2P File Sharing Systems
• Hybrid (Broker-mediated)
– Unstructured+ centralized
• Ex.: Napster
– Unstructured + super peer notion
• Ex.: KazaA, Morpheus
• Unstructured decentralized (or loosely controlled)
+ Files can be anywhere
+ Support of partial name and keyword queries
– Inefficient search (some heuristics exist) & no
guarantee of finding
• Ex.: Gnutella
• Structured (or tightly controlled, DHT)
+ Files are rigidly assigned to specific nodes
+ Efficient search & guarantee of finding
– Lack of partial name and keyword queries
• Ex.: Chord, CAN, Pastry, Tapestry
Resource Discovery
Comparing Some File Sharing
Methods
Centralized
 One or few central coordinator(s)
 e. g. Napster, Instant Messengers
Fully Decentralized
 All peers (or none)
contain routing information
 e. g. Freenet, Gnutella
Hybrid
 Some superpeers carry indexing information
 e. g. FastTrack (Kazaa, Morpheus), Gnutella derivates
Resource Discovery in P2P Systems
1st Generation:
Central server
Central index
Napster
2nd Generation:
peer
No central server
Flooding
Gnutella
peer
GET file
peer
peer
Search query
Start
1
2
4
3rd Generation:
Distributed Hash Table
(self organizing overlay
network: topology,
document routing)
- structured
7
06
Interv Succ
[1,2)
1
[2,4)
3
[4,0)
0
1
Start
2
13
5
Interv Succ
[2,3)
3
[3,5)
3
[5,1)
0
Start
4
2 57
Interv Succ
[4,5)
0
[5,7)
0
[7,3)
0
2
6
5
3
4
CAN, Chord,
Pastry, etc.
Challenges
• Duplicated Messages
– From loop and flooding
• Missing some contents
– From loop and TTL
• Oriented to File
• Why?
– Unstructured Network
– Too Specific
Structured P2P
• Second generation P2P overlay networks
• Self-organizing
• Load balanced
• Fault-tolerant
• Scalable guarantees on numbers of hops to answer a query
– Major difference with unstructured P2P systems
• Based on a distributed hash table interface
Distributed Hash Tables (DHT)
• Distributed version of a hash table data structure
• Stores (key, value) pairs
– The key is like a filename
– The value can be file contents
• Goal: Efficiently insert/lookup/delete (key, value) pairs
• Each peer stores a subset of (key, value) pairs in the system
• Core operation: Find node responsible for a key
– Map key to node
– Efficiently route insert/lookup/delete request to this
node
DHT Applications
• Many services can be built on top of a DHT
interface
–
–
–
–
–
–
–
File sharing
Archival storage
Databases
Naming, service discovery
Chat service
Rendezvous-based communication
Publish/Subscribe
DHT Desirable Properties
• Keys mapped evenly to all nodes in the
network
• Each node maintains information about only
a few other nodes
• Messages can be routed to a node
efficiently
• Node arrival/departures only affect a few
nodes
DHT Routing Protocols
•
DHT is a generic interface
•
There are several implementations of this interface
•
–
–
–
–
Chord [MIT]
Pastry [Microsoft Research UK, Rice University]
Tapestry [UC Berkeley]
Content Addressable Network (CAN) [UC Berkeley]
–
–
–
–
–
SkipNet [Microsoft Research US, Univ. of Washington]
Kademlia [New York University]
Viceroy [Israel, UC Berkeley]
P-Grid [EPFL Switzerland]
Freenet [Ian Clarke]
These systems are often referred to as P2P routing substrates or P2P
overlay networks
Structured Overlays
• Properties
– Topology is tightly controlled
• Well-defined rules determine to which other
nodes a node connects
– Files placed at precisely specified locations
• Hash function maps file names to nodes
– Scalable routing based on file attributes
Second generation P2P systems
• They guarantee a definite answer to a
query in a bounded number of network
hops.
• They form a self-organizing overlay
network.
• They provide a load balanced, faulttolerant distributed hash table, in which
items can be inserted and looked up in a
bounded number of forwarding hops.
Approach to Structured P2P
Network
• Contribute to a way
– to construct structured and general P2P network without loops
and TTL
– to know knowledge about constructed P2P network
• 2-D Space
– Mapping each nodes’ network identifier into 2-D space
– Zone
• Each node occupies allocated area
• Aggregate nodes with same network identifier into a zone
• Maintain a binary tree
– Core
• Represent each zone
• Manage it’s zone
• Gateway between neighbor zones and it’s member
– Member
• Belonged to a zone
• Each message should be sent to its zone and members in its zone
Resource Discovery
Document Routing Shortly
001
012
• Chord, CAN, Tapestry,
Pastry model
• Benefits:
212 ?
212 ?
– More efficient searching
– Limited per-node state
• Drawbacks:
– Limited fault-tolerance vs
redundancy
332
212
305
Scalability of P2P Systems
• Peer-to-peer (P2P) file sharing systems are
now one of the most popular Internet
applications and have become a major
source of Internet traffic
• Thus, it is extremely important that these
systems be scalable
• Unfortunately, the initial designs for P2P
systems have significant scaling problems:
– Napster has a centralized directory service
– Gnutella employs a flooding-based search
mechanism that is not suitable for large
systems
Motivation
How to find data in a distributed file sharing system?
Publisher
Key=“LetItBe”
Value=MP3 data
N2
N1
Internet
N4

N3
Lookup is the key problem
N5
Client ?
Lookup(“LetItBe”)
Centralized Solution

Central server (Napster)
Publisher
Key=“LetItBe”
Value=MP3 data
N2
N1
N3
Internet
DB
N4
Requires O(M) state
 Single point of failure

N5
Client
Lookup(“LetItBe”)
Distributed Solution (1)

Flooding (Gnutella, Morpheus, etc.)
Publisher
Key=“LetItBe”
Value=MP3 data
N2
N1
Internet
N4

N3
N5
Worst case O(N) messages per lookup
Client
Lookup(“LetItBe”)
Distributed Solution (2)

Routed messages (Freenet, Tapestry, Chord, CAN, etc.)
Publisher
Key=“LetItBe”
Value=MP3 data
N2
N1
Internet
N4

Only exact matches
N3
N5
Client
Lookup(“LetItBe”)
Distributed Hash Table (DHT)
Based Systems
• In response to these scaling problems, several research
groups have proposed a new generation of scalable P2P
systems that support a DHT functionality
–
–
–
–
Tapestry
Pastry
Chord
Content-Addressable Networks (CAN)
• In these systems:
– files are associated with a key (produced, e.g., by hashing
the file name) and
– each node in the system is responsible for storing a
certain range of keys
Structured P2P Applications
• A fundamental problem that confronts P2P applications is to
efficiently locate the node that stores a particular data
(file) item
• Data location can be easily implemented by associating a key
with each data item, and storing the key/data item pair at
the node to which the key maps
• The algorithms support the next operation:
Given a key, it maps the key onto a node
• Hash tables are used to map keys onto values that represent
nodes
Example P2P problem: lookup
N1
Key=“title”
Value=file data…
Publisher
Internet
N4
•
N2
N5
N3
?
N6
At the heart of all P2P systems
Client
Lookup(“title”)
Structured P2P Applications
• P2P routing protocols like Chord, Pastry, CAN, and
Tapestry induce a connected overlay network
across the Internet, with a rich structure that
enables efficient key lookups
• Such protocols have 2 parts:
– Looking up file item in a specially constructed
overlay structure
– A protocol is specified that allows a node to join or
leave the network, properly rearranging the ideal
overlay to account for their presence or absence
Looking Up
• It is a basic operation in these DHT
systems
• lookup(key) returns the identity (e.g., the
IP address) of the node storing the object
with that key
• This operation allows nodes to put and get
files based on their key, thereby
supporting the hash-table-like interface
Document Routing
• The core of these DHT systems is the
routing algorithm
• The DHT nodes form an overlay network
with each node having several other nodes
as neighbors
• When a lookup(key) is issued, the lookup is
routed through the overlay network to the
node responsible for that key
• The scalability of these DHT algorithms is
tied directly to the efficiency of their
routing algorithms
Document Routing Algorithms
• They take, as input, a key and, in response, route a message to the
node responsible for that key
– The keys are strings of digits of some length
– Nodes have identifiers, taken from the same space as the keys
(i.e., same number of digits)
• Each node maintains a routing table consisting of a small subset of
nodes in the system
• When a node receives a query for a key for which it is not
responsible, the node routes the query to the neighbour node that
makes the most “progress” towards resolving the query
– The notion of progress differs from algorithm to algorithm, but
in general is defined in terms of some distance between the
identifier of the current node and the identifier of the queried
key
Content-Addressable Network
(CAN)
• A typical document routing method
• Virtual Cartesian coordinate space is used
• Entire space is partitioned amongst all the nodes
– every node “owns” a zone in the overall space
• Abstraction
– can store data at “points” in the space
– can route from one “point” to another
• Point = node that owns the enclosing zone
Basic Concept of CAN
Data stored in the CAN is addressed by
name (i.e. key), not location (i.e. IP
address)
Task of the routing: how find the place of
a file?
CAN Example: Two Dimensional
Space
• Space divided between
7
nodes
6
• All nodes cover the entire 5
space
4
• Each node covers either a 3
square or a rectangular 2
area of ratios 1:2 or 2:1 1
• Example:
0
– Node n1:(1, 2) first node
that joins  cover the
entire space
n1
0
1
2
3
4
5
6
7
CAN Example: Two Dimensional
Space
• Node n2:(4, 2) joins 
space is divided between
n1 and n2
7
6
5
4
3
n2
n1
2
1
0
0
1
2
3
4
5
6
7
CAN Example: Two Dimensional
Space
• Node n3:(3, 5) joins 
space is divided between
n1 and n3
7
6
n3
5
4
3
n2
n1
2
1
0
0
1
2
3
4
5
6
7
CAN Example: Two Dimensional
Space
• Nodes n4:(5, 5) and
n5:(6,6) join
7
6
n5
n4
n3
5
4
3
n2
n1
2
1
0
0
1
2
3
4
5
6
7
CAN Example: Two Dimensional
Space
• Nodes: n1:(1, 2); n2:(4,2);
n3:(3, 5);
n4:(5,5);n5:(6,6)
• Items: f1:(2,3); f2:(5,1);
f3:(2,1); f4:(7,5);
7
6
n5
n4
n3
5
f4
4
f1
3
n2
n1
2
f3
1
f2
0
0
1
2
3
4
5
6
7
CAN Example: Two Dimensional
Space
• Each item is stored by
the node who owns its
mapping in the space
7
6
n5
n4
n3
5
f4
4
f1
3
n2
n1
2
f3
1
f2
0
0
1
2
3
4
5
6
7
CAN: Query Example
• Each node knows its
neighbours in the d-space
• Forward query to the
neighbour that is closest to
the query id
• Example: assume Node n1
queries File Item f4
7
6
n5
n4
n3
5
f4
4
f1
3
n2
n1
2
f3
1
f2
0
0
1
2
3
4
5
6
7
CAN: Query Example
• Each node knows its
neighbours in the d-space 7
6
• Forward query to the
neighbour that is closest 5
4
to the query id
• Example: assume Node n1 3
2
queries File Item f4
n5
n4
n3
f4
f1
n2
n1
f3
1
f2
0
0
1
2
3
4
5
6
7
CAN: Query Example
• Each node knows its
neighbours in the d-space 7
6
• Forward query to the
neighbour that is closest 5
4
to the query id
• Example: assume Node n1 3
2
queries File Item f4
n5
n4
n3
f4
f1
n2
n1
f3
1
f2
0
0
1
2
3
4
5
6
7
CAN: Query Example
• Each node knows its
neighbours in the d-space 7
6
• Forward query to the
neighbour that is closest 5
4
to the query id
• Example: assume Node n1 3
2
queries File Item f4
n5
n4
n3
f4
f1
n2
n1
f3
1
f2
0
0
1
2
3
4
5
6
7
Resource Discovery
Document Routing – CAN
• Associate to each node and item a unique id
(nodeId and fileId) in an d-dimensional space
• Goals
– Scales to hundreds of thousands of nodes
– Handles rapid arrival and failure of nodes
• Properties
– Routing table size O(d)
– Guarantees that a file is found in at most d*n1/d
steps, where n is the total number of nodes
Overview of Structured P2P
Network
216
zone
0
Core node
216
Member node
Member Tree
Tx within a zone
Tx between zones
Tx between zones
Elements of Structured P2P
Network
• Core/Member Nodes
– Neighboring zone information
• core info, zone info, direction
– Member information
• member node information, routing table
• Strategies
–
–
–
–
–
Routing Messages
Constructing Structured P2P Network
Managing Zone
Constructing Member Tree
Discovering Contents
Core/Member nodes
• 7 neighboring zone information
– Core node (IP, Port#)
– Zone Range (x1,y1)~(x2,y2)
• Numbering zone
– 4 bits
– 00 : less than
– 01 : belong to
– 10 : greater than
• Member information
– IP, Port#
• Member Tree
– Uplink node info (only 1)
– Downlink node info (limited by
2)
1000
1001
0110
0100
0000
1010
0001
0010
• Within a zone
Routing Messages
– Depends on the Member
Tree (Binary Tree)
• Between zones
– If not a core, just send its
core
– Then core route this message
along X coordinate until
reaching destination x
– After that, route the
message along Y coordinate
• Every Message should have
originator’s IP and Port
1000
1001
0110
0100
0000
1010
0001
0010
Constructing Structured P2P
Network(JOIN)
Node
Core
Bootstrapping
RP
JOIN/(JOIN_FWD)
Routing Message
Zone Management
Join As Core/Join As Member
Inform Neighboring Zones
Inform Members
Inform Neighboring Zones
Managing Zone(1)
same network identifier?
Yes
No
Msg Type?
AsMember
Split Zone
&
Accept as a Member
Rearrange Neighbors
Inform Members
Inform Neighbors
AsCore
Set itself
&
Inform Neighbors
Msg AsCore
Msg AsMember
Reply
Join Completed
Managing Zone(2)
• Splitting Zone
Y
– Network ID of New node is
within its zone range, but
Network ID is different
• Direction of Split
– X or Y direction
– Depends on Difference of
X and Y between two
network IDs
• Rearrange neighboring
zones
• Two nodes inform neighbors
of this change
X
Constructing Member Tree
• Each node
6
– Maintain information
of all members
– Creates a binary tree
2
4
• Using sorted IP
address
Core
– Rule
• one link between
core and a member
• Uplink is only one
• Downlink is limited by
2
1
5
3
7
Discovering Content
Y
216-1
• Content Discovery
– Send the Msg to its
Member and it’s core
– Core
• On receiving it, Send
it neighbor zones
along X coordinate
• Also send it
Neighboring Y zones
with flooding
• DiscoveryHit
0
X
216-1
Other type of P2P Storage
Systems
d467c4
d462ba
d471f1
d46a1c
d4213f
d13da3
Route(d46a1c)
65a1fc
• Example: Chord, Pastry or
Tapestry P2P systems
• Every node is responsible for a
subset of the data
• Routing algorithm locates data,
with small per-node routing state
• Volunteer nodes join and leave
system at any time
• All nodes have identical
responsibilities
• All communication is symmetric
P2P Ring
• Nodes are arranged in a
ring based on id
• Ids are assigned randomly
• Very large id space
P2P Storage
14
16
• Data items also have ids
• Every node is responsible
for a subset of the data
• Assignment based on f(id),
e.g., id’s successor
• Supports reconfiguration
Routing State
• Nodes have limited info
• Small routing state,
changes slowly
• Each node knows its k
successors
• Each node knows log(N)
fingers
Routing
14
16
• Route via binary search
• Use fingers first
• Then successors
• Cost is order log n
Chord
• Chord:

Provides peer-to-peer hash lookup service:
 Lookup(key)


 IP address
Chord does not store the data
Efficient: O(Log N) messages per lookup

N is the total number of servers

Scalable: O(Log N) state per node

Robust: survives massive changes in membership
Chord: Lookup Mechanism

Lookups take O(Log N) hops
N5
N10
N110
N20 K19
N99
N32 Lookup(K19)
N80
N60
Document Routing – Chord
N5
N10
N110
• MIT project
• Uni-dimensional ID space
N99
• Keep track of log N
nodes
• Search through log N
nodes to find desired
N80
key
K19
N20
N32
N60
Routing Challenges

Define a useful key nearness metric

Keep the hop count small

Keep the routing tables “right size”

Stay robust despite rapid changes in membership
Authors claim:
Chord: emphasizes efficiency and
simplicity

Chord Overview

Provides peer-to-peer hash lookup service:

Lookup(key)  IP address

Chord does not store the data

How does Chord locate a node?

How does Chord maintain routing tables?

How does Chord cope with changes in membership?
Chord properties

Efficient: O(Log N) messages per lookup

N is the total number of servers

Scalable: O(Log N) state per node

Robust: survives massive changes in membership

Proofs are in paper / tech report

Assuming no malicious participants
Chord IDs

m bit identifier space for both keys and nodes

Key identifier = SHA-1(key)
Key=“LetItBe”

SHA-1
ID=60
Node identifier = SHA-1(IP address)
IP=“198.10.10.1”
SHA-1
ID=123

Both are uniformly distributed

How to map key IDs to node IDs?
Consistent Hashing [Karger 97]
0 K5
IP=“198.10.10.1”
N123
K101
N90

K20
Circular 7-bit
ID space
N32
Key=“LetItBe”
K60
A key is stored at its successor: node with next higher ID
Consistent Hashing

Every node knows of every other node
 requires global information

Routing tables are large O(N)

Lookups are fast O(1)
0
N10
Where is “LetItBe”?
Hash(“LetItBe”) = K60
N123
N32
“N90 has K60”
K60
N90
N55
Chord: Basic Lookup

Every node knows its successor in the ring
0
N10
N123
Where is “LetItBe”?
Hash(“LetItBe”) = K60
N32
“N90 has K60”
K60 N90

requires O(N) time
N55
“Finger Tables”

Every node knows m other nodes in the ring

Increase distance exponentially
N112
80 + 25
N96
80 + 24
80 + 23
80 + 22
80 + 21
80 + 20
N80
N16
80 + 26
“Finger Tables”

Finger i points to successor of n+2i
N120
N112
80 + 25
N96
80 + 24
80 + 23
80 + 22
80 + 21
80 + 20
N80
N16
80 + 26
Lookups are Faster

Lookups take O(Log N) hops
N5
N10
N110
N20 K19
N99
N32 Lookup(K19)
N80
N60
Joining the Ring


Three step process:

Initialize all fingers of new node

Update fingers of existing nodes

Transfer keys from successor to new node
Less aggressive mechanism (lazy finger update):

Initialize only the finger to successor node

Periodically verify immediate successor, predecessor

Periodically refresh finger table entries
Joining the Ring - Step 1

Initialize the new node finger table

Locate any node p in the ring

Ask node p to lookup fingers of new node N36

Return results to new node
N5
N20
N36
N99
1. Lookup(37,38,40,…,100,164)
N40
N80
N60
Joining the Ring - Step 2

Updating fingers of existing nodes

new node calls update function on existing nodes
existing nodes can recursively update fingers of other
nodes

N5
N20
N99
N36
N40
N80
N60
Joining the Ring - Step 3

Transfer keys from successor node to new node

only keys in the range are transferred
N5
N20
N99
N36
K30
N40 K38
K30
N80
K38
N60
Copy keys 21..36
from N40 to N36
Handing Failures

Failure of nodes might cause incorrect lookup
N120
N113
N10
N102
N85
Lookup(90)
N80

N80 doesn’t know correct successor, so lookup fails

Successor fingers are enough for correctness
Handling Failures


Use successor list

Each node knows r immediate successors

After failure, will know first live successor

Correct successors guarantee correct lookups
Guarantee is with some probability
Can choose r to make probability of lookup failure
arbitrarily small

Evaluation Overview

Quick lookup in large systems

Low variation in lookup costs

Robust despite massive failure

Experiments confirm theoretical results
Cost of lookup
Cost is O(Log N) as predicted by theory

constant is 1/2
Average Messages per Lookup

Number of Nodes
Robustness

Simulation results: static scenario

Failed lookup means original node with key failed (no replica of keys)

Result implies good balance of keys among nodes!
Robustness

Simulation results: dynamic scenario

Failed lookup means finger path has a failed node

500 nodes initially

average stabilize( ) call 30s

1 lookup per second (Poisson)

x join/fail per second (Poisson)
Strengths

Based on theoretical work (consistent hashing)

Proven performance in many different aspects


“with high probability” proofs
Robust (Is it?)
Weakness

NOT that simple (compared to CAN)

Member joining is complicated


aggressive mechanisms requires too many messages and updates

no analysis of convergence in lazy finger mechanism
Key management mechanism mixed between layers

upper layer does insertion and handle node failures

Chord transfer keys when node joins (no leave mechanism!)

Routing table grows with # of members in group

Worst case lookup can be slow
Chord (content based search)
• Chord is a lookup service, not a
search service
– Based on binary search trees
0
• Provides just one operation:
0
– A peer-to-peer hash lookup:
• Lookup(key)  IP address
• Chord does not store the
data
– Uses Hash function:
• Key identifier = SHA-1
(key)
• Node identifier = SHA-1
(IP address)
– Both are uniformly distributed
– Both exist in the same ID
space
• How to map key IDs to node
1
4
6
- a node
K11
7
10
- an
item
K0
N10
N1
Circular
ID space
K7
K4
M
Chord
(content
based
search)
The goal of Chord is to provide the
performance of a binary search which
means O(log N) query path-length
In order to manage a maximum pathlength O(log N) each node maintains a
routing table (called “finger table”) with at
most m entries (where m=logN)
 The ith entry in the table at node n
contains the identity of the first node s
that succeeds n by at least 2i-1 on the
identifier circle (all arithmetic modulo 2m)
 i.e., s = successor(n + 2i-1), 1≤ i ≤ m
 Note that the first finger of n is its
immediate successor on the circle
Start (n +
2i-1)
Interval
of
responsib
ility
Succes
sor
1
[1,2)
1
2
[2,4)
3
[4,0)
0
4
0
7
1
6
2
5
4
not existing node, but a
possible value in ID space
3
existing node
Chord (content based search)
St Int
art
.
Important characteristics
 Each node stores info only about a
small number of possible IDs (at most
logN)
 Knows more info about nodes
closely following it on the identifier
circle
 A node’s table does not generally
contain enough info to locate the
successor of an arbitrary key k
0
Suc
c.
1
[1,
2)
1
2
[2,
4)
3
4
[4,
1 0)
7
6
St Int Suc
0art
.
c.
2
[2,
3)
3
3
[3,
5)
3
5
[5,
1)
0
2
5
3
4
St Int Suc
art
.
c.
4
[4,
5)
0
5
[5,
0
Chord (content based search)
“Finger Table” Allows
How
do we locate the successor of a key
k?
If n can find a node whose ID is closer
than its own to k, that node will know more
about the identifier circle in the region of k
than n does
Thus n searches its finger table for the
node j whose ID most immediately
precedes
k, and
asks j for the node it knows N110
By repeating
this
whose IDnislearns
closest to k
process,
about nodes with IDs sta Interv Suc
al
c.
closer and closer to k rt
N99
10
[100,1
110
Gradually we will find
0
01)
the immediate
predecessor of k
10 [101,10
5
1
3)
10
3
[103,1
07)
5
10
7
[107,11
5)
5
Log(n)-time Lookups
…
…
…
9
[9,13)
10
13
[13,21 20
)
N5
N10
K19
N20
N32
N80
N60
Lookup
(K19)
distributed hash tables
Distributed applications
Insert(key, data)
node
•
•
•
•
Lookup (key)
Distributed hash tables
node
….
data
node
Nodes are the hash buckets
Key identifies data uniquely
DHT balances keys and data across nodes
DHT replicates, caches, routes lookups, etc.
Why DHT?
• Demand pulls
– Growing need for security and robustness
– Large-scale distributed apps are difficult to build
– Many applications use location-independent data
• Technology pushes
– Bigger, faster, and better: every PC can be a server
– Scalable lookup algorithms are available
– Trustworthy systems from untrusted components
DHT is a good interface
DHT
lookup(key)  data
Insert(key, data)
•
•
UDP/IP
Send(IP address, data)
Receive (IP address)  data
Supports a wide range of applications,
because few restrictions
•
•
Keys have no semantic meaning
Value is application dependent
Minimal interface
DHT is a good shared
infrastructure
• Applications inherit some security and robustness
from DHT
– DHT replicates data
– Resistant to malicious participants
• Low-cost deployment
– Self-organizing across administrative domains
– Allows to be shared among applications
• Large scale supports Internet-scale workloads
1. Scalable lookup
• Map keys to nodes in a
load-balanced way
– Hash keys and nodes into
a string of digit
– Assign key to “closest”
node
• Forward a lookup for a key
to a closer node
K5
K20
N105
Circular
ID space
N90
K80
N60
• Insert: lookup + store
• Join: insert node in ring
Examples: CAN, Chord, Kademlia, Pastry, Tapestry,
Viceroy, ….
N32
Chord’s routing table: fingers
¼
1/8
1/16
1/32
1/64
1/128
N80
½
Lookups take O(log(N)) hops
N5
N10
K19
N20
N110
N99
N32 Lookup(K19)
N80
N60
• Lookup: route to closest predecessor
2. Balance load
N5
K19
N10
K19
N20
N110
N99
N32 Lookup(K19)
N80
N60
• Hash function balances keys over nodes
• For popular keys, cache along the path
3. Handling failures:
redundancy
N5
N10
N110
K19
N20
N99
N32
N40
K19
K19
N80
N60
• Each node knows IP addresses of next r nodes
• Each key is replicated at next r nodes
Lookups find replicas
N5
N10
N110
N99
1.
3.
N20
2.
4.
K19
N40
N50
N80
N68
N60
Lookup(K19)
• Tradeoff between latency and bandwidth [Kademlia]
5. Optimize routing to reduce
latency
T o vu.nl
Lulea.se
OR-DSL
CMU
MIT
MA-Cable
Cisco
Cornell
CA-T 1
CCI
Aros
Utah
NYU
N20
N41
N80
N40
• Nodes close on ring, but far away in Internet
• Goal: put nodes in routing table that result in
few hops and low latency
Chord Autonomy
• When new keys are inserted the system is not affected. It just
finds the appropriate node and stores it
• When nodes join or leave, the finger tables must be correctly
maintained and also some keys must be transferred to other
nodes
• Also, every key is stored only in one node, which means that if
that node becomes unavailable the key is also unavailable
• This incurs an O(log2N) cost for maintaining the finger tables and
assuring correctness of the system while nodes join/leave the
system
• This imply a restricted autonomy of the system
• The only replicated information is (implicitly) the finger tables,
because each node has to maintain its own
Distributed Hash Tables
• A large shared memory implemented by
p2p nodes
• Addresses are logical, not physical
• Implies applications can select them as
desired
• Typically a hash of other information
• System looks them up
Drawbacks of DHTs
• Structured solution
– Given a filename, find its location
– Tightly controlled topology & file placement
• Can DHTs do file sharing?
– Probably, but with lots of extra work:
• Caching
• Keyword searching
– Poorly suited for keyword searches
– Transient clients cause overhead
– Can find rare files, but that may not matter
• General evaluation of the structured P2P systems
– Great at finding rare files, but most queries are for
popular files