Download Introduction to Peer-to

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Computer security wikipedia , lookup

IEEE 802.1aq wikipedia , lookup

Recursive InterNetwork Architecture (RINA) wikipedia , lookup

List of wireless community networks by region wikipedia , lookup

Airborne Networking wikipedia , lookup

Distributed operating system wikipedia , lookup

CAN bus wikipedia , lookup

Routing wikipedia , lookup

Kademlia wikipedia , lookup

Routing in delay-tolerant networking wikipedia , lookup

Peer-to-peer wikipedia , lookup

Transcript
Outline
Introduction to
Peer-to-Peer Systems
!
!
!
!
!
!
Ozalp Babaoglu
ALMA MATER STUDIORUM – UNIVERSITA’ DI BOLOGNA
Introduction
Peer-to-Peer vs. Client / Server
Common Topologies
Data location
Churn
Security
© Montresor/Babaoglu 2010
!
Introduction
! Peer-to-peer (P2P) systems have become extremely
popular and contribute to vast amounts of Internet traffic
! P2P basic definition:
!
!
Complex Systems
2
P2P History: 1969 - 1990
! The origins:
!
A P2P system is a distributed collection of peer nodes
Each node is both a server and a client:
!
In the beginning, all nodes in Arpanet/Internet were peers
Each node was capable of:
▴ Performing routing
(locate machines)
▴ Accepting ftp connections
(file sharing)
▴ Accepting telnet connections (distribution computation)
▴ May provide services to other peers
▴ May consume services from other peers
! Very different from the client-server model
© Montresor/Babaoglu 2010
Complex Systems
3
© Montresor/Babaoglu 2010
Complex Systems
4
P2P History: 1999 - today
Napster
Napster Client 2
Napster Client 1
Copy: song.mp3
! The advent of Napster:
!
!
!
Query: song.mp3
Jan 1999: the first version of Napster was released by Shawn
Fanning, student at the Northeastern University
July 1999: Napster Inc. founded
Feb 2001: Napster closed down
Napster Client 3
Napster Central
Index Server
- Client/Server search
- P2P download
- Napster is not “pure P2P”
Napster Client 4
Complex Systems
© Montresor/Babaoglu 2010
Your Computer
Client 2
5
Complex Systems
© Montresor/Babaoglu 2010
Client/Server vs. Peer-to-Peer
Example – Video sharing
Client-Server: YouTube
Advantages
• Client can disconnect after upload
downloader
downloader
• Uploader needs little bandwidth
• Other users can find the file easily
(just use search on server webpage)
downloader
uploader
Disadvantages
• Server may not accept file or remove
it later (according to content policy)
• Servers well connected to the
• Only nodes located at the
• Servers carry out critical tasks
• Clients only talk to servers
• Tasks distributed across all nodes
• Clients talk to other clients
“core” of the Internet
© Montresor/Babaoglu 2010
“periphery of the Internet”
Complex Systems
downloader
downloader
client-server
© Montresor/Babaoglu 2010
• Whole system depends on the server
(what if shut down like Napster?)
• Server storage and bandwidth
are expensive!
Complex Systems
Example – Video sharing
Peer-to-peer: BitTorrent
Advantages
• Does not depend on a central server
downloader
• Bandwidth shared across nodes
(downloaders also act as uploaders)
downloader
• High scalability, low cost
downloader
seeder
Disadvantages
• Seeder must remain on-line to
guarantee file availability
downloader
downloader
peer-to-peer
© Montresor/Babaoglu 2010
• Content is more difficult to find
(downloaders must find .torrent file)
• Freeloaders cheat in order to
download without uploading
Complex Systems
P2P vs. client-server
Client-server
Peer-to-peer
Asymmetric: client and servers carry
out different tasks
Symmetric: No distinction between node;
they are peers
Global knowledge: servers have a
global view of the network
Local knowledge: nodes only know a
small set of other nodes
Centralization: communications and
management are centralized
Decentralization: no global knowledge,
only local interactions
Single point of failure: a server
failure brings down the system
Robustness: several nodes may fail with
little or no impact
Limited scalability: servers easily
overloaded
High scalability: high aggregate capacity,
load distribution
Expensive: server storage and
bandwidth capacity is not cheap
Low-cost: storage and bandwidth are
contributed by users
© Montresor/Babaoglu 2010
10
Complex Systems
P2P and Overlay Networks
Overlay networks
! Peer-to-Peer Networks are usually “overlays”
! Logical structures built on top of a physical routed
communication infrastructure (IP) that creates the allusion
of a completely-connected graph
! Links based on logical relationships (“knows”) rather than
physical connectivity
Physical network: “who has a communication link to whom”
© Montresor/Babaoglu 2010
Complex Systems
11
© Montresor/Babaoglu 2010
Complex Systems
12
Overlay networks
Overlay networks
Overlay network (ring): “who knows whom”
Logical network: “who can communicate with whom”
© Montresor/Babaoglu 2010
Complex Systems
© Montresor/Babaoglu 2010
Complex Systems
13
14
Overlay networks
P2P Environment
! High latency, low bandwidth communication
! Churn
!
!
Nodes may disconnect temporarily
New nodes are continuously joining the system, while others
leave permanently
! Security
!
!
P2P clients runs on machines under the total control of their
owners
Malicious users may try to bring down the system
! Selfishness
!
Overlay network (binary tree): “who knows whom”
© Montresor/Babaoglu 2010
Complex Systems
Users may run hacked clients in order to avoid contributing
resources
© Montresor/Babaoglu 2010
15
Complex Systems
16
Why P2P?
P2P Problems
! Overlay construction and maintenance
!
! Decentralization enables deployment of applications that
are:
!
!
!
!
!
! Data location
!
Highly available
Fault-tolerant
Self organizing
Scalable
Difficult or impossible to shut down
locate a given data object among a large number of nodes
! Data dissemination
!
propagate data in an efficient and robust manner
! Per-node state
!
! This results in a “grassroots” approach and
“democratization” of the Internet
© Montresor/Babaoglu 2010
e.g., random, two-level, ring, etc.
! Tolerance to churn
!
17
Complex Systems
keep the amount of state per node small
maintain system invariants (e.g., topology, data location, data
availability) despite node arrivals and departures
© Montresor/Babaoglu 2010
Complex Systems
P2P Applications
18
P2P Topologies
! Sharing of content:
!
!
File sharing, content delivery networks
Gnutella, eMule, Akamai
!
!
!
!
! Sharing of storage:
!
Distributed file systems
! Sharing of CPU time:
!
!
Centralized
Hierarchical
Decentralized
Hybrid
Parallel computing
Grid, Seti@home, Folding@home, FightAids@home
© Montresor/Babaoglu 2010
Complex Systems
19
© Montresor/Babaoglu 2010
Complex Systems
20
Evaluating topologies
! Manageability
!
! Manageable
!
How hard is it to keep working?
! Information coherence
!
!
How authoritative is info?
!
How easy is it to grow?
!
How well can it handle failures?
Single point of failure
! Censorship
!
How hard is it to shut down?
© Montresor/Babaoglu 2010
No
! Fault tolerance
! Censorship
!
Information is centralized
! Extensible
! Fault tolerance
!
System is all in one place
! Coherent
! Extensibility
!
Centralized
Complex Systems
21
Easy to shut down
© Montresor/Babaoglu 2010
Complex Systems
Hierarchical
! Manageable
!
! Manageable
!
Chain of authority
! Coherent
!
!
Cache consistency
Add more leaves, rebalance
!
Root is vulnerable
Anyone can join in
redundancy
! Censorship
!
Just shut down the root
© Montresor/Babaoglu 2010
!
! Fault tolerance
! Censorship
!
Difficult, unreliable peers
! Extensible
! Fault tolerance
!
Difficult, many owners
! Coherent
! Extensible
!
Decentralized
Complex Systems
Difficult to shut down
© Montresor/Babaoglu 2010
Complex Systems
Centralized + Decentralized
! Manageable
!
! Flat unstructured: a node can connect to any other node
Same as decentralized
! Coherent
!
Better than decentralized
superpeer form a small overlay
used for indexing and forwarding
! high load on superpeer
!
redundancy
! Flat structured: constraints based on node ids
!
Difficult to shut down
© Montresor/Babaoglu 2010
fast join procedure
good for data dissemination, bad for location
!
Anyone can join in
! Censorship
!
only constraint: maximum degree dmax
!
! Two-level unstructured: nodes connect to a superpeer
! Fault tolerance
!
!
!
! Extensible
!
Some Common Topologies
!
Complex Systems
allows for efficient data location
constraints require long join and leave procedures
© Montresor/Babaoglu 2010
Data location: Flooding
! Problem: find the set of nodes S that store a copy of
object O
! Flooding: send a search message to all nodes (first
Gnutella protocol)
!
!
Data location: Flooding
! Flooding (cont.)
!
Flooding in a flat unstructured network:
A search message contains either keywords or an object id
Advantages:
obj
▴ simplicity
▴ no topology constraints
!
search horizon for
TTL = 2
Disadvantages:
▴ high network overhead (huge traffic generated by each search request)
▴ flooding stopped by TTL (which produces search horizon)
▴ only applicable to small number of nodes
© Montresor/Babaoglu 2010
26
Complex Systems
Complex Systems
Objects that lie outside of the horizon are not found
27
© Montresor/Babaoglu 2010
Complex Systems
28
Data location: Superpeers
! Two-Level Overlay (cont.)
! Two-level overlay: use superpeers to track the locations
of an object [eMule, Gnutella 2, BitTorrent]
!
!
Each node connects to a superpeer and advertises the list of
objects it stores
Search requests are sent to the supernode, which forwards them
to other supernodes
Advantages:
!
Disadvantages:
!
Data location: Superpeers
obj
request
response
▴ highly scalable
▴ superpeers must be realiable, powerful and well connected to the Internet
(expensive)
▴ superpeers must maintain large state
▴ the system relies on a small number of superpeers
© Montresor/Babaoglu 2010
Complex Systems
A two-level overlay is a partially centralized system
In some systems, superpeers may be disconnected (e.g., BitTorrent)
29
© Montresor/Babaoglu 2010
30
Complex Systems
Data location: Key-Based Routing
Pastry
! Structured networks: use a routing algorithm that implements KeyBased Routing (KBR) [Chord, Pastry, Overnet, Kad, eMule]
! KBR (also known as Distributed Hash Tables) works as follows:
!
!
!
!
!
nodes are (randomly) assigned unique node identifiers (nodeId)
given a key k, the node whose nodeId is numerically closest to k
among all nodes in the network is known as the root of key k
given a key k, a KBR algorithm can route a message to the root
of k in a small number of hops, usually O(log n)
the location of an object with id objectId is tracked by the root of
k = objectId
thus, one can find the location of an object by routing a message
to the root of k = objectId and querying the root for the location of
the object
© Montresor/Babaoglu 2010
Complex Systems
31
! Pastry is a substrate for the wide area peer-to-peer
location and routing middleware
! Any host that runs Pastry middleware and has proper
credentials can participate in the overlay network
© Montresor/Babaoglu 2010
Complex Systems
32
Pastry Characteristics
! Each node in the Pastry has a unique 128-bit nodeId
! Keys (of items or messages) and nodeIds are in the same
address space
! Given a message with a key, Pastry routes the message
to an alive node whose nodeId is numerically closest to
the key (its root)
! The routing will be done in O(log n) hops where n is the
number of nodes in the network
! Takes into account network locality
!
Design of Pastry
! Node Identifiers
!
!
!
Assigned randomly
Uniformly distributed over some large ID space (e.g., 128 bits)
through a hash function (e.g., SHA1)
IDs are thought of as a sequence of base-2b digits
▴ b=1, IDs are binary numbers
▴ b=4, IDs are hexadecimal numbers
▴ Typically, b is 4
Seeks to minimize the distance the message travel
© Montresor/Babaoglu 2010
33
Complex Systems
© Montresor/Babaoglu 2010
Design of Pastry
!
!
!
!
!
Route the message to the numerically closest node
Done in approximately log2bn steps
n number of nodes
Eventual delivery of messages unless " L or more adjacent
nodes fail simultaneously
L is typically 24
© Montresor/Babaoglu 2010
Pastry Node State
! Routing table
! Routing characteristics
!
34
Complex Systems
Complex Systems
!
!
!
!
35
log2bN rows
2b-1 entries in each rows
N is the item space size
The entry at row x, column y represents the node which shares a
x-digit prefix with the current node, and the x+1st digit is y
Each entry in the routing table refers to any node whose nodeId
has the appropriate prefix
© Montresor/Babaoglu 2010
Complex Systems
36
Pastry Node State
Pastry Node State
Routing table example
! Each entry maps the nodeId to its IP address
! Only the nodes which are likely to be close to the current
node are selected
! If no node is suitable for the entry, it will be left blank
! Selection of b involves a trade off
!
!
used to find next hop with
longer shared prefix
used to find the nodeid
closest to a key that is
close to the local nodeid
When b decreases, the number of hops to route a message will
increase
When b increases, the populated portion in the routing table will
be low
• In this example the routing table size is 4 x 15 = 60 entries, for
a maximum network size of N = 65536 nodes.
• The average route length in this case is 4 hops.
© Montresor/Babaoglu 2010
Complex Systems
37
© Montresor/Babaoglu 2010
Pastry Node State: Leaf set
! Contains the entries of the nodes numerically closest to
the current node
! The set of nodes with " L numerically closest with larger
nodeIds, and
! The set of nodes with " L numerically closest with smaller
nodeIds
! L = 2b
© Montresor/Babaoglu 2010
Complex Systems
39
38
Complex Systems
Pastry Routing
! A particular node receives the message
! If the message key falls within the leaf set
!
Directly forward the message to the target node
! Otherwise use the routing table
!
The current node forwards the message to a node whose nodeId
shares with the message ID a prefix that is one digit longer than
the prefix shared with the current nodeId
© Montresor/Babaoglu 2010
Complex Systems
40
Pastry Routing
Pastry Routing
! If the entry cannot be found in the leaf set nor in the
routing table
!
!
!
!
Among all the entries that the current node has, identify the ones
that share with the message key a prefix at least as long as the
current nodeId
Choose the an entry whose nodeId is numerically closer to the
message key than the current node’s
Such an entry must exist in the leaf set if the message hasn’t
reached the destination
Forward the message to that node
© Montresor/Babaoglu 2010
! The routing will finally converge
!
41
Complex Systems
At each step, the message is sent to a node with ID closer to the
message key
© Montresor/Babaoglu 2010
Pastry Routing Example
route(k=8955,msg)
Key-Based Routing [Pastry]
Source nodeid:
k=
04F2
04F2
Hop id
04F2
85E0
8909
8957
8954
Object 8955 is tracked by node
8954, which knows of two copies
stored at nodes 620F and C52A
!
8957
8821
8954
obj 8955
8909
stored at
nodes
620F,C52A
overlay address space
[0000,FFFF]
© Montresor/Babaoglu 2010
Complex Systems
Disadvantages:
▴ each object must be tracked by a different node
▴ objects are tracked by unreliable nodes (which may disconnect)
▴ keyword-based searches are more difficult to implement than with
superpeers (because objects are located by their objectid)
▴ the overlay must be structured according to a given topology in order to
achieve a low hop count
▴ routing tables must be updated every time a node joins or leaves the
overlay
5230
85E0
Advantages:
▴ completely decentralized (no need for superpeers)
▴ routing algorithm achieves low hop count for large network sizes
C52A
AC78
620F
obj
8955
Data location: KRB (cont.)
!
obj
8955
8955
Shared prefix length
0
1
2
3
3 (root of k)
42
! Structured networks (cont.)
E25A
3A79
Hop #
0
1
2
3
4
Complex Systems
43
© Montresor/Babaoglu 2010
Complex Systems
44
Data location - Loosely structured overlays
Loosely structured networks: use hints for the location of objects [Freenet]
• Nodes locate objects by sending search requests containing the objectid
Data location - Loosely structured overlays
Loosely structured networks (cont.)
• A search response leaves routing hints on the path back to the source
• Requests are propagated using a technique similar to flooding
• Hints are used when propagating future requests for similar object ids
• Objects with similar identifiers are grouped on the same nodes
Hints
AE5J: B
request
for AE5J
C
Hints
AE5J: D
5B20: E
AE5J
5B20
© Montresor/Babaoglu 2010
request
for AF02
B
B
E
C
A
A
D
response
F
Complex Systems
AE5J
5B20
D
E
AF02
F
AF02
Hints
AE5J: F
45
46
Complex Systems
© Montresor/Babaoglu 2010
Data location - Loosely structured overlays
Data location - Summary
The location schemes described previously can be classified according to
the two dimensions: structure and decentralization
! Advantages:
!
Degree of Structure
no topology constraints, flat architecture
searches are more efficient than with plain flooding
! Disadvantages:
!
!
!
does not support keyword-based searches
search requests have a TTL
contrary to structured overlays, loosely structured overlay do not
guarantee a low number of hops, nor that the object will be found
© Montresor/Babaoglu 2010
Complex Systems
47
Degree of Decentralization
!
© Montresor/Babaoglu 2010
low
loose
high
partial
eMule
Gnutella 2
BitTorrent
-
-
total
Gnutella
Freenet
Chord
Pastry
Complex Systems
48
Effects of Churn
! Churn can have several effects on a P2P system:
!
!
!
data objects may be become unavailable if all replicas disconnect
routing tables may become inconsistent
(e.g., entries may point to disconnected nodes)
the overlay may become partitioned if many nodes suddenly
disconnect:
Churn - Preventing Partitions
! A naïve approach to preventing partitions is to increase
the average node degree
! Ring partitions can be avoided by keep a list of successor
nodes
© Montresor/Babaoglu 2010
49
Complex Systems
© Montresor/Babaoglu 2010
Churn Tolerance
! Node arrivals and departures must not disrupt the normal
behavior of the system
!
!
!
two types of churn tolerance:
!
▴ dynamic recovery: ability to react to changes in the overlay to maintain
system invariants (e.g., heal partitions)
▴ static resilience: ability to continue operating correctly before adaptation
occurs (e.g., route messages through alternate paths)
© Montresor/Babaoglu 2010
Complex Systems
Security
! Security in P2P systems is hard to enforce:
system invariants must be maintained
▴ connected overlay (i.e., no partitions), low average path length
▴ data objects accessible from anywhere in the network
!
50
Complex Systems
!
51
Users have full control of their computers
Modified clients may not follow the standard protocol
Data may be corrupted
Private data stored on remote computers may disclosed
© Montresor/Babaoglu 2010
Complex Systems
52
Security - Weak identities
! The user may leave the system and rejoin it with a new
identity (different user id)
! If an attack is detected, the attacker can re-enter the
system with a new id
! An attacker may create a large number of false identities
(Sybil attack)
S1
! The user cannot change its identity
! Solution: use a centralized, trusted Certification Authority
(CA)
!
!
S2
!
Example of Sybil attack:
A
!
- Nodes S1 to S6 are actually 6 instances of
the P2P client running on the same machine
S6
S3
© Montresor/Babaoglu 2010
! Strong identities prevent Sybil Attacks
! If an attacker is caught, it cannot easily rejoin the system
- The attacker can intercept all traffic coming
from or going to node A
Complex Systems
© Montresor/Babaoglu 2010
Security - Weak vs. Strong identities
! Strong identities requires a centralized CA
New nodes must contact the CA before joining the network:
▴ The CA response may be slow
▴ If the CA is unavailable, new nodes cannot join
!
The security of the system depends on CA:
▴ The CA must correctly verify the identity of the requester
▴ The CA’s private key must be secret
! Many P2P systems use weak identities
!
!
Each new user must obtain an identity certificate
The certificate is digitally signed by the CA, whose public key is
known by all users
A certificate cannot be forged (require the CA’s private key)
To prove his identity, a user signs a message with his private key,
and attaches the corresponding certificate signed by the CA
S4
S5
!
Security - Strong identities
IP addresses already gives some identity information
Some systems ensure anonymity
© Montresor/Babaoglu 2010
Complex Systems
55
Complex Systems
54