Download node - Open Learning Environment

Document related concepts

Peering wikipedia , lookup

Distributed firewall wikipedia , lookup

Backpressure routing wikipedia , lookup

Airborne Networking wikipedia , lookup

List of wireless community networks by region wikipedia , lookup

IEEE 802.1aq wikipedia , lookup

Recursive InterNetwork Architecture (RINA) wikipedia , lookup

Distributed operating system wikipedia , lookup

CAN bus wikipedia , lookup

Routing in delay-tolerant networking wikipedia , lookup

Kademlia wikipedia , lookup

Peer-to-peer wikipedia , lookup

Transcript
Varia
• Assignment?
• Nächsten Dienstag 4 Stunden (8:30-12:30)
Lab, beide Labgruppen zusammen
Distributed Systems
20. Peer-to-Peer Systems
Simon Razniewski
Faculty of Computer Science
Free University of Bozen-Bolzano
A.Y. 2016/2017
Outline
1.
2.
3.
4.
Distributed System Architectures
P2P - Overview
Unstructured P2P - Gnutella
Structured P2P – Chord
1. (Distributed) System Architectures
Reminder
• In general Distributed Systems (DS) are designed
to share resources.
• What is a resource?
– e.g., programs, data, CPUs
• Advantages of DS
–
–
–
–
–
Fault tolerance
Cost reduction (e.g., by pooling resources)
Scalability
Load Balancing
…
• Disadvantages?
Architectures
• DS are software architectures.
– It is important to organize the various software
modules.
– This organization can happen according to
different architectural styles.
•
•
•
•
Layered architectures
Object-based architectures
Data-centric architectures
Event-based architectures
Architectural Styles
Layered architectures
• Example: TCP/IP protocol stack
Architectural Styles
Object-based architectures
• Components are connected in a RPC style
• Example:
– Client-Server architecture, Java RMI
Architectural Styles
data-centric architectures
• Processes communicate via a shared repository
• Example:
– Distributed File Systems
Architectural Styles
Event-based architectures
• Processes communicate through a mechanism of event
propagation.
• Publish/Subscribe systems
– Processes publish events
– Only those processes that subscribed will be notified
 Publishers and subscribers don’t need to explicitly refer each
other (MOM)
2. P2P - Overview
Peer-to-Peer: History
• ICQ
– 1996: first P2P instant messaging tool
– Server maintains the status of the users, communication happens P2P
(till 2011, also Skype worked that way)
• Napster
– 1999 first file sharing system for music
– 2001 shutdown over legal issues
• Gnutella
– 2000 first truly decentralized P2P filesharing system
• Bittorrent
– Developed 2001, most used system today
• Bitcoin
– Distributed virtual currency, released 2009
Peer-to-Peer Systems
Motivation
• A huge number of nodes participating in the
network:
– Have demands towards the use of resources
– Classical client/server systems: Server are the
bottlenecks
– But: Clients have a lot of compute power by
themselves
– E.g., Bitcoin network in 2013: 8x the compute power
of the top 500 supercomputers
• Main questions:
– How can we utilize the power of the clients?
– How can we achieve fairness?
Peer-to-Peer Systems
Overlay Networks
• P2P networks form an overlay network on top of the
Internet (TCP/IP network)
• TCP/IP networks form an overlay network politically and
technically over the underlying telecom
• Both:
– introduce their own addressing scheme (e.g. peer IDs, IP
addresses)
– emphasize redundancy
• Complexity of P2P systems lies in the design of the
overlay network + the message exchange protocol
• Super-peer based (e.g. Napster, (Bittorrent))
• Unstructured (e.g. Gnutella)
• Structured (e.g. Chord)
Peer-to-Peer
Overlay Networks
Client Server vs. P2P content discovery
Index
Overlay
Network
Client
Network
Network
Provider
Provider
Provider
Provider
Node
Node
• Clients interacts with Servers after finding useful
information provided by Indexing services
• No intelligence in the network
Node
• Search in the network
• “Content”-based
• Intelligence in the network
Peer-to-Peer
Features
• Resources (location, sharing)
– Relevant resources located at (private) nodes
(peers)
• uncontrolled, voluntary offers
– Resources of peers are heterogeneous
• bandwidth, CPU power, storage space, …
• quality depends on device / connectivity
– Distributed resource location
• widely spread: requires proper mechanism to find and
use
Peer-to-Peer
Features
• Networking
– High churn rate (ratio of peers joining/leaving a
service over time)
• peers are online for an unpredictable limited time
• heterogeneous connection types
• often operating behind firewalls or NAT gateways
• Self-organizing system
– Relevant mechanisms performed by peers in the
network
• Resource search, allocation and scheduling
• No central control
Peer-to-Peer
vs Grid Computing
• A computing paradigm to interconnect the
existing data processing centers to Virtual
Organizations and operate on data in a
distributed way.
• Commonly in commercial applications
• Still some centralized components
Peer-to-Peer
vs Cloud Computing
• A computing paradigm to access distributed pools of
resources.
– Resources: storage, bandwidth, CPU
• Resource providers: typically companies
• Controlled environment
– No malicious users
– No (very little) churns
– Homogeneous devices
• Parts of the architecture are centralized
• Different goal
– Software as a service (e.g. Google docs)
– Platform as a service (e.g. Google app engine, Windows Azure)
– Infrastructure as a service (e.g. Amazon EC2)
Peer-to-Peer
Popular systems
Peer-to-Peer
Traffic
http://www.exfo.com/PageFiles/37706/Robert%20Fitts_Cisco%20Graph.jpg
(Since 2012, reported slight decrease vs. streaming)
Structured P2P architectures
• The overlay is constructed by a deterministic
procedure.
• Distributed Hash Tables (DHTs) are commonly
used to organize data.
• DHTs:
– Each data item is assigned a key in a key space.
– Each node is assigned a key in the same way.
– Idea: map the key of a data item to a unique node key.
The mapping is done by using a distance metric
Structured P2P architectures
• When looking for a piece of data, it is
important to return the id of the node
responsible for that piece of data.
• This is achieved by performing routing of the
request from the node that originated the
request to the node responsible for the key
associated to the data one is looking for.
Structured P2P architectures - Chord
• Nodes are logically organized in a ring.
• Node ids can be obtained e.g., by hashing the
IP address of the node.
• A piece of data having key k is assigned to a
node having id>k.
– This node is called successor of k denoted by
succ(k).
Structured P2P architectures- Chord
N63
The distance to peer nodes
increases exponentially.
N2
N6
N56
N11
N50
put ( 28, A )
performed on
N48
N42
N42
Key 28
…
N33
N33 is the target node
N27
N17
Data in this area
is stored into Node 27
Unstructured P2P architectures
• The network looks like a random graph.
• Each node maintains a list of neighbors.
– How to construct this list ?
• e.g., by contacting popular nodes.
• Each node has a partial view of the whole
network.
• When a piece of data has to be found the
network will be flooded.
– Each node contacts its neighbors and so forth.
Unstructured P2P architectures
• Centralized P2P
– e.g., Napster
Unstructured P2P architectures
• Fully decentralized
– e.g., Gnutella 0.2
Super-peer-based P2P architectures
• Some nodes will act as servers for a subset of
peers.
– These nodes are called super-peers (SPs).
• SPs are organized in a P2P way.
• Problems:
– Leader election (i.e., which nodes are good
candidates for being SPs ?)
Super-peer-based P2P architectures
• Combine centralized and decentralized aspects
– e.g., KaZaa
Overlay Networks
Types of queries
• Lookup
– Key-value as in the case of hash tables
• Given a key, returns a single values/file
• Match
– Given a sequence of keywords
• Return all documents matching the terms
• Structured networks have 100% probability of success for
Lookup
• Unstructured networks do not have the same guarantee
Overlay Networks
Classification
Homogeneous/fully
decentralized
4. No guarantee of
success
Heterogeneous/
super-peer-based
4. No guarantee of
success
3. Unstructured P2P
Unstructured Overlay Networks
Types
• Homogeneous (all nodes are assumed equal)
• Heterogeneous (nodes have various roles)
– Decentralized File Sharing with Distributed Servers
– Decentralized file Sharing with Super Nodes
• Flat: all nodes are in one overlay
• Hierarchical: more than one overlay exists
Unstructured Overlay Networks
Centralized Networks
• Central index maintaining information about
– shared object (e.g., file name)
– location (IP address where the object is located)
• Normal peers maintain the objects
– each peer maintains its own objects
– decentralized storage
– file transfer between peers
• Issues:
– Central point of failure
– Unbalanced cost: the server is a bottleneck
Unstructured Overlay Networks
Centralized Networks
SERVER
Node G
d1, B
Node A
Node D
Node E
Node B
Node C
d1
transfer of d1
Node F
Unstructured Overlay Networks
Centralized Networks
• Advantages?
– Search complexity O(1)
– Simple and fast
• Disadvantages?
– No intrinsic scalability
– Single point of failure/attack
– Complex queries difficult
– A single server cannot handle a large number of
peers
Unstructured Overlay Networks
Distributed Networks
• All peers play the same role
• Search is carried out by a cooperation among
peers
• Each peer has a local view of the network
• Main motivation
– Provide robustness
– No single point of failure
– Scalability
Unstructured Overlay Networks
Distributed Networks - Challenges
• How to join the network ?
– No central index
• Knowing at least 1 node that already is in the network
– Constructing the local view of the network
• How to perform search ?
– Flooding
– Random walk
– ..
• How to deliver contents?
– Peer to Peer communication
Unstructured Overlay Networks
Distributed Networks
• PROs?
– Each peer can fail/leave the network
– Scalable
• CONs?
– Slow and expensive search
– No guarantee of completeness (i.e., finding ALL
the objects that satisfy a request)
Unstructured Overlay Networks
Distributed Networks - Search
• Flooding (Breadth First Search)
– Send the message to all neighbor peers apart from
the peer that delivered the message
– Keep track of the messages processed to avoid
cycles
Query
Unstructured Overlay Networks
Distributed Networks - Flooding
Source
node
Message Count
Length of the path
5
Target
node
2615
113
1
Unstructured Overlay Networks
Distributed Networks - Search
• Expanding Ring
– Successive floods with increasing TTL
• Start with small TTLs, if no success, then increase it
– A.k.a. iterative deepening (e.g. games)
• Properties
– Improved performance
• If content follow a Zipf distribution
– Message overhead is high
Unstructured Overlay Networks
Distributed Networks - Search
• Random walk
– Forward the query to a randomly selected
neighbor
• Message overhead is reduced significantly
• Increased latency
• Multiple random walks
– Reduce latency
– Generate more load
• Termination mechanism
– TTL based
Unstructured Overlay Networks
Distributed Networks – Random walk
Assume k=2, each message is
sent to only 2 neighbors
Target
node
Source
node
Gnutella Protocol 0.4
Unstructured Overlay Networks
Gnutella protocol
• Messages have a GUID (globally unique identifier) to
avoid loops
• TTL max. equal to 7
• Phases:
– Connecting
• PING
• PONG
– Querying
• QUERY
• QUERY HIT
– Transfer
• HTTP GET or PUSH (in case of firewall)
Unstructured Overlay Networks
Gnutella protocol – Phase 1
• To connect to a Gnutella network, peer must initially
know
– (at least) one member node of the network and connect to
it
• This/these first member node(s) must be found by
other means
– find first member using other medium (Web, chat, ...)
– nowadays host caches are usually used
• Share further neighbours to get more connections
Unstructured Overlay Networks
Gnutella algorithm – Phase 2
• A node that receives a QUERY message
– increases the HOP count field of the message and
• IF(HOP <= TTL && ! QUERY.GUID already received) THEN
forwards it to all nodes except the one the peer received it from
– Nodes also checks whether they can answer and in case
• Send a QUERY HIT message to the requestor
– QUERY HIT messages contain
• The IP address of the sender
• The message is routed through the same path it arrived
Unstructured Overlay Networks
Gnutella algorithm – Phase 3
• A peer sets up a HTTP connection
– actual data transfer is not part of the Gnutella protocol
– HTTP GET is used
• Special case: peer with the file located behind a
firewall/NAT gateway the downloading peer
– cannot initiate a TCP/HTTP connection
• can instead send the PUSH message asking the other peer to
initiate a TCP/HTTP connection to it and then transfer (push) the
file via it
• does not work if both peers are behind firewalls
Unstructured Overlay Networks
Gnutella algorithm – Scalability
• A TTL (Time To Live) of 4 hops for the PING messages leads to
a known topology of roughly 8000 nodes
– TTL in the original Gnutella client was 7 (not 4)
• Gnutella in its original version (V 0.4) suffers from a range of
scalability issues due to
– fully decentralized approach
– flooding of messages
• Measured message frequency
(Portman et al.):
• Gnutella 0.6: Superpeers
Unstructured Overlay Networks
Gnutella algorithm – free riders
• Study results (since e.g. Adar/Hubermann 2000):
– 70% of the Gnutella users share no files
– 90% do not answer to queries
• Some solutions
– incentives for sharing: peers only accept connections /
forward messages from peers that share contents
– Micro-payment (e.g., Mojo Nation)
Bittorrent
• BitTorrent is a system for downloading files.
• It does not provide all the functionalities of a
typical P2P system (e.g., search).
• To share a content a user create a .torrent file
containing:
– Metadata about the content to be shared
– Information about the tracker, that is, the node
that coordinates the distribution
Bittorrent - Features
• The user that provides the file chops it into
pieces of size between 64K and 1 MB.
• When a client is looking for a file it locates the
.torrent file pointing to the tracker.
• The tracker tells which other peers (called
leechers) are downloading the file.
Bittorrent - Functionality
• Replicas of the downloaded parts are created.
– More leechers downloading more replicas.
– As soon as the leecher has completed a piece, it can share
it with others
– Share rarest first
• All the pieces are put together to assemble the file.
• The main objective of BitTorrrent is to guarantee
cooperation.
– Avoiding free riding (i.e., nodes that only download
and do not provide contents)
– Tracker hands addresses of pieces only to peers that
share received pieces (tit-for-tat)
Bittorrent - Illustration
(click)
Bittorrent - Remarks
• Tracker normally does not host the file
• Torrent file hosts: Same applies
• From thepiratebay.org:
No torrent files are saved at the server. That means no copyrighted
and/or illegal material are stored by us. It is therefore not possible to
hold the people behind The Pirate Bay responsible for the material that
is being spread using the site.
• Today: Trackerless torrents (see structured P2P next)
Legal and economical aspect of filesharing
What is legal?
What is illegal?
Should filesharing be forbidden?
What should the government do?
What should producers of digital content do?
Assignment 3
• Many nice solutions!
• Why no security for ports needed?
– Security at system level (firewall)
– No exposure to remote code (in contrast to remote objects)
• Concurrency is hidden, nevertheless an issue!
– E.g., what if two clients register at the same time?
• Structure of documentation
– Title, subheadings, paragraphs, boldfacing, enumerations, bullets,
screenshots, tables of text cases/differences, …
– Not just monolithic text blocks
4. Structured P2P Networks Chord
Distributed Hash Tables
• Hash tables allow very quick access to large
sets of objects
– How fast?
• Can we distribute them?
• Why would we want to?
Standard hashing and why it does not work
System architectures
Distributed Hash Tables
• Distributed Hash Tables (DHTs):
– Each data item is assigned a key in a key space.
– Each node is assigned a key in the same way.
– Idea: map the key of a data item to a unique node
key by using a distance metric
Structured P2P architectures
Chord
• Developed in 2001 at MIT
• Efficient node localization
– provable performance, proven correctness
• Chord is a distributed lookup protocol
– Distributed version of traditional hash tables
• Support of just one operation:
– given a key, Chord maps the key onto a node
• More concretely
– Chord provides the IP address of the node
responsible for the sought key
Structured P2P architectures
Chord – consistent hashing
• The Hash function assigns each node and key
an m-bit identifier using a base hash function
such as SHA-1 (Secure Hash Standard)
– ID(node) = hash(IP, Port)
– ID(key) = hash(key)
• Ensures that when an n-th node joins (or
leaves) the network, only an O(1/N) fraction of
the keys are moved to a different location
Structured P2P architectures
Chord – consistent hashing
• In an m-bit space there are 2m identifiers
• How to build the ring?
– Identifiers are ordered in an identifier circle
modulo 2m
– The circle is called the Chord
• The key k is assigned to the node whose
identifiers is equal to or follows k in the
identifier space
• This node is called the successor of k
Structured P2P architectures
Chord – consistent hashing
identifier
node
6
1
0
successor(6) = 0
6
identifier
circle
6
5
2
2
3
4
key
successor(1) = 1
1
7
X
2
successor(2) = 3
Structured P2P architectures
Chord – node join and departure
• When a node n joins the network
– certain keys assigned to n’s successor become
assigned to n
• When node n leaves the network
– all of its assigned keys are reassigned to n’s
successor.
Structured P2P architectures
Chord – node join (6)
keys
5
7
Certain keys assigned to
succ(6) (i.e., 0) now are
assigned to 6
keys
1
0
1
7
keys
6
2
5
3
4
keys
2
Structured P2P architectures
Chord – node departure (1)
keys
7
Certain keys are assigned
to succ(1) (i.e., 3) when 1
leaves
keys
1
0
1
7
keys
6
6
2
5
3
4
keys
2
Structured P2P architectures
Chord – (simple) lookup
• A very small amount of routing information
suffices to implement consistent hashing in a
distributed environment
• If each node knows only how to contact its current
successor node on the identifier circle, all node
can be visited in linear order.
• Queries for a given identifier could be passed
around the circle via these successor pointers until
they encounter the node that contains the key.
– Is this efficent ?
Structured P2P architectures
Chord – (simple) lookup
• Pseudo code for finding the successor:
// ask node n to find the successor of id
n.find_successor(id)
if (id  [n, successor])
return successor;
else
// forward the query around the circle
return successor.find_successor(id);
Structured P2P architectures
Chord – (simple) lookup
• The path taken by a query from node 1 for key
7:
keys
7
lookup (7)
0
1
7
6
2
5
3
4
Structured P2P architectures
Chord – efficient lookup
• To accelerate lookups, each Chord node
maintains additional routing information
• Each node maintains a routing table with (at
most) m entries (where N=2m) called the
finger table
• The ith entry in the table at node n will be:
FTn[i]=successor(n+2i-1)
– The distance from n increases exponentially
Structured P2P architectures
Chord – finger table
finger table
start
For.
0+20
0+21
0+22
1
2
4
1
6
succ.
1
3
0
finger table
For.
start
0
7
keys
6
0
1+2
1+21
1+22
2
3
5
succ.
keys
1
3
3
0
2
5
3
4
finger table
For.
start
0
3+2
3+21
3+22
4
5
7
succ.
0
0
0
keys
2
Structured P2P architectures
Chord – finger table
• Each node stores information about:
– Its direct predecessor
– A small number of successor nodes
– Knows more about nodes closely following it than about
nodes farther away
• A node’s finger table generally does not contain
enough information to determine the successor
of an arbitrary key k
• Repetitive queries to nodes that immediately
precede the given key will lead to the key’s
successor eventually
Structured P2P architectures
Chord – lookup with finger table
• How to perform lookup by exploiting the
finger table?
– Search in finger table for the highest node q that
precedes id
q=MAX(FT≤id)
– Invoke find_successor from that node q
=> Number of messages O(log N)!
Structured P2P architectures
Chord – finger table
63
The distance to peer nodes
increases exponentially.
finger table
distance succ.
N11+1
N17
N11+2
N17
N11+4
N17
N11+8
N27
N11+16 N27
N11+32 N48
2
6
56
11
50
finger table
Distance succ.
N42+1
N48
N48
N42+2
N48is responsible
N42+4
The
node that
N50 28 is N11
N42+8
for the key
N42+16 N63
N42+32 N11
get( 28)
performed on
48
42
N42
Key 28
27
33
N33 is the target node
17
finger table
distance succ.
N27+1
N33
N27+2
N33
N27+4
N33
N27+8
N42
N27+16 N48
N27+32 N63
79
Structured P2P architectures
Chord – maintenance
• Basic “stabilization” protocol is used to keep
nodes’ successor pointers up to date
– this is sufficient to guarantee correctness of
lookups
• Those successor pointers can then be used to
verify the finger table entries
• Every node runs stabilize periodically to find
newly joined nodes
Structured P2P architectures
Chord – stabilization
• “Stabilization” protocol contains 6 functions:
– create()
– join()
– stabilize()
– notify()
– fix_fingers()
– check_predecessor()
Structured P2P architectures
Chord – join
• When node n first starts, it calls n.join(n’),
where n’ is any known Chord node.
• The join() function asks n’ to find the
immediate successor of n.
• join() does not make the rest of the network
aware of n.
Structured P2P architectures
Chord – stabilization
// create a new Chord ring.
n.create()
predecessor = null;
successor = n;
// join a Chord ring containing node n’.
n.join(n’)
predecessor = null;
successor = n’.find_successor(n);
Structured P2P architectures
Chord – stabilization
• Each time node n runs stabilize()
– it asks its successor for its predecessor p,
– it decides whether p should be n’s successor instead.
• stabilize() notifies node n’s successor of n’s
existence, giving the successor the chance to
change its predecessor to n.
• The successor does this only if it knows of no
closer predecessor than n.
Structured P2P architectures
Chord – stabilization
// called periodically. verifies n’s immediate
// successor, and tells the successor about n.
n.stabilize()
x = successor.predecessor;
if (x < n.successor)
successor = x;
successor.notify(n);
// n’ thinks it might be our predecessor.
n.notify(n’)
if (predecessor is null or n’>predecessor)
predecessor = n’;
Structured P2P architectures
Chord – join and stabilization

np
succ(np) = ns
succ(np) = n
n
nil

predecessor = nil

n acquires ns as successor via some n’
n runs stabilize

n notifies ns being the new predecessor

ns acquires n as its predecessor

np runs stabilize

pred(ns) = n
pred(ns) = np
ns
n joins




np asks ns for its predecessor (now n)
np acquires n as its successor
np notifies n
n will acquire np as its predecessor

all predecessor and successor pointers are now
correct

fingers still need to be fixed, but old fingers will
still work
Structured P2P architectures
Chord – fix finger
• This is how new nodes initialize their finger
tables
• Each node also periodically calls fix fingers to
make sure its finger table entries are up-todate (crashes)
Structured P2P architectures
Chord – fix finger
// called periodically, refreshes finger table entries.
n.fix_fingers()
for (i=0..m-1)
finger[i] = find_successor(n + 2i);
Structured P2P architectures
Chord – node failures
• Key step in failure recovery is maintaining correct
successor pointers
– Fingers are “only” shortcuts
• To help achieve this, each node maintains a
successor-list of its r nearest successors on the ring
• If node n notices that its successor has failed, it
replaces it with the first live entry in the list
Structured P2P architectures
Chord – experimental results
• Latency grows slowly with
the total number of nodes
• Path length for lookups is
about ½ log2N (212 nodes)
• Chord is robust in the face
of multiple node failures
• Original paper: Ion Stoica,
Robert Morris, David R.
Karger, M. Frans Kaashoek,
and Hari Balakrishnan.
Chord: A scalable peer-topeer lookup service for
internet applications.
SIGCOMM 2001
Bittorrent – Trackerless torrents
• Keys are filehashes, values are sets of IP addresses
• Replication among neighbours
• Protocol for building the overlay and routing: Kademlia
–
–
–
–
Very similar to Chord
Finger tables also contain exponentially growing entries
But: Distance calculated based on XOR (unintuitive)
Advantages
• Easier to calculate
• Symmetric distance function  Neighbourhood relation is symmetric  Easier
communication on join and leave
• Further reading:
– http://stackoverflow.com/questions/1332107/how-does-dht-intorrents-work
– http://tutorials.jenkov.com/p2p/peer-routing-table.html
Learned today
• Peer-to-Peer
– Structured (Chord)
•
•
•
•
•
Logarithmic lookup via finger tables
100% success for lookup
Node join and leave
How to do lookup
Similar implementations underly Bittorrent (“Trackerless torrents”)
– Unstructured
• Centralized (Napster, Bittorrent (?))
• Decentralized (Gnutella)
– Only flooding/random walk possible
– Sometimes with super peers