Download L13_PeerToPeerV1.2

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Distributed firewall wikipedia , lookup

Airborne Networking wikipedia , lookup

Recursive InterNetwork Architecture (RINA) wikipedia , lookup

Peer-to-peer wikipedia , lookup

Distributed operating system wikipedia , lookup

Transcript
Slides for Chapter 10:
Peer-to-Peer Systems
From Coulouris, Dollimore and Kindberg
Distributed Systems:
Concepts and Design
Edition 4, © Addison-Wesley 2005
Peer to Peer Systems - Introduction
“Applications that exploit resources available at the
edges of the Internet including storage, cycles,
content and human presence” [Shirky 2000]
Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 4
© Pearson Education 2005
Peer to Peer Systems - Introduction
Placement of data objects across many hosts
Ability to access data with minimum overhead
Exploit existing services such as naming, routing,
data replication and security
Provide file sharing, web caching, information
distribution
Best used for fairly static data
Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 4
© Pearson Education 2005
Peer to Peer Systems - Scalability
Not feasible to have a single service provider
Administration and fault recovery dominate
Large bandwidth requirement with high dependence
Difference between server and desktop machines
diminishing
Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 4
© Pearson Education 2005
Peer to Peer - Characteristics
Design must insure each user contributes resources
to the system
All nodes need the same functional capabilities and
responsibilities
Correct operation does not depend on the existence
of any centrally administered system
The system offers a degree of anonymity
Workload must be balanced and availability must be
ensured with minimum overhead
Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 4
© Pearson Education 2005
Peer to Peer: Issues
Computers and networks are volatile resources and
owners are under no obligation to keep them
switched on.
Users cannot be guaranteed access to a resource
however if there are multiple copies of a data object
the probability of not being able to access the object
is low.
Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 4
© Pearson Education 2005
Peer to Peer - Examples
First generation
Napster – Music exchange (2001)
Second Generation
Freenet - file exchange (2000)
Gnutella, Kazaa, BitTorrent (2000-2004)
Third generation
Pastry, Tapestry, CAN, Chord, Kademlia. Middleware
layers (2001-2004)
Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 4
© Pearson Education 2005
Peer to Peer – Third Generation (Overlay routing)
First and second generation peer-to-peer
applications are based on IPv4 and IPv6.
Large chunks of IPv4 and IPv6 are pre allocated and
thus unavailable.
Third generation:
Uses a uniform 2*128 name space
Derives a GUID from the resources state (e.g. by using a
secure hash function), which must be immutable.
Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 4
© Pearson Education 2005
Peer to Peer Secure: Secure Digest Function
Given a file f a function H should be used to
calculate a fixed length hash (e.g. 2^128 bits)
h = H(f)
h should be randomly distributed in the over the entire
address space
It should be difficult to calculate f = H”(h)
Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 4
© Pearson Education 2005
Figure 10.1: Distinctions between IP and overlay routing for peer-to-peer
applications
Scale
Load balancing
Network dynamics
(addition/deletion of
objects/nodes)
Fault tolerance
Target identification
Security and anonymity
IP
Application-level routing overlay
IPv4 is limited to 232 addressable nodes. The
IPv6 name space is much more generous
(2128), but addresses in both versions are
hierarchically structured and much of the space
is pre-allocated according to administrative
requirements.
Loads on routers are determined by network
topology and associated traffic patterns.
Peer-to-peer systems can address more objects.
The GUID name space is very large and flat
(>2128), allowing it to be much more fully
occupied.
Object locations can be randomized and hence
traffic patterns are divorced from the network
topology.
IP routing tables are updated asynchronously on Routing tables can be updated synchronously or
a best-efforts basis with time constants on the asynchronously with fractions of a second
order of 1 hour.
delays.
Redundancy is designed into the IP network by Routes and object references can be replicated
its managers, ensuring tolerance of a single
n-fold, ensuring tolerance of n failures of nodes
router or network connectivity failure. n-fold or connections.
replication is costly.
Each IP address maps to exactly one target
Messages can be routed to the nearest replica of
node.
a target object.
Addressing is only secure when all nodes are Security can be achieved even in environments
trusted. Anonymity for the owners of addresses with limited trust. A limited degree of
is not achievable.
anonymity can be provided.
Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 4
© Pearson Education 2005
Figure 10.2: Napster: peer-to-peer file sharing with a centralized,
replicated index
peers
Napster serv er
Index
1. Fil e lo cation
req uest
2. Lis t of pee rs
offerin g th e fi le
Napster serv er
Index
3. Fil e req uest
5. Ind ex upd ate
4. Fil e de livered
Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 4
© Pearson Education 2005
Napstar - 1999
- First application for globally-scaleable information
storage
- Centralised indexes
- User supplied files
- “Fairness Policy” - if you use then contribute
- Shut down –
- Napster claimed they were not participating in the copying
of files
- BUT the index servers had well known addresses and
were unable to remain anonymous
Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 4
© Pearson Education 2005
Napstar – 1999 Lessons learnt
- Feasibility of using end user clients to implement a
distributed application
- Identified need for fair distribution of requests to
share data
- Use network locality where possible
- Limitations
- Napster used replicated databases but the nature of the
application did not require high consistency. Most
application need greater consistency
- Music files never updated
- No guarantees of availability – wait for music to appear
Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 4
© Pearson Education 2005
Figure 10.3: Distribution of information in a routing overlay
AÕs routi ng knowle dge DÕs rou ting kn owledg e
C
A
D
B
Obje ct:
Node :
BÕs ro utin g knowled ge
CÕs rou ting kn owledg e
Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 4
© Pearson Education 2005
Figure 10.4: Basic programming interface for a distributed hash table (DHT)
as implemented by the PAST API over Pastry
put(GUID, data)
The data is stored in replicas at all nodes responsible for the object
identified by GUID.
remove(GUID)
Deletes all references to GUID and the associated data.
value = get(GUID)
The data associated with GUID is retrieved from one of the nodes
responsible it.
Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 4
© Pearson Education 2005
Figure 10.5: Basic programming interface for distributed object location and
routing (DOLR) as implemented by Tapestry
publish(GUID )
GUID can be computed from the object (or some part of it, e.g. its
name). This function makes the node performing a publish operation the
host for the object corresponding to GUID.
unpublish(GUID)
Makes the object corresponding to GUID inaccessible.
sendToObj(msg, GUID, [n])
Following the object-oriented paradigm, an invocation message is sent to
an object in order to access it. This might be a request to open a TCP
connection for data transfer or to return a message containing all or part
of the object’s state. The final optional parameter [n], if present, requests
the delivery of the same message to n replicas of the object.
Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 4
© Pearson Education 2005
Figure 10.6: Circular routing alone is correct but inefficient
Based on Rowstron and Druschel [2001]
0 FFFFF....F (2 128-1)
D471 F1
D467 C4
D46A1C
D13DA3
The dots depict live nodes.
The space is considered as
circular: node 0 is adjacent to
node (2128-1). The diagram
illustrates the routing of a
message from node 65A1FC
to D46A1C using leaf set
information alone, assuming
leaf sets of size 8 (l = 4). This
is a degenerate type of
routing that would scale very
poorly; it is not used in
practice.
65 A1FC
Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 4
© Pearson Education 2005
Figure 10.8: Pastry routing example Based on Rowstron and Druschel [2001]
0 FFFFF....F (2 128-1)
Routing a mess age from node 65A1FC to D46A1C.
With the aid of a w ell-populated routing table the
mess age c an be deliv ered in ~ log16 (N ) hops.
D471 F1
D46A1C
D467 C4
D462 BA
D421 3F
D13DA3
65 A1FC
Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 4
© Pearson Education 2005
Figure 10.7: First four rows of a Pastry routing table
Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 4
© Pearson Education 2005
BitTorrent
BitTorrent –
Enables efficient transfer of (large) files from a remote
server to a client machine.
Original Client developed by Ludvig Strigeas
Many other clients exist
Just another protocol!
References http://bittorrent.org
Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 4
© Pearson Education 2005
BitTorrent Features
Small footprint
Bandwidth prioritization
Scheduling
RSS scheduling (see below)
Cuts files into small chunks (256K)
Uses Digital Hash Table (SHA1) to integrity
Protocol Encryption
Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 4
© Pearson Education 2005
Really Simple Syndication (1)
Family of web feed formats
Typically used for information that changes regularly
e.g. Blog entries, weather updates, audio, video
User subscribes to ‘web feeds’, data is updated
regularly.
Web feed includes data and meta data (data about
data) that for example my describe how the data is
displayed.
Web feeds can be customized to the receiving
appliance
Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 4
© Pearson Education 2005
Really Simple Syndication (2)
An RSS reader can run on a desktop computer (or
similar device) and request updates at regular
intervals.
Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 4
© Pearson Education 2005
The Basics of BitTorrent (1)
A Peer to Peer architecture is one that that shares
data between different systems. i.e. there is no
central controller.
BitTorrent defines a Peer-to-Peer protocol
Peer-to-Peer systems should still be able to perform
tasks even if part of the network is down.
The bittorrent protocol is attributed to Bram Cohen
Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 4
© Pearson Education 2005
The basics of BitTorrent (2)
Basic premise was to distribute data in such a way
that the original distributor would be able to
decrease the bandwidth usage while still being able
to at least the same number of people.
The idea was to break the file into smaller segments
called pieces.
To save bandwidth, each person downloading (a
peer) would have the pieces they acquired available
to other peers in the swarm (the entire network of
people connected to a torrent).
Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 4
© Pearson Education 2005
The basics of BitTorrent (3)
A seed is a peer that has all every piece.
Over time a peer will acquire all pieces and become
a seed themselves
The availability of a torrent is the number of
complete copies of the in the swarm.
trackers provide a centralised service of keeping
track to the other peers IP addresses (containing the
files). For each given swarm a tracker only needs to
collect a peers IP Address and port number.
Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 4
© Pearson Education 2005
The basics of BitTorrent (4)
BitTorrent speeds are not guaranteed for any torrent
swarm.
Point to point protocols can be slow depending on
network speeds and usage.
Swarms that contain more seeds and peers are not
necessarily faster.
Trackers are often the bottleneck
Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 4
© Pearson Education 2005
The basics of BitTorrent (4)
Installation of client
Install BitTorrent
Surf the web required file
Click where to save file locally
Wait for download to complete
Exit downloader
Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 4
© Pearson Education 2005