Download p2p-intro-mcomp - UF CISE

Document related concepts

Zero-configuration networking wikipedia , lookup

Deep packet inspection wikipedia , lookup

Distributed firewall wikipedia , lookup

Airborne Networking wikipedia , lookup

Recursive InterNetwork Architecture (RINA) wikipedia , lookup

IEEE 802.1aq wikipedia , lookup

Lag wikipedia , lookup

Remote Desktop Services wikipedia , lookup

Distributed operating system wikipedia , lookup

CAN bus wikipedia , lookup

Peer-to-peer wikipedia , lookup

Transcript
Dr. Sumi Helal & Dr. Choonhwa Lee
Computer & Information Science & Engineering Department
University of Florida, Gainesville, FL 32611
{helal, chl}@cise.ufl.edu



Introduction to peer-to-peer networking
protocols (Nov. 9)
BitTorrent protocol (Nov. 9)
Peer-to-peer streaming protocols (Nov. 18)
1.
2.
3.
4.
5.
I. Stoica, R. Morris, D. Karger, F. Kaashoek, and H. Balakrishnan, “Chord: A Peerto-Peer Lookup Service for Internet Applications,” In Proc. of the ACM
SIGCOMM Conference, September 2001.
B. Cohen, “Incentives Build Robustness in Bit Torrent,” In Proceedings of
Workshop on Economics of Peer-to-Peer Systems, 2003.
M. Piatek, T. Isdal, T. Anderson, A. Krishnamurthy, and A. Venkataramani, "Do
Incentives Build Robustness in BitTorrent?” In Proc. of the 4th USENIX
Symposium on Networked Systems Design and Implementation (NSDI), April
2007.
J. Liu, S. G. Rao, B. Li, and H. Zhang, “Opportunities and Challenges of Peer-toPeer Internet Video Broadcast,” Proc. of the IEEE, vol.96, no.1, pp.11-24,
January 2008.
M. Zhang, Q. Zhang, L. Sun, and S. Yang, “Understanding the Power of PullBased Streaming Protocol: Can We Do Better?” IEEE Journal on Selected Areas
in Communications, vol.25, no.9, pp.1678-1694, December 2007.
Slide courtesy:
Prof. Jehn-Ruey Jiang, National Central University, Taiwan
Dah Ming Chiu, Chinese University of Hong Kong, China
Chun-Hsin, National University of Kaohsiung, Taiwan
Prof Shiao-Li Tsao, National Chiao Tung University, Taiwan
Prof. Shuigeng Zhou, Fudan University, China

P2P file sharing


P2P communication


Squid, Akamai, LimeLight
Overlay testbed


CoopNet, Zigzag, Narada, P2Cast, Joost, PPStream
Proxies and Content Distribution Networks


IRIS, Chord/CFS, Tapestry/OceanStore, Pastry/PAST, CAN
P2P multimedia streaming


NetNews (NNTP), Instant Messaging (IM), Skype (VoIP)
P2P lookup services and applications (DHTs and global repositories)


Napster, FreeNet, Gnutella, KaZaA, eDonkey/eMule, ezPeer, Kuro, BT
PlanetLab, NetBed/EmuLab
Other areas

P2P Gaming, Grid Computing





More than 200 million users registered with Skype and
around 10 million online users (2007)
Around 4.7M hosts participate in SETI@home (2006)
BitTorrent accounts for 1/3 of Internet traffic (2007)
More than 200,000 simultaneous online users on PPLive
(2007)
More than 3,000,000 users downloaded PPStream (2008)



Well-known, powerful,
reliable server is data
source
Clients request data from
server
Very successful model

WWW (HTTP), FTP, Web
Services, etc




Scalability
A single point of failure
System administration
Unused resources at the network edge

“Peer-to-Peer (P2P) is a way of structuring distributed
applications such that individual nodes have symmetric roles.
Rather than being divided into clients and servers each with
quite distinct roles, in P2P applications, a node may act as
both a client and a server.”
Excerpt from the Charter of Peer-to-Peer Research Group,
IETF/IRTF, June 24, 2003
Peers play similar roles
No distinction of responsibilities

In a P2P network, every
node is both a client and a
server



No centralized data source


Provide and consume data
Any node can initiate a
connection
The ultimate form of
democracy on the Internet
As no. of clients increases,
no. of servers also increases



Perfectly scalable
Distributed costs
Increased privacy

Efficient use of resources


Scalability



Consumers of resources also donate resources
Aggregate resources grow naturally, as more peers join
Reliability




B/W, storage, and processing power at the edge of the network
Replicas
Geographic distribution
No single point of failure
Ease of administration



Self-organization
No need for server deployment and provisioning
Built-in fault tolerance, replication, and load balancing







1999: Napster
2000: Gnutella, eDonkey
2001: Kazaa
2002: eMule, BitTorrent
2003: Skype
2004: Coolstreaming, GridMedia, PPLive
2004~: TVKoo, TVAnts, PPStream, SopCast, …
14

Whether or not the protocols rely on central indexing
servers to facilitate the interactions between peers




Decentralized
Hybrid
Centralized
Whether the overlay networks contain some structure
or are created in an ad-hoc fashion


Unstructured
Structured (i.e., precise control over network topology or data
placement)
Unstructured
Networks
Centralized
Napster
Decentralized
Gnutella
Hybrid
KaZaA, Gnutella
Structured
Networks
Chord, Pastry, CAN



First P2P file sharing application
Centralized directory to help find content
History
 In 1999, S. Fanning launches Napster
 Peaked at 1.5 million simultaneous users
 July 2001, Napster shuts down
insert(X,
123.2.21.23)
...
Publish
I have X, Y, and Z!
123.2.21.23
123.2.0.18
Fetch
Query
Where is file A?
search(A)
-->
123.2.0.18
Reply


Pros:
 Simple
 Search cost is O(1)
 Controllable (pro or con?)
Cons:
 Server maintains O(N) state
 Server does all processing
 A single point of failure



Completely distributed P2P file sharing
Each peer floods its request to all other peers prohibitive overheads
History



In 2000, J. Frankel and T. Pepper from Nullsoft released
Gnutella
Soon many other clients: Bearshare, Morpheus,
LimeWire, etc.
In 2001, many protocol enhancements including
“UltraPeers
The ‘Animal’ GNU
GNU: Recursive Acronym
GNU’s Not Unix ….
Gnutella =
+
GNU
Nutella
Nutella: a hazelnut chocolate spread
produced by the Italian
confectioner Ferrero ….
I have file A.
I have file A.
Reply
Query
Where is file A?

Pros:



Fully de-centralized
Search cost distributed
Cons:



Search cost is O(N)
Search time is O(???)
Nodes leave often, network unstable

Hierarchical supernodes, i.e., ultra-peers





Assigned the task of servicing a small sub-part of
the network
Indexing and caching of files in the assigned part
Sufficient bandwidth and processing power
Kazaa & Morpheus are proprietary systems
Hybrid protocol


More efficient than old Gnutella
More robust than Napster
“Super Nodes”
insert(X,
123.2.21.23)
...
Publish
I have X!
123.2.21.23
search(A)
-->
123.2.22.50
123.2.22.50
Query
Replies
search(A)
-->
123.2.0.18
Where is file A?
123.2.0.18

So far

n: number of participating nodes
Centralized :
- Directory size – O(n)
- Number of hops – O(1)

Flooded queries:
- Directory size – O(1)
- Number of hops – O(n)

We want



Efficiency : O(log(n)) messages per lookup
Scalability : O(log(n)) state per node
Robustness : surviving massive failures
Publish (H(y))
P2P Network
Object “y”
Objects have
hash keys
Join (H(x))
Peer “x”
H(y)
H(x)
y
Hash key
Peer nodes also
x have hash keys
in the same
hash space
Place object to the peer with closest hash keys
0
M
node
data object
Hash
table
0
2128-1
Peer
node
Internet
Hash
table
Peer
node
0
2128-1

Track peers which allow us to move quickly across
the hash space

Hash
table
Peer
node
A peer p tracks those peers responsible for hash keys
(p+2i-1), i=1,..,m
0
p
p+22
p+24
p+28 2128-1


Frans Kaashoek et al., MIT, 2001
Identifiers


m bit identifier space for both keys and nodes
Key identifier = SHA-1(key)
Key=“LetItBe”


ID=5
Node identifier = SHA-1(IP address)
IP=“198.10.10.1”

SHA-1
SHA-1
ID=105
Both are uniformly distributed
How to map key IDs to node IDs?
K5
IP=“198.10.10.1”
N105
As nodes
enter the
network,
they are
assigned
unique IDs
by hashing
their IP
address
K20
Circular 7-bit
ID space
N90
K80
N32
A key is stored
at its
successor: node
with next
higher ID
N120
N10
“Where is key 80?”
N105
“N90 has K80”
K80
N32
Every node
knows its
successor in
the ring
N90
N60

Finger table (FT):



With m additional entry
The i-th entry points to the successor of node n+ 2i-1
To look up key k at node n



In FT, identify the highest node n' whose id is
between n and k.
If such a node exists, the lookup is repeated starting
from n'
Otherwise, the successor of n is returned
finger[k]: The first node on circle that
succeeds (n+2k-1) mod 2m, 1≤k≤m
N42 is the first node that succ
eeds (8+26-1) mod 26=40
N14 is the first node that suc
ceeds (8+21-1) mod 26=9
Lookup(my-id, key-id)
look in local finger table for
highest node n s.t. my-id < n < key-id
if n exists
call Lookup(key-id) on node n
// next hop
else
return my successor
// done
N5
N10
N110
K19
N20
N99
N32: N60, N80, N99
N99: N110, N5, N60
N5 : N10, N20, N32,
N60, N80
N10: N20, N32, N60 N80
N80
N20: N32, N60, N99
N32 Lookup(K19)
N60

Centralized/distributed/hybrid


Napster, Gnutella, KaZaA
Unstructured/structured
 Unstructured
P2P – no control over topology and file
placement
Gnutella, Morpheus, Kazaa, etc
 Structured P2P – topology is tightly controlled and
placement of files are not random
 Chord, CAN, Pastry, Tapestry, etc

P2P overlay topology








Free riding – incentive mechanisms
Topological awareness
ISP-friendly
NAT traversal
Fault resilience
P2P traffic monitoring and detection
Security


Search – full index, partial index, semantic search
Spurious content, anonymity, trust & reputation management
Non-technical issues

Copyright infringement, intellectual privacy
Slide courtesy:
Prof. Dah Ming Chiu, Chinese University of Hong Kong, Hong Kong
Dr. Iqbal Mohomed, University of Toronto, Canada



IP multicast
CDN (Content Distribution Network)
Application layer multicast

Overlay structures
 Tree-based (push)
 Data-driven (pull)

P2P swarming
 BitTorrent, CoolStreaming


Released in the summer of 2001
Basic ideas from game theory to largely eliminate
the free-rider problem



All precedent systems could not deal with this problem
well
No strong guarantees unlike DHTs
Working extremely well in practice unlike DHTs 




A file is chopped into small pieces, called chunks
Pieces are disseminated over the network
As soon as a peer acquire a piece, it can trade it for
missing pieces with other peers
A peer hopes to be able to assemble the entire file
at the end




Web server
The .torrent file
Tracker
Peers

Content discovery (i.e., file search) is handled
outside of BitTorrent, using a Web server



To provide the “meta-info” file by HTTP
For example, http://bt.btchina.net
The information about each movie or content
is stored in a metafile such as
“supergirl.torrent”

Static file storing necessary meta information



Name
Size
Checksum
 The content is divided into many “chunks” (e.g., 1/4 megabyte
each)
 Each chunk is hashed to a checksum value
 When a peer later gets the chunks (from other peers), it can
check the authenticity by comparing the checksum

IP address and port of the Tracker
 Keeping
track of peers
 To allow peers to find one another
 To return a random list of active peers

Two types of peers:


Downloader (leecher) : A peer who has only a part (or
none) of the file.
Seeder: A peer who has the complete file, and chooses
to stay in the system to allow other peers to download
Matrix.torrent
User
Bob
Web Server
`
Tracker
User
Downloader:
David
User
Seeder:
Chris
User
Downloader:
Alice
Tracker
Web Server
C
A
Peer
Peer
[Leech]
B
Peer
[Leech]
[Seed]
Tracker
Web Server
C
A
Peer
Peer
[Leech]
B
Peer
[Leech]
[Seed]
Tracker
Web Server
C
A
Peer
Peer
[Leech]
B
Peer
[Leech]
[Seed]
Tracker
Web Server
C
A
Peer
Peer
[Leech]
B
Peer
[Leech]
[Seed]
Tracker
Web Server
C
A
Peer
Peer
[Leech]
B
Peer
[Leech]
[Seed]
Tracker
Web Server
C
A
Peer
Peer
[Leech]
B
Peer
[Leech]
[Seed]
Tracker
Web Server
C
A
Peer
Peer
[Leech]
B
Peer
[Leech]
[Seed]



A file is split into chunks of fixed size, typically 256Kb
Each peer maintains a bit map that indicates which chunks
it has
Each peer reports to all of its neighboring peers (obtained
from tracker) what chunks it has

This is the information used to build the implicit delivery trees
{1,2,3,4,5,6,7,8,9,10}
User
Seeder:
Alice
{}{1,2,3}
{1,2,3,5}
{}
{1,2,3}
{1,2,3,4}
{1,2,3,4,5}
User
Downloader
Bob
User
Downloader
Joe



Rarer pieces are given priority in downloading
with the rarest being the first candidate
The most common pieces are postponed
towards the end
This policy ensures that a variety of pieces are
downloaded from the seeder, resulting in
quicker chunk propagation
Basic idea of tit-for-tat strategy in BitTorrent:
 Maintain 4-5 “friends” with which to exchange chunks
 If a friend is not exchanging enough chunks, get rid of
him/her


Periodically, randomly select a new friend


Known as “choking” in BT
Known as “optimistic unchoking” in BT
If you have no friends, randomly select several new
friends

Known as “anti-snubbing” in BT
Alice
100kb/s
User
40kb/s
70kb/s
User
110kb/s
10kb/s
Downloader
Joe
70kb/s
10kb/s
20kb/s
30kb/s
5kb/s
15kb/s
User
Downloader:
Bob
User
Downloader:
Ed
User
Downloader:
David
User
Downloader:
Chris