* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download P2P
Survey
Document related concepts
Remote Desktop Services wikipedia , lookup
IEEE 802.1aq wikipedia , lookup
Airborne Networking wikipedia , lookup
Deep packet inspection wikipedia , lookup
Distributed operating system wikipedia , lookup
Recursive InterNetwork Architecture (RINA) wikipedia , lookup
Transcript
Peer-to-Peer Applications Course on Computer Communication and Networks, CTH/GU The slides are adaptation of slides of the authors of the main textbook of the course, of the authors in the bibliography section and of Jeff Pang. P2P 1 P2P file sharing Example r Alice runs P2P client application on her notebook computer r Intermittently connects to Internet; gets new IP address for each connection r Asks for “Hey Jude” r Application displays other peers that have copy of Hey Jude. r Alice chooses one of the peers, Bob. r File is copied from Bob’s PC to Alice’s notebook: HTTP r While Alice downloads, other users uploading from Alice. r Alice’s peer is both a Web client and a transient Web server. All peers are servers = highly scalable! P2P 2 Intro r Quickly grown in popularity m Dozens or hundreds of file sharing applications m many million people worldwide use P2P networks m Audio/Video transfer now dominates traffic on the Internet r But what is P2P? m Searching or location? m Computers “Peering”? m Take advantage of resources at the edges of the network • End-host resources have increased dramatically • Broadband connectivity now common P2P 3 The Lookup Problem N1 Key=“title” Value=MP3 data… Publisher N2 Internet N4 N5 N3 ? Client Lookup(“title”) N6 P2P 4 The Lookup Problem (2) r Common Primitives: m m m m Join: how to I begin participating? Publish: how do I advertise my file? Search: how to I find a file? Fetch: how to I retrieve a file? P2P 5 Overview r Centralized Database m Napster r Query Flooding m Gnutella r Hierarchical Query Flooding m KaZaA r Swarming m BitTorrent r Unstructured Overlay Routing m Freenet r Structured Overlay Routing m r Distributed Hash Tables Other P2P 6 P2P: centralized directory original “Napster” design (1999, S. Fanning) 1) when peer connects, it informs central server: m m Bob centralized directory server 1 peers IP address content 2) Alice queries directory server for “Hey Jude” 3) Alice requests file from Bob Problems? 1 3 1 2 r Single point of failure r Performance bottleneck 1 Alice r Copyright infringement P2P 7 Napster: Publish insert(X, 123.2.21.23) ... Publish I have X, Y, and Z! 123.2.21.23 P2P 8 Napster: Search 123.2.0.18 Fetch Query search(A) --> 123.2.0.18 Reply Where is file A? P2P 9 Napster: Discussion r Pros: Simple m Search scope is O(1) m Controllable (pro or con?) r Cons: m Server maintains O(N) State m Server does all processing m Single point of failure m P2P 10 Overview r Centralized Database m Napster r Query Flooding m Gnutella r Hierarchical Query Flooding m KaZaA r Swarming m BitTorrent r Unstructured Overlay Routing m Freenet r Structured Overlay Routing m r Distributed Hash Tables Other P2P 11 Gnutella: History r In 2000, J. Frankel and T. Pepper from Nullsoft released Gnutella r Soon many other clients: Bearshare, Morpheus, LimeWire, etc. r In 2001, many protocol enhancements including “ultrapeers” P2P 12 Gnutella: Overview r Query Flooding: m Join: on startup, client contacts a few other nodes; these become its “neighbors” m Publish: no need m Search: ask neighbors, who ask their neighbors, and so on... when/if found, reply to sender. m Fetch: get the file directly from peer P2P 13 Gnutella: Search I have file A. I have file A. Reply Query Where is file A? P2P 14 Gnutella: protocol r Query message sent over existing TCP connections r peers forward Query message r QueryHit sent over reverse Query path File transfer: HTTP Query QueryHit QueryHit Scalability: limited scope flooding P2P 15 Gnutella: More on Peer joining 1. 2. 3. 4. 5. Joining peer X must find some other peer in Gnutella network (from gnutellahosts.com): use list of candidate peers X sequentially attempts to make TCP with peers on list until connection setup with Y X sends Ping message to Y; Y forwards (floods) Ping message. All peers receiving Ping message respond with Pong message X receives many Pong messages. It can then (later on, if needed) setup additional TCP connections P2P 16 Query flooding: Gnutella r fully distributed m no central server r public domain protocol overlay network: r edge between peer X and Y if there’s a TCP r many Gnutella clients connection implementing protocol r all active peers and edges is overlay net r Edge is not a physical link r Given peer will typically be connected with < 10 overlay neighbors What is routing in p2p networks? P2P 17 Gnutella: Discussion r Pros: m Fully de-centralized m Search cost distributed r Cons: m Search scope is O(N) m Search time is O(???) m Nodes leave often, network unstable r Improvements: m limiting depth of search m Random walks instead of flooding P2P 18 Aside: Search Time? P2P 19 Aside: All Peers Equal? 1.5Mbps DSL 1.5Mbps DSL Quic kTime™ and a TIFF (Unc ompres sed) decompress or are needed to see this picture. 56kbps Modem 1.5Mbps DSL 10Mbps LAN 1.5Mbps DSL 56kbps Modem 56kbps Modem P2P 20 Overview r Centralized Database m Napster r Query Flooding m Gnutella r Hierarchical Query Flooding m KaZaA r Swarming m BitTorrent r Unstructured Overlay Routing m Freenet r Structured Overlay Routing m r Distributed Hash Tables Other P2P 21 KaZaA: History r In 2001, KaZaA created by Dutch company Kazaa BV r Single network called FastTrack used by other clients as well: Morpheus, giFT, etc. r Eventually protocol changed so other clients could no longer talk to it r popular file sharing network with >10 million users (number varies) P2P 22 KaZaA: Overview r “Smart” Query Flooding: m Join: on startup, client contacts a “supernode” ... may at some point become one itself m Publish: send list of files to supernode m Search: send query to supernode, supernodes flood query amongst themselves. m Fetch: get the file directly from peer(s); can fetch simultaneously from multiple peers P2P 23 KaZaA: Network Design “Super Nodes” P2P 24 KaZaA: File Insert insert(X, 123.2.21.23) ... Publish I have X! 123.2.21.23 P2P 25 KaZaA: File Search search(A) --> 123.2.22.50 123.2.22.50 Query Replies search(A) --> 123.2.0.18 Where is file A? 123.2.0.18 P2P 26 KaZaA: Discussion r Pros: m Tries to take into account node heterogeneity: • Bandwidth • Host Computational Resources • Host Availability (?) m r Cons: m m r Rumored to take into account network locality Mechanisms easy to circumvent Still no real guarantees on search scope or search time P2P architecture used by Skype P2P 27 Overview r Centralized Database m Napster r Query Flooding m Gnutella r Hierarchical Query Flooding m KaZaA r Swarming m BitTorrent r Unstructured Overlay Routing m Freenet r Structured Overlay Routing m r Distributed Hash Tables Other P2P 28 BitTorrent: History r r In 2002, B. Cohen debuted BitTorrent Key Motivation: m m r Focused on Efficient Fetching, not Searching: m m r Popularity exhibits temporal locality (Flash Crowds) E.g., Slashdot effect, CNN on 9/11, new movie/game release Distribute the same file to all peers Single publisher, multiple downloaders Has some “real” publishers: m Blizzard Entertainment using it to distribute games P2P 29 BitTorrent: Overview r Swarming: m Join: contact centralized “tracker” server, get a list of peers. m Publish: Run a tracker server. m Search: Out-of-band. E.g., use Google to find a tracker for the file you want. m Fetch: Download chunks of the file from your peers. Upload chunks you have to them. P2P 30 File distribution: BitTorrent r P2P file distribution tracker: tracks peers participating in torrent torrent: group of peers exchanging chunks of a file obtain list of peers trading chunks peer 2: Application Layer 31 BitTorrent (1) r file divided into 256KB chunks. r peer joining torrent: has no chunks, but will accumulate them over time m registers with tracker to get list of peers, connects to subset of peers (“neighbors”) r while downloading, peer uploads chunks to other peers. r peers may come and go r once peer has entire file, it may (selfishly) leave or (altruistically) remain m 2: Application Layer 32 BitTorrent (2) Pulling Chunks r at any given time, different peers have different subsets of file chunks r periodically, a peer (Alice) asks each neighbor for list of chunks that they have. r Alice sends requests for her missing chunks m rarest first Sending Chunks: tit-for-tat r Alice sends chunks to four neighbors currently sending her chunks at the highest rate re-evaluate top 4 every 10 secs r every 30 secs: randomly select another peer, starts sending chunks newly chosen peer may join top 4 “optimistically unchoke” 2: Application Layer 33 BitTorrent: Tit-for-tat (1) Alice “optimistically unchokes” Bob (2) Alice becomes one of Bob’s top-four providers; Bob reciprocates (3) Bob becomes one of Alice’s top-four providers With higher upload rate, can find better trading partners & get file faster! 2: Application Layer 34 BitTorrent: Publish/Join Tracker P2P 35 BitTorrent: Fetch P2P 36 BitTorrent: Sharing Strategy “Tit-for-tat” sharing strategy m m “I’ll share with you if you share with me” Be optimistic: occasionally let freeloaders download • Otherwise no one would ever start! • Also allows you to discover better peers to download from when they reciprocate r Approximates Pareto Efficiency m Game Theory: “No change can make anyone better off without making others worse off” P2P 37 BitTorrent: Summary r Pros: m m Works reasonably well in practice Gives peers incentive to share resources; avoids freeloaders r Cons: m Central tracker server needed to bootstrap swarm P2P 38 BTW: File Distribution: Server-Client vs P2P Question : How much time to distribute file from one server to N peers? us: server upload bandwidth Server us u1 d1 u2 ui: peer i upload bandwidth d2 File, size F dN uN di: peer i download bandwidth Network (with abundant bandwidth) 2: Application Layer 39 BTW: File distribution time: server-client r server sequentially sends N copies: m NF/us time r client i takes F/di time to download Time to distribute F to N clients using client/server approach Server F us dN u1 d1 u2 d2 Network (with abundant bandwidth) uN = dcs = max { NF/us, F/min(di) } i increases linearly in N (for large N) 2: Application Layer 40 BTW: File distribution time: P2P Server r server must send one u1 d1 u2 F d2 copy: F/us time us r client i takes F/di time Network (with dN to download abundant bandwidth) uN r NF bits must be downloaded (aggregate) r fastest possible upload rate: us + Sui dP2P = max { F/us, F/min(di) , NF/(us + Sui) } i 2: Application Layer 41 Server-client vs. P2P: example Client upload rate = u, F/u = 1 hour, us = 10u, dmin ≥ us Minimum Distribution Time 3.5 P2P Client-Server 3 2.5 2 1.5 1 0.5 0 0 5 10 15 20 25 30 35 N 2: Application Layer 42 Overview r Centralized Database m Napster r Query Flooding m Gnutella r Hierarchical Query Flooding m KaZaA r Swarming m BitTorrent r Unstructured Overlay Routing m Freenet r Structured Overlay Routing m r Distributed Hash Tables Other P2P 43 Freenet: History r In 1999, I. Clarke started the Freenet project r Basic Idea: m Employ shortest-path-like routing on the overlay network to publish and locate files r Addition goals: m Provide anonymity and security m Make censorship difficult P2P 44 Freenet: Overview r Routed Queries: m Join: on startup, client contacts a few other nodes it knows about; gets a unique node id m Publish: route file contents toward the file id. File is stored at node with id closest to file id m Search: route query for file id toward the closest node id (DFS+caching in freenet vs. BFS+nocaching in Gnutella) m Fetch: when query reaches a node containing file id, it returns the file to the sender P2P 45 Freenet: Routing Tables r r r id – file identifier (e.g., hash of file) next_hop – another node that stores the file id file – file identified by id being stored on the local node m m If file id stored locally, then stop If not, search for the “closest” id in the table, and forward the message to the corresponding next_hop If data is not found, failure is reported back … m … r Forwarding of query for file id id next_hop • Forward data back to upstream requestor • Requestor then tries next closest match in routing table P2P file 46 Freenet: Routing query(10) n2 n1 4 n1 f4 12 n2 f12 5 n3 1 9 n3 f9 4’ 4 2 n3 3 n1 f3 14 n4 f14 5 n3 14 n5 f14 13 n2 f13 3 n6 n5 4 n1 f4 10 n5 f10 8 n6 5 id next_hop file … … DFS+caching in freenet vs. BFS+nocaching in Gnutella) n4 P2P 47 Freenet: Routing Properties r “Close” file ids tend to be stored on the same node m Why? Publications of similar file ids route toward the same place r Network tend to be a “small world” m Small number of nodes have large number of neighbors (i.e., ~ “six-degrees of separation”) r Consequence: m Most queries only traverse a small number of hops to find the file (cf. also 80-20 rule) P2P 48 Freenet: Discussion Pros: Intelligent routing makes queries relatively short Search scope small (only nodes along search path involved); no flooding Cons: Still no provable guarantees! +/- anonymity Anonymity properties may give you “plausible deniability” Anonymity features make it hard to measure, debug P2P 49 Overview Centralized Database Napster Query Flooding Gnutella Hierarchical Query Flooding KaZaA Swarming BitTorrent Unstructured Overlay Routing Freenet Structured Overlay Routing Distributed Hash Tables Other P2P 50 DHT: History In 2000-2001, academic researchers said “we want to play too!” Motivation: Frustrated by popularity of all these “half-baked” P2P apps :) We can do better! (so we said) Guaranteed lookup success for files in system Provable bounds on search time Provable scalability to millions of node Hot Topic in networking ever since P2P 51 Motivation How to find data in a distributed file sharing system? Publisher Key=“LetItBe” Value=MP3 data N2 N1 Internet N4 o N3 N5 Client ? Lookup(“LetItBe”) Lookup is a key problem P2P 52 Centralized Solution o Central server (Napster) Publisher Key=“LetItBe” Value=MP3 data N2 N1 N3 Internet DB N4 N5 Client Lookup(“LetItBe”) Requires O(M) state o Single point of failure o P2P 53 Distributed Solution (1) o Flooding (Gnutella, Morpheus, etc.) Publisher Key=“LetItBe” Value=MP3 data N2 N1 Internet N4 o N3 N5 Client Lookup(“LetItBe”) Worst case O(N) messages per lookup P2P 54 Distributed Solution (2) o Routed messages (Freenet, Tapestry, Chord, CAN, etc.) Publisher Key=“LetItBe” Value=MP3 data N2 N1 Internet N4 o N3 N5 Client Lookup(“LetItBe”) Only exact matches P2P 55 Routing Challenges o Define a useful key nearness metric o Keep the hop count small o Keep the routing tables “right size” o Stay robust despite rapid changes in membership P2P 56 DHT: Overview Abstraction: a distributed “hash-table” (DHT) data structure: put(id, item); item = get(id); Implementation: nodes in system form a distributed data structure Can be Ring, Tree, Hypercube, Skip List, Butterfly Network, ... P2P 57 DHT: Overview (2) Structured Overlay Routing: Join: On startup, contact a “bootstrap” node and integrate yourself into the distributed data structure; get a node id Publish: Route publication for file id toward a close node id along the data structure Search: Route a query for file id toward a close node id. Data structure guarantees that query will meet the publication. Fetch: Two options: • Publication contains actual file => fetch from where query stops • Publication says “I have file X” => query tells you 128.2.1.3 has X, use IP routing to get X from 128.2.1.3 P2P 58 Chord Overview o Provides peer-to-peer hash lookup service: o Lookup(key) IP address o How does Chord locate a node? o How does Chord maintain routing tables? o How does Chord cope with changes in membership? P2P 59 Chord IDs o m bit identifier space for both keys and nodes o Key identifier = SHA-1(key) Key=“LetItBe” o SHA-1 ID=60 Node identifier = SHA-1(IP address) IP=“198.10.10.1” SHA-1 ID=123 o Both are uniformly distributed o How to map key IDs to node IDs? P2P 60 Consistent Hashing [Karger 97] 0 K5 IP=“198.10.10.1” N123 K101 N90 o K20 Circular 7-bit ID space N32 Key=“LetItBe” K60 A key is stored at its successor: node with next higher ID P2P 61 Consistent Hashing o Every node knows of every other node o requires global information o Routing tables are large O(N) o Lookups are fast O(1) 0 N10 Where is “LetItBe”? Hash(“LetItBe”) = K60 N123 N32 “N90 has K60” K60 N90 N55 P2P 62 Chord: Basic Lookup o Every node knows its successor in the ring 0 N10 N123 Where is “LetItBe”? Hash(“LetItBe”) = K60 N32 “N90 has K60” K60 N90 o requires O(N) time N55 P2P 63 “Finger Tables” o Every node knows m other nodes in the ring o Increase distance exponentially N112 80 + 25 N16 80 + 26 N96 80 + 24 80 + 23 80 + 22 80 + 21 80 + 20 N80 P2P 64 “Finger Tables” o Finger i points to successor of n+2i N120 N112 80 + 25 N16 80 + 26 N96 80 + 24 80 + 23 80 + 22 80 + 21 80 + 20 N80 P2P 65 Lookups are Faster o Lookups take O(Log N) hops oResembles binary search N5 N10 N110 N20 K19 N99 N32 Lookup(K19) N80 N60 P2P 66 Chord properties o Efficient: O(Log N) messages per lookup o N is the total number of servers o Scalable: O(Log N) state per node o Robust: survives massive changes in membership o Proofs are in paper / tech report o Assuming no malicious participants P2P 67 DHT: Chord Summary r Routing table size? m Log N fingers r Routing time? m Each hop expects to 1/2 the distance to the desired id => expect O(log N) hops. P2P 68 DHT: Discussion r Pros: m m Guaranteed Lookup O(log N) per node state and search scope r Cons: m No one uses them? (only one file sharing app) m Supporting non-exact match search is hard P2P 69 Overview r Centralized Database m Napster r Query Flooding m Gnutella r Hierarchical Query Flooding m KaZaA r Swarming m BitTorrent r Unstructured Overlay Routing m Freenet r Structured Overlay Routing m r Distributed Hash Tables Other P2P 70 Other P2P networks r Content-Addressable Network (CAN) topological r r r r r r routing (k-dimensional grid) Tapestry, Pastry, DKS: ring organization as Chord, other measure of key closeness, k-ary search instead of binary search paradigm Viceroy: butterfly-type overlay network .... Sethi@home: for sharing computational resources... Skype .... P2P 71 P2P Case study: Skype Skype clients (SC) r inherently P2P: pairs of users communicate. r proprietary Skype login server application-layer protocol (inferred via reverse engineering) r hierarchical overlay with SNs r Index maps usernames to IP addresses; distributed over SNs Supernode (SN) 2: Application Layer 72 Peers as relays r Problem when both Alice and Bob are behind “NATs”. m NAT prevents an outside peer from initiating a call to insider peer r Solution: m Using Alice’s and Bob’s SNs, Relay is chosen m Each peer initiates session with relay. m Peers can now communicate through NATs via relay 2: Application Layer 73 P2P: Summary r r Many different styles of organization (data, routing); pros and cons of each Lessons learned: m m m m m m m Single points of failure are bad Flooding messages to everyone is bad Underlying network topology is important Not all nodes are equal Need incentives to discourage freeloading Privacy and security are important Structure can provide bounds and guarantees r Observe: Application-layer networking (sublayers...) P2P 74 Bibliography r r r r r Ion Stoica, Robert Morris, David Karger, Frans Kaashoek, Hari Balakrishnan. Chord: A Scalable Peer-To-Peer Lookup Service for Internet Applications. Proceedings of the ACM SIGCOMM, 2001. Sylvia Ratnasamy, Paul Francis, Mark Handley, Richard Karp, Scott Shenker. A Scalable Content-Addressable Network. Proceedings of the ACM SIGCOMM, 2001. M.A. Jovanovic, F.S. Annexstein, and K.A.Berman. Scalability Issues in Large Peer-to-Peer Networks - A Case Study of Gnutella. University of Cincinnati, Laboratory for Networks and Applied Graph Theory, 2001. Frank Dabek, Emma Brunskill, M. Frans Kaashoek, David Karger, Robert Morris, Ion Stoica, Hari Balakrishnan. Building Peer-to-Peer Systems With Chord, a Distributed Lookup Service. Proceedings of the 8th Workshop on Hot Topics in Operating Systems (HotOSVIII), 2001. Ian Clarke, Oskar Sandberg, Brandon Wiley, and Theodore W. Hong. Freenet: A Distributed Anonymous Information Storage and Retrieval System. Designing Privacy Enhancing Technologies: International Workshop on Design Issues in Anonymity and Unobservability. LLNCS 2009. Springer Verlag 2001. P2P 75 Some extra notes P2P 76 Chord: Joining the Ring o o Three step process: o Initialize all fingers of new node o Update fingers of existing nodes o Transfer keys from successor to new node Less aggressive mechanism (lazy finger update): o Initialize only the finger to successor node o Periodically verify immediate successor, predecessor o Periodically refresh finger table entries P2P 77 Joining the Ring - Step 1 o Initialize the new node finger table o Locate any node p in the ring o Ask node p to lookup fingers of new node N36 o Return results to new node N5 N20 N36 N99 1. Lookup(37,38,40,…,100,164) N40 N80 N60 P2P 78 Joining the Ring - Step 2 o Updating fingers of existing nodes o new node calls update function on existing nodes o existing nodes can recursively update fingers of other nodes N5 N20 N99 N36 N40 N80 N60 P2P 79 Joining the Ring - Step 3 o Transfer keys from successor node to new node o only keys in the range are transferred N5 N20 N99 N36 K30 N40 K38 K30 N80 Copy keys 21..36 from N40 to N36 K38 N60 P2P 80 Chord: Handling Failures o o Use successor list o Each node knows r immediate successors o After failure, will know first live successor o Correct successors guarantee correct lookups Guarantee is with some probability Can choose r to make probability of lookup failure arbitrarily small o P2P 81 Chord: Evaluation Overview o Quick lookup in large systems o Low variation in lookup costs o Robust despite massive failure o Experiments confirm theoretical results P2P 82