Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
UCDavis, ecs251 Fall 2007 ecs251 Fall 2007: Operating System Models #4: Peer-to-Peer Systems Dr. S. Felix Wu Computer Science Department University of California, Davis http://www.cs.ucdavis.edu/~wu/ [email protected] 10/23/2007 P2P 1 UCDavis, ecs251 Fall 2007 The role of service provider.. Centralized management of services – DNS, Google, www.cnn.com, Blockbuster, SBC/Sprint/AT&T, cable service, Grid computing, AFS, bank transactions… Information, Computing, & Network resources owned by one or very few administrative domains. – Some with SLA (Service Level Agreement) 10/23/2007 P2P 2 UCDavis, ecs251 Fall 2007 Interacting with the “SP” Service providers are the owner of the information and the interactions – Some enhance/establish the interactions 10/23/2007 P2P 3 UCDavis, ecs251 Fall 2007 Incentives for SP Business model By law 10/23/2007 P2P 4 UCDavis, ecs251 Fall 2007 Let’s compare … Google Blockbuster CNN MLB/NBA LinkIn e-Bay 10/23/2007 P2P Skype Bittorrent Blog Youtube BotNet Cyber-Paparazzi 5 UCDavis, ecs251 Fall 2007 Toward P2P More participation of the end nodes (or their users) – More decentralized Computing/Network resources available – End-user controllability and interactions – Security/robustness concerns 10/23/2007 P2P 6 UCDavis, ecs251 Fall 2007 Peer’s role Shifting functionality/service/information from the SP to the peers. – Peer’s CPU cycles – Peer’s Internet access – Peer’s storage 10/23/2007 P2P 7 UCDavis, ecs251 Fall 2007 Service Providers in P2P We might not like SP, but we still can not avoid SP entirely. – Who is going to lay the fiber and switch? – Can we avoid DNS? – How can we stop “Cyber-Bullying” and other similar? – Copyright enforcement? – Internet becomes a junkyard? 10/23/2007 P2P 8 UCDavis, ecs251 Fall 2007 We will discuss… P2P system examples – Unstructured, structured, incentive Architectural analysis and issues Future P2P applications and why? 10/23/2007 P2P 9 UCDavis, ecs251 Fall 2007 Questions to ask Peer’s role (or SP’s role) Peer’s controllability and vulnerability Incentives to contribute Peer’s mobility and dynamics Scalability 10/23/2007 P2P 10 UCDavis, ecs251 Fall 2007 Challenge to you… Define a new P2P-related application, service, or architecture. Justify why it is practical, useful and will scale well. – Example: sharing cooking recipes, experiences & recommendations about restaurants and hotels 10/23/2007 P2P 11 UCDavis, ecs251 Fall 2007 Napster P2P File sharing “Unstructured” 10/23/2007 P2P 12 UCDavis, ecs251 Fall 2007 Napster peers Napster serv er Index 1. Fil e lo cation req uest Napster serv er Index 3. Fil e req uest 2. Lis t of pee rs offerin g th e fi le 5. Ind ex upd ate 4. Fil e de livered 10/23/2007 P2P 13 UCDavis, ecs251 Fall 2007 Napster Advantages? Disadvantages? 10/23/2007 P2P 14 UCDavis, ecs251 Fall 2007 Napster: Questions to ask Peer’s role (or SP’s role) Peer’s controllability and vulnerability Incentives to contribute Peer’s mobility and dynamics Scalability 10/23/2007 P2P 15 UCDavis, ecs251 Fall 2007 10/23/2007 P2P 16 UCDavis, ecs251 Fall 2007 10/23/2007 P2P 17 UCDavis, ecs251 Fall 2007 Originally conceived of by Justin Frankel, 21 year old founder of Nullsoft March 2000, Nullsoft posts Gnutella to the web A day later AOL removes Gnutella at the behest of Time Warner The Gnutella protocol version 0.4 http://www9.limewire.com/developer/gnutella_protocol_0.4.pdf and version 0.6 http://rfc-gnutella.sourceforge.net/Proposals/Ultrapeer/Ultrapeers.htm there are multiple open source implementations at http://sourceforge.net/ including: – Jtella – Gnucleus Software released under the Lesser Gnu Public License (LGPL) the Gnutella protocol has been widely analyzed 10/23/2007 P2P 18 UCDavis, ecs251 Fall 2007 Gnutella Protocol Messages Broadcast Messages – Ping: initiating message (“I’m here”) – Query: search pattern and TTL (time-to-live) Back-Propagated Messages – Pong: reply to a ping, contains information about the peer – Query response: contains information about the computer that has the needed file Node-to-Node Messages – GET: return the requested file – PUSH: push the file to me 10/23/2007 P2P 19 UCDavis, ecs251 Fall 2007 Steps: • Node 2 initiates search for file A 7 1 A 4 2 6 3 5 10/23/2007 P2P 20 UCDavis, ecs251 Fall 2007 A Steps: • Node 2 initiates search for file A • Sends message to all neighbors 7 1 4 2 3 A 6 A 5 10/23/2007 P2P 21 UCDavis, ecs251 Fall 2007 A A Steps: • Node 2 initiates search for file A • Sends message to all neighbors • Neighbors forward message 7 1 4 2 6 3 A 10/23/2007 A 5 P2P 22 UCDavis, ecs251 Fall 2007 A:7 A 7 1 4 2 6 3 A:5 10/23/2007 A A Steps: • Node 2 initiates search for file A • Sends message to all neighbors • Neighbors forward message • Nodes that have file A initiate a reply message 5 P2P 23 UCDavis, ecs251 Fall 2007 7 1 4 2 3 A:7 A:5 A 6 A 10/23/2007 Steps: • Node 2 initiates search for file A • Sends message to all neighbors • Neighbors forward message • Nodes that have file A initiate a reply message • Query reply message is backpropagated 5 P2P 24 UCDavis, ecs251 Fall 2007 7 1 A:7 2 4 A:5 6 3 Steps: • Node 2 initiates search for file A • Sends message to all neighbors • Neighbors forward message • Nodes that have file A initiate a reply message • Query reply message is backpropagated 5 10/23/2007 P2P 25 UCDavis, ecs251 Fall 2007 Limited Scope Flooding Reverse Path Forwarding download A 1 7 4 2 6 3 5 Steps: • Node 2 initiates search for file A • Sends message to all neighbors • Neighbors forward message • Nodes that have file A initiate a reply message • Query reply message is backpropagated • File download • Note: file transfer between clients behind firewalls is not possible; if only one client, X, is behind a firewall, Y can request that X push the file to Y 10/23/2007 P2P 26 UCDavis, ecs251 Fall 2007 Gnutella Advantages? Disadvantages? 10/23/2007 P2P 27 UCDavis, ecs251 Fall 2007 Gnutella: Questions to ask Peer’s role (or SP’s role) Peer’s controllability and vulnerability Incentives to contribute Peer’s mobility and dynamics Scalability 10/23/2007 P2P 28 UCDavis, ecs251 Fall 2007 Gnutella vs. Napster 10/23/2007 P2P 29 UCDavis, ecs251 Fall 2007 GUID: Short for Global Unique Identifier, a randomized string that is used to uniquely identify a host or message on the Gnutella Network. This prevents duplicate messages from being sent on the network. GWebCache: a distributed system for helping servants connect to the Gnutella network, thus solving the "bootstrapping" problem. Servants query any of several hundred GWebCache servers to find the addresses of other servants. GWebCache servers are typically web servers running a special module. Host Catcher: Pong responses allow servants to keep track of active gnutella hosts On most servants, the default port for Gnutella is 6346 10/23/2007 P2P 30 10/23/2007 Gnutella Network Growth P2P 05/12/01 05/16/01 05/22/01 05/24/01 05/29/01 50 02/27/01 03/01/01 03/05/01 03/09/01 03/13/01 03/16/01 03/19/01 03/22/01 03/24/01 11/20/00 11/21/00 11/25/00 11/28/00 Number of nodes in the largest network component ('000) UCDavis, ecs251 Fall 2007 . 40 30 20 10 - 31 UCDavis, ecs251 Fall 2007 “Limited Scope Flooding” Ripeanu reported that Gnutella traffic totals 1Gbps (or 330TB/month). – Compare to 15,000TB/month in US Internet backbone (December 2000) – this estimate excludes actual file transfers Reasoning: QUERY and PING messages are flooded. They form more than 90% of generated traffic predominant TTL=7 >95% of nodes are less than 7 hops away measured traffic at each link about 6kbs network with 50k nodes and 170k links 10/23/2007 P2P 32 UCDavis, ecs251 Fall 2007 A B F D E C G H Perfect Mapping 10/23/2007 P2P 33 UCDavis, ecs251 Fall 2007 A B F D E C G H Inefficient mapping Link D-E needs to support six times higher traffic. 10/23/2007 P2P 34 UCDavis, ecs251 Fall 2007 Topology mismatch The overlay network topology doesn’t match the underlying Internet infrastructure topology! 40% of all nodes are in the 10 largest Autonomous Systems (AS) Only 2-4% of all TCP connections link nodes within the same AS Largely ‘random wiring’ Most Gnutella generated traffic crosses AS border, making the traffic more expensive May cause ISPs to change their pricing scheme 10/23/2007 P2P 35 UCDavis, ecs251 Fall 2007 Scalability Whenever a node receives a message, (ping/query) it sends copies out to all of its other connections. existing mechanisms to reduce traffic: – TTL counter – Cache information about messages they received, so that they don't forward duplicated messages. 10/23/2007 P2P 36 UCDavis, ecs251 Fall 2007 70% of Gnutella users share no files 90% of users answer no queries Those who have files to share may limit number of connections or upload speed, resulting in a high download failure rate. If only a few individuals contribute to the public good, these few peers effectively act as centralized servers. 10/23/2007 P2P 37 UCDavis, ecs251 Fall 2007 Anonymity Gnutella provides for anonymity by masking the identity of the peer that generated a query. However, IP addresses are revealed at various points in its operation: HITS packets includes the URL for each file, revealing the IP addresses 10/23/2007 P2P 38 UCDavis, ecs251 Fall 2007 Query Expressiveness Format of query not standardized No standard format or matching semantics for the QUERY string. Its interpretation is completely determined by each node that receives it. String literal vs. regular expression Directory name, filename, or file contents Malicious users may even return files unrelated to the query 10/23/2007 P2P 39 UCDavis, ecs251 Fall 2007 Superpeers Cooperative, long-lived peers typically with significant resources to handle very high amount of query resolution traffic. 10/23/2007 P2P 40 UCDavis, ecs251 Fall 2007 10/23/2007 P2P 41 UCDavis, ecs251 Fall 2007 Gnutella is a self-organizing, large-scale, P2P application that produces an overlay network on top of the Internet; it appears to work Growth is hindered by the volume of generated traffic and inefficient resource use since there is no central authority the open source community must commit to making any changes Suggested changes have been made by – Peer-to-Peer Architecture Case Study: Gnutella Network, by Matei Ripeanu – Improving Gnutella Protocol: Protocol Analysis and Research Proposals by Igor Ivkovic 10/23/2007 P2P 42 UCDavis, ecs251 Fall 2007 Freenet before v0.7 Essentially the same as Gnutella: – Limited-scope flooding – Reverse-path forwarding Difference: – Data objects (I.e., files) are also being delivered via “reverse-path forwarding” Freenet v0.7 -- “Darknet routing” 10/23/2007 P2P 43 UCDavis, ecs251 Fall 2007 P2P Issues Scalability & Load Balancing Anonymity Fairness, Incentives & Trust Security and Robustness Efficiency Mobility 10/23/2007 P2P 44 UCDavis, ecs251 Fall 2007 Incentive-driven Fairness P2P means we all should contribute.. – Hopefully fair, but the majority is selfish… “Incentive for people to contribute…” 10/23/2007 P2P 45 UCDavis, ecs251 Fall 2007 10/23/2007 P2P 46 UCDavis, ecs251 Fall 2007 File Organization File 1 2 3 4 Piece 256KB Block 16KB Incomplete Piece 10/23/2007 P2P 47 UCDavis, ecs251 Fall 2007 Initialization HTTP GET MYFILE.torrent webserver tracker MYFILE.torrent http://mytracker.com:6969/ S3F5YHG6FEB FG5467HGF367 “register” F456JI9N5FF4E … list of peers ID1 169.237.234.1:6881 ID2 190.50.34.6:5692 ID3 34.275.89.143:4545 … ID50 231.456.31.95:6882 … Peer 40 10/23/2007 user P2P Peer 2 Peer 1 48 UCDavis, ecs251 Fall 2007 Peer/Seed 1 10/23/2007 2 3 P2P 4 49 UCDavis, ecs251 Fall 2007 “On the Wire” Protocol (Over TCP) 0 BitField Remote Peer Interested = 0 choked = 1 10/23/2007 Non-keepalive messages: 0 – choke 11 – unchoke 0 1 2 – interested 3 – not interested ID/Infohash 4 – haveHandshake BitField 5 – bitfield 6 – request 7 – piece 8 – cancel P2P Local Peer Interested = 0 choked = 1 50 UCDavis, ecs251 Fall 2007 Choking By default, every peer is “choked” – stop “uploading” to them, but the TCP connection is still there. Select 4~6 peers to “unchoke” ?? – “Re-choke” every 30 seconds – How to decide? Optimistic Unchoking – What is this? 10/23/2007 P2P 51 UCDavis, ecs251 Fall 2007 “Interested” A request for a piece (or its sub-pieces) 10/23/2007 P2P 52 UCDavis, ecs251 Fall 2007 Get a piece/block!! Download: – Which peer? (download from whom? Does it matter?) – Which piece? How about “upload”? – Which peer? – Which piece? 10/23/2007 P2P 53 UCDavis, ecs251 Fall 2007 Piece Selection Pipelining (5 requests) Strict Priority (incomplete pieces first) Rarest First What is the problem? 10/23/2007 P2P 54 UCDavis, ecs251 Fall 2007 Rarest First Exchanging bitmaps with 20+ peers – Initial messages – “have” messages Array of buckets – Ith buckets contains “pieces” with I known instances – Within the same bucket, the client will randomly select one piece. 10/23/2007 P2P 55 UCDavis, ecs251 Fall 2007 Piece Selection Pipelining (5 requests) Strict Priority 3 stages: – Random first piece – Rarest First – Endgame mode 10/23/2007 P2P 56 UCDavis, ecs251 Fall 2007 Piece Selection Piece (64K~1M) Sub-piece (16K) – Piece-size: trade-off between performance and the size of the torrent file itself – A client might request different sub-pieces of the same piece from different peers. Strict Priority - sub-pieces and piece Rarest First – Exception: “random first” – Get the stuff out of Seed(s) as soon as possible.. 10/23/2007 P2P 57 UCDavis, ecs251 Fall 2007 Get a piece/block!! Download: – Which peer? – Which piece? How about “upload”? – Which peer? – Which piece? 10/23/2007 P2P 58 UCDavis, ecs251 Fall 2007 Peer Selection Focus on Rate Upload to 4~6 peers Random Unchoke Global rate cap only 10/23/2007 P2P 59 UCDavis, ecs251 Fall 2007 Bittorrent: “Tit for Tat” Equivalent Retaliation (Game theory) – A peer will “initially” cooperate, then respond in kind to an opponent's previous action. If the opponent previously was cooperative, the agent is cooperative. If not, the agent is not. 10/23/2007 P2P 60 UCDavis, ecs251 Fall 2007 Choking By default, every peer is “choked” – stop “uploading” to them, but the TCP connection is still there. Select 4~6 peers to “unchoke” ?? – Best “upload rates” and “interested”. – Uploading to the unchoked ones and monitor the download rate for all the peers – “Re-choke” every 30 seconds Optimistic Unchoking (6+1) – Randomly select a choked peer to unchoke 10/23/2007 P2P 61 UCDavis, ecs251 Fall 2007 Bittorrent Fairness of download and upload between a pair of peers Every 10 seconds, estimate the download bandwidth from the other peer – Based on the performance estimation to decide to continue uploading to the other peer or not 10/23/2007 P2P 62 UCDavis, ecs251 Fall 2007 Properties Bigger “%” = better chance of unchoked Bigger “%” ~= better UL and DL rates ?! 10/23/2007 P2P 63 UCDavis, ecs251 Fall 2007 Who to Unchoke? Peer/Seed 1 10/23/2007 2 3 P2P 4 64 UCDavis, ecs251 Fall 2007 Seed unchoking old algorithm – unchoke the fastest peers (how?) – problem: fastest peers may monopolize seeds new algorithm periodically sort all peers according to their last unchoke time prefer the most recently unchoked peers; on a tie, prefer the fastest (presumably) achieves equal spread of seed bandwidth 10/23/2007 P2P 65 UCDavis, ecs251 Fall 2007 Seed unchoking old algorithm – unchoke the fastest peers (how?) – problem: fastest peers may monopolize seeds new algorithm periodically sort all peers according to their last unchoke time prefer the most recently unchoked peers; on a tie, prefer the fastest (presumably) achieves equal spread of seed bandwidth 10/23/2007 P2P 66 UCDavis, ecs251 Fall 2007 Attacks to BT ??? 10/23/2007 P2P 67 UCDavis, ecs251 Fall 2007 Attacks to BT Download only from the seeds Download only from fastest peers Announcing false pieces Privacy -- (Torrent, source IP addresses) 10/23/2007 P2P 68 UCDavis, ecs251 Fall 2007 BitTorrent: Questions to ask Peer’s role (or SP’s role) Peer’s controllability and vulnerability Incentives to contribute Peer’s mobility and dynamics Scalability 10/23/2007 P2P 69 UCDavis, ecs251 Fall 2007 Bittorrent “Tic-for-Tat” incentive model within the same torrent Piece/Peer selection and choking The need for tracker and torrent file 10/23/2007 P2P 70 UCDavis, ecs251 Fall 2007 Client implementations mainline: written in Python; right now, the only one employing the new seed unchoking algorithm Azureus: the most popular, written in Java; implements a special protocol between clients (e.g. peers can exchange peer lists) other popular clients: ABC, BitComet, BitLord, BitTornado, μTorrent, Opera browser various non-standard extensions – retaliation mode: detect compromised/malicious peers – anti-snubbing: ignore a peer who ignores us – super seeding: seed masquerading as a leecher 10/23/2007 P2P 71 UCDavis, ecs251 Fall 2007 Resources Basic BitTorrent mechanisms [Cohen, P2PECON’03] BitTorrent specification Wiki http://wiki.theory.org/BitTorrentSpecification Measurement studies [Izal et al., PAM’04], [Pouwelse et al., Delft TR 2004 and IPTPS’05], [Guo et al., IMC’05], and [Legout et al., INRIA-TR-2006] Theoretical analysis and modeling [Qiu et al., SIGCOMM’04], and [Tian et al., Infocom’06] Simulations [Bharambe et al., MSR-TR-2005] Sharing incentives and exploiting them [Shneidman et al., PINS’04], [Jun et al., P2PECON’05], and [Liogkas et al., IPTPS’06] 10/23/2007 P2P 72 UCDavis, ecs251 Fall 2007 Trackerless Bittorrent Every BT peer is a tracker! But, how would they share and exchange information regarding other peers? Similar to Napster’s index server or DNS 10/23/2007 P2P 73 UCDavis, ecs251 Fall 2007 Pure P2P Every peer is a tracker Every peer is a DNS server Every peer is a Napster Index server How can this be done? – We try to remove/reduce the role of “special servers”! 10/23/2007 P2P 74 UCDavis, ecs251 Fall 2007 Unstructured P2P P2P network topology is arbitrary! – Gnutella – BitTorrent and Napster 10/23/2007 P2P 75 UCDavis, ecs251 Fall 2007 Unstructured P2P P2P network topology is arbitrary! – Gnutella – BitTorrent and Napster Mapping - [Content/Object, Peer] is arbitrary! – How to search the content in Gnutella? – BitTorrent and Napster? 10/23/2007 P2P 76 UCDavis, ecs251 Fall 2007 Disadvantages of u-P2P Flooding cost Rare contents – Also, the issue of “All or 1” 10/23/2007 P2P 77 UCDavis, ecs251 Fall 2007 From u-P2P to s-P2P P2P network topology is formed according to the content ownership Unique “naming/keying” for the content object as well as the peer 10/23/2007 P2P 78 UCDavis, ecs251 Fall 2007 Peer The requirements of Peer? 10/23/2007 P2P 79 UCDavis, ecs251 Fall 2007 Structured Peering Peer identity and routability 10/23/2007 P2P 80 UCDavis, ecs251 Fall 2007 Structured Peering Peer identity and routability Key/content assignment – Which identity owns what? (Google Search?) 10/23/2007 P2P 81 UCDavis, ecs251 Fall 2007 Structured Peering Peer identity and routability Key/content assignment – Which identity owns what? Napster: centralized index service Skype/Kazaa: login-server & super peers DNS: hierarchical DNS servers Two problems: (1). How to connect to the “topology”? (2). How to prevent failures/changes? 10/23/2007 P2P 82 UCDavis, ecs251 Fall 2007 DHT Most s-P2P systems are DHT-based. Distributed hash tables (DHTs) – decentralized lookup service of a hash table – (name, value) pairs stored in the DHT – any peer can efficiently retrieve the value associated with a given name – the mapping from names to values is distributed among peers 10/23/2007 P2P 83 UCDavis, ecs251 Fall 2007 HT as a search table (BitTorrent, Napster) “160 bits” Index key Content Object/ Peer naming 10/23/2007 Information/content is distributed, and we need to know where? Where is this piece of music? Is this BT piece available? What is the location of this type of content? What is the current IP address of this skype user? P2P 84 UCDavis, ecs251 Fall 2007 DHT as a search table ??? Index key 10/23/2007 P2P 85 UCDavis, ecs251 Fall 2007 DHT as a search table ??? Index key 10/23/2007 P2P 86 UCDavis, ecs251 Fall 2007 DHT segment ownership ??? Index key 10/23/2007 P2P 87 UCDavis, ecs251 Fall 2007 DHT Scalable Peer arrivals, departures, and failures Unstructured versus structured 10/23/2007 P2P 88 UCDavis, ecs251 Fall 2007 DHT (Name, Value) How to utilize DHT to avoid Trackers in Bittorrent? 10/23/2007 P2P 89 UCDavis, ecs251 Fall 2007 DHT-based Tracker FreeBSD 5.4 CD images Publish the key on the class web site. Index key Whoever owns this hash entry is the tracker for the corresponding key! Seed’s IP address PUT & GET 10/23/2007 P2P 90 UCDavis, ecs251 Fall 2007 Chord Given a key (content object), it maps the key onto a peer -- consistent hash Assign keys to peers. Solves problem of locating key in a collection of distributed peers. Maintains routing information as peers join and leave the system 10/23/2007 P2P 91 UCDavis, ecs251 Fall 2007 Chord Consistent Hashing A Simple Key Lookup Algorithm Scalable Key Lookup Algorithm Node Joins and Stabilization Node Failures 10/23/2007 P2P 92 UCDavis, ecs251 Fall 2007 Consistent Hashing Consistent hash function assigns each peer and key an m-bit identifier (e.g., 140 bits). SHA-1 as a base hash function. A peer’s identifier is defined by hashing the peer’s IP address. (other possibilities?) A content identifier is produced by hashing the key: – ID(peer) = SHA-1(IP, Port) – ID(content) = SHA-1(related to the content object) – Application-dependent! 10/23/2007 P2P 93 UCDavis, ecs251 Fall 2007 Peer, Content In an m-bit identifier space, there are 2 identifiers (for both peer and content). Which peer handles which content? 10/23/2007 P2P m 94 UCDavis, ecs251 Fall 2007 Peer, Content In an m-bit identifier space, there are 2 identifiers (for both peer and content). Which peer handles which contents? m – We will not have 2m peers/contents! – Each peer might need to handle more than one contents. – In that case, which peer has what? 10/23/2007 P2P 95 UCDavis, ecs251 Fall 2007 Consistent Hashing m In an m-bit identifier space, there are 2 identifiers. m an identifier circle modulo 2 . The identifier ring is called Chord ring. Content X is assigned to the first peer whose identifier is equal to or follows (the identifier of) X in the identifier space. This peer is the successor peer of key X, denoted by successor(X). 10/23/2007 P2P 96 UCDavis, ecs251 Fall 2007 Successor Peers identifier node 6 1 0 6 identifier circle 6 5 2 2 successor(2) = 3 3 4 10/23/2007 key successor(1) = 1 1 7 successor(6) = 0 X 2 P2P 97 UCDavis, ecs251 Fall 2007 Join and Departure When a node N joins the network, certain contents previously assigned to N’s successor now become assigned to N. When node N leaves the network, all of its assigned contents are reassigned to N’s successor. 10/23/2007 P2P 99 UCDavis, ecs251 Fall 2007 Join keys 5 7 keys 1 0 1 7 keys 6 2 5 3 keys 2 4 10/23/2007 P2P 100 UCDavis, ecs251 Fall 2007 Departure keys 7 keys 1 0 1 7 keys 6 6 2 5 3 keys 2 4 10/23/2007 P2P 101 UCDavis, ecs251 Fall 2007 Join/Depart What information must be maintained? 10/23/2007 P2P 102 UCDavis, ecs251 Fall 2007 Join/Depart What information must be maintained? – Pointer to successor(s) – Content itself (but application dependent) 10/23/2007 P2P 103 UCDavis, ecs251 Fall 2007 Tracker gone? FreeBSD 5.4 CD images Publish the key on the class web site. Index key Whoever owns this hash entry is the tracker for the corresponding key! Seed’s IP address PUT & GET 10/23/2007 P2P 104 UCDavis, ecs251 Fall 2007 How to identify the tracker? And, its IP address, of course? 10/23/2007 P2P 105 UCDavis, ecs251 Fall 2007 A Simple Key Lookup A very small amount of routing information suffices to implement consistent hashing in a distributed environment If each node knows only how to contact its current successor node on the identifier circle, all node can be visited in linear order. Queries for a given identifier could be passed around the circle via these successor pointers until they encounter the node that contains the key. 10/23/2007 P2P 106 UCDavis, ecs251 Fall 2007 A Simple Key Lookup Pseudo code for finding successor: // ask node n to find the successor of id N.find_successor(id) if (id (N, successor]) return successor; else // forward the query around the circle return successor.find_successor(id); 10/23/2007 P2P 107 UCDavis, ecs251 Fall 2007 A Simple Key Lookup The path taken by a query from node 8 for key 54: 10/23/2007 P2P 108 UCDavis, ecs251 Fall 2007 Successor Each active node MUST know the IP address of its successor! – N8 has to know that the next node on the ring is N14. Departure N8 => N21 But, how about failure or crash? 10/23/2007 P2P 109 UCDavis, ecs251 Fall 2007 Robustness Successor in R hops – N8 => N14, N21, N32, N38 (R=4) – Periodic pinging along the path to check, & also find out maybe there are “new members” in between 10/23/2007 P2P 110 UCDavis, ecs251 Fall 2007 Is that good enough? 10/23/2007 P2P 111 UCDavis, ecs251 Fall 2007 Without Periodic Ping…?? Triggered only by dynamics (Join/Depart)! 10/23/2007 P2P 112 UCDavis, ecs251 Fall 2007 Complexity of the search Time/messages: O(N) – N: # of nodes on the Ring Space: O(1) – We only need to remember R IP addresses Stablization depends on “period”. 10/23/2007 P2P 113 UCDavis, ecs251 Fall 2007 Scalable Key Location To accelerate lookups, Chord maintains additional routing information. This additional information is not essential for correctness, which is achieved as long as each node knows its correct successor. 10/23/2007 P2P 114 UCDavis, ecs251 Fall 2007 Finger Tables Each node N’ maintains a routing table with up to m entries (which is in fact the number of bits in identifiers), called finger table. The ith entry in the table at node N contains the identity of the first node s that succeeds N by at i-1 least 2 on the identifier circle. i-1 s = successor (n+2 ). s is called the ith finger of node N, denoted by N.finger(i) 10/23/2007 P2P 115 UCDavis, ecs251 Fall 2007 Finger Tables i-1 s = successor (n+2 ). finger table start For. 0+20 0+21 0+22 1 2 4 1 6 1 3 0 0 1+2 1+21 1+22 2 3 5 succ. keys 1 3 3 0 2 5 finger table For. start 3 0 3+2 3+21 3+22 4 10/23/2007 succ. finger table For. start 0 7 keys 6 P2P 4 5 7 succ. keys 2 0 0 0 116 UCDavis, ecs251 Fall 2007 Finger Tables A finger table entry includes both the Chord identifier and the IP address (and port number) of the relevant node. The first finger of N is the immediate successor of N on the circle. 10/23/2007 P2P 117 UCDavis, ecs251 Fall 2007 Example query The path a query for key 54 starting at node 8: 10/23/2007 P2P 118 UCDavis, ecs251 Fall 2007 Scalable Key Location Since each node has finger entries at power of two intervals around the identifier circle, each node can forward a query at least halfway along the remaining distance between the node and the target identifier. From this intuition follows a theorem: Theorem: With high probability, the number of nodes that must be contacted to find a successor in an N-node network is O(logN). 10/23/2007 P2P 119 UCDavis, ecs251 Fall 2007 Complexity of the Search Time/messages: O(logN) – N: # of nodes on the Ring Space: O(logN) – We need to remember R IP addresses – We need to remember logN Fingers Stablization depends on “period”. 10/23/2007 P2P 120 UCDavis, ecs251 Fall 2007 An Example M = 140 (identifier size), ring size is 2140 N = 216 (# of nodes) How many entries we need to have for the Finger Table? Each node n’ maintains a routing table with up to m entries (which is in fact the number of bits in identifiers), called finger table. The ith entry in the table at node n contains the identity of the first node s that succeeds n by at least 2i-1 on the identifier circle. s = successor(n+2i-1). 10/23/2007 P2P 121 UCDavis, ecs251 Fall 2007 Complexity of the Search Time/messages: O(M) – M: # of bits of the identifier Space: O(M) – We need to remember R IP addresses – We need to remember M Fingers Stablization depends on “period”. 10/23/2007 P2P 122 UCDavis, ecs251 Fall 2007 Structured Peering Peer identity and routability – 2M identifiers, Finger Table routing Key/content assignment – Hashing Dynamics/Failures – Inconsistency?? 10/23/2007 P2P 123 UCDavis, ecs251 Fall 2007 Joins and Stabilizations The most important thing is the successor pointer. If the successor pointer is ensured to be up to date, which is sufficient to guarantee correctness of lookups, then finger table can always be verified. Each node runs a “stabilization” protocol periodically in the background to update successor pointer and finger table. 10/23/2007 P2P 124 UCDavis, ecs251 Fall 2007 Node Joins – stabilize() Each time node N runs stabilize(), it asks its successor for the it’s predecessor p, and decides whether p should be N’s successor instead. stabilize() notifies node N’s successor of N’s existence, giving the successor the chance to change its predecessor to N. The successor does this only if it knows of no closer predecessor than N. 10/23/2007 P2P 125 UCDavis, ecs251 Fall 2007 Node Joins – stabilize() // called periodically. verifies N’s immediate // successor, and tells the successor about N. N.stabilize() x = successor.predecessor; if (x (N, successor)) successor = x; successor.notify(N); // N’ thinks it might be our predecessor. n.notify(N’) if (predecessor is nil or N’ (predecessor, N)) predecessor = N’; 10/23/2007 P2P 126 UCDavis, ecs251 Fall 2007 Stabilization – – pred(ns) = n nil succ(np) = ns n predecessor = nil n acquires ns as successor via some n’ n runs stabilize – – succ(np) = n pred(ns) = np ns n notifies ns being the new predecessor ns acquires n as its predecessor np runs stabilize – – – – np asks ns for its predecessor (now n) np acquires n as its successor np notifies n n will acquire np as its predecessor all predecessor and successor pointers are now correct fingers still need to be fixed, but old fingers will still work np 10/23/2007 n joins P2P 127 UCDavis, ecs251 Fall 2007 fix_fingers() Each node periodically calls fix fingers to make sure its finger table entries are correct. It is how new nodes initialize their finger tables It is how existing nodes incorporate new nodes into their finger tables. 10/23/2007 P2P 128 UCDavis, ecs251 Fall 2007 Node Joins – fix_fingers() // called periodically. refreshes finger table entries. N.fix_fingers() next = next + 1 ; if (next > m) next = 1 ; finger[next] = find_successor(N + 2next-1); // checks whether predecessor has failed. n.check_predecessor() if (predecessor has failed) predecessor = nil; 10/23/2007 P2P 129 UCDavis, ecs251 Fall 2007 Node Failures Key step in failure recovery is maintaining correct successor pointers To help achieve this, each node maintains a successor-list of its r nearest successors on the ring If node n notices that its successor has failed, it replaces it with the first live entry in the list Successor lists are stabilized as follows: – node n reconciles its list with its successor s by copying s’s successor list, removing its last entry, and prepending s to it. – If node n notices that its successor has failed, it replaces it with the first live entry in its successor list and reconciles its successor list with its new successor. 10/23/2007 P2P 131 UCDavis, ecs251 Fall 2007 Chord – The Math Every node is responsible for about K/N keys (N nodes, K keys) When a node joins or leaves an N-node network, only O(K/N) keys change hands (and only to and from joining or leaving node) Lookups need O(log N) messages To reestablish routing invariants and finger tables after node joining or leaving, only O(log2N) messages are required 10/23/2007 P2P 132 UCDavis, ecs251 Fall 2007 Chord: Questions to ask Peer’s role (or SP’s role) Peer’s controllability and vulnerability Incentives to contribute Peer’s mobility and dynamics Scalability 10/23/2007 P2P 133 UCDavis, ecs251 Fall 2007 Pros/Cons about DHT 10/23/2007 P2P 134