Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Network tap wikipedia , lookup
Computer network wikipedia , lookup
Backpressure routing wikipedia , lookup
Recursive InterNetwork Architecture (RINA) wikipedia , lookup
IEEE 802.1aq wikipedia , lookup
Distributed operating system wikipedia , lookup
Airborne Networking wikipedia , lookup
Everything2 wikipedia , lookup
CS522: Algorithmic and Economic Aspects of the Internet Instructors: Nicole Immorlica ([email protected]) Mohammad Mahdian ([email protected]) Peer-To-Peer Networks A network in which nodes employ distributed resources to accomplish critical task Nodes are typically equals, i.e. (approximately) indistinguishable in functionality System is highly dynamic, nodes frequently come and go P2P Definition Distributed systems consisting of interconnected nodes able to self-organize into network topologies with the purpose of sharing resources such as content, CPU cycles, storage and bandwidth, capable of adapting to failures and accommodating transient populations of nodes while maintaining acceptable connectivity and performance, without requiring the intermediation or support of a global centralized server or authority. – A Survey of Peer-To-Peer Content Distribution Technologies, Androutsellis-Theotokis and Spinellis Peer-To-Peer Applications Direct real-time communication: instant messaging Combine processing power of multiple distributed machines to perform complex computations: analysis of SETI data, prime computation Store and distribute digital content: mp3 file sharing Peer-To-Peer Benefits Self-organized and adaptive Easily scalable Fault-tolerant and load balanced Resistant to censorship P2P Construction Clients Servers P2P overlay network SPRINT AOL MIT UUNET P2P Classification Data organization Centralization Unstructured Hybrid Napster, IM Partial Kazaa, Gia None Gnutella Loosely Structured Highly Structured Freenet Chord, CAN Napster, IM 2 3 1 Server 7 File Transfer 6 4 5 Centralized servers maintain list of files and peer at which file is stored Peers join, leave, and query network via direct communication with servers File transfers occur directly between peers Napster, IM Advantages: Highly efficient data lookup Rapidly adapts to changes in network Disadvantages: Questionable scalability Vulnerable to censorship, failure, attack Gnutella All peers, called servents, are identical and function as both servers and clients A peer joins network by contacting existing servents (chosen from online databases) using PING messages A servent receiving a PING message replies with a PONG message and forwards PING to other servents Peer connects to servents who send PONG Gnutella A servent queries network by sending a QUERY message A servent receiving a QUERY message replies with a QUERYHIT message if he can answer the query. If not, he forwards QUERY message to other servents Routing in Gnutella How PING/QUERY messages are forwarded affects network topology, search efficiency/accuracy, and scalability Proposals Breadth-First-Search: flooding, iterative deepening, modified random BFS Depth-First-Search: random walk, k-walker random walks, two-level random walk, dominating set based search Hybrid Search Random walk with lookahead: short random walks with shallow local flooding Takes advantage of “supernodes” or nodes of high degree Stationary dist. of random walk is naturally biased towards supernodes Lookahead allows search to quickly discover content stored at all neighbors of these highdegree nodes Supernodes Improve scalability and performance of Gnutella-like systems via supernodes Supernodes are special peers with high degree, elected dynamically according to bandwidth and other considerations Supernodes maintain a list of content stored at peers Advantages: Searches propagate on supernodes, 3 to 5 times faster Takes advantage of heterogeneity in network Gnutella Advantages Entirely decentralized, pure P2P network Highly resistant to failure Disadvantages Search is time-consuming Network typically scales poorly Chord Distributed hash table (DHT) implementation Each node/piece of content has an ID Content IDs are deterministically mapped to node IDs so a searcher knows exactly where data is located, a content addressable network Efficient: O(log n) messages per lookup Scalable: O(log n) state per node Keys in Chord m bit identifier space for both nodes and content keys Content ID = hash(content) Node ID = hash(IP address) Both are uniformly distributed How to map content IDs to node IDs? Mapping Content to Nodes 0 K5 IP=“198.10.10.1” N123 K101 N90 K20 Circular 7-bit ID space N32 Content = “U2” K60 Figure adapted from Stoica et al. Content is stored at successor node, node with next higher ID Routing Every node knows of every other node Routing tables O(n), lookup O(1) N10 Where is “U2”? Hash(“U2”) = K60 N123 N32 “N90 has K60” K60 N90 N55 Figure adapted from Stoica et al. Routing Every node knows its successor in ring Routing tables O(1), lookup O(n) N10 Where is “U2”? Hash(“U2”) = K60 N123 N32 “N90 has K60” K60 N90 N55 Figure adapted from Stoica et al. Routing Every node knows m others Distances increase exponentially, node i points to node whose ID is successor of i + 2j for j from 1 to m. These pointers are called fingers. The finger (routing) table and search time are both O(log n) Finger Tables N112 80 + 25 N16 80 + 26 N96 80 + 24 80 + 23 80 + 22 80 + 21 80 + 20 N80 Figure adapted from Stoica et al. Routing with Finger Tables N5 N10 N110 N20 K19 N99 N32 Lookup(K19) N80 N60 Figure adapted from Stoica et al. Chord Dynamics When a node joins Initialize all fingers of new node Update fingers of existing nodes Transfer content from successor to new node When a node leaves Transfer content to successor Chord Failures Churn rate is very high (on average, nodes are in system for only 60 minutes) and events happen concurrently Churn (esp. ungraceful departures or simultaneous joins/departures) can failure states, e.g. inconsistencies in successor relationships or, worse, loopy states Requires a lot of maintenance messages to preserve ideal state Maintenance in P2P Maintenance protocol ensures global connectivity and efficient lookup by continuously repairing overlay network and routing tables Maintenance is essential, e.g. when a node Joins and announces presence Updates routing table to ensure efficient search Monitors neighbors for failures/departures Cost of maintenance protocol can be measured in terms of the rate of maintenance messages Half Life Defn. Suppose there are Nt nodes at time t. Let the doubling time t be such that at time t + t, Nt new nodes have arrived. Similarly let halving time t be such that at time t + t, Nt/2 nodes have departed. Then the half life of a system is mint(t, t). The half life is the average amount of time until half the system has been replaced. Measures rate of change of system. Example Nodes arrive according to Poisson with rate : prob. k arrivals in time t proportional to e-t Nodes remain for duration exponential rate : prob. node stays for amount of time is e- If system in steady state, then arrival rate must equal departure rate N, so N = / . Doubling time = N/ = 1/, halving time = (ln 2)/, and so half life = (ln 2)/. Bounding Maintenance Costs Thm. There exists a sequence of joins and leaves such that any node that, at any time, has received an average of fewer than k notifications per half-life will be disconnected from the network with prob. at least (1 – 1/e)k. Cor. Any N-node P2P network that remains connected with prob. at least 1 – 1/N must generate an average of (log N) notifications per node per half life. Proof of Bound Consider Poisson arrival rate , exponential waiting time =1 in system. Suppose node n averages fewer than k notifications per half life and so there is a minimum time t such that at time t, n has received less than kt notifications. Observe n is isolated at time t with probability at least (1 – 1/(e-1))k. Maintenance in Chord Liben-Nowell, Balakrishnan, Karger: (Modified) Chord requires only O(log2 n) maintenance messages per half life to maintain efficiency and correctness of search. Chord Advantages: Highly efficient search Good load balancing Disadvantages: Locality of data is destroyed Only handles exact match queries, but keyword queries are more prevalent Most requests are for highly replicated files (needles vs haystack) Conclusion Saw several representative P2P systems, each with advantages and disadvantages Many important issues Efficiency of search Ability to adapt to dynamics of system Security: Malicious peers, Spread of worms Free riding: Reputation mechanisms, Micropayment mechanisms Legality