* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Brief Overview of Academic Research on P2P
Network tap wikipedia , lookup
Distributed firewall wikipedia , lookup
Deep packet inspection wikipedia , lookup
Cracking of wireless networks wikipedia , lookup
IEEE 802.1aq wikipedia , lookup
Recursive InterNetwork Architecture (RINA) wikipedia , lookup
Distributed operating system wikipedia , lookup
Airborne Networking wikipedia , lookup
Brief Overview of Academic Research on P2P Pei Cao Relevant Conferences IPTPS (International Workshop on Peer-to-Peer Systems) ICDCS (IEEE Conference on Distributed Computer Systems) NSDI (USENIX Symposium on Network System Design and Implementation) PODC (ACM Symposium on Principles of Distributed Computing) SIGCOMM Areas of Research Focus Gnutella-Inspired BitTorrent-Inspired The “Directory Service” Problem The “File Distribution” Problem P2P Live Streaming P2P and Net Neutrality Gnutella-Inspired Research Studies The Applications and The Problems Napster, Gnutella, KaZaa/FastTrak, Skype Look for a particular content/object, and find which peer has it the “directory service” problem Challenge: how to offer a scalable directory service in a fully decentralized fashion Arrange direct transfer from the peer the “punch a hole in the firewall” problem Decentralized Directory Services Structured Networks DHT (Distributed Hash Tables) Very active research areas from 2001 to 2004 Limitation: lookup by keys only Multi-Attribute DHT Limited support for query-based lookup Unstructured Networks Various improvements to basic flooding based schemes What Is a DHT? Single-node hash table: key = Hash(name) put(key, value) get(key) -> value How do I do this across millions of hosts on the Internet? Distributed Hash Table Distributed Hash Tables Chord CAN Pastry Tapastry Symphony Koodle etc. The Problem N2 N1 Put (Key=“title” Value=file data…) Publisher Internet N4 • Key Placement • Routing to find key N5 N3 ? N6 Client Get(key=“title”) Key Placement Traditional hashing Nodes numbered from 1 to N Key is placed at node (hash(key) % N) Why Traditional Hashing have problems Consistent Hashing: IDs Key identifier = SHA-1(key) Node identifier = SHA-1(IP address) SHA-1 distributes both uniformly How to map key IDs to node IDs? Consistent Hashing: Placement Key 5 Node 105 K5 N105 K20 Circular 7-bit ID space N32 N90 K80 A key is stored at its successor: node with next higher ID Basic Lookup N120 N10 “Where is key 80?” N105 “N90 has K80” K80 N90 N60 N32 “Finger Table” Allows log(N)-time Lookups ¼ 1/8 1/16 1/32 1/64 1/128 N80 ½ Finger i Points to Successor of n+2i N120 112 ¼ 1/8 1/16 1/32 1/64 1/128 N80 ½ Lookups Take O(log(N)) Hops N5 N10 K19 N20 N110 N99 N32 Lookup(K19) N80 N60 Chord Lookup Algorithm Properties Interface: lookup(key) IP address Efficient: O(log N) messages per lookup N is the total number of servers Scalable: O(log N) state per node Robust: survives massive failures Simple to analyze Related Studies on DHTs Many variations of DHTs Different ways to choose the fingers Ways to make it more robust Ways to make it more network efficient Studies of different DHTs What happens when peers leave aka churns? Applications built using DHTs Tracker-less BitTorrent Beehive --- a P2P based DNS system Directory Lookups: Unstructured Networks Example: Gnutella Support more flexible queries Typically, precise “name” search is a small portion of all queries Simplicity High resilience against node failures Problems: Scalability Flooding # of messages ~ O(N*E) Flooding-Based Searches 1 3 2 5 6 4 7 8 . . . . . . . . . . . . Duplication increases as TTL increases in flooding Worst case: a node A is interrupted by N * q * degree(A) messages Problems with Simple TTLBased Flooding Hard to choose TTL: For objects that are widely present in the network, small TTLs suffice For objects that are rare in the network, large TTLs are necessary Number of query messages grow exponentially as TTL grows Idea #1: Adaptively Adjust TTL “Expanding Ring” Multiple floods: start with TTL=1; increment TTL by 2 each time until search succeeds Success varies by network topology Idea #2: Random Walk Simple random walk takes too long to find anything! Multiple-walker random walk N agents after each walking T steps visits as many nodes as 1 agent walking N*T steps When to terminate the search: check back with the query originator once every C steps Flexible Replication In unstructured systems, search success is essentially about coverage: visiting enough nodes to probabilistically find the object => replication density matters Limited node storage => what’s the optimal replication density distribution? In Gnutella, only nodes who query an object store it => ri pi What if we have different replication strategies? Optimal ri Distribution Goal: minimize ( pi/ ri ), where ri =R Calculation: introduce Lagrange multiplier , find ri and that minimize: ( pi/ ri ) + * ( ri - R) => - pi/ ri2 = 0 for all i => ri pi Square-Root Distribution General principle: to minimize ( pi/ ri ) under constraint ri =R, make ri proportional to square root of pi Other application examples: Bandwidth allocation to minimize expected download times Server load balancing to minimize expected request latency Achieving Square-Root Distribution Suggestions from some heuristics Store an object at a number of nodes that is proportional to the number of node visited in order to find the object Each node uses random replacement Two implementations: Path replication: store the object along the path of a successful “walk” Random replication: store the object randomly among nodes visited by the agents KaZaa Use Supernodes Regular Nodes : Supernodes = 100 : 1 Simple way to scale the system by a factor of 100 BitTorrent-Inspired Research Studies Modeling and Understanding BitTorrent Analysis based on modeling View it as a type of Gossip Algorithm Usually do not model the Tit-for-Tat aspects Assume perfectly connected networks Statistical modeling techniques Mostly published in PODC or SIGMETRICS Simulation Studies Different assumption of bottlenecks Varying details of the modeling of the data transfer Published in ICDCS and SIGCOMM Studies on Effect of BitTorrent on ISPs Observation: P2P contributes to cross-ISP traffic SIGCOMM 2006 publication on studies in Japan backbone traffic Attempts to improve network locality of BitTorrent-like applications ICDCS 2006 publicatoin Academic P2P file sharing systems Bullet, Julia, etc. Techniques to Alleviate the “Last Missing Piece” Problem Apply Network Coding to pieces exchanged between peers Pablo Rodriguez Rodriguez, Microsoft Research (recently moved to Telefonica Research) Use a different piece-replication strategy Dahlia Makhi, Microsoft Research “On Collaborative Content Distribution Using MultiMessage Gossip” Associate “age” with file segments Network Coding Main Feature Allowing intermediate nodes to encode packets Making optimal use of the available network resources Similar Technique: Erasure Codes Reconstructing the original content of size n from roughly a subset of any n symbols from a large universe of encoded symbols Network Coding in P2P: The Model Server Dividing the file into k blocks Uploading blocks at random to different clients Clients (Users) Collaborating with each other to assemble the blocks and reconstruct the original file Exchanging information and data with only a small subset of others (neighbors) Symmetric neighborhood and links Network Coding in P2P Assume a node with blocks B1, B2, …, Bk Pick random numbers C1, C2, …, Ck Construct new block E = C1 * B1 + C2 * B2 + … + Ck * Bk Send E and (C1, C2, …, Ck) to neighbor Decoding: collect enough linearly independent E’s, solve the linear system If all nodes pick vector C randomly, chances are high that after receiving ~K blocks, can recover B1 through Bk P2P Live Streaming Motivations Internet Applications: PPLive, PPStream, etc. Challenge: QoS Issues Raw bandwidth constraints Example: PPLive utilizes the significant bandwidth disparity between “Univeristy nodes” and “Residential nodes” Satisfying demand of content publishers P2P Live Streaming Can’t Stand on Its Own P2P as a complement to IP-Multicast P2P as a way to reduce server load Used where IP-Multicast isn’t enabled By sourcing parts of streams from peers, server load might be reduced by 10% P2P as a way to reduce backbone bandwidth requirements When core network bandwidth isn’t sufficient P2P and NetNeutrality It’s All TCP’s Fault TCP: per-flow fairness Browsers 2-4 TCP flows per web server Contact a few web servers at a time Short flows P2P applications: Much higher number of TCP connections Many more endpoints Long flows When and How to Apply Traffic Shaping Current practice: application recognition Needs: An application ignostic way to trigger the traffic shaping A clear statement to users on what happens