* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download ppt
		                    
		                    
								Survey							
                            
		                
		                
                            
                            
								Document related concepts							
                        
                        
                    
						
						
							Transcript						
					
					INF5071 – Performance in distributed systems: Peer-to-Peer Systems 27/10 & 10/11 – 2006 Client-Server    Traditional distributed computing Successful architecture, and will continue to be so (adding proxy servers) Tremendous engineering necessary to make server farms scalable and robust backbone network local distribution network INF5071 – performance in distributed systems local distribution network local distribution network 2006 Carsten Griwodz & Pål Halvorsen Distribution with proxies  Hierarchical distribution system completeness of available content root servers  E.g. proxy caches that consider popularity  Popular videos replicated and kept close to clients  Unpopular ones close to the root servers  Popular videos are replicated more frequently regional servers local servers end-systems INF5071 – performance in distributed systems 2006 Carsten Griwodz & Pål Halvorsen Peer-to-Peer (P2P)  Really an old idea - a distributed system architecture    No centralized control Nodes are symmetric in function Typically, many nodes, but unreliable and heterogeneous backbone network local distribution network INF5071 – performance in distributed systems local distribution network local distribution network 2006 Carsten Griwodz & Pål Halvorsen Overlay networks Overlay node Overlay link IP link backbone network LAN LAN backbone network backbone network LAN LAN IP path INF5071 – performance in distributed systems IP routing 2006 Carsten Griwodz & Pål Halvorsen P2P  Many aspects similar to proxy caches        Nodes act as clients and servers Distributed storage Bring content closer to clients Storage limitation of each node Number of copies often related to content popularity Necessary to make replication and de-replication decisions Redirection  But  No distinguished roles  No generic hierarchical relationship  At most hierarchy per data item  Clients do not know where the content is  May need a discovery protocol  All clients may act as roots (origin servers)  Members of the P2P network come and go INF5071 – performance in distributed systems 2006 Carsten Griwodz & Pål Halvorsen P2P Systems  Peer-to-peer systems  New considerations for distribution systems  Considered here     Scalability, fairness, load balancing Content location Failure resilience Routing  Application layer routing  Content routing  Request routing  Not considered here  Copyright  Privacy  Trading INF5071 – performance in distributed systems 2006 Carsten Griwodz & Pål Halvorsen Examples: Napster Napster  Program for sharing (music) files over the Internet  Approach taken  Central index  Distributed storage and download  All downloads are shared  P2P aspects  Client nodes act also as file servers INF5071 – performance in distributed systems 2006 Carsten Griwodz & Pål Halvorsen Napster  Client connects to Napster with login and password  Transmits current listing of join shared files central index  Napster registers username, maps username to IP address and records song list INF5071 – performance in distributed systems ... 2006 Carsten Griwodz & Pål Halvorsen Napster  Client sends song request to Napster server  Napster checks song database query central index answer  Returns matched songs with usernames and IP addresses (plus extra stats) INF5071 – performance in distributed systems ... 2006 Carsten Griwodz & Pål Halvorsen Napster  User selects a song, download request sent straight to user  Machine contacted if available get file central index ... INF5071 – performance in distributed systems 2006 Carsten Griwodz & Pål Halvorsen Napster: Assessment  Scalability, fairness, load balancing  Replication to querying nodes  Number of copies increases with popularity     Large distributed storage Unavailability of files with low popularity Network topology is not accounted for at all Latency may be increased  Content location  Simple, centralized search/location mechanism  Can only query by index terms  Failure resilience  No dependencies among normal peers  Index server as single point of failure INF5071 – performance in distributed systems 2006 Carsten Griwodz & Pål Halvorsen Examples: Gnutella Gnutella  Program for sharing files over the Internet  Approach taken     Purely P2P, centralized nothing Dynamically built overlay network Query for content by overlay broadcast No index maintenance  P2P aspects  Peer-to-peer file sharing  Peer-to-peer querying  Entirely decentralized architecture  Many iterations to fix poor initial design (lack of scalability) INF5071 – performance in distributed systems 2006 Carsten Griwodz & Pål Halvorsen Gnutella: Joining  Connect to one known host and send a broadcast ping  Can be any host, hosts transmitted through word-of-mouth or hostcaches  Use overlay broadcast ping through network with TTL of 7 TTL 4 INF5071 – performance in distributed systems TTL 3 TTL 2 TTL 1 2006 Carsten Griwodz & Pål Halvorsen Gnutella: Joining  Hosts that are not overwhelmed respond with a routed pong  Gnutella caches these IP addresses or replying nodes as neighbors  In the example the grey nodes do not respond within a certain amount of time (they are overloaded) INF5071 – performance in distributed systems 2006 Carsten Griwodz & Pål Halvorsen Gnutella: Query  Query by broadcasting in the overlay query  Send query to all overlay neighbors TTL:6  Overlay neighbors forward query to all their neighbors  Up to 7 layers deep (TTL 7) INF5071 – performance in distributed systems TTL:7 2006 Carsten Griwodz & Pål Halvorsen Gnutella: Query  Send routed responses  To the overlay node that was the source of the broadcast query  Querying client receives several responses  User receives a list of files that matched the query and a corresponding IP address INF5071 – performance in distributed systems 2006 Carsten Griwodz & Pål Halvorsen Gnutella: Transfer  File transfer  Using direct communication  File transfer protocol not part of the Gnutella specification INF5071 – performance in distributed systems 2006 Carsten Griwodz & Pål Halvorsen Gnutella: Assessment  Scalability, fairness, load balancing  Replication to querying nodes  Number of copies increases with popularity     Large distributed storage Unavailability of files with low popularity Bad scalability, uses flooding approach Network topology is not accounted for at all, latency may be increased  Content location  No limits to query formulation  Less popular files may be outside TTL  Failure resilience  No single point of failure  Many known neighbors  Assumes quite stable relationships INF5071 – performance in distributed systems 2006 Carsten Griwodz & Pål Halvorsen Examples: Freenet Freenet  Program for sharing files over the Internet  Focus on anonymity  Approach taken      Purely P2P, centralized nothing Dynamically built overlay network Query for content by hashed query and best-first-search Caching of hash values and content Content forwarding in the overlay  P2P aspects     Peer-to-peer file sharing Peer-to-peer querying Entirely decentralized architecture Anonymity INF5071 – performance in distributed systems 2006 Carsten Griwodz & Pål Halvorsen Freenet: Nodes and Data  Nodes  Routing tables  Contain IP addresses of other nodes and the hash values they hold (resp. held)  Data is indexed with a hash values  “Identifiers” are hashed  Identifiers may be keywords, author ids, or the content itself  Secure Hash Algorithm (SHA-1) produces a “one-way” 160bit key  Content-hash key (CHK) = SHA-1(content)  Typically stores blocks INF5071 – performance in distributed systems 2006 Carsten Griwodz & Pål Halvorsen Freenet: Storing and Retrieving Data  Storing Data  Data is moved to a server with arithmetically close keys 1. The key and data are sent to the local node 2. The key and data is forwarded to the node with the nearest key Repeat 2 until maximum number of hops is reached  Retrieving data  Best First Search 1. An identifier is hashed into a key 2. The key is sent to the local node 3. If data is not in local store, the request is forwarded to the best neighbor Repeat 3 with next best neighbor until data found, or request times out 4. If data is found, or hop-count reaches zero, return the data or error along the chain of nodes (if data found, intermediary nodes create entries in their routing tables) INF5071 – performance in distributed systems 2006 Carsten Griwodz & Pål Halvorsen Freenet: Best First Search query ... ?  Heuristics for Selecting Direction >RES: Returned most results <TIME: Shortest satisfaction time <HOPS: Min hops for results >MSG: Sent us most messages (all types) <QLEN: Shortest queue <LAT: Shortest latency >DEG: Highest degree INF5071 – performance in distributed systems 2006 Carsten Griwodz & Pål Halvorsen Freenet: Routing Algorithm INF5071 – performance in distributed systems 2006 Carsten Griwodz & Pål Halvorsen Freenet: Assessment  Scalability, fairness, load balancing  Caching in the overlay network  Access latency decreases with popularity  Large distributed storage  Fast removal of files with low popularity  A lot of storage wasted on highly popular files  Network topology is not accounted for  Content location  Search by hash key: limited ways to formulate queries  Content placement changes to fit search pattern  Less popular files may be outside TTL  Failure resilience  No single point of failure INF5071 – performance in distributed systems 2006 Carsten Griwodz & Pål Halvorsen Examples: FastTrack, Morpheus, OpenFT FastTrack, Morpheus, OpenFT  Peer-to-peer file sharing protocol  Three different nodes  USER  Normal nodes  SEARCH  Keep an index of “their” normal nodes  Answer search requests  INDEX  Keep an index of search nodes  Redistribute search requests INF5071 – performance in distributed systems 2006 Carsten Griwodz & Pål Halvorsen FastTrack, Morpheus, OpenFT USER SEARCH INDEX INF5071 – performance in distributed systems 2006 Carsten Griwodz & Pål Halvorsen FastTrack, Morpheus, OpenFT USER SEARCH ! ! INDEX INF5071 – performance in distributed systems ? 2006 Carsten Griwodz & Pål Halvorsen FastTrack, Morpheus, OpenFT: Assessment  Scalability, fairness, load balancing      Large distributed storage Avoids broadcasts Load concentrated on super nodes (index and search) Network topology is partially accounted for Efficient structure development  Content location  Search by hash key: limited ways to formulate queries  All indexed files are reachable  Can only query by index terms  Failure resilience  No single point of failure but overlay networks of index servers (and search servers) reduces resilience  Relies on very stable relationship / Content is registered at search nodes  Relies on a partially static infrastructure INF5071 – performance in distributed systems 2006 Carsten Griwodz & Pål Halvorsen Examples: BitTorrent BitTorrent  Distributed download system  Content is distributed in segments  Tracker  One central download server per content  Approach to fairness (tit-for-tat) per content  No approach for finding the tracker  No content transfer protocol included INF5071 – performance in distributed systems 2006 Carsten Griwodz & Pål Halvorsen BitTorrent Segment download operation  Tracker tells peer source and number of segment to get  Peer retrieves content in pull mode  Peer reports availability of new segment to tracker Tracker INF5071 – performance in distributed systems 2006 Carsten Griwodz & Pål Halvorsen BitTorrent No second input stream: not contributed enough Tracker Rarest first strategy INF5071 – performance in distributed systems 2006 Carsten Griwodz & Pål Halvorsen BitTorrent No second input stream: not contributed enough All nodes: max 2 concurrent streams in and out Tracker INF5071 – performance in distributed systems 2006 Carsten Griwodz & Pål Halvorsen BitTorrent Tracker INF5071 – performance in distributed systems 2006 Carsten Griwodz & Pål Halvorsen BitTorrent Tracker INF5071 – performance in distributed systems 2006 Carsten Griwodz & Pål Halvorsen BitTorrent Tracker INF5071 – performance in distributed systems 2006 Carsten Griwodz & Pål Halvorsen BitTorrent Assessment  Scalability, fairness, load balancing      Large distributed storage Avoids broadcasts Transfer content segments rather than complete content Does not rely on clients staying online after download completion Contributors are allowed to download more  Content location  Central server approach  Failure resilience  Tracker is single point of failure  Content holders can lie INF5071 – performance in distributed systems 2006 Carsten Griwodz & Pål Halvorsen Comparison Napster Scalability Routing information Lookup cost Gnutella Limited by flooding One central server O(1) Physical locality INF5071 – performance in distributed systems FreeNet O(#nodes) BitTorrent Separate overlays per file Uses caching Neighbour list O(log(#nodes)) FastTrack Index server One tracker per file O(1) O(#blocks) By search server assignment 2006 Carsten Griwodz & Pål Halvorsen Comparison Napster Load balancing Content location Many replicas of popular content All files reachable Uses index server Search by index term Failure resilience Gnutella Index server as single point of failure INF5071 – performance in distributed systems FreeNet FastTrack BitTorrent Content placement changes to fit search Load concentrated on supernodes Rarest first copying All files reachable Search by hash External issue Overlay network of index servers Tracker as single point of failure Unpopular files may be outside TTL Uses flooding Search by hash No single point of failure 2006 Carsten Griwodz & Pål Halvorsen Peer-to-Peer Systems Distributed directories Examples: Chord Chord  Approach taken     Only concerned with efficient indexing Distributed index - decentralized lookup service Inspired by consistent hashing: SHA-1 hash Content handling is an external problem entirely  No relation to content  No included replication or caching  P2P aspects  Every node must maintain keys  Adaptive to membership changes  Client nodes act also as file servers INF5071 – performance in distributed systems 2006 Carsten Griwodz & Pål Halvorsen Lookup Based on Hash Tables lookup(key)  data Insert(key, data) key hash function hash table 0 1 2 y z pos 3 .. .. .. Hash bucket N INF5071 – performance in distributed systems 2006 Carsten Griwodz & Pål Halvorsen Distributed Hash Tables (DHTs) Distributed application Insert(key, data) Lookup (key) data Distributed hash tables node        node node …. node Define a useful key nearness metric Keep the hop count small Keep the routing tables “right size” Stay robust despite rapid changes in membership Nodes are the hash buckets Key identifies data uniquely DHT balances keys and data across nodes INF5071 – performance in distributed systems 2006 Carsten Griwodz & Pål Halvorsen Chord IDs & Consistent Hashing  m bit identifier space for both keys and nodes  Key identifier = SHA-1(key) Key=“LetItBe” SHA-1 ID=54  Node identifier = SHA-1(IP address) IP=“198.10.10.1” SHA-1 ID=123  Both are uniformly distributed  Identifiers ordered in a circle modulo 2 m  A key is mapped to the first node whose id is equal to or follows the key id INF5071 – performance in distributed systems 2006 Carsten Griwodz & Pål Halvorsen Routing: Everyone-Knows-Everyone  Every node knows of every other node - requires global information  Routing tables are large – N Where is “LetItBe”? “N56 has K60” Hash(“LetItBe”) = K54 Requires O(1) hops INF5071 – performance in distributed systems 2006 Carsten Griwodz & Pål Halvorsen Routing: All Know Their Successor  Every node only knows its successor in the ring  Small routing table – 1 Where is “LetItBe”? “N56 has K60” Hash(“LetItBe”) = K54 Requires O(N) hops INF5071 – performance in distributed systems 2006 Carsten Griwodz & Pål Halvorsen Routing: “Finger Tables”  Every node knows m other nodes in the ring  Increase distance exponentially  Finger i points to successor of n+2i INF5071 – performance in distributed systems 2006 Carsten Griwodz & Pål Halvorsen Joining the Ring  Three step process:    Initialize all fingers of new node - by asking another node for help Update fingers of existing nodes Transfer keys from successor to new node INF5071 – performance in distributed systems 2006 Carsten Griwodz & Pål Halvorsen Handling Failures  Failure of nodes might cause incorrect lookup N120 N10 N113 N102 N85 Lookup(90) N80  N80 doesn’t know correct successor, so lookup fails  One approach: successor lists    Each node knows r immediate successors After failure find first known live successor Increased routing table size INF5071 – performance in distributed systems 2006 Carsten Griwodz & Pål Halvorsen Chord Assessment  Scalability, fairness, load balancing      Large distributed index Logarithmic search effort Network topology is not accounted for Routing tables contain log(#nodes) Quick lookup in large systems, low variation in lookup costs  Content location     Search by hash key: limited ways to formulate queries All indexed files are reachable Log(#nodes) lookup steps Not restricted to file location  Failure resilience  No single point of failure  Not in basic approach  Successor lists allow use of neighbors to failed nodes  Salted hashes allow multiple indexes  Relies on well-known relationships, but fast awareness of disruption and rebuilding INF5071 – performance in distributed systems 2006 Carsten Griwodz & Pål Halvorsen Examples: Pastry Pastry  Approach taken     Only concerned with efficient indexing Distributed index - decentralized lookup service Uses DHTs Content handling is an external problem entirely  No relation to content  No included replication or caching  P2P aspects     Every node must maintain keys Adaptive to membership changes Leaf nodes are special Client nodes act also as file servers INF5071 – performance in distributed systems 2006 Carsten Griwodz & Pål Halvorsen Pastry  DHT approach  Each node has unique 128-bit nodeId  Assigned when node joins  Used for routing   Each message has a key NodeIds and keys are in base 2b  b is configuration parameter with typical value 4 (base = 16, hexadecimal digits)    Pastry node routes the message to the node with the closest nodeId to the key Number of routing steps is O(log N) Pastry takes into account network locality  Each node maintains    Routing table is organized into log2b N rows with 2b-1 entry each Neighborhood set M — nodeId’s, IP addresses of M closest nodes, useful to maintain locality properties Leaf set L — set of L nodes with closest nodeId INF5071 – performance in distributed systems 2006 Carsten Griwodz & Pål Halvorsen Pastry Routing b=2, so nodeId is base 4 (16 bits) Contains the nodes that are numerically closest to local node NodeId 10233102 Leaf set SMALLER LARGER 10233033 10233001 10233021 10233000 10233120 10233230 10233122 10233232 -2-2301203 1-2-230203 -3-1203203 1-3-021022 2 10-3-23302 102-2-2302 3 3  log2b N  rows Routing table -0-2212102 1 0 1-1-301233 10-0-31203 102-0-0230 10-1-32102 102-1-1302 1023-0-322 10233-0-01 1023-1-000 1 0 Contains the nodes that are closest to local node according to proximity metric 1023-2-121 10233-2-32 102331-2-0 Entries in the nth row share the first n-1 digits with current node 2 Neighborhood set 13021022 02212102 INF5071 – performance in distributed systems 10200230 22301203 common prefix – next digit – rest 2b-1 entries per row 11301233 31203203 Entries in the mth column have m as nth row digit 31301233 33213321 Entries with no suitable nodeId are left empty 2006 Carsten Griwodz & Pål Halvorsen Pastry Routing 1. 2. source 3. 2331 X0: 0-130 | 1-331 | 2-331 | 3-001 Search leaf set for exact match Search route table for entry with at one more digit common in the prefix Forward message to node with equally number of digits in prefix, but numerically closer in leaf set 1331 X1: 1-0-30 | 1-1-23 | 1-2-11 | 1-3-31 1211 X2: 12-0-1 | 12-1-1 | 12-2-3 | 12-3-3 1223 L: 1232 | 1221 | 1300 | 1301 INF5071 – performance in distributed systems des t 1221 2006 Carsten Griwodz & Pål Halvorsen Pastry Assessment  Scalability, fairness, load balancing Distributed index of arbitrary size Support for physical locality and locality by hash value Stochastically logarithmic search effort Network topology is partially accounted for, given an additional metric for physical locality  Stochastically logarithmic lookup in large systems, variable lookup costs      Content location  Search by hash key: limited ways to formulate queries  All indexed files are reachable  Not restricted to file location  Failure resilience  No single point of failure  Several possibilities for backup routes INF5071 – performance in distributed systems 2006 Carsten Griwodz & Pål Halvorsen Examples: Tapestry Tapestry  Approach taken     Only concerned with self-organizing indexing Distributed index - decentralized lookup service Uses DHTs Content handling is an external problem entirely  No relation to content  No included replication or caching  P2P aspects  Every node must maintain keys  Adaptive to changes in membership and value change INF5071 – performance in distributed systems 2006 Carsten Griwodz & Pål Halvorsen Routing and Location  Namespace (nodes and objects)  SHA-1 hash: 160 bits length  Each object has its own hierarchy rooted at RootID = hash(ObjectID)  Prefix-routing     [JSAC 2004] Router at hth hop shares prefix of length ≥h digits with destination local tables at each node (neighbor maps) route digit by digit: 4***  42**  42A*  42AD neighbor links in levels INF5071 – performance in distributed systems 2006 Carsten Griwodz & Pål Halvorsen Routing and Location  Suffix routing [tech report 2001]  Router at hth hop shares suffix of length ≥h digits with destination  Example: 5324 routes to 0629 via 5324  2349  1429  7629  0629  Tapestry routing       Cache pointers to all copies Caches are soft-state UDP Heartbeat and TCP timeout to verify route availability Each node has 2 backup neighbors Failing primary neighbors are kept for some time (days) Multiple root nodes possible, identified via hash functions  Search value in a root if its hash is that of the root  Choosing a root node  Choose a random address  Route towards that address  If no route exists, choose deterministically, a surrogate INF5071 – performance in distributed systems 2006 Carsten Griwodz & Pål Halvorsen Routing and Location  Object location  Root responsible for storing object’s location (but not the object)  Publish / search both routes incrementally to root  Locates objects  Object: key/value pair  E.g. filename/file  Automatic replication of keys  No automatic replication of values  Values  May be replicated  May be stored in erasure-coded fragments INF5071 – performance in distributed systems 2006 Carsten Griwodz & Pål Halvorsen Tapestry #addr 1 #addr 2 … #K V Insert( , key K, value V) (#K,●) (#K,●) INF5071 – performance in distributed systems 2006 Carsten Griwodz & Pål Halvorsen Tapestry V (#K,●) caching (#K,●) (#K,●) (#K,●) (#K,●) #addr 1 #addr 2 … #K result ?K (#K,●) (#K,●) ● INF5071 – performance in distributed systems 2006 Carsten Griwodz & Pål Halvorsen Tapestry V (#K,●) (#K,●) (#K,●) (#K,●) (#K,●) (#K,●) (#K,●) ● INF5071 – performance in distributed systems 2006 Carsten Griwodz & Pål Halvorsen Tapestry V Move( , key K, value V) V (#K,●) (#K,●) (#K,●) (#K,●) (#K,●) (#K,●) (#K,●) ● INF5071 – performance in distributed systems 2006 Carsten Griwodz & Pål Halvorsen Tapestry V (#K,●) (#K,●) (#K,●) (#K,●) (#K,●) Stays wrong till timeout (#K,●) (#K,●) ● INF5071 – performance in distributed systems 2006 Carsten Griwodz & Pål Halvorsen Tapestry Assessment  Scalability, fairness, load balancing     Distributed index(es) of arbitrary size Limited physical locality of key access by caching and nodeId selection Variable lookup costs Independent of content scalability  Content location  Search by hash key: limited ways to formulate queries  All indexed files are reachable  Not restricted to file location  Failure resilience     No single point of failure Several possibilities for backup routes Caching of key resolutions Use of hash values with several salt values INF5071 – performance in distributed systems 2006 Carsten Griwodz & Pål Halvorsen Comparison Comparison Chord Pastry Routing information Log(#nodes) routing table size Log(#nodes) x (2b – 1) At least log(#nodes) routing routing table size table size Lookup cost Log(#nodes) lookup cost Approx. log(#nodes) lookup cost Variable lookup cost By neighbor list In mobile tapestry No single point of failure Several backup route No single point of failure Several backup route Alternative hierarchies Physical locality Failure resilience No resilience in basic version Additional successor lists provide resilience INF5071 – performance in distributed systems Tapestry 2006 Carsten Griwodz & Pål Halvorsen Applications Streaming in Peer-to-peer networks Peer-to-peer network INF5071 – performance in distributed systems 2006 Carsten Griwodz & Pål Halvorsen Promise Promise  Video streaming in Peer-to-Peer systems  Video segmentation into many small segments  Pull operation  Pull from several sources at once  Based on Pastry and CollectCast  CollectCast  Adds rate/data assignment  Evaluates  Node capabilities  Overlay route capabilities  Uses topology inference  Detects shared path segments - using ICMP similar to traceroute  Tries to avoid shared path segments  Labels segments with quality (or goodness) INF5071 – performance in distributed systems 2006 Carsten Griwodz & Pål Halvorsen Promise active sender Each active sender: • receives a control packet specifying which data segments, data rate, etc., • pushes data to receiver as long as no new control packet is received standby active sender standby sender The receiver: • sends a lookup request using DHT • selects some active senders, control packet • receives data as long as no errors/changes occur • if a change/error is detected, new active senders may be Receiver selected Thus, Promise is a multiple sender to one receiver P2P media streaming system which 1) accounts for different capabilities, 2) matches senders to achieve best quality, and 3) dynamically adapts to network fluctuations and peer failure INF5071 – performance in distributed systems active sender 2006 Carsten Griwodz & Pål Halvorsen SplitStream SplitStream  Video streaming in Peer-to-Peer systems     Uses layered video Uses overlay multicast Push operation Build disjoint overlay multicast trees  Based on Pastry INF5071 – performance in distributed systems 2006 Carsten Griwodz & Pål Halvorsen SplitStream Each node: • joins as many multicast trees as there are stripes (K) • may specify the number of stripes they are willing to act as Source: full quality movie router for, i.e., according to the amount of resources available Stripe 1 Each movie is split into K stripes and each stripe is multicasted using a separate three Thus, SplitStream is a multiple sender to multiple receiver P2P system which distributes the forwarding load while respecting each node’s resource limitations, but some effort is required to build the forest of multicast threes Stripe 2 INF5071 – performance in distributed systems 2006 Carsten Griwodz & Pål Halvorsen Some References 1. 2. 3. 4. 5. 6. M. Castro, P. Druschel, A-M. Kermarrec, A. Nandi, A. Rowstron and A. Singh, "SplitStream: High-bandwidth multicast in a cooperative environment", SOSP'03, Lake Bolton, New York, October 2003 Mohamed Hefeeda, Ahsan Habib, Boyan Botev, Dongyan Xu, Bharat Bhargava, "Promise: Peer-to-Peer Media Streaming Using Collectcast", ACM MM’03, Berkeley, CA, November 2003 Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek and Hari Balakrishnan, “Chord: A scalable peer-to-peer lookup service for internet applications”, ACM SIGCOMM’01 Ben Y. Zhao, John Kubiatowicz and Anthony Joseph, “Tapestry: An Infrastructure for Faulttolerant Wide-area Location and Routing”, UCB Technical Report CSD-01-1141, 1996 John Kubiatowicz, “Extracting Guarantees from Chaos”, Comm. ACM, 46(2), February 2003 Antony Rowstron and Peter Druschel, “Pastry: Scalable, distributed object location and routing for large-scale peer-to-peer systems”, Middleware’01, November 2001 INF5071 – performance in distributed systems 2006 Carsten Griwodz & Pål Halvorsen Adaptive Multi-Source Streaming in Heterogeneous Peer-to-Peer Networks Vikash Agarwal, Reza Rejaie Computer and Information Science Department University of Oregon http://mirage.cs.uoregon.edu January 19, 2005 INF5071 – performance in distributed systems 2006 Carsten Griwodz & Pål Halvorsen Introduction  P2P streaming becomes increasingly popular  Participating peers form an overlay to cooperatively stream content among themselves  Overlay-based approach is the only way to efficiently support multiparty streaming apps without multicast  Two components:  Overlay construction  Content delivery  Each peer desires to receive max. quality that can be streamed through its access link  Peers have asymmetric & heterogeneous BW connectivity  Each peer should receive content from multiple parent peers => Multi-source streaming.  Multi-parent overlay structure rather than tree INF5071 – performance in distributed systems 2006 Carsten Griwodz & Pål Halvorsen Benefits of Multi-source Streaming  Higher bandwidth to each peer  higher delivered quality  Better load balancing among peers  Less congestion across the network  More robust to dynamics of peer participation  Multi-source streaming introduces new challenges … INF5071 – performance in distributed systems 2006 Carsten Griwodz & Pål Halvorsen Multi-source streaming: Challenges  Congestion controlled connections from different parent peers exhibit  independent variations in BW  different RTT, BW, loss rate  Aggregate bandwidth changes over time  Streaming mechanism should be quality adaptive  Static “one-layer-per-sender” approach is inefficient  There must be a coordination mechanism among senders in order to  Efficiently utilize aggregate bandwidth  Gracefully adapt delivered quality with BW variations This paper presents a receiver-driven coordination mechanism for multi-source streaming called PALS INF5071 – performance in distributed systems 2006 Carsten Griwodz & Pål Halvorsen Previous Studies  Congestion control was often ignored  Server/content placement for streaming MD content [Apostolopoulos et al.]  Resource management for P2P streaming [Cue et al.]  Multi-sender streaming [Nguyen et al], but they assumed  Aggregate BW is more than stream BW  RLM is receiver-driven but ..  RLM tightly couples coarse quality adaptation with CC  PALS only determines how aggregate BW is used  P2P content dist. mechanism can not accomodate “streaming” apps  e.g. BitTorrent, Bullet INF5071 – performance in distributed systems 2006 Carsten Griwodz & Pål Halvorsen Overall Architecture  Overall architecture for P2P streaming  PRO: Bandwidth-aware overlay construction  Identifying good parents in the overlay  PALS: Multi-source adaptive streaming  Streaming content from selected parents  Distributed multimedia caching  Decoupling overlay construction from delivery provides great deal of flexibility  PALS is a generic multi-source streaming protocol for non-interactive applications INF5071 – performance in distributed systems 2006 Carsten Griwodz & Pål Halvorsen Assumptions & Goals Assumptions: Goals:  All peers/flows are cong.  To fully utilize aggregate bandwidth controlled to dynamically maximize delivered  Content is layered encoded quality  All layers are CBR with the  Deliver max no of layers same cons. rate*  Minimize variations in quality  All senders have all layers (relax this later)*  Limited window of future packets are available at each sender  Live but non-interactive * Not requirements INF5071 – performance in distributed systems 2006 Carsten Griwodz & Pål Halvorsen P2P Adaptive Layered Streaming (PALS)  Receiver: periodically requests an ordered list of packets/segments from each sender.  Sender: simply delivers requested packets with the given order at the CC rate  Benefits of ordering the requested list:  Provide flexibility for the receiver to closely control delivered packets  Graceful degradation in quality when bandwidth suddenly drops  Periodic requests => stability & less overhead INF5071 – performance in distributed systems 2006 Carsten Griwodz & Pål Halvorsen Basic Framework EWMA aggregate BW Peer 2 Receiver passively monitors EWMA BW from each sender Peer 1 Estimate total no of pkts to be delivered during next window (K) BW1 Demux bw (t) 3 C C C C Decoder buf 3 bw2(t) buf 2 bw1(t) buf 1 buf 0 INF5071 – performance in distributed systems 0 •Controlling evolution of buf. state. bw (t) •Controlling bw0(t), bw1(t), …, •Allocating each sender’s bw among active layers BW0 Internet BW2 Allocate K pkts among active layers (Quality Adaptation) Assign a subset of pkts to each sender (Packet assignment) Peer 0 2006 Carsten Griwodz & Pål Halvorsen Key Components of PALS  Sliding Window (SW): to keep all senders busy & loosely synchronized with receiver playout time  Quality adaptation (QA): to determine quality of delivered stream, i.e. required packets for all layers during one window  Packet Assignment (PA): to properly distribute required packets among senders INF5071 – performance in distributed systems 2006 Carsten Griwodz & Pål Halvorsen Sliding Window    1) 2) 3) Buffering window: range of timestamps for packets that must be requested in one window. Window is slided forward in a step-like fashion Requested packets per window can be from Playing window (loss recovery) Buffering window (main group) Future windows (buffering) INF5071 – performance in distributed systems 2006 Carsten Griwodz & Pål Halvorsen Sliding Window (cont’d)  Window size determines the tradeoff between smoothness or signaling overhead & responsiveness  Should be a function of RTT since it specifies timescale of variations in BW  Multiple of max smoothed RTT among senders  Receiver might receive duplicates  Re-requesting the packet that is in flight!  Ratio of duplicates are very low and can be reduced by increasing window INF5071 – performance in distributed systems 2006 Carsten Griwodz & Pål Halvorsen Coping with BW variations  Sliding window is insufficient  Coping with sudden drop in BW by  Overwriting request at senders  Ordering requested packets  Coping with sudden increase in BW by  Requesting extra packets INF5071 – performance in distributed systems 2006 Carsten Griwodz & Pål Halvorsen Quality Adaptation  Determining required packets from future windows  Coarse-grained adaptation  Add/drop layer  Fine-grained adaptation Peer 2 Peer 1 BW1 Demux bw (t) 3 C C C C Decoder buf 3 bw2(t) buf 2 buf 1 buf 0 bw1(t) 0  Buffer distribution determines what bw (t) What is a proper buffer dist? INF5071 – performance in distributed systems BW0 Internet BW2  Controlling bw0(t), bw1(t), …,  Loosely controlling evolution of receiver buffer state/dist. degree of BW variations can be smoothed. Peer 0 2006 Carsten Griwodz & Pål Halvorsen Buffer Distribution  Impact on delivered quality  Conservative buf. distribution achieves long-term smoothing  Aggressive buf. distribution achieves short-term improvement  PALS leverages this tradeoff in a balanced fashion  Window size affects buffering:  Amount of future buffering  Slope of buffer distribution  Multiple opportunities to request a packet (see paper)  Implicit loss recovery INF5071 – performance in distributed systems 2006 Carsten Griwodz & Pål Halvorsen Packet Assignment  How to assign an ordered list of selected pkts from diff. layers to individual senders?  Number of assigned pkts to each sender must be proportional to its BW contribution  More important pkts should be delivered  Weighted round robin pkt assignment strategy  Extended this strategy to support partially available content at each peer  Please see paper for further details INF5071 – performance in distributed systems 2006 Carsten Griwodz & Pål Halvorsen Performance Evaluation  Using ns simulation to control BW dynamics  Focused on three key dynamics in P2P systems: BW variations, Peer participation, Content availability  Senders with heterogeneous RTT & BW  Decouple underlying CC mechanism from PALS  Performance Metrics: BW Utilization, Delivered Quality  Two strawman mechanisms with static layer assignment to each sender:  Single Layer per Sender (SLS): Sender i delivers layer i  Multiple Layer per Sender (MLS): Sender i delivers layer j<i INF5071 – performance in distributed systems 2006 Carsten Griwodz & Pål Halvorsen Necessity of Coordination  SLS & MLS exhibit high variations in quality  No explicit loss recovery  No coordination  Inter-layer dependency magnifies the problem  PALS effectively utilizes aggregate BW & delivers stable quality in all cases INF5071 – performance in distributed systems 2006 Carsten Griwodz & Pål Halvorsen Delay-Window Tradeoff  Avg. delivered quality only depends on agg. BW Heterogeneous senders  Higher Delay => smoother quality  Duplicates exponentially decrease with window size  Avg. per-layer buffering linearly increases with Delay  Increasing window leads to even buffer dist.  See paper for more results. INF5071 – performance in distributed systems 2006 Carsten Griwodz & Pål Halvorsen Conclusion & Future Work  PALS is a receiver-driven coordination mechanism for streaming from multiple cong. controlled senders.  Simulation results are very promising  Future work:  Further simulation to examine further details  Prototype implementation for real experiments  Integration with other components of our architecture for P2P streaming INF5071 – performance in distributed systems 2006 Carsten Griwodz & Pål Halvorsen Partially available content  Effect of segment size and redundancy INF5071 – performance in distributed systems 2006 Carsten Griwodz & Pål Halvorsen INF5071 – performance in distributed systems 2006 Carsten Griwodz & Pål Halvorsen INF5071 – performance in distributed systems 2006 Carsten Griwodz & Pål Halvorsen Packet Dynamics INF5071 – performance in distributed systems 2006 Carsten Griwodz & Pål Halvorsen
 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
                                             
                                             
                                             
                                             
                                             
                                             
                                             
                                             
                                            