Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Internet protocol suite wikipedia , lookup
Multiprotocol Label Switching wikipedia , lookup
Distributed operating system wikipedia , lookup
Airborne Networking wikipedia , lookup
Recursive InterNetwork Architecture (RINA) wikipedia , lookup
List of wireless community networks by region wikipedia , lookup
Zero-configuration networking wikipedia , lookup
CS234 – Application Layer Multicasting Tuesdays, Thursdays 3:30-4:50p.m. ICS 243 Prof. Nalini Venkatasubramanian [email protected] (with slides adapted from Dr. Animesh Nandi and Dr. Ayman Abdel-Hamid) Groupware applications Collaborative applications require 1-many communications Video conference Interactive distributed simulations Stock exchange (NYSE) News dissemination, file updates Online games Dynamic peer groups Dynamic membership (group size) Non-hierarchical Logical Multipoint Communications •Two basic logical organizations Rooted: hierarchy (perhaps just two levels) that structures communications Non-rooted: peer-to-peer (no distinguished nodes) •Different structure could apply to control and data “planes” Control plane determines how multipoint session is created Data plane determines how data is transferred between hosts in the multipoint session Multicasting © Dr. Ayman Abdel-Hamid 3 Logical Multipoint Communications Control Plane •The control plane manages creation of a multipoint session Rooted control plane One member of the session is the root, c_root Other members are the leafs, c_leafs Normally c_root establishes a session Root connects to one or more c_leafs c_leafs join c_root after session established Non-rooted control plane All members are the same (c_leafs) Each leaf adds itself to the session Multicasting © Dr. Ayman Abdel-Hamid 4 Logical Multipoint Communications Data Plane The data plane is concerned with data transfer •Rooted data plane Special root member, d_root Other members are leafs, d_leafs Data transferred between d_leafs and d_roots d_leaf to d_root d_root to d_leaf There is no direct communication between d_leafs •Non-rooted data plane No special members, all are d_leafs Every d_leafs communicate with all d_leafs Multicasting © Dr. Ayman Abdel-Hamid 5 Multicast Communication •Multicast abstraction is peer-to-peer Application-level multicast Network-level multicast Requires router support (multicast-enabled routers) Multicast provided at network protocol level IP multicast Multicasting © Dr. Ayman Abdel-Hamid, CS4 254 Spring 2006 6 Multicast Communication •Transport mechanism and network layer must support multicast •Internet multicast limited to UDP (not TCP) Unreliable: No acknowledgements or other error recovery schemes (perhaps at application level) Connectionless: No connection setup (although there is routing information provided to multicast-enabled routers) Datagram: Message-based multicast Multicasting © Dr. Ayman Abdel-Hamid 7 IP multicast Using unicast to replicate packets not efficient thus, IP multicast needed Built around the notion of group of hosts: Senders and receivers need not know about each other Sender simply sends packets to “logical” group No restriction on number or location of receivers address Applications may impose limits Normal, best-effort delivery semantics of IP • Unlike broadcast, allows a proper subset of hosts to participate • Standards: IP Multicast (RFC 1112) IP Multicast •IP supports multicasting Uses only UDP, not TCP Special IP addresses (Class D) identify multicast groups Internet Group Management Protocol (IGMP) to provide group routing information Multicast-enabled routers selectively forward multicast datagrams IP TTL field limits extent of multicast •Requires underlying network and adapter to support broadcast or, preferably, multicast Ethernet (IEEE 802.3) supports multicast Multicasting © Dr. Ayman Abdel-Hamid 9 IP Multicast: Group Address •How to identify the receivers of a multicast datagram? •How to address a datagram sent to these receivers? Each multicast datagram to carry the IP addresses of all recipients? Not scalable for large number of recipients Use address indirection A single identifier used for a group of receivers Multicasting © Dr. Ayman Abdel-Hamid 10 IP Multicast: IGMP Protocol •RFC 3376 (IGMP v3): operates between a host and its directly attached router •host informs its attached router that an application running on the host wants to join or leave a specific multicast group •another protocol is required to coordinate multicast routers throughout the Internet network layer multicast routing algorithms •Network layer multicast IGMP and multicast routing protocols •IGMP enables routers to populate multicast routing tables •Carried within an IP datagram Multicasting © Dr. Ayman Abdel-Hamid 11 IP Multicast: IGMP Protocol •Joining a group Host sends group report when the first process joins a given group Application requests join, service provider (end-host) sends report •Maintaining table at the router Multicast router periodically queries for group information Host (service provider) replies with an IGMP report for each group Host does not notify router when the last process leaves a group this is discovered through the lack of a report for a query Multicasting © Dr. Ayman Abdel-Hamid, CS4 254 Spring 2006 12 IP Multicast: Multicast Routing •Multicast routers do not maintain a list of individual members of each host group •Multicast routers do associate zero or more host group addresses with each interface Multicasting © Dr. Ayman Abdel-Hamid, CS4 254 Spring 2006 13 IP Multicast: Multicast Routing •Multicast router maintains table of multicast groups that are active on its networks •Datagrams forwarded only to those networks with group members Multicasting © Dr. Ayman Abdel-Hamid, CS4 254 Spring 2006 14 IP Multicast: Multicast Routing •How multicast routers route traffic amongst themselves to ensure delivery of group traffic? Find a tree of links that connects all of the routers that have attached hosts belonging to the multicast group Group-shared trees Source-based trees Shared Tree Multicasting © Dr. Ayman Abdel-Hamid Source Tree s 15 MBONE: Internet Multicast Backbone •The MBone is a virtual network on top of the Internet (section B.2) Routers that support IP multicast IP tunnels between such routers and/or subnets Multicasting © Dr. Ayman Abdel-Hamid, CS4 254 Spring 2006 16 Issues with IP Multicast IP Multicast islands ISPs are not always interested in mutual cooperation. Forwarding IP Unicast traffic is “easier” (stateless) than forwarding IP Multicast. Billing Issues Violates ISP input-rate-based billing model ISPs fear the additional traffic induced through IP Multicast (DoS attacks – join high traffic volume Multicast groups). Multicast islands are interconnected with tunnels: MBONE. No indication of group size (again needed for billing) Security Issues No incentive for ISPs to enable multicast! Manual setup of tunnels, Administrative overhead. Result: Usability is reduced Application Layer Multicast A promising alternative: ALM Attempts to mimic multicast routing at the application level Instead of high-speed routers, use programmable end hosts as interior nodes in the ALM system ALM System ALM Structure Dissemination Method Data Dissemination in ALM Concerns What is target data? How fast? Streaming, Large files, Notification Messages Tolerant to delay, emergency case How reliably? How efficiently? Overhead, Bandwidth What you expect with ALM System perspective Utilize all available bandwidth User perspective Perfect continuity Low startup delay Low lag Design space of CEM ALM Data Plane Tree Forest/ Multiple Tree Mesh Hybrid (Tree with Mesh) Control Plane SAAR [17] Random Network Data Plane : Tree Loop free path Capacity of each tree link >= the streaming rate Push based forwarding Low latency Unable to utilize the forwarding bandwidth of all participating nodes Highly sensitive node failure Only interior nodes contribute Subtrees are significantly affected by a node failure Require sophisticated recovery techniques ESM (End-System Multicast)[2], Overcast[3], ZIGZAG[4], NICE[5] Points of failure Points on an ALM structure which can disturb message dissemination behind these points Each node of a single tree structure is a point of failure Is failure recovery feasible? Tree repair takes time which causes delay Tens of seconds Detection of failure takes time Ex) TCP timeover, hearbeat duration Tree reconstruction cost is high Eager techniques that frequently monitor failure – significant overhead Goal: Reliable dissemination without failure recovery? Increasing Reliability Increasing the number of parents N parents ALM structure SplitStream[2003], K-DAG [2004], Multicast Forest [2005] N parents ALM endures N-1 failures among its parents But, each node may get N-1 redundant messages Data Plane : Forest/Multiple Tree K trees K loop free paths Data is separated into multiple (K) chunks Capacity of each tree link >= the streaming rate/K Each node can have more children than a single tree Average depth of trees is lower than a single tree Reduce delay and increase fault resilience High Reliability Each chunk is disseminated in one of the trees Sending same data to K paths Endure K-1 failures SplitStream[6], CoopNet[7], Chunkyspread[8] Random Multicast Forest Use multiple multicast tree ( K ) Each node gets k parents Sometimes it can have the same parents for different trees Cannot endure K-1 failures with some probability S a b d c e g f h i K-DAG (Directed Acyclic Graph ) Each node selects each parent randomly Select “M” closest candidate nodes ( M > K ) Select K parents randomly among the M Each node gets K parents Still it can not guarantee message delivery S a b c d f e g h i SplitStream[6] Forest based dissemination Basic idea Split the stream into K stripes (with MDC coding) For each stripe create a multicast tree such that the forest Contains interior-node-disjoint trees Respects nodes’ individual bandwidth constraints SplitStream : MDC coding Multiple Description coding Fragments a single media stream into M substreams (M ≥ 2 ) K packets are enough for decoding (K < M) Less than K packets can be used to approximate content Useful for multimedia (video, audio) but not for other data Cf) erasure coding for large data file SplitStream : Interior-nodedisjoint tree Each node in a set of trees is interior node in at most one tree and leaf node in the other trees. Each substream is disseminated over subtrees S ID =0x… b ID =1x… a c e d ID =2x… g f h i Data Plane : Mesh A Random Graph Node’s degree is proportional to the node’s forwarding bandwidth Gossip based forwarding Minimum degree : ensure the mesh remains connected in the presence of churn Buffer (b) : most recently received blocks Every r seconds, nodes exchange the information of missing blocks and buffered blocks. Randomly request an available block for a missing block CoolStreaming[9], Chainsaw[10], PRIME[11], PULSE[12], CREW[13] Pros and Cons of Mesh Fully utilize bandwidth of all nodes High Failure resilience High overhead Long Delay/start up Gossip-based Broadcast Probabilistic Approach with Good Fault Tolerant Properties Choose a destination node, uniformly at random, and send it the message After Log(N) rounds, all nodes will have the message w.h.p. Requires N*Log(N) messages in total Needs a ‘random sampling’ service Usually implemented as Rebroadcast ‘fanout’ times Using UDP: Fire and Forget BiModal Multicast (99), Lpbcast (DSN 01), Rodrigues’04 (DSN), Brahami ’04, Verma’06 (ICDCS), Eugster’04 (Computer), Koldehofe’04, Periera’03 Data Plane : Hybrid Tree/Forest + Mesh Tree/Forest : Push-based dissemination Mesh : Pull-based dissemination Low delay Instant failure recovery Bullet[14], PRM[15], FaReCast[16] Bullet Layers a mesh on top of an overlay tree to increase overall bandwidth Basic Idea Use a tree as a basis In addition, each node continuously looks for peers to download from In effect, the overlay is a tree combined with a random network (mesh) PRM (Probabilistic Resilient Multicasting) Tree-based Multicasting with a few of random mesh links. Cross-Link Redundancy Add (random) cross links to the forwarding topology For Reliability PRM (Probabilistic Resilient Multicasting) Recovery Mechanism of PRM Basic assumption Ephemeral forwarding – Reactive Nodes maintain a set of mesh neighbors Upon detecting a disconnection, the head of disconnected node tries to obtain the stream from one of the mesh neighbors Before data plane recovery Randomized forwarding – Proactive A node forwards the stream to a random mesh neighbors with very small probability (1~3%) Case Study : FaReCast for Flash Dissemination Early Warning System Earthquake Early Warning System Earthquake Event : Rare event, but its spreading speed is very fast Give the recipients a little more time to prepare for the disaster A simple alert-message should be delivered to all recipients within very short time duration Catastrophe Resilience Use-Case Earthquakes may damage infrastructure in unpredictable and multiple ways If earthquake disrupts the major routers/links in LA Power, Telecommunication, Servers Can assume 10-25% of all overlay nodes in California will be “down” Usually, people have studied node failures on smaller scale: Fault Tolerance We are looking at scenario where a large fraction of the network is simultaneously affected : Catastrophe Tolerance Design systems that are resilient to 50%41 simultaneous node failures Characteristic of Recipients The number of Recipients is huge Hundreds of thousands recipients may be interested in this warning messages Recipients are unreliable Normal machine administrated by normal users Free will of join/leave Need to consider message Internal laggingthe forreliable forwarding dissemination to all online recipients Event characteristic Number of Earthquakes in the United States for 2000 - 2009 Located by the US Geological Survey National Earthquake Information Center Magnitude 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 8.0 to 9.9 0 0 0 0 0 0 0 0 0 0 7.0 to 7.9 0 1 1 2 0 1 0 1 0 0 6.0 to 6.9 6 5 4 7 2 4 7 9 9 0 5.0 to 5.9 63 41 63 54 25 47 51 72 73 5 4.0 to 4.9 281 290 536 541 284 345 346 366 446 49 3.0 to 3.9 917 842 1535 1303 1364 1475 1213 1137 1483 171 2.0 to 2.9 660 646 1228 704 1336 1738 1145 1173 1570 247 1.0 to 1.9 0 2 2 2 1 2 7 11 14 4 0.1 to 0.9 0 0 0 0 0 0 1 0 0 0 The major415earthquake (Magnitude>6) is 13 very 22 rare event. No Magnitude 434 507 333 540 73 20 Total 2342 2261 3876 2946 3552 3685 * 2783 2791 * 3615 6 * 482 Need to consider the maintenance overhead of the ( http://neic.usgs.gov/neis/eqlists/eqstats.html ) early warning system. 44/32 Forest-based M2M ALM structure Multiple Fan-In – increase reliability with rich path diversity Multiple Fan-Out – for efficient and fast dissemination ↑Path Diversity, ↑ reliability, ↑speed 45/32 Properties of M2M ALM structure Fan-in constraint Each node has Fi distinct parents (except level 0/1) All Parents of a node are on the previous level Root node reliability Loop-Free transmission Level : length in overlay from a root node No loop in parent-to-children dissemination All paths from root to node have same length – data reaches from diff. parents closeby in time. Parent set uniqueness Parent sets of any two nodes at a level (level ≥2) differ by at least one parent. Increases path diversity, more reliable delivery 46/32 Realizing parent set uniqueness Fan-out factor = 2 Fan-in = 2 N L ( Fo ) Fo Fi 2 L A Level 0 Level 1 NL = 4 B F C I ? Level 2 NL = 6 D E G H J K Bypassing Problem with Simple Multicasting Method 2 parents ALM structure a c b d g e h f i (1) Bypassing Problem: Type 1 - The message may bypass an alive nodes, when its all parents fail - A child (node h) of a bypassed node ( node d ) can receive the message from other parents (2) Bypassing Problem: Type 2 -A child of a bypassed node can not receive the message, when its other parents fail Probability of Bypassing Probability of bypassing (type 1) - Pf : probability that a node fails - n parents : means there are ‘n’ paths from which the message can come n Pf j 1 Probability of bypassing (type 2) - Multiply ‘Pf’ of its parents to the probability of bypassing - It is smaller than type 1, but still can happen j Prevent Bypassing Use Child-to-Parents relationship General dissemination of tree/forest ALM structures uses only Parent-to-Children relationship Get additional ‘m’ paths, where ‘m’ is the number of children n Pf j 1 m j Pf j j 1 Backup Dissemination 2 parents ALM structure a c b d g e h f i Each node can expect the duplicated messages A node does not get duplicated message within some time, it sends the message to its parents which did not send a message Limitation of Backup Dissemination 2 parents ALM structure a c b d g e h f j i Node g, h and i are leaf nodes having no backup paths from children Node d and h can get the message by the Backup Dissemination Still node i can not get the message Leaf nodes are problem Leaf-to-Leaf Dissemination 2 parents ALM structure a c b d g e d d h Each leaf node has links to other leaf nodes sharing the same parents f f j i A leaf node does not get duplicated message within some time from a specific parent, it sends the message to this parent as well as its children EXTRA SLIDES Control Plane : Random Network Bounce Protocol in CREW[13] A kind of K-DAG Random walk through a tree to get new parents Construct DAG with random parents S Neighbor Request c g a b d e f Control Plane : SAAR Isolated control plane to support efficient and flexible selection of data dissemination peers Multiple overlays share a control overlay Reduce overhead and improve scalability Anycast primitive used to build different overlays References [1] Understanding the design tradeoffs for cooperative streaming multicast, In Max Planck Institute for Software Systems Technical Report MPI-SWS-2009-002. [2] Early Experience with an Internet Broadcast System Based on Overlay Multicast. In Proceedings of USENIX Annual Technical Conference 2004. [3] Overcast: Reliable Multicasting with an Overlay Network. In Proceedings of OSDI 2000. [4] ZIGZAG: An efficient peer-to-peer scheme for media streaming. In Proceedings of INFOCOM 2003. [5] Scalable Application Layer Multicast. In Proceedings of SIGCOMM 2002. [6] Splitstream: High-bandwidth multicast in a cooperative environment. In Proceedings of SOSP 2003. [7] Distributing streaming media content using cooperative networking. In Proceedings of NOSSDAV 2002. [8] Chunkyspread: Heterogeneous unstructured tree-based peer-to-peer multicast. In Proceedings of ICNP 2006. [9] Coolstreaming/DONet: A data-driven overlay network for peer-to-peer live media streaming. In Proceedings of INFOCOM 2005. [10] Chainsaw: Eliminating trees from overlay multicast. In Proceedings of IPTPS 2005. [11] PRIME: Peer-to-peer Receiver-drIven MEsh-based Streaming. In Proceedings of INFOCOM 2007. [12] PULSE: An adaptive, incentive-based, unstructured p2p live streaming system. In Proceedings of IEEE Transactions on Multimedia, Vol. 9, Nov. 2007. [13] CREW: A Gossip-based Flash-Dissemination System. In Proceedings of ICDCS 2006. [14] Bullet:High Bandwidth Data Dissemination Using an Overlay Mesh. In Proceedings of SOSP 2003. [15] Resilient multicast using overlays. In Proceedings of ACM SIGMETRICS 2003. [16] FaReCast: Fast, Reliable Application Layer Multicast for Flash Dissemination. In Proceedings of Middleware 2010. [17] SAAR: A shared control plane for overlay multicast. In Proceedings of NSDI 2007. EXTRA SLIDES (Not covered) Constraint Model for CEM Continuity Mesh achieves almost perfect continuity after T=45 Trees are fast, but lose continuity without recoveries T-C : the proportion of the streamed bits that have arrived at the receiving node within T seconds of their first transmission by the source Join Delay Effect of stream rate With high stream rate (e.g. Ri=1.2), tree loses continuity substantially. Reducing Mesh Lag By decreasing the streaming block size By decreasing swarming interval (r) Overhead!! Closer view of buffer size and swarming interval Swarming overhead hampers delivery delay of blocks. Improving Tree Continuity Recovery mechanisms improve continuity, but incur delays Under high churn and packet loss