Download ppt - UCI

Document related concepts

Internet protocol suite wikipedia , lookup

Multiprotocol Label Switching wikipedia , lookup

Distributed operating system wikipedia , lookup

IEEE 1355 wikipedia , lookup

Airborne Networking wikipedia , lookup

Recursive InterNetwork Architecture (RINA) wikipedia , lookup

CAN bus wikipedia , lookup

List of wireless community networks by region wikipedia , lookup

Zero-configuration networking wikipedia , lookup

Kademlia wikipedia , lookup

Routing in delay-tolerant networking wikipedia , lookup

IEEE 802.1aq wikipedia , lookup

Transcript
CS234 – Application Layer
Multicasting
Tuesdays, Thursdays 3:30-4:50p.m.
ICS 243
Prof. Nalini Venkatasubramanian
[email protected]
(with
slides adapted from Dr. Animesh Nandi and Dr. Ayman
Abdel-Hamid)
Groupware applications
Collaborative applications require
1-many communications






Video conference
Interactive distributed simulations
Stock exchange (NYSE)
News dissemination, file updates
Online games
Dynamic peer groups


Dynamic membership (group size)
Non-hierarchical
Logical Multipoint Communications
•Two basic logical organizations
Rooted: hierarchy (perhaps just two levels)
that structures communications
Non-rooted: peer-to-peer (no
distinguished nodes)
•Different structure could apply to control and
data “planes”
Control plane determines how multipoint
session is created
Data plane determines how data is
transferred between hosts in the multipoint
session
Multicasting
© Dr. Ayman Abdel-Hamid
3
Logical Multipoint Communications
Control Plane
•The control plane manages creation of a multipoint session
Rooted control plane
One member of the session is the root, c_root
Other members are the leafs, c_leafs
Normally c_root establishes a session
Root connects to one or more c_leafs
c_leafs join c_root after session established
Non-rooted control plane
All members are the same (c_leafs)
Each leaf adds itself to the session
Multicasting
© Dr. Ayman Abdel-Hamid
4
Logical Multipoint Communications
Data Plane
The data plane is concerned with data transfer
•Rooted data plane
Special root member, d_root
Other members are leafs, d_leafs
Data transferred between d_leafs and d_roots
d_leaf to d_root
d_root to d_leaf
There is no direct communication between d_leafs
•Non-rooted data plane
No special members, all are d_leafs
Every d_leafs communicate with all d_leafs
Multicasting
© Dr. Ayman Abdel-Hamid
5
Multicast Communication
•Multicast abstraction is peer-to-peer
Application-level multicast
Network-level multicast
Requires router support (multicast-enabled routers)
Multicast provided at network protocol level  IP multicast
Multicasting
© Dr. Ayman Abdel-Hamid, CS4
254 Spring 2006
6
Multicast Communication
•Transport mechanism and network layer must support multicast
•Internet multicast limited to UDP (not TCP)
Unreliable: No acknowledgements or other error recovery schemes
(perhaps at application level)
Connectionless: No connection setup (although there is routing
information provided to multicast-enabled routers)
Datagram: Message-based multicast
Multicasting
© Dr. Ayman Abdel-Hamid
7
IP multicast


Using unicast to replicate packets not efficient 
thus, IP multicast needed
Built around the notion of group of hosts:



Senders and receivers need not know about each other
Sender simply sends packets to “logical” group
No restriction on number or location of receivers

address
Applications may impose limits

Normal, best-effort delivery semantics of IP
•
Unlike broadcast, allows a proper subset of hosts to participate
•
Standards: IP Multicast (RFC 1112)
IP Multicast
•IP supports multicasting
Uses only UDP, not TCP
Special IP addresses (Class D) identify multicast groups
Internet Group Management Protocol (IGMP) to provide group routing
information
Multicast-enabled routers selectively forward multicast datagrams
IP TTL field limits extent of multicast
•Requires underlying network and adapter to support broadcast or,
preferably, multicast
Ethernet (IEEE 802.3) supports multicast
Multicasting
© Dr. Ayman Abdel-Hamid
9
IP Multicast: Group Address
•How to identify the receivers of a multicast datagram?
•How to address a datagram sent to these receivers?
Each multicast datagram to carry the IP addresses of all recipients? 
Not scalable for large number of recipients
 Use address indirection
A single identifier used for a group of receivers
Multicasting
© Dr. Ayman Abdel-Hamid
10
IP Multicast: IGMP Protocol
•RFC 3376 (IGMP v3): operates between a host and
its directly attached router
•host informs its attached router that an application
running on the host wants to join or leave a specific
multicast group
•another protocol is required to coordinate multicast
routers throughout the Internet  network layer
multicast routing algorithms
•Network layer multicast  IGMP and multicast
routing protocols
•IGMP enables routers to populate multicast routing
tables
•Carried within an IP datagram
Multicasting
© Dr. Ayman Abdel-Hamid
11
IP Multicast: IGMP Protocol
•Joining a group
Host sends group report when the first process joins a given group
Application requests join, service provider (end-host) sends report
•Maintaining table at the router
Multicast router periodically queries for group information
Host (service provider) replies with an IGMP report for each group
Host does not notify router when the last process leaves a group  this
is discovered through the lack of a report for a query
Multicasting
© Dr. Ayman Abdel-Hamid, CS4
254 Spring 2006
12
IP Multicast: Multicast Routing
•Multicast routers do not maintain a list of individual members of each host
group
•Multicast routers do associate zero or more host group addresses with
each interface
Multicasting
© Dr. Ayman Abdel-Hamid, CS4
254 Spring 2006
13
IP Multicast: Multicast Routing
•Multicast router maintains table of multicast groups that are active on its
networks
•Datagrams forwarded only to those networks with group members
Multicasting
© Dr. Ayman Abdel-Hamid, CS4
254 Spring 2006
14
IP Multicast: Multicast Routing
•How multicast routers route traffic amongst themselves to ensure delivery of
group traffic?
Find a tree of links that connects all of the routers that have attached
hosts belonging to the multicast group
Group-shared trees
Source-based trees
Shared Tree
Multicasting
© Dr. Ayman Abdel-Hamid
Source Tree
s
15
MBONE: Internet Multicast Backbone
•The MBone is a virtual network on top of the Internet (section B.2)
Routers that support IP multicast
IP tunnels between such routers and/or subnets
Multicasting
© Dr. Ayman Abdel-Hamid, CS4
254 Spring 2006
16
Issues with IP Multicast

IP Multicast islands

ISPs are not always interested in mutual cooperation.

Forwarding IP Unicast

traffic is “easier” (stateless) than forwarding IP Multicast.
Billing Issues


Violates ISP input-rate-based billing model




ISPs fear the additional traffic induced through IP Multicast
(DoS attacks – join high traffic volume Multicast groups).
Multicast islands are interconnected with tunnels: MBONE.


No indication of group size (again needed for billing)
Security Issues


No incentive for ISPs to enable multicast!
Manual setup of tunnels, Administrative overhead.
Result: Usability is reduced
Application Layer Multicast

A promising alternative: ALM


Attempts to mimic multicast
routing at the application level
Instead of high-speed routers,
use programmable end hosts as
interior nodes in the ALM system
ALM System
ALM Structure
Dissemination
Method
Data Dissemination in ALM

Concerns

What is target data?


How fast?



Streaming, Large files, Notification
Messages
Tolerant to delay, emergency case
How reliably?
How efficiently?

Overhead, Bandwidth
What you expect with ALM

System perspective


Utilize all available bandwidth
User perspective



Perfect continuity
Low startup delay
Low lag
Design space of CEM
ALM
Data Plane
Tree
Forest/
Multiple
Tree
Mesh
Hybrid
(Tree
with
Mesh)
Control Plane
SAAR
[17]
Random
Network
Data Plane : Tree

Loop free path
Capacity of each tree link >= the
streaming rate
Push based forwarding

Low latency



Unable to utilize the forwarding
bandwidth of all participating nodes


Highly sensitive node failure



Only interior nodes contribute
Subtrees are significantly affected by a
node failure
Require sophisticated recovery
techniques
ESM (End-System Multicast)[2],
Overcast[3], ZIGZAG[4], NICE[5]
Points of failure

Points on an ALM structure which
can disturb message dissemination
behind these points

Each node of a single tree structure is a
point of failure
Is failure recovery feasible?

Tree repair takes time which causes delay


Tens of seconds
Detection of failure takes time




Ex) TCP timeover, hearbeat duration
Tree reconstruction cost is high
Eager techniques that frequently monitor
failure – significant overhead
Goal: Reliable dissemination without failure
recovery?
Increasing Reliability

Increasing the number of parents

N parents ALM structure



SplitStream[2003], K-DAG [2004], Multicast Forest
[2005]
N parents  ALM endures N-1 failures among
its parents
But, each node may get N-1 redundant
messages
Data Plane : Forest/Multiple
Tree


K trees  K loop free paths
Data is separated into multiple (K) chunks


Capacity of each tree link >= the streaming rate/K




Each node can have more children than a single tree
Average depth of trees is lower than a single tree
Reduce delay and increase fault resilience
High Reliability



Each chunk is disseminated in one of the trees
Sending same data to K paths
Endure K-1 failures
SplitStream[6], CoopNet[7], Chunkyspread[8]
Random Multicast Forest



Use multiple multicast tree ( K )
Each node gets k parents
Sometimes it can have the same parents
for different trees

Cannot endure K-1 failures with some
probability
S
a
b
d
c
e
g
f
h
i
K-DAG (Directed Acyclic
Graph )

Each node selects each parent randomly



Select “M” closest candidate nodes ( M > K )
Select K parents randomly among the M
Each node gets K parents

Still it can not guarantee message delivery
S
a
b
c
d
f
e
g
h
i
SplitStream[6]


Forest based dissemination
Basic idea


Split the stream into K stripes (with MDC
coding)
For each stripe create a multicast tree such
that the forest


Contains interior-node-disjoint trees
Respects nodes’ individual bandwidth constraints
SplitStream : MDC coding

Multiple Description coding



Fragments a single media stream
into M substreams (M ≥ 2 )
K packets are enough for decoding (K
< M)
Less than K packets can be used to
approximate content


Useful for multimedia (video, audio) but not
for other data
Cf) erasure coding for large data file
SplitStream : Interior-nodedisjoint tree


Each node in a set of trees is interior
node in at most one tree and leaf
node in the other trees.
Each substream is disseminated over
subtrees
S
ID =0x…
b
ID =1x…
a
c
e
d
ID =2x…
g
f
h
i
Data Plane : Mesh


A Random Graph
Node’s degree is proportional to the node’s
forwarding bandwidth


Gossip based forwarding




Minimum degree : ensure the mesh remains connected in
the presence of churn
Buffer (b) : most recently received blocks
Every r seconds, nodes exchange the information of missing
blocks and buffered blocks.
Randomly request an available block for a missing block
CoolStreaming[9], Chainsaw[10], PRIME[11],
PULSE[12], CREW[13]
Pros and Cons of Mesh




Fully utilize bandwidth of all nodes
High Failure resilience
High overhead
Long Delay/start up
Gossip-based Broadcast
Probabilistic Approach with Good Fault Tolerant Properties




Choose a destination node, uniformly at random, and send it the message
After Log(N) rounds, all nodes will have the message w.h.p.
Requires N*Log(N) messages in total
Needs a ‘random sampling’ service
Usually implemented as


Rebroadcast ‘fanout’ times
Using UDP: Fire and Forget
BiModal Multicast (99), Lpbcast (DSN 01), Rodrigues’04 (DSN), Brahami ’04, Verma’06 (ICDCS),
Eugster’04 (Computer), Koldehofe’04, Periera’03
Data Plane : Hybrid

Tree/Forest + Mesh

Tree/Forest : Push-based dissemination


Mesh : Pull-based dissemination


Low delay
Instant failure recovery
Bullet[14], PRM[15], FaReCast[16]
Bullet


Layers a mesh on top of an overlay
tree to increase overall bandwidth
Basic Idea



Use a tree as a basis
In addition, each node continuously
looks for peers to download from
In effect, the overlay is a tree combined
with a random network (mesh)
PRM (Probabilistic Resilient
Multicasting)


Tree-based Multicasting with a few of
random mesh links.
Cross-Link Redundancy


Add (random) cross links to the
forwarding topology
For Reliability
PRM (Probabilistic Resilient
Multicasting)
Recovery Mechanism of PRM

Basic assumption


Ephemeral forwarding – Reactive



Nodes maintain a set of mesh neighbors
Upon detecting a disconnection, the head of
disconnected node tries to obtain the stream
from one of the mesh neighbors
Before data plane recovery
Randomized forwarding – Proactive

A node forwards the stream to a random mesh
neighbors with very small probability (1~3%)
Case Study : FaReCast for
Flash Dissemination

Early Warning System

Earthquake Early Warning System



Earthquake Event : Rare event, but its
spreading speed is very fast
Give the recipients a little more time to
prepare for the disaster
A simple alert-message should be
delivered to all recipients within very
short time duration
Catastrophe Resilience Use-Case

Earthquakes may damage infrastructure in
unpredictable and multiple ways


If earthquake disrupts the major
routers/links in LA




Power, Telecommunication, Servers
Can assume 10-25% of all overlay nodes in
California will be “down”
Usually, people have studied node failures
on smaller scale: Fault Tolerance
We are looking at scenario where a large
fraction of the network is simultaneously
affected : Catastrophe Tolerance
Design systems that are resilient to 50%41
simultaneous node failures
Characteristic of Recipients

The number of Recipients is huge


Hundreds of thousands recipients may
be interested in this warning messages
Recipients are unreliable



Normal machine administrated by normal
users
Free will of join/leave
Need
to consider
message
Internal
laggingthe
forreliable
forwarding
dissemination to all online recipients
Event characteristic
Number of Earthquakes in the United States for 2000 - 2009
Located by the US Geological Survey National Earthquake Information Center
Magnitude
2000
2001
2002
2003
2004
2005
2006
2007
2008
2009
8.0 to 9.9
0
0
0
0
0
0
0
0
0
0
7.0 to 7.9
0
1
1
2
0
1
0
1
0
0
6.0 to 6.9
6
5
4
7
2
4
7
9
9
0
5.0 to 5.9
63
41
63
54
25
47
51
72
73
5
4.0 to 4.9
281
290
536
541
284
345
346
366
446
49
3.0 to 3.9
917
842
1535
1303
1364
1475
1213
1137
1483
171
2.0 to 2.9
660
646
1228
704
1336
1738
1145
1173
1570
247
1.0 to 1.9
0
2
2
2
1
2
7
11
14
4
0.1 to 0.9
0
0
0
0
0
0
1
0
0
0
The major415earthquake
(Magnitude>6)
is 13
very 22
rare event.
No Magnitude
434
507
333
540
73
20
Total
2342
2261
3876
2946
3552
3685
* 2783
2791
* 3615
6
* 482
 Need to consider the maintenance overhead of the
( http://neic.usgs.gov/neis/eqlists/eqstats.html
)
early warning system.
44/32
Forest-based M2M ALM
structure
Multiple Fan-In – increase reliability with rich path diversity
Multiple Fan-Out – for efficient and fast dissemination
↑Path Diversity, ↑ reliability, ↑speed
45/32
Properties of M2M ALM
structure

Fan-in constraint


Each node has Fi distinct parents (except level 0/1)
All Parents of a node are on the previous level



Root node reliability
Loop-Free transmission



Level : length in overlay from a root node
No loop in parent-to-children dissemination
All paths from root to node have same length – data
reaches from diff. parents closeby in time.
Parent set uniqueness


Parent sets of any two nodes at a level (level ≥2) differ by
at least one parent.
Increases path diversity, more reliable delivery
46/32
Realizing parent set
uniqueness
Fan-out factor = 2
Fan-in = 2
N L  ( Fo )  Fo  Fi  2
L
A
Level 0
Level 1
NL = 4
B
F
C
I
?
Level 2
NL = 6
D
E
G
H
J
K
Bypassing Problem with Simple
Multicasting Method
2 parents ALM structure
a
c
b
d
g
e
h
f
i
(1) Bypassing Problem: Type 1
- The message may bypass
an alive nodes, when its all
parents fail
- A child (node h) of a
bypassed node ( node d )
can receive the message from
other parents
(2) Bypassing Problem: Type 2
-A child of a bypassed node
can not receive the message,
when its other parents fail
Probability of Bypassing
Probability of bypassing (type 1)
- Pf : probability that a node fails
- n parents : means there are ‘n’
paths from which the message can
come
n
 Pf
j 1
Probability of bypassing (type 2)
- Multiply ‘Pf’ of its parents to the probability of
bypassing
- It is smaller than type 1, but still can happen
j
Prevent Bypassing

Use Child-to-Parents relationship


General dissemination of tree/forest ALM
structures uses only Parent-to-Children
relationship
Get additional ‘m’ paths, where ‘m’ is the
number of children
n
 Pf
j 1
m
j
  Pf j
j 1
Backup Dissemination
2 parents ALM structure
a
c
b
d
g
e
h
f
i
Each node can expect
the duplicated messages
A node does not get
duplicated message
within some time, it
sends the message to
its parents which did not
send a message
Limitation of Backup
Dissemination
2 parents ALM structure
a
c
b
d
g
e
h
f
j
i
Node g, h and i are leaf
nodes having no backup
paths from children
Node d and h can get the
message by the Backup
Dissemination
Still node i can not get the
message
 Leaf nodes are problem
Leaf-to-Leaf Dissemination
2 parents ALM structure
a
c
b
d
g
e
d
d
h
Each leaf node has links to
other leaf nodes sharing
the same parents
f
f
j
i
A leaf node does not get
duplicated message within
some time from a specific
parent, it sends the
message to this parent as
well as its children
EXTRA SLIDES
Control Plane : Random
Network

Bounce Protocol in CREW[13]



A kind of K-DAG
Random walk through a tree to get new parents
Construct DAG with random parents
S
Neighbor
Request
c
g
a
b
d
e
f
Control Plane : SAAR


Isolated control plane to support
efficient and flexible selection of data
dissemination peers
Multiple overlays share a control
overlay


Reduce overhead and improve
scalability
Anycast primitive used to build
different overlays
References
[1] Understanding the design tradeoffs for cooperative streaming multicast, In Max Planck
Institute for Software Systems Technical Report MPI-SWS-2009-002.
[2] Early Experience with an Internet Broadcast System Based on Overlay Multicast. In
Proceedings of USENIX Annual Technical Conference 2004.
[3] Overcast: Reliable Multicasting with an Overlay Network. In Proceedings of OSDI 2000.
[4] ZIGZAG: An efficient peer-to-peer scheme for media streaming. In Proceedings of
INFOCOM 2003.
[5] Scalable Application Layer Multicast. In Proceedings of SIGCOMM 2002.
[6] Splitstream: High-bandwidth multicast in a cooperative environment. In Proceedings of
SOSP 2003.
[7] Distributing streaming media content using cooperative networking. In Proceedings of
NOSSDAV 2002.
[8] Chunkyspread: Heterogeneous unstructured tree-based peer-to-peer multicast. In
Proceedings of ICNP 2006.
[9] Coolstreaming/DONet: A data-driven overlay network for peer-to-peer live media streaming.
In Proceedings of INFOCOM 2005.
[10] Chainsaw: Eliminating trees from overlay multicast. In Proceedings of IPTPS 2005.
[11] PRIME: Peer-to-peer Receiver-drIven MEsh-based Streaming. In Proceedings of INFOCOM
2007.
[12] PULSE: An adaptive, incentive-based, unstructured p2p live streaming system. In
Proceedings of IEEE Transactions on Multimedia, Vol. 9, Nov. 2007.
[13] CREW: A Gossip-based Flash-Dissemination System. In Proceedings of ICDCS 2006.
[14] Bullet:High Bandwidth Data Dissemination Using an Overlay Mesh. In Proceedings of SOSP
2003.
[15] Resilient multicast using overlays. In Proceedings of ACM SIGMETRICS 2003.
[16] FaReCast: Fast, Reliable Application Layer Multicast for Flash Dissemination. In
Proceedings of Middleware 2010.
[17] SAAR: A shared control plane for overlay multicast. In Proceedings of NSDI 2007.
EXTRA SLIDES (Not covered)
Constraint Model for CEM
Continuity
Mesh achieves almost
perfect continuity after T=45
Trees are fast, but lose
continuity without recoveries
T-C : the proportion of the streamed bits that have arrived at the
receiving node within T seconds of their first transmission by the source
Join Delay
Effect of stream rate
With high stream rate (e.g. Ri=1.2), tree loses continuity substantially.
Reducing Mesh Lag
By decreasing the streaming block size
By decreasing swarming interval (r)
Overhead!!
Closer view of buffer size and
swarming interval
Swarming overhead hampers delivery
delay of blocks.
Improving Tree Continuity
Recovery mechanisms improve continuity,
but incur delays
Under high churn and packet loss