Download Tema 1: Tecnologías LAN. - GRC

Document related concepts

RapidIO wikipedia , lookup

Backpressure routing wikipedia , lookup

Computer network wikipedia , lookup

Asynchronous Transfer Mode wikipedia , lookup

CAN bus wikipedia , lookup

Airborne Networking wikipedia , lookup

Deep packet inspection wikipedia , lookup

Internet protocol suite wikipedia , lookup

Wake-on-LAN wikipedia , lookup

Cracking of wireless networks wikipedia , lookup

Serial digital interface wikipedia , lookup

IEEE 1355 wikipedia , lookup

I²C wikipedia , lookup

Recursive InterNetwork Architecture (RINA) wikipedia , lookup

Zero-configuration networking wikipedia , lookup

IEEE 802.1aq wikipedia , lookup

Routing wikipedia , lookup

Routing in delay-tolerant networking wikipedia , lookup

Multiprotocol Label Switching wikipedia , lookup

Transcript
Tema 2:
Aspectos de encaminamiento
Algoritmos básicos de encaminamiento
 Link state
 Distance Vector
Encaminamiento en Internet
 RIP
 OSPF
 BGP
Multi-Protocol Label Switching (MPLS).
IP multicast
Transmisión de Datos Multimedia –
http://www.grc.upv.es/docencia/tdm
– Master IC 2007/2008
Transmisión de Datos Multimedia - Master IC 2007/2008
Interplay between routing, forwarding
Computer Networking: A Top
Down Approach Featuring the
Internet,
3rd edition.
Jim Kurose, Keith Ross
Addison-Wesley, July 2004.
routing algorithm
local forwarding table
header value output link
0100
0101
0111
1001
3
2
2
1
value in arriving
packet’s header
0111
1
3 2
2
Transmisión de Datos Multimedia - Master IC 2007/2008
Graph abstraction
5
2
u
2
1
Graph: G = (N,E)
v
x
3
w
3
1
5
z
1
y
2
N = set of routers = { u, v, w, x, y, z }
E = set of links ={ (u,v), (u,x), (v,x), (v,w), (x,w), (x,y), (w,y), (w,z), (y,z) }
Remark: Graph abstraction is useful in other network contexts
Example: P2P, where N is set of peers and E is set of TCP connections
3
Transmisión de Datos Multimedia - Master IC 2007/2008
Graph abstraction: costs
5
2
u
v
2
1
x
• c(x,x’) = cost of link (x,x’)
3
w
3
1
5
z
1
y
- e.g., c(w,z) = 5
2
• cost could always be 1, or
inversely related to bandwidth,
or inversely related to
congestion
Cost of path (x1, x2, x3,…, xp) = c(x1,x2) + c(x2,x3) + … + c(xp-1,xp)
Question: What’s the least-cost path between u and z ?
Routing algorithm: algorithm that finds least-cost path
4
Transmisión de Datos Multimedia - Master IC 2007/2008
5
Routing Algorithm classification
Global or decentralized information?
Static or dynamic?
Global:
 all routers have complete topology, link cost
info
 “link state” algorithms
Decentralized:
 router knows physically-connected
neighbors, link costs to neighbors
 iterative process of computation, exchange
of info with neighbors
 “distance vector” algorithms
Static:
 routes change slowly over time
Dynamic:
 routes change more quickly
 periodic update
 in response to link cost
changes
Transmisión de Datos Multimedia - Master IC 2007/2008
6
A Link-State Routing Algorithm: Dijsktra’s Algorithm
 net topology, link costs known to all
nodes
 accomplished via “link state
broadcast”
 all nodes have same info
 computes least cost paths from one
node (‘source”) to all other nodes
 gives forwarding table for
that node
 iterative: after k iterations, know least
cost path to k destinations
Notation:
 c(x,y): link cost from node x to y; = ∞ if not
direct neighbors
 D(v): current value of cost of path from source to
destination v
 p(v): predecessor node along path from source to
v
 N': set of nodes whose least cost path definitively
known
1 Initialization:
2 N' = {u}
3 for all nodes v
4
if v adjacent to u
5
then D(v) = c(u,v)
6
else D(v) = ∞
7
8 Loop
9 find w not in N' such that D(w) is a minimum
10 add w to N'
11 update D(v) for all v adjacent to w and not in N' :
12
D(v) = min( D(v), D(w) + c(w,v) )
13 /* new cost to v is either old cost to v or known
14 shortest path cost to w plus cost from w to v */
15 until all nodes in N'
Transmisión de Datos Multimedia - Master IC 2007/2008
Dijkstra’s algorithm example
5
B
2
A
2
1
7
SPT
A
C
C
F
2
E
1
5
1
3
D
B
step
0
3
D
E
F
D(b), P(b) D(c), P(c) D(d), P(d) D(e), P(e) D(f), P(f)
2, A
5, A
1, A
~
~
Transmisión de Datos Multimedia - Master IC 2007/2008
Dijkstra’s algorithm example
5
B
2
A
2
1
8
SPT
A
AD
C
C
F
2
E
1
5
1
3
D
B
step
0
1
3
D
E
F
D(b), P(b) D(c), P(c) D(d), P(d) D(e), P(e) D(f), P(f)
2, A
5, A
1, A
~
~
2, A
4, D
2, D
~
Transmisión de Datos Multimedia - Master IC 2007/2008
Dijkstra’s algorithm example
5
B
2
A
2
1
9
SPT
A
AD
ADE
C
C
F
2
E
1
5
1
3
D
B
step
0
1
2
3
D
E
F
D(b), P(b) D(c), P(c) D(d), P(d) D(e), P(e) D(f), P(f)
2, A
5, A
1, A
~
~
2, A
4, D
2, D
~
2, A
3, E
4, E
Transmisión de Datos Multimedia - Master IC 2007/2008
Dijkstra’s algorithm example
5
B
2
A
2
1
1
0
SPT
A
AD
ADE
ADEB
C
C
F
2
E
1
5
1
3
D
B
step
0
1
2
3
3
D
E
F
D(b), P(b) D(c), P(c) D(d), P(d) D(e), P(e) D(f), P(f)
2, A
5, A
1, A
~
~
2, A
4, D
2, D
~
2, A
3, E
4, E
3, E
4, E
Transmisión de Datos Multimedia - Master IC 2007/2008
Dijkstra’s algorithm example
5
B
2
A
2
1
1
1
SPT
A
AD
ADE
ADEB
ADEBC
C
C
F
2
E
1
5
1
3
D
B
step
0
1
2
3
4
3
D
E
F
D(b), P(b) D(c), P(c) D(d), P(d) D(e), P(e) D(f), P(f)
2, A
5, A
1, A
~
~
2, A
4, D
2, D
~
2, A
3, E
4, E
3, E
4, E
4, E
Transmisión de Datos Multimedia - Master IC 2007/2008
Dijkstra’s algorithm example
5
B
2
A
2
1
1
2
SPT
A
AD
ADE
ADEB
ADEBC
C
D(b), P(b)
2, A
2, A
2, A
C
F
2
E
1
5
1
3
D
B
step
0
1
2
3
4
3
D
E
D(c), P(c) D(d), P(d) D(e), P(e)
5, A
1, A
~
4, D
2, D
3, E
3, E
F
D(f), P(f)
~
~
4, E
4, E
4, E
Transmisión de Datos Multimedia - Master IC 2007/2008
Dijkstra’s algorithm example
Resulting shortest-path tree from A:
B
C
A
F
D
E
Resulting forwarding table in A:
destination
1
3
link
B
D
(A,B)
(A,D)
E
(A,D)
C
(A,D)
F
(A,D)
Transmisión de Datos Multimedia - Master IC 2007/2008
Dijkstra’s algorithm, discussion
Algorithm complexity: n nodes
 each iteration: need to check all nodes, w, not in N
 n(n+1)/2 comparisons: O(n2)
 more efficient implementations possible: O(nlogn)
Oscillations possible:
 e.g., link cost = amount of carried traffic
D
1
1
0
A
0 0
C
e
1+e
e
initially
1
4
B
1
2+e
A
0
D 1+e 1 B
0
0
C
… recompute
routing
0
D
1
A
0 0
C
2+e
B
1+e
… recompute
2+e
A
0
D 1+e 1 B
e
0
C
… recompute
Transmisión de Datos Multimedia - Master IC 2007/2008
1
5
Distance Vector Algorithm
 Bellman-Ford Equation (dynamic programming)
 Define: dx(y) := cost of least-cost path from x to y
 Then
dx(y) = min {c(x,v) + dv(y) }
v
 where min is taken over all neighbors v of x
Transmisión de Datos Multimedia - Master IC 2007/2008
Bellman-Ford example
Clearly, dv(z) = 5, dx(z) = 3, dw(z) = 3
5
2
u
v
2
1
x
3
w
3
1
5
z
1
y
B-F equation says:
2
du(z) = min { c(u,v) + dv(z),
c(u,x) + dx(z),
c(u,w) + dw(z) }
= min {2 + 5,
1 + 3,
5 + 3} = 4
Node that achieves minimum is next
hop in shortest path ➜ forwarding table
1
6
Transmisión de Datos Multimedia - Master IC 2007/2008
1
7
Distance Vector Algorithm
 Node x maintains the Distance vector:
 Dx = [Dx(y): y є N ]
 Where Dx(y) = estimate of least cost from x to y
 Node x also maintains its neighbors’ distance vectors
 For each neighbor v, x maintains
Dv = [Dv(y): y є N ]
 Node x knows the cost to each neighbor v: c(x,v)
Transmisión de Datos Multimedia - Master IC 2007/2008
1
8
Distance vector algorithm
Basic idea:
 Each node periodically sends its own distance vector estimate to
neighbors
 When a node x receives new DV estimate from neighbor, it updates
its own DV using B-F equation:
Dx(y) ← minv{c(x,v) + Dv(y)} for each node y ∊ N
 Under minor, natural conditions, the estimate Dx(y) converge to the
actual least cost dx(y)
Transmisión de Datos Multimedia - Master IC 2007/2008
Distance Vector Algorithm
 Iterative, asynchronous: each local
iteration caused by:
 local link cost change
 DV update message from
neighbor
 Distributed:
 each node notifies neighbors
only when its DV changes
 neighbors then notify their
neighbors if necessary
Each node:
wait for (change in local link
cost of msg from neighbor)
recompute estimates
if DV to any dest has
changed, notify neighbors
1
9
node x table
cost to
x y z
from
from
from
x 0 2 7
y 2 0 1
z 7 1 0
x 0 2 3
y 2 0 1
z 3 1 0
cost to
x y z
x 0 2 3
y 2 0 1
z 3 1 0
cost to
x y z
cost to
x y z
x 0 2 7
y 2 0 1
z 3 1 0
x 0 2 3
y 2 0 1
z 3 1 0
from
from
x ∞∞ ∞
y ∞∞ ∞
z 71 0
x 0 2 3
y 2 0 1
z 7 1 0
cost to
x y z
cost to
x y z
from
x ∞ ∞ ∞
y 2 0 1
z ∞∞ ∞
node z table
cost to
x y z
cost to
x y z
from
from
x 0 2 7
y ∞∞ ∞
z ∞∞ ∞
node y table
cost to
x y z
from
Transmisión de Datos Multimedia - Master IC 2007/2008
2
0
Dx(z) = min{c(x,y) + Dy(z), c(x,z) + Dz(z)}
= min{2+1 , 7+0} = 3
Dx(y) = min{c(x,y) + Dy(y), c(x,z) + Dz(y)}
= min{2+0 , 7+1} = 2
x
2
time
y
7
1
z
Transmisión de Datos Multimedia - Master IC 2007/2008
Comparison of LS and DV algorithms
 Message complexity
 LS: with n nodes, E links, O(nE)
msgs sent
 DV: exchange between
neighbors only
 convergence time varies
 Speed of Convergence
 LS: O(n2) algorithm requires
O(nE) msgs
 may have oscillations
 DV: convergence time varies
 may be routing loops
 count-to-infinity problem
2
1
 Robustness: what happens if router
malfunctions?
 LS:
 node can advertise incorrect link
cost
 each node computes only its
own table
 DV:
 DV node can advertise incorrect
path cost
 each node’s table used by
others
– error propagate thru network
Tema 2:
Aspectos de encaminamiento
Algoritmos básicos de encaminamiento
 Link state
 Distance Vector
Encaminamiento en Internet
 RIP
 OSPF
 BGP
Multi-Protocol Label Switching (MPLS).
IP multicast
Transmisión de Datos Multimedia –
http://www.grc.upv.es/docencia/tdm
– Master IC 2007/2008
Transmisión de Datos Multimedia - Master IC 2007/2008
Hierarchical Routing
 Our routing study thus far - idealization
 all routers identical
 network “flat”
 … not true in practice
 scale: with 200 million destinations:
 can’t store all dest’s in routing tables!
 routing table exchange would swamp links!
 administrative autonomy
 internet = network of networks
 each network admin may want to control routing in its own network
2
3
Transmisión de Datos Multimedia - Master IC 2007/2008
2
4
Hierarchical Routing
Gateway router
 Direct link to router in another AS
 aggregate routers into regions, “autonomous systems” (AS)
 routers in same AS run same routing protocol
 “intra-AS” routing protocol
 routers in different AS can run different intra-AS routing protocol
Transmisión de Datos Multimedia - Master IC 2007/2008
Interconnected ASes
3c
3a
3b
AS3
1a
2a
1c
1d
1b
Intra-AS
Routing
algorithm
AS2
AS1
Inter-AS
Routing
algorithm
Forwarding
table
2
5
2c
2b
 Forwarding table is configured by
both intra- and inter-AS routing
algorithm
 Intra-AS sets entries for
internal dests
 Inter-AS & Intra-As sets
entries for external dests
Transmisión de Datos Multimedia - Master IC 2007/2008
Inter-AS tasks
 Suppose router in AS1 receives
datagram for which dest is outside of
AS1
 AS1 needs:
 to learn which dests are
reachable through AS2 and
which through AS3
 to propagate this reachability
info to all routers in AS1
 Job of inter-AS routing!
 Router should forward packet
towards one of the gateway
routers, but which one?
3c
3b
3a
AS3
1a
2
6
2a
1c
1d
1b
2c
AS2
AS1
2b
Transmisión de Datos Multimedia - Master IC 2007/2008
Example: Choosing among multiple ASes
 Now suppose AS1 learns from the inter-AS protocol that subnet x is
reachable from AS3 and from AS2.
 To configure forwarding table, router 1d must determine towards
which gateway it should forward packets for dest x.
 This is also the job on inter-AS routing protocol!
 Hot potato routing: send packet towards closest of two routers.
Learn from inter-AS
protocol that subnet
x is reachable via
multiple gateways
2
7
Use routing info
from intra-AS
protocol to determine
costs of least-cost
paths to each
of the gateways
Hot potato routing:
Choose the gateway
that has the
smallest least cost
Determine from
forwarding table the
interface I that leads
to least-cost gateway.
Enter (x,I) in
forwarding table
Transmisión de Datos Multimedia - Master IC 2007/2008
2
8
Intra-AS Routing
 Also known as Interior Gateway Protocols (IGP)
 Most common Intra-AS routing protocols:
 RIP: Routing Information Protocol
 OSPF: Open Shortest Path First
 IGRP: Interior Gateway Routing Protocol (Cisco proprietary)
Transmisión de Datos Multimedia - Master IC 2007/2008
RIP ( Routing Information Protocol)
 Distance vector algorithm
 Included in BSD-UNIX Distribution in 1982
 Distance metric: # of hops (max = 15 hops)
From router A to subsets:
u
v
A
z
2
9
C
B
D
w
x
y
destination hops
u
1
v
2
w
2
x
3
y
3
z
2
Transmisión de Datos Multimedia - Master IC 2007/2008
RIP advertisements
 Distance vectors: exchanged among neighbors every 30 sec via
Response Message (also called advertisement)
 Each advertisement: list of up to 25 destination nets within AS
 Table processing:
 RIP routing tables managed by application-level process called route-d
(daemon)
 advertisements sent in UDP packets, periodically repeated
routed
routed
Transprt
(UDP)
network
(IP)
link
physical
3
0
Transprt
(UDP)
forwarding
table
forwarding
table
network
(IP)
link
physical
Transmisión de Datos Multimedia - Master IC 2007/2008
3
1
RIP: Link Failure and Recovery
 If no advertisement heard after 180 sec --> neighbor/link declared
dead





routes via neighbor invalidated
new advertisements sent to neighbors
neighbors in turn send out new advertisements (if tables changed)
link failure info quickly propagates to entire net
poison reverse used to prevent ping-pong loops (infinite distance = 16
hops)
Transmisión de Datos Multimedia - Master IC 2007/2008
3
2
OSPF (Open Shortest Path First)
 “open”: publicly available
 Uses Link State algorithm
 LS packet dissemination
 Topology map at each node
 Route computation using Dijkstra’s algorithm
 OSPF advertisement carries one entry per neighbor router
 Advertisements disseminated to entire AS (via flooding)
 Carried in OSPF messages directly over IP (rather than TCP or UDP
Transmisión de Datos Multimedia - Master IC 2007/2008
3
3
OSPF “advanced” features (not in RIP)
 Security: all OSPF messages authenticated (to prevent malicious
intrusion)
 Multiple same-cost paths allowed (only one path in RIP)
 For each link, multiple cost metrics for different TOS (e.g., satellite
link cost set “low” for best effort; high for real time)
 Integrated uni- and multicast support:
 Multicast OSPF (MOSPF) uses same topology data base as OSPF
 Hierarchical OSPF in large domains.
Transmisión de Datos Multimedia - Master IC 2007/2008
3
4
Hierarchical OSPF
 Two-level hierarchy: local area,
backbone.
 Link-state advertisements only
in area
 each nodes has detailed area
topology; only know direction
(shortest path) to nets in other
areas.
 Area border routers: “summarize”
distances to nets in own area,
advertise to other Area Border routers.
 Backbone routers: run OSPF routing
limited to backbone.
 Boundary routers: connect to other
AS’s.
Transmisión de Datos Multimedia - Master IC 2007/2008
3
5
Internet inter-AS routing: BGP
 BGP (Border Gateway Protocol): the de facto standard
 BGP provides each AS a means to:
1. Obtain subnet reachability information from neighboring ASs.
2. Propagate the reachability information to all routers internal to the AS.
3. Determine “good” routes to subnets based on reachability information
and policy.
 Allows a subnet to advertise its existence to rest of the Internet: “I
am here”
Transmisión de Datos Multimedia - Master IC 2007/2008
BGP basics
 Pairs of routers (BGP peers) exchange routing info over semipermanent TCP connections: BGP sessions
 Note that BGP sessions do not correspond to physical links.
 When AS2 advertises a prefix to AS1, AS2 is promising it will
forward any datagrams destined to that prefix towards the prefix.
 AS2 can aggregate prefixes in its advertisement
3c
3a
3b
AS3
1a
AS1
2a
1c
1d
1b
2c
AS2
2b
eBGP session
iBGP session
3
6
Transmisión de Datos Multimedia - Master IC 2007/2008
3
7
BGP route selection
 Router may learn about more than 1 route to some prefix. Router
must select route.
 Elimination rules:




Local preference value attribute: policy decision
Shortest AS-PATH
Closest NEXT-HOP router: hot potato routing
Additional criteria
Transmisión de Datos Multimedia - Master IC 2007/2008
3
8
Why different Intra- and Inter-AS routing ?
 Policy:
 Inter-AS: admin wants control over how its traffic routed, who routes
through its net.
 Intra-AS: single admin, so no policy decisions needed
 Scale:
 hierarchical routing saves table size, reduced update traffic
 Performance:
 Intra-AS: can focus on performance
 Inter-AS: policy may dominate over performance
Tema 2:
Aspectos de encaminamiento
Algoritmos básicos de encaminamiento
 Link state
 Distance Vector
Encaminamiento en Internet
 RIP
 OSPF
 BGP
Multi-Protocol Label Switching (MPLS).
IP multicast
Transmisión de Datos Multimedia –
http://www.grc.upv.es/docencia/tdm
– Master IC 2007/2008
Transmisión de Datos Multimedia - Master IC 2007/2008
MPLS - The Motivation






IP Protocol Suite - the most predominant networking technology.
Voice & Data convergence on a single network infrastructure.
Continual increase in number of users.
Demand for higher connection speeds.
Increase in traffic volumes.
Ever-increasing number of ISP networks.
 MPLS Working Groups and Standards




4
0
Standardized by the IETF - currently in Draft stage.
MPLS recommendations are done by IP players for IP services
MPLS core components are generic
MPLS doesn’t use specific technology process
Transmisión de Datos Multimedia - Master IC 2007/2008
MPLS and ISO model
IETF main goal is that
when a layer is added,
no modification is
needed on the existing
layers.
All new protocol must
be backward
compatible
7
to
5
Applications
TCP
PPP
PPP
UDP
IP
MPLS
Frame
3
ATM (*)
ATM (*)
2
Physical (Optical - Electrical)
1
Frame
Relay
Relay
(*) ATM overlay model
(without addressing and
P-NNI) is considered as
an ISO layer 2 protocol.
4
1
4
Transmisión de Datos Multimedia - Master IC 2007/2008
Some MPLS Terms...









4
2
LER - Label Edge Router
LSR - Label Switch Router
FEC - Forward Equivalence Class
Label - Associates a packet to a FEC
Label Stack - Multiple labels containing information on how a packet
is forwarded.
Shim - Header containing a Label Stack
Label Switch Path - path that a packet follows for a specific FEC
LDP - Label Distribution Protocol, used to distribute Label
information between MPLS-aware network devices
Label Swapping - manipulation of labels to forward packets towards
the destination.
Transmisión de Datos Multimedia - Master IC 2007/2008
MPLS Architecture
LSP (Label-Switched Paths)
Routing protocol
Classification
Label assignment
OSPF
Label removal
OSPF
OSPF
Local table
Local table
Local table
Local table
Local table
Layer 2
Layer 2
Layer 2
Layer 1
Layer 1
Layer 1
Core
Node
Egress
Node
Forward Equivalence Class
FEC table
Attributes
Label table
Label Switch
Local table
Precedence
Ingress
Node
4
3
Label swapping
Transmisión de Datos Multimedia - Master IC 2007/2008
MPLS process
Label Switch Path
OSPF / RIP / IS-IS
Label removal
Label swapping
Classification
Label assignment
FEC
FEC
FEC
Label table
Label table
Label table
Layer 2
Layer 2
Layer 2
Layer 1
Layer 1
Layer 1
Core
Node
Egress
Node
Precedence
Ingress
Node
4
4
Transmisión de Datos Multimedia - Master IC 2007/2008
MPLS Cloud
LER: Label Edge Router
L3 Routing
LER
L3 Routing
L3 Routing
LER
LSR
Label Swapping
L3 Routing
IP Packet
IP Packet w/ Label
4
5
LER
LSR
Label Swapping LER
L3 Routing
LSR: Label Switch(ing) Router
Transmisión de Datos Multimedia - Master IC 2007/2008
FEC Classification
 A packet can be mapped to a particular FEC based on the following criteria:








destination IP address,
source IP address,
TCP/UDP port,
in case of inter AS-MPLS, Source-AS and Dest-AS,
class of service,
application used,
…
any combination of the previous criteria.
Ingress Label
6
FEC
Egress Label
138.120.6/24 - xxxx
9
•FECs are manually initiated by the operator
•A FEC is associated to at least one Label
Ingress
Label
Ingress
Label
4
6
FEC FEC
Attribute
Egress
Label
Attribute
Egress
Label
6
138.120.6/24 - xxxx
A
9
6
138.120.6/24 - xxxx
B
12
Transmisión de Datos Multimedia - Master IC 2007/2008
What is a Label?
 A label is a short, fixed length, locally significant identifier used to
identify a FEC.
 The label can be identified by the L2 technology identifier (e.g.
VPI/VCI for ATM, DLCI for FR or MPLS label for PPP/Ethernet).
L2 Type
Port Ingress Label FEC
L2 Type
Egress Label Port
ATM
1-1
12 (i.e. 4/65)
F1
ATM
22 (i.e. 5/65)3-4
ATM
1-1
15 (i.e. 0/25)
F4
FR
9 (i.e. 101) 5-1
Gig Eth
5-1
F1
ATM
22 (i.e. 4/65)3-4
7
 MPLS Label Assignment Schemes
 Topology Driven
 Label assignment in response to routing protocols (OSPF and BGP) updates
 Control Driven
 Label assignment in response to RSVP, CR-LDP requests
 Traffic Driven
 Label assignment in response to flow detection & triggering
4
7
Transmisión de Datos Multimedia - Master IC 2007/2008
The MPLS Shim Header
 The Label (Shim Header) is represented as a sequence of Label Stack Entry
 Each Label Stack Entry is coded by 4 bytes (32 bits) as described
 20 Bits is reserved for the Label Identifier (also named Label)
Label
(20 bits)
Exp
(3 bits)
Label :
Exp :
S:
TTL :
S
(1 bit)
TTL
(8bits)
Label value (0 to 15 are reserved for special use)
Experimental Use
Bottom of Stack (set to 1 for the last entry in the label)
Time To Live
 Based on the contents of the label a swap, push (impose) or pop
(dispose) operation can be performed on the packet's label stack
4
8
Transmisión de Datos Multimedia - Master IC 2007/2008
Label Switched Path
Ingress Ingress
Interface Label
1
5
Ingress Ingress
Interface Label
FEC Egress Egress
Interface Label
138.120
3
1
138.120
4
12
MPLS switch
3
1
4
138.120
1
127.20
MPLS switch
2
3
1
3
2
3
1
2
2
MPLS switch
Ingress Ingress
Interface Label
1
4
9
12
FEC Egress Egress
Interface Label
x
FEC Egress Egress
Interface Label
5
3
138.120
MPLS switch
192.168
x
Transmisión de Datos Multimedia - Master IC 2007/2008
Hop by Hop IP forwarding
Ingress Ingress
Interface Label
1
Default
Ingress Ingress
Interface Label
FEC Egress Egress
Interface Label
None
3
1
MPLS switch
??
3
1
1
2
MPLS switch
1
3
2
138.120.6.12
??
1
2
MPLS switch
Ingress Ingress
Interface Label
1
x
FEC Egress Egress
Interface Label
None
3
Default
4
138.120
138.120.6.12
3
3
2
4
None
Default
??
127.20
5
0
Default
FEC Egress Egress
Interface Label
MPLS switch
192.168
x
Transmisión de Datos Multimedia - Master IC 2007/2008
IP forwarding using Label Switch Path
Ingress Ingress
Interface Label
1
5
Ingress Ingress
Interface Label
FEC Egress Egress
Interface Label
138.120
3
1
138.120
4
12
MPLS switch
3
1
4
1
127.20
2
MPLS switch
1
3
2
3
2
1
2
MPLS switch
Ingress Ingress
Interface Label
1
x
FEC Egress Egress
Interface Label
138.120
3
5
138.120
138.120.6.12
3
138.120.6.12
5
1
12
FEC Egress Egress
Interface Label
MPLS switch
192.168
x
Transmisión de Datos Multimedia - Master IC 2007/2008
5
2
MPLS Label Distribution Protocol
 LDP - a set of procedures by which one LSR informs the other of
the FEC-to-Label binding it has made.
 Currently, several protocols used as Label Distribution Protocol
(LDP) are available:
 RSVP-TE (MPLS extension)
 LDP and CR-LDP
 BGP-4 MPLS extensions
 Label Distribution schemes
Transmisión de Datos Multimedia - Master IC 2007/2008
Downstream stream on demand
Ingress Ingress
Interface Label
1
5
Ingress Ingress
Interface Label
FEC Egress Egress
Interface Label
138.120
3
1
138.120
4
x
12
MPLS switch
3
1
4
138.120
1
127.20
2
MPLS switch
1
3
3
2
3
2
1
Ingress Ingress
Interface Label
1
x
FEC Egress Egress
Interface Label
138.120
3
MPLS switch
192.168
2
MPLS switch
5
3
12
FEC Egress Egress
Interface Label
5
The label is requested by the
upstream node and the
downstream node defines the label
used.
Transmisión de Datos Multimedia - Master IC 2007/2008
Unsolicited Downstream
Ingress Ingress
Interface Label
1
5
Ingress Ingress
Interface Label
FEC Egress Egress
Interface Label
138.120
3
1
138.120
4
x
12
MPLS switch
3
1
4
138.120
1
2
MPLS switch
127.20
1
3
3
2
3
2
1
Ingress Ingress
Interface Label
1
x
FEC Egress Egress
Interface Label
138.120
3
MPLS switch
192.168
2
MPLS switch
5
4
12
FEC Egress Egress
Interface Label
5
The downstream node defines
the label and advertises it to the
upstream node.
Transmisión de Datos Multimedia - Master IC 2007/2008
5
5
Edge LSR Features








Routing protocols
FEC Classification
Initiates LSP setup for Downstream On Demand method
Adaptation of non-MPLS data to MPLS data
Layer 2 translation for MPLS data
Terminated MPLS-VPN
At least one LDP protocol
Edge LSR is counted into the TTL count as a regular router
Transmisión de Datos Multimedia - Master IC 2007/2008
5
6
Core LSR Features





Routing protocols
Propagates Downstream On Demand method (request and mapping)
Layer 2 translation
High speed label forwarding/switching
At least one LDP protocol
Transmisión de Datos Multimedia - Master IC 2007/2008
5
7
MPLS Advantages







Simplified Forwarding
Efficient Explicit Routing
Traffic Engineering
QoS Routing
Mappings from IP Packet to Forwarding Equivalence Class (FEC)
Partitioning of Functionality
Common Operation over Packet and Cell media
Tema 2:
Aspectos de encaminamiento
Algoritmos básicos de encaminamiento
 Link state
 Distance Vector
Encaminamiento en Internet
 RIP
 OSPF
 BGP
Multi-Protocol Label Switching (MPLS).
IP multicast
Transmisión de Datos Multimedia –
http://www.grc.upv.es/docencia/tdm
– Master IC 2007/2008
5
9
Transmisión de Datos Multimedia - Master IC 2007/2008
Multicast = Efficient Data Distribution
Src
Src
Transmisión de Datos Multimedia - Master IC 2007/2008
6
0
Why Multicast ?
 Need for efficient one-to-many delivery of same data
 Applications:










News/sports/stock/weather updates
Distance learning
Configuration, routing updates, service location
Pointcast-type “push” apps
Teleconferencing (audio, video, shared whiteboard, text editor)
Distributed interactive gaming or simulations
Email distribution lists
Content distribution; Software distribution
Web-cache updates
Database replication
Transmisión de Datos Multimedia - Master IC 2007/2008
6
1
Why Not Broadcast or Unicast?
 Broadcast:





Send a copy to every machine on the net
Simple, but inefficient
All nodes must process packet even if they don’t care
Wastes more CPU cycles of slower machines (“broadcast radiation”)
Network loops lead to “broadcast storms”
 Replicated Unicast:




Sender sends a copy to each receiver in turn
Receivers need to register or sender must be pre-configured
Sender is focal point of all control traffic
Reliability => per-receiver state, separate sessions/processes at sender
Transmisión de Datos Multimedia - Master IC 2007/2008
Multicast Apps Characteristics
 Number of (simultaneous) senders to the group
 The size of the groups
 Number of members (receivers)
 Geographic extent or scope
 Diameter of the group measured in router hops




The longevity of the group
Number of aggregate packets/second
The peak/average used by source
Level of human interactivity
 Lecture mode vs interactive
 Data-only (eg database replication) vs multimedia
6
2
Transmisión de Datos Multimedia - Master IC 2007/2008
6
3
Reliable Multicast vs. Unreliable Multicast
 When a multicast message is sent by a process, the runtime
support of the multicast mechanism is responsible for delivering the
message to each process currently in the multicast group.
 As each participating process may be on a separate host, due to
factors such as failures of network links and/or network hosts,
routing delays, and differences in software and hardware, the time
between when a message is sent and when it is received may vary
among the recipient processes.
 Moreover, a message may not be received by one or more of the
processes at all.
Transmisión de Datos Multimedia - Master IC 2007/2008
6
4
Classification of multicasting mechanisms in terms of message
delivery
 Unreliable multicast:
 The arrival of the correct message at each process is not guaranteed.
 Reliable multicast:
 Guarantees that each message is eventually delivered in a noncorrupted form to each process in the group.
 The definition of reliable multicast requires that each participating
process receives exactly one copy of each message sent. It does
not put any restriction of the order the messages delivered.
 Reliable multicast can be further classified based on the order of the
delivery of the messages: unordered, FIFO, causal order, atomic
order.
Transmisión de Datos Multimedia - Master IC 2007/2008
6
5
Classification of reliable multicast -- unordered
 An unordered reliable multicast system guarantees the safe delivery
of each message, but it provides no guarantee on the delivery order
of the messages.
 Example: Processes P1, P2, and P3 have formed a multicast group.
Three messages, m1, m2, m3 have been sent to the group. An
unordered reliable multicast system may deliver the messages to
each of the three processes in any of these:
m1-m2-m3,
m1-m3-m2,
m2-m1-m3,
m2-m3-m1,
m3-m1-m2,
m3-m2-m1
Transmisión de Datos Multimedia - Master IC 2007/2008
6
6
Classification of reliable multicast - FIFO
 If process P sent messages mi and mj, in that order, then each
process in the multicast group will be delivered the messages mi
and mj, in that order.
 Note that FIFO multicast places no restriction on the delivery order
among messages sent by different processes. For example, P1
sends messages m11 then m12, and P2 sends messages m21 then
m22. It is possible for different processes to receive any of the
following orders:
m11-m12-m21-m22,
m11-m21-m12-m22,
m11-m21-m22-m12,
m21-m11-m12-m22
m21-m11-m22-m12
m21-m22-m11-m12.
Transmisión de Datos Multimedia - Master IC 2007/2008
6
7
Classification of reliable multicast – Causal order
 If message mi causes (results in) the occurrence of message mj,
then mi will be delivered to each process prior to mj. Messages mi
and mj are said to have a causal or happen-before relationship.
 For example, P1 sends a message m1, to which P2 replies with a
multicast message m2. Since m2 is triggered by m1, the two
messages share a causal relationship of m1-> m2. A causal-order
multicast message system ensures that these two messages will be
delivered to each of the processes in the order of m1- m2.
Transmisión de Datos Multimedia - Master IC 2007/2008
Classification of reliable multicast –
Atomic order
 In an atomic-order multicast system, all messages are guaranteed
to be delivered to each participant in the exact same order. Note
that the delivery order does not have to be FIFO or causal, but
must be identical for each process.
 Example:
 P1 sends m1, P2 sends m2, and P3 sends m3.
 An atomic system will guarantee that the messages will be delivered
to each process in only one of the six orders:
m1-m2- m3, m1- m3- m2, m2- m1-m3,
m2-m3-m1, m3-m1- m2, m3-m2-m1.
6
8
Transmisión de Datos Multimedia - Master IC 2007/2008
IP Multicast Architecture
Service model
Host-to-router protocol
(IGMP)
Routers
Multicast routing protocols
(various)
6
9
Hosts
Transmisión de Datos Multimedia - Master IC 2007/2008
IP Multicast model: RFC 1112
 Message sent to multicast “group” (of receivers)
 Senders need not be group members
 A group identified by a single “group address”
 Use “group address” instead of destination address in IP packet sent to
group




Groups can have any size;
Group members can be located anywhere on the Internet
Group membership is not explicitly known
Receivers can join/leave at will
 Packets are not duplicated or delivered to destinations outside the
group
 Distribution tree constructed for delivery of packets
 No more than one copy of packet appears on any subnet
 Packets delivered only to “interested” receivers => multicast delivery
tree changes dynamically
 Network has to actively discover paths between senders and receivers
7
0
Transmisión de Datos Multimedia - Master IC 2007/2008
7
1
IP Multicast Addresses
 Class D IP addresses
 224.0.0.0 – 239.255.255.255
1 110
Group ID
 Address allocation:
 Well-known (reserved) multicast addresses, assigned by IANA:
224.0.0.x and 224.0.1.x Transient multicast addresses, assigned and
reclaimed dynamically, e.g., by “sdr” program
 Each multicast address represents a group of arbitrary size, called a
“host group”
 There is no structure within class D address space like subnetting
=> flat address space
Transmisión de Datos Multimedia - Master IC 2007/2008
IP Multicast Service
 Sending
 Uses normal IP-Send operation, with an IP multicast address specified
as the destination
 Must provide sending application a way to:
 Specify outgoing network interface, if >1 available
 Specify IP time-to-live (TTL) on outgoing packet
 Enable/disable loop-back if the sending host is/isn't a member of the
destination group on the outgoing interface
 Receiving
 Two new operations
 Join-IP-Multicast-Group(group-address, interface)
 Leave-IP-Multicast-Group(group-address, interface)
 Receive multicast packets for joined groups via normal IP-Receive
operation
7
2
Transmisión de Datos Multimedia - Master IC 2007/2008
7
3
Link-Layer Transmission/Reception
 Transmission
 IP multicast packet is transmitted as a link-layer multicast, on those
links that support multicast
 Link-layer destination address is determined by an algorithm specific to
the type of link
 Reception
 Necessary steps are taken to receive desired multicasts on a particular
link, such as modifying address reception filters on LAN interfaces
 Multicast routers must be able to receive all IP multicasts on a link,
without knowing in advance which groups will be used
Transmisión de Datos Multimedia - Master IC 2007/2008
Using Link-Layer Multicast Addresses
 Ethernet and other LANs using 802 addresses:
 Direct mapping! Simpler than unicast! No ARP etc.
IP multicast address
1110
28 bits
Group bit
0000000100000000010111100
23 bits
LAN multicast address
 32 class D addresses may map to one MAC address
 Special OUI for IETF: 0x01-00-5E.
 No mapping needed for point-to-point links
7
4
Transmisión de Datos Multimedia - Master IC 2007/2008
Multicast over LANs & Scoping
 Multicasts are flooded across MAC-layer bridges along a spanning
tree
 But flooding may steal sending opportunity for non-member stations
which want to transmit
 Almost like broadcast!
 Scope: How far do transmissions propagate?
 Implicit scoping: Reserved Mcast addresses => don’t leave subnet.
 Also called “link-local” addresses
 TTL-based scoping:
 Multicast routers have a configured TTL threshold
 Multicast datagram dropped if TTL <= TTL threshold
 Useful as a blanket parameter.
 Administrative scoping:
 Use a portion of class D address space (239.0.0.0 thru 239.255.255.255)
 Truly local to admin domain; address reuse possible.
 In IPv6, scoping is an internal attribute of an IPv6 multicast address
7
5
Transmisión de Datos Multimedia - Master IC 2007/2008
Multicast Scope Control – Small TTLs
 TTL expanding-ring search to reach or find a nearby subset of a
group
 Rings can be nested, but not overlapping
s
1
2
3
7
6
Transmisión de Datos Multimedia - Master IC 2007/2008
IP Multicast Architecture
Service model
Host-to-router protocol
(IGMP)
Routers
Multicast routing protocols
(various)
7
7
Hosts
Transmisión de Datos Multimedia - Master IC 2007/2008
7
8
Internet Group Management Protocol
 IGMP: “signaling” protocol to establish, maintain, remove groups on
a subnet.
 Objective: keep router up-to-date with group membership of entire
LAN
 Routers need not know who all the members are, only that members
exist
 Each host keeps track of which mcast groups are subscribed to
 Socket API informs IGMP process of all joins
Transmisión de Datos Multimedia - Master IC 2007/2008
How IGMP Works
 On each link, one router is elected the “querier”
 Querier periodically sends a Membership Query message to the allsystems group (224.0.0.1), with TTL = 1
 On receipt, hosts start random timers (between 0 and 10 seconds)
for each multicast group to which they belong
Routers:
Hosts:
7
9
Q
Transmisión de Datos Multimedia - Master IC 2007/2008
How IGMP Works (cont.)
 When a host’s timer for group G expires, it sends a Membership
Report to group G, with TTL = 1
 Other members of G hear the report and stop (suppress) their
timers
 Routers hear all reports, and time out non-responding groups
Routers:
Hosts:
8
0
Q
G
G
G
G
Transmisión de Datos Multimedia - Master IC 2007/2008
8
1
How IGMP Works (cont.)
 Normal case: only one report message per group present is sent in
response to a query
 Query interval is typically 60-90 seconds
 When a host first joins a group, it sends immediate reports, instead
of waiting for a query
 IGMPv2: Hosts may send a “Leave group” message to “all routers”
(224.0.0.2) address
 Querier responds with a Group-specific Query message: see if any
group members are present
 Lower leave latency
Transmisión de Datos Multimedia - Master IC 2007/2008
8
2
The Java Basic Multicast API
 At the transport layer, the basic multicast supported by Java is an
extension of UDP (the User Datagram Protocol)
 For the basic multicast, Java provides a set of classes which are
closely related to the datagram socket API classes
Transmisión de Datos Multimedia - Master IC 2007/2008
Datagram - recap
sender process
receiver process
a byte array
a byte array
receiver's
address
a DatagramPacket object
a DatagramPacket object
send
receive
a DatagramSocket
object
a DatagramSocket
object
object reference
data flow
8
3
Transmisión de Datos Multimedia - Master IC 2007/2008
8
4
The Java Basic Multicast API - 2
 There are four major classes in the API, the first three of which we
have already seen in the context of datagram sockets.
 InetAddress: In the datagram socket API, this class represents the
IP address of the sender or receiver. In multicasting, this class
can be used to identify a multicast group.
 DatagramPacket: As with datagram sockets, an object of this class
represents an actual datagram; in multicast, a DatagramPacket
object represents a packet of data sent to all participants or
received by each participant in a multicast group.
 DatagramSocket: In the datagram socket API, this class represents
a socket through which a process may send or receive data.
 MulticastSocket : A MulticastSocket is a DatagramSocket, with
additional capabilities for joining and leaving a multicast group. An
object of the multicast datagram socket class can be used for
sending and receiving multicast packets. In the Java API, a
MulticastSocket object is bound to a port address, e.g. 3456, and
methods of the object allows for the joining and leaving of a
multicast address, e.g. 239.1.2.3
Transmisión de Datos Multimedia - Master IC 2007/2008
8
5
Joining a multicast group
To join a multicast group at IP address m and UDP port p, a
MulticastSocket object must be instantiated with p, then the object’s
joinGroup method can be invoked specifying the address m:
// join a Multicast group at IP address
// 239.1.2.3 and port 3456
InetAddress group =
InetAddress.getByName("239.1.2.3"); MulticastSocket s =
new MulticastSocket(3456); s.joinGroup(group);
Transmisión de Datos Multimedia - Master IC 2007/2008
8
6
Sending to a multicast group
A multicast message can be sent using syntax similar with the datagram
socket API.
String msg = "a multicast message.";
InetAddress group =
InetAddress.getByName("239.1.2.3");
MulticastSocket s =
new MulticastSocket(3456);
s.joinGroup(group);
// optional
DatagramPacket hi =
new DatagramPacket(msg.getBytes( ),
msg.length( ),group, 3456);
s.send(hi);
Transmisión de Datos Multimedia - Master IC 2007/2008
8
7
Receiving messages sent to a multicast group
A process that has joined a multicast group may receive messages sent to the group
using syntax similar to receiving data using a datagram socket API.
byte[] buf = new byte[1000];
InetAddress group =
InetAddress.getByName("239.1.2.3");
MulticastSocket s =
new MulticastSocket(3456);
s.joinGroup(group);
DatagramPacket recv =
new DatagramPacket(buf,buf.length);
s.receive(recv);
Transmisión de Datos Multimedia - Master IC 2007/2008
8
8
Leaving a multicast group
 A process may leave a multicast group by invoking the leaveGroup
method of a MulticastSocket object, specifying the multicast address
of the group.
 s.leaveGroup(group);
Transmisión de Datos Multimedia - Master IC 2007/2008
8
9
Setting the “time-to-live”
 The runtime support needs to propagate a multicast message from
a host to a neighboring host in an algorithm which, when executed
properly, will eventually deliver the message to all the participants.
 Under some anomalous circumstances, however, it is possible that
the algorithm which controls the propagation does not terminate
properly, resulting in a packet circulating in the network indefinitely.
 Indefinite message propagation causes unnecessary overhead on
the systems and the network.
 To avoid this occurrence, it is recommended that a “time to live”
parameter be set with each multicast datagram.
 The time-to-live (ttl) parameter, when set, limits the count of
network links or hops that the packet will be forwarded on the
network.
Transmisión de Datos Multimedia - Master IC 2007/2008
9
0
Setting the “time-to-live”
The recommended ttl settings are:
0 if the multicast is restricted to processes on the same host
1 if the multicast is restricted to processes on the same subnet
32 if the multicast is restricted to processes on the same site
64 if the multicast is restricted to is processes on the same region
128 is if the multicast is restricted to processes on the same
continent
 255 is the multicast is unrestricted






Transmisión de Datos Multimedia - Master IC 2007/2008
9
1
Setting the “time-to-live”
String msg = "Hello everyone!";
InetAddress group =
InetAddress.getByName("239.1.2.3");
MulticastSocket s = new MulticastSocket(3456);
s.setTimeToLive(1);
// set time-to-live to 1 hop
DatagramPacket hi =
new DatagramPacket(msg.getBytes( ),
msg.length( ),group, 3456);
s.send(hi);
The value specified for the ttl must be in the range 0 <= ttl <= 255; an
IllegalArgumentException will be thrown otherwise.
Transmisión de Datos Multimedia - Master IC 2007/2008
9
2
The C version: Joining Multicast Groups
 To join a group, you use the setsockopt() kernel service call with a
new parameter. The new parameter is the ip_mreq structure:
/************************************************************/
/*** The ip_mreq structure for selecting a multicast addr ***/
/************************************************************/
struct ip_mreq
{
struct in_addr imr_multiaddr;
/* known multicast group */
struct in_addr imr_interface;
/* network interface */
} ;
 The imr_multiaddr field specifies the multicast group you want to
join. It is the same format as the sin_addr field in the sockaddr_in
structure. The imr_interface field lets you choose a particular host
interface. This is similar to a bind(), which lets you specify the host
interface (or leave the host option wide open with an INADDR_ANY
value).
Transmisión de Datos Multimedia - Master IC 2007/2008
The C version: Joining Multicast Groups
 The following code snippet shows you how to join a group using the
ip_mreq structure. It sets the imr_interface field to INADDR_ANY
merely for demonstration. Do not use it unless you have only one
interface on your host; the results can be unpredictable .
/************************************************************/
/*** Join a multicast group
***/
/************************************************************/
const char *GroupID = "224.0.0.10";
struct ip_mreq mreq;
if ( inet_aton(GroupID, &mreq.imr_multiaddr) == 0 )
panic("address (%s) bad", GroupID);
mreq.imr_interface.s_addr = INADDR_ANY;
if ( setsockopt(sd, SOL_IP, IP_ADD_MEMBERSHIP,&mreq,sizeof(mreq))!= 0)
panic("Join multicast failed");
/************************************************************/
/*** Drop a multicast group
***/
/************************************************************/
if ( setsockopt(sd, SOL_IP, IP_DROP_MEMBERSHIP, &mreq, sizeof(mreq)) != 0 )
panic("Drop multicast failed");
9
3
Transmisión de Datos Multimedia - Master IC 2007/2008
IP Multicast Architecture
Service model
Host-to-router protocol
(IGMP)
Routers
Multicast routing
protocols
9
4
Hosts
Transmisión de Datos Multimedia - Master IC 2007/2008
9
5
Multicast Routing
 Basic objective – build distribution tree for multicast packets
 The “leaves” of the distribution tree are the subnets containing at least
one group member (detected by IGMP)
 Multicast service model makes it hard
 Anonymity
 Dynamic join/leave
Transmisión de Datos Multimedia - Master IC 2007/2008
9
6
Simple Multicast Routing Techniques
 Flood and prune




Begin by flooding traffic to entire network
Prune branches with no receivers
Examples: DVMRP, PIM-DM
Unwanted state where there are no receivers
 Link-state multicast protocols
 Routers advertise groups for which they have receivers to entire
network
 Compute trees on demand
 Example: MOSPF
 Unwanted state where there are no senders
Transmisión de Datos Multimedia - Master IC 2007/2008
How to Flood Efficiently ?
 A router forwards a packet from source (S) iff it arrives via the
shortest path from the router back to S
 Reverse path check!
 Packet is replicated out all but the incoming interface
 Reverse shortest paths easy to compute  just use info in DV
routing tables
 DV gives shortest reverse paths
 Efficient if costs are symmetric
S
y
x
z
Forward packets that arrive
on shortest path from “t” to “S”
(assume symmetric routes)
t
a
9
7
Transmisión de Datos Multimedia - Master IC 2007/2008
Problem
 Flooding can cause a given packet to be sent multiple times over the same
link: can filter better than this!
S
x
y
a
duplicate packet
z
b
 Solution: Reverse Path Broadcasting
9
8
Transmisión de Datos Multimedia - Master IC 2007/2008
Reverse Path Broadcasting (RPB)
 Basic idea: forward a packet from S only on child links for S
 Child link of router x for source S: link that has x as parent on the
shortest path from the link to S
S
5
x
forward only
to child link
child link of x
for S
6
y
a
z
b
9
9
Transmisión de Datos Multimedia - Master IC 2007/2008
How to Find Child Links?
 Routing updates !
 If z tells x that it can reach S through y,
 and if the cost of this path is >= the cost of the path from z to S
through x,
 then x knows that the link to z is a child link
 In case of tie, lower address wins
S
5
x
forward only
to child link
child link of x
for S
6
y
a
z
b
1
0
Transmisión de Datos Multimedia - Master IC 2007/2008
Truncated RPB
 This is still a broadcast algorithm – the traffic goes everywhere –
lousy filtering!
 First order solution: Truncated RPB
 Don't forward traffic onto networks with no receivers
 Identify leaves
 Leaf links are the child links that no other router uses to reach source S
 Use periodic updates of form:
– “this is my next-link to source S”
 If child is not the “next-link” for anyone, it is a leaf
 Detect group membership in leaf (IGMP)
1
0
Transmisión de Datos Multimedia - Master IC 2007/2008
Reverse Path Multicast (RPM)
 Prune back transmission so that only absolutely necessary links
carry traffic
 Use on-demand pruning so that router group state scales with
number of active groups
S
data message
prune message
v
1
0
a
b
x
y
t
a
b
Transmisión de Datos Multimedia - Master IC 2007/2008
Basic RPM Idea
 Prune (Source,Group) at leaf if no members
 Send Non-Membership Report (NMR) up the tree
 If all children of router R prune (S,G)
 Propagate prune for (S,G) to parent R
 On timeout:




Prune dropped
Flow is reinstated
Down stream routers re-prune
Note: this is a soft-state approach
 Grafting: Explicitly reinstate sub-tree when
 IGMP detects new members at leaf, or when a child asks for a graft.
1
0
1
0
Transmisión de Datos Multimedia - Master IC 2007/2008
Putting it together: Topology
G
G
S
G
1
0
Transmisión de Datos Multimedia - Master IC 2007/2008
Flood with Truncated Broadcast
G
G
S
G
1
0
Transmisión de Datos Multimedia - Master IC 2007/2008
Pruning
G
S
G
Prune (s,g)
Prune (s,g)
G
1
0
Transmisión de Datos Multimedia - Master IC 2007/2008
Grafting
G
S
G
G
Report (g)
Graft (s,g)
Graft (s,g)
G
1
0
Transmisión de Datos Multimedia - Master IC 2007/2008
After Grafting Complete
G
G
G
S
G
Transmisión de Datos Multimedia - Master IC 2007/2008
1
0
Reliable Multicast: The Goal
 Implement reliability on top of IP multicast
 Why is this hard ?
 Sender cannot keep state for unknown number of dynamic receivers
 Remember open & dynamic group semantic?
 Algorithms like TCP that estimate path properties such as RTT and
congestion window don’t generalize to trees.
 Remember: TCP is only for a unicast session!
 Has to address (N)ACK implosions
Transmisión de Datos Multimedia - Master IC 2007/2008
Implosion
Packet 1 is lost
All 4 receivers request a resend
Resend request
S
1 2
R1
R1
R2
R3
1
1
S
R2
R4
R3
R4
Transmisión de Datos Multimedia - Master IC 2007/2008
1
1
Retransmission
 Re-transmitter
 Options: sender, other receivers
 How to retransmit
 Unicast, multicast, scoped multicast, retransmission group, …
 Problem: retransmissions (aka repairs) may reach destinations that
don’t require a retransmission
 A.k.a “exposure” problem
 Solution: subcast the re-transmission only to receivers that need it.
Transmisión de Datos Multimedia - Master IC 2007/2008
Why Subcast? Exposure problem…
Packet 1 does not reach R1;
Receiver 1 requests a resend
Resend request
Packet 1 resent to all 4 receivers
S
Resent packet
1
1 2
R1
1
R1
R2
R2
R3
1
1
S
R4
R3
R4
Transmisión de Datos Multimedia - Master IC 2007/2008
Ideal Recovery Model
Packet 1 reaches R1 but is lost
before reaching other Receivers
Only one receiver sends NACK to
the nearest S or R with packet
Resend request
S
S
Resent packet
1 2
1
R1
Repair sent
only to
those that
need packet
R1
1
R2
R3
1
1
R2
R4
R3
R4
Transmisión de Datos Multimedia - Master IC 2007/2008
Reliable Multicast Transport: Issues
 Retransmission can make reliable multicast as inefficient as
replicated unicast
 (N)ACK-implosion if all destinations ack at once
 “Crying baby”: a bad link affects entire group
 Heterogeneity: receivers, links, group sizes
 Anonymous/Open/Dynamic Group Model:
 Source does not know # of destinations, and destinations may vanish
 Multicast applications do not need strong reliability of the type
provided by TCP.
 Can tolerate some reordering, delay, etc
1
1