Download Multicast Virtual Private Networks

Document related concepts

Computer network wikipedia , lookup

Deep packet inspection wikipedia , lookup

Network tap wikipedia , lookup

Recursive InterNetwork Architecture (RINA) wikipedia , lookup

Airborne Networking wikipedia , lookup

Wake-on-LAN wikipedia , lookup

Peering wikipedia , lookup

Cracking of wireless networks wikipedia , lookup

Routing in delay-tolerant networking wikipedia , lookup

IEEE 802.1aq wikipedia , lookup

Zero-configuration networking wikipedia , lookup

Multiprotocol Label Switching wikipedia , lookup

Transcript
Graduate Program in Telecommunications
George Mason University
Technical Report Series
4400 University Drive MS#2B5
Fairfax, VA 22030-4444 USA
http://telecom.gmu.edu/
703-993-3810
Multicast Virtual Private Networks
CHRISTOPHER LENART
[email protected]
Technical Report GMU-TCOM-TR-09
Abstract
Multicast has long been a popular technology in computer networks for the efficient distribution of data, such as patches
or live video, to multiple users simultaneously. The early implementations were always restricted to a single network, and
a remote office would need its own multicast distribution system separate from a main office, for example. This report
describes Next-Generation Multicast Virtual Private Networks (NG-MVPN). NG-MVPN is a popular technology used by
service providers to connect the multicast networks for several locations over their network. The beginning of this report
starts by describing the building blocks of NG-MVPN. These are Multicast, Multiprotocol Label Switching (MPLS),
Border Gateway Protocol (BGP) and BGP/MPLS VPNs. The report assumes the reader already has an understanding
of these technologies. For brevity, the essential parts of these technologies required for NG-MVPN are discussed. The
service provider multicast technology, MVPN (mVPN), written by Eric Rosen and also called Draft Rosen MVPN also
is discussed as background. Lastly, this report also discusses Global Table Multicast (GTM), which is an extension of
NG-MVPN that uses the global routing table rather than the segregated routing tables used for BGP Virtual Private
Networks. Resources for this report are mainly IETF Request for Comments, but also includes technical books, technical
articles, and personal communication. All references are be cited and listed at the end of the report.
Contents
Introduction
1
2
Building Blocks: Multicast, BGP, and MPLS
1.1 Multicast . . . . . . . . . . . . . . . . . . . .
1.1.1 Multicast Addressing . . . . . . . . . .
1.1.1.1 Types of Multicast Addresses
1.1.2 Multicast Distribution Trees . . . . . .
1.1.2.1 Reverse Path Forwarding . .
1.1.3 Internet Group Management Protocol .
1.1.4 Protocol Independent Multicast . . . .
1.1.4.1 PIM Sparse-Mode . . . . . .
1.1.4.2 PIM Dense-Mode . . . . . .
1.1.4.3 PIM Single-Source Mode . .
1.2 MPLS . . . . . . . . . . . . . . . . . . . . . .
1.2.1 MPLS Signaling . . . . . . . . . . . . .
1.2.1.1 LDP . . . . . . . . . . . . .
1.2.1.2 RSVP-TE . . . . . . . . . .
1.3 BGP . . . . . . . . . . . . . . . . . . . . . . .
1.3.1 UPDATE Message . . . . . . . . . . .
1.3.2 Multiprotocol BGP . . . . . . . . . . .
1.4 BGP/MPLS Virtual Private Networks . . . . .
1.4.1 Network Topology and Terminology . .
1.4.2 Virtual Routing and Forwarding Tables
1.4.3 BGP Addressing and Advertisement . .
1.4.3.1 VPNv4 Address Family . . .
1.4.4 Forwarding . . . . . . . . . . . . . . .
1.4.5 Inter-AS Considerations . . . . . . . .
1.4.6 BGP/MPLS VPN Summary . . . . . .
1.5 Generic Routing Encapsulation . . . . . . . . .
1.6 Control Plane vs Forwarding Plane . . . . . . .
i
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
1
1
2
2
3
4
5
5
6
7
8
8
10
10
11
13
13
14
14
15
15
16
16
18
20
20
20
20
Draft Rosen Multicast Virtual Private Networks
2.1 Overview of MVPNs . . . . . . . . . . . . . . . . . . . . .
2.2 MVPN Operation . . . . . . . . . . . . . . . . . . . . . . .
2.2.1 Multicast Distribution Trees . . . . . . . . . . . . .
2.2.1.1 MDTs and Generic Routing Encapsulation
2.2.1.2 Default MDT . . . . . . . . . . . . . . .
2.2.1.3 Data MDT . . . . . . . . . . . . . . . . .
2.2.2 Auto-Discovery in MVPNs . . . . . . . . . . . . . .
2.2.3 RPF . . . . . . . . . . . . . . . . . . . . . . . . . .
2.3 Considerations for Inter-AS and BGP Free Core . . . . . . .
2.3.1 PIM MVPN Join Attribute . . . . . . . . . . . . . .
2.3.2 BGP Connector . . . . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
22
22
25
25
25
25
26
28
29
29
29
29
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
3
4
BGP/MPLS Multicast Virtual Private Networks
3.1 Next-Generation Multicast VPN Overview . . . . . . . . . . . . . . . . .
3.2 PMSI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.2.1 Instantiating PMSIs . . . . . . . . . . . . . . . . . . . . . . . . .
3.3 PIM and BGP Control Plane . . . . . . . . . . . . . . . . . . . . . . . .
3.3.1 PIM Control Plane for CE-PE Information . . . . . . . . . . . .
3.3.2 MP-BGP Control Plane for PE-PE Information . . . . . . . . . .
3.3.2.1 New BGP Path Attributes and Extended Communities
3.3.2.2 MCAST-VPN NLRI . . . . . . . . . . . . . . . . . . .
3.3.3 MP-BGP for PE-PE Upstream Multicast Hop . . . . . . . . . . .
3.3.3.1 BGP for Upstream Multicast Hop Selection . . . . . .
3.3.3.2 Upstream Multicast Hop Selection . . . . . . . . . . .
3.4 Forwarding Plane Considerations . . . . . . . . . . . . . . . . . . . . . .
3.4.1 Tunnel Type 1 - RSVP-TE P2MP LSP . . . . . . . . . . . . . .
3.4.2 Tunnel Type 2 - mLDP P2MP LSP . . . . . . . . . . . . . . . .
3.4.3 Tunnel Type 3 - PIM-SSM . . . . . . . . . . . . . . . . . . . . .
3.4.4 Tunnel Type 4 - PIM-SM . . . . . . . . . . . . . . . . . . . . . .
3.4.5 Tunnel Type 6 - Ingress Replication . . . . . . . . . . . . . . . .
3.4.6 P-Tunnel Aggregation . . . . . . . . . . . . . . . . . . . . . . .
3.5 Global Table Multicast . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.5.1 Use of NG-MVPN BGP Procedures in GTM . . . . . . . . . . .
3.5.1.1 Route Distinguishers and Route Targets . . . . . . . .
3.5.1.2 UMH-Eligible Routes . . . . . . . . . . . . . . . . . .
3.5.1.3 BGP Autodiscovery Routes . . . . . . . . . . . . . . .
3.5.1.4 BGP C-Multicast Routes . . . . . . . . . . . . . . . .
3.5.2 Inclusive and Selective Tunnels . . . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
30
30
31
33
33
33
33
33
35
41
41
41
42
42
42
42
42
43
43
43
43
44
44
45
45
45
Summary
4.1 Compare and Contrast . .
4.2 Receiver Sites: All or Some
4.3 NG-MVPN vs GTM . . . .
4.4 Conclusion . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
46
46
46
47
47
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
List of Figures
1.1
1.2
1.3
1.4
1.5
1.6
1.7
1.8
1.9
1.10
1.11
1.12
1.13
1.14
1.15
Basic Modes of Network Transmission . . . . . . .
Unicast vs Multicast Trees . . . . . . . . . . . . .
PIM-DM vs PIM-SM . . . . . . . . . . . . . . . .
MPLS LSPs . . . . . . . . . . . . . . . . . . . . .
Point-to-Multipoint MPLS LSPs . . . . . . . . . .
LDP Signaling . . . . . . . . . . . . . . . . . . . .
Multicast LDP Signaling . . . . . . . . . . . . . .
RSVP-TE Signaling . . . . . . . . . . . . . . . . .
Multicast RSVP-TE Signaling . . . . . . . . . . .
Service Provider Network with Customer Sites . . .
VRFs and Attachment Circuits . . . . . . . . . . .
MP-BGP VPNv4 BGP UPDATE Message Example
VPN Label Advertisements . . . . . . . . . . . . .
VPN Forwarding . . . . . . . . . . . . . . . . . . .
Control Plane vs Forwarding Plane . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
2
4
7
9
9
10
11
12
12
15
17
18
19
19
21
2.1
2.2
2.3
2.4
2.5
2.6
MVPN
MVPN
MVPN
MVPN
MVPN
MVPN
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
23
24
24
26
27
28
3.1
3.2
3.3
3.4
BGP/MPLS Multicast VPN . . . . . . . . . . . . . . . . . . . . . . . .
Provider Multicast Service Interface . . . . . . . . . . . . . . . . . . . .
Shared Tree to Source Tree Switchover using Source Active A-D Routes
GTM Network Topology . . . . . . . . . . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
31
32
39
44
Overview . . . . . . . .
Details . . . . . . . . . .
C-Instance LAN . . . . .
Default MDT Operation
Data MDT Signaling . .
Data MDT Operation .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Introduction
Every day more technology is utilizing digital methods of communication. The popular example of this is television,
where a handful of channels were sent using analog radio waves directly to an antenna on a house. There was nothing
in between. Today television content is created digitally then packaged digitally to be sent to a television provider’s
head-end. From there the content is sent over a private network to the home or even over the Internet. Between all
these points are finite sized communication channels. The content is growing in size too. Standard Definition Television
(SDTV) was upgraded to High Definition Television (HDTV). HDTV bandwidth is increasing even further with 4K and
8K HDTV, the nomenclature coming from the number of vertical pixels. All of this extra bandwidth is challenging those
finite communication channels and they must be constantly updated to keep up. Television isn’t the only use case that
is choking networks. Large enterprise networks have servers that maintain software updates, or may also stream an
executive message video.
Multicast steps in by allowing a network to send one copy of a packet over a link from a source to many receivers. Rather
than having to send a stream to each server, which is the case with unicast, a source can send one stream and let the
network do the work in getting that stream to anyone who wants to receive it. Multicast also keeps track of where
the interested receivers are, so unlike broadcast, the stream only goes to parts of the network rather than all of the
network.
Companies have embraced the use of Virtual Private Networks over Service Provider networks for a number of years,
which allow them to distribute traffic between remote sites without having to build their own infrastructure. These Virtual
Private Networks have been extended to distribute Multicast across them in a scalable manner. This report explores the
network technologies that provide the Virtual Private Networks and how they have been updated and modified to care for
multicast traffic.
Approach
The intention of this technical report is to walk the reader through the various Multicast VPN technologies. Rather than
jump straight into the multicast technologies and describe each underlying technology involved, the approach is to present
the underlying technologies up front and then put them together when discussing the Multicast VPNs. The report starts
with basic concepts that are then built on for the various approaches to doing Multicast VPNs. It is assumed as well that
the reader already has a background in various computer network technologies. The report is laid out as follows:
• Building Blocks
• Draft Rosen Multicast VPNs
• Next-Generation Multicast VPNs
• Global Table Multicast
• Summary
Building Blocks This chapter explains the basics of mutlicast, MPLS, and BGP that are relevant to multicast VPNs.
The topics are cherry-picked so that there is an understanding of the underlying mechanisms for the various multicast
VPN technologies. The information from BGP and MPLS is combined to discuss Layer 3 VPNs (L3VPNs) which are a
i
major component of each multicast VPN technology discussed in this report. Much information regarding each technology
is omitted for brevity and simplicity.
Draft Rosen Multicast VPNs One of the first widespread implementations for multicast VPNs, or MVPN, was created
by Eric Rosen at Cisco. It was implemented while it was in draft status at the IETF, hence the name Draft Rosen
mVPN. Even though it was only released in draft status it had wide acceptance among the various telecommunications
vendors.
Next-Generation Multicast VPNs Draft Rosen MVPNs evolved to Next-Generation Multicast VPNs (NG-MVPNs)
which overcame some of the limitations of Draft Rosen MVPNs. This section focuses on the two IETF RFCs that were
used to establish the standard, and building on the BGP and MPLS concepts established in the Building Blocks section.
Global Table Multicast is another Multicast VPN technology that relies on the mechanisms and semantics established by
the NG-MVPN standards. While NG-MVPN has routing table isolation for customers as a key characteristic, GTM relies
on the global routing table to reduce operational overhead when that isolation isn’t necessary. This part of the chapter
explores the di↵erences between NG-MVPN and GTM.
Resources
This paper utilizes mainly the documents from the Internet Engineering Task Force (IETF) standards body. The IETF
releases standards in the form of Request For Comments (RFCs) which are allowed unlimited distribution. The initial
stage of an RFC is a draft which has many versions over its lifetime as it is edited, reviewed, and updated. Eventually
the draft is ratified as a standard to become an RFC and is assigned a number. Telecommunication vendors use these
standards to ensure interoperability with products created by other vendors. Each RFC referenced is mentioned in the
main body of the text as a plain-sight reference. Also where applicable the page number is referenced to assist in
identifying the location of a particular piece of information. Some information was taken from various texts as they have
additional illustrations or more elegant explanations of the technology at hand, or the amount of detail in an RFC was
not required.
ii
Chapter 1
Building Blocks: Multicast, BGP, and
MPLS
This chapter introduces the relevant concepts of Multicast, BGP, MPLS, and the combinations of BGP and MPLS that
are used in Multicast VPNs. Not all aspects of each technology will be covered. The reader is encouraged to follow the
references for a more in depth understanding of all the technologies.
1.1
Multicast
The familiar method of transmitting data or a message is unicast. This is the common model of one source node and
one destination node. An instant message that goes from one computer to another computer is a familiar example.
Another example is a single web server sends the contents of a web page to just one node at a time. A file download goes
from one server to the single user that needs it. Another transmission model is broadcast. In the case of a broadcast a
message is sent to all of the nodes on a network, and is generally limited to that local network. Broadcasts if not used
properly can overwhelm a network. The last model is multicast. Not everyone needs a file at the same time and not
everyone is watching the same channel at the same time. Multicast solves this problem by only sending the data to the
nodes that request it [1, p. 69–71].
Another problem that multicast solves is the escalating bandwidth problem. In the unicast model each person requesting
the data gets a copy. If 100 people request it, the source server will send 100 copies. With multicast the server only
needs to send one copy, and this copy gets replicated in the network by an intermediate node, such as a router or switch,
until each requesting user gets a copy. Each link in the network only has to forward one copy, even if 100 users are
requesting it [2, p. 1].
1
Unicast
End Node
Broadcast
Source
Multicast
Receiver
Figure 1.1: Basic Modes of Network Transmission
Figure 1.1 gives a graphical representation of the three main modes of transmission. The right-most graphic implies that
the source is sending one transmission but it is sent to multiple receivers that request the content. The mechanisms of
how a receiver requests data will be described later in this chapter. The figure also shows two major components of
the multicast network, the source and the receiver. In between are the nodes that replicate and forward the multicast
traffic.
1.1.1
Multicast Addressing
Internet Protocol (IP) Addresses are defined by five classes, A-E. Classes A, B and C are used predominately for
unicast, although certain addresses are used for broadcast. The addresses in each class can be further broken down
using subnetting, with the last IP address in a subnet reserved for broadcast for that subnet that’s reserved for a
particular Local Area Network (LAN). Class E addresses are reserved for future or experimental use, but have not had
any widespread implementation. Class D addresses are reserved for multicast, and are defined by the range 224.0.0.0
through 239.255.255.255. The exact specifications for the addressing are defined in RFC 1112. The addresses in this
range are also referred to as group addresses [3, p. 2]. Because they are part of the IP Protocol domain they still follow
the dotted decimal notation used for the other classes.
1.1.1.1
Types of Multicast Addresses
Within the Class D range, the addresses are further broken down into various groups, and may either be permanently
assigned or transient addresses. The assignment of the permanent addresses are maintained by the IANA after they are
specified in the IETF RFCs [4, p. 28].
Link Local-Scope Link local scope is within the range 224.0.0.0 through 224.0.0.255. This range contains addresses
specifically assigned to a function, such as routing protocol updates. The Time-to-Live (TTL) of these addresses are
set to 1 so they can only be forwarded once before becoming invalid. The addresses 224.0.0.1 and 224.0.0.2 have the
important assignments of being the “all hosts on subnet” and “all routers on subnet.”
Globally Scoped This is the large range of 224.0.1.0 through 238.255.255.255. These aren’t limited like the link-local
addresses and can be used to transmit information across large networks and the Internet. Some addresses have been
reserved for specific network functions, such as 224.0.1.1 for Network Time Protocol (NTP), as well as ranges assigned
to organizations (all within the 224.0.0.0/8 range).
2
Both the link-local scope and the globally scoped assignments were originally maintained in RFCs, however
now they are maintained on the IANA website.
Limited Scope These fall within the range 239.0.0.0 through 239.255.255.255. These are analogous to private
addresses used for unicast, such as 10.0.0.0/8. Networks are required to use policies to prevent any traffic from these
range from leaving an autonomous system (AS). These are defined in RFC 2365.
GLOP Addressing GLOP addressing isn’t an initialism or acronym, it’s simply the name of the range 233.0.0.0 through
233.0.0.255. Established in RFC 2770, this group of addresses was created for organizations that already had an AS
number assigned by the IANA. The AS number is inserted into the second and third octets of the address to create a
unique address range for the organization. This leaves the last octet as the assignable range [4, p. 28–30]. An example
of a GLOP address for AS 789 is 233.3.21.1 [5, p. 2].
Source-Specific Multicast Well after multicast was created specific addresses were reserved solely for Source-Specific
Multicast (SSM). The range is 232.0.0.0/8 and any group using this address uses SSM. SSM requires special modifications
to Internet Group Management Protocol and Protocol Independent Multicast, which will be discussed in sections 1.1.3
and 1.1.4. RFC 4607 declares that the use of any address outside of this range is called Any-Source Multicast (ASM) [6,
p. 3]. This report will follow this convention.
1.1.2
Multicast Distribution Trees
An important part of forwarding multicast traffic through the network is the ability for a network node to build distribution
trees so it can do routing and forwarding. A network node with this capability can be referred to as a multicast-enabled
node, and since it is doing multicast routing these nodes will be referred to as a multicast-enabled router, or just multicast
router. Each multicast router is connected to other multicast routers and shares information with the use of special
multicast protocols to build trees.
There are two main types of trees: shared-based trees and source-based trees. Shared-based trees can be referred to as
shared trees. Source-based trees can be referred to as source trees or Shortest Path Trees (SPTs). In this report, to
prevent confusion, the terms shared trees and source trees will be used.
Both trees are based on a common notation referred to as (S,G) notation (pronounced “ess comma gee”) to represent a
set of sources and groups. The S represents the source of the stream and is the unicast IP address of the server that is
sending the traffic. The G represents the multicast group and it is the identification of a specific stream of traffic. A
source can have multiple groups associated with it. A group address could represent something like a specific file or a
channel in IP based TV. As discussed in the addressing section, the group address from the class D range of all IPv4
addresses. An example of a source and group set would be (1.1.1.1,239.1.1.1) where 1.1.1.1 is the multicast source
server and 239.1.1.1 is the multicast group address. In shared trees the source is denoted by an asterisk and means “all
sources.” The notation is (*,G), and using the previous example is written as (*,239.1.1.1) to represent a specific group,
but no specific source.
Shared trees utilize a central point in the tree, referred to as a Rendezvous Point (RP). Sources send their traffic to the
RP then the RP forwards the traffic to all of the active receivers for a group. Shared trees use the (*,G) notation since
the source is unknown to the receiver and the traffic is sent to the RP. Source trees are simpler than shared trees since
the root of the tree is at the source. The tree then spans the multicast enabled network to all the receivers. This type
of tree makes use of the shortest path between the source and the receiver, and di↵erent trees may exist for di↵erent
groups. The source tree uses the (S,G) notation since the source is known[4, p. 41–43].
3
Unicast
Source Tree
S1
1
2
S1
S2
S1
S2
1
2
1
2
3
4
6
Shared Tree
3
5
4
7
3
5
6
RP
7
4
6
5
7
S1 Stream
S2 Stream
End Node
Source
Receiver
Intermediate Node (Router)
Figure 1.2: Unicast vs Multicast Trees
Figure 1.2 compares unicast distribution to the source and shared mode multicast distribution trees. With unicast, the
source needs to send one copy per receiver for the same content. Contrast that to the source tree where source 1 (S1)
only needs to send one copy even though it has two receivers. The copy is replicated at intermediate node 5 and each
downstream node only receives one copy. Even if a downstream node, such as 7, had dozens of receivers attached to it
(directly or indirectly) node 5 would only have to send one copy to 7. In the shared tree intermediate node 4 is configured
to be the RP. The stream from source (S2) is unchanged since it passed through that node anyway, but the stream
from S1 no longer takes the shortest path to node 5 and instead sends it to 4 before being passed along to 5 to then be
replicated.
1.1.2.1
Reverse Path Forwarding
Multicast routing co-exists with unicast routing in a network. Unicast routing is responsible for looking at the destination
of a IP packet1 and forwarding it out the interface that was determined to be on the best path by a unicast routing
protocol. When forwarding multicast packets the router needs to know the best path to the root or source of the tree in
the upstream direction in addition to which interfaces are toward the receivers in the downstream direction. Reverse
Path Forwarding (RPF) is employed by the router to ensure that there is a loop free topology. It does this by ensuring
that the multicast traffic is arriving on the same interface that is also the best path to the source. If the traffic arrives on
a di↵erent interface it’s possible that there is a loop in the topology. RPF knows which interface is the best path to the
1 Datagram
is the original technical term for an IP packet; however the common vernacular is to use packet when referring to IP encapsulated
data.
4
source utilizing the unicast routing table since the source for a multicast is a unicast address. When a multicast packet
arrives in a router it will check to make sure it arrived on the upstream interface. If it does the router will forward it. If it
does not the router will drop it [4, p. 47]. Referencing figure 1.2, intermediate node 5 will only forward traffic from S1 if
the traffic is coming from intermediate node 1; otherwise it will be dropped.
1.1.3
Internet Group Management Protocol
At its most fundamental level, Internet Group Management Protocol (IGMP) is used by IP hosts (receiving nodes) to
announce they would like to receive traffic from a specific group or multiple groups, also referred to as dynamic host
registration. Multicast routers listen for these messages as well as send out queries to discover if hosts are active or idle.
IGMP was originally specified in RFC 1112, then was enhanced in RFC 2236 as IGMPv2 [4, p. 51]. One of the major
enhancements in IGMPv2 is to allow a host to to leave a group rather than just timing out. The latest is IGMPv3 and is
specified in RFC 3376, and was updated by RFC 4604. RFC 3376 added the ability to filter by source[7, p. 1], while RFC
4604 adds wording for SSM.2 [8, p. 1].
IGMP messages are embedded into IP packets. There are three types of messages that are germane to the interaction
between the hosts and multicast routers: Membership Query, Membership Report, and Leave Group. The message
is distinguished by the type field in an IGMP message which is the payload within an IP packet. Queries are sent by
routers to either to learn if an attached network has any groups with active hosts, in the case of a general query, or a
group-specific query to learn if a group has any active hosts. The membership report is used by hosts to either respond
to a query, or to send an unsolicited query when an application is launched. The leave group message is used by hosts to
explicitly notify a router that it is leaving a group. In each case the group address is referenced in the message, except in
the case of a general query where the address is set to zero. In all cases the TTL of the packet is set to 1 so the router
cannot forward the message [9, p. 2–5].
RFC 3376 describes IGMPv3 and modifies the membership query and introduces a new membership report for version
3. The membership query is modified to support a list of one or more specific sources in the message. The group
format is still the same where the group address is set to zero for a general query and a group address is provided for a
group-specific query. The version 3 membership report is modified so that the IGMP message has one or more records,
and each group record can list one or more specific sources. The message itself specifies the number of group records,
and each group record specifies the number of sources for that record [7, p. 7–15]. The same RFC also specifies the
mechanism of INCLUDE and EXCLUDE modes. The INCLUDE mode specifies a list of sources that the host would like
to receive traffic from, and EXCLUDE specifies a list of sources that the host should not receive multicast traffic from.
These INCLUDE and EXCLUDE lists tell the router that hosts only want traffic from these specific sources [4, p. 55].
RFC 4607 builds on RFC 3376 to add language regarding source-specific multicast rules established in RFC4607 (written
by the same authors as 4604 and published at the same time). Specifically this references the 232.0.0.0/8 range and
establishes the concept of “SSM-aware” hosts and routers that recognize this address space. [8, p. 1–6]. RFC 4607
states that when a host joins an SSM group the router should use SSM methods and does not need to use shared-tree
distribution (i.e. a source-tree can be used instead) [6, p. 3–4].
1.1.4
Protocol Independent Multicast
IGMP cares for multicast signaling between a host and a multicast router. However a separate protocol is needed between
multicast routers and other multicast routers. Although there are several multicast routing protocols available, such as
Distance Vector Multicast Routing Protocol (DVMRP) and Multicast OSPF (MOSPF), this report focuses on Protocol
Independent Multicast (PIM), and its three modes: Sparse-Mode, Dense-Mode, and Single-Source Mode. PIM gets its
name from the fact that it does not rely on any specific routing protocol for it to function. It can use BGP, OSFP, IS-IS,
static routes, etc. This is in contrast to a protocol like MOSPF which requires OSPF as the routing protocol. PIM also
does not build its own routing topology, instead relying on the unicast routing tables provided by the aforementioned
routing protocols to build its distribution trees. Using the unicast routing table PIM can do reverse path checks and build
reverse path tables to maintain the interface used to most optimally reach a known source. PIM-DM is regarded to be
better when there is expected to be a large number of active receivers compared to the total number of receivers in the
2 Some recent texts mention only RFC 3376 as the reference for SSM; however the semantics specific to SSM are expanded in RFC 4604.
RFC 3376 does establish the message formatting for reports and queries with specific sources.
5
network, and when the traffic is constantly being forwarded. PIM-SM is regarded as the better choice when the number
of active users will be a small percentage of the total receivers, or when the traffic for a group will be used sporadically
[4, p. 78–79].
Note: From this point onward an IGMP membership report will be referred to as an IGMP Join. This is in line with
various other texts, articles, and sources regarding IGMP and PIM interaction.
1.1.4.1
PIM Sparse-Mode
PIM Sparse-Mode (SM) was originally specified in RFC 2117 which was later updated by RFC 2362. More recently RFC
4601 was created which obsoletes RFC 2362, fixes any errors from RFC 2362, as well as adds rules regarding how to
handle traffic using SSM addresses [10, p. 4]. PIM-SM relies on shared-trees for multicast distribution. At the center
of the tree is the Rendezvous Point (RP) which functions as an intermediary for the multicast routers attached to the
source and receivers. Another name for the shared tree is the RP Tree (RPT) since the tree for the receivers is rooted at
the RP. The location of the RP is either statically configured or learned dynamically by various methods, one of which is
the Boostrap Router (BSR) method.
Each router builds a Multicast Routing Information Base (MRIB) which stores the best interface to use as a next-hop
for forwarding PIM messages. These messages are typically sent in the opposite direction of the multicast traffic being
forwarded, as is the case for a PIM Join or Prune message. The MRIB is based on reverse-path forwarding rules, meaning
it knows the best path back towards a source. Each source and receiver has a Designated Router (DR)3 that acts on its
behalf for various PIM related actions.
Each router also has a Tree Information Base (TIB) which contains the state of a multicast router by collecting all the
messages received via PIM and IGMP. It stores the state of all the multicast trees on the router [10, p. 5].
When a receiver sends an IGMP Join to its directly connected multicast router a PIM Join is sent to the RP. The
notation of this join is a (*,G) message meaning the source is undefined. The PIM Join will be propagated toward the
RP by each intermediate multicast router until it reaches the RP or another multicast router with a (*,G) entry for that
group already established. All routers with receivers for that group will be part of a tree that is rooted at the RP. PIM
Join messages sent periodically as long as the DR has active receivers to prevent that section of the tree from timing
out. A source will always send its traffic to its local multicast router (DR). The source DR will encapsulate the traffic
into a unicast traffic and forward it to the RP which decapsulates it and forwards it onto the tree for that group. This
source-to-RP mechanism is facilitated by a Register Message.
This method is inefficient however, and only needs to be used to establish an initial source-receiver relationship. When
the RP starts receiving the encapsulated packets from the source DR it will begin building a source tree path back
toward the source using (S,G) Joins that specifically contain the source address. Eventually the source specific (S,G)
Joins will make it back to the source DR. At this point, the source DR will forward unencapsulated packets toward the
RP. The RP will then be receiving two copies of the multicast traffic - encapsulated and unencapsulated. The RP will
drop the encapsulated packets and send a PIM Register-Stop to the source DR, and at this point the DR will stop
sending encapsulated packets to the RP for that group.
So far some efficiency has been gained in that the RP is now receiving unencapsulated native multicast traffic and forwarding it native to the receiver as well. However, further efficiency is created by allowing the router attached
to the receiver to join a source based tree. With the traffic hitting the receiver’s router natively, this router now knows
the source for the group. It will initiate an (S,G) Join back toward the source (based on the MRIB, as it contains the
best path toward the source based on reverse-path forwarding built on the unicast tree) until it reaches the source router
or an intermediate router that already has an entry for that specific (S,G) pair. At some point in the tree a router will be
receiving traffic from the source on the shortest-path/source tree and the RP simultaneously. The router will drop the
traffic from the RP as well as send a special PIM Prune message toward the RP, denoted as an (S,G,rpt) Prune.4 [10,
p. 4–8].
3 The
DR is one of several routers that exists on a LAN, and is selected through an election process
PIM Join and Prune message are actually the same message, referred to as a PIM Join/Prune Message. They are distinguished
based on whether the group address is in the Join or Prune field of the message [11, p. 708]
4 The
6
Another message used in PIM-SM is the Hello Message. The Hello Message is used by PIM to discover neighbors,
maintain adjacencies, and elect DRs in a LAN environment. The Hello messages contain a holddown timer which tells
the router how long to wait before determining a neighbor is down. The message is sent at a regular interval, typically a
number of seconds. The well known address used for Hello Messages is the ALL-PIM-ROUTERS address of 224.0.0.13
[10, p. 21].
1.1.4.2
PIM Dense-Mode
PIM also has a source tree mode where the router with receivers immediately builds a shortest-path tree back to the
source. In contrast to PIM-SM, PIM-DM uses a “push” method rather than a “pull” method[4, p. 80]. PIM-DM is
described in RFC 3973. The basic operation of PIM-DM is to flood multicast traffic throughout the network, then
“prune” back the links that do not have any active receivers. The prune is sent upstream toward the source. Another
message called a PIM Graft is used when a link needs to be re-added to the multicast tree. The Prune state is based on
a timer. When the Prune timer expires traffic will once again be transmitted down a link that was previously pruned
toward potential receivers. A router can also send a Graft message toward the source when a receiver joins an area that
was originally pruned from the source tree. PIM-DM uses (S,G) notation only, and each (S,G) pair has a timer associated
with it to maintain state and does not rely on keepalive messages [12, p. 5-6]. PIM-DM also uses the Join message only
to override a prune [12, p. 13].
PIM-DM
PIM-SM
S1
S1
1
2
1
2
3
4
3
5
6
RP
7
5
6
Prune
Traffic
End Node
4
7
Join
Source
Source Join
Traffic
Receiver
Intermediate Node (Router)
Figure 1.3: PIM-DM vs PIM-SM
Figure 1.3 makes a basic comparison between PIM-DM and PIM-SM. The graphic on the left shows S1 sending out
traffic to all active receivers. Since node 6 does not have any active receivers it sends a prune message back toward S1
7
via node 4. Node 2 also does not have an active receiver so it sends a prune toward node 3. In contrast, with PIM-SM a
PIM Join is sent by any router that’s aware of an active receiver. The Join is sent in the opposite direction of the traffic
flow. A dash-dotted arrowed line from node 4 to 1 is a source-specific Join that the RP sends to the source once it
starts receiving the encapsulated traffic. As described in section 1.1.4.1 (Sparse Mode) eventually the traffic to each
receiver will evolve into a source based tree similar to the PIM-DM tree, where all traffic is native (unencapsulated) from
the source to the receiver, whether it goes through the RP or not. The graphic on the right only shows the initial stages
of PIM-SM.
1.1.4.3
PIM Single-Source Mode
As laid out in RFC 4607 some extra considerations are required when receiver joins a group in the 232.0.0.0/8 range. [6,
p. 4] IGMP was expanded so it can handle source-specific messages. PIM wasn’t expanded, but RFC 4601 mentions
specific semantics and rules to be applied for SSM groups that makes PIM Single-Source Mode (PIM-SSM) a subset
of PIM-SM. Mainly, it specifies that when the SSM range is used the (*,G) Join cannot be utilized and the tree must
be built using a source tree with (S,G) Joins. Also, there is no need for an RP. This means that the PIM Register
and Register-Stop processes are not used, and there is no need for the special (S,G,rpt) Prune since the source tree is
always built. Otherwise, the mechanics for building a tree in PIM-SSM are the same as PIM-SM by utilizing (S,G) Joins
directly to the source in the opposite direction of the traffic flow. The same RPF and MRIB constructs are used [10,
p. 80–81].
1.2
MPLS
Multiprotocol Label Switching (MPLS) is an IP technology that uses one or more shim headers (called labels) to forward
packets rather than the address information contained in an IP header. The shim sits between the IP header and the
payload in the packet. A network that is MPLS enabled consists of two main types of routers: Label Edge Routers (LERs)
and Label Switch Routers (LSRs). Throughout the MPLS are Label Switched Paths (LSPs) which are unidirectional
tunnels that carry packets5 through the network. An LSP begins at an LER and passes through LSRs in the middle of the
network. The LER can create many LSPs, and it decides which LSP to place a packet using a Forwarding Equivalency
Class (FEC). A basic example of a FEC are packets that all have the same destination IP address [13, p. 6–7]. The LER
is either an ingress router, where the LSP begins, or an egress router where the LSP ends.
A label is 4 bytes in size and consists of a 20-bit value, a 3-bit traffic-class value (commonly referred to as EXP bits), a
bottom of stack bit which has a value of one when it is the bottom (or only label) in a “stack” of labels between the
header and the payload, and a 8 bit TTL field which as the same function as an IP TTL. An MPLS router forms many
mappings of an ingress label to an egress label and an associated interface. An LER or LSR will either “push” (add
a new label), “swap” (exchange one label for another), or “pop” (remove a label). The ingress LSR will push one or
more labels onto an IP packet based on FEC information to form the LSP. The router exchanges the incoming label,
based on the mappings it already established, with the egress label and then sends the entire packet with its labels to the
next router for a similar operation, or a pop operation since it’s the last router in the LSP (the LER). This exchange
operation is called label swapping. Basically the router is selecting the interface to the next-hop based on the inner label.
There also is an additional operation called Penultimate Hop Popping (PHP) where the penultimate router will pop a
label exposing either another label or the IP header itself. The former is a common operation in Layer 3 VPNs and is
discussed in section 1.4 [13, p. 7–9].
5 An LSP can also carry Layer 2 information without an IP header, such as plain Ethernet, with a technology called Layer 2 VPNs. These
are outside the scope of this report.
8
1 LER
2
3
4
5 LSR
6
LER
7
Figure 1.4: MPLS LSPs
The line in figure figure 1.4 represents a unidirectional LSP. Its origin is at the LER, transits an LSR, and terminates at
another LER. In one of many scenarios the LER, node 1, will have pushed a label into the IP header, node 5 will do a
swap operation, and it knows to send that packet through the interface that connects it to node 7 based on the label it
gets from node 1.
Each MPLS router contains a database of labels which need to be populated. These are done by MPLS signaling
protocols. The following sections will discuss the two main signaling protocols, LDP and RSVP-TE, as well as their
additional mechanisms for Point-to-MultiPoint (P2MP). P2MP forwarding has a single ingress router with multiple egress
routers for the same LSP. A router in the middle will copy the traffic and send it out two or more interfaces with a
separate label for each interface. A router that does replication is also referred to as a branch node. Downstream from a
replication point is a branch node. As with regular LSPs, the P2MP LSP is unidirectional [13, p. 165–166].
1 LER
2
3
4
5 LSR
6
LER
7
Figure 1.5: Point-to-Multipoint MPLS LSPs
Compare figure 1.5 to figure 1.4. Figure 1.5 has node 5 as a branch node which replicates the traffic to both node 7 and
node 3. In this case, node 7 is a branch node while node 3 is a branch node and a transit node.
9
1.2.1
MPLS Signaling
An association between an IP subnet and a label is called a label binding. A signaling protocol is required to build and
distribute these bindings. To accomplish this the engineering community created a new protocol called Label Distribution
Protocol (LDP) and also extended an existing protocol called Resource Reservation Protocol (RSVP). RSVP was
extended to become RSVP Traffic Engineering (RSVP-TE) [13, p. 11]. BGP was also extended to distribute labels. This
will be covered more in section 1.4.3.1.
1.2.1.1
LDP
LDP was defined in RFC 5036, which updates RFC 3036, as a specific protocol for handling labels in MPLS networks.
LDP uses message exchanges between directly connected peers or through targeted sessions that span multiple hops.
In either case, the peer that exchanges messages is an LDP neighbor. These messages are used for session setup and
information exchange. Once a session is setup the neighbors exchange label binding information between the labels and
FECs (e.g. IP subnet). LDP has a fundamental rule that the LSP it is creating will always follow the shortest path
of the Interior Gateway Protocol (IGP) such as IS-IS or OSPF. LDP relies on the IGP to determine the shortest path
throughout a network based on its routing metrics. LDP distributes its labels from egress to ingress. The egress router
will advertise a label {L1} for a given FEC to its upstream neighbor. The upstream neighbor will decide, based on the
IGP shortest path, if it should use L1 to forward downstream to that FEC on the egress router. If this checks, the
upstream neighbor will use that label to forward traffic to the egress router that initiated it. The upstream neighbor will
then apply label L2 for that FEC, and advertise that label to its upstream neighbors. This process continues with all
routers throughout the network [13, p. 12–13].
An LSP creation in LDP is demonstrated simply in figure 1.6 where node 7 advertises label {100} back toward node 5 for
a given FEC. Node 5 installs this label in its forwarding table (assuming it’s the shortest path based on the IGP) then
advertises label {50} back to node 1 which also installs the label. For an LSP, the ingress router will now push label {50}
and forward the packet to node 5 which swaps {50} for {100}, then forwards it on to node 7 where the label is finally
popped. The LSP now consists of labels {50} and {100}.
Push Label
{50}
1
2
{50}
4
6
5
Pop Label
{100}
3
Swap Label
{50},{100}
{100}
7
Figure 1.6: LDP Signaling
RFC 6388 describes the extensions for multicast LDP (mLDP). The LDP message has an extension added so that a
label can be associated with a “P2MP FEC” value, which is the combination of the source address of the tree and a
unique identifier. A router must be able to understand mLDP labels and the capability is advertised during LDP neighbor
initialization. Using the P2MP FEC an mLDP enabled router can associate the labels as part of the same tree [14,
p. 6–11]. As a result when the mLDP router receives two labels that contain the same P2MP FEC it knows to only
advertise one label upstream toward the source. The procedure for advertising a label is slightly di↵erent from regular
LDP. In regular LDP a router will only use the label for forwarding that matches the IGP best path. In the case of mLDP,
10
the router will only advertise a label that follows the IGP best path toward the source [13, p. 173–174]. In essence,
mLDP is doing its own RPF check in order to advertise a label. Figure 1.7 illustrates two labels, {100} and {200} that
are being advertised up the shortest path toward source A. A new P2MP FEC is used which consists of source A and the
unique identifier of 1 (this is just an arbitrarily picked value). Since both labels belong to the same P2MP FEC the mLDP
router, node 5, advertises only a single label back toward the source. Node 5, when receiving label {50} will replicate the
traffic toward nodes 7 and 3 using labels {100} and {200} respectively.
Source A
1
2
P2MP {50}
FEC: A, 1
4
5
3
{200}
P2MP FEC: A, 1
P2MP FEC: A, 1
{100}
6
7
Figure 1.7: Multicast LDP Signaling
1.2.1.2
RSVP-TE
Resource Reservation Protocol (RSVP) was originally created with Quality of Service (QoS) in mind. It had mechanisms
that allowed for reserving bandwidth in a network for a specific flow. Scalability concerns doomed it from ever becoming
widespread but the mechanisms for bandwidth reservation proved useful in MPLS networks and it evolved into RSVP
Traffic Engineering (RSVP-TE), and was originally defined in RFC 3209. RSVP-TE is di↵erent from LDP in that it
doesn’t necessarily follow the best path provided by an IGP and therefore doesn’t rely on the IGP for shortest path
information. Also, the LSP is set up from the ingress router, also called the headend router. The ingress router sends a
Path Message toward the egress router, which is defined by an IP address (such as a loopback interface) on the egress
router. Once the Path Message makes it to the egress router it responds with an Resv Message (“reserve message”)
back toward the initiating ingress router. The Resv Message is only addressed to the next-hop back toward the ingress,
and each subsequent Resv Message along the path is also one hop. This is because each Resv Message contains a label
along with bandwidth reservation information. The path that the ingress router sets can be dynamic, which utilizes a
traffic engineering database, or statically configured6 [13, p. 21–27].
6 RSVP-TE
allows for more than just label reservation as it also has traffic engineering capabilities as well as Fast Reroute capabilities
allowing for SONET-like failover times in a packet switched network. The mechanics for setup of RSVP-TE such as path computation are
outside the scope of this report.
11
Push Label
{50}
1
2
{50}
4
5
3
Swap Label
{50},{100}
{100}
Pop Label
7
{100}
Path Message
Resv Message
6
Figure 1.8: RSVP-TE Signaling
Looking at figure 1.8 shows how RSVP-TE accomplishes the same task by building an LSP from node 1 to node 7 but
with a di↵erent method. Node 1 initiates the LSP by sending a Path Message toward node 7 using an IP address for
node 7. Once node 7 receives the path message it responds with a Resv Message to node 5, its upstream router back
toward node 1. The Resv Message toward node 5 contains the label {100} and also traffic reservation information (not
shown). Node 5 then repeats this process to node 1, advertising label {50}. At this point node 1 will push 50 onto a
packet then forward it to node 5, where label {50} is swapped for {100} and sent to node 7.
The mechanisms for P2MP RSVP-TE are mostly the same as regular RSVP-TE. The P2MP version uses the same Path
and Resv Messages to set up the path, and each egress LER gets its own sub-LSP [13, p. 167–169]. A new identifier
called a P2MP SESSION Object, defined in RFC 4875, is used to relate the multiple sub-LSPs together so that the
router knows that they are the part of the same P2MP LSP. The session object contains three fields: P2MP ID, a
Tunnel ID, and an Extended Tunnel ID. In the P2MP SESSION Object the P2MP ID is the IP address of the destination
LSR. The Tunnel ID is a unique 16-bit number, and the Extended Tunnel ID is either blank or the IP address of the
ingress LSR .[15, p. 5].
1
2
{50}
3
{50}
{200}
4
5
{100}
6
7
Path Message
Resv Message
Figure 1.9: Multicast RSVP-TE Signaling
Figure 1.9 is very similar to figure 1.8 except that two separate Path and Resv Messages are used resulting in label {50}
12
being advertised twice, one for each sub-LSP. Recall that for a P2MP LSP there is a P2MP SESSION Object that “ties”
the two sub-LSPs together.
1.3
BGP
Border Gateway Protocol (BGP) was originally created to be a new Exterior Gateway Protocol (EGP) for IP networks.
BGP was originally conceived during the 12th meeting of the IETF in 1989 and eventually evolved into RFC 1779, later
obsoleted by 4271. BGP creates loop free topologies between and through various autonomous systems using a path
vector methodology that analyzes a path of a network rather than simply using the lowest cost path like an IGP [16,
p. 1–9]. The usefulness of BGP isn’t limited to just its scalability, especially as it pertains to multicast VPNs. The
construction of BGP allows it to be extendible. This versatility was leveraged to support additional protocols and gave
the foundation for services such as multicast VPNs which exchange information beyond IPv4.
1.3.1
UPDATE Message
BGP consists of OPEN, NOTIFICATION, KEEPALIVE, and UPDATE Messages for setup and session control. However
the UPDATE message will be the focus of this report as it is the message that carries, with some modifications discussed
in section 1.3.2, the multicast information needed in multicast VPNs. An UPDATE message is used to exchange feasible
IPV4 prefixes, or to withdraw them, between BGP speakers (BGP enabled routers). The UPDATE message contains,
among a few other things, a field for withdrawn prefixes, Path Attributes, and a field for Network Layer Reachability
Information (NLRI) which carries the feasible prefixes that a BGP speaker knows about.
Below the encoding of the UPDATE message is shown.
+————————————————————–+
| Withdrawn Routes Length (2 octets)
|
+————————————————————–+
| Withdrawn Routes (variable)
|
+————————————————————–+
| Total Path Attribute Length (2 octets)
|
+————————————————————–+
| Path Attributes (variable)
|
+————————————————————–+
| Network Layer Reachability Information (variable) |
+————————————————————–+
Within an UPDATE Message there are several Path Attributes defined, only one of which will be discussed in detail in
this report (NEXT HOP). BGP uses Path Attributes to add information to a set of prefixes that a BGP speaker can use
to manage and control how the prefixes are added to its Route Information Base (RIB) and the global routing table.
Certain attributes can also be used in policies for greater administrative control over how the prefix is stored or sent to
other routers. The NEXT HOP attribute contains an IPV4 unicast address that is used as the next-hop for the prefixes
contained in the NLRI field and represents the router that either has these prefixes directly connected or knows how to
reach them. A BGP speaker MUST be able to process the NEXT HOP Path Attribute7 .
The NLRI field in the original BGP implementation is fairly straightforward as it contains a list of IP address prefix and
their lengths (subnet size). The number of prefixes contained in an UPDATE message is variable. An UPDATE message
can contain only one set of Path Attributes. If only one IP prefix pertains to that set, then there will only be one prefix
7 BGP defines characteristics for Path Attributes as follows: Well Known Mandatory, Well Known Discretionary, Optional Transitive, and
Optional Non-Transitive. The NEXT HOP Path Attribute is Mandatory Well Known and must be handled by the BGP speaker. Optional
Transitive on the other hand does not need to be handled by the BGP speaker and can be forwarded to another BGP speaker. For more
details refer to RFC 4271 section 5.
13
contained in the NLRI [17, p. 14–21]. Prefixes matching another set of Path Attributes need to be sent in a separate
UPDATE message [16, p. 13].
1.3.2
Multiprotocol BGP
Originally BGP was created with IPv4 addressing in mind [16, p. 35]. In order to carry more than just IPv4 information
Multiprotocol BGP (MP-BGP) was defined in RFC 2858, and was later obsoleted by RFC 4760. To extend the capabilities
of what BGP can carry two new Path Attributes were created, called MP REACH NLRI and MP UNREACH NLRI.
Unlike, for example the NEXT HOP Path Attribute, these two new Path Attributes are not required to be processed by
the router. Therefore if the router does not understand or support the new Path Attributes the router can simply ignore
them8 . MP UNREACH NLRI functions similarly to the field for withdrawn prefixes in the UPDATE message. If anything
other than IPv4 needs to be sent by a BGP speaker it uses the MP REACH NLRI Path Attribute. It has a similar
role to the legacy NLRI but it has been extended to identify other protocols as well as carry their information. The
MP REACH NLRI also contains its own Next Hop field. The NLRI is encoded depending on the protocol being carried. To
identify what protocol is being carried MP-BGP defines an Address Family Identifier (AFI) and Subsequent Address Family
Identifier (SAFI). The formatting Next Hop is also dependent on the AFI and SAFI of the MP REACH NLRI Path Attribute.
+————————————————————-+
| Address Family Identifier (2 octets)
|
+————————————————————-+
| Subsequent Address Family Identifier (1 octet)
|
+————————————————————-+
| Length of Next Hop Network Address (1 octet) |
+————————————————————-+
| Network Address of Next Hop (variable)
|
+————————————————————-+
| Reserved (1 octet)
|
+————————————————————-+
| Network Layer Reachability Information (variable) |
+————————————————————-+
Above the encoding of the MP REACH NLRI Path Attribute is shown, which is a part of the UPDATE Message
encoding shown on page 13. Note that the MP REACH NLRI Path Attribute has its own Next Hop and NLRI fields, the
structures of which are determined by the AFI and SAFI combination[18, p. 1–5]. As it will be seen in this chapter and
the following chapters the MP-BGP MP REACH NLRI and MP UNREACH NLRI Path Attributes will be used to enable
extensions to unicast routing and multicast routing by reserving their own AFI and SAFI numbers and creating unique
NLRI encodings for each extension.
Sometimes it will be described that a route carries certain attributes. This is just another way of describing an UPDATE
Message that has a certain set of attributes that are associated with particular route or set of routes that uses those
attributes.
1.4
BGP/MPLS Virtual Private Networks
BGP/MPLS Virtual Private Networks or BGP/MPLS VPNs, also known as Layer 3 VPNs (L3VPNs), create a method
for service providers (SPs) to provide IP VPN services to their customers. The method was originally described in RFC
2457bis but was obsoleted by RFC 4364. As we will see later in this report, BGP/MPLS VPNs are very important
components for multicast VPNs since they borrow the mechanisms that are defined in RFC 4364. As the name implies,
BGP/MPLS VPNs utilize the concepts of the previous two sections of this report.
8 MP REACH NLRI and MP UNREACH NLRI are optional non-transitive meaning the router can ignore them then must drop them if
ignored.
14
The major components of BGP/MPLS VPNs that will be discussed are as follows: Network topology and terminology,
virtual routing and forwarding tables, BGP addressing and advertisement, and forwarding.
1.4.1
Network Topology and Terminology
BGP/MPLS VPNs come with their own set of terms describing network components. In the world of VPNs the network
is broken up into Customer Edge (CE) routers, Provider Edge (PE) routers, and Provider (P) routers. The P routers sit
in the core of the SP network and in the path of the VPN there can be one or more of them (and in some rare cases
none). As the name implies the PE routers sit at the edge of the SP network and connect to one or more CE routers
that sit at the customer’s location. The connection between the PE and CE routers is called an attachment circuit (AC).
Figure 1.10 shows an example topology. Nodes 1, 2, 6 and 7 are the PE routers, each with a CE router attached to it.
The red CE routers belong to one customer, CE1 being at site 1 and CE2 being at site 2 for that particular customer.
The same applies to the blue CE routers, which belong to a di↵erent customer [19, p. 5–9]. Two separate customers can
also connect to the same PE and remain isolated. Virtual Routing and Forwarding Tables make customer separation
within a router possible.
CE
CE A1
B1
1 PE
C
D1
2 PE
3 P
P
4
5 P
Service
Provider
Network
6 PE
D2
A2
7 PE
CE
B2 CE
Figure 1.10: Service Provider Network with Customer Sites
1.4.2
Virtual Routing and Forwarding Tables
In a PE model the PE is responsible for keeping the routing information separate between customers. A Virtual Routing
and Forwarding Table (VRF) is used to accomplish this. The VRF is a routing table that is kept separate from the main
routing table, which will be referred to global routing table in this report, and other VRFs on the same PE. The PE
router also maintains independent forwarding information for each VRF. In essence a VRF behaves like a router within a
router using the same mechanisms to learn prefixes and forward traffic over a network. The AC between a CE and a PE
is associated with a specific VRF for only that customer. The PE router learns prefixs from the CE by using any IGP
or BGP, and static routes can also be configured within a specific VRF9 . The PE router maintains these prefixes in a
separate logical table that indicates which interface to use for the prefixes learned from the CE [19, p. 9–12].
9 See
section 7 from RFC 4364 for more details
15
1.4.3
BGP Addressing and Advertisement
The purpose of a BGP/MPLS VPN is to connect remote customer sites over a Service Provider network. Figure 1.10
shows two customers, each with two sites, on opposite sides of the network. The previous section mentioned that a
CE will exchange prefixes with a PE and the prefixes will be placed in a particular VRF. BGP has been updated using
multiprotocol extensions discussed in section section 1.3.2 so that the prefixes in one PE can be sent to another PE on
the other side of the network.
1.4.3.1
VPNv4 Address Family
RFC 4364 introduces the VPN IPv4 (VPNv4) Address Family in section 4.1.
Route Distinguisher The key part of the VPN-v4 Address Family is an 8-byte Route Distinguisher (RD) that is
prepended to an IPv4 Address. The purpose of the RD is not to convey any additional information about a subnet, but
to make any address unique when it is in the domain of the service provider network [19, p. 12–13]. The RD has two
formats defined by a Type field, either 0 or 1. In addition to the two byte Type field are the Administrator and Assigned
Number subfields, both of which add up to six bytes. The first variation is when the type field is 0, which means the
Administrator Subfield is 2 bytes and the Assigned Number subfield is 4 bytes. In this case the Administrator Subfield is
the Autonomous System Number (ASN) field that is assigned by the IANA for a Service Provider to which the ASN is
assigned. The Assigned Number is assigned by the Service Provider and is an arbitrary number. The second variation
is when the Type field is 1 which means the Assigned Number is four bytes and the Assigned Number is two bytes. In
this case the Administrator field is an IPv4 IP address, and is recommended to be a public IP address. The Assigned
Number is assigned by the Service Provider to which the IPv4 address is assigned [20, p. 116–117]. Because of the RD
and the VRF route table isolation, customers can advertise the same address space over the service provider network,
including RFC 1918 private IP addresses (e.g. 192.168.0.0) which are not allowed to be advertised over the Internet [19,
p. 12–13]. An example of a Type 0 RD is 65000:100 [21, p. 435]. Type 0 RDs (and Route Targets discussed next) will
be the convention used throughout this report.
Route Target Although VRFs keep the routing information separate for di↵erent CEs on the PE router the same BGP
session is used to forward the prefixes to other BGP speakers/PEs throughout the Service Provider network. The prefixes
in the VRF are converted to VPNv4 prefixes when they are exported from the VRF to the PE BGP table. BGP will
then use its knowledge of the network to distribute the route to the other PEs that need to know about it. The far
end PE will then import the VPNv4 addresses into the VRF associated with the same customer as the VRF on the
advertising PE. To control which VRF is allowed to import which prefixes, a new Path Attribute is created called a Route
Target (RT) [19, p. 15–16]. The Route Target uses the same structure as the RD, however it is not prepended to an
IPv4 address. The RT is actually defined in RFC 4360 which defines several Extended Communities for use in BGP, and
mentions BGP/MPLS VPNs as a possible use for RTs. The RT is a specific form of the Extended Community BGP
Path Attribute which is an eight byte value. Like the RD, a Type field defines whether or not an ASN or IPv4 address is
used as the Administrator Field, and the Assigned Number field is an arbitrary number assigned by the Service Provider
to which the ASN or IP address is assigned [22, p. 2–6]. The RT acts as an identifier for a prefix advertised BGP. As
the prefix is exported from the VRF to the BGP table an export RT is configured for that VRF. When BGP sends an
UPDATE Message it eventually makes it to a PE that is connected to the same customer. This PE has an import RT
configured for the VRF. For the prefixes to be imported to the VRF the RT must match the value that was set on the
other PE that exported the prefixes into BGP. A VRF must have at least one export and import PE, but they do not
need to be similar on the same PE within the same VRF.
16
VRF B:
RD – 789:201
Export RT – 789:2
Import RT – 789:2
CE
B1
VRF B
C
Global Table
D1
PE
VRF D
Figure 1.11: VRFs and Attachment Circuits
Figure 1.11 shows two customers, A1 and B1, each at their own site, connected to a PE. Both customers have an
attachment circuit that is associated with a single VRF. There is a third customer that connects to the global routing
table. In this report attachment circuit will only refer to interfaces (physical and logical) that are associated with a
VRF even though the same transport technology (such as frame relay, SONET, or Ethernet VLAN) is used to connect
all the customers to the same router. Also, more than one CE can connect to the same VRF, either using separate
physical interfaces or the same physical interface and multiple logical interfaces such as VLAN subinterfaces. Furthermore
two separate logical interfaces can be in separate VRFs even if there is only one physical circuit. In any case, a VRF
is associated with only a certain set of prefixes that come from the customer via an IGP, external BGP, or statically
configured in the VRF on the PE, and these prefixes remain separate from the global table and any other VRF on the
same PE. VRF B, the VRF associated with customer B, also has an RD of 789:201 and an RT of 789:2. It exports and
imports the same RT, so it will accept prefixes from any VRF exporting 789:2 and any VRF importing 789:2 will accept
prefixes from Customer B site 1. Any prefixes within VRF B at site 1 will be prepended with 789:201 when being sent via
BGP to other PEs. The RT and RD values are assigned by the SP. Note that site 1 is configured with RD 789:201,
while site 2 can be configured with 789:202 as shown in figure 1.13 on page 19. Each VRF can have its own RD value.
The use of 789 is the AS Number of the SP.
MPLS/BGP VPNv4 NLRI Formatting While various protocols may be used to connect the PE and the CE, the
PE-PE communication is carried by BGP. Each PE is a BGP speaker and forms a BGP session with the other PEs
with the capability of advertising VPNv4 addresses within the BGP UPDATE message. For VPNv4 an AFI of 1 is used
(IPv4) and a SAFI of 128, which signifies it’s a labeled VPNv4 NLRI. Recall from section 1.3.2 that this information
is contained within the MP REACH NLRI Path Attribute of the BGP update. The structure of the NLRI field within
the MP REACH NLRI Path Attribute is defined in RFC 310 7[19, p. 22] as follows: A length field, a label field, and an
address field [23, p. 3]. The address field in the VPNv4 NLRI is a combination of the RD and IPv4 address from a VRF
[19, p. 22]. The VPNv4 message also contains a Next Hop field which contains the address of the PE that is advertising
and an RD of 0:0. The Next Hop is formatted this way because MP-BGP requires that the address format of the Next
Hop is the same as the format of the prefixes in the NLRI. This Next Hop is also referred to as the BGP Next Hop. [19,
p. 17].
In summary, an UPDATE Message sent between two PEs using the VPNv4 Address Family is summarized in Figure 1.12
on the following page.
17
BGP UPDATE
Withdrawn Routes Length
Withdrawn Routes
Total Number of Path Attributes
NEXT-HOP Path Attribute
(Legacy BGP) Path Attribute
MP_REACH_NLRI Path Attribute
AFI 1
SAFI 128
Next Hop
NLRI
Length Field
Label Field
RD Field
Extended Community Path Attribute
Flags
Route Target Value
Network Layer Reachability Information
Figure 1.12: MP-BGP VPNv4 BGP UPDATE Message Example
The extensibility of the BGP protocol, and the concept that allows MP-BGP to exist, is the Path Attribute. In the above
figure a VPNv4 BGP UPDATE Message is shown, showing how the Path Attributes and their respective fields are nested
within the UPDATE message. The values in the AFI, SAFI, the fields in the NLRI field of the MP REACH NLRI, and
the presence of the Extended Community of Route Target type Path Attribute are unique to the VPNv4 message. If the
SAFI number were di↵erent the NLRI field of the MP REACH NLRI Path Attribute may be formatted di↵erently, and
the Route Target Extended Community may not be there at all, replaced with a di↵erent Extended Community.
1.4.4
Forwarding
The forwarding used for MPLS/BGP VPNs is MPLS using a combination of the label sent using BGP and another label
using LDP or RSVP-TE. The label carried in BGP is referred to as the VPN Label or the BGP Label and the one learned
by LDP or RSVP-TE is the IGP Label or the Tunnel Label since this label is used to tunnel the VPN traffic through
the Service Provider network. The IGP Label is associated with the IP address that the PE used to advertise the BGP
message, and can be referred to as the IGP Next Hop [19, p. 23–24]. The BGP Next Hop and the IGP Next Hop are
typically the same IP address and assigned using the address on a loopback interface [24, p. 115]. The IGP Next Hop is
18
advertised throughout the network using an IGP, and a label is associated with it and advertised hop-by-hop using the
MPLS mechanisms discussed in section section 1.2.
B1
MP-BGP Message
VPNv4 Address: 789:202:10.2.2.0/24
BGP Next Hop: 1.1.1.7/32
Label: {123}
Route Target 789:2
5
2 RT Export: 789:2
RT Import: 789:2
3
LDP Label for
1.1.1.1/32 {Imp-Null}
LDP Label for
1.1.1.1/32 {200}
LDP Label for
1.1.1.1/32 {100}
RT Export: 789:2
Loopback: 1.1.1.7/32
7 RT Import: 789:2
RD: 789:202
B2
10.2.2.0/24
Figure 1.13: VPN Label Advertisements
Figure 1.13 provides a summary for the BGP VPN advertisement. It shows labels being advertised hop-by-hop by LDP
for the loopback address 1.1.1.1/32. Node 2 is advertising an Implicit Null label which tells the upstream router to pop
the top label rather than leave it on. This signals a penultimate hop pop. A label of {123} is also being advertised by a
BGP UPDATE message which also contains the VPNv4 prefix 789:202:10.2.2.0/24. The 789:202 is the RD of Customer
B Site 2. RT information is also included as 789:2, and the VRFs at both sites for Customer B is configured to import
and export that RT. These two labels combine to form a label stack. The BGP Label is the inner label of the stack and
is therefore sometimes referred to as the “Inner Label.” The IGP Label is on top of the stack and is used to forward the
packet through the Service Provider network. As the packet traverses through the network each subsequent hop swaps
the top IGP label while the inner BGP label remains the same. Once the packet reaches the far-end PE the IGP label is
popped (or it is popped at the penultimate hop using PHP). The PE is then able to use the BGP Label to forward the
packet to the correct CE router using a standard label lookup and forwarding process [25, p. 204–206].
{300}
{123}
IP
B1
2
Swap
3
{200}
{123}
IP
Pop
{123}
IP
10.2.2.0/24
5
7
B2
Loopback: 1.1.1.7/32
Figure 1.14: VPN Forwarding
Figure 1.14 is another look at figure 1.13 showing the label stack and how it changes hop by hop between the two PEs.
Since the Imp-Null label was advertised by node 2 to node 3 the IGP Label is popped.
19
1.4.5
Inter-AS Considerations
In some situations, depending on the operator, a VPN may extend beyond a single AS. This section briefly describes
the terminology and options that support this scenario. In each case, eBGP is used to communicate between the two
networks.
Option A: Back-to-Back VRFs In this option an Autonomous System Border Router (ASBR) has a single interface to
the ASBR in the other network. The interface has multiple subinterfaces, at least one per VRF, that is used to
exchange routes for that VPN/VRF.
Option B: Labeled VPNv4 Routes In this method an ASBR will receive VPNv4 routes using iBGP, and will then
exchange them to another ASBR in another network using eBGP. That ASBR will distribute the labeled VPNv4
routes within the network to another ASBR in another network. This option should only be used between trusted
networks. An LSP is required end-to-end over both networks, and Route Targets must be agreed upon.
Option C: Multihop eBGP for VPNv4 For this scenario two separate networks exchange /32 host addresses representing the BGP process for a router. The PE routers in the di↵erent networks create a multi-hop eBGP session
(default for eBGP is only 1 hop as the default TTL for a BGP message is set to 1) to exchange the VPNv4 routes.
This requires three labels in a stack. The bottom label is the one found in the VPNv4 update. The middle label is
the one bound to the /32 host address for the edge PE. The top label is bound to the /32 address of the ASBR.
This way from the perspective of a packet from a particular PE, it uses the top label to get to the edge of the
network, the label is then popped and the packet is forwarded to the other network where the middle label (now
the top label of a two label stack) is used to reach the other PE, then the bottom label is used for the specific
VRF as in normal BGP/MPLS operation.
1.4.6
BGP/MPLS VPN Summary
The important takeaways for BGP/MPLS VPN, as it relates to Multicast VPNs, are that each PE has one or more VRFs,
and that the VRFs on all the PEs in the SP network are linked by their Route Targets which determine which VRF can
accept which routes. The Route Target is configured for a VRF, and is carried in the Route Target Extended Community
of the BGP UPDATE Message. In a simple case all the VRFs for a single customer use the same Route Target. Also,
each VRF can be uniquely identified by its Route Distinguisher. As will be seen in the next chapters a single IPv4 address
configured on the PE, usually on a loopback interface, should be used as the BGP Next Hop. The same IPv4 address
can be used by extensions to other protocols and the Multicast VPN mechanisms can then map messages within those
protocols to messages within BGP and a specific VRF/VPN.
1.5
Generic Routing Encapsulation
Generic Routing Encapsulation (GRE) is defined in RFC 2784 as an attempt to create a generic description of how to
create tunnels transport IPv4 packets using another IPv4 header. The encapsulation is described as a payload packet
being encapsulated by a GRE packet. This GRE packet is then encapsulated by another protocol and is referred to as the
delivery packet. The defined values for the delivery protocol and the payload packet are both IP, therefore GRE currently
describes a method for IP-in-IP encapsulation [26, p. 1–5]. This technology is important in Draft Rosen VPNs. GRE
should not be confused with IP-in-IP encapsulation defined in RFC 1853.
1.6
Control Plane vs Forwarding Plane
An important concept in this report will be the control plane mechanisms in contrast to the forwarding plane mechanisms.
The idea of separate control and forwarding planes doesn’t have a definition and the idea can vary depending on what
technology is in focus. For example within the specific protocol MPLS the control plane can be thought of as RSVP-TE
or LDP label signaling, while the forwarding plane can be thought of as the router process of swapping the advertised
labels and forwarding the traffic throughout the network. Looking at BGP/MPLS VPNs, extended to multicast VPNs,
20
there is a suite of protocols such as BGP and PIM as well as RSVP-TE. This report defines the former, the protocols
involved in session setup, as the control plane, such as BGP advertising an MP-BGP UPDATE message. The protocols
responsible for forwarding the traffic through the network, such as RSVP-TE or LDP, will be defined as the forwarding
plane.
One example of this is the concept of a BGP Free Core. Each PE can form a BGP session with a BGP Route Reflector
(RR), which can be a dedicated router for distributing BGP routes only. This is in contrast to having a mesh of BGP
sessions throughout the Service Provider network. BGP connections between routers in the same as (Internal BGP
or iBGP) do not need to be directly connected as is a requirement with BGP connections to a di↵erent AS (External
BGP or eBGP). This means that the RR can sit anywhere in the network and not need to be directly connected to the
PEs and can be centralized. In this configuration BGP does not need to be configured on the P routers since the BGP
communication is PE-RR-PE. The BGP distribution across the network is the Control Plane.
As discussed previously the BGP UPDATE message carries the Next Hop address of the originating PE and this address
is also distributed by an IGP. MPLS labels are distributed through the network for each Next Hop address since each
one can be considered a FEC. When a PE sets up the forwarding path for traffic for a specific VRF it associates the
packet with a BGP label with an IGP label on top. The traffic is then forwarded hop by hop using only the IGP label.
The distribution of the IGP label via LDP or RSVP-TE and the forwarding process of swapping labels hop-by-hop is the
Forwarding Plane. The P routers have no knowledge of the routes on the PEs yet traffic can be forwarded through the
network. In e↵ect, the data is tunneled through the network in a VPN model whether BGP is along the forwarding path
or not.
RR
MP-iBGP
PE
P
Label
Distribution
MP-iBGP
P
Label
Distribution
PE
Label
Distribution
Figure 1.15: Control Plane vs Forwarding Plane
Figure 1.15 shows MP-iBGP communication between two PEs and an RR. The RR could be physically connected to one
or both of the PEs or it could be connected by any number of routers between it and the PEs. For this reason it does
not have any lines representing interfaces. A dashed line is used to represent the communication between the PE and RR
and is representative of the Control Plane communication. Traffic does not need to be forwarded through the RR and in
this scenario it is for exchanging BGP information only. The PEs are physically connected to the P routers, and labels
are distributed hop-by-hop. The label distribution is represented by the solid lines and is representative of the Forwarding
Plane communication. Note that the Forwarding Plane can have its own control communication such as PIM adjacency
establishment or LDP neighbor communication. However these are still considered part of the Forwarding Plane.
21
Chapter 2
Draft Rosen Multicast Virtual Private
Networks
The BGP/MPLS VPNs discussed in the previous chapter were designed to carry unicast traffic. With the growing
popularity of multicast services, various enterprises began to require multicast support between their sites over a Service
Provider (SP) network. Initial implementations of GRE tunnels or Layer 2 VPNs (aka pseudowires) provided results that
are not scalable. The Multicast Virtual Private Network (MVPN) solution, developed by Cisco, is a way to address these
issues. The IETF draft was written by Eric Rosen at Cisco and stayed in draft status, hence the name Draft Rosen
VPNs. Eventually the draft was turned into a historical RFC, number 6037, which will be used as one of the sources for
this chapter. For the rest of this report the solution will be referred to as Multicast VPN (MPVN). Although the MVPN
solution is built o↵ of unicast BGP/MPLS VPNs there is a large di↵erence between the two. However certain elements
from the unicast model are reused, such as VPNs, tunneling traffic through the network (with GRE instead of MPLS),
and the use of Multiprotocol BGP [13, p. 279–280].
2.1
Overview of MVPNs
Standard BGP/MPLS VPNs hide per-VPN state information from the P routers. They are not aware of how many VRFs
are on the PE routers in the network. For optimal multicast routing the P routers would need to maintain some sort of
per-VRF state information for the multicast replication trees. Even if the P routers did support this information they
would need to maintain multicast state information for every group for every customer so that the multicast tree is only
built to the PEs which require the traffic. This is not scalable. Multicast VPN provides a solution to the scalability issue
by allowing the SP to maintain a multicast tree only for each VPN rather than every group inside every VPN.
The solution has the following prerequisites:
• PIM-SM is used in the PE VRF instance.
• PIM is used in the SP network.
• The SP network supports multicast forwarding natively.
It is helpful to first define some terms used in the specification.
Customer Element vs. Provider Element The convention in MVPN documentation, and the convention that will
be used in this report, is to add C- for customer or P- for provider before the various technical terms that describe
MVPN.
Multicast VRF This is a VRF on a PE that the service provider configures to be multicast enabled. Within each VRF
is its own multicast routing table and PIM-SM adjacencies with a PIM capable CE router. The CE related PIM instances,
22
whether directly to the CE router or to the far end PE, will be referred to as C-Instances [27, p. 6]. The Multicast VRF
also participates in MP-BGP for VPNv4 addresses for unicast routes specific to the VPN as well as a new MDT-SAFI
address structure created for MVPN.
Multicast Domain
A Multicast Domain (MD) is a set of multicast VRFs that belong to the same MVPN.
Multicast Distribution Tree The tunnel that is used to carry multicast traffic across the SP network is referred to as
the Multicast Distribution Tree (MDT). The MDT is the MVPN mechanism that allows the C-Instance PIM sessions
between the PEs to appear as if they are directly connected, hiding the core of the network. At a high level each PE sees
the PIM adjacencies of the C-Instance as if they were directly connected via a LAN [13, p. 281–282]. MDTs are created
using the P-Instances of PIM in the SP network and are used to encapsulate the C-Packets of an multicast VRF. There
are two types of MDTs: Default and Data. The Default MDT is used to encapsulate all customer multicast traffic and
forward the traffic to each PE, at least initially. If the traffic volume becomes large and not all sites within the MD want
to receive the traffic one or more Data MDTs can be created. Each MD has at least a Default MDT and can have zero
or more Data MDTs [27, p. 5].
Multicast Tunnel The Multicast Tunnel or Multicast Tunnel Interface (MT or MTI) is an abstract concept as there is
no actual physical tunnel. From the perspective of the multicast VRF the MT is the interface for the path to the other
VRFs in an MD via an MDT. Depending on the router vendor or platform the tunnel will be displayed as “tunnelx” or
“MT” to represent the encapsulation or decapsulation interface [2, p. 67–69][2, p. 80–81].
A1
B1
1
2
3
B3
4
5
A3
6
7
A2
B2
Customer A MD
Customer A MVRF
Customer A MDT
Customer B MD
Customer B MVRF
Customer B MDT
Figure 2.1: MVPN Overview
A high level overview of Multicast VPNs is shown in figure 2.1. Both customers have three sites and a Multicast VRF
represented by the solid circle. Each customer also has its own Multicast VRFs which are part of the MD, each connected
by an independent MDT. Note how each customer has its own MD and MDT. Also recall that each MD can have
multiple MDTs (one Default and multiple Data) even though only one is depicted. Figure 2.2 shows a little more detailed
view correlating the terms discussed above using Customer A as an example.
23
Multicast Domain A
PIM C-Instance
PIM P-Instance
PIM C-Instance (Tunneled)
PIM C-Instance
C-IP Header
C-Payload
P-IP Header (GRE)
C-IP Header
C-Payload
C-IP Header
C-Payload
PE6
M-VRF A
A2
PE1
A1
P
M-VRF A
PE5
M-VRF A
MTI
A3
Figure 2.2: MVPN Details
From left to right, the C-Packets are encapsulated with GRE as it enters the MDT via the MTI. From this point on all
C-Groups are hidden and they are all transported through the network using the P-Group that’s assigned to the MDT.
Using that mechanism there could be 100 C-Groups but the SP network only needs to build a tree for the one P-Group
using P-Instance PIM. At PEs 5 and 6 the P-IP Header is removed along with the P-Group and the multicast traffic
is forwarded to the CE. The PEs use the P-Group to identify which VRF the MDT belongs to. This creates a LAN
environment from the perspective of the C-Instance as shown below in figure 2.3.
PE6
M-VRF A
A2
PE1
A1
M-VRF A
PE5
M-VRF A
Figure 2.3: MVPN C-Instance LAN
24
A3
2.2
2.2.1
MVPN Operation
Multicast Distribution Trees
The Multicast Distribution Trees (MDTs) in MVPN are used to carry the customer multicast control and data traffic,
already defined as C-Packets. These can be further broken down into C-PIM Join, C-Traffic, etc. The C-Packets are
encapsulated within the MDT and from the SP perspective the traffic becomes P-Packets. The MDTs can be shared
trees established using PIM-SM, source trees using PIM-SSM, or a combination of the two. Which is used is up to the
carrier [2, p. 61].
2.2.1.1
MDTs and Generic Routing Encapsulation
The tunneling aspect of MVPNs and MDTs is very important and therefore is explained before the MDT operational
details. When a customer sends multicast traffic (C-Packets) it first reaches the PE in the Multicast VRF where it is
part of the PIM C-Instance. If the traffic needs to be forwarded across the network it is encapsulated by the PE via
the logical MTI by Generic Routing Encapsulation (GRE) and decapsulated at the far-end PE by its logical MTI. The
encapsulation is what allows MVPN to scale. When the C-Packets from a customer enter the Multicast VRF and are
forwarded they are encapsulated by GRE so that the C-Source address and the C-Group address are encapsulated by
another IP Header forming a P-Packet. This header contains an address of the PE as the source address (typically the
address used for MP-BGP as well) and a unique-per-MDT address referred to as the P-Group address [2, p. 61][27, p. 13].
The SP network only uses the outer header to forward the traffic and build the MDT. Because of this encapsulation
the SP network can build the multicast trees mostly the same way as described in the first chapter using just the P-IP
Header. Some extra considerations for building the trees are necessary and are described in the following sections.
2.2.1.2
Default MDT
The Default MDT is used by every PE that is part of an MD as well as each Multicast VRF that is part of that MD.
The Default MDT is identified by an MDT Group Address, also known as the VPN Group address and defined earlier
as a P-Group Address. MDT Group Address and P-Group address will be used interchangeably. A CE router uses its
C-Instance PIM to exchange multicast routing information with the PE within its VRF. The routing information is then
sent across the MDT via the MTI from PE to PE. At the destination PE the information within the VRF is propagated
to the CE using its C-Instance PIM. The PE-PE multicast traffic that is carried across the MDT is also part of the
C-Instance, but is tunneled. Refer again to figure 2.2 where there is a PE-CE C-Instance on both sides, with a tunneled
C-Instance in the middle. This can also be thought of as one contiguous C-Instance where part of it is tunneled. Any
traffic that enters the Default MDT is sent to all PEs participating in that MDT [2, p. 62–66].
The Default MDT is created and maintained by the P-Instance of PIM in the SP network using standard PIM setup
procedures and using the global routing table of the SP’s IGP. If PIM-SM is used the MDT for a specific MDT Group
joins the shared tree that is rooted at the Rendezvous Point (RP). Just like standard trees in PIM each MDT has a
separate trees built defined by where the receivers/Multicast VRFs are located. A PE router in an MDT is both a source
and receiver. Using the P-Packets the PIM P-Instance can do normal RPF checks via the global IGP as it builds the
tree.
Figure 2.4 summarizes the operation of the Default MDT. Customer CE A1 sends traffic over the Default MDT to A2
while A2 is sending a PIM Hello over the MDT as well. The Default MDT is connected to all three PE’s with Multicast
VRFs for Customer A. While the customer is using Group Address 239.0.0.1 for the C-Traffic, it is encapsulated in the
MDT and the SP network forwards the traffic using the P-Group Address 233.3.21.1001 . A1 could also be sending
traffic using groups 239.0.0.2 and 239.0.0.3 and so on, but the same MDT Group Address of 233.3.21.100 is used.
Note that the same P-Group Address is used in both directions, while the C-Group for the Customer Join uses the
ALL-PIM-ROUTERS Group Address. Referencing figure 2.2 on page 24 the MDT for Customer B could have an MDT
Group Address of 233.3.21.200.
1 In this example the P-Group address is a GLOP Multicast Group Address. The second and third octets of 3.21 are derived from the
AS Number 789 using the method described in the GLOP Addressing paragraph on page 3. The last octet is arbitrary with .100 used for
Customer A.
25
C-Source: 10.1.1.1
C-Dest: 239.0.0.1
P-Source: 1.1.1.1
P-Dest: 239.3.21.100
A1
C-Source: 10.1.1.1
C-Dest: 239.0.0.1
1
2
C-Source: 10.1.1.1
C-Dest: 239.0.0.1
P-Source: 1.1.1.1
P-Dest: 239.3.21.100
C-Source: 10.1.2.1
C-Dest: 232.0.0.13
P-Source: 1.1.1.6
P-Dest: 239.3.21.100
3
4
5
6
A3
7
C-Source: 10.1.2.1
C-Dest: 232.0.0.13
A2
Join Path
Traffic Path
Figure 2.4: MVPN Default MDT Operation
The Default MDT has a single P-Group address but is carrying multiple customer (S,G) streams, which use a C-Source
Address and a C-Group Address. The customer streams can be referred to as (C-S,C-G). Each MDT may be denoted as
(P-S,P-G).
2.2.1.3
Data MDT
The Default MDT always sends all traffic to PEs that are participating in that particular MDT. When the amount of
traffic gets larger this method becomes more and more inefficient. To regain efficiency of delivering multicast traffic
to only the PEs that have active receivers the Data MDT is used. The Data MDT can be created when a configured
bandwidth threshold is crossed for the Default MDT. One or more Data MDTs can be created in addition to the Default
MDT and each MDT receives a unique group address which can be obtained from a pool of P-Group addresses. The
Data MDT also only handles data traffic; control traffic is only sent over the Default MDT.
The PE router tracks the amount of bandwidth for each (C-S,C-G) customer stream and creates a new Data MDT if
that particular group exceeds the user configured bandwidth threshold. The PE does not create a new Data MDT based
solely on the aggregate traffic amount for all groups traversing a Default MDT. Each (C-S,C-G) stream gets its own
Data MDT if it crosses the bandwidth threshold. However if the amount of P-Groups in the pool is exceeded then the PE
router will put more than one customer (C-S,C-G) stream onto a Data MDT. The trade-o↵ is that a smaller P-Group
pool allows for fewer MDTs, which means less P-Instance PIM state, but a larger pool allows for more optimization but
with more P-Instance state.
Just like the Default MDT the Data MDT is created using P-Instance PIM. The PE router with active receivers can
send a PIM P-Join message, but first it needs to learn of the P-Group address of the Data MDT. To facilitate this a new
control message is created called a Data MDT Join. The PE with an active source sends the Data MDT Join to all
the PEs participating in the Default MDT using a destination address of 224.0.0.13, the ALL-PIM-ROUTERS Group
Address. The message payload consists of the customer’s (C-S,C-G) information (the customer’s source address and
group address for a stream) along with the Data MDT’s P-Group address. A PE router with receivers for that particular
26
(C-S,C-G) stream will then join that Data MDT. PE’s that do not have active receivers will still store the Data MDT
Join information in case an active receiver does want to join that (C-S,C-G) stream. The source PE that initiated the
Data MDT will wait several seconds before putting traffic onto the Data MDT to allow for time for the receiving PEs to
set up the tunnel [2, p. 66–67].
The Data MDT can be setup by using either PIM-SM or PIM-SSM. If PIM-SM is used the PE routers, upon receipt of
the Data MDT Join, will send a P-Join back toward the P-RP of the shared tree. If PIM-SSM is used the receiving PE
will send a P-Join back to the source PE router creating a source tree. RFC 6037 recommends the use of PIM-SSM [27,
p. 16–17].
A1
C-Source: 10.1.1.1
C-Group: 239.0.0.2
P-Group: 239.3.21.101
P-Source: 1.1.1.1
P-Group: 239.3.21.101
1
2
3
4
5
6
A3
7
A2
Data MDT Join Path
P-Join
Figure 2.5: MVPN Data MDT Signaling
Figure 2.5 shows the source PE advertising the Data MDT Join over the Default MDT. In contrast to creffig:mdtdefault
a new P-Group Address of 233.3.21.101 is used for the Data MDT instead of .100 which is already used for the Default
MDT. The (C-S,C-G) of (10.1.1.1,239.0.0.2) is the customer stream that crossed the bandwidth threshold configured
on the PE. Only PE 5 has active receivers for this customer stream so it sends a P-Join back to the source PE using the
PIM-SSM method. Figure 2.6 below shows the customer traffic traversing the new Data MDT. Note that the traffic for
this group is encapsulated using the new P-Group Address specific to the Data MDT of .101.
27
A1
C-Source: 10.1.1.1
C-Group: 239.0.0.2
P-Source: 1.1.1.1
P-Group: 239.3.21.101
1
2
3
4
5
6
A3
7
A2
Data MDT Join Path
P-Join
Figure 2.6: MVPN Data MDT Operation
2.2.2
Auto-Discovery in MVPNs
The P-Group Address for an MDT is manually configured on a router. When PIM-SM is used to build the trees the
standard mechanisms are used, where the source and receiver PEs can discover each other through the RP. The receiver
PE is sending (*,G) PIM P-Joins toward the RP while the source is sending Register Messages toward the RP. Because
of the use of (*,G) the receiver PE does not need to know the source PE’s unicast source address for that particular
group [2, p. 105]. Each PE only needs to know the P-Group address for the MDT [27, p. 8].
Using PIM-SSM for MDT setup requires an additional mechanism for auto-discovery since the receiver PE does not know
the source PE’s IP address2 . The mechanism created is a new BGP Address Family called MDT SAFI. This Address
Family uses an AFI of 1 and a SAFI of 66. The NLRI field contains one or more of the 2-tuple of an RD prepended to
the IPv4 address used as the source address plus the P-Group Address.
+————————————————+
| RD:IPv4 Source Address (12 octets) |
+————————————————+
| P-Group Address (4 octets)
|
+————————————————+
A Route Target (RT) is also included in the same UPDATE message that contains the MP-BGP Address Family for
MDT SAFI. Using normal BGP VPN mechanisms the route information can be associated with the correct VRF. The
P-Group could also be used, but this would require that all P-Groups are unique across a multi-provider network. This is
difficult, so the RFC specifies that RTs must always be used to facilitate the use of multi-provider networks [27, p. 8–10].
Each BGP speaker participating in MVPN receives the MDT SAFI information and uses the Route Targets to install
2 Contrast this to a typical SSM case in a non-MVPN network where a host is trying to join a specific group and source: The IGMPv3
Membership Report (IGMP Join) has the source included in it along with the group it’s trying to join. The router then turns this into a (S,G)
PIM Join. In the MVPN case in the P-Instance there is no host joining a group using a specific source; only the P-Group Address is known
from manual configuration.
28
the information into the correct VRF. Each PE router can then join the (S,G) tree using normal PIM processes [2,
p. 105–106].
2.2.3
RPF
Reverse Path Forwarding (RPF) checks are a fundamental part of multicast and are still needed in an MVPN environment.
In a typical PIM network the check occurs by making sure traffic is arriving over the interface that is part of the shortest
path back to the source according to the global unicast routing table. This check needs be a handled a little di↵erently
when in a Multicast VRF that consists of MDT MTIs. The check can occur normally for the PIM P-Instance since this is
part of the global table. Within the VRF the C-Instance traffic can either be sourced from a CE interface or from the
MDT’s MTI for the MVPN. If it is received from the CE interface a normal RPF check can occur since that interface is
participating in the VRF’s routing table. However if the packets are received from another PE on the other side of the
MDT the VRF doesn’t automatically have the route toward the other PE. In this case, the routes within a VRF for the
other PEs in the MDT are provided by VPNv4 BGP. The RPF check within the Multicast VRF will set the upstream
interface as the MTI if the VPNv4 message contains a C-Source address. The RPF neighbor address is set to be the
BGP Next Hop address within the VPNv4 message, and PIM will use this address when sending Hello Messages across
the MDT. With these modifications the MTI is treated just like a physical interface on the router, and PIM simply uses
the BGP Next Hop as the PIM neighbor on the other side of the MDT [2, p. 70].
2.3
Considerations for Inter-AS and BGP Free Core
When a BGP free core is used, or in Inter-AS scenarios, extra information is necessary for RPF checks or PIM signaling
to occur. RFC 6037 specifies two new methods to allow for communication in these scenarios.
2.3.1
PIM MVPN Join Attribute
The PIM MVPN Join Attribute, also called the PIM RPF Vector or PIM Vector, is used to assist with Inter-AS
communication or BGP-Free Core communication. The PIM Vector is a new PIM Join Attribute, an extension of PIM.
The PIM Vector contains the IP address of the router that has reachability to the source (the IP address that the PIM
Join/Prune should be forwarded to), and an RD. The RD is taken from the BGP MDT SAFI UPDATE Message [27,
p. 11–13][2, p. 122–123].
Using MVPN and BGP MDT advertisements the PE will be aware of the source address, but it is kept outside of the IGP
table and is in the special BGP MDT SAFI Table. The PIM Vector helps in a BGP free core, or in an inter-AS scenario,
where the source address isn’t known because it’s not in the IGP table. PIM relines on the IGP table, and there is no
BGP MDT Table, since that is only on the PE routers or ASBR router. Instead, P-PIM can use the IP address of the
RPF Vector, which is an IP address of a router that knows how to reach the source PE. The source PE is aware of the
BGP MDT Table and the global IP address used in the PIM Vector. The RD is required so that the PE can associate
the PIM message with the appropriate BGP MDT Table [2, p. 123].
2.3.2
BGP Connector
With each VPNv4 UPDATE message that a PE distributes from a Multicast VRF it must carry the BGP Connector
Attribute. It is an optional transitive attribute [27, p. 15–16]. The value of the attribute is the IP address of the PE (likely
the loopback). For Intra-AS communication it doesn’t have much purpose, but for Inter-AS “Option-B” communication
it has significance, when the ASBR changes the next-hop of the UPDATE message. This allows the originating PE’s
router address to be preserved. This allows the far-end PE in the other AS to fulfill its RPF check [2, p. 116–117].
29
Chapter 3
BGP/MPLS Multicast Virtual Private
Networks
While Draft Rosen MVPNs were able to allow Service Providers (SPs) to create scalable Multicast Private Networks,
Draft Rosen does have its limitations. For one, the SPs are not able to leverage the MPLS technology already deployed
in their network. The Draft Rosen method utilized GRE to create the tunnel between the edges through the core which
created an overlay network. This results in the SP having to maintain a PIM/GRE topology in addition to a BGP/MPLS
topology for the traditional RFC 4364 Unicast VPNs. In a large SP network with many customers a large amount of PIM
state also had to be maintained by the routers in the core, when there is the preference by many SPs to keep their cores
simple and only label-switch traffic. The Default MDT was inefficient in the sense that all PEs had to receive C-Packets
even if there weren’t any receivers, and higher amounts of traffic caused more state to be created to support multiple
Data MDTs [2, p. 153–154].
BGP/MPLS Multicast VPNs were created to extend the use of Unicast VPNs, as defined in RFC 4364, to carry customer
multicast traffic. The RFC defines a framework to allow an SP to carry multiple C-Multicast streams without requiring
the amount of state in the SP network to increase proportionally. The primary method for accomplishing this is by
aggregating multiple customer streams into a single distribution tree throughout the backbone P routers. Multiple
aggregation methods are defined [28, p. 5–7]. BGP/MPLS Multicast VPNs are defined in RFC 6513, which provides the
overview and framework, and RFC 6514 which includes detailed information about the BGP encodings defined within RFC
6513. Eric Rosen co-authored RFC 6513 along with Rahul Aggarwal and both are the main authors for RFC 6514.
RFC 6513 defines an Multicast VPN (MVPN) as two sets of sites, a Sender Site and a Receiver Site. The traffic
originated by a Sender Site should only be received by its corresponding set of Receiver Sites, and not any other Receiver
Site not in that set. In other words Customer A Sender traffic should only be received by receivers at Customer A sites.
Or, Customer A can send traffic to another customer if it allows that to happen, which would imply that the other
customer is in Customer A’s receiver set. This would be the case in an extranet. The MVPN capabilities are carried out
using RFC 4364 mechanisms.
In this chapter the Draft Rosen MVPNs will be referred to as DR-MVPNs and the BGP/MPLS Multicast VPNs described
in RFCs 6513 and 6514 will be referred to as Next-Generation MVPNs (NG-MVPNs). This chapter will also use the
same convention of distinguishing customer and provider elements with the C- and P- prefix. Some terms will be carried
over as well, such as P-Group Address.
3.1
Next-Generation Multicast VPN Overview
In an NG-MVPN network the role of BGP is to convert PIM messages from a customer on a PE into special BGP
messages, send them across the network, and convert them back to PIM at the far end PE for hando↵ to the customer
at another site. Using the Unicast BGP/MPLS procedures defined in RFC 4364 the PE can map these messages to
a specific Multicast enabled VRF. BGP is also responsible for autodiscovery using a set of special BGP messages and
binding C-Multicast routes to whichever provider tunnel is chosen. Using information carried within BGP the PEs can
30
also establish a variety of P-Tunnels. One option is be PIM/GRE based tunnels as in DR-MVPNs. However, there are
also MPLS based options, including RSVP-TE which can allow for traffic engineering of the multicast traffic. With
the inclusion of MPLS technologies for transport and the use of BGP for control plane the technology has the name
BGP/MPLS Multicast VPNs [13, p. 287-292].
PIM Control Plane
BGP Control Plane
P-Tunnel Forwarding
C-Traffic (Tunneled)
PIM Control Plane
C-PIM
C-Payload
Transport Type
C-IP Header
C-Payload
C-PIM
C-Payload
PE6
M-VRF A
A2
PE1
A1
P
M-VRF A
PE5
M-VRF A
PMSI
A3
Figure 3.1: BGP/MPLS Multicast VPN
The above figure is purposefully similar to figure 2.2 on page 24 to compare and contrast the two technologies. As
DR-MVPN there is a P2MP P-Tunnel; however the P-Tunnel can be a variety of options using PIM/GRE and MPLS.
Rather than an MTI, NG-MVPN uses a somewhat similar concept of a PMSI at the endpoints of the P-Tunnel. Also,
the control plane is no longer solely PIM but is now MP-BGP within the SP network. In both cases, the C-Traffic is
tunneled throughout the multicast network. NG-MVPNs can be broken up into two parts: control plane and forwarding
plane. The control plane is the combination of PIM and BGP while the forwarding plane are the various options for
transporting the customer multicast traffic across the network, such as MPLS.
3.2
PMSI
As with DR-MVPN, NG-MVPN also has multicast distribution trees. The two types are Inclusive Trees and Selective
Trees. An Inclusive Tree includes all of the C-Multicast Traffic of the PEs that are members of the same MVPN. The
number of Inclusive Trees is bound by the number of VPNs on a PE router, not by the number of C-Multicast groups.
Selective Trees carry only one or more C-Multicast Groups for a given MVPN. In other words, they don’t carry all of
the C-Multicast groups for a customer. A PE can by default carry all traffic on an Inclusive Tree and elect to only
put higher bandwidth flows onto separate Selective Trees. The Selective Trees should be configured so that they only
terminate on PEs that actually have active receivers [28, p. 7–8]. Inclusive trees also have two subtypes: Multidirectional
and Unidirectional (MI-PMSI and UI-PMSI). The Multidirectional tree is akin to a broadcast network where any PE
that sends a message will have that message sent to any other PE on the MI-PMSI. The Unidirectional PMSI allows a
particular PE to send traffic to any other PE in that MVPN [28, p. 15–16]. The di↵erence may not be obvious between
a MI-PMSI and a UI-PMSI. A MI-PMSI can be thought of a set of UI-PMSIs that create full-mesh connectivity in an
MPVN. This may become more clear when explained in the subsection regarding instantiation of PMSIs in section 3.2.1.
31
The MI-PMSI is used in special circumstances not used in this report (such as PIM as the PE-PE Control Plane or the
use of PIM-BIDIR) so only I-PMSI will be used.
The Inclusive Tree and Selective Tree are akin to the Default MDT and Data MDT of DR-MVPNs respectively. Both
inclusive and selective trees can be aggregated into another tunnel as an aggregated inclusive tree and/or an aggregated
selective tree. This is discussed more in depth in section 3.4.6.
A PE needs the ability to send packets over one or more trees that belong to an MVPN. This concept is realized by
Provider Multicast Service Interfaces (PMSIs). A C-Packet sent via a PMSI will be delivered to some or all of the PEs
participating in the MVPN, and any receiver will be able to determine which VPN the C-Packet resides in. The PMSI is
the entry point for a P-Tunnel, which is the transport mechanism used for delivering C-Packets. RFC 6513 clarifies that
a PMSI is not necessarily part of a P-Tunnel, as a single P-Tunnel can carry multiple PMSIs [28, p. 14–15]. The PMSI is
also an abstract concept. When a PE gives a packet to the PMSI it will arrive at one or all of the PEs that belong to a
given MVPN. A PE may send C-Traffic to the PE routers that have receivers for that traffic or to all of the PE routers
in that MVPN. BGP is used to signal which type of PMSI should be used by including a PMSI Tunnel Attribute that is
included in a NG-MVPN BGP UPDATE [2, p. 157–158].
There are two types of PMSIs. The first is an Inclusive PMSI (I-PMSI). The I-PMSI is used when a PE can send a
message that will be received by all the PEs for that MPVN. Another type of PMSI is the Selective PMSI (S-PMSI).
The S-PMSI is used so that a message will be sent to only selected PEs participating in an MVPN [28, p. 15–16]. It
is possible to send traffic only on S-PMSIs and never use an I-PMSI for carrying C-Multicast Traffic which allows for
further optimization [28, p. 19].
I-PMSI
S-PMSI
A1
A1
1
2
1
2
3
4
6
5
3
4
A3
7
6
A2
5
A3
7
A2
Customer A MVRF
Customer A P-Tunnel
Figure 3.2: Provider Multicast Service Interface
Figure 3.2 shows an I-PMSI and an S-PMSI. The PMSI can be thought of the interface to the P-Tunnel, however for
each P-Tunnel there may be more than one PMSI. The I-PMSI connects to all the PEs for Customer A, while the S-PMSI
connects to only one PE. The S-PMSI may also carry only a subset of the multicast groups for the MVPN.
32
3.2.1
Instantiating PMSIs
A PMSI is instantiated by P-Tunnels, which are the encapsulation and forwarding method for multicast traffic in NG-MVPN.
The P-Tunnels can be created by PIM, mLDP, RSVP-TE, or replication over P2P Unicast P-Tunnels. In the PIM case,
as is in DR-MVPN, there is a P-Instance of PIM that is used to create the tunnels. These can be either source tree
or shared tree methods, but an S-PMSI is best created using source tree methods. Using mLDP P2MP can create an
S-PMSI or a UI-PMSI, and MP2MP mLDP can create a MI-PMSI. An MI-PMSI can also be created by a set of P2MP
mLDP LSPs. RSVP-TE can instantiate an S-PMSI or a UI-PMSI with a single set, where multiple sets can instantiate an
MI-PMSI (one by each PE in the MPVN). Unicast P-Tunnels are either a partial or full mesh for UI-PMSI and S-PMSI or
MI-PMSI respectively.
P-Tunnels are discussed in detail in section 3.4.
3.3
PIM and BGP Control Plane
NG-MVPN requires that a PE maintains at most one BGP peering session with all the other PEs in the network, or with
a Route Reflector (RR), for carrying the NG-MVPN control information [28, p. 11]. This report only considers using
BGP for PE-PE control information and not PIM. In other words, the report only considers translating, for example,
PIM C-Join messages into BGP C-Multicast Routes, and not forwarding the PIM Join over a PMSI. The description for
PE-CE PIM and PE-PE BGP components are covered below.
3.3.1
PIM Control Plane for CE-PE Information
Similar to Unicast BGP/MPLS VPNs, NG-MVPNs have the CE peer only with the directly attached PE using a multicast
routing protocol over the attachment circuit (AC). The CE does not peer with the remote CE on the other side of the
SP network. The AC is part of a VRF that is configured to be multicast enabled. As with DR-MVPNs these multicast
peering sessions between the CE and PE are referred to as multicast C-Instances. The VRF that the AC is attached
to contains both unicast and multicast routing instances. RFC 6513 specifies the use of PIM-SM, PIM-SSM, and
Bidirectional PIM (BIDIR-PIM) as the PE-CE protocols [28, p. 13]. The PE-PE support methodology for BIDIR-PIM will
not be discussed in this report.
3.3.2
MP-BGP Control Plane for PE-PE Information
New Path Attributes, Extended Communities, and NLRI Encodings (referred to as Route Types) were created to support
NG-MVPNs and are included in NG-MVPN BGP UPDATE Messages. The following sections describe in detail each
addition.
3.3.2.1
New BGP Path Attributes and Extended Communities
RFC 6514 defines three new path attributes that are used in conjunction with the new NLRI encodings described in the
next section.
PMSI Attribute The P-Tunnel Multicast Service Interface (PMSI) Attribute in a BGP UPDATE message identifies
which type of P-Tunnel is used to send traffic. This is an optional transitive attribute. The PMSI Attribute is made up of
four fields as follows [29, p. 10–11]:
33
+————————————————-+
| Flags (1 Octet)
|
+————————————————-+
| Tunnel Type (1 octet)
|
+————————————————-+
| MPLS Label (3 Octets)
|
+————————————————-+
| Tunnel Identifier (Variable)
|
+————————————————-+
The Flags field only has one flag which indicates of leaf information is required. The MPLS Label field is either set to
zero to indicate there is no label, or a label value is encoded in the high-order 20 bits of the three octets [29, p. 10]. The
MPLS Field is used when the ingress PE uses “upstream label allocation” to distribute a label to an egress router [30,
p. 9]. The Tunnel Type field has the following values [29, p. 10]:
• 0 - No Tunnel Information Present
• 1 - RSVP-TE P2MP LSP
• 2 - mLDP P2MP LSP
• 3 - PIM-SSM Tree
• 4 - PIM-SM Tree
• 5 - BIDIR-PIM Tree
• 6 - Ingress Replication
• 7 - mLDP MP2MP LSP
Depending on the value in the Tunnel Type field the Tunnel Identifier includes the following information [29, p. 10–
13]:
No Tunnel Information Present No tunnel information is included. This setting can be used when a PE needs to know
the receivers before it establishes a tunnel. The “Leaf Information Required Bit” is set in this case, which will
prompt the other PEs to send Leaf A-D route messages [28, p. 52].
RSVP-TE P2MP LSP The same information in the P2MP Session Object is included. This is the Extended Tunnel ID,
Tunnel ID, and P2MP ID.
mLDP P2MP LSP The P2MP FEC Element is included. This is the combination of the source address of the LSP
tree and a unique value.
PIM-SSM Tree The P-Root Node Address (P-Source Address of the PE) and the P-Group Address. The P-Group
address is an address from the P-Instance of PIM running in the service provider network.
PIM-SM Tree The Sender Address and the P-Group Multicast Address.
BIDIR-PIM Tree BIDIR-PIM uses the same Tunnel Information as PIM-SM.
Ingress Replication The unicast IP address of the tunnel endpoint.
mLDP MP2MP LSP The MP2MP FEC Element, which is similar to the P2MP FEC in concept, and is not discussed
in this report.
Section 3.4 discusses the various types of P-Tunnels in depth, except for BIDIR-PIM and mLDP MP2MP, which will not
be covered in this report.
Source AS Extended Community This BGP Extended Community is set to the AS Number (ASN) of the SP network
that the PE belongs to. It is used for identifying the ASN, and has particular use for Inter-AS updates. It is an optional
transitive attribute. A unicast BGP/MPLS UPDATE Message must carry this Extended Community [29, p. 13].
34
VRF Route Import Extended Community Every Multicast VRF is required to have an import Route Target configured,
which is similar use to the Unicast BGP/MPLS VPNs import/export Route Target. This Route Target is referred to as
the C-Multicast Import RT. It contains two fields. One is the “Global Administrator Field” which contains an IP address
of the PE that is the same across all VRFs (e.g. a loopback address on the PE). The other is the “Local Administrator
Field” which is set to a unique 16-bit number that can identify a VRF. The combination of the Global and Local Fields
can uniquely Identify a VRF [29, p. 14].
An important clarification from unicast BGP/MPLS RTs is that The C-Multicast Import RT is also dynamic in the sense
that the Global Admin Field always contains the IP address of the active sender, which can change [2, p. 166].
The C-Multicast Import RT is just the value that is configured for a particular VRF, and is carried to other PEs by
putting the value into the RT Extended Community of a BGP UPDATE message. Of the special BGP/MPLS MVPN
Routes, which are described in section 3.3.2.2, C-Multicast Import RTs are only carried by the Route Target Extended
Communities of C-Multicast Routes (Type 6 and 7) [2, p. 166]. Outside of these special routes, the C-Multicast RT
value must also be carried in the VRF Route Import Extended Community of a BGP UPDATE Message for a unicast
BGP/MPLS VPN Route. These unicast routes represent the source of a particular C-Multicast flow. However, if it is
known that none of the unicast routes are capable of being a source, then the route should not carry the VRF Route
Import EC [29, p. 14].
3.3.2.2
MCAST-VPN NLRI
RFC 6514 defines a new MP-BGP NLRI with a set of NRLI encodings for two purposes: MVPN auto-discovery (A-D)
and binding as well as advertisement of C-Multicast Routes. Each NRLI encoding is known as a Route Type. One of
these Route Type may indicate the type of PMSI that is going to be signaled, or it may indicate that a PE has a receiver
ready to receive traffic. As discussed earlier there are multiple types of PMSIs and BGP is used to signal which types are
used for an MVPN. The first five Route Types are for auto-discovery and binding information are as follows:
• Intra-AS I-PMSI A-D route
• Inter-AS I-PMSI A-D route
• S-PMSI A-D route
• Leaf A-D route
• Source Active A-D route
The last two Route Types are for carrying C-Multicast Route information, “C-Multicast Routes”. Each VRF contains a
unique Tree Information Base (MVPN-TIB) containing the C-Multicast Routes for that particular VRF. The two Route
Types are as follows:
• Shared Tree Join Route
• Source Tree Join Route
The NLRI is identified by AFI 1 and SAFI 5 (MCAST-VPN) and consists of three fields. The first is the Route Type field
which identifies which Route Type will be encoded in the NLRI. The next field is the length field to specify how many bits
will make up the actual Route Type encoding [29, p. 4-6].
+————————————————-+
| Route Type (1 Octet)
|
+————————————————-+
| Length (1 octet)
|
+————————————————-+
| Route Type specific (Variable)
|
+————————————————-+
Each NLRI Route Type encoding is described below along with its behavior in an NG-MVPN network.
35
Route Type 1 - Intra-AS I-PMSI A-D The Intra-AS I-PMSI A-D route is advertised by any PE that wishes to
participate in NG-MVPN auto-discovery and binding.
+————————————————-+
| Route Distinguisher (8 Octets)
|
+————————————————-+
| Originating Router’s IP Address
|
+————————————————-+
The NLRI contains an RD that is configured for the VRF that the route originated from along with the same IP address
that it uses in the VRF Route Import EC that was used in a Unicast BGP/MPLS advertisement for that VRF (e.g. a
loopback address). The combination of the RD and the Originating Router’s IP address uniquely identifies a Multicast
VRF. The advertisement only contains the Tunnel Attribute field if an I-PMSI is being created (remember that an
I-PMSI does not need to be used and the network can use solely S-PMSIs). In other words, in any case, the PE send this
type of advertisement. If the I-PMSI is being used then the advertisement must contain the PMSI Attribute, and if
Ingress Replication is used it must contain a label for demultiplexing at the receiver end. The Next Hop field of the
MP REACH NLRI that contains the MCAST-VPN Route must be set to the same address as the Originating Router’s
IP Address field. The advertisement also uses the same Route Target values as the Unicast BGP/MPLS export routes
for that VRF.
Upon receipt of the I-PMSI Intra-AS advertisement, the receiving PE will import the routes into the VRF if the Route
Target in the RT EC of the route matches the RT value configured for the VRF. When the receiving PE receives the
Intra-AS Route advertisement and it does not have the PMSI Tunnel Attribute and Ingress Replication is not used the
receiving PE can assume that (1) only an S-PMSI will be used, or (2) that the originating PE of the advertisement
cannot send multicast traffic (i.e.it is only a receiver). To determine whether it’s case 1 or 2 the VRF Route Import EC
is used. If the VRF Route Import EC is not present for a unicast BGP/MPLS route, then the PE that originated the
cannot be selected as a source PE (as it does not have routes with a source). Therefore it is case (1), and this PE will
only be used for originating S-PMSI routes.
If a Tunnel Attribute is carried and Ingress Replication is used then the MPLS Label and the Address in the Tunnel
Identifier should be used when the local PE sends traffic to the PE that originated the route. In all other cases the local
PE should join the P-Tunnel (if RSVP-TE is used then the sender PE is responsible to building the tunnel to the local
PE).
The only time an Intra-AS I-PMSI Route is not originated by a PE is when a MVPN site will not be receiving any
multicast traffic (i.e. it is only a sender) and Ingress Replication is used.
An example of an Intra-AS I-PMSI A-D route as it is shown in a router’s routing table:
1:789:100:1.1.1.1, where 1 is the Route Type, 789:100 is the RD, and 1.1.1.1 is the IP address of the originating router
[31].
Route Type 2 - Inter-AS I-PMSI A-D This Route Type is only used when Inter-AS segmented tunnels are used
between AS networks. Only an ASBR originates this route.
+————————————————-+
| Route Distinguisher (8 Octets)
|
+————————————————-+
| Source AS (4 Octets)
|
+————————————————-+
The RD is encoded the same as it is in Unicast BGP/MPLS VPNs. The Source AS contains an AS Number of the
originating router, and occupies the low-order 16 bits of the field. The high-order bits are set to zero. This Route Type is
originated when an ASBR determines, using Type 1 Routes, that there is an active receiver in its own AS. The Inter-AS
I-PMSI A-D Route also carries an import Route Target called “ASBR Import RT” (which is the unicast RT), which
allows for the accptance of Leaf A-D route and C-Multicast routes from an ASBR. The ASBR sends the advertisement
via external BGP to the neighboring AS. It sends the message with the “Leaf Information Required” flag set, and does
not send any label. The Next Hop field of the MP REACH NLRI field is set to an IP address that is reachable by a
36
router in the other AS. In the network that is on the other side of the ASBR the identification of a source becomes the
pair of AS and RD, rather than PE and RD. This means that even with multiple trees on the source AS side, the other
AS may have just one MVPN for all of the MVPNs in the source AS.
Upon receipt of the I-PMSI Inter-AS advertisement, the receiving PE will import the routes into the VRF if the Route
Target in the RT EC of the route matches the RT value configured for the VRF. If the router is an ASBR it will pass the
routes along in external BGP. If the PMSI Attribute carries a Tunnel Type for PIM-SM/SSM or mLDP P2MP Tree, the
receiving router should join the tree using the identifying information carried in the Tunnel Identifier field of the attribute.
If the Tunnel Identifier is set to RSVP-TE P2MP Tree, then the originating router is required to build the sub-LSP to
the receiving router (this may have been done already as the headend is responsible for initiating the LSP construction in
RSVP-TE). If the “Leaf Information Required” bit was set then the receiving router will originate a Leaf A-D Route.
The Leaf A-D Route Key is populated with the MCAST VPN NLRI information from the Inter-AS I-PMSI advertisement
[29, p. 20–30].
An example of an Inter-AS I-PMSI A-D route as it is shown in a router’s routing table:
2:789:100:789, where 2 is the Route Type, 789:100 is the RD, and 789 is the source AS Number of the originating
router [31].
Route Type 3 - S-PMSI A-D
C-Source address (C-S,C-G).
The S-PMSI A-D Route Type is only used when the C-Multicast stream has a specific
+————————————————-+
| Route Distinguisher (8 Octets)
|
+————————————————-+
| Multicast Source Length (1 Octet)
|
+————————————————-+
| Multicast Source (variable)
|
+————————————————-+
| Multicast Group Length (1 octet)
|
+————————————————-+
| Multicast Group (variable)
|
+————————————————-+
| Originating Router’s IP Address
|
+————————————————-+
The RD is the same as in the Inter-AS and Intra-AS I-PMSI Route. The Multicast Source contains the IP address of the
C-Multicast source IP address. The Multicast Group contains the C-Multicast Group Address or the mLDP P2MP FEC
values when P2MP mLDP is used. The Originating Router’s IP Address is that of the PE, not the CE, as with the
Intra-AS I-PMSI A-D message, and it needs to be the same as the address used in the VRF Route Import Extended
Community (e.g. a loopback address). This Route Type carries the PMSI Tunnel Attribute which contains the identity
of the P-Multicast Tree used for the P-Tunnel. If the originating PE needs to learn about the leaves of the P-Multicast
tree it can set the “Leaf Information Flag” bit. An ASBR in certain circumstances may convert one or more received
S-PMSIs from another AS into one I-PMSI and distribute it toward the receiver in its own AS.
The process when receiving an S-PMSI A-D route is the same as described for the Inter-AS I-PMSI A-D Route. If
the “Leaf Information Required” bit is set then the receiving PE originates a Leaf A-D route. The Route Key Field is
populated with the MCAST VPN NLRI information from the S-PMSI A-D Route [29, p. 40–45].
An example of an S-PMSI A-D route as it is shown in a router’s routing table:
3:789:100:32:10.1.1.1:32:239.0.0.1:1.1.1.1, where 3 is the Route Type, 789:100 is the originating router’s RD, 32 is the
length of the address (indicating IPv4) in both locations, and 10.1.1.1 is the C-Source Multicast Address, 239.0.0.1 is
the C-Group Address, and 1.1.1.1 is the Originating Router’s IP Address [31].
Route Type 4 - Leaf A-D Route The previous three Route Types mentioned the Leaf A-D Route. The Leaf A-D
Route is sent in response to an advertisement that contains the PMSI Tunnel Attribute with the “Leaf Information
Required” bit set to 1 in an Inter-AS I-PMSI A-D Route or in an S-PMSI A-D Route.
37
+————————————————-+
| Route Key (variable)
|
+————————————————-+
| Originating Router’s IP Address
|
+————————————————-+
The Route Key field carries the MCAST VPN NLRI information from whichever type of PMSI A-D Route it received
(either Inter-AS Inclusive or Selective). If the Tunnel Type from the received advertisement is Ingress Replication then
the Leaf A-D needs to set Ingress Replication in its PMSI Tunnel Attribute Tunnel Type field, and it also needs to
carry a label. This label will be placed on the stack by the ingress PE (the same one that originated the PMSI A-D
advertisement) so the MVPN traffic can be demultiplexed into the correct Multicast VRF by the egress PE (the same one
that originated the Leaf A-D advertisement). The Next Hop of the MP REACH NLRI in the Leaf A-D Message must be
set to the same IP that is in the Originating Router’s IP Address field. The Leaf A-D advertisement also contains an
IP-Based RT EC that is based on the IP address carried in the Next Hop field of the received PMSI A-D advertisement
(the sender PE’s IP address) in the Global Admin Field. The Local Admin field is set to zero [29, p. 29]. Zero is used
because the correct VRF can be determined by the corresponding Route information in the Route Key field [32].
An example of a Leaf A-D route as it is shown in a router’s routing table:
4:3:32:10.1.1.1:32:239.0.0.1:1.1.1.1:1.1.1.7, where 4 is the Route Type. In this example, after the 4:, the S-PMSI
MCAST VPN NLRI information is copied, which makes the Route Key field. The trailing 1.1.1.7 is the Originating
Router’s IP Address (of the PE that is sending the Leaf A-D advertisement) [31]. The scenario in this example is that
the PE 1.1.1.1 originated an S-PMSI A-D Route and the PE 1.1.1.7 is responding with a Leaf A-D Advertisement.
In a common scenario, an ingress (source) PE will originate a Type 3 S-PMSI A-D Route with the “Leaf Information
Required” bit set. Receiver PEs that have active receivers will respond with a Type 4 Leaf A-D Route. This is the
standard process when using S-PMSIs [30, p. 17].
Route Type 5 - Source Active A-D Route The Source Active A-D Route is used to advertise if a PE has an active
source. The Source Active A-D Route is only used for groups outside the 232/8 range for SSM and only in conjunction
with Source Tree C-Multicast Join (Route Type 7) [29, p. 9]. When using the SSM range a PE will simply use the
Source Tree C-Multicast Route [32].
+————————————————-+
| Route Distinguisher (8 Octets)
|
+————————————————-+
| Multicast Source Length (1 Octet)
|
+————————————————-+
| Multicast Source (variable)
|
+————————————————-+
| Multicast Group Length (1 octet)
|
+————————————————-+
| Multicast Group (variable)
|
+————————————————-+
The Source Active A-D Route is only used in conjunction with C-Trees when they switch from a shared tree to a source
tree, or when the C-Tree is only a source tree. Depending on the scenario the fields are populated di↵erently, except the
RD field which takes the standard RD encoding from the Multicast VRF in Unicast BGP/MPLS format. In both cases
the Source and Group fields are the C-Source and C-Group addresses. However in the procedure that is solely source
tree the C-Source and C-Group are received from PIM Register messages1 . The MP REACH NLRI Next Hop is the
same as the address carried in the VRF Route Import EC of the unicast BGP/MPLS routes that are advertised by the
PE, and should carry the same Route Targets as the Intra-AS I-PMSI A-D Route the PE originates. The Source Active
A-D Route is propagated to all of the PEs of the MVPN [29, p. 46–47].
1 It
can also come from an MSDP Source-Active Message but that is outside the scope of this report
38
Source Tree Only
There are three ways that a PE can learn about an active multicast source in this scenario. One is for the PE to be a
C-RP. A second way is to use PIM Anycast RP procedures. Another way is to use MSDP to exchange the information
from the C-RP to the PE. Once a new source is learned using any of these methods the PE will send a Source Active A-D
route to all PEs within the same MVPN [29, p. 49–52]. This is the default method for NG-MVPN. PEs with receivers
for the C-Group in the Source Active message will respond with a Type 7 C-Multicast route toward the ingress PE [2,
p. 162].
Shared Tree changing to Source Tree
In certain situations the default method is not suitable. One such situation is when the C-RP is not on a PE and MDSP
is not used. In this case a Shared Tree method is used where Joins are sent to the RP. In NG-MVPN the Type 6 Shared
Tree C-Multicast Route is used instead of a Type 7 Route. These Type 6 messages contain the (C-*,C-G) information
and are forwarded from the PE with a receiver to the PE that is attached to the Customer VPN site of the C-RP [2,
p. 164]. At this point the C-RP is sending traffic to its PE and the PE is forwarding this traffic to all the PEs on that
I-PMSI. The PE with the C-Source then sends its packets to the C-RP with PIM register messages. The PE with the
C-RP attached will then send (C-S,C-G) messages over the I-PMSI to all the PEs. Any C-Receiver o↵ the other PEs will
send (S,G) PIM Joins to their respective PE, which will them forward them as (C-S,C-G) C-Multicast Routes (Type 7
Source Tree) to the PE with the C-Source. This PE will then start sending traffic onto the I-PMSI, while the C-RP is
also sending traffic. Recall that the I-PMSI includes all PEs. As a result a PE may receive traffic from both the C-RP
PE and the C-S PE over the PMSI. To prevent this the Source Active A-D route is used. Whenever a PE creates an
(C-S,C-G) state within its VRF, because of reception of the Source C-Multicast Route, it originates the Source Active
route to all the PEs of that MVPN. As a result, the PEs that receive the Source Active advertisement, that have active
receivers, will accept traffic from the PE with the C-Source instead of the PE with the C-RP. The PE connected to the
C-RP will stop forwarding any traffic for that specific (C-S,C-G) as a result of receiving the Source Active advertisement
[28, p. 63–67].
A1
C-S
1
2
3
4
5
6
A2
A3
C-RP
7
C-R
A4
C-R
Figure 3.3: Shared Tree to Source Tree Switchover using Source Active A-D Routes
Consider the simple topology in figure 3.3. PE1 is attached to the C-Source and PE3 is connected to the C-RP. PE1 is
forwarding the traffic to PE3 which is then forwarding the traffic to PEs 6 and 7 which have C-Receivers attached to
them over a PMSI. The C-Receiver attached to PE6 may send an (S,G) PIM Join that gets translated to a Type 7
(C-S,C-G) Source Tree C-Multicast Route by PE6 and then forwarded to PE1. Upon reception PE1 will start forwarding
the traffic onto the PMSI. To prevent the scenario described above where PE6 and PE7 receive traffic from both the
C-Source and the C-RP, PE1 will send a Source Active A-D Route to all the PEs. PE6 and PE7 will select PE1 as its
sender, and PE3 will cease forwarding traffic onto the PMSI for that particular (C-S,C-G).
39
Handling a Source Active A-D Route For Both Methods
When a PE receives a Source Active A-D Route it will put the route in the Multicast VRF with the corresponding RTs. It
will also check to see if a matching (C-*,C-G) entry is present. If one is present it will use the tunnel of the corresponding
Source Active A-D Route in the forwarding path to receive traffic. When the PE receives a C-Multicast PIM Join from
the CE it will install the (C-*,C-G) state in the MVPN TIB and check if there is a corresponding Source Active A-D
Route. If there is one present it will set up the forwarding path to receive traffic from the tunnel of corresponding Source
Active A-D Route. In both cases the (C-*,C-G) entry must have an associated PE-CE Attachment Circuit within that
Multicast VRF [29, 47–48].
5:789:100:32:10.1.1.1:32:239.0.0.1, where 5 is the Route Type, 789:100 is the originating router’s RD, 32 is the length
of the address and 10.1.1.1 is the C-Source Multicast Address and 239.0.0.1 is the C-Group Address [31]. Regardless of
how the fields were populated they will appear the same in the Multicast VRF routing table.
Route Type 6 and Route Type 7 - Shared and Source C-Multicast Route C-Multicast Routes are created in
response to the creation of C-PIM states on a PE within a Multicast VRF. The encoding for Route Types 6 and 7 are
the same, with only a di↵erence in the Customer Source Address fields.
+————————————————-+
| Route Distinguisher (8 Octets)
|
+————————————————-+
| Source AS (4 Octets)
|
+————————————————-+
| Multicast Source Length (1 Octet)
|
+————————————————-+
| Multicast Source (variable)
|
+————————————————-+
| Multicast Group Length (1 octet)
|
+————————————————-+
| Multicast Group (variable)
|
+————————————————-+
The RD field consists of he standard Unicast BGP/MPLS encoding. The Source AS field contains the AS Number of
the PE that originated the advertisement. The Multicast Group is always the C-Multicast Group Address. If it is a Type
6 Shared Tree C-Multicast Route the C-Multicast Source is the address of the C-RP. If it is a Type 7 Source Tree
C-Multicast Route the address consists of the C-Source Address for that group.
A PE creates a Shared Tree Join C-Multicast Route when the C-PIM instance creates a (C-*,C-G) state. If this state is
deleted the PE can send a C-Multicast advertisement using the MP UNCREACH NLRI attribute. A PE will create and
delete a Source Tree C-Multicast Route once the C-PIM instance creates a (C-S,C-G) state using similar methods to
the (C-*,C-G) state. Again, the di↵erence is that with the (C-*,C-G) Shared Tree state the C-Source Address of the
advertisement is the C-RP, and in the (C-S,C-G) case it is the C-Source Address. There is a special case where mLDP is
the C-Instance Protocol (between the CE and PE). In that case there will be an mLDP state with the P2MP FEC, and
the C-Source Address is the P2MP FEC.
All three cases (Shared, Source, and mLDP) are the same for constructing the rest of the C-Multicast Route. The
local PE will select the best Uptream multicast Hop (UHM) route and pull the following information: The ASN that is
carried in the Source AS Extended Community of the UMH route and the C-Multicast Import RT of the upstream PE
(which is from the value of the VRF Route Import EC of the UMH route). The UMH route was also described as the
Unicast BGP/MPLS VPN Route that represents the source of the C-Multicast flow). UMH routes and selection are
discussed in detail in section 3.3.3. The RD of the C-Multicast Route is set to the RD of the UMH route that contains
the subnet for the C-Multicast Source Address. The C-Multicast Route also constructs an RT that is set to the value of
the C-Multicast Import RT (the value of the C-Multicast Import RT, the VRF Route Import EC, and the last RT are the
same).
If the local and source PEs are in di↵erent AS networks then the AS number of the source PE is used, and the RD is
taken from the Inter-AS I-PMSI A-D route for the corresponding C-Multicast Route. An ASBR can use the RD and
40
Originating IP Address information to propagate the C-Multicast Route.
When a PE receives a Shared Tree or Source Tree C-Multicast Route it will check to see if any of the RTs in the
Extended Communities of the route match the C-Multicast Import RT of the VRF. It will then create the (C-*,C-G) or
(C-S,C-G) state in the VRF (assuming the RTs match for that VRF) then bind either an I-PMSI or S-PMSI to that
route depending on the PE’s configuration. If a withdrawal message (MP UNREACH NLRI) is received then the PE
must remove the (C-*,C-G) or (C-S,C-G) state in the VRF. If the C-Group is in the non-SSM range then a timer is used
to delay the removal. This is done so that the PE will continue forwarding traffic over the PMSI until all the PEs have
received the withdrawal of the Source Active A-D route for a given (C-S,C-G) [29, p. 32–39].
Examples of the routes for both Shared and Source C-Multicast Routes: 6:789:100:789:32:1.1.1.4:32:239.0.0.1, where 6
is the Route Type, 789:100 is the originating router’s RD, the following 789 is the Source AS, 32 is the length of the
address and 1.1.1.4 is the C-Source Multicast Address as the C-RP and 239.0.0.1 is the C-Group Address.
7:789:100:789:32:10.1.1.1:32:239.0.0.1, where 7 is the Route Type, 789:100 is the originating router’s RD, the following
789 is the Source AS, 32 is the length of the address and 10.1.1.1 is the C-Source Multicast Address as the C-Source
and 239.0.0.1 is the C-Group Address [31].
3.3.3
MP-BGP for PE-PE Upstream Multicast Hop
When a PE receives a PIM C-Join or C-Prune message from a CE it contains a (*,G) or (S,G) flow. If the source of
this flow, or the RP, is across the MVPN of the SP network then the PE needs to find the “Upstream Multicast Hop”
(UMH). The UMH is the PE where the traffic enters a network. This could be the PE where the (*,G) packets enter the
network in the case of a shared tree and an RP, the actual source in the case of a (S,G) source tree, or at an ASBR.
RFC 6513 refers to both the (*,G) RP source or the (S,G) source as the C-Root. This report will follow the same
convention. The process of selecting the UMH for a given C-Root is called the “upstream multicast hop selection.”
UMH selection can be done by PIM or BGP, but this report only focuses on the BGP method.
3.3.3.1
BGP for Upstream Multicast Hop Selection
In a simple case the PE does the UMH selection by checking the unicast routing table of the VRF that the PE-CE
Attachment Circuit is in. However sometimes a customer will choose to use a separate set of unicast routes. In this case
the PE-CE relationship may share unicast routes using MP-BGP and SAFI 22 or OSPF with a Multi-Topology Identifier
(the cases are not limited to these two protocols). In this case an MVPN can have two separate VRFs, one for the
unicast and one for the routes used for UMH. While the same BGP SAFI can be used to send this traffic to both VRFs
across the backbone3 , RFC 6513 uses a new MP-BGP Address Family (AF), referred to as “Multicast for BGP/MPLS
IP Virtual Private Networks (VPNs)” [28, p. 25–26]. This AF should not be confused with the MVPN Address Family
from section 3.3.2.2 used for the various autodiscovery/binding and C-Multicast Routes.
The SAFI for this AF is 129. The NLRI of this MP REACH NLRI is a Length field and a Prefix field. The length field
determines if it’s IPv4 or IPv6, and the prefix is an RFC 4364 RD prepended to the IP address. These routes must
also carry the Source AS Extended Community and the VRF Route Import Extended Community, as with the Unicast
BGP/MPLS Routes [29, p. 31–32].
3.3.3.2
Upstream Multicast Hop Selection
After a PE receives a C-Join message it looks in the Multicast VRF. In the VRF it looks at all the UMH routes and
determines the best match for the C-Root from within that C-Join (matching the source Address or the RP address).
For the matching routes the PE determines the Upstream PE and RD. The Upstream PE is determined from the VRF
Route Import EC, or if that is not included, the route’s BGP Next Hop. In both cases the RD is taken from the route’s
2 SAFI 2 is the value for Multicast Routes. However these are just unicast routes that are used specifically for multicast purposes and are
kept in their own routing table.
3 In which case RFC 6514 recommends using the same RD between unicast and UMH VRF on the same PE, but a di↵erent RD for the set
on di↵erent PEs.
41
NLRI. This creates a set 3-tuples of Route, Upstream PE and Upstream RD. All of the routes in this set are called
the “UMH Route Candidate Set’. A router must choose the best Route out of the set, which results in the ”Selected
UMH Route,“ and the corresponding ”Selected Upstream PE“ and ”Selected Upstream RD“ [28, p. 27]. When Inter-AS
methods are used the UMH and the Selected Upstream PE are di↵erent. In this case the UMH is the ASBR IP address
[28, p. 29].
3.4
Forwarding Plane Considerations
As is in RFC 4346 for Unicast BGP/MPLS VPNs, RFC 6513 decouples the methods for exchanging control/routing
information from the methods for encapsulating and forwarding the traffic. The P-Tunnels supported can be encapsulated
in MPLS, IP, or GRE and can be signaled by PIM (using GRE encapsulation) and MPLS (RSVP-TE and mLDP) [28,
p. 11]. Inline with separation of control and forwarding, the PMSI is the control plane component that binds the traffic
to a P-Tunnel (as a P-Tunnel can carry more than on PMSI). The P-Tunnel forwarding plane is the component that
handles the encapsulation and forwarding of the traffic through the network. In the case of MPLS the concepts discussed
in Chapter 1 are used to build the tunnel. No new extensions are required for NG-MVPN. In the case of PIM the
concepts discussed in Chapter 2 are used to build the tunnel. PIM P-Tunnels in NG-MVPN are very similar to the ones in
DR-MVPN. A PE router will use the PMSI information from the BGP A-D routes in conjunction with the PMSI Tunnel
Attribute to determine which P-Tunnel is used for a particular customer stream [2, p. 159].
3.4.1
Tunnel Type 1 - RSVP-TE P2MP LSP
Only the headend PE for an RSVP-TE LSP sends Intra-AS I-PMSI A-D Routes with the Tunnel Attribute included. All
other PEs send Intra-AS I-PMSI A-D Routes without the PMSI tunnel attributes. The headend PE, after receiving the
Intra-AS I-PMSI A-D Routes without the PMSI Attribute, will build the RSVP-TE sub-LSPs of the P2MP LSP to each
PE that originated the routes. If an S-PMSI is being used then the headend PE will send an S-PMSI A-D Route with the
“Leaf Information Required” bit set. This will result in a Leaf A-D Route and the headend router will use this to bind a
C-Flow to that S-PMSI and build the LSP. The PMSI Tunnel Attribute contains the Tunnel Type set to RSVP-TE P2MP,
the RSVP-TE P2MP Session Object, and optionally a P2MP Sender Template Object4 [28, p. 39–40]. Penultimate Hop
Popping (PHP) must be disabled so that the MPLS label is carried all the way to the PE. This is because the label is
used to correlate the traffic carried by the LSP to its VRF.
3.4.2
Tunnel Type 2 - mLDP P2MP LSP
When using mLDP the A-D Routes carry a PMSI Tunnel Attribute identifying the use of an mLDP P2MP LSP. The
Tunnel Identifier is set to the mLDP P2MP FEC Element [28, p. 42]. The setup process for I-PMSI and S-PMSI tunnels
is the same as the RSVP-TE case. However, the egress PE initiates the LSP construction [2, p. 248–250].
3.4.3
Tunnel Type 3 - PIM-SSM
When PIM-SSM is used to create the P-Tunnel the PMSI Tunnel Attribute states that PIM-SSM is used [28, p. 40]. The
Tunnel Identifier is the IP Address of the PE that is attached to the C-Source, which is used as the P-Source Address for
the IP/GRE encapsulation, and the P-Group Address. When S-PMSIs are being created the PE routers should have a
set of P-Group Addresses that can be used to create the tunnels [28, p. 41].
3.4.4
Tunnel Type 4 - PIM-SM
When PIM-SM is used to create the P-Tunnel the PMSI Tunnel Attribute states that PIM-SM is used and uses the
P-Group Address. The PE at the root of the shared tree sends out the Intra-AS I-PMSI A-D Routes [28, p. 41]. The
4 This
is used to identify a particular P2MP TE LSP
42
information in the Tunnel Identifier field of the PMSI Attribute is the Sender Address (the IP address of the originating
PE) and the P-Group address. The Sender Address is used as the P-Source Address for the IP/GRE encapsulation [29,
p. 12]. As is the case with PIM-SSM, when S-PMSIs are being created the PE routers should have a set of P-Group
Addresses that can be used to create the tunnels. However in the PIM-SM case each PE must have a unique set of
addresses. [28, p. 41].
3.4.5
Tunnel Type 6 - Ingress Replication
In this type of P-Tunnel the ingress PE replicates C-Traffic then puts it on to any number of point-to-point unicast
tunnels to each PE. IP/GRE or MPLS can be used as the tunnel technology. The PE routers still send an Intra-AS
I-PMSI A-D Routes. The PMSI Tunnel Attribute will identify Ingress Replication, and in this case must also send an
MPLS label. This label is used to identify the proper VRF at the egress PE [28, p. 42–43].
3.4.6
P-Tunnel Aggregation
As mentioned earlier in the report, multiple PMSIs can be aggregated into one P-Tunnel using MPLS. In essence, an
outer tunnel is built using the processes described earlier in the report. These are built using downstream allocated labels.
This is because the downstream LSR (with traffic flowing from ingress PE to egress PE as upstream to downstream in
the context of VPN) originally advertised the label toward the upstream LSR. To support aggregation, a new concept
called “upstream label allocation” is used, which is defined in RFC 5331. In this model the upstream LSR allocates and
advertises the label being used [33, p. 1-11].
In NG-MVPN, BGP is used to send the upstream allocated label. The label is contained within the PMSI Tunnel
Attribute. Intra-AS I-PMSI[28, p. 17], Inter-AS I-PMSI [28, p. 22], and S-PMSI A-D routes [28, p. 42] all can distribute
the upstream allocated label. This MPLS label is below the downstream allocated MPLS label used to build the outer
LSP, which is the aggregate LSP. The egress PE uses this label to demultiplex the traffic to the correct VRF. The outer
LSP must advertise a regular MPLS label at the last hop. It cannot advertise an Implicit Null or Explicit Null label [28,
p. 35–38].
3.5
Global Table Multicast
Global Table Multicast is an IETF specification, currently in draft status at the time of this writing, that uses the
NG-MVPN methodology to create multicast provider tunnels in an SP network without the use of VRFs. A common
name for the main table outside of VRFs is called the “global table,” hence the name Global Table Multicast (GTM).
GTM is sometimes also called “Internet Multicast” but the GTM IETF draft (“Global Table Multicast with BGP-MVPN
Procedures”) avoids the use of the term since the use of Internet implies that the multicast streams carried by the
provider are available to the entire public Internet.
GTM separates the network into a “core network” that is surrounded by one or more non-core parts of the network
called “attachment networks.” Between the core and attachment networks is the Protocol Border Router (PBR). The
PBR translates the protocols used in the core network (e.g. BGP) to the protocols used in the attachment network (e.g.
PIM), and it gets its name as it sits at the protocol boundary. The routers in the attachment network that attach to
the PBRs are referred to as Attachment Routers (ARs). A PBR isn’t necessarily an edge router in the PE sense, as in
NG-MVPN and regular Unicast BGP/MPLS VPNs. The PBR does mark the border of any tunnels that are used to
transport multicast traffic across the core [34, p. 4–5].
3.5.1
Use of NG-MVPN BGP Procedures in GTM
Global Table Multicast PBRs use the same procedures described in NG-MPVN for PE routers. The PE-CE Attachment
Circuit (AC) should be considered any circuit that attaches to a PBR (PBR-AR), and the backbone network in NG-MVPN
to be considered the core network between the PBRs. Some adaptations are required [34, p. 6].
43
PIM
AR
MP-iBGP
PBR
“Core”
PIM
PBR
AR
P-Tunnel
Figure 3.4: GTM Network Topology
Figure 3.4 shows a high level diagram of the separation between the “core,” where the GTM procedures are carried out,
and the AR routers that attach to the PBRs. The AR can simply be another router within the same AS, it does not
need to be a CE router, and a source and also simply connect directly to a PBR.
3.5.1.1
Route Distinguishers and Route Targets
The MCAST-VPN BGP Routes (SAFI 5 MP REACH NLRI Path Attribute) from NG-MVPN have a Route Target (RT)
field and a Route Distinguisher (RD) field in the NLRI. The RD must be set to zero.
Recall that NG-MVPN has two types of RTs: The C-Multicast RT Extended Community (EC) and the Unicast
BGP/MPLS VPN Import/Export RT. The C-Multicast RT is carried by Extended Communities in the routes of CMulticast Shared Tree Routes, C-Multicast Source Tree Routes, and Leaf A-D Routes, and identifies the PE router
that has been selected by the route’s originator as the Upstream PE or UMH. This RT has a Global Admin Field, which
identifies the Upstream PE or UMH and a Local Admin Field which is a unique value that identifies a specific VRF. GTM
requires the use of the C-Multicast RT, however with the Local Admin field set to zero to imply that the Global Table is
being used and not a VRF. The Global Admin Field remains the same. This version of the C-Multicast RT is referred
to as the PBR-Identifying RT. The Unicast BGP/MPLS VPN Import/Export RT is optional. If this RT is used and
configured for the Global Table, then the values must match, and should be unique from any Import/Export RTs used
for NG-MVPN [34, p. 6-8].
3.5.1.2
UMH-Eligible Routes
NG-MVPN specified that UMH-Eligible Routes use SAFI 128 (Unicast BGP/MPLS VPN) or SAFI 129 (Multicast
BGP/MPLS VPN). These are the VPN specific routes that are contained within a VPN and require the use of RTs.
GTM specifies that the UMH-Eligible Routes are of SAFI 1 (Unicast), 2 (Multicast) or 4 (MPLS Labeled), and they do
not require the use of RTs. No new procedures are required for these routes to be imported into the Global Table of a
PBR.
Recall that NG-MVPN described that the PE looks up the C-Root address (either the C-Source or the C-RP) in the
Global Table and finds the best matches and these are the UMH-Eligible Routes. This is done to determine the UMH,
Upstream PE, Upstream RD, and Source AS of the flow. GTM will use the routes of SAFI 2 if available, if not it will use
routes from SAFI 1 or SAFI 4 (which are considered equal according to BGP best path selection). The same NG-MVPN
procedures are used to find the Selected UMH Route. The Upstream RD is always assumed to be zero.
The UMH-Eligible Routes in GTM may carry the VRF Route Import EC and/or the Source AS EC. If these are carried
then the Upstream PBR and Source AS are identified from these ECs respectively. If the UMH-Eligible Route is not
carrying the Source AS EC the AS is considered to be the local AS. If the UMH-Eligible Route does not carry the VRF
Route Import EC, then the following optional procedure is used: a PBR advertises a route to itself carrying a VRF Route
Import EC with an IP address in its Global Administrator field that is set to the same IP address as the Next Hop and the
NLRI address in that route that its advertising to itself. Refer to this as “Route R”. The PBR then advertises “Route R”
to other PBRs within the network. When a PBR looks up a route that does not contain the VRF Route Import EC it
looks up a route that contains the Next Hop, and should find “Route R” that was advertised by all of the PBRs. From
44
“Route R“ it can determine the upstream PBR from the PBR-Identifying RT found within. Each PBR will perform this
process.
In some cases the UMH-Eligible Route can be learned outside of BGP. For example, the C-Root address may be found in
the IGP links state database, or the C-Root next-hop interface may be a Traffic Engineering tunnel [34, p. 9-12].
3.5.1.3
BGP Autodiscovery Routes
Some special considerations may be needed for the various A-D Routes [34, p. 14–17].
Intra-AS I-PMSI A-D Routes In addition to the conditions when an NG-MVPN implementation does not need to
distribute Intra-AS I-PMSI A-D Routes, GTM specifies that these routes do not need to be distributed when I-PMSIs are
not being used, and when Shared and Source Tree C-Multicast Routes never have their Next Hop field change. Also
section section 3.5.1.1 on RD and RT changes applies.
Inter-AS I-PMSI Routes
S-PMSI Routes
Leaf A-D Routes
There are no additional procedures for GTM, except for sections on RD and RT usage.
There are no additional procedures for GTM, except for sections on RD and RT usage.
There are no additional procedures for GTM, except for sections on RD and RT usage.
Source Active A-D Routes The changes in section section 3.5.1.1 apply. In NG-MVPN there is the assumption that
no two routes will have the same RD unless they come from the same PE. However in GTM the RD is always set to
zero, so all RDs will match. A special procedure is used for GTM. A PBR can attach a VRF Route Import EC to the
route. If this is the case, a BGP speaker distributing the route can change the Next Hop, otherwise the BGP speaker
may not change the Next Hop. An egress PBR that receives the route can either use the VRF Route Import EC if it is
available, or it may use the Next Hop of the originating PBR if it not available (hence the requirement for a BGP speaker
to not change the Next Hop if there is no VRF Route Import EC for that route).
3.5.1.4
BGP C-Multicast Routes
In GTM environments when it is known in advance that the Next Hop of a route will not change as it propagates through
the BGP speakers, the procedure for creating the IP-Address-Specific RT is to just use the IP address of the Upstream
PBR in the Global Admin field of the RT. Otherwise the process from NG-MVPN is used, where the IP-Address-Specific
RT is based on the Next Hop of a Type 1 or Type 2 I-PMSI Route [34, p. 17].
3.5.2
Inclusive and Selective Tunnels
GTM allows the use of both Inclusive and Selective Tunnels. The specification does advise that using Inclusive Tunnels
should be carefully considered for reasons of scale. If there is a large set of PBRs then the exclusive use of Selective
Tunnels may be a better approach [34, p. 14].
45
Chapter 4
Summary
The previous two chapters explored Draft Rosen MVPNs (DR-MVPNs) and BGP/MPLS MVPNs (NG-MVPNs). Both of
these utilized concepts from Chapter 1, “Building Blocks,” which discussed the various Protocol Independent Multicast
(PIM) technologies, Border Gateway Protocol (BGP), Multiprotocol Label Switching (MPLS), and the combination of
BGP and MPLS to form Unicast Virtual Private Networks.
4.1
Compare and Contrast
Both DR-MPVNs and NG-MVPNs will allow a customer to carry customer multicast traffic across an SP network.
Selecting the ideal method is up to the operator. If a network does not already utilize MPLS, then DR-MVPNs may be
the better choice over deploying MPLS. However, in a network that uses MPLS and already deploys Unicast BGP/MPLS
VPNs, NG-MVPNs are the better choice. NG-MVPNs are a newer technology, so networks that use older equipment
may need to use DR-MVPNs until upgrades can be made.
DR-MVPN relied heavily on PIM to set up the P-Tunnels within the Service Provider (SP) network. BGP is mainly used
for special cases, for example when PIM-SSM is used and the source needs to be advertised across the network. In
contrast, NG-MVPNs specify the use of BGP/MPLS Unicast VPNs to build the P-Tunnels.
Encapsulation in DR-MVPN uses GRE to encapsulate the customer PIM messages into a new IP packet using a di↵erent IP
address, one that is part of the SP network. NG-MVPN also allows the use of GRE but also other encapsulation methods
using MPLS as well as optionally using pre-existing point-to-point tunnels in the case of Ingress Replication.
4.2
Receiver Sites: All or Some
Both DR-MVPNs and NG-MVPNs have methods of sending traffic to all sites using the same multicast interface or
only select sites. These are Default MDTs and Data MDTs for DR-MVPNs or Inclusive PMSIs and Selective PMSIs for
NG-MVPNs. In the case of Default MDTs and Inclusive PMSIs, traffic may be sent to sites that do not have active
receivers. Considering that the point of multicast is to only send traffic to sites with active receivers these methods may
seem excessive. However, both have their place.
DR-MVPNs require the Default MDT to build the connectivity to the various sites. Data MDTs cannot send control
traffic. In this case the Default MDT is mandatory. The Data MDTs can then be used, after being signaled over the
Default MDT, to better scale larger traffic flows. NG-MVPNs allow for only Selective PMSIs to be established. Even
though an Inclusive PMSI is not mandatory for signaling, it still has uses. An example is a customer with low bandwidth
requirements. In this case there isn’t much burden being placed on the network by sending the traffic to all customer
sites, even if not all sites have receivers that have active multicast receivers. Some extra use of resources is traded for
avoiding the need to add more multicast state to the SP network. The same is true for Inclusive PMSIs. Another case is
46
simply that all sites actually do need the traffic, in which case the Default MDT and Inclusive PMSI make sense. Both
DR-MVPNs and NG-MVPNs allow for the dynamic creation of Data MDTs or Selective PMSIs, respectively.
For both technologies, the use of Data MDTs or Selective PMSIs comes down to the operator’s preference of scale.
NG-MVPNs can further increase scale in the SP network by allowing for the aggregation of P-Tunnels.
4.3
NG-MVPN vs GTM
The methods used in NG-MVPN were extended or modified to create Global Table Multicast (GTM). NG-MVPNs allow
for the multicast traffic to be tunneled through the network using MPLS, which allows for the multicast traffic to traverse
a network that does not have PIM or BGP in the core. The need for GTM over NG-MVPN becomes apparent in very
large networks that are also not carrying external customer traffic. In a larger network the configuration of VRFs and the
parameters for building P-Tunnels required for NG-MVPN can become burdensome. If the operator is trying to distribute
its own traffic and not customer traffic the need for VRFs likely is not necessary. However, the mechanics of NG-MVPNs
require information tied to VRFs to operate. With GTM these mechanics are modified so that the information isn’t
required and the Global Table of a router can be used to originate and accept the BGP MVPN routes. A good use case
for this is Internet Protocol Televison (IPTV) for a cable company. The content can be originated on the company’s own
routers and does not need to be isolated or distributed to only specific routers for a specific business customer. If the
traffic is going to all of the potentially thousands of routers that terminate TV subscribers, having to build VPNs to each
router is an enormous task. With GTM this task can be eliminated and the technology simply needs to be enabled on
the edge routers connected to the source and the subscribers. The TV content can then be pushed to all routers using
efficient multicast replication through a core that does not have PIM or BGP state. Alternatively, PIM and BGP can be
configured in the core for uses independent from the VPN or MVPN services, allowing for the subscriber TV content to
be distributed independently from the core PIM and BGP instances.
4.4
Conclusion
DR-MVPNs and NG-MVPNs provide an SP the ability to provide new services for their customers in a scalable manner,
while GTM builds on NG-MVPNs to allow an SP to lower operational burden if per-customer isolation is not required.
Each technology can be considered when multicast needs to be deployed in a network. With these technologies the reach
of multicast is further than ever, enabling more multicast applications to reach even more people.
47
References
[1] Pete Loshin. TCP/IP Clearly Explained. Morgan Kaufmann, 2002.
[2] Vinod Joseph and Srinivas Mulugu. Deploying Next Generation Multicast-Enabled Applications. Morgan Kaufmann,
2011.
[3] S. Deering. Host Extensions for IP Multicasting. RFC 1112, RFC Editor, August 1989.
[4] Daniel Minoli. IP Multicast With Applications To IPTV and Mobile DVB-H. John Wiley & Sons, Inc., 2008.
[5] D. Meyer and P. Lothberg. GLOP Addressing in 233/8. RFC 3180, RFC Editor, September 2001.
[6] H. Holbrook and B. Cain. Source-Specific Multicast for IP. RFC 4607, RFC Editor, August 2006.
[7] B. Cain, S. Deering, I. Kouvelas, B. Fenner, and A. Thyagarajan. IGMP Group Management Protocol, Version 3.
RFC 3376, RFC Editor, October 2002.
[8] H. Holbrook, B. Cain, and B. Haberman. Using Internet Group Management Protocol Version 3 (IGMPv3) and
Multicast Listener Discovery Protocol Version 2 (MLDv2) for Source-Specific Multicast. RFC 4604, RFC Editor,
August 2006.
[9] W. Fenner. Internet Group Management Protocol, Version 2. RFC 2236, RFC Editor, November 1997.
[10] B. Fenner, M. Handley, H. Holbrook, and I. Kouvelas. Protocol Independent Multicast - Sparse Mode (PIM-SM):
Protocol Specification (Revised). RFC 4607, RFC Editor, August 2006.
[11] Wendell Odom, Rus Healy, and Denise Donohue. CCIE Routing and Switching Certification Guide. Cisco Press,
Fourth edition, 2010.
[12] A. Adams, J. Nicholas, and W. Sidak. Protocol Independent Multicast - Dense Mode (PIM-DM): Protocol
Specification (Revised). RFC 3973, RFC Editor, January 2005.
[13] Ina Minei and Julian Lucek. MPLS-Enabled Applications. John Wiley & Sons, Inc., Third edition, 2011.
[14] IJ. Wijnands, I. Minei, K. Kompella, and B. Thomas. Label Distribution Protocol Extensions for Point-to-Multipoint
and Multipoint-to-Multipoint Label Switched Paths. RFC 6388, RFC Editor, November 2011.
[15] R. Arrgarwal, D. Papadimitriou, and S. Yasukawa. Extensions to Resource Reservation Protocol - Traffic Engineering
(RSVP-TE) for Point-to-Multipoint TE Label Switched Paths (LSPs). RFC 4875, RFC Editor, May 2007.
[16] Russ White, Danny McPherson, and Srihari Sangali. Practical BGP. Pearson Education, Inc., 2005.
[17] Y. Rekhter, T. Li, and S. Hares. A Border Gateway Protocol 4 (BGP-4). RFC 4271, RFC Editor, January 2006.
[18] T. Bates, R. Chandra, D. Katz, and Y. Rekhter. Multiprotocol Extensions for BGP-4. RFC 4760, RFC Editor,
January 2007.
[19] E. Rosen and Y. Rekhter. BGP/MPLS IP Virtual Private Networks (VPNs). RFC 4364, RFC Editor, February 2006.
[20] Peter Tomsu and Gerahrd Wieser. MPLS-Based VPNs. Prentice Hall PTR, 2002.
[21] Randy Zhang and Micah Bartell. BGP Design and Implementation. Cisco Press, 2004.
[22] S. Sangli, D. Tappan, and Y. Rekhter. BGP Extended Communities Attribute. RFC 4360, RFC Editor, February
2006.
48
[23] Y. Rekhter and E. Rosen. Carrying Label Information in BGP-4. RFC 3107, RFC Editor, May 2001.
[24] Ivan Pepelnjak and Jim Guichard. MPLS and VPN Architectures. Cisco Press, 2000.
[25] Luc De Ghein. MPLS Fundamentals. Cisco Press, 2006.
[26] D. Farinacci, T. Li, S. Hanks, D. Meyer, and P. Traina. Generic Routing Encapsulation (GRE). RFC 2784, RFC
Editor, March 2000.
[27] E. Rosen, Y. Cai, and I. Wijnands. Cisco Systems Solution for Multicast in BGP/MPLS IP VPNs. RFC 6037, RFC
Editor, October 2010.
[28] E. Rosen and R. Aggarwal. Multicast in BGP/MPLS IP VPNs. RFC 6513, RFC Editor, February 2012.
[29] R. Aggarwal, E. Rosen, T. Morin, and Y. Rekhter. BGP Encodings and Procedures for Multicast in BGP/MPLS IP
VPNs. RFC 6514, RFC Editor, February 2012.
[30] Understanding JUNOS OS Next-Generation Multicast VPNs.
https://kb.juniper.net/library/
CUSTOMERSERVICE/GLOBAL˙JTAC/technotes/2000320-en.pdf, January 2014. Accessed:July 15th, 2014.
[31] NG MVPN BGP Route Types and Encodings. http://www.juniper.net/us/en/local/pdf/app-notes/
3500142-en.pdf, 2010. Accessed:July 15th, 2014.
[32] Personal Communication, July 2014. Je↵rey Zhang, Juniper Networks.
[33] R. Aggarwal, Y. Rekhter, and E. Rosen. MPLS Upstream Label Assignment and Context-Specific Label Space.
RFC 5331, RFC Editor, August 2008.
[34] J. Zhang, L. Giuliano, E. Rosen, Karthik Subramanian, D. Pacella, and J. Schiller. Global Table Multicast with
BGP-MVPN Procedures. Draft 04, IETF Tools, May 2014.
49