* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Document
Point-to-Point Protocol over Ethernet wikipedia , lookup
Piggybacking (Internet access) wikipedia , lookup
Asynchronous Transfer Mode wikipedia , lookup
Network tap wikipedia , lookup
Internet protocol suite wikipedia , lookup
Backpressure routing wikipedia , lookup
Computer network wikipedia , lookup
List of wireless community networks by region wikipedia , lookup
Spanning Tree Protocol wikipedia , lookup
Zero-configuration networking wikipedia , lookup
Deep packet inspection wikipedia , lookup
Airborne Networking wikipedia , lookup
Wake-on-LAN wikipedia , lookup
Recursive InterNetwork Architecture (RINA) wikipedia , lookup
Cracking of wireless networks wikipedia , lookup
IEEE 802.1aq wikipedia , lookup
Chapter 4 Internetworking 4.1 Simple Internetworking (IP) 4.2 Routing 4.3 Global Internet 4.4 Multicast 4.5 Multiprotocol Label Switching (MPLS) 1 4.1 Simple Internetworking (IP) Best Effort Service Model Global Addressing Scheme ARP (Address Resolution Protocol ICMP (Internet Message Control Protocol) 2 IP Internet Netw ork 1 (Ethernet) Concatenation of Networks H1 H2 H7 H3 R3 H8 Netw ork 4 (point-to-point) Netw ork 2 (Ethernet) R1 R2 H4 Protocol Stack Netw ork 3 (FDDI) H6 H5 H1 H8 TCP R1 IP ETH R2 IP ETH R3 IP FDDI FDDI IP PPP PPP TCP IP ETH ETH 3 Service Model Connectionless (datagram-based) Best-effort delivery (unreliable service) packets are lost packets are delivered out of order duplicate copies of a packet are delivered packets can be delayed for a long time Datagram format 0 4 Version 8 HLen 16 TOS 31 Length Ident TTL 19 Flags Protocol Offset Checksum SourceAddr DestinationAddr Options (variable) Pad (variable) Data 4 Fragmentation and Reassembly Each network has some MTU Design decisions fragment when necessary (MTU < Datagram) try to avoid fragmentation at source host re-fragmentation is possible fragments are self-contained datagrams use CS-PDU (not cells) for ATM delay reassembly until destination host do not recover from lost fragments 5 Start of header Ident = x Example 0 Offset = 0 Rest of header (a) 1400 data bytes Start of header Ident = x 1 Offset = 0 Rest of header H1 R1 R1 R2 R2 R3 R3 H8 512 data bytes (b) Start of header ETH IP (1400) FDDI IP (1400) Ident = x 1 Offset = 64 PPP IP (512) ETH IP (512) PPP IP (512) ETH IP (512) Rest of header PPP IP (376) ETH IP (376) 512 data bytes Start of header Ident = x 0 Offset = 128 Rest of header 376 data bytes 6 Global Addresses Properties globally unique hierarchical: network + host Dot Notation 10.3.2.4 128.96.33.81 192.12.69.77 7 (a) 0 24 Netw ork Host 14 (b) 1 0 16 Netw ork Host 21 (c) 1 1 0 Netw ork 8 Host 7 Datagram Forwarding Strategy every datagram contains destination’s address if connected to destination network, then forward to host if not directly connected, then forward to some router forwarding table maps network number into next hop each host has a default router each router maintains a forwarding table Example (R2) Network Number 1 2 3 4 Next Hop R3 R1 interface 1 interface 0 8 Address Translation Map IP addresses into physical addresses destination host next hop router Techniques encode physical address in host part of IP address table-based ARP table of IP to physical address bindings broadcast request if IP address not in table target machine responds with its physical address table entries are discarded if not refreshed 9 ARP Details Request Format HardwareType: type of physical network (e.g., Ethernet) ProtocolType: type of higher layer protocol (e.g., IP) HLEN & PLEN: length of physical and protocol addresses Operation: request or response Source/Target-Physical/Protocol addresses Notes table entries timeout in about 10 minutes update table with source when you are the target update table if already have an entry do not refresh table entries upon reference 10 ARP Packet Format 0 8 16 Hardware type = 1 HLen = 48 PLen = 32 31 ProtocolType = 0x0800 Operation SourceHardwareAddr (bytes 0― 3) SourceHardwareAddr (bytes 4- 5) SourceProtocolAddr (bytes 0 - 1) SourceProtocolAddr (bytes 2 - 3) TargetHardwareAddr (bytes 0 – 1) TargetHardwareAddr (bytes 2 - 5) TargetProtocolAddr (bytes 0 - 3) 11 Internet Control Message Protocol (ICMP) Echo (ping) Redirect (from router to source host) Destination unreachable (protocol, port, or host) TTL exceeded (so datagrams don’t cycle forever) Checksum failed Reassembly failed Cannot fragment 12 Redirect G1 Network (1) H1 Network (2) G2 Network H2 G2 finds that H1 is directly connected and will inform H1 to redirect the IP datagrams to G2. 4.2 Routing Forwarding vs Routing forwarding: to select an output port based on destination address and routing table routing: process by which routing table is built Network as a Graph A 3 4 C 6 1 2 1 B 9 E F 1 D Problem: Find lowest cost path between two nodes Factors static: topology dynamic: load 14 Distance Vector Each node maintains a set of triples (Destination, Cost, NextHop) Directly connected neighbors exchange updates periodically (on the order of several seconds) whenever table changes (called triggered update) Each update is a list of pairs: (Destination, Cost) Update local table if receive a “better” route smaller cost came from next-hop Refresh existing routes; delete if they time out 15 Routing Table Example (Node B) B C A D E F G Destination Cost NextHop A 1 A C 1 C D 2 C E 2 A F 2 A G 3 A 16 Routing Loops Example 1 F detects that link to G has failed F sets distance to G to infinity and sends update to A A sets distance to G to infinity since it uses F to reach G A receives periodic update from C with 2-hop path to G A sets distance to G to 3 and sends update to F F decides it can reach G in 4 hops via A B C A D E F G 17 Routing Loops Example 2 link from A to E fails A advertises distance of infinity to E B and C advertise a distance of 2 to E B decides it can reach E in 3 hops; advertises this to A A decides it can read E in 4 hops; advertises this to C C decides that it can reach E in 5 hops… B C A D E F G 18 Distance Vector: link cost changes Link cost changes: node detects local link cost change 1 updates routing info, recalculates distance vector if DV changes, notify neighbors “good news travels fast” x 4 y 50 1 z At time t0, y detects the link-cost change, updates its DV, and informs its neighbors. At time t1, z receives the update from y and updates its table. It computes a new least cost to x and sends its neighbors its DV. At time t2, y receives z’s update and updates its distance table. y’s least costs do not change and hence y does not send any message to z. 19 Distance Vector: link cost changes “good news Travels fast” Dy 1 x 4 y 50 1 z algorithm terminates Dz 20 Distance Vector: link cost changes Link cost changes: bad news travels slow - “count to infinity” problem! 44 iterations before algorithm stabilizes z (y) does not know that the least distance from y (z) 60 X 4 Y 1 Z 50 to x that y (z) tells z (y) is the distance of the path yz-y-x (z-y-x) algorithm continues on! 21 Distance Vector: poisoned reverse If Z routes through Y to get to X : Z tells Y its (Z’s) distance to X is infinite (so Y won’t route to X via Z) will this completely solve count to infinity problem? Loops involving three or more nodes cannot be solved using the technique 60 X 4 Y 50 1 Z algorithm terminates 22 RIP ( Routing Information Protocol) Distance vector algorithm Included in BSD-UNIX Distribution in 1982 Distance metric: # of hops (max = 15 hops) Source node: A u v A z C B D w x y destination hops u 1 v 2 w 2 x 3 y 3 z 2 23 RIP advertisements Distance vectors: exchanged among neighbors every 30 sec via Response Message (also called advertisement) Each advertisement: a list of up to 25 destination subnets within AS 0 8 Command 16 Version Family of net 1 31 Must be zero Address of net 1 Address of net 1 Distance to net 1 Family of net 2 Address of net 2 Address of net 2 Distance to net 2 24 RIP: Example z w A x D B y C Destination Network w y z x …. Next Router Num. of hops to dest. …. .... A B B -- 2 2 7 1 Routing table in D 25 RIP: Example Dest w x z …. Next C … w hops 4 ... A Advertisement from A to D z x Destination Network w y z x …. D B C y Next Router Num. of hops to dest. …. .... A B B A -- Routing table in D 2 2 7 5 1 26 RIP: Link Failure and Recovery If no advertisement heard after 180 sec --> neighbor or link declared dead routes via neighbor invalidated new advertisements sent to neighbors neighbors in turn send out new advertisements (if tables changed) link failure info quickly propagates to entire net poison reverse used to prevent ping-pong loops (infinite distance = 16 hops) 27 RIP Table processing RIP routing tables managed by application-level process called route-d (daemon) advertisements sent in UDP packets, periodically repeated routed routed Transprt (UDP) network (IP) link physical Transprt (UDP) forwarding table forwarding table network (IP) link physical 28 Link State Strategy send to all nodes (not just neighbors) information about directly connected links (not entire routing table) Link State Packet (LSP) id of the node that created the LSP cost of link to each directly connected neighbor sequence number (SEQNO) time-to-live (TTL) for this packet 29 Link State (cont) Reliable flooding store most recent LSP from each node forward LSP to all nodes but one that sent it generate new LSP periodically increment SEQNO start SEQNO at 0 when reboot decrement TTL of each stored LSP discard when TTL=0 30 Reliable Flooding X A C B D X A C B (a) X A C B (c) D (b) D X A C B D (d) 31 Route Calculation Dijkstra’s shortest path algorithm Let N denotes set of nodes in the graph l (i, j) denotes non-negative cost (weight) for edge (i, j) s denotes this node M denotes the set of nodes incorporated so far C(n) denotes cost of the path from s to node n M = {s} for each n in N - {s} C(n) = l(s, n) while (N != M) M = M union {w} such that C(w) is the minimum for all w in (N - M) for each n in (N - M) C(n) = MIN(C(n), C (w) + l(w, n )) 32 A Link-State Routing Algorithm Dijkstra’s algorithm net topology, link costs known to all nodes accomplished via “link state broadcast” all nodes have same info computes least cost paths from one node (‘source”) to all other nodes gives forwarding table for that node iterative: after k iterations, know least cost path to k destinations Notation: c(x,y): link cost from node x to y; = ∞ if not direct neighbors D(v): current value of cost of path from source to destination v p(v): predecessor node along path from source to v N': set of nodes whose least cost path definitively known 33 Dijsktra’s Algorithm 1 Initialization: u: source node 2 N' = {u} 3 for all nodes v 4 if v adjacent to u 5 then D(v) = c(u,v) 6 else D(v) = ∞ 7 8 Loop 9 find w not in N' such that D(w) is a minimum 10 add w to N' 11 update D(v) for all v adjacent to w and not in N' : 12 D(v) = min( D(v), D(w) + c(w,v) ) 13 /* new cost to v is either old cost to v or known 14 shortest path cost to w plus cost from w to v */ 15 until all nodes in N' 34 Dijkstra’s algorithm: example Step 0 1 2 3 4 5 N' u ux uxy uxyv uxyvw uxyvwz D(v),p(v) D(w),p(w) 2,u 5,u 2,u 4,x 2,u 3,y 3,y D(x),p(x) 1,u D(y),p(y) ∞ 2,x D(z),p(z) ∞ ∞ 4,y 4,y 4,y 5 2 u v 2 1 x 3 w 3 1 5 z 1 y 2 35 Dijkstra’s algorithm: example 5 5 2 u 3 v 2 1 x w 3 5 2 y 1 z 1 2 u 2 1 2 u 2 1 x w 5 z 1 3 x 2 y 1 5 5 v 3 v 3 w 3 1 5 z 1 y 2 2 u v 2 1 x 3 w 3 1 5 z 1 y 2 36 Dijkstra’s algorithm: example 5 5 2 u v 2 1 x 3 w 3 1 5 z 1 y 2 2 u v 2 1 x 3 w 3 1 5 z 1 y 2 37 Dijkstra’s algorithm, discussion Algorithm complexity: n nodes each iteration: need to check all nodes, w, not in N n(n+1)/2 comparisons: O(n2) more efficient implementations possible: O(nlogn) Oscillations possible: e.g., link cost = amount of carried traffic D 1 1 0 A 0 0 C e 1+e e initially B 1 2+e A 0 D 1+e 1 B 0 0 C … recompute routing 0 D 1 A 0 0 C 2+e B 1+e … recompute 2+e A 0 D 1+e 1 B e 0 C … recompute 38 OSPF (Open Shortest Path First) “open”: publicly available – defined in RFC 2328 Uses Link State algorithm Link-State packet dissemination Topology map at each node Route computation using Dijkstra’s algorithm OSPF advertisement carries one entry per neighbor router Advertisements disseminated to entire AS (via flooding) Carried in OSPF messages directly over IP (rather than TCP or UDP) 39 OSPF “advanced” features (not in RIP) Security: all OSPF messages authenticated (to prevent malicious intrusion) Load Balancing: Multiple same-cost paths allowed (only one path in RIP) For each link, multiple cost metrics for different TOS (e.g., satellite link cost set “low” for best effort; high for real time) Integrated uni- and multicast support: Multicast OSPF (MOSPF) uses same topology data base as OSPF Hierarchical OSPF in large domains. 40 Hierarchical OSPF An OSPF autonomous system (AS) can be configured into areas Exactly one OSPF area in the AS is configured to be the backbone area Each area runs its own OSPF link-state routing algorithm Two-level hierarchy: local area, backbone. Link-state advertisements only in area each nodes has detailed area topology; only know direction (shortest path) to nets in other areas. 41 Hierarchical OSPF 42 Hierarchical OSPF Four types of routers Internal routers: perform only intra AS routing Area border routers: belong to both an area and the backbone Backbone routers: run OSPF routing limited to backbone. Boundary routers: connect to other AS’s. 43 OSPF Advertisement Format 0 8 16 31 LS Age Version Type Message length SourceAddr AreaId Checksum Authentication type Authentication Options Link-state ID Advertising router Type=1 LS sequence number LS checksum Length 0 Flags 0 Number of links Link ID Link data Link type Num_TOS Metric Optional TOS information More links Header Format Link-State Advertisement 44 Comparison of LS and DV algorithms Message complexity LS: with n nodes, E links, O(nE) messages sent DV: exchange between neighbors only convergence time varies Speed of Convergence LS: O(n2) algorithm requires O(nE) messages may have oscillations DV: convergence time varies may be routing loops count-to-infinity problem Robustness: what happens if router malfunctions? LS: node can advertise incorrect link cost each node computes only its own table DV: DV node can advertise incorrect path cost each node’s table used by others error propagate thru network 45 Metrics Original ARPANET metric measures number of packets queued on each link took neither latency or bandwidth into consideration New ARPANET metric stamp each incoming packet with its arrival time (AT) record departure time (DT) when link-level ACK arrives, compute Delay = (DT - AT) + Transmit + Latency if timeout, reset DT to departure time for retransmission link cost = average delay over some time period 46 Metrics Still has problems Under light load, it works well since the two static factors of delay dominated the cost. Under heavy load, a congested link would start to advertise a very high cost. This caused all the traffic to move off that link, leaving it idle, so then it advertise a low cost,… The range of link values was much too large. Fine Tuning compressed dynamic range replaced Delay with link utilization 47 Revised ARPANET routing metric versus link utilization 225 9.6-Kbps satellite link 9.6-Kbps terrestrial link 140 56-Kbps satellite link 56-Kbps terrestrial link 90 75 60 30 25% 50% Utilization 75% 100% 48 Revised ARPANET routing metric versus link utilization A highly loaded link never shows a cost of more than three times its cost when idle The most expensive link is only seven times the cost of least expensive A high-speed satellite link is more attractive than a low-speed terrestrial link Cost is a function of link utilization only at moderate to high loads. 49 4.3 Global Internet Structure Tree Structure of the Internet in 1990 NSFNET backbone Stanford BARRNET regional Berkeley Westnet regional PARC MidNet regional ■■■ UNM NCAR ISU UNL KU UA 50 Global Internet One of the salient features of this topology is that it consists of “end user” sites (e.g, Stanford university) that connect to “service provider” networks (e.g, BARRNET) Each provider and end user is likely to be an administratively independent entity – Autonomous System (AS). Scalability problems Scalability of routing Address utilization Subnetting – deals with address space utilization Classless routing or supernetting – tackles both address utilization and routing scalability 51 Subnetting Inefficient use of Hierarchical Address Space class C with 2 hosts (2/255 = 0.78% efficient) class B with 256 hosts (256/65535 = 0.39% efficient) Still Too Many Networks routing tables do not scale route propagation protocols do not scale Subnetting provides an elegantly simple way to reduce the total number of networks that are assigned The idea is to take a single IP network number and allocate the IP addresses with that network number to several physical networks – subnets. 52 Subnetting Add another level to address/routing hierarchy: subnet Subnet masks define variable partition of host part A single network number can be shared among multiple networks involves configuring all the nodes on each subnet with a subnet mask. Subnets visible only within site Netw ork number Host number Class B address 111111111111111111111111 00000000 Subnet mask (255.255.255.0) Netw ork number Subnet ID Host ID 53 Subnetted address Subnet Example H1 H2 255.255.255.128 128.96.34.139 128.96.34.128 Subnet mask: 255.255.255.128 Subnet number: 128.96.34.0 128.96.34.15 128.96.34.1 R1 H1 Subnet mask: 255.255.255.128 Subnet number: 128.96.34.128 128.96.34.130 128.96.34.139 128.96.34.129 H3 R2 H2 R1 255.255.255.128 128.96.34.139 128.96.34.128 128.96.33.1 128.96.33.14 Forwarding table at router R1 Subnet mask: 255.255.255.0 Subnet number: 128.96.33.0 Subnet Number 128.96.34.0 128.96.34.128 128.96.33.0 Subnet Mask 255.255.255.128 255.255.255.128 255.255.255.0 Next Hop interface 0 interface 1 R2 54 Forwarding Algorithm D = destination IP address for each entry (SubnetNum, SubnetMask, NextHop) D1 = SubnetMask & D if D1 = SubnetNum if NextHop is an interface deliver datagram directly to D else deliver datagram to NextHop Use a default router if nothing matches Not necessary for all 1s in subnet mask to be contiguous Can put multiple subnets on one physical network Subnets not visible from the rest of the Internet 55 Classless Routing (CIDR) Supernetting CIDR: Classless Inter-Domain Routing A technique that addresses two scaling concerns: the growth of backbone routing tables, and the potential for the 32-bit IP address space to be exhausted well before the 4 billionth host is attached to the Internet. Even though subnetting can help to assign addresses carefully, it does not get around the fact that any AS with more than 255 hosts wants a class B address – exhaustion of IP address space. 56 Classless Routing (CIDR) Supernetting CIDR tries to balance the desire to minimize the number of routes that a router needs to know against the need to hand out addresses efficiently Assign block of contiguous network numbers to nearby networks Represent blocks with a single pair (first_network_address, count) Restrict block sizes to powers of 2 Use a bit mask (CIDR mask) to identify block size All routers must understand CIDR addressing 57 Route aggregation with CIDR Customers 128.112.128/24 Advertise 128.112.128/21 ISP .. . 128.112.135/24 Since all of the customers are reachable through the same Provider network, it can advertise a single route to all of Them by just advertising the common 21-bit prefix they share 58 IP Forwarding Revisited Find the network number in a packet and then lookup that number in a forwarding table. Reexamine this assumption with CIDR Prefixes length 2-32 bits Prefixes may “overlap” Some addresses may match more than one prefix. Longest Prefix Matching (LPM) For example 171.69 (16-bit prefix) 171.69.10 (24-bit prefix) 171.69.10.5 matches both 171.69.20.5 only matches 171.69 59 Interdomain Routing (BGP) • AS = routing domain • Routing Policies • Two major Interdomain routing protocols -- Exterior gateway Protocol (EGP) -- Border gateway Protocol (BGP-4) R1 R3 R2 Autonomous system 1 R4 Autonomous system 2 R5 R6 60 BGP-4: Border Gateway Protocol AS Types stub AS: has a single connection to one other AS carries local traffic only multihomed AS: has connections to more than one AS refuses to carry transit traffic transit AS: has connections to more than one AS carries both transit and local traffic Each AS has: one or more border routers one BGP speaker that advertises: local networks other reachable networks (transit AS only) gives path information 61 Today’s multibackbone Internet Large corporation “Consumer” ISP Peering point Backbone service provider “Consumer” ISP Large corporation Peering point “Consumer” ISP Small corporation 62 BGP Example Speaker for AS2 advertises reachability to P and Q network 128.96, 192.4.153, 192.4.32, and 192.4.3, can be reached directly from AS2 Customer P (AS 4) 128.96 192.4.153 Customer Q (AS 5) 192.4.32 192.4.3 Customer R (AS 6) 192.12.69 Customer S (AS 7) 192.4.54 192.4.23 Regional provider A (AS 2) Backbone netw ork (AS 1) Regional provider B (AS 3) Speaker for backbone advertises networks 128.96, 192.4.153, 192.4.32, and 192.4.3 can be reached along the path (AS1, AS2). Speaker can cancel previously advertised paths 63 Internet inter-AS routing: BGP BGP (Border Gateway Protocol): the de facto standard BGP provides each AS a means to: 1. Obtain subnet reachability information from neighboring ASs. 2. Propagate the reachability information to all routers internal to the AS. 3. Determine “good” routes to subnets based on reachability information and policy. Allows a subnet to advertise its existence to rest of the Internet: “I am here” 64 BGP basics Pairs of routers (BGP peers) exchange routing information over semi-permanent TCP connections: BGP sessions Note that BGP sessions do not correspond to physical links. When AS2 advertises a prefix to AS1, AS2 is promising it will forward any datagrams destined to that prefix towards the prefix. AS2 can aggregate prefixes in its advertisement 3c 3a 3b AS3 1a AS1 2a 1c 1d 1b 2c AS2 2b External BGP (eBGP) session Internal BGP (iBGP) session 65 Aggregation of prefixes 138.16.64/24 138.16.65/24 138.16.66/24 => 138.16.64/22 138.16.67/24 66 Distributing reachability info With eBGP session between 3a and 1c, AS3 sends prefix reachability information to AS1. 1c can then use iBGP to distribute this new prefix reachability information to all routers in AS1 1b can then re-advertise the new reachability information to AS2 over the 1b-to-2a eBGP session When router learns about a new prefix, it creates an entry for the prefix in its forwarding table. 3c 3a 3b AS3 1a AS1 2a 1c 1d 1b 2c AS2 2b eBGP session iBGP session 67 Path attributes & BGP routes When advertising a prefix, advertisement includes BGP attributes. prefix + attributes = “route” Two important attributes: AS-PATH: contains the ASs through which the advertisement for the prefix passed: AS 67 AS 17 used to detect and prevent looping advertisement also use in choosing among multiple path to the same prefix NEXT-HOP: Indicates the specific internal-AS router to next-hop AS. (There may be multiple links from current AS to next-hopAS.) When gateway router receives route advertisement, uses import policy to accept/decline. 68 BGP route selection Router may learn about more than 1 route to any one prefix. Router must select route. Elimination rules invoked sequentially until one route remains: 1. Local preference value attribute: policy decision – AS’s network administrator 2. Shortest AS-PATH 3. Closest NEXT-HOP router: hot potato routing 4. Additional criteria 69 BGP messages BGP messages exchanged using TCP. BGP messages: OPEN: opens TCP connection to peer and authenticates sender UPDATE: advertises new path (or withdraws old) KEEPALIVE keeps connection alive in absence of UPDATES; also ACKs OPEN request NOTIFICATION: reports errors in previous message; also used to close connection 70 BGP routing policy legend: B W provider network X A customer network: C Y Figure 4.5-BGPnew: a simple BGP scenario A,B,C are provider networks X,W,Y are customer (of provider networks) X is dual-homed: attached to two networks X does not want to route from B via X to C .. so X will not advertise to B a route to C 71 BGP routing policy (2) legend: B W provider network X A customer network: C Y Figure 4.5-BGPnew: a simple BGP scenario A advertises to B the path AW B advertises to X the path BAW Should B advertise to C the path BAW? No way! B gets no “revenue” for routing CBAW since neither W nor C are B’s customers B wants to force C to route to w via A B wants to route only to/from its customers! 72 Why different Intra- and Inter-AS routing ? Policy: Inter-AS: administrator wants control over how its traffic routed, who routes through its net. Intra-AS: single admin, so no policy decisions needed Scale: hierarchical routing saves table size, reduced update traffic Performance: Intra-AS: can focus on performance Inter-AS: policy may dominate over performance 73 IP Version 6 Features 128-bit addresses (classless) multicast real-time service authentication and security autoconfiguration end-to-end fragmentation protocol extensions Header 40-byte “base” header extension headers (fixed order, mostly fixed length) fragmentation source routing authentication and security other options 74 4.4 Broadcast/Multicast routing Broadcast routing –- deliver a packet from a source node to all other nodes Multicast routing – deliver a packet from a source node to a subset of other nodes 75 Source-duplication versus in-network duplication duplicate R1 duplicate creation/transmission R1 duplicate R2 R2 R3 R4 (a) R3 R4 (b) (a) source duplication, (b) in-network duplication 76 Broadcast routing algorithms Uncontrolled flooding Controlled flooding Spanning-tree broadcast 77 Uncontrolled flooding The source node sends a copy of the packet to all of its neighbors When a node receives a broadcast packet, it duplicates the packet and forwards it to all of its neighbors (except the neighbor from which it receives the packet) Problems: If the graph has cycles, then one or more copies of each broadcast packet will cycle indefinitely Broadcast storm 78 Controlled flooding Sequence-number-controlled flooding Source node puts its address and a broadcast sequence number into a broadcast packet Each node maintains a list of the source address and sequence number of each packet it has received When a node receives a broadcast packet If the packet is in the list, the packet is dropped Otherwise, the packet is duplicated and forwarded 79 Controlled flooding Reverse path forwarding When a router receives a broadcast packet, it duplicates and forwards the packet only if the packet arrives on the link that is on its own shortest unicast path back to the source A B c F E Packet will be forwarded D G Packet not forwarded beyond receiving router 80 Controlled flooding Drawback Some of the nodes receive redundant packets A B c F E D G Redundant packets Ideally, every node should receive only one copy of the broadcast packet. 81 Spanning-tree broadcast Spanning tree – a tree that contains all nodes in a graph Minimum spanning tree – a spanning tree whose cost is the minimum among all the spanning trees of a graph Broadcast along a spanning tree A B c F A E B c D F G (a) Broadcast initiated at A E D G (b) Broadcast initiated at D 82 Construction of Spanning-tree Many algorithms have been developed Center-based approach Select a center node (rendezvous or core) Each node unicasts tree-join message to the center node A A 3 B c 4 F 2 E 1 Center node B c D F 5 E D G G (a) Stepwise construction of spanning tree (b) Constructed spanning tree 83 Multicast Routing: Problem Statement Goal: find a tree (or trees) connecting routers having local multicast group members tree: not all paths between routers used source-based: different tree from each sender to receivers shared-tree: same tree used by all group members Source-based trees Shared tree 84 Approaches for building multicast trees source-based tree: one tree per source shortest path trees reverse path forwarding group-shared tree: group uses one tree minimal spanning (Steiner) center-based trees …we first look at basic approaches, then specific protocols adopting these approaches 85 Shortest Path Tree multicast forwarding tree: tree of shortest path routes from source to all receivers Dijkstra’s algorithm S: source LEGEND R1 1 2 R4 R2 3 R3 router with attached group member 5 4 R6 router with no attached group member R5 6 R7 i link used for forwarding, i indicates order link added by algorithm 86 Reverse Path Forwarding rely on router’s knowledge of unicast shortest path from it to sender each router has simple forwarding behavior: if (multicast datagram received on incoming link on shortest path back to sender) then flood datagram onto all outgoing links else ignore datagram 87 Reverse Path Forwarding: example S: source LEGEND R1 R4 router with attached group member R2 R5 R3 R6 R7 router with no attached group member datagram will be forwarded datagram will not be forwarded • result is a source-specific reverse SPT – may be a bad choice with asymmetric links 88 Reverse Path Forwarding: pruning forwarding tree contains subtrees with no multicast group members no need to forward datagrams down subtree “prune” messages sent upstream by router with no downstream group members LEGEND S: source R1 router with attached group member R4 R2 P R5 R3 R6 P R7 P router with no attached group member prune message links with multicast forwarding 89 Shared-Tree: Steiner Tree Steiner Tree: minimum cost tree connecting all routers with attached group members problem is NP-complete excellent heuristics exists not used in practice: computational complexity information about entire network needed monolithic: rerun whenever a router needs to join/leave 90 Center-based trees single delivery tree shared by all one router identified as “center” of tree to join: edge router sends unicast join-message addressed to center router join-message “processed” by intermediate routers and forwarded towards center join-message either hits existing tree branch for this center, or arrives at center path taken by join-message becomes new branch of tree for this router 91 Center-based trees: an example Suppose R6 chosen as center: LEGEND R1 3 R2 router with attached group member R4 2 R5 R3 1 R6 1 router with no attached group member path order in which join messages generated R7 92 Internet Multicasting Routing: DVMRP DVMRP: distance vector multicast routing protocol, RFC1075 flood and prune: source-based tree, reverse path forwarding, RPF tree based on DVMRP’s own routing tables constructed by communicating DVMRP routers no assumptions about underlying unicast initial datagram to multicast group flooded everywhere via RPF routers not wanting group: send upstream prune messages 93 DVMRP: continued… soft state: DVMRP router periodically (1 min.) “forgets” branches are pruned: multicast data again flows down unpruned branch downstream router: reprune or else continue to receive data routers can quickly regraft to tree following IGMP join at leaf odds and ends commonly implemented in commercial routers Mbone routing done using DVMRP 94 Tunneling Q: How to connect “islands” of multicast routers in a “sea” of unicast routers? physical topology logical topology multicast datagram encapsulated inside “normal” (non-multicast- addressed) datagram normal IP datagram sent thru “tunnel” via regular IP unicast to receiving multicast router receiving multicast router decapsulates to get multicast datagram 95 PIM: Protocol Independent Multicast Not dependent on any specific underlying unicast routing algorithm (like RIP, OSPF, works with all) Two different multicast distribution scenarios : Dense: Sparse: group members # of routers with group densely packed, in “close” proximity. members is small wrt total # of routers group members “widely dispersed” 96 Consequences of Sparse-Dense Dichotomy: Dense Sparse: group membership by no membership until routers explicitly join routers assumed until routers explicitly prune receiver-driven data-driven construction construction of multicast tree (e.g., center-based) of multicast tree (e.g., RPF) bandwidth and nongroup-router processing bandwidth and nonconservative group-router processing profligate 97 PIM- Dense Mode Flood-and-prune RPF, similar to DVMRP but underlying unicast protocol provides RPF information for incoming datagram less complicated (less efficient) downstream flood than DVMRP reduces reliance on underlying routing algorithm has protocol mechanism for router to detect if it is a leaf-node router 98 PIM - Sparse Mode Center-based approach R1 router sends join message to rendezvous point (RP) intermediate routers update state and forward join after joining via RP, router can switch to source-specific tree R4 join R2 R3 join R5 join R6 all data multicast from rendezvous point R7 rendezvous point 99 PIM - Sparse Mode Sender(s): unicast data to RP, which distributes down RP-rooted tree RP can extend multicast tree upstream to source RP can send stop message to the source if no attached receivers R1 R4 join R2 R3 join R5 join R6 all data multicast from rendezvous point R7 rendezvous point “no one is listening!” 100 4.5 MultiProtocol Label Switching (MPLS) Prior Work MPLS Overview MPLS Architecture Prior Work Tag Switching (Cisco) Aggregate Route-Based IP Switching (ARIS, IBM) IP Navigator IFMP-IP Switching (Ipsilon) Cell Switching Router (CSR, Toshiba) 102 Prior Work Tag switching is based on the control-driven approach. The set up of LSPs (Label Switched Paths) closely follows control messages such as routing updates and RSVP messages. Aggregate route-based IP switching (ARIS) is based on the control-driven approach. Very similar to tag switching. ARIS introduces the concept of an “egress identifier” (FECs) to express the granularity of LSPs. IP Navigator is again a control-driven protocol. Use OSPF as the internal routing protocol used within a routing domain. Explicit routing is used to setting up the VCs. 103 Prior Work Ipsilon Flow Management Protocol (IFMP) is a traffic driven protocol. When the number of packets from a flow exceeds a predetermined threshold, the controller uses IFMP to set up an LSP for the particular flow. Cell switch router (CSR) proposal is similar to IP switching. CSR is primarily designed as a device for interconnecting ATM clouds. Within an LIS (logical IP subnet), ATM forum standards are used to connection hosts and switched together. Multiple LISs are then interconnected with CSRs that are capable of running both IP forwarding and cell forwarding. The setup of LSPs is data-driven for best effort traffic and RSVPdriven for flows that require resource reservation. 104 MPLS Overview RFC 3812 The IETF MPLS working group is to standardize a base technology that integrates the label swapping forwarding paradigm with network layer routing. Cisco is the major contributor to the MPLS working group. substitute “Label” for “Tag” in Tag Switching @ MPLS Core mechanisms of MPLS Semantics assigned to a stream label Labels are associated with specific streams of data. Forwarding Methods Forwarding is simplified by the use of the short fixed length labels to identify streams. Forwarding may require simple functions such as looking up a label in a table, swapping labels, and possibly decrementing and checking a TTL. Label Distribution Methods Allow nodes to determine which labels to use for specific streams. Native IP Forwarding IP routing: both the packet forwarding and route determination process in an IP network. Native IP forwarding (NIF): hop-by-hop, destination-based packet forwarding. Each packet’s next hop and output port are determined by a longest-prefix-match forwarding table lookup. Additional packet classification may also be performed to derive output port queuing and scheduling rules. 107 A Simplified NIF forwarding engine Longest Prefix Match lookup Forwarding Table Next hop + port Packet Classification Input Ports IP Header Queuing and Scheduling rules Output Ports IP payload Packet Classification keys: IP source and destination addresses, IP protocol type, DiffServ (DS) or TOS byte, and TCP/UDP port numbers. 108 Per-Hop classification, queuing, and scheduling Classify Queue Port 1 Port N S Port M 109 A Simplified LSR forwarding engine Switching Table Next hop + port Queuing and Scheduling rules Output Ports Input Ports MPLS label MPLS payload 110 Traffic Engineering Conventional IP routing attempts to find the shortest path between a packet’s current location and its intended destination. “Hot spots” and packet loss rates, latency, and jitter increase as the average load on a router rises. Solutions: (1) Faster routers, (2) Alternate routes. Routing policy may also require traffic engineering. For example, the external link between R6 and A3 may have been funded solely by A2 and A3. Therefore, A1’s traffic must not be allowed to traverse it. 111 Traffic Engineering -- Override the shortest path route IP Backbone Access 1 R1 R6 Access 3 R5 Access 2 R2 R3 R4 Route from A2 to D Desired route from A1 to D Actual route from A1 to D Destination D 112 Signaling and Provisioning Signaling: when network (re)configuration can be requested by users at any time and achieved within milliseconds or seconds. Provisioning: When the reaction time for (re)configuration becomes measured in minutes or hours. In either case, the (re)configuring action involves establishing (or modifying) information used by routers or switches to control their forwarding actions, including forwarding (routing) information, classification rules, and/or queuing and scheduling parameters. 113 Core MPLS Components The basic routing approach Routing is accomplished through the use of standard L3 routing protocols (e.g. OSPF and BGP). The information maintained by the L3 routing protocols is then used to distribute labels to neighboring nodes that are used in the forwarding of packets. Labels Label semantics, Label granularity, Label assignment, Label stack and forwarding operations. Label Semantics The label is nothing more than a shorthand for an aggregate stream of user data. The meaning of the label is a strictly local issue between two neighboring nodes. MPLS could be employed between any two neighboring nodes, even if no other nodes in the network participate in MPLS. When MPLS is used between more than two nodes, then the operation between any two neighboring nodes could be interpreted as independent of the operation between any other pair of nodes. Label Granularity The device uses the label to forward packets will forward all packets with the same label in the same way. A Forwarding Equivalence Class (FEC) is a set of L3 packets which are all forwarded in the same manner by a particular Label Switching Router (LSR). For unicast IP traffic, the granularity of a label allows various levels of aggregation in a Label Information Base (LIB). For IP multicast, the natural binding of a label would be to a multicast tree. Label assignment Label assignment involves allocating a label, and then binding a label to a route. Label assignment can be driven by control traffic or data traffic. (discussed later.) Label withdrawal is primarily a matter of garbage collection, that is collecting up unused labels so that they may be reassigned. 117 Routing Aggregation R6 Access 1 4 1 R1 R5 Access 3 2 Access 2 R2 R3 3 5 R4 Destination D 118 Forwarding Component Label Stack and Forwarding Operations label swap : looking up the incoming label to determine the outgoing label, encapsulation, port, and any additional information which may pertain to the stream such as a particular queue or other QoS related treatment. label push : When a packet first enters an MPLS domain, the packet is associated with a label. label pop : When a packet leaves an MPLS domain, the label is removed. The label stack is useful within hierarchical routing domain. Encapsulation Label-based forwarding makes use of various pieces of information, including a label or stack of labels, and possibly additional information such as a TTL field. “MPLS encapsulation” : encapsulate the label information and information used for label based forwarding. An encapsulation scheme may make use of the following fields: label, TTL, class of service, stack indicator, next header type indicator, and checksum MPLS label stack encoding Stack bottom Stack top COS Label (20 bits) Label (20 bits) Label (20 bits) Exp (3 bits) Exp (3 bits) Exp (3 bits) S (1 bit) S (1 bit) S (1 bit) TTL (8 bits) TTL (8 bits) TTL (8 bits) ... Original Packet MPLS frame delivered to link layer 121 Label Assignment Topology driven (Tag) In response to normal processing of routing protocol control traffic Labels are pre-assigned; no label setup latency at forwarding time. Request driven (RSVP) In response to normal processing of request based control traffic May require a large number of labels to be assigned. Traffic driven (Ipsilon) The arrival of data at an LSR triggers label assignment and distribution. Label setup latency; potential for packet reordering. 122 Label Distribution Explicit Label Distribution Downstream label allocation label allocation is done by the downstream LSR most natural mechanism for unicast traffic Upstream label allocation label allocation is done by the upstream LSR may be used for optimality for some multicast traffic A unique label for an egress LSR within the MPLS domain Any stream to a particular MPLS egress node could use the label of that node. 123 Label Distribution Explicit Label Distribution Protocol (LDP) Reliability : by transport protocol or as part of LDP. Separate routing computation and label distribution. Piggybacking on Other Control Messages Use existing routing/control protocol for distributing routing/control and label information. OSPF, BGP, RSVP, PIM Combine routing and label distribution. Label purge mechanisms By time out Exchange of MPLS control packets 124 Label Distribution Protocol LDP Peer: Two LSRs that exchange label/stream mapping information via LDP LDP messages Discovery messages (via UDP) announce and maintain the presence of LSR Session messages maintain session between LDP peers Advertisement message label operation (Label distribution) Notification message advisory information and signal error information Error notification: signal fatal errors Advisory notification: status of the LDP session or some previous message received from the peer. 125 Label Swapping Labeled Packet Example : Forwarding a Labeled Packet Map the incoming label to a next hop label, determines where to forward the packet. Encodes the new label stack into the packet, and then forwards it. Incoming Label Map (ILM) Input Port Label 1 4 Unlabeled Packet LSR analyzes the L3 header, to determine the packet’s stream. Map the stream to a next hop, determines where to forward the packet. Encodes the new label stack into the packet, and then forwards it. Output Port Label 2 6 Label Switching Router (LSR) L3 Header L2 Header IP Router Module Label Dat H3 4 H2 Dat 1 H3 6 2 126 H2 Use of MPLS in a Hierarchy Swap L1 L4 OSPF R2 R1 L2 Push IN OUT IN OUT L2 L3 L3 L1 Swap OUT L1 R3 IN OUT L1 L4 R5 R6 R4 Pop L2 L3 L3 L1 L1 L1 L1 BGP L2 L1 Domain 1 Domain 2 127 Conclusion MPLS improves the scalability of hop-by-hop routing and forwarding, and provides traffic engineering capabilities for better network provisioning. It decouples forwarding from routing and allows multi-protocol support without requiring changes to the basic forwarding paradigm. Generalized MPLS (GMPLS) λMPLS (Optical wavelength-based) 128