Download slides - Inria

Internet Routing Renata Teixeira Laboratoire LIP6 CNRS and Université Pierre et Marie Curie What is routing?  A famous quotation from RFC 791 “A name indicates what we seek. An address indicates where it is. A route indicates how we get there.” -- Jon Postel 1 Challenges for Internet routing Scale  With over 1 billion destinations – – Administrative autonomy  internet = network of networks can’t store all dest’s in routing tables! routing table exchange  each network admin may want to control would swamp links! routing in its own network 2 Scalability: Classless InterDomain Routing (CIDR)  IP Prefix represents a network (group of dests) –  Represented by two 32-bit numbers (address + mask) Hierarchy in address allocation – Address allocation by ARIN/RIPE/APNIC and by ISPs – Today, routing tables contain ~150,000-200,000 prefixes IP address: 12.4.0.0 00001100 00000100 00000000 00000000 IP mask: 255.254.0.0 11111111 11111110 00000000 00000000 Network Prefix for hosts Usually written as 12.4.0.0/15 3 Reduce routing table size Without CIDR: 232.71.0.0 232.71.1.0 232.71.2.0 ….. 232.71.255.0 232.71.0.0 232.71.1.0 232.71.2.0 ….. 232.71.255.0 Big ISP Internet With CIDR: 232.71.0.0 232.71.1.0 232.71.2.0 ….. 232.71.255.0 Big ISP 232.71.0.0/16 4 Internet Autonomy: network of networks DT AS 2 AS 1 AS 3 LIP6 network  Internet = interconnection of Autonomous Systems (AS) – Distinct regions of administrative control – Routers/links managed by a single “institution” – Service provider, company, university, etc. 5 Hierarchical routing Inter-AS routing (Border Gateway Protocol) determines AS path and egress point DT 2 AS AS 1 LIP6 network AS 3 Intra-AS routing (Interior Gateway Protocol) Most common: OSPF,IS-IS determines path from ingress to egress 6 Outline  Intra-domain routing Shortest path routing – Link state: OSPF/IS-IS – Distance vector: RIP –  Inter-domain routing Challenges and goals – AS relationships – Path vector: Border Gateway Protocol (BGP) – 7 Intra-domain Routing 8 Shortest-path routing  Path-selection model – Destination-based – Load-insensitive (e.g., static link weights) – Minimum hop count or sum of link weights 2 3 2 1 1 1 4 4 5 3 9 Two types of routing algorithms  Link-state algorithm – Uses global information – Based on Dijkstra’s algorithm  Distance-vector algorithm – Information is distributed – Based on Bellman-Ford 10 Link-state routing  Each router keeps track of its incident links –  Each router broadcasts the link state –  To give every router a complete view of the graph Each router runs Dijkstra’s algorithm –  Whether the link is up or down and the cost on the link To compute the shortest paths and construct the forwarding table Example protocols – Open Shortest Path First (OSPF) – Intermediate System – Intermediate System (IS-IS) 11 Detecting topology changes  Beaconing – Periodic “hello” messages in both directions – Detect a failure after a few missed “hellos” “hello”  Performance trade-offs – Detection speed – Overhead on link bandwidth and CPU – Likelihood of false detection 12 Broadcasting the link state  Flooding – Node sends link-state information out its links – And then the next node sends out all of its links – … except the one where the information arrived X A C B (a) X A C B D D X A C B (b) X A C B (c) (d) 13 D D Broadcasting the link state    Reliable flooding – Ensure all nodes receive link-state information – … and that they use the latest version Challenges – Packet loss – Out-of-order arrival Solutions – Acknowledgments and retransmissions – Sequence numbers – Time-to-live for each packet 14 When to initiate flooding   Topology change – Link or node failure – Link or node recovery Configuration change –  Link cost change Periodically – Refresh the link-state information – Typically (say) 30 minutes – Corrects for possible corruption of the data 15 Convergence  Getting consistent routing information to all nodes –  E.g., all nodes having the same link-state database Consistent forwarding after convergence – All nodes have the same link-state database – All nodes forward packets on shortest paths – The next router on the path forwards to the next hop 2 3 2 1 1 1 4 4 5 3 16 Transient disruptions  Detection delay – A node does not detect a failed link immediately – … and forwards data packets into a “blackhole” – Depends on timeout for detecting lost hellos 2 3 2 1 1 1 4 4 5 3 17 Transient disruptions  Inconsistent link-state database – Some routers know about failure before others – The shortest paths are no longer consistent Can cause transient forwarding loops 2 2 1 3 3 1 1 4 2 2 1 1 5 – 4 4 3 18 1 4 3 Convergence delay    Sources of convergence delay – Detection latency – Flooding of link-state information – Shortest-path computation – Creating the forwarding table Performance during convergence period – Lost packets due to blackholes and TTL expiry – Looping packets consuming resources – Out-of-order packets reaching the destination Very bad for VoIP, online gaming, and video 19 Reducing convergence delay     Faster detection – Smaller hello timers – Link-layer technologies that can detect failures Faster flooding – Flooding immediately – Sending link-state packets with high-priority Faster computation – Faster processors on the routers – Incremental Dijkstra algorithm Faster forwarding-table update – Data structures supporting incremental updates 20 Bellman-ford algorithm  Define distances at each node x –  dx(y) = cost of least-cost path from x to y Update distances based on neighbors – dx(y) = min {c(x,v) + dv(y)} over all neighbors v v y 2 3 u 2 1 1 1 4 x 5 w4 t 3 s z du(z) = min{c(u,v) + dv(z), c(u,w) + dw(z)} 21 Distance vector algorithm  c(x,v) = cost for direct link from x to v –  Dx(y) = estimate of least cost from x to y –   Node x maintains distance vector Dx = [Dx(y): y є N ] Node x maintains its neighbors’ distance vectors –  Node x maintains costs of direct links c(x,v) For each neighbor v, x maintains Dv = [Dv(y): y є N ] Each node v periodically sends Dv to its neighbors – And neighbors update their own distance vectors – Dx(y) ← minv{c(x,v) + Dv(y)} for each node y ∊ N Over time, the distance vector Dx converges 22 Distance vector algorithm Iterative, asynchronous: Each node: each local iteration caused by:  Local link cost change  Distance vector update message from neighbor wait for (change in local link cost or message from neighbor) recompute estimates Distributed:  Each node notifies neighbors only when its DV changes  Neighbors then notify their neighbors if necessary if DV to any destination has changed, notify neighbors 23 Changes in a link cost Dy(x) = min{c(y,x) + Dx(x), c(y,z) + Dz(x)} = min{60+0, 1+5} = 6 X X4 Z 6 Z 8 Z Z1 60 Y 4 1 X Y Z Y4 Y5 Routing loop Z 50 X Y Y5 Y 7 Y1 This process will continue for 44 iterations! Usually called count-to-infinity problem. 24 Routing Information Protocol (RIP)    Distance vector protocol – Nodes send distance vectors every 30 seconds – … or, when an update causes a change in routing Link costs in RIP – All links have cost 1 – Valid distances of 1 through 15 – … with 16 representing infinity – Small “infinity”  smaller “counting to infinity” problem RIP is limited to fairly small networks – E.g., used in the Princeton campus network 25 Link state vs. distance vector Message complexity  LS: with n nodes, E links, O(nE) messages sent  DV: exchange between neighbors only Speed of Convergence   LS: O(n2) algorithm requires O(nE) messages DV: convergence time varies – May be routing loops – Count-to-infinity problem Robustness: What happens when router malfunctions? LS: – Node can advertise incorrect link cost – Each node computes only its own table DV: – DV node can advertise incorrect path cost – Each node’s table used by others (error propagates) 26 Conclusions: Intra-domain routing    Routing is a distributed algorithm – React to changes in the topology – Compute the shortest paths Two main shortest-path algorithms – Dijkstra  link-state routing (e.g., OSPF and IS-IS) – Bellman-Ford  distance vector routing (e.g., RIP) Convergence process – Changing from one topology to another – Transient periods of inconsistency across routers 27 Inter-domain routing Outline  Inter-domain routing – AS relationships – Path-vector routing  Border Gateway Protocol (BGP) – Incremental, prefix-based, path-vector protocol – Internal vs. external BGP – Business relationships and traffic engineering with BGP – BGP convergence delay 29 Relationship between Autonomous Systems 30 Hierarchy of autonomous systems Large Big Medium1 Univ  Medium2 Large, tier-1 provider with a nationwide backbone – At the “core” of the Internet, don’t have providers  Medium-sized regional provider with smaller backbone  Small network run by a single company or university 31 Connections between networks Big dial-in access Small IXP private peering Medium Large gateway router access router IXP commercial customer 32 Internet exchange point Single-homed customers Big Medium1 Univ  Univ Medium2 Large has only one connection to the Internet 33 Multi-homed customers Big Medium1 Univ Medium2 Large  Same provider: e.g., Medium2 to Big  Different providers: e.g., Medium2 to Big and Huge 34 Customer-provider relationship  Customer needs to be reachable from everyone –  Provider exports routes learned from customer to everyone Customer does not want to provide transit service Customer does not export from one provider to another Univ is customer of Medium1 traffic to/from Medium2 is a customer Univ of Big and Large Big Medium1 Medium2 – Univ Large 35 transit traffic is not allowed Peer-peer relationship  Peers exchange traffic between customers – AS exports only customer routes to a peer – AS exports a peer’s routes only to its customers customers exchange traffic Big and Large are peers Big and Medium1 are peers Big doesn’t provide transit for its peers Big Medium1 Medium2 Univ Large 36 Peering provides shortcuts Peering also allows connectivity between the customers of “Tier 1” providers 37 peer provider peer customer How peering decisions are made? Peer  Reduces upstream transit costs Don’t Peer  You would rather have customers  Can  Peers  May  Peering increase end-to-end performance be the only way to connect your customers to some part of the Internet (“Tier 1”) are usually your competition relationships may require periodic renegotiation 38 Inter-domain Routing 39 Interconnected ASes 3c AS 3 3b 3a 2c 2a AS 2  Forwarding table is configured by both intra- and inter-AS routing algorithm 2b AS 1 1c 1a 1b 1d Intra-AS Routing algorithm Inter-AS Routing algorithm Forwarding table 40 – Intra-AS sets entries for internal dests – Inter-AS & Intra-As sets entries for external dests Inter-AS tasks  Suppose router in AS1 receives datagram for which dest is outside of AS1 – Router should forward packet towards one of the gateway routers, but which one? 3c AS 3 3b 3a AS1 needs: 1. to learn which dests are reachable through AS2 and which through AS3 2. to propagate this reachability info to all routers in AS1 Job of inter-AS routing! AS 1 1c 1a 2c 1b 1d 2a 41 2b AS 2 Example: Setting forwarding table in router 1d  Suppose AS1 learns (via inter-AS protocol) that prefix x is reachable via AS3 (gateway 1a) but not via AS2.  Inter-AS protocol propagates reachability info to all internal routers.  Router 1d determines from intra-AS routing info that its interface I is on the least cost path to 1a.  Puts in forwarding table entry (x,I). 3c AS 3 3b 3a AS 1 1c 1a 2c 1b 2a l 1d 42 2b AS 2 Example: Choosing among multiple ASes  Now suppose AS1 learns from the inter-AS protocol that prefix x is reachable from AS3 and from AS2.  To configure forwarding table, router 1d must determine towards which gateway it should forward packets for dest x. This is also the job on inter-AS routing protocol! 3c AS 3 3b 3a AS 1 1c 1a 2c 1b 2a l 1d 43 2b AS 2 Challenges for inter-domain routing  Scale – –  Privacy – –  Prefixes: 150,000-200,000, and growing ASes: 30,000 visible ones, and growing ASes don’t want to divulge internal topologies … or their business relationships with neighbors Policy – – – No Internet-wide notion of a link cost metric Need control over where you send traffic … and who can send traffic through you 44 Shortest-path routing is restrictive  All traffic must travel on shortest paths  All nodes need common notion of link costs  Incompatible with commercial relationships YES National ISP2 National ISP1 NO Regional ISP3 Cust3 Regional ISP2 Cust2 45 Regional ISP1 Cust1 Link-state routing is problematic  Topology – – High bandwidth and storage overhead Forces nodes to divulge sensitive information  Entire – information is flooded path computed locally per node High processing overhead in a large network  Minimizes – Works only if policy is shared and uniform  Typically – some notion of total distance used only inside an AS E.g., OSPF and IS-IS 46 Distance vector is on the right track  Advantages – Hides details of the network topology – Nodes determine only “next hop” toward the dest  Disadvantages – Minimizes some notion of total distance, which is difficult in an inter-domain setting – Slow convergence due to the counting-to-infinity problem (“bad news travels slowly”)  Idea: extend the notion of a distance vector 47 Path-vector routing  Extension of distance-vector routing – Support flexible routing policies – Avoid count-to-infinity problem  Key idea: advertise the entire path – Distance vector: send distance metric per dest d – Path vector: send the entire path for each dest d d: path(2,1) d: path(1) 2 3 1 data traffic 48 d Faster loop detection  Node can easily detect a loop – Look for its own node identifier in the path – E.g., node 1 sees itself in the path “3, 2, 1”  Node – can simply discard paths with loops E.g., node 1 simply discards the advertisement d: path(2,1) d: path(1) 2 3 d: path(3,2,1) 49 1 Flexible policies  Each node can apply local policies – Path selection: Which path to use? – Path export: Which paths to advertise?  Examples – Node 2 may prefer the path “2, 3, 1” over “2, 1” – Node 1 may not let node 3 hear the path “1, 2” 2 3 1 50 Border Gateway Protocol (BGP)  Inter-domain routing protocol for the Internet – –  Prefix-based path-vector protocol Policy-based routing based on AS Paths Evolved during the past 15 years – 1989 : BGP-1 [RFC 1105] • – – – 1990: BGP-2 [RFC 1163] 1991: BGP-3 [RFC 1267] 1995: BGP-4 [RFC 1771] • – Replacement for EGP (1984, RFC 904) Support for Classless Interdomain Routing (CIDR) 2006: BGP-4 [RFC 4271] 51 BGP session operation Establish session on TCP port 179 AS1 BGP session Exchange all active routes AS2 Exchange incremental updates While connection is ALIVE exchange route UPDATE messages 52 Incremental protocol  A node learns multiple paths to destination – Stores all of the routes in a routing table Applies policy to select a single active route – May advertise the route to its neighbors –  Incremental – Announcement: • – updates Upon selecting a new active route, add node id to path Withdrawal • If the active route is no longer available 53 BGP route  Destination prefix (e.g,. 128.112.0.0/16)  Route attributes, including – – AS path (e.g., “2 1”) Next-hop IP address (e.g., 12.127.0.121) 192.0.2.1 AS 1 12.127.0.121 AS 2 AS 3 128.112.0.0/16 AS path = 2 1 Next Hop = 12.127.0.121 128.112.0.0/16 AS path = 1 Next Hop = 192.0.2.1 54 An AS is not a single node  AS – path length can be misleading An AS may have many router-level hops BGP says that path 4 1 is better than path 3 2 1 AS 4 AS 3 AS 2 AS 1 55 Two types of BGP neighbor relationships  External BGP (eBGP) –  Session between routers in different ASes Internal BGP (iBGP) – Need to distribute BGP information within the AS – iBGP sessions are routed using IGP eBGP AS1 iBGP 56 AS2 iBGP mesh doesn’t scale iBGP updates eBGP update  Configuration overhead – N border routers means N(N-1)/2 sessions – One new router requires configuring all the others  Routing overhead – Each router has to listen to updates from all neighbors – Larger routing tables, because of alternate routes 57 Route reflectors eBGP update iBGP updates RR  Acts RR RR – Routes from clients, distribute to other RRs – Routes from other RRs, distribute to clients  Only 58 like a route server sends best route BGP path selection  Simplest case – Shortest AS path – Arbitrary tie break AS 5 128.112.0.0/16 AS Path = 5 4 3 2 1 AS 7 128.112.0.0/16 AS Path = 6 2 1 AS 6  Example – Three-hop AS path preferred over a four-hop AS path – AS 7 prefers path through AS 6  But, – BGP is not limited to shortest-path routing Policy-based routing 59 BGP policy: Applying policy to routes  Import – Filter unwanted routes from neighbor • – E.g. prefix that your customer doesn’t own Manipulate attributes to influence path selection • E.g., assign local preference to favored routes  Export – policy Filter routes you don’t want to tell your neighbor • – policy E.g., don’t tell a peer a route learned from other peer Manipulate attributes to control what they see • E.g., make a path look artificially longer than it is 60 BGP route processing Receive Apply Policy = filter routes & BGP Updates tweak attributes Apply Import Policies Based on Attribute Values Best Routes Best Route Selection Best Route Table Apply Policy = filter routes & tweak attributes Apply Export Policies Install forwarding Entries for best Routes. IP Forwarding Table 61 Transmit BGP Updates Import policy: Filtering  Discard – some route announcements Detect configuration mistakes and attacks  Examples – – on session to a customer Discard route if prefix not owned by the customer Discard route that contains other large ISP in AS path Tier-3 Big Univ 128.112.0.0/16 62 Export policy: Filtering  Discard – some route announcements Limit propagation of routing information  Examples – Don’t announce routes from one peer to another – Don’t announce routes for network-management hosts Huge Big Univ 63 Large network operator BGP Policy Configuration    Routing policy languages are vendor-specific – Not part of the BGP protocol specification – Different languages for Cisco, Juniper, etc. Still, all languages have some key features – Policy as a list of clauses – Each clause matches on route attributes – … and either discards or modifies the matching routes Configuration done by human operators – Implementing the policies of their AS – Business relationships, traffic engineering, security, … 64 Implementing business relationships with BGP policies 65 Best route selection: Simplified BGP decision process  Ignore if next hop unreachable  Highest local preference  Lowest AS  Lowest  Prefer path length MED (with same next hop AS) eBGP over iBGP  Lowest IGP cost to egress router  Lowest router ID of egress router 66 Import policy: Local preference  Favor – – one path over another Override the influence of AS path length Apply local policies to prefer a path  Example: prefer customer over peer Big Local-pref = 90 Large Local-pref = 100 Tier-2 Tier-3 67 Univ Example: Customer to provider router import policies route selection export policies A local pref = 100 B select Univ route select A’s route 132.239.17.0/24 B Medium1 A send to other iBGP neighbors send to other eBGP neighbors Big Medium2 Univ Large 68 Example: Peers router import policies route selection export policies A local pref = 90 B C select M1 route send to other iBGP routers don’t send send to customers select A’s route select A’s route 132.239.0.0/16 How do B and C know that A’s route Medium1 is a peer route? Suppose Medium1, Big, and Large are peers A C Big B Medium2 Univ Large 69 Community Attribute  Community attribute is used to tag routes – List of four-byte values – Format= <AS number>:<value>  Import – Ex.: Internet2 uses to tag commercial routes • community add 11537:2001  Export – policy: Set community policy: Match community, apply policy Ex.: Don’t export commercial routes • from community 11537:2001 then reject 70 Example: Customers vs. peers router import policies route selection export policies A local pref (M1)= 100 local pref (L)= 80 select M1 route send to other iBGP and eBGP neighbors B 132.239.0.0/16 Big Medium1 A Medium2 Suppose: • M1 is a customer Univ of Big and Large • Big and Large are peers Large 71 Engineering traffic with BGP policies 72 Traffic engineering with BGP  For inbound traffic – Filter outbound routes – Tweak attributes in outbound routes in the hope to influence the neighbor’s selection  For – – inbound traffic outbound routes outbound traffic inbound routes outbound traffic Filter inbound routes Tweak attributes in inbound routes to influence best route selection 73 Implementing backup paths for outbound traffic with local prefs AS 1 primary Set local-pref = 100 For all routes from AS 1 backup AS 2 Forces outbound traffic to take primary link, unless link is down 74 Set local-pref = 50 For all routes from AS 1 Implementing backup paths for inbound traffic with AS prepending AS 1 132.239.17.0/24 AS-path = 2 2 2 132.239.17.0/24 AS-path = 2 backup primary AS 2 Usually, forces inbound traffic to take primary link, unless link is down 75 AS prepending may not work AS 0 AS 1 132.239.17.0/24 AS-path = 2 2 2 2 2 2 2 132.239.17.0/24 AS-path = 2 backup primary AS 2 Suppose that AS2 is a customer of AS1 and AS0, then AS0 prefers the backup route, because local preference is considered before AS path length 76 BGP communities can help AS 0 AS 1 132.239.17.0/24 community = 3:70 132.239.17.0/24 backup primary AS 2 Use community attribute to signal which local pref value should be used. If customer local pref is 100 and peer local pref is 90, setting local pref to 70 will solve the problem. 77 Example: Multiple egress points router import policies route selection export policies A B C local pref = 80 local pref = 80 local pref = 80 select Big route select Big route select Large route send to other iBGP send to other iBGP send to other iBGP route to UPMC Big What will router D choose? Medium1 A B Medium2 C Univ Large 78 D Hot-potato (early-exit) routing route to Univ Big A B 1 7 2 2 C 1 2 5 1 IGP distances D-A : 10 D D-B: 8 D-C: 7 Large traffic to Univ Hot-potato routing = route to closest egress point when there is more than one route to destination 79 Asymmetric routing A 2 Big 100 Paris London Large 5 150 B A to B 80 B to A Some routers don’t need BGP  Customer that connects to a single upstream ISP – The ISP can introduce the prefixes into BGP – … and the customer can simply default-route to the ISP Big Nail up routes 130.132.0.0/16 pointing to Univ Nail up default routes 0.0.0.0/0 pointing to Big Univ 130.132.0.0/16 81 Some routers don’t need BGP  Routers inside a “stub” network – Border router may speak BGP to upstream ISPs – But, internal routers can simply “default route” Big Large BGP Univ 130.132.0.0/16 82 Joining BGP and IGP information  Border Gateway Protocol (BGP) – Announces reachability to external destinations – Maps a destination prefix to an egress point •  128.112.0.0/16 reached via 192.0.2.1 Interior Gateway Protocol (IGP) – Used to compute paths within the AS – Maps an egress point to an outgoing link • 192.0.2.1 reached via 10.1.1.1 10.1.1.1 192.0.2.1 83 BGP convergence 84 Causes of BGP routing changes     Topology changes – Equipment going up or down – Deployment of new routers or sessions BGP session failures – Due to equipment failures, maintenance, etc. – Or, due to congestion on the physical path Changes in routing policy – Reconfiguration of preferences – Reconfiguration of route filters Persistent protocol oscillation – Conflicts between policies in different ASes 85 BGP session failure    BGP runs over TCP – BGP only sends updates when changes occur – TCP doesn’t detect lost connectivity on its own AS1 Detecting a failure – Keep-alive: 60 seconds – Hold timer: 180 seconds Reacting to a failure – – Discard all routes learned from the neighbor AS2 Send new updates for any routes that change 86 Routing change: Before and after 0 0 (1,0) (2,0) (2,0) (1,2,0) 1 1 2 2 (3,1,0) (3,2,0) 3 3 87 Routing change: Path exploration  AS 1 – Delete the route (1,0) – Switch to next route (1,2,0) – Send route (1,2,0) to AS 3  AS 0 (2,0) (1,2,0) 1 3 – Sees (1,2,0) replace (1,0) – Compares to route (2,0) – Switches to using AS 2 2 (3,2,0) 3 88 Routing change: Path exploration    (1,0) (1,2,0) (1,3,0) Initial situation – Destination 0 is alive – All ASes use direct path When destination dies – All ASes lose direct path – All switch to longer paths – Eventually withdrawn 1 2 0 E.g., AS 2 – (2,0)  (2,1,0) – (2,1,0)  (2,3,0) – (2,3,0)  (2,1,3,0) – (2,1,3,0)  null (3,0) (3,1,0) (3,2,0) 89 (2,0) (2,1,0) (2,3,0) (2,1,3,0) 3 BGP converges slowly, if at all  Path vector avoids count-to-infinity – –  Fortunately, in practice – –  But, ASes still must explore many alternate paths … to find the highest-ranked path that is still available Most popular destinations have very stable BGP routes And most instability lies in a few unpopular destinations Still, lower BGP convergence delay is a goal – – – Can be tens of seconds to tens of minutes High for important interactive applications … or even conventional application, like Web browsing 90 Why different intra- and inter-AS routing?   Policy – Inter-AS: admin wants control over how its traffic routed, who routes through its network – Intra-AS: single admin, so no policy decisions needed Scale –  hierarchical routing saves table size, reduced update traffic Performance – Intra-AS: can focus on performance – Inter-AS: policy may dominate over performance 91 Conclusions   BGP is solving a hard problem – Routing protocol operating at a global scale – With tens of thousands of independent networks – That each have their own policy goals – And all want fast convergence Key features of BGP – Prefix-based path-vector protocol – Incremental updates (announcements and withdrawals) – Policies applied at import and export of routes – Internal BGP to distribute information within an AS – Interaction with the IGP to compute forwarding tables 92 Acknowledgements Many of the slides presented here are borrowed from Tim Griffin, Jim Kurose, and Jennifer Rexford. 93 Routing measurements Renata Teixeira Laboratoire LIP6 CNRS and Université Pierre et Marie Curie Outline  Measurements in an AS – IGP routing measurements – iBGP measurements – Impact of IGP changes in BGP  BGP Measurements from multiple ASes – Public BGP data: RIPE and RouteViews – Diagnosis of inter-domain routing changes 95 Impact of hot-potato routing changes in IP networks with Aman Shaikh (AT&T) Tim Griffin (University of Cambridge) Jennifer Rexford (Princeton) Internet routing architecture Big Medium1 Medium2 Univ Large inter-domain routing (BGP) intra-domain routing (OSPF,IS-IS) Changes in one AS End-to-end performance may impact traffic depends on all ASes and routing in other ASes along the path 97 Inter-domain routing  Border – Exchange reachability info between ASes • – Gateway Protocol (BGP) IP prefix – block of IP addresses Use policies to determine paths to use and to export to neighbors I can reach 132.239.17.0/24 BGP selects egress Choices of egress points 132.239.17.1 98 Intra-domain routing  Interior Gateway Protocol (IGP) – Most common: OSPF and IS-IS – Flood link-state information – Compute shortest path based on link weights – Link weights are set by network operators A E 3 6 1 2 B F 1 2 D C Cost of path from A to F = 1+2+1+3 99 Interaction between IGP and BGP route to Univ Big A B 1 6 2 2 C 1 2 5 1 IGP distances D-A : 9 D D-B: 8 D-C: 7 Large traffic to Univ Hot-potato routing = route to closest egress point when there is more than one route to destination 100 Hot-potato routing changes traffic to Univ Big A B 1 6 2 2 C 1 Large 2 5 1 IGP distances D-A : 9 D D-B: 8 11 D-C: 7 10 Consequences: Transient forwarding instability - failure Traffic -shift planned maintenance BGP routing - trafficchanges engineering 101 Impact of hot-potato routing changes?  How often hot-potato changes happen?  How many destinations do they affect?  What  How are the convergence delays? much traffic do they affect? 102 Outline  Measurement methodology – Collection of OSPF and BGP data of AT&T – Identification of hot-potato routing changes  BGP impact  Traffic impact  Minimizing hot-potato disruptions 103 Routing measurements  Passive measurement of routing protocol – A monitor acts as a regular router – Stores all messages with a timestamp  IS-IS – or OSPF monitoring Flooding gives complete view of topology dynamics  BGP monitoring – Only best route for each prefix from a router – Need to collect data from multiple vantage points 104 Collecting input data BGP updates BGP monitor Monitor updates from nine vantage points AT&T A backbone B OSPF messages OSPF Monitor Replay routing decisions from vantage point A and B to identify hot-potato changes 105 Monitor the flooding of link-state advertisements Challenges for identifying hot-potato changes  A single – Group related routing messages in time  Router – implementation affects message timing Controlled experiments of router in the lab  Many – event may cause multiple messages BGP updates caused by external events Classify BGP routing changes by possible causes 106 Algorithm for correlating routing changes    Compute distance changes – Group OSPF messages close in time – Compute distance changes from each vantage point Classify BGP changes by possible OSPF cause – Group updates close in time – Compare old and new route according to decision process Determine causal relationship – Consistent BGP and OSPF changes – Close in time NY SF 9 ? 10 Dallas BGP update: SF  NY 107 Outline  Measurement methodology  BGP impact – How often do hot-potato changes happen? – Which fraction of prefixes do they affect?  Traffic impact  Minimizing hot-potato disruptions 108 BGP impact of an OSPF change router A router B … but few have a very big impact Vast majority of OSPF changes have no impact on these routers 109 Variation across routers dst dst NY NY SF 9 SF 10 1 B 1000 A Small changes will make router A switch egress points to dst More robust to intradomain routing changes Significance of hot-potato routing depends on network design and router location. 110 Outline  Measurement methodology  BGP impact  Traffic impact  – How long are convergence delays? – What is the impact on traffic? Minimizing hot-potato disruptions 111 Delay for BGP routing change  Steps – – – – between OSPF change and BGP update OSPF message flooded through the network (t0) OSPF updates distance information OSPF monitor BGP decision process rerun (timer driven) BGP update sent to another router (t) • First BGP update sent (t1) BGP monitor  Metrics time to update other prefixes time for BGP to revisit its decision t0 t1 t 112 time BGP reaction time uniform 5 – 80 sec Worst case scenario: Transfer delay 0 – 80 sec to revisit BGP decision 50 – 110 sec to send multiple updates Last prefix may take 3 minutes to converge! First BGP update All BGP updates 113 Transient data disruptions 1 – BGP decision process runs in R2 2 – R2 starts using E1 to reach dst 3 – R1’s BGP decision can take up to 60 seconds to run R1 10 R2 10 100 111 E2 E1 dst Disastrous for interactive applications (VoIP, gaming, web) 114 Challenges for Active Measurements  Problem: Single-homed probe machines – Probes do not experience the loop – Probes do not illustrate the customer experience customer traffic in loop R1 Operator probes P1 10 R2 111 10 100 10 R2 E2 E1 dst 115 P2 Traffic shifts MB per second e1 p e2 routing shift (R) i load variation (L) decrease in traffic TM(i,e1,t) = L(i,e1,t) + R(i,e1,t) L(i,e1,t) : variation that still uses e TM(i,e1of ,t) traffic = TM(i,e 1,t) - TM (i,e11,t-1) R(i,e1,t) : traffic that moved to e1 – moved out of e1 minutes 116 R relative to normal variations Large shifts caused by routing changes routing shift 70 times normal variations TM caused by load Vast majority (99.88%) of TM  [-4,4] TM caused by routing TM relative to normal variations 117 Hot-potato vs. external BGP routing changes Hot-potato changes have biggest impact on TM Hot potato eBGP TM relative to normal variations 118 Summary of measurement analysis   Convergence can take minutes – Forwarding loops, leads to packet loss and delay – Fixes: event-driven implementations or tunnels Frequency of hot-potato changes depends on location –  Once a week on average for more affected routers Internal events can have big impact – Some events affect over half of a BGP table – Responsible for largest traffic variations 119 Implications of hot-potato routing changes  To  end users – Transient disruptions – New end-to-end path characteristics To network administrators – Instability in the traffic matrix 120 Outline  Measurement methodology  BGP impact  Traffic impact  Minimizing hot-potato disruptions – What can operators do today? 121 What can operators do today?  Network design – Design networks that minimize hot-potato changes – Implement a fixed ranking of egress points (e.g., MPLS tunnels injected in IGP)  Maintenance – Plan maintenance activities considering the impact of changes on BGP routes 122 Comparison of Network Designs dst dst NY SF NY SF 1 dst 9 10 NY 10 SF 9 Dallas 1000 Dallas 10 9 Dallas Small changes will make Dallas switch egress points to dst 123 More robust to intradomain routing changes Careful Cost in/out Links dst NY SF 4 10 5 10 10 5 Dallas 124 100 Conclusion  Hot-potato routing is too disruptive –   Small changes inside an AS can lead to big disruptions on BGP and transit traffic In addition, hot potato is… – Too restrictive: Egress selection mechanism dictates a policy – Too convoluted: IGP metrics determine BGP egress selection Introduce more flexible egress selection mechanism – TIE: Tunable Interdomain Egress selection 125 More Info http://rp.lip6.fr/~teixeira  BGP impact –  Traffic impact –  R. Teixeira, N. Duffield, J. Rexford, and M. Roughan, “Traffic Matrix Reloaded: Impact of Routing Changes”, in Proc. of PAM, March 2005. Model of network sensitivity to IGP changes –  R. Teixeira, A. Shaikh, T. Griffin, and J. Rexford, “Dynamics of Hot-Potato Routing in IP networks”, in Proc. of ACM SIGMETRICS, June 2004. R. Teixeira, T. Griffin, A. Shaikh, and G.M. Voelker, “Network Sensitivity to Hot-Potato Disruptions”, in Proc. of ACM SIGCOMM, August 2004. New egress selection mechanism – R. Teixeira, T. Griffin, M. Resende, and J. Rexford, “TIE Breaking: Tunable Interdomain Egress Selection”, IEEE/ACM Transactions on Networking, August, 2007. 126 Pin-pointing routing changes with Jennifer Rexford (Princeton) Why diagnose routing changes?  Routing changes cause service disruptions – Convergence delay – Traffic shift – Change in path properties • RTT, available bandwidth, or lost connectivity  Operators need to know – Why: For diagnosing and fixing problems – Where: For accountability • Need to guarantee service-level agreements 128 Limitations of active measurements  Active measurements: traceroute-like tools – Can’t probe in the past – Shows the effect, not the cause AS 2 AS 4 AS 1 AS 3 129 Passive BGP measurements  Passive measurements: public BGP data – RIPE RIS: http://www.ripe.net/ris/ – RouteViews: http://www.routeviews.org/ eBGP update feeds Data Collection (RouteViews, RIPE) 130 Many uses for public BGP data  Build the AS-level topology of the Internet  Inference of AS relationships  Origin of IP prefixes  Prediction of AS-level paths  Map IP addresses to ASes  Understand routing instabilities  Locate the origin of routing changes  Detect prefix hijacks  … 131 Appealing to locate origin of routing changes BGP update feeds Data Mining (across time, views, and prefixes) Data Collection (RouteViews, RIPE) 132 root cause Each router’s view is unique Myth: The BGP updates from a single router accurately represent the AS. dst The measurement system needs to capture the BGP routing changes from all border routers AS 2 AS 1 A B 7 6 12 C 10 D BGP data collection No change 133 Internal routing changes matter Myth:Routing changes visible in eBGP have greater end-to-end impact than changes with local scope. dst The measurement system needs to capture internal changes inside an AS AS 2 AS 1 A B 5 6 12 C 10 7 D BGP data collection No change 134 Route aggregation hides information Myth:BGP data from a router accurately represents changes on that router. The measurement system needs to know all routes the router knows. 12.1.1.0/24 BGP data collection No change A 12.1.0.0/16 135 Policy obscures event location Myth:The AS responsible for the change appears in the old or the new AS path. peer peer 1 2 3 BGP data 4 Accurate collection 8 troubleshooting old AS path: 5 may require 1,2,8,9,10 9 measurement data new AS path: 6 from each AS 1,4,5,6,7,10 7 11 10 136 dst Conclusion  Routing monitoring is essential for network operation –  Diagnose and fix network failures Both BGP and IGP monitoring are passive – – Listen to routing messages Little overhead  Public – – BGP data can be misleading Only few vantage points per AS Inaccuracies depend on the goal of the study 137

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download slides - Inria