* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Towards an Accurate AS-level Traceroute Tool
Survey
Document related concepts
Net neutrality law wikipedia , lookup
Airborne Networking wikipedia , lookup
Deep packet inspection wikipedia , lookup
Multiprotocol Label Switching wikipedia , lookup
Recursive InterNetwork Architecture (RINA) wikipedia , lookup
IEEE 802.1aq wikipedia , lookup
Transcript
Internet Routing: Measurement, Modeling, and Analysis ACM Sigmetrics 2005 Tutorial Dr. Jia Wang [email protected] AT&T Labs Research Florham Park, NJ 07932, USA http://www.research.att.com/~jiawang/ Prof. Zhuoqing Morley Mao [email protected] Department of EECS University of Michigan Ann Arbor, MI 48109, USA http://www.eecs.umich.edu/~zmao/ Outline 1. 2. 3. 4. Overview of Inter-domain routing Measuring inter-domain paths BGP Measurement BGP Modeling Our opinions should not be taken to represent AT&T policies 2 Part I: Overview of Interdomain Routing Internet Loose cooperative effort of Internet Service Providers (ISPs) E.g., AT&T, Sprint, UUNet, AOL Best effort service Connectedness Anyone connected to the Internet can exchange traffic with anyone else connected to the Internet 4 Internet routing routes Control plane: exchange routes Internet : Routing session Data plane: forward traffic IP traffic Fail over to alternate route rusty.cs.berkeley.edu www.cnn.com IP=169.229.62.116 Prefix=169.229.0.0/16 IP=64.236.16.52 5 Prefix=64.236.16.0/20 Internet routing domain Autonomous routing domain Network devices under same technical and administrative control Common routing policy E.g., ISPs, enterprise networks Autonomous system Autonomous routing domain with an AS number (ASN) AS numbers: 16 bits integer Public AS number: 1 – 64511 Private AS number: 64512 – 65535 Examples AT&T: 7018, 6431, … Sprint: 1239, 1240, … MIT: 3 6 More than 20,000 ASes today Internet Autonomous System ISP Level3 Calren Berkeley ISP ISP Qwest ISP business ISP ISP AT&T Sprint UUnet ISP ISP IP traffic University company GNN CNN 7 Internet routing architecture Intra-domain routing Calren Berkeley Level3 IP traffic Internet Inter-domain routing GNN CNN 8 Intra-domain routing Run within a certain network infrastructure Optimize routes taken between points within a network Internal Gateway Protocols (IGPs) Metrics based OSPF (Open Shortest Path First) RIP (Routing Information Protocol) IS-IS (Intermediate System to Intermediate System) 9 Inter-domain routing Run between networks Provide full connectivity of entire Internet External Gateway Protocol (EGP) Policy based BGP (Border Gateway Protocol) 10 Link state protocols Examples: OSPF, IS-IS Based on Dijkstra’s shortest path computation Each router periodically floods immediate reachability information to other routers Fast convergence High communication and computation overhead Not scalable for large networks Requires periodic refreshes 11 Vectoring protocols Distance vs. Path Vector Distance: hop count (RIP) Path: entire path (BGP) Helps identify loops Supports policy-based routing based on path Minimal communication overhead Takes longer to converge, i.e., in proportion to the maximum path length 12 Link state vs. vectoring Link state Vectoring IGP EGP OSPF IS-IS RIP BGP BGP is a path vector protocol 13 Classful addressing IPv4: 32 bits Five classes of networks Class Address Mask # of networks # of hosts A 0* 255.0.0.0 128 ~1.6M B 10* 255.255.0.0 16384 65535 C 110* 255.255.255.0 ~2.1M 255 D Used for multicast E Reserved and currently unused Improve scaling factor of routing in the Internet => classless 14 CIDR: Classless Inter-domain Routing (RFC1519) No implicit mask based on the class of the network Explicit masks passed in the routing protocol Allow aggregation and hierarchical routing IP address: 12.70.0.0 Address Mask Mask: 255.255.252.0 00001100 00100110 00000000 00000000 11111111 11111111 11000000 00000000 Network prefix CIDR representation: 12.70.0.0/22 Host identifier 15 Address aggregation 12.70.0.0/24 12.70.1.0/24 12.70.2.0/24 12.70.3.0/24 Internet ISP A 12.71.0.0/16 ISP B 12.70.0.0/22 12.71.0.0/16 16 Routing and forwarding Routing The decision process of choosing optimal path that is consistent with the administrative or technical policy Forwarding The act of receiving a packet, doing a lookup, and copying a packet to the next hop 17 Classless forwarding Internet 12.70.0.20 10.20.128.10 10.20.128.1 10.20.0.1 IP traffic 10.20.1.1 135.120.0.1 Prefix 12.70.0.0/24 12.70.0.0/16 12.0.0.0/8 0.0.0.0 Next hop 10.20.0.1 10.20.1.1 10.20.128.1 10.20.128.10 18 Inter-domain routing with CIDR support BGP-4 [RFC1771] De facto EGP Carry routing information between ASes Path vector protocol Policy based routing Run on top of TCP for reliability Basic operations Set up BGP session Exchange all candidate routes Send incremental updates 19 Establish BGP session Establish neighboring session between 12.10.0.1 and 12.10.0.2 12.10.0.1 Prefix 135.120.0.0/24 68.35.0.0/16 TCP 179 Next hop 10.128.0.1 10.192.1.1 12.10.0.2 Prefix 12.70.0.0/24 12.9.0.0/16 Next hop 10.20.0.1 10.20.1.1 20 Exchange all candidate routes 12.70.0.0/24 12.9.0.0/16 10.20.0.1 10.20.1.1 12.10.0.1 12.10.0.2 135.120.0.0/24 68.35.0.0/16 Prefix 135.120.0.0/24 68.35.0.0/16 12.70.0.0/24 12.9.0.0/16 Next hop 10.128.0.1 10.192.1.1 10.20.0.1 10.20.1.1 10.128.0.1 10.192.1.1 Prefix 12.70.0.0/24 12.9.0.0/16 135.120.0.0/24 68.35.0.0/16 Next hop 10.20.0.1 10.20.1.1 10.128.0.1 10.192.1.1 21 Send incremental updates Withdraw 12.9.0.0/16 12.10.0.1 Prefix 135.120.0.0/24 68.35.0.0/16 12.70.0.0/24 12.9.0.0/16 12.10.0.2 Next hop 10.128.0.1 10.192.1.1 10.20.0.1 10.20.1.1 Prefix 12.70.0.0/24 12.9.0.0/16 135.120.0.0/24 68.35.0.0/16 Next hop 10.20.0.1 10.20.1.1 10.128.0.1 10.192.1.1 22 BGP messages OPEN: set up a peering session UPDATE: announce new routes or withdraw previously announced routes NOTIFICATION: shut down a peering session KEEPALIVE: confirm active connection at regular interval 23 Internal vs. external BGP Internet I-BGP E-BGP update E-BGP I-BGP update AS B AS C AS A 24 Scaling I-BGP for large AS Route reflectors Confederations AS 1000 E-BGP update EBGP RR RR IBGP AS 65010 IBGP EBGP AS 65020 Only best paths being sent by RR 25 Establish connectivity Prefix 135.120.0.0/16 AS 3 Next hop AS path 12.10.0.5 2 1 Prefix 135.120.0.0/16 IBGP Next hop AS path 12.10.0.1 1 12.10.0.6 EBGP 12.10.0.5 AS 2 AS 1 135.120.0.0/16 IBGP EBGP 12.10.0.1 12.10.0.2 IBGP Prefix 135.120.0.0/16 Next hop AS path 12.10.0.1 1 26 IGP and BGP working together Prefix 135.120.0.0/16 AS 3 IBGP Next hop AS path 12.10.0.1 1 Prefix 12.10.0.0/30 135.120.0.0/16 12.10.0.6 Next hop 10.10.0.1 10.10.0.1 EBGP 12.10.0.5 AS 1 12.10.0.1 135.120.0.0/16 EBGP AS 2 12.10.0.2 10.10.0.1 IBGP IBGP Prefix 135.120.0.0/16 Next hop AS path 12.10.0.1 1 27 Policy routing ISP2 ISP1 traffic Connectivity DOES NOT imply reachability! ISP3 ISP4 traffic Cust1 Cust2 Policy determines how traffic can flow on the Internet 28 BGP routing process Routes received from peers Apply input policy Select best route Best routes Apply output policy Routes advised to peers Routing Forwarding table table BGP is not shortest path routing! 29 Best route selection Highest local preference Shortest AS path Lowest MED (Multi-Exit-Discriminator) I-BGP < E-BGP Lowest I-BGP cost to E-BGP egress Tie breaking rules 30 Best route selection Highest local preference To enforce economical relationships between domains Shortest AS path Lowest MED (Multi-Exit-Discriminator) I-BGP < E-BGP Lowest I-BGP cost to E-BGP egress Tie breaking rules 31 Best route selection Highest local preference Shortest AS path Compare the quality of routes, assuming shorter AS-path length is better Lowest MED (Multi-Exit-Discriminator) I-BGP < E-BGP Lowest I-BGP cost to E-BGP egress Tie breaking rules 32 Best route selection Highest local preference Shortest AS path Lowest MED (Multi-Exit-Discriminator) To implement “cold potato” routing between neighboring domains I-BGP < E-BGP Lowest I-BGP cost to E-BGP egress Tie breaking rules 33 Best route selection Highest local preference Shortest AS path Lowest MED (Multi-Exit-Discriminator) I-BGP < E-BGP Prefer EBGP routes to IBGP routes Lowest I-BGP cost to E-BGP egress Tie breaking rules 34 Best route selection Highest local preference Shortest AS path Lowest MED (Multi-Exit-Discriminator) I-BGP < E-BGP Lowest I-BGP cost to E-BGP egress Prefer routes via the nearest IGP neighbor To implement “hot potato” routing Tie breaking rules 35 Best route selection Highest local preference Shortest AS path Lowest MED (Multi-Exit-Discriminator) I-BGP < E-BGP Lowest I-BGP cost to E-BGP egress Tie breaking rules Router ID based: lowest router ID Age based: oldest route 36 BGP route propagation Not all possible routes propagate Commercial relationships determine policies for Route import Route selection Route export 37 Typical AS relationships Provider-customer customer pay money for transit Peer-peer typically exchange respective customers’ traffic for free Siblings Mutual transit agreement Provide connectivity to the rest of the Internet for each other 38 AS relationships translate into BGP export rules Export to a provider or a peer Allowed: its routes and routes of its customers and siblings Disallowed: routes learned from other providers or peers Export to a customer or a sibling Allowed: its routes, the routes of its customers and siblings, and routes learned from its providers and peers 39 Which AS paths are legal? Valley-free: After traversing a provider-customer or peer-peer edge, cannot traverse a customer-provider or peer-peer edge Invalid path: >= 2 peer links, downhilluphill, downhill-peer, peer-uphill 40 Example of valley-free paths [1 2 3], [1 2 6 3] are valley-free X X [1 4 3], [1 4 5 3] are not valley free 41 Inferring AS relationships Identify the AS-level hierarchy of Internet Not shortest path routing Predict AS-level paths Traffic engineering Understand the Internet better Correlate with and interpret BGP update Identify BGP misconfigurations E.g., errors in BGP export rules 42 Existing approaches On inferring Autonomous Systems Relationships in the Internet, by L. Gao, IEEE Global Internet, 2000. Characterizing the Internet hierarchy from multiple vantage points, by L. Subramanian, S. Agarwal, J. Rexford, and R. Katz, IEEE Infocom, 2002. Computing the Types of the Relationships between Autonomous Systems, by G. Battista, M. Patrignani, and M. Pizzonia, IEEE Infocom, 2003. On AS-level Path Inference, by Z. Mao, L. Qiu, J. Wang, and Y. Zhang, ACM Sigmetrics, 2005. 43 Policy routing causes path inflation End-to-end paths are significantly longer than necessary Why? Topology and routing policy choices within an ISP, between pairs of ISPs, and across the global Internet Peering policies and interdomain routing lead to significant inflation Interdomain path inflation is due to lack of BGP policy to provide convenient engineering of good paths across ISPs 44 Path inflation Based on [Mahajan03] Comparing actual Internet paths with hypothetical “direct” link 45 Part II: Measuring Interdomain Forwarding Paths Why do we care? Characterize end-to-end network paths Latency Capacity Link utilization Loss rate. Diagnose routing anomalies Forwarding loop, blackholes, routing changes, unexpected paths, main component of end-to-end latency. Discover Internet topology Server placement 47 Key challenge Need to understand how packets flow through the Internet without real-time access to proprietary routing data from each domain. Identify accurate packet forwarding paths Characterize the performance metrics of each hop along the paths 48 Existing approaches With access to the source AS-level traceroute Towards an Accurate AS-Level Traceroute Tool, by Z. Mao, J. Rexford, J. Wang, and R. Katz, ACM Sigcomm, 2003. Scalable and Accurate Identification of AS-Level Forwarding Paths, by Z. Mao, D. Johnson, J. Rexford, J. Wang, and R. Katz, IEEE Infocom, 2004. Without access to the source Routescope On AS-level Path Inference, by Z. Mao, L. Qiu, J. Wang, and Y. Zhang, ACM Sigmetrics, 2005. 49 AS-Level Traceroute Traceroute gives IP level forwarding path IP address of the router interfaces on a forwarding path RTT statistics for each hop along the way 50 Traceroute from AT&T Research to www.cnn.com traceroute to cnn.com (64.236.24.12), 30 hops max, 40 byte packets 1 oden (135.207.16.1) 1 ms 1 ms 1 ms 2 *** 3 attlr-gate (192.20.225.1) 2 ms 2 ms 2 ms 4 12.119.155.157 (12.119.155.157) 3 ms 4 ms 4 ms 5 gbr6-p52.n54ny.ip.att.net (12.123.192.18) 4 ms 4 ms 4 ms 6 tbr2-p012401.n54ny.ip.att.net (12.122.11.29) 4 ms (ttl=249!) 5 ms (ttl=249!) 5 ms (ttl=249!) 7 ggr2-p390.n54ny.ip.att.net (12.123.3.62) 4 ms 5 ms 4 ms 8 att-gw.ny.aol.net (192.205.32.218) 4 ms 4 ms 4 ms 9 bb2-nye-P1-0.atdn.net (66.185.151.66) 4 ms 4 ms 4 ms 10 bb2-vie-P8-0.atdn.net (66.185.152.201) 13 ms (ttl=245!) 12 ms (ttl=245!) 12 ms (ttl=245!) 11 bb1-vie-P11-0.atdn.net (66.185.152.206) 10 ms 10 ms 10 ms 12 bb1-cha-P7-0.atdn.net (66.185.152.28) 20 ms 20 ms 20 ms 13 bb1-atm-P6-0.atdn.net (66.185.152.182) 25 ms 25 ms 25 ms 14 pop1-atl-P4-0.atdn.net (66.185.136.17) 25 ms (ttl=243!) 24 ms (ttl=243!) 24 ms (ttl=243!) 15 * * * 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * Destination unreachable! Who is responsible for the forwarding problem? 51 Need to know Inter-domain level path Obtain AS level paths BGP AS path Traceroute AS path 52 BGP AS path Signaling path: control traffic d: path=[BC] d: path=[C] Prefix d Forwarding path: data traffic Prefix d … AS path ABC … Is BGP AS path the answer? No! 53 BGP AS path is not the answer Requires timely access to BGP data Signaling path may differ from forwarding path Route aggregation and filtering Routing anomalies: e.g., deflections, loops [Griffin2002] BGP misconfigurations: e.g., incorrect AS prepending Two paths may differ precisely when operators most need accurate data to diagnose a problem! 54 Traceroute AS path Obtain IP level path using traceroute Map IP addresses to ASes a b c d Source e Destination AS A AS B AS C AS D Is traceroute AS path the answer? NO! 55 Traceroute AS path is not the answer Identifying ASes along forwarding path is surprisingly difficult! Internet route registry Origin AS in BGP routes 56 Internet route registry Whois database E.g. NANOG traceroute, prtraceroute Out-of-date, incomplete Address allocation to customers Acquisition, mergers, break-ups 57 Origin AS in BGP routes Last AS in the AS path for each prefix Prefix AS path d ABC … … More accurate and complete than whois data 58 Limitations of BGP origin AS Multiple Origin AS (MOAS) Multi-homing misconfiguration Internet eXchange Points (IXPs) Infrastructure addresses may not be advertised Does not require to be announced publicly Security concerns Addresses announced by someone else Static routed customers Shared equipments at boundary between ASes Need accurate IP-to-AS mapping! 59 Accurate AS-level traceroute Combine BGP and traceroute data to find a better answer! 60 Assumptions IP-to-AS mapping Mappings from BGP tables are mostly correct. Change slowly BGP paths and forwarding paths mostly match. 70% of the BGP path and traceroute path match 61 BGP path and traceroute path could differ! Inaccurate IP-to-AS mapping Traceroute problems Legitimate mismatches 62 BGP path and traceroute path could differ! Inaccurate IP-to-AS mapping Internet eXchange Points (IXPs) Sibling ASes Unannounced infrastructure addresses Traceroute problems Legitimate mismatches 63 Internet eXchange Points (IXPs) Shared infrastructure connected to multiple service providers Exchange BGP routes and data traffic May have its own AS number or announced by participating ASes Dedicated BGP sessions between pairs of participating ASes E.g., Mae-East, Mae-West, PAIX. 64 IXPs cause extra AS hop Extra AS hop in traceroute path Large number of fan-in and fan-out ASes Non transit AS, small address block, likely MOAS A B C D E A E F B F G C G Traceroute AS path BGP AS path 65 Sibling ASes Single organization owns and manages multiple ASes May share address space Cause extra AS hop Large fan-in and fan-out for the “sibling AS pair” A B C H D E A F B G C Traceroute AS path E D F G BGP AS path 66 Unannounced infrastructure addresses ASes do not necessarily announce infrastructure via BGP Lead to “unmapped” addresses Sometimes fall into supernet announced by AS’s provider or sibling 67 Unannounced infrastructure addresses AS loop in traceroute path AS A 4. A,C,A 3. B,A AS B Substitute AS hop 2. A Missing AS hop in traceroute path Extra AS hop in traceroute path AS C 1. A,C 68 BGP path and traceroute path could differ! Inaccurate IP-to-AS mapping Traceroute problems Forwarding path changing during traceroute Interface numbering at AS boundaries ICMP response refers to outgoing interface Legitimate mismatches 69 Forwarding path changing during traceroute AS D AS E Route flaps between A B C and A D E AS A AS A AS B AS D AS C AS C AS hop B is substituted by AS D in the traceroute path 70 Interface numbering at AS boundaries AS A AS A AS C AS B AS C Missing AS hop B in traceroute path 71 ICMP response refers to outgoing interface AS A AS C ICMP message AS B Extra AS hop B in traceroute path 72 BGP path and traceroute path could differ! Inaccurate IP-to-AS mapping Traceroute problems Legitimate mismatches Route aggregation and filtering Routing anomalies, e.g., deflections 73 Route aggregation/filtering AS A 8.0.0.0/8 B C AS B AS C 8.0.0.0/8 C 8.64.0.0/16 C D Extended traceroute path due to filtering by AS B 74 Mismatch patterns and causes Extra AS Miss AS AS Loop Subst AS IXP X Sibling ASes X X X X Unannounced IP X X X X Aggregation/ filtering Other X Inter-AS interface X ICMP source address X X Routing anomaly X X X X X X X X 75 BGP and traceroute data collection Initial mappings from origin AS of a large set of BGP tables (Ignoring unstable paths) For each location: Local BGP paths For each location: Combine all locations: Traceroute paths from multiple locations Traceroute AS paths •Compare •Look for known causes of mismatches (e.g., IXP, sibling ASes) •Edit IP-to-AS mappings (a single change explaining a large number of mismatches) 76 Measurement setup Eight vantage points Upstream providers: US-centric tier-1 ISPs Sweep all routable IP address space About 200,000 IP addresses, 160,000 prefixes, 15,000 destination ASes 77 Preprocessing BGP paths Discard prefixes with BGP paths containing Routing changes based on BGP updates Private AS numbers (64512 - 65535) Empty AS paths (local destinations) AS loops from misconfiguration AS SET instead of AS sequence Less than 1% prefixes affected 78 Preprocessing traceroute paths Resolving incomplete traceroute paths Unresolved hops within a single AS map to that AS Unmapped hops between ASes Try match to neighboring AS using DNS, Whois Trim unresponsive (*) hops at the end Compare with the beginning of local BGP paths MOAS at the end of paths Assume multi-homing without BGP Validation using AT&T router configurations More than 98% cases validated 79 Initial IP-to-AS Mapping Whois Combined BGP tables Resolving incompletes Match 44.7% 73.2% 78.0% Mismatch 29.4% 8.3% 9.0% 1.5 8.8 9.0 Ratio 80 Heuristics to improve mappings Overall modification to mappings 10% IP-to-AS mappings modified 25 IXPs identified 28 pairs of sibling ASes found 1150 of the /24 prefixes shared Match Mismatch Ratio IXP Sibling ASes Unannounced address space 84.4% 85.9% 90.6% 8.7% 7.8% 3.5% 9.7 11.0 26.0 81 Systematic optimization Dynamic-programming and iterative improvement Initial IP-to-AS mapping derived from BGP routing tables Identify a small number of modifications that significantly improve the match rate. 95% match ratio, less than 3% changes, very robust 82 Optimization results Input mapping Full initial Mapping Heuristically optimized mapping Omit 10% initial mapping Omit 4 probing sources Omit probing destinations (one probe per unique BGP path) Mismatch 5.23% 3.08% 6.57% 6.34% 7.12% 83 AS-level path inference Without access to the source Challenges Asymmetric routes: 60% Complicated routing policies Multihomed networks Find the shortest policy path that conforms with AS relationships 84 Routescope Assumptions Explicit AS relationship Peer-peer Provider-customer Shortest policy AS path preferred Valley-free Uniform routing policy within an AS AS destination based uniform routing Stability These assumptions are mostly correct. 85 AS path inference algorithm Compose AS graph based on BGP tables Infer AS relationship Classify edges based on AS relationship Customer-provider (UP) link Provider-customer (DOWN) link Peering (FLAT) link Compute shortest policy path conforming the “valleyfree” rule using modified Dijkstra’s algorithm Infer the first AS hop if multiple paths returned 86 AS path inference accuracy Total Match Match length Exact match Shorter Longer AS7018 (tier-1) 18085 82% 83% 35% 17% 0% AS2152 (tier-2) 11990 64% 64% 10% 35% 0% AS8121 (tier-3) 15757 16% 27% 3% 69% 4% All BGP gateways 2457 70% 73% 30% 22% 4% US BGP gateways 1907 60% 62% 27% 34% 4% If the first hop is known, 15% of mismatches can be eliminated. 87 First hop inference AS T1 AS S AS D Transition point T1 Source Only have access to D T2 AS T2 Destination AS C Gather candidate first hop ASes from S by launch traceroute to S from multiple vantage points Identify the transition point T that is likely to be on the path from S to D by testing hop_count(S,T) + hop_count(T,D) = hop_count(S,D88) Hop count inference Hop_count(S,T) = hop_count(T,S) To infer hop_count(H,D): H = T or S Send ping packet to H Guess the initial TTL value TTL0 set by H Get TTL value TTL1 in ICMP response packet received from H Hop_count(H,D) = TTL0 - TTL1 + 1 Common value for TTL0: 32 (Win95/98/Me) 64 (Linux, Compaq Tru64) 128 (Win NT/2000/XP) 255 (most UNIX systems) 89 Improvement with known first AS hop Total Match length Improvement AS7018 (tier-1) 18085 86% 3% AS2152 (tier-2) 11990 76% 12% AS8121 (tier-3) 15757 48% 21% All BGP gateways 1907 70% 8% US BGP gateways 2457 88% 15% 90 Possible causes of inaccuracy Complicated AS relationships: 15% paths Two consecutive FLAT links DOWN link followed by a FLAT link FLAT link followed by UP link Routing policies Shortest path vs. customer routes Inconsistent advertisement to different peering locations BGP tie-breaking rules AS prepending:>28% ASes 91 Part III: BGP Measurement BGP routing updates Route updates at prefix level No activity in “steady state” Routing messages indicate changes, no refreshes 93 Internet routing instability Large # of BGP updates Failures Policy changes Redundant messages Routing instability Route keeps changing, e.g., routes keep going up and down 94 Implications Router overhead Transient delay and loss Unreachable hosts High loss rate High jitter Long delays Significant packet reordering Poor predictability of traffic flow How do we know if the instability is due to routing or network congestion? 95 Measure BGP stability First work by Labovitz et al. Methodology Collect routing messages from five public exchange points BGP information considered AS path Next hop: next hop to reach a network Two routes are the same if they have the same AS path and next hop Other attributes (e.g., MED, communities) ignored Focus on forwarding path stability 96 Measurement methodology 97 BGP information exchange Announcements: a router has either Learned of a new route, or Made a policy decision that it prefers a new route Withdrawals: a router concludes that a network is no longer reachable Explicit: associated to the withdrawal message Implicit: (in effect an announcement) when a route is replaced as a result of an announcement message In steady state BGP updates should be only the result of infrequent policy changes BGP is stateful, requires no refreshes Update rate: indication of network stability 98 Example of delayed convergence stage 0 1 4 2: [1] [41] [431] node 3: [1] [41] [241] 4: [1] [31] -- Example topology: 9 ---- d 1 2 4 3 Assuming node 1 has a route to a destination, and it withdraws the route: Stage (msg processed) 0: 1: 1->{2,3,4}W Msg queued 1->{2,3,4}W 2->{3,4}A[241], 3->{2,4}A[341], 4->{2,3}A[431] 2: 2->{3,4}A[241] 3: 3->{2,4}A[341] 4: 4->{2,3}A[431] 3->{2,4}A[341], 4->{2,3}A[431] 4->{2,3}A[431], 4->{2,3}W MinRouteAdver timer expires: 4->{2,3}W, 3->{2,4}A[3241], 2->{3,4}A[2431] … (omitted) 9: 3->{2,4}W Note: In response to a withdrawal from 1, node 3 sends out 3 messages: 3->{2,4}A[341], 3->{2,4}A[3241], 3->{2,4}W 99 Types of inter-domain routing updates Forwarding instability may reflect topology changes Policy fluctuations (routing instability) may reflect changes in routing policy information Pathological updates redundant updates that are neither routing nor forwarding instability Instability forwarding instability and policy fluctuation change forwarding path 100 Routing successive events (instability) WADiff W: a route is explicitly withdrawn as it becomes unreachable A: is later replaced with an alternative route Forwarding instability AADiff A: a route is implicitly withdrawn A: then replaced by an alternative route as the original route becomes unavailable or a new preferred route becomes available Forwarding instability 101 Routing successive events (pathological instability) WADup W: a route is explicitly withdrawn A: then reannounced later forwarding instability or pathological behavior AADup A: a route is implicitly withdrawn A: then replaced with a duplicate of the original route pathological behavior or policy fluctuation WWDup The repeated transmission of BGP withdrawals for a prefix that is currently unreachable (pathological behavior) 102 Measurement findings: overview Year 2000 BGP updates more than one order of magnitude larger than expected Routing information dominated by pathological updates Implementation problems BGP self-synchronization Unconstrained routing policies 103 Routing problem findings Implementation problems Redundant updates Routers do not maintain the history of the announcements sent to neighbors Self-synchronization BGP routers exchange information simultaneously may lead to periodic link/router failures Unconstrained routing policies may lead to persistent route oscillations 104 Instability measurement Instability and redundant updates exhibits strong correlation with load (30 seconds, 24 hours and seven days periods) Instability usually exhibits high frequency Pathological updates exhibits both high and low frequencies 105 Non-localized instability No single AS dominates instability statistics No correlation between size of AS and its impact on instability statistics There is no small set of paths that dominate instability statistics 106 Measurement conclusions Routing in the Internet exhibits many undesirable behaviors Instability over a wide range of time scales Asymmetric routes Network outages Problem seems to worsen Many problems are due to software bugs or inefficient router architectures 107 Lessons Even after decades of experience routing in the Internet is not a solved problem This attests the difficulty and complexity of building distributed algorithm in the Internet, i.e., in a heterogeneous environment with products from various vendors Simple protocols may increase the chance to be Understood Implemented right 108 Better understanding of BGP dynamics Difficulties Multiple administrative domains Unknown information (policies, topologies) Unknown operational practices Ambiguous protocol specs Proposal: a controlled active measurement infrastructure for continuous BGP monitoring – BGP Beacons. 109 What is a BGP Beacon? An unused, globally visible prefix with known Announce/Withdrawal schedule For long-term, public use 110 Who will benefit from BGP Beacon? Researchers: study BGP dynamics To calibrate and interpret BGP updates To study convergence behavior To analyze routing and data plane interaction Network operators Serve to debug reachability problems Test effects of configuration changes: E.g., flap damping setting 111 Related work Differences from Labovitz’s “BGP faultinjector” Long-term, publicly documented Varying advertisement schedule Beacon sequence number (AGG field) Enabler for many research in routing dynamics RIPE Ris Beacons Set up at 9 exchange points 112 Active measurement infrastructure Many Observation points: 1:Oregon RouteViews Internet ISP 2. RIPE ISP 3.AT&T ISP Send route update Upstream provider Stub AS BGP Beacon #1 198.133.206.0/24 ISP ISP ISP ISP Upstream provider ISP ISP ISP 4. Verio 5. MIT 6.Berkeley 113 Deployed PSG Beacons Prefix Src AS Start date Upstream Beacon provider AS host Beacon location 198.133.206.0/24 3130 8/10/02 2914, 1239 Randy Bush WA, US 192.135.183.0/24 5637 9/4/02 3701, 2914 Dave Meyer OR, US 203.10.63.0/24 1221 9/25/02 1221 Geoff Huston Australia 198.32.7.0/24 3944 10/24/02 2914, 8001 Andrew Partan MD, US 192.83.230.0/24 3130 06/12/03 2914, 1239 Randy Bush WA, US 114 Deployed PSG Beacons B1, 2, 3, 5: Announced and withdrawn with a fixed period (2 hours) between updates 1st daily ANN: 3:00AM GMT 1st daily WD: 1:00AM GMT B4: varying period B5: fail-over experiments Software available at: http://www.psg.com/~zmao 115 Beacon 5 schedule Live host behind the beacon for data analysis Study fail-over Behavior for multi-homed customers 116 Beacon terminology Internet Beacon AS Beacon prefix: 198.133.206.0/24 Input signal: Beacon-injected change 3:00:00 GMT: Announce (A0) 5:00:00 GMT: Withdrawal (W) Output signal: RouteView AT&T 5:00:10 A1 5:00:40 W 5:01:10 A2 Signal length: number of updates in output signal (3 updates) Signal duration: time between first and last update in the signal (5:00:10 - 5:01:10, 60 seconds) Inter-arrival time: time between consecutive updates 117 Process Beacon data Identify output signals, ignore external events Data cleaning Anchor prefix as reference Same origin AS as beacon prefix Statically nailed down Minimize interference between consecutive input signals Beacon period is set to 2 hours Time stamp and sequence number Attach additional information in the BGP updates Make use of a transitive attribute: Aggregator fields 118 Beacon data cleaning process Goal Clearly identify updates associated with injected routing change Discard beacon events influenced by external routing changes 119 Cumulative Beacon statistics: significant noise Current observation points: 111 peers: RIPE, Route-View, Berkeley, MIT, MIT-RON nodes, ATT-Research, AT&T, AMS-IXP, Verio Avg expansion: 2*0.2+1*0.8=1.2 120 Cumulative Beacon statistics: significant noise Example response to ANN-beacon at peer p R1: ASpath= 286 209 1 3130 3927 No. transient routes=2 R2: ASpath= 286 209 2914 3130 3927 Out-signal length=1 100 events: 20: R1 R2, 80: R2 Beacon Max no. Max ANN- Max WD- Max ANN-avg Max WD-avg transient out-signal out-signal expansion expansion routes length length 1 186 11 14 9.7 11.2 2 179 9 15 7.0 10.8 3 117 16 13 5.8 11.4 4 307 18 15 8.8 16.3 121 Cisco vs. Juniper update rate-limiting Known last-hop Cisco and Juniper routers from the same AS and location Average signal length: average number of updates observed for a single beacon-injected change 122 “Cisco-like” last-hop routers Linear increase in signal duration wrt signal length Slope=30 second Due to Cisco’s default rate-limiting setting 123 “Juniper-like” last-hop routers Signal duration relatively stable wrt increase in signal length Shorter signal duration compared to “Cisco-like” last-hops 124 Route flap damping A mechanism to punish unstable routes by suppressing them Reduce router processing load due to instability Prevent sustained routing oscillations Do not sacrifice convergence times for well-behaved routes There is conjecture a single announcement can cause route suppression. 125 RFC2439: Route flap damping Cisco default setting Scope Penalty Exponentially decayed 3000 Suppress threshold Inbound external routes Per neighbor, per destination Penalty Flap: route change Increases for each flap Decays exponentially 2000 1000 750 P(t ' ) P(t )e (t 't ) Reuse threshold 0 2 4 32 Time (min) 126 Route flap damping analysis Strong evidence for withdrawal- and announcementtriggered suppression. 127 Distinguish between announcement and withdrawal Summary: •WD-triggered sup more likely than ANNtriggered sup •Cisco: overall more likely trigger sup than Juniper (AAAW-pattern) •Juniper: more aggressive for AWAW pattern 128 Convergence analysis Summary: •Withdrawals converge slower than announcements •Most beacon events converge within 3 minutes 129 Output signal duration 30 60 90 120 130 Beacon 1’s upstream change Single-homed (AS2914) Multi-homed (AS1,2914) Multi-homed 131 (AS1239, 2914) Beacon for identifying router behavior Rate-limiting timer 30 second Beacon 2 seen from RouteView data Different rate-limiting behavior: Cisco vs. Juniper 132 Inter-arrival time analysis 133 Inter-arrival time modeling Geometric distribution (body): Update rate-limiting behavior: every 30 sec Prob(missing update train) independent of how many already missed Mass at 1: Discretization of timestamps for times<1 Shifted exponential distribution (tail): Most likely due to route flap damping 134 Motivation destination Failure AS4 Disruption AS2 AS3 Congestion BR C A BR C B AS1 BR C C Mitigation A backbone network is vulnerable to routing BR C D changes that occur in other domains. source 135 Goal Identify important routing anomalies Lost reachability Persistent flapping Large traffic shifts Contributions: •Build a tool to identify a small number of important routing disruptions from a large volume of raw BGP updates in real time. •Use the tool to characterize routing disruptions in an operational network 136 Capturing Routing Changes A large operational network (8/16/2004 – 10/10-2004) BR BR C BR BR C BR BR C BGP CPE Monitor BR BR C BR BR C BR BR C 137 Challenges Large volume of BGP updates Millions daily, very bursty Too much for an operator to manage Different from root-cause analysis Identify changes and their effects Focus on actionable events rather than diagnosis Diagnose causes in/near the AS 138 System Architecture BGP (106) BR E Updates Events (105) BR E BR E BGP Update Grouping Persistent Flapping Prefixes (101) “Typed” Events Event Classification Clusters Event Correlation Frequent Flapping Prefixes (101) Large Disruptions (101) (103) Traffic Impact Prediction Netflow Data BR E BR E BR E From millions of updates to a few dozen reports 139 Grouping BGP Update into Events Challenge: A single routing change leads to multiple update messages affects routing decisions at multiple routers BR E BR E BR E Approach: BGP BGP Update Updates Grouping Persistent Flapping Prefixes Events •Group together all updates for a prefix with inter-arrival < 70 seconds •Flag prefixes with changes lasting > 10 minutes. 140 Grouping Thresholds Based on our understanding of BGP and data analysis Event timeout: 70 seconds 2 * MRAI timer + 10 seconds 98% inter-arrival time < 70 seconds Convergence timeout: 10 minutes BGP usually converges within a few minutes 99.9% events < 10 minutes 141 Persistent Flapping Prefixes A surprising finding: 15.2% of updates were caused by persistent-flapping prefixes even though flap damping is enabled. Types of persistent flapping Conservative damping parameters (78.6%) Protocol oscillations due to MED (18.3%) Unstable interfaces or BGP sessions (3.0%) 142 Example: Unstable eBGP Session AE ISP DE Peer BE CE p Customer Flap damping parameters is session-based Damping not implemented for iBGP sessions 143 Event Classification Challenge: Major concerns in network management Changes in reachability Heavy load of routing messages on the routers Change of flow of the traffic through the network Events Event Classification “Typed” Events, e.g., Loss/Gain of Reachability Solution: classify events by severity of their impact 144 Event Category – “No Disruption” p AS2 AS1 DE No Traffic Shift “No Disruption”: EE AE BE ISP no border routers have any traffic shift. (50.3%) CE 145 Event Category – “Internal Disruption” p AS2 AS1 DE EE AE BE “Internal Disruption”: ISP all traffic shifts are internal. (15.6%) CE Internal Traffic Shift 146 Event Category – “Single External Disruption” p AS2 AS1 DE external Traffic Shift EE AE BE “Single External Disruption”: ISP only one of the traffic shifts is external (20.7%) CE 147 Statistics on Event Classification Events Updates No Disruption 50.3% 48.6% Internal Disruption 15.6% 3.4% Single External Disruption 20.7% 7.9% Multiple External Disruption 7.4% 18.2% Loss/Gain of Reachability 6.0% 21.9% First 3 categories have significant day-to-day variations Updates per event depends on the type of events and the number of affected routers 148 Event Correlation Challenge: A single routing change affects multiple destination prefixes “Typed” Events Event Correlation Clusters Solution: group the same-type, close-occurring events 149 EBGP Session Reset Caused most of “single external disruption” events Check if the number of prefixes using that session as the best route changes dramatically Number of prefixes session recovery session failure time Validation with Syslog router report (95%) 150 Hot-Potato Changes Hot-Potato Changes P AE 11 9 BE ISP 10 “Hot-potato routing” = route to closest egress point CE Caused “internal disruption” events Validation with OSPF measurement (95%) [Teixeira et al – SIGMETRICS’ 04] 151 Traffic Impact Prediction Challenge: Routing changes have different impacts on the network which depends on the popularity of the destinations Traffic Impact Prediction Clusters Large Disruptions Netflow Data E BR E BR E BR Solution: weigh each cluster by traffic volume 152 Traffic Impact Prediction Traffic weight Per-prefix measurement from netflow 10% prefixes accounts for 90% of traffic Traffic weight of a cluster the sum of “traffic weight” of the prefixes A small number of large clusters have large traffic weight Mostly session resets and hot-potato changes 153 Performance Evaluation Memory Static memory: “current routes”, 600 MB Dynamic memory: “clusters”, 300 MB Speed 99% of intervals of 1 second of updates can be process within 1 second Occasional execution lag Every interval of 70 seconds of updates can be processed within 70 seconds Measurements were based on 900MHz CPU 154 Conclusion of BGP Troubleshooting Tool BGP troubleshooting system Fast, online fashion Operators’ concerns (reachability, flapping, traffic) Significant information reduction millions of update a few dozens of large disruptions Uncovered important network behavior Hot-Potato changes Session resets Persistent-flapping prefixes 155 Part IV BGP Modeling BGP Is Not Guaranteed to Converge! BGP is not guaranteed to converge to a stable routing. Policy inconsistencies can lead to “livelock” protocol oscillations. Goal: Design a simple, tractable and complete model of BGP modeling Example application: sufficient condition to guarantee convergence. 157 BGP is Solving What Problem? Underlying problem Distributed means of computing a solution. Shortest Paths RIP, OSPF, IS-IS X? BGP X can aid in the design of policy analysis algorithms and heuristics, aid in the analysis and design of BGP and extensions, help explain some BGP routing anomalies, provide a fun way of thinking about the protocol 158 Separate Dynamic and Static Semantics Static semantics: BGP policies Stable Paths Problem Dynamic semantics: BGP SPVP SPVP: Simple Path Vector Protocol A distributed algorithm for solving Stable Paths Problem 159 What is Stable Paths Problem? Example: A graph of nodes and edges, Node 0, called the origin, For each non-zero node, a set or permitted paths to the origin. This set always contains the “null path”. A ranking of permitted paths at each node. Null path is always least preferred. 222 10 0 5 5210 2 4 420 430 3 30 0 1 130 10 most preferred … least preferred (not160null) A Solution to SPP A solution is an assignment of permitted paths to each node such that node u’s assigned path is either the null path or is a path uwP, where wP is assigned to node w and {u,w} is an edge in the graph, each node is assigned the highest ranked path among those consistent with the paths assigned to its neighbors 161 A Solution to SPP A solution need not represent a shortest path tree or a spanning tree 210 20 5 5210 2 4 420 430 3 30 0 1 130 10 162 There can be Multiple Solutions to an SPP 120 10 120 10 1 120 10 1 0 0 2 210 20 DISAGREE 1 2 210 20 First solution 0 2 210 20 Second solution 163 Multiple Solutions Can Occur Due to Recovery: 10 1230 1 230 210 2 1 primary link 0 2 0 1 10 1230 2 230 310 0 backup link 3210 30 3 Remove primary link 3 3 3210 30 Restore primary link 164 Ranking BGP Paths Highest local Preference Shortest AS path Length Origin: IGP<EGP<INCOMPLETE Lowest MED value IBGP preferred over EBGP Lowest IGP cost Tie breaking 165 Bad Gadget: No Solution Stage 1: 1: [10] 2: [210] 3: [30] Stage 2: 1:[130] 2:[20] 3:[320] Back to stage 1 2 210 20 4 0 1 130 10 3 320 30 166 Bad Gadget: No Solution Stage 1: 1: [10] 2: [20] 3: [320] Stage 2: 1:[130] 2:[210] 3:[30] Back to stage 1 2 210 20 4 0 1 130 10 3 320 30 167 Has A Solution, But Can Get Trapped: 4 310 3120 5 5310 563120 53120 4310 453120 43120 1 3 120 10 0 6 2 6310 643120 63120 This part has a solution only when node 1 is assigned the direct path (1 0). 210 20 As with DISAGREE, this part has two distinct solutions 168 Has A Solution, But Can Get Trapped: 4 310 3120 5 5310 563120 53120 4310 453120 43120 1 3 120 10 0 6 2 6310 643120 63120 This part has a solution only when node 1 is assigned the direct path (1 0). 210 20 As with DISAGREE, this part has two distinct solutions 169 How To Solve An SPP? Exponential complexity Just enumerate all path assignments, And check stability of each…. NP-complete 3-SAT can be reduced to SPP 170 Distributed Algorithms to Solve SPP OSPF-like Distributed topology, path ranks Solve SPP locally Exponential worst case How to avoid loops if multiple solutions exist? RIP-like: Pick the best path form neighbors’ paths Tell neighbors about changes Can diverge Not guaranteed to find a solution even if it exists No bound on convergence time 171 SPVP Protocol Pick the best path available at any time process spvp[u] { receive P from w { rib-in(uw) := u P if rib(u) != best(u) { rib(u) := best(u) foreach v in peers(u) { send rib(u) to v } } } } 172 SPVP and SPP SPVP wanders around assignment space SPP Solvable must converge SPVP Can Diverge must diverge 173 A sufficient condition for sanity If an instance of SPP has an acyclic dispute digraph, then Static (SPP) Dynamic (SPVP) solvable safe (can’t diverge) unique solution predictable restoration all sub-problems uniquely solvable robust with respect to link/node failures 174 Dispute Digraph Example 130 10 210 20 1 2 0 20 10 420 210 3420 3 4 3420 30 420 430 BAD GADGET II CYCLE 430 130 30 175 Dispute Wheels R_k u_0 u_k Q_0 Q_k •At u_i, rank of Q_i u_1 R_1 is less than or equal u_2to rank of R_iQ_(i+1) Q_1 R_0 Q_2 Q_(I+1) Q_i u_(i+1) R_i u_i •There exists a dispute wheel iff there exists cycle in the dispute digraph 176 Dispute Wheel Example 1230 120 10 2310 230 20 1 2 3 1 1 2 0 0 3 3 2 3120 310 30 177 A Dynamic Solution Extend SPVP with a history attribute, A route’s history contains a path in the dispute digraph that “explains” how the route was obtained, A route history will contain a dispute cycle if and only if a policy dispute is dynamically realized. If a route’s history contains a cycle, then suppress it …. 178