Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
FlowRoute: Inferring Forwarding Table Updates Using Passive Flow-level Measurements Amogh Dhamdhere (CAIDA/UCSD) [email protected] with Lee Breslau, Nick Duffield, Cheng Ee, Alexandre Gerber, Carsten Lund and Shubho Sen (AT&T Labs-Research) Motivation • Routing protocol performance during routing events can affect end-to-end performance • Transient loops and packet losses may occur during routing reconvergence • Network operators need to monitor routing protocol performance • Do routers respond as expected? – Update their forwarding tables in a timely manner? – Update their forwarding tables to the expected state? 5/25/2017 IMC 2010, Melbourne Australia 2 Monitoring Routing Events • Control plane monitors (e.g., OSPFmon, BGPmon) – Monitor the control plane – cannot measure when a router implemented a change in its forwarding table • Active probing – Can only monitor paths that are probed – Spatial and temporal resolution limited by placement of probes and probing frequency 5/25/2017 IMC 2010, Melbourne Australia 3 FlowRoute • A data-plane monitoring tool to work in conjunction with control plane monitors • Infer forwarding table updates using flow-level measurements • Works offline, for after-the-fact forensics and analysis • No additional overhead on routers – Uses flow-level measurements (e.g., Netflow) that are already collected 5/25/2017 IMC 2010, Melbourne Australia 4 Basic Method T1: f1 N1 R T2: f2 N2 • Single packet flows f1 and f2 towards D • f1 seen at N1: R is previous hop at time T1 • N1 is R’s next hop towards D at T1 • f2 seen at N2: R is previous hop at time T2 • N2 is R’s next hop towards D at T2 R’s next hop towards D changed in [t1,t2] 5/25/2017 IMC 2010, Melbourne Australia 5 Routing Flow Records o i Rp R δ Rn R sees flow towards destination D from tf to tl Netflow: (R, i, o, tf, tl, D) Map outgoing interface Duplicate first o to next hop router packet timestamp Map incoming Subtract link interface i to propagation previous hop delays router (Rp, tf-δ, tl- δ,D,R) (R, tf, tf, D, Rn) One flow record at R produces two routing flow records, giving the routing state of R and Rp 5/25/2017 IMC 2010, Melbourne Australia 6 Inferring Forwarding Table Updates • Collect netflow records from all routers • Convert to Routing Flow Records (RFRs) for offline processing (R, T1, T2, N1, D) (R, T3, T4, N2, D) T2 < T3 N1 T1 5/25/2017 N2 T2 T3 T4 R changed next hop towards D in the time window [t2,t3] “range” of forwarding table update IMC 2010, Melbourne Australia 7 Inferring Forwarding Table Updates (R, T1, T2, N1, D) (R, T3, T4, N2, D) T2 > T3 N2 N1 T1 5/25/2017 T3 T2 T4 • Collect netflow records from all routers • Convert to Routing Flow Records (RFRs) for offline processing Routing flow records overlap could be due to Equal Cost Multi-Path (ECMP) IMC 2010, Melbourne Australia 8 ECMP [T1,T2]: f1 N1 R [T3, T4]: f2 5/25/2017 D N2 • Router R can forward flows destined to D to either N1 or N2 • RFRs generated at N1 and N2 can overlap inconsistency • Non-overlapping RFRs can appear as a routing change for every flow IMC 2010, Melbourne Australia 9 Filtering ECMP • Observation: In 99% of next hop changes due to ECMP, a router routes fewer than 20 flows towards one next hop, before routing a flow towards an equal-cost next hop • Filtering heuristic: Declare routing change only if >20 flows were routed to the old next hop before a flow is routed to new next hop • Conservative: May miss routing changes before 20 flows are forwarded to the old next hop 5/25/2017 IMC 2010, Melbourne Australia 10 Sampling • Both packet and flow sampling in high-speed networks • Sampling does not affect correctness of inferred ranges • Sampling affects the width of ranges; more sampling lower temporal resolution • More discussion in the paper 5/25/2017 IMC 2010, Melbourne Australia 11 Timely Forwarding Table Updates Forwarding table update ranges OSPF event “cluster” All ranges overlap with OSPF event cluster 5/25/2017 IMC 2010, Melbourne Australia 12 Delayed Forwarding Table Updates Forwarding table updates consistent with OSPF events Forwarding table updates delayed w.r.t OSPF events Such behavior is not detectable using a control plane monitor alone! 5/25/2017 IMC 2010, Melbourne Australia 13 Delayed Forwarding Table Updates • Used FlowRoute on a 2-month dataset • 2666 OSPF event clusters • 97010 time ranges consistent with OSPF event clusters • 117 ranges that showed delayed forwarding table updates • Two routers showed delayed updates 14 times in the 2-month dataset – Subsequently retired from the network 5/25/2017 IMC 2010, Melbourne Australia 14 Loops • Delayed forwarding table updates can cause transient loops – Example in the paper of how this can happen • 392 instances of 1-hop loops during 2-month dataset • Mostly short-lived (sub-second) • A few loops lasted 10s of seconds – Long-lived loops were due to delayed updates by one or more routers 5/25/2017 IMC 2010, Melbourne Australia 15 Summary • FlowRoute: A data plane monitor to work in conjunction with control plane monitors for forensics and analysis of forwarding table updates • Used to study forwarding table updates in a tier-1 ISP network • Found cases of delayed forwarding table updates due to buggy routers • Also found transient loops during routing convergence and spikes in link utilization 5/25/2017 IMC 2010, Melbourne Australia 16 Thanks! [email protected] www.caida.org/~amogh 5/25/2017 IMC 2010, Melbourne Australia 17 Practical Issues • What should be the destination? Can be either destination IP address, prefix, or MPLS tunnel endpoint – Need to observe sufficient flow volume – We choose MPLS tunnel endpoint • Sampling – Both packet and flow sampling occur in high-speed networks – Sampling does not affect correctness of inferred ranges – Affects the width of the ranges; more sampling lower temporal resolution 5/25/2017 IMC 2010, Melbourne Australia 18 Existing Approaches • Control plane monitors (e.g., OSPFmon, BGPmon) – Monitor the control plane, cannot measure when a router implemented a change in its forwarding table • Collect and process router logs – Large volume of data, transporting and processing is hard – Limited by polling frequency, e.g., 5 minutes with SNMP • Active probing – Spatial and temporal resolution limited by placement of probes and probing frequency 5/25/2017 IMC 2010, Melbourne Australia 19 Delayed Forwarding Table Updates • Used FlowRoute on a 2-month dataset -- 2666 OSPF event clusters • 97010 time ranges consistent with OSPF event clusters • 58 clusters, 117 ranges that showed delayed forwarding table updates • Two routers showed delayed updates 14 times in the 2-month dataset – Subsequently retired from the network 5/25/2017 IMC 2010, Melbourne Australia 20