* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download PPT
Survey
Document related concepts
Transcript
A Framework for Measuring and Predicting the Impact of Routing Changes Ying Zhang Z. Morley Mao Jia Wang 1 Internet routing changes Various causes Link failures, configuration changes, topology changes, etc. Direct influence on the data plane Transient data-plane disruption Packet loss, increased delay, forwarding loops BR C Source Old path BR C Internet BR C BR C New path BR C BR C BR C Destination BR C 2 Motivation Frequent routing dynamics can cause transient disruption in the data plane Inconsistent routes during convergence Real-time applications can be affected Predicting performance impact can assist more intelligent route selection 3 Measuring and predicting the impact Comprehensively measure the impact of routing changes Characterize the properties of routing changes that cause traffic disruption Search for pattern to help prediction 4 Outline Motivation Methodology Characterization of data-plane failures Failure prediction model 5 Methodology Data collection Control plane: local real-time BGP updates Data plane: ping and traceroute probes for each update A light weight active probing methodology A coarse-grained performance metric: reachability Destination reachable: any ping reply Scalable to many destinations with live IPs Measurement-based approach No simplifying assumptions Empirical evidence 6 Our approach Focus: measure data-plane failures caused by routing changes Coarse-grained performance metrics Methodology: light-weight active probing Triggered by locally observed routing updates Update Prefix: P, Probing target of a live IP within the prefix AS path: A D B Old path BR C AS C Internet BR C Prefix P AS B New path AS A BR C AS D Measurement 7 Framework Our approach Focus: measure data-plane failure caused by routing changes Methodology: light-weight active probing Triggered by locally observed routing updates Probing target of a live IP within the prefix Ping Traceroute Ping, traceroute Old path BR C AS C Internet BR C AS B New path Live IP 1 within Prefix P AS A BR C AS D Measurement 8 Framework Probing control Background probing Identifying persistent failures Verifying live IP’s response Resource control Ignoring updates due to table transfers Imposing maximum probing duration Accuracy control Impose maximum waiting duration 9 Outline Motivation Methodology Characterization of data-plane failures Failure prediction model 10 Characterization of data-plane failures Failure types Reachability failure Ping reply is not received due to network problems Forwarding loops A subset of reachability failures Transient loops observed in the path Failure properties Affected networks Failure duration Failure predictability 11 Overall reachability failure statistics Loop Unreachable Other All Reachable Incidence Prefix AS 6% 23% 33% 36% 72% 38% 42% 73% 63% 57% 83% 98% Internet experiments for 11 weeks 12 Affected network locations Understanding the networks affected by routing changes Most Ases are near the edge and in foreign countries Small fraction of destinations experiencing many unreachable incidences 13 Failure durations Short duration Most last less than 300 seconds Transient routing failure, convergence delay 10% incidences with longer duration Configuration errors or path failures 14 Failure predictability Destination prefix information Appearance probability Probability of an unreachable incidence for prefix D Destination prefix and AS path segments Conditional probability on AS path segments Probability of an unreachable event occurring given a particular AS path segment Responsible AS Where traceroute stops 15 Outline Motivation Methodology Characterization of data plane failure Failure prediction model 16 Prediction model Prefix and AS segment information The data plane failure likelihood ratio P(Y 1 | R; D) (Y ) P(Y 0 | R; D) P(Y=1|R;D): the conditional probability of data-plane failure given a routing update R for prefix D Assuming the failure on each AS is independent n P(Y 1 | R x1 , x2 ,...xn ; D) 1 (1 P(Y 1 | xi ; D)) i 1 xi is the responsible AS in history data 17 Evaluation The trade-off between selectivity and sensitivity is the decision threshold which determines false positives and false negative route Receiver operating characteristic Evaluation results 60% detection rate with 18% false positives 18 Conclusion Developed an efficient framework for measuring and predicting data-plane failures caused by routing changes Identified patterns to accurately predict data-plane failures Provided suggestions for more intelligent route selections 19