Download wespy_1 (dec 2007)

A victim-centric peer-assisted framework for monitoring and troubleshooting routing problems How to Monitor? Four schemes How to monitor? Pros Cons Monitor devices such as routers No overhead Device status does not directly translate into user perceived performance Monitor BGP updates No overhead; Know what happens in other network Do not see some data-plane anomaly Monitor flowlevel traffic No overhead; Real traffic; Witness direct impact of failures Do not witness failures directly Active probing Extra overhead; May Witness direct impact of failures not mimic the real traffic What Constraint to Monitor?  Network meets ISP’s goals     Resource utilization Routing goes as specified by policy … Network meets users’ goals    Reachability  Most fundamental end-to-end property  Easy to define and formulate Delay, loss  Less easier to define and formulate Application level: Bulk transfer, VOIP  Depends on reachability, delay, loss, etc Our Monitor Scheme  Monitor reachability using active probing     Focus on reachability Use ping – no need for remote cooperation Trade off between probing efficiency and probing coverage (challenges) Disclaimers   Do not monitor delay or loss Do not consider ISP’s goals Troubleshooting -Next Step to Monitoring  Goal of troubleshooting    Localize the root cause How local? Depends on the nature of the cause Purpose of troubleshooting  Local root cause   Pin-point the problem and fix it Remote root cause   Contact the responsible networks to solve the problem By-pass the faulty network Localize the Root Cause Topology dimension AS 1 a->b b->c c->d AS 2 m->n m->l n->l AS 3 x->y y->z z->x Firewalls (those who prevent forwarding) Forwarding paths (those who do forwarding) Control plane Physical and link layers Protocol dimension Localize the cause Link level at AS level Localize the cause at protocol Both AS level and protocol level Troubleshooting: Three building blocks  Tool   Data: generated by tool   traceroute, ping, netflow, looking glass, etc e2e reachability, BGP updates, traffic profile, etc Brain: the intelligent part, usually network operator   Digest the data, make inference, leverage dependency, draw from past experience The key of troubleshooting. Hard problem What Can We Do to Improve?  Improve the tool     Promote the cooperation among networks Traceroute -> resilient remote traceroute BGP feed -> resilient remote BGP feed Improve the automation of brain  Unify previous work Automatic Brain  It’s a challenging problem    Fault may occur at multiple levels Involve machine learning Example work: Enterprise network services, sigcomm’07, by Paramvir Bahl et al. Dependency Graph Approach   Decompose a large system into components Infer the dependencies among components    A set of observations on some components   A depends on B: If B fails, A fails Lead to a hierarchy of dependencies: dependency graph (like Makefile) For example, F,H,X works but G fails Infer the status of other components using dependencies, finally locate the root cause component Dependency Graph Example 1 Multi-tier dependency graph. Diagnoses multi-level fault but needs automated construction. [ From Paramvir Bahl et al, sigcomm’07 ] Dependency Graph Example 2 Flat dependency graph. Diagnoses simple fault. [From Ramana Kompella et al, infocom’07 ] Trade-off in Decomposition   The granularity of decomposition determines the how specific the troubleshooting is Fine-grained decomposition    Advantage: more specific Disadvantage: graph is more complex, constructing and solve it is challenging Coarse-grained decomposition   Advantage: graph is simple, constructing and solving it is less challenging Disadvantage: less specific Dependency Graph Regarding Internet Routing A B p can ping q A depends on B p can send packets to q q can send packets to p Forwarding path p->q is OK Physical path p->q is OK Link u_i->u_{i+1} is up …… Control plane info is correctly propagated AS N_i has correct route N_{i+1} AS N_i imports routes of prefix p Path p->q before failure: IP hops: u_0, u_1, …, u_n, AS hops: N_0, N_1, …., N_m Dependency Graph Regarding Internet Routing (cont.)  Account for three common root causes     Link/router failure Router misconfiguration leading to missing route (i.e. does not import route) Router misconfiguration or attack leading to prefix hijacking Topology-wise locate the root cause, and also tell among the three root causes  Reasonably specific Recent Work on Network Troubleshooting  Infocom’07, Detection and Localization of Network Black Holes, by Ramana R. Kompella et al   CoNext’07, NetDiagnoser: Troubleshooting network unreachabilities using end-to-end probes and routing data, by Amogh Dhamdhere et al   Automate the “brain”. Consider both physical failure and control plane fault. For inter-domain. Flat dependency graph. Sigcomm’07, Automating Cross-layer Diagnosis of Enterprise Wireless Networks, by Cheng et al   Automate the “brain”. Consider only physical failure. Mainly for intra-domain. Flat dependency graph. Improving the “tool”. Measure and infer various delays in a wireless environment Sigcomm’07, Towards Highly Reliable Enterprise Network Services Via Inference of Multi-level Dependencies, by Paramvir Bahl et al  Automate the “brain”. Mainly for enterprise network and services. Deal with multi-level faults. Automatically generate multi-tier dependency graph. NetDiagnoser: Overview   Troubleshooting unreachability Fault assumption:      Link failure, router misconfiguration causing partial link failure (in particular BGP export filter misconfiguration) Deal with filtered traceroute More comprehensive than previous work Infrastructure: sensors, all pair-wise traceroute Mechanisms:    Binary tomography Per-neighor-basis logical link modeling control plane Combining BGP withdraw message NetDiagnoser: Logical Links Netdiagnoser: Dependency Assumption P can send packets to q Forwarding path p->q is OK Physical path p->q is OK Link u_i->u_{i+1} is up A B Control plane info is correctly propagated AS N_{i+1} exports prefix q to AS N_i A depends on B P->q: IP hops: u_0, u_1, …, u_n, AS hops: N_0, N_1, …., N_m

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download wespy_1 (dec 2007)