Download Scalable and Deterministic Overlay Network Diagnosis

Scalable and Deterministic Overlay Network Diagnosis Yao Zhao, Yan Chen Northwestern Lab for Internet and Security Technology (LIST) Dept. of Computer Science Northwestern University http://list.cs.northwestern.edu David Bindel Computer Science Division Dept. of EECS University of California at Berkeley When something breaks in the Internet, the Internet's very decentralized structure makes it hard to figure out what went wrong and even harder to assign responsibility. ̶ ̶ “Looking Over the Fence at Networks: A Neighbor's View of Networking Research”, by Committees on Research Horizons in Networking, National Research Council, 2001. Motivation  Internet diagnosis very important      To end users To overlay network service providers (e.g., Akamai) To Internet service providers (ISP) But a very challenging problem due to the privacy of network administration Solution  E2E measurements by end users -- overlay networks Related Works  Router based approaches [SOSP03]     Mostly ICMP based, ICMP rate limiting Unscalable for simultaneous diagnosis Cannot deterministically separate forward/backward path loss Statistical approaches [MINC, INFOCOM03]     Non-deterministic: fundamentally under-constrained system Inference based on temporal correlation in a multicast tree Have to compromise for unicast, then sensitive to cross traffic Optimization based on assumptions: # of lossy links small   Random sampling, linear programming, and Bayesian inference. Unscalable: iterative refinement slow to converge for large networks Problem Formulation Given an overlay of N end hosts and O(N2) paths, to what granularity can we deterministically diagnosis the network fault? Assumptions:  Topology measurable  Can only measure the E2E path, not the link Outlines     Architecture and algebraic model Identifying virtual links Evaluation with simulations Internet experiments Our Approach    Monitor a basis set of O(n·logn) paths that fully describe the O(n2) paths Decompose the paths into minimal deterministically identifiable segments Compute the loss rate for each segment for diagnosis topology Overlay Network Operation Center measurements End hosts Trouble spots location Diagnosis results: Qwest access link: 63.232.180.230->63.232.33.134 Peering between UUNET and AOL: 64.45.216.154->172.139.89.74 Linear algebraic model Path loss rate p, link loss rate l 1  p1  (1  l1)(1  l 2)  log( 1  l1)  log( 1  p1)  log( 1  l1)  log( 1  l 2)  1 1 0 log( 1  l 2) log( 1  l 3)   x1  1 1 0 x2  b1  x3 1 p1 3 D 2 B C Putting All Paths Together Gx  b, where path matrix G {0 | 1}rs link loss rate vector x   s1 , path loss rate vector b  r1 … k  rank (G)  s Usually an under - constrained system! = Identifiable and Unidentifiable   Vectors in the row space of G are identifiable Otherwise, unidentifiable x2 A 1 p1 3 D 2 B p2 C 1 1 0 G  1 1 1   (1,-1,0) (1,1,0) Row(path) space (identifiable) (1,1,1) (0,0,1) x3 x1 Outlines     Architecture and algebraic model Identifying virtual links Evaluation with simulations Internet experiments Definition of Virtual Links  Uniquely identified shortest path segments    Identifiable Consecutive Undecomposable b 1 1’ 2 3’ 4 3 2’ 4’ 5 4 paths, 5 links  a e c d 5 virtual links One More Example   6 paths, 8 links 4 virtual links:  Corresponding to links 1, 2, 3+4+7 and 5+6+8 respectively 3’ 1 1’ 2 3 4 2’ 5 4’ 6 5’ 7 6’ 8 1 0  1 G 1 0  0 1 1 0 0 1 0 0 1 1 0 0 0 0 1 1 0 0 0 0 0 0 1 1 0 0 0 0 1 1 0 0 1 1 0 0 1 0 0 0  1 1  1 Computing Virtual Links in Undirected Graph  Check if a vector is a virtual link    QR decomposition: G  QR v  Qv ? O(l·k) to check if a vector of length l is in row space of G  O(l2) 0 1 0  QG  1 0 0 x3 (1,1,1) potential virtual links in a (1,0,0) x1 path of length l (0,1,0) (1,1,0)  Total complexity x2 Row space 2 3 2 O(l·k·l ·k)=O(l ·k )  Small constant: only 4.2 sec for 135-node network Undirected vs. Directed Graphs  Directed graph  Any linear combination => incoming   outgoing Theorem: In a directed graph, no end-to-end path contains an identifiable subpath. Rescue: Good Path Algorithm  Identifying virtual links in undirected graphs   Use topology only For directed graphs: additional info needed   Path loss rate Use the link property constraint to break the deadlock   All the links in a good path are good links, i.e. no or little loss. Most of the paths on the Internet are good paths System Flowchart   Monitors O(n·logn) paths that can fully describe all the O(n2) paths (SIGCOMM04) Inherit load balancing, monitoring adaptation, etc. Optimization steps: find the minimal basis for identifiability test Measure topology to get G Select a basis of G, G, for monitoring Stage 1: set up scalable monitoring system for diagnosis Good path algorithm on G Estimated loss rates for all paths in G Reduced paths G’’ Good path algorithm on G Select a basis of G’’: G '  QR Reduced paths G’ Find all lossy virtual links in G Stage 2: online update the measurements and diagnosis Outlines     Architecture and algebraic model Identifying virtual links Evaluation with simulations Internet experiments Metrics   Avg length of lossy virtual links in all lossy paths Diagnosis granularity   The avg number of potential lossy links in a lossy path Example (Path 1 w/ lossy VL 1 of length 5, path 2 and 3 w/ lossy VL 2 of length 2)    Avg lossy VL length: (5+2)/2 = 3.5 Avg diagnosis granularity: (5+2+2)/3 = 3 Accuracy   Absolute error |p – p’ | Relative error  p( ) p' ( )  F ( p, p' )  max  ,  p ' (  ) p (  )   where p( )  max(  , p), p' ( )  max(  , p' ) Simulation Methodology  Topology type    Topology size   10% ~ 50% Link loss rate distribution   1000 ~ 20000 or 184k nodes Fraction of end hosts on the overlay network   Three types of BRITE router-level topologies Mecator topology LLRD1 and LLRD2 models Loss model  Bernoulli and Gilbert Sample of Simulation Results (Barabasi+Gilbert) Sample of Simulation Results (Barabasi+Gilbert) Results using Mercator Topology # of end hosts on OL Avg LP # of LP # of links Avg LP in LP Length Avg VLL in LP Avg BVLL in LP 50 8.86 1459 1304 3.55(4.86) 2.56(3.54) 2.97(4.18) 100 8.8 5625 3182 3.22(4.5) 1.76(2.36) 2.21(3.11) 200 8.85 22303 7065 3.2(4.21) 1.6(2.07) 1.99(2.74) Gibbs Sampling (Infocom03)  D      Ensemble of loss rates of links in the network Goal   Observed packet transmission and loss at the clients Determine the posterior distribution P(|D) Approach   Use Markov Chain Monte Carlo with Gibbs sampling to obtain samples from P(|D) Draw conclusions based on the samples Comparison with Bayesian Inference using Gibbs Sampling (1) Comparison with Bayesian Inference using Gibbs Sampling (2) Outlines     Architecture and algebraic model Identifying virtual links Evaluation with simulations Internet experiments Methodology  Planetlab   Topology measured by Traceroute   135 end hosts Avg path length is 17.2 Path loss rate by active UDP probing   300 40-byte UDP packets per measured path in 90 sec Small overhead: 17.9kb if even measuring all paths Diagnosis Results Loss rate [0, lossy path [0.05, 1.0] (15.8%) 0.05) [0.05, 0.1) [0.1, 0.3) [0.3, 0.5) [0.5, 1.0) 1.0 % 84.2 17.2 Total end-to-end paths 15.6 24.9 15.8 26.5 18,090 Avg Path Length 17.2 After removing 79.5% good paths w/ 80.5% good links … Avg lossy path (>5% loss rate) length 11.5 (9.0) Avg lossy virtual link length 4.3 (3.1) Avg Granularity 4.0 (2.7) The numbers in () are those after removing sequential link chains. Speed Results    On a Pentium-IV 3.2GHz PC Average setup time (selecting 5,706 paths for monitoring): 109.3 seconds Diagnosis of 2,858 lossy paths: 4.2 seconds Validation  Cross Validation      Divide 5720 paths into two sets (2860 each) Get 571 virtual links from the first set Check consistency with the second path set 99.1% paths in the second set are consistent with virtual links computed by the first set. IP Spoofing based Validation   D:b, TTL=255 ICMP:S:a, S:r3, D:c, TTL=255 UDP: S:c, TTL=2 D:c, c r1 a r2 r3 b IP Spoofing based Consistency Checking    Use the function of source routing of IP Spoofing to create new path segments Validation is the same as cross validation Results:    1000 new path including part of segments in potential lossy paths 94.1% loss spoofed paths are consistent with 361 out of 1664 lossy virtual links 5.9% paths are inconsistent with 45 virtual links Conclusions   Propose the first deterministic and scalable overlay diagnosis system based on a linear algebraic approach Diagnosis with virtual links:     Identifiable, consecutive and minimal path segments Directed topology indecomposable to VL Good path algorithms for rescue Both simulation and Internet experiments show fast & accurate diagnosis w/ optimal granularity Backup Slides Previous Work  “Computing the unmeasured: An algebraic approach to Internet mapping,” INFOCOM’01   “User-level internet path diagnosis,” SOSP’03    Need the support of routers Not accurate “Multicast-based inference of network-internal loss characteristics,” IEEE Transactions in Information Theory, 1999.   Can’t work on directed graph Multicast support or unicast approximation “Server-based inference of Internet link lossiness,” INFOCOM'03  Can only determine whether a link is lossy or not Distribution of Length of lossy Virtual Links IP Spoof Based Diagnosis

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Scalable and Deterministic Overlay Network Diagnosis