Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Efficient Network Reachability Analysis using a Succinct Control Plane Representation Seyed K. Fayaz, Tushar Sharma, Ari Fogel Ratul Mahajan, Todd Millstein, Vyas Sekar George Varghese Network configuration is hard ??? Network operator Reachability policy: A can talk to B Reality R3 network does What the R4 R1 A R2 B network Does the network do what we want it to do? 2 State of the art in network verification Data plane verification Prior work: DP3 DP4 DP1 A DP2 Can A talk to B? Data plane (Forwarding table) B HSA, NSDI’12 ATPG, CoNext’12 NOD, NSDI’15 … Network operator ✔ Reachability policy: A can talk to B ✔ DP3 DP1 DP4 R3 R1 A • • • • DP2 R2 Are we done? R4 B 3 The data plane keeps changing! DP3 DP1 DP3 DP3 DP4 DP4 DP1 DP2 DP1 Can A talk to B? Time = t1 Time = t2 Can A talk to B? time Time = t3 Network operator traffic from A to B Reachability policy: A can talk to B DP3 DP31 DP34 R3 R1 A … DP2 DP2 Can A talk to B? DP4 DP32 R2 R4 B 4 Motivating example: Reachability bug triggered by a BGP announcement Before the incident After the incident W W A DCA services in 10.10.0.0/16 A B DCB DCB/16 ✔ DCA New service 10.10.1.160/28 10.10.0.0/16 culprit B DCB DCB/28 ✔ DCB/16 ✗ Root cause: Router B’s config. had a aggregate route 10.10.0.0/16 pointing to DCB The /28 advertisement activated the aggregate route! How can we proactively find such latent reachability bugs? 5 A data plane is just the current incarnation of the control plane! Router Prior work on control plane verification configuration Route advertisement 1 NSDI’05 – rcc, – Bagpipe, OOPSLA’16 (e.g., BGP advertisement) Control plane – ARC, SIGCOMM’16 Route advertisement 2 – Batfish, NSDI’15 (implementation (e.g., OSPF advertisement) – … advertisements to neighbors of BGP, OSPF, etc.) … Route advertisement 3 (e.g., RIPLimitations: advertisement) DataPlane Plane321 Data Plane Data <prefix P,port1> • Incomplete: Focus on just one routing protocol <prefix P,port2> <prefix P,port3> … …… of message passing • Unscalable: Detailed modeling … Router To find latent reachability bugs, we should focus on the control plane! 6 Our contributions • ERA: A tool for finding latent router configuration bugs in seconds based on control plane analysis – Expressive-yet-tractable control plane model – Scalable exploration of control plane model • Implementation as an open source tool 7 ERA: System overview Operator router configurations reachability policies ERA ERA: A tool to find latent reachability bugs due to router misconfiguration. Scope: Reachability bugs occurring in the steady state 8 Outline • Background and motivation • Design of ERA • Implementation and evaluation 9 Challenges in control plane analysis Operator router configurations reachability policies control plane model Challenge 1: Expressive and tractable model? model exploration Challenge 2: Scalable exploration ERA 10 Challenge 1: Expressive and tractable control plane model Operator router configurations reachability policies control plane model Challenge 1: Expressive and tractable model? model exploration Challenge 2: Scalable exploration ERA 11 A route as a succinct bit-vector ? Router control plane Control plane I/ O model Actual protocol’s messages (e.g., Batfish, NSDI’15) ? Expressive Tractable ✔ ✗ Protocol agnostic I/O model (e.g., ARC, SIGCOMM’16) ✗ ✔ Route as a compact bit vector ✔ ✔ Protocol Dst IP Dst mask Administrative (32 bits) (5 bits) distance (4 bits) attributes (87 bits) A route as a succinct and unifying control plane I/O unit 12 Control plane as a fast pipeline of boolean operators: Intuition ? Router control plane • Why not actual router’s code? Hard to explore • Router as a fast route processing pipeline X3 X2 X1 X0 protocol attribute • An example: admin. distance (RIP=1, BGP=0) _ _ Router control X 1X 0 X3X1X0∨X2X1X0 ? plane prefix router config. input _ X 1X 0 _ RIP _ X1X0 X1= X1X0 _ static 10 _ _ _ _ X3X2 X1X0 = X3X1X0∨X2X1X0 set RIP attr. to 1 _ _ _ X3X1X0∨X2X1X0 13 Control plane as a fast pipeline of boolean operators: Complete pipeline BDD of input routes AND with supported protocols 2 6 AND with NEG. 5 7 Select best route 8 1 of static routes per dst prefix Apply input filters OR with aggregate routes Apply output filters 3 OR with routes originated by router 4 OR with redistributed routes BDD of output routes • Compact representation of a collection of routes using Binary Decision Diagrams (BDDs) • The pipeline captures key control plane behaviors that are source of many bugs. 14 Challenge 2: Scalable control plane exploration Operator router configurations reachability policies control plane model Challenge 1: Expressive and tractable model? model exploration Challenge 2: Scalable exploration ERA 15 Reachability analysis by exploring the control plane model • Intuition: To see what traffic can reach from A to B, just find out what route prefixes advertised by B can reach A! route advertisements (represented as BDDs) traffic A True ? ? True R3 R4 R A Network Environment • Prepare for the worst! R1 R R2 B B R5 True ? • Optimizations to scale control plane exploration: – Equivalence classes of routes – Fast AVX2 instructions to implement conjunction/disjunction 16 Outline • Background and motivation • Design of ERA • Implementation and evaluation 17 Implementation Router config. parser from Batfish (NSDI’15) parsing Cisco and Juniper Network topology (Custom format) Operator Reachability policies (e.g., AB, valleyfree, blackhole) Environment assumptions (default: “all routes”) Control plane model (Custom Java code and BDD library) Model Exploration (Java and Intel AVX2 optimizations) https://github.com/Network- verification/ERA 18 Evaluation • ERA is effective in finding latent reachability bugs – Found known and new bugs in synthetic scenarios – Found known and new bugs in real scenarios – These bugs were caused by router misconfiguration wrt • • • • • Incorrect route redistribution Incorrect route aggregation Unintended cross-protocol effects Interaction between SDN and traditional routing protocols … • ERA is fast and scalable – ERA analyzes networks with over 1,600 routers in < 7 seconds – Finding a latent bug using state of the art data plane analysis techniques in a 2-router network would take up to 1022 days! 19 Conclusions • Problem: How to find latent network reachability bugs? • Data plane verification is fundamentally limited • Current control plane analysis tools are incomplete or unscalable • ERA: A fast control plane analysis tool: • Modeling control plane’s I/O as compact BDDs • Modeling control plane processing logic using fast boolean arithmetic • ERA can help find latent bugs and is scalable 20