Download X 1 - Andrew.cmu.edu

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Cracking of wireless networks wikipedia , lookup

IEEE 1355 wikipedia , lookup

Transcript
Efficient Network Reachability Analysis using
a Succinct Control Plane Representation
Seyed K. Fayaz, Tushar Sharma, Ari Fogel
Ratul Mahajan, Todd Millstein, Vyas Sekar
George Varghese
Network configuration is hard
???
Network
operator
Reachability policy:
A can talk to B
Reality
R3 network does
What the
R4
R1
A
R2
B
network
Does the network do what we want it to do?
2
State of the art in network verification
Data plane verification
Prior work:
DP3
DP4
DP1
A
DP2
Can A talk to B?
Data plane
(Forwarding table)
B
HSA, NSDI’12
ATPG, CoNext’12
NOD, NSDI’15
…
Network
operator
✔
Reachability policy:
A can talk to B
✔
DP3
DP1
DP4
R3
R1
A
•
•
•
•
DP2
R2
Are we done?
R4
B
3
The data plane keeps changing!
DP3
DP1
DP3
DP3
DP4
DP4
DP1
DP2
DP1
Can A talk to B?
Time = t1
Time = t2
Can A talk to B?
time
Time = t3
Network
operator
traffic from A to B
Reachability policy:
A can talk to B
DP3
DP31
DP34
R3
R1
A
…
DP2
DP2
Can A talk to B?
DP4
DP32
R2
R4
B
4
Motivating example: Reachability bug
triggered by a BGP announcement
Before the incident
After the incident
W
W
A
DCA
services in
10.10.0.0/16
A
B
DCB
DCB/16 ✔
DCA
New service
10.10.1.160/28
10.10.0.0/16
culprit
B
DCB
DCB/28 ✔
DCB/16 ✗
Root cause:
Router B’s config. had a aggregate route 10.10.0.0/16 pointing to DCB
The /28 advertisement activated the aggregate route!
How can we proactively find such latent reachability bugs?
5
A data plane is just the current
incarnation of the control plane!
Router
Prior work on control
plane verification
configuration
Route advertisement
1 NSDI’05
– rcc,
– Bagpipe, OOPSLA’16
(e.g., BGP advertisement)
Control plane
– ARC, SIGCOMM’16
Route advertisement
2
– Batfish,
NSDI’15
(implementation
(e.g., OSPF advertisement)
– …
advertisements
to neighbors
of BGP, OSPF, etc.)
…
Route advertisement 3
(e.g., RIPLimitations:
advertisement)
DataPlane
Plane321
Data
Plane
Data
<prefix
P,port1>
• Incomplete: Focus on
just one
routing protocol
<prefix
P,port2>
<prefix
P,port3>
…
…… of message passing
• Unscalable: Detailed modeling
…
Router
To find latent reachability bugs, we should
focus on the control plane!
6
Our contributions
• ERA: A tool for finding latent router configuration
bugs in seconds based on control plane analysis
– Expressive-yet-tractable control plane model
– Scalable exploration of control plane model
• Implementation as an open source tool
7
ERA: System overview
Operator
router
configurations
reachability
policies
ERA
ERA: A tool to find latent reachability bugs due to router
misconfiguration.
Scope: Reachability bugs occurring in the steady state
8
Outline
• Background and motivation
• Design of ERA
• Implementation and evaluation
9
Challenges in control plane analysis
Operator
router
configurations
reachability
policies
control plane
model
Challenge 1:
Expressive
and tractable
model?
model
exploration
Challenge 2:
Scalable
exploration
ERA
10
Challenge 1: Expressive and tractable
control plane model
Operator
router
configurations
reachability
policies
control plane
model
Challenge 1:
Expressive
and tractable
model?
model
exploration
Challenge 2:
Scalable
exploration
ERA
11
A route as a succinct bit-vector
?
Router control
plane
Control plane I/ O model
Actual protocol’s messages
(e.g., Batfish, NSDI’15)
?
Expressive
Tractable
✔
✗
Protocol agnostic I/O model
(e.g., ARC, SIGCOMM’16)
✗
✔
Route as a compact bit vector
✔
✔
Protocol
Dst IP Dst mask Administrative
(32 bits) (5 bits) distance (4 bits) attributes (87 bits)
A route as a succinct and unifying control plane I/O unit
12
Control plane as a fast pipeline of
boolean operators: Intuition
?
Router control
plane
• Why not actual router’s code?  Hard to explore
• Router as a fast route processing pipeline
X3 X2 X1 X0 protocol attribute
• An example:
admin. distance
(RIP=1, BGP=0)
_
_
Router
control
X 1X 0 
X3X1X0∨X2X1X0
?
plane
prefix
router config.
input
_
X 1X 0
_
RIP
_
X1X0 X1= X1X0
_
static 10
_ _
_
_
X3X2 X1X0 = X3X1X0∨X2X1X0
set RIP attr.
to 1
_
_
_
X3X1X0∨X2X1X0
13
Control plane as a fast pipeline of
boolean operators: Complete pipeline
BDD of
input
routes
AND with
supported protocols
2
6 AND with NEG.
5
7 Select best route
8
1
of static routes
per dst prefix
Apply input filters
OR with
aggregate routes
Apply output filters
3 OR with routes
originated by router
4
OR with
redistributed routes
BDD of
output
routes
• Compact representation of a collection of routes using Binary
Decision Diagrams (BDDs)
• The pipeline captures key control plane behaviors that are
source of many bugs.
14
Challenge 2: Scalable control plane
exploration
Operator
router
configurations
reachability
policies
control plane
model
Challenge 1:
Expressive
and tractable
model?
model
exploration
Challenge 2:
Scalable
exploration
ERA
15
Reachability analysis by exploring the
control plane model
• Intuition: To see what traffic can reach from A to B, just find
out what route prefixes advertised by B can reach A!
route advertisements
(represented as BDDs)
traffic
A
True
?
?
True
R3
R4
R
A
Network
Environment
• Prepare for the worst!
R1
R
R2
B
B
R5
True
?
• Optimizations to scale control plane exploration:
– Equivalence classes of routes
– Fast AVX2 instructions to implement conjunction/disjunction
16
Outline
• Background and motivation
• Design of ERA
• Implementation and evaluation
17
Implementation
Router config. parser
from Batfish
(NSDI’15) parsing
Cisco and Juniper
Network topology
(Custom format)
Operator
Reachability policies
(e.g., AB, valleyfree, blackhole)
Environment
assumptions
(default: “all routes”)
Control plane model
(Custom Java code
and BDD library)
Model Exploration
(Java and Intel AVX2
optimizations)
https://github.com/Network- verification/ERA
18
Evaluation
• ERA is effective in finding latent reachability bugs
– Found known and new bugs in synthetic scenarios
– Found known and new bugs in real scenarios
– These bugs were caused by router misconfiguration wrt
•
•
•
•
•
Incorrect route redistribution
Incorrect route aggregation
Unintended cross-protocol effects
Interaction between SDN and traditional routing protocols
…
• ERA is fast and scalable
– ERA analyzes networks with over 1,600 routers in < 7 seconds
– Finding a latent bug using state of the art data plane analysis
techniques in a 2-router network would take up to 1022 days!
19
Conclusions
• Problem: How to find latent network reachability bugs?
• Data plane verification is fundamentally limited
• Current control plane analysis tools are incomplete or
unscalable
• ERA: A fast control plane analysis tool:
• Modeling control plane’s I/O as compact BDDs
• Modeling control plane processing logic using fast
boolean arithmetic
• ERA can help find latent bugs and is scalable
20