Download nsa08 - Princeton University

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Cracking of wireless networks wikipedia , lookup

Net bias wikipedia , lookup

Distributed firewall wikipedia , lookup

Asynchronous Transfer Mode wikipedia , lookup

IEEE 802.1aq wikipedia , lookup

Peering wikipedia , lookup

Airborne Networking wikipedia , lookup

Deep packet inspection wikipedia , lookup

Network tap wikipedia , lookup

IEEE 1355 wikipedia , lookup

Transcript
Traffic Engineering for ISP Networks
Jennifer Rexford
Computer Science Department
Princeton University
http://www.cs.princeton.edu/~jrex
A Challenge in ISP Backbone Networks
 Finding
a good way to route the data packets
– Given the current network topology and offered traffic
– For good performance and efficient use of resources
Why the Problem is Hard?
 IP traffic
varies, and the service is best effort
– The offered traffic is not known in advance
– The resources in the network are not reserved
 The
routers do not adapt on their own
– Load-sensitive routing is not widely deployed
– Due to control overhead and stability challenges
 Routing
protocols were not designed to be managed
– At best indirect control over the flow of traffic
 Fine-grain
traffic measurements often unavailable
– E.g., only have coarse-grain link load statistics
In This Talk…
 TE
with traditional IP routing protocols
– Shortest-path protocols with configurable link weights
 Two
main research challenges
– Optimization: tuning link weights to the offered traffic
– Tomography: inferring the offered traffic from link load
 Deployed
solutions in AT&T’s U.S. backbone
– Our experiences working with the network operators
– And how we improved the tools over time
 Ongoing
research on traffic management
Optimization: Tuning Link Weights
Routing Inside an Internet Service Provider
 Routers
flood information to learn the topology
– Routers determine “next hop” to reach other routers…
– By computing shortest paths based on the link weights
 Routers
forward packets via the “next hop” link(s)
2
3
2
1
1
1
3
5
4
3
Link Weights Control the Flow of Traffic
 Routers
compute paths
– Shortest paths as sum of link weights
 Operators
set the link weights
– To control where the traffic goes
2
3
2
1
1
31
3
5
4
3
Heuristics for Setting the Link Weights
 Proportional
to physical distance
– Cross-country links have higher weights than local ones
– Minimizes end-to-end propagation delay
 Inversely
proportional to link capacity
– Smaller weights for higher-bandwidth links
– Attracts more traffic to links with more capacity
 Tuned
based on the offered traffic
– Network-wide optimization of weights based on traffic
– Directly minimizes key metrics like max link utilization
Why Are the Link Weights Static?
 Strawman
alternative: load-sensitive routing
– Link metrics based on traffic load
– Flood dynamic metrics as they change
– Adapt automatically to changes in offered load
 Reasons
why this is typically not done
– Delay-based routing unsuccessful in the early days
– Oscillation as routers adapt to out-of-date information
– Most Internet transfers are very short-lived
 Research
and standards work continues…
– … but operators have to work with what they have
Big Picture: Measure, Model, and Control
Network-wide
“what if” model
Offered
Topology/
traffic
Configuration
measure
Changes to
the network
control
Operational network
Traffic Engineering in an ISP Backbone
 Topology
– Connectivity and capacity of routers and links
 Traffic
matrix
– Offered load between points in the network
 Link
weights
– Configurable weights for shortest-path routing
 Performance
objective
– Balanced load, low latency, service level agreements …
 Question:
Given the topology and traffic matrix in
an IP network, which link weights should be used?
Key Ingredients of Our Approach
 Measurement
– Topology: monitoring of the routing protocols
– Traffic matrix: widely deployed traffic measurement
 Network-wide
models
– Representations of topology and traffic
– “What-if” models of shortest-path routing
 Network
optimization
– Efficient algorithms to find good configurations
– Operational experience to identify key constraints
Formalizing the Optimization Problem
 Input:
graph G(R,L)
– R is the set of routers
– L is the set of unidirectional links
– cl is the capacity of link l
 Input:
traffic matrix
– Mi,j is traffic load from router i to j
 Output:
setting of the link weights
– wl is weight on unidirectional link l
– Pi,j,l is fraction of traffic from i to j traversing link l
Multiple Shortest Paths With Even Splitting
0.25
0.25
0.5
1.0
0.25
0.5
1.0
0.25
0.5
0.5
Values of Pi,j,l
Defining the Objective Function
 Computing
the link utilization
– Link load: ul = Si,j Mi,j Pi,j,l
– Utilization: ul/cl
 Objective
functions
– min(maxl(ul/cl))
f(x)
– min(Sl f(ul/cl))
x
1
Complexity of the Optimization Problem
 NP-hard
optimization problem
– No efficient algorithm to find the link weights
– Even for the simple convex objective functions
 Why
can’t we just do multi-commodity flow?
– E.g., solve the multi-commodity flow problem…
– … and the link weights pop out as the dual
– Because IP routers cannot split arbitrarily over ties
 What
are the implications?
– Have to resort to searching through weight settings
Optimization Based on Local Search
 Start
with an initial setting of the link weights
– E.g., same integer weight on every link
– E.g., weights inversely proportional to link capacity
– E.g., existing weights in the operational network
 Compute
the objective function
– Compute the all-pairs shortest paths to get Pi,j,l
– Apply the traffic matrix Mi,j to get link loads ul
– Evaluate the objective function from the ul/cl
 Generate
a new setting of the link weights
repeat
Making the Search Efficient
 Avoid
repeating the same weight setting
– Keep track of past values of the weight setting
– … or keep a small signature (e.g., a hash) of past values
– Do not evaluate a weight setting if signatures match
 Avoid
computing the shortest paths from scratch
– Explore weight settings that changes just one weight
– Apply fast incremental shortest-path algorithms
 Limit
the number of unique values of link weights
– Do not explore all 216 possible values for each weight
 Stop
early, before exploring the whole search space
Incorporating Operational Realities
 Minimize
number of changes to the network
– Changing just 1 or 2 link weights is often enough
 Tolerate
failure of network equipment
– Weights settings usually remain good after failure
– … or can be fixed by changing one or two weights
 Limit
dependence on measurement accuracy
– Good weights remain good, despite random noise
 Limit
frequency of changes to the weights
– Joint optimization for day and night traffic matrices
Application to AT&T’s Backbone Network
 Performance
of the optimized weights
– Search finds a good solution within a few minutes
– Much better than link capacity or physical distance
– Competitive with multi-commodity flow solution
 How AT&T changes
the link weights
– Maintenance done every night from midnight to 6am
– Predict effects of removing link(s) from the network
– Reoptimize the link weights to avoid congestion
– Configure new weights before disabling equipment
Example from My Visit to AT&T’s Operations Center
 Amtrak
repairing/moving part of the train track
– Need to move some of the fiber optic cables
– Or, heightened risk of the cables being cut
– Amtrak notifies us of the time the work will be done
 AT&T engineers
model the effects
– Determine which IP links go over the affected fiber
– Pretend the network no longer has these links
– Evaluate the new shortest paths and traffic flow
– Identify whether link loads will be too high
Example Continued
 If
load will be too high
– Reoptimize the weights on the remaining links
– Schedule the time for the new weights to be configured
– Roll back to the old weight setting after Amtrak is done
 Same
process applied to other cases
– Assessing the network’s risk to possible failures
– Planning for maintenance of existing equipment
– Adapting the link weights to installation of new links
– Adapting the link weights in response to traffic shifts
Conclusions on Traffic Engineering
 IP networks
do not adapt on their own
– Routers compute shortest paths based on static weights
 Service
providers need to adapt the weights
– Due to failures, congestion, or planned maintenance
 Leads
to an interesting optimization problems
– Optimize link weights based on topology and traffic
 Optimization
problem is computationally difficult
– Forces the use of efficient local-search techniques
 Results
of the local search are pretty good
– Near-optimal solutions that minimize disruptions
Extensions
 Robust
link-weight assignments
– Link/node failures
– Range of traffic matrices
 More
complex routing models
– Destinations reachable via multiple “egress points”
– Interdomain routing policies
 Interaction
between ISPs
– Inter-ISP negotiation for joint optimization
– Grappling with scalability and trust issues
Tomography: Inferring the Traffic Matrix
Computing the Traffic Matrix Mi,j
 Hard
to measure the traffic matrix
– IP networks transmit data as individual packets
– Routers do not keep traffic statistics, except link
utilization on (say) a five-minute time scale
 Need
to infer the traffic matrix Mi,j from
– Current topology G(R,L)
– Current routing Pi,j,l
– Current link load ul
– Link capacity cl
Inference: Network Tomography
From link counts to the traffic matrix
Sources
5Mbps
3Mbps
4Mbps
4Mbps
Destinations
Tomography: Formalizing the Problem
 Ingress-egress
pairs
– p is a ingress-egress pair of nodes (i,j)
– xp is the (unknown) traffic volume for this pair Mi,j
 Routing
– Plp is proportion of p’s traffic that traverses l
 Links
in the network
– l is a unidirectional edge
– ul is the observed traffic volume on this link
 Relationship:
u = Px (work backwards to get x)
Tomography: One Observation Not Enough
 Linear
system of n nodes is underdetermined
– Number of links e is around O(n)
– Number of ingress-egress pairs c is O(n2)
– Dimension of solution sub-space at least c - e
 Multiple
observations are needed
– k independent observations (over time)
– Stochastic model with Poisson iid counts
– Maximum likelihood estimation to infer matrix
 Doesn’t
work all that well in practice…
Approach Used at AT&T: Tomo-gravity
 Gravitational
assumption
– Ingress point a has traffic via
– Egress point b has traffic veb
– Pair (a,b) has traffic proportional to via * veb
9
6
20
3
14
21
7
10
Approach Used at AT&T: Tomo-gravity
 Problem
with gravity model
– Gravity model ignores the load on the inside links
– Gravity assumption isn’t always 100% correct
– Resulting traffic matrix might not satisfy the link loads
 Combining
the two techniques
– Gravity: find a traffic matrix using the gravity model
– Tomography: find the family of traffic matrices
consistent with all link load statistics
– Tomo-gravity: find the tomography solution that is
closest to the output of the gravity model
 Works
extremely well (and fast) in practice
Conclusions
 Managing
IP networks is challenging
– Routers don’t adapt on their own to congestion
– Routers don’t reveal much information about traffic
 Measurement
provides a network-wide view
– Topology
– Traffic matrix
 Optimization
enables the network to adapt
– Inferring the traffic matrix from the link loads
– Optimizing the link weights based on the traffic matrix
New Research Direction: Design for Manage-ability
 Two
main parts of network management
– Control: optimization
– Measurement: tomography
 Two
research approaches
– Bottom up: do the best with what you have
– Top down: design systems that are easier to manage
 Design
for manage-ability
– “If you are both the professor and the student, you create
exam questions that are easy to answer.”
Example: Changing the Path Computation
 Routers
split traffic over multiple paths
– More traffic on shorter paths, less on longer ones
– In proportion to the exponential of path cost
2
3
2
 Exciting
result
1
1
1
4
3
5
3
– Can achieve optimal distribution of the traffic
– With polynomial-time algorithm for setting the weights
New Research Direction: Logically-Central Control
 Traditional
division of labor
– Routers: real-time, distributed protocols
– Management system: offline, centralized algorithms
 Example:
routing protocols and traffic engineering
– Routing: routers react automatically to link failures
– TE: management system sets the link weights
 The
case for separating routing from routers
– Better decisions with network-wide visibility
– Routers only collect measurements and forward packets
Example: Routing Control Platform (RCP)
 Logically-centralized
server
– Collects measurement data from the network
– Pushes forwarding tables into the routers
 Benefits
RCP
– Network-wide policies
– Flexible, easy to customize
– Fewer nodes to upgrade
ISP
 Feasibility
– High-end PC can compute routes for large ISP
– Simple replication to survive failures
References
 Traffic
engineering using traditional protocols
– http://www.cs.princeton.edu/~jrex/papers/ieeecomm02.pdf
– http://www.cs.princeton.edu/~jrex/papers/opthand04.pdf
– http://www.cs.princeton.edu/~jrex/papers/ton-whatif.pdf
 Tomo-gravity
to infer the traffic matrix
– http://www.cs.utexas.edu/~yzhang/papers/mmi-ton05.pdf
– http://www.cs.utexas.edu/~yzhang/papers/tomogravitysigm03.pdf
– http://www.cs.princeton.edu/~jrex/papers/sfi.pdf
References
 Design
for manage-ability
– http://www.cs.princeton.edu/~jrex/papers/pefti.pdf
– http://www.cs.princeton.edu/~jrex/papers/optimizability.pdf
– http://www.cs.princeton.edu/~jrex/papers/tie-long.pdf
 Routing
Control Platform
– http://www.cs.princeton.edu/~jrex/papers/rcp.pdf
– http://www.cs.princeton.edu/~jrex/papers/ccr05-4d.pdf
– http://www.cs.princeton.edu/~jrex/papers/rcp-nsdi.pdf
– http://www.research.att.com/~kobus/docs/irscp.inm.pdf