Download lockheed-jan05 - Princeton University

Route Control Platform Making the Network Act Like One Big Router Jennifer Rexford Princeton University http://www.cs.princeton.edu/~jrex http://www.cs.princeton.edu/~jrex/papers/rcp.pdf 1 Outline • Internet architecture – Complexity of network management • Moving control from routers to servers – Reducing complexity and increasing flexibility • Traffic engineering example – Today’s approach vs. the RCP • Making the RCP real – Deployability, scalability, and reliability • Example applications – Security, maintenance, and customer control 2 Internet Architecture • The Internet is – – – – Decentralized: loose confederation of peers Self-configuring: no global registry of topology Stateless: limited information in the routers Connectionless: no fixed connection between hosts • These attributes contribute – To the success of Internet – To the rapid growth of the Internet – … and the difficulty of controlling the Internet! sender receiver 3 A Well-Studied Architecture Question • • • • Smart hosts, dumb network Network moves IP packets between hosts Services implemented on hosts Keep state at the edges Edge IP Network IP How to partition function vertically? Edge 4 Inside a Single Network Shell scripts Management Plane • Figure out what is Planning tools Databases happening in network Configs SNMP netflow modems • Decide how to change it OSPF Control Plane • Multiple routing processes Link Routing OSPF metrics on each router policies BGP • Each router with different configuration program OSPF OSPF • Huge number of control BGP BGP FIB knobs: metrics, ACLs, policy FIB Traffic Eng FIBPacket filters Data Plane • Distributed routers • Forwarding, filtering, queuing 5 Inside a Single Network Shell scripts Management Plane • Figure out what is Planning tools Databases happening in network Configs SNMP netflow modems • Decide how to change it State everywhere! OSPF Control Plane • Link Dynamic state in Routing forwarding tablesrouting processes • Multiple OSPF metrics onpolicies, each router policies • Configured state in settings, packet filters BGP • Each router with different • Programmed state in magicconfiguration constants, program timers OSPF OSPF • Many dependencies between bitsnumber of stateof control • Huge BGP BGP FIB knobs: metrics, ACLs, policy State updated in uncoordinated, decentralized way! FIB Traffic Eng FIBPacket filters • • • • Data Plane Distributed routers Forwarding, filtering, queueing Based on FIB or labels 6 How Did We Get in This Mess? • Initial IP architecture – Bundled packet handling and control logic – Distributed the functions across routers – Didn’t anticipate the need for management • Rapid growth in features – Sudden popularity and growth of the Internet – Increasing demands for new functionality – Incremental extensions to protocols & router software • Challenges of distributed algorithms – Some functions are hard to do in a distributed fashion 7 What Does the Operator Want? • Network-wide views – Network topology – Mapping to lower-level equipment – Traffic matrix • Network-level objectives – – – – Load balancing Survivability Reachability Security • Direct control – Explicit configuration of data-plane mechanisms 8 What Architecture Would Achieve This? • Management plane  Decision plane – Responsible for all decision logic and state – Operates on network-wide view and objectives – Directly controls the behavior of the data plane • Control plane  Discovery plane – Responsible for providing the network-wide view – Topology discovery, traffic measurement, etc. • Data plane – Queues, filters, and forwards data packets – Accepts direct instruction from the decision plane 9 Example Application: Traffic Engineering • Problem: Adapt routing to the traffic demands – Inputs: network topology and traffic matrix – Outputs: routing of traffic that balances load • Three ways to solve the problem – Extend the control plane to adapt to load – Management plane, with today’s control plane – Decision plane 10 Interior Gateway Protocol (OSPF/IS-IS) • Routers flood information to learn the topology – Determine “next hop” to reach other routers… – Compute shortest paths based on the link weights • Link weights configured by the network operator 2 3 2 1 1 1 3 5 4 3 11 Control Plane: Let the Routers Adapt • Strawman alternative: load-sensitive routing – Link metrics based on traffic load – Flood dynamic metrics as they change – Adapt automatically to changes in offered load • Reasons why this is typically not done – Delay-based routing unsuccessful in the early days – Oscillation as routers adapt to out-of-date information – Most Internet transfers are very short-lived • Research and standards work continues… – … but operators have to do what they can today 12 Management Plane: Measure, Model, Control optimize Network-wide “what if” model Offered Topology/ traffic Configuration measure Changes to link weights control Operational network 13 Management Plane Approach • Topology – Connectivity and capacity of routers and links • Traffic matrix – Offered load between points in the network • Link weights – Configurable parameters for routing protocol • Performance objective – Balanced load, low latency, service agreements … • Question: Given the topology and traffic matrix, which link weights should be used? 14 Management Plane Solution • Measure – Topology: monitoring of the routing protocols – Traffic matrix: widely deployed traffic measurement • Model – Representations of topology and traffic – “What-if” models of the routing protocol • Optimize – Efficient local-search algorithms to find good settings – Operational experience to identify key constraints http://www.cs.princeton.edu/~jrex/papers/ieeecomm02.pdf 15 This Works, But Has Some Limitations • “What-if” model – Repeats the logic implemented in the control plane – Duplication of functionality, and debugging • Optimization techniques – Local search because the problem is intractable – Too much computation to explore all possibilities • Network effects – Link-weight changes are disruptive – Routers must converge after each change – Leads to transient packet loss and delay 16 Decision Plane Solution • Measure – Topology: monitoring of the routing protocols – Traffic matrix: widely deployed traffic measurement • Optimize the routing – Compute desired forwarding paths directly – Simpler than optimizing the link weights • Instruct the routers – Could change one router at a time to gradually switch to the new routes – Avoid transient packet loss and delays 17 More Network-Level Objectives • Survivability – Routing that can tolerate any single equipment failure – Incorporate knowledge of shared risk groups • Reachability policies – Control which pairs of hosts can communicate – Install packet filters and forwarding-table entries • Security – Install “blackhole” routes that drop attack traffic – Keep routing tables within router storage limits • Etc. 18 Is The Decision Plane Feasible? • Deployability: any path from here to there? – Must be compatible with today’s routers – Must provide incentives for deployment • Speed: can it run fast enough? – Must respond quickly to network events – Needs to be as fast as a router • Reliability: single point of failure? – Must be replicated to tolerate failure – Replicas must behave consistently 19 Deployability • Take a lesson from Ethernet – Change anything but the message format • Border Gateway Protocol (BGP) – Interdomain routing protocol for the Internet • Widely implemented on existing routers • Widely used, especially in backbone networks – Three main aspects of BGP • Protocol: standard messages sent between routers • Vendors: path-selection logic on individual routers • Operators: configuration of policies for path selection – Logic and policies are complex, but messages simple 20 Deployment in a Single Network Before: conventional use of BGP in a backbone network eBGP iBGP After: RCP learns external routes and sends answers to the routers eBGP RCP iBGP Only one AS has to change its architecture! 21 Longer Term, Wide-Spread Deployment • Represents an AS as a single logical entity – Complete view of AS’s routes – Computes routes for all routers inside an AS • Exchanges routing information with other ASes – Using BGP or a new inter-AS protocol – While still using BGP to talk to the routers RCP RCP Inter-AS Protocol RCP iBGP AS 1 Physical peering AS 2 AS 3 22 RCP Architecture Route Control Server BGP Engine Brain R RCP Brawn OSPF Viewer Route Control Server BGP Engine R R R OSPF Viewer R R R Scalability through decomposition; reliability through replication 23 Scalability: Three-Part RCP Architecture • OSPF viewer – Continuous view of network topology – Passive monitoring of link-state advertisements • BGP engine – Collecting BGP updates from border routers – Sending chosen routes to the router – Lots of TCP connections, like a Web server • Route Control Server – Logic for computing answers for the routers – Configuration for controlling the logic – Operates on real-time feeds from the monitors 24 Scalability: Initial Prototype • Implementation platform – 3.2 GHz Pentium-4 – 8 GB memory – Linux 2.6.5 kernel • Workload – Routing/topology changes in AT&T’s network • RCP performance – Memory usage: less than 2GB – Speed, BGP changes: less than 40 msec – Speed, topology changes: 0.1-0.8 seconds • System is able to keep up… 25 Reliability • Replication: avoid single point of failures – Multiple RCPs per network – Connected at different places • Consistency: replicas act as one – Replicas performing the same algorithm on the same input get the same answer (eventually) – Replica has complete view of each partition it sees A A A, B B B 26 Application: DDoS Blackholing • Blackholing of denial-of-service attacks – Preconfigure a “null” route on each router – Identify address of attack victim (from DoS system) – RCP assigns the destination address to the null route RCP “Use null route for 1.2.3.4/32” iBGP Victim 1.2.3.4 attack (detected by traffic analysis) 27 Application: Maintenance Dry-Out • Dry-out of traffic before maintenance – Plan to take a router out of service – RCP assigns routes via new egress points in advance after RCP s d “Use route via s for d” iBGP before r Router r about to undergo maintenance 28 Application: Customized Egress Selection • Customer-controlled selection of egress points – Customer with two data centers and many sites – Customer wants to control the load balance – RCP customization (not simply closest egress) “Use route via s for d” Site #1 RCP s d Site #2 “Use route via r for d” iBGP r 29 Conclusion • Managing IP networks is too hard – IP architecture not designed for management – Complex, distributed operation of routers • Reducing complexity in the key – Network-wide views & objectives, and direct control – Removing control logic and state from the routers • New architecture is feasible – RCP is deployable, scalable, reliable – RCP solves important operations problems 30

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download lockheed-jan05 - Princeton University