Download Coping with Link Failures in Centralized Control Plane Architecture

Coping with Link Failures in Centralized Control Plane Architecture Maulik Desai, Thyagarajan Nandagopal Introduction • Traditional network - Router identifies link failures and establishes alternate route • Centralized control plane architectures – SoftRouter, 4D - Controller sends updates to switches - Switches have least amount of intelligence • Link failure between switches and controller - Switches attempt to find a controller - Routing loops may occur • Naïve solution – Flooding the news of link failure - Creating a lot of unnecessary traffic • Better solution - Only the necessary switches will be informed of link failure Still maintaining minimum amount of intelligence on the switches Implemented in a network formed by OpenFlow switches The news of link failures can reach switches sooner than the controller can identify the failure and send out updates Related Work • SoftRouter - Control function and packet forwarding function is separated - Increasing reliability in the network - Elements of a SoftRouter network includes Forwarding Element (FE): switches performing packet forwarding Control Element (CE): controllers running control plane functions Network Element (NE): logical grouping of some CEs and a few FEs • 4D - Four logical planes Decision Plane: makes all the decisions about network control Dissemination Plane: ensuring communication between decision plane and switches Discovery Plane: identifying physical components of a network Data Plane: handling individual packets, controlled by decision plane • Both maintaining a separate control plane from the data plane Related Work - Cont. • OpenFlow switches - Controlled by a remotely located controller - Maintaining multiple data paths in the same network - Flow table: Header Fields: Counters: maintaining statistics for the switch Actions: if the header of a received packet matches with the header fields, action defined in the field is applied Link Failure (1) • A simple link failure scenario Link between A and B fails – informing all relevant switches about the failed link and ask them to refrain from sending messages to B that travel towards A, until the controller sends them an update Link Failure (2) • Island Link between A and B fails – forming an island - B could inform C and D of the failed link, avoiding unnecessary traffic - A could inform the controller the failed link, preventing its attempt to reach the island Link Failure (3) • Routing Loop Link between B and C fails – forming a routing loop - B could inform A about the failed link - This process can be completed a lot sooner than the controller could identify the link failure and updating switches – preventing routing loops Coping with Link Failures • A scheme to reduce the damage caused by link failure - In case of link failure, all the switches that could send flows in the driaction of the failed link should be informed of this event - Link failure messages should not propagate in the network indefinitely, and unless required, these messages should not be flooded in the network - The scheme should provide enough information to the network switches regarding the flows that are affected by the failed link. At the same time, it should make sure that the flows that are not affected by this event do not get modified - The proposed scheme should not violate the basic premises of keeping the minimum amount of intelligence available at the switches Solutions to Link Failures • Goal - Informing link failure event to all the switches that could send flows in the direction of the failed link - Making sure Link Failure Message (LFM) do not get flooded in the entire network • Outline - A proper way to define a flow Computations for the switch experiencing link failure Computations for the switch receiving a LFM Importance of specifying ingress port in the flow definition Flow tables without ingress ports Solutions to Link Failures - A proper way to • One way to define the flow: define a flow • Better way to define the flow: Solution to Link Failures – Computations for the switch experiencing link failure (1) Solution to Link Failures – Computations for the switch experiencing link failure (2) • LFM Structure - Source Address: IP address of the switch the initiate the LFM - Message ID: ensuring that the same LFM does not get forwarded multiple times by the same switch - Flow Definition: a subset of header fields that make up the definition of the flow - Flow Count: indicating the total number of flow specifications that are attached with the LFM Solution to Link Failures – Computations for the switch receiving a LFM (1) • Upon receiving a LFM - Making the note of the interface (rxInterface) from where the message came in - Detach the list (flowList) of flow definitions attached with the LFM - Looking up flow table and locating ingress ports - Sending out new LFM (Why?) - Modifying the Action field of the affected flow table entries (the tricky part) Solution to Link Failures – Computations for the switch receiving a LFM (2) • Why sending out new LFM instead of forwarding the same LFM Solution to Link Failuress – Computations for the switch receiving a LFM (3) • Modifying the Action field of the affected flow table entries - Splitting a flow table entry into two Solution to Link Failure – Importance of specifying ingress port in the flow definition • Specifying ingress port will be the most helpful in the topology that is similar to a perfect graph • Specifying ingress port will be the least helpful in a chain topology Solution to Link Failures – Flow tables without ingress port • Sometimes it may not be possible to specify ingress port for the flow table entries in all the switches - Have to flood LFM to all the switches in the network - LFM may float around in the network indefinitely • Solution – including a “Hop count” or “Time to live” field - “Hop count” decreases by one every hop as LFM gets forwarded - stop forwarding a LFM if “Hop count” is 0 - “Time to live” is a timestamp – stop forwarding a LFM once the “Time to live” expires - Those values have to be chosen carefully Performance Analysis (1) • Environment - A small network of kernel-based OpenFlow switches - Switches are installed on VMs that run on Debian Lenny linux - A chain topology is used - VMs share a 2.6Ghz processor, with 64M of RAM assigned to each of them Performance Analysis (2) • Results - Since VMs are not very well time synchronized, it is difficult to calculate total amount of time taken to reach all the switches - Calculating the time difference between receiving a LFM and sending out a new LFM - Sum is 394 mSec + time taken between transmitting and receiving LFMs - Total time taken to send LFM to every switch is negligible compared to the time between controller’s connectivity probe which may vary between tens of seconds to a few hundred seconds Performance Analysis (3) • Processing time VS. Flow table entries Conclusion • In centralized control plane architecture, link failure could create many problems • To address the problems, a solution is proposed - Informing relevant switches to refrain from sending traffic towards the failed link without flooding - Simplicity – Maintaining the basic premises of keeping the minimum intelligence available at all switches - All the relevant switches are informed of the failed link significantly sooner than a controller learns about the link failure and sends out an update

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Coping with Link Failures in Centralized Control Plane Architecture