Download high-speed packet forwarding in software routers on multi

SEATTLE and Recent Work Jennifer Rexford Princeton University http://www.cs.princeton.edu/~jrex Joint with Changhoon Kim, Minlan Yu, and Matthew Caesar Challenges in Edge Networks • Large number of hosts –Tens or even hundreds of thousands of hosts • Dynamic hosts –Host mobility –Virtual machine migration • Cost conscious –Equipment costs –Network-management costs Need a scalable and efficient self-configuring network 2 An All-Ethernet Solution? • Self-configuration –Hosts: MAC addresses –Switches: self-learning • Simple host mobility –Location-independent flat addresses • But, poor scalability and performance –Flooding frames to unknown destinations –Large forwarding tables (one entry per address) –Broadcast for discovery (e.g., ARP, DHCP) –Inefficient delivery of frames over spanning tree 3 An All-IP Solution? • Scalability –Hierarchical prefixes (smaller tables) –No flooding • Performance –Forwarding traffic over shortest paths • But, several disadvantages –Complex joint configuration of routing and DHCP –Clumsy mobility (location-dependent addresses) –Expensive longest-prefix match in data plane 4 Compromise: Hybrid Architecture Ethernet-based IP subnets interconnected by routers Ethernet Bridging - R Flat addressing Self-learning R Flooding Forwarding along a tree IP Routing R - Hierarchical addressing R - Subnet configuration - Host configuration - Forwarding along shortest paths R Sacrifices Ethernet’s simplicity and IP’s efficiency for scalability 5 Can We Do Better? • Shortest-path routing on flat addresses –Shortest paths: scalability and performance –MAC addresses: self-configuration and mobility H S S H H H S S S S S H S H S H S S S S H H • Scalability without hierarchical addressing –Limit dissemination and storage of host info –Sending packets on slightly longer paths 6 SEATTLE Scalable Ethernet Architecture for Large Enterprises (joint work with Changhoon Kim and Matt Caesar) 7 SEATTLE Design Decisions Objective Approach Solution 1. Avoiding flooding Never broadcast unicast traffic 2. Restraining broadcasting Bootstrap hosts via unicast 3. Reducing routing state Populate host info only when and where it is needed Traffic-driven resolution with caching 4. Shortest-path forwarding Allow switches to learn topology L2 link-state routing maintaining only switch-level topology * Meanwhile, avoid modifying end hosts Network-layer one-hop DHT 8 Network-Layer One-hop DHT • Maintains <key, value> pairs with function F – Consistent hash mapping a key to a switch – F is defined over the set of live switches • One-hop DHT 2128-1 0 1 – Link-state routing ensures switches know each other • Benefits – Fast and efficient reaction to changes – Reliability and capacity naturally grow with size of the network 9 Location Resolution <key, val> = <MAC addr, location> x C Owner Host discovery A Hash F(MACx) = B y Forward directly from D to A User Publish <MACx, A> Tunnel to A Resolver Switches End hosts Control message Data traffic Tunnel to B B Store <MACx, A> D Traffic to x Hash F(MACx ) = B Notify <MACx, A> E 10 Address Resolution <key, val> = <IP addr, MAC addr> x <IPx ,MACx> Broadcast ARP request for IPx C A Hash F(IPx) = B Unicast look-up to B D y Hash F(IPx ) = B Unicast reply <IPx, MACx, A> B Store <IPx, MACx, A> E Traffic following ARP takes a shortest path without separate location resolution 11 Handling Network and Host Dynamics • Network events –Switch failure/recovery  Change in <key, value> for DHT neighbor  Fortunately, switch failures are not common –Link failure/recovery  Link-state routing finds new shortest paths • Host events –Host location, MAC address, or IP address –Must update stale host-information entries 12 Handling Host Information Changes Dealing with host mobility x Old location F Host talking with x y A < x, A > < x, D > New location C < x, A > < x, D > B D < x, D > Resolver < x, A > < x, D > E MAC- or IP-address change can be handled similarly 13 Packet-Level Simulations • Large-scale packet-level simulation –Event-driven simulation of control plane –Synthetic traffic based on LBNL traces –Campus, data center, and ISP topologies • Main results –Much less routing state than Ethernet –Only slightly more stretch than IP routing –Low overhead for handling host mobility 14 Amount of Routing State Ethernet SEATTLE SEATTLE reduces the amount of routing w/ caching state by more than an order of magnitude SEATTLE w/o caching 15 Cache Size vs. Stretch Stretch = actual path length / shortest path length (in latency) ROFL SEATTLE offers near-optimal stretch with very small amount of routing state SEATTLE 16 Sensitivity to Mobility Ethernet SEATTLE rapidly updates routing state with very low overhead SEATTLE w/ caching SEATTLE w/o caching 17 Prototype Implementation XORP Click Interface Network Map OSPF Daemon Link-state advertisements User/Kernel Click Routing Table Ring Manager Host Info Manager Host-info registration and notification msgs SeattleSwitch Data Frames Data Frames Throughput: 800 Mbps for 512B packets, or 1400 Mbps for 896B packets 18 Conclusions on SEATTLE • SEATTLE –Self-configuring –Scalable –Efficient • Enabling design decisions –One-hop DHT with link-state routing –Reactive location resolution and caching –Shortest-path forwarding 19 VL2 Architecture (Microsoft) 20 VL2 Data Center Architecture • VL2 work at Microsoft – Focus on data centers, rather than enterprise networks • Similarities to SEATTLE – Flat addressing – Mapping of destination address to egress switch – Packet encapsulation to egress switch • Differences from SEATTLE – Separate directory servers, not an in-network DHT – Lookup and encapsulation performed by the servers – Load balancing over many paths (ECMP + anycast) 21 Scalable Flow-Based Networking with DIFANE Joint work with Minlan Yu, Mike Freedman, and Jia Wang Flexible Policies in Enterprises • Access control – Drop packets from malicious hosts • Customized routing – Direct Skype calls on a low-latency path • Measurement – Collect detailed HTTP traffic statistics HTTP HTTP 23 Flow-based Switches • Install rules in flow-based switches – Store rules in high speed memory (TCAM) • Perform simple actions based on rules – Rules: Match on bits in the packet header – Actions: Drop, forward, count Flow space dst. src. forward via link 1 drop 24 Challenges of Policy-Based Management • Policy-based network management – Specify high-level policies in a management system – Enforce low-level rules in the switches • Challenges – Large number of hosts, switches and policies – Limited TCAM space in switches – Support host mobility – No hardware changes to commodity switches 25 Pre-install Rules in Switches Pre-install rules Packets hit the rules Controller Forward • Problems: – No host mobility support – Switches do not have enough memory 26 Install Rules on Demand (Ethane, NOX) Buffer and send packet header to the controller Controller Install rules First packet misses the rules Forward • Problems: – Delay of going through the controller – Switch complexity – Misbehaving hosts 27 DIFANE Architecture (two stages) DIstributed Flow Architecture for Networked Enterprises 28 Stage 1 The controller proactively generates the rules and distributes them to authority switches. 29 Partition and Distribute the Flow Rules Controller Flow space Distribute partition information Authority Switch A AuthoritySwitch B Authority Switch C Authority Switch B Ingress Switch accept Authority Switch A reject Egress Switch Authority Switch C 30 Stage 2 The authority switches keep packets always in the data plane and reactively cache rules. 31 Packet Redirection and Rule Caching Authority Switch Ingress Switch Egress Switch First packet Following packets Hit cached rules and forward A slightly longer path in the data plane is faster than going through the control plane 32 Locate Authority Switches • Partition information in ingress switches – Using a small set of coarse-grained wildcard rules – … to locate the authority switch for each packet • Distributed directory service but not DHT – Hashing does not work for wildcards – Keys can have wildcards in arbitrary bit positions Authority Switch A AuthoritySwitch B Authority Switch C X:0-1 Y:0-3  A X:2-5 Y: 0-1B X:2-5 Y:2-3  C 33 Three Sets of Rules in TCAM Type Cache Rules Priority Field 1 Field 2 Action Timeout 210 10 sec 00** 111* Forward to Switch B In ingress switches 209 1110 11** Drop reactively installed by authority switches 10 sec … … … … … 110 00** 001* Forward Trigger cache manager Infinity … Authority In authority switches 109 0001 0*** Drop, proactively installed byTrigger controller Rules cache manager … … … … 15 0*** 000* Redirect to auth. switch … … … … Partition In every switch 14 … Rules proactively installed by controller … 34 DIFANE Switch Prototype Built with OpenFlow switch Recv Cache Updates Control Plane Only in Auth. Switches Send Cache Updates Cache Manager Notification Cache Rules Data Just software modification forRules authority switches Authority Plane Partition Rules 35 Caching Wildcard Rules • Overlapping wildcard rules – Cannot simply cache matching rules 36 Caching Wildcard Rules • Multiple authority switches – Contain independent sets of rules – Avoid cache conflicts in ingress switch Authority switch 1 Authority switch 2 37 Partition Wildcard Rules • Partition rules – Minimize the TCAM entries in switches – Decision-tree based rule partition algorithm Cut B is better than Cut A Cut B Cut A 38 Handling Network Dynamics Network dynamics Policy changes at controller Topology changes at switches Host mobility Authority Rules Partition Rules Timeout Change Mostly no change No change No change Change Cache rules Timeout No change No change 39 Prototype Evaluation • Evaluation setup – Kernel-level Click-based OpenFlow switch – Traffic generators, switches, controller run on separate 3.0GHz 64-bit Intel Xeon machines • Compare delay and throughput – NOX: Buffer packets and reactively install rules – DIFANE: Forward packets to authority switches 40 Delay Evaluation • Average delay (RTT) of the first packet – NOX: 10 ms – DIFANE: 0.4 ms • Reasons for performance improvement – Always keep packets in the data plane – Packets are delivered without waiting for rule caching – Easily implemented in hardware to further improve performance 41 Peak Throughput • One authority switch; Single-packet flow Throughput (flows/sec) 1,000K DIFANE NOX 1 ingress 2 3 4 switch 100K DIFANE (800K) Ingress switch DIFANE Bottleneck further increases the throughput linearly with(20K) the number of authority switches. 10K Controller Bottleneck (50K) 1K 1K 10K 100K Sending rate (flows/sec) 1000K 42 Scaling with Many Rules • How many authority switches do we need? – Depends on total number of rules – … and the TCAM space in these authority switches Campus IPTV # Rules 30K 5M # Switches 1.7K 3K Assumed Authority Switch TCAM size Required # Authority Switches 160 KB 1.6 MB 5 (0.3%) 100 (3%) 43 Conclusion • DIFANE is a scalable way to support policy – Mix of reactive and proactive rule installation – Proactive compute and partition the rules – Reactively cache rules at ingress switches • Generalized SEATTLE – SEATTLE: destination-based forwarding  Rules based (only) on flat MAC addresses  Lookup using hashing, in a one-hop DHT – DIFANE: multidimensional packet classification  Rules based on many fields, with possible wildcards  Lookup based on coarse-grained partition rules • Scalable solution for flow-based networking 44 Backup Slides 45 BUFFALO Bloom Filter Forwarding Architecture for Large Organizations 46 Data Plane Scaling Challenge • Large layer-two networks –Many end-host MAC addresses –Many switches • Forwarding-table size becomes a problem –Requires a large, fast memory –Expensive and power hungry –Over-provisioning to avoid running out • Buffalo’s goal –Given a small, fast memory –… make the most of it! Bloom Filters • Bloom filters in fast memory – A compact data structure for a set of elements – Calculate s hash functions to store element x – Easy to check set membership – Reduce memory at expense of false positives x Vm-1 V0 0 0 0 1 0 0 0 1 0 1 h1(x) h2(x) h3(x) 0 1 0 0 0 hs(x) 48 Bloom Filter Forwarding • One Bloom filter (BF) per next hop – Store all addresses forwarded to that next hop Bloom Filters query packet Nexthop 1 Nexthop 2 hit …… Nexthop T 49 BUFFALO Challenges How to optimize memory usage? • Minimize the false-positive rate How to handle false positives quickly? • No memory/payload overhead How to handle routing dynamics? • Make it easy and fast to adapt Bloom filters 50 1. Optimize Memory Usage • Goal: Minimize overall false-positive rate – Probability that one BF has a false positive • Input: – Fast memory size M – Number of destinations per next hop – The maximum number of hash functions • Output: the size of each Bloom filter – Larger BF for next hop with more destinations 51 1. Optimize Memory Usage (cont.) • Constraints – Memory constraint  Sum of all BF sizes <= fast memory size M – Bound on number of hash functions  To bound CPU calculation time  Bloom filters share the same hash functions • Convex optimization problem – An optimal solution exists – Solved by IPOPT optimization tool – Runs in about 50 msec 52 1. Minimize False Positives • Forwarding table with 200K entries, 10 next hops Overall False Positives [fraction] • 8 hash functions 1.00E+00 1.00E-01 1.00E-02 1.00E-03 1.00E-04 1.00E-05 1.00E-06 0 200 400 600 Memory size (KB) 800 1000 53 1. Comparing with Hash Table Fast Memory Size (MB) • Save 65% memory with 0.1% false positive 14000 12000 10000 8000 6000 4000 2000 0 hash table fp=0.01% fp=0.1% fp=1% 0 500 1000 1500 #Forwarding Table Entries (K) 65% 2000 54 2. Handle False Positives • Design goals – Should not modify the packet – Never go to slow memory – Ensure timely packet delivery • BUFFALO solution – Exclude incoming interface  Avoid loops in one false positive case – Random selection among the rest  Guarantee reachability with multiple FPs 55 2. One False Positive • Most common case: one false positive –When there are multiple matching next hops –Avoid sending to incoming interface • We prove that there is at most a 2-hop loop with a stretch <= l(AB)+l(BA) A dst B 56 2. Multiple False Positives • Handle multiple false positives – Random selection from matching next hops – Random walk on shortest path tree plus a few false positive links – To eventually find out a way to the destination False positive link Shortest path tree for destination dst 57 2. Multiple False Positives (cont.) • Provable stretch bound –With k false positives –… expected stretch is at most O(k2*3k/3) –Proved by random walk theories • Stretch bound is actually not that bad –False positives are independent  Different switches use different hash functions –k false positives for one packet are rare  Probability of k false positives drops exponentially in k • Tighter bounds in special cases: e.g., tree 58 2. Stretch in Campus Network 1.E+00 fp=0.5% When fp=0.001% fp=0.1% When fp=0.5%, 99.9% of the packets fp=0.001% have no stretch 0.0002% packets 0.0003% packets have a stretch 6 times have aofstretch of path length shortest shortest path length 1.E-01 CCDF 1.E-02 1.E-03 1.E-04 1.E-05 1.E-06 0 1 2 3 4 5 6 Stretch normalized by shortest path length 7 59 3. Update on Routing Change • Use CBF in slow memory – Assist BF to handle forwarding-table updates – Easy to add/delete a forwarding-table entry CBF in slow memory 2 0 1 0 0 0 Delete a route 3 0 2 0 0 0 0 0 2 0 1 0 1 0 0 0 0 0 1 0 BF in fast memory 1 0 1 0 0 0 3. Occasionally Resize BF • Under significant routing changes –# of addresses in BFs changes significantly –Re-optimize BF sizes • Use CBF to assist resizing BF –Large CBF and small BF –Easy to expand BF size by contracting CBF CBF 0 0 1 1 0 0 0 3 0 Easy to contract CBF to size 4 BF 0 Hard to expand to size 4 0 0 1 0 BUFFALO Switch Architecture Prototype implemented in kernel-level Click 62 Prototype Evaluation • Environment – 3.0 GHz 64-bit Intel Xeon – 2 MB L2 data cache (fast-memory size M) – 200K forwarding-table entries and 10 next hops • Peak forwarding rate – 365 Kpps, 1.9 μs per packet – 10% faster than hash-based EtherSwitch • Performance with forwarding-table updates – 10.7 μs to update a route – 0.47 s to reconstruct BFs based on CBFs 63 Conclusion on BUFFALO • BUFFALO –Small, bounded memory requirement –Small stretch –Fast reaction to routing updates • Enabling design decisions –Bloom filter per next hop –Optimizing of Bloom filter sizes –Preventing forwarding loops –Dynamic updates using counting Bloom filters 64 SeaBuff Seattle + Buffalo 65 Seattle + Buffalo • Seattle –Shortest-path routing and scalable control plane –Fewer host MAC addresses stored at switches • Buffalo –Less memory for given # of routable addresses –Graceful handling of increase in # of addresses • Combined data plane egress switch destination cache Bloom filter Seattle Buffalo outgoing link Two Small Sources of Stretch x y C Traffic to x A D B E Relay for x • Seattle: diverting some packets through relay B • Buffalo: extra stretch from D to B, or B to A Choosing the Right Solution • Spatial distribution of traffic –Sparse: caching in Seattle is effective –Dense: caching is not as effective • Temporal distribution of traffic –Stable: shortest path routing is effective –Volatile: forwarding through relay spreads traffic • Topology –Arbitrary: false positives have more impact –Tree-like: false positives have less impact Conclusions • Growing important of edge networks –Enterprise networks –Data center networks • Shortest-path routing and flat addressing –Self-configuring and efficient –Scalability in exchange for stretch • Ongoing work –SeaBuff = Seattle + Buffalo –Theoretical analysis of stretch in Buffalo –Efficient access control in OpenFlow

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download high-speed packet forwarding in software routers on multi