Download PPT

Paper Review Building a Robust Software-based Router Using Network Processors ABSTRACT Need More Service  Software-based Routers Router: IXP1200 Network Processor development board PC  3.47 Mpps (minimum size packets) or 1.77 G of aggregate Hierarchical Architecture: Guarantees line speed for forwarding of simple packets Extra capacity for exceptional packets in P3(310 Kpps and 1510 cycles for each)  INTRODUCTION Most Network Processors use parallelism. IXP1200: 6 Micro Engines each supporting up to 4 hardware contexts. Router with a data plane (MEs) and a control plane (P3). Processor Hierarchy: OSPF, Updating Routing tables, …[More cycles] Missed packets from cache Minimum packet processing, forwarding,…[Fewer cycles] ARCHITECTURE-Software •Classifier •Forwarder •Scheduler Two default forwarder: Input queue •Minimal IP forwarding fast path. •Full IP protocol (IP options) Two main attributes: Explicit support for adding new forwarders in run time Does not specify where in Processor hierarchy Input queue ARCHITECTURE-Hardware IXP Evaluation System (200MHz): •32MB DRAM (64-bit 100MHz) •2MB SRAM (32-bit 100MHz) •4KB On-chip scratch •64-bit 66MHz IX bus •Ethernet ports(8*100M + 2*1G) •32-bit 100MHz PCI Bus •4KB ISTORE for each ME rate of DRAM = 6.4Bbps •4KB I-cache for StrongARM Send/receive BW = 2*(8*100M+2*1G) = 5.6 Gbps •A pair of FIFOs: (16 slot*64 byte) Capacity of IX Bus = 4 Gbps Forwarding Pipeline The common unit = 64-byte MAC-packet(MP) MAC breaks and tag as first, intermediate, last or only MP in packet Allocating slots to MACs and drains input FIFO and fill output FIFO Can MEs from input FIFO to output FIFO in a single step? 2 stage pipeline: Input Processing INPUT_LOOP: 1 acquire_input_mutex() 2 if (!port_rdy(p)) goto INPUT_LOOP 3 load IN_FIFO[c] 4 release_input_mutex() 5 mp_addr = calculate mp_addr() 6 copy reg_mp_data IN_FIFO[c] 7 state = protocol_processing(reg_mp_data) 8 copy reg_mp_data  DRAM[ mp_addr] 9 if (at_start_of_packet(state)) 10 enqueue(state, state.queue) 11 goto INPUT_LOOP Strict FIFO slots and context binding For IP: Validating header Updating TTL Re-computing checksum Set source and dest MACs Destination Queue Minimum Forwarder: one-cycle hardware hash Scheduling & Buffering A Queue that is serviced by StrongARM Statically allocates a set of contexts to run input loop 16 input contexts Token passing (hardware signaling mechanism) to serialize DMA access. Buffer scheduling: 16MB of DRAM (8192 buffers of 2KB) consumed in a circular fashion A shared state variable Output Processing OUTPUT LOOP: Select none empty queue form that 1 acquire_output_mutex() port queues (Scheduling) 2 release_output_mutex() 3 if (finished_last_ packet) 4 qid = select_queue() 5 state = dequeue(qid) 6 mp_addr = first_mp(state) 7 else 8 mp_addr =next_mp(state) 9 fifo_addr = calculate_fifo_addr() 10 copy DRAM[mp_addr]OUT_FIFO[fifo_addr] 11 enable IN_FIFO[fifo_addr] 12 finished_last_packet =at_end_of_packet(state) 13 goto OUTPUT LOOP Queuing Queues: Circular arrays of 32-bit entries in SRAM. Queues are assigned statically to output contexts: Output context saves queues in 16 registers not in scratch memory. Multiple queues. Which one next? By prioritizing queues. Contention: 1. Use mutexes. 2. Have queues for each inputs in outputs  Single priority level Queuing [cont] I.2 + O.1 I.2 + O.3 : Maximum flexibility I.1 + O.3 : Slower rate Evaluation For one MP: 280 cycles for register operations 180(DRAM) + 90(SRAM) + 160(Scratch) = 430 cycles for memory Sum = 710 cycles = 3550 ns (for 200 MHz) 3.47 Mppseach packet is processed in 288 ns Result: The system can forward 12 packets in parallel Switching Paths Path C: Forward packets at 534 Kpps(500cpp) StrongARM is involved too. |No additional tasks for MEs. Path B: Forward packets at 526 Kpps Path A: Forward packets at maximum rate of 3.47Mpps PRIORITY StrongARM Complicated to decide forwarders: It supports Pentium It shares resources with MEs and can act like them OS on StrongARM: 1. Acts as a bridge that forward packets to P4 2. Supports a small collection of local forwarders Simple priority scheme: Gives packets being passed to P3 over packets that are to be processed locally. Virtual Router Processor MEs statically have 2 tasks: •A router infrastructure (RI) that is able to forward minimum-sized packets •A virtual router processor (VRP) that run additional code on behalf of each packet protocol_processing runs on abstract machine. Interfacing & Implementation StrongARM interacts with MEs: fid = install(key, fwdr, size, where) remove(fid) data = getdata(fid) setdata(fid, data) Key: (src addr, src port, dst addr, dst port) Where: ME: Load from StrongARM to ME’s ISTORE SA: Loads into DRAM PE: Loads into Pentium jump table Installs fwrd that matches the key and specified flow size and where indicates the processor Interfacing Some date forwarders: Conclusions •How to program the processor hierarchy with a fixed forwarding infrastructure that fully exploits the parallelism available on the IXP1200 MicroEngines. •Demonstrates how new functionality can be injected into all three levels of the processor hierarchy. •Statically partition the processing capacity of the MicroEngines into a fixed routing infrastructure and a programmable VRP. •Can be used in many designs.

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download PPT