* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Commercial Network Processors
Computer network wikipedia , lookup
Distributed firewall wikipedia , lookup
Multiprotocol Label Switching wikipedia , lookup
Recursive InterNetwork Architecture (RINA) wikipedia , lookup
Airborne Networking wikipedia , lookup
Network tap wikipedia , lookup
Asynchronous Transfer Mode wikipedia , lookup
Serial digital interface wikipedia , lookup
Cracking of wireless networks wikipedia , lookup
Wake-on-LAN wikipedia , lookup
Quality of service wikipedia , lookup
Commercial Network Processor Architectures Agere PayloadPlus Vahid Tabatabaee Fall 2007 ENTS689L: Packet Processing and Switching Commercial Network Processor Architectures 1 References  Title: Network Processors Architectures, Protocols, and Platforms Author: Panos C. Lekkas Publisher: McGraw-Hill  Agere PayloadPlus Family White Papers  Payload+: Fast Pattern Matching & Routing for OC-48, David Kramer, Roger Bailey, David Brown, Sean Mcgee, Jim Greene, Robert Corley, David Sonnier, (Agere Systems) in Hot Chips a Symposium on High Performance Chips, Aug. 19-21, 2001  Agere Product Brief documents for FPP, RSP, ASI and FPL.  Agere White paper: “The case for a classification Language”, Feb. 2003. ENTS689L: Packet Processing and Switching Commercial Network Processor Architectures 2 General Information  Agere PayloadPlus is a comprehensive networking processor solution for OC-48.  It has expanded to support OC-192 through the NP10/TM10 (renamed to APP750NP and APP750TM).  This product is discontinued since then.  Originally this was a 3 chip solution but later on it was integrated into a single chip solution.  We review the original solution and APP550 (single chip) which their info. is on the Agere website. ENTS689L: Packet Processing and Switching Commercial Network Processor Architectures 3 The Big Picture The network processor family has a pipeline architecture and includes (in the original 3 chip solution):  Fast Pattern Processor (FPP)  Takes data from PHY chip  Protocol recognition  Classification  based on layer 2 to 7  Table lookup with millions of entries and variable lengths  Reassembly  Routing Switch Processor (RSP)  Queueing  Packet Modification  Traffic Shaping  QoS processes  Segmentation  Agere System Interface (ASI)  Management  Tracks state information  Support for RMON (Remote Monitoring) ENTS689L: Packet Processing and Switching Commercial Network Processor Architectures 4 The 3 Chip Solution  POS-PHY: Packet Over Sonet – PHYsical  UTOPIA: Universal Test & Operation Phy Interface for ATM  FBI: Functional Bus Interface Physical Interface FPP Configuration Bus RSP Fabric Interface Controller Switch Fabric 8-bit POS-PHY FBI ASI microP PCI to Host CPU 8-bit POS-PHY ENTS689L: Packet Processing and Switching Commercial Network Processor Architectures Source: http://nps.agere.com/support/non-nda/docs/FPP_Product_Brief.pdf 5 Main Responsibilities and Interfaces  FPP receives data from the PHY over a standard interface that can be POS PHY Level 3 (POS-PL3) or a UTOPIA 2 or 3 interface.  FPP classify traffic based on the contained at layer 2 to 7.  FPP send packet over POS-PL3 to RSP.  RSP is responsible for  Queueing, packet modification, shaping, QoS tagging, Segmentation.  The ASI chip is responsible for  Exceptions, maintains state information, interface to host processor, configure FPP and RSP over the CBI interface.  The management-Path Interface (MPI) enables the FPP to receive management frames from the local host.  Functional Bus Interface (FBI) connects the FPP to ASI to externally process function calls. ENTS689L: Packet Processing and Switching Commercial Network Processor Architectures 6 Memory  64 bit standard PC-133 synchronous dynamic random access memory (SDRAM)  133 MHz pipelined zero bus turnaround (ZBT) synchronous static random access memory (SSRAM).  PayloadPlus can use standard off-the-shelf standard DRAM for table lookups and does not need expensive and power hungry Content Addressable Memory (CAM).  Typical power limit for a line card is 150 W. ENTS689L: Packet Processing and Switching Commercial Network Processor Architectures 7 FPP Features  Programmable classification from layer 2 to 7  Pipelined multi-threaded processing of PDU  High-level Functional Programming Language (FPL) that implicitly takes care of multiple threads  ATM re-assembly at OC-48 rates (eliminates external SAR)  Table lookup with millions of entries  Eliminates need for external CAMs  Deterministic performance regardless of the table size  Configurable UTOPIA/POS interfaces ENTS689L: Packet Processing and Switching Commercial Network Processor Architectures 8 FPP Protocol Data Unit (PDU)  FPP is a pipelined multithreaded processor that can simultaneously analyze and classify up to 64 protocol data units (PDU).  Each incoming PDU is assigned its own processing thread which is called a context.  Each PDU consists of one or multiple 64-byte blocks  The context is a processing path that keeps track of:  All blocks of PDU.  Input port Number of the PDU  Data offset for the PDU  The last block information  Program variable associated with the PDU  Classification information of the PDU ENTS689L: Packet Processing and Switching Commercial Network Processor Architectures 9 FPP Block Diagram Source: http://www.hotchips.org/archives/hc13/3_Tue/13agere.pdf Source: http://nps.agere.com/support/non-nda/docs/FPP_Product_Brief.pdf ENTS689L: Packet Processing and Switching Commercial Network Processor Architectures 10 FPP Functional Description  The input framer frames incoming data into 64 byte blocks.  It writes blocks into the data buffer (SDRAM) and into block buffers and context memory.  The block buffer stores data that are being processed and other associated context data for the execution of the FPP operations on the incoming data.  The output interface sends the PDU and their classification information to the RSP.  The Pattern Process Engine (PPE) performs pattern matching to determine how the incoming PDUs are classified.  The Queue Engine manages FPP replay contexts, provide address for block buffers and maintains information on blocks, PDUs and connection queues. ENTS689L: Packet Processing and Switching Commercial Network Processor Architectures 11 FPP Functional Description (two pass)  FPP processes bit streams in two passes.  In the first pass the PDU blocks are read into the queue engine memory  It produces data blocks as separate 64-byte blocks  The data offsets of each block is determined  Links between individual blocks that compose a PDU is established.  The PDU type is identified  In the second pass (replay phase) as the PDU is replayed from the queue engine  The PDU is processed as a whole entity.  Pattern matching is executed  At the same time PDU transmission toward the output unit is done. ENTS689L: Packet Processing and Switching Commercial Network Processor Architectures 12 FPP Top Level Flow Source: http://nps.agere.com/support/non-nda/docs/FPL_Product_Brief.pdf ENTS689L: Packet Processing and Switching Commercial Network Processor Architectures 13 RSP (Traffic Manager) Features  64K queues  Programmable shaping (such as VBR, UBR, CBR)  Programmable discard policies (RED, WRED, EPD)  Programmable QoS (CBR, VBR, UBR)  Programmable CoS (Fixed Priority, Round Robin, WRR, WFQ, GFR)  Programmable packet modification  Support for multicast  Generates required checksums/CRC ENTS689L: Packet Processing and Switching Commercial Network Processor Architectures 14 RSP overview Source: http://www.hotchips.org/archives/hc13/3_Tue/13agere.pdf ENTS689L: Packet Processing and Switching Commercial Network Processor Architectures http://nps.agere.com/support/non-nda/docs/RSP_Product_Brief.pdf 15 RSP Functional Description  RSP handles classification and analysis results of the FPP on the incoming PDU.  It supports up to 64 logical input port.  For each PDU there is a command from the FPP that instructs RSP how to handle the PDU.  The PDU is added to a queue and stored in the PDU SDRAM.  RSP supports up to 64K programmable queues.  Processed data is output on a configurable 32-bit interface  There is also an 8-bit POS-PHY level 3 management interface.  RSP uses custom logic and three Very Large Instruction Word (VLIW) compute engines to process PDU ENTS689L: Packet Processing and Switching Commercial Network Processor Architectures 16 VLIW Compute Engines  The compute engines operate in a pipeline fashion  Each compute engine is dedicated to a processing function  Traffic Management Engine enforces, discard policies, and keeps queue statistics.  Traffic Shaper Engine ensures QoS and CoS for each queue.  Stream Editor Engine performs necessary PDU modifications  In each queue definition, the RSP includes, destination, scheduling information, and pointers to programs for each of the three VLIW compute engines.  Therefore, RSP can run multiple protocols at the same time.  The external CPU can also add queue definitions to set up ATM virtual circuits, for example. ENTS689L: Packet Processing and Switching Commercial Network Processor Architectures 17 RSP Data Flow The RSP 3 major processing stages: 1. Prepares and queues the PDU for scheduling 1. 2. 3. 2. Selects the next PDU block to be scheduled 1. 2. 3. 4. 3. Assembles the blocks into a PDU in SDRAM Determines the destination queue Determines if the PDU should be queued. If it should, it is added to the appropriate queue for scheduling Selects the physical port Selects the logical port Selects the scheduler Selects the QoS queue Selects the CoS queue Modifies and transmits the PDU on the appropriate output ports 1. 2. 3. Adjusts the QoS transmit intervals and CoS priority Performs PDU modifications Perform AAL5 CRC if necessary ENTS689L: Packet Processing and Switching Commercial Network Processor Architectures http://nps.agere.com/support/non-nda/docs/RSP_Product_Brief.pdf 18 Hierarchical Scheduling (Internal Scheduling Logic)  Channels: The output interface supports a 32-bit data channel which supports 1-4 POS-PHY or UTOPIA channels. It also has an 8-bit management output.  Physical Ports: Physical output ports are assigned to channels. There are up to 32 physical ports since there are 32 back pressure signals.  Logical ports: The RSP supports up to 256 logical output ports.  Schedulers: A set of schedulers is defined for each logical port. The RSP supports CBR, VBR and UBR schedulers.  QoS queues: Each of the QoS queues is assigned to a single scheduler. http://nps.agere.com/support/non-nda/docs/RSP_Product_Brief.pdf  COS queues: Up to 16 CoS queues feed a single QoS queue. ENTS689L: Packet Processing and Switching Commercial Network Processor Architectures 19 ASI  ASI seamlessly integrates FPP and RSP with the host processor.  It makes it possible for the designer to do the following:  Centralized initialization and configuration of the NP system and its physical interfaces.  Send routing and VPI/VCI updates to the system.  Implement various routing and management protocols.  Handle any occurring exceptions.  ASI enables high speed flow-oriented state maintenance:  Gathering Remote Network Monitoring (RMON) statistics  Time stamping packets  Checking Packet Sequence  Policing ATM and frame relay up to OC-48 rates  8-bit POS-PHY interface over which the ASI sends packets to the FPP and receives them from RSP ENTS689L: Packet Processing and Switching Commercial Network Processor Architectures 20 How Does ASI Work?  It has a PCI interface for communication with host processor.  32-bit high speed interface (FBI) to get functional call from FPP.  Two ALUs for processing FPP external function requests for:  Maintaining state and statistics.  Policing (leaky bucket)  Two SSRAM interface to allow memory access for different tasks without contention http://nps.agere.com/support/non-nda/docs/ASIProductBrief.pdf ENTS689L: Packet Processing and Switching Commercial Network Processor Architectures 21 ASI Configuration Capabilities  ASI enables host processor to configure up to 8 devices  The configuration bus is compatible with both Intel and Motorola bus formats.  It is used to : Initialize and configure FPP and RSP Load the program code for the FPP and RSP Load the dynamic updates to the FPP tables and RSP queues Configure third party external framers and physical interfaces ENTS689L: Packet Processing and Switching Commercial Network Processor Architectures 22 Policy and Conformance Checking  ASI performs conformance checking or policing for up to 64k connections at OC-48 rate.  It only does marking, not scheduling or shaping  Several variations of GCRA (leaky-bucket) algorithm can be used  For the dual leaky bucket case, the ASI indicates whether cells or frames are compliant or not and from which bucket the nonconformance was derived. ENTS689L: Packet Processing and Switching Commercial Network Processor Architectures 23 FPL  FPL is a functional language for classification.  In the functional language the programmer tells the computing resources what to do rather than how to do it.  In FPL you describe the protocol and the actions to process them.  In C you have to say how to process protocols.  FPL codes would be much shorter, easier to debug, and modify. ENTS689L: Packet Processing and Switching Commercial Network Processor Architectures 24 FPL Main Features  Fast pattern matching and classification of the data stream.  Defining functions for the FPP to execute based on the recognized patterns  Easy to read semantics  Dynamic updating of the code in the FPP  Software development tool set ENTS689L: Packet Processing and Switching Commercial Network Processor Architectures 25 Two Pass Processing  Recall the two pass processing in FPP  The first pass does preliminary process such as identifying the PDU type.  In the second pass (replay) it can simply transmit the PDU and conclusions or process a higher level protocol.  The queue engine allows you to process PDUs embedded in higher layer protocols in the replay phase. ENTS689L: Packet Processing and Switching Commercial Network Processor Architectures 26 Sample FPL Program Flow Source: http://nps.agere.com/support/non-nda/docs/FPL_Product_Brief.pdf ENTS689L: Packet Processing and Switching Commercial Network Processor Architectures 27 FPL code example Source: http://www.hotchips.org/archives/hc13/3_Tue/13agere.pdf ENTS689L: Packet Processing and Switching Commercial Network Processor Architectures 28 Dynamic FPL Program Changes  You can add and delete certain types of FPL statements from the image code in FPL dynamically.  FPL supports two types of pattern statement structures:  Single-rule patterns have a single pattern to match with one or two functions to perform.  These are called flows  These can not be added or removed dynamically  Multiple rule pattern statements allow you to define tables  This is used to define IP routing tables  These are called trees  You can add or delete statements from existing trees  You can not add a tree dynamically ENTS689L: Packet Processing and Switching Commercial Network Processor Architectures 29 Performance of the Network Processor  Drop in the performance due to the N+1 problem ENTS689L: Packet Processing and Switching Commercial Network Processor Architectures 30 Network Processor Performance   Performance evaluation for a mixture of packet sizes Performance drops when the number of computations per packet increases ENTS689L: Packet Processing and Switching Commercial Network Processor Architectures 31