Download Introduction to Network Processors

CS 260 Lecture 1: Introduction to Network Processors Instructor: L.N. Bhuyan www.cs.ucr.edu/~bhuyan/CS260 1 2003 ©UCR Outline °Introduction to NP Systems °Relevant Applications °Design Issues and Challenges °Relevant Software and Benchmarks °A case study: Intel IXP network processors 2 2003 ©UCR What are Network Processors °Any device that executes programs to handle packets in a data network °Examples • Processors on router line cards • Processors in network access equipment 3 2003 ©UCR Why Network Processors ° Current Situation • Data rates are increasing • Protocols are becoming more dynamic and sophisticated • Protocols are being introduced more rapidly ° Processing Elements • GP(General-purpose Processor) - Programmable, Not optimized for networking applications • ASIC(Application Specific Integrated Circuit) - high processing capacity, long time to develop, Lack the flexibility • NP(Network Processor) 4 - achieve high processing performance - programming flexibility - Cheaper than GP 2003 ©UCR Typical NP Architecture SDRAM Bus (Packet buffer) SRAM (Routing table) Bus Output ports Input ports multi-threaded processing elements Co-processor Network Processor 5 2003 ©UCR TCP/IP Model OSI TCP/IP 7 Application Application 6 Pre. 5 Session 4 Transport TCP 3 Network IP 2 Data Link 1 Physical Host-to-Net ° ISO OSI (Open Systems Interconnection) not fully implemented ° Presentation and Session layers not present in TCP/IP 6 2003 ©UCR Processing Tasks Policy Applications Control Plane Network Management Signaling Topology Management Queuing / Scheduling Data Transformation Data Plane Classification Data Parsing Media Access Control Physical Layer Source: Network Processor Tutorial in Micro 34 - Mangione-Smith & Memik 7 2003 ©UCR Application Categorization ° Control-Plane tasks • Less time-critical • Control and management of device operation - Table maintenance, port states, etc. ° Data-Plane tasks • Operations occurring real-time on “packet path” • Core device operations - Receive, process and transmit packets 8 2003 ©UCR Data Plane Tasks ° Media Access Control • Low-level protocol implementation - Ethernet, SONET framing, ATM cell processing, etc. ° Data Parsing • Parsing cell or packet headers for address or protocol information ° Classification • Identify packet against a criteria (filtering / forwarding decision, QoS, accounting, etc.) ° Data Transformation • Transformation of packet data between protocols ° Traffic Management • Queuing, scheduling and policing packet data 9 2003 ©UCR Applications: IPv4 Routing P A P P B Router C ° Routers determine next hop and forward packets 10 2003 ©UCR URL-based switching – My NSF Project www.yahoo.com Internet Image Server IP TCP APP. DATA Application Server GET /cgi-bin/form HTTP/1.1 Host: www.yahoo.com… Switch HTML Server ° Increase efficiency ° Tasks • Traverse the packet data (request) for each arriving packet and classify it: - Contains ‘.jpg’ -> to image server - Contains ‘cgi-bin/’ -> to application server 11 2003 ©UCR Organizing Processor Resources °Design decisions: • High-level organization • ISA and micro architecture • Memory and I/O integration °Today’s commercial NPs: • Chip multiprocessors • Most are multithreaded • Exploit little ILP (Cisco does) • No cache • Micro-programmed 12 2003 ©UCR Architectural Comparisons °High-level organizations • Aggressive superscalar (SS) • Fine-grained multithreaded (FGMT) • Chip multiprocessor (CMP) • Simultaneous multithreaded (SMT) 13 2003 ©UCR Multithreading °Basic idea • multiple register sets in the processor • fast context switch • switch thread on a cache access (How is this different than non-blocking cache?) • tolerating local latency vs remote in CCNUMA multiprocessors • hybrids - switch on notice - simultaneous multithreading 14 2003 ©UCR Architectural Comparisons (cont.) Fine-Grained Coarse-Grained Multiprocessing Time (processor cycle) Superscalar Simultaneous Multithreading 15 Thread 1 Thread 3 Thread 5 Thread 2 Thread 4 Idle slot 2003 ©UCR Tasks and Services Three Benchmarks used in the experiment 16 2003 ©UCR Some Challenges ° Intelligent Design • Given a selection of programs, a target network link speed, the ‘best’ design for the processor - Least area - Least power - Most performance ° Write efficient multithreaded programs • NPs have - Heterogeneous computer resources Non-uniform memory Multiple interacting threads of execution Real-time constraints • Make use of resources - How to use special instructions and hardware assists – – Compilers Hand-coded • Multithreaded programs 17 Manage access to shared state Synchronization between threads 2003 ©UCR Benchmarks for Network Processors • NetBench - 10 applications - http://cares.icsl.ucla.edu/NetBench • CommBench - 8 networking and communications applications - http://ccrc.wustl.edu/~wolf/cb/ • EEMBC - http://www.eembc.org/benchmark • MediaBench - Transcoders - Some communications applications 18 2003 ©UCR IXP1200 Block Diagram ° StrongARM processing core ° Microengines introduce new ISA ° I/O • PCI • SDRAM • SRAM • IX : PCI-like packet bus ° On chip FIFOs • 16 entry 64B each 19 2003 ©UCR IXP1200 Microengine ° 4 hardware contexts • Single issue processor • Explicit optional context switch on SRAM access ° Registers • All are single ported • Separate GPR • 256*6 = 1536 registers total ° 32-bit ALU • Can access GPR or XFER registers ° Shared hash unit • 1/2/3 values – 48b/64b • For IP routing hashing ° Standard 5 stage pipeline ° 4KB SRAM instruction store – not a cache! ° Barrel shifter 20 2003 ©UCR IXP 2400 Block Diagram ° XScale core replaces StrongARM DDR DRAM controller ° Microengines ME0 ME1 ME3 ME2 Scratch /Hash /CSR XScale Core PCI QDR SRAM controller • Faster • More: 2 clusters of 4 microengines each ° Local memory ME4 ME7 ME5 ME6 MSF Unit ° Next neighbor routes added between microengines ° Hardware to accelerate CRC operations and Random number generation ° 16 entry CAM 21 2003 ©UCR Different Types of Memory Type Width Size Approx Notes unloaded (byte) (bytes) latency (cycles) Local 4 2560 1 Indexed addressing post incr/decr On-chip 4 Scratch 16K 60 Atomic ops SRAM 4 256M 150 Atomic ops DRAM 8 2G 300 Direct path to/fro MSF 22 2003 ©UCR IXA Software Framework External Processors Control Plane Protocol Stack Control Plane PDK XScale Core C/C++ Language Core Components Core Component Library Resource Manager Library Microengine Pipeline Microblock Library Micro block Micro block Protocol Library Micro block Microengine C Language Utility Library Hardware Abstraction Library 23 2003 ©UCR Example Toaster System: Cisco 10000 ° Almost all data plane operations execute on the programmable XMC ° Pipeline stages are assigned tasks – e.g. classification, routing, firewall, MPLS • Classic SW load balancing problem ° External SDRAM shared by common pipe stages 24 2003 ©UCR IBM PowerNP ° 16 pico-procesors and 1 powerPC ° Each pico-processor • Support 2 hardware threads • 3 stage pipeline : fetch/decode/execute ° Dyadic Processing Unit • Two pico-processors • 2KB Shared memory • Tree search engine ° Focus is layers 2-4 ° PowerPC 405 for control plane operations • 16K I and D caches ° Target is OC-48 25 2003 ©UCR SRAM Fabric Processor Table Lookup Unit Executive Processor text Buffer Mngt Unit Queue Mngt Unit 60Gbps Busses Cluster 26 SRAM PROM Switch Fabric PCI SRAM CONTROL Motorola C-Port C-5 Chip Architecture Cluster CP-0 CP-1textCP-2 CP-3 CP12 CP- CP13 text 14 CP15 PHY PHY PHY PHY PHY PHY PHY PHY 2003 ©UCR References ° [NPT] W. H. Mangione-Smith, G. Memik Network Processor Technologies ° [NPRD] Patrick Crowley, Raj Yavatkar An Introduction to Network Processor Research & Design, HPCA-9 Tutorial 27 2003 ©UCR

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Introduction to Network Processors