Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
An Architecture and Prototype Implementation for TCP/IP Hardware Support Mirko Benz Dresden University of Technology, Germany TERENA 2001 Motivation – Demanding Services Application Complexity 1000+ Instructions per Packet Internet Security Provision Quality of Service Support Routing Switching Required Processing TERENA 2001 Motivation – MIPS versus Bandwidth Trend MIPS Performance / Bandwidth Hardware Support for Protocol Processing Acceleration Case Study: TCP Processor Performance Evolution ~100%/18 month Available Bandwidth Technological Progress / Time TERENA 2001 Project Overview • Assumptions & Preconditions - Restriction to Local Area Networks (e.g. Gigabit Ethernet) - High Bandwidth and Low Error Probability - Concentration on Host Implementations Protocol Analysis Prototype Variants TERENA 2001 TCP/IP Partitioning Evaluation Optimisation System Simulation Efficient OS Integration Flexible Protocol Engine Domain Specific Methodology Talk Outline TCP Protocol Performance Evaluation TCP Acceleration Approach System Simulation Environment Operating System Integration Hardware Implementation Directions Myrinet Implementation and Results Conclusions and Outlook TERENA 2001 TCP Protocol Performance Evaluation • TCP Software Implementation Structure • Sources of Protocol Processing Overhead - Communication, Synchronisation Application - Operating System Call Overhead Socket - Copy Operation - Classification: Per-Byte / Per-Packet TCP • Optimisation Opportunities IP - Interrupt Suppression Driver - Zero Copy Mechanisms - User Level Networking Network - Checksum Offloading (e.g. Task Offload) - Extending frame sizes (e.g. Jumbo Frames) TERENA 2001 TCP Protocol Performance Evaluation • Performance TCP versus Myrinet GM: - Throughput 335/967 Mbit/s (TCP/Myrinet) - Latency 81/29 s (TCP/Myrinet) - 100% CPU Utilisation - (RedHat Linux 6.2 / PIII 500 MHz) 1000 [Mbit/s] 800 600 GM 1.2 TCP/Myrinet 400 200 0 0 TERENA 2001 10000 20000 30000 [byte] Goals? • Software Implementation as a Foundation • Achieve On Wire Compatibility • Consider Different Target Architectures • Develop Re-Useable Hardware Components • Integration of High Level Tools • System Wide Optimisation • Efficient, Transparent Operating System Integration Flexible Protocol Engine TERENA 2001 Domain Specific Methodology TCP Acceleration Approach Application Socket TCP IP Driver Network PE TERENA 2001 • TCP SW Stack Complexity - General Purpose Protocol - Not Designed for High End Networking - Many Interdependent Algorithms - Often Modified, Adapted, Optimised - ~15.000 Lines C • Approach - TCP Partitioning -> Fast Path Extraction - Hardware Support -> Acceleration - Operating System Bypass • HW/SW Synchronisation - Initialisation, Termination/Error • Transparent Integration - Socket Level Switch Fast Path Protocol Processing • Only for User Data Exchange • No Connection Management • No Error Recovery – Only Detection • Complexity ~10% of SW Stack Sender Receiver Connection Context Connection Context TCP Send TCP Send Send Ack TCP Recv Data Ack Network TERENA 2001 Send Ack TCP Recv System Simulation Environment • Complex Communication System • Real Applications, Operating System (User Mode Linux) • Network Simulation – Error Injection • Fast Path Implementation: Hardware/Software • System Evaluation: Functionality & Performance Netperf Netserver Socket Socket User Mode Linux User User Mode Mode Linux Linux CORBA Network Simulator TERENA 2001 Evaluation TCP Fast Path SW ISA Simulator TCP Fast Path HW VHDL Simulator Fast Path Hardware Implementation Directions • Embedded RISC Processor Software - LEON Sparc 33 MHz, INTEL StrongARM 200 MHz - OS: ucLinux, GNU C Environment Intelligent Network Adapter (Myrinet) - RISC Core with User/Network Interface, DMA Engines - Control Program Modification, no Operating System • Network Processor (INTEL IXP1200) - 6 multithreaded microengines - Development: IXP Assembler, Simulator • Specific Hardware - High Level FPGA Design Flow, XILINX Virtex - SYNOPSYS Protocol Compiler TERENA 2001 Hardware Myrinet Implementation Plattform • Technology - Packet-Communication and Switching Technology - High-Performance, Highly Reliable - System-Area Network, Cluster Interconnect • Intelligent Network Adapter LOCAL SRAM 64 bit 64 bit, 33 MHz PCI DMA Bridge Controller Host Interface RISC LANai 7 TERENA 2001 Packet Interface 1280 Mbit/s Myrinet Link TCP Fast Path/Myrinet • Development Environment - Host SW GM (message passing), Firmware MCP – open source - GNU C Suite, no OS, one context only, no Interrupts •Implementation - MCP: 4 Event Driven State Machines - Fast Path Integration within Network Send & Recv Code - Exploitation of Hardware Support for Checksum Computation - No specific Optimisations, Some Limitations TERENA 2001 TCP Fast Path / Myrinet Performance Results • Performance - Test Setup: INTEL PIII/500MHz, Myrinet LAN Adapter, Linux OS - Netperf Benchmark Throughput/Delay - Throughput Peak: 967, 816, 333 Mbit/s (GM, Fast Path, TCP) - Delay Minimum: 16.5, 49, 81 s (GM, Fast Path, TCP) 1000 [Mbit/s] 800 GM 1.2 Fast Path TCP/Myrinet 600 400 200 0 0 TERENA 2001 10000 20000 30000 [byte] Summary & Outlook • Integrated Architecture and Desing Flow for Protocol Processing Acceleration - TCP Partitioning - System Simulation Environment - Integration with existing SW TCP Stack & OS • Prototype with Promising Performance • Present Work: - Fast Path HW Implementation and SoC Integration Protocol Analysis TCP/IP Partitioning System Simulation Prototype Variants Evaluation Optimisation Efficient OS Integration Flexible Configurable Protocol Engine TERENA 2001