* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download protocols
Survey
Document related concepts
Zero-configuration networking wikipedia , lookup
Network tap wikipedia , lookup
Bus (computing) wikipedia , lookup
Computer network wikipedia , lookup
Registered jack wikipedia , lookup
Wake-on-LAN wikipedia , lookup
Cracking of wireless networks wikipedia , lookup
IEEE 802.1aq wikipedia , lookup
Deep packet inspection wikipedia , lookup
Recursive InterNetwork Architecture (RINA) wikipedia , lookup
Internet protocol suite wikipedia , lookup
Power over Ethernet wikipedia , lookup
Transcript
UNIC, a Linux Framework to Reach Wire Speed Performances on Ethernet Networks Alain NINANE for the CMS DAQ Group University of Louvain DESY, 20/09/04 [email protected] Outline • Introduction to CMS DAQ – Trigger and DAQ architecture – Network requirements • Review internals of modern OS architecture – Memory managements & network protocols • The UNIC architecture – User level access to Network Interface Card • Measurements • Conclusions & Prospectives DESY 20 Sept 2004 UNIC - Wire Speed Performances on Ethernet 2 of 45 Experiments at LHC • CMS and ATLAS – PP collisions • LHCb – CP violation in B-meson decay • ALICE – Heavy-Ions collisions DESY 20 Sept 2004 UNIC - Wire Speed Performances on Ethernet 3 of 45 Compact Muon Solenoid Inner Outer Tracker Pixel Silicon DESY 20 Sept 2004 Calorimeter Electromagnetic Hadron UNIC - Wire Speed Performances on Ethernet Muon Detector Diameter 15 m Length 21 m Weight 12500 T 4 of 45 CMS Physics Rates 40 MHz bunch crossing frequency 1034 cm-2s-1 luminosity 20 pp interaction every 25 ns 109 Hz pp collisions rate Powerful event selection of 1 over 1013 “Interesting” physics ... new particles 10-4 Hz DESY 20 Sept 2004 UNIC - Wire Speed Performances on Ethernet 5 of 45 CMS Event Data QuickTime™ and a TIFF (Uncompressed) decompressor are needed to see this picture. Sub-detector KB Tracker pixel 72 Tracker silicon 300 Preshower 110 Electromagnetic calorimeter 100 ~ 1 MB of data every 25 ns Hadronic calorimeter 64 ~ 40.000.000 MB/s Muon system 22 Trigger 10 Powerful data rate reduction of 1 over 400 103 Disk/tape storage capacity 100 MB/s QuickTime™ and a TIFF (Uncompressed) decompressor are needed to see this picture. DESY 20 Sept 2004 Offline computing power UNIC - Wire Speed Performances on Ethernet 6 of 45 CMS Trigger and DAQ • Level 1 Trigger – – – – Filtering by custom hardware 3.2 µs processing time Data stored in pipeline memories Maximum output rate 100 kHz • Event Builder (EVB) • High Level Trigger – – – – 100 KHz Filtering by “COTS” computers Near offline software algorithms Full event data ~1s processing time Data stored in RAM Maximum output rate 100 Hz DESY 20 Sept 2004 40 MHz 100 Hz UNIC - Wire Speed Performances on Ethernet 7 of 45 CMS Event Builder Architecture • Event data fragments from subdetectors are read and stored in ~650 FED memory systems • Switching network connecting data sources to data destinations • Full data set of one event stored in the memory system of a single unit for HLT processing DESY 20 Sept 2004 UNIC - Wire Speed Performances on Ethernet 8 of 45 CMS Event Builder Throughput • 100 kHz x 1 MByte – From ~ 500 readout links – Links at 200 MByte/s • 100 GByte/s (1 Tbit/s) – ~500 links at 200 MByte/s – To HLT filtering system • 100 Hz x 1MB to storage by computing services DESY 20 Sept 2004 UNIC - Wire Speed Performances on Ethernet 9 of 45 CMS Event Builder Baseline • Hosts capabilities – Receive, send, process data at 200 MByte/s • Network switch capabilities – Handle 500 data sources and 500 data sinks – Aggregate throughput of 100 GByte/s DESY 20 Sept 2004 UNIC - Wire Speed Performances on Ethernet 10 of 45 Technological Choice (I) • Commercial solution • Private solution – Use commercially available hardware & software – Use widely accepted standards DESY 20 Sept 2004 – Custom made hardware/software – Application dedicated protocol UNIC - Wire Speed Performances on Ethernet 11 of 45 Technological Choice (II) Topic Public Commercial/Standard Private/Custom Reliability Flexibility Performances Evolutive Vendor Independence DESY 20 Sept 2004 UNIC - Wire Speed Performances on Ethernet 12 of 45 Intermediate solution • Use widely available hardware like Ethernet and publicly available, Open Source, software like Linux – Open Source … Source code available – Widely documented Can be modified/adapted to fit particular needs Avoid to reinvent the whole wheel DESY 20 Sept 2004 UNIC - Wire Speed Performances on Ethernet 13 of 45 Review of Internals of Modern Operating Systems • Memory Managements • Network Protocols Modern OS Architecture User Programs User Programs Libraries User Level socket network protocols network interface drivers plain file filesystem cooked disk interface cooked tty raw disk interface raw tty interface block buffer cache block device drivers line discipline process control subsystem Kernel Level scheduler memory management character device drivers Hardware Control Hardware DESY 20 Sept 2004 inter-process communication UNIC - Wire Speed Performances on Ethernet Kernel Level Hardware Level 15 of 45 Kernel / User Mode • CPU in user mode – Unprivileged CPU instruction set – Code written by users and software programmmers (libraries) – User process owned memory space DESY 20 Sept 2004 • CPU in kernel mode – Unprotected CPU instruction set – Code written by kernel developpers and privileged users – Kernel and user memory space UNIC - Wire Speed Performances on Ethernet 16 of 45 Roles of a Device Driver • In the bottom half of the kernel – Control and command the hardware • In the top half of the kernel – Manage data transfer between applications (user space) and devices (kernel space) DESY 20 Sept 2004 UNIC - Wire Speed Performances on Ethernet 17 of 45 Memory Management in Linux Physical Addresses User Virtual Addresses Physical Memory 0xC238’0000 0x0238’0000 0x41F0’0000 Process 123 Kernel Virtual Addresses 0x1000’0000 0x01F0’0000 0x0126’0000 Process 345 0x2126’0000 0x4000’0000 0x1000’0000 0x00F0’0000 0x1200’0000 0xC1F0’0000 Kernel Logical Addresses 0x0000’0000 0xF100’0000 Device Memory DESY 20 Sept 2004 0xC000’0000 0xF000’0000 UNIC - Wire Speed Performances on Ethernet 18 of 45 Data Transfer Overhead (I) • Problem 1 • Solution – Copy of the data between the user and kernel processes – Can’t be avoided easily Synchronous for user application Asynchronous in the kernel DESY 20 Sept 2004 – Use capability of device drivers to remap memory spaces (ioremap) – Requires careful programming UNIC - Wire Speed Performances on Ethernet 19 of 45 Memory Mapping in Linux Physical Addresses User Virtual Addresses Physical Memory 0x2126’0000 Kernel Virtuel Addresses 0x0126’0000 Process 345 0x1000’0000 0x4000’0000 0x1200’0000 0x00F0’0000 0xC1F0’0000 Kernel Logical Addresses 0x0000’0000 0xF100’0000 Device Memory DESY 20 Sept 2004 0xC000’0000 0xF000’0000 UNIC - Wire Speed Performances on Ethernet 20 of 45 Network Protocols • Role of network protocols – Provide communication and interoperability between differents applications running on different computers and operating systems – Provide communication reliability, even for applications running on top of unreliable network layers – Isolate network details from application DESY 20 Sept 2004 UNIC - Wire Speed Performances on Ethernet 21 of 45 A Real Life Example Linux DEC Alpha Workstation 4b Mail Text File (a) 4a 3a 2a 2b 1b 1a 2c 2d 3b 2e 3d 3e 4c 1c 3c Mail Text File (b) MS Hotmail Web Server DESY 20 Sept 2004 UNIC - Wire Speed Performances on Ethernet 22 of 45 Transmission Control Protocol • TCP - a reliable stream transport service – – – – – Stream oriented Virtual circuit connection Buffered transfer Unstructured stream Full duplex connection • Reliability – Provided by a positive acknowledgement with retransmission method • TCP itself is based on top of another protocol – Best known as IP, the Internet Protocol DESY 20 Sept 2004 UNIC - Wire Speed Performances on Ethernet 23 of 45 Protocols Layering • SMTP, HTTP, NFS, … Application Reliable Stream TCP User Datagram UDP Internet Protocol (IP) Network • TCP - Connected stream • UDP - Connection less • IP Datagram • Ethernet, ATM, VMEbus, … Physical Medium DESY 20 Sept 2004 UNIC - Wire Speed Performances on Ethernet 24 of 45 Protocols Headers Data Fragment Number Acknowledgement Number Source/Destination Port Numbers Checksum Application data TCP Header Application data IP Header TCP Header Application data IP Header TCP Header Application data ‘Next’ Protocol number Size and checksum Time to live Source/Destination IP addresses ‘Next’ Protocol number Size and CRC Source/Destination Ethernet Addresses Ethernet Header DESY 20 Sept 2004 UNIC - Wire Speed Performances on Ethernet 25 of 45 Data Transfer Overhead (II) • Problem 2 • Solution – Protocols have been designed to be general Few buttons to tune to get higher performances – Overhead of network protocols Headers Checksumming Relies on the quality of the software implementation of the protocol Copy of the data between differents layers DESY 20 Sept 2004 – Be less general No need for a flexible addressing system if domain of application is local Benefits from the homogeneity of your hardware – Implements an application specific protocol Avoid copying of the data between many layers Be fault tolerant UNIC - Wire Speed Performances on Ethernet 26 of 45 The UNIC Framework UNIC - User Level Access to NIC • Avoid useless overhead in data copy – Between user and kernel spaces – Inside protocols • Avoid overhead by protocols – Allows event builder task to access the ethernet frames directly • The UNIC solution – Use memory mapping between event builder task and ethernet frames in kernel. – Patch the ethernet device driver to use the memory mapped frames. DESY 20 Sept 2004 UNIC - Wire Speed Performances on Ethernet 28 of 45 Network Subsystem in Linux STANDARD Arch. UNIC Arch. Application Application + Protocol Kernel Boundary Systems Calls TCP Systems Calls UDP Kernel Top Half Layer Protocols IP e1000 acenic UNIC - Wire Speed Performances on Ethernet syskonnect Hardware eepro100 DESY 20 Sept 2004 e1000 acenic syskonnect eepro100 NICs Kernel Bottom Half Layer UNIC Device Driver gluecode to patched Ethernet device driver NICs 29 of 45 Zero-Copy Layer 2 Device Driver STANDARD Arch. DESY 20 Sept 2004 UNIC Arch. UNIC - Wire Speed Performances on Ethernet 30 of 45 Patched Device Drivers • Problem ! – Patch a device driver • However – The network subsystem is standardized – Task is nearly repetitive on existing drivers • “Augment” the standard control structure – Socket buffer (skbuff) -----> unic slot • Work done for: Becker’s driver for Intel 100 Mbit (eepro100) Syskonnect Gigabit (sk98lin) Sorensen’s driver for Alteon Gigabit (acenic) Intel 1000 Mbit (e1000) DESY 20 Sept 2004 UNIC - Wire Speed Performances on Ethernet 31 of 45 Standard Ethernet Device Driver Frames allocated “on the fly” by the device driver Control structures are called socket buffer (skbuff) • Tx: sendto() • Rx: recvfrom() DESY 20 Sept 2004 UNIC - Wire Speed Performances on Ethernet 32 of 45 Patched Ethernet Device Driver Frames allocated statically and mapped by the application Control structures are called unicslots • Tx: ioctl() • Rx: polling thread DESY 20 Sept 2004 UNIC - Wire Speed Performances on Ethernet 33 of 45 Measurements & Performances Event Builder Demonstrator • • • • • 64 PCs - Supermicro 370 DLE - Serverworks LE chipset Pentium III 750 MHz, 1000 MHz PCI 64 bit/66 MHz Linux kernel 2.4 Gigabit ethernet – NIC: Alteon AceNIC (Copper UTP) – Switch: 64 ports, FastIron-8000 from Foundry Networks DESY 20 Sept 2004 UNIC - Wire Speed Performances on Ethernet 35 of 45 Streaming Tests • 1 way point-to-point streaming – 1 host sender to 1 host receiver 1 rail: 1 NIC / host 2 rails: 2 NICs / host varying packet size up to MTU • Drivers and protocols – Standard TCP/IP Layer 2 • Measurements – – sockets total saturation throughput measured at the receiver side bottleneck is the receiver – Patched packet losses: ~10 % with Layer 2 protocols (standard and patched drivers) Layer 2 zero-copy DESY 20 Sept 2004 UNIC - Wire Speed Performances on Ethernet 36 of 45 Streaming - 1 rail Streaming throughput vs packet size 140 TCP/IP Layer 2 sockets Layer 2 zero-copy 120 Throughput [MB/s] 100 80 60 40 20 0 0 200 400 600 800 1000 1200 1400 1600 Packet size [bytes] DESY 20 Sept 2004 UNIC - Wire Speed Performances on Ethernet 37 of 45 Streaming - UNIC - 1 & 2 rails Time/packet - Layer 2 zero copy driver 14 1 rail 12 2 rails Time/packet [us] 10 116 MB/s 8 6 230 MB/s 4 2 0 0 128 256 384 512 640 768 896 1024 1152 1280 1408 1536 Packet size [bytes] DESY 20 Sept 2004 UNIC - Wire Speed Performances on Ethernet 38 of 45 EVB Protocol Protocol with destination based traffic shaping Builder units request events at event manager Builder units reads fragments from readout units sequentially Builder units process several events simultaneously Application level reliability accounts of packet losses, ... Acronyms RU : readout units BU : builder units EVM : event manager DESY 20 Sept 2004 UNIC - Wire Speed Performances on Ethernet 39 of 45 EVB - TCP/IP Performances 31 x 31 Event building performance measurements N x N setup Fragments size generated according to log-normal distribution average 16 kB rms 8 kB Performance results 75 MB/s for 16 kB Scalable with N DESY 20 Sept 2004 UNIC - Wire Speed Performances on Ethernet 40 of 45 EVB - UNIC performances Event building performances 31 x 31 N x N setup Fragments size generated according to log-normal distribution Performance results average 16 kB rms 8 kB Maximum between 8-20 kB 1-rail : 115 MB/s 2-rails : 220 MB/s Scalable with N DESY 20 Sept 2004 UNIC - Wire Speed Performances on Ethernet 41 of 45 Conclusion & (Currents & Future) Developments Conclusion • Goal of 200 MB/s is reached ! – However, maintenance has to be made over years and years • TCP/IP is still investigated together with the UNIC driver DESY 20 Sept 2004 UNIC - Wire Speed Performances on Ethernet 43 of 45 Current Developments • TCP/IP implementation are improving – Kernel 2.4 -> 2.6 zero copy inside the protocol – More “buttons” Linux Advanced Routing & Traffic Control – Jumbo frames (MTU now up to 9 kB) Support of jumbo frames in switches ??? • UNIC has been ported to Intel e1000 – Tests on a NIC with 4 rails 327 MByte/s standards frames 364 MByte/s jumbo frames DESY 20 Sept 2004 UNIC - Wire Speed Performances on Ethernet 44 of 45 The Complete Story • Ethernet is not the unique networking technology planned to be used in the Event Builder – Myrinet Higher performances (native 250 Mbyte/s) Cheaper switches but expensive NIC Depends only on a single manufacturer/vendor – Role of this Ethernet and TCP/IP study shows that they are still valuable candidates … may be just as a backup solution !! DESY 20 Sept 2004 UNIC - Wire Speed Performances on Ethernet 45 of 45