Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
Broadband Protocols WP 1.2.1 IP protocols, Lambda switching, multicasting Richard Hughes-Jones The University of Manchester www.hep.man.ac.uk/~rich/ then “Talks” 1 FABRIC Meeting, Poznan Poland, 25 Sep 2006, R. Hughes-Jones Manchester Protocols Document 2 FABRIC Meeting, Poznan Poland, 25 Sep 2006, R. Hughes-Jones Manchester Protocols Document 1 “Protocol Investigation for eVLBI Data Transfer” Document JRA-WP1.2.1.001 Jodrell & Manchester folks with hard work from Matt Completed and on the EXPReS WIKI Introduces e-VLBI and its Networking Requirements Continuously streamed data Individual packets are not particularly valuable. Maintenance of the data rate is important Quite different to those where bit-wise correct transmission is required e.g. file transfer Forms a valuable use case for GGF GHPN-RG Presents the actions required in order to make an informed decision and to implement suitable protocols in the European VLBI Network. Strategy document. 3 FABRIC Meeting, Poznan Poland, 25 Sep 2006, R. Hughes-Jones Manchester Protocols Document 2 Protocols considered for investigation include: TCP/IP UDP/IP DCCP/IP VSI-E RTP/UDP/IP Remote Direct Memory Access TCP Offload Engines Very useful discussions at Haystack VLBI meeting Agreement to make joint tests Haystack-Jodrell Use of ESLEA 1 Gbit transatlantic link Work in progress – Links to ESLEA UK e-science Vlbi-udp – Simon: UDP/IP stability & the effect of packet loss on correlations Tcpdelay – Stephen: TCP/IP and CBR data 4 FABRIC Meeting, Poznan Poland, 25 Sep 2006, R. Hughes-Jones Manchester tcpdelay 5 FABRIC Meeting, Poznan Poland, 25 Sep 2006, R. Hughes-Jones Manchester tcpdelay: VLBI Application Protocol Want to examine how TCP moves Constant Bit Rate Data tcpdelay a test program: instrumented TCP program emulates sending CBR Data. Records relative 1-way delay Record TCP Stack activity with web100 Number of packets n bytes Wait time time 6 FABRIC Meeting, Poznan Poland, 25 Sep 2006, R. Hughes-Jones Manchester VLBI Application Protocol VLBI data is produced at Constant Bit Rate Sender TCP & Network Receiver Timestamp1 Timestamp2 Data1 Timestamp3 Data2 Timestamp4 Packet loss Timestamp5 Data3 Data4 ●●● Time 7 FABRIC Meeting, Poznan Poland, 25 Sep 2006, R. Hughes-Jones Manchester Visualising the Results Stephen Kershaw When packet loss is detected TCP: Reduces Cwnd Halves the sending rate Expect a delay in the message arrival time Arrival time Packet loss Delay in stream Expected arrival time at CBR Message number / Time 8 FABRIC Meeting, Poznan Poland, 25 Sep 2006, R. Hughes-Jones Manchester Arrival Times: UKLight JB-JIVE-Manc Effect of loss rate on message arrival time TCP buffer 32M bytes 50 Drop 1 in 5k 45 Drop 1 in 10k 40 Drop 1 in 20k Drop 1 in 40k No loss 35 Time / s Message size: 1448 Bytes Wait time: 22 us Data Rate: 525 Mbit/s Route: JB-UKLight-JIVEUKLight-Man RTT ~27 ms 30 25 20 BDP @512Mbit 1.8Mbyte Estimate catchup possible if loss < 1 in 1.24M 15 10 5 0 1 2 3 4 5 6 Message number 7 8 9 4 x 10 9 FABRIC Meeting, Poznan Poland, 25 Sep 2006, R. Hughes-Jones Manchester 10 TCP Web100: JB-Manc – Large buffer Standard TCP TCP buffer 930k Drop 1 in 40,000 packets Classic Cwnd behaviour 600000 500000 400000 400000 300000 300000 200000 200000 100000 100000 0 5000 6000 7000 8000 9000 0 10000 11000 12000 13000 14000 15000 time ms 350 300 250 200 150 100 50 0 5000 7000 9000 11000 13000 15000 time ms 3 Limited by ssthresh ! TCP requires much care!! pkt re-transmit 2.5 2 1.5 1 0.5 0 5000 7000 9000 time ms 11000 13000 15000 10 FABRIC Meeting, Poznan Poland, 25 Sep 2006, R. Hughes-Jones Manchester Cwnd 500000 Num. Dup ACKs Message size: 1448 Bytes Wait time: 22 Data Rate: 525 Mbit/s Route: JB-UKLight-JIVEUKLight-Man RTT ~27 ms Data Bytes Out DataBytesOut (Delta) DataBytesIn (Delta) CurCwnd (Value) 600000 iBOB 11 FABRIC Meeting, Poznan Poland, 25 Sep 2006, R. Hughes-Jones Manchester Prototype iBOB with two sampler boards attached FPGA based signal processing board from UC Berkeley 12 FABRIC Meeting, Poznan Poland, 25 Sep 2006, R. Hughes-Jones Manchester iBOB block diagram Bryan Anderson RAM 10GE CX4 Station VSI Board iBOB 10 Gigabit Ethernet now available UDP/IP module exists Use for Demonstration of FPGA driven IP networking Link to PC NIC – diagnostics Test over GÉANT Onsala - Jodrell CX4 - fibre media converter 10GE VSI or headstack Disk based system 13 FABRIC Meeting, Poznan Poland, 25 Sep 2006, R. Hughes-Jones Manchester Multi-Gigabit Trials on GEANT Collaboration with Dante. What is inside GÉANT2 What is the collaboration interesting? 10 Gigabit Ethernet UDP memory-2-memory flows TCP flows with allocated Bandwidth Options using GÉANT Development Network 10 Gbit SDH Network Options Using the GÉANT LightPath Service PoP Location for Network tests 14 FABRIC Meeting, Poznan Poland, 25 Sep 2006, R. Hughes-Jones Manchester GÉANT2 Topology 15 FABRIC Meeting, Poznan Poland, 25 Sep 2006, R. Hughes-Jones Manchester GÉANT2: The Convergence Solution EXPReS PC 10 GE NREN Access 1678 MCC GÉANT2 POP A Existing IP Router L2 Matrix TDM Matrix GÉANT2 POP B 1678 MCC 1626 LM L2 TDM 1626 LM Managed Lambda’s EXPReS PC 10 GE Existing IP Router 16 FABRIC Meeting, Poznan Poland, 25 Sep 2006, R. Hughes-Jones Manchester From PoS to Ethernet Connect. Communicate. Collaborate • More Economical Architecture • Highest Overall Network Availability • Flexibility (VLAN management) • Highest Network Performance (Latency) IP Links VLANs Router 1678 MCC Transport Node VC-4-nv Channels 1/10 Gigabit Ethernet L2 Matrix TDM Matrix 17 FABRIC Meeting, Poznan Poland, 25 Sep 2006, R. Hughes-Jones Manchester What do we want to do? Set up 4 Gigabit Lightpath Between GÉANT PoPs Collaboration with Dante PCs in their PoPs with 10 Gigabit NICs VLBI Tests: UDP Performance Throughput, jitter, packet loss, 1-way delay, stability Continuous (days) Data Flows – VLBI_UDP and multi-Gigabit TCP performance with current kernels Experience for FPGA Ethernet packet systems Dante Interests: multi-Gigabit TCP performance The effect of (Alcatel) buffer size on bursty TCP when using BW limited Lightpaths Need A Collaboration Agreement 18 FABRIC Meeting, Poznan Poland, 25 Sep 2006, R. Hughes-Jones Manchester Options Using the GÉANT Development Network 10 Gigabit SDH backbone Alkatel 1678 MCC Node location: London Amsterdam Paris Prague Frankfurt Can do traffic routing so make long rtt paths Available Dec/Jan 07 Less Pressure for long term tests 19 FABRIC Meeting, Poznan Poland, 25 Sep 2006, R. Hughes-Jones Manchester Options Using the GÉANT LightPaths Set up 4 Gigabit Lightpath Between GÉANT PoPs Collaboration with Dante PCs in Dante PoPs 10 Gigabit SDH backbone Alkatel 1678 MCC Node location: Budapest Geneva Frankfurt Milan Paris Poznan Prague Vienna Can do traffic routing so make long rtt paths Ideal: London Copenhagen 20 FABRIC Meeting, Poznan Poland, 25 Sep 2006, R. Hughes-Jones Manchester 4 Gigabit GÉANT LightPath Example of a 4 Gigabit Lightpath Between GÉANT PoPs PCs in Dante PoPs 26 * VC-4s 4180 Mbit/s 21 FABRIC Meeting, Poznan Poland, 25 Sep 2006, R. Hughes-Jones Manchester PCs and Current Tests 22 FABRIC Meeting, Poznan Poland, 25 Sep 2006, R. Hughes-Jones Manchester Test PCs Have Arrived Boston/Supermicro X7DBE Two Dual Core Intel Xeon Woodcrest 5130 2 GHz Independent 1.33GHz FSBuses 530 MHz FD Memory (serial) Chipsets: Intel 5000P MCH – PCIe & Memory ESB2 – PCI-X GE etc. PCI 3 8 lane PCIe buses 3* 133 MHz PCI-X 2 Gigabit Ethernet SATA 23 FABRIC Meeting, Poznan Poland, 25 Sep 2006, R. Hughes-Jones Manchester Lab Tests 10 Gigabit Ethernet 10 Gigabit Test Lab being set up in Manchester Cisco 7600 Cross Campus λ <1ms Server quality PCs Neterion NICs Myricom & Chelsio being purchased B2B performance so far SuperMicro X6DHE-G2 Kernel (2.6.13) & Driver dependent! One iperf TCP data stream 4 Gbit/s Two bi-directional iperf TCP data streams 3.8 & 2.2 Gbit/s UDP Disappointing Propose to install Fedora Core5 Kernel 2.6.17 on the new Intel dual-core PCs 24 FABRIC Meeting, Poznan Poland, 25 Sep 2006, R. Hughes-Jones Manchester Any Questions? 25 FABRIC Meeting, Poznan Poland, 25 Sep 2006, R. Hughes-Jones Manchester Backup Slides 26 FABRIC Meeting, Poznan Poland, 25 Sep 2006, R. Hughes-Jones Manchester Bandwidth on Demand Our Long-Term Vision Bandwidth Request Policy Middleware Network Resource Mgr Bandwidth Request Applications e.g. GRID Research Activity UNI-C Command 1678 MCC Ethernet 1678 MCC GMPLS Ethernet 1678 MCC Applications e.g. GRID 27 FABRIC Meeting, Poznan Poland, 25 Sep 2006, R. Hughes-Jones Manchester 10 Gigabit Ethernet: UDP Throughput 1500 byte MTU gives ~ 2 Gbit/s Used 16144 byte MTU max user length 16080 DataTAG Supermicro PCs Dual 2.2 GHz Xenon CPU FSB 400 MHz PCI-X mmrbc 512 bytes wire rate throughput of 2.9 Gbit/s CERN OpenLab HP Itanium PCs Dual 1.0 GHz 64 bit Itanium CPU FSB 400 MHz PCI-X mmrbc 4096 bytes wire rate of 5.7 Gbit/s SLAC Dell PCs giving a Dual 3.0 GHz Xenon CPU FSB 533 MHz PCI-X mmrbc 4096 bytes wire rate of 5.4 Gbit/s 6000 16080 bytes 14000 bytes 12000 bytes 10000 bytes 9000 bytes 8000 bytes 7000 bytes 6000 bytes 5000 bytes 4000 bytes 3000 bytes 2000 bytes 1472 bytes an-al 10GE Xsum 512kbuf MTU16114 27Oct03 5000 Recv Wire rate Mbits/s 4000 3000 2000 1000 0 0 5 10 15 20 25 Spacing between frames us 30 35 40 28 FABRIC Meeting, Poznan Poland, 25 Sep 2006, R. Hughes-Jones Manchester 10 Gigabit Ethernet: Tuning PCI-X 16080 byte packets every 200 µs Intel PRO/10GbE LR Adapter PCI-X bus occupancy vs mmrbc mmrbc 512 bytes Measured times Times based on PCI-X times from the logic analyser Expected throughput ~7 Gbit/s Measured 5.7 Gbit/s 10 mmrbc 1024 bytes PCI-X Transfer time us DataTAG Xeon 2.2 GHz 8 6 4 measured Rate Gbit/s rate from expected time Gbit/s Max throughput PCI-X 2 0 0 1000 2000 3000 4000 Max Memory Read Byte Count 5000 CSR Access mmrbc 2048 bytes PCI-X Sequence Data Transfer Interrupt & CSR Update 10 Kernel 2.6.1#17 HP Itanium Intel10GE Feb04 PCI-X Transfer time us 8 6 4 2 measured Rate Gbit/s rate from expected time Gbit/s Max throughput PCI-X mmrbc 4096 bytes 5.7Gbit/s 0 0 1000 2000 3000 4000 Max Memory Read Byte Count 5000 FABRIC Meeting, Poznan Poland, 25 Sep 2006, R. Hughes-Jones Manchester 29 Bandwidth Challenge wins Hat Trick The maximum aggregate bandwidth was >151 Gbits/s 130 DVD movies in a minute serve 10,000 MPEG2 HDTV movies in real-time 22 10Gigabit Ethernet waves Caltech & SLAC/FERMI booths In 2 hours transferred 95.37 TByte 24 hours moved ~ 475 TBytes Showed real-time particle event analysis SLAC Fermi UK Booth: 1 10 Gbit Ethernet to UK NLR&UKLight: transatlantic HEP disk to disk VLBI streaming 2 10 Gbit Links to SALC: rootd low-latency file access application for clusters Fibre Channel StorCloud 4 10 Gbit links to Fermi Dcache data transfers SC2004 101 Gbit/s FNAL-UltraLight SLAC-ESnet-USN In to booth UKLight FermiLab-HOPI SLAC-ESnet Out of booth 30 FABRIC Meeting, Poznan Poland, 25 Sep 2006, R. Hughes-Jones Manchester SC|05 Seattle-SLAC 10 Gigabit Ethernet 2 Lightpaths: Routed over ESnet Layer 2 over Ultra Science Net 6 Sun V20Z systems per λ dcache remote disk data access 100 processes per node Node sends or receives One data stream 20-30 Mbit/s Used Neteion NICs & Chelsio TOE Data also sent to StorCloud using fibre channel links Traffic on the 10 GE link for 2 nodes: 3-4 Gbit per nodes 8.5-9 Gbit on Trunk 31 FABRIC Meeting, Poznan Poland, 25 Sep 2006, R. Hughes-Jones Manchester 10 Gigabit Ethernet: TCP Data transfer on PCI-X Sun V20z 1.8GHz to 2.6 GHz Dual Opterons Connect via 6509 XFrame II NIC PCI-X mmrbc 4096 bytes 66 MHz Data Transfer Two 9000 byte packets b2b Ave Rate 2.87 Gbit/s CSR Access Burst of packets length 646.8 us Gap between bursts 343 us 2 Interrupts / burst 32 FABRIC Meeting, Poznan Poland, 25 Sep 2006, R. Hughes-Jones Manchester 10 Gigabit Ethernet: UDP Data transfer on PCI-X Sun V20z 1.8GHz to 2.6 GHz Dual Opterons Connect via 6509 XFrame II NIC PCI-X mmrbc 2048 bytes 66 MHz One 8000 byte packets 2.8us for CSRs 24.2 us data transfer effective rate 2.6 Gbit/s Data Transfer CSR Access 2.8us 2000 byte packet, wait 0us ~200ms pauses 8000 byte packet, wait 0us ~15ms between data blocks 33 FABRIC Meeting, Poznan Poland, 25 Sep 2006, R. Hughes-Jones Manchester