Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Cracking of wireless networks wikipedia , lookup
Deep packet inspection wikipedia , lookup
Bus (computing) wikipedia , lookup
Buffer overflow wikipedia , lookup
TCP congestion control wikipedia , lookup
Recursive InterNetwork Architecture (RINA) wikipedia , lookup
Mapping of scalable RDMA protocols to ASIC/FPGA platforms Yosef Gavriel Tirat-Gefen, PhD Senior Member IEEE Chief Scientist Castel Systems Inc. & Dept. Physics and Astronomy George Mason University Fairfax, VA [email protected] 2004 MAPLD/205 1 Tirat-Gefen Presentation Overview • Motivation • TCP Off-loading • Zero-copying • RDMA protocol • RDMA protocol stack • Structure of a RDMA card • Results • Conclusion 2004 MAPLD/205 2 Tirat-Gefen Motivation Supercomputer or Server farm Supercomputer or Server farm WAN Terabyte storage Terabyte storage Workstation Enabling high-bandwidth WAN applications 2004 MAPLD/205 3 Tirat-Gefen Applications • Distributed Command and Control. • Signal processing (e.g. RADAR) • Sharing of intelligence data real-time. • Distributed large scale computation/ simulation of aerospace problems. • Extension of storage area networks over a wide area network (WAN). • Enabling technology for modern supercomputing installations. 2004 MAPLD/205 4 Tirat-Gefen Traditional TCP/IP Networking Application/O.S. Application/O.S. TCP TCP Layer 3 (IP) Layer 2 (MAC) Layer 1 (PHY) 2004 MAPLD/205 Router Layer 3 (IP) Layer 3 Layer 3 Layer 2 (MAC) Layer 2 Layer 2 Layer 1 (PHY) Layer 1 Layer 1 5 Tirat-Gefen Standard Data Flow on TCP/IP Application A Memory Space Application B Memory Space WAN/LAN TCP Buffer/Stack Memory Space L3 2004 MAPLD/205 L2 TCP Buffer/Stack Memory Space L1 L1 6 L2 L3 Tirat-Gefen Standard Data Flow on TCP/IP • Traditional TCP/IP copies data from application to TCP memory buffer • Leads to CPU lost cycles in buffer copying • CPU gets overwhelmed to rates above 2.5 Gbps • TCP/IP off-loading is a help but it does not solve the problem on the receiver side 2004 MAPLD/205 7 Tirat-Gefen TCP/IP off-load processing Application/O.S. TCP Layer 3 (IP) Mapped to hardware Layer 2 (MAC) TCP/IP offload Layer 1 (Phy) 2004 MAPLD/205 Application/O.S. Processor (TOE) 8 Tirat-Gefen Zero-copying and TCP offloading processing Host CPU Cache Memory TCP off-load Processor TOE/NIC Card Host CPU Network buffer WAN/LAN Receive Buffer Host Main Memory 2004 MAPLD/205 9 Tirat-Gefen Zero-copying and TCP offloading processing • Zero-copying is still not achieved as receiver buffer is still copied back to application memory space • TCP/IP off-loading is not scalable • RDMA protocols provide a solution 2004 MAPLD/205 10 Tirat-Gefen RDMA data-flow for WAN applications Host Memory Application Memory Space Host CPU B Host CPU A WAN RDMA NIC Card 2004 MAPLD/205 Host Memory Application Memory Space RDMA NIC Card 11 Tirat-Gefen Scalable WAN-RDMA for bandwidths above 10 Gbps Host 10 Gbps links RDMA NIC Card for WAN Tx Buffer > 10 Gbps RDMA Engine MAC PHY WAN Rx Buffer DMA channel 2004 MAPLD/205 12 Tirat-Gefen The RDMA protocol layers and our prototype Running on Host CPU 2004 MAPLD/205 ULP (e.g. iSCSI, NFS) RDMA DDP MPA SCTP TCP Layer 3 (e.g. IP) Layer 2 (MAC) Layer 1 (PHY) 13 FPGA implementation FPGA and off-the-shelf MAC/PHY chips Tirat-Gefen Overall Hardware/Firmware Organization of the WAN RDMA card PCI-Express/Hyper-transport Interface IP/Firmware module RDMA Protocol Engine Rx Memory controller SCTP Protocol Engine Rx Memory Bank Layer 3 (IP) Processor Tx Memory controller Rx Memory Bank Data stream split/join unit SAR SAR SAR SAR 10GE/OC-192 framer 10GE/OC-192 framer 10GE/ OC-192 framer 10GE/OC-192 framer PHY PHY PHY 2004 MAPLD/205 PHY 14 Tirat-Gefen Present Results • Currently using Virtex-II/Virtex-IIPro (Xilinx) as target devices for our cores • Data indicate that most of the key cores will fit one FPGA device (Virtex-II) • Aggregate of all cores is spanning several FPGAs • Intra-device communication is a issue, need to be careful with PCB design. • We are currently trying to accommodate most of the cores in one FPGA. •Most of the cores will be made available free-of-charge to researchers in non-profit or government organizations. 2004 MAPLD/205 15 Tirat-Gefen Conclusion • Advent of Hyper-transport/ PCI-Express and VITA (embedded computing) standards will enable I/0 bandwidths above 10 Gbps locally • Extension of RDMA protocol enables large bandwidths over wide area networks • The proposed cores will fulfill the natural growth of bandwidth requirements in commercial/defense/aerospace applications. 2004 MAPLD/205 16 Tirat-Gefen