Download ppt File - users.cs.umn.edu

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

IEEE 1355 wikipedia , lookup

Internet protocol suite wikipedia , lookup

Recursive InterNetwork Architecture (RINA) wikipedia , lookup

Bus (computing) wikipedia , lookup

TCP congestion control wikipedia , lookup

Buffer overflow protection wikipedia , lookup

UniPro protocol stack wikipedia , lookup

Transcript
VIA and Its Extension To
TCP/IP Network
Yingping Lu ([email protected])
Based on Paper “Queue Pair IP, …”
by Philip Buonadonna
Outline
Motivation
VIA Overview
QP/IP Architecture
QP/IP Performance
Summary
Motivation
High performance computing, clustering applications
require high-throughput, low-latency communications
facility
Traditional TCP/IP is not designed for highthroughput, low-latency communications
Application software has not kept pace with the
increase of I/O speed




Memory copy
Checksum Computation
Interrupt
Context Switching
Typical Communication Data
Path
Bandwidth Comparison
Throughput (MB/s)
Bandwidth Comparison
1000
800
600
TCP/IP
400
VIA
200
0
6
25
2
51
24
0
1
48
0
2
96
0
4
92
1
8
84
3
16
Message Length (Bytes)
68
7
32
VIA Solution
VIA is a industry standard convened by
Microsoft, Compaq, Intel.
Key features of VIA:





Reduce memory copy (Zero-copy)
Direct user level access to NIC hardware
Eliminate OS kernel from critical path
Collapse ISO/OSI model
Offload CPU processing to intelligent NIC
VIA Architecture
VIA Components
Consumer


The end entity to use VIA function to communicate, can be
user-level or kernel
Use VIPL for programming
VI User Agent

Implements OS bypassing agent
Kernel Agent

Device driver, handle security and OS-related issues
VIA-capable NIC (Channel Adapter)

Implements VIA communications
Programming Abstraction
Queue Pairs

Components
 Send queue
 Receive queue
 Completion queue (status)

Data Movement Operations
 Send/Receive
 RDMA Read
 RDMA Write
Virtual Interface (Queue Pair)
Memory Access
Memory Registration



Memory must registered before use
System pins out the memory region
Nic use DMA to transfer data from memory to Nic
Memory Protection

Registered memory are associated with a VI
consumer and only valid to the VI consumer
Gather/Scatter list


Gather list: a list of registered source data buffers
(read)
Scatter List: a list of registered destination data
buffers (write)
Memory Model
Registered
Memory
Region
Virtual
Memory
Space
Page 1
Page n-1
Physical Memory
Page 0
Descriptor
A work queue element to be placed into
queue pair (send or receive queue)
Contains control segment and a list of
address segment
Specifies operation command, memory
address, size
Door Bell
An asynchronous
mechanism to notify VI
NIC of a new work
queue post
Door Bell can be a
register in NIC accessed
by both CPU and NIC
VIPL
Descriptor
1 VI NIC
0
Operation Example –
Send/Receive
Sender:

Consumer:
Receiver

 Register receive buffer
 Post a receive buffer in
the receive queue
 Register send buffer
 Post a Send work
queue element

Channel Adapter:
 Send out the data
and header, data are
retrieved directly
from consumer
memory
Consumer:

Channel Adapter:
 Receive packets from
sender
 Find out a receive
queue element in the
receive queue
 Move data directly to
the buffer specified in
the receive queue
element
Operation Example - RDMA
Write
Initiator

Consumer:
Receiver

 Register receiving buffer
address
 Send the address, R-key
 Register sending buffer
address
 Get receiver’s address
 Post a RDMA Write

Channel Adapter
 Send out data with
header(the operation,
receiving address), data
are retrieved directly
from sender buffer
Consumer
and length to initiator

Channel Adapter
 Receive data
 Check the validity of
address in RDMA header
 Move data directly to
the memory specified in
the RDMA header
Summary of VIA
Goal: low-latency, high-throughput by
offering direct access to NIC, Zero copy
Architecture components: consumer
(VIPL), UA, KA, VI-NIC
Main concepts: queue pairs, memory
pin, gather/scatter, descriptor, door bell
Operations: Send/Receive, RDMA Read,
RDMA Write
Why QP/IP
TCP/IP network is robust, ubiquitous
However, TCP/IP is not designed for highperformance, low-latency purpose
Queue Pair abstraction provides a way to
offload CPU processing, reduce the critical
data path, provide memory zero copy
The Integration of QP and IP may be able to
reduce the latency, improve the throughput
between end-end node applications
connected through TCP/IP network
Challenges to QP/IP
Provide a VIPL supporting QP/IP
Integration of connection setup
Handle message segmentation
Implement TCP/IP mechanism at NIC
Handle message boundary for TCP
Handle zero-copy in the event of packet
loss
QP/IP Architecture
QPIP Components
FSM:




Doorbell FSM
Sched/XMT FSM
RECV FSM
Mgmt FSM
Major Data Abstract



QPs
CQs
TCP Control Block (TCB)
QP/IP State Machines
QPIP Prototype
Three components

Application Library
 PostSend(), PostRecv(), Poll(), Wait()

Kernel driver
 Initialization
 Address mapping mechanism
 Interrupt service

Network interface firmware
 Implement TCP, UDP, IPV6 protocols
Application-Application RTT
Application Throughput & CPU
Utilization
Network Interface Processing
Cost
QPIP Based on NBD
NDB Client Throughput and
CPU Effectiveness
Summary
Integrate the QP concept from VIA with the
ubiquitous TCP/IP network
Provide low-latency, high throughput for SAN
QP/IP contains doorbell FSM, Sched/XMT FSM, RECV
FSM, Mgmt FSM. It also contains QPs, CQs, TCB data
structure.
Demonstrate comparable performance, much lower
CPU utilization with modest hardware.
The programmability also adds flexibility to adapt
with the evolvement of TCP/IP and scheduling
requirements.
Issues
How to integrate TOE in the mechanism?
How to effectively handle message boundary in
TCP to support upper level application, I.e.
iSCSI? How to handle segmentation?
How to support zero-copy in the case of packet
loss?
How to extend this into a WAN environment
(more unpredictability, fluctuation of latency,
available bandwidth, congestion, LFN)?
How to effectively support OSD communication?
Questions?