Download Link to Poster

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
TCP Offload through Connection Handoff
Hyong-Youb Kim ([email protected]) Scott Rixner ([email protected])
Cycles (Base)
Cycles (Zero-copy)
Cycles (+Checksum Offload)
Cycles (ZC+CO)
Cycles (+1024 Connections)
Cycles (+20ms Latency)
Instructions(Base)
Instructions (Zero-copy)
Instructions (+Checksum Offload)
Instructions (ZC+CO)
Instructions (+1024 Connections)
Instructions (+20ms Latency)
Uops (Base)
Uops (Zero-copy)
Uops (+Checksum Offload)
Cycles (SPECweb99)
Instructions (SPECweb99)
10000
8000
6000
4000
8000
6000
4000
2000
2000
0
0
System Call
TCP
IP
Eth
Driver
Total
System Call
TCP
IP
Eth
Driver
Network Stack Performance
•Well known issues: Data copies and TCP checksum calculations
•New observation: A large number of connections and long network
latencies kill processor performance, leading to reduced system
performance.
•Connection data structures (protocol control blocks, sockets, etc.)
overwhelm L2 caches and cause cache misses
TCP Offload
1200
1000
CS
IBM
NASA
SPECWEB
WC
800
600
CPU
DDR
DRAM
DRAM
400
200
0
4
Total
HTTP Content Throughput (Mb/s)
10000
Counts per packet
12000
Counts per packet
12000
HTTP Content Throughput (Mb/s)
Performance Issues of TCP/IP Stack
8
16
32
64
128
?
PCI
Chipset
Offload
Processor
NIC
256 512 1024 2048
Connections
1200
CS
IBM
NASA
SPECWEB
WC
1000
800
•Offload processor runs TCP. It has fast memory for
storing connections and can process packet more
efficiently than host CPU.
600
400
•Offload processor needs to communicate with host
CPU, so there needs to be software interface
200
0
0
5
10
15
20
25
30
35
40
One-way Latency (ms)
Web Server Throughput
Connection Handoff Interface
Design considerations:
Offload Policies:
1.NIC has finite compute power
NIC is a non-trivial resource. The host OS
must manage it through policies.
3.Do not want to complicate host stack
software architecture
4.Do not want to modify the sockets
API
Host OS
2.NIC has finite memory
Connection handoff: OS establishes a
connection and hands it off to NIC
1.Can control the amount of work
2.Minimal impact on stack architecture
NIC
Advantages of connection handoff:
User Application
Socket
TCP
IP
Bypass
Ethernet
Driver
Socket
TCP
Transmit
IP
Receive
Ethernet
Lookup
File
Socket Socket Buffer Events
Common Protocol Control Block
Handoff
(offload)
interface
synchronizes
sockets
TCP Control Block
Process
information
Cached Route
TCP Control Block
Cached Route
Data Structures
Standard interface for offload firmware:
CPU
Experimental Results
1.6
1.4
1.2
1
0.8
0.6
0.4
0.2
0
12000
3.5
5000
3
4000
2.5
3000
2
1.5
2000
1
1000
0.5
TCP
IP
Ethernet
Driver
Bypass
10000
4
3.5
8000
3
6000
2.5
2
4000
1.5
1
2000
0.5
0
0
System
Call
Total
TCP
IP
Ethernet
Driver
Bypass
Total
SPECweb99
Cycles (No Handoff)
Instructions (No Handoff)
L2 Misses (No Handoff)
Cycles (Handoff)
Instructions (Handoff)
L2 Misses (Handoff)
To
ta
l
System
Call
TCP
IP
Ethernet
Driver
Bypass
Total
Counts per packet
Writing firmware involves too many low level
hardware details. Firmware is not portable.
Standard API for firmware can help.
1.Send/receive through MAC
30
16000
L2 misses per packet
Counts per packet
7
6
5
4
3
2
1
0
Too many low level details
Components:
Cycles (Handoff)
Instructions (Handoff)
L2 Misses (Handoff)
18000
7000
6000
5000
4000
3000
2000
1000
0
DRAM
5
TCP Send: 256 total, 256 offloaded
Cycles (No Handoff)
Instructions (No Handoff)
L2 Misses (No Handoff)
SRAM
4.5
0
System
Call
MAC
Cycles (Handoff)
Instructions (Handoff)
L2 Misses (Handoff)
L2 misses per packet
4
0
R
ec
ei
ve
Tr
a
ch
ns
_n
m
ic
_h it
an
ch
_n
do
ic
ff
_r
es
to
ch
re
_n
ic
_s
ch
en
_n
d
ic
_r
ec
vd
ch
_n
ic
ch
_c
_n
trl
ic
_f
or
w
ar
ch
d
_o
s_
re
cv
ch
_o
s_
ac
ch
k
_o
ch
s
_o
_c
s_
trl
re
so
ch
ur
_o
ce
s_
re
st
or
e
Messages per packet
Web (No Handoff)
Web (Handoff)
6000
Counts per packet
Alteon programmable Gigabit Ethernet NIC
Cycles (No Handoff)
Instructions (No Handoff)
L2 Misses (No Handoff)
Cycles (Handoff)
Instructions (Handoff)
L2 Misses (Handoff)
Counts per packet
Athlon XP CPU, 2GB DRAM, FreeBSD 4.7
CPU
DMA
25
14000
12000
20
10000
15
8000
6000
10
4000
5
2000
0
L2 misses per packet
Cycles (No Handoff)
Instructions (No Handoff)
L2 Misses (No Handoff)
L2 misses per packet
Prototype system:
Socket operations occur less
frequently than transmit and
receive: Reduced PCI message
traffic
Yes/No
Policy objectives: Maximize packet rates,
ensure fair allocation of NIC resources, etc.
4.Can achieve zero-copy I/O
TCP Send (No Handoff)
TCP Send (Handoff)
Policy
NIC
information
Socket Socket Buffer
Network Stack
3.Socket interface unchanged
Connection
information
0
System
Call
TCP
IP
Ethernet
Driver
Bypass
Total
TCP Send: 512 total, 256 offloaded SPECweb99 simulation with a faster
NIC: 26% increase in HTTP throughput
2.Read/write through DMA
3.Message exchange through driver
4.CPU and memory abstraction
Related documents