Download slides

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Asynchronous Transfer Mode wikipedia , lookup

RapidIO wikipedia , lookup

RS-232 wikipedia , lookup

Low Pin Count wikipedia , lookup

Bus (computing) wikipedia , lookup

Point-to-Point Protocol over Ethernet wikipedia , lookup

Buffer overflow protection wikipedia , lookup

Piggybacking (Internet access) wikipedia , lookup

Nonblocking minimal spanning switch wikipedia , lookup

Deep packet inspection wikipedia , lookup

Cracking of wireless networks wikipedia , lookup

Wake-on-LAN wikipedia , lookup

IEEE 1355 wikipedia , lookup

Recursive InterNetwork Architecture (RINA) wikipedia , lookup

UniPro protocol stack wikipedia , lookup

Internet protocol suite wikipedia , lookup

TCP congestion control wikipedia , lookup

Transcript
TCP Offload Through
Connection Handoff
Hyong-youb Kim and Scott Rixner
Rice University
April 20, 2006
Full TCP Offloading

Move all TCP/IP processing to the network interface

Computation



Memory



Saves processing resources on the host
NIC can be customized for TCP/IP processing
Reduces host memory references
Network interface can exploit small, fast, local memory
Problems

Network interface can become a performance bottleneck



Limited computation on NIC
Limited memory capacity on NIC
Complicates global resource management in the stack
Rice University
TCP Offload Through Connection Handoff
2
Solution: Connection Handoff

Only handoff established connections to NIC


Operating system controls division of work
Only TCP send and receive on the NIC



OS performs connection establishment, routing, …
No changes to sockets API
SPECweb99 performance


17% and 32% reduction in cycles per packet
15% and 27% improved throughput
Rice University
TCP Offload Through Connection Handoff
3
Unmodified Network Stack
User Application
Host OS
User requests
Socket
TCP
IP
Ethernet
Driver
Protocol/socket operations
Packet generation
Receive processing
NIC
Ethernet frames
Rice University
Transmit
Receive


~3100 instructions per packet
~50% of all operations are
memory references
TCP Offload Through Connection Handoff
4
Network Stack with Connection Handoff
NIC
Host OS
User Application
Socket
Connection in OS
TCP
Bypass
IP
Ethernet
Driver
Connection on NIC
Socket
TCP
IP
Ethernet
Lookup
Transmit/Receive
Rice University
Protocol/socket operations
Packet send, same as
unmodified stack
Packet generation
Receive processing
Packet receive now goes
through lookup
TCP Offload Through Connection Handoff
5
Handoff Interface


Extend driver/OS API
Move connections



Relay socket operations between OS and NIC






Handoff (OS): move connection from OS to NIC
Restore (OS, NIC): move connection from NIC to OS
Send (OS): insert send data into NIC's socket
Acknowledge (NIC): remove ack'ed data from OS's socket
Receive (NIC): insert received data into OS's socket
Received (OS): remove received data from NIC's socket
Control (OS, NIC): change socket states, etc.
Misc.

Forward (OS), Post (OS), Resource (NIC)
Rice University
TCP Offload Through Connection Handoff
6
Example Use
Accept connection, receive request, send response, close connection
Host OS
Handoff Command
NIC
Accept
handoff
Allocate connection
Enqueue data
receive
Receive data
Read data
received
Dequeue data
Write data
send
Enqueue data
acknowledge
Receive ACK
Change socket state
control
Receive FIN
Close
control
Send FIN
Destroy connection
control
Destroy connection
Drop sent data
Rice University
TCP Offload Through Connection Handoff
7
Real Prototype



Modified FreeBSD 4.7
AMD Athlon XP 2800+ CPU
Alteon programmable Gigabit Ethernet NIC

1MB memory



Limited to 256 connections
Actual socket buffer data only in main memory
88MHz processor

Limits maximum throughput
Rice University
TCP Offload Through Connection Handoff
8
TCP Send
L2 misses (No Handoff, 1 connection)
Cycles (No Handoff, 256 connections)
L2 misses (No Handoff, 256 connections)
Cycles (Handoff, 256 connections)
L2 misses (Handoff 256 connections)
Cycles per packet
7000
9
8
7
6
5
4
3
2
1
0
6000
5000
4000
3000
2000
1000
0
Rice University
System
Call
TCP
IP
L2 misses per packet
Cycles (No Handoff, 1 connection)
Ethernet Driver Bypass Total
TCP Offload Through Connection Handoff
9
Simulated Machine


Prototype NIC is too slow
Simics full-system simulator



Simulated processor



Boots unmodified operating systems
Use same software as real prototype
1GHz functional x86 processor
Timed memory to mimic Athlon XP 2800+
Simulated NIC


450MHz functional processor
Timed 1Gb/s Ethernet wire
Rice University
TCP Offload Through Connection Handoff
10
SPECweb99, 1024 Connections
No Handoff
Handoff 1024 connections
Static (No Handoff)
Static (Handoff 1024 connections)
Cycles per packet
14000
15% increase in HTTP
throughput (Mb/s)
12000
10000
27% increase in HTTP
throughput (Mb/s)
8000
6000
4000
2000
0
System
Call
Rice University
TCP
IP
Ethernet Driver
TCP Offload Through Connection Handoff
Bypass
Total
11
Summary

Memory behavior limits TCP performance

Connection state accesses cause cache pressure

Offload can help, but full offload is problematic

Connection handoff: offloading made practical



OS in charge of division of work
Host network stack largely unaffected
Ongoing work: OS handoff policies
Rice University
TCP Offload Through Connection Handoff
12