Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
How to Minimize Transport Protocol Processing:
Implementation and Evaluation of
Network Level Framing
Pål Halvorsen, Thomas Plagemann, and Vera Goebel
Institute for Informatics, University of Oslo
Norway
4th International Workshop on
Multimedia Network Systems and Applications (MNSA ’02),
Vienna, Austria, July 2002
Overview
Application scenario
The INSTANCE project
Network Level Framing (NLF)
design and implementation
performance evaluation
Summary and conclusions
MNSA’02, Vienna, Austria, July 2002
© 2002 Pål Halvorsen
Application Scenario
Media-on-Demand server:
Applicable in applications like News- or
Video-on-Demand provided by city-wide
cable or pay-per-view companies
Multimedia Storage
Server
Network
Retrieval is the bottleneck:
Some important factors:
• Memory management
• Communication protocol processing
• Error management
MNSA’02, Vienna, Austria, July 2002
Network
Project goals:
Optimize performance within a
single server:
• Reduce resource requirements
• Maximize number of clients
© 2002 Pål Halvorsen
The INSTANCE Project
We try to make optimal use of a
given set of resources:
memory architecture
integrated error management
network
Project goals:
level framing
(NLF)
Optimize performance within a
single server:
• Reduce resource requirements
• Maximize number of clients
MNSA’02, Vienna, Austria, July 2002
© 2002 Pål Halvorsen
Traditional Approach
TRANSPORT
TRANSPORT
TRANSPORT
TRANSPORT
NETWORK
NETWORK
NETWORK
NETWORK
LINK
LINK
LINK
LINK
Upload to server
Frequency: low (1)
MNSA’02, Vienna, Austria, July 2002
Download from server
Frequency: very high
© 2002 Pål Halvorsen
Network Level Framing (NLF): Basic Idea
TRANSPORT
TRANSPORT
TRANSPORT
TRANSPORT
NETWORK
NETWORK
NETWORK
NETWORK
LINK
LINK
LINK
LINK
Upload to server
Frequency: low (1)
MNSA’02, Vienna, Austria, July 2002
Download from server
Frequency: very high
© 2002 Pål Halvorsen
When to Store Packets
UDP
Transport Layer
TCP
TCP
or
or
UDP/FEC
UDP/FEC
UDP
UDP
Network Layer
IP
IP
IP
IP
Link Layer
MNSA’02, Vienna, Austria, July 2002
© 2002 Pål Halvorsen
Splitting the UDP Protocol
udp_PreOut()
udp_output()
Prepend UDP and IP headers
Temporarily connect
Prepare pseudo header for
checksum, clear unknown fields
udp_output()
Prepend UDP and IP headers
Precalculate checksum
Prepare pseudo header
for checksum
Calculate checksum
UDP
UDP
udp_QuickOut()
Update UDP and IP headers
Fill in some other IP header fields
UDP
Update checksum, i.e., only add
checksum of prior unknown fields
Hand over datagram to IP
Fill in other IP header fields
Disconnect connect
socket
MNSA’02, Vienna, Austria, July 2002
Hand over datagram to IP
© 2002 Pål Halvorsen
Traditional Checksum Operations – I
The UDP checksum covers three fields:
A 12 byte pseudo header containg fields from the IP header
The 8 byte UDP header
The UDP data (payload)
Simplified checksum calculation function (in_cksum):
u_16int_t *w;
int checksum;
for each mbuf in packet {
w = mbuf -> m_data;
while data in mbuf {
checksum += w;
w++;
}
}
MNSA’02, Vienna, Austria, July 2002
© 2002 Pål Halvorsen
Traditional Checksum Operations – II
Traditional checksum operation:
u_16int_t *w;
int checksum;
for each mbuf in packet {
w = mbuf -> m_data;
while data in mbuf {
checksum += w;
w++;
}
}
MNSA’02, Vienna, Austria, July 2002
© 2002 Pål Halvorsen
Modified Checksum Operations
NLF checksum operation:
+
+
=
MNSA’02, Vienna, Austria, July 2002
© 2002 Pål Halvorsen
Implementation – I
data
Straight forward implementation:
precalculated
header
(meta-data)
To allow flexibility, we have one data and
one meta-data file:
data
meta-data
UDP
MNSA’02, Vienna, Austria, July 2002
© 2002 Pål Halvorsen
Implementation – II
NLF version 1:
most of the UDP/IP processing is spent on checksum calculation
precalculate checksum over data payload
during transmission time:
generate header
calculate checksum over header and add precalculated payload checksum
NLF version 2:
several reports show increased performance using header templates
precalculate checksum over data payload
during stream open:
generate header template
calculate header checksum
during transmission time:
block copy header template
add header template checksum, payload checksum, and packet length field
MNSA’02, Vienna, Austria, July 2002
© 2002 Pål Halvorsen
Performance: Test Setup
Implemented in NetBSD 1.5.2
Dell Precision Workstation 620
PIII 933 MHz CPU
3 COM 1 Gbps NIC
Software probe
RDTSC instruction
CPUID instruction
probe overhead 206 cycles
Performed tests using 1 KB, 2 KB, 4 KB, and 8 KB UDP packets
Transmitting 225 MB of data
Data is transmitted using the zero-copy data path
MNSA’02, Vienna, Austria, July 2002
© 2002 Pål Halvorsen
Performance: Checksum
Overhead increases linearly
with payload size
11899
23674
7000
CPU cycles
6000
5000
Traditional
4000
UDP data
3000
UDP data +
header
2000
1000
0
1 KB
2 KB
4 KB
Packet size
8 KB
Overhead is constant
regardless of payload
~ 50 cycles less
MNSA’02, Vienna, Austria, July 2002
© 2002 Pål Halvorsen
Performance: Header Overhead
1000
CPU cycles
800
~25 cycles more
600
NLF, v1
NLF, v2
400
200
0
1 KB
2 KB
4 KB
Packet size
8 KB
NLF version 3: use header template checksum, but
generate header instead of block copy
MNSA’02, Vienna, Austria, July 2002
© 2002 Pål Halvorsen
Performance: UDP
12304
24108
7000
6000
CPU cycles
5000
Traditional
NLF, v1
NLF, v2
NLF, v3
4000
3000
2000
1000
0
1 KB
MNSA’02, Vienna, Austria, July 2002
2 KB
4 KB
Packet size
8 KB
© 2002 Pål Halvorsen
Conclusions and Future Work
Network Level Framing reduces communication
system processing by precalculating
payload checksum (off-line)
header checksum (stream open)
Gain per packet is dependent of packet payload size,
e.g., 1 KB (8 KB) 97.3 % (99.6 %)
Our mechanisms (at least) double the
number of concurrent clients
Ongoing and future work:
NLF in lower protocols (ongoing)
On-board processing
MNSA’02, Vienna, Austria, July 2002
© 2002 Pål Halvorsen
Questions??
MNSA’02, Vienna, Austria, July 2002
© 2002 Pål Halvorsen
Related Work
Checksum caching in memory
high data rates
cached elements will be removed
before it can be reused
Header templates
block-copying is time consuming
On-Board processing
useful and becoming “off-the-shelve” hardware
may be nice to combine with NLF
MNSA’02, Vienna, Austria, July 2002
© 2002 Pål Halvorsen