Download Profiling - University of Wisconsin–Madison

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Remote Desktop Services wikipedia , lookup

Bus (computing) wikipedia , lookup

Lag wikipedia , lookup

Direct memory access wikipedia , lookup

Transcript
Profiling Grid Data Transfer
Protocols and Servers
George Kola, Tevfik Kosar and Miron Livny
University of Wisconsin-Madison
USA
Motivation




Scientific experiments are
generating large amounts
of data
Education research &
commercial videos are not
far behind
Data may be generated and
stored at multiple sites
How to efficiently store
and process this data ?
Applic
ation
First
Data
Data
Volume
(TB/yr)
Users
SDSS
1999
10
100s
LIGO
2002
250
100s
ATLAS
/CMS
2005
5,000
1000s
Source: GriPhyN Proposal, 2000
WCER
2004
500+
100s
2/33
Motivation


Grid enables large scale computation
Problems
Data intensive applications have suboptimal
performance
 Scaling up creates problems
 Storage servers thrash and crash


Users want to reduce failure rate and improve
throughput
3/33
Profiling Protocols and Servers




Profiling is a first step
Enables us to understand how time is spent
Gives valuable insights
Helps
computer architects add processor features
 OS designers add OS features
 middleware developers to optimize the middleware
 application designers design adaptive applications

4/33
Profiling

We (middleware designers) are aiming for
automated tuning
Tune protocol parameters, concurrency level
 Depends on dynamic state of network, storage server



We are developing low overhead online analysis
Detailed Offline + Online analysis would enable
automated tuning
5/33
Profiling

Requirements
Should not alter system characteristics
 Full system profile
 Low overhead


Used OProfile
Based on Digital Continuous Profiling Infrastructure
 Kernel profiling
 No instrumentation
 Low overhead/tunable overhead

6/33
Profiling Setup

Two server machines
Moderate server: 1660 MHzAthlon XP CPU with 512
MB RAM
 Powerful server: dual Pentium 4 Xeon 2.4 GHz CPU
with 1 GB RAM.


Client Machines were more powerful – dual Xeons



To isolate server performance
100 Mbps network connectivity
Linux kernel 2.4.20, GridFTP server 2.4.3 , NeST
prerelease
7/33
GridFTP Profile
45.0
Percentage of CPU Time
40.0
35.0
30.0
25.0
20.0
15.0
10.0
5.0
0.0
Idle
Ethernet Interrupt
Driver Handling
Libc
Globus
Read From GridFTP
Oprofile
IDE
File I/O
Rest of
Kernel
Write To GridFTP
Read Rate = 6.45 MBPS, Write Rate = 7.83 MBPS
=>Writes to server faster than reads from it
8/33
GridFTP Profile
45.0
40.0
Percentage of CPU Time
35.0
30.0
25.0
20.0
15.0
10.0
5.0
0.0
Idle
Ethernet Interrupt
Driver Handling
Libc
Globus
Read From GridFTP
Oprofile
IDE
File I/O
Rest of
Kernel
Write To GridFTP
Writes to the network more expensive than reads
=> Interrupt coalescing
9/33
GridFTP Profile
45.0
40.0
Percentage of CPU Time
35.0
30.0
25.0
20.0
15.0
10.0
5.0
0.0
Idle
Ethernet Interrupt
Driver Handling
Libc
Globus
Read From GridFTP
Oprofile
IDE
File I/O
Rest of
Kernel
Write To GridFTP
IDE reads more expensive than writes
10/33
GridFTP Profile
45.0
40.0
Percentage of CPU Time
35.0
30.0
25.0
20.0
15.0
10.0
5.0
0.0
Idle
Ethernet Interrupt
Driver Handling
Libc
Globus
Read From GridFTP
Oprofile
IDE
File I/O
Rest of
Kernel
Write To GridFTP
File system writes costlier than reads
=> Need to allocate disk blocks
11/33
GridFTP Profile
45.0
40.0
Percentage of CPU Time
35.0
30.0
25.0
20.0
15.0
10.0
5.0
0.0
Idle
Ethernet Interrupt
Driver Handling
Libc
Globus
Read From GridFTP
Oprofile
IDE
File I/O
Rest of
Kernel
Write To GridFTP
More overhead for writes because of higher transfer rate
12/33
GridFTP Profile Summary

Writes to the network more expensive than reads
Interrupt coalescing
 DMA would help


IDE reads more expensive than writes


Tuning the disk elevator algorithm would help
Writing to file system is costlier than reading
Need to allocate disk blocks
 Larger block size would help

13/33
NeST Profile
60.0
Percentage of CPU Time
50.0
40.0
30.0
20.0
10.0
0.0
Idle
Ethernet Interrupt
Driver Handling
Libc
Read From NeST
NeST
Oprofile
IDE
File I/O
Rest of
Kernel
Write To NeST
Read Rate = 7.69 MBPS, Write Rate = 5.5 MBPS
14/33
NeST Profile
60.0
Percentage of CPU Time
50.0
40.0
30.0
20.0
10.0
0.0
Idle
Ethernet Interrupt
Driver Handling
Libc
Read From NeST
NeST
Oprofile
IDE
File I/O
Rest of
Kernel
Write To NeST
Similar trend as GridFTP
15/33
NeST Profile
60.0
Percentage of CPU Time
50.0
40.0
30.0
20.0
10.0
0.0
Idle
Ethernet Interrupt
Driver Handling
Libc
Read From NeST
NeST
Oprofile
IDE
File I/O
Rest of
Kernel
Write To NeST
More overhead for reads because of higher
transfer rate
16/33
NeST Profile
60.0
Percentage of CPU Time
50.0
40.0
30.0
20.0
10.0
0.0
Idle
Ethernet Interrupt
Driver Handling
Libc
Read From NeST
NeST
Oprofile
IDE
File I/O
Rest of
Kernel
Write To NeST
Meta data updates (space allocation) makes
NeST writes more expensive
17/33
GridFTP versus NeST

GridFTP


NeST


Read Rate = 6.45 MBPS, write Rate = 7.83 MBPS
Read Rate = 7.69 MBPS, write Rate = 5.5 MBPS
GridFTP is 16% slower on reads
GridFTP I/O block size 1 MB (NeST 64 KB)
 Non-overlap of disk I/O & network I/O


NeST is 30% slower on writes

Lots (space reservation/allocation)
18/33
Effect of Protocol Parameters

Different tunable parameters
I/O block size
 TCP buffer size
 Number of parallel streams
 Number of concurrent transfers

19/33
Read Transfer Rate
20/33
Server CPU Load on Read
21/33
Write Transfer Rate
22/33
Server CPU Load on Write
23/33
Transfer Rate and CPU Load
24/33
Server CPU Load and L2 DTLB misses
25/33
L2 DTLB Misses
Parallelism triggers the kernel to use larger page size
=> lower DTLB miss
26/33
Profiles on powerful server

Next set of graphs were obtained using the
powerful server
27/33
Parallel Streams versus Concurrency
28/33
Effect of File Size (Local Area)
29/33
Transfer Rate versus Parallelism in
Short Latency (10 ms) Wide Area
30/33
Server CPU Utilization
31/33
Conclusion


Full system profile gives valuable insights
Larger I/O block size may lower transfer rate


Parallelism may reduce CPU load




Network, disk I/O not overlapped
May cause kernel to use larger page size
Processor feature for variable sized pages would be useful
Operating system support for variable page size would be
useful
Concurrency improves throughput at increased server load
32/33
Questions

Contact
[email protected]
 www.cs.wisc.edu/condor/publications.html

33/33