Download In Windows 2003, the use of a larger TCPWindowSize

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Microsoft Windows wikipedia , lookup

Windows Phone wikipedia , lookup

Windows Mobile wikipedia , lookup

Windows RT wikipedia , lookup

Criticism of Windows Vista wikipedia , lookup

DNIX wikipedia , lookup

Unix security wikipedia , lookup

CP/M wikipedia , lookup

OS/2 wikipedia , lookup

Windows Phone 8.1 wikipedia , lookup

Batch file wikipedia , lookup

OS-tan wikipedia , lookup

Windows NT startup process wikipedia , lookup

Windows NT 3.1 wikipedia , lookup

Paging wikipedia , lookup

Windows Server 2003 wikipedia , lookup

Spring (operating system) wikipedia , lookup

Transcript
Tuning Windows 2003 and 2008 for Symantec NetBackup
Introduction
Symantec NetBackup is an enterprise class data protection product with a huge portfolio of
features, platform and application support. Being used in heterogeneous environments can
however lead to that the out-of-box operating system and NetBackup settings are not
sufficient. NetBackup supports many platforms to run the master, media servers, and clients
on, including Windows 2003 and Windows 2008 (also R2 in version 7). Unix and Linux
platforms seems to be able to almost always better cope with the I/O load, whereas the
Windows platform is not equally suited.
Fortunately, there are many tuning parameters available in the NT kernel, and some are
relevant to NetBackup. This article will cover a few of these parameters and settings that I
have discovered over the years working with NetBackup. Microsoft decided, for some
reason, not to use good default values for high I/O load, although with Windows 2008, a lot
more parameters are auto-tuned for this type of load. This article covers Windows 2003 as
well as Windows 2008, and differences in tuning will be pointed out.
Being a backup product, I/O paths are the primary concern, as we need to optimize how the
data can be moved in the best way to minimize infrastructure strains and keep the backup
windows to a minimum. Also, what not everyone considers is that we also have to provide
efficient means of restoring all the data, thus having fast I/O from tape and disk is an
important part as well.
Note: All numeric values used for Windows registry parameters are in decimal mode,
and not the default hexadecimal.
General considerations
I/O paths
For an application such as NetBackup a key factor for success is to design the I/O paths in
such a way that the maximum throughput is made possible on the server’s backplane.
Typically the data I/O enters through the network interfaces, goes via the CPU, and then sent
on the tape drives or disks.
In regards to network interfaces, it is preferred to team multiple interfaces for ingoing traffic.
This does require configuration of the network switch to allow IEEE802.3ad link
aggregations. It is important to allow the switch to distribute incoming packets in order to
fully utilize the bandwidth.
Normal host-based teaming usually only support failover and outbound traffic load balancing.
For NetBackup, outbound traffic is seldom useful, unless vaulting between media servers or
using a network based Disk Pool appliance such as PureDisk, NetApp or DataDomain.
In regards to HBA for SAN connectivity, the I/O for disk and tape should be split. Tape I/O is
synchronous and can impact the disk I/O severely. Also, use several HBA ports in order to
distribute traffic to the tape drives. E.g. a 4GBit HBA port can serve up to four LTO3 drives,
but real world experience show that a maximum of two drives per port works better, due to
I/O interrupt handling and other hardware and kernel constraints. Also, if possible rather use
several single port HBA, and distribute over the available I/O slots in the server. This
typically improves the balancing of the I/O on the backplane, CPU, and memory.
Persistent bindings
It is of outmost importance to configure any HBA with persistent bindings in order not to
“confuse” the Windows kernel if a path disappears and then later becomes active again, or if
the server is rebooted. In many cases the kernel will allocate a new internal path name,
making the path NetBackup is using non-functional. The symptom seen in NetBackup is
MISSING PATH in the Device Monitor. The various HBA vendors have their own tools for
configuring the settings on the HBA, so refer to respective vendor’s documentation tools
when configuring persistent bindings.
Software
Windows 2003 R2 with all applicable updates is preferred. Additional software for SAN
connectivity is required when disk and/or tape drives are SAN attached, and should be the
latest recommended versions
from the respective vendors.
Common on Windows is anti-virus software, and in order to maintain some performance,
these must be configured to exclude NetBackup processes and directories. For instance, see
tech note 295599 (Symantec, 2008b) for further information on excluding directories and
processes for McAfee.
Services
On any Windows server there are many services started automatically. Some can safely be
left started, but one service in particular should always be disabled; the Removable Storage
service.
This service tends to interfere with NetBackup’s device management and should be disabled
in the Services section of the Computer Management Console. Follow the instructions in tech
note 245559 (Symantec, 2003).
As a consequence of disabling the Removable Storage service, the system may log events in
the system log regarding DCOM errors. This errors are harmless and the workaround is
presented in tech note 240378 (Symantec, 2008a).
Another consequence is when backing up the NetBackup servers, bpbkar process logs an
error as the RSM is not running. This can be solved by excluding the
<system_drive>:\WINDOWS\ system32\ntmsdata directory on those servers. Please see tech
note 247001 (Symantec, 2004) for more information.
Device drivers
Always strive to use the latest supported combination of device drivers and firmware for
network adapters and HBAs. Today, most drives come with default settings that are pretty
much optimum, we can however further configure the OS kernel, in order to remove some
overhead and issues.
Disable device driver verification
Windows 2003 and 2008 comes default with random testing of device drivers, and by
disabling this we can gain better performance, as we really don't want the kernel to spend
time on randomly testing drivers for debugging, which we know are working fine . This
parameter is documented for Windows 2003 and 2008, but no conclusive evidence found yet
for Windows 2008R2.
Parameter
HKLM\SYSTEM\CurrentControlSet\Control\SessionManager\Memory
Management\DontVerify RandomDrivers
Value
1
Disable Test Unit Ready
When using tape libraries and tape drives on a SAN where the tape drives are shared among
several media servers, it is highly recommended to disable Test Unit Ready (TUR)
functionality for the tape device drivers. Follow the procedure documented by Microsoft
(2009a). The impact is primarily where NetBackup is configured for Shared Storage Option
(SSO) for tape drives, as any Windows based media server potentially will send SCSI
commands to the drives to check whether they are ready. In SSO configurations, a tape drive
may very well be in use by another host, and any SCSI commands sent from another server
would interfere, and backups and restore operations will experience problems such as slow
performance or even failures.
Virtual memory
It is important to properly size the virtual memory swap file prior installing NetBackup. A
general recommendation is to have a swap file at least two times the size of physical memory
and it must be preset to that size, and not auto-extended.
The reason to this is when a swap file must be extended automatically, the I/O operation in
memory will be denied and a failure reported in NetBackup (most likely a status 81 for the
jobs). This in turn will effectively abort the backup job on the media server. This is a
behavior of the Windows operating system, and can only be avoided by pre-sizing the swap
file.
Storage system
Cache
By default the Windows 2003 operating system is optimized for file services, and thus will
prioritize the file system cache in memory. For Media servers sending data directly to tape,
NAS device, or other OpenStorage devices it may be better to optimize the kernel for
applications instead. Media servers having Disk Storage Units (DSU) of Basic or Enterprise
type may be better off with the default setting though, in order to have a file cache.
Two registry variables are of interest in tuning file system cache;
Value
Parameter
HKLM\System\CurrentControlSet\Services\LanmanServer\Parameters\Size 3
The default for Size variable is 3 which will maximize throughput for both file sharing as
well as network applications in general.
Value
Parameter
HKLM\System\CurrentControlSet\Control\Session Manager\
0
Memory Management\LargeSystemCache
The LargeSystemCache variable should be set to 0 in order to minimize the file system cache
and thus allow more memory for network applications. On servers with plenty of memory,
say 8GB or more, the settings may very well be left unchanged.
Disabling Last accessed
The NTFS file system records the last accessed time for each file and directory, adding to the
I/O operations required when accessing files. An access is defined to any type of operation,
such as directory listing, reading or writing or otherwise updating the file or directories.
If the last access information is not required by company or audit policies, the NetBackup
master server can benefit from disabling it. As the catalog database consists of many
thousands if not millions of files, having the kernel to update each file access adds overhead.
Valu
Parameter
e
HKLM\SYSTEM\CurrentControlSet\Control\FileSystem\NTFSDisableLastAccessUpd
1
ate
The variable has to be added and use the DWORD type. Set the value to 1 in order to disable
last access time stamping.
Disabling 8.3 file names
The NTFS file system keeps a short name for every file in order to maintain compatibility
with older operating systems. However, this setting is not required for a NetBackup master
server, and by disabling it, we decrease the number of necessary I/O operations per file
creation. By disabling it, no 16-bit applications must run on the master server.
Parameter
Valu
e
HKLM\SYSTEM\CurrentControlSet\Control\FileSystem\NTFSDisable8dot3NameCre
1
ation
Networking
There are a number of TCP parameters that can be tuned in order to accommodate for typical
NetBackup I/O. The I/O pattern for a Windows server is normally not a sustained datatransfer, but rather short bursts of I/O.
TCP keepalive time
There may be a delay in detecting the loss of a connection from a NetBackup master server to
a media server. In certain situations, there can be a delay on a NetBackup master server
before it detects that the connection to a media server has been aborted. For example, if a
media server goes down while running a backup, there may be a delay on the master server
before it detects that the media server is no longer available. While at first it may appear that
there is a problem with the NetBackup master server, this delay is actually a result of a
certain TCP/IP configuration parameter called KeepAliveTime that is set to 7,200,000 (two
hours, in milliseconds) by default. Decrease the value to 900000 (15 minutes).
The effect of this delay is that NetBackup jobs running on that media server appear to be
active for a period of time after the connection to the media server has gone down. In some
cases this can result in an undesirable delay before the current backup job fails and is
subjected to the normal NetBackup retry logic for execution on a different media server, if
one is available.
Another scenario where it is important to use a low timeout is where a firewall is in the I/O
path. Typically this is the case in secure networks or when taking backup of servers in a DMZ
or otherwise untrusted network.
Firewalls typically drop the session if no traffic occurs for a set time. NetBackup does not
respond very well to this, and the jobs will fail. This usually happens during incremental
backups, as there could potentially take a very long time before the client sends data to the
media server. Set KeepAliveTime to a value lower than the firewall's timeout, and this
problem is solved.
Parameter
Value
HKLM\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters\KeepAliveTime 0xDBBA0
TCPWindowSize and Window Scaling
In Windows 2003, the use of a larger TCPWindowSize for gigabit network interfaces should
be set to the maximum value 65535.
Windows 2008 (and R2):this parameter is obsolete and disregarded by the kernel.
Value
Parameter
HKLM\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters\TcpWindowSize 65535
For Windows 2003, it may also be useful to allow TCP window scaling in order to allow
larger than 64KB size. Tuning this may actually not be necessary, but the trial method will
have to prove whether it improves the I/O throughput. Windows supports the RFC1323
option.
Valu
Parameter
e
HKEY_LOCAL_MACHINE\System\CurrentControlSet\Services\Tcpip\Parameters\Tc
1
p1323Opts
The TCPWindowSize variable can set up to a value of 1GB. Once the variables is set and
system rebooted the TCP/IP stack will support large windows.
Windows 2008/2008R2:As TCPWindowsSize is deprecated in Windows 2008 (and R2), this
also holds true for Tcp1323Opts.
MaxHashTableSize
On media servers with many concurrent connections such as high multiplexing and many
concurrent sessions to disk at the same time, it may be useful to set the variable to a higher
value than default. The default is calculated as 128 * CPUs^2. Maximum value is 65535
(DWORD).
Windows 2008/2008R2:this parameter is obsolete and disregarded by the kernel.
Valu
Parameter
e
HKEY_LOCAL_MACHINE\System\CurrentControlSet\Services\Tcpip\Parameters\Ma 6553
xHashTableSize
5
NumTcbTablePartitions
By default this variable is calculated on CPU^2. This may not be the best setting for servers
with 8 or more CPUs. For most large servers it is better to use a value equal to 4 x CPU.
Windows 2008/2008R2:this parameter is obsolete and disregarded by the kernel.
Valu
Parameter
e
HKLM\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters\NumTcbTablePartiti
16
ons
MaxUserPort
The default number of ports per IP address is only 5000. For a large NetBackup domain it
may possibly not be sufficient in order to allow large amount of concurrent connections
between Master server, Media servers and clients. The variable is really only useful on
Master and Media servers, unless the client is heavily loaded as well, such as in cases when it
serves as a web or database server.
Windows 2003 support up to 65534 concurrent ports per IP address. The variable does not
exist by default, and must be created manually. The first 1024 ports are reserved, thus it
makes little sense to set to max value. If a host has more than 60000 concurrent connections,
we probably have other problems such as CPU and disk bottlenecks, but a value of 60000
would at least leave us ample room.
Valu
Parameter
e
HKEY_LOCAL_MACHINE\System\CurrentControlSet\Services\Tcpip\Parameters\M 6000
axUserPort
0
In Windows 2008, including Windows 2008R2, the way of setting this has change and we
use the netsh command to configure start port and the range. By default, the start port is
49152, and the end port is 65535. This leaves us with 16383 usable dynamic ports. If the
NetBackup environment is very large, we may still have to tune the available range. This is
done by entering following commands to allow 60000 connections;
netsh int ipv4 set dynamicport tcp start=10000 num=50000
netsh int ipv4 set dynamicport udp start=10000 num=50000
netsh int ipv6 set dynamicport tcp start=10000 num=50000
netsh int ipv6 set dynamicport udp start=10000 num=50000
The UDP ports are just set to have the same range, but NetBackup does not really use UDP
ports.
Processes
Kernel threads
By default, the Windows operating system does not optimize the kernel settings for many
concurrent threads. When the OS is started the kernel allocates structures for the kernel
worker threads which will carry out the actual work that the running processes require, such
as device driver I/O, the kernel itself and other internal components.
NetBackup put a very high load on the master and media servers as many processes are
started on the master and media servers for each active job. Typically, the master server is
maxed out with the default kernel threads settings when reaching a domain of approximately
300 clients.
We could spread the backup window for the clients, but that may not always be possible due
to other constraints. What we can do is to allocate the maximum possible kernel threads, so
that the kernel can serve as many processes as possible at any time.
We are interested in three variables covering kernel threads;
• DefaultNumberofWorkerThreads
• AdditionalDelayedWorkerThreads
• AdditionalCriticalWorkerThreads
The DefaultNumberofWorkerThreads control the number of threads allocated for each work
queue in the kernel. Note: Allocating too many threads may use more system resources than
what is optimal.
Delayed work threads are used for work which are not real-time or otherwise time-critical.
Memory for these threads may be swapped out from CPU cache and memory while in queue.
Worker threads for time-critical processes have high priority and the memory pages must stay
in CPU cache or memory.
Val
ue
Parameter
HKLM\SYSTEM\CurrentControlSet\Services\RpcXdr\Parameters\DefaultNumberofWo
64
rkerThreads
HKLM\SYSTEM\CurrentControlSet\Control\SessionManager\Executive\AdditionalDel
16
ayedWorkerThreads
HKLM\SYSTEM\CurrentControlSet\Control\SessionManager\Executive\AdditionalCrit
16
icalWorkerThreads
All three variables use DWORD as type. The AdditionalDelayedWorkerThreads and
AdditionalCriticalWorkerThreads variables should already exist, but the
RpcXdr\Parameters\DefaultNumberofWorkerThreads path and variable will have to be
created.
The AdditionalDelayedWorkerThreads and AdditionalCriticalWorkerThreads variables
should be set to a value of 16, and DefaultNumberofWorkerThreads to 64.
CPU affinity
On media servers with many CPUs it can be beneficial to the I/O throughput to control which
CPU’s handle network I/O and which CPUs handle tape or disk I/O. By controlling this we
can tell the OS kernel thread scheduler not to do unnecessary context switches, but let the
various I/O threads sit on their respective CPU. Context switching and memory page faults
are very expensive in high I/O load applications such as NetBackup.
The CPU affinity can be configured by using the Interrupt Filter Configuration Tool
(intfiltr.exe) available in the Windows 2003 Resource Kit Tools.
NOTE: Use great care when using this tool!!! And be on the physical console. The tool
allows selecting the various devices present in the system. Select a network device and add it
to the interrupt filter. Note: It may be necessary to select the “Don’t Restart Device when
Making Changes” prior adding it to the filter in order to avoid service interruption or a
crashed system.
Once the device is present in the filter, the CPU masking can be set by clicking on the “Set
Mask” button in the “Interrupt Affinity Mask box”.
NOTE: Some devices may not work with the affinity setting. A reboot may be necessary,
and if the device still does not work after a reboot, removal of the filter is required, and
no CPU affinity can be used for that device.
On Windows 2008R2, the kernel provides a better control of resources using the NUMA
(non-uniform memory access) architecture. Applications which demand high performance
can be written so that the threads are distributed to several cores or maintained on a CPU. In
general, using the principle of locality generates less context switches on the CPUs.
In Windows 2008, the intfiltr.exe tool has been replaced by the IntPolicy tool (Microsoft,
2007).
References
Microsoft (2003) Performance Tuning Guidelines for Windows Server 2003. [Online].
Available from: http://download.microsoft.com/download/2/8/0/2800a... (Accessed: 22 July,
2010)
Microsoft (2007) Interrupt-Affinity Policy Tool. [Online]. Available from:
http://www.microsoft.com/whdc/system/sysperf/IntPo... (Accessed: 2 August, 2010)
Microsoft (2009a) Microsoft (2009) Windows Server 2003 cannot perform backup jobs to
tape devices on a storage area network. [Online]. Available from:
http://support.microsoft.com/kb/842411 (Accessed: 22 July, 2010)
Microsoft (2009b) Performance Tuning Guidelines for Windows Server 2008 R2. [Online].
Available from: http://www.microsoft.com/whdc/system/sysperf/Perf_... (Accessed: July 21,
2010)
Symantec (2003) How to disable the Removable Storage Manager service to avoid conflict
with VERITAS NetBackup. [Online]. Available from:
http://seer.entsupport.symantec.com/docs/245559.htm
Symantec (2004) Problems report showing Removable Storage Management Win32 1058
error. [Online]. Available from: http://seer.entsupport.symantec.com/docs/247001.htm
Symantec (2008a) GENERAL ERROR: After disabling Removable Storage Management
(RSM) services on Windows 2000 and 2003, the system event viewer log reports Evt ID:
10005. NtmsSvc DCOM errors. [Online]. Available from:
http://seer.entsupport.symantec.com/docs/240378.htm
Symantec (2008b) 3RD PARTY: NetBackup Services are randomly shutting down on
Windows servers. [Online]. Available from:
http://seer.entsupport.symantec.com/docs/295599.htm
Symantec (2010a) Symantec NetBackup ™ Backup Planning and Performance Tuning Guide
- UNIX, Windows, and Linux - Release 6.5. [Online]. Available from:
ftp://exftpp.symantec.com/pub/support/products/Net... (Accessed: July 21, 2010)