* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Fast and Reliable Backup and Restore of a VLDB over the Network
Survey
Document related concepts
Transcript
A Case Study: Fast and Reliable Backup and Restore of a VLDB over the Network SQL Server Technical Article Writer: Thomas H. Grohser (=tg=) Contributor: Lindsey Allen Technical Reviewers: Sanjay Mishra, Lubor Kollar, Stuart Ozer, Thomas Kejser, Juergen Thomas, James Podgorski, Burzin Patel Published: June 2009 Applies to: SQL Server 2008 Summary: Database sizes increase constantly, as do requirements for access and availability. At the same time, it is more important than ever to have a fast and reliable backup and recovery plan in place. This document discusses the challenges of designing a robust backup and restore solutions for very large databases (VLDBs). Using a real-world example, it demonstrates how to make the best use of the backup and restore features of SQL Server 2008 to help create a backup and restore plan for VLDBs over the network. Copyright The information contained in this document represents the current view of Microsoft Corporation on the issues discussed as of the date of publication. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information presented after the date of publication. This white paper is for informational purposes only. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED, OR STATUTORY, AS TO THE INFORMATION IN THIS DOCUMENT. Complying with all applicable copyright laws is the responsibility of the user. Without limiting the rights under copyright, no part of this document may be reproduced, stored in, or introduced into a retrieval system, or transmitted in any form or by any means (electronic, mechanical, photocopying, recording, or otherwise), or for any purpose, without the express written permission of Microsoft Corporation. Microsoft may have patents, patent applications, trademarks, copyrights, or other intellectual property rights covering subject matter in this document. Except as expressly provided in any written license agreement from Microsoft, the furnishing of this document does not give you any license to these patents, trademarks, copyrights, or other intellectual property. Unless otherwise noted, the example companies, organizations, products, domain names, email addresses, logos, people, places, and events depicted herein are fictitious, and no association with any real company, organization, product, domain name, e-mail address, logo, person, place, or event is intended or should be inferred. © 2009 Microsoft Corporation. All rights reserved. Microsoft, SQL Server, and Windows are trademarks of the Microsoft group of companies. All other trademarks are property of their respective owners. Contents Introduction.................................................................................................................................................. 5 Solution Overview ...................................................................................................................................... 5 Service-Level Agreement .......................................................................................................................... 5 Solution Overview ..................................................................................................................................... 6 2 How to Use This Document ...................................................................................................................... 6 Implement a Reliable Backup over the Network ................................................................................... 6 When Is a Backup Valid? ...................................................................................................................... 7 Basic Options with the Built-in Backup and Restore Features.............................................................. 7 What Can Go Wrong? ........................................................................................................................... 8 Performing Reliable Backups over the Network ....................................................................................... 8 Handling File Storage Corruption on the Backup Files ......................................................................... 8 Handing Database Server Failure During Backup ................................................................................. 9 Handling Backup Server Failure .......................................................................................................... 10 Making Sure the Backup Is Valid ............................................................................................................. 11 Restoring the Database to Test the Backup........................................................................................ 11 Running DBCC CHECKDB ..................................................................................................................... 11 Automate the Process ......................................................................................................................... 12 Implement Fast Backup over the Network ........................................................................................... 13 Getting Backup to Perform Faster ............................................................. Error! Bookmark not defined. Using Multiple Disks and Files in Parallel ............................................................................................ 14 Using Multiple NICs ............................................................................................................................. 15 Recommendations for the Total Number of Files to Use ................................................................... 16 Other Optimizations and Recommendations ................................................................................... 17 Separating the Backup and the Access Network ................................................................................ 17 Jumbo Frames ..................................................................................................................................... 18 BLOCKSIZE ........................................................................................................................................... 18 BUFFERCOUNT and MAXTRANSFERSIZE ............................................................................................. 19 Backup Compression ........................................................................................................................... 19 Hardware Recommendations for the Backup File Servers ................................................................ 20 Disk Drives ............................................................................................................................................ 20 RAID Controller Settings ..................................................................................................................... 20 HBA Settings ......................................................................................................................................... 20 Network Interface Cards ..................................................................................................................... 20 NUMA-Based Systems ........................................................................................................................ 21 Calculating the Time to Back Up and Restore a Database ............................................................... 21 Conclusion ................................................................................................................................................. 23 3 Appendix: Hardware Used ...................................................................................................................... 24 4 Introduction Backup and restore is like many things in life; you may not appreciate it until you need it for the first time. Over the years we have all seen cases where there was no valid backup when one was needed; it has resulted in companies going out of business, people getting fired, and data being lost. Companies often have what seem like valid reasons for not implementing a backup and restore plan. We often hear quotes like: “We have a cluster, so we do not need backup” or “We don’t need a restore strategy, because we make backups every day.” What we have learned throughout the years is that you always need a backup and restore plan. Solution Overview Service-Level Agreement The most important step for a successful backup is to establish the required Service-Level Agreement (SLA). If you don’t know what your SLA is, you can’t create a backup and restore strategy. The SLA should answer the following questions: The size of the database. This includes the current size and expected growth per month or year. Workload volume. This is how much transaction log is created on average and at peak times. Time to recover. The time to recover is the maximum amount of time a restore from backup after disastrous failure can take. Do not confuse time to recover with high availability; time to recover addresses the situation in which high availability measures fail and recovery from backup is the only option. Worst-case scenario. This is the worst case for this organization’s particular hardware and software configuration. It could be anything from complete data center failure with loss of all data to the failure of a single server. There are many other SLA points that should be agreed on, but they are not relevant here. For this case study, the SLA and the requirements were as follows: 5 The size of the database. 2 terabytes, expected to grow to 4 terabytes within the lifetime of the solution. Workload volume: very high, up to 5 gigabytes (GB) per minute at peak, and about 0.5 gigabytes per minute during the time of the backup window. Time to recover: four hours. Worst-case scenario: complete data center failure and loss of all data in a data center Solution Overview Out-of-the-box SQL Server backup and restore functionality is straightforward. The real challenge is to design a disaster-proof solution that meets all the business requirements while maintaining high performance. Very large databases (VLDBs) pose special problems when it comes to backup operations, particularly if they support a great deal of activity. For the purposes of disaster recovery, you might want to back up the databases to a different location over the network, which introduces another level of challenges. Imagine a two-terabyte online transactional processing (OLTP) database with heavy workload, high transactional throughput, and demanding availability requirements, it is unfeasible to use the Virtual Backup Device Interface (VDI) API of the Microsoft® SQL Server® database software to perform SAN-based backups. Additionally, imagine that the Service-Level Agreement (SLA) requires full restore in less than four hours (including applying logs). This white paper describes a solution for this very problem implemented using SQL Server 2008. A complete backup solution includes full backup, potentially differential backups and frequent log backups. In this case study, differential backup actually increases the overall solution duration (from backup to restore), so differential backups are not used. Log backup is integral part of the solution and executed every minute. The setup and configuration of log backup/ shipping/restore is very complex for the large scale highly available deployment, so it will be described in a different paper. This paper focuses on techniques of full backup performance and scalability. How to Use This Document This document gives you an overview of how to make reliable and fast full backups over the network. After it outlines the requirements in the SLA, this white paper shows how the solution was implemented and which options and parameters were chosen for the various settings involved in backup and restore. It also provides tips and considerations that will help you design your own solution. This document is meant to provide a practical case study of how to efficiently use the backup and restore features in SQL Server 2008 for full backups (most of the features are available in SQL Server 005 as well, but all sample scripts use SQL Server 2008 syntax and capabilities). The same principles apply for copy-only, differential, and transaction log backups, but samples and details for these are outside the scope of this white paper. This document does not provide you with a backup or restore plan; it shows you how to perform a reliable backup that you can use for data recovery later. Because how often you back up your data depends on your business, the question of data backup frequency is not covered here. Implement a Reliable Backup over the Network 6 First and foremost, the solution needs to be reliable, A reliable backup implies the backup can be successfully restored and pass application integrity check. When Is a Backup Valid? This is one of the questions where if you ask ten different people you will get eleven different answers. In this case, we defined a valid backup as one that meets the following criteria: It must exist in a remote location, such as a remote data center. It must be successfully restored. The restored database must show no errors when DBCC CHECKDB is run. So basically a complete backup cycle consists of the following steps: 1. Back up the database. 2. Copy the files to a remote location for storage. The storage could be a file server, a tape device, or both. (This step can be combined with the first one by directly backing up over the network.) 3. Restore the database from the remote location. 4. Run DBCC CHECKDB. 5. Run sanity checks on your data to make sure it’s valid. This step is optional, but we recommend it. Basic application sanity check includes ETL data from restored database to data warehouse environment, then run a predefined set of reports to verify data integrity. Note: Not all companies have a remote data center; if remote restoration is not possible, at least store the backups in a different room from the original data. Never store the backups on the same rack, server, or disk as the original data. Basic Options with the Built-in Backup and Restore Features With the built-in backup and restore commands in SQL Server 2008, you have the following options: Back up to a local disk and copy the files over the network later. Back up directly to a backup share over the network. If you back up to a local disk, the actual backup runs faster but requires extra disk space on the database server. If you back up to a network share, overall performance improves, and no extra disk space is required on the database server. The process for the second option is simpler than the process for the first and is therefore less error-prone. Unless you need to store the backup files locally, you should always back up to a backup share. We will focus on this scenario for the rest of this paper. 7 What Can Go Wrong? It’s easier to find out how to do something correctly by first looking at what can go wrong. Identify possible conditions, and then design ways to protect your organization from these conditions. For example, you might identify the following possibilities: Bits become corrupted on the backup file during the backup. The source server fails during backup. Backup storage fails during backup Backup storage fails after the backup completes. The following sections describe the measures we took to protect our backup from these four failure conditions. Performing Reliable Backups over the Network Let’s get started. We have a database, MyVLDB, that we want to backup. MyVLDB is on a server, SQL01; the file server we want to back up to is BAK01. The name of the file share is \\backup. Figure 1 illustrates this configuration. Figure 1: Backing up over the network The following command backs the database up to the file server. BACKUP DATABASE MyVLDB TO DISK = '\\BAK01\backup\MyVLDB.bak' WITH INIT Although this is a valid command, it does not include any protection against the conditions identified earlier as possible areas of weakness. The following sections walk through some of the things that can go wrong and provide some solutions you can adapt to your own environment. Handling File Storage Corruption on the Backup Files Files can become corrupt during storage. The first and easiest step you can take is to put a checksum in the backup files to make sure any kind of file corruption is detected. 8 BACKUP DATABASE MyVLDB TO DISK = '\\BAK01\backup\MyVLDB.bak' WITH CHECKSUM, INIT; The option WITH CHECKSUM places a checksum on every page in the backup file, which enables detection of corrupt pages in the backup file during restore. WITH CHECKSUM option has 2%-3% overhead, but definitely worth it for reliability reasons. Handing Database Server Failure During Backup The next problem to solve is what happens if your database server fails during a backup. If the server fails during a backup, that backup file is not usable. Restarting the backup after the server is either restarted or repaired is the only way to fix this. You should definitely avoid overwriting existing backup files in case a failure happens during backup, so that you will at least have the last successful backup to fall back to. One of the ways to do this is to keep a backup sequence number. You can set this number to increase each time a backup is performed, or you can build it one the time and date of the backup, which works if you back up on a regular basis, such as once an hour or once a day. BACKUP DATABASE MyVLDB TO DISK = '\\BAK01\backup\MyVLDB<BackupSequenceNumber>.bak' WITH CHECKSUM, INIT; The following example creates a backup sequence number based on the Coordinated Universal Time (UTC) date. -- use GETUTCDATE() as backup sequence number DECLARE @FileName AS nvarchar(4000) = '\\BAK01\backup\MyVLDB' + CONVERT(nvarchar(32), GETUTCDATE(), 112 )+ '.bak'; BACKUP DATABASE MyVLDB TO DISK = @FileName WITH CHECKSUM, INIT; The following example creates a backup sequence number that increments by one each time a backup is run. -- use a sequential number as backup sequence number -- the following code must be executed only once per database to create the -- required table to store the sequence number USE MyVLDB; GO CREATE SCHEMA DBBackup; GO CREATE TABLE DBBackup.Sequence( Number int NOT NULL, ID int NOT NULL, CONSTRAINT PK_Seq PRIMARY KEY CLUSTERED (ID ASC) ); ALTER TABLE DBBackup.Sequence WITH CHECK ADD CONSTRAINT CK_Seq CHECK (ID=1); 9 ALTER TABLE DBBackup.Sequence CHECK CONSTRAINT CK_Seq; INSERT INTO DBBackup.Sequence (Number) VALUES (1,1); -- this is the corresponding backup command DECLARE @SeqNumber AS int; UPDATE MyVLDB.DBBackup.Sequence SET @SeqNumber= Number= Number+ 1 WHERE ID=1; DECLARE @FileName AS nvarchar(4000) = '\\BAK01\backup\MyVLDB' RIGHT('0000000' + CAST(@SeqNumber AS NVARCHAR(8)), 8) + '.bak'; BACKUP DATABASE MyVLDB TO DISK = @FileName WITH CHECKSUM, INIT; Although this second method is more complex, it is also more flexible. The other example in this white paper will build on it. Handling Backup Server Failure Backup file servers or their storage is another point of failure from which you need to protect yourself. The way to prevent this failure is to use a second server (BAK02, in this example). Placing this server in a second data center provides the best protection against our worst-case scenario. The following figure shows a backup sequence that uses a second server and applies the other techniques we’ve discussed. Backup MyVLDB00000001.bak MyVLDB MyVLDB00000003.bak MyVLDB00000005.bak ... SQL01 BAK01 Backup MyVLDB00000002.bak MyVLDB00000004.bak ... BAK02 Figure 2: Protecting against backup server failure The backup are round-robin across two servers, so after two completed backup cycles, you can overcome a complete server failure because you have a valid backup on the second server. 10 If you don’t have a second backup file server, you should try to separate the logical unit numbers (LUNs) so that if you keep multiple generations of backups, a partial failure of the disk subsystems affects only one backup set. Making Sure the Backup Is Valid The next logical step after making the backup is to test whether the backup files are valid. The best way to do this is to restore the database. Restoring the Database to Test the Backup You need to have an instance of SQL Server installed on the backup server and have a LUN on your backup server to restore the databases to it (it should be big enough to hold all the files of your largest database). RAID 0 is an acceptable compromise between performance, costs, and availability, because only temporary data is stored there, which can be easily restored if required. RAID 1/0 is also an option, if you can afford it. Here is a basic layout of the backup server file system. C:\backup MyVLDB00000001.bak MyVLDB00000003.bak C:\restore Figure 3: Backup server file system To receive the disk layout of the database backed in up the file MyVLDB00000001.back, you can issue the following command. RESTORE FILELISTONLY FROM DISK = 'c:\backup\MyVLDB00000001.bak'; To open the first backup file to get the disk information about the source database, create a restore command that places the database data files on the available locations on your backup server in the restore mount point from this information, and then execute it. RESTORE DATABASE MyVLDB FROM DISK = 'C:\backup\MyVLDB00000001.bak' WITH MOVE 'MyVLDB' TO 'C:\restore\MyVLDB.mdf', MOVE 'MyVLDB_log' TO 'C:\restore\MyVLDB.ldf', CHECKSUM; If the restore completes without errors, you know that the backup files were not corrupted. Running DBCC CHECKDB To check the integrity of the restored database, run the following command. 11 DBCC CHECKDB(MyVLDB) WITH NO_INFOMSGS, EXTENDED_LOGICAL_CHECKS, DATA_PURITY If the command returns no errors, you know that you have a valid backup of the database. Automate the Process Here is the code for a script that takes the file name of the first backup file as input, performs the whole restore job automatically, and then runs DBCC CHECKDB at the end. DECLARE @dbName AS sysname = 'MyVLDB'; DECLARE @BackupFileName AS nvarchar(4000) = 'C:\backup\MyVLDB00000001.bak'; DECLARE @DBFiles AS TABLE ( LogicalName nvarchar(128), PhysicalName nvarchar(260), Type char(1), FileGroupName nvarchar(128), Size numeric(20,0), MaxSize numeric(20,0), FileID bigint , CreateLSN numeric(25,0), DropLSN numeric(25,0) NULL, UniqueID uniqueidentifier, ReadOnlyLSN numeric(25,0) NULL, ReadWriteLSN numeric(25,0) NULL, BackupSizeInBytes bigint, SourceBlockSize int, FileGroupID int, LogGroupGUID uniqueidentifier NULL, DifferentialBaseLSN numeric(25,0) NULL, DifferentialBaseGUID uniqueidentifier, IsReadOnly bit, IsPresent bit, TDEThumbprint varbinary(32) NULL ); INSERT INTO @DBFiles EXEC sp_executeSQL N'RESTORE FILELISTONLY FROM DISK = @FileName', N'@FileName NVARCHAR(4000)', @FileName = @BackupFileName; -- Find last valid verified backup so that we don’t get confused at the end DECLARE @lastRestoreID AS bigint = ISNULL( (SELECT TOP(1) restore_history_id FROM msdb.dbo.restorehistory WHERE restore_type = 'D' and destination_Database_Name = @dbName ORDER BY restore_history_id DESC), 0); -- Build the verify command DECLARE @sqlcmd AS nvarchar(4000) = 'RESTORE DATABASE @RestoreName FROM DISK = @FileName WITH '; DECLARE @IsFirst AS tinyint = 1; WHILE ((SELECT COUNT(*) FROM @DBFiles) > 0) BEGIN DECLARE @LN AS sysname = (SELECT TOP(1) LogicalName FROM @DBFiles ORDER BY LogicalName); SET @SQLcmd+='MOVE '''+@LN+''' TO ''C:\Restore\'+@dbname+@LN+'.dbf'''; DELETE @DBFiles WHERE LogicalName = @LN; SET @SQLcmd += ', '; END SET @SQLcmd += ' CHECKSUM'; DECLARE @RC AS int; EXEC @RC = sp_executeSQL @SQLCmd, N'@RestoreName sysname, @FileName nvarchar(4000)', 12 @RestoreName = @dbName, @FileName = @BackupFileName; IF (@RC <> 0 OR NOT EXISTS(SELECT TOP(1) * FROM msdb.dbo.restorehistory WHERE restore_type = 'D' AND destination_Database_Name = @dbName AND restore_history_id > @lastRestoreID)) RAISERROR ('Database restore NOT sucessfull', 16, 0); DBCC CHECKDB(@dbName) WITH NO_INFOMSGS, EXTENDED_LOGICAL_CHECKS, DATA_PURITY IF @@ERROR <> 0 RAISERROR ('Database restore NOT sucessfull', 16, 0); High Performance Backup over the Network Backing up a 2-terabyte database onto a local hard drive and then restoring it meets the time-torecover requirements of our SQL Server deployments, but this simple process does not provide adequate protection against our SLA’s worst-case scenario. On the other hand, backing up the database over the network to another location provides protection against our worst-case scenario, but with a typical 1-gigabit-per-second (Gbps) connection, performance suffers. When we looked at this situation, as a baseline test, we backed up the 2-terabyte database over a 1Gbps network link to another data center 10 miles away. This took more than 24 hours, which was far from acceptable. We needed to design a solution that would remove the bottleneck, enabling us to complete the backup within that time-to-restore window specified in our SLA. Assembling all the small pieces of the backup puzzle and running endless tests, we were able to reduce the backup of the 2-terabyte database to 36 minutes. The solution, which we termed “multistreamed backups over the network”, used eight 1-Gbps network connections. Jumbo frames were configured for each network card, and each network card was then aggregated to layer 2 backup switches to one 10GE (10Gbit Ethernet) line that ran to the second site. Backup now took 2 hours and 15 minutes. Enabling SQL Server 2008 backup compression finally brought the time down to 36 minutes. The database consisted of 32 data files and 1 log file, spanning about 2.5 terabytes across 9 LUNs (data files on an enterprise-class SAN and log files on direct attached storage). Table below shows the backup durations of baseline and two faster backup approaches. The details of each implementation are described in the sections later. Approach Baseline Network Single 1Gbps NIC, default config 8x1Gbps NICs, Jumbo frame Multistreamed backup over the network Multistreamed backup with 8x1Gbps NICs, Jumbo frame Backup Compression Table: Duration of Backup 2TB to a server 10 miles away Duration >24hours 2 hours 25minutes 36 minutes Speed up the Backup There are two ways to speed things up: optimization and parallelization. 13 First, let’s take a look at how backup operates internally. The number of backup threads depends upon the number of logical volumes used for the databases files, and the number of backup devices. This is very important information, because this fact allows us to control thedegree parallelism of backup. Up until now we worked on the disk layout shown in the diagram below. One technique to improve performance is to increase the number of disks and files used, as described in the next section. SQL01 BAK01 MyVLDB.ldf MyVLDB.bak MyVLDB.mdf Figure 4: Sample file backup file and disk layout Using Multiple Disks and Files in Parallel One approach is to increase speed by increasing the number of disks and files used. Let’s see what happens if we add another LUN on each of the servers and spread both the database and the backup over the LUNs. SQL01 BAK01 MyVLDB.ldf MyVLDB00000001_1.bak MyVLDB_1.mdf MyVLDB00000001_2.bak MyVLDB_2.ndf Figure 5: Adding LUNs This is the command to back up the two files. BACKUP DATABASE MyVLDB TO DISK = '\\BAK01\backup\MyVLDB00000001_1.bak', DISK = '\\BAK01\backup\MyVLDB00000001_2.bak' WITH CHECKSUM, INIT; 14 This command results in some performance gain, but not the fourfold increase you would expect (double from read plus double from write). To fully realize this gain, you need to add network bandwidth. Using Multiple NICs For backups over the network, the most likely bottleneck is the network. Increasing the bandwidth of the network above 1 Gbps is not cheap, but adding more 1-gigabit-per-second network cards to your server using network interface cards (NICs) is a relatively inexpensive option. If you add two NICs for the backup on both the database and the backup file server, your configuration now looks like figure 6. Access NIC Backup NICs SQL01 BAK01 Figure 6: Adding NICs Some hardware vendors offer teaming software to bundle multiple physical network cards into one logical network card. This works well for servers where hundreds of clients access the server over the network, because the load-balancing algorithms used all depend on different clients to work. In our case, we want to increase the speed between two individual servers. And this is done by placing the network in different logical subnets. The following table lists the subnets for each network card and server. NIC Access Backup 01 Backup 02 Server SQL01 192.168.1.1 MASK 255.255.255.0 192.168.2.1 MASK 255.255.255.0 192.168.3.1 MASK 255.255.255.0 Server BAK01 192.168.1.2 MASK 255.255.255.0 192.168.2.2 MASK 255.255.255.0 192.168.3.2 MASK 255.255.255.0 Table 1: Subnets Each network card is in a different subnet (192.168.1.0/24, 192.168.3. 0/24). If we now modify our backup command to specify the IP address instead of the server name, we can control which subnet and therefore which NIC is used to transport the data. The fact that all logical subnets reside on the same physical layer 2 network has no influence. BACKUP DATABASE MyVLDB TO DISK = '\\192.168.2.2\backup\MyVLDB00000001_1.bak', 15 DISK = '\\192.168.3.2\backup\MyVLDB00000001_2.bak' WITH CHECKSUM, INIT; In case of a restore, it works the same way. RESTORE DATABASE MyVLDB FROM DISK = '\\192.168.2.2\backup\MyVLDB00000001_1.bak', DISK = '\\192.168.3.2\backup\MyVLDB00000001_2.bak' WITH CHECKSUM, NORECOVERY; This also works with more than two extra network cards. So far I have tested it with up to 16 network cards in parallel. There is also a performance gain if you send more than one file at a time over a network card. BACKUP DATABASE MyVLDB TO DISK = '\\192.168.2.2\backup\MyVLDB00000001_1.bak', DISK = '\\192.168.3.2\backup\MyVLDB00000001_2.bak' DISK = '\\192.168.2.2\backup\MyVLDB00000001_3.bak' DISK = '\\192.168.3.2\backup\MyVLDB00000001_4.bak' WITH CHECKSUM, INIT; Empirical tests have shown that depending on the hardware used (mostly server and NIC0, two to four files per NIC are a good choice. Testing will tell you what the optimal number for your particular system is. Recommendations for the Total Number of Files to Use Based on many experiments, I can give you the following advice on how to configure your backups to perform optimally. The bottom line is that backup works best if every resource is used equally. All of the following equations should be true for optimal results. These equations are listed in order of importance, with the recommended value for n shown in bold type. 𝑁𝑢𝑚𝑏𝑒𝑟𝑂𝑓𝐵𝑎𝑐𝑘𝑢𝑝𝐹𝑖𝑙𝑒𝑠 = 𝑛 … 𝑛 ∈ (𝟏, 2,4,8,16 … ) 𝑁𝑢𝑚𝑏𝑒𝑟𝑂𝑓𝐶𝑜𝑟𝑒𝑠 𝑁𝑢𝑚𝑏𝑒𝑟𝑂𝑓𝐵𝑎𝑐𝑘𝑢𝑝𝐹𝑖𝑙𝑒𝑠 = 𝑛 … 𝑛 ∈ (1,2, 𝟑, 𝟒, 5, … ) 𝑁𝑢𝑚𝑏𝑒𝑟𝑁𝐼𝐶𝑠 𝑁𝑢𝑚𝑏𝑒𝑟𝑂𝑓𝐷𝑎𝑡𝑎𝑏𝑎𝑠𝑒𝐹𝑖𝑙𝑒𝑠 = 𝑛 … 𝑛 ∈ (𝟏, 2,3,4, … ) 𝑁𝑢𝑚𝑏𝑒𝑟𝐿𝑈𝑁𝑠 𝑁𝑢𝑚𝑏𝑒𝑟𝑂𝑓𝐷𝑎𝑡𝑎𝑏𝑎𝑠𝑒𝐹𝑖𝑙𝑒𝑠 1 1 1 1 = 𝑛 … 𝑛 ∈ (… , , , , , 𝟏, 2,4,8,16 … ) 𝑁𝑢𝑚𝑏𝑒𝑟𝑂𝑓𝐵𝑎𝑐𝑘𝑢𝑝𝐹𝑖𝑙𝑒𝑠 16 8 4 2 𝑁𝑢𝑚𝑏𝑒𝑟𝑂𝑓𝐷𝑎𝑡𝑎𝑏𝑎𝑠𝑒𝐹𝑖𝑙𝑒𝑠 1 1 1 1 = 𝑛 … 𝑛 ∈ (… , , , , , 𝟏, 2,4,8,16 … ) 𝑁𝑢𝑚𝑏𝑒𝑟𝑂𝑓𝐶𝑜𝑟𝑒𝑠 16 8 4 2 16 The database files should be evenly distributed across LUNs, and the backup files should be evenly distributed across network cards. For example, place two files on each LUN, and do not place two files on one LUN and one on the other. Table 2 shows an example of balanced files and disks for good backup performance. These numbers are for backup performance only; you will not see a performance increase in normal database I/O between a database that has four files and a database that has eight files on the same number of LUNs. Putting these formulas in context, here are some examples of good combinations. CPU cores LUNs for data files NICs for backup 2 1 1 4 2 2 4 4 2 8 2 2 8 4 2 16 2 2 16 4 4 16 4 4 16 8 4 32 8 4 32 8 8 32 16 8 64 16 8 64 32 16 Table 2: Balanced file and disk combinations Data files 2 2 4 2 4 8 8 16 16 16 16 16 32 32 Backup files 2 4 4 8 8 8 8 16 16 16 32 32 32 64 Other Optimizations and Recommendations Separating the Backup and the Access Network In general, it is a good idea to separate the access and the backup network physically. Due to the totally different nature of access and backup traffic, switches may not always work optimally if both kinds of traffic are performed at the same time. Also, the high-bandwidth requirements of the backup traffic often require more switch capacity. It’s also a good idea in general to keep high-bandwidth traffic within one switch fabric. (If the switch has multiple units, keep traffic within one unit, if possible.) Backup NICs BAK01 SQL01 Stacked Switch Figure 7: Efficient use of network switch hardware 17 The configuration in figure 7 uses the network switch hardware much more efficiently than the one in figure 8. Backup NICs BAK01 SQL01 Stacked Switch Figure 8: Less efficient use of network switch hardware Jumbo Frames The normal maximum size of an Ethernet network packet is 1.5000 bytes (=frame size). This means that is you want to transfer 1 megabyte over the network, is has to be split into 700 packets, which are then transferred one after the other. There are NICs and switches available that support Ethernet packages with larger frame sizes. These NICs and switches are known as jumbo frames. The larger the frame size, the faster the transfer of the data, because fewer round trips between the servers are required. Typically supported jumbo frame sizes are around 4,088 and 9,016 bytes (including the Ethernet and Jumbo Frame Header). For example, with the 9,016-byte frame size, 1 megabyte can be transferred in 117 packets. Empirical test have shown that with a frame size of 9,016 bytes, network performance almost doubles. BLOCKSIZE The block size parameter of BACKUP command should correspond to the block size of the medium data is written to. When you write to single disks, the default of 512 is just fine; if you use RAID arrays or SAN, you must test to see whether the default or 65536 is better. If you back up over the network, choose a value that fills the network packets with as little overhead as possible. Keep in mind that splitting the data works in both directions. A 512-byte block size will result in 2 blocks per network packet (assuming the 1,500-byte frame size of Ethernet), or 8 network packets to transfer a single database page. On the other hand, writing a block of 4,096 bytes will use 3 network packets, or 6 network packets for a single database page. Just to give you another example, with a jumbo frame size of 9,016 bytes, I got the best results with a block size of 8,192 bytes, and with a jumbo frame size of 4.088 bytes, 65,536 delivered the best result. 18 BUFFERCOUNT and MAXTRANSFERSIZE The BUFFERCOUNT and MAXTRANSFERSIZE options of the BACKUP command can also have a huge impact on backup performance. Unfortunately even with weeks of testing I could not find a rule to determine the best number, so you have to test. For MAXTRANSFERSIZE, my advice is that if you run on 64-bit, (either x64 or IA64) and have enough memory available, going with the maximum buffer size of 4 MB (=4194304) is fine. For more information on buffercount and other parameters optimization, please refer to tuning the performance of backup compression technical note:http://sqlcat.com/technicalnotes/archive/2008/04/21/tuningthe-performance-of-backup-compression-in-sql-server-2008.aspx In some cases, much smaller values produced better results, but not with any predictability. You should either test your configuration or just keep the default settings. Backup Compression Backup compression (a new feature in SQL Server 2008) enables you to increase performance and at the same time dramatically reduce the required disk space to store your backups. To enable backup compression, specify the WITH COMPRESSION option with the BACKUP command. The following example shows how to enable backup compression in our scenario. BACKUP DATABASE MyVLDB TO DISK = '\\BAK01\backup\MyVLDB.bak' WITH CHECKSUM, INIT, COMPRESSION; The amount of compression really depends on the data you store. For most databases (numbers, monetary amounts, dates, simple text), the compression ratio will be between 1:4 and 1:6. You can expect lower results if your database contains certain other types of data, such as images that have already been compressed. For more information about using compression with different types of data, see Backup Compression in SQL Server Books Online. In the real world, I have seen backup times shrink from 125 to 36 minutes and file reductions to 20 percent of their original size. There is one drawback: Compression needs CPU resources. SQL Server performs the compression on one thread that writes the data for the backup file, so the number of backup files defines the number of CPU (cores) that work on compression in parallel. It is possible to use Resource Governor to limit the CPU usage of backup giving other connections more resources. If you use Transparent Database Encryption (TDE) on a database, you should not attempt to back it up with compression, because encrypted data does not compress very well. If you do not want to specify the compression option with every backup command, you can use sp_configure to enable backup compression for all backup operations, server-wide. 19 sp_configure 'backup compression default', 1 reconfigure Be careful if you enable compression server-wide; doing so enables backup compression for differential and log backups as well. While the required CPU resources for compressing the data might not be a problem for full backups or backups that are performed at off-peak times, this usage could cause major problems during peak times. Hardware Recommendations for the Backup File Servers Disk Drives Disks that are used to store backup files and to verify the backups (by performing test restores) have fewer requirements than production servers do. This is because almost all I/O operations performed on them are sequential and not random. Therefore Serial ATA (SATA) drives are fine for most cases. RAID Controller Settings For the LUNs that store the backup files, you should configure the largest stripe size available (256 KB, 128 KB, 64 KB), 100 percent write cache, and 0 percent read cache. It is also possible to activate the write cache of the individual spindles. But if the power fails during a backup, the backup is not usable, so in the end, it does not matter whether the bytes in the write cache are lost or not. For the LUN on which the restore tests are performed, you should have a stripe size of 64 KB, 100 percent write cache, and 0 percent read cache. For the RAID level (1, 10, 5, 6…), it really depends on the RAID controller or storage system you use. Because only sequential writes and reads on the data are performed on the backup file server, if the controller caches the data until it can perform a full write of one whole stripe at a time, you can use any RAID level. If the controller can’t provide this behavior and performance is an issue, RAID1, and RAID10 are your only options. HBA Settings If you use SAN for the backup file disks, the maximum queue depth of the adapters that are used to connect to the SAN should be set to the maximum supported value. The default is 32, but backup and restore perform much better at settings around 256. Network Interface Cards You should be very selective about the NIC you use. A certain number of ports does not guarantee excellent I/O performance on all of them at the same time; there are times when two two-port adapters might provide better performance than one four-port adapter. The amount of kernel time used for the driver is also very important. There are NICs on the market with drivers that use up to 50 percent of a single CPU to perform, while others use somewhere between 3 percent and 5 percent. 20 Finally, if you use multiple NICs to perform a backup, it is very important that they use different CPUs to handle their interrupts. NUMA-Based Systems If you have a non-uniform memory access (NUMA) based system, make sure that all the I/O cards (such as NIC, RAID, and HBA) are distributed across all the NUMA nodes of the system. Calculating the Time to Back Up and Restore a Database One of the key elements of an SLA is timeframe, so when you design a backup solution for your organization, you will need to plan for the amount of time the whole process will take. This helps you meet the time-to-restore specification of your SLA, and it also helps you shape other important areas of your plan, such as how often to take backups and the compression settings to use. There are several steps in determining how much time the backup and restore process will take. The first thing to do is calculate how much data there is to be backed up: 𝐷𝑎𝑡𝑎𝑏𝑎𝑠𝑒𝑆𝑖𝑧𝑒[𝐺𝐵] = 𝑆𝑝𝑎𝑐𝑒𝑈𝑠𝑒𝑑𝐼𝑛𝐷𝑎𝑡𝑎𝑓𝑖𝑙𝑒𝑠[𝐺𝐵] + 𝑆𝑖𝑧𝑒𝑂𝑓𝑇𝑟𝑎𝑛𝑠𝑎𝑐𝑡𝑖𝑜𝑛𝐿𝑜𝑔[𝐺𝐵] If you expect a constant growth over time, the total size of the backup after a period of time can be estimated using this formula: 𝐵𝑎𝑐𝑘𝑢𝑝𝐹𝑖𝑙𝑒𝑆𝑖𝑧𝑒[𝐺𝐵] = 𝐷𝑎𝑡𝑎𝑏𝑎𝑠𝑒𝑆𝑖𝑧𝑒[𝐺𝐵] + (𝐸𝑥𝑝𝑒𝑐𝑡𝑒𝑑𝐺𝑟𝑜𝑤𝑡ℎ𝑃𝑒𝑟𝑌𝑒𝑎𝑟[𝐺𝐵] × 𝑌𝑒𝑎𝑟𝑠) If, on the other hand, the growth is relative, you can use this formula to estimate: 𝐵𝑎𝑐𝑘𝑢𝑝𝐹𝑖𝑙𝑒𝑆𝑖𝑧𝑒[𝐺𝐵] = 𝐷𝑎𝑡𝑎𝑏𝑎𝑠𝑒𝑆𝑖𝑧𝑒[𝐺𝐵] × (1 + 𝐸𝑥𝑝𝑒𝑐𝑡𝑒𝑑𝐺𝑟𝑜𝑤𝑡ℎ𝑃𝑒𝑟𝑌𝑒𝑎𝑟[%] 𝑌𝑒𝑎𝑟𝑠 ) 100 You should always add at least 10 percent as a safety buffer. Next, you need to determine the sustained concurrent maximum sequential read and write performance of your disk subsystems. You can use Performance Monitor (known as System Monitor in some versions of the Windows® operating system) to measure the values while performing test backups and restores. This should result in 5 values. If you have multiple LUNs or NICs and the values differ, always use the slowest result for the calculation. 1. 2. 3. 4. 5. 21 Network adapter (NIC) throughput Backup LUN sequential read performance Backup LUN sequential write performance Database data LUN sequential read performance Database data LUN sequential write performance MB/s MB/s MB/s MB/s MB/s Using the number of LUNS used in parallel for the database files, the number of LUNs used in parallel for the backup files, and the number of the network cards used, we can come up with the following performance indicators: 𝑁𝑒𝑡𝑤𝑜𝑟𝑘𝑃𝑒𝑟𝑓𝑜𝑟𝑚𝑎𝑛𝑐𝑒 = NICThroughput × 𝑁𝑢𝑚𝑏𝑒𝑟𝑜𝑓𝑁𝐼𝐶𝑠𝑈𝑠𝑒𝑑 𝐹𝑖𝑙𝑒𝑅𝑒𝑎𝑑𝑃𝑒𝑟𝑓𝑜𝑟𝑚𝑎𝑛𝑐𝑒 = 𝐵𝑎𝑐𝑘𝑢𝑝LUNReadPerformance × 𝑁𝑢𝑚𝑏𝑒𝑟𝑜𝑓𝐿𝑈𝑁𝑠𝑈𝑠𝑒𝑑 𝐹𝑖𝑙𝑒𝑊𝑟𝑖𝑡𝑒𝑃𝑒𝑟𝑓𝑜𝑟𝑚𝑎𝑛𝑐𝑒 = 𝐵𝑎𝑐𝑘𝑢𝑝LUNWritePerformance × 𝑁𝑢𝑚𝑏𝑒𝑟𝑜𝑓𝐿𝑈𝑁𝑠𝑈𝑠𝑒𝑑 𝐷𝑎𝑡𝑎𝑏𝑎𝑠𝑒𝑅𝑒𝑎𝑑𝑃𝑒𝑟𝑓𝑜𝑟𝑚𝑎𝑛𝑐𝑒 = 𝐷𝑎𝑡𝑎LUNReadPerformance × 𝑁𝑢𝑚𝑏𝑒𝑟𝑜𝑓𝐿𝑈𝑁𝑠𝑈𝑠𝑒𝑑 𝐷𝑎𝑡𝑎𝑏𝑎𝑠𝑒𝑊𝑟𝑖𝑡𝑒𝑃𝑒𝑟𝑓𝑜𝑟𝑚𝑎𝑛𝑐𝑒 = 𝐷𝑎𝑡𝑎LUNReadPerformance × 𝑁𝑢𝑚𝑏𝑒𝑟𝑜𝑓𝐿𝑈𝑁𝑠𝑈𝑠𝑒𝑑 The backup compression factor is defined as: 𝐶𝑜𝑚𝑝𝑟𝑒𝑠𝑠𝑖𝑜𝑛𝐹𝑎𝑐𝑡𝑜𝑟 = 𝐵𝑎𝑐𝑘𝑢𝑝𝐹𝑖𝑙𝑒𝑆𝑖𝑧𝑒𝑈𝑛𝑐𝑜𝑚𝑝𝑟𝑒𝑠𝑠𝑒𝑑[𝐺𝐵] 𝐵𝑎𝑐𝑘𝑢𝑝𝐹𝑖𝑙𝑒𝑆𝑖𝑧𝑒𝐶𝑜𝑚𝑝𝑟𝑒𝑠𝑠𝑒𝑑[𝐺𝐵] If you do not use compression, this will result in CompressionFactor = 1. The performance of a backup is limited by the following: 𝐷𝑎𝑡𝑎𝑏𝑎𝑠𝑒𝑅𝑒𝑎𝑑𝑃𝑒𝑟𝑓𝑜𝑟𝑚𝑎𝑛𝑐𝑒[𝑀𝐵/𝑠𝑒𝑐] 𝐵𝑎𝑐𝑘𝑢𝑝𝑃𝑒𝑟𝑓𝑜𝑟𝑚𝑎𝑛𝑐𝑒[𝑀𝐵/𝑠𝑒𝑐] = 𝑚𝑖𝑛 { 𝑁𝑒𝑡𝑤𝑜𝑟𝑘𝑃𝑒𝑟𝑓𝑜𝑟𝑚𝑎𝑛𝑐𝑒[𝑀𝐵/𝑠𝑒𝑐] × 𝐶𝑜𝑚𝑝𝑟𝑒𝑠𝑠𝑖𝑜𝑛𝐹𝑎𝑐𝑡𝑜𝑟 } 𝐹𝑖𝑙𝑒𝑊𝑟𝑖𝑡𝑒𝑃𝑒𝑟𝑓𝑟𝑜𝑚𝑎𝑛𝑐𝑒[𝑀𝐵/𝑠𝑒𝑐] × 𝐶𝑜𝑚𝑝𝑟𝑒𝑠𝑠𝑖𝑜𝑛𝐹𝑎𝑐𝑡𝑜𝑟 In most cases, without backup compression, the network performance number will be the smallest compared to the read and write, but with compression, the limiting factor could be either DatabaseReadPerformance or a new component not taken into consideration so far: the CPU doing the compression. So the estimated time to back up the database becomes: 𝐵𝑎𝑐𝑘𝑢𝑝𝑇𝑖𝑚𝑒[𝑀𝑖𝑛] = 𝐵𝑎𝑐𝑘𝑢𝑝𝐹𝑖𝑙𝑒𝑆𝑖𝑧𝑒[𝐺𝐵] × 1024 𝐵𝑎𝑐𝑘𝑢𝑝𝑃𝑒𝑟𝑓𝑜𝑟𝑚𝑎𝑛𝑐𝑒[𝑀𝐵/𝑠𝑒𝑐] × 60 For the restore, the calculation is a little bit more complex. First you must know whether your system supports instant file initialization. This feature of SQL Server allows the creation of data files on NTFS volumes without zeroing them out at the time of creation. Because there are security risks associated with this feature, it can be used only if you run SQL Server using an account that has been granted Perform Volume Maintenance Tasks rights. If the user account is a member of the local administrator group, this right is granted by default. (Note, the time to initialize the LOG may still be a limiting factor since it cannot be zeroed). If instant file initialization is enabled, FileInitFactor is equal to the compression factor of the backup: 22 𝐷𝑎𝑡𝑎𝑏𝑎𝑠𝑒𝑊𝑟𝑖𝑡𝑒𝑃𝑒𝑟𝑓𝑜𝑟𝑚𝑎𝑛𝑐𝑒[𝑀𝐵/𝑠𝑒𝑐]/𝐹𝑖𝑙𝑒𝐼𝑛𝑖𝑡𝐹𝑎𝑐𝑡𝑜𝑟 𝑅𝑒𝑠𝑡𝑜𝑟𝑒𝑃𝑒𝑟𝑓𝑜𝑟𝑚𝑎𝑛𝑐𝑒[𝑀𝐵/𝑠𝑒𝑐] = 𝑚𝑖𝑛 { 𝑁𝑒𝑡𝑤𝑜𝑟𝑘𝑃𝑒𝑟𝑓𝑜𝑟𝑚𝑎𝑛𝑐𝑒[𝑀𝐵/𝑠𝑒𝑐] × 𝐶𝑜𝑚𝑝𝑟𝑒𝑠𝑠𝑖𝑜𝑛𝐹𝑎𝑐𝑡𝑜𝑟 } 𝐹𝑖𝑙𝑒𝑅𝑒𝑎𝑑𝑃𝑒𝑟𝑓𝑟𝑜𝑚𝑎𝑛𝑐𝑒[𝑀𝐵/𝑠𝑒𝑐] × 𝐶𝑜𝑚𝑝𝑟𝑒𝑠𝑠𝑖𝑜𝑛𝐹𝑎𝑐𝑡𝑜𝑟 And as with the backup, you can calculate the estimated time for the restore: 𝑅𝑒𝑠𝑡𝑜𝑟𝑒𝑇𝑖𝑚𝑒[𝑀𝑖𝑛] = 𝐵𝑎𝑐𝑘𝑢𝑝𝐹𝑖𝑙𝑒𝑆𝑖𝑧𝑒[𝐺𝐵] × 1024 𝑅𝑒𝑠𝑡𝑜𝑟𝑒𝑃𝑒𝑟𝑓𝑜𝑟𝑚𝑎𝑛𝑐𝑒[𝑀𝐵/𝑠𝑒𝑐] × 60 So, if either RestoreTime or BackupTime is higher than you expected, or if either value is higher than your SLA permits, you can experiment with the techniques described earlier to reduce these values. Doing things in parallel usually speeds things up much more than trying to speed up a single component in the chain. In very high-end scenarios, you must apply both. For the entire backup cycle (including verification), the time is: 𝐵𝑎𝑐𝑘𝑢𝑝𝐶𝑦𝑐𝑙𝑒𝑇𝑖𝑚𝑒[𝑀𝑖𝑛] = 𝐵𝑎𝑐𝑘𝑢𝑝𝑇𝑖𝑚𝑒[𝑀𝑖𝑛] + 𝐿𝑜𝑐𝑎𝑙𝑅𝑒𝑠𝑡𝑜𝑟𝑒𝑇𝑖𝑚𝑒[𝑀𝑖𝑛] + 𝐶ℎ𝑒𝑘𝐷𝐵𝑇𝑖𝑚𝑒[𝑀𝑖𝑛] Use the value for BackupTime as defined earlier, and use the following to determine the other two values: 𝐿𝑜𝑐𝑎𝑙 𝑅𝑒𝑠𝑡𝑜𝑟𝑒 𝑇𝑖𝑚𝑒 = 𝐵𝑎𝑐𝑘𝑢𝑝𝐹𝑖𝑙𝑒𝑆𝑖𝑧𝑒[𝐺𝐵] × 1024 𝐷𝑎𝑡𝑎𝑏𝑎𝑠𝑒𝑊𝑟𝑖𝑡𝑒𝑃𝑒𝑟𝑓𝑜𝑟𝑚𝑎𝑛𝑐𝑒[𝑀𝐵/𝑠𝑒𝑐]/𝐹𝑖𝑙𝑒𝐼𝑛𝑖𝑡𝐹𝑎𝑐𝑡𝑜𝑟 𝑚𝑖𝑛 { } × 60 𝐹𝑖𝑙𝑒𝑅𝑒𝑎𝑑𝑃𝑒𝑟𝑓𝑜𝑟𝑚𝑎𝑛𝑐𝑒[𝑀𝐵/𝑠𝑒𝑐] × 𝐶𝑜𝑚𝑝𝑟𝑒𝑠𝑠𝑖𝑜𝑛𝐹𝑎𝑐𝑡𝑜𝑟 𝐶ℎ𝑒𝑐𝑘𝐷𝐵𝑇𝑖𝑚𝑒 = 𝐵𝑎𝑐𝑘𝑢𝑝𝐹𝑖𝑙𝑒𝑆𝑖𝑧𝑒[𝐺𝐵] × 1024 𝐷𝑎𝑡𝑎LUNReadPerformance [𝑀𝐵/𝑠𝑒𝑐] × 60 Note: CHECKBD might require much more than just the time it takes to read from the disk. It depends on the complexity of the database schema. It’s not going to be faster—that’s for sure. Conclusion Microsoft and the SQL Server team are committed to providing technology that will help customers and partners address their backup and restore concerns. SQL Server 2008 provides the technology and capabilities that organizations count on to help address the growing challenges of managing data and its backups. With significant advances in key areas, SQL Server 2008 provides a robust platform for VLDBs. This white paper provides only an overview of the backup and restore feature and functionality in SQL Server 2008. For more information, see the SQL Server Web site (http://www.microsoft.com/sql). 23 For more information: http://www.microsoft.com/sqlserver/: SQL Server Web site http://technet.microsoft.com/en-us/sqlserver/: SQL Server TechCenter http://msdn.microsoft.com/en-us/sqlserver/: SQL Server DevCenter Did this paper help you? Please give us your feedback. Tell us on a scale of 1 (poor) to 5 (excellent), how would you rate this paper and why have you given it this rating? For example: Are you rating it high due to having good examples, excellent screen shots, clear writing, or another reason? Are you rating it low due to poor examples, fuzzy screen shots, or unclear writing? This feedback will help us improve the quality of white papers we release. Send feedback. Appendix: Hardware Used Database server: HP Superdome with 16 dual-core CPUs, 1.6 GHz, 24 MB cache, 256 GB memory 8 dual-port HBA for the SAN connectivity of the database data files 8 SAS RAID controllers for the transaction log 32 network cards The SAN LUNs were each 512 GB, composed of 256 15K RPM spindles as RAID 10 Backup file servers: HP Integrity rx 7640 Servers with 8 dual-core CPUs, 1.6 GHz, 18 MB cache, 32 GB memory 4 SAS RAID controllers (each with a 512 MB cache) 96 SATA drives with 7,200 RPM configured as 32 RAID 1 LUNs The remaining drives were used to build the 4 RAID 0 LUNs for the restore of the database 8 dual-port network cards (8 ports used for the actual backup file share) Network equipment: 24 Cisco C3750E Switch with a jumbo frame size of 4,088 bytes Backup settings: BLOCKSIZE: 65536 BUFFERCOUNT: 256 MAXTRANSFER: 4 MB Number of backup files: 32 (4 per network card) Database server Backup file server MyVLDB_01.mdf MyVLDB00000001_01.bak MyVLDB_02.ndf MyVLDB00000001_02.bak MyVLDB_03.ndf MyVLDB00000001_03.bak MyVLDB_04.ndf MyVLDB00000001_04.bak MyVLDB_05.ndf MyVLDB00000001_05.bak MyVLDB_06.ndf MyVLDB00000001_06.bak MyVLDB_07.ndf MyVLDB00000001_07.bak MyVLDB_08.ndf MyVLDB00000001_08.bak ... ... MyVLDB_29.ndf MyVLDB00000001_29.bak MyVLDB_30.ndf MyVLDB00000001_30.bak MyVLDB_31.ndf MyVLDB00000001_31.bak MyVLDB_32.ndf MyVLDB00000001_32.bak Figure 9: Disk layout and file mapping between database server and backup file server The network cards that were used to perform the backup were evenly distributed across the server, and the CPUs handing the interrupts of the NICs were assigned so that the NUMA mode that hosted the NIC also handled the I/O. 25 Cell Slot 0 8 GB 8 GB 8 GB 8 GB 8 GB 8 8 RAM GB GB 8 GB HP DualCore Itanium2 24MB total CPU Socket 0 HP DualCore Itanium2 CPU Socket 3 HP DualCore Itanium2 CPU Socket 1 HP DualCore Itanium2 CPU Socket 2 1.6GHz Core - 12MB L3 1.6GHz Core - 12MB L3 1.6GHz Core - 12MB L3 1.6GHz Core - 12MB L3 1.6GHz Core - 12MB L3 1.6GHz Core - 12MB L3 24MB total 24MB total 24MB total 1.6GHz Core - 12MB L3 L 1.6GHz Core - 12MB L3 Cell Slot 1 8 GB 8 GB 8 GB 8 GB 8 GB 8 8 RAM GB GB 8 GB Cell Slot 2 8 GB 8 GB 8 GB 8 GB 8 GB 8 8 RAM GB GB 8 GB HP DualCore Itanium2 DualCore Itanium2 CPU Socket24MB0total HPCPU Socket 0 HP DualCore Itanium2 DualCore Itanium2 CPU Socket 3 HPCPU Socket 3 HP DualCore Itanium2 24MB total HP DualCore Itanium2 CPU Socket 1 CPU Socket 1 HP DualCore Itanium2 DualCore Itanium2 CPU Socket 2 HPCPU Socket 2 1.6GHz Core - 12MB L3 1.6GHz Core - 12MB L3 1.6GHz Core - 12MB L3 1.6GHz Core - 12MB L3 1.6GHz Core - 12MB L3 1.6GHz Core - 12MB L3 1.6GHz Core - 12MB L3 1.6GHz Core - 12MB L3 1.6GHz Core - 12MB L3 1.6GHz Core - 12MB L3 1.6GHz Core - 12MB L3 1.6GHz Core - 12MB L3 1.6GHz Core - 12MB L3 1.6GHz Core - 12MB L3 1.6GHz Core - 12MB L3 1.6GHz Core - 12MB L3 24MB total 24MB total 1.6GHz Core - 12MB L3 I/O Bay 1 - Rear x/0/9/0/0/0 x/0/2/0/0/0 x/0/10/0/0/0 8 133MHz PCI-X Mode 1 x/0/12/0/0/0 PCIe 5 x8 PCIe 4 x8 PCIe 3 PCIe 2 xx8 4Gb FC x/0/4/0/0/0 2 PCIe 2 xx8 4Gb FC x/0/2/0/0/0 x/0/6/0/0/0 Application Description Mode 1 1 133MHz PCI-XPCI-X 2x1000TX x/0/1/0/0/0 Mode 1 0 133MHz PCI-XPCI-X 2x1000TX Cell 2 # Application Description x/0/2/0/0/0 x/0/12/0/0/0 4 x8 PCIe 3 PCIe 2 xx8 4Gb FC x/0/4/0/0/0 2 PCIe 2 xx8 4Gb FC x/0/2/0/0/0 x/0/6/0/0/0 x/0/5/0/0/0 Mode 1 1 133MHz PCI-XPCI-X 2x1000TX x/0/1/0/0/0 Mode 1 0 133MHz PCI-XPCI-X 2x1000TX 7 x/0/8/0/0/0 8 133MHz PCI-X Mode 1 x8 PCIe PCIe 133MHz Mode 1 0 PCI-XPCI-X 2x1000TX x/0/9/0/0/0 5 6 x8 PCIe P800 SAS x/0/13/0/0/0 x/0/0/0/0/0 I/O 133MHz PCI-X Mode 1 1 PCI-X 2x1000TX Chas x/0/1/0/0/0 1 PCIe 2 2 xx8FC 4Gb x/0/10/0/0/0 PCIe PCIe x/0/9/0/0/0 9 133MHz PCI-X Mode 1 x8 PCIe P800 SAS x/0/14/0/0/0 5 x8 PCIe P800 SAS x/0/14/0/0/0 133MHz Mode 1 11 PCI-XPCI-X 2x1000TX Mode 1 10 133MHz PCI-XPCI-X 2x1000TX 6 x8 PCIe 133MHz Mode 1 10 PCI-XPCI-X 2x1000TX x/0/8/0/0/0 PCIe 4 x/0/5/0/0/0 133MHz PCI-X Mode 1 9 x/0/10/0/0/0 I/O Chas 1 Mode 1 11 133MHz PCI-XPCI-X 2x1000TX x8 PCIe P800 SAS x/0/13/0/0/0 3 x8 PCIe 133MHz PCI-X Mode 1 8 x/0/12/0/0/0 x/0/0/0/0/0 7 PCIe 2 xx8 4Gb FC x/0/4/0/0/0 x/0/6/0/0/0 Cell 0 # x/0/5/0/0/0 R 133MHz Mode 1 0 PCI-XPCI-X 2x1000TX x/0/0/0/0/0 I/O Mode 1 1 PCI-X PCI-X 2x1000TX Chas 133MHz x/0/1/0/0/0 3 PCIe 2 2 xx8FC 4Gb PCI-X/ USB Mode 1 9 133MHzVGA x8 PCIe P800 SAS x/0/14/0/0/0 8 GB HP DualCore Itanium2 1.6GHz Core - 12MB L3 Mode 1 10 133MHz PCI-XPCI-X 2x1000TX 6 8 GB 1.6GHz Core - 12MB L3 x/0/8/0/0/0 PCIe 8 GB 8 8 RAM GB GB CPU Socket24MB0total 24MB total HP DualCore Itanium2 24MB total CPU Socket 3 24MB total HP DualCore Itanium2 24MB total CPU Socket 1 24MB total HP DualCore Itanium2 24MB total CPU Socket 2 24MB total I/O Bay 0 - Front x8 PCIe P800 SAS x/0/13/0/0/0 8 GB 8 GB 1.6GHz Core - 12MB L3 1.6GHz Core - 12MB L3 Mode 1 11 133MHz PCI-XPCI-X 2x1000TX 7 8 GB 1.6GHz Core - 12MB L3 1.6GHz Core - 12MB L3 1.6GHz Core - 12MB L3 Cell Slot 3 Application Description Cell 3 # PCIe 2 xx8 4Gb FC x/0/4/0/0/0 3 x8 PCIe 4 x/0/5/0/0/0 x8 PCIe 5 x/0/6/0/0/0 Cell 1 # Application Description x8 PCIe P800 SAS x/0/14/0/0/0 PCIe 6 x8 PCIe P800 SAS x/0/13/0/0/0 PCIe 7 133MHz PCI-X Mode 1 8 x/0/12/0/0/0 133MHz PCI-X Mode 1 9 x/0/10/0/0/0 I/O Chas 3 133MHz Mode 1 10 PCI-XPCI-X 2x1000TX x/0/9/0/0/0 133MHz Mode 1 11 PCI-XPCI-X 2x1000TX x/0/0/0/0/0 x/0/8/0/0/0 R L BulkPS0 BulkPS1 BulkPS2 PDCA: Power Source 0 BulkPS3 BulkPS4 BulkPS5 PDCA: Power Source 1 1 CISCO 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 1X 11X 13X 23X 25X 35X 37X 2X 12X 14X 24X 26X 36X 38X 38 39 40 41 42 43 44 45 46 47 48 47X SYST RPS MASTER STAT 49 50 X2-1 DUPLX X2-2 SPEED STACK PoE 51 MODE 10GE to other datacenter Figure 10: Database server backup network configuration 26 48X Catalyst 3750-E Series PoE-48 52 Cell 1 CPU 1.6GHz DualCore Proc AB584A CPU 0 Itanium Core 18MB Itanium Core 1.6GHz DualCore Proc AB584A CPU 1 Itanium Core 18MB Itanium Core 1.6GHz DualCore Proc AB584A CPU 2 Itanium Core 18MB Itanium Core 1.6GHz DualCore Proc AB584A CPU 3 Itanium Core 18MB Itanium Core Cell 0 CPU 1.6GHz DualCore Proc AB584A CPU 0 Itanium Core 18MB Itanium Core 1.6GHz DualCore Proc AB584A CPU 1 Itanium Core 18MB Itanium Core 1.6GHz DualCore Proc AB584A CPU 2 Itanium Core 18MB Itanium Core 1.6GHz DualCore Proc AB584A CPU 3 Itanium Core 18MB Itanium Core Optional Cell Board Memory Memory 2GB (2x1GB) RAM AB453A 2GB (2x1GB) RAM AB453A 2GB (2x1GB) RAM AB453A 2GB (2x1GB) RAM AB453A 2GB (2x1GB) RAM AB453A 2GB (2x1GB) RAM AB453A 2GB (2x1GB) RAM AB453A 2GB (2x1GB) RAM AB453A 2GB (2x1GB) RAM AB453A 2GB (2x1GB) RAM AB453A 2GB (2x1GB) RAM AB453A 2GB (2x1GB) RAM AB453A 2GB (2x1GB) RAM AB453A 2GB (2x1GB) RAM AB453A 2GB (2x1GB) RAM AB453A 2GB (2x1GB) RAM AB453A Chipset HP SX2000 Partition 1 Partition 0 PDCA B1 BPS 1 PDCA B0 BPS 0 PDCA A1 PDCA A0 rx7640 Cell 0 Core I/O Console LAN Cell 1 Core I/O Console LAN Slim DVD 1 1/0/0/3/1.2.0 DDS / Slim DVD 0 0/0/0/3/1.2.0 Internal SCSI Internal SCSI 300GB 15kSlot SCSI 300GB 15kSlot SCSI u320 Disk 0-1 u320 Disk 1-1 Disk AD265A Disk AD265A 0/0/0/3/0.6.0 1/0/0/3/0.6.0 300GB 15kSlot SCSI 300GB 15kSlot SCSI u320 Disk 0-2 u320 Disk 1-2 Disk AD265A Disk AD265A 0/0/1/1/4/1.5.0 1/0/1/1/4/1.6.0 1000t LAN 1000t LAN 1/0/2/1 1/0/4/1 266MHz PCI-e Mode2 8 GraphicsPCI-X / 2 x USB Card 66MHz Mode1 A6869A 1/0/1/1 PCIe Smart ArrayMode2 P800 266MHz PCI-e 1/0/6/1 PCIe Smart ArrayMode2 P800 266MHz PCI-e 7 266MHz Mode2 PCIe 2PortPCI-e 1000T AD337A 133MHz PCI-X PCI-X 2Port 1000T Mode1 A9900A 6 133MHz PCI-X PCI-X 2Port 1000T Mode1 A9900A 133MHz PCI-X PCI-X 2Port 1000T Mode1 A9900A AD335A 1/0/14/1 5 1/0/8/1 4 1/0/10/1 3 AD335A 1/0/12/1 2 0/0/2/1 266MHz PCI-e Mode2 1 0/0/4/1 PCIe Smart ArrayMode2 P800 266MHz PCI-e 8 PCI-X SA6402 A9890A + 66MHz PCI-X Mode1 512M BBWC 372538-B21 0/0/1/1 PCIe Smart ArrayMode2 P800 266MHz PCI-e 0/0/6/1 133MHz PCI-X PCI-X 2Port 1000T Mode1 A9900A 7 266MHz Mode2 PCIe 2PortPCI-e 1000T AD337A 133MHz PCI-X PCI-X 2Port 1000T Mode1 A9900A 6 133MHz PCI-X PCI-X 2Port 1000T Mode1 A9900A 5 AD335A 0/0/14/1 4 0/0/8/1 3 0/0/10/1 2 AD335A 0/0/12/1 1 Modem Console UPS Serial Serial Modem Console UPS PS 2 SAS In 750GB SATA SATA7.2k or SAS Disk Slot Disk 458930-B21 750GB SATA SATA7.2k or SAS Disk Slot Disk 458930-B21 750GB SATA SATA7.2k or SAS Slot Disk Disk 458930-B21 750GB SATA SATA7.2k or SAS Slot Disk Disk 458930-B21 SATA7.2k or SAS 750GB SATA Disk Slot Disk 458930-B21 SATA7.2k or SAS 750GB SATA Disk Slot Disk 458930-B21 SATA7.2k or SAS 750GB SATA Disk Slot Disk 458930-B21 SATA7.2k or SAS 750GB SATA Disk Slot Disk 458930-B21 SATA7.2k or SAS 750GB SATA Disk Slot Disk 458930-B21 SATA7.2k or SAS 750GB SATA Disk Slot Disk 458930-B21 SATA7.2k or SAS 750GB SATA Disk Slot Disk 458930-B21 SATA7.2k or SAS 750GB SATA Disk Disk Slot Disk 458930-B21 Enclosure PS 2 SAS In 1 CISCO 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 1X 11X 13X 23X 25X 35X 37X 2X 12X 14X 24X 26X 36X 38X 38 39 40 41 42 43 44 45 46 47 48 47X SYST RPS MASTER STAT 49 50 X2-2 750GB SATA SATA7.2k or SAS Disk Slot Disk 458930-B21 750GB SATA SATA7.2k or SAS Disk Slot Disk 458930-B21 750GB SATA SATA7.2k or SAS Slot Disk Disk 458930-B21 750GB SATA SATA7.2k or SAS Slot Disk Disk 458930-B21 SATA7.2k or SAS 750GB SATA Disk Slot Disk 458930-B21 SATA7.2k or SAS 750GB SATA Disk Slot Disk 458930-B21 SATA7.2k or SAS 750GB SATA Disk Slot Disk 458930-B21 SATA7.2k or SAS 750GB SATA Disk Slot Disk 458930-B21 SATA7.2k or SAS 750GB SATA Disk Slot Disk 458930-B21 SATA7.2k or SAS 750GB SATA Disk Slot Disk 458930-B21 SATA7.2k or SAS 750GB SATA Disk Slot Disk 458930-B21 SATA7.2k or SAS 750GB SATA Disk Disk Slot Disk 458930-B21 Enclosure PS 2 SAS In SPEED 51 52 48X MSA60 SAS PS 1 msa 60 controller Single Bus SAS SAS Out Catalyst 3750-E Series PoE-48 750GB SATA SATA7.2k or SAS Disk Slot Disk 458930-B21 750GB SATA SATA7.2k or SAS Disk Slot Disk 458930-B21 750GB SATA SATA7.2k or SAS Slot Disk Disk 458930-B21 750GB SATA SATA7.2k or SAS Slot Disk Disk 458930-B21 SATA7.2k or SAS 750GB SATA Disk 458930-B21 Disk Slot SATA7.2k or SAS 750GB SATA Disk 458930-B21 Disk Slot SATA7.2k or SAS 750GB SATA Disk 458930-B21 SATA7.2k or SAS 750GB SATA Disk 458930-B21 SATA7.2k or SAS 750GB SATA Disk Slot Disk 458930-B21 SATA7.2k or SAS 750GB SATA Disk Slot Disk 458930-B21 SATA7.2k or SAS 750GB SATA Disk Slot Disk 458930-B21 SATA7.2k or SAS 750GB SATA Disk Disk Slot Disk 458930-B21 Enclosure PS 2 SAS In 10GE to other datacenter PS 1 msa 60 controller 750GB SATA SATA7.2k or SAS Disk Slot Disk 458930-B21 750GB SATA SATA7.2k or SAS Slot Disk Disk 458930-B21 750GB SATA SATA7.2k or SAS Slot Disk Disk 458930-B21 SATA7.2k or SAS 750GB SATA Disk Slot Disk 458930-B21 SATA7.2k or SAS 750GB SATA Disk Slot Disk 458930-B21 SATA7.2k or SAS 750GB SATA Disk Slot Disk 458930-B21 SATA7.2k or SAS 750GB SATA Disk Slot Disk 458930-B21 SATA7.2k or SAS 750GB SATA Disk Slot Disk 458930-B21 SATA7.2k or SAS 750GB SATA Disk Slot Disk 458930-B21 SATA7.2k or SAS 750GB SATA Disk Disk Slot Disk 458930-B21 Enclosure PS 2 MSA60 SAS PS 1 msa 60 controller Single Bus SAS 750GB SATA SATA7.2k or SAS Disk Slot Disk 458930-B21 750GB SATA SATA7.2k or SAS Disk Slot Disk 458930-B21 750GB SATA SATA7.2k or SAS Slot Disk Disk 458930-B21 750GB SATA SATA7.2k or SAS Slot Disk Disk 458930-B21 SATA7.2k or SAS 750GB SATA Disk Slot Disk 458930-B21 SATA7.2k or SAS 750GB SATA Disk Slot Disk 458930-B21 SATA7.2k or SAS 750GB SATA Disk Slot Disk 458930-B21 SATA7.2k or SAS 750GB SATA Disk Slot Disk 458930-B21 SATA7.2k or SAS 750GB SATA Disk Slot Disk 458930-B21 SATA7.2k or SAS 750GB SATA Disk Slot Disk 458930-B21 SATA7.2k or SAS 750GB SATA Disk Slot Disk 458930-B21 SATA7.2k or SAS 750GB SATA Disk Disk Slot Disk 458930-B21 Enclosure PS 2 SAS In MSA60 SAS PS 1 msa 60 controller Single Bus SAS SAS Out 750GB SATA SATA7.2k or SAS Disk Slot Disk 458930-B21 750GB SATA SATA7.2k or SAS Disk Slot Disk 458930-B21 750GB SATA SATA7.2k or SAS Slot Disk Disk 458930-B21 750GB SATA SATA7.2k or SAS Slot Disk Disk 458930-B21 SATA7.2k or SAS 750GB SATA Disk Slot Disk 458930-B21 SATA7.2k or SAS 750GB SATA Disk Slot Disk 458930-B21 SATA7.2k or SAS 750GB SATA Disk Slot Disk 458930-B21 SATA7.2k or SAS 750GB SATA Disk Slot Disk 458930-B21 SATA7.2k or SAS 750GB SATA Disk Slot Disk 458930-B21 SATA7.2k or SAS 750GB SATA Disk Slot Disk 458930-B21 SATA7.2k or SAS 750GB SATA Disk Slot Disk 458930-B21 SATA7.2k or SAS 750GB SATA Disk Disk Slot Disk 458930-B21 Enclosure PS 2 SAS In MSA60 SAS PS 1 msa 60 controller Single Bus SAS SAS Out 750GB SATA SATA7.2k or SAS Disk Slot Disk 458930-B21 750GB SATA SATA7.2k or SAS Disk Slot Disk 458930-B21 750GB SATA SATA7.2k or SAS Slot Disk Disk 458930-B21 750GB SATA SATA7.2k or SAS Slot Disk Disk 458930-B21 SATA7.2k or SAS 750GB SATA Disk Slot Disk 458930-B21 SATA7.2k or SAS 750GB SATA Disk Slot Disk 458930-B21 SATA7.2k or SAS 750GB SATA Disk Slot Disk 458930-B21 SATA7.2k or SAS 750GB SATA Disk Slot Disk 458930-B21 SATA7.2k or SAS 750GB SATA Disk 458930-B21 SATA7.2k or SAS 750GB SATA Disk 458930-B21 SATA7.2k or SAS 750GB SATA Disk 458930-B21 Disk Slot SATA7.2k or SAS 750GB SATA Disk 458930-B21 Disk Slot PS 2 Figure 11: Backup file server network and disk configuration MSA60 SAS SATA7.2k or SAS 750GB SATA Disk Slot Disk 458930-B21 SAS In SAS In Disk Slot 750GB SATA SATA7.2k or SAS Disk Slot Disk 458930-B21 SAS Out SAS Out Disk Slot Single Bus SAS SAS Out 27 PS 1 msa 60 controller X2-1 DUPLX STACK PoE MSA60 SAS Single Bus SAS SAS Out MODE PS 1 msa 60 controller Single Bus SAS SAS Out Disk Slot Disk Slot MSA60 SAS Disk Enclosure PS 1 msa 60 controller Single Bus SAS 750GB SATA SATA7.2k or SAS Disk Slot Disk 458930-B21 750GB SATA SATA7.2k or SAS Disk Slot Disk 458930-B21 750GB SATA SATA7.2k or SAS Slot Disk Disk 458930-B21 750GB SATA SATA7.2k or SAS Slot Disk Disk 458930-B21 SATA7.2k or SAS 750GB SATA Disk Slot Disk 458930-B21 SATA7.2k or SAS 750GB SATA Disk Slot Disk 458930-B21 SATA7.2k or SAS 750GB SATA Disk Slot Disk 458930-B21 SATA7.2k or SAS 750GB SATA Disk Slot Disk 458930-B21 SATA7.2k or SAS 750GB SATA Disk Slot Disk 458930-B21 SATA7.2k or SAS 750GB SATA Disk Slot Disk 458930-B21 SATA7.2k or SAS 750GB SATA Disk Slot Disk 458930-B21 SATA7.2k or SAS 750GB SATA Disk Disk Slot Disk 458930-B21 Enclosure MSA60 SAS