Download SQL Server - Toolbox.com

Document related concepts

Serializability wikipedia , lookup

Entity–attribute–value model wikipedia , lookup

IMDb wikipedia , lookup

Microsoft Access wikipedia , lookup

Oracle Database wikipedia , lookup

Team Foundation Server wikipedia , lookup

Database wikipedia , lookup

Extensible Storage Engine wikipedia , lookup

Btrieve wikipedia , lookup

SQL wikipedia , lookup

Concurrency control wikipedia , lookup

Ingres (database) wikipedia , lookup

Open Database Connectivity wikipedia , lookup

Microsoft Jet Database Engine wikipedia , lookup

Versant Object Database wikipedia , lookup

PL/SQL wikipedia , lookup

Relational model wikipedia , lookup

ContactPoint wikipedia , lookup

Database model wikipedia , lookup

Microsoft SQL Server wikipedia , lookup

Clusterpoint wikipedia , lookup

Transcript
PROFESSIONAL DBA SERIES
Christopher Kempster
SQL Server
A Practical Guide to
Backup, Recovery & Troubleshooting
SQL Server – Professional DBA Series
Dedicated to my dearest friend and wife Veronica
and my kids Carl and Cassidy.
Special thanks goes to Trevor Williams for editing the e-book.
Copyright © 2004, 2005 Christopher Kempster
Perth, Western Australia
Copying, selling, duplication or reproduction of this work is expressly forbidden without the copyright holder’s written consent.
All scripts and examples are used at your own risk.
The author does not assume any liability for errors or omissions anywhere in this ebook.
Always backup before performing system changes or attempting a system recovery.
Never test recovery procedures on a production server, be it on a separate database, instance or node in a cluster.
Microsoft Word 2000 is a registered trademark of the Microsoft Corporation.
SQL Server and SQL Server 2000 is a registered trademark of the Microsoft Corporation.
CutePDF v3.07 is a registered product of Acro Software Inc.
FOREWORD
Dear Readers,
I was delighted when Chris asked me to write a forward for his second book, which
covers the important topics of backup, recovery and high availability with Microsoft SQL
Server. This is an exciting release that will fill an important gap in the Database
Administration book market. Microsoft SQL Server is being increasingly used for large
mission critical enterprise systems, which require robust backup and recovery systems.
Providing high availability solutions requires careful planning and implementation and
Chris covers each topic in detail so that the reader is guided every step of the way.
Chris enjoyed good sales with his previous ebook, entitled "SQL Server 2000 for the
Oracle DBA", both in Australia and internationally and I look forward to him achieving
further success with this exciting new release.
ASG welcomes the opportunity to encourage and grow staff excellence wherever
possible. Chris has enjoyed an extensive working relationship with ASG. Chris is always
highly motivated and enthusiastic, and has impressed us all with his in depth knowledge
of both Microsoft SQL Server and Oracle, and importantly his ability to apply this
knowledge to the maximum benefit of our clients. We are delighted to see that Chris is
willing to share his knowledge and experiences with others in the IT Community
through the release of his second ebook. This is very much in line with one of ASG’s key
objectives of “contributing to the development of the IT community”.
Finally, I would like to thank Chris for the opportunity to provide some excellent
international coverage with respect to our world-class technical capabilities in the area
of Microsoft SQL Server administration.
Steve Tull
Chief Solutions Officer
ASG Group Limited.
ii
Table of Contents
PLANNING AND PREPARATION ..........................6
WHAT IS DISASTER RECOVERY PLANNING?..............6
Disaster Recovery Plans (DRP) ............................7
DRP for SQL Server Databases ............................9
Example - Disaster Recovery Planning...............11
FRAMEWORKS FOR IT SERVICE MANAGEMENT ......17
CoBIT (Control Objectives for Best IT Practices)
..............................................................................17
ITIL (Information Infrastructure Library)...........18
Microsoft Enterprise Services Framework (ESF)20
Balanced Scorecard .............................................21
SERVICE LEVEL METRICS ........................................21
The Scale of Nines................................................21
Other availability metrics ....................................22
What is achievable with SQL Server? .................23
RESPONSIBILITY VS. ACCOUNTABILITY ..................23
BUILDING STAFF CAPABILITY..................................24
Consider an Emergency Response Team (ERT)..24
DBA Taining and Skills (building ERT expertise)
..............................................................................25
CHANGE CONTROL...............................................28
MANAGING CHANGE CONTROL BY EXAMPLE .........28
Environment Overview ........................................28
Pre-change window resource meeting ................31
Visual Source Safe (VSS) .....................................32
Managing Servers ................................................33
Development Server .............................................33
Test Server............................................................35
Refreshing TEST from PRODUCTION..........36
Production Support..............................................37
Production............................................................37
Hot Fixes ..............................................................38
Smarten up your applications (Autonomic
Computing)...........................................................39
MRAC of IR/Task Completion .............................40
Summary...............................................................40
USING VSS BY EXAMPLE - PART 1 ..........................41
Terminology .........................................................41
Build Phase – Moving to the new change control
structure ...............................................................41
First Change Control...........................................43
Moving Code to TEST..........................................45
Overwriting existing code in TEST from DEV ....46
Taking a Change Control into Production ..........47
USING VSS BY EXAMPLE - PART 2 ..........................49
How do I move files to next weeks change control?
..............................................................................49
What does VSS look like after 2 iterations of
change? ................................................................51
I forgot to take a file into production for a
schedule change...................................................52
I have a special project that will span many weeks,
now what? ............................................................52
USING VSS BY EXAMPLE - PART 3 ..........................54
Re-iterating our chosen VSS structure ................54
What does my VSS look like to date?...................55
What do you do with /development after each
change control? ...................................................56
What do you branch into /development in terms
of VB6 COM code? .........................................56
VSS Issues ............................................................57
Share/Branching files from /test into /production
..........................................................................57
Building the new /production project ..............57
Adding/Removing Project Source Files ..........57
Error on Renaming Projects ............................58
Use Labels where appropriate .........................59
The "guest" user...............................................59
Security Issues in /production for Branching
files...................................................................59
Welcome to .Net ...................................................61
Initial Configuration of Visual Studio.Net ......61
VS.Net Solutions and Projects .............................62
Important Notes before we continue ...............62
Adding a new (simple) Solution to Source
Control - Example............................................63
VSS FOR THE DBA...................................................66
THEORY AND ESSENTIAL SCRIPTS.................67
UNDO & REDO MANAGEMENT ARCHITECTURE ......67
AUDIT THE SQL SERVER INSTANCE ........................73
META DATA FUNCTIONS ..........................................75
LISTING SQL SERVER INSTANCES ...........................76
INFORMATION SCHEMA VIEWS ................................76
DATABASE, FILE AND FILE GROUP INFORMATION ..77
Extracting Basic Database Information ..............77
Determining Database Status Programmatically77
USING O/ISQL .........................................................78
RETRIEVING LICENSING INFORMATION ...................79
Alter licensing mode after install?.......................79
ALLOWING WRITES TO SYSTEM TABLES ..................80
COUNT ROWS & OBJECT SPACE USAGE ..................81
Space and Memory Usage ...............................82
BLACK BOX TRACING ..............................................82
SCAN ERROR LOG FOR MESSAGES?.........................85
DATABASE LAST RESTORED AND FROM WHERE?.....85
WHAT STORED PROCEDURES WILL FIRE WHEN MY
INSTANCE STARTS?...................................................86
WHEN THE DATABASE WAS LAST ACCESSED? .........86
ESSENTIAL TRACE FLAGS FOR RECOVERY &
DEBUGGING ..............................................................86
Example of setting and verifying the trace flags .87
“TRACE OPTION(S) NOT ENABLED FOR THIS
CONNECTION” ? ........................................................89
BULK COPY OUT ALL TABLE DATA FROM DATABASE
..................................................................................89
SQLSERVR BINARY COMMAND LINE OPTIONS.....89
SQL SERVER LOG FILES ..........................................90
Read Agent Log Example................................91
How and When do I switch sql server logs?........91
DETECTING AND DEALING WITH DEADLOCKS .........91
Example Deadlock Trace.................................94
ORPHANED LOGINS ..................................................95
ORPHANED SESSIONS - PART 1 ................................96
ORPHANED SESSIONS - PART 2 ................................97
CHANGE DB OWNER ................................................97
TRANSFER DIAGRAMS BETWEEN DATABASES .........98
TRANSFER LOGINS BETWEEN SERVERS ...................99
KILLING SESSIONS ....................................................99
The ALTER Statement ........................................100
How do I trace the session before Killing it ? ...100
SETTING UP AND SENDING SQL ALERTS VIA SMTP
................................................................................101
The Stored Procedure ........................................103
Creating the Alert ..............................................103
Testing the Alert .................................................106
Recommended Backup and Restore Alerts ........106
HIGH AVAILABILITY..........................................107
PURCHASING THE HARDWARE ...............................107
So what hardware should I buy? .......................107
What is the HCL or the “Windows Catalog”....110
HIGH AVAILABILITY USING CLUSTERS ..................110
VMWARE SQL Cluster - by Example................111
Using VMWARE in Production? .......................112
Step 1. Software & Licensing............................112
Step 2. Create the virtual servers .....................113
Step 3. Build your domain controller ...............115
Step 4. Build member server 1 (node 1 of the
cluster)................................................................118
Adding SCSI Disks ........................................118
Adding another NIC for the Private Network120
Prepare your SCSI disks ................................121
Install Cluster Services on Server (node) 1 ...122
Validate Node 1 in the cluster via Cluster
Administrator .................................................124
Step 5. Build member server 2 ..........................125
Step 6. Install SQL Server 2k in the cluster
(Active/Passive)..................................................126
Test Connectivity ...........................................134
HIGH AVAILABILITY USING LOG SHIPPING............135
Manual Log Shipping - Basic Example .............136
Custom Logshipping – Enterprise Example ......140
Log Shipping Example 1 - Setup and Running
........................................................................143
Server 1 (source) ............................................143
Server 2 (destination).....................................144
Log Shipping Example 2 - Finalising Recovery /
Failover ..........................................................146
Concluding thoughts ......................................146
TROUBLESHOOTING SQL CLUSTERS...........147
TROUBLE SHOOTING AND MANAGING CLUSTERS .147
How many MSMQ’s can I have per cluster?.....147
I am having trouble starting the cluster (Win2k)
............................................................................147
Why can’t I backup to C Drive? ........................148
Move SQL Server cluster between SANs ...........148
Should I change the Service Dependencies in
Cluster Administrator? ......................................149
How can I stop Full Text indexing affecting the
whole cluster group? .........................................149
Diagnosing Issues, where to look? ....................149
Can I delete the BUILTIN\Administrators group in
SQL?...................................................................150
Correct way of stopping a clustered SQL instance
............................................................................151
How do I keep the Instance offline but start it
locally for maintenance? ...................................151
Can I automatically schedule a fail-over? ........152
Correct way to initiate a failure in Cluster
Administrator .....................................................152
Any Windows 2003 restrictions with clustering?
............................................................................153
Changing Service Account Logins/Passwords ..153
Event logs between cluster nodes – can I sync
them also? ..........................................................153
Nodes in a cluster via Query Analyser? ............153
Failed to obtain TransactionDispenserInterface:
Result Code = 0x8004d01b ...............................153
Altering Server Network Properties for the
Instance ..............................................................153
Add Disk E: to our list of disk resources for SQL
Server .................................................................154
Cluster Network Name resource 'SQL Network
Name(SQLCLUSTER1)' cannot be brought online
because the name could not be added to the
system. ................................................................155
I renamed my sql server virtual cluster name –
now I am getting errors and the instance will not
start? ..................................................................156
How to I alter the IP address of the virtual server?
............................................................................157
The Microsoft Clustering Service failed to restore
a registry key for resource SQL Server .............157
Reinstall SQL Server on a Cluster Node ...........157
How to remove a SQL Server Instance from the
cluster.................................................................158
Remove/add a single sqlserver node from the
clustered instance (not evicting a node from the
cluster service itself) ..........................................158
COMCLUST and Windows 2003 Server ...........158
Try to run service pack setup.bat and tells me
“Setup initialization error. Access Denied” .....158
Applying a service pack to the SQL Server
clustered instance ..............................................159
BACKUP...................................................................160
BACKUP FUNDAMENTALS ......................................160
Importance of Structure and Standards ............160
Directory Structures.......................................161
Naming Rules ................................................162
Database File Names .....................................163
Logical Filenames and File group Names .....163
Default properties ..........................................164
Recovery Interval ...............................................165
Recovery Models................................................166
What privileges do I need to backup databases?
............................................................................168
Backup and Restore between Editions of SQL 2k
............................................................................168
Backup Devices ..................................................168
Database Maintenance Plans ............................168
Data Dictionary Views.......................................171
Removing Backup History from MSDB .......172
Full (complete) Backups ....................................172
Differential Backups ..........................................174
Transaction Log Backups ..................................175
Log backups failing when scheduled via
Database Maintenance Plan ...........................176
Filegroup Backups .............................................176
OLAP Backups ...................................................177
Can I compress backups? ..................................177
Can I backup and restore over a UNC path?....177
Logon failure: unknown user name or bad
password.........................................................178
What is the VDI? ................................................178
WHAT, WHEN, WHERE, HOW TO BACKUP .............178
What is the DBA responsible for? .....................178
What do I backup? .............................................179
How do I backup? ..............................................179
When do I backup?.............................................180
Where do I backup? ...........................................180
HOW BIG WILL MY BACKUP FILE BE? .....................180
Full .....................................................................180
Differential .........................................................181
Transaction Log .................................................181
Using the MSDB to view historical growth .......181
HOW DO I BACKUP/COPY DTS PACKAGES? ..........182
SOME BACKUP (AND RECOVERY) BEST PRACTICE 183
BACKUP CLUSTERS - DBA.....................................185
BACKUP PERFORMANCE.........................................185
CUSTOM BACKUP ROUTINES – HOW TO ................186
RECOVERY & TROUBLESHOOTING..............187
IMPORTANT FIRST STEPS ........................................187
CONTACTING MS SUPPORT ....................................188
WHAT PRIVILEGES DO I NEED TO RESTORE A
DATABASE? ............................................................189
REVISITING THE RESTORE COMMAND .................190
AUTO-CLOSE OPTION & TIMEOUTS ON EM ..........192
CAN I RE-INDEX OR FIX SYSTEM TABLE INDEXES? 192
CONNECTIONREAD (WRAPPERREAD()). [SQLSTATE
01000] ....................................................................193
SPACE UTILISATION NOT CORRECTLY REPORTED? 194
GUEST LOGIN MISSING ON MSDB DATABASE .......194
TROUBLESHOOTING FULL TEXT INDEXES (FTI)....194
General FTI Tips................................................195
LOCKED OUT OF SQL SERVER? .............................195
INSTANCE STARTUP ISSUES....................................196
“Could not start the XXX service on Local
Computer” .........................................................196
SSPI Context Error – Example with Resolution 197
Account Delegation and SETSPN......................200
I’M GETTING A LOCK ON MODEL ERROR WHEN
CREATING A DB? ....................................................201
TRANSACTION LOG MANAGEMENT .......................202
Attempt backup but get “transaction log is full”
error ...................................................................202
Alter recovery model, backup and shrink log file
............................................................................203
Shrinking Transaction Log Files .......................203
Step 1. Get basic file information ..................203
Step 2. I don’t mind loosing transaction log
data (point in time recovery is not important to
me), just shrink the file ..................................204
Step 3. I need the transaction log file for
recovery..........................................................204
Step 4. Shrink the transaction log.................205
Rebuilding & Removing Log Files ....................206
Removing log files without detaching the
database..........................................................206
Re-attaching databases minus the log?..........207
Using DBCC REBUILD_LOG()...................209
CAN I LISTEN ON MULTIPLE TCP/IP PORTS?..........210
OPERATING SYSTEM ISSUES ..................................210
I see no SQL Server Counters in Perfmon?.......210
Server hostname has changed ...........................211
Use DFS for database files? ..............................214
Use EFS for database files? ..............................214
Use Compressed Drives for database files?......216
“previous program installation created pending
file operations” ..................................................216
DEBUGGING DISTRIBUTED TRANSACTION
COORDINATOR (MSDTC) PROBLEMS ....................216
Failed to obtain TransactionDispenserInterface:
Result Code = 0x8004d01b ...............................216
Essential Utilities ...............................................216
Check 1 - DTC Security Configuration .............217
Check 2 - Enable network DTC access installed?
............................................................................217
Check 3 - Firewall separates DB and Web Server?
............................................................................218
Check 4 - Win 2003 only - Regression to Win 2000
............................................................................218
Check 5 - Win 2003 only - COM+ Default
Component Security...........................................219
COMMON DEVELOPMENT/DEVELOPER ISSUES ......219
I’m getting a TCP bind error on my SQL Servers
Startup?..............................................................219
Error 7405 : Heterogeneous queries.................219
Linked server fails with enlist in transaction
error? .................................................................220
How to I list tables with Identity Column property
set? .....................................................................220
How do I reset Identity values? .........................220
How do I check that my foreign key constraints are
valid?..................................................................220
I encrypted my stored proc and I don’t have the
original code!.....................................................220
How do I list all my procs and their parameters?
............................................................................221
Query Analyser queries time out? .....................221
“There is insufficient system memory to run this
query” ................................................................221
My stored procedure has different execution
plans? .................................................................222
Using xp_enum_oledb_providers does not list all
of them?..............................................................223
The columns in my Views are out of
order/missing? ...................................................224
PRINT statement doesn’t show results until the
end?....................................................................225
PRINT can result in Error Number 3001 in
ADO ...............................................................225
Timeout Issues....................................................225
ADO ...............................................................225
COM+ ............................................................226
OLEDB Provider Pooling Timeouts .............227
IIS...................................................................227
SQL Server.....................................................229
Sample Error Messages .................................230
Is the timeout order Important? .....................231
DBCC COMMANDS ................................................231
What is – dbcc dbcontrol() ? .............................231
What is - dbcc rebuild_log() ? ...........................232
TROUBLESHOOTING DTS AND SQL AGENT ISSUES
................................................................................233
Naming Standards..............................................233
I’m getting a “deferred prepare error” ? .........233
Debugging SQLAgent Startup via Command Line
............................................................................233
Don’t forget your package history! ...................234
Where are my packages stored in SQL Server? 234
DTS Package runtime logging...........................235
I get an “invalid class string” or “parameter is
not correct”........................................................236
I lost the DTS package password.......................236
I lost a DTS package - can I recover it?............236
Access denied error on running scheduled job .237
Changing DTS package ownership ...................237
I have scheduled a package, if I alter it do I recreate the job?....................................................237
xpsql.cpp: Error 87 from GetProxyAccount on line
604......................................................................237
DTSRUN and Encrypted package call ..............238
TEMPDB IN RAM – INSTANCE FAILS TO START ...238
RESTORE A SINGLE TABLE FROM A FILE GROUP ....239
Pre-Recovery Steps ............................................239
Recovery Steps ...................................................240
Can I do a partial restore on another server and
still get the same result? ....................................243
Can I do a partial restore over the live database
instead? ..............................................................243
Restore over database in a loading status?.......244
MOVING YOUR SYSTEM DATABASES ......................244
Moving MASTER and Error Logs .....................244
Moving MSDB and MODEL..............................245
Moving TEMPDB...............................................246
Moving User Databases.....................................247
Some issues with MODEL and MSDB
databases ........................................................251
Fixed dbid for system databases ....................251
Scripting Database Objects ...............................252
Verifying Backups ..............................................254
RECOVERY ..............................................................255
A quick reminder about the order of recovery ..255
Killing User Connections and Stopping Further
Connects.............................................................256
Using the GUI for Recovery ..............................256
Options - Leave database in non-operational
state but able to restore additional logs .........257
Options – Using the Force restore over existing
database option...............................................258
Restoring a databases backup history from backup
files .....................................................................259
SQLServer Agent must be able to connect to
SQLServer as SysAdmin.....................................259
Restore cannot fit on disk...................................260
“Exclusive access could not be obtained..” ......260
Restore uses “logical” names ...........................260
UNABLE TO READ LOCAL EVENT LOG. THE EVENT
LOG IS CORRUPTED .................................................261
WHAT IS A “GHOST RECORD CLEANUP”?..............261
HOW DO I SHRINK TEMPDB? ...............................261
Shutdown and re-start........................................262
Use DBCC SHRINKDATABASE .......................262
Use DBCC SHRINKFILE ..................................263
HOW DO I MIGRATE TO A PREVIOUS SERVICE PACK?
................................................................................265
Full Rollback......................................................265
Service Pack install hangs when “checking
credentials”........................................................267
OLAP .....................................................................268
Recovery of OLAP cubes to another server ......268
Non-interface error: CoCreate of DSO for
MSOLAP ............................................................268
What TCP port does Analysis Services use? .....269
RESTORATION SCENARIOS .....................................269
Dealing with Database Corruption ...................269
How do I detect it?.........................................269
How do I recover from it? .............................270
“Database X cannot be opened, its in the middle
or a restore” ......................................................272
Installing MSDB from base install scripts.........272
Model and MSDB databases are de-attached
(moving db files)? ..............................................272
Restore Master Database ..................................275
Restore MSDB and Model Databases ...............277
No backups of MODEL ? ..............................277
No backups of MSDB ?................................277
Recovery of System Databases and
NORECOVERY option.......................................277
Collation Issues - Restores from other Instances or
v7 Upgrades .......................................................278
Suspect Database (part 1) .................................280
Suspect Database (part 2) and the 1105 or 9002
error ...................................................................281
Suspect Database (part 3) – restore makes
database suspect? ..............................................283
Suspect Database (part 4) – Cannot open FCB for
invalid file X in database XYZ ...........................284
Suspect Database (part 5) – drop index makes
database suspect? ..............................................285
How do I rename a database and its files? .......288
Database is in “Loading” Mode ? ....................290
Restore with file move........................................290
Restore to a network drive.................................290
Restore a specific File Group ............................291
Adding or Removing Data Files (affect on
recovery) ............................................................293
Emergency Mode ...............................................293
Restore Full Backup...........................................294
Partial (stop at time) PITR Restore on a User
Database ............................................................295
Corrupt Indexes (DBMS_REPAIR) ...................295
Worker Thread Limit of N has been reached? ..296
Reinstall NORTHWIND and PUBS...................296
Some of my replicated text/binary data is being
truncated? ..........................................................296
Other Recovery Scenarios .................................296
Scenario 1 - Lost TEMPDB Database..........296
Scenario 2 - Rebuildm.exe...........................297
Scenario 4 - Disks lost, must restore all system
and user databases from backup to new
drive/file locations .........................................301
INDEX.......................................................................302
APPENDIX A ...........................................................306
UNDERSTANDING THE DISK, TAPE AND STORAGE
MARKET .................................................................306
SAN (Storage Area Network).............................306
Example SAN Configuration.........................310
What is NAS (Network Attached Storage) ? ......310
What is iSCSI? ...................................................311
Anything else apart from iSCSI? .......................313
Using Serial ATA over SCSI ..............................314
SCSI, Fiber or iSCSI? ........................................315
Hard Disk Availability - Overview of RAID......316
Summary ........................................................316
Performance/Cost/Usage ...............................316
Disk Performance...........................................317
Database File Placement - General Rules......318
Example RAID Configurations .....................319
Virtualising Storage Management – the end game
............................................................................322
So as a DBA - what storage scheme do I pick?.323
TAPE Drives ......................................................326
Building a Backup Server..............................329
Who needs tapes when I can go to Disk? ..........331
IN THE DATA CENTRE ............................................332
Understanding server Racks..............................332
What are Blade Servers and Blade Centers? ....334
REFERENCES.........................................................337
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
1
Chapter
“..how you develop [it] is at least as important as the final result”
A.M.Schneiderman
Planning and Preparation
T
he role of DBA is undoubtedly an important one, but many DBAs tend to be
somewhat blasé about backup and recovery. This e-book attempts to bridge the
knowledge gap and provide working scenarios, policy/procedure and best practice
for database backup and recovery.
As with my previous e-book, I assume a reasonable knowledge of DBMS architecture and
in particular some general DBA experience with SQL Server.
This first chapter covers a range of planning and preparation strategies that will
ultimately define your system design, backups and recovery procedures. We will focus
on disaster recovery planning and frameworks for IT service management, then take this
further in chapter two and change management (by example) then chapter three with
alternatives for high availability.
What is Disaster Recovery Planning?
This is a complex question. It conjures thoughts of business continuity, data and
database recovery, network and server availability, staff expertise/training/availability
and policy and procedures. So the question is probably not so much “what is disaster
recovery?” (the title tends to be self explanatory), but “at what point do you draw the
line?”, and how much time and money are you prepared to spend curbing the multitude
of possibilities?
That said; let us define disaster recovery planning.
Planning for Disaster Recovery is synonymous with contingency planning; it is “a plan for
backup procedures, emergency response and post-disaster recovery”. This plan
encompasses the who/how/where/when of “emergency response, back operations and
post-disaster” procedures to “ensure the availability of critical resources and to facilitate
the continuity of” business “operations in an emergency situation” (2).
It is very interesting reading the variety of thoughts in this space (3), one particularly
interesting one was the division of the DR from that of business continuity planning:
Christopher Kempster
6
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
a) Disaster Recovery (DR) - the process of restoring systems [including manual and
automated business process] to an operational [state] after a catastrophic
systems failure leading to a complete loss of operational capability. (3a)
b) Business Continuity (BC) - is the forethought to prevent loss of operational
capability even though there may be a catastrophic failure in some parts of the
system. BC includes DR plans to restore failed system components. (3a)
Here the two key elements are forethought to prevention and the process of recovery/
resumption of business – both are essential components partners in building, maintaining
and sustaining capability at a technical and business services level (we understand the
risks, decrease the risks, and manage the risks). Only at this point can we, through a
fine balance of money and persistent capability, be confident in our ongoing DR planning
(DRP).
Disaster Recovery Plans (DRP)
Disaster recovery is divided into two distinct processes:
a) IT recovery planning or IT system recovery planning
b) business continuity planning – business and IT risk assessment, mitigation
planning, both manual, automated, physical and logical. This is a overarching
feeder to a) in terms of where a bulk of the focus will be for IT disaster plans,
based upon known business imperatives (ie. we only doing what is relevant to the
business and its overarching strategy).
For simplicity sake, we will use the acronym DRP to encompass a). Although important,
b) will not be covered any further in this ebook.
The focus of DRP is in the “recovery of the essential [IT] functions and [IT] systems of an
organization”, and “emphasizes recovery within the shortest possible time” (51). The
DRP provides the road-map that details the actions performed before, during and after a
disaster. The plan(s) are a comprehensive set of statements that address any IT disaster
that could damage the business and its [core business] services. (51)
The process of planning is an iterative one, base upon:
a) efficiency : doing the right thing at the right time, before, during and after a
disaster (speed, sustainability and thoroughness)
b) effectiveness : cost-effective resumption and business service recovery, effective
coordination of recovery (cost, coordination, end-game achieved)
c) [professional, legal] obligations, [formal] contractual commitments and
accountabilities
In order to effectively write and measure IT performance against SLA’s; or underpinning
contracts with external providers and operational level agreements (between your IT
sections, i.e. comms, prod support, in-house developers, help desk etc), the DRP is a
fundamental driver for defining the contracts and outlining areas of concern. The
commitment to high quality through legally (and financially) bound commitment ensures
efforts are made to honor them. (51)
Christopher Kempster
7
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
The DRP documentation is context based, typically in one of three strategic views: (51)
1) Mitigation
What measures are in place to minimize the impact of identified disaster
scenarios.
2) Anticipation
What are the measures to respond to and recovery from a disaster.
3) Preventative
What measures are in place to prevent disasters from happening? This includes
problem and known error management.
Baselining your existing environment to secure and manage business and IT
dependencies in which to forge prevention strategies.
Our disaster plan “table of contents”, its ownership and iterative management is very
much based on the document’s strategic view. Be aware that DRP documents need to
be:
•
Prescriptive;
•
Simple to follow; and
•
Fact orientated.
to be an effective working document. This may require professional (third party)
assistance.
The process of DRP definition can be broken down as: (52)
Initiate
Maintain
Analysis
Test
Create
Initiate - form team, identify scope, resources required, timeline, stakeholders, budget
and management buy-in. Link the DRP’s statement of work to existing initiatives and
most importantly to the business continuity requirements. The identification of
overarching strategic view is required.
IMPORTANT – The DBAs may need to drive the process if nothing is currently in place
or is not actively persued by management. The process of drafting the first DRP
document may be the process initiator.
Analysis – requirements gathering, scope item drill through, build activity list, pass
through concerns and highlight issues, priorities and set deliverables.
Christopher Kempster
8
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
Create – create plan, document all processes/procedures/steps required to meet stated
scope and activities from the analysis and initiation phases.
Test (iterative) – evaluate and walkthrough DRP, of key importance is the relevancy or
effectiveness for the stated objective. The planned should be appropriate for the reader,
simple to implement and care taken with the workflow of steps, identifying contact points
and highlighting weaknesses as early as possible. Test plans are retained for audit, and
to assist with managed re-iteration.
Maintain (iterative) – ongoing plan maintenance, review and active ownership; measures
should be applied at this stage to ensure the plan is persisted. Assign group
responsibility and accountability.
Avoid merging DRP documents into one or two overarching papers for the entire IT shop,
or for all databases. I highly recommend dividing them based on the context in which it
is written. The plans should be key part of the Intranet website (secured appropriately
of course), versioned and burnt to CD. There is little point in planning unless the results
are documented and made available.
DRP for SQL Server Databases
Depending on the site the DBA may be more or less prescriptive with the steps taken to
restore a service (i.e. list all steps for restoring the master db for example), this may be
based on in-house expertise, team size and site attendance.
Based on our discussion of strategy and the iterative approach to DRP definition, the DBA
needs to consider:
•
•
Impact on system users if the database is not available
o
The DBA needs to understand the user and business impacts of the
database not being available, both for read/write and read-only. You need
to be somewhat pragmatic with the choices made to keep the system
available, and be sure the business users are heavily involved; a 24x7
system with 1hr maximum downtime may consider a one day loss of data
as an acceptable compromise for example; a online shopping system may
not.
o
Communication plan – who is contacted and when? how? apart from
your technical staff does the communication plan encompass business
users?
Storage of installation disks, associated physical media and license keys
o
Physical access (lockdown and security procedures) and storage of SQL
Server installation media, especially license keys
o
Third party utilities must be considered
o
OS installation disks and license keys
o
How the change management process ensures media is updated (in the
right areas and in a timely fashine)
Christopher Kempster
9
S Q L
•
•
•
•
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
Restoration
o
Can we restore in time to meet SLA’s? what can we do to achieve them?
The cost of trying to do so (even at the risk of human error due to
complexities involved) and do we need to revisit the SLA?
o
Recovery scenarios (server, full instance and binaries, databases, tables,
replication, full text indexes, OLAP cubes)
o
User account/login details (consider essential OS and domain level logins,
DBMS service accounts, dialup procedures and access rights and software
required etc)
o
Access to backup tapes, time to restore and responsibilities, re-call of tape
procedure (the cost and signatories to receive media)
o
Processes for dealing with corrupt backups, missing tapes (or overwritten
ones) and/or database files
o
Process for system database restoration
o
Checkpoints before recovery will begin
Staff capability and availability
o
Key staff contact and escalation list, phone numbers (and physical phones
- never use private phones for business work)
o
Microsoft Support contact numbers and keys/credit information
o
New training requirements based on proposed HW/SW selection or base
capability to date
o
External vendor support and underpinning contracts for DBA expertise
o
Reference books/manuals and how-to’s
o
Dialup/remote access procedures (includes after hours system access,
taxi/resource expense claims, minimum hardware required for remote
access, what staff “cant” do whilst on call).
Inter-system dependencies and distributed transactions or replicas
o
Dependencies to other business components, such as web servers, middle
tier, distributed systems, especially where transactions span time and
service.
o
Startup order, processing sequence of events (ie. data resend or
replication re-publishing steps etc). This is very important in sites using
replication where the time taken to restore such services and re-push or
pull replicas may be significant.
Backups
Christopher Kempster
10
S Q L
•
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
o
Backup file access and file retention periods, both tape and on-disk
o
Speed of accessing backups, simplicity of? Require others to intervene?
o
Full backup cycle, including regular system database backups
o
Verify backups. Test restore procedures and measure to ensure this is
occurring
o
Combinations of Full, Differential and Log backups
o
Log shipping – including the security and compression/encryption of log
shipped files and the complexity this may bring
o
System dependencies that will affect the backup, namely the Windows
registry, OS binaries (system state) etc.
Database hooks, associated modules and links
o
Installed extended stored procedures
o
Database links and their usage/security/properties
o
Full text catalogs
•
Hardware and Software Spares
•
Change Management Procedures
•
Audit of existing environment
•
Fail-safe tasks – I recommend the following at a minimum
o
Full db script monthly
o
Check for DB corruption daily (when possible – dbcc checkdb)
o
SQL Diagnostic (sqldiag.exe) dump daily to assist with debugging major
system crashes
o
Maintenance of global instance and database parameters (configuration
and initialization settings, trace flags, startup stored procedures, database
properties and file locations etc)
With these and others, the DBA is well on track to provide a solid framework in which to
base DRP; that being database and associated service recovery in a efficient and effective
manner.
Example - Disaster Recovery Planning
It will be rare that your existing company has no DR plans in some form that you can
leverage and build upon – so my advise is go and look for it. It is important that you
blend in with the initiatives of other team members and that of management to gain
Christopher Kempster
11
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
support in the time you will spend writing and maintaining them (which can be
significant).
The DR documentation may be similar to the organizational chart, where all management
and exective members are signatories to, and part of the communication of service based
recovery plans. I say service based in that the databases you manage from day to day
support core applications and delivery critical services to the business in which you are a
part of.
The diagram below provides an example of the DR documentation produced for an
organization, and its context in terms of the services it applies to:
Master Disaster
Recovery Plan
Corporate
App.System Plan
DBA – Database
Recovery Plan
AP – Application
Recovery Plan
a) “Crisis and Disaster Management” document contents
a. Provides general guidance for management of a disaster.
b. Defines the roles and responsibilities associated with the management of a
crisis. It is not unusual to have three core roles - crisis manager,
communications manager, recovery manager and the finance/purchasing
manager.
Christopher Kempster
12
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
Crisis
Manager
Vendors
Purchasing
& Finance
Recovery
Manager
Recovery Teams
Operations
App Support
Vendors
etc
Comms
Manager
External
Firms /
Contractors
Help
Desk
Purchasing
& Finance
Company
Management
Disaster
Coordinator
Legal Reps
Board/
Directors
Media
Coordinator
Media
Public
Users
c. Communication initiation “sequence of events” and associated flow-charts,
including:
i. Initial Notification
ii. Ongoing Updates
iii. Communication Method(s)
iv. Logging of communications and time based events
v. Communication Milestones (dependencies) – this will include the
when, responsibility and action information.
d. Reference to the “Crisis and Disaster Contact Details” document and its
use and maintenance.
e. Crisis Management section that defines
i. When a crisis is declared (pre-conditions)
ii. Crisis Management coordination process(es) – includes the use of
a central communications center, crisis meetings, record keeping,
involvement of the business, crisis closure
iii. Disaster Management coordination process(es) – escalation to a
disaster (triggers), disaster management center (where?,
staffing?, point of contact?, notifications, record keeping, requests
for resources, public/media questions)
iv. Service Restoration Prioirity List – list of core services (ie. IT
applications/services)
v. Release of Funds
vi. Templates for
Christopher Kempster
13
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
1. Crisis and Disaster Review Meetings
2. Actions Lists (notifications/escalations)
3. Contact Lists
4. Sample email and SMS messages
5. Broadcast messages
6. Activity Logs
7. Crisis and Disaster Declaration Memo’s
b) “Crisis and Disaster Contact List” document contents
a. Simple tabular form divided into section, representative of the contacts
“role”, for example management, external partners, help desk/others
b. Includes confidentiality notices and reference to the above plan on its use
c) “Backup and Disaster Recovery Testing” document contents
a. This document is an overarching statement of work that spells out the
process of backup and disaster recovery testing. The document lays out
the ground rules and what is expected by all parties identified as
responsible and accountable for the application. The document will
include schedules for annual/monthly tests, including signoff forms and
registers.
b. Specifically, the document should include the following content items:
i. Definition of Application class levels and their frequency of disaster
recovery testing
ii. List of core applications and their business priority and the class of
application
iii. Description of the disaster recovery testing cycle (iterative cycle
consisting of walk-though, simulation testing, parallel testing and
full interpretation testing (running prod on DR machines).
iv. Detailed summary and flow charts of the implementation of a
disaster recovery test. Including references to template
documentation, staff contacts etc.
Christopher Kempster
14
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
v. Backup test procedures – basic summary of what a backup is,
what should be considered when testing a backup.
d) “[AppName] – Disaster Recovery Plan” document contents
a. This document is based upon a template from the master recovery plan
and is used for each and every core service. Typical content items
include:
i. Introduction, scope and audience
ii. Recovery strategy – including a summary of the availability times
(from SLA if you have one), invocation (who is authorized to),
guidance as to how to initiate and control the situation, system
dependencies (system, doc reference, contact), recovery team
(name, title, contact), how to request additional resources,
recovery team checklist (ie. confirmed invocation?, established
recovery team?, arranged for backup/SW media? etc..)
iii. Recovery procedure – Infrastrucure (locations, media), Data
Recovery (OS, file systems, databases), Application Recovery
(interfaces, user interface), Assumptions/Risks, Referring low level
documentation (see next).
e) “[AppName] - Setup and Configuration” document contents
a. This document provides step by step instructions for complete reinstallation of all components making up the service. The document
should NOT refer to lengthly installation sheets from vendors where
possible, but if so, it needs to be well clarified with environment specific
notes.
b. In writing the document, consider colour coding based on responsibility
(sys admin, DBA, help desk) and function (web server, database, middle
tier etc). The document should be executed serially where possible.
c. Based on the Master Recovery Plan, this document will include the
communication and escalation paths in a flow chart format. Contact
details are also included which are specific to the application being
delivered.
f)
“[AppName] - [DBA] - Disaster Recovery Plan” document contents
a. Your specific database backup plan may be part of a wider (enterprise)
based backup and restore strategy for all corporate databases. A classic
example is backups driven via centrally managed backup software, where
the process for restore is the same no matter the underling DBMS (to
some degree).
b. The experitise of DBA’s may dictate the prescriptive nature of this
document; for example, will you describe at length the process for
restoring the system databases? Consider this when developing the
contents, for example:
Christopher Kempster
15
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
i. Backup schedule
1. Types of backups taken and their retention periods, for
example, you may do a nightly full, but also mention a full
monthly that is retained for 6 months.
2. Backups start when and normally run for how long?
(include the reasoning behind the times chosen and if any
other job may impact the schedule)
3. special conditions apply? ie. is there any reason why the
schedule will differ for a specific part of the month or year?
4. Standard backup file retention period
5. How the backups are performed (written to disk first then
taken to tape?, log shipped? Etc)
ii. Backup Maintenance
1. Monitoring of backups
2. Archival of older backups
3. Backup testing
4. Assumptions and risks of testing
iii. Recovery process
1. Initiation and Communication Procedures (is a timely
reminder of the overarching process that must be followed)
2. Media and Passwords
3. Requesting File Restoration
4. Database and Instance Configuration
a. Server level properties (applicable to the DBMS)
b. Instance level properties
c. Database level properties
d. Replication Configuration
e. Full Text Indexing
f.
Logins and Users (including their database security
properties)
g. DTS Packages and their Job schedules
Christopher Kempster
16
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
5. Order of Recovery
6. System Interface Recovery Procedures
7. Database Recovery
a. Pre-Conditions (recovery of media etc)
b. Recovery Senarions (may include a wide variety of
senarions, from system databases, suspect
databases, lost DTS jobs, moving to the DR server
etc).
c. Post-Conditions (including steps to be taken if
partial recovery was only possible)
Frameworks for IT Service Management
What I discuss throughout this book is very much technical; covering the how and why of
backup and recovery at the DBMS. At a higher level, this should simply be part of a
Corporate and IT Governance framework that actively translates business objectives to IT
objectives through a common language, roles/responsibilities, accountabilities, and help
drive the same strategic goals for the business.
This section will provide a very short summary of frameworks in relation to IT service
management processes.
I cannot stress enough the importance of such frameworks within your organization.
Later in this book I discuss a customised version of change management for a small
applications development group, but many governance models take this much further in
terms of a complete solution for service management and delivery.
CoBIT (Control Objectives for Best IT Practices)
CoBIT is an open standard outlining good practice for the management of IT processes,
and most of which is free to download (www.isaca.org). The “ISACA and its affiliated IT
Governance Institute lead the information technology control community and serve its
practitioners by providing the elements needed by IT professionals in an ever-changing
worldwide environment.” (4)
The key items of focus in terms of DR are found under the Delivery and Support control
objective (aka Domain):
a) Manage Changes – outlines the process of change, requests for, SW release
policies, maintenance and documentation etc, all essential components that can
assist in further DR planning.
b) Ensure Continuous Service – the establishment of a continuity framework with
business process owners.
c) Manage third party services – providers of third party services are controlled,
documented and interfaces managed. This encompasses continuity of service
risks.
Christopher Kempster
17
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
d) Educate and train users
e) Manage problems and incidents
f)
Ensure system security
The reader should download and read the Framework documentation related to Delivery
and Support as we have only touched a small number of processes.
NOTE – Many (if not all) of the COBIT processes map 1:1 to the ITIL framework
discussed next.
ITIL (Information Infrastructure Library)
ITIL has developed into a defacto world standard for IT Service Management from its
beginnings in the late 1980’s by the British CCTA (Central Computer &
Telecommunications Agency – now called the Office of Government Commerce). From
its original library of some 32 books, the latest revision (2000/2001) sees a complete
restructure down to two core condensed volumes, concerning itself with the delivery and
support of IT services appropriate to the requirements of the business.
The framework itself provides proven methods for the planning of common processes,
roles and activities and their inter-relationships (communication lines). More importantly,
the framework (like others) is goal orientated around modules, so each process can be
used on its own or part of a larger model.
In terms of DR within the ITIL framework (www.itil.co.uk), both the service delivery and
service support sides provide complementary processes for the delivery of IT services to
the business:
Christopher Kempster
18
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
Operational Level
(Service Support)
&
T R O U B L E S H O O T I N G
Tactical Level
(Service Delivery)
Incident
Management
Service Level
Management
Problem
Management
Availability
Management
Configuration
Management
Change
Management
Release
Management
Capacity
Management
CMDB
Financial
Management
Continuity
Management
Service Desk
Some of these are:
a) Availability Management – deals with and guarantees the demands of systems
availability. It is focused on the reliability and flexibility of operational systems,
not only from internal staff with hard/soft problems, but includes contractual
stipulations by suppliers.
b) IT Service Continuity – also known as contingency planning manages all
information required to recover an IT service. It notes the importance of a
business and IT focuses to continuity management and that both part of the same
fabric. Through risk analysis, assessment and measurement, it will stipulate the
how and when of the recovery cycle in terms of real business need.
c) Configuration Management – register of configurable items and their relationships
(not just an IT asset register) within a database known as the CMDB. This
provides the fundamental basis for other processes, with the registration of not
only hardware and software, but SLA’s, known errors and incidents etc. The key
is relationship management to drive corporate repository of knowledge to assist
all key ITIL processes in some form.
d) Release Management – manages the planned and applied changes to components
in the IT infrastructure.
e) Problem Management – deals with the root causes of disruptions to IT services.
The process attempts to distinguish, recognize, research and rectify issues with
the aim to minimizing recurrence of such problems.
Christopher Kempster
19
S Q L
f)
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
Incident Management – first line support processes/applications in place for
customers where they experience problems with IT services.
g) Change Management – is accountable for all service changes within the IT
environment. It is driven via a formal set of steps and management processes to
manage change and coordinate the deployment of change with release
management; updating the CMDB along the way. The change management
processes are driven via RFC’s (requests for change) and typically a CAB meeting
to track, accept and evaluate proposed changes. Forward schedules of change
clearly define the proposed release dates and keep all parties well informed.
All of the processes naturally encompass the daily working practices of any IT shop, no
matter the size, and ITIL can be effectively adapted from the guiding principles it
provides. Remember this is a service management system of core processes to assist in
aligning IT to the business, and it is far from being a technical or prescriptive how-to for
IT management.
Microsoft Enterprise Services Framework (ESF)
There are three components to the cyclic ESF framework:
1) Prepare – Microsoft Readiness Framework (MRF)
2) Plan and Build – Microsoft Solutions Framework (MSF)
3) Manage – Microsoft Operations Framework (MOF)
The MOF part of ESF can may be regarded as an extension to ITIL in terms of distributed
IT environments and the latest IT trends (web transactional based systems etc). In
practice, it is a Microsoft implementation around Microsoft technologies.
The MOF has two core elements:
a) Team Model – describes in detail how to structure operations teams, the key
activities, tasks and skills of each of the role functions, and what guiding
principles top uphold in running a Microsoft platform.
b) Process Model – Is based around four basic concepts
a. IT service management has a life cycle
b. The cycle has logical phases that run concurrently
c. Operational reviews are release and time based
d. IT service management touches all aspects of the enterprise
With that in mind, the process model has four integrated phases: changing,
operating, supporting, optimizing, and all forming a spiral life cycle. Each phase
follows with a review milestone tailored to measure the effectiveness of
proceeding phases.
The diagram below shows the high level roles and four process phases within MOF:
Christopher Kempster
20
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
Illustration:MOF Process Model and Team Model, Roles and Process Phases,
http://www.serview.de/content/english/itsm/4mof/7abbildung/view
Balanced Scorecard
Although not a service delivery framework, the balanced scorecard
(www.balancedscorecard.org) is a measurement-based management approach that
works on the idea that “all business processes should be part of a measurement system
with feedback loops”. (12) The system takes into consideration both strategic and
technical plans that are deployed along with a measurement system; more importantly,
this is a continuous cycle that is “aimed at continuous improvement and continuous
adjustment of strategy to new business conditions” (12).
It should be noted that the balanced scorecard is a corporate “strategy into action” tool
rather than an IT Governance framework (please be aware that I have used “strategy
into action” very loosely, it is much more than simply this). Such a strategy is a key
element in which governance models can reside and is well worth understanding at a
high level.
Service Level Metrics
In order to improve you need to something to measure against. Simply we know, but
rarely done in many fields of business, including IT. This section covers the scale of nines
as one of the many forms of measurement, primarily used in service level agreements to
define a level of system availability in its most simplisitic form.
The Scale of Nines
Many service level availability measures talk in terms of the “nines”, as the table below
shows:
Percentage
Downtime/Year
100
99.9999
99.999
99.99
None.
<= 30 secs
<= 5.2m
<= 52.22m
Christopher Kempster
21
S Q L
99.9
99.0
90.0
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
<= 8hrs 46m
<= 87hrs 36m
<= 36days 12hrs
The five and six nines are a formidable and complex requirement for equipment vendors
to deliver (and therefore they ask a premium price), and even more so within your
organization. There are no hard and fast rules simply to say the higher the nines the
more costly the solution, both in raw capital and operationally. Generally speaking,
select the rating that represents the comfort level the organization (culturally,
economically, politically) – be reminded that it is a predictive measure and not a hard and
fast rule as its mathematical calculation in even the simplest environment is absurdly
complicated. (38)
For the SLA, the measurement is outside the scope of scheduled downtime or commonly
known as a change window. The window must be relatively flexible in terms of its size to
encompass large changes, but a fixed maximum period may be unavoidable. In order for
this to occur, management must revisit current change practices, underpinning contrafts
with external suppliers and operational (internal) contracts between business entities.
Other availability metrics
Who you are establishing a service level with will ultimately determine the technical
detail, or lack of, in terms of the availability measures we use.
For example, if we are establishing service levels between an IT shop and an external
data carrier for your privately connected data centers, then availability metrics based
upon packet loss, allocation errors, minimum transfer speeds, are effective measures. If
the IT shop is dealing with the application owners using the data center services, then
these metrics mean (and measure) nothing of any value and will not be able to be
related to the overall experience of service.
Generally speaking there is no one definitive measure that can be applied to any one
service, client or business. The choice of a metric depends upon the nature of the service
and the technical sophistication of the users. (39).
All said, there are some common formulas and definitions, the value of which is tough to
crack and more importantly, must be justified and documented if the final figures are
challenged. They are:
%Availability = (MTBF / (MTBF + MTTR)) * 100
- or %Availability = (Units_of_time – downtime) / total_units_of_time
- where MTBF = mean time between failures
MTTR = mean time to recover
Be aware that availability metrics are very different to latency / response time metrics, or
throughput or capacity metrics, let alone utilization metrics. Be warned that these types
of metrics require the utmost confidence in terms of measurement and how they pan out
for ALL users of the service.
Christopher Kempster
22
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
Remember that availability is purely “how much time a system is available for normal
operation”, be that a fixed figure or one that varies throughout the SLA term. Where
possible, utilize past statistics, trial availability measures and orientate the SLA to whom
you are dealing with.
NOTE – Without effective IT service measures in place, never attempt a charge
back model.
What is achievable with SQL Server?
In order to achieve anything more than 99.99% uptime outside of a standard change
window for the DBMS, the DBA may should consider the following:
Percentage
Downtime/Year
Considerations
100
99.9999
None.
<= 30 secs
99.999
<= 5.2m
99.99
<= 52.22m
Impossible. Possible in a perfect world with no change.
Improbable. In a highly redundant environment using
Microsoft clustering or a third party product the system will
always fault for a short period of time (in the order of 30sec to
one minute). The key here is the services around the DBMS
and eliminating all single points of failure. Change
management and access to the servers is critically important.
Such a system cannot test “live failovers” unless it is part of
the change widow, but is extremely risky.
Possible. With multiple redundant services in play. There is
not time for inline hardware replacement during an
emergency.
Possible and easily sustained (at a price$). Hardware spares
must be easily available if we choose not to cover all single
points of failure. Reboots of hardware are not possible in most
cases.
Responsibility vs. Accountability
When establishing any plan it is of utmost importance to not only define a persons
“role(s)”, but clearly identify it in terms of responsibility and accountability.
A responsibility is a basic requirement when performing an activity, i.e. when there is
something we are required to do (13). For example, a DBA is responsible for making and
validating backups of the database to facilitate complete recovery to a point in time.
An accountability stems from actions we (or others we are managing as a Senior DBA) do
or don’t take, regardless of the whether they are our direct responsibility or not. Simply
put, we are answerable for actions and their results. (13) Clearly identifying them is
difficult, but extremely important as a measure of service delivery and professionalism.
One interesting flow on topic is that of authority; “if you make someone responsible but
do not give them authority, can they be held accountable”? (14). If you look at this
pragmatically the point of writing down, and agreeing upon tasks that are in effect
promises tends to require a level of authority for these tasks. Management need to
ensure this remains in focus and does not fragment into the realms of shared
accountability to a point where being accountable is no different from having
responsibility.
It is critically important for business continuity, disaster planning and systems recovery
that time is taken to identify responsibilities and those accountable for the actions taken.
Christopher Kempster
23
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
The assignment of accountability and responsibility builds upon the fabric of process,
procedure and control as without it, no one will be answerable to actions taken (or not as
it seems in many cases) and can be a disasterous situation for the business.
NOTE – A classic case from ITIL with the terminology applied is that of change and
release management – the change manager is accountable for all changes within
the IT environment, the release manager is responsible for making and
communicating the change. At all points in times the final outcome of the change
in production rests with the change manager.
Having a shared, global understanding of these terms is an important step forward in the
allocation and measurement of work that forms the hygiene factors of any business. This
is especially the case when dealing with disaster and change management.
Building Staff Capability
Consider an Emergency Response Team (ERT)
An emergency response team (ER for IT or your dedicated production support team and
signatory body and owners of all disaster recovery plans) will:
a) verify (testing) backups and associated recovery procedure checklists
b) assist customers and management in recognising risks and steps (and costs) to
mitigate them
c) respond to, manage, and coordinate emergency recovery procedures
d) ensure staff capability training policies are in place to better facilitate systems
recovery, especially as services change over time, be they new technologies or
upgrades ones.
e) attend change management meetings and be well advised of new system
development initiatives or purchases/evaluations
f)
be actively involved in business continuity planning; to ensure staff have a solid
understanding of the procedures in place, and what manual procedures need to
be followed if system downtime is in the order or hours or days.
As a very basic example, I worked in a small development team (25 staff in total) for a
Government Department. The business unit was responsible for all business systems
development and subsequently included some core business applications. The ER-team
was formed based upon the following:
Christopher Kempster
24
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
Overarching
plan
Lead
Developer C
Manager
(Accountable)
Team Help
Desk
Lead
Developer B
Lead
Developer A
Coordinate
customer
communications
and impact on
other services
Direct customer
communications via
Dev.Manager and
business continunity
procedure initiation
DR-documentation and Systems
Recovery Procedures
& Communication plans
Customers
Apps’
Coordinate
activities of
network and
server recovery
and problem
determination
Database
Administrator
(Responsible)
Service Desk
External
Vendor
Support
Server/Network
Administrators
DBMS
Business
Continunity
Procedures
This sort of model will differ based on the underling IT governance model your business is
pursuing, its culture and political boundaries, and of course budgetary constraints. That
said, the driver here is the coordination of team members via a strict set of recovery
processes, with the responsibilities defined and documentation in place to monitor,
measure and coordinate actions during an emergency.
DBA Taining and Skills (building ERT expertise)
In terms of database administration roles, I divide them into the following streams based
upon different skill sets and IT focuses that of course mean a different style of training
and self improvement (we do not cover mixed roles, namely Developer/DBA etc.):
DBA stream
Skill requirements
Training plan
Production
Support DBA
• Highly skilled pure DBA
with expert knowledge
of all facets of the
DBMS.
• Has little involvement
with developers apart
from tuning on
request.
• Provide expert advice
and skills for system
upgrades, replication,
advanced SQL and
DBMS features.
• Expert in maintaining
multiple, large and
small scale databases
systems in a
production
The production support DBA is a
specialist role. The DBA is typically
found in large data centers, ISP’s or
business with numerous mission
critical databases that rely on a
fulltime production presence.
Christopher Kempster
The DBA must focus on certification
and the key driver for ongoing skills
development, along with seminars
and short term training courses
relevant to the technologies planned
or being implemented.
The DBA must have a solid grasp of
recovery and high availability
scenarios; as such, time must be
given to sustain these skills.
25
Differentiating
yourself
This role requires the
DBA to cross skill in
Oracle or DB2 at an
absolute minimum
with certification.
Consider future roles
such as Chief DBA,
Operations Manager
and beyond.
S Q L
Application
Development DBA
Christopher Kempster
S E R V E R
B A C K U P , R E C O V E R Y
environment.
• Expert in database and
systems recovery
scenarios.
• Highly skilled at
“thinking outside the
box”
• Applying systems
recovery calming and
methodically.
• 24x7 support is often
required.
• Highly skilled in
writing, tuning and
passing on expert SQL
and DBMS design
knowledge to
developers.
• Highly skilled in logical
and physical database
modeling, including
classic patterns,
(de)normalization,
modeling tools
• Advanced DTS
• Good understanding of
the implications and
development
requirements or issues
of using replication,
clustering, XML in the
DB, blob data storage,
large database design
etc.
• OLAP management and
good MDX skills
• Average system
administration skills,
including the OS, IIS,
FTP, encryption, SSL
• Database performance
tuning.
• Skilled at
Backup/Recovery
(typically due to the
fact that developers
require numerous
copies of databases)
• Server performance
management,
marrying up figures to
the underlying DBMS
• Expert in profiler, user
and DB tracing,
especially blocking,
locking and deadlocks
• Has a good
understanding of the
SQL Server engine, but
is a little rough in
some areas due to the
wide range of skills
required
• Can tend to be a little
on the “cowboy” DBA
side and care must be
&
T R O U B L E S H O O T I N G
Consider jobs that encompass a wide
range of databases vendors and
numerous production instances or
where uptime is critical and achieved
through enterprise-class hardware
and software features.
To remain effective in this role the
DBA needs to thoroughly embrace
and cross skill in the languages and
development technologies in play
against the DBMS – but at the same
time not becoming part of the
development team and locking you in
as a “Developer/DBA” which is a very
different role. That said, back-end
development in T-SQL etc, is a core
requisite, include complex DTS builds
etc. These skills should be
continually developed; don’t alienate
yourself from the development team.
The DBA is asked frequently about
indexing, fragmentation
management, and
moving/exporting/importing/restoring
databases throughout the day. More
importantly, the DBA must keep on
top of any new DBMS feature and
determining the costs/benefits and
issues with its use and
implementation for the sake of the
team and its agility. Major decisions
will be made off the DBA’s
recommendations, research skills and
ability to sell and pursue ideas are
important.
The DBA should
really take a team
leadership role where
possible; bringing
the team together
and driving the
importance of
standards,
procedure, change
management, DBMS
design and SQL
improvements etc.
Consider running
Special Interest
Group sessions
during lunch times to
drive this goal.
Keep highly
informed, it is one
thing understanding
the technology, but
apply critique,
cost/benefit analysis
backed by the
wisdom of yourself,
others and research
is the magic value
add. Strive for this in
the role.
Consider IT
consulting courses
over certification.
26
S Q L
•
Applications DBA
•
•
•
•
Christopher Kempster
S E R V E R
B A C K U P , R E C O V E R Y
taken to pull them into
line to follow process
and procedures.
Regarded as a “Jack of
all trades”.
Good DBA skills, solid
daily administration,
SQL tuning, profiling
and tracing
Good understanding of
OLAP, DTS, MSDTC,
XML, TSQL etc
Good to Expert
knowledge of two or
more core vendor
applications (Microsoft,
SAP, Oracle etc),
especially for
application
customization and
extension
Advanced TSQL, DTS
skills. May include 3rd
party lanuage and
excellent report writing
skills.
&
T R O U B L E S H O O T I N G
The applications DBA is a skilled DBA
role with specialist knowledge of a
enterprise class application, such as
SAP over SQL Server for example.
Apart from a thorough understand of
the underling database structures,
the DBA has expert knowledge of
deployment, setup, and
administration of the application over
the chosen DBMS platform. Training
should be clearly orientated around
these skills and maintaining them. At
the same time, looking at vendor
hooks for adhoc reporting and
application extension should be
considered and make part of the
persons work plan.
27
Consider enterprise
class application
only, namely SAP,
CRM, Portal, BizTalk
and other Integration
technologies, Oracle
Applications etc.
Specialist skills in
such products are
highly sought after,
but watch the
market carefully;
especially your local
one and adapt
accordingly.
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
2
Chapter
Change Control
T
he need for a managed change control procedure is a fundamental requirement for
any IT department. In terms of disaster recovery, it allows the business to analyse
the risk a change will have on business as usual, and clealy spells out the pre and
post tasks to be performaned when applying a change.
The idea here is managed change to reduce human error and impact on the business.
This chapter will the policies and procedures of an example change management system,
and detail the use of Microsoft Visual Source safe for source code management.
Managing Change Control by Example
This section will discuss the processes I used from day to day for change management in
a relatively large development. We will cover:
1. formalizing the process
a. document
b. agree upon and prepare the environment
c. build and maintain list of definitive software and server install audit log
d. management support
2. database script management
3. developer security privileges in extreme programming environments
4. going live with change
5. managing ad-hoc (hot fix) changes
Environment Overview
Christopher Kempster
28
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
With any serious, mission critical applications development, we should always have three
to five core environments in which the team is operating. They include:
1. Development
a. rarely the environment (database) is rebuilt from production. Its servers
are generally busy and reflects any number of pending change controls,
some of which never get to test and others go all the way through to
production.
2. Test
a. refreshed from production of a regular basis and in sync with a “batch” of
change controls that are going to production within a defined change
control window.
b. ongoing user acceptance testing
c. database security privileges reflect what will (or is) in production
3. Production Support (optional, some regard as Test) / Maintenace
a. mirror of production at a point in time for user testing and the testing of
fixes or debugging of critical problems rather than working in production.
4. Pre-production or Compile Server (optional)
a. a copy of production as of “now”
b. used when “compiling code” into production and the final pre-testing of
production changes
c. locked down image to ensure maximum compatibility on go live
5. Production
The cycle of change is shown in the diagram below through some of these servers:
Christopher Kempster
29
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
The whole change management system, be it in-house built or a third party product has
seen a distinct shift to the whole CRM (relationship management) experience, tying in a
variety of processes to form (where possible) this:
Christopher Kempster
30
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
This ties in a variety of policy and procedures to provide end-to-end service delivery for
the customer. The “IR (incident record) database” shown does not quite get meet all
requirements, but covers resource planning, IR and task management, and subsequent
change window planning. With good policy and practice, paper based documentation of
server configuration and application components will assist in other areas of the service
delivery and maintenance.
See ITIL framework for complete coverage of the underling concepts presented in the
diagram above.
Pre-change window resource meeting
Every fortnight the team leaders, DBAs and the development manager discuss planned
and continuing work over the next two weeks. The existing team of 20 contract
programmers works on a variety of tasks, from new development projects extending
current application functionality (long term projects and mini projects) to standard bug
(incident request) fixing and system enhancements. All of this is tracked in a small SQL
Server database with an Access front end, known as the “IR (incident reporting)”
system.
The system tracks all new developments (3 month max cycle), mini projects (5-10 days),
long term projects (measured and managed in 3 month blocks) and other enhancements
and system bugs. This forms the heart and soul of the team in terms of task
management and task tracking. As such, it also drives the change control windows and
what of the tasks will be rolled into production each week (we have a scheduled
downtime of 2 hours each Wednesday for change controls).
The resource meeting identifies and deals with issues within the environments, tasks to
be completed or nearing completion, and the work schedule over the next two weeks.
The Manager will not dictate the content of the change window but guide resource and
task allocation issues. The team leaders and the development staff will allocate their
tasks to a change control window with a simple incrementing number representing the
Christopher Kempster
31
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
next change window. This number and associated change information in the IR database
is linked to a single report that the DBA will use to on Tuesday afternoon to “lock” the
change control away and use it to prepare for a production rollout.
Visual Source Safe (VSS)
The key item underpinning any development project is source control software. There is
a variety on the market but most sites I have visited to date use Microsoft VSS.
Personally, I dislike the product; with its outdated interface, lack of functionality and
unintuitive design, it’s something most tend to put up with (its better than nothing!).
Even so, a well managed and secured VSS database is critical to ongoing source
management.
Consider these items when using VSS:
•
Spend time looking around for better change manage front-ends that lever off the
VSS API / automation object model, if possible. Web-based applications that
allow remote development would be handy.
•
Consider separate root project folders for each environment
o
$/
development
test (unit test)
production
•
Understand what labeling and pinning mean in detail, along with the process f
sharing files and repining. These fundamentals are often ignored and people
simply make complete separate copies for each working folder or worse still, have
a single working folder for dev, test and production source code (i.e. 1 copy of the
source).
•
All developers should check in files before leaving for the day to ensure backups
cover all project files.
•
Take time to review the VSS security features and allocation permissions
accordingly.
•
If pinning, labeling, branching etc is too complex, get back to basics with either
three separate VSS databases covering development, test and production source
code, or even three project folders. Either way the development staff needs to be
disciplined in their approach to source control management.
•
Apply latest service packs
We will discuss VSS in depth later in the chapter.
Christopher Kempster
32
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
Managing Servers
There are few development teams that I have come across that have their own server
administrators. It is also rare that the servers fall under any SOE or contractual
agreement in terms of their ongoing administration on the LAN and responsibility of the
IT department. As such, the DBA should take the lead and be responsible for all server
activities where possible, covering:
•
server backups – including a basic 20 tape cycle (daily full backups) and
associated audit log. Insist that tapes are taken off site and keep security in
mind.
•
software installed – the DBA should log all installations and de-installations of
software on the server. The process should be documented and proactively
tracked. This is essential for the future rollout of application components in
production and for server rebuilds.
•
licensing and terminal server administration
•
any changes to active-directory (where applicable)
•
user management and password expiration
•
administrator account access
On the Development and Test servers I allow Administrator access to all developers to
simplify the whole process. Before going live, security is locked down on the application
and its OS access to mimic production as best we can. If need be, we will contact the
companys systems administrators to review work done and recommend changes.
In terms of server specifications, aim for these at a minimum:
•
RAID-1, 0+1 or RAID-5 for all disks – I had 4 disks fail on my development and
test servers over a one year period. These servers really take a beating at times
and contractor downtime is expensive. I recommend:
o
2+ Gb RAM minimum with expansion to 4+ Gb
o
Dual PIII 900Mhz CPU box
Allowing administrative access to any server usually raises concerns, but in a managed
environment with strict adherence of responsibilities and procedure, this sort of flexibility
with staff is appreciated and works well with the team.
Development Server
The DBA maintains a “database change control form”, separate from the IR management
system and any other change management documentation. The form includes the three
core server environments (dev, test and prod) and associated areas for developers to
sign in order for generated scripts from dev to make their way between server
environments. This form is shown below:
Christopher Kempster
33
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
In terms of security and database source management, the developers are fully aware
of:
•
naming conventions for all stored procedures and views
•
the DBA is the only person to make any database change
•
database roles to be used by the application database components
•
DBO owns all objects
•
Database roles will be verified and re-checked before code is promoted to test
•
Developers are responsible for utilizing visual source safe for stored procedure
and view management
•
the DBA manages and is responsible for all aspects of database auditing via
triggers and their associated audit tables
•
production server administrators must be contacted when concerned with file
security (either the DBA if they have the responsibility or system administrators)
and associated proxy user accounts setup to run COM+ components, ftp access,
and security shares and to remove virtual directory connections via IIS used by
the application.
Christopher Kempster
34
S Q L
•
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
strict NTFS security privileges
With this in mind, I am quite lenient with the server and database environment, giving
the following privileges. Be aware that I am a change control nut and refuse to move
any code into production unless the above is adhered to and standard practices are met
throughout the server change cycle. There are no exceptions.
1. Server
a. Administrator access is given to all developers via terminal services to
manage any portion of the application
b. DBA is responsible for server backups to tape (including OS, file system
objects applicable to the application and the databases)
2. Database
a. ddl_admin access – to add, delete or alter stored procedures, views, and
user defined functions.
b. db_securityadmin access – to deny/revoke security as need be to their
stored procedures and views.
No user has db_owner or sysadmin access; DBA’s should be aware that developers may
logon to the server as administrator and use the built-in/administrator account to attain
sysadmin access. I do not lock it down, but make people fully aware of the
consequences (i.e. changes don’t go to production and may be promptly removed).
Database changes are scripted and the scripts stored in visual source safe. The form is
updated with the script and its run order or associated pre-post manual tasks to be
performed. To generate the scripts, I am relatively lazy. I alter all structures via the
diagrammer, generate the script, and alter those that can be better scripted. This
method (with good naming conventions) is simple and relatively fail-safe, and may I say,
very quickly. All scripts are stored in VSS.
The database is refreshed during “quite” times from production. This may only be a data
refreshed but when possible (based on the status of changes between servers), a full
database replacement from a production database backup is done. The timeline varies,
but on average a data refresh occurs every 3-5 months and a complete replacement
every 8-12 months.
Test Server
The test server database configuration in relation to security, user accounts, OS
privileges, database settings are close to production as we can get them. Even so, it’s
difficult to mimic the environment in its entirety as many production systems include web
farms, clusters, disk arrays etc that are too expensive to replicate in test.
Here the DBA will apply scripts generated from completed change control forms that alter
database structure, namely tables, triggers, schema bound views, full-text indexing, user
defined data types and changes in security (and which have already run successfully in
Christopher Kempster
35
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
development). The developers will ask the DBA to move up stored procedures and views
from development into test as need be to complete UAT (user acceptance testing).
The DBA will “refresh” the test server database on a regular basis from production. This
tends to coincide with a production change control window rollout. On completion of the
refresh, the DBA might need to re-apply database change control forms still “in test”.
All scripts are sourced from VSS.
Refreshing TEST from PRODUCTION
We will only dot point one of many possibilities from a database perspective, the process
can be very complex with large numbers of application components and runtime libraries.
•
Notify developers of your intention – check that important user or change testing
is not underway. The DBA doesn’t really want to synchronise the table data as it
can be an overly complex task in large database schemas (100+ tables for
example – my last DB had 550!)
•
Check free space on the test server to restore the database backup from
prodiction (backup file) and accommodate the expanded storage after the restore.
o
If the production databases are huge and the only free space is in
production, then consider restoring a recent copy of production as
“test_<prodname>” within the production instance, deleting/purging
appropriate records and shrinking before taking a backup of this database
over to test.
•
Restore the database into the test database instance as “new_<db-name>” (for
example), allowing the developers to continue using the existing test database.
•
Fix the database users to the logins (sp_change_user_login) as required – retain
existing default database
•
Notify developers that no further changes to DBMS structure will be accepted.
•
Use a scripting tool (SQL Compare from RedGate software is very good) to
compare the existing database structures. You may not take over all existing
changes. Go back over your change control documentation and changes raise
and marked as “in test” as these will be the key changes the developers will
expect to see. This process can take a full day to complete and you may have to
restore again if you get it wrong (very easy to do).
•
Email development staff notifying them of the cutover
•
Switch the databases by renaming them.
•
Check user and logins carefully, change the default database as required.
•
Fix full text indexes as required.
•
Fix any internal system parameter or control tables that may be referring to
production. Copy the data from the old test database as required.
Christopher Kempster
36
S Q L
•
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
Notify developers that the database is available.
Production Support
The production support server box is similar to that of test, but is controlled by the
person who is packaging up the next production release of scripts and other source code
ready for production. This server is used for:
•
production support – restoring the production database to it at a point in time and
debugging critical application errors, or pre-running end of month/quarter jobs.
•
pre-production testing – final test before going live with code, especially handy
when we have many DLL’s with interdependencies and binary compatibilities
issues.
All database privileges are locked down, as is the server itself.
Production
The big question here is, “who has access to the production servers and databases?”.
Depending on your SLAs, this can be wide and varied, from all access to the development
team via internally managed processes all the way to having no idea where the servers
are let alone getting access to it. I will take the latter approach with some mention of
stricter access management.
If the development team has access, it’s typically under the guise of a network/server
administration team that oversee all servers, their SOE configuration and network
connectivity, OS/server security and more importantly, OS backups and virus scanning.
From here, the environment is “handed over” to the apps team for application
configuration, set-up, final testing and “go live”.
In this scenario, a single person within the development team should manage change
control in this environment. This tends to be the application architect or the DBA.
When rolling out changes into production:
a. application messages are shown (users notified)
b. web server is shutdown
c. MSDTC is stopped
d. Crystal reports and other batch routines scheduled to run are closed and/or
disabled during the upgrade
e. prepare staging area “c:\appbuild” to store incoming CC window files
f.
backup all components being replaced, “c:\appatches\<system>\YYYYMMDD”
a. I tend to include entire virtual directories (even if only 2 files are being
altered)
Christopher Kempster
37
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
b. COM+ DLL’s are exported and the DLL itself is also copied just in case the
export is corrupt
g. full backup of the database is done if any scripts are being run
h. consider a system state backup and registry backup, emergency disks are a must
and should always be kept up to date.
Take care with service packs of any software (never bundle a application change with a
service pack). The change (upgrade or downgrade) of MDAC, and the slight changes in
system stored procedures and system catalogs with each SQL Server update, can grind
parts (or all) of your application to a halt.
Here is an example of an apppatches directory on the server we are about to apply a
application change to. The directory is created, and all files to be replaced are copied:
The DBA may choose to copy these files to tape or copy to another server before running
the upgrade. If the change was a virtual directory, I would copy the entire directory
rather than selective files, it simplifies backup process and avoids human error.
Hot Fixes
Unless you are running a mission critical system, there will always be minor system bugs
that result in hot fixes in production. The procedure is relatively simple but far from ideal
in critical systems.
a. Warn all core users of the downtime. Pre-empt with a summary of the errors
being caused and how to differentiate the error from other system messages.
b. If possible, re-test the hot fix on the support server
c. Bring down the application in an orderly fashion (e.g. web-server, component
services, sql-agent, database etc).
d. Backup all core components being replaced/altered
Database hot fixes, namely statements rolling back the last change window, are tricky.
Do not plan to disconnect users if possible. But careful testing is critical to prevent
having to do point in time recovery if this gets worse.
Christopher Kempster
38
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
Finally, any hot fix should end with a ½ page summary of the reasons why the change
was made in the monthly production system report. Accountability is of key importance
in any environment.
Smarten up your applications (Autonomic Computing)
Autonomic computing “is an approach to self-managed computing systems with a
minimum of human interference” (IBM). In other words, self repairing, reporting,
managing systems that looks after the whole of the computing environment. So what
has this got to do with change management? everything actually.
The whole change management process is about customers and the service we provide
them as IT professionals. To assist in problem detection and ideally, resolution, system
architects of any application should consider either:
a. API for monitoring software to plug in error trapping/correcting capability
b. Application consists of single entry point for all system messages (errors, warning,
information) related to daily activity
c. The logging system is relatively fault tolerant itself, i.e. if it can not write
messages to a database it will try a file system or event log.
d. Where possible, pre-allocate range of codes with a knowledge base description,
resolution and rollback scenario if appropriate. Take care that the numbers
allocated don’t impose on sysmessages (and its ADO errors) and other OS related
error codes as you don’t want to skew the actual errors being returned.
A simplistic approach we have taken is shown below; it’s far from self healing but meets
some of the basic criteria so we can expand in the future:
It is not unusual for commercial applications to use web-services to securely connect to
the vendors support site for automatic patch management, error logging and self
Christopher Kempster
39
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
correction. This adds a further complexity for the system administrators in terms of
firewall holes and their ability to control the application.
MRAC of IR/Task Completion
This is going off track a little in terms of change control but I felt its worth sharing with
you. The MRAC (Manage, Resource, Approve, Complete) principle is a micro guide to
task management for mini projects and incident requests spanning other teams/people
over a short period of time. The idea here is to get developers who own the task to
engage in basic project management procedures. This not only assists in documenting
their desired outcome, but communicating this to others involved and engaging the
resources required to see the entire task through to its completion.
The process is simple enough as shown in the table below. The development manager
may request this task breakdown at any time based on the IR’s complexity. The
developer is expected to meet with the appropriate resources and drive the task and its
processes accordingly. This is not used in larger projects in which a skilled project
manager will take control and responsibility of the process.
Task or deliverable
Planned
Completion
Date
Managed
by
Resourced
to
Approved
by
Completed
by
Requirements
Design
Build
Test
Implement
Audit (optional)
The tasks of course will vary, but rarely sway from the standard requirements, design,
build, test, implement life-cycle. Some of the key definitions related to the process are
as follows:
Managed
Resourced
Accepted
Approved
Authorized
Variation
Each task or deliverable is managed by the person who is given the responsibility of ensuring
that it is completed
The person or persons who are to undertake a task or prepare a deliverable
The recorded decision that a product or part of a product has satisfied the requirements and
may be delivered to the Client or used in the next part of the process.
The recorded decision that the product or part of the product has satisfied the quality
standards.
The recorded decision that the record or product has been cleared for use or action.
A formal process for identifying changes to the Support Release or its deliverables and
ensuring appropriate control over variations to the Support Release scope, budget and
schedule. It may be associated with one or more Service Requests.
This simple but effective process allows developers and associated management to better
track change and its interdependencies throughout its lifecycle.
Summary
No matter the procedures and policies in place, you still need the commitment from
management. Accountability and strict adherence to the defined processes is critical to
avoid the nightmare of any project, that being source code versions that we can never
re-created, or a production environment in which we do not have the source code.
Christopher Kempster
40
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
Failure to lay down the law with development staff (including the DBA) is a task easily
put in the too hard basket. It is not easy, but you need to start somewhere.
This section has presented a variety of ideas on the topic that may prompt you to take
further action.
Using VSS by Example - Part 1
This section is divided in multiple parts to introduce you to a VSS (Microsoft Visual Souce
Safe) framework that I have successfully used for a large applications development team.
It provides a working framework for source code management in Microsoft Visual Source
Safe. The framework presented support a system currently under support and
maintenance, as well as other development initiatives (mini projects) that may be adding
new functionality.
For those working with multi-client software in which “production source code” is at
different patch levels (and may include custom code), you will need to rework VSS to
adopt the strategy – namely due to multiple side-by-side releases with different custom
changes to the same product.
The article is scenario driven, starting with a very basic scenario. We then move to more
complex changes.
Terminology
CCM
Project
Branch
Share
Merge
Change Control Manager.
Is similar to a folder in explorer. It contains configuration items (files),
VSS will automatically version projects and files with an internal value.
Is synonymous to a copy in explorer, but VSS maintain links between
versions of branches to facilitate merging at some later stage. VSS via its
internal version numbering will maintain the “logical branch history” in
order to facilitate merging with any other branch.
A shared file is a “link” to its master, or in other terms, a shortcut. The
shared file will be automatically pinned (cannot change this action by
VSS); to alter it you must unpin it. If the shared file is altered its master
will also be changed, along with any other “shortcut” (share).
Process of merging the changes from two branched files. The VSS GUI will
show the hierarchies of branches and associated links/paths, broken
branches that can not be merged too, or branches from incompatible
parents (can merge the branches from two different trees!).
Build Phase – Moving to the new change control structure
The project folder layout of VSS is as follows:
Christopher Kempster
41
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
This structure maps to your equivalent development, test and server environments.
Source code for development and test environments is built from VSS production code
project (projects are shown as Windows explorer folders in VSS) and the subsequent
development/test project code applied over top (roll forward) to form the environment.
The developers need to manage themselves when there are multiple “versions” of a DLL
or other file source moving into test over time. Either way, the VSS environment allows
the full “build” of development or test servers from a known image of production source
code (which is critical):
The /production project folder includes the names of other applications currently under
the change control process. To start the process of change management using this
structure, we will be introducing a new application call “MYAPP”. The CCM will:
a. create a new project in /production call “myapp”
b. create sub-projects to hold myapps’s source code
c. label the MYAPP project as “MYAPP Initial Release”
d. arrange a time with the developer(s) and check-in source code into the projects
created
This will give the structure:
Christopher Kempster
42
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
The production folder is “read only” for all but the change control manager (CCM).
First Change Control
All application project changes in the /production project “folder”, follow a change control
window path, as such, in the development project folder the CCM will:
a. create a new project in the development folder representing the change control
window (if none exist)
b. notify developers of VSS development project status
For example:
The MYAPP developer needs to alter his single ASP and COM files for this change control
window. The developer will need to:
a. Create a “MYAPP” project under the CC001 project in development
Expand the production source code project, navigate to MYAPP and the COM
Libraries project, then with a right click and drag the project to
/development/cc001/myapp and from the menu displayed, select share and
branch
Christopher Kempster
43
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
Check the “recursive” check box if any sub-project folders are also required.
b. Do the same for Web Sites. Important – If there are multiple websites and you
only want selected projects, then manually create the same structure in dev then
use VSS to share the projects from there.
c. REVISE WHAT YOU HAVE DONE – ONLY SHARE/BRANCH THE FILES REQUIRED
FOR THE CHANGE CONTROL. IF YOU FORGET FILES OR REQUIRE OTHERS, THEN
REPEAT THIS PROCESS FROM THE SAME PRODUCTION SOURCE PROJECT.
ALWAYS KEEP THE SAME PROJECT FOLDER STRUCTURE AS IN THE PRODUCTION
PROJECT
d. If you make a mistake, delete your project and start again.
The above steps will leave us with the following project structure:
Christopher Kempster
44
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
To review where the original source came from in the production folder:
a. select a file
b. right click properties
c. select the paths tab
d. click links if the GUI is hard to read
We can now check in/out code as need be in the development project folder. The DEV
(development) server environment is then built (i.e. refreshed) from this source code in
VSS.
Moving Code to TEST
The CCM will:
a. create a CC001 project in the /test project “folder”
The developer(s) will:
Christopher Kempster
45
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
b. remove any reference to “myapp” in the /test/cc001 project (if applicable)
c. expand the cc001/myapp project, select myapp then right click and drag to the
/test/cc001 project, selecting the share and branch option
d. Select “recursive” check box (as all code for MYAPP is going up as a single unit of
testing).
This will give us:
This scenario will assume all code will transition from DEV and TEST and into PROD.
The TEST server environment is then built (refreshed) from this source in VSS.
Overwriting existing code in TEST from DEV
During testing we found a range of problems with our ASP page. We fixed the problem in
/development and now want to overwrite/replace the VSS /test copy for the CC001
change control.
The developer will:
a. Check-in code in the development folder for the change control
Christopher Kempster
46
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
b. Navigate to the /test/cc001/myapp/Web Sites/ project
Delete from this point
IMPORTANT – You can do individual files if you need to, rather than a whole project
c. Goto the /development/cc001/myapp project folder, right click and drag the
websites project
Test now has the fixes made in dev. Use the files to build/update the test server
environment.
Taking a Change Control into Production
The developers will:
a. ensure all code in the /test/cc001 folder is OK to go live
Christopher Kempster
47
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
b. remove any files that are not ready
The CCM will:
c. create a new project in the production project folder to represent the change
control going “live”
d. share and branch the myapp application (entire structure) into the new project
folder in production:
e. Navigate to the /test/cc001/myapp project, get a list of files to “go live”
f.
Navigate back to /production/cc001/myapp project and locate these files, for each
file:
a. Select the file, SourceSafe
b.
Merge Branches
Pick the appropriate branch to merge to
Press the merge button.
c. Check conflicts if need be.
The /production project structure is:
Christopher Kempster
48
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
Once all files are merged into the new production project folder, the production server
environment can be built from these altered files.
The branching in VSS provides an effective solution to managing versions and the interrelationships between components (files) and physical projects. The above scenario for
example gave us this tree like structure to work from (versions in diagram are VSS
internal versioning numbers – use labels to assist with source identification if you are
having trouble with it):
Using VSS by Example - Part 2
In this section we expand on other source control scenarios and how our defined project
structure and VSS branching functionality accommodates it.
How do I move files to next weeks change control?
In the previous example, we only processed the single ASP file, for some reason, the
developers wanted to delay the DLL for the next round of change controls. Therefore we
need to get the COM file in /development/cc001 into the new project
/development/cc002 for it to be included in that change window.
The CCM will:
a. create /cc002 in /development folder
Christopher Kempster
49
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
b. The developer will branch the DLL code from /production/cc001 into
/development/cc002
c. Select the file(s) to be merged in /development/cc002.
Christopher Kempster
50
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
d. Resolve the differences (we know what they are and are fine with all of them)
e. After the merge, double check your code before moving on with development.
What does VSS look like after 2 iterations of change?
After two complete iterations of changes, we end up with this structure in VSS:
We have a snapshot of production for all iterations:
a. initial snapshot
b. change control 1
c. change control 2
Christopher Kempster
51
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
We can also rebuild the test server from a production build and roll forward through
change controls. In the development project, we have a well managed source
environment for each change and its progression through the environments.
I forgot to take a file into production for a schedule change
The CCM needs to:
a. check files are in the equivalent /test project folder
b. if not, the developer should update this project folder by branching the missing
files from the equivalent development folder.
c. As the /production/ccxxx folder already exists (had to for other code to go into
prod), the CCM simply merges this missing file into the project and takes the files
from here to the production server environment.
I have a special project that will span many weeks, now what?
The scenario is this:
a. we do weekly change controls
b. a new special project will take 2 standard chance control iterations to move into
production
c. the standard change controls will be altering files the special project will be using
d. we need to ensure that the standard change controls are not affected and visa
versa for the special project until the special project is ready to do live.
To manage this we need to ensure these rules apply:
a. strict branching from a single point in time from the /production project folder
b. the special project files will merge with some future change control and go live
with this change control window to ensure VSS consistency.
At CC003, we start a new special MYAPP project that affects the single ASP file we have.
The CCM will:
a. create /development/CC003 project
b. create /development/MYAPP Mini Project
Christopher Kempster
52
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
c. The developer branches in the files to be altered into both /development/cc003
and /myapp mini project
d. Now both project folders in dev are branched from the single “tree root” (for one
of a better word). Developers continue to work in DEV and branch to test as need
be to facilitate standard app testing.
Merging “MYAPP Mini Project” back into the standard change control.
Here we assume CC003 and CC004 changes over the last 2 weeks are in production, we
need to get MYAPP Mini Project live, this will be done in CC005. The VSS structure is:
a. CCM creates /development/CC005
b. CCM branches code used in “MYAPP Mini Project” from /production/cc004 (current
production source)
c. CCM then, working with the developers, merges into /development/MYAPP Mini
Project with the /development/CC005 project files.
d) Merging is not straight forward in VSS. Take care, and where possible
attempt do this “offline” from the product (using 3rd party source compare tools
for example).
Christopher Kempster
53
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
d. On completion of the merge, remove MYAPP Mini Project or renamed to “use
CC005 – MYAPP Mini Project”
e. Go through standard dev
document.
test
prod VSS practices as outlined through this
Using VSS by Example - Part 3
Re-iterating our chosen VSS structure
The structure we have chosen, i.e. /development, /test, /production VSS project folders,
is perhaps seen by many as just a file system with undelete and some labeling capability,
and that's fine. We have divided off the "environments" from one another and know that
source for production at some known point in time is at location XYZ in VSS and not part
of a single copy that is being used for ongoing development. This goes with the /test
project and was the main reason why we went down this path.
It takes some time to get developers used to the fact that we do have three
environments in VSS between which code is shared/branched (copied) between; for all
intended purposes, developers should see no real difference in the way they currently
develop and interact with VSS via their development software IDE.
The development staff, in liaison with a senior developer (the ultimate owner of code
branched from /dev to /test then /prod), can setup the local PC development
Christopher Kempster
54
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
environment, and hook in the /development folder to their IDE with little effort, and
continue working in this environment.
The developers need to use the VSS GUI to add code as required into /development
folders. The only real complaint here is the need to branch code into /test which some
complain as time consuming (but just darn lazy in my opinion).
The key in /test is the naming of our folders. The folders represent the change control
numbers, which means that your testing of component XYZ is planned to be released into
production as change control number 20041201. If it isn’t, then we branch/move it to
another change control. All said and done, we are not overly strict in /test within VSS,
we do allow other folders to be created to suit the needs of the team, but nothing rolls
into /production that isn’t inside of a /test/cc folder.
Finally the /production folder. The fundamental concept here is the full duplication of all
source that makes up the production server before the change control was rolled into
production. We copy this source to a new project folder and the change manager then
merges code from our /test/cc folder into this copy, thus forming the new production
source code image.
This method is simplistic and easy to use. As the change manager of VSS becomes more
adept with the product, one may use labeling as the core identifier of source at a single
point in time; be warned that it is not simple and a mistake can be very costly.
What does my VSS look like to date?
Here is the screen shot of my VSS project after seven iterations of scheduled change
controls:
Christopher Kempster
55
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
I will discuss /development next. As we see, /production has a "as of now" project folder
for each application, and then the copies of the same folder "as at YYYYMMDD". So for
the STARS application we see:
/production/stars
/production/stars20030917
(developers branch into /development from here for any changes to prod code)
(the STARS system and all its source code as at 17/09/2003)
I will archive off the projects in production as soon as the size of VSS gets too large. It
grows quickly with such a scheme. Even so, it doesn’t slow VSS and disk space is
plentiful (200Gb free so I am not stressed over it).
As we can see in the /test project folder, we have projects representative of the change
control. It is important to remember that this folder is used to build the test server
before formal testing of your changes to go up for that change control. It takes some
discipline for developers to do this and not copy files from the development server to the
test server without going via VSS. The developers are responsible at the end of the day,
and I have solid backing from the development manager.
On a change control go live, I email the team notifying them of the lock down of the /test
change control folder. I then check for checked-pout files in this project then use VSS
security to lock the project. I then do a “get latest version” to my pre-production server,
compile all DLL's, developers test once I have emailed a "please test pre-prod" request,
and wait for final approvals from the developers before 5.30pm that night.
Generally, all working well to date.
What do you do with /development after each change control?
The general thinking here is the removal of all /development code that was taken into
production for a change control and is not being worked on further. I let the senior
developers take responsibility here and they manage this well. I have yet to see any
issues caused by poor management of the /development project folder and its contents.
The developers use the compare and merge facilities of VSS extensively between /test,
/dev and at times /production.
What do you branch into /development in terms of VB6 COM code?
You basically have two options here will branch all your COM project code into
development, or branch only selected class files for the COM project. The development
team opts for selected classes only, and does a get latest version on the production
source to their local computer for compilation. This allows team member to more
effectively track what classes are being altered in development without relying on the
versioning of VSS, file dates or leaving files checked out. This tends to be senior
developer specific on what approach is taken in the /development projects for each
application.
As shown above, the /development/cms application folder holding our COM (cms.dll), we
find only 4 of the 16 class files for this component.
Christopher Kempster
56
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
VSS Issues
Share/Branching files from /test into /production
You cannot overwrite files with share/branch. If I attempt to share/branch the file
myfile.txt from one project to another and the file exists at the destination, you are told
this but the branch will give you no option to overwrite the file.
To get around this, delete the files at the destination project first, then share/branch from
your source file. An alternatively method is via merging, but can be very time consuming
and error prone.
Building the new /production project
When a change control is complete and all code is now in production from our
/test/CCYYYYMMDD project, we need to re-establish our production source environment
again. For example, for the STARS application we would see this:
/test/CC20031204/STARS/<other projects and altered source taken into prod>
/production
/STARS
/STARS20031204
(copy all source from /STARS20031204 and merge /test/CC20031204/STARS/ here)
(before the change control)
So the /STARS project under production is a image of all source currently on the
production server.
To do the copy of /STARS20031204 and paste as /STARS, one would think they could
simply share/branch this folder the /production and be prompted to rename? Well, it
doesn't work. You need to manually create the /STARS project, then for each subproject folder under /STARS20041204 share/branch it to /STARS. This is extremely
inconvienant and does not follow standard GUI interactivity as per other MS products.
Adding/Removing Project Source Files
If you remove files from a project folder in VSS, you will find that VSS tracks this removal
for subsequent recovery, all fine and good. When you add files back into this project
folder, you will get something like this:
Christopher Kempster
57
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
In this case I am adding 45 files that are part of a single large DLL. There is no "yes all"
option, so I need to click on NO forty-five times.
When deleting files, you are prompted with the list, and a small dialog, check the
"destroy permanently" option:
This will prevent the previous message from popping up forty five times but there is no
rollback.
IMPORTANT - If you delete a range of source files from a project, then check new
files in over the top, you may find all history is lost of previous actions against these
files. Consequently, always branch from /test back into /production to retain this
history. Developer rely on this history and may get very upset if it’s lost.
Error on Renaming Projects
Removing large project from VSS is an issue. When I rename, say, /production/stars to
/product/stars20031204, it sits there for 20+sec then I repeated get this message:
then this:
Christopher Kempster
58
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
This will carry on for all sub-project folders. If I kill the process via task manager and
return to VSS, the project has been successfully renamed and I have no further issues
with the project.
Use Labels where appropriate
This is not a gripe on VSS, but a feature I encourage. Anytime code is checked-in, VSS
will assign it an internal version number. The developers can complement this with labels
that can act as textual version numbers to provide better clarity on what a specific
version encompasses. It is not unusual to label a single project and all its files, once
done, we can get latest version based on this label to retrieve all marked source files
with this label. This is done by right clicking on the VSS project - show history - check
the labels only check box in the project history dialog shown. A list of labels is shown,
click on the label to be retrieved and click get; this will do a get latest version to the
projects marked working directory.
The "guest" user
A very quick one - the guest account cannot be granted check-in/out, add/rename/delete
or destroy privileges on VSS project folders. As such, when bulk changing user access
unselect the guest account to allow you to bulk change the remainder of the users.
Security Issues in /production for Branching files
In the /production project folder, a developer cannot branch files into the /development
folder unless they have the check-in/out security option enabled for the /production
project. This is a real problem for me. Why? Developers can not check in/out over
production source and may also branch incorrectly (i.e. share rather than branch). As
Christopher Kempster
59
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
shown below, the developer requires read and check in/out for the branch option to be
available.
Here are some possible options to get around the issues:
a) Ask developers to email me what files are required for branching into development
(far from ideal)
b) Pre-branch from /production back into /development and the developers can sort out
what to remove or keep for their next phase of development
c) Give developers the check-in/out access in production and trust that all will be fine
and source will be branched corrected and never checked in/out of the production folder
directly.
d) Change the way to name /production folders so we always have a "copy" of current
production (locked down) and another for developers to branch from.
e) Microsoft upgrades VSS (not likely - come on guys, all other software has seen major
change except VSS). That said, the change has been in the VS.Net GUI space as we will
see later.
Me? I have opted for c) for the time being to remain flexible.
NOTE - Do not forget that, if using option c), developers incorrectly share rather
than branch code, or check in/out of the /production folder rather than
/development you have all the VSS features to resolve the issue, namely the inbuilt
versioning of source (i.e. we can rollback), the viewing of this version history, and
the fact that in /production we have denied the "destroy" (delete) option.
Christopher Kempster
60
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
Welcome to .Net
Initial Configuration of Visual Studio.Net
You require Visual Source Safe SP6d or greater for Visual Studio .Net 2003 to complete
the following steps. I have found that, under some circumstances VSS 6c will not work,
the source control options are shown but are all grayed out (disabled), very strange
indeed.
Start VS.Net and select Tools then Options to open the IDE options dialog below. From
here we are interested in two key areas in relation to VSS, the source control options:
Click on SCC Provider to connect to your source safe database, for example:
From here we configure the VSS database connection string, some core check in/out
options and login details. These are automatically populated based on my existing VSS
client setup.
Christopher Kempster
61
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
Equally important is the projects and solution settings under environment. All developers
should map to the same physical location on their development machines, in this
example it is c:\AppProjects, I do not like to prefix the company’s name or division/team
name as, like all things, they tend to change often. The directory will be automatically
created.
IMPORTANT - If you repair your VS.Net installation you may need to re-install
VSS 6d as the source safe option may "disappear" from your File menu.
VS.Net Solutions and Projects
Important Notes before we continue
Here are some general rules before we continue on with examples:
a) Use VS.Net source control options where possible - don’t run the VSS GUI,
checkout code, then go about opening the source. Use the VS.Net options
and the open from source control menu option.
b) Developers need to standardize a local PC directory structure for the
checkout of code from source safe, it is not difficult, but means everyone
gets together than sorts out the base working directory structure for
checked out code. You may find VSS and VS.Net hard coding specific
directories that will make life very difficult for you later in the development
phase.
c) Keep your project and solutions simple. Avoid multiple solutions for
multiple sub-components of an application, keep the structure flat and
simple. Only “play” when you (and your team) are sufficiently
experienced comfortable with it.
Christopher Kempster
62
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
Adding a new (simple) Solution to Source Control - Example
Here is a very simple (default) solution called cktest1. No rocket science here by any
means.
From the file menu, select Source Control and Add to source, you are presented with this
warning in relation to Front-Page, if all fine, press continue.
You are prompted for the root location of this solution within VSS. You can name your
VSS project folder different to that of the solution, but I do not recommend it. Here we
select our /development folder:
Christopher Kempster
63
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
The /cktest1 VSS project is then created and the lock icons within VS.Net represent the
source control activation against this solution and its project(s).
The VSS structure is a little messy though - we get this by default:
/cktest1 is the solution, and /cktest1_1 is the project, so lesson learnt - name your
solution and projects before hand. Either a prefix or postfix standard is recommend here,
for example:
cktest1Solution and cktest1Project
but does get a little more complicated with many projects.
If you want to change your solution and project names you will get this message:
Christopher Kempster
64
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
Cancel this change. If you don’t like the VSS standard structure, and prefer a more
logical hierarchy like this:
/development/cktest1 (aka. the system name!)
/cktest1Solution
/cktest1Project
It gets a little tricky.
To do this, check-in all code into VSS and close your solution. Open the VSS GUI, and
navigate to your newly created project.
Now share & branch your code from:
/cktest1/cktest1 == to ==> /cktest1/cktest1Solution
and
/cktest1/cktest1_1 == to ==> /cktest1/cktest1Solution/cktest1Project
Do not remove the old project folders.
Go back to VS.Net and open the solution once again (do not checkout code!).
Select the solution, then select File, Source Control, and pick the Change Source Control
option.
Under the server bindings area, we select the new project folders as create previously.
The solution will be automatically checked-out to complete the new VSS bindings.
Open the VSS GUI and remove the old /cktest1 and /cktest1_1 projects. I would
recommend a get latest version on the root project just in case.
Whether you remain with a flat structure, or re-bind as we have done above, is an issue
for the change manager more than anything in terms of best practice.
Christopher Kempster
65
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
VSS for the DBA
The examples have presented a complete solution for change control management using
VSS.
For the DBA may choose to store scripts in VSS created from development, test and
those applied to production. The examples apply in all cases, as scripts are simply files
they need to be managed between environments; and VSS is the tool for the job.
The DBA should also consider VSS as a backup utility. The DBA should script all
production databases on a monthly basis, and store the scripts within VSS. Using VSS
and its merge functionality can be handy in this respect to locate differences between
months for example, or retrieve the definitions of lost database objects, constraints,
views etc rather than recoverying full databases from tape.
The DBA should not forget objects outside of the user database, namely:
•
Full text index definitions
•
Logins
•
Important passwords
•
Scripted DTS jobs
•
DTS Packages saved as VB files
•
Linked Server Defintions
•
Publication and Subscription scripts
Christopher Kempster
66
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
3
Chapter
Theory and Essential Scripts
T
he DBA must have a solid understanding of the instance and the databases. The
DBA can then better plan system backup and recovery procedures. This chapter
provides the knowledge to better understand your SQL environment.
As a reminder, this ebook is not a beginners guide to SQL databases. This chapter
provides value added knowledge to those already familiar with installation, setup and
configuration of SQL Server 2000.
Undo & Redo Management Architecture
The key component for rollback and redo in SQL Server is the transaction log that is
present in each database within the instance. The transaction log is a serial record of all
transactions (DML and DDL) executed against the database. It is used to store:
•
•
•
•
•
start of each transaction
before and after changes made by the transaction
allocation and de-allocation of pages and extents
commit and rollback of transactions
all DDL and DML
The transaction log itself consists of one or more physical database files. The size of the
first must be greater than or equal to 512Kb in size. SQL Server breaks down the
physical file into two or more virtual transaction logs. The size of the file and its autogrowth settings will influence the number of virtual logs and their size. The DBA cannot
control the number of, or sizing of, virtual logs.
Physical transaction log file (min size 512Kb with 2 virtual logs)
Virtual Log File
(min 256Kb)
The log-writer thread manages the writing of records to the transaction log and the
underlying data/index pages. As pages are requested and read into the buffer cache,
changes are sent to the log-cache as a write-ahead operation which log-writer must
complete with a write to the transaction log before the committed change is written to
Christopher Kempster
67
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
the data files as part of a check-point. In terms of an update, the before and after
images are written to the log; a delete the before image is written; an insert tracks only
the new records (not including many other record entries as mentioned previously as part
of the transaction). The log-writer (or log manager) allocates a unique LSN (log
sequence number – a 32bit#) to each DML/DDL statement including the transaction ID
that links like log entries together.
NOTE – The log manager (writer) alone does not do all of the writing; the
logwriter thread will write pages that the worker threads don’t handle.
Log Cache
Log flushed, LSN allocated and
transaction id stamped with before/after
images of the records updated.
Physical transaction log file
Write-ahead log cache, must
complete before buffer cache
pages written back to
physical data files.
Buffer Cache / Log Manager
Dirty buffer pages,
log-write requested
Select and update of database record
Database Data Files
Checkpoint completes - writes back to
buffer cache entries to data files only
after log write is compete.
The transaction entries (committed and uncommitted) in the transaction log are doubly
linked lists and each linked entry may contain a range of different (sequential) LSN’s.
The buffer manager guarantees the log entries are written before database changes are
written, this is known as write-ahead logging; this is facilitated via the LSN and its
mapping to a virtual log. The buffer manager also guarantees the order of log page
writes.
NOTE – every database page has a LSN in its header; this is compared to the log
entry LSN to determine if the log entry is a redo entry.
The transaction itself remains in the active portion of the transaction log until it is either
committed or rolled back and the checkpoint process successfully completes.
Christopher Kempster
68
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
Physical transaction log file
Doubly linked list log-entries for the transaction.
Free space
in virtual
logs
Active Portion of
Log
NOTE – The checkpoint process keeps around 100 pending I/O operations
outstanding before it will yield to the User Mode Scheduler (UMS). The
performance monitor counter checkpoint pages/sec is an effective measure of the
duration of checkpoints.
Although not shown above, space is also allocated for each log entry record for rollback
purposes; therefore actual space utilisation can be significantly more than expected.
Microsoft defines the active part of the log to be the portion of the log file from the
MinLSN to the last written log record (end of logical log).
The MinLSN is the first of possibly many yet uncommitted (or roll backed) transactions.
Cyclic and serial
transaction log
Free virtual log
space
End/start of next virtual log file
BEGIN
TRAN
T1
UPD
TRAN
T1
BEGIN
TRAN
T2
COMIT
TRAN
T1
CHECK
POINT
DEL
TRAN
T2
LSN
112
LSN
113
LSN
114
LSN
115
LSN
116
LSN
117
- active portion Min LSN
Last Log
record
The log records for transaction are written to disk before a commit acknowledgement is
sent to the client. Even so, the physical write to the data files may have not occurred.
Writes to the log are synchronous but the writes to data pages are asynchronous. The
log contains all the necessary information for redo in the event of failure and we don’t
have to wait for every I/O request to complete. (33)
When the database is using a full or bulk-logged recovery model, the non-active portion
of the log will only become “free” (can be overwritten) when a full backup or transaction
log backup is executed. This ensures that recovery is possible if need be via the backup
and the DBMS can happily continue and overwrite the now free log space. If the
database is using the simple recovery model at a database checkpoint, any committed
(and check pointed) or rollback (and checkpointed) transaction’s log space will become
immediately free for other transactions to use. Therefore, point in time recovery is
impossible.
Christopher Kempster
69
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
The checkpoint process is key to completing the committed transactions and writing the
dirty buffers back to disk. In relation to the transaction log, a checkpoint will:
a) write a log entry for the start of the checkpoint
b) write the start LSN for the checkpoint chain to the database for subsequent
recovery on instance failure
c) write a list of active (outstanding) transactions to the log
d) write all dirty log and data pages to disk for all transactions
e) writes a log file record marking the end of the checkpoint
The checkpoint will occur:
a)
b)
c)
d)
On issue of the CHECKPOINT or ALTER DATABASE statement
On instance shutdown (SHUTDOWN statement)
On SQL service shutdown
Automatic checkpointing
a. DBMS calculate timing based on the recovery interval setting
b. “Fullness” of the transaction log and number of transactions
c. Based on timing set with recovery internal parameter
d. Database using simple recovery mode?
i. If becomes 70% full
ii. Based on recovery interval parameter
NOTE – dirty page flushing is performed by the lazywriter thread. A commit does not
trigger an immediate checkpoint.
You can see the checkpoint operation in action via the ::fn_virtualfilestats (SQL 2k)
system function. The function has two parameters, the database ID and the file ID. The
statistics returned are cumulative. The test is simple enough, here we run insert and
update DML, we call :fn_virtualfilestats between each operation and at the end force a
checkpoint, here is what we see:
DbId FileId
---- -----7
2
7
3
7
2
7
3
7
2
7
3
7
2
7
3
7
2
7
3
7
2
TimeStamp
----------1996132578
1996132578
1996132578
1996132578
1996132578
1996132578
1996132593
1996132593
1996132593
1996132593
1996132625
NumberReads
-------------------317
30
317
30
317
30
317
30
317
30
317
NumberWrites
-------------------154
42
156
42
158
42
160
42
162
42
165
BytesRead
-------------------17096704
819200
17096704
819200
17096704
819200
17096704
819200
17096704
819200
17096704
BytesWritten
-------------------2127360
540672
2128384 [1024 diff]
540672 [0 diff]
2129408 [1024 diff]
540672 [0 diff]
2130432 [1024 diff]
540672 [0 diff]
2131456 [1024 diff]
540672 [0 diff]
2140672 [9216 checkpt]
7
1996132625
30
43
819200
548864
3
[8192 written]
File ID 2 = Log File
File ID 3 = Data File
Each colour band represents new DML operations being performed. The last operation
we perform is the CHECKPOINT. Here we see the BytesWritten increase as the logs are
forced to flush to disk (file ID 3).
Running the LOG command you will see the checkpoint operations:
DBCC LOG(3, TYPE=-1)
where 3 is the database ID (select name, dbid from master..sysdatabases order by 1)
Christopher Kempster
70
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
Apart from the recovery interval and recovery mode parameters, the DBA has little
control of checkpointing. It can be forced via the CHECKPOINT statement if need be and
I recommend this before each backup log statement is issued.
To drill into the virtual logs and the position, status, size, MinLSN within the
transaction log files, use the commands:
dbcc loginfo
dbcc log(<db#>, TYPE=-1)
The 3rd party tool “SQL File Explorer” includes a simple but effective GUI view of your
transaction log file and its virtual logs. This can be very handy when resolving log file
shrink issues.
..and a textual view..
Christopher Kempster
71
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
NOTE – To get even more details about log entries, consider the command:
select * from ::fn_dblog(null, null)
The process of recovery involves both:
d) Redo (rolling forward or repeating history to time of the crash); and
e) Undo (rolling back or undoing un-committed transactions) operations.
NOTE – In SQL Server 2k, the database is only available after undo complete (not
including when the STANDBY restore clause is used). In SQL Server Yukon, the
database is available when undo begins! Providing faster restore times.
The redo operation is a check, we are asking “for each redo log entry, check the physical
files and see if they change has already been applied”, if not, the change is applied via
this log entry. The undo operation requires the removal of changes. The running of
these tasks is based upon the last checkpoint record in the transaction log. (33)
The management of transactions is a complex tasks, and is dealt with not but the buffer
manager, but via the transaction manager within the SQL engine. Its task includes:
a) isolation level lock coordination with the lock manager, namely when locks
can be released to protect the isolation level requested; and
b) management of nested and distributed transactions and their boundaries
in which they span; this includes the coordination with the MSDTC service
using RPC calls. The manager also persists user defined savepoints within
transactions.
Christopher Kempster
72
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
REMINDER – The transaction log is a write-ahead log. Log writes are synchronous
and single threaded Actual DB writes are asynchronous, multi-threaded and as
shown in the diagrams, optimistic. Also note that compressed drives and their
associated algorithms disable the write-ahead logging (WAL) protocol, and can
effectively stall checkpoints and the timing of calls.
Audit the SQL Server Instance
Walking into a new server environment is never easy, and quickly understanding the
database running on your instances at a high level is a basic task that needs to be
completed quickly and accurately. The DBA should the following – all are important in
some way to systems recovery (DR) and system troubleshooting in general:
a) Instance version and clustered node information (if applicable)
b) Base instance properties covering switches, memory, security settings etc.
c) Service startup accounts, including the SQL instance service, SQL Agent and the
SQL Agent proxy account (if used)
d) Select snapshot of configuration settings
e) The existing of performance counters and statistics from a select few
f)
List of sysadmin and dbowner users/logins
g) List of databases, options set, total size and current transaction log properties, if
database has no users other that DBO:
a. Files used and their drive mappings
b. Count of non-system tables with no indexes
c. Count of non-system tables with no statistics
d. List of Duplicate indexes
e. Count of non-system tables with no clustered index
f.
Tables with instead-of triggers
g. List of schema bound views
h. Count of procedures, views, functions that are encrypted
i.
List of any user defined data types
j.
List of PINNED tables
k. For each user, a count of objects owned by them rather than DBO
h) Last database full backup times – see section on backups.
Christopher Kempster
73
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
i)
Publications currently in existence
j)
Subscriptions currently in existence
&
T R O U B L E S H O O T I N G
k) Startup stored procedures in effect
l)
Currently running traces and flags set on startup
m) List of linked servers and their properties
n) If the database is currently experiencing blocking
o) Login accounts with no password, and those with sysadmin access
p) Non standard users in the master and msdb databases (consider model as well to
determine if any special “generic” user exists when other databases are created),
those users with xp_cmdshell access
q) Current free space on local disks
r) Full text index statistics/summary
This e-book provides many of these answers based on the problem at hand. The internet
is a huge source of scripts, I have no single script but use many crafted by fellow DBA’s.
Rather than deal with the issues of copyright, I charge you with the task of accumulating
and adapting suitable scripts. Feel free to email me if you require assistance.
As a general guide for systems recovery, collect the following:
DBCC MEMORYSTATUS
SELECT @@VERSION
exec master..xp_msver
exec sp_configure
select * from master..syslogins
select * from master..sysaltfiles
select * from master..sysdatabases
select * from master..sysdevices
select * from master..sysconfigures
select * from master..sysservers
select * from master..sysremotelogins
select * from master..sysfilegroups
select * from master..sysfiles
select * from master..sysfiles1
sp_MSforeachtable and sp_MSforeachdb stored procedures
execute master..sp_helpsort
execute master..sp_helpdb -- for every database within the instance.
To determine the actual “edition” of the SQL instance, search the internet for a site listing
the major and minor releases. A very good one can be found at:
http://www.krell-software.com/mssql-builds.htm
The SERVERPROPERTY command is a handy one, giving you the ability to ping small but
important information from the instance. For example:
SELECT SERVERPROPERTY('ISClustered')
SELECT SERVERPROPERTY('COLLATION')
Christopher Kempster
74
S Q L
SELECT
SELECT
SELECT
SELECT
SELECT
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
SERVERPROPERTY('EDITION')
SERVERPROPERTY('ISFullTextInstalled')
SERVERPROPERTY('ISIntegratedSecurityOnly')
SERVERPROPERTY('ISSingleUser')
SERVERPROPERTY('NumLicenses')
The database level equivalent is DATABASEPROPERTY and is covered well in the BOL.
To locate the install path (undocumented command):
DECLARE @sql SYSNAME
DECLARE @data SYSNAME
EXEC master..sp_msget_setup_paths @sql OUTPUT, @data OUTPUT
SELECT @sql, @data
Do not forget the SQLDIAG.EXE command to dump to into a single ASCII file core system
and SQL coinfiguration information. I highly recommend that you run this daily to assist
with systems recovery:
cd "C:\Microsoft SQL Server\MSSQL$CORPSYS\binn"
sqldiag -icorpsys -oc:\sqldiag_corpsys.txt
-- corpsys is the named instance
Goto the binn dir for the instance and run sqldiag. Remove the –i<myinstancename> as
necessary. I always run named instances of sql server (a personal preference more than
anything – especially if you want to run an older DBMS version of SQL on the box), so
the default instance will be:
C:\Program Files\Microsoft SQL Server\MSSQL\Binn>sqldiag -oc:\sqldiag.txt
Meta Data Functions
It is easy to forget about the SQL Server meta data functions, all of which are essential to
making life a damn site easier with more complex scripts. I recommend you spend some
time exploring the following at a minimum:
Function Name
DB_NAME
DB_ID
FILE_NAME
FILEGROUP_NAME
FILEGROUPPROPERTY
FILEPROPERTY
Christopher Kempster
Purpose
Given the database ID from the master..sysdatabases table, returns the
name of the database.
As above but you pass the name and the ID is returned.
Given the fileid number from sysfiles for the current database, returns
the logical name of the file. The value is equivalent to the name column
in sysfiles. Be aware that databases in the instance can share the same
ID number, so file_name(1) can work equally well in any database, this
ID value is not unique for the entire instance.
Pass in the groupid value as represented in sysfilegroups for each
database and returns the groupname column. Remember that we can
have primary and user defined filegroups, transaction logs do not have
a group with a maximum of 256 filegroups per database.
We can use the previous command as the first input into this routine,
the next parameter includes three basic options, IsReadOnly,
IsUserDefinedFG, IsDefault, this undoubtedly change with future
releases on SQL Server.
Will return file specific property information, we can use the FILE_NAME
function as the first input, the next parameter can be one of the
following: IsReadOnly, IsPrimaryFile, IsLogFile, SpaceUsed. The space
is in pages.
75
S Q L
FULLTEXTSERVICEPROPERTY
OBJECT_NAME
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
Returns the MSSEARCH (Microsoft Search Service) service properties
specific to SQL Server full text engine integration.
Based on the current database of the connection
Listing SQL Server Instances
To get a list of SQL Instances on the network, consider this command:
isql –L
-orosql –L
There are a range of third party alternatives available on the internet, such as
sqlping.exe from www.sqlsecurity.com. Also check my website under hints/tips for a
bunch of SQL tools from a variety of authors.
Information Schema Views
It is not uncommon for system tables to change between releases; so significant can be
the change that impacts on existing scripts can result in complete re-writes. To ease the
pain a little, a range of what I call Metadata++ views are available that tend to be more
consistent and additive, rather than taken away.
The metadata views are referred to as the information_schema.
Information Schema
information_schema.tables
Column Summary
Catalog (database), Owner, Object name, Object type
(includes views)
Catalog (database), Owner, Object name, View Source
(if not encrypted), Check Option?, Is Updatable?
Catalog (database), Owner, Table name, Col Name, Col
Position, Col Default, Is Nullable, Data Type, Char Min
Len, Char Octal Len, Numeric Prec <etc>
Database Name, Schema Name, Schema Owner, Def
Char Set Catalog, Def Char Set Schema, Def Char Set
Name
Catalog (database), Owner, Constraint Name, Unique
Constraint Database, Unique Constraint Owner, Unique
Constraint Name, Update Rule, Delete Rule
information_schema.views
information_schema.columns
information_schema.schema
information_schema.referential_constraints
Note that the views return data for the currently active (set) database. To query another
database use the db-name(dot) prefix:
select * from mydb.information_schema.tables
There are no special security restrictions for the views and they are accessible through
the public role. Be aware though that the schema views will only return data to which
the user has access to. This is determined within the views themselves via the
permissions() function.
To view the code of any of the views:
use master
exec sp_helptext 'information_schema.Referential_Constraints'
Christopher Kempster
76
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
As you have probably noticed, the views are very much DDL orientated and lack items
such as databases or linked servers for example; these would prove very handy rather
than having to query system tables. (34)
Database, File and File Group Information
Extracting Basic Database Information
Use the system stored procedure found in the master database:
exec sp_helpdb mydatabase
Determining Database Status Programmatically
The sysdatabases table in the master database includes the status column of type
int(eger). This is an essential column for many scripts to use to determine the state of
instance databases before continuing on with a specific recovery scenario etc. Here is an
example:
alter database pubs set read_only
go
select name, ‘DB is Read Only’
from master..sysdatabases
where status & 0x400 = 1024
go
where 0x400 is equivalent to 10000000000 binary, (BOL tells us that bit position 11 (or
1024 decimal) tells us the database is in read-only mode). Including more than one
option is a simple matter of addition; the DBA should recognize that 0x (zero x) is the
prefix for a hexadecimal value.
Status
32
64
128
256
512
1024
2048
4096
32768
Binary & Hex
Codes
0x20
100000
0x40
1000000
0x80
10000000
0x100
100000000
0x200
1000000000
0x400
10000000000
0x800
100000000000
0x1000
1000000000000
0x8000
1000000000000000
Meaning
Loading
Pre-Recovery
Recovering
Not Recovered
Offline
Read Only
DBO use only
Single User
Emergancy Mode
Common Status Code Bits.
An alternate method is a call to the DATASEPROPERTY function and is much more
understandable:
Christopher Kempster
77
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
select name, ‘DB is Read Only’
from master..sysdatabases
where DATABASEPROPERTY(name, N’IsReadOnly’) = 1
go
The DATABASEPROPERTYEX function is a drill-through function, providing more specific
property information about the database itself. These functions are called meta-data
functions.
Remember – multiple bits can be set at any one time. The list above is not the
definitive set of values, but are the most important to recognize in terms of Backup
and Recovery.
Using O/ISQL
The osql or isql command line routines are very handy and are well worth exploring. For
a majority of DBA work though, the command can go unused, but occasionally they
prove essential, especially for complex scripts where dumping data to flat files is
required. For example:
osql -E -h-1 -w 158 -o test.txt -Q "SET NOCOUNT ON SELECT name FROM
master..sysdatabases WHERE name <> 'model'"
or run an external SQL file:
exec master..xp_cmdshell 'osql -S MySQLServer -U sa -P ic:\dbscripts\myscript.sql'
Rather than embedding the username/password, especially within stored procedures via
xp_cmdshell, consider –E for integrated login via the service user account. The SQL
script may contain multiple statements, be they in a global transaction or not.
NOTE – to represent a TAB deliminator for output files, use –s “ “, the space
represents a TAB character.
Taking this further, a Mike Labosh at www.devdex.com shows how to process many .sql
scripts in a directory using a single batch file:
RunScripts.bat
@ECHO OFF
IF "%1"=="" GOTO Syntax
FOR %f IN (%1\*.sql) DO osql -i %f <--- also add other switches for osql.exe
GOTO End
:Syntax
ECHO Please specify a folder like this:
ECHO RunScripts c:\scripts
ECHO to run all the SQL scripts in that folder
:End
ECHO.
Christopher Kempster
78
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
Retrieving Licensing Information
To locate information about the existing licensing scheme, run the following within Query
Analyzer:
select SERVERPROPERTY('LicenseType')
select SERVERPROPERTY('NumLicenses')
where 'NumLicenses' is not applicable for the per CPU scheme. If you are running the
Developer Edition, 'LicenseType' returns “disabled” and 'NumLicenses' is NULL.
The License Manager program (control-panel -> administrative tools) cannot display
information related to per-processor licensing for SQL Server (Q306247). To get around
this issue the installation will create a new administrative tool program to track such
information as shown in the screen shots below.
SQL Server 2k licence applet, this
is very different under v7.
Version 2000
Version 7
The Settings
Control panel
Licensing
shows only server licensing for SQL Server
2000 installations (unlike v7). The applet above is used instead. Notice that you cannot
switch to a per-processor license; you must re-install the instance to do this.
NOTE – Unless specified in your license contract or SQL Server version on
installation, the installed instance will not automatically expire or end.
Alter licensing mode after install?
Once the licensing mode is set on instance installation, that’s it. You cannot change it
back. In control panel we have the icon:
Christopher Kempster
79
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
This will allow you administer your licenses, but you will see one of the options is grayed
out, such as:
To get around this go to the following registry key entry:
Change the mode to 1 (default is zero) and you get both options:
Microsoft may not support you if this method is used.
Allowing writes to system tables
Use the following code whilst logged in as SA or with sysadmin privileges:
exec sp_configure "allow updates", 1 -- 0 (zero) to disable
go
reconfigure with override -- force immediate change in config item
Updating system tables directly is not supported by Microsoft, but there are some
occasions where it is required. In terms of DR, it is important to understand the process
if you need to apply changes.
Christopher Kempster
80
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
Count Rows & Object Space Usage
The DBA has a number of methods here to return a table rowcount and space utilization
details. Ensure your scripts return table objects and not schema bound views or views
themselves:
OBJECTPROPERTY(<object-id>, 'IsUserTable') = 1
If you want to run a SQL command for each table in the database, consider the
command master..sp_MSforeachtable, for example:
exec sp_MSforeachtable @command1=”select count(*) from ?”
The only other MS options are master..sp_MSforeachdb or
master..sp_MSforeach_worker. It is a pity we have no others for views, functions, stored
procedures etc. Search Google and you will find many examples:
http://www.sqlservercentral.com/columnists/bknight/sp_msforeachworker.asp
Do note that the MS_ @command1 parameter itself can accept multiple physical
commands. For example this is valid:
@command1=”declare @eg int print ‘?’ select count(*) from ?”
- or @command1=”declare @name varchar(50) set @name=parsename(‘?’, 1) dbcc
updateusage(0, @name) with no_infomsgs select count(*) from ?”
Here are some methods of row counting and returning object space usage:
Method
select
count(*)
from
mytable
SELECT
FROM
WHERE
AND
General Notes
Will not return total number of bytes used by the table.
IO intensive operation in terms of table and/or index
scanning.
Patch or version changes in DBMS can break script.
Returns row count and bytes/space used. Does not
factor in multiple indexes.
May not be 100% accurate (based on last statistics
update).
Consider DBCC UPDATEUSAGE before-hand.
The indid column value may be zero (no clustered
indexes) or one (clustered); note that 0 and 1 will not
exist together.
object_name(id) ,rowcnt ,dpages * 8
mydb..sysindexes
indid IN (1,0)
OBJECTPROPERTY(id, 'IsUserTable') = 1
- OR SELECT o.name, i.[rows], i.dpages * 8
FROM
sysobjects o
INNER JOIN sysindexes i
ON
o.id = i.id
WHERE (o.type = 'u') AND (i.indid = 1)
and o.name not like ‘dt%’
ORDER BY o.name
exec sp_spaceused ‘mytable’
Returns data and index space usage.
Will auto-update the sysindexes table.
Will scan each data block, therefore can be slow on
larger tables – use the @updateusage parameter set to
‘false’ to skip this scan.
Prefix with master.., and the routine will search this DB
for the object name passed in.
Performs a physical data integrity check on tables or
indexed views; CHECKALLOC is more thorough in terms
of all allocation structure validation.
Slow and DBMS engine intensive.
Will return structural error information with the table (if
dbcc checktable(‘mytable’)
Christopher Kempster
81
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
any), along with a row and pages used count.
Can repair errors (see BOL for options).
Will validate all indexes unless specified.
Another example using the msforeachtable routine
(undocumented system routine).
exec sp_msforeachtable
@command1=@command1=”declare @name
varchar(50) set @name=parsename(‘?’, 1) dbcc
updateusage(0, @name) select count(*)
from ?”
Space and Memory Usage
To get the current database size along with data/index/reserved space:
use mydb
exec sp_spaceused
- or just the basic file information select fileid, size from sysfiles
- or via FILEPROPERTY USE master
go
SELECT FILEPROPERTY('master', 'SpaceUsed')
To retrieve fixed disk space usage:
exec xp_fixeddrives
To get the space used on a table:
exec sp_spaceused largerow, @updateusage = true
To retrieve transaction log space usage:
dbcc sqlperf (logspace)
To determine amount of free space for a database, consider the following SQL:
select name, sum(size), sum(maxsize) from mydb..sysfiles where status & 64 = 0 group
by name -- size is in 8Kb pages
select sum(reserved) from mydb..sysindexes where indid in (0,1,255)
NOTE – The DBCC UPDATEUSAGE command should be run before sp_spaceused
command it run to avoid inconsistencies with the reported figures.
Black Box Tracing
Microsoft Support recommends the black box trace when dealing with instance lockups
and other strange DBMS errors that are difficult to trace. The trace itself will create the
Christopher Kempster
82
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
files blackbox.trc, blackbox_0n.trc (switches every 5Mb). These return critical system
error messages. The trace is activated via:
declare @traceID int
exec sp_trace_create @traceID OUTPUT, 8
exec sp_trace_setstatus @traceID, 1
-- Create the trace
-- Start the trace
and produces this file:
If you want to start the trace every time SQL Server starts, then place the code into a
stored procedure within the master database and set the option:
exec sp_procoption ‘mystoredprocnamehere’, ‘startup’, true
As a general rule, do not unless you have a specific need. To verify the trace:
USE master
GO
SELECT * FROM ::fn_trace_getinfo(1)
GO
Do not run the trace as a regular enabled trace as it may degrade system performance.
Here is a more complete script to startup the blackbox trace. It includes logging
messages to the error log.
use master
go
CREATE PROC dbo.sp_blackbox
AS
declare @traceID int
declare @errid int
declare @logmessage varchar(255)
exec @errid = sp_trace_create @traceID OUTPUT, 8 -- Create the TRACE_PRODUCE_BLACKBOX
exec sp_trace_setstatus @traceID, 1
-- Start the trace
if @errid <> 0 begin
set @logmessage
as varchar)
exec xp_logevent
end
else begin
set @logmessage
varchar)
exec xp_logevent
end
GO
Christopher Kempster
= 'Startup of Black box trace failed - sp_blackbox - error code - ' + cast(@errid
60000, @logmessage, ERROR
= 'Startup of Black box trace success - trace id# - ' + cast(@traceID as
60000, @logmessage, INFORMATIONAL
83
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
exec sp_procoption N'sp_blackbox', N'startup', N'true'
GO
To stop the trace:
declare @traceID int
set @traceID = 1
exec sp_trace_setstatus @traceID, 0
If you are really keen, run Process Explorer from sysinternals.com and look through the
handles for the sqlserver process. It will list the black box trace file.
To read a trace file, the easiest approach is to run the Profiler GUI. Here is our example
file showing commands running at the time of the crash:
Be very careful with re-starting your instance after a crash. The instance will overwrite
the file if you leave the default filename. In your stored procedure, write some smarter
code to better control the filename, for example:
set @v_filename = ‘BlackTrace_’ + convert(varchar(20),getdate(), 112) + '_' +
convert(varchar(20), getdate(), 108)
giving you files like:
BlackTrace_20030512_150000.trc
If you want to create a table from the trace file, rather than opening it via profiler, the
use the command ::fn_trace_gettable():
SELECT *
INTO myTraceResults
FROM ::fn_trace_gettable('c:\sqltrace\mytrace_20040101.trc', default)
Christopher Kempster
84
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
If the single trace created many files (split by size), and this if the first then the default
parameter will ensure ALL files are loaded into the table (very handy feature).
Scan Error Log for Messages?
Our friends at Microsoft released a great support document 115519, “INF: How to Scan
SQL Errorlog or DBCC Output for Errors”, namely with the findstr DOS command.
Database last restored and from where?
When you select properties of a database within an instance, it pops up a window with a
range of tabs. The General tab provides a high level summary of the database, including
the last fullback and transaction log backup. But for development/test servers, the
question is often posed, “where and when was this database restored from?”.
From a backup perspective, the table msdb..backupset stores the dates used within EM,
namely the column backup_finished_date where type =’D’ for full backups and ‘L’ for
transaction logs.
To retrieve restoration history, query the table msdb..restorehistory:
select
destination_database_name as DBName,
user_name as ByWhom,
max(restore_date) as DateRestored
from
msdb..restorehistory
where
destination_database_name = 'mydb'
and
restore_type = 'D' -- full
group by
destination_database_name, user_name
Join this query to msdb..restorefile over the restore_history_id column to retrieve the
physical file name for each database file restored.
If you want to get the backup files dump location and name from where the restore
occurred, then look at the msdb..backup* tables. The restorehistory table column
backup_set_id joins to msdb..backupset and is the key for locating the necessary
information. This table is an important one as it stores all the essential information
related to the database restored, namely its initial create date, last LSN’s etc. This
restored history is the main reason why many prefer EM for restoration to graphically
view this meta data.
Also consider the command:
restore headeronly from disk='c:\mydb.bak'
to view backup file header information.
Christopher Kempster
85
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
What stored procedures will fire when my instance starts?
The DBA can specify custom stored procedures kept in the master database to run on
service startup via the command:
exec sp_procoption 'mystoredproc', 'startup', 'on'
It can also be set within Enterprise Manager when editing your stored procedures (see
checkbox at the bottom of the edit stored proc window).
The DBA can “skip” the procedures on instance startup via the trace flag:
-T4022
You can check if the option is enabled for a stored procedure using OBJECT_PROPERTY
statement:
SELECT OBJECTPROPERTY(OBJECT_ID('mystoredproc'), 'ExecIsStartup')
Alternatively run this command to get a listing. Care is required when using the system
tables between SQL Server version changes:
select name
from sysobjects
where category & 16 = 16
order by name
When the database was last accessed?
The best command I have seen to date is this:
EXEC master..xp_getfiledetails 'C:\work\ss2kdata\MSSQL$CKTEST1\Data\pubs_log.ldf'
Essential Trace Flags for Recovery & Debugging
The following trace flags are essential for a variety of recovery scenarios. These flags are
referred to throughout this handbook. The use of trace flags allow the DBA to gain a
finer granularity of control on the DBMS not normally given.
260
1200
1204
1205
1206
1704
3502
3607
3608
3609
Show version information on extended stored procedures
Prints lock information (process ID and lock requested)
Lock types participating in deadlocking
Detailed information on commands being run at time of deadlock
Complements 1204.
Show information about the creation/deletion of temporary tables
Prints information about start/end of a checkpoint
Skip auto-recovery for all instance databases
As above, except master database
Skip creation of the tempdb database
Christopher Kempster
86
S Q L
4022
7300
7502
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
Bypass master database stored procedure that run on instance startup
Get extended error information on running a distributed query
Disable cursor plan caching for extended stored procedures
Remember that each flag has its own –T<trace#>. Do not try and use spaces or
commas for multiple flags and a single –T command. Always review trace output in the
SQL Server error log to ensure startup flags have taken effect.
Example of setting and verifying the trace flags
The trace flag can be enabled via EM as shown below:
This particular trace will force the logging of checkpoints:
checkpoint
-- force the checkpoint
Important – Trace flags set with the –T startup option affect all connections.
The DBA can also enable trace flags on the command line via the sqlservr.exe
binary:
Christopher Kempster
87
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
Rather than at the command line, we can also enable traces via the services applet:
To view the current enabled session and database wide traces, use these commands:
DBCC TRACEON(3604)
DBCC TRACESTATUS(-1)
Or rather than using –1, enter the specific trace event#.
For the current connection we can use the DBCC TRACEON and DBCC TRACEOFF
commands. You can set multiple flags on and off via DBCC TRACEON(8722,
8602[,etc]). Use -1 as the last parameter to affect all connections:
Christopher Kempster
88
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
DBCC TRACEON (xxxxx, -1)
Use profiler to trace existing connections. Remember that these trace flags are not the
same as trace events as used by profiler (sp_trace_setevent).
IMPORTANT – To check startup parameters without connecting to the instance,
run regedit, navigate to HKLM/Software/Microsoft/Microsoft SQL
Server/<instance>/MSSQLServer/Parameters, and review the SQLArgX list of
string values.
“Trace option(s) not enabled for this connection” ?
You may get this message with attempting to view trace flag status via the command:
DBCC TRACESTATUS(-1)
You may know that XYZ flags are enabled, but this routine returns the above message,
leaving you in the dark. You could go back over your SQL Server logs and hope to catch
the DBCC TRACEON output for the flags used. Otherwise run this first (before
tracestatus):
DBCC TRACEON(3604)
Bulk copy out all table data from database
Basically I am not going to reinvent the wheel here. There is a MS support document
176818 titled “INF: How to bulk copy out all the tables in a database”. Do note that it
simply does a select * from tables and uses BCP via xp_cmdshell to dump the files to
disk. This may be problematic with TEXT fields. If so, consider the textcopy.exe utility
that came with SQL Server 6.5 and 7.0. An excellent coverage of this routine can be
found on Alexander Chigrik’s website at – “Copy text or image into or out of SQL Server”,
http://www.mssqlcity.com/Articles/KnowHow/Textcopy.htm
SQLSERVR Binary Command Line Options
Starting the SQL Server instance (default or named) manually via the command line is an
essential skill for the DBA to master. When I say master, I generally mean that you
should be familiar with the options and have experienced first hand the outcome of the
option.
The options as of SQL Server 2000 are:
Option
-I<IO affinity mask>
Summary / Use
Introduced in SP1 of SS2k.
-c
-d<path\filename>
-l<path\filename>
-e<path\filename>
-m
Do not run as a service
Fully qualified path to the master database primary data file
Fully qualified path to the master database log file
Fully qualified path to the error log files
Start SQL Server in single user (admin) mode; fundamental for master
database recovery from backup files.
Minimum configuration mode. Tempdb will be of size 1Mb for its data file and
-f
Christopher Kempster
89
S Q L
-Ttrace-number
-yerror-number
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
the log will be 0.5Mb; this will only occur if you have not altered tempdb as it
only works over the tempdev and templog logical files.
Startup the instance with specified trace flag. Use multiple –T commands for
each trace number used.
If error-number is encountered, SQL will write a stack trace out to the SQL
Server error log file.
NOTE – Although not mandatory, the –c and –f parameters are typically used
together when starting an instance in minimum configuration mode.
SQL Server Log Files
In SQL Server the error log and its destination is defined by the –e startup
parameter. This is also stored within the registry during instance start-up:
From within EM, we can right click the instance for global properties and view the startup parameters rather then searching the registry:
The management folder in EM allows the DBA to view the contents of the log files. I say
files because SQL Server will cycle through six different log files (default setting). The
cycling of the files will occur on each re-start of the SQL Server instance, or via exec
sp_cycle_errorlog.
The DBA can control the number of cycled logs via a registry change. This change
can be done within Query analyser if you like or via EM by right click for properties in
the SQL Server Logs item under the Management folder.
Exec xp_instance_regwrite
N'HKEY_LOCAL_MACHINE',
N'SOFTWARE\Microsoft\MSSQLServer\MSSQLServer',
Christopher Kempster
90
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
N'NumErrorlogs', REG_DWORD, 6
The DBA can view error logs within query analyser using:
exec master.dbo.xp_readerrorlog [number of log file, values 0 to 6]
The valid list can be retrieved via:
exec sp_enumerrorlogs
Archive#
0
1
2
3
4
5
6
Date
06/15/2002
06/14/2002
06/03/2002
06/03/2002
05/26/2002
05/12/2002
05/11/2002
23:59
21:25
21:36
10:29
14:25
16:27
15:54
Log File Size (Byte)
3646
21214
3063
49852
31441
5414
2600
The error log for SQL*Agent is best managed with EM. With SQL*Agent shutdown, select
its properties and you can control a range of log settings:
Location & filename of the error log.
Include full error trace with log entries
Use non-unicode file format
Read Agent Log Example
The website “SQLDev.Net” (http://sqldev.net/sqlagent.htm) has a fantastic stored
procedure that really simplifies the reading of the SQL Agent logs. Well worth a look.
How and When do I switch sql server logs?
I switch them manually at the end of each day (around midnight). The files can get huge
and makes life overly difficult when looking for errors, especially with deadlock tracing
enabled. To switch them I created a small SQL Agent Job which can be invoked with the
following SQL command:
exec sp_cycle_errorlog
You may feel the need to retain 10 or more logs, rather than the standard 6 (see
previous section item). I find the default is more than enough when cycled daily.
Detecting and dealing with Deadlocks
Christopher Kempster
91
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
The NT performance monitor is a good place to start to determine the extent of a
problem. We use the counter:
SQLServer:Locks \ Number of Deadlocks\sec
Ideally its value is zero and/or a rare event. There are situations where this is difficult especially third party applications or your OLTP database that is also being used for
reporting and other batch type events out of your control. The DBA should follow up
with SQL Profiler to better trace the deadlocks occurring.
Profiler is a powerful tracing tool, but it does have some problems when tracing
deadlocks as we will see later. On starting a new trace, the DBA should include the
events:
Errors and Warnings
Exception
Locks
Lock: Deadlock
Lock: Deadlock Chain
If you stayed with this, and waited for your expectant deadlock to occur, you will get very
little information of the objects affected or statements executed unless you select the
data column Object Id. From here you need to manually use OBJECT_NAME to
determine the object affected.
Why is this a problem? To get more information you typically include the event T-SQL:
SQL: Batch Completed. If you run the trace with this option then you will be tracing ALL
completed batches and in a busy system, this can mean thousands of entries within
minutes; This makes tracing a difficult and time consuming task. Even so, if you can
deal with this, you will get a thorough list of statements related to the deadlock; stop the
trace after the deadlock occurred and use the find dialog to search the columns for
deadlock event, then search backwards from the SPIDS involved in the trace to get a
summary of the commands before the deadlock.
NOTE – Running profiler whilst locking is already underway and a problem, will do
you no good. You may only get a small amount of relevant information about the
issue (i.e. profiler doesn’t magically trace already running processes before
continuing on its way with current events).
The client application involved in a deadlock will receive the error# 1205, as shown
below:
Server: Msg 1205, Level 13, State 50, Line 1
Transaction (Process ID 54) was deadlocked on {lock} resources with another process
and has been chosen as the deadlock victim. Rerun the transaction.
To assist in this circumstance, utilise EM or run the following commands:
exec sp_who2
dbcc inputbuffer (52)
exec sp_MSget_current_activity 56,4,@spid=52
information
Christopher Kempster
92
-- view all sessions
-- get SQL buffer for 52
-- get extended locking
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
Finally, the DBA can utilise trace flags. This is an effective method for debugging
deadlocks and provides some excellent error log data. The flags are:
1204 Get lock type and current command affected by deadlock
1205 Get extended information about the command being executed (e.g.
graph)
1206 Complements 1204, get other locks also participating in the deadlock
3605 Send trace output to the error log (optional, will go there anyhow).
The screen shots below illustrate the output from a deadlock with the traces enabled. I
have no statistics on the adverse effect to DBMS performance, but this is a very effective
method for debugging problem systems that are deadlocking frequently but you can
never get a comprehensive set of data to debug it.
The actual deadlock is around the customer and employer tables. Two processes have
updated one of the two separately and have yet to commit the transaction; they
attempted to select each others locked resources resulting in the deadlock. This is not
reflected in the log dump.
Christopher Kempster
93
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
The ECID is the execution context ID of a thread for the SPID. The value of zero
represents the parent thread and other ECID values are sub-threads.
Check with http://support.microsoft.com for some excellent scripts to monitor blocking in
SQL Server.
Example Deadlock Trace
We have a large COM+ based application that was experiencing deadlocking issues. The
key issue here is that COM+ transactions use an isolation level of serialisable. As such,
locks of any sort can be a real problem in terms of concurrency. To start resolving the
problem we:
a) Worked with the developers in determining out how to replicate the error
a. this allowed us to identify the code segments possible causing the error
and assisted of course with testing.
b) Set instance startup parameters -T1204 -T1205 -T1206, re-started the instance
c) Ran Profiler
a. Filter the database we are concerned with
b. Include event classes
i. Lock:Deadlock
ii. Lock:Deadlock Chain
iii. SQL:StmtCompleted
iv. RPC:Completed
c. Included standard columns, namely TextData and SPID
d) Ran the code to cause the deadlock.
Search the profiler trace:
Lock:Deadlock identifies that SPID 67 was killed. Go back through the trace to locate the
commands executed in sequence for the two SPIDS in the chain. Take some time with
this, you need to go back through the chain of transaction begins (in this case they are
COM+ transactions) to clearly determine what has happened for each SPID’s transaction
block.
Christopher Kempster
94
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
To assist with further debugging, goto your instance error log and locate the deadlock
dump chain:
IX lock wanting to
be taken out
Last command from
buffer. (stored proc or
DML statement
select object_name(918450496) will give
you the table name to help identity the
possible problem statement to start looking
for in the batch of SQL being executed.
Current lock being
held
Orphaned Logins
At times, the DBA will restore a database from one instance to another. In doing so,
even though the login exists for the instance, the SID (varbinary security ID) for the
login is different to that in the other instance. This effectively “orphans” the
database user from the database login due to this relationship between
master..syslogins and mydb..sysusers.
master database
user database
syslogins
sysusers
SELECT SUSER_SID('user1')
0x65B613CE2A01B04FB5E2C5310427D5D5 -- SID of user1 login, instance A
0x680298C78C5ABC47B0216F035B3ED9CC -- SID of user1 login, instance B
Christopher Kempster
95
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
In most cases, simply running the command below will fix the relationship and allow the
login to access the user database (must run against every database in which the login is
valid).
exec sp_change_users_login <see books online>
This will only work for SQL logins and not fully integrated logins (which is a down right
pain). Write your own script to resolve this problem.
If you are still getting errors, consider removing the user from the sysusers table in the
restored database, and re-add the user.
The DBA can validate NT login accounts via the command:
EXEC sp_validatelogins
NOTE – Do not use ALIASES; this allowed the DBA to map a single login to many
database user. This is a dated feature that may disappear in newer versions.
Microsoft released a great support document related to the scripting of logins, which
includes the original password. The script is not complete in terms of all possible options,
but is very handy:
Example: (http://support.microsoft.com/default.aspx?scid=kb;[LN];Q246133)
/* sp_help_revlogin script
** Generated Nov 17 2002 12:14PM on SECA\MY2NDINSTANCE */
DECLARE @pwd sysname
-- Login: BUILTIN\Administrators
EXEC master..sp_grantlogin 'BUILTIN\Administrators'
-- Login: user1
SET @pwd = CONVERT (varbinary(256),
0x0100420A7B5781CB9B7808100781ECAC953CB1F115839B9248C3D489AC69FA8D5C4BE3B11B1ED1A3
0154D955B8DB)
EXEC master..sp_addlogin 'user1', @pwd, @sid = 0x680298C78C5ABC47B0216F035B3ED9CC,
@encryptopt = 'skip_encryption'
-- Login: user2
SET @pwd = CONVERT (varbinary(256),
0x01006411A2058599E4BE5A57528F64B63A2D50991BC14CC59DB0D429A9E9A24CA5606353B317F4D4C
A10D19A2E82)
EXEC master..sp_addlogin 'user2', @pwd, @sid = 0xF1C954D9C9524C41A9ED3EA6E4EA82F4,
@encryptopt = 'skip_encryption'
Orphaned Sessions - Part 1
I rarely have to deal with orphaned sessions and worry little about them unless of course
it’s resulting in system degradation or chewing server CALs. The trick here is the
identification of the database session. This can be confusing, especially with connection
sharing via COM+ components - database process that you believe is related to a COM+
connection could be different when you next refresh the screen.
Christopher Kempster
96
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
We can assume the session is orphaned or killable by:
a) Processes Status = ‘awaiting command’
b) Proccesses Last Batch date – Getdate() is longer than what we typically expect
c)
We know the XYZ application and its logins/db connect properties crashed, but
processes remain.
d) If the SPID is -2 then assume the session is orphaned
The process information can be gleamed from sp_who or sp_who2 or querying
sysprocesses; and the utmost care must be taken. As such, I highly recommend you:
a) Run – DBCC INPUTBUFFER against the spid to determine its last statement of
work
b) Run – sp_lock to determine if the SPID is locking objects
To kill the SPID, use the KILL command.
The System Administrator may consider altering the TCP/IP keep alive timeout setting in
the registry:
HKLM\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters\KeepAliveTime
The default is 2hrs and is measured in milliseconds. As a general guide, consider a lower
value in the order of one hour.
Orphaned Sessions - Part 2
An orphaned session has a SPID of –2. Orphaning may be caused by a variety of things
(though are rare) and is typically linked to the distributed transaction coordinator (DTC).
A DTC related problem will show up in the SQL Server log files with an error such as “SQL
Server detected a DTC in-doubt transaction for UOW <value>”. If the transaction issue
can not be resolved, then a kill statement can be issued over the UOW code. For
example:
Kill 'FD499C76-F345-11DA-CD7E-DD8F16748CE'
The table syslockinfo has the UOW column.
NOTE – Use Component Services when appropriate to drill into COM+ classes and
their instantiations to locate UOW to assist with killing the correct SPID at the
database.
Change DB Owner
You can change the database owner with the command:
Christopher Kempster
97
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
USE MyDatabase
EXEC sp_changedbowner 'MyUser'
GO
This requires the sysadmin instance privilege, and the user will logically replace the dbo
user (dbo can never be revoked/removed altogether). You cannot run this command
against the system databases. There is seldom any need to use this command.
Transfer Diagrams between databases
To transfer a database diagram from one database to another:
a) Determine the Object Id (or diagram) to be copied from database A to B.
The id column for dtproperties is an identity column. I have two
diagrams, one called “CKTEST” and the other called “Another Diagram”.
select objectid, value from dtproperties
b) To transfer the diagram, we need to ensure the objectid column is unique.
If not, we can set the value manually to whatever value we like.
INSERT INTO B.dbo.dtproperties
(objectid, property, value, lvalue, version)
SELECT objectid, property, value, lvalue, version
FROM A.dbo.dtproperties
The databases should be identical of course. If not, opening diagrams will result in the
loss of objects within the diagram OR you will simply see no diagrams listed at the
destination.
IMPORTANT – No diagrams in the destination database? The dtproperties table
will not exist and will, of course require changes to the steps above.
Christopher Kempster
98
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
Transfer Logins between Servers
To transfer logins between servers and retain the logins’ passwords (SQL Logins),
consider utilizing the DTS task to transfer logins between servers, or use the following
SQL statement:
select 'sp_addlogin @loginame = ' + name + ', @passwd = "' + password
+ '", @encryptopt = skip_encryption, @deflanguage = "' + language + '"'
+ char(13) + 'go' from syslogins
where name in ('user1', 'user2')
The key to this script is the skip encryption option. Note that we still need to:
a) setup the login
database user relationship (sp_adduser)
b) assign database user privileges
Killing Sessions
Within SQL Server, each connection (physical login) is allocated a spid (system
server process identifier and worker thread). To identify them we can execute the
system stored procedures:
exec sp_who
or
exec sp_who2
The DBA can also use the current activity option under the Management Group folder
and select Process Info.
NOTE - Be warned. I have experienced major performance problems when
forcing the refresh of current activity via Enterprise Manager to a point where the
CPU’s hit a solid 50% for 5 minutes before returning back to me. This is not
acceptable when running against a hard working production system.
Once the SPID has been identified, we use the KILL command to remove the session.
For example:
select @@spid
exec sp_who2
Kill 55
Kill 55 with statusonly
-----
get my sessions SPID, in this case its 51
from output, determine SPID to be killed
issue kill request to DBMS
get status of the request
SPID 55: transaction rollback in progress. Estimated rollback completion: 100%.
Estimated time remaining: 0 seconds.
The DBA should reissue sp_who2 to monitor the SPID after the kill to ensure
success. Also consider looking at and joining over the tables:
• sysprocesses
• syslock
• syslockinfo
Christopher Kempster
99
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
You cannot kill your own processes. Be very careful you do not kill system
processes. The processadmin fixed system role will allow a user to kill SQL Server
sessions.
NOTE - Killing SPIDs running extended stored procedures or which did a call-out
to a user created DLL may take some time to kill and, in some cases, seem to have
stopped but remain as a running process.
The ALTER Statement
An alternative method to the KILL command is the alter database statement. The only
issue here is the alter database statement requires the database status to be altered, you
cannot simply kill user connections without making the database read only, for DBO use
only for example. Therefore, it may not suite your specific requirements.
Here is a practical example of its use:
alter database northwind set restricted_user with rollback immediate
The termination clause will perform the session removal and rollback of transactions. If
the termination clause is omitted, the command will wait until all current transactions
have been committed/rolled back.
See BOL for more information about the command. Try and use this command over KILL
when you need to disconnect all databases sessions AND change the database status.
How do I trace the session before Killing it ?
To get further information about a session, consider the command:
DBCC PSS (suid, spid, print-option)
For example:
--Trace flag 3604 must be on
DBCC TRACEON (3604)
--Show all SPIDs
DBCC PSS
DBCC TRACEOFF (3604)
GO
Another option apart from utilizing profiler is:
exec sp_who2
dbcc inputbuffer (52)
exec sp_MSget_current_activity 56,4,@spid=52
-- view all sessions
-- get SQL buffer for 52
-- get extended locking information
Taking this a step further, adapt the code fragment below to iterate through SPIDs
and capture their event data returned from INPUTBUFFER for further checking.
DECLARE @ExecStr varchar(255)
CREATE TABLE #inputbuffer
Christopher Kempster
100
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
(
EventType nvarchar(30),
Parameters int,
EventInfo nvarchar(255)
)
SET @ExecStr = 'DBCC INPUTBUFFER(' + STR(@@SPID) + ')'
INSERT INTO #inputbuffer
EXEC (@ExecStr)
SELECT EventInfo
FROM #inputbuffer
NOTE – DBCC INPUTBUFFER only shows a maximum of 255 characters and the
first statement only if the process is executing a batch. Consider ::fn_get_sql()
instead.
As of SQL Server 2000 SP3 (or 3a), the DBA can use the system function
::fn_get_sql() in association with the master..sysprocesses table and its three
new columns:
• sql_handle – handle to the currently running query, batch or stored proc,
a value of 0x0 means there is no handle
• stmt_start – starting offset within the handle
• stmt_end – end of the statement within the handle (-1 = end handle)
For a single script to save your time try this site:
http://vyaskn.tripod.com/fn_get_sql.htm
Setting up and Sending SQL Alerts via SMTP
This section will show you, through code, how to enable SQL Agent alerts and using a
simple stored procedure and DLL, send emails via an SMTP server rather than using SQL
Mail. We will cover these items:
•
SQL Agent Event Alerts
•
SQL Agent Tokens
•
Calling DLL's via sp_OA methods
•
Using RAISEERROR
•
The SMTP DLL
Here is the code for our simplecdo DLL (some items have been cut to make the code
easier to read). The routine is coded in VB and makes use of the standard CDO library:
Public Function SendMessage(ByVal ToAddress As String, _
ByVal FromAddress As String, _
ByVal SubjectText As String, _
ByVal BodyText As String, _
ByVal Server As String, _
ByRef ErrorDescription As String) As Long
'This is the original function (no attachments).
'
Christopher Kempster
101
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
Dim lngResult As Long
lngResult = Send(ToAddress, FromAddress, SubjectText, BodyText, Server, "", ErrorDescription)
SendMessage = lngResult
End Function
Private Function Send(ByVal ToAddress As String, _
ByVal FromAddress As String, _
ByVal SubjectText As String, _
ByVal BodyText As String, _
ByVal Server As String, _
ByVal AttachmentFileName As String, _
ByRef ErrorDescription As String)
'Simple function for sending email from an SQL Server stored procedure.
'Returns 0 if OK and 1 if FAILED.
'
Dim Result As Long
Dim Configuration As CDO.Configuration
Dim Fields As ADODB.Fields
Dim Message As CDO.Message
On Error GoTo ERR_HANDLER
'Initialise variables.
Result = 0
ErrorDescription = ""
'Set the configuration.
Set Configuration = New CDO.Configuration
Set Fields = Configuration.Fields
With Fields
.Item(CDO.CdoConfiguration.cdoSMTPServer) = Server
.Item(CDO.CdoConfiguration.cdoSMTPServerPort) = 25
.Item(CDO.CdoConfiguration.cdoSendUsingMethod) = CdoSendUsing.cdoSendUsingPort
.Item(CDO.CdoConfiguration.cdoSMTPAuthenticate) = CdoProtocolsAuthentication.cdoAnonymous
.Update
End With
'Create the message.
Set Message = New CDO.Message
With Message
.To = ToAddress
.From = FromAddress
.Subject = SubjectText
.TextBody = BodyText
Set .Configuration = Configuration
'Send the message.
.Send
End With
EXIT_FUNCTION:
'Clean up objects.
Set Configuration = Nothing
Set Fields = Nothing
Set Message = Nothing
Send = Result
Exit Function
ERR_HANDLER:
Result = Err.Number
Christopher Kempster
102
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
ErrorDescription = "Number [" & Err.Number & "] Source [" & Err.Source & "] Description
[" & Err.Description & "]"
Me.LastErrorDescription = ErrorDescription
GoTo EXIT_FUNCTION
End Function
Copy the compiled DLL to your DB server and run the command below to install:
regsvr32 simplecdo.dll
The Stored Procedure
The routine below makes the simple call to the SMTP email DLL. We have hard coded
the IP (consider making it a parameter). I was also sloppy with the subject heading for
the email. Again this should be a parameter or better still a SQL Agent Token (see later).
CREATE PROCEDURE usp_sendmail (@recipients varchar(200), @message varchar(2000)) AS
declare @object int, @hr int, @v_returnval varchar(1000), @serveraddress varchar(1000)
set @serveraddress = '163.232.xxx.xxx'
exec @hr = sp_OACreate 'SimpleCDO.Message', @object OUT
exec @hr = sp_OAMethod @object, 'SendMessage', @v_returnval OUT, @recipients, @recipients, 'test',
@message, @serveraddress, @v_returnval
exec @hr = sp_OADestroy @object
GO
Creating the Alert
Run Enterprise Manager, under the Management folder expand SQL Agent and right click
Alerts - New Alert.
Christopher Kempster
103
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
In this case our alert is called CKTEST. We are going to send the DBA an email whenever
a logged severity 16 message occurs for any databases (not really practical, but this is
just an example).
Click on the Response tab next.
Uncheck the email, pager and net send options (where applicable for your system).
Check the execute job checkbox, drop down the list box and scroll to the top, and select
<New Job>.
Enter the Name of the new SQL Agent Job, then press the Steps button to create a step
that will call our stored procedure.
Christopher Kempster
104
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
Here we enter the step name, it is a t-sql script of course, and the command which is:
exec master.dbo.usp_sendmail @recipients = '[email protected]',
@message = '
Error: [A-ERR]
Severity: [A-SEV]
Date: [STRTDT]
Time: [STRTTM]
Database: [A-DBN]
Message: [A-MSG]
Check the [SRVR] SQL Server ErrorLog and the Application event log on the
server for additional details'
This is where the power of Agent Tokens comes into play. The tokens will be
automatically filled in. There are numerous tokens you can leverage; here are some
examples:
[A-DBN]
[A-SVR]
[DATE]
[TIME]
[MACH]
[SQLDIR]
[STRTDT]
[STRTTM]
[LOGIN]
[OSCMD]
[INST]
Alert Database name
Alert Server name
Current Date
Current Time
Machine name
SQL Server root directory
Job start time
Job end time
SQL login ID
Command line prefix
Instance name (blank if default instance)
Click OK twice. The Job and its single step are now created. In the Response windows
press Apply, then press OK to exit the Alert creation window and return back to
enterprise manager.
Christopher Kempster
105
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
Goto Jobs under SQL Server agent to confirm the new Job and its step we have just
created.
Testing the Alert
Run Query Analyser and run the following:
RAISERROR ('Job id 1 expects the default level of 10.', 16, 1) with log
The with log clause is important. The alert will not fire without it.
Shortly I receive the following email:
Recommended Backup and Restore Alerts
The following alerts are highly recommend for monitoring SQL Server backups and
recovery (59):
a) Error – 18264, Severity – 10, Database successfully backed up
b) Error – 18204 and 18210, Severity – 16, Backup device failed
c) Error – 3009, Severity – 16, Cannot insert backup/restore history in MSDB
d) Error – 3201, Severity – 16, Cannot open backup device
e) Error – 18267, Severity – 10, Database restore successfully
f)
Error – 18268, Severity – 10, Database log restored successfully
g) Error – 3443, Severity – 21, Database marked as standby or readonly but has
been modified, restore log cannot be performed.
The DBA can alter the text, and include the alert tokens as necessary.
Christopher Kempster
106
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
4
Chapter
High Availability
O
ne of the most important issues for many organisations revolves around disaster
recovery (DR), which goes hand-in-hand with the topic of high availability.
When we talk about high availability, we are primarily focused on almost
seamless failover of our servers hosting the applications they are running; and the
technologies to support the continuation of service with as little interruption to the
business as possible. The solution you come up with will be dictated by the realization of
its:
a) value-add to the business and customer expectations (latent and blatant)
b) cost of system downtime and manual cutover
c) issues of business continuity and system reliance
The problem you tend to have is that systems typically grow into this realization rather
than being born with it. As such, the DBA and system architects must carefully consider
the overarching issues with application state, server configuration, OS and DBMS editions
purchased, technologies being used (clusterable?) and to some degree, “brail the future
environment” in which the application will live and breath.
Throughout this chapter, we will compare and contrast the high availability options to
support your DR plan and identify issues in advance. We also cover clustering with
VMWARE so you can repeat the same tests using your home PC.
Purchasing the Hardware
So what hardware should I buy?
The selection of hardware, be it for a cluster or not is a tough job. The DBA relies on past
performance indicators, systems growth, wisdom of yourself and fellow IT professionals
and of course, restrictions outlined by enterprise architecture initiatives (and how much
of the budget you have). These and other factors can turn a great system into one that
has mediocre performance with great disaster-recovery, visa versa or none of these.
This section discusses some of many issues related to hardware selection.
Here are some general “rules”:
Christopher Kempster
107
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
a) Enterprise quality applications require hardware from enterprise hardware
vendors (HP/COMPAQ, IBM, DELL etc) - but shop around
b) Try and picture your future production environment and the business applications
hosted within it. Attempt to marry it with your enterprise architecture and its
possible visions for systems consolidation, this may also asssit in building your
dev/test environment. Looking forward is an essential planning skill that takes
some time to master, especially with IT.
c) Consider cheaper “components”, i.e. RAM, prices can be hugely inflated from the
larger hardware vendors. An absolute minimum for RAM is 2Gb ECC. Be very
careful with desupport dates on hardware, and again, looking into the future
through research may save you thousands of dollars.
d) Check OS compatibility lists carefully; you may find Windows 2003 is not on the
supported OS list for example – never take the risk.
e) Plan to use RAID in your DEV/TEST servers - but watch out for the number of free
drive bays, size your disks carefully!
f)
Be pragmatic with your RAID configuration, instances with multiple databases will
not mean a single RAID-1 set for each transaction log. Be concerned with
logical/physical read/writes and the tuning of SQL to reduce overall system load.
Be aware that RAID-5, with suitable HBA backup cache, and read/write cache will
meet most expectations.
g) Go SCSI-320 for internal disks. Read the next section on RAID and Storage for a
further insight. In production we should try a leverage enterprise class shared
mass storage devices (SAN’s) over direct attached storage solutions or very large
internal system storage.
h) Plan to hook into your enterprise backup software rather than buying the
hardware and tapes – networking issues? size of backups? impact on the
business? time to backup and restore? responsibilities and accountabilities? agent
OS compatibility? Tape drive per server is a very costly solution in the longer
term.
i)
When you make a decision on the hardware specs take time research the choice –
any issues? compatibility problems? potential install nightmares?
j)
Always pay for longer terms parts/labour warranties – but read the conditions and
turn around times very carefully.
k) Consider 64bit computing as the end-game for future hardware infrastructure.
Taking this a little further, here is a simple list to review and add to when selecting
hardware:
Server Check List
Operating systems supported by the hardware have been checked? (and you know the
OS requirements of the services to be run?)
Existing order will cover the services to be provisioned?
Enterprise backup agents/software supported on the chosen HW and OS?
Christopher Kempster
108
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
SAN connection required? Dual HBA’s? and is there a standard HBA to suit existing
switch technology?
Existing RACK’s can house proposed servers (in terms of free space and existing rack
dimensions)
Warranty and support has been factored in? restrictions based on distance, public
holidays, where parts are located etc
CPU, RAM (including stick configurations and free slots), HBA’s, NIC’s, Power
(swappable?)
BTU and power draw?
Installation and shipping included? Insurance covered?
Monitor/CD-ROM/USB/Mouse required?
SAN bootable? (any specific restrictions?)
Additional RAID card to achieve RAID-5?
Power slots required?
Management network interface?
Specific cluster support/restrictions ie. ex-public holidays, parts are stored on the other
side of the country?
Can the OS you plan to load onto the hardware support the features you plan to
purchase? ie. all the RAM/CPU’s, multi-clustered nodes etc.
What is the desupport date for the hardware? Is it old technology?
Do you plan to use virtualization technology? and if so, will the vendor support it?
What are the limits of a virtual machine that will adversely effect the services delivered
on them?
Is vertical scalability important? what criteria does this impose?
Can the power requirements of the server or rack be met in terms of its placement?
Can the equipment be moved to the location? (physical restrictions)
UPS required?
Dual Ethernet and HBA cards required? Communications team have standardized on
specific models?
So you are ordering a 4 cpu machine – what are the database licensing implications?
Will you purchase software assurance?
Hardware listed in the Windows HCL? (discussed in the next section)
I often see the consolidation of servers (incl. storage) and their hosted applications onto
one of the following architectures:
1) Medium to large scale co-located servers in clusters within a data center;
this typically represents the same set of servers for each application but
with their co-location to a single managed environment.
2) SAN (fiber or iSCSI) mass storage, leveraged by racked 1U and 2U blade
servers . The blade servers are of course co-located servers into a data
center; this is similar to 1) but sees the introduction of storage
consolidation and commodity blade servers.
3) SAN (fiber or iSCSI) mass storage, leveraged by consolidated large scale
servers hosting numerous virtual servers using virtual server software/or
alternatively virtualized at the BIOS/OS level where supported (typically
Unix implementations). This is taking 1) a step further with the reduction
in the total number of servers into clustered large-scale enterprise servers
that virtually host a reduced number of environments where possible.
4) SAN with VMWARE or other server virtualization technology (such as
LPAR) on highly scalable server infrastructure.
Christopher Kempster
109
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
See Appendix A for further information on SAN, NAS, iSCSI, RAID and TAPE
issues. We also cover the basics of data center based hardware such as racks and
blade servers.
No matter the project’s budget or the timeframe for production rollout, make every
attempt to align with your enterprise vision for hosted production systems and seek
active buy-in and support from executive management.
What is the HCL or the “Windows Catalog”
The Quality Online Service offered by Microsoft is a quality endorsement policy and
procedure that allows hardware and software vendors to use the “designed for windows”,
“certified for windows” and “.net connected” logos. The vendor must pass a range of
tests, typically managed by a third party company which at last view was managed by
VeriTest (www.veritest.com). Apart from the fact that software and hardware is retested independently, the certification typically means you will (potentially) have fewer
issues when purchasing the goods and will be better supported by Microsoft. This is
particularly important in clustered server implementations.
As a general recommendation, read before you buy! Do not purchase hardware without
some prior understanding of HCL (hardware compatibility list) and the Microsoft Windows
Catalog support; especially in large scale enterprise solutions and associated operating
systems (like data centre server). I personally do not believe Microsoft would provide
any different support to that given to HCL hardware/software buyers, but will certainly
make your life harder if you do have serious problems.
A classic case in note was iSCSI (discussed later) support within Windows 2003 and MS
Exchange. Although fully supported, the HCL was very sparse in terms of vendors
compliance as at March 2004.
Read more at: https://winqual.microsoft.com/download/default.asp
High Availability using Clusters
In my previous e-book, “SQL Server 2k for the Oracle DBA”, I discussed SQL Clustering
in some depth but not much a lot about recovery in this environment. This chapter will
cover a large number of day-to-day issues you may experience using this technology,
and provide a walkthrough using VMWARE.
Please download the free chapter on High Availability from my website for more
information on SQL Clusters.
Let us clarify first the following availability terminology:
a) Hot Standby (passive SQL Cluster) – immediate restoration of IT services
following an irrecoverable incident. The delay will be less than 2-4hrs.
b) Warm Standby (bring online passive DR servers) – re-establishment of a
service within 24 to 72hrs, be it local or remote in nature. The full service
is typically restored.
Christopher Kempster
110
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
c) Cold Standby (establishment of new servers/environment) – longer than
72hr restoration of full IT services.
Identification of SPOF (single points of failure) and their risk/mitigation and contingencies
strategies is of key importance. The design for high availability needs to consider the
elimination of single points of failure or provision alternate components to minimize
business impact. The ITIL framework suggests these definitions:
a) High Availability – mask or minimizes the effect of IT component failure.
b) Continuous Operation – mask or minimize the effect of planned downtime.
c) Continuous Availability - mask or minimize the effect of ALL failures.
From a SQL perspective, the DBA can utilize a number of strategies to mitigate the risks:
a) Windows Clustering and SQL Server cluster technology;
b) Log Shipping;
c) Replication - can be complex to configure and manage. Also remember
that changes are not automatically provisioned into the published
replica(s), requiring ongoing administrative effort and careful change
control practices. Due to this, I do not regard it as a overly effective form
of high availability; and
d) Federated Databases - primarily for performance over availability.
This chapter will cover a) and b) only. There are many third party high availability
solutions not covered in this ebook, including:
a) Legato Costandby Server
b) PolyServe Matrix HA
c) SteelEye LifeKeeper
d) Veritas Clustering Service (VCS) and ClusterX products
Using VMWARE we will cover SQL Server Clustering and troubleshooting techniques.
VMWARE SQL Cluster - by Example
The VMWARE software from VMWARE the company (www.vmware.com) allows us to run
virtual machines (VM) over an existing operating system, creating an interface layer
between physical devices and allowing the user to define virtual hardware from it. The
software goes further though, allowing us to build new devices that physically do not
exist, such as new disk drives, network cards etc.
The version we will be using is called VMWARE Workstation v4.0, check the vendors
website for new editions. The direct competitor to VMWARE in the Microsoft space is
Microsoft Virtual PC (and server when it is released). Although a good product, I found it
slow and did not support SQL clustering during the writing of this particular chapter.
Christopher Kempster
111
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
IMPORTANT – Installing VMWARE virtual machines will NOT replace your existing
host operating system in any way. The VMWARE itself depends on the host OS
itself to run.
The aim here is to:
a) allow the DBA to replicate SQL Cluster testing with a relatively low spec PC – the
machine used for this example is a AMD 2.6Ghz, 1Gb RAM, single 80Gb drive;
b) catalyst to discuss cluster design issues; and
c) allow DBA’s to realize the benefits of virtual server environments as another form
of high-availability and server consolidation.
Using VMWARE in Production?
To be right up front, I am a BIG advocate for VMWARE, not only in your
development/test environments, but also in production. As an example, I worked with a
customer that ran mission critical applications in high-scalable IBM x445 servers (16
CPU’s) over VMWARE ESX.
Four servers were provisioned within a in a geographically dispersed environment
connected to large SAN’s via fiber interconnects. The servers ran a mix of Linux and
Windows 2003 operating systems, all provisioned from golden templates (discussed
later) running as SQL clusters and a variety of other smaller applications.
The essential value added extras with the virtualized server environment were:
a) speed in which new virtual servers (and application services) can be provisioned;
b) virtual servers could be moved in near realtime (whilst running) between physical
hardware through VMOTION technology;
c) test environments could be establish as mirrors of production within a matter of
minutes;
d) VMWARE Virtual Center provides a single unified management interface to all
systems, no matter their underlying OS; and
e) large capacity (and relatively cheap x86 architecture) hardware can be better
utilized.
Step 1. Software & Licensing
In the following steps we are using VMWare Workstation 4.0.5; you can download a trial
from www.vmware.com. Install the software as per installation guide. As a general idea I
run Windows XP Pro with 1Gb RAM on a 2.6Ghz AMD chip; reserve approximately 1.8Gb
of HDD space per server then add around 500Mb per SCSI disk in the fake disk array
thereafter (we install 3 non-raid disks in the array).
At the same time, think about the server operating system you plan to install. We require
three nodes in a virtual network, being:
1) Domain Controller
Christopher Kempster
112
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
2) Cluster Node 1
3) Cluster Node 2
Take care with this step. The OS must support the cluster service (of course). The
following example is using Microsoft Windows 2000 Advanced Server edition. Check
carefully with your local Microsoft vendor/re-seller regarding personal use licensing
issues, as we are installing three servers.
The DBA should be aware of the following limits:
Windows 2000 Advanced Server
Windows 2000 Data Center Server
Max 2 Nodes, 8 CPU’s, 8Gb RAM
Max 4 Nodes, 32 CPU’s, 64 Gb RAM
Windows 2003 Enterprise Ed
Windows 2003 Data Center Ed
Max 8 Nodes, 8 CPU’s, 32Gb RAM
Max 8 Nodes, 32 CPU’s, 64Gb RAM
Paul Thurrott has a great website for feature comparisons and Windows in general http://www.winsupersite.com/showcase/winserver2003_editions.asp
http://www.winsupersite.com/reviews/win2k_datacenter.asp
IMPORTANT – “Microsoft does not support issues that occur in Microsoft
operating systems or programs that run in a virtual machine until it is determined
that the same issue can be reproduced outside the virtual machine environment.”,
See MS Support #273508 for the full support note. Do note that many corporate
resellors of VMWARE will provide this level support.
Step 2. Create the virtual servers
Run VMWARE, select File
New Virtual Machine, select typical machine configuration
and from the drop down list we pick Windows 2000 Advanced Server. Enter the name of
the virtual server (it makes no difference to the host name of the server itself) and the
location of you server on disk. I have placed all virtual servers at:
C:\work\virtualservers\
VMWare will pick up all current PC hardware and provide a summary, along with RAM
(memory) it plans to use on the virtual machines startup. Generally no changes are
required. Do not alter the network unless you are very comfortable with vmware
network bridging.
Below is a screen shot of the server before startup:
Christopher Kempster
113
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
You can create all of your virtual servers now if you like to save you repeating the
process, but we only start off with the domain controller. The three servers I have
created are:
a) winas2kdc – domain controller
b) winas2kcn1 – cluster node 1
c) winas2kcn2 – cluster node 2
NOTE – We do not use VMWARE server templates (also known as golden
templates) as it requires extensive coverage by system administrators (not me!).
The golden template is a “known image” (or copy) of a virtual server at some point
in time. The image includes the base software any of your normally provisioned
servers would include and is carefully crafted to remove networking issues when
new servers are created from the template and put online.
Click on edit virtual machine settings, and reduce the Guest Size (MB): value down to
around 150 to 220Mb of RAM. No other change is required.
The “end game” is this configuration:
Christopher Kempster
114
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
MYDOMAIN.COM
MSCS Service
“MYCLUSTER”
192.168.1.75
winas2kdc
192.168.0.132
DNS
C:
“SQLCLUSTER1”
192.168.0.78
winas2kcn1
192.168.0.134
AD
DHCP
NIC1
NIC1
NIC1
C:
winas2kcn1
192.168.0.136
C:
NIC2
NIC2
10.1.1.3
Q:
10.1.1.2
E:
F:
NOTE – My naming conventions for the servers - please do not take them to
heart; they are only examples and care must be taken using the OS installed as
part of the hostname for obvious reasons.
Step 3. Build your domain controller
Place the Windows server disk in the cd-rom and start the virtual machine. From here
complete a standard installation of the operating system. The VMWare software will not
affect your current PC operating system, so don’t concern yourself about the steps
related to formatting disks. Ensure NTFS is selected for the file system.
•
When asked, you have no domain or work group to join. If prompted to enter a
new one, enter the word: MYDOMAIN
•
Leave all network options standard (typical). Do not alter them yet.
•
When the install is complete, login to the server as administrator.
By default, the Windows 2000 Configure Your Server dialog is shown with a range of
options. First step is to install active directory. Select this option and follow the prompts.
When asked for the domain name, enter MYDOMAIN.COM and carry on with the defaults
from there.
Run the DNS management tool. We need to configure a zone for mydomain.com.
Expand the tree in the left hand pane and right click properties on Forward Lookup Zones
and select new. I have selected active directory integrated. Once created, select the
zone, right click for properties and click add host. We are going to add the a host entry
representative of the virtual IP of the cluster itself. This host/server will be called
Christopher Kempster
115
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
mycluster (or mycluster.mydomain.com). One of the member servers that are active at
the time in the cluster will take on this IP as we will see later. The IP is 192.168.1.75
Exit from the DNS management utility.
Before we continue, minimize all windows, and on the desktop, select properties of the
My Network Places, I have only one NIC defined for this virtual server so I see:
Select properties of the LAN connection, my TCP/IP properties are as follows:
Christopher Kempster
116
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
The default gateway was actually picked up from my local PC settings, and this is the
gateway properties used for my internet connection sharing with another PC on the
network. Note the IP address and the DNS entries, as expected for the role of the
server.
Under the networking option in Windows 2000 Configure Your Server installs DHCP in
order for the domain controller to allocate IPs to the two member servers when they
come online. The IPs will be reflected in the DNS as the servers are built and come
online into the domain, as one would expect of the DNS.
Create a new scope within DHCP, I called it mynetwork. I have allocated a DHCP address
pool with a range between 192.168.0.133 to 192.168.0.140 (any server member server
will get a IP from this range).
Clicking next will ask for any exclusions (or reservations). Ignore this, and retain the
default lease period of 8 days to complete the DHCP configuration.
Close the DHCP management utility. Go back to Windows 2000 Configure Your Server
and select Active Directory and Manage. We need to create another user with
administrative rights. This is our cluster admin user:
Christopher Kempster
117
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
The username is cluster (for want of a better word), and is a member of the
administrators group. Exit once this is done.
NOTE – Using DHCP for your cluster nodes is NOT recommended and is far from
best practice. I have used it as a demonstration; revert to fixed IPs and remember
to update the DHCP IP range as need be.
Step 4. Build member server 1 (node 1 of the cluster)
Leave the domain controller up and available. We now configure a member server. This
will be node 1 of our cluster. Before we start this virtual server and carry on with the
installation, we need to setup a number of fake SCSI disks (to simulate a disk array in
which the cluster will add as disk resources) and another NIC for the clusters private
network (remember I only have 1 physical NIC in my server, so I need to add a virtual
one in VMWARE).
IMPORTANT – try and keep installation directories and servers identical in the cluster node
installation to avoid any later issues.
Adding SCSI Disks
To simulate a disk array/SAN, or a “bunch of disks” for the cluster, VMWARE allows us to
create virtual disks (simply files in a directory) and add them to both servers in the
cluster via a disk.locking=false clause in the .VMX file for each node.
Before we continue, the solution presented here does not use VMWARE edit server
options to add a hard disk:
Christopher Kempster
118
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
After some effort, I found the VMWARE SCSI drivers did not load on the node, and could
not re-install them. To get around this, visit Rob Bastiaansen’s website to download plain
disks:
http://www.robbastiaansen.nl/tools/tools.html#plaindisks
And extract the disks into a folder called:
C:\work\virtualservers\disks
Edit the .PLN file for the disk and change the path to the disk (.DAT) file; do not alter any
other setting in the .PLN file. I copied the file twice, giving me three 500Mb plain disks.
For each .VMX file on our virtual server cluster nodes (not our DC):
a) shutdown the node
b) locate and edit with notepad the .VMX file for the node
c) Add the following:
<after the memsize option add..>>
disk.locking = "FALSE"
Christopher Kempster
119
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
<<add these near the bottom of the file>>
scsi1.present="true"
scsi1:0.present= "true"
scsi1:0.deviceType= "plainDisk"
scsi1:0.fileName = "C:\work\virtualservers\disks\plainscsi1500mb.pln"
scsi1:1.present= "true"
scsi1:1.deviceType= "plainDisk"
scsi1:1.fileName = "C:\work\virtualservers\disks\plainscsi2500mb.pln"
scsi1:2.present= "true"
scsi1:2.deviceType= "plainDisk"
scsi1:2.fileName = "C:\work\virtualservers\disks\plainscsi3500mb.pln"
Before we start the first server of our cluster, we need to add another NIC.
Adding another NIC for the Private Network
For each server that will be a node in the cluster, we require two NIC’s:
1) For the private network between cluster nodes
2) For the public facing communication to the node
I assume that most people have at least one NIC, if not, use VMWARE and the edit
hardware option to add the two NIC’s. Leave options as default.
For each node when booted, set up IP properties as such:
Node 1
Node 2
For each node, disable netbios over TCPIP for DHCP:
Christopher Kempster
120
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
The actual networking and connection of the server’s NICs can be varied, and I suggest
that the Network Administrators / Communications Specialists be approached well before
the purchase of any hardware. At a high level, we tend to this sort of configuration:
Server
/Node
NIC
Teamed network cards via
SW/HW and trunking
Switch
“Private” network via
switch is configured as
a VLAN
Switch
Multi-switch redundency
Switch
Also, be aware the of the following:
“By agreement with the Internet Assigned Numbers Authority (IANA), several IP networks are always left
available for private use within an enterprise. These reserved numbers are:
•
10.0.0.0 through 10.255.255.255 (Class A)
•
172.16.0.0 through 172.31.255.255 (Class B)
•
192.168.0.0 through 192.168.255.255 (Class C)
You can use any of these networks or one of their subnets to configure a private interconnect for a cluster.
For example, address 10.0.0.1 can be assigned to the first node with a subnet mask of 255.0.0.0. Address
10.0.0.2 can be assigned to a second node, and so on. No default gateway or WINS servers should be
specified for this network.”
Prepare your SCSI disks
With the disks and NICs added, we need to partition and format the disks.
a) Start your DC server first
Christopher Kempster
121
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
b) Start server 1 (winas2kcn1), leave the other server down for the time being.
c) When booting the server, notice in the bottom right hand corner the list of
resources for this server (ignore floppy disk drive messages):
d) Login to the server using the domain administrator account
e) When you login, the server will detect new hardware and install the VMWARE
SCSI drivers for you.
f)
My computer
g) Storage
Manage
Disk Management
h) For each of the three disks listed, DO NOT MAKE THEM DYNAMIC DISKS (also
known as disk signing). Simply create a partition, and format using NTFS,
extended partitions.
a. My disks are mapped to Q:\ (disk1, will be Quorum disk), E:\ and F:\
i)
Repeat this for the other cluster node - winas2kcn2
j)
Once done, shutdown server 2 - winas2kcn2.
Install Cluster Services on Server (node) 1
Now the disks have been added and prepared, and we have two NIC’s defined for our
server nodes:
a) Start your DC server first – already started from previous step
b) Start server 1 (winas2kcn1) – already started from previous step
Christopher Kempster
122
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
c) Leave server 2 (winas2kcn2) down
d) Login with your domain administrator account
e) Navigate to control panel, add/remove programs, add/remove windows
components. The cluster services option will be shown. Run this option
f)
This will be the first node in the cluster:
g) Enter the name of your cluster: MYCLUSTER (the virtual host entry we made in
DNS)
h) Enter the username and password of our cluster administrator user. We called it
“cluster”, and the domain name MYDOMAIN.
i)
Your SCSI disks are shown. We will retain all three disks for this managed cluster
resource.
j)
Pick our Q:\ as the quorum drive (holds all cluster checkpoint and log files
essential to the cluster resource group).
k) Next we are asked about our network configuration. At the first prompt pick your
private network card and select the radio button option “internal cluster
communications only” (private network).
l)
Then we are asked about our public network. Select “all communications (mixed
network)” in the radio button. Addressing is self explanatory.
m) Click through and the cluster service should be up and running:
Christopher Kempster
123
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
NOTE – The following examples are using Windows 2000. Do note that in
Windows 2003, if no shared disks are detected, then a local quorum will be
automatically created; you can also create a local quorum resource after cluster
installation. The local quorum information will be stored in
%systemroot%\cluster\MSCS. Of course, being local it will remain a single node
cluster installation.
To create the quorum with a local resource, use the /fixquorum switch. The
creation is also covered in MS Support article #283715.
Validate Node 1 in the cluster via Cluster Administrator
Open the cluster administrator program:
And we see this. Notice our node 1 (winas2kcn1) is the owner of the resources? Click
through options to familiarize yourself with the cluster.
Run IP config (ipconfig command via the DOS prompt), notice that this server has the IP
address of mycluster (as defined in DNS).
Christopher Kempster
124
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
Step 5. Build member server 2
With the DC (domain controller) and node 1 online, we have added both disks to both
servers and configured their private networks. Now we complete installation by installing
cluster services on node 2.
Boot node 2. Using the control panel wizard continue with cluster service installation by
selecting the second or next node in the cluster:
You will be asked the name of the cluster to join. DO NOT USE “mycluster”. Use the
name of server 1 (winas2kcn1) that currently owns the cluster. Use our cluster user
account to authenticate.
Run cluster administrator to verify your node:
Christopher Kempster
125
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
Step 6. Install SQL Server 2k in the cluster (Active/Passive)
Our VMWARE cluster is now up and running. We will install SQL Server 2k in
active/passive mode in the cluster (i.e. a single named instance on the cluster, not
multiple instances running on both nodes). Remember than only one server (node) in
the cluster can open and read/write to/from the database files at any one time, no
matter how many nodes you have in the cluster. An active/active cluster simply means
two separate instances with their OWN disks and database files are running on both
nodes of a two server cluster (for failover reasons, DON’T install two default instances on
each node – simple but common mistake).
IMPORTANT – SQL Server 2000 SP3 or higher is only supported under a
Windows 2003 cluster.
I will not discuss the wide and varying issues with clusters and sql server 2k installation.
Please visit Microsoft Support and MSDN for detailed articles covering a majority of them.
Also take the time to visit www.sql-server-performance.com and
www.sqlservercentral.com for their feature (and free) articles on clustering sql server.
REMEMBER – You can only install SQL Server Enterprise Edition on a cluster.
You will require:
a) SQL Server 2k Enterprise Edition
b) Latest service pack
IMPORTANT – The setup of COM+ on Windows 2003 is completely different to
running comclust.exe on Windows 2000. The administrator must manually create
the MSDTC group, including IP, Network Name and the DTC resource via the
Cluster Administrator.
Follow these steps to install the DBMS (assumes all servers are up and
installed/configured as described earlier).
Christopher Kempster
126
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
1) On each cluster node, run comclust.exe:
2) From the primary node in the cluster, run cluster administrator
3) Verify MSDTC is clustered correctly by navigating to the resources folder. Check
each node to ensure the distributed transaction coordinator service is running.
Nodes with a DTC issue are shown in the administrator tool:
My node 2 locked up with the CLB setup. A CTRL-C did the trick and a manual
bring resource online via cluster administrator worked without failure. This ebook does not cover such issues.
4) It is generally recommended you turn off all services except the following before
installation (confirm between OS releases):
Alerter
Cluster Service
Computer Browser
Distributed File System
Distributed Link Tracking Client
Distributed Link Tracking Server
DNS Client
Event Log
IPSEC Policy Agent
License Logging Service
Logical Disk Manager
Messenger
Net Logon
Plug and Play
Process Control
Remote Procedure Call (RPC) Locator
Remote Procedure Call (RPC) Service
Remote Registry Service
Removable Storage
Security Accounts Manager
Server
Christopher Kempster
127
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
Spooler
TCP/IP NetBIOS Helper
Windows Management Instrumentation Driver Extensions
Windows NT LM Security Support Provider
Windows Time Service
Workstation
5) Before we install SQL Server 2k – use cluster administrator to offline DTC, right
click for resource properties:
6) Place your SQL Server disk in the CD-ROM, and begin installation on node 1
(win2kascn1)
7) You are prompted, by default for the virtual server name, we will call this
sqlcluster1:
8) We are prompted for the virtual IP of our sql server instance within the cluster.
The resource will be failed over as required in the cluster. Before I enter the IP
here, goto to your domain controller server, run DNS, and add a new Host record
for this server. Double check your DHCP settings also to ensure the integrity of
the allocated IP:
Christopher Kempster
128
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
9) Pick our default data file disk, don’t use your Q:\ (quorum drive); we will use F:\
10) Our two nodes will be shown. We want all servers to be used; no changes
required.
11) Go back to your domain controller. Create a new domain user called
“SQLServerAdmin” and add it to the administrators group. This is the service
account that will run our SQL Server clustered instance.
Christopher Kempster
129
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
for each node, add this user in the administrators group:
and continue on with the SQL Server installation with this domain account:
Christopher Kempster
130
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
12) Enter the instance name, we will use a named instance:
13) You are shown the standard summary screen:
Christopher Kempster
131
T R O U B L E S H O O T I N G
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
We do not install the binaries on the clustered disks. They remain local to the
node. Check Custom and continue.
14) Select your components. Note that Full Text Indexing is not on the list. This will
be installed by default as a cluster resource.
15) Select you service startup account:
16) Then run through your authentication mode, collation, and network library
properties. I leave all default but choose mixed mode.
IMPORTANT – For each instance, ensure you fix the PORT properties over TCPIP.
Do not let SQL Server automatically assign the port as it can change between nodes in
the cluster.
17) Setup begins, node 1 then automatically on node 2:
Christopher Kempster
132
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
18) Complete
19) Reboot Node 1, then Node 2
20) Run Cluster Administrator on either node to verify the cluster:
Note that all resources are under disk group 2. Move them with caution as all are
dependent on one another for a successful failover.
For the active node (node 1), check its IP configuration:
SQL Server virtual IP
Cluster virtual IP
Node 1’s DHCP IP
You should be able to PING the:
a) server cluster IP address - 192.168.1.75
b) server cluster name (network name of the cluster) – 192.168.0.78
Christopher Kempster
133
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
IMPORTANT – In an active/active cluster, remember that the total memory usage
of all instances should be less than the total memory of a single node. If all
instances fail to one node then it must be able to support, in terms of memory, all
databases. If not you will experience severe paging and very poor performance.
Also be reminded that a maximum of 8 instances can run on a single node, and
only one default instance may exist within the entire cluster.
Test Connectivity
Our SQL Server Named instance is referred to as : SQLCLUSTER1\SS2KTESTC1
The port and this instance name can be verified by running the server network utility on
either node in the cluster.
From my PC (that is running VMWARE), install the client tools (client network utility,
query analyzer), and create an alias via the client network utility as such:
If you cannot connect, double check the server name, the port, the enabled protocols,
attempt to ping the sql server virtual IP and the cluster IP.
Christopher Kempster
134
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
High Availability using Log Shipping
The process in SQL Server works at a database level, not at a global instance level as you
will see later. The aim here is the create a warm standby server in which one or more
databases are permanently in recovery mode. The source database ships transaction log
backup files to the destination server and we restore (with no recovery option) the logs
sequentially (in order). If there is a failure on the source server we attempt to recover
the last of the transaction logs, ship it to the destination server and complete the
database recovery. Once done, we send client connects to the new server and they
continue to work with little or no data loss.
Windows Domain
Server B
(standby)
Server A
(primary)
DatabaseA
DatabaseA
Client
Connections
Backup
Transaction
Log to network
share
Restore
Log with
norecovery on
destination
server
NOTE - we have not diagrammed the full database backups that must also be logshipped to start the process.
The DBA can use a custom written script or the SQL Server Wizard to do the shipping.
The shipping in most cases will be to a server on the same Windows 2k network domain,
but there is no reason why VPN tunnels over large distances or other remote scenarios
are not utilized.
IMPORTANT – You do not require enterprise edition of SQL Server for log
shipping.
If you further help over what has been provided in this ebook for custom shipping,
then get hold of the SQL Server 2000 Resource Kit from Microsoft.
Christopher Kempster
135
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
The supplied wizard provides all necessary steps to setup log shipping for selected
databases. Before doing so remember the following:
document the process before running the wizard
how will the client application components detect the failover?
databases must be in full or bulk-logged recovery mode
ensure instance names are the same on source and destination servers
pre-determine the log backup schedule for the source database
pre-setup the transaction log backup directory via a network UNC path (referred
to by both source and destination DTS jobs)
create DTS packages on both servers and use the transfer logins job to facilitate
the transferring of login information between shipped servers.
Must run the wizard using a login that has sysadmin system privileges
Once the log shipping DTS (maintenance plan) has been created, go to the Management
Folder in EM and select properties of the log-shipping job to view the process/steps.
See reference (56) for a documented “How-To” for log shipping in a SQL Server 2k
environment using the Wizard.
Manual Log Shipping - Basic Example
Here we will discuss a working example that was coded using two stored procedures,
a linked server, two DTS packages and (optionally) pre-defined backup devices.
This example revolves around a single server with two named instances:
Christopher Kempster
136
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
Primary Database Instance - SECA\MY2NDINSTANCE
Standby Database Instance - SECA\MY3RDINSTANCE
i.
Setup disk locations for primary and destination backups and test with SQL
Server service account.
Backups on primary server are dumped to \primary and
are then “shipped” via a simple xcopy to the \standby
directory from which they are restored.
ii. Pre-determine what account will be used for doing the backups, establish the
linked server and restore backups on the primary and standby databases. I
recommend that you create a new account with sysadmin rights on both
servers to facilitate this.
iii.
Setup Linked Server on primary server to destination server
We will link from the primary server
over to the standby server via the
account specified in step 2. Data
source is the server/instance name we
are connecting too.
In this case we are using the SA
account which is not best practice, but
will suffice for this example.
Apart from the SA account mapping
above, no other mapping will be valid.
Christopher Kempster
137
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
Ensure remote procedure call options are
set to facilitate the calling of t-sql stored
procedures on the standby server from the
primary server.
* If you are using EM, and have registered the server under an account not mapped in the linked
server, then the linked server will not work for you. Check your EM registration before attempting
to view tables/view under the linked server within EM.
iv.
Setup database backup devices on primary server
This is completely optional and really depends on the backup model you are
using. SQL Server allows you to pre-create backup devices, once created, we
can use the logical name for the backup device (which maps to a physical file)
rather than the using the physical path and filename for the backup. This
makes scripting much friendlier and easier to read/change.
Here we create two devices, one for full backups and the other for logs. See
management folder within Enterprise Manager and the backup folder item item
to create these:
v.
Check primary server database recovery model. Select properties of the
database via EM. Ensure the model is in line with the backups you will be logshipping.
vi.
Write two stored procedures that will reside in the master database on the
standby server for recovery of the FULL and LOG backups. The standby restore
option is what tells SQL Server that the database is in warm standby mode.
CREATE PROCEDURE dbo.StandbyServer_restore_full_database_backup AS SET NOCOUNT ON
RESTORE DATABASE logshiptest
FROM DISK = 'e:\backups\standby\logship_primaryserver_full_backup.BAK'
WITH
RESTRICTED_USER, -- leave db in DBO only use
REPLACE, -- ensure overwite of existing
STANDBY = 'e:\backups\standby\undo_logshiptest.ldf', -- holds uncommitted trans
MOVE 'logshiptest_data' TO 'e:\standbydb.mdf',
MOVE 'logshiptest_log' TO 'e:\standbydb.ldf'
GO
CREATE PROCEDURE dbo.StandbyServer_restore_log_database_backup AS SET NOCOUNT ON
RESTORE LOG logshiptest
FROM DISK = 'e:\backups\standby\logship_primaryserver_log_backup.BAK'
WITH
RESTRICTED_USER,
STANDBY = 'e:\backups\standby\undo_logshiptest.ldf' -- holds uncommitted trans
GO
Christopher Kempster
138
S Q L
vii.
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
Write two DTS packages on the primary server, one to do a full backup and the
other a log backup.
Package 1 – Primary Server, Full Database Backup
exec
standbyserver.master.dbo.StandbyServer_restore_full_database_backup
BACKUP LOG logshiptest
WITH TRUNCATE_ONLY
WAITFOR DELAY '00:00:05'
BACKUP DATABASE logshiptest TO
logship_primaryserver_full_backup
WITH INIT
WAITFOR DELAY '00:00:05'
Package 2 – Primary Server, Log Database Backup
BACKUP LOG logshiptest TO
logship_primaryserver_log_backup
WITH INIT, NO_TRUNCATE
exec
standbyserver.master.dbo.StandbyServer_restore_log_database_backup
WAITFOR DELAY '00:00:05'
Christopher Kempster
139
S Q L
viii.
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
Test Full then Log DTS routines and debug as required
ix.
Schedule DTS packages
x.
Monitor
xi.
On failure of the primary, do the following
-- Login to primary server (depends on failure), and attempt to backup last database
log file
BACKUP LOG logshiptest TO logship_primaryserver_log_backup WITH INIT,
NO_TRUNCATE
-- Login into standby server
restore database logshiptest with recovery
Deleting database file 'e:\backups\standby\undo_logshiptest.ldf'.
RESTORE DATABASE successfully processed 0 pages in 4.498 seconds (0.000 MB/sec).
-- Ensure client connections are connection to the now live “standby” server.
Some thoughts about this setup:
a) The full and log backups are being appended to the same file, consider writing
better backup routine on the primary server than produces separate files for
each backup with a date/time stamp. Do this in a T-SQL stored procedure on
the primary database and consider replacing the DTS copy command with a
call to xp_cmdshell within the stored procedure.
b) If using a), parameterize the two recovery procedures on the standby server
to accept the file path/filename of the file to be recovered. The routine in a)
will have all this information and can pass it to the standby server without
any problems.
c) Consider email on failure DTS tasks.
d) Consider how you will remove backups N days old?
e) Will the standby ever become the primary and effectively swap serve roles?
Custom Logshipping – Enterprise Example
The Enterprise Edition of SQL Server 2k includes the ability to configure and run log
shipping; I find the process overly complex and a little restrictive in terms of control (i.e.
I want to zip files, or FTP them to remote sources – we have no such options in the
supplied method).
The scenarios I have implemented the custom log ship routines on are:
a) Heavily used OLTP database hosted on our “source server”
b) We currently backup to two destinations (called duplexing) on disk
Christopher Kempster
140
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
c) We have a reporting requirement as well (destination server), and have chosen to
log ship the database to this server.
The architecture is summarized below:
We take full and transaction log backups via a custom written stored procedure. The
routine will dump the files to a disk array on the source server and optionally gzip
(compress) them. The stored procedure will then copy the file to the remote server over
a network share with appropriate service user privileges for the instance. Any failures
are reported to the DBA via email. Also note that the copy to the remote server can be
easily changed to an FTP and servers must be time-synchronized otherwise you will get
the error "There is a time difference between the client and the server".
At the destination server, we manually run the Initialize stored procedure. This routine
will search the dump directory for specific pre and post fixed files specified by the
procedures incoming parameters. Using xp_cmdshell and the dir command, we build up
a list of files in the directory, locate the last FULL (unzip) then restore. We then search
for a differential backup, apply this, and carry on applying the subsequent transaction log
files. All restore commands used for the database are logged into a user defined master
database table.
Finally, we call a simple monitoring stored procedure that looks for errors in the log table
every N minutes (as defined by the DBA); emailing errors via CDO-SYS and your local
SMTP server.
Advantages
•
No requirement for linked servers
Christopher Kempster
141
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
•
Simply scripts with no complex lookups over the MSDB database, very easy
the change and enhance
•
Easy to setup and configure
•
Will not, by default, force the overwriting of existing database files used by
other databases
•
Will search for full, differentials and logs and apply in correct order, so long as
files copy OK
Disadvantages/Issues
•
Requires a network share to copy files (can be a security issue)
•
Cant pre-detect missed or missing files (as you could if you utilized the
MSDB)
•
Cant pre-detect invalid file sizes
•
Does not do a quick header and file check and compare DBA's passed in
parameters with
•
Relies on the DBA to supply move commands for the restore as a parameter
(see later), does not dynamically pick up database files/filegroups from the
backup files themselves.
•
User sessions must be killed on the log shipped database before attempting
the restore command
o
The only way I can see around this is via a physical log reader/parser
program and the DBA runs SQL scripts rather than applying the log
itself.
Configuration & Installation
All scripts are custom written and are typically stored in the master database of the
instance.
Source Server (server-1)
The source server utilizes the following database objects:
•
DBBackup_sp - master database - dumps full, log or differential backups to
specified directory on source server, optionally zips files, emails on error,
copies (duplexes) backups to destination server via UNC path, delete files
older the N days.
•
SendMail_sp - master database - utilises simplecdo.dll (custom written VB
COM that uses CDOSYS to send emails, can use JMail instead) to email the
administrator on backup errors.
Christopher Kempster
142
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
•
dtdelete.exe - c:\scripts\ - command line executable that will remove files
from a directory (and recursively if so desired) that are N days old from the
backup destination directory
•
gzip.exe - c:\scripts\ - command line file compression utility for backup files
Destination Server (server-2)
The destination server utilises the following database objects:
•
usp_LogShipping_Init - master database - run manually (1st time only or
on serious error). Searches the incoming backup directory, applies most
recent FULL backup, then last differential (if any) and applies subsequent
transaction log files. Leaves database in norecovery or standby mode. Logs all
recoveries to its audit table.
•
usp_LogShipping_Continue - master database - as above, but searches for
differentials and transaction logs only. If a full is found then Init is recalled to
reapply the full backup again. Logs all recoveries to its audit table.
•
usp_LogShipping_Finish - master database - manually called by the DBA,
will finalise the recovery of a database and make it available for read/write.
IMPORTANT - DBA must turn off log shipping jobs on the destination server
before attempting this command.
•
usp_LogShipping_Monitor - master database - reads the audit table below
and emails the errors found to the DBA.
•
LogShipping_Audit - master database - table that to which all recovery
attempts are logged.
•
SendMail_sp - master database - utilises simplecdo.dll to email the
administrator on backup errors.
•
gzip.exe - c:\scripts\ - command line file de-compression utility for backup
files
•
usp_KillUsers - master database - kills all users connected to a database for
the instance
IMPORTANT - Consider using the command alter database xxx set resticted_user
rollback immediate rather than utilizing the usp_KillUsers stored procedure. The
routines themselves use alter database command as required.
Log Shipping Example 1 - Setup and Running
Server 1 (source)
Here we assume the server has been configured, instances and databases all running
nicely and the DBA is now ready to sort out the backup recovery path for the database
MYDB. This database has four file groups (1 data file per group) and a single log file.
The database itself is approx 3.6Gb in size, full recovery mode and requires point in time
recovery in 10min cycles.
Christopher Kempster
143
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
The DBA creates the following directory on the source server to hold backups:
d:\dbbackup\mydb\
There is ample space for 4 days worth of backups. Litespeed or other 3rd party backup
products are not being used. The DBA wants the full backup files zipped, and files older
than 4 days automatically removed.
On the remote server, the DBA creates the duplex directory: e:\dbbackup\mydb\
A share is created on the dbbackup directory called standbydest for one of a better word
and NT security configured accordingly.
The DBA configures the following stored procedure to run 2 x daily for FULL backups via a
DTS job:
exec DBBackup_sp 'full', 'mydb', 'c:\scripts', 4, 'c:\scripts', 1,
'd:\dbbackup\mydb\','\\server2\standbydest\mydb\', '[email protected]', 'Y'
We are running full backups at 6am and 8pm to cover ourselves nicely in terms of
recovery (not shown here, in DTS). We chose not to run differentials and are happy with
recovery times in general. The script above and its parameters tells the backup where
our gzip and dtdelete.exe files are (c:\scripts), the backup destination, and the duplex
destination on server-2. We are retaining files less than 4 days old and the value one (1)
tells the routine to zip the file created.
Next we schedule the transaction log file backups:
exec DBBackup_sp 'log', 'mydb', 'c:\scripts', 4, 'c:\scripts', 0,
'd:\dbbackup\mydb\','\\server2\standbydest\mydb\', '[email protected]', 'N'
The script above and its parameters tells the backup where our gzip and dtdelete.exe
files are (c:\scripts), the backup destination, the duplex destination on server-2. We are
retaining files less than 4 days old and the value one (1) tells the routine to zip the file
created. The value zero (0) represents the email, when zero the DBA is only notified on
backup failure, not success.
The DBA should actively monitor the backups for a good two or three days, ensuring full
and log backups are copied successfully, the backups can be manually restored on
server-2, and the deletion of files older than N days is working fine.
Server 2 (destination)
As mentioned in server 1 (source) setup, the DBA has already created the duplex
directory e:\dbbackup\mydb\ and configured a share on the \dbbackup directory called
standbydest using NT security. For server-2, we schedule three jobs that execute stored
procedure routines to initialise, continue and monitor log-shipping.
Initialise
The main stored procedure is log shipping initialize. We supply the routine a number of
parameters, being the name of the database to be restored, the location of the backups
(remember - files were copied from server-1), the standby redo file, the pre and post-fix
file extensions so the routine can build a list of files from disk to restore from, and finally,
the MOVE command for each database filegroup.
Christopher Kempster
144
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
Here is an example:
exec usp_LogShipping_Init
'mydb'
,'e:\dbbackup\mydb\'
,'e:\dbbackup\mydb_standby.rdo'
,'mydb_'
,'.bak*'
,'_full.bak'
,'_dif.bak'
,'_trn.bak'
,'
MOVE ''MYDB_SYSTEM'' TO ''c:\dbdata\mydb\mydbstandby_system01.mdf'',
MOVE ''MYDB_DATA'' TO ''c:\dbdata\mydb\mydbstandby_data01.mdf'',
MOVE ''MYDB_INDEX'' TO ''c:\dbdata\mydb\mydbstandby_index01.mdf'',
MOVE ''MYDB_AUDIT'' TO ''c:\dbdata\mydb\mydbstandby_audit01.mdf'',
MOVE ''MYDB_LOG'' TO ''c:\dbdata\mydb\mydbstandby_log01.mdf''
'
,'c:\scripts'
The DBA may consider further customization of this script, namely the building of the
MOVE statement by reading the backup file header. You can, but I work on the KISS
principle and in this scenario we can’t go wrong.
The initialise routine is NOT SCHEDULED and is only run manually if we need to force the
re-initialization of log shipping from the last full backup.
The master..LogShipping_Audit is updated accordingly with the files applied by the
routine or any failure/error information.
NOTE - This routine will locate the last full backup and apply it, then the last
differential (if any) and all subsequent transaction logs.
Continue
This is the crux of the log shipping routines and is scheduled to run every two hours from
7am to 10pm. The routine has identical parameters to that of the initialise procedure.
When run, the routine will determine if Initialise must be called (missing standby
database), or a new full backup file has been found and we need to start from scratch
with the full and differentials. This routine is basically the driver for log shipping, in both
its initializing and continuing to apply transaction logs as they arrive in the backup
directory from server-1.
exec usp_LogShipping_Continue
'mydb'
,'e:\dbbackup\mydb\'
,'e:\dbbackup\mydb_standby.rdo'
,'mydb_'
,'.bak*'
,'_full.bak'
,'_dif.bak'
,'_trn.bak'
,'
MOVE ''MYDB_SYSTEM'' TO ''c:\dbdata\mydb\mydbstandby_system01.mdf'',
MOVE ''MYDB_DATA'' TO ''c:\dbdata\mydb\mydbstandby_data01.mdf'',
MOVE ''MYDB_INDEX'' TO ''c:\dbdata\mydb\mydbstandby_index01.mdf'',
MOVE ''MYDB_AUDIT'' TO ''c:\dbdata\mydb\mydbstandby_audit01.mdf'',
MOVE ''MYDB_LOG'' TO ''c:\dbdata\mydb\mydbstandby_log01.mdf''
Christopher Kempster
145
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
'
,'c:\scripts'
Monitor
Using a simple-cdo custom DLL, this scheduled job running at the same schedule as
continue log shipping calls this:
exec usp_LogShipping_Monitor '[email protected]', 'LOGSHIP', 15
The DBA is emailed error rows from the master..LogShipping_Audit table. Here is an
example of its contents:
Log Shipping Example 2 - Finalising Recovery / Failover
The DBA will attempt the following in order, I say "attempt" as the first steps may fail
and must be carefully monitored.
1) Re-evaluate the need to cutover and double check that the situation is such
that cut over is required
2) Attempt to run a backup on server-1 using your scheduled DBBackup_sp
command or via Query Analyser
3) Verify backup and file copy, manually copy if required
4) Run usp_LogShipping_Continue (or its scheduled job) on server-2
5) Disable above job
6) Manually run exec..usp_LogShipping_Finish 'db-name-here'
Concluding thoughts
The method presented is simple, easy to implement and does not rely on linked servers
or numerous system table lookups. One of the big disadvantages with log shipping is
more to do with the recovery process (albeit short – consider this when testing), that
being "user sessions must be killed before attempting to recover the database". If users
need to be kicked from the standby database to apply further logs, then its tough setting
a specific recover time if the database is also used for corporate reporting.
Christopher Kempster
146
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
5
Chapter
Troubleshooting SQL Clusters
F
ollowing on from the previous chapter on High Availability, we will cover some of
hints and tips for SQL Server Cluster administration and troubleshooting.
Where possible, leverage the VMWARE environment to test all scenarios before
attempting any maintenance work in production. It is important to note that this
chapter is not designed as a start to finish read; many of the solutions are scenario based
and do not expand on terminology used (such as the definition and usage of “MSMQ” for
example).
Trouble Shooting and Managing Clusters
How many MSMQ’s can I have per cluster?
Only one instance per cluster.
I am having trouble starting the cluster (Win2k)
Cluster start problems tend to reside with the service account that is running the cluster
service. Before doing anything further, check this account by logging into any of the
cluster nodes with the service account user; the event logs may highlight authentication
issues as well.
If this is not the problem, then consider these steps:
a) Ping each node in the cluster. Verify/check all networking properties on all nodes,
and ping each node.
b) Node has a valid cluster database? Check the file clusdb in %systemroot%\cluster
exists. If the file does not exist, then refer to this MS support document for
detailed recovery - http://support.microsoft.com/default.aspx?scid=kb;ENUS;224999
c) Check the registry key HKLM\Cluster exists.
d) The cluster.log file must not be read-only; verify the service account has full
access to this file.
e) Can the node see the quorum disk?
Christopher Kempster
147
S Q L
f)
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
Check the system event log and cluster.log file carefully; the quorum disk may be
corrupted. If this is suspected, or you get an error related to the cluster logs,
then review this MS Support document carefully:
http://support.microsoft.com/default.aspx?scid=kb;EN-US;245762
Why can’t I backup to C Drive?
The SQL Server cluster may only see drives that are in its cluster group, and local disks
are not here and would never be as all nodes cannot access it. Choose another resource.
Move SQL Server cluster between SANs
This tip is provided by Loay Shbeilat from Microsoft; the tip can be found on the MS SQL
groups. As it is very handy, and with permission, I have included it here:
Assumptions:
1) The machines will not change
2) The storage will be changed
3) The 2 SANs will be accessible to the cluster for the migration purpose.
4) Assume the Old disk drive is O: and the New disk Drive is N:
Steps I followed:
1) Backup the disks
2) Backup the disk signatures/geometry. You can use "confdisk.exe" to do that.
3) On the new SAN create a new partition that you will use for the SQL. Name the
disk N:\
4) Create a Disk Resource for the new disk and include it with the SQL group.
5) Offline the SQL cluster resource (so that no one would be writing to the disk
anymore)
6) Keep the disk resources online.
7) Using a copy utility replicate the data from the old drive to the new drive make
sure to copy the correct ACL's/attributes/etc. The " /o " switch with xcopy does
copy the ACL's. You can also ntbackup then restore the data.
8) Now add the new disk as a dependency for the SQL resource. The SQL
resource at this point of time will have 2 disk dependencies: Disk O: and Disk N:
9) Go to disk management. Rename the Old disk drive from O: to X:
10) Rename the New disk drive from N: to O:
11) Back to cluster administrator, rename the resource from "Disk O:" to "Disk
X:"
Christopher Kempster
148
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
12) Rename the resource from "Disk N:" to "Disk O:"
13) Remove the "Disk X:" dependency from the SQL resource. Now it should only
have one disk dependency "disk O:"
14) I would go to the advanced properties of the SQL resource, and set it to "Do
not restart" (just in case things dont go well, you dont want the resource failing
back and forth between the nodes)
15) Try to online the SQL resource
Does it work?
Then go back to Advanced tab in properties and set it to "Restart"
Does it fail?
Go the event viewer and check the system and the application events. Does it
shed any light on the problem?
Should I change the Service Dependencies in Cluster Administrator?
Generally NO. Microsoft support states that the only time would be to add an additional
disk resource or when configuring Analysis Services (OLAP) for clustering. The default
SQL Server 2k dependency tree is:
See MS Support document - http://support.microsoft.com/default.aspx?kbid=835185
How can I stop Full Text indexing affecting the whole cluster group?
Uncheck the “affect the group” option for the properties of the full text resource via the
cluster administration GUI (third TAB).
Diagnosing Issues, where to look?
The first and most obvious place to look is the event log, particularly the application
event log section. Refer to these log files:
a) cluster.log - %systemroot%\cluster\ - cluster service log
b) sqlclstr.log - %systemroot%\ - when clustered instance starts log
c) sqlspN.log - %systemroot%\ - SQL service pack and setup logs
The logs above are standard ASCII files. Always check Microsoft support and search
Google Groups before calling MS support. Many issues are covered in service packs.
You can view the destination of the main log via:
Christopher Kempster
149
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
Other variables include:
a) ClusterLogLevel=2
(0=none, 1=errors, 2=errors and warnings, 3=all events)
b) ClusterLogSize=20
(size in megabytes)
c) ClusterLogOverwrite=0
(1 = overwrite each time cluster service starts)
Refer to KB#168801 on the Microsoft Support website.
Can I delete the BUILTIN\Administrators group in SQL?
No. This account is used for the isalive ping between nodes in the cluster.
On deleting the account, I get these errors (it may differ between installs):
[sqsrvres] checkODBCConnectError: sqlstate = 28000; native error = 4818; message =
[Microsoft][ODBC SQL Server Driver][SQL Server]Login failed for user 'MYDOMAIN\cluster'.
[sqsrvres] ODBC sqldriverconnect failed
It is only when I attempt to restart that I get the stream of messages.
If you run Cluster Administrator and attempt a refresh you may find the screen locks up;
no need to reboot, the control will eventually be returned to you; this can happen with
the SQL Service Control Manager (SCM).
From your active node, goto the services applet and check the instance is up. If not, look
over the SQL Server logs if not to further clarify the issue. Run EM. You may need to
alter your registration properties to the SA account. Once in, re-create the
BUILTIN\Administrators login and attempt to re-start via the cluster administrator utility.
You may notice that the instance comes back online within the cluster administrator as
soon as the user is created.
If the error above persists and this solution doesn’t resolve your issue:
a) Check the event log and its application log area is not full
b) Ensure all nodes are rebooted
c) Ensure cluster administrator and SQL Server Instance administrator users have
administrator privileges
d) You may find there is a SQL instance dependency to the Quorum drive, if your
cluster resource groups are split into cluster group or your SQL Server group, you
may find the SQL Server instance fails to come online (status pending). If you
failover the cluster group then you should see the SQL Server instance come
Christopher Kempster
150
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
online as well. You can not create dependencies between groups, only between
resources within groups; also note the groups themselves have a server owner.
I regard this as a major problem, and have experienced ongoing systems errors; as such,
I would do a complete re-install of your instance.
Correct way of stopping a clustered SQL instance
It is important to remember that a single instance in a clustered environment is running
on one and only one node at any time (share nothing cluster); as such, we need to first
of all use the cluster administrator to determine the active node.
In SQL Server 2k use the Service Control Manager (SCM) which is cluster aware unlike
SQL Server v7. As an example, using SCM to shutdown the instance on my active node I
see it cleanly takes the instance offline:
The startup via SCM is also fine, taking the group items back online.
IMPORTANT – Taking the SQL Server virtual instance offline via the Cluster
Administrator will shutdown the instance. If you want to keep it offline, but start
the instance on the active node then don’t use Enterprise Manager (EM) - it is
cluster aware and will start the instance within the cluster !
How do I keep the Instance offline but start it locally for maintenance?
Taking the SQL Server virtual instance offline via the Cluster Administrator will shutdown
the instance. If you want to keep it offline, but start the instance on the active node then
do not use Enterprise Manager (EM.
If I offline the instance via Cluster Administrator on the active node, I see the service
completely shutdown. Run net start:
Christopher Kempster
151
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
When I run Enterprise Manager, it tells me the instance is down. But this is the virtual
instance, and therefore a little confusing at first. As I know the instance IS up, I drill
through via EM:
..and in Cluster administrator we confirm its offline:
If I right click properties of on the EM instance registration, and START, then instance will
come online within Cluster Administrator.
Can I automatically schedule a fail-over?
Use the cluster command via DOS command line. It can be schedule via AT on the
current active node. This is an effective way to fail the SQL Server over:
cluster MySQLGroup group “SQLCLUSTER1” /MOVETO:Server2
Correct way to initiate a failure in Cluster Administrator
Open the cluster administrator, navigate to your SQL Server group and initiate failure of
the “SQL IP…” address resource item three times.
Avoid stopping the SQL Server service outside the cluster administrator as a way of
initiating failover. There is a possibility of corrupting or shutting completely down the
SQL cluster or the cluster service(s) itself.
Christopher Kempster
152
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
Any Windows 2003 restrictions with clustering?
Be aware that the maximum number of nodes is 8 and the maximum number of SQL
instances within the cluster is 16 (8 per node in an active/active two node cluster). Read
the Microsoft documentation carefully between releases.
Changing Service Account Logins/Passwords
Use Enterprise Manager at the Active node in all cases to avoid problems at other nodes.
Event logs between cluster nodes – can I sync them also?
Primarily for Windows 2000 installations you can enable/disable event log replication
between nodes via:
cluster [name of node] /prop EnableEventLogReplication={0,1}
Nodes in a cluster via Query Analyser?
Use the following command:
SELECT * FROM ::FN_VIRTUALSERVERNODES()
Failed to obtain TransactionDispenserInterface: Result Code = 0x8004d01b
This is not necessarily a sql cluster only issue. You may get this error when:
a) MSDTC has been forcible stopped or restarted (or crashed and re-started)
b) The SQL Server service started before MSDTC
Altering Server Network Properties for the Instance
First of all, locate the active node and run the server network utility; do not offline the
virtual instance at this point in time via cluster administrator as the server network utility
requires it up to determine the instance’s currently supported protocols and their
properties.
You will notice that named pipes cannot be removed. Make the changes as required and
you will be given the standard warning about having to restart the instance.
The Server Network Utility is supposedly cluster aware, but I had trouble with this under
my SQL Server 2k SP3 instance. The port changes I made for the TCPIP protocol were
not reflected between nodes. Consequently, node 1 was under 1456 and node 2 was
using 2433.
To sort this out, I edited the registry of Node 1 and altered the tcpport key under the
supersocketsnetlib folder:
Christopher Kempster
153
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
Take time to check your nodes after any change, and do allow a few minutes (5+mins)
for the replication to occur between nodes.
IMPORTANT – If you are using the force protocol encryption option, make sure
you have a certificate for each node for the virtual instance.
Add Disk E: to our list of disk resources for SQL Server
On installation, we selected F:\ for the default data directory. As such all system and
example databases in the instance refer to F:\
To add E:\ to the list of valid drives for our clustered SQL Server instance:
1) Open cluster administrator
2) Under the groups folder, look through each group and locate “Disk E:”
3) Move this resource to the same group as your SQL Server instance
resource are
4) Rename this group to make it more readable. Here is what I see:
5) Take “SQL Server (<name>)” offline:
Christopher Kempster
154
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
6) Select properties of “SQLServer (<name>)” and the dependencies tab,
add Disk E: as a dependency.
7) Apply and bring the resource back online.
8) From enterprise manager or query analyzer, you should be able to select
E:\ along with our original F:\ drive.
This was also covered by Microsoft support in article 295732.
Cluster Network Name resource 'SQL Network Name(SQLCLUSTER1)' cannot be
brought online because the name could not be added to the system.
This is a minor problem typically caused by incorrect DNS settings for forward and
reverse (PTR) lookups.
Goto your domain controller and run the DNS manager utility.
We have a single forward lookup zone defined called “mydomain”, under which lists the
DHCP hosts allocated IPs and the virtual entries for:
a) mycluster (MSCS service virtual IP) – 192.168.1.75
b) sqlcluster1 (SQL Server virtual cluster IP) – 192.168.0.78
For correct names resolution, we also require reverse lookup zones for these subnets.
The reverse zone lookup asks for the first three parts of the IP (aka the subnet). Once
created, right click on the reverse lookup zone and select new pointer. Browse the
forward zone and select the appropriate entry for the subnet created.
Christopher Kempster
155
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
In my configuration I have this:
Reboot your nodes and the effect should take with the names resolving.
I renamed my sql server virtual cluster name – now I am getting errors and the
instance will not start?
This is a nasty problem. Basically – don’t change it!
The issue is described in a MS support article:
http://support.microsoft.com/?id=307336
With errors such as (typically in the cluster administrator GUI):
MSSQLSERVER Error (3) 17052 N/A BOPDEV1 [sqsrvres]
checkODBCConnectError: sqlstate = 08001; native error = 11; message =
[Microsoft][ODBC SQL Server Driver][DBNETLIB]SQL Server does not exist or
access denied.
MSSQLSERVER Error (3) 17052 N/A BOPDEV1 [sqsrvres] ODBC sqldriverconnect
failed
and event log messages such as:
Cluster resource 'SQL Server (SS2KTESTC1)' failed.
You can repeat this error by selecting properties of the “SQL Network Name (<name>)”
resource and in the parameters tab changing the virtual name here. This resource will
come up fine, but will error on the next resource, that being the instance itself. If you
cant remember the old virtual name, even from DNS, then you are in trouble (revisit
Windows Event logs and SQL Server logs).
I have found no way of renaming this virtual name and believe the only safe way is
reinstallation.
Christopher Kempster
156
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
How to I alter the IP address of the virtual server?
This is documented in Microsoft Support KB#Q244980. I have validated their approach
and the steps taken are fine. In our VMWARE example, we have created a DNS entry for
sqlcluster1 (192.168.0.78). If we want to move IPs:
•
Run the setup disk on the active node for the instance
•
Click advanced options
•
Click maintain virtual servers, enter the virtual server name to manage
•
In the screen below, we see our existing IP. Add the new IP and remove the
existing
•
Continue on from here to complete the operation.
The Microsoft Clustering Service failed to restore a registry key for resource SQL
Server
This is a nasty problem, I simulated this same error a few times by rebooting the node
within a few seconds (around 50) of completing the installation of my SQL instance. Each
time node 2 reported this error when attempting to start the instance.
The error is indicative of a registry corruption and/or mismatch.
Reinstall SQL Server on a Cluster Node
The full procedure can be found in the BOL. The general tasks are:
a) Ensure the node is not active for the SQL Server instance
b) Run SQL Server setup
c) Remove the node from the configuration
d) Do whatever is required to the server (node)
Christopher Kempster
157
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
e) Run SQL Server setup once again
f)
Add the node back into the virtual instance
g) Reboot the node after installation
h) Reinstall service packs as required.
Read on for more specific information/scenarios.
How to remove a SQL Server Instance from the cluster
Pick the active node. Run your SQL Server setup, enter the name of the virtual sql
server to be removed, then pick the instance under this virtual server. Supply the
login/password for the cluster administrator user, not your SQL Server instance startup
user. The instance will take time to remove and will clean up effectively. You will be
asked to reboot all nodes in the cluster.
NOTE – When running Setup, the “upgrade, remove, or add components to an
existing instance of SQL Server” will be unavailable if all nodes are not available,
only the advanced options radio button will be available.
Remove/add a single sqlserver node from the clustered instance (not evicting a
node from the cluster service itself)
Run the setup disk, install database server, select advanced options (if not shown
recheck the virtual server name you entered), maintain the cluster, skip past the IP
configuration (should already be there), and select the host to remove / add. Enter the
cluster administrator user account credentials. Goto to the node in question after a
removal, you will find the binaries/reg entries for this instance are completely removed.
Reboot as required.
When adding a node remember to reapply the current service pack/patch level before
attempting a failover.
COMCLUST and Windows 2003 Server
There is no need to run comclust under Windows 2003. This will be done during the
Cluster installation. You may need to add the DTC to the SQL Server group as a
dependency to ensure the instance starts without ongoing client connectivity issues.
Try to run service pack setup.bat and tells me “Setup initialization error. Access
Denied”
For some very strange reason I had this error on one node but not the other. The
problem for me was the node that didn’t have the error wasn’t the current active node in
the cluster. Now I could have failed the instance over, but I wanted to sort out this
problem.
When running setup.bat, I was shown a dialog with the message:
***
Setup initialization error.
Access denied.
Source: 'C:\sql2ksp3\x86\setup\sqlspre.ini'
Christopher Kempster
158
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
Target: 'C:\DOCUME~\ADMINI~1.000\LOCALS~1\Temp\setupsql.ini'
***
To get around this problem:
a) Ensure you login to the node as the domain or local administrator
b) Create a new folder, called c:\temp (if does exist)
c) Ensure Everyone group has full control
d) Alter the TMP and TEMP environment variables to use this directory, ensure there
are no embedded spaces or special characters.
e) Re-try the setup.
Applying a service pack to the SQL Server clustered instance
Apart from the pre-planning and backup stages before applying any patch or service
pack, follow these steps:
a) Ensure all nodes in the cluster are up and available that host your instance
b) From the current primary node, run the service pack setup.exe from this server
a. You will be told to reboot your node if you have not done so from a recent
sql server instance installation. You can bypass this via the next button
(may not be present on all service pack releases!).
c) Enter the name of the virtual sql cluster
d) You may be presented with other options related to the service pack or if multiple
instances for the virtual SQL cluster are being hosted
e) Enter the sa account or use windows authentication
f)
Enter the cluster administrator login details
g) OK to complete
h) SQL Server will upgrade all nodes
i)
Verify via cluster administrator (group up?) event log, sql server log and upgrade
log file
j)
Reboot one node at time to maintain maximum uptime.
IMPORTANT – For replicated instances, install service pack at the distributor first,
then the publisher and finally the subscribers.
Christopher Kempster
159
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
6
Chapter
Backup
T
he database backup is a simple operation but funnily enough, it is what we take
for granted most that comes back to haunt us sooner rather than later.
Throughout this chapter we reiterate the fundamentals of SQL Server backups,
then look at more advanced options to customize your backup regime.
Backup Fundamentals
Importance of Structure and Standards
From a backup and recovery perspective, it is important the DBA clearly defines and
documents:
a) the sql server binaries directory structure (for single or multiple instances)
b) database file locations
c) instance and database naming standards
d) base instance and database properties that are regarded as “default”
values for any installation, including
i. including server, instance and database collation settings
ii. security properties – integrated or mixed mode, service user
runtime account, domain (and its trusts to other domains as
required to facilitate logins)
iii. SA account passwords and its security, use of the sysadmin
system role
iv. instance memory – will you fix min/max memory?
e) the most basic system backups to be performed and a how-to and whereto summary
f)
location of other external files, such as full-text index catalogs, instance
error files.
Christopher Kempster
160
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
So why is this important? well, it is all about the consistency of management, no matter
the underlying hardware or database services running within it. Consistency in this form
means no surprises, ease of DBMS navigation and problem determination, simplify
systems recovery, and to quickly establish some rudimentary but important knowledge of
any new service support requirement.
At its most basic level, the DBA should prepare a single document that clearly defines the
above elements. The document should be readily available on the technical intranet for
the support team, and most importantly, be adapted over time to include new features
and simple changes as the team determines what fits within their business.
We will discuss some of these elements in more detail.
Directory Structures
Having a standard directory structure for your SQL installations is very important. If you
are attempting to recover specific database files from tape onto a server you know little
about, there is nothing more frustrating than wasting time restoring them to temporary
locations only to be moved later (as you discover more of the system), or having to
search for files, or removing the wrong files that happened to be for another instance
that was down at the time (it can happen).
Taking a page from Oracle’s book of best practice, we have the importance of a “flexible
architecture” for directory and file creation. The OFA, or Optimal Flexible Architecture
basically describes:
Establish a documented and orderly directory structure for installation binaries
and database files that is compatible with any disk resource
eg. c:\SQLServer2kBin\<instance-name>\bin\{binaries}
c:\SQLServer2kData\<instance-name>\<db-name>\{data files}
c:\SQLServer2kLog\<instance-name>\<db-name>\{log files}
c:\SQLServer2kErr\<instance-name>\{error log files}
Separate segments of different behavior into different file-groups:
Consider separation based on usage, creating for user’s databases a
separate file-group for DATA, INDEXES, AUDITING, and not using the
PRIMARY group for user objects.
Consider file-group separation based on disk resource contention
Consider file-group separation based on their historical data content
Consider separation based on file management – size of files, write
intensity etc.
Separate database components across different disk resources for reliability and
performance
Christopher Kempster
161
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
NOTE – Be aware of SQL Server data striping (not to mention RAID striping). If
you have two or more database files for a single file-group, SQL Server will
automatically stripe data across these based on the percentage of free space with
each data file. Allow this if you are also planning disk striping “by hand” during
installation/restore.
The example above (i.e. c:\SQLServer2kData) is probably not optimal if you plan to
upgrade the instance and the files stay exactly where they are – no DBA likes to create
additional work for themselves. So here are two examples I have used that attempt to
define a working OFA style structure for DBMS directories:
Naming conventions
for instances
(i<name>) assist in
grouping DB files
and identifying
services
Contextual naming
of the directory
clear states its DB
purpose
Although not shown above, we can add in the database binaries (dbbin), separate the
error log files (dberrlog) and others with relative ease.
NOTE – On installation you are prompted for the system directory data and log file
destination (for all system databases). We can adapt the above structure to include
a \mssql\system directory outside of the dbdata\master etc for easier file
identification at this point.
Naming Rules
Here are some example naming suggestions:
•
Use 01 to 99 to ID the file-number in the group.
•
Try and stay within the 256 character boundary for directory depth, just in case
some restore scenario or internally documented/undocumented SQL command
has this restriction (you never know). Besides, long names are very inconvenient
when performing command line restores.
•
Do not use spaces and avoid under-scores (_), but use capitalization to your
advantage. If you do use underscores, then use them consistently.
Christopher Kempster
162
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
•
Apply restrictions to the size of names, but leave it flexible. For example, impose
a 10 character limit on your instance and user database names, but allow a 4 or 5
letter names as well.
•
Names should be context or service driven for example, migration databases
copied from production onto your development server may be named MIGMyApp;
where possible. Avoid the temptation to call a database oldcopyprd or dbatestdb.
•
Avoid the prefix of “dev”, “test”, “prod” for instance
Database File Names
For a new database (includes a single data file for the primary file group and single
transaction log), SQL Server will name them:
<db-name>_Data
<db-name>_Log
e.g. mydb_Data, file extension is .MDF to default data dir
e.g. mydb_Log, file extension is .LDF to default log dir
I name them as following during DB creation:
<instance-name>_<db-name>_SYSTEM
<instance-name>_<db-name>_LOG01
If it’s the default instance then leave <instance-name> blank. If you used the directory
conventions described earlier then you may choose to omit the instance name from the
files.
The DBA should retain the file-extension standards from Microsoft for overall consistency:
.mdf
.ndf
.ldf
master data file
next data file (for any other files created for the database)
transaction log file
Logical Filenames and File group Names
The logical database filename is a unique name that will identify the file within the filegroup. I like to use it to identify the general contents of the file within the file-group to
simplify recovery, for example MYDB_DATA or MYDB_AUDIT. This can be said for file
group names as well. Here is an example:
Try and be practical with the number of files, and the file groups. I only do the above
file-group split when the disk capacity and volumes provide a clear advantage for me to
do so.
Christopher Kempster
163
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
It is important to remember that a file-group with two or more database files causes SQL
Server to stripe writes over the files based on the percentage of free space available
within each. Depending on your application this may provide a performance
enhancement as SQL Server creates a new IO thread for each file. This is not the case
though with transaction log files which are written sequentually. Best practice states that
where have one and only one database file per file-group.
Default properties
The DBA should, where possible, clearly document the basic instance and database
settings to avoid potential show stoppers at a later stage, especially with collation
settings. They are simply things, but easily missed.
Some of the items to cover in a checklist:
•
•
Instance Level
o
Instance runs as a known SQLServerAdmin (or similar) domain account
user
o
Use named instances are used, default instance should be avoided
o
Consider fixing named instances to specific ports
o
Min/Max memory settings for the instance
o
Server and Instance installation collation
o
Directory structures (as above)
o
Sysadmin role security
o
Security settings (Mixed/Integrated)
o
Licensing mode and validation of
o
Naming convention
o
Auto-start services (Instance, SQL Agent, MSDTC)
o
Disable NT fiber usage, disable boost SQL Server priority
o
Recovery Interval (eg. 2+ minutes) – requirement dependent
o
Default user language
o
Document the SQL Server log file destination, consider moving it to a
more appropriate location than the default
o
Access-to/documentation-of SA account
Database Level
Christopher Kempster
164
S Q L
•
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
o
Database collation (use <server default> where possible)
o
Cross ownership chaining OFF (SP3)
o
Auto close database is OFF
o
Auto-update statistics is ON
o
Auto-shrink database is OFF
o
Simple recovery model – alter as the business requires
o
At an absolute minimum do full backups of all database via maintenance
plans and retain last 2-3 days if possible.
o
No business defined user account has been given db_owner privilege
o
Move to a fixed size file growth over % (the percentage growth and
exponentially grow files)
SQL Agent
o
Set service account
o
Set/alter the SQL agent og file destination
o
Auto-restart enabled
o
Set proxy account as required
o
Alter SQL Server authentication connection account as required
NOTE – The DBA should consider moving the default filegroup away from the
primary if you are creating other file-groups – the primary filegroup should store
system related tables only in this case. For example:
alter database mydb MODIFY FILEGROUP mydb_data DEFAULT
Recovery Interval
The recovery interval is set at an instance level and affects the checkpoint timeout period
for all databases in the SQL Server instance (it will not accurately dictate how long a
recovery will take in terms of applying files, roll-back or roll-forward). This of course has
flow on effects for instance recovery in terms of the number of possible transactions SQL
Server must rollback (uncommitted transactions) and roll forward (committed but not
written to the physical database data files) during its recovery cycle on instance startup.
The default value is zero (SQL Server managed). Any value greater than zero sets the
recoivery interval in minutes and when altered, its value takes effect immediately. (In a
majority of circumstances leave the setting at the default).
The value can be set via Enterprise Manager, or via a SQL statement:
Christopher Kempster
165
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
exec sp_configure N'recovery interval (min)', 2
reconfigure with override
Trying to pre-empt the actual goings-on with the DBMS architecture in terms of this
value is difficult to predict and the SQL Server documentation is somewhat vague. Use
performance monitor counters to monitor checkpoints and alter the recovery interval to
review the impact to the instance, this may take some time to be reflected. It is
important to remember that performance monitor wont measure instance recovery time.
Noe in some circumstances it can effect your SLA (service level agreement).
Number of pages flushed by checkpoint or
other operations that require all dirty pages
to be flushed.
In this case we are monitoring the default
instance. Other instances will have their
own counters.
Recovery Models
In SQL Server, each database has a recovery model which determines what statements
are logged and if point in time recovery is possible. The models are:
Christopher Kempster
166
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
a) Simple – transaction log (redo log) entries are truncated on completion of a
checkpoint. Point in time recovery is not possible.
b) Full – transaction log entries are retained and the log file will grow until the
DBA back’s up the transaction log and the committed log data to disk
(archived log mode in Oracle).
c) Bulk Logged – as per full but selected commands are not fully logged
(therefore the commands are not recoverable). These commands include
select into, bcp and bulk insert, create index and indexed view creation, text
and image operations (write and update text).
Mapping these models back to the SQL Server 7 days:
Select Into Bulk Copy
Off
On
Off
On
Truncate Log on Chkpt
Off
Off
On
On
SS2k Recovery Model
Full
Bulk Logged
Simple
Simple
The default system database recovery models are:
MASTER
MSDB
MODEL
TEMPDB
Simple (only full backups can be performance on master)
Simple
Simple
Simple (recovery properties cannot be set for this DB)
Normally, do not alter recovery model properties for any system database. The DBA of
course can alter the recovery model properties for user databases via Enterprise Manager
or a SQL statement:
ALTER DATABASE [BUYSMART]
SET RECOVERY [SIMPLE, FULL,
BULK_LOGGED]
For backward compatibility the
sp_dboption command is also
available.
Christopher Kempster
167
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
The alteration will take immediate affect. The DBA should issue a full backup and if using
full or bulk-logged options, continue with planned transaction log backups as necessary.
What privileges do I need to backup databases?
In order to backup a database the DBA requires the db_owner privilege. If this is not
suitable from a security perspective then db_backupoperator also grants the permission.
The fixed server role for all databases is sysadmin and will of course grant the
permission.
Backup and Restore between Editions of SQL 2k
The DBA can backup/restore without any problem between standard and enterprise
editions of SQL Server 2k. So long as the service packs are identical, or, the destination
database is a higher service pack to that of the source instance.
Backup Devices
A “backup device” is simply a logical name (alias) for a physical file that may be a disk
location (physical or UNC) or tape device. The device is visible to all databases within
the instance.
The device is not necessarily required, but is there for convenience and does allow
backup scripts to separate themselves from the physical location of data files. Altering a
script to backup elsewhere can be done by changing the destination of the backup
device.
exec sp_addumpdevice N'disk',
N'mydevice',
N'e:\dbbackups\mydevice.BAK'
The above dialog will run exec xp_fileexist "e:\dbbackups\mydevice.BAK" to verify the
location and warn the DBA accordingly.
The device has some flow on affects within the Enterprise Manager in terms of viewing
their content and selecting the device as a drop down item when backing up databases
via the EM.
IMPORTANT – The Database Maintenance Plan wizards do not utilise these
backup devices for some strange reason.
Database Maintenance Plans
Christopher Kempster
168
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
If you are just starting out with SQL Server and want to get backups up and running
quickly, along with some integrity checking, then consider database maintenance plans.
NOTE – Maintenance Plans are found under the Management folder within
Enterprise Manager.
The maintenance plan is simply a wizard that generates a series of MSDB jobs that are
scheduled and run by SQL*Agent. These jobs may include the following against one or
more databases:
Database backups (full and log backups only)
a) Can specify auto-removal of media sets that are N
days/weeks/months/seconds /minutes old
a) Can specify destination directory and the option to auto-create sub-directories
for each database to be backed-up
b) Database re-organisation
c) Update database statistics
d) Shrink databases (remove un-used space)
e) Database integrity checks
For backups, SQL Server will create one media set per backup set. This means one
physical disk file (media set) backup and inside it, a single log or full backup (backup
set). It will NOT append backup sets to existing media.
NOTE – Many of these items can be scheduled individually, away from the backup
job times. Note that SQL Server will not warn you of overlapping jobs or the fact
that another maintenance job already exists of that type.
With respect to backups, the maintenance plan can be a little restrictive though, consider
some of these:
no support for differential backups
many of the backup options are not available, such as the password parameter
can not duplex backup files (copy to another disk or server as part of the backup)
does not support appended backups to an existing backup device (media set)
NOTE – Natively, SQL Server has no built in mechanism for compressing or
zipping backup files. Consider writing your own backup t-sql stored procedure and
using the xp_cmdshell extended stored procedure.
The maintenance plan backup screens are shown below. Remember that at this point we
have already selected the databases to be included in this overall maintenance plan (first
screen of the wizard).
Christopher Kempster
169
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
This screen is for FULL backups of
the selected databases. We can
optionally verify the backup.
Disk or pre-defined tape backups
Full backup schedule
Always use a different directory to
that recommended. The SQL
Server path is too deep in the
directory structure.
Nice feature, will remove media sets
that are N days (and other periods)
old and auto-create sub-directories
for each database.
The next screen is related to transaction log backups. Be warned here that not all
databases you have selected in the first screen may be privy to log backups and can
result in failure of the scheduled maintenance plan.
NOTE - Check the job carefully, it may try and backup the logs for all databases.
Christopher Kempster
170
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
This screen and those
following it are very
similar to the FULL backup
screens. The default file
extension is TRN rather
than BAK.
The DBA can review and alter the maintenance plan at any time by simply selecting
properties for the generated plan and editing it as necessary within Enterprise Manager.
Data Dictionary Views
It is important to understand the difference between a media set and a backup set.
These concepts are used throughout the following sections and within the online help
for SQL Server.
A physical backup device is the media set. Within the media we can store one or
more logical backup sets of one or more databases (typically its all the same
database). This is shown below with their properties:
MEDIA SET (NAME, DESCRIPTION, PASSWORD, MEDIA-SEQUENCE)
BACKUP SET (NAME, SERVER, DATABASE, BACKUP TYPE, DATE, EXPIRES, SIZE, DESC)
BACKUP SET (NAME, SERVER, DATABASE, BACKUP TYPE, DATE, EXPIRES, SIZE, DESC)
Backup set
Media set
BACKUP DATABASE [mydb]
TO DISK = ‘e:\dbbackups\mydb\mydb_20040624_full.bak’
WITH INIT,
NAME = ‘Full Backup of MYDB on 24/06/2002’
Christopher Kempster
171
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
The backup information is recorded in the MSDB database, here is a snapshot of the
physical data model:
Backup
media set
media_
set_id
backup
_set_id
Backup set
Restore
history
Backup file
Backup
media
family
Restore file
Restore_
history_id
backup
_set_id
media_
set_id
Restore_
history_id
Log mark
history
Restore file
group
NOTE – rather than using msdb.. (which tells SQL Server that it will find the stored
procedure in the msdb system database and use the dbo owner), we could have
entered use [msdb] before we ran the procedure.
If you append backups to a media set then refer to each appended backup via the
FILE option (backup and restore commands) as you will see in the examples
presented throughout the chapter.
Removing Backup History from MSDB
The DBA should purge this information on a regular basis. I have personally found that
recovering a database via the GUI with a large number of backup records can result in a
huge delay (4+ minutes at times) as you wait for the GUI to return control back to you.
set dateformat dmy
exec msdb..sp_delete_backuphistory ‘15/06/2002’
-- remove records older than date specified
Full (complete) Backups
A full backup in SQL Server is a hot backup. The database does not come offline or
becomes unavailable to the end-user during a full backup. In terms of the entire SQL
Server instance though, full backups should encompass all system backups in order to
successfully recovery the entire instance. There is no single backup or recovery
statement that will cover all databases within the instance.
At a bare minimum the DBA should consider:
MASTER
MSDB
MODEL
<User DB>
Full backups, nightly
Full backups, nightly
Full backups, nightly
Full backups, nightly
The tempdb system database is rebuilt automatically to the destination defined in the
sysdatabases system table in the master database on instance start-up.
Christopher Kempster
172
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
The GUI is very simple to understand. In most cases the DBA will create a Database
Maintenance Plan to schedule and manage the full database backup.
An example full backup statement is:
BACKUP DATABASE [mydb]
TO DISK = ‘e:\dbbackups\mydb\mydb_20020624_full.bak’
WITH INIT,
PASSWORD = ‘my db password’,
NAME = ‘Full Backup of MYDB on 24/06/2002’
Processed 112 pages for database 'mydb', file 'mydb_Data' on file 1.
Processed 1 pages for database 'mydb', file 'mydb_Log' on file 1.
BACKUP DATABASE successfully processed 113 pages in 0.534 seconds (1.720 MB/sec).
BACKUP DATABASE [ANOTHERDB]
TO DISK = 'e:\anotherdb_20020603_full.bak'
WITH INIT,
NAME = 'Full Backup of ANOTHERDB on 03/06/2002',
EXPIREDATE = '03/06/2002'
If we tried to run the above command again we get the error below due to the expiration
date we have set. To get over this and still use the INIT option then we need to use the
SKIP option as well.
Server: Msg 4030, Level 16, State 1, Line 1
The medium on device 'e:\aa_20020624_full.bak' expires on Jun 3 2002 12:00:00:000AM and
cannot be overwritten.
Server: Msg 3013, Level 16, State 1, Line 1
BACKUP DATABASE is terminating abnormally.
NOTE – take a close look at the WITH clause syntax. The books online cover this
command very well and should be reviewed thoroughly.
The DBA should have a good understanding of all backup and recovery options, but
some of the key items are:
•
•
•
•
•
•
TO [DISK, TAPE] = ‘<backup device name or physical location>’
o Logical of physical location for the database backup to be placed
WITH INIT
o Force overwrite of the backup file if exists
WITH NOINIT
o Will “append” the backup to the existing backup set within the media.
MEDIA[name, password, description]
o These options set the name, description and password for the entire
media. A backup media (disk or tape) can contain one or more backup
sets
FORMAT, NOFORMAT
o Format renders the entire media set un-usable and ready for new
backup sets. Does NOT preserve the media header.
o No-format tells SQL Server to retain the existing media header. It will
not overwrite the media device unless INIT is also used.
EXPIREDATE = <dd/mm/yyyy>, RETAINDAYS = <number of days>
Christopher Kempster
173
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
Prevents the overwriting of a backup based on the expire date and
retain days parameters.
BLOCKSIZE = <bytes>
o If media requires backups to be of a specific block size in order for a
restore from that media to be successfully read.
o
•
The Enterprise Manager GUI is a little restrictive when it comes to restoring database
backups when the PASSWORD option has been used. It does not give you the option to
specify it and displays the error:
NOTE – Use passwords on backups only as a deterant, not a security feature.
Differential Backups
The differential backup will backup any 64Kb extent within the database that contains an
altered page. Remember this when viewing the backup size of the media set as you may
be surprised. The tracking is managed by the SQL Server storage engine using the DCM
(differential change map) page present in each non-log data file.
A differential backup is not supported in Database Maintenance Plans (should change in
the next version of SQL Server). Therefore DBAs need to resort to writing their own
scripts that can be a right pain. In many cases full and log backups will suffice but this
may slow the recovery process when applying large numbers of archived log files.
Differentials are used to speed the recovery process. The DBA will need to do their own
performance tuning and measurements to determine if differentials are required to meet
the recovery SLA.
Here is an example differential backup:
BACKUP DATABASE [mydb]
TO DISK = ‘e:\dbbackups\mydb\mydb_20020624_full.bak’
WITH DIFFERENTIAL,
INIT,
NAME = 'Differential Backup of MYDB on 24/06/2002'
The differential backup will backup all extents modified since the last full backup, and
NOT since the last differential. This is very important to understand, especially during
recovery. The last differential backup done must be used on recovery; they are
cumulative unlike log backups.
To get these backups up and running quickly, write a T-SQL stored procedure and use a
DTS to call it with an email notification for error tracking. Simply schedule the package
to run as required.
Christopher Kempster
174
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
Important – You cannot use differential backups to do point-in-time recovery (i.e.
the STOP AT clause is not valid to recover to a point in time for the period the
differential backup covers).
Transaction Log Backups
Transaction log backups are a fundamental requirement for “point in time recovery”
(PITR) of a database.
Remember that a transaction log exists for each database within the SQL Server instance
and is a mandatory requirement for the database to exist. The log backup is supported
via Maintenance Plans, making it very simple for the DBA to quickly set up full backups
with scheduled log backups.
The database must be in full or bulk-logged recovery modes before attempting a
transaction log backup. If not you will receive the error:
Server: Msg 4208, Level 16, State 1, Line 1
The statement BACKUP LOG is not allowed while the recovery model is SIMPLE. Use BACKUP
DATABASE or change the recovery model using ALTER DATABASE.
Server: Msg 3013, Level 16, State 1, Line 1
BACKUP LOG is terminating abnormally.
Attempting to backup the log file for the master database will result in the error:
Server: Msg 4212, Level 16, State 1, Line 1
Cannot back up the log of the master database. Use BACKUP DATABASE instead.
Server: Msg 3013, Level 16, State 1, Line 1
BACKUP LOG is terminating abnormally.
You can alter the recovery mode of the MSDB database if you like and do transaction log
backups, but it is not recommended unless there is an important requirement to do so.
Attempting to backup the log file for the tempdb database will result in the error:
Server: Msg 3147, Level 16, State 1, Line 1
Backup and restore operations are not allowed on database tempdb.
Server: Msg 3013, Level 16, State 1, Line 1
BACKUP LOG is terminating abnormally.
Microsoft documentation states that concurrent full and log backups are compatible.
After some testing I concur and after many months have yet to experience any backup or
recovery issues.
IMPORTANT – Before you attempt to restore an individual file or filegroup, you
must backup the transaction log.
There are several important parameters the DBA should understand when using log
backups. Note that many of these parameters are NOT available when creating
maintenance plans. Therefore, for the advanced DBA this may be too restrictive.
BACKUP LOG [mydb]
TO DISK = 'c:\master.bak'
WITH <see books online for comprehensive list of parameters>
Christopher Kempster
175
S Q L
Parameter
NO_TRUNCATE
NO_LOG
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
Notes
Special parameter used when database is in a damaged state,
allows us to attempt a log backup without truncating the virtual log
files. This is important if we are still trying to recover the instance
whilst we attempt to build another (i.e. standby database).
Is synonymous to no_truncate and is available for backward
compatibility only.
NOTE – Remember that the databases transaction log will continue to fill as
committed and non-committed transactions execute against the database. The
backup transaction log will write all committed transactions to your selected
transaction log backup file (an archived log).
The DBA can truncate the transaction log via the WITH NO_LOG or WITH
TRUNCATE_ONLY option. This is used in a variety of situations, the classic being when
you accidentally used the full or bulk-logged recovery model when you didn’t want
transactions permanently logged for point in time recovery. The log then grows and
typically results in full transaction log errors. This command will remove all non-active
transactions from the log, from there, the DBA can think shrink the log files and change
the recovery model as need be.
BACKUP LOG [mydb] WITH TRUNCATE_ONLY
Remember - you cannot selectively truncate transactions in the log file, it’s all or
nothing. The DBA must do a full backup immediately as you cannot recovery
through a truncate (as you would except).
Log backups failing when scheduled via Database Maintenance Plan
Take care with the integrity check before backup option with transaction log backups
done via maintenance plans. The job may simply fail and the log backup not start. This
can be related to permissions because the database must be in single user mode whilst
the integrity check runs. Uncheck and re-test, run integrity checks outside of, or
separate to your backup schedule.
Filegroup Backups
The DBA can also do tablespace (file-group) backups, although fine I rarely use them as
it typically complicates recovery. For very large databases this may be the only option
though. Here is an example:
BACKUP DATABASE [mydb]
FILE = N'myprimarydatafile',
TO DISK = N'C:\mydb_fg_myprimarydatafile.bak'
WITH
INIT ,
NOUNLOAD ,
NOSKIP ,
STATS = 10,
NOFORMAT
-- logical filename of physical file
-- backup destination
NOTE – Your database must be using the full or bulk-logged recovery model
Christopher Kempster
176
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
OLAP Backups
To backup Analysis Services, the DBA must:
a) Backup the Registry (\Microsoft\OLAP Server)
b) Backup the repository data files. Even if you migrate the repository to SQL
Server you should backup the bin directory to ensure maximum
recoverability. This includes the msmdrep.mdb database (unless you have
migrated the repository to SQL Server).
c) Backup the OLAP data-files
The ROLAP storage model for OLAP cubes, can complicate your backup as the
aggregations will be stored in the data-source in which your cube is utilizing to source its
fact data. This may be problematic with very large cubes.
Within Analysis Service manager you can export your cube database, this is the primary
method for backup that is probably the most reliable. This will export the aggregations,
security privileges, not the actual processed cubes with their data. On restoring the OLAP
database you will need to do a complete re-process of the OLAP database (repository).
Use the command line executable msmdarch.exe to archive a specific database into a
single .cab file. The DBA should extend this backup to include the items discussed above.
Can I compress backups?
You cannot (natively) compress SQL backups via Maintenance Plans or through the
native BACKUP command. To get around this, consider a custom stored procedure that
shells out to the command line (xp_cmdshell) and calls a third party zip/compression
program. Most of the popular vendors like WinZip and RAR have command line options.
For example:
SELECT @cmdline = @p_zippath + '\gzip.exe ' + @v_filename
EXEC @v_error = master..xp_cmdshell @cmdline, NO_OUTPUT
See a full script at: http://www.chriskempster.com/scripts/dbbackup_ss2k.sql
Can I backup and restore over a UNC path?
Yes you can, but the service account user must have the NTFS permission to do so;
check this carefully when debugging. Here is a working example I did some time back to
prove that it is possible:
restore database northwind_unc
from disk = '\\pc-124405\unctest\northwind.bak'
WITH MOVE 'Northwind' TO 'c:\testdb.mdf',
MOVE 'Northwind_log' TO 'c:\testdb.ldf'
Processed 320 pages for database 'northwind_unc', file 'Northwind' on file 1.
Processed 1 pages for database 'northwind_unc', file 'Northwind_log' on file 1.
RESTORE DATABASE successfully processed 321 pages in 0.247 seconds (10.621 MB/sec
Christopher Kempster
177
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
Logon failure: unknown user name or bad password
You may find developers getting this error when attempting commands like:
exec xp_cmdshell ‘dir \\myserver\sharename’
The service account must have the NTFS permission to successfully complete this
command.
What is the VDI?
From version 7 of SQL Server the VDI (virtual device interface or specification) was
introduced to backup and restore database instances. It is essential that any 3rd party
backup software leverages VDI as its core API for SQL backups (unless explicitly
underwritten by Microsoft).
On backup via VDI, the files are read remotely via the API and data is passed to the 3rd
party application. The VDI also supports split-mirror and copy-on-write technologies.
The VDI is free-threaded.
Note that VDI has not been especially adapted or optimized for Windows 2003 volume
shadow copy function.
What, When, Where, How to Backup
What is the DBA responsible for?
The DBA is responsible for in terms of backups:
•
Ensuring the instance and its databases are fully recoverable to a known point in
time. At a minimum this point should be daily.
•
Notifying system/backup administrators as to what directories should be backed
up
•
Verifying daily backups are valid (recoverable).
•
Ensuring appropriate log backups occur if point in time recovery is required
•
Ensuring full text indexes, OLAP cubes and DTS packages/jobs are backed up and
recoverable
•
Working with system/backup administrators in testing recovery scenarios
•
Checking and correcting database corruption (database integrity)
•
Determinining the need to log ship or copy backup files to other servers, and if so,
configuring, testing and managing this environment
•
Ensuring recovery documentation is kept up to date
Christopher Kempster
178
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
What do I backup?
The DBA should backup at the most primitive level:
a) all databases – full backup each night
b) sql server binaries – in other words the installation directory
c) system registry – via a system state backup using NTBackup or equivalent
The DBA needs to liaise with the system administrators regarding further OS and system
backups, especially if physical database files are being read and how these may/may-not
affect the DBMS itself. If you have complete responsibility over the servers (typically in
DEV and TEST) then stay simple where possible. Use NTBACKUP to take files off to tape
or duplex (copy) backups between DEV and TEST servers. In all cases, you source safe
environment is the critical component here and should be managed by server
administration professionals.
NOTE - Software and software agents like Tivoli Storage Manager and its TDP
agents (for SQL backups) will typically replace SQL Server backup routines and do
the work for you. As a DBA, you will be responsible for what and when this is
backed up. Ensure you document it well and communicate this back to the Backup
Administrator.
If point in time recovery (requires FULL or BULK-LOGGED recovery models) is expected
on a database, then FULL backups once per day, LOG backups once per hour or whatever
time span is acceptable in terms of recovery to the last backup. Backups are not CPU
intensive, but take care in terms of IO performance.
If you don’t require point in time recovery, and do not mind loosing all work between the
last FULL backup and the last differential (if applicable), then do a FULL backup each day.
Test a database recovery at least once every month.
Ensure your recovery model is set correctly for each user database. Finally as a general
rule backup your system databases daily no matter if you are experiencing little change.
How do I backup?
Often I simply use a Database Maintenance Plan – its is simple and effective for a
majority of instances. Very large instance databases, or the more experienced DBA, one
may choose to customize with their own routines (typically stored procedures run via SQL
Agent scheduled jobs). Custom routines may do a mixture of differentials and log
backups with specific filegroups. Compress, copy (to another server), email the
administrator and possibly encrypt the backup.
The business may leverage 3rd party software, this is fine but simply requires testing,
especially between service packs. Very large databases may require specialist backup
software such as that from Lightspeed Systems; this software creates very small backup
files, encrypted and at double (or more) the speed (half the time) of a standard SQL
Server backup.
Christopher Kempster
179
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
When do I backup?
Most instances require daily full backups. Ensure that daily backup gets copied to tape in
a timely matter and not be a day behind in terms of the physical tapes date stamp (test
your recovery!).
The backup in SQL Server is hot meaning you will not experience locking issues during a
full, log or differential backup. As such, synchronize your timings with the system
administrators and avoid peak disk IO periods around. Typically we see full backups
running very early in the morning or later in the evening. Always monitor the average
time taken for backups and factor in your batch jobs.
Where do I backup?
Where possible, to disk then to tape. Be aware of:
a) disk capacity for FULL backups, can you store multiple full backups on disk? If you
can, try and store a number of backup days on disk.
b) additional IO resulting from full, log or differential backups – use perfmon.exe and
other IO monitoring tools to carefully check Disk queue lengths, and contention
around the backup destination disks.
c) security – who has access and why?
Tapes should be taken offsite where possible with a SLA monitored tape recall process in
place with solid vendor commitment. The responsibilities and accountabilities of not
inserting the correct tapes into drives for backup should be in place and well understood.
How big will my backup file be?
SQL Server (natively) will not compress or encrypt your backups. Consequently you may
find them overly large at times. The size of a backup is in direct relation to:
a) the databases recovery model (and the supported backup methods)
b) the type of backup being performed
c) the amount of change incurred between the last backup
d) the ALU format size (see format MSDOS command) and disk partition strategy
Full
A full backup will write out all database file data to the backup, including the transaction
logs virtual log files not currently free.
Although the total size of the backup does not measure one to one with the total used
space of the databases files, the restore of the backup file will ensure the physical
database file size within it are returned to the size at the time of the backup.
Christopher Kempster
180
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
If my log file was 4Gb at the time of the full backup, and the resulting backup file is 1Gb,
then a restore will create the log file of size 4gb; it will never shrink/data-compress the
files for you based on the data physically backed-up at the time.
To give you a broad idea of full backup size:
a) Total DB size (all files) = 3Gb, Backup size = 1.9Gb
b) Total DB size (all files) = 9.5Gb, Backup size = 5.4Gb
c) Total DB size (all files) = 34Gb, Backup size = 22Gb
To view the space required (in bytes) to perform a restore from a full backup:
RESTORE FILELISTONLY FROM DISK=’myfull.bak’
GO
This can be applied to all backup file types.
Differential
A differential backup will include all extents altered from the last FULL or DIFFERENTIAL
backup. As an extent is 64k, even a small 1 byte change in a single page will result in
the extent in which the page resides being backed-up. The differential is of course
significantly larger that transaction log backups, but can speed recovery time as it will be
a replacement for all previous transaction log backups (and differentials) taken between
the last FULL to the point when the differential was run.
Transaction Log
Some of the smallest log files will be around 56k, covering basic header information to
facilitate effective recovery using this file even though no change may have occurred
within the database (files are always created for a log backup regardless of data
changes). Changes are typically page level with another as need be for rollback/forward
information.
Using the MSDB to view historical growth
A good method of tracking backup files sizes is via the MSDB database backup tables,
namely msdb..backupset and msdb..backupfile. A great script was written by “Lila”, a
member of www.sqlservercentral.com that is well worth trying:
/***********************************************************
Check growth of .LDF and .MDF from backuphistory.
Lines returned depends on the frequency of full backups
Parameters: database name
fromdate (date from which info is required in smalldatetime)
Results best viewed in grid
***********************************************************/
--- Change these vars for your database
declare @dbname varchar(128)
declare @fromdate smalldatetime
select @dbname = 'mydatabase'
select @fromdate = getdate()-30 ---filegrowth last 30 days
Christopher Kempster
181
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
create table #sizeinfo
(
filedate datetime null,
dbname nvarchar(128) null,
Dsize numeric (20,0) null,
Lsize numeric (20,0) null,
backup_set_id int null,
backup_size numeric (20,0) null
)
--- tmp pivot table to get mdf en ldf info in one line
insert #sizeinfo
select
filedate=bs.backup_finish_date,
dbname=bs.database_name,
SUM(CASE file_type WHEN 'D' THEN file_size ELSE 0 END) as Dsize,
SUM(CASE file_type WHEN 'L' THEN file_size ELSE 0 END) as Lsize,
bs.backup_set_id,
bs.backup_size
from
msdb..backupset bs, msdb..backupfile bf
where
bf.backup_set_id = bs.backup_set_id
and
rtrim(bs.database_name) = rtrim(@dbname)
and
bs.type = 'D' -- database
and
bs.backup_finish_date >= @fromdate
group by
bs.backup_finish_date, bs.backup_set_id, bs.backup_size, bs.database_name
order by
bs.backup_finish_date, bs.backup_set_id, bs.backup_size, bs.database_name
select
Date=filedate,
Dbname=dbname,
MDFSizeInMB=(Dsize/1024)/1024,
LDFSizeInMB=(Lsize/1024)/1024,
TotalFIleSizeInMB=((Dsize+Lsize)/1024)/1024,
BackupSizeInMB=(backup_size/1024)/1024
from
#sizeinfo
order by
filedate
drop table #sizeinfo
We can export this easily to Microsoft Excel and graph for regular monthly meetings and
ongoing capacity planning. Third party tools like Diagnostic Manager from NetIQ have
this sort of functionality built in.
How do I Backup/Copy DTS packages?
When you create and save a DTS package, you have these options:
a) save as a local package (also known as saving to SQL Server) – in the
sysdtspackages table of the MSDB system database
Christopher Kempster
182
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
b) save to Meta Data Services – RTbl prefixed tables that use the r_iRTbl prefixed
stored procedures in the MSDB system database. Does have some security
implications.
c) save as a Structure Storage File – file name required, stored on disk
d) save as a Visual Basic File – file name required, stored on disk
NOTE – Unless there is a need for capturing lineage information on package
execution, do not use meta data services for storage. It’s slow and there are some
security issues.
For a) and b) the DBA needs to:
a) backup MSDB database on a regular basis
a. for not just the package, but also the schedules which can be just as
important for some applications.
b) consider exporting (bcp) out the msdb..sysdtspackages table for non-meta data
services stored packages for added protection.
For c) and d) make sure your file system backup encompasses the file.
To move packages between servers, consider the above routine or the 3rd party products
below. Another option is to simply save-as the packages to the other server.
A large number of 3rd party backup products include “DTS” specific operations, but test
carefully. My concerns in this space are primarily with:
a) recovery of all packages and their history
b) persistence of the package owner properties from a security perspective
c) persistence of scheduled packages (jobs)
d) persistence of job tasks
Here are some 3rd party DTS specific export products to evaluate:
a) RobotiQ.com - http://robotiq.com/index.asp?category=sqldtscreator
b) SQLDTS.com - http://www.sqldts.com/default.aspx?272
Some Backup (and Recovery) Best Practice
The following should be carefully considered when establishing the DR plan for your SQL
Server databases. You will probably find that some are more security driven than
anything, and that is appropriate; DR is not simply backup and restore, but establishes a
range of scenarios and contingency plans that will undoubtedly cover many other aspects
of DBMS management.
Christopher Kempster
183
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
a) Do not use the DBO (db_owner) privilege for any non-dba user; no user should
be granted sysadmin or db_owner privileges that may result in “accidental”
damage to your instance or its databases over normal change management
procedures.
b) Do not make it a habit of connecting to production as sa to view data on a adhoc
basis or even to check on the system. The DBA must be very careful as a simple
mistake can prove fatal to the business.
c) Use the native SQL Server backup commands / maintenance plan rather than
using 3rd party products where possible. If you do use 3rd party products to
enhance speed, security or functionality, then run it on your test server for a few
months and test recovery scenarios carefully before moving forward into
production. Understand the implications of version changes (can still read your
old backup tapes?) and even expired registration keys (what will happen if your
key expires on Friday night and the system fails on Sunday? support available?
can you recover?)
d) Avoid streaming directly to tape
e) “Duplex” (copy) backups to another server. Do this as the last operation in the
backup and add in a step to email the DBA on failure. The system administrators
should ensure server times are synchronized to avoid “time out of sync” errors on
copy/xcopy.
f)
Try and store at least 2 days worth of backups on disk, consider compressing
backups via winzip/gzip commands. This will assist in faster recovery cycles and
avoid requesting backup tapes.
g) Monitor disk queue lengths carefully if you are sharing your backups and database
files, especially during peak loads when transaction log backups are being
dumped.
h) Run DBCC CHECKDB, CHECKCATALOG on a regular basis; take care with very
large databases and test carefully. This is essential in locating database
corruption that may go unnoticed for days, especially in lightly used or rarely hit
tables/databases. Run at off-peak times
i)
Who has access to your backup destination? How do backups get to your
development/test servers? Do not forget about the security implications.
j)
Are backups encrypted? Is the process fully documented in terms of restoration
and de-encryption? Where are the private and public keys stored?
k) Where are the system passwords stored? Do you have an emergency contact
list? What is your change policy?
l)
Ensure custom backup scripts are well tested, and flexible, ensuring that changes
in database structure do not affect the recoverability of your backup regime.
m) Choose your recovery model carefully to match business expectations. Re-affirm
the commitment made and test thoroughly, especially during bulk inserts etc.
Christopher Kempster
184
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
n) Manually script (or automate if possible) your entire database on a regular basis,
include all objects, users, database create statement etc.
o) Run SQLDIAG weekly
p) Monitor log file growth carefully and match it with an appropriate recovery model
and associated backups. Plan to shrink as required. Take care with disk space.
Keep a careful eye on transaction log sizes for example after a DBREINDEX etc.
q) Use mirror RAID arrays where possible. If write cache is enabled, cover yourself
with a UPS.
Backup Clusters - DBA
Backing up a cluster is no different from backing up a non-clustered installation. In order
to backup to disk, the disk resource must be added as a resource in the cluster group.
The rest is routine in terms of SQL*Agent jobs scheduling the write etc and the command
itself.
Microsoft have released a support document detailing NTBACKUPs over Windows 2003
within a cluster - http://support.microsoft.com/default.aspx?scid=kb;en-us;286422. In
summary, the system administrator should backup:
a) OS on each node
b) Registry on each node (system state)
c) Cluster configuration database
d) Quorum drive
e) Local drive data/SQL binaries.
If copying files between nodes, consider the /o option to retain ACL’s.
Backup Performance
To check backup and performance speed, consider the following performance monitor
(perfmon.exe) counters:
a) SQL Server Backup Device: Device Throughput Bytes/sec
To give you an idea of the raw bytes transferred over the period and
length of time taken if perfmon was monitoring for the entire period.
b) Physical Disk: % Disk Time
Generally speaking, the value should be <= 55%; anything in the order of
90% for a sustained period (5+sec) will be a problem. Drill into c) to
clarify the findings.
Christopher Kempster
185
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
c) Physical Disk Object: Avg. Disk [Write] Queue Length
Any value of 2 or higher sustained over a continuous period (5+ seconds
etc) tends to highlight an IO bottleneck. You must divide the figure by the
number of spindles in the array.
To accurately measure and understand what the figures mean, you must have an
intimate understanding of the underlying disk subsystem.
Custom Backup Routines – How to
There comes the time where the mainstream maintenance plan backup doesn’t give you
the freedom to achieve your backup requirements, be they compressed or encrypted
files, or even the need to move to a custom log shipping scenario. As the DBA works
through the multitude of recovery senarions, and becomes familiar with the backup and
restore commands, the ability to streamline the process also grows.
With appropriate sysadmin privilege, the stored procedure code is relatively simple. To
view a sample (working) routine, go here on my website:
http://www.chriskempster.com/scripts/dbbackup_ss2k.sql
The logic is:
a)
Check parameter validity
b) Check dump destination paths, estimate free space required from current
database and log size (full backups)
c) Check database status before attempting the backup
d) Determine if the backup operation requested is valid against the database
in question (ie. log backup against master will not work)
e) Create backup device
f)
Set backup parameters as per requested backup type
g) Run the backup
h) Determine if request has been made to zip/compress the file generated to
another location, and attempt the zip
i)
Determine if request has been made to copy the file generated to another
location, and attempt the copy
j)
Check for files older than N days, and remove them
The second routine will do a dir (directory listing) of a database backup file dump location
and based on the filters supplied generate the restore commands (in order). To make
this routine a lot more complex, consider RESTORE with FILEHEADERONLY etc for each
Christopher Kempster
186
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
file, then based on the result set build the appropriate restore commands without the
need for file filters to determine file types etc.
http://www.chriskempster.com/scripts/restoredbfromfiles.sql
Recovery & Troubleshooting
T
hroughout this chapter we will cover recovery scenarios. This hands-on approach
will provide detailed information about the steps taken, areas to watch out for
and how previously discussed backup plans and procedures are put to use.
The chapter is not designed for start to finish reading, it is very much a hands on
reference in which the DBA, meeting a problem, can use this chapter as a key reference
to determine the next steps and ideally an immediate solution.
Important first steps
The first thing that will strike you about this chapter is the multitude of problems the DBA
can face from day to day. Keeping yourself abreast of standard recovery scenarios
through staged scenario repetition is an essential task for all DBA’s, but no matter how
comfortable you feel about recovery, it is of utmost importance that you take the time to:
a)
Evaluate the situation
The DBA needs to determine what is in error, and more importantly why.
This involves a simple note taking of the sequence of events and times
before and after the issues was detected, and to the best of our ability the
internal and external factors surrounding the events. From here, pull
together key staff and mind map around this; do not talk recovery paths
or strategies as yet or fall into the trap of initiating a known immediate
recovery scenario.
b) Research and review possible recovery scenarios
With a solid picture of the situation, we begin to brain storm the recovery
process. If this is a relatively simple recovery or a known problem, the
review may be as quick as revisiting the processes as its applicable to the
Christopher Kempster
187
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
situation. The steps should be bullet pointed and tentatively agreed by
those present. The services SLA may dictate a specific route to be taken
or if further information is required.
c) Pull in existing DRP plans and determine their relevancy and map the
action plan. Depending on the thoroughness of the plan, it will tend to be
the key driver for recovery and communication.
d) Plan, review and communicate the strategy
Team leaders/management will be ultimately responsible for the actions
taken. This process will commit the resources required to initiate the plan.
If not already done, detailed steps and communication points are added to
the plan (paper or electronic based) and controlled via the technical
services manager.
e) Define the backup and rollback strategy
This is done in parallel to c). The DBAs and other support staff will define
the initial backups to be done before the recovery is started (very
important) and the rollback strategy on failure. Some recovery scenarios
will consist of multiple checkpoint steps with different rollback strategies.
f)
Audit your environment
The environment should be quickly audited. From a DBA perspective, the
DBA should record and validate basic server properties (for example OS
patch level, DBA collation/versions, file locations etc). This “yard-stick”
information can prove very handy throughout the recovery.
g) Take action - execute the plan.
h) Review and repeat cycle
IMPORTANT – As a manager, do not take the “we have a cluster, we can simply
fail over now and be up and running within minutes” as the immediate solution. It
is of utmost importance that we talk through the impact with technical staff. For
whatever the reason, failover may be the biggest mistake you can make. Deal
with the issue after the fact and try and work the problem through a no-blame
culture.
The first chapter highlighted some of the important management elements to build upon
within your IT shop. A structured approach to systems recovery is of utmost importance
as the pressure builds to return services back in working order. Here, effective mitigation
of human error is the key.
Contacting MS Support
Microsoft support services is highly recommend, primarily in clustered or replicated
database scenarios, or when you feel uncomfortable with the recovery scenario (never
Christopher Kempster
188
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
hesitate and take a gamble in production ). The website for support numbers can be
found at:
http://support.microsoft.com/default.aspx?scid=fh;EN-US;CNTACTMS
A more effective costing structure for most would be the “SQL Database Support
Package” at around $1900 US (shop around, many gold partners of Microsoft offer
equivalent services and special rates). This will cover 12 non-defect support incidents
with a variety of other features.
Before you ring Microsoft, collect the following information:
a) Have a quick scan through the fix list of SQL Server service packs (depending on
the version you are running) http://support.microsoft.com/default.aspx?kbid=290211
b) Did you try a Google search over your specific problem/error? –there may well be
a very simple solution.
c) SQL Product version and current build
d) Collect server information and dump to a standard text file
e) Run SQLDIAG where possible and dump to a file
f)
Run DBCC CHECKDB and dump results to a file
Include diagrams of your architecture, and information about applications running against
the instance etc.
The Microsoft support team may also direct you to the utility, PSSDIAG. It is covered
well on MS Support:
http://support.microsoft.com/default.aspx?kbid=830232
What privileges do I need to Restore a Database?
The instance wide (server) roles that allow backup and restore privileges are sysadmin
and dbcreator only. The db_owner database privilege allows backup privileges only. You
cannot restore databases with only this privilege, even if you are the “owner” of the
database within the instance.
The DBA can work around the restore problem for db_owner users (and other logins) via
the xp_cmdshell extended stored procedure running a simple isql command via a trusted
login. For example, the stored procedure may include this statement:
set @sqlstring = 'RESTORE DATABASE mydb
' + 'FROM DISK=''' + @p_path + ''' ' +
'WITH MOVE ''corpsys_raptavetmiss_Data'' TO
''d:\dbdata\mydb\mydb_data01.mdf'',
MOVE ''corpsys_raptavetmiss_Log'' TO
''d:\dbdata\mydb\mydb_log01.mdf'', RECOVERY'
Christopher Kempster
189
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
set @isqlstring = 'isql -S' + @@SERVERNAME + ' -E -Q "' + @sqlstring + '"'
exec master..xp_cmdshell @isqlstring
Of course you may need to setup the sql*agent proxy account to leverage xp_cmdshell
(see later), and secure this routine appropriately. The isql command which connects to
the instance uses the service account user that is running the instance. This user has
sysadmin privileges.
Revisiting the RESTORE command
To be honest the BOL covers the RESTORE command very well and its hardly worth the
effort re-interating the syntax. That said, we will cover the more common syntax
statements used throughout the ebook.
The restore command has two core functions:
a) Restoration and/or re-application of database backup files against the
instance to restore to a known point in time
b) Verification-of and meta-data dumps of backup file information
For a), the command is simple RESTORE [DATABASE, LOG] <options>, where-as with b)
we have:
1) restore filelistonly
2) restore headeronly
3) restore labelonly
4) restore verifyonly
These commands can prove very usefull when determing the contents of backup files to
facilitate more effective and timely recovery. Perhaps the most helpful is the headeronly
option and its databaseversion column returned. See the BOL for a comprehensive
summary.
The restore command is broken down into:
a) restoration of full and differential backup files only via the restore database
<options> command; or
b) restoration of log backup files via the restore log <options> command.
I believe most DBA’s will utilize a small number of options with the command. The most
essential is the FROM clause (where the backup is), and the MOVE clause (where the files
are restored to on the file system), for example:
restore database mydb from disk = ‘c:\mydb_full.bak’ with recovery
- or -
Christopher Kempster
190
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
exec sp_addumpdevice 'disk', 'mydb_fullbackup', 'c:\mydb_full.bak'
restore database mydb from mydb_fullbackup with recovery
The MOVE command is essential to restore files to another location (note the MOVE
option is repeated for each file with a single WITH option):
RESTORE DATABASE mydb FROM DISK = 'c:\mydb_full.bak'
WITH MOVE 'mydb' TO 'c:\mydb_test.mdf',
MOVE 'mydb_log' TO 'c:\mydb_test.ldf'
Be it the LOG or DATABASE option the options are basically the same.
The final part of restoration is the state in which the command leave us when its run,
namely:
a) WITH STANDBY = <filename>
The STANDBY clause is essential for log-shipping, but also allows the DBA
to recover a database file backup AND open the database in read only
mode; this is very handy when trying to determine the point in which to
end recovery or when corruption began etc. The filename specified will
include undo information to “continue the recovery where it was stopped”.
restore database mydb from disk = ‘c:\mydb_full.bak’ with
standby=’c:\mydb_undo.bak’
b) WITH NORECOVERY
Leaves the database in a state in which further backup files can be
applied, in other words, the restore will NOT rollback any uncommitted
transactions. The command is classically used when rolling forward from
a full backup, and subsequently applied one more differential or log
backups. For example:
restore database mydb from disk = ‘c:\mydb_full.bak’ with norecovery
restore log mydb from disk = ‘c:\mydb_log1.bak’ with norecovery
restore log mydb from disk = ‘c:\mydb_log2.bak’ with recovery
go
c) WITH RECOVERY
Completes the recovery process and marks the database as open and
available for user connections. The command completes recovery with the
rollback of uncommitted transaction. This option is the default option in
this list.
restore database mydb with recovery
d) WITH STOPAT[mark]
The STOPAT clause can only be specified with differential and log backup
files. This option will roll forward to a specific date/time only within the
backup file, if the time specified is outside of or not encompassed by the
Christopher Kempster
191
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
file, then you will be told. You cannot use this option to skip
time/transactions or backup files. Once specified its basically the end point
of recovery, and thus it tends to be used hand-in-hand with the WITH
RECOVERY option.
restore database mydb from disk = ‘c:\mydb_full.bak’ with norecovery
restore log mydb from disk = ‘c:\mydb_log1.bak’ with norecovery
restore log mydb from disk = ‘c:\mydb_log2.bak’ with recovery, stopat
= ‘May 20, 2004 1:01 PM’
From this quick overview, we will, in no particular order, tackle the multitude of recovery
scenarios. At the end of the day, practice makes perfect (so they say).
Auto-Close Option & Timeouts on EM
Enabling this database option is bad practice and should be avoided (hopefully the option
will be removed completely in the future). Basically the option opens (mounts) and
closes (dismounts) the database (and its files) on a regular basis (which is typically
connection based, or in other words how busy the database is at a point in time).
An adverse effect from having this option on for larger databases (or numerous
databases within an instance) is a very slow enterprise manager and slow OLEDB
connections. Expanding the databases folder can take an agonizing amount of time.
This can also happen if ODBC tracing options are enabled.
Can I re-index or fix system table indexes?
Generally speaking, system object problems should be treated with utmost caution, and
dbcc checkdb should be run over other databases to determine the extent of the problem
at hand. Personally, I would restore from backup or call Microsoft support to ensure I am
covering all bases.
If you experience errors with a system object and its index, the undocumented command
sp_fixindex may assist you. The routine exists in the master database and you can view
the code. Do note (as per the code), that the database must be in single user mode for
the command to run. The command itself will run a:
a. dbcc dbrepair – with repairindex option; or
b. dbcc dbreindex
This command will not repair the clustered index on sysindexes or sysobjects.
NOTE – The instance can be started in Single User Mode via the –m startup
parameter.
The steps may be:
Christopher Kempster
192
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
--- Backup mydb database first!, the mydb database and its sysfilegroups table is corrupt
-use master
go
exec sp_dboption mydb, single, true
go
use mydb
go
checkpoint
go
exec sp_fixindex mydb, sysfilegroups, 2
go
exec checkdb
go
exec checkalloc
go
exec sp_dboption mydb, single, false
go
--- Backup again!
--
If the command refuses to run, for whatever reason, then consider running the DBCC
equivalents yourself.
ConnectionRead (WrapperRead()). [SQLSTATE 01000]
This is a strange error that I have seen in two distinct cases:
a) Windows 2000/2003 – Confirmed bug as per 827452
b) Windows NT 4.0
If you have multiple SQL Server listener protocols enabled, such as tcpip and named
pipes, the error may result due to a malformed or overfilled TCPIP packets for the
instance. The user will experience the error and an immediate disconnection from the
instance. The error will persist for further connections to the instance but not every
operation.
We have found that a simple reboot under Windows NT 4.0 resolves the error. I found
no support documents related to known problems. Another option:
a) for the code segment – try and split over two distinct transactions or
blocks of code (between BEGIN END statements)
b) add SET ONCOUNT ON
c) force TCP over named pipes (especially) or other protocols
d) did you change the network packet size option? reset if you can back to
4096 (default). The error may also be related to the default being too
small, but I would highly recommend that you do not alter it without
thorough testing.
Christopher Kempster
193
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
Space utilisation not correctly reported?
A DBA comes to you puzzled about the size of the database. He had run sp_spaceused to
determine how much space the database was using and it had returned an unexpected
result. The space was substantially incorrect. The most likely approach to fix this problem
is to run:
DBCC UPDATEUSAGE (0)
Guest Login missing on MSDB database
If the guest user is removed from MSDB you may experience a variety of symptoms,
such as:
a) List of packages under Data Transformation Services in EM are
blank/missing
b) Cannot create package error – “server user ‘mylogin’ is not a valid user in
database ‘msdb’”
The guest user has public role permissions only to MSDB. The guest user is not mapped
to a specific login as such (virtual login user), and has these base properties as defined in
msdb..ssyusers:
a) uid = 2
b) status = 2
c)
name = guest
d) gid = 0
e)
hasdbaccess = 1, islogin = 1, issqluser = 1 (other is columns are zero)
If you logged in as some user, we could add this user to the MSDB database, grant public
role access and the errors disappear. Otherwise, we need to re-create the guest user in
MSDB database.
Troubleshooting Full Text Indexes (FTI)
There is a very good Microsoft Support document on this. The article number is
240867, titled “INF: How to Move, Copy, and Backup Full-Text Catalog Folders and
Files”.
Although fine and well, the document is lengthy and involves numerous registry changes
– all are high risk changes; consider a rebuild of the catalog if its not overly large and you
can deal with the wait (and added CPU load). My general approach to FTI is to code two
alternatives in your application logic:
a) you can actively “turn off” FTI via a single parameter (row in a table); or
Christopher Kempster
194
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
b) some other variable embedded in each stored procedure that uses FTI and will
execute an equivalent non-FTI SQL (typically using like %__%). This requires
more maintenance/coding (two sets of SQL), but it can really save your bacon.
If you do copy a database to another server, FTI is solid enough to rebuild its catalogs in
its default FTI destination folder without too many complaints and reconfiguration from
your side. I have a classic case in production where my catalogs are on e:\. This path
doesn’t exist on my DEV and TEST servers. I simply select the catalog and ask it to
rebuild, and it will move to d:\ (its default path). If this does not work, then run the
rebuild at the table level instead, and repeat for each table using FTI.
General FTI Tips
Here are some general tips:
•
"Language database/cache file could not be found" – can be caused by missing
cache files (wdbase.*), that can be copied from the SQL CD; there should be 8
files
•
Check the value for ‘default full-text language’ option returned by sp_configure;
check this language code file exists, US_English = 1033, UK_English = 2057 for
example.
•
If the catalog is not building, check your language breaker and try NEUTRAL – run
a full populate after the change. Check all columns carefully.
•
Ensure the MSSEARCH service is starting correctly. Review services applet and
event logs.
•
Use a completely different catalog to the existing (working) FTI'ed table columns.
On build via EM, at the table level force a full refresh, then check the indexes
population and current row counts.
•
Try very simple full text queries first. The indexing may be working but your
query isn’t.
•
The incremental update of the catalog can only work if the table has a timestamp
column; depending on your server and how busy the system is, the catalog can
take between 20sec to a minute to update. Take care with large updates, the
catalog can take some time to catch up.
•
If you want to be accent insensitive, then set the language settings appropriately
on the table columns being indexed. I have had problems with this. It simply did
not work under SQL Server 2k and I believe it to be a bug.
Locked out of SQL Server?
This problem is typically related to your authentication mode for SQL Server (32), that
being:
a) SQL & Windows (also know as mixed mode)
b) Windows – which effectively disables the sa account
Christopher Kempster
195
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
DBAs may find themselves locked out after removing the BUILTIN/Administrators login.
This login allows all NT users with administrator privilege to connect to the instance with
sysadmin privileges. As such, its not uncommon to remove this account and possible reassign it. In the process, the unsuspecting DBAs may find themselves locked out.
If you cannot login via Windows security but know the SA account, then first check the
LoginMode entry in the registry:
SQL Server 7.0:
HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft \MSSQLServer\MSSQLServer\LoginMode
SQL Server 2000:
HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\MicrosoftSQLServer\<instance_name>\MSSQLServer\LoginMode
NOTE – Also check the administrator account and/or the service running your SQL
Server instance service has access to this key via regedt32.exe
A value of 1 (one) is Windows authentication only, and 2 (two) for SQL & Windows
(mixed mode). If this value is 1, then alter it to 2 and restart the instance. IF you know
the SA password, you should be able to login (remember – the default install PW is
typically blank).
If you cannot remember your SA account, then leverage from the
BUILTIN\Administrators account to login to the instance and alter it. This of course
assumes you have not removed it, or have an alternative Windows or SQL login to get in.
Another idea (and more drastic solution) is to restore the master database, either from a
previous backup or using rebuildm.exe
If you still have issues, consider a password cracking organization such as Password
Crackers Inc found at www.pwcrack.com, or try Microsoft Support Services.
Instance Startup Issues
“Could not start the XXX service on Local Computer”
This is a classic problem in which the SQL Server service is failing to start. Unless
reported otherwise in the SQL Server logs or the SQL Server Agent logs, the service itself
is an issue and not the underlying DBMS and its data files/binaries. Here is a check list of
points to consider when debugging the problem:
a) check service account user – is the:
a. password expired / account locked?
i. If the user is a domain account, this may be the case, and you
need to check carefully with the systems administrators but
account/group policies being applied
b. is the user account part of the BUILTIN/Administrator group in SQL
Server, in which access to the instance is being granted?
i. If not
Christopher Kempster
196
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
1. does the user has a instance login account with the
sysadmin (sa) privilege?
2. is the default database for the login valid?
c. If the service account user is anything by the system account, login locally
to the server, did it work? Get your administrator to verify your
underlying OS privileges if you have customized the account to and not
given it administrator access.
d. Try and use the built in SYSTEM account and re-start the instance, same
error? or no problems?
e. Has the domain name changed? is the login account for the user (if not
using builtin/administrators) still valid?
f.
Instance name has special characters in it? (typically picked from the
name of the host). Even “-“ may affect the connection.
g. Has the 1434 UDP port been blocked?
If you are desperate, create a new local administrator account and start the service with
it. Debug flow on issues regarding SQL Server Agent and possibly replication problems
thereafter.
SSPI Context Error – Example with Resolution
In this example, I have created a domain user called “SQLServerAdmin” that will run the
SQL Server Service including SQL Agent for a named instance. The software was
installed as this user, we did not select the named pipes provider though on installation,
only opting for TCP/IP over port 2433.
The instance started OK, but when attempting to connect via EM (enterprise manager)
using pure windows authentication whilst logged in as SQLServerAdmin domain user, we
had the following message:
Christopher Kempster
197
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
The SQL Agent service was also failing to start with the following message:
SQL Agent Error:
[165] ODBC Error: 0, Cannot generate SSPI context [SQLSTATE HY000]
If we changed the services startup user to local system then we had no issues. Also, if
we re-enabled named-pipes and kept SQLServerAdmin user, again we had no issues with
SSPI context. Even so, I didn’t like it and took action.
First of all we needed to check the SPN via the setspn command downloadable from this
site: http://www.petri.co.il/download_free_reskit_tools.htm
Get the hostname of the server, run hostname.exe from the DOS command line, and
pass it through to setspn as follows:
C:\Program Files\Resource Kit>setspn -L royntsq02
Registered ServicePrincipalNames for
CN=ROYNTSQ02,CN=Computers,DC=corpsys,DC=training,DC=wa,DC=gov,DC=au:
MSSQLSvc/royntsq02.corpsys.training.wa.gov.au:2433
HOST/ROYNTSQ02
HOST/royntsq02.corpsys.training.wa.gov.au
The key item is:
MSSQLSvc/royntsq02.corpsys.training.wa.gov.au:2433
Christopher Kempster
198
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
All seems fine here. As we are using TCP/IP, we need to ensure the nslookup command
is successful:
C:\Program Files\Resource Kit>nslookup
DNS request timed out.
timeout was 2 seconds.
*** Can't find server name for address 163.232.6.19: Timed out
DNS request timed out.
timeout was 2 seconds.
*** Can't find server name for address 163.232.6.22: Timed out
*** Default servers are not available
Default Server: UnKnown
Address: 163.232.6.19
Again, if we enabled named pipes we don’t have an issue. Therefore the error must be
related to the nslookup results. These were resolved by the systems administrator,
giving us:
C:\Documents and Settings\SQLServerAdmin>nslookup
Default Server: roy2kds1.corpsys.training.wa.gov.au
Address: 163.232.6.19
Finally, we shutdown the instance, and set it back to the SQLServerAdmin user account
once more, only to give us this message:
Christopher Kempster
199
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
The server administrators were contacted, the time resynced and the server rebooted.
On startup the service account started without any further issues and access via EM was
successful using windows authentication.
NOTE - using the SQL Server option to register the instance in AD (active
directory) was also unsuccessful.
Account Delegation and SETSPN
If you work in a multi-server environment, namely a separate web server and database
server, you "may" experience a problem when relying on integrated security to connect
through to the underlying DBMS. The error is similar to this:
Server: Msg 18456, Level 14, State 1, Line 1
Login failed for user 'NT AUTHORITY\ANONYMOUS LOGON'
This error message may also pop up when connecting between databases on different
servers.
To get around this issue, try to activate account delegation. This allows the retention of
authentication credentials from the original client. For this to work, all servers must be
running Windows 2000 with Kerberos support enabled, and of course using Active
Directory Services.
To activate delegation, shutdown your SQL Server instance, and for the service user
(don’t use the system account), select properties of the account and check the box
"account is trusted for delegation". This is found with other options such as reset
Christopher Kempster
200
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
password, account locked etc. The user requesting the delegation must not have this
option set.
Once done, the SQL Server requires a SPN (service principal name) assigned by the
administrator domain account. This must be assigned to the SQL Server service account.
This is done via the setspn utility found in the Windows 2k resource kit, for example:
setspn -a MSSQLService/myserver.chriskempster.com sqlserveradmin
You must be using tcpip as setspn can only target TCPIP sockets. Multiple ports? create
an SPN for each.
setspn -a MSSQLService/myserver.chriskempster.com:2433 sqlserveradmin
I’m getting a lock on MODEL error when creating a DB?
The DBA may experience this error:
You may get this error when another SPID has a open session against the model
database:
Close or kill the session and try again.
If two create database commands attempt to run simultaneously, you will also receive
the error.
Christopher Kempster
201
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
Transaction Log Management
Attempt backup but get “transaction log is full” error
The backup command itself will attempt a checkpoint which may fail if the transaction log
is full and cannot extend/grow to accommodate the operation. Use the command DBCC
SQLPERF(LOGSPACE) against the database to view current usage properties. The DBA
should check for any open transactions via the command DBCC OPENTRAN or
::fn_get_sql(), determining the SPID via the master..sp_who2 command.
The DBA needs to:
a) Check free disk space
Determine if more disk space can be allocate for the database file extension and
its auto-grow value at a absolute minimum.
This is a simple operation and requires no explanation. Be aware though of the
database transaction log files growth properties which may been to be altered, if
you attempt this you may get the same error again when all you you need to do
is make more free space on disk.
b) Check database recovery model (full, simple or bulk logged?)
Was the database in a bulk-logged or full recovery model by mistake? But do not
change it for the sake of simply resolving your immediate problem.
a. I only do full backups at night. I’m using full / bulk logged recovery, but
it’s mid-day and my transaction logs are huge and now full ! I don’t want
to loose any transactions, now what?
i. You have no choice but to backup the transaction log file, or
attempt a full backup. If disk space is an issue but the standard
SQL Server backup is too large, then consider a third party backup
routine like Litespeed. Once backed up, we will cover transaction
log shrinking later in the section.
b. Whoops, I wanted a simple recovery model not a full or bulk-logged !
i. Discussed in the next section.
c) Simply hit the files grow limit?
Alter as required via enterprise manager with a right click and properties on the
database and altering the max-file-size properties.
d)
Can another database log file be added against another disk?
IMPORTANT - Before selecting an option above, consider a quick re-test backing
up the transaction log via the command:
backup log MyDb to disk = ‘c:\mydb.bak’ with init
Christopher Kempster
202
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
The error itself will be something like:
The log file for database ‘mydb' is full. Back up the transaction log for the database to free up some log space.
Alter recovery model, backup and shrink log file
If you believe b) is the best option then:
a) Run enterprise manager
b) Alter recovery model of the database of the database to simple. You may get this
error message:
c) Ignore this message. Double check via Enterprise Manager by checking the
recovery model once again, you will find it is set to simple.
d) Backup the log file with truncate only option
backup log mydb with truncate_only
e) Check the file size, its large size will remain, to shrink this file select properties of
the database, navigate to the transaction log tab, and remember the filename
(logical), this is used in the command below to shrink this now truncated file.
Once done re-check the size via Enterprise Manager. If no change close the
properties window of the database and take another look.
dbcc shrinkfile (mydb_log, truncateonly)
IMPORTANT – You will lose all ability to perform a point in time recovery after
step d) is performed. I recommend that you do a full backup of the database if
necessary after e).
Shrinking Transaction Log Files
Step 1. Get basic file information
Before we attempt any shrink command, we need to collect some basic information on
the transaction log file:
exec sp_helpdb
select name, size from mydb..sysfiles
DBCC LOGINFO(‘mydb’)
The DBCC command will return the virtual logs within the physical log. Be warned that it
may be large:
Christopher Kempster
203
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
The transaction log of course expands itself in units of the virtual log file (VLF) size, and
may only compress itself to a VLF boundary. A VLF itself can be in a state of:
a) active – starts from the minimum LSN (log sequence number) of an active
or uncommitted transaction. The last VLF ends at the last written LSN.
b) recoverable – portion of the log that precedes the oldest active transaction
c) reusable
The key here is based on the recovery model of the database, SQL will maintain the LSN
sequence for log backup consistence within the VLF’s, ensuring the minimum LSN cannot
be overwritten until we backup or truncate the log records.
The key here is the STATUS and position of the most active log (status = 2). Also check
for uncommitted or open transactions and note their SPIDS.
use mydb
go
dbcc opentran
select
spid, blocked, open_tran, hostname, program_name, cmd, loginame,
sql_handle
from
master..sysprocesses
where
open_tran > 0
or
blocked <> 0
Step 2. I don’t mind loosing transaction log data (point in time recovery is not important to
me), just shrink the file
Run the following command to free the full virtual logs in preparation for shrinking:
BACKUP LOG mydb WITH TRUNCATE_ONLY
- or BACKUP DATABASE mydb TO DISK = 'NUL'
Once done, alter the recovery model of the database as need be.
Skip to step 4.
Step 3. I need the transaction log file for recovery
Then simply backup the transaction log file, free disk/tape space may be your problem
here. Also, be conscious of your log shipping database and its recovery if you are using a
standby database.
Christopher Kempster
204
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
Step 4. Shrink the transaction log
The DBCC SHRINKFILE command is the command we will use. Be aware that in SQL
Server v7 the DBA would need to generate dummy transactions to move the active
virtual log to the start of the file (see results of the DBCC command in step 1). This is
not required in 2000.
DBCC shrinkfile (mydb_log, 10)
.. and revisit file sizing.
If you expect log growth, then pre-empt the growth to some degree by pre-allocating the
appropriate storage rather than letting SQL do the work for you.
Consider a fixed growth rate in Mb over a percentage of the current total. The files can
“run-away” in terms of used space. Always be aware that auto-growth options should be
used as a contingency for unexpected growth.
Be aware that shrinking data files will result in a large number of transaction log entries
being generated as pages are moved. Keep a close eye on this when running the
command for these files.
Here is another example:
backup log corpsys_raptavetmiss with truncate_only
dbcc shrinkfile (corpsys_raptavetmiss_log, truncateonly)
DbId FileId CurrentSize MinimumSize UsedPages EstimatedPages
------ ------ ----------- ----------- ----------- -------------10
2
128
128
128
128
(1 row(s) affected)
Christopher Kempster
205
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
Rebuilding & Removing Log Files
Removing log files without detaching the database
To remove (or consolidate) transaction log files from many to a single file, without deattaching the database from the instance then follow these steps. The scenario is based
on a test database with three log files.
a) Backup or truncate the transaction log file(s) – all files, though physically
separate, are treated as a single logical file written to serially.
b) Check the position of the active portion of the log file using the DBCC
LOGINFO(‘cktest’) command, and look at cktest..sysfiles to marry up the
file_id to the physical filename:
dbcc loginfo('cktest')
select fileid, name, filename from cktest..sysfiles
c) Be warned that active transaction will impact the success of the following
commands. We will use DBCC shrinkfile and its emptyfile option to tell
SQL not to write to log files two and three:
DBCC SHRINKFILE ('cktest_log2', EMPTYFILE )
DBCC SHRINKFILE ('cktest_log3', EMPTYFILE )
Christopher Kempster
206
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
d) Remove the files:
ALTER DATABASE cktest REMOVE FILE cktest_log2
ALTER DATABASE cktest REMOVE FILE cktest_log3
e) Check via dbcc loginfo:
The physical files are also deleted.
Re-attaching databases minus the log?
Do not action this method. It is more of an informational process, or warning if you may,
to demonstrate that it is not possible:
a. Backup the database transaction log before detaching
b. Via EM or the command line, detach the database (remove sessions of
course):
c. Copy or rename the log files to be removed
d. Use the command line, or EM to attach the database:
Christopher Kempster
207
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
Note that OK is greyed out. Also note that the command line alternative
will give you this error:
Server: Msg 1813, Level 16, State 2, Line 1
Could not open new database 'cktest'. CREATE DATABASE is aborted.
Device activation error. The physical file name 'c:\cktest_log2_Log.LDF'
may be incorrect.
Device activation error. The physical file name 'c:\cktest_log3_Log.LDF'
may be incorrect.
So what next?
e. The sp_attach_single_file_db command shown below will not work, as this
simple command relies on a LDF file only, which is not our situation:
EXEC sp_attach_single_file_db @dbname = 'cktest', @physname =
'C:\cktest_data.mdf'
Server: Msg 1813, Level 16, State 2, Line 1
Could not open new database 'cktest'. CREATE DATABASE is aborted.
Device activation error. The physical file name 'c:\cktest_Log.LDF' may be
incorrect.
Device activation error. The physical file name 'c:\cktest_log2_Log.LDF'
may be incorrect.
Device activation error. The physical file name 'c:\cktest_log3_Log.LDF'
may be incorrect.
You can only work around this with sp_attach_db; the
sp_attach_single_file_db is not intended for multi-log file databases.
As you can see here, we can use the detach and attach method only for moving files, and
not to consolidate them.
Christopher Kempster
208
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
Using DBCC REBUILD_LOG()
The DBCC REBUILD_LOG command can be used to re-create the databases transaction
log file (or consolidate from many log files down to one), dropping all extents and writing
a new log of a single new page. Do note that this is an undocumented command.
In order for this command to work, you need to:
a)
Kill (or ask people to logoff) user sessions from DB
b) Have the master database active - use master – otherwise you will get the
error “User must be in the master database.”
c) Database must be must be put in bypass recovery (emergency) mode to
rebuild the log.
d) Stop/Start SQL Server – may cause you to look at an alternate method?
e) Run the DBCC command now
No specific trace flag is required for this command to run.
WARNING – Do not take this command lightly. Before any rebuild ask yourself
the question “do I need to backup the transactions I am possibly about to remove
in rebuilding the log file?”. Especially if the DB is part of replication and the
transactions have yet to be pushed/pulled. Consider another solution if all you are
trying to do is shrink the log file.
Here is an example:
-- do a full backup, and ideally backup the physical database files as well
use master
go
-- so we can set the DB in bypass recovery mode (or emergency mode)
exec sp_configure 'allow updates', 1
reconfigure with override
go
select * from sysdatabases where name = '<db_name>'
-- remember the STATUS and DBID column values
begin tran
-- set DB into emergency mode
update sysdatabases
set status = 32768
where name = '<db_name>'
-- only 1 row is updated? If so, commit, query again if need be
commit tran
-- STOP the SQL Server instance now.
-- Delete or rename the log file
-- START the SQL Server instance.
-- Run the DBCC command
DBCC REBUILD_LOG (trackman,'c:\work\ss2kdata\MSSQL$CKTEST1\data\testapp_log.ldf' )
Warning: The log for database 'trackman' has been rebuilt. Transactional consistency has been lost.
DBCC CHECKDB should be run to validate physical consistency. Database options will have to be reset,
and extra log files may need to be deleted.
Christopher Kempster
209
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
DBCC execution completed. If DBCC printed error messages, contact your system administrator.
-- optional - run Check DB as mentioned to validate the database, check SQL Server logs as well.
exec sp_configure 'allow updates', 0
reconfigure with override
go
Do note that if the database was, say, in DBO use only mode before the rebuild, it will be
returned to this mode after a successful DBCC REBUILD_LOG. The DBA should re-check
the STATUS of the database carefully, and update the sysdatabases tables accordingly if
the status has not return to its original value. If the old log file was renamed, remove it.
The file size will be the default 512Kb. Check growth properties, and resize as you see
fit. Revisit your backup strategy for this database as well to be on the safe side.
Can I listen on multiple TCP/IP ports?
Run the server network utility GUI, select the appropriate instance. Highlight the enabled
protocol (TCP/IP) and click the properties button; for the default port enter your port
numbers separated by commas, for example: 1433,2433,2432. Restart the instance for
the changes to take effect and check your sql server logs.
You will see something like:
SQL server listening on 163.232.12.3:1433, 163.232.12.3:2433, 163.232.12.3:2432, 127.0.0.1:1433,
127.0.0.1:2433, 127.0.0.1:2432.
SQL Server is ready for client connections
Remember not to use 1434, which is a reserved UDP port for SQL Server instance "ping".
Operating System Issues
I see no SQL Server Counters in Perfmon?
This problem is a real pain and I have yet to determine why this issue occurs. The
problem “seems” to manifest itself on installation but this of course may vary per site.
Follow the steps below to assist in recovering them.
The SQL Server counters are automatically installed with the instance. If they are
not, then try to re-register sqlctr80.dll and run the file sqlctr.ini, both located in the binn
directory for the instance. The DBA should also try the command lodctr.exe sqlctr.ini,
and the unlodctr command. Always issue the unlodctr command before lodctr, eg:
C:\>unlodctr MSSQL$MY2NDINSTANCE
Removing counter names and explain text for MSSQL$MY2NDINSTANCE
Updating text for language 009
C:\Chris Kempster>
C:\Program Files\Microsoft SQL Server\MSSQL$MY2NDINSTANCE\Binn>
lodctr MSSQL$MY2NDINSTANCE sqlctr.ini
Re-start your server.
Christopher Kempster
210
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
Some important items to remember though with the counters:
a) each database instance has its own set of counters as per the \binn directory
for each instance, (C:\<my ss2k install
path>\<instance>\BINN\SQLCTR80.DLL)
b) the system table master..sysperfinfo includes many of the database
performance counters. Many of these values are cumulative. See this
Microsoft article for example views over this table:
http://support.microsoft.com/search/preview.aspx?scid=kb;en-us;Q283886
c) you can monitor a server remotely, if you are having problems, map a drive
or authenticate to the remote domain first then try again, try the server IP
then the host-name (\\163.222.12.11 for example). You may need to restart performance monitor after authenticating as you can continue to have
authentication problems.
If you still have problems review the following registry keys via regedit or regedt32
(allows altering of key permissions) and ensure that:
a) they exist
b) service account running the instance has permission to create/manage the key
/HKEY_LOCAL_MACHINE/SYSTEM/ControlSet001/Services/MSSQLSERVER
/HKEY_LOCAL_MACHINE/SYSTEM/CurrentControlSet/Services/MSSQLSERVER
/HKEY_LOCAL_MACHINE/SYSTEM/CurrentControlSet/Services/MSSQL$<instance-name>/performance
Server hostname has changed
My old PC name was SECA, and was altered via xxxx to OLDCOMP. This is confirmed
(after rebooting) with the hostname DOS command:
When I run query analyser I get this:
Christopher Kempster
211
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
The hostname change is fine and has filtered through service manager (services
themselves are not affected, named instances don’t pre or postfix hostnames or depend
upon them), but the list of instances in Query Analyser is a problem.
The DBA should be aware that is not only this utility that if affected. Run Query Analyser,
connect to your instances, then run this command:
SELECT @@SERVERNAME
I get the output SECA\MY2NDINSTANCE, and not OLDCOMP\MY2NDINSTANCE as one
would expect.
For v7.0 instances:
a) After rebooting the server you may find that instances will not start or you fail to
connect. If so, re-run your SQL Server setup disk (remember your edition? – look
at past log files in notepad if not). Don’t worry about service packs etc.
b) The setup runs as per normal and upgrade your instance.
c) Start your instances, run query analyser and type in the commands:
a. SELECT @@SERVERNAME -- note the old name
b. exec sp_dropserver 'SECA\MY2NDINSTANCE' -- old name
c. exec sp_addserver 'OLDCOMP\MY2NDINSTANCE','local' -- new name
d. re-start instance
e. SELECT @@SERVERNAME -- new name?
For v2000 instances simply run step c) to e) above.
Christopher Kempster
212
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
NOTE – If you run Enterprise Manager, take note of the registered instance
names, you will need to re-register the instances before you can successfully
connect.
If you run exec sp_dropserver and get the message:
Server: Msg 15190, Level 16, State 1, Procedure sp_dropserver, Line 44
There are still remote logins for the server 'SECA\MY2NDINSTANCE'.
If replication is enabled, then disable replication in EM by connecting to the instance and
selecting properties of the replication folder:
This can be a real drama for some DBAs as this option states:
Therefore, consider the Generate SQL Script option beforehand over this EM group.
On dropping, you may get this error:
In my case this didn’t seem to matter. Tracing the output of the drop actually does a
majority of the work related to the remote users, therefore exec sp_dropserver
'SECA\MY2NDINSTANCE' and exec sp_addserver 'OLDCOMP\MY2NDINSTANCE','local'
worked fine. As for the jobs, the problem is related to the originating_server column of
msdb..sysjobs table. Consider a manual update to this column where appropriate:
Christopher Kempster
213
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
DECLARE @server sysname
SET @server = CAST(SERVERPROPERTY('ServerName') AS sysname)
UPDATE msdb..sysjobs
SET originating_server = @server
WHERE originating_server = 'seca\my2ndinstance'
Use DFS for database files?
Using DFS (distributed file system – Windows 2000 and 2003), is not recommended as
generally it is not overly different to storing your database files over standard network
shares. The issue here is more to do with the DFS replication and overall reliability of
replication end-points within the DFS; this is especially the case if you plan to use DFS as
a form of high availability.
I have not been able to find any official MS document that states that you cannot use
DFS. For best practice sake though, do it at your own peril.
MS Support document 304261 discusses the support of network database files with SQL
7 and 2000. The recommendation is SAN or NAS attached storage over any network file
shares (which require –T1807 trace flag to enable access). This is again interesting as I
have been numerous NAS implementations managed at a higher level through DFS,
albeit SQL Server was not involved.
Use EFS for database files?
The DBA may consider the Windows 2000 or above Encrypted File Systems (EFS) for
database files on NTFS volumes. To use this option:
Backup all instance databases
shutdown your database instance – if you don’t the cipher command will
report the file is currently in use.
login with the service account the SQL Server instance is using
select properties of the folder(s) in which the database files reside via
Windows explorer
select advanced option button and follow prompts to encrypt files/folders
re-start the SQL Server service
verify the successful start-up of instance and databases affected via the
encryption (or create databases after the fact over the encrypted directories).
Verify encryption of the database files via cipher.exe
IMPORTANT – Always test this process BEFORE apply it to your production
instance. Don’t laugh, I have seen it happen. Take nothing for granted. Also, do
not treat EFS as a “data encryption” scheme for your DBMS. It’s purely at a file
system level.
Christopher Kempster
214
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
For an internals overview of EFS read “Inside Encrypting File System, Part 1 and 2” from
www.winntmag.com.
To check if EFS is enabled via the command line, use the cipher command:
E:\dbdata\efs>cipher
Listing E:\dbdata\efs\
New files added to this directory will be encrypted.
E
E
EFSTEST_Data.MDF
EFSTEST_Log.LDF
Depending on the version of your OS and its Window Explorer options, the files and/or
folders may appear a different colour when encrypted.
As EFS is based on public/private keys, we can look at the certificate and export as
required:
On encrypting, the DBA should export the certificate and store it for emergency recovery
procedures if the need arises:
The importing/exporting of key information is a fundamental requirement when using
EFS – do not forget it.
If you attempt to start the service as any user other than the user that encrypted the
database data files, the instance will not start and/or your database will be marked
suspect. Here is an example SQL Server error log entry:
udopen: Operating system error 5(error not found) during the creation/opening of physical device
E:\cktemp\efs\efs_Data.MDF.
FCB::Open failed: Could not open device E:\cktemp\efs\efs_Data.MDF for virtual device number
(VDN) 1.
udopen: Operating system error 5(error not found) during the creation/opening of physical device
E:\cktemp\efs\efs_Log.LDF.
FCB::Open failed: Could not open device E:\cktemp\efs\efs_Log.LDF for virtual device number
(VDN) 2.
If the service cannot be started and, subsequently, no error is reported in the logs, check
if encryption is (one of) the issues via the cipher command.
Christopher Kempster
215
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
IMPORTANT – Install service pack two or greater of SQL Server 2000 to avoid
errors such as “there are no EFS keys in sqlstp.log for error 6006”. See support
document 299494.
Also be aware that, depending on your Windows operating system, you may experience
the problem quoted in support document 322346 (“You cannot access protected data
after you changed your password”). Specifically, EFS uses your domain password to
create a hash value for the encryption algorithm; each time a file is saved (or written to)
the system encrypts it using this hash key value. When the password is altered Windows
will not re-encrypt EFS files/folders, only when they are accessed (20). Simply be aware
of this potential problem.
Use Compressed Drives for database files?
Sure, but Microsoft may not support you; SQL Server does not support writable,
database file storage on a compressed drive. The compression algorithms disable the
Write-Ahead Logging (WAL) protocol and can affect the timing of the WriteFile calls. As
such, this can also lead to stalling conditions during checkpoint.
“previous program installation created pending file operations”
Check to see if the following key exists, if so delete it and re-try. I have successfully
done this numerous times over a Windows 2000 server with no further issues.
HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Session Manager\PendingFileRenameOperations
Debugging Distributed Transaction Coordinator (MSDTC)
problems
Failed to obtain TransactionDispenserInterface: Result Code = 0x8004d01b
The DBA may receive this error when:
a) MSDTC is forcibly stopped and re-started whilst the instance is running or starting
b) The SQL Server service has started before the MSDTC service has
The error can be common in clustered environments where services can lag behind one
another during startup.
Essential Utilities
Microsoft support tends to use three core utilities for debugging MSDTC transactions and
associated errors:
1) DTCPing - download from and documented at
http://support.microsoft.com/default.aspx?scid=kb;en-us;306843
2) DTCTester - download from and documented at
http://support.microsoft.com/default.aspx?scid=kb;en-us;293799
3) NetMon - found on Windows setup disks or resource kit
Christopher Kempster
216
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
Check 1 - DTC Security Configuration
This is a mandatory check Windows 2003 boxes (all of them if you run separate web and
database servers) if MSDTC service is intended to be used.
In administrative tools, navigate down through Component Services -> Computers, and
right-click on My Computer to get properties. There should be an MSDTC tab, with a
"Security Configuration" button. Click on that, and make sure network transactions are
enabled.
Check 2 - Enable network DTC access installed?
Navigate via the Control Panel and Add/Remove Programs, Add/Remove Windows
Components, select Application Server and click details. Ensure the Enable network DTC
access is checked. Verify whether you also require COM+ access.
Christopher Kempster
217
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
Check 3 - Firewall separates DB and Web Server?
The transaction coordinator uses random RPC (remote procedure call) ports. By default
the RPC service randomly selects the port numbers around 1024 (note that port 135 is
fixed and is the RPC endpoint port).
To alter the registry entries, see MS Support document:
http://support.microsoft.com/default.aspx?scid=kb;EN-US;250367
The keys are located at:
HKEY_LOCAL_MACHINE\Software\Microsoft\Rpc
and include key items:
a. Ports
b. PortsInternetAvailable
c. UseInternetPorts
The document states that ports 5000 to 5020 are recommended, but can range from
1024 to 65535.
This is required on the DB server and Web server. Reboot is required.
Check 4 - Win 2003 only - Regression to Win 2000
Ensure checks 1 and 2 are complete before reviewing this scenario. Once done, run
through the following items as discussed on this support document:
http://support.microsoft.com/?kbid=555017
If you have success, add in/alter the following registry key, where 1 is ON:
HKLM\Software\Microsoft\MSDTC\FallbackToUnsecureRpcIfNecessary, DWORD, 0/1
Apply to all of the servers involved in the DTC conversation. You need to restart the
MSDTC service.
Christopher Kempster
218
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
Check 5 - Win 2003 only - COM+ Default Component Security
New COM+ containers created in COM 1.5 (Windows 2003) will have the "enforce access
checks for this application" enabled.
Uncheck this option if you are experiencing component access errors, or cannot
instantiate object errors on previously running DLLs. Upgraded operation systems and
their containers will not have this option checked.
Also refer to MS support article http://support.microsoft.com/?id=810153
Common Development/Developer Issues
I’m getting a TCP bind error on my SQL Servers Startup?
For developers using personal edition of SQL Server, they may experience this error:
2004-06-14 16:01:02.12 server SuperSocket Info: Bind failed on TCP port 1433.
Basically port 1433 is already in use and another must be selected or use dynamic port
selection. We can try and identifying the process using the port with port query
command line utility available from Microsoft (or change your SQL listener port):
http://support.microsoft.com/default.aspx?scid=kb;EN-US;310099
Error 7405 : Heterogeneous queries
Here is a classic error scenario - I have added a SQL 2000 linked server on one of the
servers. When I try to write a SP that inserts data into the local server from the linked
server table it doesn’t allow me to compile the SP and gives the error below.
"Error 7405 : Heterogeneous queries require the ANSI_NULLS and
ANSI_WARNINGS options to be set for the connection. This ensures consistent
query symantics.Enable these options and then reissue your query."
To get around the problem you need to set those ansi settings outside the text of the
procedure.
set ANSI_NULLS ON
set ANSI_WARNINGS ON
create procedure <my proc name here> as
<etc>
go
Christopher Kempster
219
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
Linked server fails with enlist in transaction error?
When attempting a transaction over a linked server, namely between databases on
disparate domains, you may receive this:
Server: Msg 7391, Level 16, State 1, Line 1
The operation could not be performed because the OLE DB provider 'SQLOLEDB' was unable to begin a
distributed transaction.
[OLE/DB provider returned message: New transaction cannot enlist in the specified transaction
coordinator. ]
OLE DB error trace [OLE/DB Provider 'SQLOLEDB' ITransactionJoin::JoinTransaction returned
0x8004d00a].
The DBA should review the DTC trouble shooting section in this book (namely DTCPing).
If you are operating between disparate domains in which a trust or an indirect transitive
trust does not exist you may also experience this error; SQL queries will run fine over db
links.
For COM+ developers, you may find that moving to supported rather than required for
the transaction model resolves the problem, namely for COM+ components on a
Windows 2003 domain server which are accessing a Windows 2000 SQL Server instance.
Again, the DTC trouble shooting tips will assist. Also check the firewall careful, and
consider the following registry entry:
[HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Rpc\Internet]
"PortsInternetAvailable"="Y"
"UseInternetPorts"="Y"
"Ports"=hex(7):35,00,30,00,35,00,30,00,2d,00,35,00,31,00,30,00,30,00,00,00,00,\00
Update the hosts file and include the HOST/IP combination if netbios and/or DNS
resolution is a problem.
How to I list tables with Identity Column property set?
Consider the following query:
select TABLE NAME, COLUMN NAME from INFORMATION SCHEMA.COLUMNS where
COLUMNPROPERTY (object id(TABLE NAME), COLUMN NAME, 'IsIdentity') = 1
How do I reset Identity values?
Use the command: DBCC CHECKINDENT, to view current seeded values. This command
is also used to re-seed (see BOL), but it will NOT fill in partially missing values. For
example, it will not add 5 to this list 1,2,3,4,6,7… but will carry on from 8. Do note that
truncation of a table (TRUNCATE command) will reset the seed.
How do I check that my foreign key constraints are valid?
Use the command: DBCC CHECKCONSTRAINTS
I encrypted my stored proc and I don’t have the original code!
I have seen this happen a few times and it’s a nasty problem to solve. I have used the
script written by SecurityFocus and Jamie Gama. The idea developed though by Josepth
Gama. Search on www.planet-source-code.com for “Decrypt SQL Server 2000 Stored
Procedures, Views and Triggers (with examples)”, or locate it at:
Christopher Kempster
220
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
http://www.planet-source-code.com/vb/scripts/ShowCode.asp?txtCodeId=505&lngWId=5
This original set of code has some size SP restrictions, so consider this instead:
http://www.planet-source-code.com/URLSEO/vb/scripts/ShowCode!asp/txtCodeId!728/lngWid!5/anyname.htm
How do I list all my procs and their parameters?
Use the command:
exec sp_sproc_columns.
To get a row set listing for a specific procedure consider this command:
exec sp_procedure_params_rowset dt_addtosourcecontrol
Or query the information schema: INFORMATION_SCHEMA.ROUTINE
Query Analyser queries time out?
Run query analyzer, select tools and options from the menu. Change query time-out via
the connections tab to zero.
“There is insufficient system memory to run this query”
This particular error can happen for a variety of reasons; one of which was fixed with
service pack 3 of SQL Server 2000. Refer to bug# 361298. This error was caused when
a query used multiple full outer joins followed by a possible 701 error in the SQL Server
log.
The other time I have experienced this problem is on machines with relatively little RAM
but a large number of running queries, or queries that are pulling large record sets from
numerous tables.
The problem with this error is that it may render the instance complete useless, not
accepting any further connections to the instance in a timely manner.
If you experience this problem:
a)
Recheck your SQL Server memory allocation, namely the max server
memory value. I generally recommend that all DBA’s physical set the
maximum allowable RAM. Be aware though that this value only limits the
size of the SQL Server buffer pool, it does not limit remaining unreserved
(private) memory (i.e. COMs, extended stored procs, MAPI etc).
Christopher Kempster
221
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
b) Check carefully what tables you are PINNING; alter as required, especially
when in a development environment vs production.
c) Consider trapping large SQL statements via profiler and working with your
developers to reduce IO.
d) Consider freeing the procedure and buffer caches every few days in
development – It can cause some adverse performance issue, but at the
same time have found it to resolve the issues.
e) Buy more RAM ! ☺ (to decrease buffer pool thrashing)
f)
If you do have a lot of memory, did you forget the /3GB and /PAE settings
in your boot INI?
As a general reminder for your boot.ini:
4GB RAM
/3GB (AWE support is not used)
8GB RAM
/3GB /PAE
16GB RAM
/3GB /PAE
16GB + RAM
/PAE
Memory greater than 4Gb - use the AWE enabled database option:
sp_configure 'show advanced options', 1
RECONFIGURE
GO
sp_configure 'awe enabled', 1
RECONFIGURE
Note that unless max server memory is specified with this option in SQL Server, the
instance will take all but 128Mb of RAM for itself.
My stored procedure has different execution plans?
There is a nasty “feature” with SQL Server related to parameter sniffing by the SQL
Server engine.
When you have a stored procedure with one or more parameters, the SQL engine will
generate a plan based on the incoming values and how they influence the execution
path; if the parameter value is a date for example, the optimizer may experience some
difficulty generating a optimal plan that will suffice well for ALL the possible values passed
into this parameter. This can result in very poor performance, even when table statistics
are up to date.
To get around this:
a) Assign incoming parameters to local variables and use them, rather than
the parameters.
b) Use sp_executesql to run the actual SQL statements
c) Consider with recompile in the procedure definition (does not always help
through).
Christopher Kempster
222
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
A classic example I had was an infrequently run stored procedure. The stored proc took
30 seconds to run, but take the embedded SQL out and run separately in query analyzer
saw the performance go down to 2 seconds. What was worse, the 30sec version was
doing 8 million logical reads verses 1000 for the raw SQL. Using a) resolved the
problem.
Using xp_enum_oledb_providers does not list all of them?
To get a list of OLE-DB providers on the server, run the following command:
You can search on the parse name in the registry to locate the underlying DLL.
The list will not show providers that:
a. may not be able to partake in Linked Server
b. versions of drivers where a later version has been installed.
To get a definitive list, consider the VB Script below:
'The script writes all installed OLEDB providers.
Option Explicit
Dim OutText, S, Key
'Create a server object
Set S = CreateObject("RegEdit.Server")
'
'Optionally connect to another computer
S.Connect "muj"
OutText = OutText & "OLEDB providers installed on " & _
s.Name & ":" & vbCrLf
OutText = OutText & "************************************" & vbCrLf
For Each Key In S.GetKey("HKCR\CLSID").SubKeys
Christopher Kempster
223
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
If Key.ExistsValue("OLEDB_SERVICES") Then
OutText = OutText & Key.Values("").Value & vbtab & _
" : " &
Key.SubKeys("OLE DB Provider").Values("") & vbCrLf
End If
Next
Wscript.Echo OutText
This script was developed by Antonin Foller, © 1996 to 2004, fine this at:
http://www.pstruh.cz/help/RegEdit/sa117.htm
There is also a bug in MDAC 2.7, “FIX: MSOLAP Providers Not Displayed is SQL Server
Enterprise Manager After You Upgrade Data Access Components 2.7”, Q317059.
The columns in my Views are out of order/missing?
A problem with SQL Server is the updating of view metadata in relation to the underlying
schema. Only on the creation of the view (or subsequent alter view) command or
sp_refreshview will the view meta-data be altered.
If your view includes select * statements (which it should never as best practice!), this
can be a real problem when the underlying tables are altered. The problem shows its
ugly head as new columns are missing from your views and worse still, columns suddenly
inherit another’s data (columns seem to have shifted), crashing your applications or
populating fields with incorrect values.
To get around this issue – always run sp_refreshviews for all views (or selected ones if
you are confident with the impact of the change) when a new table column and/or other
view column (if embedding calls to other views) has been created or its name altered.
The routine is very quick so I would not trouble yourself with analyzing the dependencies
and trying for formulate some fancy query logic to filter out specific views.
Here is a great example script from Dan Guzman MVP that I have adapted slightly.
DECLARE @RefreshViewStatement nvarchar(4000)
DECLARE RefreshViewStatements
CURSOR LOCAL FAST_FORWARD READ_ONLY FOR
SELECT
'EXEC sp_refreshview N''' +
QUOTENAME(TABLE_SCHEMA) +
N'.' +
QUOTENAME(TABLE_NAME) +
''''
FROM INFORMATION_SCHEMA.TABLES
WHERE TABLE_TYPE = 'VIEW' AND
OBJECTPROPERTY(
OBJECT_ID(QUOTENAME(TABLE_SCHEMA) +
N'.' +
QUOTENAME(TABLE_NAME)),
'IsMsShipped') = 0
OPEN RefreshViewStatements
FETCH NEXT FROM RefreshViewStatements INTO @RefreshViewStatement
WHILE @@FETCH_STATUS = 0
BEGIN
EXEC(@RefreshViewStatement)
Christopher Kempster
224
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
FETCH NEXT FROM RefreshViewStatements INTO @RefreshViewStatement
END
CLOSE RefreshViewStatements
DEALLOCATE RefreshViewStatements
GO
The refresh view procedure will stop when a view is invalid (i.e. columns are missing in
the underlying tables and/or other views have been removed causing it to become
invalid).
PRINT statement doesn’t show results until the end?
The PRINT statement works at the TDS level. Within a stored procedure via query
analyzer, if its buffer fills, you may see the results of your statements, otherwise you will
not see the flush until the completion of the statement.
As an alternative, use RAISEERROR. The results will show up in the output screen of
query analyser immediately, rather than waiting for the buffer to fill or dumping the
output to you on error or at the end of the routine.
RAISERROR('A debug or timing message here', 10, 1) WITH NOWAIT
This does not affect @@ERROR.
PRINT can result in Error Number 3001 in ADO
If you are executing T-SQL stored procedures then double check that you have removed
all PRINT statements (typically used when debugging your code).
Timeout Issues
The application development DBA needs a good understanding of the overarching
application architecture and subsequent technologies (ADO/OLE-DB, COM+, MSMQ, IIS,
and ASP etc) to more proactively debug and track down database performance problems.
A good place to start is common timeout error. This section will provide a brief overview
of where to look and how to set the values.
ADO
Within ADO, the developer can set:
a) connection timeout (default 15 seconds)
a. if the connection cannot be established within the timeframe specified
b) command timeout (default 30 seconds)
a. cancellation of the executing command for the connection if it does not
respond within the specified time.
These properties also support a value of zero, representing an indefinite wait.
Here is some example code:
Dim MyConnection as ADODB.Connection
Set MyConnection = New ADODB.Connection
MyConnection.ConnectionTimeout = 30
MyConnection.Open
- and -
Christopher Kempster
225
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
Set MyConnection = New ADODB.Connection
<<set strMyConn>>
MyConnection.Open strMyConn
Set myCommand = New ADODB.Command
Set myCommand.ActiveConnection = MyConnection
myCommand.CommandTimeout = 15
Take care with command timeouts. They are described by Microsoft:
http://support.microsoft.com/default.aspx?scid=KB;en-us;q188858
COM+
Any component service DLLs partaking in COM+ transactions are exposed to two timeout
values:
a) global transaction timeout (default 60 seconds)
b) class level transaction timeout (default 60 seconds)
Open component services, select properties of “My Computer”. The screen shots shown
below may differ based on the server and of course the MSDTC version being run. The
options tab allows us to set this global value for all registered COM+ DLLs:
Christopher Kempster
226
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
The next layer down is not at the individual COM component, but at the class level within
each component. This cannot be programmatically altered. Once installed, drill through
to the specific COM and select properties for the class:
Again this option is only available for those classes partaking in COM+ transactions. The
GUI also allows transaction support properties to be altered, but this can be fully
controlled (and should be) by the developer.
OLEDB Provider Pooling Timeouts
This is somewhat off track for the ebook, but the DBA should know that control over the
pooling of unused open sessions can be controlled at the OLEDB provider level. This is
applicable from MDAC v2.1 onwards. Examples of this can be found at:
http://support.microsoft.com/default.aspx?scid=kb;en-us;237977
IIS
Within IIS we can set session timeout values:
a)
b)
c)
d)
globally for the website
for each virtual directory for a website
in a virtual directory global.asa
at the individual ASP page
For the website, open IIS, right click for properties and select the home directory tab.
From here click the configuration button. This is very similar for each individual virtual
directory:
Christopher Kempster
227
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
Where session state is applicable the default value is 60 minutes, and ASP script timeout
default is 90 seconds. The values represent:
a) session timeout
a. user sessions - the IIS server can create per user session objects and can
be used to maintain basic state information for the “session” via user
defined session variables. The session object is created on storing or
accessing a session object in ASP code, which will fire the session_onstart
event
i. timeout controlled programmatically via
Session.Timeout = [n-minutes]
ii. User page refresh will reset the timeout period
iii. Get the unique session ID via Session.SessionID
iv. Control locality that affects date displays etc via
Session.LCID
= [value]
v. Create a session variable and show contents:
Session(“myData”) = “some value here”
response.write Session(“myData”)
b. application sessions – similar to user sessions but are accessible to all user
sessions for the virtual directory. They cannot timeout as such but will
reset on IIS being re-started. Its values are typically initialised via the
global.asa
i. initialised on first page served by IIS
ii. use: Application(“myData”) = “some value here”
b) script timeout
a. limit for page execution time
b. default of 90 seconds
c. also set via the following for a specific ASP page
Christopher Kempster
228
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
i. <%@ LANGUAGE=”VBSCRIPT”%>
<% Server.ScriptTimeout = 1800 %>
Be aware that IIS has a restricted maximum value for session timeout, that being 24hrs
(1440 minutes). In most cases, developers will proactively use a single point of
reference for application wide variables, session and script timeouts.
SQL Server
For SQL Server we have:
a) LOCK_TIMEOUT
a. Will timeout the executing command after N milliseconds if it is waiting for
locks.
b. Typically called at the top of stored procedures (along with set nocount)
c. Is not Pandora’s box for solving deadlock issues
d. Check value via select @@LOCK_TIMEOUT
e. Over linked servers, test carefully. You may find that a 4-part naming
convention does not work. Worse still setting the value before using
OPENQUERY may also not work. If you experience this problem, try this
syntax:
select *
from openquery([myremoteserver],
‘set lock_timeout 1000 select col1 from myremotetable’)
b) ‘remote login timeout’
a. linked server login timeout.
b. OLE-DB providers, default is 20 seconds
c. exec sp_configure N' remote login timeout (s)', 1000
c) ‘remote query timeout’
a. linked server query timeout.
b. default 600 seconds (ten minutes)
c. exec sp_configure N'remote query timeout (s)', 1000
d) ‘query wait’
a. default 25 times cost estimated (-1), value in seconds
b. wait time occurs when resources are not available and process has to be
queued
c. if used incorrectly, can hide other errors related to deadlocking
d. will not stop/cancel blocking issues
e. set at instance level only
f. don’t use in an attempt to stop a query after N seconds, its resource
related only.
g. exec sp_configure 'query wait', 5
e) ‘query governor cost limit’
a. default zero, value is in seconds, upper time for DML to run
b. execute/runtime, not parse time
c. is an estimated figure by the optimizer
d. globally for the instance
i. sp_configure ‘query governor cost limit’, 1
e. manually set per connection
i. SET QUERY_GOVERNOR_COST_LIMIT 1
Remember that deadlocks are not the same as LOCK_TIMEOUT issues. The DBA should
Christopher Kempster
229
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
have a careful look at the DTS timeout options; select properties at a variety of levels
and active-x objects to determine where values can be set.
Sample Error Messages
Here is a table of sample error messages propagated from a basic web based application
to the end user. Be careful, as the actual error message can depend on many factors
such as the network providers being used and much more.
ASP Script Timeout
(website or the
virtual directory,
check both carefully
when debugging)
COM+ Transaction
Timeout
or
ADO Connection
Timeout
Christopher Kempster
230
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
ADO Command
Timeout
Runtime error -214721781 (80040e31)
SQL – Query
Governor cost limit
reached
Server: Msg 8649, Level 17, State 1, Line 1
SQL - Lock Timeout
Server: Msg 1222, Level 16, State 54, Line 2
[Microsoft][ODBC SQL Server Driver]Timeout expired
The query has been cancelled because the estimated cost of
this query (7) exceeds the configured threshold of 1. Contact
the system administrator.
Lock request time out period exceeded.
SQL – Query Wait
Server: Msg 1204, Level 19, State 1, Line 1
The SQL Server cannot obtain a LOCK resource at this time.
Rerun your statement when there are fewer active users or ask
the system administrator to check the SQL Server lock and
memory configuration.
Is the timeout order Important?
The order really depends on the developer and their specific requirements. Even so, the
DBA and developer should have a good understanding of the settings being used to
better assist in tracking and resolving timeout issues.
In most cases, the developer is inclined to start debugging from the DBMS and work up
(not necessarily for session timeouts though). Doing so allows the developer to better
understand errors related to DBMS connectivity and command execution verses higher
level problems in the business and interface layer tiers.
Care must be taken with long running batch jobs or complex SQL. It is not unusual for
developers to fix class level transaction timeouts in COM+ and associated IIS level
timeout values. Unless you have done a lot of testing, command timeouts are difficult to
manage due to spikes in user activity.
DBCC Commands
What is – dbcc dbcontrol() ?
This DBCC option is the same as sp_dboption, and takes two parameters setting the
database online or offline:
USE master
GO
DBCC DBCONTROL (pubs,offline)
GO
--View status of database
SELECT CASE DATABASEPROPERTY('pubs','IsOffline')
WHEN 1 THEN 'Offline'
Christopher Kempster
231
S Q L
ELSE
END
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
'Online'
AS 'Status'
This command does not work under service pack three (a) of SQL Server 2000.
What is - dbcc rebuild_log() ?
This is an undocumented DBCC command. It will basically drop all virtual log extents and
create a new log file. The command requires the database to be in bypass recovery
mode before it can be run:
server: Msg 5023, Level 16, State 2, Line 1
Database must be put in bypass recovery mode to rebuild the log.
Another name for bypass recovery mode is Emergancy Mode, therefore:
use master
go
sp_configure 'allow updates', 1
reconfigure with override
go
update sysdatabases set status = 32768 where name = '<db-name-here>'
We now stop SQL Server instance delete the database log file before re-starting the
instance, otherwise you get this error:
Server: Msg 5025, Level 16, State 1, Line 1
The file 'F:\Program Files\Microsoft SQL
Server\MSSQL$SS2KTESTC1\data\cktest_Log.LDF' already exists. It should be
renamed or deleted so that a new log file can be created.
With the instance restarted. Run the command:
dbcc rebuild_log(cktest)
-- cktest is the name of the database
Warning: The log for database 'cktest' has been rebuilt. Transactional consistency
has been lost. DBCC CHECKDB should be run to validate physical consistency.
Database options will have to be reset, and extra log files may need to be
deleted.
If you need to specify a different filename use:
dbcc traceon (3604)
dbcc rebuild_log(cktest, ‘c:\dblog\cktest_log.ldf’)
If successful, the db status is dbo use only. The file will be created in the default data
directory; if the previous database had multiple log files, they will be lost (but not
physically deleted from the file-system, do this manually). Only a single log file is
created 1Mb in size.
Use EM or the alter database command to restore its status. Use sp_deattach and
sp_attach commands (or via EM) if you need to move the database log file to another
location.
Christopher Kempster
232
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
Troubleshooting DTS and SQL Agent Issues
Naming Standards
A quick note. The DBA should follow a strict naming standard for DTS packages and stick
with it. The convention used should make the purpose and context of the package easy
to identify. For example:
<application/module> - <function or task> [ - <part N or app versioning, sub-task>]
eg:
EIS – Load HR Data – v1.223.1
DBA – Log Shipping - Init
I’m getting a “deferred prepare error” ?
A SQL comment embedded in a stored procedure/SQL edited within DTS designer then
run can cause this error. I experienced this error on SQL Server 2000 SP2.
Debugging SQLAgent Startup via Command Line
On http://www.sqlservercentral.com, Andy Warren discussed a problem with SQL Agent
and his success with Microsoft support in resolving the startup issues with the agent. In
the article, he mentions some command line options well worth knowing to assist you in
debugging the agent.
For example:
cd c:\Program Files\Microsoft SQL Server\MSSQL$CKDB\Binn>
sqlagent.exe -i ckdb -c -v > c:\logfile.txt
where:
-i [instancename], my named instance. If there is a problem here you will be told
about it
2003-07-09 15:45:28 - ! [246] Startup error: Unable to read SQLServerAgent
registry settings (from SOFTWARE\Microsoft\Microsoft SQL
Server\SQLAGENT$CKDB\SQLServerAgent)
in the error above, I used sqlagent$ckdb, rather than the straight CKDB.
-c, command line startup
-v, verbose error mode
> c:\filename.txt, send errors to this file
So on startup I get this:
Microsoft (R) SQLServerAgent 8.00.760
Copyright (C) Microsoft Corporation, 1995 - 1999.
2003-07-09 15:45:59 - ? [094] SQLServerAgent started from command line
2003-07-09 15:46:02 - ? [100] Microsoft SQLServerAgent version 8.00.760 (x86 unicode retail build) :
Process ID
2003-07-09 15:46:02 - ? [100] Microsoft SQLServerAgent version 8.00.760 (x86 unicode retail build) :
Process ID 840
Christopher Kempster
233
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
2003-07-09 15:46:02 - ? [101] SQL Server PC-124405\CKDB version 8.00.760 (0 connection limit)
2003-07-09 15:46:02 - ? [102] SQL Server ODBC driver version 3.81.9031
2003-07-09 15:46:02 - ? [103] NetLib being used by driver is DBMSLPCN.DLL; Local host server is PC124405\CKDB
2003-07-09 15:46:02 - ? [310] 1 processor(s) and 256 MB RAM detected
2003-07-09 15:46:02 - ? [339] Local computer is PC-124405 running Windows NT 5.0 (2195) Service
Pack 3
2003-07-09 15:46:02 - + [260] Unable to start mail session (reason: No mail profile defined)
2003-07-09 15:46:02 - + [396] An idle CPU condition has not been defined - OnIdle job schedules will
have no effect
CTRL-C to stop, Y response shuts down immediately, no enter required.
Don’t forget your package history!
Everytime you save a DTS package to SQL Server, a version or copy is created in the
msdb..packagedata system table (or msdb..sysdtspackages) in the MSDB database:
Deleting the only version of a package will of course remove the entire package and
subsequent scheduled jobs will not run. To remove a version use the command below
(or EM):
exec msdb..sp_drop_dtspackage NULL, NULL, "CA49CFB3-60D1-4084-ABCF-32BD7F93E766"
where the GUID string is from the versionid column in msdb..packagedata. Packages
saved to disk or file will not carry with it the SQL Server versioning.
Where are my packages stored in SQL Server?
A SQL Server saved package will be stored in the sysdtspackages table within the MSDB
database. The key here is the packagedata column which is a binary (image) blob of the
package itself:
The DBA can use the textcopy.exe utility to dump the contents of a package (note that
the table holds multiple versions of a single package) into a .DTS file. Once there you
can open the file via Enterprise Manager (right click properties of data transformation
services) and attempt to open the file.
Here is an example of textcopy.exe:
Christopher Kempster
234
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
textcopy /S MYSERVER\CKTEST /U sa /P /D msdb /T sysdtspackages /C
packagedata /W "where name='TEST'" /F c:\dtsfile.dts /O
DTS Package runtime logging
To enable logging within your package, right click on any white space within the design
screen for the package, and select package properties:
Under the logging tab, we can log runtime data of the package to:
a) SQL Server – to your current or another SQL Server instance
b) file on disk (at the server)
Option a) will call two MSDB stored procedures:
exec msdb..sp_log_dtspackage_begin
and
exec msdb..sp_log_dtspackage_end
which write to the table sysdtspackagelog. If you are suspicious of MSDB growth, check
this table first before moving onto the sysdtspackages table and reviewing the history of
packages.
Option b) will generate something like:
The execution of the following DTS Package succeeded:
Package Name: (null)
Package Description: (null)
Package ID: {1F84517D-1D5C-4AB0-AE0C-D7EC364F5052}
Package Version: {1F84517D-1D5C-4AB0-AE0C-D7EC364F5052}
Package Execution Lineage: {FA5B1E8F-A786-4C24-9B91-232D1D155D7A}
Executed On: NEWCOMP
Executed By: admin
Execution Started: 7/03/2004 9:29:01 AM
Execution Completed: 7/03/2004 9:29:01 AM
Total Execution Time: 0 seconds
Christopher Kempster
235
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
Package Steps execution information:
There is no API call exposed (or option within the DTS designer) to overwrite the file, so it
will grow and grow with each execution of the package. One possible solution is to
leverage the filesystemobject (fso) API calls and use a small active-x script as the first
step in your package to clear this file (31).
I get an “invalid class string” or “parameter is not correct”
With each release of SQL Server comes new DTS controls, such as transfer-database or
logins for example. If you attempt to open a package that utilizes these objects and the
corresponding DLL (active-x) is not part of your clients (or servers) EM, then you may get
this error. Be very careful in this case when custom objects are used within DTS
packages.
So what can you do?
First of all, its technically not a SQL Server problem, its yours, so don’t plan on ringing up
Microsoft support and asking for a free call (not that I would of course). A classic case in
which this error can occur is if you have old SQL Server 7 packages that used the OLAP
Manager add-in to process OLAP cubes, found at:
http://www.microsoft.com/sql/downloads/dtskit.asp?SD=GN&LN=EN-AU&gssnb=1
As a fix, download the package and install on the server and your client, the package will
then open without error as it would fine the class-id (pointing to the underlying DLL).
I lost the DTS package password
If the package has been scheduled, and the DTSRUN command will use either an
encrypted or unencrypted runtime string. This may be the user password of the
package, not the owner password (can view the package definition). Be aware that once
a user password is set, the owner password must also be specified.
The password is also stored in msdb..sysdtspackages. For meta-data-services packages
the password is stored in msdb..rtbldmbprops (col11120).
I believe that a de-encryption program is available, consider MS support.
I lost a DTS package - can I recover it?
Consider the following in order:
Christopher Kempster
236
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
a) Can you retrieve an older version – assuming you made an edit error and need to
rollback
b) Consider restoring your MSDB database from a recent backup
Access denied error on running scheduled job
Check the following:
a) If the package is accessing UNC paths, specific files or directories, ensure the SQL
Agent service account user has the correct privileges.
b) With trial and error, and the inherent versioning of DTS packages, break down the
package to determine the failing task
c) Ensure the service account running SQL Agent has the appropriate privilege
d) An alternative for b), consider running the SQL Agent service account under
localsystem; did you remove builtin/administrator account from SQL?
Changing DTS package ownership
For non sysadmin users, it is not unusual for end users to receive the message “only the
owner of DTS Package ‘MyDTSPackage’ or a member of the sysadmin role may create
new versions of it’. This comes from the call to msdb..sp_add_dtspackage.
If multiple people need to work on the package, and most are not sysadmin users (we
hope not), then we have a problem of package ownership.
To resolve this we can use the undocumented command:
exec sp_reassign_dtspackageowner ‘package-name’, ‘package-id’, ‘newloginname’
A nice wrapper script can be found at : http://www.sqldts.com/default.aspx?271
I have scheduled a package, if I alter it do I re-create the job?
No. The scheduled job is a 1:1 map to the package and the most recently saved version
of the package. You can alter the package, save it, and the job will run this version
without the need to re-create the job.
xpsql.cpp: Error 87 from GetProxyAccount on line 604
Only system administrators (sysadmin users) can run xp_cmdshell commands. Even
though you have granted a non-sysadmin user to execute privilege, you will still receive
this error. To resolve, we need to set the proxy account in which xp_cmdshell will run
under on the database server.
To set, right click and select properties for SQL Agent. Under the Job System tab, the
bottom group allows us to specify/reset the proxy account:
Christopher Kempster
237
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
Uncheck the box, and click the reset proxy account button:
Alter the account to a domain user with the appropriately restricted privileges to the
server, press OK. This GUI is equivalent to the command xp_sqlagent_proxy_account.
Stop and start SQL Agent for the proxy user to take effect.
DTSRUN and Encrypted package call
When you schedule a package to run via Enterprise Manager, the step properties will
show you something like this:
DTSRun
/~Z0x6D75AA27E747EB79AC882A470A386ACEB675DF1E7CB370A93244AA80916653FC9F13B50CA6F
6743BB5D6C31862B66195B63F4EBEE17F6D4E824F6C4AD4EADD8C323C89F3D976BC15152730A8AF
5DB536B84A75D03613D6E9AF2DD5BC309EB9621F56AF
This is equivalent to this:
DTSRun /S MySQLServer /E /N "MyDTSPackage"
Review the parameter set carefully to determine if further runtime items are required.
Generally, this makes for a much more readable job to package relationship.
IMPORTANT – If you do alter it, any subsequent save of the package will not
prevent the job from using this latest change.
TEMPDB in RAM – Instance fails to start
This option is not supported in SQL Server 7 and 2000. If you do experience the error in
past versions attempt the following:
a) Stop all SQL Server services, exit from Enterprise Manager and Query Analyser.
b) Go to the \binn directory for your instance and at the command line type in:
sqlservr –c -f
Christopher Kempster
238
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
c) The above command will start the Instance in single user mode with minimal
configuration. You will see the SQL Server error log output to this command
window, wait until SQL Server completes recovery and the instance has started.
d) Run Query Analyser (isql) and connect with sysadmin privileges (aka the SA
account)
e) Run:
f)
sp_configure tempdb,0
go
reconfigure
go
Go back to the command line, type shutdown OR CTRL-D, you will be asked to
shutdown the instance.
g) Re-start the instance as per normal.
h) Check TEMPDB status and more importantly, validate that its size is adequate for
your particular instance.
Restore a single table from a File Group
A table was dropped. You know the filegroup it comes from. Application can still run
albeit with some errors.
This scenario can be easily adapted to include any DBMS object.
Pre-Recovery Steps
The DR plan is initiated. The team meets to determine:
a) cause of the problem (if possible at this early stage)
b) impact of the tables removal/corruption/missing data
c) time to repair vs full database restore to a point in time, or, is it a
reporting/generated table that will simply be recreated later or can be rebuilt
overnight?
d) Amount of DML activity occurring within the DBMS – hot tables?
e) time the error was noticed or occurred at (to facilitate accurate point in time
recovery)
f)
a copy of the database resides elsewhere? (to assist with scripting if required)
Time is of the essence with this recovery scenario. The longer you delay in making a
decision the more work application users may need to repeat if you decide on a complete
point in time recovery before the error occurred.
Christopher Kempster
239
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
To ensure maximum availability a decision is made to restore this single table and its
associated indexes, constraints, triggers.
Recovery Steps
We have the following backups:
FULL
LOG
LOG
LOG
6am
9am
10am
11am
fgtest_full.bak
fgtest_log1.bak
fgtest_log2.bak
fgtest_log3.bak
missing table detected around this point
All files have been checked, uncompressed and are on disk ready for use. The database
structure is:
You can also do this:
restore filelistonly from disk='c:\fgtest_full.bak' with nounload
Logical F.Name
fgtest_primary
fgtest_data01
fgtest_data02
fgtest_Log
Physical Filename/Path
C:\Program Files\Microsoft SQL Server\MSSQL$CKDB\data\fgtest_primary_Data.MDF
C:\Program Files\Microsoft SQL Server\MSSQL$CKDB\data\fgtest_data01_Data.NDF
C:\Program Files\Microsoft SQL Server\MSSQL$CKDB\data\fgtest_data02_Data.NDF
C:\Program Files\Microsoft SQL Server\MSSQL$CKDB\data\fgtest_Log.LDF
Type
D
D
D
L
FileGrp
PRIMARY
DATA01
DATA02
NULL
We will restore the database to the same instance but of course, with a different
database name. Later we will discuss the issues of recovery on the “live” database. The
partial clause is required to start the recovery, I have had limited success via EM so I
would highly recommend doing this via query analyser.
RESTORE DATABASE [fgtest_temp]
FILE = N'fgtest_data02'
FROM DISK = N'C:\fgtest_full.bak'
WITH PARTIAL, RESTRICTED_USER, REPLACE,
MOVE 'fgtest_primary' TO 'c:\temp1.ndf',
MOVE 'fgtest_data02' TO 'c:\temp2.ndf',
MOVE 'fgtest_log' TO 'c:\temp3.ndf',
STANDBY = 'c:\undo.ldf'
-- norecovery option is not applicable
To complete partial the recovery of the database and fg_data02 filegroup via the creation
of a new database, you of course must include:
a) primary file group – system tables/meta-data for the database
b) log files – to facilitate the recovery process
The primary file group files are also recovered with our specific file group. This is
mandatory as the dictionary information for the database and its objects only exist due to
the sys tables in the primary file group. Take this into consideration if the primary file
group is large.
Christopher Kempster
240
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
To confirm the restore and recovery of fgtest_data02 and primary filegroups, run the
command:
select * from fgtest_temp..sysfiles
File refers to the existing database, notice the zero size, growth and a status
If you attempt to query an object that resides in fgtest_data01 you will get this message:
Server: Msg 8653, Level 16, State 1, Line 1
Warning: The query processor is unable to produce a plan because the table 'aa' is
marked OFFLINE.
The STANDBY clause allows us to inspect the database after each restore, providing us
with the opportunity to check the status of our missing table. The standby clause of
course is used for log shipping scenarios, creating a warm standby database for high
availability.
RESTORE LOG fgtest_temp
FROM DISK='c:\fgtest_log1.bak'
WITH STANDBY = 'c:\undo.ldf', RESTRICTED_USER
select * from fgtest_temp..bb
-- exists.
RESTORE LOG fgtest_temp
FROM DISK='c:\fgtest_log2.bak'
WITH STANDBY = 'c:\undo.ldf', RESTRICTED_USER
select * from fgtest_temp..bb
-- exists.
RESTORE LOG fgtest_temp
FROM DISK='c:\fgtest_log3.bak'
WITH STANDBY = 'c:\undo.ldf', RESTRICTED_USER
select * from fgtest_temp..bb
-- doesnt exist!
Get end time of this log file and between this time and the log2, try the stop at clause to
determine how close you can get to the drop being issues, OR, use a 3rd party log reader
application to assist you in locating the STOPAT time.
-- run this first:
RESTORE LOG fgtest_temp
FROM DISK='c:\fgtest_log3.bak'
WITH STANDBY = 'c:\undo.ldf', RESTRICTED_USER,
STOPAT = 'May 30, 2003 02:05:12 PM'
select * from fgtest_temp..bb
-- doesnt exist!
-- then run:
RESTORE LOG fgtest_temp
FROM DISK='c:\fgtest_log3.bak'
WITH STANDBY = 'c:\undo.ldf', RESTRICTED_USER,
STOPAT = 'May 30, 2003 02:05:05 PM'
Christopher Kempster
241
S Q L
S E R V E R
select * from fgtest_temp..bb
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
-- exists!
The current live database is in a state where a range of constraints (foreign key),
triggers, views and stored procedure may be invalid due to the missing table. This
operation is somewhat tricky and the DBA needs to be confident with the underlying
relationship the object has to other objects, let alone its own properties such as triggers
etc (object dependent of course). There are few sites where the database does not exist
already in some known state, such as a test or development database server, therefore,
you will have a good idea as to the additional restore steps to take.
In this particular example, to restore the table, its constraints, table and column
descriptions we need do the following. Remember that, as we use the standby file, we
can utilise EM to open the database during each recovery step and script out the
necessary code ready to run on the “live” production database.
REMEMBER – You cannot create diagrams on a read only database, which is the
mode our standby database is in during the recovery process.
a) Goto EM and script the table, all options are selected:
b) From the script
a. run the create table statement on the live database
b. copy the contents of the fgtest_temp..bb (standby database) to fgtest..bb
(live database). The data is now restored
c. Re-create triggers
d. for all views on the live production database run sp_refreshview. No need
to take action on stored procedures unless some were manually altered to
cater for the temporary loss of the table.
e. Re-create table constraints (primary and foreign keys) on fgtest..bb from
the script, include defaults and check constraints.
f.
Re-create other non-keyed indexes from the script
c) Although fine, we might be missing foreign keys from other objects to the newly
restored table. Remember that SQL Server will not allow you to drop a table if it’s
Christopher Kempster
242
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
referenced by foreign keys (with referring data). Even so, you need to double
check and re-create any missing keys.
This command runs the master..sp_MSdependencies stored procedure, or use
query analyser:
exec sp_MSdependencies ‘?’
For all tables shown, select them in the standby database, script them with only
the foreign/primary key option selected only. Before running the script, re-check
the production database. Don’t run with the drop statements in the script.
Always run the drop statements without them otherwise you will end up having a
very bad day with your recovery.
On completion, drop your partially recovered database and notify all users via the ER
Team.
Can I do a partial restore on another server and still get the same result?
Its important to remember that all meta-data related to the database is restored with the
primary file-group. SQL Server EM will allow you to script any portion of the database
but you will not be able to view table data for those file groups not physically restored via
the partial restore option.
Can I do a partial restore over the live database instead?
Of course - but typically as a last resort. The key issues here are:
a) restore without standby - is restricted to recovery commands only
b) restore with standby – database will be in read-only mode between restores
c) primary file group is part of the recovery
So in general, the database will be unavailable to end-users throughout the restore. This
tends to defeat the objective of ensuring as little downtime to the end-user whilst we
recover single objects.
If you decide to continue with this option, remember what is happening with the partial
restore. In this example, only the fgtest_data02 filegroup is being partially restored, the
primary and other user defined file-groups remain at the current LSN. If we issue the
Christopher Kempster
243
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
recover command over fgtest_data02 then the whole database goes into recovery mode
and its objects (tables, views etc) are inaccessible. Therefore, we use the standby
option after killing existing database user sessions:
IMPORTANT – Do a full backup before attempting this operation.
RESTORE DATABASE [fgtest]
FILE = N'fgtest_data02'
FROM DISK = N'C:\fgtest_full.bak'
WITH PARTIAL, RESTRICTED_USER,
STANDBY = 'c:\undo.ldf'
-- over the live prod database !
This example doesn’t show it but the database objects we need to restore were not part
of the initial full backup. The above command resulted in an empty database with no
user objects, if it was not included then the objects would have been there. Either way,
this is what we would expect to ensure consistency with the data dictionary in the
partially recovered file-group.
Even though fgtest_data01 is not part of the partial recovery, its tables (eg. [aa]) are still
not accessible, giving us the error:
Server: Msg 8653, Level 16, State 1, Line 1
Warning: The query processor is unable to produce a plan because the table 'fgtest..aa' is
marked OFFLINE.
To complete the recovery, the DBA needs to restore all database files for the database to
a single known point in time. The DBA must decide when this point will be. As
mentioned earlier, this scenario is a bad one and should not be attempted unless you
have very good reason to do so.
Restore over database in a loading status?
Recently I came across an interesting issue with restoring over a database in a state of
"loading". Any attempt to restore over the file resulted in strange IO errors. It should be
noted that removing a loading database within EM will not remove the physical data files.
If you get this issue and it’s applicable, simply remove the files completely and the
restore will succeed, unless of course you really do have a physical disk problem.
Moving your system databases
Moving MASTER and Error Logs
The instance start-up parameters in the registry can be altered, effectively allowing you
to move the master database files to another location.
Christopher Kempster
244
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
Alter the registry entries, stop the instance services, including SQL Agent, move the
master database files to the new location and re-start the instance. These settings can
be changed within Enterprise Manager by selecting properties of the instance and altering
the start-up parameters. This will call an extended stored procedure to alter the registry
settings above.
Moving MSDB and MODEL
The DBA can move the MSDB and MODEL databases via:
Full backup then restore to a new destination
Use the sp_attach and sp_deattach commands
The backup and restore option is more appropriate, the instance will require a less
downtime and is a safer option overall.
To move this database via the sp_attach and sp_deattach commands, we require a little
more work needs to be done.
1. Stop the instance, then open the service control window for the instance, and
add the start parameter -T3608
2. Start the instance
Christopher Kempster
245
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
3. Run query analyser, login in with sysadmin privileges
4. Run the following command, writing down the name, fileid and filename
displayed
go
use msdb
go
sp_helpfile
5. De-attach the database from the instance with the command
use master
go
sp_detach_db 'msdb'
go
6. Refresh enterprise manager, you will notice that MSDB has now disappeared.
7. Copy or Move the database files listed in 4 to their new destination
8. Run a command similar to the following to re-attach the MSDB database files
use master
go
sp_attach_db 'msdb','C:\msdbdata.mdf','c:\msdblog.ldf'
go
9. Go back to the service control window. Shutdown services, clear the trace flag
and re-start the service.
10. Refresh or re-start Enterprise Manager.
11. Re-run step 4 to verify the new file location
Repeat the process above for the MODEL database.
WARNING - if doing both databases at the same time attach the MODEL
database before MSDB. If not you can come across a situation where MSDB takes
on the role of MODEL which is far from ideal (57).
Moving TEMPDB
The TEMPDB database is the same as the temporary tablespace option within Oracle.
This database is used for sorting and temporary tables. It is common practice,
where possible, to move this database to its own RAID 1 or RAID 10 set. This needs
to be carefully evaluated to determine if the use of the database and the set of disks
it is residing on is a bottleneck. To move the database files for this system
database:
Christopher Kempster
246
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
1. Run query analyzer. Check existing location of the tempdb database files
use tempdb
go
sp_helpfile
go
2. Get the destination directory ready and document the results above. Recheck the locations.
3.
Issue the alter statement to logically move the files
alter database tempdb
modify file (name=tempdev, filename=c:\dbdata\sys\tempdb.dbf)
alter database tempdb
modify file (name=templog, filename=d:\dblog\sys\templog.ldf)
Restart the instance for the files to be re-created at the new location.
Consider trace flag 3608 or 3609 (skips tempdb creation) if you have issues with the
new destination or with the model database (from which it’s created).
You can also resize the tempdb database via the SIZE option in the alter database
statement.
Moving User Databases
The DBA can move or copy databases via the sp_attach and sp_deattach commands.
This works on all database files, not selected file-groups. We have a variety of options:
a)
b)
c)
d)
Shutdown instance, copy database files, and re-attach at destination server
Offline the database, copy files, and re-attach at destination server.
De-attach the database, copy files, and re-attach at destination server.
Run a split mirror, offline or read-only the database, break the mirror and use
the files from the mirrored disk.
Some of these methods are described below.
Remember - copying a database will not take the logins with it, as this
information is stored in the master database.
Remember – If you do not have database backups, but still have all the database
files, the re-attaching the database will be your last remaining hope of recovering
your database.
Shutdown instance method
Simply shutdown the SQL Server instance, taking care when running multiple instances
on the same server. When down, copy the database files to the other server (or
copy/rename/move if it will be attached to the same server). As the database was cleanly
shutdown there will be no issues with re-attaching so long as the copy did not fail
unexpectedly. If the instance did fail unexpectedly and you have no backups, reattaching may still be possible (with the added risk of data corruption).
Christopher Kempster
247
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
When using this method, the database will of course remain on the source server with no
change what-so-ever to the source database.
To shutdown the instance use one of the following:
a) use NET STOP service command from the operating system
b) use Enterprise Manager and their GUI option
c) issue the SHUTDOWN transact-SQL command
Offline a Database method
Once the database is “offline”, you can copy its database files to a new server and
re-attach. Use this method when shutting down the SQL Server instance is not
desirable and you want to retain the database on the source server.
Reminder – User sessions will not be disconnected; this is applicable for
sp_dboption and the ALTER database command.
To take the instance offline:
exec sp_dboption N'mydb', N'offline', N'true'
or
alter database [mydb] set offline with rollback after 60 seconds
or
alter database [mydb] set offline with rollback immediate
or
DBCC DBCONTROL (mydb,offline)
Using the alter database statement (SQL Server 2k and beyond) is the preferred method.
The rollback after statement will force currently executing statements to rollback after N
seconds. The default is to wait for all currently running transactions to complete and for
the sessions to be terminated. Use the rollback immediate clause to rollback transactions
immediately.
When running the command with users connected you will get something like:
sp_dboption (does not wait like the alter database command, see below)
Server: Msg 5070, Level 16, State 2, Line 1
Database state cannot be changed while other users are using the database 'mydb'
Server: Msg 5069, Level 16, State 1, Line 1
ALTER DATABASE statement failed.
sp_dboption command failed.
alter database [aa] set offline [any parameter combination]
This command will run forever, waiting for sessions to disconnect. When it completes you will get
something like:
Nonqualified transactions are being rolled back. Estimated rollback completion: 100%.
See the script http://www.sqlservercentral.com/scripts/scriptdetails.asp?scriptid=271 to
kill off all connections for a database.
To confirm the offline status:
SELECT DATABASEPROPERTY('pubs','IsOffline')
or
SELECT DATABASEPROPERTYEX('mydb', 'Status')
Christopher Kempster
248
-- 1 if yes
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
Attempting to connect to the database will give you:
Server: Msg 942, Level 14, State 4, Line 1
Database 'mydb' cannot be opened because it is offline.
De-Attaching the database
If you want to completely remove the database from the master database and the
SQL Server instance, use the detach command rather than offlining the database.
When attempting to de-attach with Enterprise manager it will warn you when:
a) there are users connected to the database
b) replication is active
All user sessions must be disconnected and replication disabled before attempting the deattachment.
The command is:
exec sp_detach_db N'mydb', N'false'
The second parameter denotes whether to include a statistics collection before deattaching the database. You must be a member of the sysadmin system role to issue this
command. Also note the error:
Server: Msg 7940, Level 16, State 1, Line 1
System databases master, model, msdb, and tempdb cannot be detached.
Funny enough, statistics are still updated before receiving this error.
The de-attachment will remove the database from the sysdatabases table in the master
database. The sysxlogins table will retain references to the de-attached database,
therefore, you will need to either remove the login(s) or alter their default database
connections:
exec sp_defaultdb N'myuser', N'master'
-- change default db from myuser to the master database.
exec sp_droplogin N'mytest'
Dropping logins is not straight forward. You need to either orphan the login from its
associated database user or drop the user, otherwise you will get this message:
Server: Msg 15175, Level 16, State 1, Procedure sp_droplogin, Line 93
Login 'myuser' is aliased or mapped to a user in one or more database(s). Drop the
user or alias before dropping the login.
You cannot remove users that own database objects. The standard drop user command
is:
use [mydb]
exec sp_dropuser N’myuser’
Checking Files before Attaching
Christopher Kempster
249
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
You should note that you cannot attach more than 16 files for a single database. Before
attaching the database, issue the following commands over the primary file-group data
file to get a listing of files that make up the database structure:
--Is the file a primary file-group MDF file?
dbcc checkprimaryfile (N'E:\SQLServerData\MSSQL\Data\mydb_Data.MDF', 0)
--Get me the database name, version and collation
dbcc checkprimaryfile (N'E:\SQLServerData\MSSQL\Data\mydb_Data.MDF', 2)
--Get a list of all files associated with the database. (original name)
dbcc checkprimaryfile (N'E:\SQLServerData\MSSQL\Data\mydb_Data.MDF', 3)
Attaching the database
The sp_attach_db command allows you to re-attach your database onto the SQL Server
instance. For example:
exec sp_attach_db
N'mydb' ,
N'E:\SQLServerData\MSSQL\Data\new_aa_Data.MDF',
N'E:\SQLServerData\MSSQL\Data\new_aa_Log.LDF'
The syntax is simple enough, the first being the name of the database to attach and its
associated database files. The database being attached must not already exist. You can
also attach databases not previously de-attached so long as the database was closed and
files where copied successfully.
Server: Msg 1801, Level 16, State 3, Line 1
Database 'mydb' already exists.
After re-attaching, especially if it’s on different server, you will need to fix orphaned
logins via the command:
exec sp_change_users_login <see SQL Server BOL for parameter list>
Attaching a single file
The sp_attach_single_file_db command is quite powerful. It allows you to re-attach a
database by specifying only its initial master data file. If your database had other data
files (even in the primary file-group) they will be automatically re-attached (only to their
previous destination though) for you by reading sysfiles within the primary MDF. This is
all fine if you want the data files restored to the same location from which the database
once existed along with the physical file name; apart from that you have no control and
will need to opt for sp_attach.
When re-attaching with this command, you have the ability for SQL Server to
automatically recreate your log file so long as it’s not available for SQL Server to
automatically re-attach when it looks up sysfiles. This method is handy when you have a
large log file and want to shrink it back to a manageable size. For example:
exec sp_attach_single_file_db N'MyxxDb' ,
N'E:\SQLServerData\MSSQL\Data\xx_Data.MDF'
Christopher Kempster
250
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
<..shows the message below, replace XX with the required name..>
Device activation error. The physical file name
'e:\sqlserverdata\MSSQL\data\xx_Log.LDF' may be incorrect.
New log file 'E:\SQLServerData\MSSQL\Data\xxxx_log.LDF' was created.
The new file size E:\SQLServerData\MSSQL\Data\xxxx_log.LDF will be 512k.
This command will not work if you have multiple log files:
Server: Msg 1813, Level 16, State 2, Line 1
Could not open new database 'mytest'. CREATE DATABASE is aborted.
Device activation error. The physical file name 'e:\sqlserverdata\MSSQL\data\mt_Log.LDF'
may be
incorrect.
Device activation error. The physical file name 'e:\sqlserverdata\MSSQL\data\mt_2_Log.LDF' may be
incorrect.
Some issues with MODEL and MSDB databases
To detach the model or msdb system databases, you need to set the trace flag –T3608
on instance startup. In all cases you must attach the model before the msdb database,
remembering that SQL*Agent for the instance must be stopped. As a side note, the
attach command executes something like the following:
CREATE DATABASE [mydb] ON
(FILENAME = ‘C:\dbdata\mydb\mydb_data.mdf’,
FILENAME = ‘C:\dbdata\mydb\mydb_log.ldf’)
FOR ATTACH
The create database command has dependencies on the model database, therefore
affecting its re-attachment.
Fixed dbid for system databases
The DBA should also be aware of the master..sysdatabases system table and its dbid
value for dbid for system databases. In some very rare occasions, it is possible that a
restore results in a corruption, or “mixup” in the dbid for the database, this may occur
when restoring databases in the wrong order. The flow on effect is some very strange
errors and confusion all round. See reference (57) for a great example of this.
The dbid for system databases are:
1
Master
2
Tempdb
3
Model
4
Msdb
Christopher Kempster
251
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
Scripting Database Objects
The scripting of databases is an important task for the DBA. Using the features of EM,
the database diagrammer and profiler (to a lesser degree) assist the DBA in building
scripts for new system changes and most importantly, is a form of recovery.
Using Enterprise Manager - right click properties on any database and the following GUI
is shown:
The screen is simplistic and requires no explanation but there are a few things to
remember:
a) You must select objects (tables, views) in order to script
indexes/triggers/constraints/permissions. You cannot “generically” script all
indexes for example without selecting all tables/views first. You need to filter
out what you need from the generated script.
b) You cannot script multiple databases at once
c) You cannot script logins specific to database (i.e. logins that map to a single
user in one database – typically the one you are scripting). You cannot script
the sa login.
d) You cannot script linked or remote servers.
e) The options tab is the key here. Remember to select permissions as probably
the most basic option under this TAB.
Christopher Kempster
252
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
Use the preview option to view the generated script in mini-dialog windows (which
you can cut into the clipboard from).
The diagrammer is also another handy place to generate scripts. For example - if you
need to make a variety of database changes and need a script to run then:
a) create a new (or use an existing) diagram and save it
b) make the changes within the diagrammer
c) press the script button (see below).
d) Copy the script generated to notepad or equivalent.
e) Don’t SAVE the diagram (we don’t want to apply the changes as yet – the script
will do it for us) and exit the diagrammer.
You can then use the saved script to apply on other identical databases (i.e. test /
support / prod databases) to mimic the changes and/or new objects.
You can cut the
scripted text from
here into notepad.
One of the key problems with the diagrammer is that you cannot allocate object
permissions whilst editing tables. This can adversely complicate your script generation
ideas.
NOTE – Be careful with generated scripts from the diagrammer. Always review
the generated script before running. In my experience EM has never generated a
script with errors.
Christopher Kempster
253
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
If you select the “design table” option and alter the table, the same script option is
available to the DBA. Note that this is not the case for “design view” although the SQL
statement is selectable.
Another method for scripting is via EM and its listing of tables and views, for
example:
Select objects in
Enterprise Manager,
CTRL-C to copy.
Run Query Analyser, open a new
connection, and paste, a script is
generated for the selected objects.
Verifying Backups
To verify a backup, use the command:
restore verifyonly from ‘c:\myfullbackup.bak’
The DBA can also load the backup history in the backup file into the MSDB database.
This can be handy when analyzing the backup before attempting a recovery. Apart
from this SQL Server has no method as such for validating the backup until recovery.
Christopher Kempster
254
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
Recovery
In SQL Server the DBA has a range of methods to facilitate recovery:
a) rebuildm.exe (from setup CD for rebuilding the system databases)
b) Enterprise Manager and its GUI wizards
c) Query Analyser (GUI or command line version)
d) SQL Server Service Control Manager, Windows Services applet itself or the
command line options for the sqlservr.exe
Many of the scenarios in this section refer to trace flags to control system database
recovery and lookup.
Recovery is potentially more complex than other DBMS systems due to the fact that
we are not dealing with one or more user databases, but with many system
databases as well as many user databases with depend on them for the single
instance. This section provides a summary by example in which the DBA can then
base further tests to drill down into this very important topic.
NOTE – Many of the examples use the GUI tools and at times reference the
equivalent T-SQL command.
A quick reminder about the order of recovery
It is critical that you remember what backup files to be applied when recovering a
database from SQL backups. It is simple enough but often forgotten. The diagrams
show which backup files must be used to recover to the point of failure.
Full backup
Differential
Time
T.Log backup
Restore Order
(left to right)
Failure Point
If you are running in transaction log mode, and you want to recover a specific database
file only:
Do a log file backup immediately
after the failure and before the
recovery begins for the file. This
file should be applied to complete
the database files recovery.
Christopher Kempster
255
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
Killing User Connections and Stopping Further Connects
Killing off user connections is simple enough and there are many scripts on the internet
to do the job. An example script by [email protected] is shown below:
CREATE PROC Kill_Connections (@dbName varchar(128))
as
DECLARE @ProcessId varchar(4)
DECLARE CurrentProcesses SCROLL CURSOR FOR
select spid from sysprocesses where dbid =
(select dbid from sysdatabases where name = @dbName )
order by spid FOR READ ONLY
OPEN CurrentProcesses
FETCH NEXT FROM CurrentProcesses INTO @ProcessId
WHILE @@FETCH_STATUS <> -1
BEGIN
--print 'Kill ' + @processid
Exec ('KILL ' + @ProcessId)
--Kill @ProcessId
FETCH NEXT FROM CurrentProcesses INTO @ProcessId
END
CLOSE CurrentProcesses
DeAllocate CurrentProcesses
GO
Also consider the command to more elegantly terminate users and close off the
connection:
ALTER DATABASE mydb
SET SINGLE_USER WITH [<termination clause>]
eg:
ALTER DATABASE mydb
SET SINGLE_USER WITH ROLLBACK IMMEDIATE
To stop further connections, alter the database to dbo access only, or disable the
database logins via sp_denylogin (NT logins only).
Remember – you cannot recover a database whilst users are connected.
Using the GUI for Recovery
Unless you have major system database problems (which require additional steps before
running EM), the DBA will find that using EM for recovery is the simplist approach. The
best thing about the GUI when it comes to recovery is its reading of the MSDB sys
backup tables and correctly listing out the backup files to be used in a recovery. Later,
we will discuss a script I wrote that does a similar thing.
IMPORTANT – This section will use restore and recovery [of databases] to mean
the same thing. Always check the context in which it is being used.
Christopher Kempster
256
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
Name of the database. We can enter a
new name if required. If you do, click on
the options tab if you do to double check
the name of the database files and the
destination
For the selected database in the drop
down above, is listed the date of all full
backups.
From the full backup selected above, the
MSDB is searched and lists, in hierarchical
order its proposed restore list to do a
complete recovery. We can select the
option to restore to a point in time. If
available.
The DBA can uncheck the appropriate
backup files as need be. Note that we
cannot alter the source of the backups
listed which can be very restrictive.
HINT – When using EM for recovery, run profiler at the same time to trace the TSQL recovery routines being executed. This is the best way to learn the recovery
commands and the context in which they are being used.
WARNING – If you restore a database, say, Northwind, and restore it as a
different name (database), then be careful when removing the new database. It
will ask if you want to remove all backup history. If you say yes then kiss good-bye
to the Northwind database’s MSDB backup entries.
We will cover some of the GUI options in brief. Remember that virtually ALL restore
operations require that no users to be connected to the database.
Options - Leave database in non-operational state but able to restore additional logs
This option allows us to restore the instance to any specific point, but leave it in a state
where we can apply further backups as need be.
Selecting properties of the restored instance in loading state gives us the error:
Christopher Kempster
257
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
If you realize that you have no further backups and want to complete the recovery of the
instance, then (note that exec sp_helpdb will not show the database):
SELECT DATABASEPROPERTY('aa', N'IsInRecovery')
SELECT DATABASEPROPERTYEX('aa', 'status')
restore database aa with recovery
RESTORE DATABASE successfully processed 0 pages in 1.178 seconds (0.000 MB/sec).
An instance re-start will also issue the recovery statement. The non-operational
state simply executes the with norecovery option on restore of the last specified
backup file.
Options – Using the Force restore over existing database option
Using EM can be a tad strange when restoring databases. If you attempt to restore the
currently selected database, it will never prompt you that you are trying to overwrite
existing databases data files, even though (technically speaking here) you are! If we
attempted to restore say the northwind database as the pubs database, we will be
prompted with the following dialog:
It seems be to something related to the MSDB backup and restore tables which
determines whether or not this dialog is shown. Anyhow, to get around this, we click on
the options tab and select the Force restore over existing database option.
The command is not different to a standard restore. There is no magical restore option
related to the prevention of file overrides.
RESTORE DATABASE [bb]
FROM DISK = N'c:\northwind_full.bak'
WITH FILE = 1, NOUNLOAD , STATS = 10, RECOVERY ,
MOVE N'Northwind_log' TO N'C:\dblog\bb_log.ldf',
MOVE N'Northwind' TO N'C:\dbdata\bb_data.mdf'
Be very careful with this option in EM. Personally, I never use it unless I am 100%
sure that the files I am writing over are fine and I have already backed them up.
Christopher Kempster
258
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
Restoring a databases backup history from backup files
In this example we have the following database, with associated backups:
Database: mydb
Data and Log files: c:\mydb.mdb, c:\mydb.ldf
Backups:
Full
Diff
Log
Log
c:\mydb_full.bak
c:\mydb_diff.bak
c:\mydb_log1.bak
c:\mydb_log2.bak
On selecting the restore in EM for the database, it magically lists all backups for a
successful restoration of my database up to the point of mydb_log2.bak from MSDB.
If we lost this information, then suddenly our nice GUI dialog is not so helpful
anymore.
To re-populate the MSDB database tables with the backup history I recommend that
you do not use the GUI. It is overly time consuming for such a simple task:
RESTORE
RESTORE
RESTORE
RESTORE
VERIFYONLY
VERIFYONLY
VERIFYONLY
VERIFYONLY
FROM
FROM
FROM
FROM
DISK
DISK
DISK
DISK
=
=
=
=
N'C:\mydb_full.bak' WITH NOUNLOAD , LOADHISTORY
N'C:\mydb_diff.bak' WITH NOUNLOAD , LOADHISTORY
N'C:\mydb_log1.bak' WITH NOUNLOAD , LOADHISTORY
N'C:\mydb_log2.bak' WITH NOUNLOAD , LOADHISTORY
NOTE - if you backup media had multiple, appended backups, then you may also
need to use the WITH FILE = option.
Once done, using the EM restore option, we select the database and work off the
restore history to pick the best path for restoration.
Remember, before restoring always double check the database name and the options,
ensuring paths and names are correct.
SQLServer Agent must be able to connect to SQLServer as SysAdmin
It is important to remember that the SQL Server service, along with the SQL Server
Agent service, can be started under a NT user account other than the default system
account. This tends to be best practice for security reasons and the ability to define strict
NTFS privileges to the user account.
Christopher Kempster
259
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
The DBA needs to be careful with the privileges this user account has within the SQL
Server instance. The base system role privilege must be sysadmin. This must be
allocated for the SQL or Agent service accounts (typically the same account).
If you don’t, you may receive this error:
SQLServerAgent could not be started (reason: SQLServerAgent must be able to connect to SQLServer
as SysAdmin, but '(Unknown)' is not a member of theSysAdmin role). "
The DBA should check the SQL Server and SQL Agent log files at a minimum in any case.
If the error persists with the Agent, then:
“did you remove the BUILTIN/ADMINISTRATORS” group login?
This is often the case if you have reverted your service agent account back to run under
the system account but the group has been removed. If so, you need to add the
BuiltIn/Administrators group back in to use the system account for SQL Agent startup.
Restore cannot fit on disk
This is a classic problem. Basically, your attempt to restore a backup results in an
out of space error and asks you to free more space before re-attempting the restore.
In this particular scenario, SQL Server wants to restore the database files to the
same size as at the time when they were backed up. There is no option to alter the
physical file size (i.e. shrink) during the restore.
The general recommendation here is the shrink the database before any full backup
to reduce the possibility of this error. If that doesn’t work, try and restore and move
files as best you can to distribute the space amongst many disks, then shrink after
the restore. Backup and try to restore again with a more appropriate placement of
database files.
“Exclusive access could not be obtained..”
As a general reminder - you cannot restore the database whilst users or SPID are
connected; this message is related to this fact. Check master..sysprocesses or sp_who2
carefully, as system SPID attempting to cleanup from a large operation or completing an
internal SQL task should not be forcibly killed without a thorough investigation as to what
is happening.
Restore uses “logical” names
In the examples presented below, the restore operations work over the logical name
for each database file being restored (where this is appropriate of course). If you do
not know the logical name of the physical files or the name of the file-group, then
you will have some problems successfully restoring. Apart from using the GUI, we
can execute the command:
restore filelistonly from disk='c:\mydb.bak'
Also review the commands:
restore headeronly from disk='c:\mydb.bak'
restore labelonly from disk='c:\mydb.bak'
Christopher Kempster
260
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
Unable to read local event log. The event log is corrupted
I have only had this error once, as shown below:
Why? The hard disk was full, simple as that. There were no errors in the SQL Server
logs, but I did notice my custom backup scripts were no longer running; these returned
no errors and their run time was virtual instantaneous:
Freeing space on the drive was enough to kick start the DTS jobs once again.
What is a “Ghost Record Cleanup”?
Running profiler, or querying sysprocesses, you may see “error:602, severity:21,
state:13” (16), this is related to a background process running a ghost record cleanup.
Depending on the statement being run (typically a bulk delete), SQL Server will mark the
objects as ghosts which is the same as marking them for pending deletion. A
background process (seen as “TASK MANAGER” in sysprocesses) removes the records
asynchronously (17).
How do I Shrink TEMPDB?
There are numerous ways to shrink TEMPDB. In each case we have a named instance
called CKTEST1, and the TEMPDB data file is 80Mb in size. Our target size in all cases if
1Mb for our single data and log file.
The following solutions are fully tested scenarios as per MS Support doc 307487.
Christopher Kempster
261
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
Shutdown and re-start
The draw back here of course is the shutdown of the instance, far from ideal in many
production systems. That said, Microsoft do warn that the two alternatives (discussed
next) may result in physical corruption of the database if in use at the time.
a) Shutdown the named instance
b) Restart the service via the command line using –c and -f:
C:\Program Files\Microsoft SQL Server\MSSQL$CKTEST1\Binn>
sqlservr.exe -c -f –scktest1
c) Connect to the instance via query analyzer or other and run:
ALTER DATABASE tempdb MODIFY FILE (NAME = 'tempdev', SIZE = 1)
ALTER DATABASE tempdb MODIFY FILE (NAME = 'templog', SIZE = 1)
d) Check the physical file size. You may notice the file size is not immediately
reflected.
e) CTRL-C to stop the service.
f)
Re-start the service as per normal
g) Check file size via EM and on disk.
Use DBCC SHRINKDATABASE
This command is a good one to quickly shrink the tempdb data and log files. The shrink
is % used space based (as you will see) and not a physical value. This can be somewhat
frustrating. Also be aware that if ALTER DATABASE MODIFY FILE was used against
tempdb to set the minimum size of the data and log files, this command will set it to the
value specified at an absolute minimum. Use sp_helpfile against TEMPDB beforehand
and review the size column to confirm this.
a) Check Existing file size via sp_spaceusage
b) Determine shrink percentage of free space to be left after the shrink. The
dependency here is the target percentage specified in c) is based on the current
space used only.
Christopher Kempster
262
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
c) Run the shrink with percentage from b)
d) Check file size via EM and on disk, or use sp_spaceusage again
During a large tempdb operation along with shrinking the database, we may experience
the following locks:
Use DBCC SHRINKFILE
Here we repeat the operations as per shrinkdatabase, namely:
a)
Check Existing file size via sp_spaceusage
b) Determine the large files for shrinking via tempdb..sysfiles
Christopher Kempster
263
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
c) Attempt to shrink:
The command has three parameters, they being:
1) the file name or file id as per sysfiles
2) a integer value, representing target size in Mb
3) one of three option options, they being EMPTYFILE (),
NOTRUNCATE (realloc pages below specified size, empty pages not
released), TRUNCATEONLY (release unused to last allocated
extent)
dbcc shrinkfile (tempdev, 10)
The command will not shrink a file less than the data currently allocated
within the file.
d) Check file size via EM and on disk, or use sp_spaceusage again:
The shrinkfile command takes out the following locks:
The EMPTYFILE option is typically used for multifile file-groups, where the DBA wants to
migrate data and heap structures from one file in a filegroup to another, and prevent
further writes to the file (as writes are typically dispersed evenly amongst files in the
filegroup based on free space). There was a problem (Q279511) in SQL Server 7 that
was resolved in SP3 and SQL Server 2k.
IMPORTANT – Shrinkfile cannot make the database smaller than the size of the
model database.
The DBA may experience this error if the database is in use:
Server: Msg 2501, Level 16, State 1, Line 1 Could not find table named '1525580473'. Check
sysobjects.
-orServer: Msg 8909, Level 16, State 1, Line 0 Table Corrupt: Object ID 1, index ID 0, page ID %S_PGID.
The PageId in the page header = %S_PGID.
Christopher Kempster
264
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
Under SP3 of SQL Server 2k, I could not replicate the error. The target size was
restricted to the last extent currently in use.
How do I migrate to a previous service pack?
The applying of SQL service packs may result in a two fold change:
a)
they may alter the physical binaries of the SQL Instance and system wide
DLLs (aka MDAC)
b) they alter your system and possibly all user databases for the instance
being upgraded
Before you attempt to apply a service pack follow these general rules:
1) Retrieve the current version of your SQL Instance being migrated back
and check other instances and their current versions as well.
2) Run MDAC checker to get the current MDAC version
3) Run SQLDiag.exe against your instance to collect other global information
about your instance (for recover reference)
4) Full backup your instances system databases (master, msdb, model)
5) Full backup all user databases
IMPORTANT – Always double check the instance you are connecting to, and
ensure that utilities (sqldiag/query-analyser) are run against the correct instance.
Never skip backups, no matter how confident you are.
When making a decision to rollback, have a good look over the SQL Server installation
logs. Pass the errors through www.google.com (Google groups). If possible, call MS
support, but take care as immediate resolution may not be possible and may be very
costly in terms of downtime.
Full Rollback
A complete rollback from a service pack is time consuming and at times risky (more in
terms of forgetting something, either by mistake or through poor documentation). The
rollback complexity is exponentially increased when dealing with clusters, replicated
instances, or where multiple applications share the same server.
To return back to the previous services pack I have successfully used the following
process (we assume the system databases were backed up before the service pack was
applied):
a) Check and record the SQL Server version
b) Check and record the MDAC version (MDAC checker from Microsoft)
Christopher Kempster
265
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
c) Stop all SQL Server services and physically backup all database files.
Alternatively, complete a full backup of all database files for the instance,
including the binaries directory.
d) Re-start the instance
e) Disable replication if it’s currently in use (publications, and the distribution)
f)
Restore the backed-up master, msdb and model databases that was taken
before you applied the service pack
g) Re-apply the service pack prior to the new service pack that you are rolling back
from (of course).
h) Re-build fulltext catalogs (if necessary, I have had no issues in the past)
i)
Re-start replication as required
j)
Check MDAC again. Determine the need to reapply as required.
k) Check and record the SQL Server version
l)
Start/Stop the instance and check Windows event log and SQL Server logs;
debug as required.
Be aware the service packs vary considerably in terms of system change. The rollback
will not remove new DLL’s, and DLL’s that are not overwritten by the service pack you
are re-applying.
If you are not happy with this, the only path you can take is a complete un-install,
reboot, registry check and cleanup, then re-install of SQL Server. This can be tough work
as we need to return the instance back to proper collation, re-create replication, DTS jobs
and packages, users and logins, along the restoration of past system databases (covers
off DTS, logins etc). This is tough work and is summaried below as per the MS Support
document 314823:
NOTE – DLLs added to the %system root% directory are not guaranteed to be
removed
a) Check and record the SQL Server version
b) Check and record the MDAC version (MDAC checker from Microsoft)
c) Script all database logins, script replication setup where possible
d) Record collation/language of instance and databases
e) De-attach all user databases – it may be rare for changes to be made in user
databases, but they do host system tables and as such, are fair game during
upgrades.
Christopher Kempster
266
S Q L
f)
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
Stop all SQL Server services, and physically backup all database files.
Alternatively, complete a full backup of all database files for the instance,
including the binaries directory and full text indexes and log files.
g) Uninstall SQL Server via Add/Remove programs
h) Reboot the server
i)
Check MDAC level, and re-install SQL Server
j)
Reboot the server
k) Apply service pack as required
l)
Reboot the server if asked to do so. Check MDAC level
m) Restore master, msdb, model system database from a recent backup in this order
(users databases will be automatically re-attached)
n) Check logins, user databases, DTS packages and jobs
o) Restore and or resynchronize full text indexes
p) Reapply replication
NOTE – The DBA may consider a complete restore from tape backup, including
system state, Windows and SQL binaries and of course the database files. Be
warned that a small mistake in this process will be disasterous.
The readme.txt files from service packs are a good reference point in terms of what’s
changed and may provide some guidance on cleaning your server without a complete
rebuild. Also, refer to the chapter on High Availability regarding clusters.
You may be asked to reboot after applying the service pack. Do so before continuing to
reduce the possibility of further error.
Finally, as a general rule, always read the documentation accompanying service packs no
matter how simple the upgrade seems.
Service Pack install hangs when “checking credentials”
To fix this issue change the "DSQuery" value under:
HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\MSSQLServer\Client\ConnectTo\DSQuery
to "DBNETLIB". The installation should complete successfully.
Christopher Kempster
267
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
OLAP
Recovery of OLAP cubes to another server
For very large OLAP cubes and associated repository, it is one thing restoring cubes or its
meta-data, but its another reprocessing cubes and meeting your SLA. The cubes are
broken down into:
\%analysis-path%\<db-name>\<dimension-name>.*
..and..
\%analysis-path%\<db-name>\<cube-name>.*
For dimensions the extensions are:
.dim
.dimcr
.dimprop
.dimtree
Dimension meta-data
Custom rollups
Properties
Member data
For cubes the extensions are:
#.agg.flex.map
#.agg.rigid.map
#.fact.map
.agg.flex.data
.agg.rigid.data
.pdr
.prt
.fact.data
changing dimension aggregation data
aggregation data
aggregation data
changing dimension aggregation data (partitions)
aggregation data (partitions)
partition meta-data
partition meta-data
cube data
The need to re-process is based on how much of the above you have backed-up and
what period you are refreshing from. The files can be restored “inline” without the need
to stop the service.
Non-interface error: CoCreate of DSO for MSOLAP
If you create a linked server to OLAP services (using the OLEDB provider for OLAP
Services X.X), and get the above error then set the “allowinprocess” option for the
provider:
Christopher Kempster
268
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
As the dialog states, all linked services using the provider will be affected. Click on the
provider option for the above settings when creating or editing a linked server:
What TCP port does Analysis Services use?
It uses port 2725. It also uses 2393 and 2394 if you are connecting via a OLAP Services
(v7) client. If you are using HTTP via IIS then it will be port 80 or 443 for SSL based
connections.
Restoration Scenarios
Dealing with Database Corruption
How do I detect it?
It is of utmost importance that data corruption be detected as soon as possible. The
longer it goes undetected (and it can be a long time) the harder your time will be for
recovery. In the worst case, your backups become unusable and may span many hours
or days of lost work.
As a general best practice rule, I highly recommend you run DBCC CHECKDB once per
day. Ideally, write the results of the command to disk and store the output for a week.
The command can be a handy reference point for yourself and MS Support Services. The
command can take its toll on the tempdb. For large databases, we can estimate tempdb
usage:
DBCC CHECKDB with ESTIMATEONLY
NOTE – I had a classic database corruption where very selective queries would
simply timeout for no apparent reason; via query analyzer you would get a
disconnection if you queried between rows 120,000 and 128,000 only on a single
table. This sort of database corruption can do undetected for some time if the DBA
is not actively checking system integrity.
A fellow DBA, Rrodrigo Acosta, wrote a great script to do just this. It can be downloaded
from my website. The command calls isql via xp_cmdshell to connect back to the
instance and run CHECKDB with a redirect of the output to a text file:
set @osql='EXEC master.dbo.xp_cmdshell '+''''+'isql -E -S' + @@servername + ' -Q"DBCC Checkdb
("'+@dbname+'")" -oC:\CheckDB\'+@date+'\'+@dbname+'_'+@date+'.log'+''''
Christopher Kempster
269
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
EXEC (@osql)
To avoid the detailed analysis, use no_infomsgs. This may reduce tempdb required work
for large schemas.
dbcc checkdb with no_infomsgs
If you suspect HW related errors, consider the PHYSICAL_ONLY option:
dbcc checkdb with physical_only
Use the NO_INDEX option to gain overall execution speed, but I tend to find indexes to
be more of an issue that heap structures.
Taking the command further, we can add a check for allocation and consistency errors.
If found, the DBA is emailed with the file attached:
set @osql='EXEC master.dbo.xp_cmdshell ' + ''' echo "Line_Text" > C:\CheckDB\tem.txt'''
exec (@osql)
set @osql='EXEC master.dbo.xp_cmdshell ' + ''' more ' + @pattachmentfilename + ' >> C:\CheckDB\tem.txt'''
exec (@osql)
set @status = -1
select @status = 1 from OpenRowset('MSDASQL', 'Driver={Microsoft Text Driver (*.txt; *.csv)};
DefaultDir=C:\CheckDB;','select Line_Text from "tem.txt"') where Line_Text like 'CHECKDB found 0
allocation errors and 0 consistency errors%'
Note that SQL 2k will apply a schema lock only against the object. If you experience an
access violation when run, review MS support article 293292.
The DBA may also review other integrity checking commands:
•
DBCC TEXTALL
•
DBCC CHECKTABLE
•
DBCC CHECKCATALOG
•
DBCC CHECKALLOC
The DBA will find that CHECKTABLE does not verify the consistency of all the allocation
structures in the object; consider using CHECKALLOC as well.
If you suspect that the statistic objects (text blobs) are corrupt (_wa objects), attempt to
script them before using DROP STATISTICS table-name.statistic-name. As a guide use
the DBCC SHOW_STATISTICS (table-name, index-name) command, or query
sysindexes.
These are covered extensively in the BOL.
How do I recover from it?
The “potentially” corrupt database can be a pain to deal with. A big problem here is
“fake” corruption, that’s right, I have experienced it a few times, for no apparent reason,
where checkdb would return different results on each execution but generally settle on a
Christopher Kempster
270
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
set of objects; only to find a simple reboot of the server saw the instance and database
mount and was clear of all errors. Very strange.
Before you run any repair command, or attempt to restore or detach the database,
always attempt to backup the database, either logically (via SQL backup command) or
physically copy the file. Do not delete files, no matter how right you think you are in the
path you’re about to execute.
Generally speaking messages related to corruption will provide information about the
object affected, and the index identifier:
Object ID 13451531, index ID 0: Page (1:21112) could not be processed. See other errors for details.
Where the index ID:
Indid
Indid
Indid
Indid
0 is a data page with no clustered index.
1 is a data page with a clustered index.
2 to 254 is a non-clustered index page.
255 is a text page.
The DBA can use OBJECT_NAME(id) to get the name of the table, or DBCC PAGE(dbid,
pagenum). Set trace flag DBCC TRACEON(3604) before running the command.
The DBA should place the database in single user mode and reconfirm the database
integrity:
Disconnect/kill all user sessions or wait till they disconnect
exec sp_dboption 'northwind', 'SINGLE_USER', 'on'
use northwind
DBCC CHECKDB
After a complete database backup, attempt to recover:
DBCC CHECKDB('northwind', REPAIR_REBUILD)
Then check system integrity again. The repair_allow_data_loss option, as per the BOL
should be used sparingly. If the issues persist, move to standard backup file recovery
procedures.
If you suspect major hardware issues, stop the instance, copy the database files to
another SQL Server and attempt to attach the database files (sp_attach_db). The event
or SQL Server logs “should” include some valuable information related to physical IO
problems.
NOTE – A suspect database may be a function of corrupt pages. Check the events
logs and SQL Server logs carefully.
If worst comes to worst, also consider third party tools; for example:
http://www.mssqlrecovery.com/mssql/index.htm
Christopher Kempster
271
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
“Database X cannot be opened, its in the middle or a restore”
This may occur when the last backup applied during a restore option used the WITH
NORECOVERY command. If so, we can complete recovery at this point and open the
database via:
restore database mydb with recovery
See the BOL for the RESTART command if the restores were directed via tape media at
the time.
Installing MSDB from base install scripts
If you have no backups of MSDB, one may consider the instmsdb.sql script to re-create
the MSDB database; this of course will completely remove any packages, jobs, alters etc
you defined previously. The MSDB re-create script is found at:
..also note the northwind and pubs database scripts.
Shutdown the instance and use trace flag –T3608 to only recover the master database on
startup:
You will see this is a common step for many database restore scenarios.
Deattach the database and run the script (see next regarding MSDB detachment).
As a side note, you can also copy the MSDB database files off the installation CD and reattach using these files; simple and effective.
Model and MSDB databases are de-attached (moving db files)?
You cannot de-attach system databases:
Christopher Kempster
272
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
Server: Msg 7940, Level 16, State 1, Line 1
System databases master, model, msdb, and tempdb cannot be detached.
To get around this, start SQL Server with the trace flag –T3608 then re-run the deattach
command again:
The commands below run without error:
If you still have issues with MSDB, then stop SQL Agent.
On starting the instance (minus the trace flag) we get this error:
2004-01-19 23:07:42.04
spid5
Could not find database ID 3.
Database may not be activated yet or may be in transition.
The default ID’s for the system databases are as follows:
MASTER
MODEL
Christopher Kempster
1
3
273
S Q L
MSDB
TEMPDB
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
4
2
The DBA should at all times, re-attach in order of these identifiers to avoid possible issues
after restoration. In our case, the instance is now down. We can use the services applet
or run SQL Server via the command like with trace ID#3608. I also start the instance
with –m:
sqlservr –m –sCKTEST1 -f –T3608
where cktest1 is the instance name.
The instance starts successfully. Run Enterprise Manager. Notice that the lists of
databases are blank:
Go back to your command line. Notice that sqlservr has exited and shutdown the
instance:
Once when starting the instance using trace flag 3609 (skip creation of tempdb) and then
invoking EM, I had a process dump which ended with:
2004-02-18 22:47:46.07 spid51 Error: 3313, Severity: 21, State: 2
2004-02-18 22:47:46.07 spid51 Error while redoing logged operation in
database 'tempdb'. Error at log record ID (5:19:22)..
Therefore is probably best we stick with using Query Analyser to complete the reattachment (note that –m or –f will have no affect also). Re-start via the command line
and connect via Query Analyser:
sqlservr –sCKTEST1 -f –T3608
Quering the master..sysdatabases table, we see this:
master
Northwind
Christopher Kempster
1
6
274
S Q L
pubs
tempdb
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
5
2
Re-attach MODEL then MSDB:
use master
go
sp_attach_db 'model',
'c:\work\ss2kdata\MSSQL$CKTEST1\data\model.mdf',
'c:\work\ss2kdata\MSSQL$CKTEST1\data\modellog.ldf'
go
sp_attach_db 'msdb',
'c:\work\ss2kdata\MSSQL$CKTEST1\data\msdbdata.mdf',
'c:\work\ss2kdata\MSSQL$CKTEST1\data\msdblog.ldf'
go
Shutdown SQL Server via a CTRL-C at the command prompt. Use Service Control to
start the instance and re-check the log. The instance should start without error.
Remove the trace flag before you re-start the instance once you are satisfied all is well.
Restore Master Database
Restoring the master database is not fun but it is necessary in rare circumstances. In
this scenario we need to restore back to the last full backup of the master database as a
variety of logins have disappeared and some configuration changes have been made, so
we are sure that the restore will assist in resolving the problem.
1. Backup the existing master database and verify the backup
i. Copy the back to another server and also to tape where
possible
2. Attempting to restore from EM will give you this:
3. Kick off all users, and shutdown the instance.
4. Alter the service properties to force instance startup in single user
mode by entering –m in the startup options for the service.
Christopher Kempster
275
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
5. Leave the service window open
6. Run EM. Connect to the instance and open the restore database dialog
for the master database. Here, we have selected the backup to be
restored and ensured beforehand that the file is ready and available.
7. On successful restore, the following dialog is shown. Go back to the
service control window and remove the –m single user mode option
and re-start the service.
Christopher Kempster
276
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
8. Close and reopen EM, connecting to the instance. Check the SQL
Server error logs on failure to start the instance.
This example is simplistic and there are scenarios where this operation can create further
problems. The key issue here is that the master database includes a variety of system
tables, with the file paths for the model and msdb and tempdb system databases. If you
restore the master (which stops your instance immediately), and attempt to re-start,
unless those paths are still valid, the instance will not start.
Consider the rebuildm.exe command (rebuild master) to assist in restoring back to a
state where at least the instance starts and then you can recover each system database
thereafter.
Restore MSDB and Model Databases
For a system database, this is simple and painless. The DBA must shutdown SQL*Agent
before attempting a restore. Once done, double check via exec sp_who2 and
connections to the MSDB database. They must be disconnected before attempting the
restore.
Restoring the MODEL database is like any other user database.
The DBA should restore MODEL before MSDB (if it requires restoration of course).
No backups of MODEL ?
Another option the DBA has is to copy the model.mdf and modellog.ldf files from the SQL
Server Installation CD. Read the next section for more information on collation issues
and how this can be done.
No backups of MSDB ?
For MSDB, consider the instmsdb.sql script.
Recovery of System Databases and NORECOVERY option
Microsoft support released a support note that explains how the restoration of a system
database, in which the NORECOVERY restore option was used, can result in instance
startup problems. In the case of the model database being left in this mode on instance
re-start the database has been left in a non-operational state, on startup the database
cannot be opened and tempdb cannot be created.
To get around the issue:
a) Start SQL Server with the following command line options:
-c, -m, -T3608, -T4022
b) Attempt to end recovery of the database:
restore database model with recovery
c) Otherwise, update the sysdatabases table and the status column to 16 for the model
database only.
d) Restart the instance minus the parameters in a)
Christopher Kempster
277
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
Collation Issues - Restores from other Instances or v7 Upgrades
The system databases, namely master, msdb, tempdb and model, do not necessarily
require the same collation for the instance to start. Here is an example. We have
installed a new named instance with a Greek collation as shown below, the default was
Latin1_General with Accent sensitivity.
On confirming the installation with a simple connect, we shutdown the instance and
delete the model database files.
On starting the instance we get the following error:
2004-01-19 13:36:59.70 spid5
FCB::Open failed: Could not open device C:\Program
Files\Microsoft SQL Server\MSSQL$CKTEST1\data\model.mdf for virtual device number (VDN)
1.
2004-01-19 13:36:59.75 server
SQL server listening on TCP, Shared Memory, Named Pipes.
2004-01-19 13:36:59.75 server
SQL Server is ready for client connections
2004-01-19 13:36:59.75 spid5
Device activation error. The physical file name
'C:\Program Files\Microsoft SQL Server\MSSQL$CKTEST1\data\model.mdf' may be incorrect.
2004-01-19 13:36:59.81 spid5
Device activation error. The physical file name
'C:\Program Files\Microsoft SQL Server\MSSQL$CKTEST1\data\modellog.ldf' may be incorrect.
2004-01-19 13:36:59.84 spid5
Database 'model' cannot be opened due to inaccessible
files or insufficient memory or disk space. See the SQL Server errorlog for details.
We copy the model database files back from CD (see previous scenario), alter the files’
read-only property, and re-start the instance. The instance will start fine.
Christopher Kempster
278
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
Checking the system database collations we see this:
master – Greek_CS_AS_KS
model – SQL_Latin1_General_CP1_CI_AS
msdb - Greek_CS_AS_KS
tempdb - SQL_Latin1_General_CP1_CI_AS
NOTE - Select properties of the database in EM, or run exec sp_helpdb via query
analyzer to get the database collation.
So now we can alter the model database (and therefore the tempdb collation on instance
re-start) and its collation right? Wrong:
alter database model collate greek_CS_AS_KS
Server: Msg 3708, Level 16, State 5, Line 1
Cannot alter the database 'model' because it is a system database.
This is actually a SS2k feature. Previous SQL Server versions prevented system
database restores of a different character set / sort order. This has been brought on by
the ability to set collation at install time, for each user database, and at the column/t-sql
variable & SQL statement level. At the same time though, you cannot alter the collation
of any system database via the simple alter command, even though a restore from a
backup may change it from the installed default for the instance.
The flow on effect can be this error within your database applications:
'cannot resolve collation conflict for equal to operation'
If you utilize temporary tables (# / ##), or tempdb is used to complete a large sort
operation, having tempdb (built from your model database on startup) with say
SQL_Latin1 and your user databases in say Greek_CS may result in this error, preventing
the operation from running until you explicitly state the conversion via the COLLATE
command in the DML operation. This is far from ideal and can render applications close
the useless (are you going to re-write the app code? I don’t think so).
Therefore, be very wary when restoring database files from other instances to complete
your recovery; especially where collation is concerned.
To get around the collation issue, take the following into consideration:
a) Use rebuildm.exe (rebuild master) and restore with the appropriate collation.
From here retain the model database and re-apply your “typical user database”
settings to model for future databases, along with the specific initial properties for
tempdb. If MSDB is still an issue for you, export DTS packages, jobs, and reapply these on the new MSDB database.
b) ALTER DATABASE mydb COLLATE – this command will alter the user database
collation, but will not alter any existing string column collations for existing
database tables. Consider looking at information_schema.columns to determine
what tables are affected and altering the column collation. Always test carefully
to ensure the change has taken affect. The worst case is having to import/export
the altered table data to take up the new collation.
Christopher Kempster
279
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
IMPORTANT – Get the MODEL database collation correct and TEMPDB will follow.
Suspect Database (part 1)
A database may become suspect for a variety of reasons such as device errors,
missing files etc, or another process (like a 3rd party backup program) has a lock on
the database files during instance startup etc.
Within EM you will see this:
NOTE – You can confirm the database status via the command:
select databasepropertyex(‘northwind’, ‘status’)
First of all, check the error logs to better gauge the extent of the problem. In this
particular case the error is:
Starting up database 'Northwind'.
udopen: Operating system error 32(error not found) during the creation/opening of physical
device C:\Program Files\Microsoft SQL Server\MSSQL$MY2NDINSTANCE\data\northwnd.mdf.
FCB::Open failed: Could not open device C:\Program Files\Microsoft SQL
Server\MSSQL$MY2NDINSTANCE\data\northwnd.mdf for virtual device number (VDN) 1.
If the physical device is missing or a simple restore is required with the move option.
This is assuming we cannot quickly resolve the error otherwise. The DBA may need
to use a third party utility to determine if another process has the file open. There
are many available on the internet (for example www.sysinternals.com).
If the file is “open” but the process is orphaned for whatever reason, we can
attempt:
a) If the instance is UP, attempt to backup the database (may fail, but is well
worth the try). Also, check disk space available for all drives used by system
databases.
b) If the instance is down, physically backup all system database files to another
disk
c) Attempt to kill off the rogue operating system processes holding the files open
and stop/start the instance with the –m parameter
d) Attempt to run:
a. exec sp_resetstatus ‘northwind’
Database 'northwind' status reset!
Christopher Kempster
280
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
WARNING: You must reboot SQL Server prior to accessing this database!
e) Run DBCC CHECKDB or DCC CHECKCATALOG if possible and copy the results
to a ascii file (for reference).
f) If all fine – shutdown the instance and re-start without –m
a. Full backup the database
g) Reboot the server (or restart the instance) or attempt to run DBCC
DBRECOVER (northwind)
h) If you decide to place the database in emergency mode, then do so as a last
resort. You will have the ability to BCP out data (even corrupt data), but is
far from ideal.
SP_CONFIGURE 'allow updates', 1
RECONFIGURE WITH OVERRIDE
GO
UPDATE master..sysdatabases set status = -32768 WHERE name = 'mydb'
GO
SP_CONFIGURE 'allow updates', 0
RECONFIGURE WITH OVERRIDE
WARNING – Before attempting any recovery or change of database status,
always shutdown the instance and backup the database files.
On any change of DB status related to recovery, the DBA should run the following on
the database and use the CHECKDB parameters accordingly to recover corruption.
dbcc checkdb
dbcc newalloc
dbcc textall
Be aware that using REPAIR_ALLOW_DATA_LOSS option for CHECKDB should be a
last resort.
IMPORTANT - I should iterate that suspect databases must be carefully analysed,
in some cases I have found that, for some unexplained reason (i.e. no error log
entries) the instance starts and a user database is in suspect mode. If you have
verified the existence of all database files, then attempt to re-attach the database
via the sp_deattach and sp_attach commands. Always backup the database files
before attempting any sort of recovery. See Part 2 for further some insight.
The DBA may also consider detaching the suspect database (via EM is fine). Go to
your file system, move the missing files, then return to EM and run the attach
database wizard. In the wizard window, you will see red crosses where the file
name/path is invalid. Alter the path/filenames, set the “attach as” and set the owner
to “sa” and the database should be successfully re-attached and operational.
Suspect Database (part 2) and the 1105 or 9002 error
From the vaults of MSDN, Microsoft mention that under rare circumstances (I personally
have never had the error) the automatic recovery of a database on instance startup may
fail, typically due to insufficient disk space. Message 1105 and/or 9002 will be
generated.
The database will be:
Christopher Kempster
281
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
a) Marked as suspect
b) Database is taken offline
The resolution to both the 1005 and 9002 errors are detailed on MSDN as:
NOTE - The DBA should check free disk space carefully, and check the log file to
determine if it is size restricted (auto-growth disabled). Where possible we want to
avoid adding more log files to the database but, if it suspected to be corrupt or in
error, we can also add log files via the ALTER DATABASE command with the ADD
FILE option or enlarge the existing file via the MODIFY FILE option.
..and..
If you reset the status via sp_resetstatus, it will ask you to complete recovery before
accessing the database, if you don’t you will get something like this:
Christopher Kempster
282
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
Prior to updating sysdatabases entry for database 'northwind', mode = 0 and status = 256 (status
suspect_bit = 256).For row in sysdatabases for database 'northwind', the status bit 256 was forced off
and mode was forced to 0.Warning: You must recover this database prior to access.
As the documentation above states, we can use DBCC DBRECOVER. If you attempt to
use the command restore database XXX with recovery you will get the same message
above.
Ensure you visit the website for more/updated information before you attempt these
steps. As a general rule, if the database is down, then backup the database files before
you attempt any form of recovery.
Suspect Database (part 3) – restore makes database suspect?
I have not experienced this error myself, but was discussed on the www.lazydba.com
news group and a solution offered by Bajal Mohammed. The people involved attempted
the following without success:
a) after making sure that all the files are physically there, we tried to reset the status
and restart the SQL Server service, but the server was not able to recover the database.
Hence it was marked suspect again. The error that we were getting was "Failed
Assertion"
b) we created a new dummy database with same file structure and file groups and gave
the filenames as *_new.mdf, *_new.ndf, *_new.ndf & *_new.ldf (in the same locations
as the original database. Files were 1 GB each (log file 10 MB). Then we took the new
database offline, renamed the files of the Original production database to the new file
names (after renaming them to old) and tried to restart SQL Service, but when it tried to
restore the database, gave a strange error that MS was not able to explain either. It gave
the filename (with path) of the *.NDF files, saying that this is not a primary file... etc.
c) finally we decided to restore from backup. Since EMC took a backup (scheduled)
around 1am, of the corrupt Databases, we had to restore from Tape. The tape restore
finished, but the database is still suspect. When we reset the status using sp_resetstatus,
it came up with the same error in b) above.
The presented solution was as follows:
1) create one database with name of "mytestdb". The database file should reside in the
same directory as the user database. For example,
F:\Program Files\Microsoft SQL Server\MSSQL$xyz\Data
2) Offline SQL server.
3) rename mytestdb.mdf to mytestdb.mdf.bak. Rename your userdatabase.mdf to
mytestdb.mdf. userdatabase.mdf is the name of the user database MDF file.
4) Online SQL server. Now mytestdb may be in suspect mode.
5) run the below script to put mytestdb to emergency mode:
use master
go
Christopher Kempster
283
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
sp_configure 'allow updates', 1
reconfigure with override
go
update sysdatabases set status = 32768 where name = 'mytestdb'
6) offline and online SQL server again.
7) rebuild the log for mytestdb:
DBCC TRACEON (3604)
DBCC REBUILD_LOG('mytestdb','mytestlog1.ldf')
8) Set the database in single-user mode and run DBCC CHECKDB to validate physical
consistency:
sp_dboption 'mytestdb', 'single user', 'true'
DBCC CHECKDB('<db_name>')
go
9) Check the database is no longer suspect.
Suspect Database (part 4) – Cannot open FCB for invalid file X in database XYZ
This is nasty error; I have experienced the error when system indexes in the primary file
group data file are corrupt. The error shown may be something like:
The database may still be accessible, but this seems to be for a finite time and is directly
related to any further IO against the database.
On running DBCC CHECKDB it reports no allocation or consistency errors. If you profile
the database and SQL against it, you may see random errors such as:
ie:
Christopher Kempster
284
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
I have also noted that, on re-start of the instance the DB may come up correctly with no
error but there will come a time when you will receive this error:
And marks the database as suspect:
Using trace flags to override recovery is not effective. Following the standard approach to
dealing with suspect database (see contents page) also failed. I also tried copying the
database files and re-attaching as a new name, but again we receive the error:
EXEC sp_attach_db @dbname = N'recovertest',
@filename1 = N'c:\temp\yyyy.mdf',
@filename2 = N'c:\temp\xxxx.ldf'
Server: Msg 5180, Level 22, State 1, Line 1
Could not open FCB for invalid file ID 0 in database 'recovertest'.
Connection Broken
A deattach DB again the now suspect database tells us the database does not exist;
along with a drop database command. Even so, dropping the database via Enterprise
Manager was fine.
In the end, a simple database restore from the last full backup was fine. Rolling forward
on the logs requires careful analysis of the SQL Server log files to determine at what
point to stop the recovery before the problem occurred. Take the time to check for
physical device corruption.
Suspect Database (part 5) – drop index makes database suspect?
A good friend of mine running developer edition of SQL Server, found his database in
Christopher Kempster
285
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
suspect mode after dropping an index from a very large database table (DB was over
40Gb on this low spec’ed PC). Unfortunately he had no record of the sql server error log
entries, and there was nothing within the Windows event log. He also set the database
to emergency mode.
To effectively resolve suspect databases, you really do need to error entries around the
time of the suspected problem, without it can make live difficult when determining the
path we need to take. In this case, the database files were not deleted, but the dropping
of the index may have resulting in corruption with the data or log files (only two files for
this database) OR the classic 9002 errors due to out of space errors.
Attempting to run CHECKDB over the 40gb database was taking a huge amount of time,
in the order of 8hrs plus from our estimates. Due to the urgency of fix, this was deemed
not an option. The final solution was a restore from a previous backup, in the mean
time; we attempt to get the database back online.
Let us sidetrack for a moment. Before we attempt any action, the DBA must:
d) full backup the master database, or better still the master data and log
files
e) backup/copy the suspect databases data and log files
The DBA can attempt to set the database mode from suspect to emergency via:
update sysdatabases
set status = status | -32768
where name = 'northwind'
-- Status now = -32740
and set it back to “normal” by setting the STATUS column to a value of 24. There is no
guarantee of course that playing with status fields will achieve the database property you
desire (i.e. from suspect to open and normal!). Note that bit 256 in the status column
represents a suspect database (not recovered); to see a list of all possible bit values look
up the BOL or query:
select * from master..spt_values
so setting the status to a raw 256 only, forces the database into suspect mode.
If the database is truly suspect then any SQL will leave it as suspect. Attempt to
checkpoint the database and re-check the database. If no change/still-suspect, attempt
to run dbcc checkdb or restart the sql instance and check the error log. Classically
suspect databases in which the database files are available and named correctly indicates
either:
a) free disk space problem preventing effective transaction logging
b) corrupt virtual transaction log(s)
Christopher Kempster
286
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
An effective way to test transaction log validity is to:
a) sp_resetstatus
b) dbcc dbrecover (see part 3)
Back to our example now ☺
The database is in emergency mode, and we suspect either a data or log file corruption.
The above steps are recommended over what we are going to do below as an example.
In an attempt to fix the problem, we will sp_detach_db the database. On doing this we
found the database currently in use:
To get an idea of SQL 2k wait types, consider http://sqldev.net/misc/waittypes.htm , this
latch type is a shared latch typically taken whilst allocating a new page, perhaps due to
space issue OR high contention for the resource. The resource column, holding the value
2:1:24 that can be decoded to DBID, File and Page ID. The issue here is not so much
SPID 55, but SPID 53, note its wait-type, referencing DBID 7 which is our suspect
database.
NOTE – look at the sysprocesses table carefully, especially for blocking process that
may relate to the underlying reason for the suspect. This MS support document
contains queries for viewing blocking data:
http://support.microsoft.com/default.aspx?scid=kb;EN-US;283725
In this case we killed the SPID 55 process and attempted the command once again:
The message is not overly encouraging, and turned out to be a HUGE mistake as we will
see.
When we attempted to attach the database the following Windows event messages in the
application log (note that we are attaching the database with a new name):
Christopher Kempster
287
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
From the messages we believed the transaction log was corrupt. We attempted to use
the command sp_attach_single_file_db but with no luck:
With no way to re-attach the database without error, we have no other choice but to:
a) shutdown the instance
b) copy back the database files from backup (master and our problem database)
c) re-start the instance
Once in emergency mode, we can BCP out the data etc. We did not try it, but as we
knew the log file is corrupt, we could have tried the command
sp_add_log_file_recover_db; and attempt to remove the old log file (covered in this ebook).
How do I rename a database and its files?
Here is an example where we have changed the name of our prototype application from
“testapp” to “trackman”. We also want to:
Christopher Kempster
288
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
a) rename its database
b) rename the databases logical and physical filenames
c) fix/check logins and their default database property
NOTE – I do not cover it, but linked servers, publications, cross database chains
etc may be invalidated with the change. Replication should be addressed before
you rename the database.
To rename the database:
-- Remove users before attempting the rename, to avoid the error:
-- The database could not be exclusively locked to perform the operation.
alter database testapp set restricted_user with rollback immediate
exec sp_renamedb 'testapp', 'trackman'
The good thing about this command is that the rename will take the default database
property for selected logins with it. So steps a) and c) are now complete.
Next we will attempt to modify the file, filegroup and logical names of the database files.
Be aware that the alter database command is the key, but for some strange reason the
filename clause in the rename only works for tempdb files and no other, so this
command:
alter database trackman
modify file
(name = 'testapp_system',
newname='trackman_system',
filename='c:\work\trackman_system.mdb')
Gives you this error:
Server: Msg 5037, Level 16, State 1, Line 1
MODIFY FILE failed. Do not specify physical name.
So rename each logical file name for each file:
alter database trackman
modify file
(name = 'testapp_system',
newname='trackman_system')
The file name 'trackman_system' has been set.
Repeat this for each logical file.
To rename the filegroups we run:
alter database trackman modify filegroup DATA name = trackman_data
Repeat for each filegroup. Remember the transaction log does not have one.
To rename the database files:
Christopher Kempster
289
S Q L
a.
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
De-attach the database using Enterprise Manager (right click
properties of the database, all tasks, deattach database), or use
sp_detach_db
b. Rename the physical files via Windows Explorer or command line
c. Re-attach the database using EM (right click properties of the
databases folder) or use sp_attach_db
Alter the database to MULTI_USER mode as required.
Database is in “Loading” Mode ?
The DBA may see something like this:
This typically occurs when the database has been restored to an inconsistent state in
which it is still pending full recovery. Attempting complete recovery may give you
something like:
restore database nonefs with recovery
Server: Msg 4331, Level 16, State 1, Line 1
The database cannot be recovered because the files have been restored to
inconsistent points in time.
Verify your order of restore carefully before attempting the restoration again, and use the
NORECOVERY and RECOVERY commands appropriately.
Restore with file move
Here is a simple example:
RESTORE DATABASE [nonefs] FROM DISK = N'C:\aa.bak'
WITH
FILE = 2,
NOUNLOAD ,
STATS = 10,
RECOVERY , MOVE N'nonefs_Log' TO N'f:\nonefs_Log.LDF'
Restore to a network drive
To use database files over a network device, start the instance with trace flag 1807.
Otherwise you will receive the error:
"File mydb.mdf is on a network device not supported for database files."
Christopher Kempster
290
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
Restore a specific File Group
Database: mydb
File-group name
mydb_primary
mydb_data
mydb_index
N/A
Backups:
C:\mydb_full.bak
C:\mydb_log1.bak
C:\mydb_diff1.bak
C:\mydb_log2.bak
C:\mydb_log3.bak
{failure occured}
Physical file-name
c:\mydb_system.bak
c:\mydb_data.bak
c:\mydb_index.bak
c:\mydb_log.ldf
Full
Log
Differential
Log
Log
IMPORTANT – You cannot do file/file-group backups for databases using a simple
recovery model
If mydb_data file-group failed/is-corrupt (the logical name of the filegroup and the
logical name of the file are the same in this case), we need to restore:
If you attempt a restore, and there is not current transaction log backup, you will get
this error:
Therefore, begin by running a transaction log backup against the database. So our
backup list changes to this:
Backups:
C:\mydb_full.bak
C:\mydb_log1.bak
C:\mydb_diff1.bak
C:\mydb_log2.bak
C:\mydb_log3.bak
{failure occured}
C:\mydb_log4.bak
Full
Log
Differential
Log
Log
Log
Before attempting the restore (and possibly getting the same message again, you should
alter the database and place it in restricted mode, so users cannot connect whilst the
database recovery is completed.
Christopher Kempster
291
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
If we attempt to restore, say up to mydb_log3.bak, you will get something like this:
Why? Basically all other filegroups are further forward in time (LSN) relative to that of the
filegroup we are attempting to restore. As such, the DBA must select the option:
or in other words, NORECOVERY. Alternatively use the STANDBY clause. The entire
database is effectively read only at this point due to the incomplete recovery of this single
file-group.
To complete the recovery, the restore list is:
a)
b)
c)
d)
e)
mydb_full (mydb_data filegroup only)
mydb_log1
mydb_log2
mydb_log3
mydb_log4 (with RECOVERY)
IMPORTANT – Note that we don’t use the differential backup to complete the
recovery in this scenario.
-- File Group from FULL backup
RESTORE DATABASE [mydb]
FILE = N'mydb_data',
-- logical name of the file in the FG
FILEGROUP = N'mydb_data'
-- this is optional if only 1 file in the FG
FROM DISK = N'C:\mydb_full.bak'
WITH FILE = 1, NOUNLOAD , STATS = 10, NORECOVERY
-- Log backup @ time 1, restore logs as normal
RESTORE LOG [mydb]
FROM DISK = N'C:\mydb_log1.bak'
WITH FILE = 1, NOUNLOAD , STATS = 10, NORECOVERY
-- Log backup @ time 2
RESTORE LOG [mydb]
FROM DISK = N'C:\mydb_log2.bak'
WITH FILE = 1, NOUNLOAD , STATS = 10, NORECOVERY
-- Log backup @ time 3
RESTORE LOG [mydb]
FROM DISK = N'C:\mydb_log3.bak'
WITH FILE = 1, NOUNLOAD , STATS = 10, NORECOVERY
Christopher Kempster
292
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
-- Log backup @ time 4
RESTORE LOG [mydb]
FROM DISK = N'C:\mydb_log4.bak'
WITH FILE = 1, NOUNLOAD , STATS = 10, RECOVERY
Once complete, do a final LOG or FULL backup.
Adding or Removing Data Files (affect on recovery)
Consider the situation where a database file has been added to the database between
transaction logs. Therefore we have this scenario:
Backups:
C:\mydb_full.bak
C:\mydb_log1.bak
Full backup
Log backup
-- new file added to database
ALTER DATABASE mydb
ADD FILE
(
NAME = mydb_newfile,
FILENAME ='c:\mydb_newfile.mdf',
SIZE = 1MB,
FILEGROWTH = 10%
)
GO
C:\mydb_log2.bak
{failure occured}
Log backup
To restore we need to:
RESTORE DATABASE [mydb]
FROM DISK = N'C:\mydb_full.bak'
WITH FILE = 1, NOUNLOAD , STATS = 10, NORECOVERY
RESTORE LOG [mydb]
FROM DISK = N'C:\mydb_log1.bak'
WITH FILE = 1, NOUNLOAD , STATS = 10, NORECOVERY
RESTORE LOG [mydb]
FROM DISK = N'C:\mydb_log2.bak'
WITH FILE = 1, NOUNLOAD , STATS = 10, RECOVERY
The completed restore will show the newly added file with no further issues. Be aware
though, Microsoft Support document Q286280 states otherwise, and there may be a
scenario where the above does not work. Revisit this support document for assistance.
Emergency Mode
This mode is undocumented and is technically unsupported, but is required on very
rare occasions. This mode allows the DBA to access a database without the log file
being present.
-- Allow updates to sys tables
exec sp_configure N'allow updates', 1
reconfigure with override
Christopher Kempster
293
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
-- If possible, attempt to set db in DBO only access mode (for safety sake)
exec sp_dboption N'Northwind', N'dbo use', N'true'
-- Record the existing record entry for the database
SELECT * FROM master..sysdatabases WHERE NAME='northwind'
-- Set DB into emergency mode
UPDATE master..SYSDATABASES SET STATUS=32768 WHERE NAME='northwind'
Stop and Re-start MSDTC.
-- Refresh Enterprise Manager
Attempting a backup or any other operation that uses transactions will result in the error:
To export out the data and associated objects, create a blank database in the same or
another database instance. Once done, run the Export wizard, select the database in
emergency mode and follow the prompts. A DTS will be created and will happily export
the database, typically without error so long as there are no underlying permission
issues.
Drop the source database as need be.
This is a very simplistic example but provides some direction towards dealing with the
problem.
NOTE – Setting a database to emergency mode is very handy when suspect
databases wont allow you to investigate the problem via DBCC commands etc.
Altering the status to emergency mode and then running, say, DBCC CHECKDB will
allow you access to the database and execute a variety of commands to resolve the
problem.
Restore Full Backup
For user databases, I tend to opt for EM as it’s simple and quick. Before restoring by
any method always check:
Christopher Kempster
294
S Q L
a)
b)
c)
d)
e)
f)
g)
h)
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
can I backup the database before the restore? (i.e. yes)
notification of end-users and killing sessions
database name
location and name of the files
remembering to fix orphaned logins if restoring to another server
re-checking the database recovery model and associated options
verifying subsequent backups will still operate as per normal
always write down in a log what you did, why and the files used.
No example is required for this scenario.
Partial (stop at time) PITR Restore on a User Database
To restore to a point in time, ending at a specific transaction log backup in your backup
sequence, we use the STOPAT command, for example:
RESTORE LOG [mydb]
FROM DISK = N'C:\mydb_log2.bak'
WITH FILE = 1, NOUNLOAD , STATS = 10, RECOVERY ,
STOPAT = N'8/08/2002 9:42:02 PM'
Use the GUI, or the commands:
restore headeronly from disk = 'C:\mydb_log1.bak'
restore headeronly from disk = 'C:\mydb_log2.bak'
and the backupfinishdate column to determine the most appropriate log files to be
used.
Corrupt Indexes (DBMS_REPAIR)
The DBA should be regularly running the following against all databases:
DBCC CHECKDB
DBCC TEXTALL
DBCC CHECKCATALOG
DBCC CHECKALLOC
These routines will report on allocation inconsistencies with tables and indexes that
typically point at data corruption. Even so, don’t be too quick to react. Before doing
anything always full backup the existing databases and try the following:
DBCC CHECKDB(‘mydatabase’, REPAIR_REBUILD)
a.
b.
c.
d.
Kill off all users or wait till they disconnect
exec sp_dboption 'northwind', 'SINGLE_USER', 'on'
DBCC CHECKDB('northwind', REPAIR_REBUILD)
exec sp_dboption 'northwind', 'SINGLE_USER', 'off'
Also try DBCC CHECKALLOC.
Christopher Kempster
295
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
IMPORTANT – Do not use dbcc dbrepair
If you are getting desperate, Microsoft has an undocumented command (typically
suggested by Microsoft support) called sp_fixindex. Restart the instance in single user
mode, checkpoint, run sp_fixindex, checkpoint again and backup once more. Re-start
the instance and re-run the DBCC routines.
See Microsoft support document Q106122 for more information.
Worker Thread Limit of N has been reached?
The DBA can configure the number of worker threads available to core SQL processes
such as handling checkpoints, user connections etc. The threads are pooled and released
quickly, therefore the system default of 255 is rarely changed. If the value is exceeded,
you will receive the limit message in the SQL Server log.
To resolve the issue:
a) review why so many threads are being used and be convinced it is not simply an
application in error.
b) use the sp_configure command to change the value
exec sp_configure
-- check the current value
exec sp_configure ‘max worker threads’, 300
reconfigure
-- force the change
-- set the new value
Reinstall NORTHWIND and PUBS
Run the scripts found in the /install directory for the instance:
Instnwnd.sql
Instpubs.sql
Some of my replicated text/binary data is being truncated?
The Max Text Repl Size option allows you to specify the size (in bytes) of text and image
data that can be replicated to subscription servers. The DBA can change the default
value via the Max Text Repl Size option:
a) Run Query Analyser
b) Connect to the SQL Server.
c) Run the following
exec sp_configure 'max text repl size', 6000000
go
reconfigure
go
Other Recovery Scenarios
Scenario 1 - Lost TEMPDB Database
Christopher Kempster
296
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
If you delete the tempdb and templog databases files, they are simply re-created on
instance startup. Assuming of course the model database is available and the disk
sub-system has sufficient free space:
It is created based on the entry in master..sysdatabases
use tempdb
go
sp_helpfile
go
The DBA can move this location via the commands below and re-starting the
instance.
use master
go
alter database tempdb modify file (name = tempdev, filename = 'c:\tempdb.mdf')
go
alter database tempdb modify file (name = templog, filename = 'c:\templog.ldf')
Go
File 'tempdev' modified in sysaltfiles. Delete old file after restarting SQL Server.
File 'templog' modified in sysaltfiles. Delete old file after restarting SQL Server.
Note that after the alter statement the entries in master..sysaltfiles,
master..sysdatabases and master..sysdevices remain unchanged. On restart, the
tempdb files have now moved to their new location but the entry in master..sysdevices
remains unchanged. Only sysaltfiles and sysdatabases has been altered.
If the device in which the tempdb datafiles are created is no longer available, the
instance will not start, as there is no other default value SQL Server will magically use.
To resolve this problem we need to use the rebuildm.exe (see Scenario 2.)
Scenario 2 - Rebuildm.exe
There comes a time in every DBA’s life where the rebuildm.exe (rebuild master) utility is
used, either to change the instances global collation or due to a disaster in which one or
more system databases need to be restored and we don’t have a valid or any full backup
(this should never happen for any reason).
The rebuildm.exe is found on the installation CD, cd-rom:\x86\binn. In the following
example we will run the command and highlight the subsequent steps to complete the
recovery.
NOTE – If copying the CD to disk, make sure the files in ?:\x86\data\ are not
read-only or have their archive bit set.
Christopher Kempster
297
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
A digression - when using disk two and running rebuildm.exe, I received the
following error:
To get around this unforeseen problem I copied it to disk and renamed the directory
c:\x86\binn\res\1033 to c:\x86\binn\Resources\3081. The utility then ran without a
problem.
REMEMBER – DON’T restore your master database after running rebuildm.exe if
the objective was to alter the server collation. Always backup as much as possible,
and consider scripting logins before attempting this process.
The steps involved are:
1. Shutdown the instance we plan to rebuild.
Christopher Kempster
298
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
2. Run the rebuildm.exe from the CD-ROM or do the above and copy to disk (not a
bad idea generally during emergency recovery scenarios). The following dialog
is shown:
The instance whose system
databases will be restored
over.
Source of the CDROM default
database files.
The DBA can also set the new
collation.
Default data directory for the
installation, this can not be
altered.
3. Press the rebuild button and respond yes to the prompt
4. The database files are copied to the new destination and the “server
configuration progress” dialog is shown, this takes around 1-2mins maximum.
Try This – Run FileMonitor from www.sysinternals.com to view the file IO and
thread calls during this process.
5. Don’t be fooled. This process affects ALL system databases not just the master
database.
Christopher Kempster
299
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
6. Check data file properties before re-starting to ensure they are not read-only.
7. Start your instance
8. Review the previous and current error log. The previous log has some good
information about the tasks undertaken with the system databases rebuild.
9. Optionally re-apply service packs
10. Optionally restore your master, model, msdb databases as need be
Before your re-start the instance with the files copied by rebuildm.exe, double check
they are not read-only. This is a common problem when the files are copied off the
CD-ROM. If this problem affects the use of rebuildm.exe then copy the files to disk
and refer to point two above.
Be careful when only restoring one or two of the system databases. All system
databases should be current with a single service pack, I have never been in a
position where a subsequent restore of the master database that had SP2 applied
existed with the MSDB database with no service packs. The DBA should think very
carefully about this and apply the service pack as required to ensure minimal amount
of error.
Scenario 3 – Lost all (or have no) backups, only have database files
To recover from this scenario:
1. Backup all database files (if available) to another server and/or to tape.
2. Check the registry for the MASTER database, and alter and/or ensure the
files are in the correct registry entry
HKEY_LOCAL_MACHINE/SOFTWARE/Microsoft/Microsoft SQL
Server/<instance name>/MSSQLServer/Parameters/{SQLArg0 and 1}
3. Attempt to re-start the instance
4. If there are still errors with system databases, namely MSDB, MASTER or
MODEL, check error log carefully and attempt to place database files at
these locations.
5. If you have no luck, run rebuildm.exe (see previous scenario)
6. The instance should successfully start
7. For MSDB database
8. Shutdown SQL*Agent service
9. Drop MSDB database
10. Reattach from your original database files
Christopher Kempster
300
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
11. For each user database re-attach database files
12. Fix orphaned logins as need be (if any)
13. Run DBCC checkdb and checkalloc against all databases
14. Check database recovery models
15. Backup databases
The DBA should revise trace flags on instance startup to assist in the task.
Scenario 4 - Disks lost, must restore all system and user databases from backup to new
drive/file locations
This is a difficult scenario. In order to start the instance, we require a valid master
database; this database also defines the subsequent location of the MSDB, MODEL
and TEMPDB database data files. If we restore the master database from our full
backup (with the move option to another disk), the sysdatabases, sysaltfiles and
sysaltdevices system tables will still contain invalid paths for the other system and
user databases as we lost those particular disks. This is made even worse, as any
time you restore the master database the instance shuts down immediately,
therefore, an instance re-start will result in numerous file-missing errors and fail to
start.
This may bring mixed comments from DBA’s, but consider the following:
1.
2.
3.
4.
5.
6.
7.
8.
9.
Run rebuildm.exe to restore system databases onto new disk(s)
Recover MSDB database from last full backup
Recover MODEL database (if very different from the original)
Restore master database from full backup and master_old
Alter system to allow changes to system tables
Transfer contents of syslogins from master_old to the master database
Re-start instance
Check system error log
Recover user databases from full backups
Christopher Kempster
301
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
INDEX
crisis manager · 12
:
D
::fn_get_sql() · 203
DAS · 325
Database
Database ID's · 253
Diagrammer · 255
Scripting · 254
database administration roles · 25
database diagram
transfer · 98
Database Maintenance Plans
differential backups · 174
database recovery models · 167
DATABASEPROPERTYEX · 78, 250
DBCC DBCONTROL · 233
dbcc inputbuffer · 100
DBCC INPUTBUFFER · 97
DBCC LOG · 70
dbcc loginfo · 71
DBCC OPENTRAN · 203
dbcc rebuild_log · 234
DBCC REBUILD_LOG · 210
DBCC SHRINKFILE · 265
dbcc sqlperf · 82
DBCC TRACEON · 88
DBCC TRACESTATUS · 88
DBCC UPDATEUSAGE · 194
DBCONTROL · 250
dbrepair · 298
Deadlocking · 92
deattach · 250
dettach system databases · 275
DFS · 215
diagrammer
scripting · 255
differential backup · 174, 181
directory structure
standards · 161
Disaster recovery · 7
disaster recovery planning · 6
distributed file system
for database files · 215
DR documentation · 12
DTS
dtsrun · 239
troubleshooting · 234
DTS package
backup · 182
A
accountability · 23
active/passive mode · 126
air conditioning · 330
authority · 23
Autonomic computing · 38
availability measures · 21
B
Backing Up with Tape Drives · 328
backup
verify · 256
Backup
history of · 261
no more space · 262
BACKUP DATABASE · 173
BACKUP LOG · 175
Backup schedule · 15
backup set · 171
blade center · 335
blade server · 335
Business Continunity · 7
business priority · 14
C
cable management · 335
change control form · 32
change management system · 29
CHECKALLOC · 298
CHECKDB · 273, 298
checkpoint · 87
CoBIT · 17
comclust · 158
communications manager · 12
compressed drive
database support · 217
Corrupt
indexes · 297
Counters
performance monitor · 211
Crisis and Disaster Contact Details · 13
Christopher Kempster
302
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
T R O U B L E S H O O T I N G
enlist error · 221
locked out
sql server · 196
log writer · 67
logical database filename · 163
LSN · 68
LUN configuration · 310
E
ECID · 94
EFS · 215
emergency response team · 23
EMPTYFILE · 266
Encrypted File Systems
for database files · 215
error log · 90
M
maintenance plan
backups · 169
maintenance plan backup · 169
Master Recovery Plan · 15
media set · 171
meta data functions · 75
Microsoft support services · 189
MOF · 19
Moving
master database · 246
MSDB and MODEL databases · 246
TEMPDB database · 248
User Databases · 249
MRAC · 39
MSDB
install script · 274
MSDTC · 217
F
FCB · 286
filegroup
recovery · 240
FILELISTONLY · 181
full backup · 173, 180
Full Text Indexing
cluster · 132
G
ghost record cleanup · 263
H
HBA · 307
HCL · 110
High Availability · 110
host bus adapter · 307
hostname change · 213
hot backup · 172
hot fixes · 37
N
naming standards · 234
NAS · 311, 312, 326
Network Attached Storage · 311, 312
Northwind
re-install · 298
I
O
iFC · 309
isalive ping · 150
iSCSI · 309, 312
isql · 78
ITIL · 18
OBJECTPROPERTY · 81
olap
backup · 177
OLAP cubes · 270
OLE-DB providers · 224
orphaned session · 97
orphaned sessions · 96
osql · 78
K
KILL command · 99
Killing users · 258
KVM switch · 334
P
parallel testing · 14
parameter sniffing · 223
personal edition · 220
physical backup device · 171
power distribution units (PDU’s) · 334
PRINT · 226
L
License Manager · 79
licensing mode · 79
linked server
Christopher Kempster
&
303
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
T R O U B L E S H O O T I N G
service level
metrics · 21
service pack
hangs · 269
service packs
rollback · 267
setspn · 202
shrink
log files · 204
Shrink Database · 264
shrink TEMPDB · 263
Shutdown instance · 249
simulation testing · 14
sp_attach_db · 251
sp_attach_single_file_db · 252
sp_change_users_login · 96
sp_configure · 80
sp_cycle_errorlog · 91
sp_delete_backuphistory · 172
sp_dropserver · 214
sp_enumerrorlogs · 91
sp_MSdependencies · 244
sp_MSforeachtable · 81
sp_procoption · 83
sp_refreshviews · 225
sp_spaceused · 82
spt_values · 288
SQL Server Named instance
cluster · 134
SQL Server Wizard · 135
storage virtualization · 323
streaming directly to tape · 184
Striping · 162
support services · 189
Suspect Database · 282
sysdtspackage · 236
production server · 36
Profiler · 92
Q
Quality Online Service · 110
R
rack heights · 334
racks · 333
racks side rails · 334
RAID · 108
RAID configurations · 320
RAISERROR · 226
re-attach · 251
reconfigure · 80
Recovery · 257
recovery interval · 165
recovery manager · 12
recovery scenarios · 188
Recovery strategy · 14
REMOVE FILE · 208
rename a database · 291
REPAIR_REBUILD · 298
responsibility · 23
restoration history · 85
Restore
emergancy mode · 296
file added · 295
file group · 293
filelistonly · 262
full backup · 297
loading mode · 292
logical names · 262
master database · 277
MSDB and MODEL databases · 279
point in time · 297
rebuildm.exe · 300
Senarios · 271
suspect database · 282
tempdb · 299
with file move · 292
RETAINDAYS · 173
robotic tape racks · 329
rollback and redo · 67
T
tablespace backup · 176
Tape drive · 327
tempdb
in RAM · 240
TEMPDB
shrinking · 263
test server · 34
time difference · 141
timeout
ado · 226
com+ · 227
IIS · 229
oledb · 229
SQL Server · 230
TOE · 331
trace flags · 86
Tracing
black box · 82
transaction coordinator · 219
S
SAN · 109, 307, 325
SATA · 315
Scripting · 254
serial advanced technology attachment ·
315
SERVERPROPERTY · 79
Service Control Manager · 151
Christopher Kempster
&
304
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
VeriTest · 110
virtual device interface · 178
virtual IP · 128
VSAN’s · 323
VSS · 31, 40
.Net · 60
branching · 48
transaction log
consolidate files · 207
shrink · 205
transaction manager · 72
transfer logins · 99
truncate
transaction log · 176
TRUNCATE_ONLY · 176
W
U
walk-though · 14
warm standby · 135
write log entry · 70
User
database ownership · 97
orphaned · 95, 96
X
V
xp_fileexist · 168
xp_fixeddrives · 82
xp_readerrorlog · 91
VDI · 178
VERIFYONLY · 261
Christopher Kempster
305
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
Appendix A
Understanding the Disk, Tape and Storage Market
Throughout this section we will cover some theory of SAN and NAS based storage
solutions and revisit RAID. Why? well clustering and high availability in general are
based around these fundamentals and it is important for the DBA to be aware of the
technologies in play, and how large scale systems architectures may effective the way
you build, manage and performance tune your database servers.
SAN (Storage Area Network)
A SAN is a high-speed sub-network (separate from your existing network infrastructure
or direct fiber) of shared storage devices. The peripherals (drives) are interconnected by
Fiber (FC) or SCSI.
The storage devices themselves are cabinets with a large number of interconnected drive
bays, RAID levels supported, power, IO controllers, network adapters or host bus adapter
cards (SCSI or Fiber), management software and operating system with a variety of
interfaces, be it web based, terminal serviced or other API call over TCP.
The SAN device is connected to front-facing switches in which servers connect via host
bus adapter (HBA) cards. For example:
A servers dual HBA’s
(for multi-path IO)
8-port
2Gb/sec switches
(SAN Fabric)
Fibre cable
Network
SAN
Diagram sourced from - http://www.bellmicro.com/product/asm/spotlights/sanplicity/
The HBA and operating system drivers provide the host/server with access to the SAN
and offloads block-level storage I/O processing from the hosts CPU(s). The devices are
highly intelligent, high throughput IO processors. The HBA’s exist at both the SAN
storage array and the server:
Christopher Kempster
306
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
Chapter 13 – Architecture Overview, Storage Networks: The Complete Reference, R.Spalding, 2003, Figure 13-3.
Chapter 13 – Architecture Overview, Storage Networks: The Complete Reference, R.Spalding, 2003.
Christopher Kempster
307
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
The HBAs connect the server to the SAN. Two or more interconnected switches create a
SAN Fabric. The fabric is designed for redundancy, performance and scalability. The
switches themselves include intelligent operating systems for management, monitoring
and security.
New switch technology allows for iFC and iSCSI connectivity (discussed later) from the
clients standard ethernet adapters over IP rather than fiber or SCSI specific HBA’s, this
offers greater flexibility to terms of connectivity and cost, which is a major issue in the
fiber networking.
The SAN itself is managed by highly-intelligent software coupled with large internal cache
that tends to marry up with the growth in SAN capacity. The SAN typically requires
specialist training for administration and performance. Be aware that vendors may not
bundle all core administrative software with the SAN and can be a costly addition at a
later date.
The following diagram provides a good logical overview of the SAN internals:
Vendors may restrict
physical loops to a
specific number of
drives, such as max
of 7 disks in a RAID5. RAID restrictions
may apply!
Otherwise
known as
LUN’s
Eg. 8Gb
Read,
300Mb
write
Server Clusters: Storage Area Networks – For Windows 2000 and Windows Server 2003,
figure 10
The virtualization of storage is the key through LUNs (logical units) that are typically seen
as basic disks under the Windows OS disk management applet.
Christopher Kempster
308
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
Server Clusters: Storage Area Networks – For Windows 2000 and Windows Server 2003,
figure 15
The administrator should take the time to evaluate vendors in terms of:
•
Licensing and maintenance model
•
Cache size and upgrade paths
o
Be very careful here as upgrades can result in new per terabyte or other
licensing model that can be very costly.
o
Maximum disk and cache capacity
o
Disk costs and vendor buy restrictions
•
SCSI and Fiber Channel support, along with switch compatibility
•
LUN configuration
o
Internal limits on size? minimum and maximum sizes?
o
Channel and/or loop restrictions in terms of physical disk connectivity
Christopher Kempster
309
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
•
Ability to inter-connect SAN’s for added resilience and storage
•
Technologies like PPRC and Flashcopy to replicate, in near real time, block level
storage from one SAN to another.
•
RAID types
o
RAID types supported? many support RAID-0 or 5 only.
NOTE – Microsoft supports booting from SANs but do note the restrictions in KB
305547.
Generally speaking, I tend to lean heavily on the senior systems administrators in terms
of actual configuration. Even so, the DBA should be confident in understanding the
performance issues of RAID arrays, how your database files will be created over the
array, striping size issues, LUN configuration and HCL issues (especially in MSCS
clusters), and most importantly the effect of large SAN disk cache on performance.
Example SAN Configuration
The following is an example of a SAN configuration for high availability. This is based on
a dual data center in which the primary SAN is duplicated to the standby SAN via PPRC
(point to point remote copy).
Gigabit ethernet
Switch
Switch
Backup
agents via
SNMP etc.
Server A
Dual SCSI or fibre
HBAs for multi-path
I/O + Teamed NIC’s
FC Switch
FC Switch
FC Switch
FC Switch
Shortwave fibre
Conv.
SAN 1
Block level
storage Tx
Single mode
duplex fibre
(9 micron)
Conv.
SAN 2
Enterp.
Backup
Software
PPRC
Real-time
Replica of
stored files
(for Dev/Test)
Flash
Copy
SCSI
DLT\LTO
TAPE
Library
What is NAS (Network Attached Storage) ?
A Network Attached Storage device (NAS) is a dedicated server with a very large [SCSI,
SATA] hard disk capacity, cutdown OS and management tools via a web-interface,
Christopher Kempster
310
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
teamed network adapters to facilitate direct connection to your existing Ethernet network
infrastructure and supporting numerous protocols including iSCSI, NFS, SMB and others.
The client may be able to:
a) map drives directly to the NAS
b) overlay additional storage virtualization technology over the NAS devices
such as Windows DFS or hardware based virtualization. Therefore the
clients know nothing of the physical NAS.
c) talk iSCSI directly or via a SCSI/iSCSI gateway
d) Mixture of the above.
The NAS device has been a god-send for many businesses that do not have the money to
deploy large scale SANs, but still want the advantages of a consolidated and scalable
storage infrastructure. The NAS is typically a plug and play solution from most vendors,
with a variety of pre-packaged or purchased add-ons for backup, fault tolerance, different
RAID configurations and expandability. But NAS performance is clearly far below that of
SAN solutions, every passing month it changes of course, so if you are performance
conscious. You do require expertise to assist in the NAS device selection and its
associated impact on the network infrastructure.
Another item worth considering is that of expandability and management of the NAS.
Multiple NAS may require individual administration, and may also result in vendor lock-in
when purchasing more capacity or clustering the NAS.
The NAS is not recommended for heavy IO database solutions, but is a very cost effective
mass storage solution for many small to mid-sized companies. The real value add comes
with its relatively “plug-and-play” setup and configuration, ease of extensibility and its
ability to leverage off your existing network investment without issues of DAS (direct
attached storage) becoming in-accessible because your server its connected or investing
it expensive switch technology or HBAs.
With virtualization of disk resource and management, the NAS will have a well earned life
within the organization for many years to come.
What is iSCSI?
The iSCSI (internet small computer system interface) protocol (ratified by IETF ) is all
about the (un)packing of SCSI commands in IP packets. The packets hold data block
level commands, are decoded by the appropriate end-point driver, and are interpreted as
if the SCSI interface was directly connected. This technology is a key driver for NAS
(Network Attached Storage) devices.
Benefits of iSCSI:
•
No need to invest in another cable network (aka Fiber)
•
No investment required in dedicated switches and protocol specific switches, we
can use standard Enternet based network cards with iSCSI drivers.
Christopher Kempster
311
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
•
Does not have the distance issues as experienced with Fiber channel (10km
reach).
•
Can be scaled in terms of speed of IP network (100Mps
•
Stable and familiar standards.
•
High degree of interoperability
1Gbps
10+Gbps)
NOTE - The iSCSI device, like DAS and SAN attached storage, knows nothing
about the "files," but only about "raw I/O" or blocks. The iSCSI appliances are well
suited to general purpose storage applications, including file I/O applications.
Some issues to be aware of:
•
Current iSCSI SAN and NAS support is questionable, check vendors carefully.
o
Particularly iSCSI Host Adapters, Disk Arrays and tape aware libraries.
•
Network management staff will start to take on many of the storage QoS
functions in terms of packet loss, latency, bandwidth and performance etc; role
shift for storage professionals?
•
Impact on CPU performance at the server/NAS device
o
TOE (TCP offload engines) cards may be used.
•
Impact on your existing network infrastructure and switch capacity. Such
implementations typically share your existing core IP infrastructure and are not
running separate physical cabling.
•
Latency may occur over your IP network; a 2 or 3% error/latency will significantly
impact your iSCSI implementation and the underlying drivers and Network OS’s
must be able to manage the data loss.
Like fiber and iSCSI connects, the storage can be virtualized (logical rather than physical)
for redundancy. The Windows 2003 family of servers is iSCSI aware.
NOTE – The only real requirement for database file creation over iSCSI storage is
the server’s ability to “see” the storage via the underlying OS.
Christopher Kempster
312
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
Anything else apart from iSCSI?
In terms of IP storage network transport protocols, we have three core models that work
over block level storage:
a) iSCSI (iSCSI/IP end device, IP fabric)
SCSI commands in TCP packets over an IP network; interconnected via
gateway (switch); be they local or remote connections to fiber SANS or
other NAS devices, even replacing the SCSI connect in DAS disk array
devices to the gateway, and the HBA’s within the server.
DAS
Array
iSCSI
Device
SCSI
iSCSI
Gateway
IP
Direct to any other IP
enabled server or
storage device.
IP (WAN/Internet/LAN/MAN)
Fibre
Device
iSCSI
Gateway
Fibre
The iSCSI interface is limited by the Ethernet connection speed; many of
which are 1Gb channels, while fiber can run at 2Gbps to 4Gbps (10Gbps
is on the horizon – along with 10Gbps ethernet).
b) FCIP (Fiber end device, Fiber fabric)
Tunnels fiber channels over IP network but can push the boundaries of
the fibers existing distance restrictions (terms of raw speed), actively
relying upon the networks packet congestion management, resend and in
order delivery.
c) iFCP (Fiber end device, IP fabric)
Fiber channel layer 4 FCP over a TCP/IP network via a gateway-togateway protocol. Lower layer FC transport is replaced with TCP/IP via
gigabit ethernet.
FC
Device
FC “Session”
iFCP
Gateway
iFCP over TCP/IP
(replaces the fibre SAN
fabric components)
device to device
device to SAN
SAN to SAN
communications.
”… Cisco FCIP only works with Cisco FCIP, Brocade FCIP only works with
Brocade FCIP, CNT FCIP only works with CNT FCIP, and McDATA iFCP only
works with McDATA iFCP.”
Christopher Kempster
313
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
There are numerous gateway (switch) vendors where you can expect to pay anything
from $17k to over $75k. The devices typically include virtualization technology
(discussed later) along with the FC-to-FC, FC-to-iSCSI, FC-Gb-Ethernet etc bridging.
NOTE - A variety of researchers are looking at alternatives to block level storage
protocols, namely something at a higher level of abstraction, perhaps at the
“object” level. (45)
Using Serial ATA over SCSI
A “new kid” on the block that evolved from the parallel ATA (or IDE) storage interface
comes serial advanced technology attachment (SATA) storage. The
interface is a single, thin cable with a minimum of four wires in differential
pairs in a point to point connection. The key here is its small form-factor,
reduced voltage requirement, thin cabling (see comparative picture on
right) up to a one meter span, 1.5Ghz clock rate giving around 150Mb/sec
with a comparative cost to that of ATA drives (in your home PC).
NOTE - Serial ATA 300 has a 3Gb/s signaling speed, with a theoretical maximum
speed of 300Mb/s.
Many storage vendors are jumping onto the SATA band-wagon, offering NAS based SATA
storage solutions, typically with a mix of SCSI for added resilience. The key here is more
drives for your dollar, driving down the possible concerns with drive resilience and
increasing your spindle per RAID array for even greater performance.
It is difficult to sum up the differences, but this table can be a guide:
Comparison
Cost per megabyte
MTBF
Exposure/Market
Penetration (2003)
Emerging or
Complementing
Technologies
Tagged command
queuing
Example pricing
3 – 5c
1.2 million hrs
80%
Serial
ATA
1 – 2c
500k to 600k hrs
20%
SAS
Serial ATA II, III
Since 1990’s
Serial ATA II, specific vendors
SCSI
$876 ($219 x 4 Raptors) + $159
$1356 ($339 x 4 Cheetahs) +
(FastTrak TX4200) = $1035 (4$379 (AcceleRaid 170) =
drive SATA RAID Array)
$1735 (4-drive SCSI RAID
Array)
Good
Poor to Moderate
CPU Usage
For full review based on pricing and performance of SCSI vs SATA, see the article “TCQ, RAID, SCSI,
SATA”, www.storagereview.com
NOTE – SATA is a CPU hog. Consider a TCP offload engine (TOE) NIC with
appropriate storage protocol drivers (like iSCSI) to offload CPU time. The SATA
controllers can be a major performance bottleneck.
At www.computerworld.com, L.Mearian provides this general performance summary. It
Christopher Kempster
314
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
is somewhat broad at 150Mb/sec for SATA v1.0, you may find actual raw performance
somewhat less; this figure tends to state multi-channel sustained performance over a
number of drives.
Serial ATA Takes on SCSI, L.Mearian, www.computerworld.com
NOTE – SCSI has used TCQ (tagged command queuing) since the 1990’s; the
feature intelligently reorders requests to minimize HDD actuator movement.
Without TCQ a drive can only accept a single command at a time (first come first
serve). The host adapter adds commands in which the controller and disk work in
unison to optimize, this is transparent to the OS. With SATA II standard includes
the provisioning of native TCQ also known as NCQ (native). Please remember that
the drive AND controller must support TCQ.
There are a large number of vendors on the market, some of which are:
•
EMC Symmetrix, EMC Centera storage appliance (uses parallel ATA drives that
require a dedicated ATA backplane and controller)
•
Hitachi Ltd Lightning arrays
•
Clarian with ATA
•
Adaptec
•
Sun StorEdge 3511 FC Array with SATA
•
NetApp (NearStore, gFiler)
Look over the article from Meta Group titled “SAN/NAS Vendor Landscape”, 7 June 2004,
P.Goodwin. This report takes a “midterm” look at the various vendors in the context of
new technologies and current strategies.
SCSI, Fiber or iSCSI?
Well, it really depends on your specific requirements and underlying infrastructure,
budget, and service requirements. There is a very good article on the internet that is
well worth reading:
“iSCSI Total Cost of Ownership” found at:
http://www.adaptec.com/worldwide/product/markeditorial.html?prodkey=ips_tco
_whitepaper&type=Common&cat=%2FCommon%2FIP+Storage
Christopher Kempster
315
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
Hard Disk Availability - Overview of RAID
Understanding RAID is basic high availability requirement. The DBA should be savvy with
the RAID levels, and understand what they mean in terms of performance and
recoverability. In this section we cover the core RAID levels, and drill into some example
RAID configurations over SAN based implementations.
Summary
RAID
Level
0
Technique Used
Summary
Capacity
Striping (no parity)
File is broken down into stripes (of a
user defined size) and sent to each
disk in the array.
Each disk has a copy or replica of
itself. Can incorporate the duplexing
of the RAID controller card as well for
each drive for added protection.
Data is stripped over data disks at the
bit level and also on redundancy disks.
Redundancy bits are calculated via
hamming codes (ECC) that are written
and read as data disks are
written/read to/from. Bit errors can
be effectively corrected on the fly via
the ECC.
Data is striped at the byte level across
disks, typically 1024 bytes per stripe.
Parity data is send to a dedicated
parity disk, any other disk can fail and
the parity disk will manage the failure.
This parity disk can be a bottleneck.
As per 3 but at a block level instead of
bytes.
Size of Smallest
Disk * No Drives
1
Mirroring/Duplex
2
Bit level striping with
hamming code ECC
(error checking and
control) disks
3
Byte level striping with
dedicated parity
4
Block level striping with
dedicated parity.
5
Block level striping with
distributed parity
6
Block level striping with
2x distributed parity
7
Asynchronous cached
striping with dedicated
parity
Mirrored stripes (or
RAID 10)
0+1
As per 4 but no dedicated parity disk,
parity is also striped across the disks
and removing the dedicated disk
bottleneck.
As per RAID-5 but two sets of parity
information is generated for each
parcel of data.
Not an open standard.
Mixture of RAID 1 and RAID 0;
RAID0+1 is a mirrored config of 2x
striped sets, RAID1+0 is a stripe
across a number of mirrored disks.
Minimum
Disks
2
Size of Smaller
Drive
2
Varies
e.g. 10 data
disks + 4 ECC
disks
(vendor
specific)
Size of Smallest
Disk * (No
Drives – 1)
3
Size of Smallest
Disk * (No
Drives – 1)
Size of Smallest
Disk * (No
Drives – 1)
3
Size of Smallest
Disk * (No
Drives – 2)
Varies
4
(Size of
Smallest Disk) *
(No Drives) / 2
4
3
Varies
Performance/Cost/Usage
RAID
Level
0
Random
Read
V.Good
Random
Write
V.Good
Seq Read
Fault
Tolerance
None
Cost
V.Good
Seq
Write
V.Good
1
Good
Good
Fair
Good
V.Good
High
2
Fair
Poor
V.Good
Fair/Avg
Fair
V.High
3
Good
Poor
V.Good
Fair/Avg
Good
Moderate
Christopher Kempster
316
Lowest
Example
Usage
TEMPDB
database
SYSTEM
databases, LOG
file groups.
Not
recommended
Not
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
4
5
V.Good
V.Good
Poor/Fair
Fair
Good/V.Good
Good/V.Good
6
V.Good
Poor
Good/V.Good
Fair/Avg
Poor
(avg to
good
with
caching)
Fair
7
V.Good
V.Good
V.Good
V.Good
V.Good/
Excellent
Good/
V.Good
V.Good/
Excellent
Good/
V.Good
0+1
(10)
&
T R O U B L E S H O O T I N G
Good
Ok - Good
Moderate
Moderate
V.Good/
Excellent
V.Good
High
V.Good
(RAID0+1),
Excellent
(RAID1+0)
High/
V.High
High/
V.High
recommended
Rarely used
DATA or INDEX
filegroups, take
care with heavy
writes. Most
economical
Rarely used
Specialized high
end only
Any, high
performance
READ/WRITE
For more information consider - www.pcguide.com/ref/hdd/perf/raid/levels as at 26 Nov 2003
Disk Performance
Is typically measure by:
a) Interface type (SCSI, Fiber) and their theoretical and physical limitations
b) Disk speed in RPM (10k, 15k)
c) Read and Write IO Queue lengths (how “busy” the drive can be in terms of
raw IO)
d) Random vs Serial disk read/write performance
e) Sustained vs Burst data transfer in Mb/sec
f)
Array type and the working spindles as a divisor or multiplier to some of
the performance figures returned
NOTE - Measuring the raw speed of your disk sub-system is an important task. I
recommend IOMETER from Intel. The key measure here is IOs per second, this
measure can be extrapolated to the measurement of Gb/hr when reviewing the
speed of backups to disk, for example:
XX IO’s/sec * #-disks * stripe-size = XXX,XXXX Kb/sec = XXX.X Gb/sec
Different raid configurations and stripe sizes may see some staggering differences
in raw speed. Take care when measuring read vs write between raid sets.
You should also take into consideration:
•
External Issues
o
Interface type, mode and speed of (theoretical maximums, sustained
transfer raid, burst speed)
o
System Bus
o
Network Interface
Christopher Kempster
317
S Q L
o
•
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
Specific RAID level support
Internal Issues
o
Controller Cache Size and Type
o
Write cache – configuration, setup and control over
o
Thermal properties of the device(s)
o
Integrated vs additional card controllers
o
Channel distribution against the RAID set
Where possible, opt for open systems for vendor independence to lower (potential) costs.
Take care not to rush your disk installation; make sure you spend sufficient time with
SCSI channel to array configuration, read/write cache settings, specific bus jumper
settings, stripe (format) size, RAID selection etc.
Database File Placement - General Rules
Consider the following:
•
Try to separate your transaction logs from data filegroups where possible.
Reduce the erratic random access behaviour exhibited of database files over the
serial transaction logs.
•
Don’t fall into the trap of creating multiple transaction log files in an attempt to
stripe writes, log files do not work this way.
•
Mirror your transaction log file disks (aka RAID-1, RAID-10) for maximum
performance and recovery purposes.
•
Retain a set of internal disks in which to store backups. Consider the impact of
large backup file writes for OLTP systems against disks also shared by database
files
•
The system databases are typically small and rarely read/write intensive so
consider using a single disk mirrored RAID-1 or RAID-5 array. For maximum
space per $, RAID-5 with read/write cache enabled will suffice a majority of
systems. Generally speaking, RAID-5 is inappropriate for heavy log file writes
(sequential in nature) and should be avoided.
•
Large/heavily used TEMPDB – the ideal configuration is RAID-0.
•
For larger databases where division of IO is important, use the file-groups to
break down the database into logical disk units, namely “data”, “index”, “audit”,
for example, so like database objects can be easily moved to appropriately
configured arrays.
Christopher Kempster
318
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
Example RAID Configurations
So let us talk about some real world RAID configurations. The DBA needs to be fully
aware of the read/write characteristics of the services you plan to run on the DBMS.
That said, take care with raw statistical figures, or even worse, the so called perfect RAID
configurations many DBA’s banter around on newsgroups and articles – many are based
on specific scenarios or the nirvana configuration for a single database within a single
instance on a server and disk pack all on their lonesome – very rare in the real world!
Over anything disk related, the DBA needs to focus primarily on system performance
gains by enhancing buffer cache utilization on reads, and enhancing writes through
effective use of indexing (not overindexing), batching commits, and completing
transactions as quickly as possible to minimize writers blocking readers. Through
ongoing performance tuning of SQL, Stored Procedures, Statistics/Histograms, View
management and Indexing, will you achive maximum gain over the shuffelling of disks,
file groups and RAID arrays.
For RAID and database files, do not be overly concerned about the transaction log files
being on their own RAID array or many database log files sharing the same array. The
log files are written serially as we know, but the key here is a RAID configuration that
does not suffer the added penalies of additional writes to maintain availability (aka RAID5 parity bits); we generally want the writes to complete as fast as possible – as such
RAID-1 or RAID-10 is highly recommended where possible. Many database logs sharing
the same array is not a problem. The key here is little or no disk fragmentation and not
sharing the array with data files that may be experiencing a multitude of writes in many
pages, from many users, all many parts of the disk. Seperating the logs from the rest of
the database files reduces this potential disk latency.
For the rest of the database, simply remember that RAID-5 is not as bad as many make
out – BUT – it will be the first to experience performance issues under heavy writes. The
examples below utilize RAID-5 extensively for a majority of database files. The systems
get away with this through:
a) enabling read/write cache for the RAID-5 array at the risks of database corruption
(very small risk that is mitigated through effective backup/recovery).
b) keep transactions as small as possible (in terms of records and objects affected
and the time to run)
c) splitting where possible indexes away from data to increase the spindle count on
reads and writes - in parallel.
d) not dumping backups to the same array or using the disks for other non-database
related activities.
e) effective SQL tuning and RAM
f)
ongoing monitoring of disk queue lengths and batch jobs
g) understanding that read performance is excellent and will, in a majority of cases,
be the higher percentage over writes that is enhanced through performance
tuning.
Christopher Kempster
319
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
Example System
System:
Hardware:
Disks:
2120 users, 40-90 concurrent, 80 trans/sec avg, 8 databases, 1 instance
Dual XEON 2.8Ghz with hyper-threading, 4Gb RAM
8x15k SCSI 320 Disks 36Gb (external disk array)
2x36Gb SCSI 320 Disks 36Gb (local disk)
Dual Channel SCSI 320 RAID Controller, 128Mb, battery backup cache
Read/write enabled cache
Battery cache backup +
128Mb cache with RAID-5
Raid-1
Raid-1
Raid-1
Raid-5
eg. 100Gb\sec\channel
over 4 channels
MASTER DATA/LOG
MSDB DATA/LOG
MODEL DATA/LOG
TEMPDB
MYDB DATA (audit tables)
MYDB DATA (indexes)
MYDB DATA (tables)
MYDB LOG
Christopher Kempster
320
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
Example System
System:
Hardware:
Disks:
3892 users, OLTP, 5 user databases, 1 instance
using a SQL Cluster (therefore the quorum disks below)
Quad XEON 1.9Ghz with hyper-threading, 8Gb RAM
IBM Shark SAN connected, PPRC’ed to remote backup SAN
LUNS or Logical Disks (may span many
physical RAID arrays)
Each SAN vendor can have wide and varying physical disk configuration limitations, such
as no-raid, or RAID-5 only, or a minimum set of 5 disks per array; either way, ongoing
license costs is a concern as storage grows. Be careful that you are not locked into
vendor only drives with little avenue for long term negotiation.
To distribute IO amongst the SAN, one may adapt the scenario above such as:
Be aware that the physical array may be used by other logical LUN’s, adding to the
complexity of drive and IO utilization. In any case, work with the vendor closely to
Christopher Kempster
321
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
monitor and managing system configuration and performance; consider under-pinning
contracts with the vendor to persist ongoing support with suitable response.
Virtualising Storage Management – the end game
One of the many buzz words in the storage market is that of “storage virtualization”.
This falls into the market space of vendors like Cicso, Brocade and Sun Microsystems just
to name a few. The solutions tend to be a specialized switch that supports a variety of
protocols and connection types (Fiber, Gigabit Ethernet, SCSI, iSCSI, iFUD etc). The
switch includes a complex API set and associated management software that allows
storage vendors to “plug in” their existing devices and translate their specific storage API
set with that of the virtualization switch; effectively creating a single virtualization
platform for a multiplicity of storage devices.
A classic example is that of the Cisco MDS 9509 (right) with 112
ports of 2Gbps fiber channel it delivers a single management
interface and QoS (quality of service) provisioning within its
embedded software for SAN routing. The devices themselves
include hot swappable power, are typically clusterable and include
redundant fabric controller cards.
Where this gets interesting is using VLANS for your SAN, also known as VSAN’s:
VSANs separate groups of ports into discrete “virtual fabrics”, up to 1000 per
switch. This isolates each VSAN group from the disruptive effects of fabric reconvergence that may occur in another VSAN. And, as with VLANs, routing is used
to forward frames between initiator and target (SAN source and destination) pairs
in different VSANs. Cisco has integrated VLANs and VSANs effectively: The IP
Storage Services Module, which extends the SAN fabric into an IP network, can
map 802.11q VLAN tags to VSAN identifiers. (42)
The main point here is the simplicity of storage management and depending on the
vendor, even more separation from the physical storage for a multitude of services. But
it is more than that. Consolidation through a storage integration engine brings reduced
TCO via:
a) single point monitoring and global storage management
b) active (de)provisioning
c) security
d) multi-protocol support
e) focused staff capability and management
Christopher Kempster
322
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
Here is a visualization of what we have discussed:
Any server with compatible HBA
1st tier storage
2nd tier storage
Virtualised access to
legacy DAS resources
Storage Virtualisation Gateway
Direct
Fibre
Direct
SCSI
iSCSI
iSCSI
NAS
SAN
DAS
NAS
DAS
Data life cycle management – offloading corporate data
(in real time) via the virtualised VSAN interface layer.
The end-game here is not so much SAN vs NAS, or fiber over iSCSI etc. These are all
decisions made from your specific performance, environment and budgetary
requirements. The key is the ease with which mass storage that can be provisioned
effectively using a variety of protocols and underlying storage technologies (the
enterprise and even the smaller business should avoid DAS). The CISCO solution, along
with VSAN’s is an important setup forward for the large enterprise.
So as a DBA - what storage scheme do I pick?
One of the mistakes DBAs make early in the piece when determining the server and
storage requirements is being overly concerned with the need to use a specific type of
RAID array for a transaction log file, and that all data files need to be striped this way
and that over yet another RAID which has 128Mb cache and is dual channeled etc. Who
really cares to be honest! What we do need to be concerned with is - what is the
availability, security, capacity, and growth estimations for the services (applications and
their databases) I am trying to deliver to the business?
From here we make decisions based on the cost effectiveness and efficiency of solutions
we propose to meet the business need. Reminding ourselves that:
a) Effectiveness = doing the right thing
b) Efficiency = cost based utilization
The DBA needs to engage the enterprise and technical architects, or system
administrators to determine if, through server and storage consolidation, we can meet
this need to the betterment of making IT work well for the business.
If you lock in the need for specific RAID types and huge storage too early, along with all
your perceived ideas about backups, tapes and procedures, you will always come out
Christopher Kempster
323
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
with “I need my own dedicated server and storage because of…” argument, which, funny
enough holds up in business cases because the systems owners simply want closure and
their service running (and HW can be relatively cheap).
This all comes to the simple answer – where possible, engage technical staff with the
business requirements and follow enterprise initiatives of server consolidation, shared or
virtual computing environments typically over clusters using large shared disk resources.
Some general considerations:
•
•
DAS (Direct Attached Storage) – be it a server with large disk capacity or a
directly connected disk array using a SCSI or Fiber HBA
o
Business ownership is unclear or segmented (un-sharable resources)
o
No existing (consolidated) infrastructure to work with, will plan to host a
variety of databases from the server and storage selected with space to
grow
o
Segregated application hosting domain, storage is not a shared resource
o
SCSI 160 or 320 only
o
Very specific HDD disk layout requirements to meet a high performance
need
o
Do not mind if storage becomes unavailable when attached server is
down, not a sharable/clusterable resource (some storage arrays are multihomed/self powered and can remain available).
o
Limited, where limited scalable storage is fine
o
Per server storage administration and maintenance model
o
Good system administrator skills required
SAN (storage area network)
o
Fiber connects, or virtualized through iSCSI or iFCP gateways to broaden
access to the SAN.
o
Very large scale (4+TB) consolidated shared disk storage for numerous
services
o
May require specialist administration knowledge
o
Ability to replicate entire storage in real time to remote SAN
o
Server boot from SAN disks, shared disk resource through virtualization
o
Dynamic storage provisioning on an as needs basis and highly scalable
o
Single point of global storage administration and monitoring
Christopher Kempster
324
S Q L
•
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
o
Typically fiber HBA and switch based (FC-AL or switch fabric)
o
Performance sensitive with low latency (over NAS)
o
Large IO’s or data transfers (over NAS)
o
Expertise required
o
Limited by distance (<=10km)
NAS (network attached storage)
o
Relatively cheap and easy to install, but can be single points of
administration for higher staff overheads.
o
Need for storage consolidation
o
iSCSI or similar access, no fiber
o
Reduced overall performance at the expense of simplicity and low
maintenance
o
File and print sharing
o
Cluster ready
o
Simple interface (Ethernet, FDDI, ATM etc)
o
No distance limitations
o
Regarded as 2nd tier mass storage
o
Typically SATA based, but watch performance carefully, consider TOE
cards.
Do note that many in the storage world believe NAS and SAN will eventually converge as
technologies through virtualization to get the benefits of both worlds.
From raw experience to date, the DAS and SAN are the only real alternatives for
database driven OLTP or DSS based applications. The NAS is perfect for what I call
second tier storage such as file and print services; test performance very careful if using
NAS for database files. The choice of SAN is typically an enterprise one and should be
taken as so in terms of responsibility to provide a storage solution for your database
service (aka performance, availability, scalability and capacity).
From a disk array configuration perspective, be pragmatic about the decisions made.
The DBA should spend a fair amount of time tuning applications with developers, or
trying to catch major vendor based application bottlenecks as early as possible. The key
here is to reduce overall physical IO through optimized query, maximizing buffer cache
usage and minimizing large reads causing large cache flushes. The purchase of suitable
RAM (2+Gb at a minimum) is very important (from a performance perspective more than
DR).
Christopher Kempster
325
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
The big issue here with RAID is raw storage. We used to deal in 9Gb drives, and would
purchase a lot of them with a multitude of raid-arrays to fit our individual database files.
With cheap 145Gb disk drives that are priced the same if not less that the 72’s or 36’s,
then filling valuable disk bays to increase spindle counts can be a hard ask. Do not be
afraid of using RAID-5 (3+ disks) with a large (128Mb+) write-enabled caching and
battery backup on the controller – but where possible avoid large log files on raid-5
arrays (minus SAN’s that have huge write cache and generally null-in-void effect for a
majority of applications).
Try as you may, the creation of a perfect file to array layout can be quickly unstuck as
more databases come on line, or the expected IO distribution is different to what you
originally estimated. With effective tuning, large RAID-5 or 10 arrays will be the best bet
for many solutions. The guide to RAID in this chapter has provided some examples.
In terms of backup, try to:
b) utilize enterprise wide backups. Watch out for SW locking database files or
skipping open files. Monitor times carefully to ensure backups are not streaming
during peak periods if a private backup network is not in use.
c) avoid direct attached tapes where possible
d) backup to disk and store as many days as you can. Avoid peak times for daily full
backups. Avoid sharing backups with database files over the same RAID
array/disk spindles; log backups are very quick (typically if regular) and you will
see little impact on a shared raid-5 array for a majority of cases
a) copy backup files to remove servers where possible, compress and encrypt files if
you can.
TAPE Drives
Tape drive technology is wide and varying with more convoluted acronyms and
associated technologies you can throw a stick at. It is therefore not unusual for system
engineers/architects to take a half hearted look at tape selection (be it a consolidated
solution or a tape drive per server). Either way this section attempts to cover out some
of the key tape technologies and what questions to ask on selection and implementation.
Some of the many tape technologies are listed below. Although not listed, interface
technology and tape library architecture is equally important (i.e. SCSI 160/320 etc):
Notes
Capacity Range
Sustained
(varies with
Transfer
Compression)
(GB/Hr)
DAT
1 to 4
20 to 40Gb
e.g. HP DAT 12/24Gb
SLR
1 to 3
20 to 100Gb
e.g. Tandberg 100
DLT
10 to 15
up to 300Gb
8000, VS80, VS160, Super DLT
LTO*
5 to 12
100 to 200Gb
* Ultrium LTO 1
SDLT
129 to 250
300 to 600Gb
SDLT600
SAIT
108 to 280
500Gb to 1.3Tb
Sony SAIT-1 - Ultra160 SCSI
LTO**
Up to 245
Up to 400Gb
** HP 2nd Generation LTO, LTO-2
“Due in 2003, Tandberg's first-generation O-Mass offering will have an uncompressed
storage capacity of 600GB, with succeeding releases rising up to an amazing 10 terabytes
(TB) on a single cartridge. Transfer rates on O-Mass' first generation cartridge is expected
to be 64MBps, accessing data in less than 3.5 seconds.” (1x)
Technology
Christopher Kempster
326
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
NOTE – The above figures are sourced from vendor documentation and may not
reflect real world results. I highly recommend testing on your chosen hardware
platform or researching as best you can. The maximum capacity is based on a
compression ratio that is typically 2:1, but again varies per vendor.
Be aware of the underlying interface and the raw throughput in which it can support
- namely SCSI (160, 320 etc) or Fiber Channel (measured in Gb). Price will
typically be measured in $ per Gb.
Speed can vary significantly based on the number of files being backed up, file
fragmentation, size of files (numerous small vs small number of large files), type
and number of network cards, other processes running at the time etc.
Here is another example from DELL in a year 2000 article:
Backup and Restore Strategies with SQL Server 2000,
http://www1.us.dell.com/content/topics/global.aspx/power/en/ps4q00_martin?c=us&cs=555&l=en&s=biz
I highly recommend the Toms Hardware Guide website for recent tape performance tests
and recommendations http://www6.tomshardware.com/storage/20030225/index.html,
refer to “Backing Up with Tape Drives: Security is what counts” for a starter.
Apart from raw transfer rate (be aware that the server, interface and connectors also
play a part in the stated figures), other metrics include:
a) Meters/second
b) Load time to beginning of tape (BOT)
c) Unload from BOT and average file access time from BOT.
d) We also have the connected interface, being SCSI or fiber and its respective
throughput (Mb/s) and source disk performance (to a lesser degree).
Take into careful consideration the procurement of tapes as the cost of these can vary
markedly; and be aware of supported operating systems and hardware, as market
penetration can be significantly different between the larger vendors (that tend to sell
their own tape technologies). At the other end of the scale, the evaluator needs to
consider MTBF (mean time before failure) which is typically measured as a percentage of
duty cycles and represented in hours.
Christopher Kempster
327
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
Taking the above a little further, we should consider the following questions when
purchasing tape solutions:
a) What is the overarching systems architecture for tape backups within your
organization? Will you serve all server backups via a single tape array? Or have a
single drive per server, or perhaps group tape backup units per domain or
application requirement?
a. Be aware that global enterprise backup solutions can be tiered, namely a
1st teir solution for your SAN, and 2nd tier solutions for NAS, DAS or simply
internal disks storage. The second tier is typically managed by cheaper
software solutions and their agents, pulling files over the IP network rather
than SCSI or fiber connections.
b) Will you consider highly redundant tape solutions? If your enterprise class
solution goes down, what is your mitigation strategy to continue processing the
following nights backups, identifying the backup tapes or reading existing tapes?
c) Do you have an accurate history of space usage? Can you see over the horizon
and how confident do you feel with the figures? This brings with it questions of
system extensibility and long term maintenance.
d) Do you have overarching documentation that records what/why/where data is
stored to tape? are there restrictions in terms of times backups can be made? If
you don’t, do you really understand a) and b)?
e) How is your [IP] network infrastructure impacted by large data volumes? Do you
collect definitive figures of network bandwidth usage during key backup times and
know what areas are experiencing lag? And are server NICS bottlenecks? Do
your business applications suffer in performance at these times and do you know
what is being affected?
f)
Are you being locked into vendor specific tapes? What is the TCO in terms of the
drives supported infrastructure and tapes required meet your medium to long
term needs? Where are they sourced from and can you wrap SLA’s around this?
(do you need to?)
g) Have you considered off-site tape storage? If you do, ensure tapes are available
locally where possible, visit 3rd party vendors and make enquires with their
clients, ensure costs are well defined in terms of tape retrieval, loss of tapes and
insurance to cover such issues. Take care with TCO measures here.
h) Do you require robotic tape racks/library for large-scale backup tape
management?
•
This typically requires enterprise class storage software such as Tivoli
Storage Manager from IBM. This software supports a wide gamut of
remote agents, operating systems and interfaces. The software resides on
a central backup server in which CPU and network connectivity will be
your greatest concern.
Christopher Kempster
328
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
•
Take to time to check the software licensing options (typically per CPU
based), and how the tape library can cross support different tape types
(i.e. LTO and LTO 2 for example).
•
Finally, check the backup schedule very carefully, and how tapes are
chosen from the rack. As data is streamed into the library you may find a
single applications content could span multiple tapes. The dispersed data
may result in skewed restore times and difficulty in recalling tapes from
offsite storage.
i)
Does your backup software support the tapes and their lengths/formats?
j)
Calculate air conditioning requirements to ensure optimal run-time environment
for your drives, Issues with tape writes where the temperature is outside of the
drives (and tapes) limit poses a major risk.
From a DBA perspective we would consider the following:
a) What is your backup strategy? When will you run full backups? Will you do
differentials each day and a single full once per week? What is the impact on your
recovery plans and SLA’s (especially with tape recovery and restore time). Think
carefully about multiple concurrent backups and how the business strategy for
backups will effect the use of a native SQL backup.
b) What are you [really] backing up? The all drives approach is typically an overkill
and will cost you down-stream with large storage requirements, backup server
bottlenecks and the need for more network throughput and overall time. The
DBA should consider “open file” database backups and include the SQL binaries,
full text catalogs, OLAP cubes, error logs etc at a minimum.
c) How will the backup meta-data be stored and checked? It is not uncommon for
DBA’s to schedule daily native SQL backups with are email to ping the DBA on
failure.
Building a Backup Server
Many organizations invest large sums of money in building and maintaining a single
backup server, and rightly so; supporting 40+ computers each night with individual tape
requirements represents a significant TCO for the business. Here we will present some
strategies for system design rather than physical solutions for enterprise backups:
1) Revisit and audit your server application backup requirements. Application
vendors and/or your development team should be approached in all cases, and
recovery specialist in your firm made part of this team – don’t take the backup
everything approach.
a. Review not only the size of the backup, but the break down of the files.
Are we talking thousands of small files? If so we really need to test the
backup software agents. Small files have tendency (in mass) to increase
CPU and IO resource usage and bottleneck the software itself. Consider
more RAM and review IO and network card utilization carefully during your
tests.
Christopher Kempster
329
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
b. If CPU usage is an issue and identifiable during backup, consider a TOE
(TCP offline engine) card. Such cards offload TCPIP processing from the
host CPU(s).
2) How will the data be transferred from the source to the destination server?
a. The hardware and network infrastructure is critical here with attention
paid to routers/switches, cabling and current bandwidth and lag issues,
the agents and their service packs/updates, network cards, crossing
network domain boundaries and SLA’s in terms of availability and
responsibility.
b. Consider the connectivity between servers carefully. Using the shared IP
network means possible conjestion and significant performance loss with
the services running on the machine. Also be aware of the performance
impact backup agents have on the server.
3) The Backup Server (destination)
a. Streaming data from numerous servers, serially and more often
asynchronously significantly impacts the performance of the hosts CPU,
HBA cards and connected tape drives and internal hard drives. Managing
the bottleneck is the real challenge here. As discussed, we can install TOE
cards (OS supported?) to reduce CPU throttling due to streamed traffic,
but two more frequently used solutions to maximize throughput and
enhance growth are:
i. Build a SAN or consider a NAS – the SAN is typically used to buffer
and/or queue incoming backup streams before going off to tape
(and subsequently offsite). A SAN based mass storage device is
an expensive solution for pre-hosting backups before offlining to
tape, so it must be compared with similar NAS devices or even
large scale internal disk capacity (which can easily stretch to 3+Tb
at as little as $400 for a 146Gb 10k SCSI-320 drive.
1. There are a range of re-packaged “disk to disk” (as they
are called) SAN backup solutions, such systems are
packaged as custom appliances, a 500Gb system costing
around $11,200 US from NexSAN Technologies Ltd for
example. Such systems utilize the ATA interface (common
in desktop PCs) internally, but present SCSI, Fiber or
Gigabyte interfaces externally.
a. Such systems should not be used to replace data
archiving or high availability requirements of very
large data centers.
ii. Consider direct SCSI of Fiber data streaming direct to tape rather
than over IP (via the agents) – appropriate configured switches
and routers can assist in conjection management and bottlenecks.
iii. Check CPU utilization performance and load test
Christopher Kempster
330
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
4) Test the market, the vendor and product availability and support, audit other sites
and their experiences
a. Be very careful with licencing. For example, you may find the backup
software require a cpu licence for the server, then another per cpu licence
for the agent, and yet another if the server is talking to a SAN, and
another again if it also manages database backups! Ouch!
b.
Site references are very important and must be timely and relevant.
5) Make a decision, build SLAs around business imperatives
6) Install, refine your source/destination server connections, document the process
and procedures, test recovery scenarios frequently (its not backed up until its
been restored!)
7) Build a series of metrics to measure performance, capacity and utilization. The
administrator should report against these on a monthly basis.
8) Train your staff and define the boundaries for responsibility and accountability
The backup software is really the end-game here. A product like IBM Tivoli Storage
Manager is a large scale, enterprise solution that virtualizes and wrappers a solid
foundation around corporate backup via server agents, the backup server (disk buffer)
and tape infrastructure or library.
Who needs tapes when I can go to Disk?
There is no doubt that backing up to disk is faster and more convenient. Large enterprise
backup solutions do just that. With the installation of backup agents and over a separate
backup or management network infrastructure (if you’re lucky), the backup management
server directs the agents to stream (typically in parallel) data from the servers to the
“backup server”. This tends to be a NAS or SCSI based disk storage solution, with TCP
offload cards and large internal disk capacity (in the order of 500Gb+); dual P4’s and
4+Gb RAM is nothing unusual as the job is system resource intensive.
Generally speaking, it is not unusual to see a range of disk-backup approaches taken:
a) Ntbackup, xcopy or other software over a file share, typically a physically remote
server on the same domain or 2nd-tier storage device (NAS, Serial ATA disk farm)
b) Backup to a SAN or other storage system that is not part of the source application
– can be relatively expensive in terms of $/Mb and restrictive in distance from
SAN for Fiber based HBA’s.
c) FTP file remotely – as the standard Windows (IIS) FTP service does not
encompass encryption third party software that does (128bit SSL) is used.
d) Streaming backups over HTTPS – rarely used
e) Log shipping to remote servers with a combination of the above
f)
Split-mirrors and removable disks
Christopher Kempster
331
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
g) Enterprise backups via agents to a shared robotic tape array
Here are some interesting points to consider:
2) 1xterrabyte of data – 10xAIT-3 tapes or 10xIDE 120Gb drives - we must compare
hardware infrastructure in each case and the cost of potential growth
3) MTBF for a tape vs disk, consider the numerous additional electronic components
for a drive, you will be surprised at the MTBF figures of hard drives vs tapes.
4) Disk storage is very adaptable and can be easily moved between servers
(generally), tape drive failure can result in longer downtime and costly
replacement
5) A mix of drive sizes can be utilized with ease
6) SCSI sub-systems typically have a 5+yr warranty, consider this when looking at
other interfaces.
I recommend that you should budget for large internal disk capacity no matter if you are
hooked to a SAN or are leveraging from your enterprise backup solution, should be
budgeted for to dump local backups to disk via the native solution(s) provided by the
DBMS. The DBA has the flexibility to quickly reapply stored backups on disk in
emergency scenarios, and as the free space to backup databases without pulling
resources in from the enterprise backup team to assist you with a recovery.
In the Data Centre
Understanding server Racks
The use of racks (server cabinets) for hosting a large number of “rack savvy” servers has
been around for years. As a DBA, its worth while understanding the basic “system admin
speak” of rack components.
The rack itself is simply a large steel cabinet (enclosed or simple framed).
The cabinet may include cable conduits, front and rear lockable doors
(degree of swing, locks etc), ventilated side panels, cut-safe steel, floor or
wall bolt and bracket provisioning, anti-tilt floor tray, wheels, rack dividers
etc, with a standard (usually) width and depth (600mm, 800mm, 900mm,
1m).
Many vendors sell-on fully equipped racks, but they
are typically component based over the rack frame.
Panels may include ventilation fans, doors of a
variety of types etc. The buyer may include racked ventilation fans
(to the left) for example.
IMPORTANT – Standardization to existing and proposed server hardware, Security
and Accessibility are key considerations in determining the best rack for your needs.
Christopher Kempster
332
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
The racks side rails are evenly drilled into what are called RUs (rack units) or simply Us,
compatible vendor servers are measured by the RU’s taken within the rack housing (1RU
or 1U = 1.75 inches) and are typically numbered bottom up. You may fine that the cage
nuts or screws, are priced separately. The rack heights (24U, 30U, 38U, 45U) can vary
of course to suite a wide range of requirements.
NOTE - many hardware vendors include rack mount kits for existing tower cased
machines.
The servers in the rack share a common keyboard, video and mouse via a device called a
KVM switch. The size and port configuration of the KVM will vary but is very similar in
function. Here is an example (only one server can be managed at any one time from the
console):
The KVM will come with a rack mounting kit, but may be sold with a special extending
keyboard and tray, along with a monitor and its rack trays/dividers to hold it in place.
The switch is a smart device taking a double-ctrl key click and showing you a character
based menu to pick the server in which to connect . Note that the KVM itself may have a
max screen resolution and may include added security features. If you are running low
on ports then most KVMs can be connected together.
To get power to the rack servers, one or more power distribution units (PDUs) are
installed. The PDU are installed on the sides of the rack (within the space provided) or
horizontally racked (see diagram). A large server
may include a number of redundant power
supplies, and you may find a single PDU cannot
serve its full compliment of outlets, so it’s not
unusual to see four or five PDUs within a rack.
The PDUs may be distributed in nature, i.e. half
of the PDU’s are serviced by one power source, and the other half by another for
redundancy. Take care in determining the power (AMP) draw required by racks PDUs.
Be aware of your mains power connectors:
To connect servers to the network, we may run dual or quad ethernet cables from the
individual servers out to a switch/router. Another alternative is the install the switch
within the rack itself. Multiple redundant switches may be used.
Christopher Kempster
333
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
The same can be said for the racked servers host bus adapters (HBAs). The HBA cards
facilitate connectivity to separate/detached storage, such as a SAN or direct attached
storage device (DAS). The HBA’s may connect to one or more racked switches off to
external disk storage.
Finally, cable management is a right pain within racks. No matter the vendor you will
always experience cable management problems. To ease your pain, consider
a labeling and documentation strategy early; color coding is effective but can
be difficult with numerous 1U servers for example. To pull cables together,
heat shrink tubing for example may be used to pull and tighten like cables.
The racks themselves may include additional cable management kits, but take
the time to position KVM, PDU, switches etc beforehand. This is especially
important when you need to move or replace servers, especially in production
racks. Also note that cable management trays or cable conduits can be purchased
separately in most cases.
The rack servers themselves should include rack
rails or rack mounting kit. This may not include
screws. The modern rack mounted server offers
state of the art technology and high performance,
from 1 to 16 or even more CPUs, terabytes of
internal disk storage and multiple redundant
power supplies.
Be very careful of server depth. Some servers require 1m deep racks, moving away
from the 800 and 900mm racks (the 1m servers fit minus the racks back door or panel).
What are Blade Servers and Blade Centers?
The blade server is a thin, hot-swappable server, independent of others in terms of CPU,
storage (typically a maximum of two disks), OS, network controllers (via extension cards
typically), but does share (with other like blades) power, fans, floppy disks, core ethernet
and HBA switches, KVM ports etc – this is managed via a backplane. This function is
served by the blade center. Let’s break down the components.
The blade center or chassis is the key component, housing the blades within the rack and
providing the essential services of power, serial/parallel/scsi ports, ethernet switching,
SAN switching, KVM connectors etc.
Power supply modules
Gigabit Ethernet and
fibre channel switch
modules (ESM’s)
Christopher Kempster
Fan/blower modules
334
Management Module for fault
detection, remote depoloyment
software etc.
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
An example configuration may be:
Active and Standby Links via NIC teaming
The chassis houses a range of blades, with a backplane that determines the connectors,
its supporting modules, and the types of server it supports.
The servers themselves can be a range of sizes, from 1 to 4U typically. The servers are
densely packed with minimal internal storage. Even so, many vendor blades are
enterprise class in terms of raw performance, with dual or quad CPU’s and large RAM
capacity.
Dual CPU’s, Quad CPU’s are less common but available. Their
footprint increases from 1U to 2 or even 3U though.
Backplane connectors
Dual internal disks, 1 can be supplemented for
SAN/NAS HBA
8Gb ECC RAM
Dual integrated Ethernet
Enternet connections can be Teamed for
added reduncency, but tends to require
additional software/drivers.
The true value of blades comes in the form of service virtualization within the blade
center. What do I mean by this? The blade takes utility or dense computing that step
further in terms of little space for high performance gain, all without the need to buy
single, large CPU (clustered) servers with fortune-500 price tags. In order to take
advantage of blades, the software that runs business services needs to be aware of this
virtual environment. The move to grid computing by a variety of vendors is a classic
example.
The Oracle 10g suite of products is a good example of this in play, where any number of
blades can be provisioned to serve a single Oracle Application Server (or database)
hosted business application, maintaining state and of course stability and scalability.
In the Microsoft space, using MSCS (Microsoft Cluster Service), component load
balancing, or network load-balancing coupled with .Net state server or database for
session management is an excellent equivalent/competing technology. All said, the
blades are more frequently used for Web and Applications servers over databases at
present – this I believe is more of a stigma of the technology in play rather than any
specific reason why you wouldn’t take the infrastructure seriously for large production
Christopher Kempster
335
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
systems. If this is the case, I would highly recommend two blade chasis and dividing
your SQL Cluster amongst blades within them to reduce the single point of failure (that
can take out 14+- servers in one hit).
Images above are from IBM corporate website and their blade server product range – June 2004.
Christopher Kempster
336
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
References
1. How to consolidate physical files and rename the logical file name of a database in
sql server 2000. Microsoft Support Services,
http://support.microsoft.com/?kbid=814576
2. Definition of “Contingency Plan”,
http://www.atis.org/tg2k/_contingency_plan.html
3. Comments from members on the article “From the soapbox – Does anyone know
what disaster recovery is?”. James Luetkehoelter, 2004.
www.SQLServerCentral.com
4. CoBIT framework overview, as at 12 Febuary 2004,
http://www.isaca.org/Content/NavigationMenu/About_ISACA/Overview_and_Hist
ory/Overview_and_History.htm
5. Orphaned Sessions, Rahul Sharma, printed from www.dbazine.com as at
6/3/2003.
6. Help! My Database is Marked Suspect, Brian Knight, 31/3/2004 @
www.sqlservercentral.com
7. Step-by-step guide to clustering Windows 2000 and SQL Server 2000, Brian
Knight, 12/7/2002 @ www.sqlservercentral.com
8. Clustering SQL Server 2000 from 500 feet, Brian Knight, 2/7/2001 @
www.sqlservercentral.com
9. Quantum SDLT-600, ZDNet Australia, product review,
http://www.zdnet.com.au/reviews/hardware/peripherals/0,39023417,391159274,00.htm
10. Beware of mixing collation with SQL Server 2000 – Part 1, Gregory Larsen,
19/2/2003,
http://www.databasejournal.com/features/mssql/article.phpr/1587631
11. “LTO 2 verses SDLT 600”, www.openstore.com, printed @ 6/4/2004
12. Background and History of Measurement-Based Management, Paul Arveson,
1998, http://www.balancedscorecard.org/bkgd/bkgd.html
13. Child Support Issues – Accountability vs Responsibility, 29 Dec 2003,
http://www.childcustody.org/childsupport/_disc84/00003786.htm
14. Responsibility vs Accountability, Formum Posts,
http://world.std.com/~lo/95.09/0304.html
Christopher Kempster
337
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
15. Tape vs Disk: Another View…, Computer Technology News, Feb 2002,
http://www.filetek.com/press/media/tapevrsdisk2.htm
16. Ghost Record Cleanup and Error 602, SQL Server Magazine, 6 August 2003,
http://www.microsoft.com/sql/techinfo/tips/administration/ghostrecords.asp
17. BUG – A failed assertion is generated during a bulk insert statement, Microsoft
Support, http://support.microsoft.com/default.aspx?scid=kb;en-us;815594
18. Brian Cryers Glossary of IT Terms with Links,
http://www.cryer.co.uk/glossary/D.htm
19. C.Russel, S.Crawford, J.Gerend, MS Press 2003, Chapter 19, Microsoft Windows
Server 2003 Administrators Companion.
20. P.Sharick, SQL Server SP2 Problems, Slow Open/Save Operations, EFS and
Password Changes, www.winntmag.com
21. B.Hollinshead, Reader to Reader - Shrinking Active Log Files Revisited,
www.winntmag.com
22. Experts Exchange, Shrink and move transaction log – easy, http://www.expertsexchange.com/Databases/Microsoft_SQL_Server/Q_20823435.html
23. Backing Up Disk to Disk – Computer World, Robert Mitchell, 3 June 2002
24. How to: Shrink the tempdb database in SQL Server, Microsoft Support Services,
http://support.microsoft.com/default.aspx?scid=kb;EN-US;Q307487
25. INF: Creating Database or Changing Disk File Locations on a Shared Cluster
Drive on Which SQL Server 2000 was not Originally Installed,
http://support.microsoft.com/default.aspx?scid=kb;en-us;295732
26. How to: Remove a SQL Server Service Pack, Microsoft Support Services,
http://support.microsoft.com/?kbid=314823
27. How to install SQL Server 2000 Clustering: Preparing for SQL Server 2000
clustering, B.McGehee, www.sql-server-performance.com
28. How can I fix a corruption in a system table?, N.Pike, 5/3/1999,
http://www.winnetmag.com/Article/ArticleID/14051/14051.html
29. Why you want to be restrictive with shrink of database files, T.Karaszi,
http://www.karaszi.com/SQLServer/info_dont_shrink.asp
30. My DTS package is stored inside SQL Server. Now I cannot open it.,
http://www.mssqlcity.com/FAQ/Trouble/OpenDTSPack.htm
31. How to overwrite DTS package log everytime the package executes,
N.V.Kondreddi, http://vyaskn.tripod.com/sql_server_dts_error_file.htm
32. Locked out of SQL Server, Microsoft,
http://www.microsoft.com/sql/techinfo/tips/administration/May3.asp
Christopher Kempster
338
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
33. Inside Microsoft SQL Server 2000, K.Delaney, 2001, Microsoft Press 2001.
34. Automatically gathering server information part 2, S.Jones, 25/12/2003,
www.sqlservercentral.com
35. SQL Server Central, Questions of the Day, www.sqlservercentral.com
36. List Available DBCC Commands, www.extremeexperts.com
37. Technical Resources – DBCC commands, www.umachandar.com
38. Five Nines – but the book, K.Percy, 14/4/2003,
http://www.nwfusion.com/columnists/2003/0414testerschoice.html
39. Most wanted service metrics (answer), R.Sturm,
http://www.greenspun.com/bboard/q-and-a-fetch-msg.tcl?msg_id=009E21
40. Computer Performance Metrics, FrontRunner Computer Performance Consulting,
http://www.frontrunnercpc.com/info/metrics.htm
41. Team development with visual studio.net and visual sourcesafe, Microsoft
Corporation, Jan 2002,
http://msdn.microsoft.com/library/default.asp?url=/library/enus/dnbda/html/tdlg_ch6.asp
42. Cisco’s flawless director of traffic, R.Birdsall & E.Mier, 16/6/2004,
http://www.arnnet.com.au/index.php?id=981505022
43. First serial ATA drive: Expectations Exceed Results,
http://www.extremetech.com/article2/0,1558,1150388,00.asp
44. “TCQ, RAID, SCSI and SATA”,
http://www.storagereview.com/articles/200406/20040625TCQ_6.html
45. Future of storage: is disk dead?, J.Mehlman, July 2004,
www.technologyandbusiness.com.au
46. Storage – Hard and Tape Drive Market First Quarter 2004, Evans Research,
http://www.evansresearch.com/snapshots/hardtape.html
47. AIT Tape and SAIT Tape – Advanced Intelligent Tape,
http://www.aittape.com/roadmap.html
48. Sony’s S-AIT Tape Gets Jump on Competitors, L.Mearian, Dec 2002,
http://www.computerworld.com/hardwaretopics/storage/story/0,10801,77003,00
.html
49. Disaster Recovery Plan Template – version 2.7, http://www.ejanco.com/drp_template.htm
50. Learn about SQL Server Disaster Recovery from Greg Robidoux of Edgewood
Solutions – Interview, B.McGeHee, http://www.sql-serverperformance.com/greg_robidoux_interview.asp
Christopher Kempster
339
S Q L
S E R V E R
B A C K U P , R E C O V E R Y
&
T R O U B L E S H O O T I N G
51. Disaster Recovery Planning, Chapt 1. An Overview, R.J.Sandhu, Premier Press ©
2002
52. Disaster Recovery Planning, Chapt 1. Disaster Recovery Planning Strategies,
R.J.Sandhu, Premier Press © 2002
53. INF: sp_attach_single_file_db Does not work for databases with multiple log
files, Microsoft Support, http://support.microsoft.com/default.aspx?scid=kb;enus;271223
54. Finding SQL Servers running on a network, J.Deere, www.sqlteam.com
55. Collecting System Information in SQL Server 2000, R.Sharma, 30/11/2000,
www.databasejournal.com
56. Simple Log Shipping in SQL Server 2000 Standard Edition, R.Talmage, SQL
Server Magazine.
57. The features of the local quorum resource on Windows Server 2003 Cluster,
Microsoft Support.
58. How to troubleshoot cluster service startup issues, Microsoft Support,
http://support.microsoft.com/?kbid=266274
59. Alert! Alert! Alert! Backup and Restore – Baby!, J.Kadlec, 12/4/2004,
http://www.edgewoodsolutions.com/resources/BackupAndRestoreBaby.asp
Christopher Kempster
340