Download DiggingOutFromCorruption-EddieWuerch

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Microsoft Jet Database Engine wikipedia , lookup

Database wikipedia , lookup

Relational model wikipedia , lookup

Concurrency control wikipedia , lookup

Clusterpoint wikipedia , lookup

Object-relational impedance mismatch wikipedia , lookup

Database model wikipedia , lookup

Extensible Storage Engine wikipedia , lookup

Transcript
Digging Out From Corruption
Data protection and loss recovery with SQL Server
Eddie Wuerch, MCM -
Principal, Database Performance - Salesforce Marketing Cloud
I am a DBA
I am a steward of my company’s data
Data loss can close my company
Data loss can ruin my career
Data loss shall not occur
Hi, I’m Eddie! And I’m a DBA.
Over 15 years SQL Server
Microsoft Certified Master
Salesforce Marketing Cloud
◦ Trillions of rows … 10+ billion tx/day … PBs data & indexes…
◦ …24x7, no downtimes
What is “Corruption”?
Logical Corruption
DELETE
dbo.BigTable
--------A bazillion
rows affected.
What is “Corruption”?
Physical Corruption
SELECT id,…
dbo.BigTable
--------Error 824
Corruption
LOGICAL – HUMAN ERROR
PHYSICAL – DAMAGED MEDIA
Incorrect data mods
File damage
Detection is up to you
Incomplete writes
Manually fix data/restore DB
SQL Errors: 823, 824, 825
DBCC CHECKDB
Discreet restore options
AG Auto-repair (!!)
Physical Corruption- Detection
CHECKSUM Page Verification
◦Always use this. Every database.
Agent alerts: 823, 824, 825
msdb.dbo.suspect_pages
Detection on page access.
Corruption may lie dormant for a long time
823/824/825 - DON’T PANIC
DBCC CHECKDB
◦Get used to this BEFORE disaster
◦Run without repair opts
◦Let it complete
◦Your problem may fixed by dropping an index
◦Investigate performance techniques
Preparation
A backup never saved anybody’s job.
The restore did.
Plan for the restore, not the backup
The Restore Strategy
RPO & RTO: What are your goals?
Layers of disaster / layers of recovery
◦ Disk, Server, Network, Datacenter…
Time = money
◦ Lower downtime = higher cost of equipment and labor
◦ Higher RPO/RTO = higher potential cost of fines, loss of business,
refunds, etc.
◦ RPO/RTO determined by cost
Backup Options
Full Backup
Database, Filegroup, File
All recovery models
Differential
Database, Filegroup, File
All recovery models
Transaction log
Database only
Not available in SIMPLE recovery mode
The Full Database Backup
Restore an entire database
Begin a point-in-time restore
Begin point of a FG/file/page restore (pull 8kb from last
week’s backup, place it in running database)
Does NOT break the log chain
The Full Backup File(s)
Contains every allocated page
Plus enough tx log to bring DB consistent
Tx log will not be cleared during full backup (space planning)
Differential Backups
All changed extents since last Full backup
Plus enough tx log for consistency
Can save lots of time on restores
Log Backups
Changes since last log backup
Sequential record of all changes
Can be taken after loss of data file(s), if log file is available
(Full Recovery Model only)
N/A for Simple Recovery Model
The Transaction Log
One file per DB is enough
Write-ahead logging
Both redo and undo tracked
ACID
The Transaction Log
Recovery Model vs. Logging Model
Crash recovery
Bulk-Logged Recovery Model
Full recovery model, with exceptions:
◦ Minimally-logged transactions (ML) only record allocations
◦ Log can’t redo – CHECKPOINT on commit (ouch!)
Log backups of ML transactions include all changed data
pages
The Log Chain
Each log backup = changes since last log backup
The sequential collection of restorable transaction
log backups = log chain
Starts with a full backup
Is not tied to the most recent full backup
BACKUP… MIRROR TO
Enterprise Edition only
Specify additional copies of backup file(s)
Up to 3 mirrors
Works with Full, Diff, and Log backups
Restore Options
Entire database in one operation
Partial (Ent.Ed.)
◦ Restore PRIMARY FG, bring DB online
◦ Restore additional FGs, bring online one-by-one (partitioning bonus)
Corruption Fixes (Online if Ent.Ed.):
◦ Restore damaged files
◦ Repair damaged pages
Restore Options
Full
A
Diff
A-1
Diff
A-2
Full
B
Diff
B-1
Log
Log
LogLOG Log
Log
Log
Log
BACKUP
… TO Log
DISKLog
= 'V:\Logs\...
'
Log
Log
Log
LogTO DISK
Log
Log
Log
Log
MIRROR
= ‘W:\Logs\...
'Log
Log
Log
Restore Options
Full
Diff
A-1
Full
Full
B
Diff
A-2
Diff
B-1
Log
Log
Log
Log
Log
Log
Log
Log
Log
Log
Log
Log
Log
Log
Log
Log
Log
Log
Log
Log
Restore Options
Full
Full
A
Diff
A-1
Full
B
Diff
A-2
Diff
B-1
Log
Log
Log
Log
Log
Log
Log
Log
Log
Log
Log
Log
Log
Log
Log
Log
Log
Log
Log
Log
Demo
Disk corruption – non-clustered index
Physical Corruption:
detection and different
restore/repair types
Disk corruption – clustered index
(Let’s break stuff!)
Lost data file
Document, Practice, Drill, Repeat
At restore time: panic, anguish, and unhappy executives
Crises don’t honor vacation schedules or work hours
Script, automate, document
Thanks for attending!
Please fill out the survey.
Download these slides and scripts
at SQLSaturday.com
Stick around for the raffle!
Then join us at the afterparty at Champps Americana