* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download DiggingOutFromCorruption-EddieWuerch
Survey
Document related concepts
Transcript
Digging Out From Corruption Data protection and loss recovery with SQL Server Eddie Wuerch, MCM - Principal, Database Performance - Salesforce Marketing Cloud I am a DBA I am a steward of my company’s data Data loss can close my company Data loss can ruin my career Data loss shall not occur Hi, I’m Eddie! And I’m a DBA. Over 15 years SQL Server Microsoft Certified Master Salesforce Marketing Cloud ◦ Trillions of rows … 10+ billion tx/day … PBs data & indexes… ◦ …24x7, no downtimes What is “Corruption”? Logical Corruption DELETE dbo.BigTable --------A bazillion rows affected. What is “Corruption”? Physical Corruption SELECT id,… dbo.BigTable --------Error 824 Corruption LOGICAL – HUMAN ERROR PHYSICAL – DAMAGED MEDIA Incorrect data mods File damage Detection is up to you Incomplete writes Manually fix data/restore DB SQL Errors: 823, 824, 825 DBCC CHECKDB Discreet restore options AG Auto-repair (!!) Physical Corruption- Detection CHECKSUM Page Verification ◦Always use this. Every database. Agent alerts: 823, 824, 825 msdb.dbo.suspect_pages Detection on page access. Corruption may lie dormant for a long time 823/824/825 - DON’T PANIC DBCC CHECKDB ◦Get used to this BEFORE disaster ◦Run without repair opts ◦Let it complete ◦Your problem may fixed by dropping an index ◦Investigate performance techniques Preparation A backup never saved anybody’s job. The restore did. Plan for the restore, not the backup The Restore Strategy RPO & RTO: What are your goals? Layers of disaster / layers of recovery ◦ Disk, Server, Network, Datacenter… Time = money ◦ Lower downtime = higher cost of equipment and labor ◦ Higher RPO/RTO = higher potential cost of fines, loss of business, refunds, etc. ◦ RPO/RTO determined by cost Backup Options Full Backup Database, Filegroup, File All recovery models Differential Database, Filegroup, File All recovery models Transaction log Database only Not available in SIMPLE recovery mode The Full Database Backup Restore an entire database Begin a point-in-time restore Begin point of a FG/file/page restore (pull 8kb from last week’s backup, place it in running database) Does NOT break the log chain The Full Backup File(s) Contains every allocated page Plus enough tx log to bring DB consistent Tx log will not be cleared during full backup (space planning) Differential Backups All changed extents since last Full backup Plus enough tx log for consistency Can save lots of time on restores Log Backups Changes since last log backup Sequential record of all changes Can be taken after loss of data file(s), if log file is available (Full Recovery Model only) N/A for Simple Recovery Model The Transaction Log One file per DB is enough Write-ahead logging Both redo and undo tracked ACID The Transaction Log Recovery Model vs. Logging Model Crash recovery Bulk-Logged Recovery Model Full recovery model, with exceptions: ◦ Minimally-logged transactions (ML) only record allocations ◦ Log can’t redo – CHECKPOINT on commit (ouch!) Log backups of ML transactions include all changed data pages The Log Chain Each log backup = changes since last log backup The sequential collection of restorable transaction log backups = log chain Starts with a full backup Is not tied to the most recent full backup BACKUP… MIRROR TO Enterprise Edition only Specify additional copies of backup file(s) Up to 3 mirrors Works with Full, Diff, and Log backups Restore Options Entire database in one operation Partial (Ent.Ed.) ◦ Restore PRIMARY FG, bring DB online ◦ Restore additional FGs, bring online one-by-one (partitioning bonus) Corruption Fixes (Online if Ent.Ed.): ◦ Restore damaged files ◦ Repair damaged pages Restore Options Full A Diff A-1 Diff A-2 Full B Diff B-1 Log Log LogLOG Log Log Log Log BACKUP … TO Log DISKLog = 'V:\Logs\... ' Log Log Log LogTO DISK Log Log Log Log MIRROR = ‘W:\Logs\... 'Log Log Log Restore Options Full Diff A-1 Full Full B Diff A-2 Diff B-1 Log Log Log Log Log Log Log Log Log Log Log Log Log Log Log Log Log Log Log Log Restore Options Full Full A Diff A-1 Full B Diff A-2 Diff B-1 Log Log Log Log Log Log Log Log Log Log Log Log Log Log Log Log Log Log Log Log Demo Disk corruption – non-clustered index Physical Corruption: detection and different restore/repair types Disk corruption – clustered index (Let’s break stuff!) Lost data file Document, Practice, Drill, Repeat At restore time: panic, anguish, and unhappy executives Crises don’t honor vacation schedules or work hours Script, automate, document Thanks for attending! Please fill out the survey. Download these slides and scripts at SQLSaturday.com Stick around for the raffle! Then join us at the afterparty at Champps Americana