* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download RMAN in the Trenches: Part II
Survey
Document related concepts
Transcript
Part 2 -- RMAN in the Trenches: To Go Forward, We Must Backup Philip Rice Univ. of California Santa Cruz NoCOUG: August 16, 2007 1 Overview Motivation: Few RMAN sessions, & Giving Back Experience Level: Intermediate & Beginner Corruption Detection Metadata Management and Reporting The Good, The Bad, The Ugly (a sampling) Flashback Performance/Tuning (safety first!) Plenty of material today; please ask if for clarity, otherwise best to save questions to end NoCOUG: August 16, 2007 RMAN in the Trenches, Part 2 2 Corruption Detection Default is to stop the backup as soon as corruption is detected SET MAXCORRUPT for each datafile would override that But MAXCORRUPT should only be used when priority is finishing rest of backup vs. repairing corruption (seldom) BACKUP VALIDATE will expose other corruptions, and repair can be done NoCOUG: August 16, 2007 RMAN in the Trenches, Part 2 3 Corruption Detection Can use RESTORE...VALIDATE to check on backups; this is not checking datafiles (.dbf) From Oracle Press book: “...validation is not a comprehensive test.“ RESTORE DATABASE looks at headers in the level 0 backup, which is used to get datafiles Level 1 has changes applied on top of those datafiles, so level 1 would not come into play until doing RECOVER rather than RESTORE NoCOUG: August 16, 2007 RMAN in the Trenches, Part 2 4 Corruption Detection RECOVER...VALIDATE n/a, but can use VALIDATE instead of RESTORE...VALIDATE Use KEY column values from LIST BACKUP SUMMARY Testing shows we can examine any or all backups, including level 1 and archivelog backups Alternate: RECOVER DATABASE TEST, but docs say that the TEST clause can be used "only if you have restored a backup taken since the last RESETLOGS operation.“ I tried it: said system datafile in use NoCOUG: August 16, 2007 RMAN in the Trenches, Part 2 5 Corruption Detection The init parameters db_block_checking and db_block_checksum will detect datafile corruptions, as reads and writes are occurring Similar, but not interdependent: When block checking is on, blocks are examined for internal consistency --always enabled for the system tablespace, but off by default for other tablespaces. When checksum is on, corruption caused by underlying I/O systems can be detected. If set to FULL, it also catches in-memory corruptions and stops them from making it to the disk. Default is TYPICAL, same as TRUE: 9i backward compatibility NoCOUG: August 16, 2007 RMAN in the Trenches, Part 2 6 Corruption Detection For strongest possible corruption protection with RMAN backups, a White Paper (http://www.oracle.com/technology/deploy/availability/pdf/corruption_wp.pdf) recommends: In the initialization parameter file, set DB_BLOCK_CHECKSUM=TRUE (default setting; default is TYPICAL for 10g, TRUE for backward compatibility) In BACKUP and RESTORE commands, do not specify the MAXCORRUPT option, do not specify the NOCHECKSUM option, but do specify the CHECK LOGICAL option NoCOUG: August 16, 2007 RMAN in the Trenches, Part 2 7 Corruption Detection Turn on db_block_checking (for non-system tablespaces) with LOW, MEDIUM, FULL in 10g, with 1-10% overhead TRUE (backward compatible from 9i) is the same as FULL In docs for this parameter: "You should set DB_BLOCK_CHECKING to FULL if the performance overhead is acceptable." NoCOUG: August 16, 2007 RMAN in the Trenches, Part 2 8 Corruption Detection FULL option for db_block_checksum (10g): Extra 4-5% overhead 10.2 docs: “catches in-memory corruptions and stops them from making it to the disk. [...]. Oracle recommends that you set DB_BLOCK_CHECKSUM to TYPICAL.” Steve Adams says I/O intensive queries with moderate to high CPU use can be worse than the estimate indicated in the docs Testing is advisable NoCOUG: August 16, 2007 RMAN in the Trenches, Part 2 9 Corruption Detection DB_BLOCK_CHECKSUM FULL setting or not: Prior job: CPU glitch discovered after months in production, introduced corruption due to heavy batch job use. Financial repercussion $1M+ (1999 $$), vendor essentially gave a top end machine to compensate Cost for not capturing in-memory corruption could be high. NoCOUG: August 16, 2007 RMAN in the Trenches, Part 2 10 Metadata Basics Data Dictionary for backup work Always in controlfile in virtual tables (V$ views) Optionally in separate catalog DB, comparable info in real tables (& RC* views) Catalog: Open ended time span, multiple DBs NoCOUG: August 16, 2007 RMAN in the Trenches, Part 2 11 Metadata: Create New Controlfile All previous metadata lost; asking for fresh start How does RESYNC affect catalog? Testing shows resync is 1-way street, controlfile to catalog Nice surprise: Testing shows resync does not wipe out catalog entries in control_file_record_keep_time period; metadata not lost in catalog DB Catalog recommended by Oracle; less critical with newer features NoCOUG: August 16, 2007 RMAN in the Trenches, Part 2 12 Metadata Management Safety: With CONFIGURE command, turn on autobackup of the controlfile Controlfile plus catalog: extra layer for safety "High Availability Best Practices" section 2.5.3.2: Run source backups in nocatalog mode to reduce dependency on the catalog database being available. At a later point, do a resync Feature idea: be able to connect to two catalogs; like mirrored disks; alternatively, do standby of catalog DB NoCOUG: August 16, 2007 RMAN in the Trenches, Part 2 13 Metadata Reporting: Runtime trends for disk/tape Do crosstab from RC_BACKUP_PIECE view. Make anything before 4PM part of previous overnight run: CREATE OR REPLACE VIEW ucsc_bkup_trend_insert_vw (...column aliases...) AS SELECT CASE WHEN to_char(p.START_TIME,'HH24') < 16 THEN trunc(p.START_TIME - 1) ELSE trunc(p.START_TIME) END AS bkup_date, sdl.SERVER_NAME, d.name, p.device_type, nvl(max(CASE WHEN backup_type = 'D‘ THEN p.ELAPSED_SECONDS END),0) AS LVL0_secs, nvl(max(CASE nvl(max(CASE nvl(max(CASE nvl(max(CASE nvl(max(CASE WHEN WHEN WHEN WHEN WHEN backup_type backup_type backup_type backup_type backup_type = = = = = 'I' 'L' 'D' 'I' 'L' THEN THEN THEN THEN THEN p.ELAPSED_SECONDS END),0) AS LVL1_secs, p.ELAPSED_SECONDS END),0) AS ARCH_secs, p.BYTES END),0) AS LVL0_bytes, p.BYTES END),0) AS LVL1_bytes, p.BYTES END),0) AS ARCH_bytes, p.START_TIME, p.COMPLETION_TIME FROM rc_backup_piece p, rc_database d, ucsc_server_db_list sdl [...] NoCOUG: August 16, 2007 RMAN in the Trenches, Part 2 14 Metadata Reporting: Runtime trends for disk/tape [crosstab from RC view...] WHERE p.DB_KEY = d.DB_KEY AND d.NAME = sdl.DB_NAME AND p.backup_type in ('D','I','L') GROUP BY (CASE WHEN to_char(p.START_TIME,'HH24') < 16 THEN trunc(p.START_TIME - 1) ELSE trunc(p.START_TIME) END ) , d.NAME, sdl.SERVER_NAME, p.DEVICE_TYPE, p.START_TIME, p.COMPLETION_TIME ORDER BY (CASE WHEN to_char(p.START_TIME,'HH24') < 16 THEN trunc(p.START_TIME - 1) ELSE trunc(p.START_TIME) END ) , sdl.SERVER_NAME, d.name, p.device_type, p.START_TIME; Make persistent table so we have trend info beyond retention period: create table ucsc_bkup_trend_details as select * from ucsc_bkup_trend_insert_vw; Do scheduled inserts so trend info is available long term. NoCOUG: August 16, 2007 RMAN in the Trenches, Part 2 15 Metadata Reporting: Runtime trends for disk/tape In next 4 slides, we see OPTIMIZATION ON in effect for last several days on graphs. For OPTIMIZATION OFF in earlier days, two factors: 1. Archives on disk for 3 days, used in transition at our site: results in 3 copies of archive backups. This was known/expected. 2. BACKUP BACKUPSET gives multiple tape copies, including Level 0 each day! The repository knows about Incremental Level, but behavior is different from making original backupset. This was not expected, and metadata reporting brought out this difference. NoCOUG: August 16, 2007 RMAN in the Trenches, Part 2 16 Metadata Reporting: Schedule Planning – Disk Time Archive backups greatly reduced in last few days 250 200 Minutes 150 ARCH LVL1 LVL0 100 50 0 7/20 7/21 7/22 7/23 NoCOUG: August 16, 2007 7/24 7/25 7/26 7/27 7/28 7/29 7/30 7/31 8/1 RMAN in the Trenches, Part 2 8/2 8/3 8/4 8/5 8/6 17 Metadata Reporting: Tape to Disk Size Ratio Size greatly reduced, no extra Lvl0 copies Ratio reduced from Max of 16:1, down to 1:1 18 16 14 Tape to Disk Ratio 12 10 8 6 4 2 0 7/21 7/22 7/23 7/24 NoCOUG: August 16, 2007 7/26 7/27 7/28 7/29 7/30 7/31 RMAN in the Trenches, Part 2 8/1 8/2 8/3 8/4 8/5 8/6 18 Metadata Reporting: Tape Runtime Multiple tape processes from MML Execution Time higher than Clock Time Not cumulative (not stacked) line chart 1200 1000 Minutes 800 600 400 200 0 7/19 7/20 7/21 7/22 7/23 7/24 7/25 7/26 7/27 7/28 CLOCK NoCOUG: August 16, 2007 7/29 7/30 7/31 8/1 8/2 8/3 8/4 8/5 8/6 EXEC RMAN in the Trenches, Part 2 19 Metadata Reporting: Disk/Tape Cumulative Runtime – stacked line chart 700.0 600.0 500.0 Minutes 400.0 TAPE DISK 300.0 200.0 100.0 0.0 7/19 7/20 7/21 7/22 7/23 NoCOUG: August 16, 2007 7/24 7/25 7/26 7/27 7/28 7/29 7/30 7/31 RMAN in the Trenches, Part 2 8/1 8/2 8/3 8/4 8/5 8/6 20 Metadata Reporting: Compression Ratio COMPRESSION_RATIO column is in 10 _summary and _details views, but these are “primarily intended to be used internally by Enterprise Manager.” Before finding that caveat, I had found results to not trust RC_BACKUP_SET_DETAILS --- good to avoid. In following chart, (10.2.0.2 for testing), Input Bytes is same for Level 0 and 1, so calc is from total DB used space. Level 1 ratio is distorted. The BDF table in RC_BACKUP_DATAFILE view knows about Block Change Tracking, and has count of blocks scanned, so a better ratio could be calculated. NoCOUG: August 16, 2007 RMAN in the Trenches, Part 2 21 Metadata Reporting: Compression Ratio from RC_BACKUP_SET_DETAILS For device_type = ‘*’, MAX Ratio for Level 1 of 690 to 1 690 391 LVL1 47 7 LVL0 7 Bkup Type 6 MAX AVG 6 MIN 6 ARCH 4 1 CTLFL 1 1 0 100 200 300 400 500 600 700 800 Ratio NoCOUG: August 16, 2007 RMAN in the Trenches, Part 2 22 The Good, The Bad, The Ugly (a sampling) Gradual improvements in each release: e.g. binary compression in 10g I requested a couple: separate retention periods for disk/tape, ability to display connection information at the RMAN prompt Corruption detection during block scanning History in the RMAN catalog, e.g. disk and tape/MML runtimes for planning purposes NoCOUG: August 16, 2007 RMAN in the Trenches, Part 2 23 The Good, The Bad, The Ugly (a sampling) Unavoidable: When testing, we can't modify metadata in the controlfile to alter behavior for our purposes, e.g. the shortest retention window is 1 day (use ALTER SYSTEM SET FIXED_DATE ) No command editing or buffer display in RMAN comparable to sqlplus; LIST command without summary clause can be copious, can be off the terminal buffer, so cmd not retrievable NLS_DATE_FORMAT variable must be set before starting RMAN (no SET cmd) NoCOUG: August 16, 2007 RMAN in the Trenches, Part 2 24 The Good, The Bad, The Ugly (a sampling) RMAN is tied into SQL engine, but no SELECT; For catalog query (sometimes better than LIST), need separate sqlplus session RMAN will make a new backup file, but for backups in separate directories based on database name (%d in the format), it won't make a new directory for us; causes backup failure NoCOUG: August 16, 2007 RMAN in the Trenches, Part 2 25 The Good, The Bad, The Ugly (a sampling) Can't reconnect a different way after command requiring repository connection (e.g. BACKUP), must exit and start over GLOBAL stored script in 10g is a step forward: But no variables and language PL/SQL is inherently in engine, e.g. could execute correct RMAN syntax based on query to determine DB version, allow generic script NoCOUG: August 16, 2007 RMAN in the Trenches, Part 2 26 Cover Your Sixes ... NoCOUG: August 16, 2007 RMAN in the Trenches, Part 2 27 ...so you don’t get caught by surprise! NoCOUG: August 16, 2007 RMAN in the Trenches, Part 2 28 Cover Your Sixes In 10.2 docs for ALLOCATE CHANNEL: "You must use a recovery catalog when backing up a standby database." -- another benefit of catalog "When using Flashback Database with a target time at which a NOLOGGING operation was in progress, block corruption is likely in the database objects and datafiles affected by the NOLOGGING operation." NoCOUG: August 16, 2007 RMAN in the Trenches, Part 2 29 Cover Your Sixes: Syntax Syntax can be similar with different meanings: # We're doing a 'normal' backup here, not an image copy: RMAN> backup as backupset ...; # The backupset that was created before is copied to another destination: RMAN> backup backupset ...; ------------------------------------------------# These two will deal with all types: controlfile, datafile, and archivelog RMAN> CROSSCHECK BACKUP; RMAN> LIST EXPIRED BACKUP; ------------------------------------------------# These two affect archivelogs, not backups of archivelogs RMAN> CROSSCHECK ARCHIVELOG ALL; RMAN> LIST EXPIRED ARCHIVELOG; NoCOUG: August 16, 2007 RMAN in the Trenches, Part 2 30 Cover Your Sixes: tape The "BACKUP BACKUPSET" command did not pick up format from CONFIGURE, it used the default of “%U”, not what I specified for tape: CONFIGURE CHANNEL DEVICE TYPE 'SBT_TAPE' FORMAT '%d_%T_%U' SEND [...] But using the format in ALLOCATE CHANNEL in the script was successful. Docs say default of “%U” is unique, but it gave us occasional duplicate tape file names. Virtual Tape Library -- "BACKUP BACKUPSET" command can not copy from tape to tape. We will still want VTL as secondary storage, not as a replacement for our disk backup area. NoCOUG: August 16, 2007 RMAN in the Trenches, Part 2 31 Flashback 9i was logical only, using Undo 10g Flashback Database is physical, using Flashback Logs; rewind DB faster than PIT recovery When to use? Business can not lose transactions for a number of hours!! NoCOUG: August 16, 2007 RMAN in the Trenches, Part 2 32 Flashback Scenarios for Flashback Database: “...save the SCN to a spool file, for example, before running a high-risk batch job.“ “Easy conversion of a physical standby database to a reporting database and back to a standby. [...] reverse the activation of a standby database.” Test/Dev DB: known starting point for tests Standby can be reverted to an earlier time, which could allow examination and manipulation in two different time periods. This would allow recovery of corrupted objects. NoCOUG: August 16, 2007 RMAN in the Trenches, Part 2 33 Flashback Flashback Recovery Area, many file types recommended: redo, archive, ctlfile, backupsets Potential for more disk contention when all in one area Example: RAID10 for DB plus redo, archive, ctlfile now; RAID5 for backupsets (write penalty not significant enough for off hours batch job); vendor app generated 25GB in a half hour for temp tables in reporting(!) Do FRA for a reason, not on a whim NoCOUG: August 16, 2007 RMAN in the Trenches, Part 2 34 Tuning/Performance Minimize what needs to be read and written: 10.2.0.2 can skip empty blocks in addition to unused blocks 10g can do binary compression to disk, much less quantity that needs to be taken to tape Concern about backups adversely affecting online use? RATE clause can limit disk I/O Look at max bytes/sec of disk system, e.g. for 10M max possible, RATE of 5M would allow 1/2 of disk capability for non-backup purposes NoCOUG: August 16, 2007 RMAN in the Trenches, Part 2 35 Tuning/Performance For sites using a physical standby, docs say that turning off db_block_checking in a physical standby "can provide as much as a twofold increase in the apply rate" for redo logs, but db_block_checking should not be turned off at the primary database Metalink Note 311068.1 -- RMAN Performance Tuning Diagnostics NoCOUG: August 16, 2007 RMAN in the Trenches, Part 2 36 Tuning/Performance References: http://www.oracle.com/technology/deploy/availability/pdf/rman_performance_wp.pdf http://www.oracle.com/technology/deploy/availability/pdf/br_optimization.pdf Chapter in 10.2 RMAN Docs -- Advanced User's Guide: http://downloadwest.oracle.com/docs/cd/B19306_01/backup.102/b14191/rcmtunin.htm#sthref1057 NoCOUG: August 16, 2007 RMAN in the Trenches, Part 2 37 A&Q Acknowledgements: Timothy Chien, Oracle Product Mgr. Bill Wagman, UC Davis Presenter: Philip Rice price [at] ucsc.edu A&Q Answers: Wisdom to share? Questions? NoCOUG: August 16, 2007 RMAN in the Trenches, Part 2 38