Download IT 21003 Database Administration

IT 21003 Database Administration Section 08 Protecting the Database from Failure Backup and Recovery  Questions to ask:      Who has the responsibility? Who does it? Who plans the procedures? Who takes the precautions? Who makes it happen?  Possibilities:  End Users  Operators  Power Users  Application or Systems Programmers  Systems Managers Tape Librarians  DBA Protecting the Database from Failure Backup and Recovery: DBA Activities  The DBA has the responsibility for planning backup and recovery procedures/strategies to cope with hardware and software failure  Disasters cause high-stress situtations  The plan must be simple, preferably automated  Should minimize human intervention  Can use the Recovery Manager to help automate these procedures  The Recovery Plans must be well documented  The recovery procedures must:  Bring the database back in a consistent state at the time of the crash  Recover the database in a reasonable time period so that downtime is minimized Protecting the Database from Failure Backup and Recovery Requirements  Recovery plans should be discussed in the early stages of database development  Some key issues: Is the database volatile? Can the users afford to lose a day’s work? Can the database be taken down on a regular basis?  It is important to establish a Service-Level Agreement (SLA) with the user departments  Backup and recovery strategies are much simpler if users understand the process  7x24 databases require complex backup and recovery routines and carefully planned downtime for maintenance Protecting the Database from Failure Backup and Recovery Techniques  Dumping and Logging  Most sites cannot allow the lose of any work  File dumps of the database are taken at set intervals, and logs of the changes between the dumps are produced  These files can then be used to perform full recovery of a database  If the database (or parts of it) are lost  Restore from the latest dump  Roll Forward in time by applying the logs of all the changes  All these facilities are available in the Oracle Recoverer Protecting the Database from Failure Possible Failures  User Errors – hard to plan for  Transaction Errors  User Process Failures  Network Failures  Instance Failures  CPU (Host) Failures  Media Failures Protecting the Database from Failure User Errors  Typical user errors An unwanted change is committed to the database, but the mistake is not realized until some time later Accidental drop of a table (cannot be rolled back) Performing wholesale deletes or updates without a WHERE clause  Prevention Users should be trained in database and application principles  Recovery  Committed data cannot simply be backed out  Other transactions may have already used the bad data  Even if the user knows the correct value, user management should probably be consulted before any change is made  Recovery Manager can be used to recover a tablespace to the point before the error occured Protecting the Database from Failure Transaction Errors  Typical transactions errors Lack of space when a table needs to expand Lack of privileges on underlying objects Deadlock with conflicting transactions Programming errors within and application  Prevention  Cannot guarantee prevention of any of the above  Careful planning of storage needs will help  Recovery  Intervention by the DBA is not normally required  Recovery performed by Oracles automatic rollback processes Protecting the Database from Failure User Process Failures  Typical causes of user process failures Abnormal disconnection of user process Loss of client due to user action Addressing exception  Prevention Not much can be done by the DBA  Recovery  Intervention by the DBA is not normally required  Aborted processes are detected by PMON which performs a rollback of the active transaction and releases all resources  PMON cleans out all caches and frees up memory occupied by the failed process Protecting the Database from Failure Network Failures  Typical types of network failures Network or phone connections lost Client workstation becomes disconnected  Recovery  Intervention by the DBA is not normally required  Loss of the connection is detected automatically by PMON  PMON forces a rollback of the client’s transaction Protecting the Database from Failure Instance Failures  Typical causes of instance failure Loss of an Oracle background process  Prevention Not much can be done Monitor the trace files for clues to the cause Recovery  Intervention by the DBA is not normally required except to shutdown and restart the database  SMON process automatically applies redo log information and rolls back all uncommitted changes on instance startup Protecting the Database from Failure CPU (Host) Failure  Typical causes of CPU failure Loss of power  Prevention Utilize a UPS Recovery  Intervention by the DBA is not normally required except to have already created scripts that restart the database on computer bootup  SMON process automatically applies redo log information and rolls back all uncommitted changes on instance startup Protecting the Database from Failure Media Failure – with no data loss  Typical failures Loss of disk controller  Prevention Utilize a UPS Recovery  Intervention by the DBA is not normally required except to have already created scripts that restart the database on computer bootup  SMON process automatically applies redo log information and rolls back all uncommitted changes on instance startup Protecting the Database from Failure Media Failure – with data loss  Typical failures  Disk Head crash  Disk or data destruction  Fire  Natural disasters  Accidental datafile deletion  Serious Failures Requires DBA action Requires plans and procedures to be in place Recovery  A strategy should be in place to handle all possible losses of data  Data loss requires both rollforward and rollback procedures  Same steps as instance recovery but probably from an earlier point in time  Need to establish which files are lost  Backup method and type of files lost dictate the recovery method used Protecting the Database from Failure Media Failure – with data loss (further protections)  Mirroring  Some sites are able to use mirror disks  Offers fault tolerance against media failure  Cannot protect against user and software errors  Mirrored Systems Expensive Whole systems mirrored for full fault tolerance  Hot standby databases  Oracle can keep a standby database in permanent recovery mode  It is a clone of the target database  Archived redo logs generated by the target are applied to the standby database as they are produced  RAID  Allows the use of additional disks to protect database files  Numerous configurations possible Protecting the Database from Failure Backup Strategies  All files need to be protected Datafiles Redo Log Files Control Files A backup strategy must be developed A choice of techniques is available to consider Rehearsals should be carried out periodically of both backup and recovery procedures  Need to consider:  The volatility of the database?  Can users afford to lose any data?  Can the database have planned downtime?  How large is the database? Protecting the Database from Failure Archive Logging  When active online redo log file is full, it is written out to an archive log file  Requires an additional background process ARCH – Archiver This is an optional process  Archiving allows the following features: Full recovery of the database to a state of consistency as of the time of the crash No loss of any committed data Important for transaction-oriented volatile databases  Oracle databases can run in either  ARCHIVELOG mode or  NOARCHIVELOG mode Protecting the Database from Failure Archive Logging  ARCHIVELOG mode  Allows full recovery of database  Hot backups can be performed while the database remains available  Management overhead concerns because the archive disk can become full  Writing to tape can be slow so archive to disk first  Highly active systems can fill redo log files faster than the archive to tape process  This will halt the instance until the archiver catches up Protecting the Database from Failure Archive Logging  NOARCHIVELOG mode  Allows the fastest performance of database  Advantages  Redo log files are cyclically reused because no archive logs are produced  No management overhead  Mass updates possible without fear of filling up the archive disk  Disadvantages  Cold, whole database backup only allowed  Damaged tablespaces stop the instance until dropped or restored  Recovery from media failures is generally not possible  Recovery Options  Restore from a previous backup  Restore from a previous export using the import utility Protecting the Database from Failure Switch to Archive Logging  The database can be switched to ARCHIVELOG mode after creation  Steps: Shutdown Normal Startup Mount db1 Alter Database db1 Archivelog; Alter Database Open;  The database is placed into ARCHIVELOG mode while in the MOUNTed state  This does not automatically enable archiving Protecting the Database from Failure Switch to Archive Logging  When a database is put into ARCHIVELOG mode the redo log files are not automatically archived  Two methods to turn on automatic archiving  Use the parameter file to enable permanent automatic archiving  Log_Archive_Start = True (False is the default)  Or while the instance is running issue:  Alter System Archive Log Start;  This starts automatic archiving but does not survive a shutdown and startup  Remember this only activates automatic archiving if the database is already in ARCHIVELOG mode  Stopping automatic archiving:  Alter System Archive Log Stop; Protecting the Database from Failure Setting a destination for Archive Log Files  The Log_Archive_Dest parameter  Log_Archive_Dest = d:\archive  This can be a single point of failure so be careful  Log_Archive_Dest_1 = d:\archive  Log_Archive_Dest_2 = e:\archive  Log_Archive_Dest_3 = e:\backup  Log_Archive_Dest_4 = f:\archive  Up to five destinations can be specified  No single point of failure Protecting the Database from Failure Archive Summary Information  Archive Log List;  This statement generates the following output:  Database log mode  Automatic archival  Archive destination  Oldest online log sequence  Next log sequence to archive  Current log sequence number Protecting the Database from Failure Redo Log Management  How many?  How big?  Where are they stored?  How are they protected/recovered?  What do they contain?  How will they be used?  Which mode of operation? Protecting the Database from Failure Redo Log Management  Minimum of two log files are required Redo log files are used in a cyclic nature Should use enough log groups to allow archiving to take place before log files need to be reused  Default size is 500 KB Minimum size is 50 KB There is no maximum size Usual size is 1 - 10 MB All log files should be the same size  Should be positioned on separate disks Protecting the Database from Failure Redo Log Management  Should be duplexed (Mirrored)  Oracle is vulnerable to the loss of redo logs  Database stops and full recovery is impossible  Redo logs contain both before and after images of changed data Redo logs provide protection for both data and rollback segments to allow total recovery  Redo logs are used to recover from instance and media failures Can be applied to restored backup copies of the database files Database changes are written to redo logs on transaction commits When redo log is full, a log switch and checkpoint occurs  Redo logs can be archived as they are filled Protecting the Database from Failure Redo Log File Switches  A checkpoint occurs at each switch and each new log is assigned a sequence number Redo Log Group 1 Log 1 Redo Log Group 2 Archive Redo Logs LGWR ARCH Log 1 LGWR 1 Log 2 ARCH Log 3 LGWR 2 Log 2 ARCH Log 3 LGWR 1 3 Log 4 2 1 Protecting the Database from Failure Redo Log Group  Number the groups 1, 2, 3 not 10, 20 , 30 Takes up less space in the control file which references the redo log files  Use just enough groups to prevent waits for checkpoints and archive operations  If Oracle loses a member of a group, it is recorded as STALE and an error is recorded in the ALERT file and LGWR trace file System is not halted, LGWR continues to write to the other members of the group  The archiver can archive any one member of a group  If the log file is unavailable archiver simply chooses an alternative  LGWR cannot reuse a log file currently being archived Protecting the Database from Failure Redo Log Group  Adding a new redo log group  Alter Database db_name Add Logfile Group x ‘filespec’ Size n (K | M);  Adding a new redo log member  Alter Database db_name Add Logfile Member ‘filespec’ To Group n;  Dropping Redo Log Groups  Alter Database db_name Drop Logfile Group n;  At least three log file groups must be present  Must have at least two for LGWR to function Protecting the Database from Failure Managing and Protecting Control Files  Control Files  Relatively small binary files that are critical to the operation of a database  Contain database definitions:  Database name  Names and locations of redo log files and datafiles  Current log sequence numbers and status  Checkpoint timestamp information  Database creation timestamp  Beginning and ending blocks of rollback segments  Information required by the Recovery Manager  All control files must be available while the database is open  A database cannot be opened if a control file is lost Protecting the Database from Failure Control Files: Current Advice  Store at least two copies on separate disks  Better to have a copy on each disk  Oracle mirrors any changes to the control files on all copies  Control files are updated at each checkpoint with a slight impact on performance  If you have only one disk available place the control files in different areas to avoid problems from bad sectors  Backup the control file whenever a change is made to database or redo log files  If primary control files are lost than a full recovery is possible from backup control files Protecting the Database from Failure Adding a New Control File  Shutdown the database  Copy the existing (good) control file to a file(s) on another disk(s) using the operating system  Specify the new control file name in the CONTROL_FILES parameter in the parameter file  Start up the instance and open the database using the changed parameter file Protecting the Database from Failure Safeguarding Control Files  Perform a control file backup each time a database change is made Alter Database Backup Controlfile To ‘filespec’;  Trace out the control file to produce a humanreadable file  Alter Database Backup Controlfile To Trace;  This produces a file in the trace directory that can be read, edited and used to reconfigure the control file if needed Protecting the Database from Failure Loss of a Control File  If any control files are lost, the database cannot function correctly  Shutdown the database (may require a Shutdown Abort) Using the OS make a copy of the good control file but not in the same location as the problem control file  Specify the location of the new control file in the parameter file and remove reference to the bad control file  Start up the database and return to normal function Protecting the Database from Failure Loss of All Control Files  Serious Problem Call Oracle Support  Media recovery is likely necessary  Can restore from a backup and then use an appropriate backup control file to recover the database  It is sometimes possible to re-create a control file using the CREATE CONTROLFILE statement Protecting the Database from Failure The Alert Log File  Written by Oracle and stored in oraclehome/admin/oracle_sid/bdump directory  Contains information on  Database creation  Startup of each background process  Log switches  Datafile and logfile manipulation  Internal errors  Block corruption  Deadlocks  The file grows without bound  Must decide what to keep online and offline Protecting the Database from Failure Questions?

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download IT 21003 Database Administration