Download PPT - Ajay Ardeshana

Unit - 2 Database Backup And Recovery  Introduction :Concurrency control and Database Recovery both are a part of the transaction management. Recovery is requires to protect the database from data inconsistencies and data loss. It ensures the atomicity and durability properties of transactions. These characteristics of DBMS helps to recover from the failure and restore the database to a consistent state. Database Recovery Concepts : Database recovery is the process of restoring the database to a correct state in the event of the failure.  It is the process of restoring the database in to the most recent consistent state that exist shortly before the time of system failure.  The failure may be the result of system crash due to hardware or software errors, a media failure such as a head crash, or a software in the application such as a logical error in a program that is accessing the database.  The number of recovery techniques that are used are based on the atomicity property of transaction.  A transaction is consider as a single unit of work in which all operations must be applied and completed to produced a consistent database.  If, for some reason, any transaction operation cannot be completed, the transaction must be aborted and any changes to the database must be rolled back(undone).  Thus, transaction recovery reverses all the changes that the transaction has made to the database before it was aborted.  If entire database needs to be recovered to a consistent state, the recovery uses the most recent backup copy of the database in a known consistent state.  The backup copy is then rolled forward to restore all subsequent transactions by using the transaction log information.  If the database needs to be recovered but the committed portion of the database is still unstable, the recovery process uses the transaction log to undo all the transactions that were not committed.  Database Backup : Some DBMSs provide functions that allow the database administrator to schedule automatic database backups to secondary storage devices such as, disks, CDs, tapes and so on.  The level of database backups can be taken as follows :  A full backup or a dump of database.  A differential backup of the database on which only the last modifications done to the database, when compare with the previous backup copy, are copied.  A backup of transaction log only. This level backs up all the transactions log operations that are not reflected ion a previous backup copy of the dataabse. Types of Database Failure : There many types of failure that can affect the database processing  Some failure affect the main memory only, while others involve secondary storage. 1. Hardware Failure : Hardware failure may include memory errors, disk crashes, bad disk sectors, disk full error and so on.  Hardware failure can also be attributed to design errors, poor quality control during fabrication, overloading and wear out of mechanical parts. 2. Software Failure : Software failure may include a failure a failures related to software such as, operating systems, DBMS software, application Programs and so on. 3. System Crashes : System crashes are due to hardware or software errors, resulting in the loss of main memory.  This could be the situation that the system has entered an undesirable state, such as Dead Lock, which prevent the program form continuing with normal processing. 4. Network Failure : Network failure can occur while using a Client-server configuration or distributed database system where multiple database servers are connected y common network.  Network failure such as communication software failure or aborted asynchronous connections will interrupt the normal operation of the database system. 5. Media Failure : Such failures are due to head crashes or unreadable media, resulting in the loss of parts of secondary storage.  They are the most dangerous failures. 6. Application Software Error : These are logical errors in the program that is accessing the database, which cause one or more transactions to fail. 7. Natural Physical Disasters : These are failure such as fire, floods, earthquake or power failure. 8. Carelessness : There are the failure due to unintentional destruction of data or facilities by operators or users. 9. Sabotage : These are failures due to international corruption or destruction of data, hardware or users. Types of Database Recovery Types of Database Recovery : In case of any type of failure a transaction must be either aborted or committed to maintain data integrity.  Transaction log plays an important role for database recovery and bring the database in a consistent state.  During recovery from failure, the recovery manager ensures that either all the effects of a given transaction are permanently recorded in the database or none of them are recorded.  A transaction begins with a successful execution of BEGIN TRANSACTION statement and it ends with successful execution of COMMIT statement.  Following two types of transaction recovery are used :  Forward Recovery.  Backward Recovery.  Forward Recovery (or REDO) : Forward Recovery is the recovery procedure, which is used in case of a physical damage, for example failure of secondary storage, failures during writing of data to database buffers or failure during transferring buffers to secondary storage.  The intermediate result of the transaction are written in the buffer. The database buffer occupy an area in the main memory. From this buffer, data are transfer to the secondary storage of database.  The update operation is regarded as permanent only when the buffers are flushed to the secondary storage of the database.  The flushing operation can be triggered by the COMMIT operation of the transaction or automatically in the event of buffers becoming full.  If failures occur between writing to the buffers and flushing of buffers to the secondary storage, the recovery manager must determine the status of the transaction that performed the WRITE and the time of failure.  If the transaction had already issued COMMIT, the recovery manager perform redo, so that the transaction’s updates to the database.  This redoing of transaction updates is also known as roll-forward.  The forward recovery guarantees the durability property of tran.  To recreate the lost disk, the system begin reading the most recent copy of the lost data and the transaction log of the changes to it.  A program the start reading log entries, starting from the first one that was recorded after the copy of database was made and continuing through to the last one that was recorded just before the failure.  For each of these entries, programs changes the data value concerned in the copy of the database to the ‘after value’ shown in the log entry.  This means that whatever process took place in the transaction that caused the log entry to be made, the net result of the database after that transaction will be stored.  Operation for each entry in the log is performed that caused a changes in the database since the copy was taken, in the same order that these transactions were originally executed.  This brings the database copy to the up-to-date level of the database that was destroyed.  Backward Recovery (or UNDO) : Backward Recovery is a recovery procedure, which is used in case an error occurs in the middle of normal operation on the database.  If the transaction had not committed at the time of failure, it will cause an inconsistency in the database, because of this other program may read incorrect data and made use of it.  Then the recovery manager must undo (rollback) any effect of the transaction database.  The backward recovery guarantees the atomicity property of the transactions.  In case of backward recovery, the recovery is started with the database in its current state and the transaction log is positioned at the last entry that was made in it.  Then a program reads ‘backward’ through log, resetting each updated data value in the database to it previous value as recorded in the transaction log, until it reach the point where the error was made.  Thus the program undoes each transaction in the reverse order from that in which it was made.  Example :-  ts  Starting Time of Transaction  tc  Time for Disk Crash  tf  Time for Transaction Failure.  In this example all the transactions T1,T2,…T6 are executing concurrently.  Let us assume that the data for transaction T2 and T4 are already written to the disk before failure at time tf.  It can be observed the transaction T1 and T6 had not committed at the point of the disk crash. Therefore the recovery manager must undo the transaction T1 and T6 at restart time.  However, it is not clear that to what extent the changes made by the other already committed transactions T1 and T6 have been propagated to the database on secondary storage.  This uncertainty is done because the buffers may or may not been flushed to secondary storage.  Thus, the recovery manager would be forced to redo transactions T2, T3, T4 and T5. Recovery Techniques  Recovery Techniques : The database recovery techniques depends on the type and extent of damage that has occurred to the database.  These techniques are based on atomic transaction property.  Following two types of damages can take place to the database.  Physical Damage : • If the database has been physically damaged, for example disk crash has occurred, then the last backup copy of the database is restored and update operation of committed transactions are reapplied using the transaction log file. • It is to be noted that restoration in this case is possible only if the transaction log has not been damaged.  Non-Physical or Transaction Failure : • If the database has become inconsistent due to a system crash during execution of transactions, then the changes that caused the inconsistency are rolled-backward(undo). • It may also be necessary to roll-forward (redo) some transactions to ensure that the updates performed by them have reached secondary storage. • In this case the database is restored to a consistent state using the before-images and after-images held in the transaction log file. • This technique is also known as log-based recovery technique. • For this following two techniques are used : – Deferred Update :– Immediate Update :- Deferred Update : In case of deferred update technique, updates are not written to the database until after a transaction has reached its COMMIT point. In other words, the updates to the database are deferred (postponed) until the transaction complete its execution successfully and reached its commit point.  During transaction execution the updates are recorded only in the transaction log and in the cache buffer.  After the transaction reached its commit point and the transaction log is forced-written to disk, the updates are recorded in the database.  If a transaction failed before it reaches this point, it will not have modified the database and so on undoing of changes will be necessary. However, it may be necessary to redo the updates of committed transactions as their effect may not have reached the database.  In the case of deferred update, the transaction log file is used in the following ways :  When a transaction T begins, transaction begin (or <T, BEGIN>) is written to the transaction log.  During the execution of transaction T, a new log record containing all log data specified previously. E.g. new value ai for attribute A is written as “<WRITE(A,ai)>”. Each record consist of the transaction name T, the attribute name A and new value ai for attribute A.  When all comprising transactions T are committed successfully, we say that the transaction T partially commits and the record “<T,COMMIT>” are written to the transaction log. After transaction T partially commits, the records associated with transaction T in the transaction log are used in executing the actual updates by writing to the appropriate records in the database. If a transaction T aborts, the transaction log record is ignored for the transaction T and write is not performed. Time Time-1 Time-2 Time-3 Time-4 Time-5 Time-6 Transaction READ(A,a1) A1:=a1+20000 WRITE(A,a1) READ(B,b1) B1:=b1-20000 WRITE(B,b1) Action Read Current Loan Balance Increase Loan Balance by 2000 Write New(Updated) Loan Balance Read Current Loan Cash Balance Reduce Loan Cash Balance by 20000 Write New(Updated) Loan Cash Balance Normal Execution of Transaction T  The transaction which update an attribute called employee’s loan balance (EMP_LOAN_BAL) in table EMPLOYEE.  Assume that the current balance of EMP_LOAN_BAL = 70000 and CUR_LOAN_CASH_BAL = 80000.  Now transaction took place for making a loan payment of 20000 to employee  After a failure has occurred, the DBMS examines the transaction log to determine which transactions need to be redone.  If the transaction log contains both the start record <T,BEGIN> and commit record <T,COMMIT> for transaction T, the transaction T must be redone.  That means, the database may have been corrupted, but the transaction execution was completed and the new values for the relevant data items are contain in the transaction log.  Therefore the transaction is needed to be reprocess.  Redo set the value of all data items updated by transaction T to the new values that are recorded in the transaction log. Time Log Entries Before Start of Transaction  Time – 1 Time – 2 Time – 3 Time – 4 Database Stored Value A = 70000 B = 80000 <T, BEGIN> <T, A, 20000> <T, B, 60000> <T, COMMIT> After Transaction  A = 90000 B = 60000 Database Update Log Entries for Transaction T  Now let us assume that the database failure has occurred in the following conditions :  Just after the COMMIT record is entered in the transaction log and before the updated records are written to the database.  Just before the execution of WRITE operation. Time Log Entries Before Start of Transaction  Time – 1 Time – 2 Time – 3 Time – 4 Database Value A = 70000 B = 80000 <T, BEGIN> <T, A, 20000> <T, b, 60000> <T, COMMIT> Failure occur just after the COMMIT record entered and before the updated records are written into the database. If the failure occurred just after the <T,COMMIT> record is enter into the transaction log and before the updated records are written into the database. When the system comes backup, no transaction is necessary because no COMMIT record for transaction T appears in the transaction Log. The REDO operation is executed, resulting in the values 90000 and 60000 being written to the database as the updated values of A, B. Time Log Entries Before Start of Transaction  Database Value A = 70000 B = 80000 Time – 1 <T, BEGIN> Time – 2 <T, A, 20000> Time – 3 <T, b, 60000> Time – 4 <T, COMMIT> Failure occur before the execution of WRITE operation.  In this case when the system comes backup, no action is necessary because no COMMIT record for transaction T appears in the transaction log.  So the value of A and B in database remains 70000 and 80000.  In this case transaction must be restarted. Immediate Update : In case of immediate update technique, all updates to the database are applied immediately as they occur without waiting to reach the COMMIT point and a record of all changes is kept in the transaction log.  In this technique, when the transaction begins, a record <T,BEGIN> and update operations are written to the transaction log on disk before it is applied to the database.  This type of recovery method requires two procedures namely : • Redoing transaction T(REDO,T) and, • Undoing of transaction T(UNDO, T).  First procedure redoes the same operation as Deferred Update.  Second one restore the values of all attributes updated by transaction T to their old values Time Log Entries Before start of transaction Time – 1 Time – 2 Time – 3 <T, BEGIN> <T, A, 70000, 20000> Time – 4 Time – 5 Time – 6 <T, B, 80000, 60000> Database Value A = 70000 B = 80000 A = 90000 B = 60000 <T, COMMIT> Immediate Update Log Entries for Transaction T.  In case of immediate update the transaction log file is used in following way :  When a transaction T begins <T, BEGIN> is written to log file.  When write operation is performed, a record containing the necessary data is written to the transaction log file.  Once the transaction log is written, the update is written to the database buffers.  The updates to the database itself are written when the buffer are next flushed to the secondary storage.  When the transaction T commits, <T,COMMIT> record is written to the transaction log file.  If the transaction log contain the record <T,BEGIN> but does not contain <T,COMMIT> transaction T is undone. The old value of affected data items are restored and transaction T is restarted. If transaction T contain both the records T will be redone.  Now suppose that database failure occurred in the following conditions :  Just before the WRITE action : “WRITE (B, b1)”  Just after “<T, COMMIT>” is written to the transaction log but before the new values are written to the database. Time Log Entries Before start of Transaction Stored Value A = 70000 B = 80000 Time – 1 <T, BEGIN> Time – 1 <T, A, 70000, 20000> A = 90000 Transaction T fail before the WRITE action to the Database In Immediate Update  When failure occur just before the execution of WRITE operation, system comes backs up and it find the record <T, BEGIN> but no corresponding <T, COMMIT>.  This means that the transaction T must be undone. Thus an UNDO(T) operation is executed. This restore the value of A to 70000 and the transaction can be restarted. Time Log Entries Before start of transaction  Time – 1 <T, BEGIN> Time – 2 <T, A, 70000, 20000> Time – 3 Time – 4 A = 70000 B = 80000 A = 90000 <T, B, 80000, 60000> Time – 5 Time – 6 Stored Value B = 60000 <T, COMMIT> Immediate Update Log Entries for T when failure occur after COMMIT action  Above given table shows the transaction log when a failure has occurred just after the execution of <T, COMMIT> is written to the transaction log but before the new values are written to the database.  When the system comes back again, a scan of the transaction log shows corresponding <T, BEGIN> and <T, COMMIT> records.  Thus a REDO(T) operation is executed.  This results into the values of A and B as 90000 and 60000 respectively. Shadow Paging : The Shadow Paging was technique does not requires the use of transaction log in a single user environment.  However in a multiuser environment a transaction log may be needed for concurrency method.  In the Shadow Page scheme, the database is consider to be made up of logical unit of storage fixed-size disk pages (or block).  The pages are mapped into physical blocks of storage by means of a page table, with one entry for each logical page of database.  This entry contains the block number of the physical storage where this page is storage.  Thus, the shadow paging scheme one possible form of the indirect page allocation.  The shadow paging scheme is similar to the one which is used by the operating system for virtual memory management.  In case of virtual memory management, the memory is divided into pages that are assume to be of a certain size.  The virtual and logical pages are mapped onto a physical memory blocks of the same size as the page. Page Table Address Page Table Physical Blocks  The mapping is provided by means of table known as Page Table.  The page table contain one entry for each logical page of the process’s virtual address space.  The shadow paging technique maintain the two page tables during the life of a transaction namely current page table and shadow page table for a transaction that is going to modify the database.  The shadow page is the original page table and the transaction addresses the database using the current page table.  At the start of transaction the two tables are same and both point to the same blocks of physical storage.  The shadow page table is never changed thereafter, and is used to restore the database in the event of system failure.  However current page table entries may change during execution of transaction.  The current page table is used to record all updates to the database. When the transaction complete, the current page become the shadow page table.  The pages that are affected by the transaction are copied to new blocks of physical storage and these blocks, along with the block not modified, are accessible to the transaction via the current page table.  The old version of the changed pages remains unchanged and these pages continue with to be access via the shadow page table.  The shadow page table contain the entries that existed in the page table before the start of the transaction and point to the blocks that were never changed by the transaction.  The shadow page table remains the unaltered by the transaction and is used for undoing the transaction.  Advantages : Overhead of maintaining the transaction log file is eliminated.  Since there is no need for UNDO or REDO operation, recovery is significantly faster.  Disadvantages : Data fragmentation or scattering.  Need of periodic garbage collection to reclaim inaccessible block Checkpoints : The point of synchronization between the database and transaction log file is called checkpoint.  General method of database recovery is using information in the transaction log file. But the main difficulty in this recovery is of m=knowing how far to go back in the transaction log to search in case of failure.  In the absence of this exact information, we may end up redoing transactions that have already been safely written to the database. Also this is very time-consuming and wasteful.  A batter way is to find a point that sufficiently far back to ensure that any time written before that point has been done correctly and stored safely.  This method is called checkpointing.  In checkpointing, all buffers are force-written to secondary storage.  The checkpoint technique is used to limit :  The volume of log information  Amount of searching  Subsequent processing that is need to carry out on the transaction log file.  During the execution of transaction, the DBMS maintain the transaction log but periodically perform the checkpoints.  Checkpoints are scheduled at predetermined intervals and involve the following operations :  Writing the start-of-checkpoint record along with the time and date to the log on the stable storage device giving the identification that it is a checkpoint.  Writing all transaction log file records in main memory to secondary storage (SS).  Writing the modified blocks in the database buffer to SS.  Writing a checkpoint record to the transaction log file. This record contains the identifier of all transactions that are active at the time of the checkpoint.  Writing an end-of-checkpoint record and saving of the address of the checkpoint record on the file accessible to the recovery routine on start-up after a system crash.  At the time of check point all the identifiers, and their database modifications which reflected at that time only in the database buffer will be propagated to the appropriate storage.  A checkpoint can be taken at fixed interval of time.  In case of failure during the serial operation of transactions, the transaction log file is checked to find the last transaction that started before the last check point.  Any earlier transactions would have committed previously, would have written to the database at the checkpoint.  Therefore it is needed to redo only : a) The one that was active at the checkpoint, b) Any subsequent transactions for which both started and commit records appear in the transaction log.  If the transactions are active at the time of failure, the transaction must be undone.  If transactions are performed concurrently, redo all transactions that have committed since the checkpoint and undo all transactions that were active at the time of failure.  Only transaction T1 is ok.  Transaction T2 and T4 will be redo and T3 and T5 will be undo.  Buffer Management : The buffers are the reserved blocks of the main memory.  DBMS application programs require I/O operations, which are performed by a components of OS.  These I/O operations normally use buffers to match the speed of the processor and relatively fast main memories with the slower secondary storages and also to minimize the number of I/O operations between the main and secondary memories.  The assignment and management of memory block are called buffer management and the component of the OS that perform this task are called buffer manager.  The buffer manager is responsible for the efficient management of the database buffers that are used to transfer pages between buffer and secondary storages.  It ensure that as many data requests made by programs as possible are satisfied from data copied from secondary storage into the buffer.  Buffer manager takes care of reading of pages from the disk into the buffer until the buffers become full and then using a replacement strategy to decide which buffer to force-write to disk to make space for new pages that need to be read from disk.  Some of the replacement strategy used by the buffer manager are:  First-In-First-Out (FIFO) and  Least Recently Used (LRU).  A computer system uses buffers that are in fact virtual memory buffers. Thus, a mapping is required between a virtual memory buffer and physical memory.  The physical memory is managed by memory management component of OS>  In a virtual memory management, the buffers containing pages of the database undergoing modification by a transaction could be written out to secondary storage.  The timing of this premature writing of buffer is decided by a memory management components of OS and it independent of the state of the transaction.  To decrease the buffer fault, the LRU algorithm is used for buffer replacement.  The buffer management effectively provides a temporary copy of a database page.  Therefore, it is used in database recovery system in which the modifications are done in this temporary copy and original page remain unchanged in the secondary storage.  Bothe transaction log and database page are written to the buffer pages into virtual memory.  The COMMIT transaction operation takes in two phases, and thus it called a two-phase commit.  In the first phase of COMMIT operation, the transaction log buffers are written out and in second phase of COMMIT operation, the data buffers are written out.  Thus it does not cause any problem because the log is always forced during the first phase of COMMIT.

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download PPT - Ajay Ardeshana