* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download T - Read
Open Database Connectivity wikipedia , lookup
Oracle Database wikipedia , lookup
Commitment ordering wikipedia , lookup
Relational model wikipedia , lookup
Microsoft Jet Database Engine wikipedia , lookup
Functional Database Model wikipedia , lookup
Serializability wikipedia , lookup
Database model wikipedia , lookup
Clusterpoint wikipedia , lookup
Extensible Storage Engine wikipedia , lookup
PART 5 TRANSACTION MANAGEMENT Chapter 17 Recovery System Introduction Recovery component in DBMS ensure atomicity and durability despite failures, thus provides high availability (可用性) Recovery schemes include actions taken during normal transaction processing to record enough information about transaction execution to recover from failures, e.g. log in DBS actions taken after a failure to recover the database contents to a state that ensures atomicity and durability Backup(备份) is another approach taken by DBS to ensure high availability of DBS June 2008 Database System Concepts - Chapter 17 Recovery System - 3 §17.1 Failure Classification Three types of failures may occur in DBS Transaction failure logical errors transaction cannot complete due to some internal error condition system errors the database system must terminate an active transaction due to an error condition (e.g., deadlock) System crash hardware malfunction (e.g. power failure or other hardware failures), and bugs in DBS software or operating systems, which causes the system to crash June 2008 Database System Concepts - Chapter 17 Recovery System - 4 §17.1 Failure Classification (cont.) fail-stop assumption: data items in non-volatile storage are assumed to not be corrupted by system crash e.g. database systems have efficient mechanisms, at hardware and software levels, to prevent corruption of disk data Disk failure (storage medium failure) a head crash or similar disk failure destroys all or part of disk storage destruction is assumed to be detectable, because disk drives use checksums to detect failures June 2008 Database System Concepts - Chapter 17 Recovery System - 5 §17.2 Storage Structure 17.2.1 Storage Types Three categories of storage medium in computer systems Volatile(易失) storage: does not survive system crashes e.g. main memory, cache memory, register Nonvolatile (非易失,永久) storage: survives system crashes e.g. disk, tape, flash memory, non-volatile (battery backed up) RAM June 2008 Database System Concepts - Chapter 17 Recovery System - 6 17.2.1 Storage Types (cont.) Stable (可靠、稳定) storage a mythical form of storage that survives all failures approximated by maintaining multiple copies on distinct nonvolatile media, e.g. RAID June 2008 Database System Concepts - Chapter 17 Recovery System - 7 17.2.3 Data Access The data accesses in transactions, i.e., write(Q) and read(Q) operations on data item Q are implemented by data transferring among disks, disk buffers in main memory, and transactions’ private work areas refer to Fig.17.0.1 and Fig.17.0.2 June 2008 Database System Concepts - Chapter 17 Recovery System - 8 Transaction data accesses on data item x and y issued by Ti, e.g. select, insert, delete, update, … DBMS local variables in local xi, yi buffer/working area specific to Ti read(x) read(y) /write(x) /write(y) BX BY … BZ input(BX) DB file on disk disk buffer /block buffer output(BY) / reflect BX BY … BZ BW BU BV … BQ BR Fig.17.0.1 Data access in DBS (I) buffer input(A) Buffer Block A x Buffer Block B Y A output(B) B read(X) write(Y) y2 x1 y1 local buffer /work area of T1 local buffer / work area of T2 main memory Fig. 17.0.2 Data access in DBS (II) disk 17.2.3 Data Access (cont.) In main memory, each transaction Ti has its private work-area in which local copies of all data items, e.g. X, are accessed and updated by Ti are kept local copy of a data item X in Ti is called xi. assuming, for simplicity, that each data item X fits in, and is stored inside, a single block e.g. xi, yi in Fig.17.0.1 and Fig.17.0.2 In main memory, there are disk buffers, also named system block buffers, in which buffer blocks for data items X residing temporarily e.g. Bx and By in Fig.17.0.1, X and Y in Fig.17.0.2 June 2008 Database System Concepts - Chapter 17 Recovery System - 11 17.2.3 Data Access (cont.) DBMS and OS transfer buffer blocks between disk and disk buffers in main memory through the following two operations: input(B): transfers the physical block B to main memory. output(B): transfers the buffer block B to the disk, and replaces the appropriate physical block there Transaction Ti transfers data items X between disk buffer and its private work-area using the following operations read(X) : assigns the value of data item X to the local variable xi. June 2008 Database System Concepts - Chapter 17 Recovery System - 12 17.2.3 Data Access (cont.) It executes this operation as follows if block Bx on which X resides is not in main memory, input(Bx) is issued assign to xi the value of X from Bx in disk buffer June 2008 write(X): assigns the value of local variable xi to data item X in the buffer block Bx. It executes this operation as follows if block Bx on which X resides is not in main memory, input(Bx) is issued assign to the value of xi to X in Bx in the disk buffer Database System Concepts - Chapter 17 Recovery System - 13 17.2.3 Data Access (cont.) read and write are similar to API in OS Output(BX) need not immediately follow write(X) system can perform the output operation when it deems fit for example, when the transaction is in partial commit state June 2008 Database System Concepts - Chapter 17 Recovery System - 14 §17.3 Recovery and Atomicity Problem when system failure occurs, the modifying of the database by transaction may leave the database in an inconsistent state. E.g. Consider transaction Ti that transfers $50 from account A to account B, with initial values of A and B being $1000, and $2000 a system crash occurs during the execution of Ti , after output(BA) has taken place, but before output(BB) is executed the values of A and B in working area, disk buffer and disk is shown in Fig.17.0.3 June 2008 Database System Concepts - Chapter 17 Recovery System - 15 disk buffer BA= (950) BB= (2050) buffer Block A BA buffer Block B BB output(B) read() write() ai =950 bi=2050 ai output(A) A B A=950 B=2000 bi work area of Ti main memory disk Fig.17.0.3 Contents of memory and DB when system crash occurs 17.3 Recovery and Atomicity (cont.) after DBS recovers from system crash, if Ti is not re-executed, DB is concurrently in an inconsistent in which A=950 and B=2000 Ti is re-executed, the DB will enter an inconsistent state in which A=900 and B=2050 with respect to these two modifications of DB, i.e., output(A) and output(B), inconsistency occurs after output(A) has been made but before all of these two modifications are made so, recovery actions are needed to be taken To ensure DB consistency, the goal is either to perform all database modifications made by Ti or none at all atomicity of Ti should be guaranteed despite failures June 2008 Database System Concepts - Chapter 17 Recovery System - 17 17.3 Recovery and Atomicity (cont.) To ensure transaction atomicity despite failures, DBMS records the information describing the modifications on DB by the transactions (e.g. by means of log records) and output these information to stable storage before modifying the database itself when system crashes, DBS can be recovered on the basis of these descriptive information June 2008 Database System Concepts - Chapter 17 Recovery System - 18 §17.4 Log-Based Recovery 17.4-1 Log A log(日志)is a sequence of records, recording all the update activities in DBS, and kept on stable storage There are four types of log records in a log; when transaction Ti executes, recovery scheme registers different types of log records into the log according to the operations issued by Ti , refer to Fig.17.0.4 and Fig.17.0.5 when Ti starts, i.e. begin-transaction appears and Ti enters active state, Ti is registered by writing a <Ti start> record into the log June 2008 Database System Concepts - Chapter 17 Recovery System - 19 begin-trans. … write(X) … opj; opn; commit. DBMS allocate resources, create trans. local -buff. Xi :V1→V2 reflect data to disk disk -buff. Bx=v2 Bx=v2 Bx Bx=v2 active partially commit disk states: log file : <Ti start> <Ti, X, V1, V2> <Ti commit> Fig.17.0.4 Log records for a committed transaction Ti release resources, end trans. commit 17.4-1 Log (cont.) June 2008 before Ti executes write(X), a update log record <Ti , X, V1, V2> is written, where V1 is the value of X before the write, and V2 is the value to be written to X. this records notes that Ti has performed a write on data item X, and X had value V1 before the write, and will have value V2 after the write when Ti finishes its last statement, i.e., commit statement appears in Ti and Ti enters partial commit state, the log record <Ti commit> is written into the log. Database System Concepts - Chapter 17 Recovery System - 21 abort write(X) … opj; opn; /rollback begin-trans. DBMS allocate resources, undo or redo the previous operations create trans. local -buff. Xi :V1→V2 disk -buff. Bx=v2 Bx=v2 Bx Bx=v1 disk states: active log <T start> i file : <Ti, X, V1, V2> failed <Ti abort> Fig.17.0.5 Log records for an aborted transaction Ti release resources, end trans. aborted 17.4-1 Log (cont.) if Ti is aborted, i.e., abort/rollback statement appears in Ti and Ti enters failed state, the log record <Ti abort> is written into the log The log contains a complete record of all database update activities. On the basis of update log information <Ti , X, V1, V2>, if a DB modification ( i.e. write V2 on data item X in DB) recorded by update log is desirable, it is output/reflected to database on disks e.g. Fig.17.0.4 June 2008 Database System Concepts - Chapter 17 Recovery System - 23 17.4-1 Log (cont.) if system failure occur, for a DB modification that is recorded in <Ti , X, V1, V2> and already output/reflected to database on disks, it should be cancelled by undo operation restore the value of data item X to its old-value V1 e.g. Fig.17.0.5 Threeo log-based recovery approaches for a single transaction or a set of serial transactions June 2008 deferred database modification (延时更新) immediate database modification (及时/立即更新) checkpoint Database System Concepts - Chapter 17 Recovery System - 24 17.4-2 Deferred Database Modification Principles this scheme records all modifications issued by the transactions to the log, but defers all the write operations to be done in the partial commit state in the active state, all modifications to DB issued by the transaction are not reflected to disk, and are temporally stored in the disk/block buffer in main memory when Ti partially commits, the log records associated with Ti are used in executing the differed write, the modified values on data items are reflected to DB on disks if a failure occurs, DBMS recovers data items DB on the basis of the log records June 2008 Database System Concepts - Chapter 17 Recovery System - 25 17.4-2 Deferred Database Modification (cont.) When a write(X) operation is issued by transaction Ti , refer to Fig.17.0.6 the log record <Ti , X, V2> is written into the log file, where V2 is the new value for X the write is not performed on X at the time, but is deferred, i.e. after write(X) is issued, only the value of X in disk/block buffer is changed into new value V2, while the value of X on disk remains the old value V1 note June 2008 old value V1 for X is not needed in the log because the old value is used for recovery; when a failure occurs and X needs to be recovered to its old value, the value of X on disk remains the unchanged V1 before Ti begins Database System Concepts - Chapter 17 Recovery System - 26 write(X) … opj; opn; begin-trans. DBMS allocate resources, reflect the new value V2 to the disk create trans. local -buff. Xi : V1→V2 disk -buff. Bx :V2 input(Bx) disk Bx:V1 states: Bx :V1 active log <T start> i file : commit. <Ti, X, V2> release resources, end trans. Bx :V2 output(Bx) Bx :V2 partially commit commit <Ti commit> Fig.17.0.6 Deferred Database Modification without failure 17.4-2 Deferred Database Modification (cont.) in partial commit state, DBMS uses the log record < Ti, X, V2 > to conduct the previously deferred write(X), that is, reflects the updated new value V2 into data item X on disk When a failure occurs while Ti is still in active state, then Ti is aborted and <Ti abort>, not <Ti commit>, appears in the log to rollback Ti and restore the value of data item X, recovery subsystem simply ignores the log information associated with Ti DB state (defined as the values of data items on disk) after recovery is just the same as that before Ti started Fig.17.0.7 June 2008 Database System Concepts - Chapter 17 Recovery System - 28 abort or write(X) … opj; opn; system crash begin-trans. DBMS allocate resources, rollback Ti and ignore the log record create trans. local -buff. Xi : V1→V2 disk -buff. Bx :V2 Bx : ?? Bx :V1 Bx :V1 disk Bx:V1 states: log <T start> i file : active <Ti, X, V2> failed <Ti abort> or null Fig.17.0.7 Deferred Database Modification with failures in active state release resources, end trans. aborted 17.4-2 Deferred Database Modification (cont.) /* 系统恢复后,维持Ti执行前的数据库状态,从原子性 角度,保证Ti中的操作全部没有做 When a failure occurs while Ti is in partially committed state, then both <Ti start> and <Ti commit> are in the log, to guarantee durability of the transaction, June 2008 recovery subsystem conduct redo(Ti) to ensure the atomicity of Ti redo(Ti) : sets the value of all data items X updated by the transaction Ti to the new values V2, on the basis of <Ti, X, V2> /*从用户角度,系统恢复后,保证Ti中的操作全部完 成 Database System Concepts - Chapter 17 Recovery System - 30 17.4-2 Deferred Database Modification (cont.) refer to Fig.17.0.8 note: state transition from failed to committed is permitted, as shown in Fig.17.0.8-2 June 2008 Database System Concepts - Chapter 17 Recovery System - 31 begin-trans. write(X) …opj; opn;commit failure occurs DBMS allocate resources, recovery by redo create trans. local -buff. Xi : V1→V2 disk -buff. Bx :V2 Bx : ?? Bx :V1 Bx :V2 disk states: Bx:V1 active redo partially commit failed log <Ti start> <Ti, X, V2> <Ti commit> file : Fig.17.0.8-1 Deferred Database Modification with failures in partially committed state release resources, end trans. committed commit begin-transaction abort Fig.17.0.8-2 Extended transaction state diagram An Example Transactions T0 and T1 , assuming T0 executes before T1 Initially, A=1000, B=2000, C=700 T0 and T1 are executed in serial as <T0; T1> T0: read (A) A:= A - 50 write (A) read (B) B:= B + 50 write (B) June 2008 T1: read (C) C:= C- 100 write (C) Database System Concepts - Chapter 17 Recovery System - 34 An Example (cont.) At the time of system crash, the log may appears at three instances of time, refer to Fig.17.4 Fig.17.4 Logs when failure occurs at three instances of time June 2008 Database System Concepts - Chapter 17 Recovery System - 35 An Example (cont.) In case(a), the system crashes when T0 is in active state; therefore, no redo actions need to be taken data items A=1000, B=2000 and C=700 in DB do not change refer to Fig.17.0.7 In case(b), the system crashes when T0 has been committed and T1 is still in active state; redo(T0) must be performed, since <T0 commit> is present A changes to 950 and B to 2050, but C=700 does not refer to Fig.17.0.8 June 2008 Database System Concepts - Chapter 17 Recovery System - 36 An Example (cont.) In case (c), the system crashes when T0 and T1 has been committed; redo(T0) and redo(T1) must be performed, since <T0 commit> and <T1commit> appear in the log A changes to 950, B to 2050 and C to 600 June 2008 Database System Concepts - Chapter 17 Recovery System - 37 17.4-3 Immediate Database Modification The scheme allows database modification to be output/reflected to the database on disk while the transaction is still in the active state output/reflect of updated blocks Bx can take place before the transaction commit, i.e. in active state In this scheme, update logs must have both old value and new value, because undo may be needed in the case of occurring of failures in active state e.g. update log <Ti, X, V1, V2> Refer to Fig.17.0.9 June 2008 Database System Concepts - Chapter 17 Recovery System - 38 write(X) … opj; opn; begin-trans. commit. DBMS allocate resources, release resources, end trans. create trans. local -buff. Xi : V1→V2 disk -buff. Bx :V2 Bx :V2 reflect disk Bx:V1 states: Bx :V2 active log <T start> i file : <Ti, X, V1, V2> Bx :V2 partially commit commit <Ti commit> Fig.17.0.9 Immediate Database Modification without failure An Example Log <T0 start> <T0 , A, 1000, 950> Database on disk A = 950 <To , B, 2000, 2050> B = 2050 <T0 commit> <T1 start> <T1, C, 700, 600> C = 600 <T1 commit> June 2008 Fig.17.6 State of system log and database for T0 and T1 Database System Concepts - Chapter 17 Recovery System - 40 17.4-3 Immediate Database Modification (cont.) Recovery scheme uses two recovery procedures undo(Ti) restores the value of all data items X updated by Ti to their old values V1 redo(Ti) sets the value of all data items updated by Ti to the new values, e.g. redo(T0) and redo(T1) in Fig.17.6 June 2008 Database System Concepts - Chapter 17 Recovery System - 41 17.4-3 Immediate Database Modification (cont.) Both operations must be idempotent (等幂的), that is, even if the operation is executed multiple times the effect is the same as if it is executed once needed since operations may get re-executed during recovery When a failure occurs while Ti is still in active state, then Ti should be aborted/rollbacked, the record <Ti start> appears in the log but the record <Ti commit> does not Ti is not committed and should be rolled back by undo (Ti) refer to Fig.17.0.10 June 2008 Database System Concepts - Chapter 17 Recovery System - 42 abort or write(X) … opj; opn; system crash begin-trans. DBMS allocate resources, local -buff. Xi : V1→V2 failure occurs and restore to V1 by undo(Ti) disk -buff. Bx :V2 Bx : ?? Bx :V2 Bx :V1 create trans. disk Bx:V1 states: active log <T start> i file : failed undo <Ti, X, V1, V2> <Ti abort> or null Fig. 17.0.10 Immediate database modification with failures in active state release resources, end trans. aborted 17.4-3 Immediate Database Modification (cont.) When a failure occurs while Ti is in partial committed state or committed state, and both the record <Ti start> and the record <Ti commit> appears in the log, then Ti should be recovered from the failure and completed successfully, by means of redo(Ti) setting the value of all data items updated by Ti to the new values refer to Fig.17.0.11-1 note: state transition from failed to committed is permitted, as shown in Fig.17.0.8-2 ( ) Undo operations are performed first, then redo operations June 2008 Database System Concepts - Chapter 17 Recovery System - 44 begin-trans. write(X) …opj; opn;commit failure occurs DBMS allocate resources, recovery by redo create trans. local -buff. Xi : V1→V2 disk -buff. Bx :V2 Bx : ?? Bx :V2 Bx: V2 disk states: Bx:V1 active redo partially commit failed log <Ti start> <Ti, X, V1, V2> <Ti commit> file : Fig.17.0.11-1 Immediate Database Modification with failures in partially committed state release resources, end trans. committed An Example Transactions T0 and T1 ( ), assuming T0 and T2 are executed in serial as <T1; T2> Initially, A=1000, B=2000, C=700 the log as it appears at three instances of time. Fig.17.7 The same log, shown at three different times June 2008 Database System Concepts - Chapter 17 Recovery System - 46 An Example (cont.) In case(a), undo(T0) is taken data items A and B are restored to their initial values 1000 and 2000 respectively In case(b), undo(T1) and redo(T0) are taken C is restored to initial value 700, and A and B are set to the updated values 950 and 2050 respectively In case(c), redo(T0) and redo(T1) are taken A and B are set to the modified values 950 and 2050 respectively, then C is set to 600 June 2008 Database System Concepts - Chapter 17 Recovery System - 47 §17.4-4 Checkpoints Demerits in the recovery procedure discussed earlier searching the entire log is time-consuming we might unnecessarily redo transactions which have already output their updates to the database e.g. Fig.17.0.11-1 To reduce these types of overheads, the recovery system streamline the recovery procedure by periodically performing checkpoints, which require the following sequence of actions to take place output all log records currently residing in main memory (refer to log record bufferring in 17.7.1 ) onto the log file on stable storage June 2008 Database System Concepts - Chapter 17 Recovery System - 48 17.4-4 Checkpoints (cont.) output/reflect all modified buffer blocks in the disk buffer prior to the checkpoints to the disk!!! /* 定期将内存disk buffer中修改后的数据块写回外设磁盘 DB文件中 write a log record < checkpoint> onto the log file on stable storage Refer to Fig.17.0.12 for the illustration of checkpoints the updated value of data item X, i.e. V2, is reflected/outputted to the disk by the checkpoint(i) the updated value of data item Y, i.e. U2, is reflected/outputted to the disk while Ti is in partially commit state June 2008 Database System Concepts - Chapter 17 Recovery System - 49 … begin-trans. …write(X) local -buff. Xi: V1→V2 disk -buff. Bx:V2 …write(Y)… commit Yi: U1→U2 Bx:V2 output(Bx) Bx:V1 By:U1 Bx:V2 By:U1 Bx:V2 By:U2 output(By) Bx:V2 By:U2 chk(i) chk(i-1) states: Bx:V2 By:U2 active chk(i+1) partially commit committed log <T start> <Ti, X, V1, V2> checkpoint(i) <Ti, Y, U1, U2> <Ti commit> i file : Fig.17.0.12 Illustration of Ti and checkpoints 17.4-4 Checkpoints (cont.) T1 T2 T3 cpk(i-1) T4 T5 cpk(i+1) cpk(i) the time point when the failure occurs log: < T1 start>; …;< T1 commit > ; < T2 start>; …;< T2 commit > ; < T3 start>; …< cpk >; …< T3 commit >; < T4 start>; …< cpk >; … < T4 commit >; < T5 start > Fig.17.0.13 Checkpoints June 2008 Database System Concepts - Chapter 17 Recovery System - 51 An example The recovery system periodically performs checkpoints that require the following sequence of actions to take place except: A. Output onto stable storage all log records currently residing in main memory B. Output to the disk all modified buffer blocks C. Output onto stable storage a log record <checkpoint> D. Redo some failure transactions. Answers: D June 2008 Database System Concepts - Chapter 17 Recovery System - 52 17.4-4 Checkpoints (cont.) If deferred DB modification scheme is employed ,the write() operation issued by a transaction can be reflected on to DB file on the disk at the checkpoints or prior to the checkpoint while Ti is in partially commit state e.g. Fig.17.0.12 and Fig.17.0.13 SQL provides mechanisms to define savepoints e.g. checkpoint defining in T-SQL as described in Fig.17.0.14 June 2008 Database System Concepts - Chapter 17 Recovery System - 53 Checkpoint vs Savepoint The checkpoint is similar to the savepoint in the real DBS, such as DB2 or Sybase But they are somewhat different in operating semantics the savepoint may not be issued periodically June 2008 Database System Concepts - Chapter 17 Recovery System - 54 BEGIN TRANSACTION USE student-DB INSERT INTO student VALUES (“ 03402”, “王菲”, “CS”, “1985/05/15”) SAVE TRAN My-savepoint /* defining save-point */ DELETE FROM student WHERE name= “王菲” or “章立” ROLLBACK TRAN My-savepoint COMMIT TRAN GO Note: /*rollback将操作滚回到保存点 My-savepoint: delete 操作被rolled back, 而 insert操作则不被rolled back; DB恢复为delete操作执行前的状态,新插入的元组 (“03402”, “王菲”, “CS”, “1985/05/15”)并未被删除,仍然保存在数据库中*/ Fig.17.0.14 An example of savepoint in T-SQL st-id stname department birthdate st-id stname department birthdate 03405 章立 CS 1985/0 1/25 03402 王菲 CS 1985/0 5/15 … … … … 03405 章立 CS 03409 李龙 CS 1984/1 2/20 1985/0 1/25 … … … … … … … … 03409 李龙 CS 03411 赵新 CS 1985/0 6/18 1984/1 2/20 … … … … 03411 赵新 CS 1985/0 6/18 (a) DB before the transaction starts (b) DB after insert is issued Fig.17.0.15-I DB instances at different timepoints st-id stname department birthdate st-id stname department birthdate … … … … 03402 王菲 CS 03409 李龙 CS 1984/1 2/20 1985/0 5/15 03405 章立 CS … … … … 1985/0 1/25 … … … … 03411 赵新 CS 1985/0 6/18 03409 李龙 CS 1984/1 2/20 … … … … 03411 赵新 CS 1985/0 6/18 (c) DB after delete is issued (d) DB after rollback is issued Fig.17.0.15-II DB instances at different timepoints 17.4-4 Checkpoints (cont.) With the help of the checkpoint, the recovery scheme mentioned in 17.4-2 and 17.4-3 can be refined as follows, assuming the transactions remain running serially After a failure occurs, the recovery scheme search the log to determine the most recent Ti that started executing just before the most recent checkpoint took place, e.g. T4 and cpk(i) in Fig.17.0.13 scanning the log backwards, from the end of the log, until it find the first <checkpoint> record (that is the final <checkpoint> record in the log and corresponds to the most recent checkpoint ) continuing the scanning backward until it finds the next < Ti start> record in the log Ti is the most recent transaction June 2008 Database System Concepts - Chapter 17 Recovery System - 58 17.4-4 Checkpoints (cont.) in Fig.17.0.13, the most recent checkpoints is chk(i), and the most recent Ti found is T4, which started executing just before chk(i) took place Once the system has identified the most recent Ti , recovery scheme applies redo and undo to only Ti and all transactions that started executing after Ti , denoting these transactions by the set T, and assuming immediate-modification is used, then for all transactions in T that have no <Tk commit> record in the log, execute undo(Tk) to rollback the uncommitted Tk in Fig.17.0.13 ( ) , undo is applied to T5 ; June 2008 Database System Concepts - Chapter 17 Recovery System - 59 17.4-4 Checkpoints (cont.) for all transactions in T such that the <Tk commit> record appears in the log, execute redo(Tk) e.g. in Fig.17.0.13 ( ), redo is applied to T4; all the transactions that are not in the set T are ignored the transactions T1 , T2 and T3 are ignored When deferred-modification is employed, undo operation does not need to be applied June 2008 Database System Concepts - Chapter 17 Recovery System - 60 An Example A set of serial T1, T2, T3 and T4 in Fig.17.0.16 The most recent transaction found is T4 which started executing just before the checkpoint took place The set of transaction to be considered is T = {T2 , T3 , T4 } T1 can be ignored, because the updates issued by T1 already output to disk due to checkpoint If immediate-modification is employed, then redo is applied to T2 and T3 undo is applied to T4 T1 is ignored June 2008 Database System Concepts - Chapter 17 Recovery System - 61 TP1 TP0 T1 T2 T3 T4 checkpoint system failure considered parts of the log log: < T1 start>; …;< T1 commit > ; < T2 start>; … < cpk >…;< T2 commit > ; < T3 start>; …; …< T3 commit >; < T4 start> Fig.17.0.16 An example of Checkpoint and undo/redo §17.6 Recovery with Concurrent Transactions Extend the log-based recovery scheme to deal with multiple concurrent transactions It is assumed that immediate modification scheme is allowed the system has a single log the system has a single disk buffer shared by all transactions a block in the disk buffer is permitted to have data items updated by one or more transactions June 2008 Database System Concepts - Chapter 17 Recovery System - 63 Pitfalls in Recovery with Concurrent Transactions As shown in Fig.17.0.17, for concurrent transactions T0 and T1, which update data item Q serially (V1→V2 → V3) , T0 should be rolled back due to a failure On the basis of log-based recovery, the data item Q is restored to its initial value V1 before T0 and T1 start, thus the update on Q performed by T1 (i.e. V3 ) is lost if T0 is rolled back, although T1 is a successful committed transaction June 2008 Database System Concepts - Chapter 17 Recovery System - 64 T0 : … … write(Q) …opj; … opn; … … … write(Q) T1 : … local -buff. Q0 : V1→V2 Q1: V2→V3 disk -buff. BQ:V2 BQ:V3 BQ :V2 BQ :V3 BQ:V1 …. abort ; recovery due to failure by undo opk; commit undo(T0) Bx :V1 <T0 start> <T1 start> <T0,Q,V1,V2> <T1,Q,V2,V3><T1 commit> <T0 abort> log Fig.17.0.17 Illustration of concurrent updates in case of rollback 17.6-1 Interaction with Concurrent Control (cont.) To avoid the pitfalls as described in Fig.17.0.17, it is desirable that if a transaction T has update a data item Q, no other transaction may update the same data item Q until T has committed or rolled back This requirement can be ensured by strict two-phase locking protocol June 2008 Database System Concepts - Chapter 17 Recovery System - 66 17.6-2 Transaction Rollback When a failed transaction Ti is rolled back, the recovery scheme scans the log backward, finds out the log record such as < Ti, Q, V1, V2> and restores the data item Q to its old value V1 Scanning the log backward is important, because Ti may updates the data item Q several times; the “oldest” update < Ti, Q, Vi, Vj> by Ti should be employed to restore the data item Q June 2008 Database System Concepts - Chapter 17 Recovery System - 67 An Example The log is as follows <Ti start> <Ti,Q,V0,V1> <Ti,Q,V1,V2> <Ti,Q,V2,V3> <Ti abort> scanning backward Scanning backward the updates on Q: V0←V1 ←V2←V3 Q is restored to V0 June 2008 Database System Concepts - Chapter 17 Recovery System - 68 17.6-3 Checkpoints As described in 17.4-4, when the checkpoint is used for the recovery for a single transaction or a set of serial transactions, it was necessary to consider only the following transactions during recovery the one transactions, if any, that was active at the time of most recent checkpoint those transactions that started after the most recent checkpoint June 2008 Database System Concepts - Chapter 17 Recovery System - 69 17.6-3 Checkpoints (cont.) For a set of concurrent transactions, since several transactions may have been active at the time of most recent checkpoint, the checkpoint log record should be of the form <checkpoint L>, where L is a list of transactions active at the time of the checkpoint also assuming that the transaction do not perform updates either on the buffer blocks or on the log while the checkpoint is in progress this constraint can be relaxed in fuzzy checkpoint scheme June 2008 Database System Concepts - Chapter 17 Recovery System - 70 An Example checkpoint(k) checkpoint(k+1) system failure t T1 T2 T3 T5 T4 T6 <checkpoint(k), {T1, T3}> <checkpoint(k+1), {T1, T6}> log:…; <checkpoint(k+1), {T1, T6}>; …; <T5 start>; …; <T1 commit>; … Fig.17.0.18 An example of checkpoint log June 2008 Database System Concepts - Chapter 17 Recovery System - 71 17.6-4 Restart Recovery !! When the system, in which a set of transactions execute concurrently, recovers from a crash, it first constructs two lists, that is, undo-list consisting of incomplete transactions which must be undone and redo-list consisting of finished transactions that must be redone, as follows initialize undo-list and redo-list to empty scan the log backwards from the end, stopping when the first <checkpoint L> record is found, i.e. the most recent checkpoint is found e.g. <checkpoint(k+1), {T1, T6}> in Fig.17.0.18 June 2008 Database System Concepts - Chapter 17 Recovery System - 72 17.6-4 Restart Recovery (cont.) June 2008 for each record found during the backward scan if it is of the form <Ti commit>, add Ti to redo-list /* Ti finishes before the failure occurs if it is of the form <Ti start>and Ti is also not in redo-list, add Ti to undo-list /* Ti starts after the most recent checkpoint but does not finish before the failure occurs for every Ti in the transaction list L of the the most recent checkpoint , if Ti is not in redo-list, add Ti to undo-list Database System Concepts - Chapter 17 Recovery System - 73 17.6-4 Restart Recovery (cont.) At this point, undo-list consists of incomplete transactions which must be undone, and redo-list consists of finished transactions that must be redone June 2008 Database System Concepts - Chapter 17 Recovery System - 74 Example One With respect to the concurrent transactions in Fig.17.0.18 ( then the failure occurs, the log is of the form as follows ), …; <checkpoint(k+1), {T1, T6}>; …; <T5 start>; …; <T1 commit>; … During the backward scanning, T1 is added to redo-list and T5 is added to undo-list redo-list ={T1 } undo-list ={T5 } For T6 in L of the the most recent checkpoint(k+1), it is not in redo-list and so added to undo-list undo-list ={T6 , T5 } June 2008 Database System Concepts - Chapter 17 Recovery System - 75 17.6-4 Restart Recovery (cont.) After the redo-list and undo-list have been constructed, the recovery proceeds as follows scan backward the log from the most recent record, i.e. the end of the log, and perform an undo for each log record, e.g. < Ti , X, V1, V2>, that belongs to a transaction Ti in the undo-list {Ti } the scan stops when the <Ti start> records have been encountered for every Ti in the undo-list /* 后向扫描,通过undo操作,依次撤销undo-list表所记录 的各个未提交事务Ti在故障点之前的更新操作 locate the most recent <checkpoint L> record on the log maybe by scanning the log forward, if the checkpoint record was passed in step1 June 2008 Database System Concepts - Chapter 17 Recovery System - 76 17.6-4 Restart Recovery (cont.) scan the log forwards from the most recent <checkpoint L> record, and performs redo for each log record such as < Ti , X, V1, V2>, that belongs to a transaction Ti in the redo-list {Ti }, till the end of the log /* 从最近检查点开始,前向扫描,通过redo操作,依次重 做redo-list表所记录的各个已提交事务Ti在最近检查点之 后所做的更新操作 The other transactions not in redo-list and undo-list are ignored Notes during the recovery procedure, it is important to undo before redoing undoing should proceed backward from the end of the log redoing should be performed forward from the most recent checkpoint Database System Concepts - Chapter 17 Recovery System - 77 June 2008 Example Two With respect to the concurrent transactions T0, T1, T2 and T3, the initial values of data items A, B, C and D are 0 assuming that immediate database modification technique is used for log-based recovery When a failure occurs, the log is given in Fig.17.0.19 in next slide Go over the steps of the recovery algorithm and give the values of data items A, B, C and D after recovery performs June 2008 Database System Concepts - Chapter 17 Recovery System - 78 A=0, B=0, C=0, D=0 Step3: scanning forward to redo T3 <T0 start> <T0, A, 0, 10> <checkpoint {T0}> A=10, B=0, C=0, D=0 <T0 commit> <T1 start> Step2: <T1, B, 0, 10> scanning backward to <T2 start> undo T2 and T1 <T2, C, 0, 10> <T2, C, 10, 20> A=10, B=10, C=20, D=0 <checkpoint {T1, T2}> <T3 start> Step1: <T3, A, 10, 20> scanning backward to A=20, B=10, C=20, D=10 <T3, D, 0, 10> construct redolist and undo-list <T3 commit> failure occurs A=20, B=0, C=0, D=10 recovery Fig.17.0.19 The log for concurrent transactions Example Two (cont.) The recovery algorithm proceeds in three steps Step1. Construct the redo-list and the undo-list by scanning backward from the end of the log to the most recent checkpoint <checkpoint {T1, T2}> redo-list ={T3} undo-list = {T1, T2} T0 is ignored Step2. Scan the log from the most recent record <checkpoint {T1, T2}> backward, and perform undo for T1 and T2 in undo-list after undo T2 , data item C is restored to 0, that is 20 →10 → 0 after undo T1 , data item B is restored to 0, that is 10 → 0 June 2008 Database System Concepts - Chapter 17 Recovery System - 80 Example Two (cont.) Step3. Scan the log from the most recent record <checkpoint {T1, T2}> forward, and perform redo for T3 in redo-list after redo T3 , data item A is restored to 20, that is 10 →20, data item D is restored to 10, that is 0 → 10 After recovery, the values of data items in DB are A=20, B=0, C=0, D=10 June 2008 Database System Concepts - Chapter 17 Recovery System - 81 Example Three Fig.17.0.20 shows the time scale for concurrent T1, T2, T3, T4 and T5, a checkpoint is set at Tc, After a system failure occurs at Tf, what recovery operations (i.e. redo, undo, or ignored) should the DBMS recovery subsystem apply to each transaction? assuming that immediate database modification technique is used for log-based recovery Solution: undo: T1, T5 ignored: T 2 redo: T3, T4 June 2008 Database System Concepts - Chapter 17 Recovery System - 82 Example Three (cont.) Tc Tf T1 T2 T3 T4 T5 system failure checkpoint Note: Ti succeeds in being committed; Ti fails and is aborted Ti Ti Fig.17.0.20 Time scale for concurrent transactions June 2008 Database System Concepts - Chapter 17 Recovery System - 83 Example Four The initial values of data items A and B are 1000 and 2000 respectively If immediate-modification is employed, give the log that is constructed by the recovery scheme and describes the concurrent executing of transactions T1 and T2, as shown in Fig.17.0.21 June 2008 Database System Concepts - Chapter 17 Recovery System - 84 T1: T2: Recovery begin-transaction read (A) log <T1 start> begin-transaction <T2 start> A:= A-50 write (A) <T1, A, 1000, 950> read(A) temp:= A*0.1 checkpoint A := A- temp write (A) <checkpoint {T1. T2}> <T2, A, 950, 855> read (B) B:= B+50 write(B) commit <T1, B, 2000, 2050> <T1, commit> read (B) checkpoint B:= B + temp write(B) abort <checkpoint {T2}> <T2, B, 2050, 2145> <T2 abort > Fig.17.0.21 Concurrent executing of T1 and T2 习题 Example Four + Example Two undo: concurrent schedule log file log file recovery actions recovery actions the recovered values of data items June 2008 Database System Concepts - Chapter 17 Recovery System - 86 §17.7 Buffer Management 17.7.1 Log Record Buffering With the help of log record buffering scheme, it is not necessary for each log record to be output to the stable storage at the time it is created; Log record buffering each log record is buffered in main memory, instead of of being output directly to stable storage log records are output to stable storage when a block of log records in the buffer is full several log records can thus be output using a single output operation, reducing the I/O cost June 2008 Database System Concepts - Chapter 17 Recovery System - 87 17.7.2 Database Buffering DBMS employs two-tier storage hierarchy ( ) and maintains an in-memory buffer of data blocks, named disk/block/system buffer when a new block B2 is needed and the buffer is full, an existing block B1 needs to be removed from buffer if the block B1 has been updated, it must be output to disk prior to B2 is input to the buffer similar to page replacement in virtual memory in operating systems June 2008 Database System Concepts - Chapter 17 Recovery System - 88 17.7.2 Database Buffering (cont.) If the input of the block B2 causes B1 to be chosen for output, then before B1 is output, all log records pertaining to the data in B1 must be output to stable storage The sequence of actions taken by the system would be output log records to stable storage until all log records pertaining to the data in B1 have been output output block B1 to the disk input block B1 from disk to disk buffer No updates/writes should be in progress on a block B when B is output to disk; this requirement can be meet by using a special means of locking June 2008 Database System Concepts - Chapter 17 Recovery System - 89 17.7.3 OS Roles in Buffer Management Database buffer can be implemented either in an area of real main-memory reserved for the database, and managed by DBMS, rather than OS in virtual memory provided by OS All these two approaches of buffer implementation have some merits or demerits The approach of implementing and managing database buffer by DBMS is more popular database systems such as DB2, Oracle, and SQL Server etc. June 2008 Database System Concepts - Chapter 17 Recovery System - 90 Appendix A Backup 备份 (Backup) 将数据库DB中的数据转储在stable storage中 经常性地建立DB的备份副本,以便出现故障时根据日志 和备份文件恢复DB中内容 经常性地定期备份数据库是一种常用的支持数据库恢复 (recovery)的手段,可帮助DBS恢复过程 DBS恢复系统(recovery component)利用备份文件(Back file)、 日志记录文件(Log record file)、检查点(Checkpoints),采用一 定的恢复机制进行DB恢复, 以保证系统的高可用性 Fig.17.0.18 June 2008 Database System Concepts - Chapter 17 Recovery System - 91 日志文件 DB 备份文件 备份 日志纪录 检查点 DBMS Fig.17.0.18 DBS 恢复系统 恢复机制 Appendix A Backup (cont.) 可从以下几方面考察数据库恢复系统 备份能力 按指定间隔建立整个数据库的备份副本 例如, 重要的数据库系统可以每天建立副本,一般在每夜 批更新作业完成之后 备份文件保存在盒式磁带或光盘媒介中,而不是数据库通 常驻留的磁盘媒介中 日志记录能力 建立日志文件, 日志文件包含自从上次完整数据库备份以 来所执行的所有事务的数据更新描述信息 日志记录: <Ti, Q, X1, X2>, Ti的各类时间戳 June 2008 Database System Concepts - Chapter 17 Recovery System - 93 Appendix A Backup (cont.) 检查点能力 在日志文件中建立检查点纪录,以加速恢复过程; 在恢复 时,恢复管理器模块只需返回到最近的检查点纪录 恢复管理器模块 采用一定的恢复算法,根据备份文件和日志文件将数据库 还原到最近的一致的数据库状态 June 2008 Database System Concepts - Chapter 17 Recovery System - 94 Appendix A Backup (cont.) 日常备份和恢复活动的主要任务如下 定期安排备份,如每天一次,如果需要,频率应更高 确保正确完成日志纪录,并将所有必要细节写入日志纪录 监控写入检查点纪录的频率 写入检查点记录将导致系统开销, 因此需要控制和调整 写入检查点纪录的频率,在系统开销和恢复期节省时间之 间找一个平衡点 June 2008 Database System Concepts - Chapter 17 Recovery System - 95 Appendix B DBS Maintaining DBS系统管理维护的主要工作如Fig.17.0.19 所示 June 2008 Database System Concepts - Chapter 17 Recovery System - 96 日常维护 备份与恢复 安全维护 空间管理 并发控制 监控和分析 系统进化 性能优化/调整 收集统计数据 增强应用程序 分析操作 模式修改 DBMS版本 使用基准程 序(benchmark) 升级 调整索引 调整查询 调整事务 调整模式 DBMS DBA Fig.17.0.19 DBS 管理维护主要工作 数 据 库 Have a break June 2008 Database System Concepts - Chapter 17 Recovery System - 98