Download T - Read

Document related concepts

Open Database Connectivity wikipedia , lookup

IMDb wikipedia , lookup

Oracle Database wikipedia , lookup

Commitment ordering wikipedia , lookup

Relational model wikipedia , lookup

Database wikipedia , lookup

Microsoft Jet Database Engine wikipedia , lookup

Functional Database Model wikipedia , lookup

Serializability wikipedia , lookup

Database model wikipedia , lookup

Clusterpoint wikipedia , lookup

Extensible Storage Engine wikipedia , lookup

ContactPoint wikipedia , lookup

Concurrency control wikipedia , lookup

Transcript
PART 5
TRANSACTION MANAGEMENT
Chapter 17
Recovery System
Introduction



Recovery component in DBMS
 ensure atomicity and durability despite failures, thus
provides high availability (可用性)
Recovery schemes include
 actions taken during normal transaction processing to record
enough information about transaction execution to recover
from failures, e.g. log in DBS
 actions taken after a failure to recover the database contents
to a state that ensures atomicity and durability
Backup(备份) is another approach taken by DBS to ensure
high availability of DBS
June 2008
Database System Concepts - Chapter 17 Recovery System -
3
§17.1 Failure Classification



Three types of failures may occur in DBS
Transaction failure
 logical errors
transaction cannot complete due to some internal error
condition
 system errors
the database system must terminate an active transaction
due to an error condition (e.g., deadlock)
System crash
hardware malfunction (e.g. power failure or other hardware
failures), and bugs in DBS software or operating systems, which
causes the system to crash
June 2008
Database System Concepts - Chapter 17 Recovery System -
4
§17.1 Failure Classification (cont.)


fail-stop assumption:
data items in non-volatile storage are assumed to not be
corrupted by system crash
 e.g. database systems have efficient mechanisms, at
hardware and software levels, to prevent corruption of
disk data
Disk failure (storage medium failure)
a head crash or similar disk failure destroys all or part of
disk storage
 destruction is assumed to be detectable, because disk drives
use checksums to detect failures
June 2008
Database System Concepts - Chapter 17 Recovery System -
5
§17.2 Storage Structure
17.2.1 Storage Types
 Three categories of storage medium in computer systems
 Volatile(易失) storage:
 does not survive system crashes
 e.g. main memory, cache memory, register

Nonvolatile (非易失,永久) storage:
 survives system crashes
 e.g. disk, tape, flash memory,
non-volatile (battery backed up) RAM
June 2008
Database System Concepts - Chapter 17 Recovery System -
6
17.2.1 Storage Types (cont.)

Stable (可靠、稳定) storage
 a mythical form of storage that survives all failures
 approximated by maintaining multiple copies on distinct
nonvolatile media, e.g. RAID
June 2008
Database System Concepts - Chapter 17 Recovery System -
7
17.2.3 Data Access

The data accesses in transactions, i.e., write(Q) and read(Q)
operations on data item Q are implemented by data transferring
among disks, disk buffers in main memory, and transactions’
private work areas
 refer to Fig.17.0.1 and Fig.17.0.2
June 2008
Database System Concepts - Chapter 17 Recovery System -
8
Transaction
data accesses on data item x and y issued by Ti,
e.g. select, insert, delete, update, …
DBMS
local variables in local
xi, yi
buffer/working area
specific to Ti
read(x)
read(y)
/write(x)
/write(y)
BX BY … BZ
input(BX)
DB file on disk
disk buffer
/block buffer
output(BY) / reflect
BX BY … BZ BW
BU BV … BQ BR
Fig.17.0.1 Data access in DBS (I)
buffer
input(A)
Buffer Block A
x
Buffer Block B
Y
A
output(B)
B
read(X)
write(Y)
y2
x1
y1
local buffer
/work area
of T1
local buffer
/ work area
of T2
main memory
Fig. 17.0.2 Data access in DBS (II)
disk
17.2.3 Data Access (cont.)
In main memory, each transaction Ti has its private work-area in
which local copies of all data items, e.g. X, are accessed and
updated by Ti are kept
 local copy of a data item X in Ti is called xi.
 assuming, for simplicity, that each data item X fits in, and is
stored inside, a single block
 e.g. xi, yi in Fig.17.0.1 and Fig.17.0.2
 In main memory, there are disk buffers, also named system block
buffers, in which buffer blocks for data items X residing
temporarily
 e.g. Bx and By in Fig.17.0.1, X and Y in Fig.17.0.2

June 2008
Database System Concepts - Chapter 17 Recovery System -
11
17.2.3 Data Access (cont.)
DBMS and OS transfer buffer blocks between disk and disk
buffers in main memory through the following two operations:
 input(B):
transfers the physical block B to main memory.
 output(B):
transfers the buffer block B to the disk, and replaces the
appropriate physical block there
 Transaction Ti transfers data items X between disk buffer and its
private work-area using the following operations
 read(X) :
assigns the value of data item X to the local variable xi.

June 2008
Database System Concepts - Chapter 17 Recovery System -
12
17.2.3 Data Access (cont.)
It executes this operation as follows
 if block Bx on which X resides is not in main memory,
input(Bx) is issued
 assign to xi the value of X from Bx in disk buffer

June 2008
write(X):
assigns the value of local variable xi to data item X in the
buffer block Bx.
It executes this operation as follows
 if block Bx on which X resides is not in main memory,
input(Bx) is issued
 assign to the value of xi to X in Bx in the disk buffer
Database System Concepts - Chapter 17 Recovery System -
13
17.2.3 Data Access (cont.)

read and write are similar to API in OS

Output(BX) need not immediately follow write(X)
 system can perform the output operation when it deems fit
 for example, when the transaction is in partial commit state
June 2008
Database System Concepts - Chapter 17 Recovery System -
14
§17.3 Recovery and Atomicity

Problem
 when system failure occurs, the modifying of the database
by transaction may leave the database in an inconsistent state.

E.g. Consider transaction Ti that transfers $50 from account A to
account B, with initial values of A and B being $1000, and
$2000
 a system crash occurs during the execution of Ti , after
output(BA) has taken place, but before output(BB) is
executed
 the values of A and B in working area, disk buffer and disk is
shown in Fig.17.0.3
June 2008
Database System Concepts - Chapter 17 Recovery System -
15
disk
buffer
BA= (950)
BB= (2050)
buffer Block A
BA
buffer Block B
BB
output(B)
read()
write()
ai =950
bi=2050
ai
output(A)
A
B
A=950
B=2000
bi
work area of Ti
main memory
disk
Fig.17.0.3 Contents of memory and DB when system crash occurs
17.3 Recovery and Atomicity (cont.)
after DBS recovers from system crash, if
 Ti is not re-executed, DB is concurrently in an inconsistent
in which A=950 and B=2000
 Ti is re-executed, the DB will enter an inconsistent state in
which A=900 and B=2050
 with respect to these two modifications of DB, i.e., output(A)
and output(B), inconsistency occurs after output(A) has been
made but before all of these two modifications are made
 so, recovery actions are needed to be taken
 To ensure DB consistency, the goal is either to perform all
database modifications made by Ti or none at all
 atomicity of Ti should be guaranteed despite failures

June 2008
Database System Concepts - Chapter 17 Recovery System -
17
17.3 Recovery and Atomicity (cont.)

To ensure transaction atomicity despite failures, DBMS records
the information describing the modifications on DB by the
transactions (e.g. by means of log records) and output these
information to stable storage before modifying the database
itself
 when system crashes, DBS can be recovered on the basis of
these descriptive information
June 2008
Database System Concepts - Chapter 17 Recovery System -
18
§17.4 Log-Based Recovery
17.4-1 Log
 A log(日志)is a sequence of records, recording all the update
activities in DBS, and kept on stable storage

There are four types of log records in a log; when transaction Ti
executes, recovery scheme registers different types of log records
into the log according to the operations issued by Ti , refer to
Fig.17.0.4 and Fig.17.0.5
 when Ti starts, i.e. begin-transaction appears and Ti enters
active state, Ti is registered by writing a
<Ti start>
record into the log
June 2008
Database System Concepts - Chapter 17 Recovery System -
19
begin-trans.
…
write(X) … opj; opn; commit.
DBMS allocate resources,
create trans.
local
-buff.
Xi :V1→V2
reflect data
to disk
disk
-buff.
Bx=v2
Bx=v2
Bx
Bx=v2
active
partially
commit
disk
states:
log
file : <Ti start> <Ti, X, V1, V2>
<Ti commit>
Fig.17.0.4 Log records for a committed transaction Ti
release
resources,
end trans.
commit
17.4-1 Log (cont.)


June 2008
before Ti executes write(X), a update log record
<Ti , X, V1, V2>
is written, where V1 is the value of X before the write, and
V2 is the value to be written to X.
 this records notes that Ti has performed a write on data
item X, and X had value V1 before the write, and will
have value V2 after the write
when Ti finishes its last statement, i.e., commit statement
appears in Ti and Ti enters partial commit state, the log
record
<Ti commit>
is written into the log.
Database System Concepts - Chapter 17 Recovery System -
21
abort
write(X) … opj; opn; /rollback
begin-trans.
DBMS allocate resources,
undo or redo
the previous
operations
create trans.
local
-buff.
Xi :V1→V2
disk
-buff.
Bx=v2
Bx=v2
Bx
Bx=v1
disk
states:
active
log <T start>
i
file :
<Ti, X, V1, V2>
failed
<Ti abort>
Fig.17.0.5 Log records for an aborted transaction Ti
release
resources,
end trans.
aborted
17.4-1 Log (cont.)


if Ti is aborted, i.e., abort/rollback statement appears in Ti
and Ti enters failed state, the log record
<Ti abort>
is written into the log
The log contains a complete record of all database update
activities. On the basis of update log information <Ti , X, V1,
V2>,
 if a DB modification ( i.e. write V2 on data item X in DB)
recorded by update log is desirable, it is output/reflected to
database on disks
 e.g. Fig.17.0.4
June 2008
Database System Concepts - Chapter 17 Recovery System -
23
17.4-1 Log (cont.)


if system failure occur, for a DB modification that is recorded
in <Ti , X, V1, V2> and already output/reflected to database
on disks, it should be cancelled by undo operation
 restore the value of data item X to its old-value V1
 e.g. Fig.17.0.5
Threeo log-based recovery approaches for a
single
transaction or a set of serial transactions



June 2008
deferred database modification (延时更新)
immediate database modification (及时/立即更新)
checkpoint
Database System Concepts - Chapter 17 Recovery System -
24
17.4-2 Deferred Database Modification

Principles
 this scheme records all modifications issued by the transactions
to the log, but defers all the write operations to be done in the
partial commit state
 in the active state, all modifications to DB issued by the
transaction are not reflected to disk, and are temporally
stored in the disk/block buffer in main memory
 when Ti partially commits, the log records associated with Ti
are used in executing the differed write, the modified values on
data items are reflected to DB on disks
 if a failure occurs, DBMS recovers data items DB on the basis
of the log records
June 2008
Database System Concepts - Chapter 17 Recovery System -
25
17.4-2 Deferred Database Modification (cont.)

When a write(X) operation is issued by transaction Ti , refer to
Fig.17.0.6
 the log record <Ti , X, V2> is written into the log file, where
V2 is the new value for X
 the write is not performed on X at the time, but is deferred, i.e.
 after write(X) is issued, only the value of X in disk/block
buffer is changed into new value V2, while the value of X
on disk remains the old value V1
 note


June 2008
old value V1 for X is not needed in the log
because the old value is used for recovery; when a failure
occurs and X needs to be recovered to its old value, the
value of X on disk remains the unchanged V1 before Ti
begins Database System Concepts - Chapter 17 Recovery System - 26
write(X) … opj; opn;
begin-trans.
DBMS allocate resources,
reflect the
new value V2
to the disk
create trans.
local
-buff.
Xi : V1→V2
disk
-buff.
Bx :V2
input(Bx)
disk
Bx:V1
states:
Bx :V1
active
log <T start>
i
file :
commit.
<Ti, X, V2>
release
resources,
end trans.
Bx :V2
output(Bx)
Bx :V2
partially
commit
commit
<Ti commit>
Fig.17.0.6 Deferred Database Modification without failure
17.4-2 Deferred Database Modification (cont.)


in partial commit state, DBMS uses the log record < Ti, X, V2
> to conduct the previously deferred write(X), that is, reflects
the updated new value V2 into data item X on disk
When a failure occurs while Ti is still in active state, then Ti is
aborted and <Ti abort>, not <Ti commit>, appears in the log
 to rollback Ti and restore the value of data item X, recovery
subsystem simply ignores the log information associated with
Ti
 DB state (defined as the values of data items on disk) after
recovery is just the same as that before Ti started
 Fig.17.0.7
June 2008
Database System Concepts - Chapter 17 Recovery System -
28
abort or
write(X) … opj; opn; system crash
begin-trans.
DBMS allocate resources,
rollback Ti
and ignore the
log record
create trans.
local
-buff.
Xi : V1→V2
disk
-buff.
Bx :V2
Bx : ??
Bx :V1
Bx :V1
disk
Bx:V1
states:
log <T start>
i
file :
active
<Ti, X, V2>
failed
<Ti abort>
or null
Fig.17.0.7 Deferred Database Modification
with failures in active state
release
resources,
end trans.
aborted
17.4-2 Deferred Database Modification (cont.)
/* 系统恢复后,维持Ti执行前的数据库状态,从原子性
角度,保证Ti中的操作全部没有做
When a failure occurs while Ti is in partially committed state,
then both <Ti start> and <Ti commit> are in the log, to guarantee


durability of the transaction,

June 2008
recovery subsystem conduct redo(Ti) to ensure the atomicity
of Ti
 redo(Ti) : sets the value of all data items X updated by the
transaction Ti to the new values V2, on the basis of <Ti, X,
V2>
 /*从用户角度,系统恢复后,保证Ti中的操作全部完
成
Database System Concepts - Chapter 17 Recovery System -
30
17.4-2 Deferred Database Modification (cont.)
refer to Fig.17.0.8
 note: state transition from failed to committed is permitted,
as shown in Fig.17.0.8-2

June 2008
Database System Concepts - Chapter 17 Recovery System -
31
begin-trans.
write(X) …opj; opn;commit failure
occurs
DBMS allocate resources,
recovery
by redo
create trans.
local
-buff.
Xi : V1→V2
disk
-buff.
Bx :V2
Bx : ??
Bx :V1
Bx :V2
disk
states:
Bx:V1
active
redo
partially
commit
failed
log
<Ti start>
<Ti, X, V2> <Ti commit>
file :
Fig.17.0.8-1 Deferred Database Modification
with failures in partially committed state
release
resources,
end trans.
committed
commit
begin-transaction
abort
Fig.17.0.8-2 Extended transaction state diagram
An Example



Transactions T0 and T1 , assuming T0 executes before T1
Initially, A=1000, B=2000, C=700
T0 and T1 are executed in serial as <T0; T1>
T0:
read (A)
A:= A - 50
write (A)
read (B)
B:= B + 50
write (B)
June 2008
T1:
read (C)
C:= C- 100
write (C)
Database System Concepts - Chapter 17 Recovery System -
34
An Example (cont.)

At the time of system crash, the log may appears at three
instances of time, refer to Fig.17.4
Fig.17.4 Logs when failure occurs at three instances of time
June 2008
Database System Concepts - Chapter 17 Recovery System -
35
An Example (cont.)

In case(a), the system crashes when T0 is in active state; therefore,
no redo actions need to be taken
 data items A=1000, B=2000 and C=700 in DB do not change
 refer to Fig.17.0.7

In case(b), the system crashes when T0 has been committed and T1
is still in active state; redo(T0) must be performed, since <T0
commit> is present
 A changes to 950 and B to 2050, but C=700 does not
 refer to Fig.17.0.8
June 2008
Database System Concepts - Chapter 17 Recovery System -
36
An Example (cont.)

In case (c), the system crashes when T0 and T1 has been committed;
redo(T0) and redo(T1) must be performed, since <T0 commit> and
<T1commit> appear in the log
 A changes to 950, B to 2050 and C to 600
June 2008
Database System Concepts - Chapter 17 Recovery System -
37
17.4-3 Immediate Database Modification
The scheme allows database modification to be output/reflected
to the database on disk while the transaction is still in the active
state
 output/reflect of updated blocks Bx can take place before the
transaction commit, i.e. in active state
 In this scheme, update logs must have both old value and new
value, because undo may be needed in the case of occurring of
failures in active state
 e.g. update log <Ti, X, V1, V2>
 Refer to Fig.17.0.9

June 2008
Database System Concepts - Chapter 17 Recovery System -
38
write(X) … opj; opn;
begin-trans.
commit.
DBMS allocate resources,
release
resources,
end trans.
create trans.
local
-buff.
Xi : V1→V2
disk
-buff.
Bx :V2
Bx :V2
reflect
disk
Bx:V1
states:
Bx :V2
active
log <T start>
i
file :
<Ti, X, V1, V2>
Bx :V2
partially
commit
commit
<Ti commit>
Fig.17.0.9 Immediate Database Modification without failure
An Example
Log
<T0 start>
<T0 , A, 1000, 950>
Database on disk
A = 950
<To , B, 2000, 2050>
B = 2050
<T0 commit>
<T1 start>
<T1, C, 700, 600>
C = 600
<T1 commit>
June 2008
Fig.17.6 State of system log
and database for T0 and T1
Database System Concepts - Chapter 17 Recovery System -
40
17.4-3 Immediate Database Modification (cont.)

Recovery scheme uses two recovery procedures
 undo(Ti) restores the value of all data items X updated by Ti
to their old values V1
 redo(Ti) sets the value of all data items updated by Ti to the
new values,
 e.g. redo(T0) and redo(T1) in Fig.17.6
June 2008
Database System Concepts - Chapter 17 Recovery System -
41
17.4-3 Immediate Database Modification (cont.)

Both operations must be idempotent (等幂的), that is,
 even if the operation is executed multiple times the effect is
the same as if it is executed once
 needed since operations may get re-executed during
recovery

When a failure occurs while Ti is still in active state, then Ti
should be aborted/rollbacked, the record <Ti start> appears in
the log but the record <Ti commit> does not
 Ti is not committed and should be rolled back by undo (Ti)
 refer to Fig.17.0.10
June 2008
Database System Concepts - Chapter 17 Recovery System -
42
abort or
write(X) … opj; opn; system crash
begin-trans.
DBMS allocate resources,
local
-buff.
Xi : V1→V2
failure occurs
and restore to
V1 by
undo(Ti)
disk
-buff.
Bx :V2
Bx : ??
Bx :V2
Bx :V1
create trans.
disk
Bx:V1
states:
active
log <T start>
i
file :
failed
undo
<Ti, X, V1, V2>
<Ti abort>
or null
Fig. 17.0.10 Immediate database
modification with failures in active state
release
resources,
end trans.
aborted
17.4-3 Immediate Database Modification (cont.)


When a failure occurs while Ti is in partial committed state or
committed state, and both the record <Ti start> and the record
<Ti commit> appears in the log, then
 Ti should be recovered from the failure and completed
successfully, by means of redo(Ti)
 setting the value of all data items updated by Ti to the new
values
 refer to Fig.17.0.11-1
 note: state transition from failed to committed is permitted,
as shown in Fig.17.0.8-2 ( )
Undo operations are performed first, then redo operations
June 2008
Database System Concepts - Chapter 17 Recovery System -
44
begin-trans.
write(X) …opj; opn;commit failure
occurs
DBMS allocate resources,
recovery
by redo
create trans.
local
-buff.
Xi : V1→V2
disk
-buff.
Bx :V2
Bx : ??
Bx :V2
Bx: V2
disk
states:
Bx:V1
active
redo
partially
commit
failed
log
<Ti start> <Ti, X, V1, V2> <Ti commit>
file :
Fig.17.0.11-1 Immediate Database Modification
with failures in partially committed state
release
resources,
end trans.
committed
An Example
Transactions T0 and T1 ( ), assuming T0 and T2 are executed in
serial as <T1; T2>
 Initially, A=1000,
B=2000, C=700


the log as it appears at three instances of time.
Fig.17.7 The same log, shown at three different times
June 2008
Database System Concepts - Chapter 17 Recovery System -
46
An Example (cont.)

In case(a), undo(T0) is taken
 data items A and B are restored to their initial values 1000
and 2000 respectively

In case(b), undo(T1) and redo(T0) are taken
 C is restored to initial value 700, and A and B are set to the
updated values 950 and 2050 respectively

In case(c), redo(T0) and redo(T1) are taken
 A and B are set to the modified values 950 and 2050
respectively, then C is set to 600
June 2008
Database System Concepts - Chapter 17 Recovery System -
47
§17.4-4 Checkpoints

Demerits in the recovery procedure discussed earlier
 searching the entire log is time-consuming
 we might unnecessarily redo transactions which have
already output their updates to the database

e.g. Fig.17.0.11-1

To reduce these types of overheads, the recovery system
streamline the recovery procedure by periodically performing
checkpoints, which require the following sequence of actions
to take place
 output all log records currently residing in main memory
(refer to log record bufferring in 17.7.1 ) onto the log file on
stable storage
June 2008
Database System Concepts - Chapter 17 Recovery System -
48
17.4-4 Checkpoints (cont.)
output/reflect all modified buffer blocks in the disk buffer
prior to the checkpoints to the disk!!!
/* 定期将内存disk buffer中修改后的数据块写回外设磁盘
DB文件中
 write a log record < checkpoint> onto the log file on stable
storage
Refer to Fig.17.0.12 for the illustration of checkpoints
 the updated value of data item X, i.e. V2, is
reflected/outputted to the disk by the checkpoint(i)
 the updated value of data item Y, i.e. U2, is
reflected/outputted to the disk while Ti is in partially commit
state


June 2008
Database System Concepts - Chapter 17 Recovery System -
49
…
begin-trans.
…write(X)
local
-buff.
Xi: V1→V2
disk
-buff.
Bx:V2
…write(Y)… commit
Yi: U1→U2
Bx:V2
output(Bx)
Bx:V1
By:U1
Bx:V2
By:U1
Bx:V2
By:U2
output(By)
Bx:V2
By:U2
chk(i)
chk(i-1)
states:
Bx:V2
By:U2
active
chk(i+1)
partially
commit committed
log <T start>
<Ti, X, V1, V2> checkpoint(i) <Ti, Y, U1, U2> <Ti commit>
i
file :
Fig.17.0.12 Illustration of Ti and checkpoints
17.4-4 Checkpoints (cont.)
T1
T2
T3
cpk(i-1)
T4
T5
cpk(i+1)
cpk(i)
the time point when
the failure occurs
log:
< T1 start>; …;< T1 commit > ; < T2 start>; …;< T2 commit > ;
< T3 start>; …< cpk >; …< T3 commit >; < T4 start>; …< cpk >; …
< T4 commit >; < T5 start >
Fig.17.0.13 Checkpoints
June 2008
Database System Concepts - Chapter 17 Recovery System -
51
An example
The recovery system periodically performs checkpoints that
require the following sequence of actions to take place except:
 A. Output onto stable storage all log records currently
residing in main memory
 B. Output to the disk all modified buffer blocks
 C. Output onto stable storage a log record <checkpoint>
 D. Redo some failure transactions.
 Answers:
D

June 2008
Database System Concepts - Chapter 17 Recovery System -
52
17.4-4 Checkpoints (cont.)

If deferred DB modification scheme is employed ,the write()
operation issued by a transaction can be reflected on to DB file on
the disk at the checkpoints or prior to the checkpoint while Ti is
in partially commit state e.g. Fig.17.0.12 and Fig.17.0.13

SQL provides mechanisms to define savepoints
 e.g. checkpoint defining in T-SQL as described in Fig.17.0.14
June 2008
Database System Concepts - Chapter 17 Recovery System -
53
Checkpoint vs Savepoint

The checkpoint is similar to the savepoint in the real DBS, such
as DB2 or Sybase

But they are somewhat different in operating semantics
 the savepoint may not be issued periodically
June 2008
Database System Concepts - Chapter 17 Recovery System -
54
BEGIN TRANSACTION
USE student-DB
INSERT INTO student
VALUES (“ 03402”, “王菲”, “CS”, “1985/05/15”)
SAVE TRAN My-savepoint
/* defining save-point */
DELETE FROM student
WHERE name= “王菲” or “章立”
ROLLBACK TRAN My-savepoint
COMMIT TRAN
GO
Note:
/*rollback将操作滚回到保存点 My-savepoint: delete 操作被rolled back, 而
insert操作则不被rolled back;
DB恢复为delete操作执行前的状态,新插入的元组 (“03402”, “王菲”,
“CS”, “1985/05/15”)并未被删除,仍然保存在数据库中*/
Fig.17.0.14 An example of savepoint in T-SQL
st-id
stname
department
birthdate
st-id
stname
department
birthdate
03405
章立
CS
1985/0
1/25
03402
王菲
CS
1985/0
5/15
…
…
…
…
03405
章立
CS
03409
李龙
CS
1984/1
2/20
1985/0
1/25
…
…
…
…
…
…
…
…
03409
李龙
CS
03411
赵新
CS
1985/0
6/18
1984/1
2/20
…
…
…
…
03411
赵新
CS
1985/0
6/18
(a) DB before the
transaction starts
(b) DB after insert is
issued
Fig.17.0.15-I DB instances at different timepoints
st-id
stname
department
birthdate
st-id
stname
department
birthdate
…
…
…
…
03402
王菲
CS
03409
李龙
CS
1984/1
2/20
1985/0
5/15
03405
章立
CS
…
…
…
…
1985/0
1/25
…
…
…
…
03411
赵新
CS
1985/0
6/18
03409
李龙
CS
1984/1
2/20
…
…
…
…
03411
赵新
CS
1985/0
6/18
(c) DB after delete is
issued
(d) DB after rollback is
issued
Fig.17.0.15-II DB instances at different timepoints
17.4-4 Checkpoints (cont.)
With the help of the checkpoint, the recovery scheme mentioned
in 17.4-2 and 17.4-3 can be refined as follows, assuming the
transactions remain running serially
 After a failure occurs, the recovery scheme search the log to
determine the most recent Ti that started executing just before the
most recent checkpoint took place, e.g. T4 and cpk(i) in
Fig.17.0.13
 scanning the log backwards, from the end of the log, until it
find the first <checkpoint> record (that is the final
<checkpoint> record in the log and corresponds to the most
recent checkpoint )
 continuing the scanning backward until it finds the next < Ti
start> record in the log
 Ti is the most recent transaction

June 2008
Database System Concepts - Chapter 17 Recovery System -
58
17.4-4 Checkpoints (cont.)


in Fig.17.0.13, the most recent checkpoints is chk(i), and the
most recent Ti found is T4, which started executing just before
chk(i) took place
Once the system has identified the most recent Ti , recovery
scheme applies redo and undo to only Ti and all transactions that
started executing after Ti , denoting these transactions by the set T,
and assuming immediate-modification is used, then
 for all transactions in T that have no <Tk commit> record in
the log, execute undo(Tk) to rollback the uncommitted Tk
 in Fig.17.0.13 (
) , undo is applied to T5 ;
June 2008
Database System Concepts - Chapter 17 Recovery System -
59
17.4-4 Checkpoints (cont.)
for all transactions in T such that the <Tk commit> record
appears in the log, execute redo(Tk)
 e.g. in Fig.17.0.13 (
), redo is applied to T4;
 all the transactions that are not in the set T are ignored
 the transactions T1 , T2 and T3 are ignored


When deferred-modification is employed, undo operation does
not need to be applied
June 2008
Database System Concepts - Chapter 17 Recovery System -
60
An Example




A set of serial T1, T2, T3 and T4 in Fig.17.0.16
The most recent transaction found is T4 which started executing
just before the checkpoint took place
The set of transaction to be considered is T = {T2 , T3 , T4 }
 T1 can be ignored, because the updates issued by T1 already
output to disk due to checkpoint
If immediate-modification is employed, then
 redo is applied to T2 and T3
 undo is applied to T4
 T1 is ignored
June 2008
Database System Concepts - Chapter 17 Recovery System -
61
TP1
TP0
T1
T2
T3
T4
checkpoint
system failure
considered parts of the log
log:
< T1 start>; …;< T1 commit > ; < T2 start>; … < cpk >…;< T2
commit > ; < T3 start>; …; …< T3 commit >; < T4 start>
Fig.17.0.16 An example of Checkpoint and undo/redo
§17.6 Recovery with Concurrent Transactions
Extend the log-based recovery scheme to deal with multiple
concurrent transactions
 It is assumed that
 immediate modification scheme is allowed
 the system has a single log
 the system has a single disk buffer shared by all transactions
 a block in the disk buffer is permitted to have data items
updated by one or more transactions

June 2008
Database System Concepts - Chapter 17 Recovery System -
63
Pitfalls in Recovery with
Concurrent Transactions

As shown in Fig.17.0.17, for concurrent transactions T0 and T1,
which update data item Q serially (V1→V2 → V3) , T0 should
be rolled back due to a failure

On the basis of log-based recovery, the data item Q is restored to
its initial value V1 before T0 and T1 start, thus the update on Q
performed by T1 (i.e. V3 ) is lost if T0 is rolled back, although
T1 is a successful committed transaction
June 2008
Database System Concepts - Chapter 17 Recovery System -
64
T0 : …
… write(Q) …opj; … opn; …
…
… write(Q)
T1 : …
local
-buff.
Q0 : V1→V2
Q1: V2→V3
disk
-buff.
BQ:V2
BQ:V3
BQ :V2
BQ :V3
BQ:V1
…. abort ; recovery
due to failure by undo
opk; commit
undo(T0)
Bx :V1
<T0 start> <T1 start> <T0,Q,V1,V2> <T1,Q,V2,V3><T1 commit> <T0 abort>
log
Fig.17.0.17 Illustration of concurrent updates
in case of rollback
17.6-1 Interaction with Concurrent
Control (cont.)
To avoid the pitfalls as described in Fig.17.0.17, it is desirable
that
 if a transaction T has update a data item Q, no other
transaction may update the same data item Q until T has
committed or rolled back
 This requirement can be ensured by strict two-phase locking
protocol

June 2008
Database System Concepts - Chapter 17 Recovery System -
66
17.6-2 Transaction Rollback

When a failed transaction Ti is rolled back, the recovery scheme
scans the log backward, finds out the log record such as < Ti, Q,
V1, V2> and restores the data item Q to its old value V1

Scanning the log backward is important, because Ti may
updates the data item Q several times; the “oldest” update < Ti,
Q, Vi, Vj> by Ti should be employed to restore the data item Q
June 2008
Database System Concepts - Chapter 17 Recovery System -
67
An Example

The log is as follows
<Ti start> <Ti,Q,V0,V1> <Ti,Q,V1,V2> <Ti,Q,V2,V3> <Ti abort>
scanning backward


Scanning backward the updates on Q: V0←V1 ←V2←V3
Q is restored to V0
June 2008
Database System Concepts - Chapter 17 Recovery System -
68
17.6-3 Checkpoints

As described in 17.4-4, when the checkpoint is used for the
recovery for a single transaction or a set of serial transactions, it
was necessary to consider only the following transactions during
recovery
 the one transactions, if any, that was active at the time of
most recent checkpoint
 those transactions that started after the most recent
checkpoint
June 2008
Database System Concepts - Chapter 17 Recovery System -
69
17.6-3 Checkpoints (cont.)

For a set of concurrent transactions, since several transactions
may have been active at the time of most recent checkpoint, the
checkpoint log record should be of the form
 <checkpoint L>, where L is a list of transactions active at the
time of the checkpoint
 also assuming that the transaction do not perform updates
either on the buffer blocks or on the log while the checkpoint
is in progress
 this constraint can be relaxed in fuzzy checkpoint scheme
June 2008
Database System Concepts - Chapter 17 Recovery System -
70
An Example
checkpoint(k)
checkpoint(k+1)
system
failure
t
T1
T2
T3
T5
T4
T6
<checkpoint(k), {T1, T3}>
<checkpoint(k+1), {T1, T6}>
log:…; <checkpoint(k+1), {T1, T6}>; …; <T5 start>; …; <T1 commit>; …
Fig.17.0.18 An example of checkpoint log
June 2008
Database System Concepts - Chapter 17 Recovery System -
71
17.6-4 Restart Recovery !!

When the system, in which a set of transactions execute
concurrently, recovers from a crash, it first constructs two lists,
that is, undo-list consisting of incomplete transactions which
must be undone and redo-list consisting of finished transactions
that must be redone, as follows
 initialize undo-list and redo-list to empty
 scan the log backwards from the end, stopping when the first
<checkpoint L> record is found, i.e. the most recent
checkpoint is found
 e.g. <checkpoint(k+1), {T1, T6}> in Fig.17.0.18
June 2008
Database System Concepts - Chapter 17 Recovery System -
72
17.6-4 Restart Recovery (cont.)


June 2008
for each record found during the backward scan

if it is of the form <Ti commit>, add Ti to redo-list
/* Ti finishes before the failure occurs

if it is of the form <Ti start>and Ti is also not in redo-list,
add Ti to undo-list
/* Ti starts after the most recent checkpoint but does not
finish before the failure occurs
for every Ti in the transaction list L of the the most recent
checkpoint , if Ti is not in redo-list, add Ti to undo-list
Database System Concepts - Chapter 17 Recovery System -
73
17.6-4 Restart Recovery (cont.)

At this point, undo-list consists of incomplete transactions which
must be undone, and redo-list consists of finished transactions
that must be redone
June 2008
Database System Concepts - Chapter 17 Recovery System -
74
Example One

With respect to the concurrent transactions in Fig.17.0.18 (
then the failure occurs, the log is of the form as follows
),
…; <checkpoint(k+1), {T1, T6}>; …; <T5 start>; …; <T1 commit>; …


During the backward scanning, T1 is added to redo-list and T5 is
added to undo-list

redo-list ={T1 }

undo-list ={T5 }
For T6 in L of the the most recent checkpoint(k+1), it is not in
redo-list and so added to undo-list

undo-list ={T6 , T5 }
June 2008
Database System Concepts - Chapter 17 Recovery System -
75
17.6-4 Restart Recovery (cont.)

After the redo-list and undo-list have been constructed, the
recovery proceeds as follows
 scan backward the log from the most recent record, i.e. the
end of the log, and perform an undo for each log record, e.g.
< Ti , X, V1, V2>, that belongs to a transaction Ti in the
undo-list {Ti }
 the scan stops when the <Ti start> records have been
encountered for every Ti in the undo-list
/* 后向扫描,通过undo操作,依次撤销undo-list表所记录
的各个未提交事务Ti在故障点之前的更新操作
 locate the most recent <checkpoint L> record on the log
 maybe by scanning the log forward, if the checkpoint
record was passed in step1
June 2008
Database System Concepts - Chapter 17 Recovery System -
76
17.6-4 Restart Recovery (cont.)
scan the log forwards from the most recent <checkpoint L>
record, and performs redo for each log record such as < Ti ,
X, V1, V2>, that belongs to a transaction Ti in the redo-list
{Ti }, till the end of the log
/* 从最近检查点开始,前向扫描,通过redo操作,依次重
做redo-list表所记录的各个已提交事务Ti在最近检查点之
后所做的更新操作
 The other transactions not in redo-list and undo-list are ignored
 Notes
 during the recovery procedure, it is important to undo before
redoing
 undoing should proceed backward from the end of the log
 redoing should be performed forward from the most recent
checkpoint Database System Concepts - Chapter 17 Recovery System - 77
June 2008

Example Two
With respect to the concurrent transactions T0, T1, T2 and T3, the
initial values of data items A, B, C and D are 0
 assuming that immediate database modification technique is
used for log-based recovery
 When a failure occurs, the log is given in Fig.17.0.19 in next
slide


Go over the steps of the recovery algorithm and give the
values of data items A, B, C and D after recovery performs
June 2008
Database System Concepts - Chapter 17 Recovery System -
78
A=0, B=0, C=0, D=0
Step3:
scanning
forward to
redo T3
<T0 start>
<T0, A, 0, 10>
<checkpoint {T0}>
A=10, B=0, C=0, D=0
<T0 commit>
<T1 start>
Step2:
<T1, B, 0, 10>
scanning
backward to
<T2 start>
undo T2 and T1
<T2, C, 0, 10>
<T2, C, 10, 20>
A=10, B=10, C=20, D=0
<checkpoint {T1, T2}>
<T3 start>
Step1:
<T3, A, 10, 20> scanning
backward to
A=20, B=10, C=20, D=10
<T3, D, 0, 10>
construct redolist and undo-list
<T3 commit>
failure occurs
A=20, B=0, C=0, D=10
recovery
Fig.17.0.19 The log for concurrent transactions
Example Two (cont.)
The recovery algorithm proceeds in three steps
 Step1. Construct the redo-list and the undo-list by scanning
backward from the end of the log to the most recent checkpoint
<checkpoint {T1, T2}>
 redo-list ={T3}
 undo-list = {T1, T2}
 T0 is ignored
 Step2. Scan the log from the most recent record <checkpoint {T1,
T2}> backward, and perform undo for T1 and T2 in undo-list
 after undo T2 , data item C is restored to 0, that is 20 →10 → 0
 after undo T1 , data item B is restored to 0, that is 10 → 0

June 2008
Database System Concepts - Chapter 17 Recovery System -
80
Example Two (cont.)

Step3. Scan the log from the most recent record <checkpoint {T1,
T2}> forward, and perform redo for T3 in redo-list
 after redo T3 , data item A is restored to 20, that is 10 →20,
data item D is restored to 10, that is 0 → 10

After recovery, the values of data items in DB are
 A=20, B=0, C=0, D=10
June 2008
Database System Concepts - Chapter 17 Recovery System -
81
Example Three
Fig.17.0.20 shows the time scale for concurrent T1, T2, T3, T4
and T5, a checkpoint is set at Tc,
 After a system failure occurs at Tf, what recovery operations (i.e.
redo, undo, or ignored) should the DBMS recovery subsystem
apply to each transaction?
 assuming that immediate database modification technique is
used for log-based recovery
 Solution:
 undo:
T1, T5
 ignored: T
2
 redo:
T3, T4

June 2008
Database System Concepts - Chapter 17 Recovery System -
82
Example Three (cont.)
Tc
Tf
T1
T2
T3
T4
T5
system failure
checkpoint
Note:
Ti succeeds in being committed;
Ti fails and is aborted
Ti
Ti
Fig.17.0.20 Time scale for concurrent transactions
June 2008
Database System Concepts - Chapter 17 Recovery System -
83
Example Four

The initial values of data items A and B are 1000 and 2000
respectively

If immediate-modification is employed, give the log that is
constructed by the recovery scheme and describes the
concurrent executing of transactions T1 and T2, as shown in
Fig.17.0.21
June 2008
Database System Concepts - Chapter 17 Recovery System -
84
T1:
T2:
Recovery
begin-transaction
read (A)
log
<T1 start>
begin-transaction
<T2 start>
A:= A-50
write (A)
<T1, A, 1000, 950>
read(A)
temp:= A*0.1
checkpoint
A := A- temp
write (A)
<checkpoint {T1. T2}>
<T2, A, 950, 855>
read (B)
B:= B+50
write(B)
commit
<T1, B, 2000, 2050>
<T1, commit>
read (B)
checkpoint
B:= B + temp
write(B)
abort
<checkpoint {T2}>
<T2, B, 2050, 2145>
<T2 abort >
Fig.17.0.21 Concurrent executing of T1 and T2
习题

Example Four + Example Two
 undo: concurrent schedule  log file
 log file  recovery actions
 recovery actions  the recovered values of data items
June 2008
Database System Concepts - Chapter 17 Recovery System -
86
§17.7 Buffer Management
17.7.1 Log Record Buffering
 With the help of log record buffering scheme, it is not necessary
for each log record to be output to the stable storage at the time it
is created;
 Log record buffering
 each log record is buffered in main memory, instead of of
being output directly to stable storage
 log records are output to stable storage when a block of log
records in the buffer is full
 several log records can thus be output using a single output
operation, reducing the I/O cost
June 2008
Database System Concepts - Chapter 17 Recovery System -
87
17.7.2 Database Buffering

DBMS employs two-tier storage hierarchy ( ) and maintains an
in-memory buffer of data blocks, named disk/block/system buffer
 when a new block B2 is needed and the buffer is full, an
existing block B1 needs to be removed from buffer
 if the block B1 has been updated, it must be output to disk
prior to B2 is input to the buffer
 similar to page replacement in virtual memory in operating
systems
June 2008
Database System Concepts - Chapter 17 Recovery System -
88
17.7.2 Database Buffering (cont.)
If the input of the block B2 causes B1 to be chosen for output, then
before B1 is output, all log records pertaining to the data in B1
must be output to stable storage
 The sequence of actions taken by the system would be
 output log records to stable storage until all log records
pertaining to the data in B1 have been output
 output block B1 to the disk
 input block B1 from disk to disk buffer
 No updates/writes should be in progress on a block B when B is
output to disk; this requirement can be meet by using a special
means of locking

June 2008
Database System Concepts - Chapter 17 Recovery System -
89
17.7.3 OS Roles in Buffer Management

Database buffer can be implemented either
 in an area of real main-memory reserved for the database, and
managed by DBMS, rather than OS
 in virtual memory provided by OS
All these two approaches of buffer implementation have some
merits or demerits
 The approach of implementing and managing database buffer by
DBMS is more popular database systems such as DB2, Oracle,
and SQL Server etc.

June 2008
Database System Concepts - Chapter 17 Recovery System -
90
Appendix A Backup
备份 (Backup)
 将数据库DB中的数据转储在stable storage中
 经常性地建立DB的备份副本,以便出现故障时根据日志
和备份文件恢复DB中内容
 经常性地定期备份数据库是一种常用的支持数据库恢复
(recovery)的手段,可帮助DBS恢复过程


DBS恢复系统(recovery component)利用备份文件(Back file)、
日志记录文件(Log record file)、检查点(Checkpoints),采用一
定的恢复机制进行DB恢复, 以保证系统的高可用性
 Fig.17.0.18
June 2008
Database System Concepts - Chapter 17 Recovery System -
91
日志文件
DB
备份文件
备份
日志纪录
检查点
DBMS
Fig.17.0.18 DBS 恢复系统
恢复机制
Appendix A Backup (cont.)
可从以下几方面考察数据库恢复系统
 备份能力
 按指定间隔建立整个数据库的备份副本
 例如, 重要的数据库系统可以每天建立副本,一般在每夜
批更新作业完成之后
 备份文件保存在盒式磁带或光盘媒介中,而不是数据库通
常驻留的磁盘媒介中
 日志记录能力
 建立日志文件, 日志文件包含自从上次完整数据库备份以
来所执行的所有事务的数据更新描述信息
 日志记录: <Ti, Q, X1, X2>, Ti的各类时间戳

June 2008
Database System Concepts - Chapter 17 Recovery System -
93
Appendix A Backup (cont.)
检查点能力
 在日志文件中建立检查点纪录,以加速恢复过程; 在恢复
时,恢复管理器模块只需返回到最近的检查点纪录
 恢复管理器模块
 采用一定的恢复算法,根据备份文件和日志文件将数据库
还原到最近的一致的数据库状态

June 2008
Database System Concepts - Chapter 17 Recovery System -
94
Appendix A Backup (cont.)

日常备份和恢复活动的主要任务如下
 定期安排备份,如每天一次,如果需要,频率应更高
 确保正确完成日志纪录,并将所有必要细节写入日志纪录
 监控写入检查点纪录的频率
写入检查点记录将导致系统开销, 因此需要控制和调整
写入检查点纪录的频率,在系统开销和恢复期节省时间之
间找一个平衡点
June 2008
Database System Concepts - Chapter 17 Recovery System -
95
Appendix B DBS Maintaining

DBS系统管理维护的主要工作如Fig.17.0.19 所示
June 2008
Database System Concepts - Chapter 17 Recovery System -
96
日常维护
备份与恢复
安全维护
空间管理
并发控制
监控和分析 系统进化 性能优化/调整
收集统计数据 增强应用程序
分析操作
模式修改
DBMS版本
使用基准程
序(benchmark) 升级
调整索引
调整查询
调整事务
调整模式
DBMS
DBA
Fig.17.0.19 DBS 管理维护主要工作
数
据
库
Have
a
break
June 2008
Database System Concepts - Chapter 17 Recovery System -
98