Download Document

Document related concepts

Microsoft SQL Server wikipedia , lookup

Oracle Database wikipedia , lookup

Open Database Connectivity wikipedia , lookup

Global serializability wikipedia , lookup

Consistency model wikipedia , lookup

Relational model wikipedia , lookup

Functional Database Model wikipedia , lookup

Database wikipedia , lookup

Database model wikipedia , lookup

Clusterpoint wikipedia , lookup

Microsoft Jet Database Engine wikipedia , lookup

Commitment ordering wikipedia , lookup

Versant Object Database wikipedia , lookup

Extensible Storage Engine wikipedia , lookup

Concurrency control wikipedia , lookup

Serializability wikipedia , lookup

Transcript
Transaction Manager
Concurrency Control
Recovery Management
Transactions
A transaction program is a unit of program execution that
accesses and possibly updates various data items.
[ A transaction program is a collection of operations that form a
single unit of work.]
Clearly, it is essential that all these operations occur, or
that, in case of failure, none occur. A database system must
ensure proper execution of transactions despite failures – either
the entire transaction executes, or none of it does. Furthermore, it
must manage concurrent execution of transactions in a way that
avoids the introduction of inconsistency.
Transaction Program
Transaction is execution program coded with High-level data
manipulation language to update contents of a database. DBMS must
ensure that a transaction should always transform the database from one
consistent state (before update) to the another (after update) , although
we accept that consistency may be violated while the transaction is in
progress.
Database recovery is the process of restoring the database to a correct
state following a failure. The failure may be the result of system crash due to
hardware or software errors, a media failure, such as a head crash, or an
application software error, such as logical error in the program that is
accessing the database. It may also be the result of unintentional or intentional
corruption of destruction of data or facilities by operators or users. Whatever
the underlying cause of the failure, the DBMS must be able to recover from
the failure and restore the database to a consistent state.
Architecture of a TPS Application
Notice of
Event
Transaction
Keyed
TPS Data
TPS
Program
Event
Response
Response
TPS
Data
Report(s)
The event is recorded by keying it into the computer system as a transaction,
which is a representation of the event. One or more TPS programs process
the transaction against TPS data. The TPS program generates two types of
output. It sends messages back to the user terminal, and it generates printed
documents.
Transaction State
A transaction may not always complete its execution successfully.
Such a transaction is termed aborted. If we are to ensure the
atomicity property, an aborted transaction must have no effect on
the state of the database. Thus, any changes that the aborted
transaction made to the database must be undone. Once the
changes caused by an aborted transaction have been undone, we
say that the transaction has been rolled back.
Partially
committed
Committed
failed
Aborted
active
Transactions access data using two operations:
• read(X), which transfers the data item X from the
database
to a local buffer belonging to the transaction that executed
the read operation.
• write(X), which transfers the data item X from the local
buffer of the transaction that executed the write back to the
database.
In a real database system, the write operation does not
necessarily result in the immediate update of the data on the
disk; the write operation may be temporarily stored in memory
and executed on the disk later. For now, however, it is assumed
that the write operation updates the database immediately.
Transaction Concepts
Usually, a transaction is initiated by a user program written
in high-level DML or programming language, where it is delimited
by statements (or function calls) of the form begin transaction and
end transaction. The transaction consists of all operations executed
between the begin transaction and end transaction.
To ensure integrity of the data, we require that the database
system maintain ACID properties of the transactions:
ACID properties of Transaction ensured by DBMS
Atomicity. Either all operations of the transaction are reflected
properly in the database, or none are.
Consistency. A transaction satisfies integrity constraints after
completion. It must preserve the consistency of the database.
Isolation.
Even though multiple transactions may execute
concurrently, the system guarantees that, for every pair of
transactions Ti and Tj, it appears to Ti that either Tj finished
execution before Ti started, or Tj started execution after Ti finished.
Thus each transaction is unaware of other transactions executing
concurrently in the system.
Durability. After a transaction completes successfully, the change
it has made to the database persist, even if there are system failures.
Atomicity: Because of the failure (power failures, hardware
failures, and software errors), the state of the system no
longer reflects a real state of the world that the database is
supposed to capture. We term such a state an inconsistent
state. We must ensure that such inconsistencies are not
visible in a database system. [The system must be at some
point be in a temporary inconsistent state, however, it is
eventually replaced by the consistent state.]
The basic idea behind ensuring atomicity is this: The
database system keeps track (on disk) of the old values of any data
on which a transaction performs a write, and, if the transaction does
not complete its execution, the database system restores the old
values to make it appear as though the transaction never executed.
Ensuring atomicity is the responsibility of the database system
itself; specifically, it is handled by a component called the
transaction-management component.
Consistency: Ensuring consistency for an individual transaction is
the responsibility of the application programmer. This task may be
facilitated by automatic testing of integrity constraints
Isolation: Even if the consistency and atomicity properties are
ensured for each transaction, if several transactions are executes
concurrently, their operations may interleave in some undesirable
way, resulting in an inconsistent state.
A way to avoid the problem of concurrently executing
transactions is to execute transaction serially – that is, one after
the other. However, concurrent execution of transactions provides
significant performance benefits.
The isolation property of a transaction ensures that
the concurrent execution of transactions results in a system
state that is equivalent to state that could have been obtained
had these transactions executed one at a time in some order.
Ensuring the isolation property is the responsibility of a
component of the database system called the concurrencycontrol component.
Durability: We assume that a failure of the computer system may
result in loss of data in the main memory, but data written to
disk are never lost. DBMS can guarantee durability by
ensuring that either :
1. The updates carried out by the transaction have been
written to disk before the transaction completes.
2. Information about the updates carried out by the
transaction and written to disk is sufficient to enable the
database to reconstruct the updates when the database system
is restarted after the failure.
Ensuring durability is the responsibility of a component of
the database system called the recovery-management
component.
Access
manager
Transaction
Manager
Scheduler
Buffer
manager
Recovery
manager
File
manager
System
manager
Database and
system catalog
A transaction manager is software that monitors the behavior
of transactions and decides whether each action can be allowed
to execute. The transaction manager coordinates transactions
on behalf of application programs. It communicates with the
scheduler (sometimes referred to as the lock manager). This
module is responsible for implementing a particular strategy
for concurrency control. If a failure occurs during the
transaction, then the database could be inconsistent. It is the
task of the recovery manager to ensure that the database in
consistent state. Finally, the buffer manager is responsible for
the transfer of data between disk storage and main memory.
Transaction Atomicity in a Single-Transaction System
In a single-transaction system, only one transaction is execute at any
time. If a transaction is active, no other transaction can start. This situation is
the same as having one application connected to the database server at a time.
To support atomicity, a database server must support operations to
open a transaction, commit a transaction, and rollback a transaction by
grouping one or more SQL commands together. If either command fails,
transaction manager can roll back all commands, returning the data source to
its original state. If all commands are successful, the transaction manager
commits the changes and make them permanent.
Concurrent Transaction Processing
Concurrency arises when many applications are executing transactions
at the same time. A single database server processes all operations, so only
one database operation can be processed at a time. However, the operations
of the transactions overlap because independent applications are requesting
service by the database server in parallel.
Schedule is a sequence of the operations by a set of concurrent
tractions that preserves the order of the operations in each of the
individual transactions.
Clearly, a schedule for a set of transactions must consists of
all instructions of those transactions, and must preserve the
chronological order in which instructions appear in each
individual transaction.
A schedule can be serial or non-serial schedule.
Each serial schedule consists of a sequence of instructions from
various transactions, where the operations of each transaction are
executed consecutively without any interleaved operations from
other transactions. For a set of n transactions, there exist n!
different valid serial schedules.
When the database system executes several transactions
concurrently, the corresponding schedule no longer needs to be
serial. OS must perform a context switch (CPU time is shared)
among all transactions which concurrently access to database.
Several execution sequences are possible, since the
various instructions from several transactions may now be
interleaved. The number of possible schedules for a set of n
transactions is much larger then n!.
Schedule : A sequence of the operations by a set of concurrent transactions that preserves
the order of the operations in each of the individual transactions.
Serial schedule : A schedule where the operations of each transaction are executed
consecutively without any interleaved operations from other transactions.
T1:
Read(BALx);
BALx = BALx + 100;
Write(BALx);
Read(BALy);
BALy = BALy – 100;
Write(BALy);
T1
T2:
Read(BALx);
BALx = BALx * 1.1;
Write(BALx);
Read(BALy);
BALy = BALy * 1.1;
Write(BALy);
T2
Read(BALx);
BALx = BALx + 100;
Write(BALx);
Read(BALy);
BALy = BALy – 100;
Write(BALy);
Read(BALx);
BALx = BALx * 1.1;
Write(BALx);
Read(BALy);
BALy = BALy * 1.1;
Write(BALy);
Serial schedule : A schedule where the operations of each transaction are executed
consecutively without any interleaved operations from other transactions. (T2 before T1)
T1:
Read(BALx);
BALx = BALx + 100;
Write(BALx);
Read(BALy);
BALy = BALy – 100;
Write(BALy);
T1
T2:
Read(BALx);
BALx = BALx * 1.1;
Write(BALx);
Read(BALy);
BALy = BALy * 1.1;
Write(BALy);
T2
Read(BALx);
BALx = BALx * 1.1;
Write(BALx);
Read(BALy);
BALy = BALy * 1.1;
Write(BALy);
Read(BALx);
BALx = BALx + 100;
Write(BALx);
Read(BALy);
BALy = BALy – 100;
Write(BALy);
Nonserial schedule :
are interleaved.
T1:
A schedule where the operations from a set of concurrent transactions
Read(BALx);
BALx = BALx + 100;
Write(BALx);
Read(BALy);
BALy = BALy – 100;
Write(BALy);
T1
T2:
Read(BALx);
BALx = BALx * 1.1;
Write(BALx);
Read(BALy);
BALy = BALy * 1.1;
Write(BALy);
T2
Read(BALx);
BALx = BALx + 100;
Read(BALx);
BALx = BALx * 1.1;
Write(BALx);
Read(BALy);
BALy = BALy * 1.1;
Write(BALx);
Read(BALy);
BALy = BALy – 100;
Write(BALy);
Write(BALy);
If several transactions run concurrently, and control of concurrent
execution is left entirely to the OS, database consistency can be
destroyed despite the correctness of each individual transaction
We can ensure consistency of the database under concurrent
execution by making sure that any schedule that executed has the
same effect as a schedule that could have occurred without any
concurrent execution. That is, the schedule should, in some sense,
be equivalent to a serial schedule.
Potential problems caused by concurrency
1. Lost update problem : An apparently successfully completed update operation
by one user can be overridden by another user.
T3
Time1 balance1 = (select balance from
Customer where accountID = 101);
balance1 += 5.00;
Time 2
Time 3 update Customer set balance =
?balance1 where accountID = 101;
Time 4
Time 5 Commit
Time 6
T4
balance (15)
15
balance2 = (select balance from
Customer where accountID = 101);
balance2 += 10.00;
15
20
update Customer set balance =
?balance2 where accountID = 101;
Commit
25
25
25
Potential problems caused by concurrency
2. The uncommitted dependency problem : This problem occurs when one transaction
is allowed to see the intermediate result of another transaction before it has committed.
T3
Time1 balance1 = (select balance from
Customer where accountID = 101);
balance1 += 5.00;
Time 2 update Customer set balance =
?balance1 where accountID = 101;
Time 3
Time 4 Rollback
Time 5
Time 6
T4
balance (15)
15
20
balance2 = (select balance from
Customer where accountID = 101);
balance2 += 10.00;
update Customer set balance =
?balance2 where accountID = 101;
Commit
20
15
30
30
3. Incorrect summary problem : is an inconsistent retrieval problem which occurs when a
transaction reads several values, but another transaction updates some of the values while
Balance
the first transaction is still executing.
T3
Time1 balance1 = (select balance from
Customer where accountID = 101);
balance1 += 10.00;
Time 2 update Customer set balance =
?balance1 where accountID = 101;
Time 3
Time 4
Time 5
Time 6
Time 7
balance1 = (select balance from
Customer where accountID = 102);
balance1 -= 10.00;
update Customer set balance =
?balance1 where accountID = 102;
Commit
T4
bal 101 bal 102
15
15
25
15
25
15
25
5
25
5
Total = select sum(balance)
from customer where accountID
= 101 or accountID = 102
Commit
A phantom read problem : It occurs when an aggregate operation is repeated by
a transaction and yields a different result because of the insertion of a row by another
transaction
T1
Time1 totalA = (select sum(balance) from
Customer where zipcode = 31101);
Time 2
Time 3 totalB = (select sum(balance) from
Customer where zipcode = 31101);
Time 4
Time 5 Commit
T2
insert into customer (accountID,
balance, zipcode) values
(105, 10.00, 31101)
sum(balance)
100
100
100
110
rollback
A nonrepeatable read problem : It occurs when a transaction reads the same value more
than one time. In between reading the data item, another transaction modifies the data item.
T1
Time1 balance1 = (select balance from
Customer where accountID = 101);
Time 2
Time 3 balance2 = (select balance from
Customer where accountID = 101);
T2
balance
15
15
update customer set balance = 0.0
where accountID = 101;
0.0
110
Recoverability :
If a transaction fails, the atomicity property requires that we undo the effects
of the transaction. In addition, the durability property states that once a
transaction commits, its changes cannot be undone.
Recoverable schedule :
A schedule where, for each pair of transactions
Ti and Tj, if Tj reads a data item previously written by T i, then the commit
operation of Ti precedes the commit operation of Tj.
Non-recoverable schedule
T3
Time1 balance1 = (select balance from
Customer where accountID = 101);
balance1 += 5.00;
Time 2 update Customer set balance =
?balance1 where accountID = 101;
Time 3
Time 4
Time 5
Time 6 Rollback
T4
balance (15)
15
20
balance2 = (select balance from
Customer where accountID = 101);
balance2 += 10.00;
update Customer set balance =
?balance2 where accountID = 101;
Commit
20
30
30
Locking : A procedure used to control concurrent access to data. When one
transaction is accessing the database, a lock may deny access to other
transactions to prevent incorrect results.
Locking methods are the most widely used approach to ensure
serializability of concurrent transactions. There are several variations, but all
share the same fundamental characteristic, namely that a transaction must claim
a read (shared) or write (exclusive) lock on a data item before the corresponding
database read or write operation.
The lock prevents another transaction from modifying the item or
even reading it, in the case of write lock.
Data items of various sizes, ranging from the entire database down
to a field, may be locked. The size of the item determines the fineness, or
granularity, of the lock.
Read lock : If a transaction has a read lock on a data item, it can read the item
but not update it
Write lock : If a transaction has a write lock on a data item, it can both read and
update the item.
•·
Any transaction that needs to access a data item must first lock the item,
requesting a read lock only access or a write lock for both read and write
access.
•·
If the item is not already locked by another transaction, the lock will be
granted.
·
If the item is currently locked, the DBMS determines whether the request is
compatible with the existing lock. If a read lock is requested on an item that
already has a read lock on it, the request will be granted; otherwise, the
transaction must wail until the existing lock is released.
• A transaction continues to hold a lock until it explicitly releases it either during
execution or when it terminates (aborts or commits). It is only when the write
lock has been released that the effects of the write operation will be made
visible to other transaction.
Lock can solve Lost update problem : (An apparently successfully completed
update operation by one user can be overridden by another user.)
Time1
Time 2
Time 3
Time 4
Time 5
Time 6
Time 7
T3
Write_lock (balance)
balance1 = (select balance from
Customer where accountID = 101);
balance1 += 5.00;
update Customer set balance =
?balance1 where accountID = 101;
Commit/ Unlock (balance)
T4
balance (15)
15
Write_lock (balance)
Wait
Wait
balance2 = (select balance from
Customer where accountID = 101);
balance2 += 10.00;
update Customer set balance =
?balance2 where accountID = 101;
Commit/ Unlock (balance)
20
20
20
30
30
Lock can solveThe uncommitted dependency problem : This problem occurs when one
Transaction is allowed to see the intermediate result of another transaction before it has
committed.
T3
Time1 Write_lock (balance)
balance1 = (select balance from
Customer where accountID = 101);
balance1 += 5.00;
Time 2 update Customer set balance =
?balance1 where accountID = 101;
Time 3
Time 4
Time 5 Rollback / Unlock (balance)
Time 6
Time 7
Time 8
T4
balance (15)
15
20
Write_lock (balance)
Wait
Wait
balance2 = (select balance from
Customer where accountID = 101);
balance2 += 10.00;
update Customer set balance =
?balance2 where accountID = 101;
Commit / Unlock (balance)
15
15
25
25
Lock can solve Incorrect summary problem :
T3
Time1 Write_lock (balance)
balance1 = (select balance from
Customer where accountID = 101);
balance1 += 10.00;
Time 2 update Customer set balance =
?balance1 where accountID = 101;
Time 3
Time 4 balance1 = (select balance from
Customer where accountID = 102);
balance1 -= 10.00;
Time 5 update Customer set balance =
?balance1 where accountID = 102;
Time 6 Commit / Unlock (balance)
Time 7
Time 8
T4
Balance
bal 101 bal 102
15
15
25
15
25
15
Wait
25
5
Wait
Total = select sum(balance)
from customer where accountID
= 101 or accountID = 102
Commit / Unlock (balance)
25
5
Write_Lock (balance)
ถ้ าปล่อย Lock เร็วเกินไป อาจเกิดปัญหา Inconsistency กับฐานข้ อมูล
Write_Lock (balx);
Read (balx);
balx = balx + 100;
Write(balx);
Unlock (balx);
Write_Lock (balx);
Read (balx);
balx = balx * 1.1;
Write(balx);
Unlock (balx);
Write_Lock (baly);
Read (baly);
baly = baly * 1.1;
Write(baly);
Unlock (baly);
Commit
Write_Lock (baly);
Read (baly);
baly = baly - 100;
Write(baly);
Unlock (baly);
Commit
Cascading rollback : the situation, in which a single transaction leads to a series of rollback.
Cascading rollbacks are undesirable, since they potentially lead to the undoing of a significant amount of
work. Clearly, it would be useful if we could design protocols that prevent cascading rollbacks. One way
to achieve this with two-phase locking is to leave the release of all locks until the end of the transaction.
T1
Write_Lock (balx);
Read (balx);
Read_Lock (baly);
Read(baly);
balx = baly + balx;
Write(balx);
Unlock (balx);
.
.
.
.
Rollback
T2
Write_Lock (balx);
Read (balx);
balx = baly + 100;
Write(balx);
Unlock (balx);
.
.
.
.
Rollback
T3
Read_Lock (balx);
.
.
.
.
Rollback
Two-phase locking (2PL) :
A transaction follows the two-phase locking protocol if all locking operations
precede the first unlock operation in the transaction.
According to the rules of this protocol, every transaction can be divided into
two phases; first a growing phase, in which it acquires all the locks needed but cannot
release any locks, and then a shrinking phase, in which it releases its locks but cannot
acquire any new locks.
Two-phase locking protocol may cause deadlock.
Deadlock : An impasse that may result when two or more transactions are each
waiting for locks held by the other to be released. Neither transaction can continue
because each is waiting for a lock it cannot obtain until the other completes.
Once deadlock occurs, the applications involved cannot resolve the problem.
Instead, the DBMS has to recognize that deadlock exists and break the deadlock
in some way.
Lock can solveThe uncommitted dependency problem : This problem occurs when one transaction
is allowed to see the intermediate result of another transaction before it has committed.
Time1
Write_lock (balance);
balance1 = (select balance from customer
where accountID = 101); balance1 += 10.00;
Time 2
Time 3
Write_lock (balance);
balance1 = (select balance from customer
where accountID = 102; balance -= 10.00;
update Customer set balance =
?balance1 where accountID = 101;
Time 4
Time 5 Write_lock (balance);
balance2 = (select balance from customer
where accountID = 102);
Time 6 Wait
Time 7
Time 8 Wait
update Customer set balance =
?balance1 where accountID = 102;
Write_lock (balance)
balance2 = (select balance from customer where
accountID = 101;
Wait
In addition to these rules, some systems permit a transaction to issue
a read lock on an item and then later to upgrade the lock to a write lock.
This effectively allows a transaction to examine the data first and then decide
whether it wishes to update it. If upgrading is not supported, a transaction
must hold write locks on all data items that it may update at some time during
the execution of the transaction, thereby potentially reducing the level of
concurrency in the system.
For the same reason, some systems also permit a transaction to
issue a write lock and then later to downgrade the lock to a read lock.
Granularity of Data Items
Granularity : The size of data items chosen as the unit of protection by a
concurrency control protocol.
A data item is chosen to be one of the following, ranging from
coarse to fine, where fine granularity refers to small item sizes and coarse
granularity refers to large item sizes:
·
The entire database.
·
A file.
·
A page (sometimes called an area or database space – a section of
physical disk in which relations are stored).
·
A record
·
A field value of a record
The size of granularity of the data item that can be locked in a single operation
has a significant effect on the overall performance of the concurrency control
algorithm. The granularity would prevent any other transactions from executing
until the lock is released. Thus, the coarser the data item size, the lower the
degree of concurrency permitted. On the other hand, the finer the item size,
the more locking information that is needed to be stored. The best item size
depends upon the nature of the transactions.
The solutions to this problem will involve providing a locking mechanism in
the database server. Any restrictions on the concurrency of transactions
will have a negative effect on the number of transactions that can be
executing at any time. This balancing act is a typical trade-off. The more
restrictive the concurrency strategy is, the more reliable it is, and the slower
it is. DBMS designers, database administrators, and application developers
must all carefully consider how much concurrency can be achieved without
sacrificing either speed or reliability.
Timestamp-Based Protocal
The use of locks, combined with the two-phase locking
protocol, guarantees serializability of schedules. The order of
transactions in the equivalent serial schedule is based on the order in
which the transactions lock the items they require. If a transaction
needs an item that is already locked, it may be forced to wait until the
item is released. A different approach that also guarantees
serializability uses transaction timestamps to order transaction
execution for an equivalent serial schedule.
Timestamp methods for concurrency control are different from
locking methods. No locks are involved, and therefore there can be no
deadlock. Locking methods generally prevent conflicts by making
transactions wait. With timestamp methods, there is no waiting;
transactions involved in conflict are simply rolled back and restarted.
Timestamp
A unique identifier created by the DBMS that indicates the
relative starting time of a transaction. A timestamp can be generated by
using the value of the system clock as the timestamp; that is, a
transaction’s timestamp is equal to the value of the clock when the
transaction enters the system.
The timestamps of the transactions determine the serializability
order. Thus, if TS(Ti) < TS(Tj), then the system must ensure that the
produced schedule is equivalent to a serial schedule in which transaction
Ti appears before transaction Tj.
With each transaction Ti in the system, we associate a unique
fixed timestamp, denoted by TS(Ti). This timestamp is assigned by the
database system before the transaction Ti starts execution. If a transaction
Ti has been assigned timestamp TS(Ti), and a new transaction Tj enters
the system, then TS(Ti) < TS(Tj). There are two simple methods for
implenenting this scheme:
Besides timestamps for transactions, there are timestamps for
data items. Each data item contains a read-timestamp, giving the
timestamp of the last transaction to read the item and a writetimestamp, giving the timestamp of the last transaction to write
(update) the item.
• W-timestamps(Q) denotes the largest timestamp of any transaction
that executed write(Q) successfully.
• R-timestamps(Q) denotes the largest timestamp of any transaction
that execute read9Q) successfully.
These timestamps are updated whenever a new read(Q) or write(Q)
instruction is executed.
Timestamping : A concurrency control protocol in which the
fundamental goal is to order transactions in such a way that older
transactions , transactions with smaller timestamps, get priority in the
even of conflict.
For a transaction T with timestamp ts(T), the timestamp
ordering protocol works as follows:
The Timestamp-ordering Protocal
1.
Suppose that transaction Ti issues read(Q)
(a) If TS(Ti) < W-timestamp(Q), then Ti needs to read a value of
Q that was already overwritten. Hence, the read operation is rejected.
(b) If TS(Ti) ≥ W-timestamp(Q), then the read operation is
executed, and R-timestamp(Q) is set to the maximum of R-timestamp(Q) and
TS(Ti).
2.
Suppose that transaction Ti issues write(Q)
(a) If TS(Ti) < R-timestamp(Q), then the value of Q that Ti is
producing was needed previously, and the system assumed that that value
would never be produced. Hence, the system rejects the write operation.
(b) If TS(Ti) < W-timestamp(Q), then Ti is attempting to write an
obsolete value of Q. Hence, the system rejects this write operation.
(c) Otherwise, the system executes the write operation and sets Wtimestamp(Q) to TS(Ti).
Transaction Failure and Recovery management
Failure Classification
 Transaction failure. There are 2 types of errors that may
cause a transaction to fail:
Logical error: The transaction can no longer continue with its normal
execution because of some internal condition, such as bad input, data not found,
overflow or resource limit exceeded.
System error : The system has entered an undesirable state.
 System crash. There is a hardware malfunction, or a bug in the DBMS or OS,
that causes the loss of the content of volatile storage and brings transaction
processing to a halt. The content of nonvolatile storage remains intact.
 Disk failure. A disk block loses its content as a result of either a head crash or
failure during a data transfer operation.
Storage Types
Volatile storage. Information residing in volatile storage does not
usually survive system crashes. Example of such storage are main
memory and cache memory. Access to volatile storage is extremely
fast, both because of the speed of the memory access itself, and
because it is possible to access any data items in volatile storage
directly.
Nonvolatile storage. Information residing in nonvolatile storage
survives system crashes. Example of such storage are disk and
magnetic tapes. Both are subject to failure (for example, head crash),
which may result in loss of information.
Stable storage. Information residing in stable storage is never lost
(in fact never cannot be guaranteed. Although, stable storage is
theoretically impossible to obtain, it can be closely approximated by
techniques that make data loss extremely unlikely.
The execution of an SQL statement begins with an implicit request to
open a transaction, followed by the processing of the statement,
followed automatically by a commit request. Rollback happens only
when the SQL statement fails.
An application must make explicit calls to the database
transaction manager to enter explicit-commit mode and allow multiple
SQL statements to execute as a single transaction.
An application executes an open transaction statement (begin
transaction) to ask the transaction manager to create a new transaction
before the next SQL statement executes.
The application executes a commit transaction statement to ask
the transaction manager to commit the transaction.
The application executes a rollback statement to ask the
application to cancel the transaction.
Storage Hierarchy
The database system resides permanently on nonvolatile storage
(usually disks), and is partitioned into fixed-length storage units called blocks.
Blocks are the units of data transfer from disk to main memory (or memory to ,
disk) , and may contain several data items.
Transactions input information from the disk to main memory, and
then output the information back onto the disk. The input and output operations
are done in block units. The blocks residing on the disk are referred to as
physical blocks; the blocks residing temporarily in main memory are referred
to as buffer blocks. The area of memory where blocks reside temporarily is
called the disk buffer.
Storage Hierarchy
Block movements between disk and main memory are initiated
through the following two operations:
Input (X) : transfers the physical block which contains the data item X from
disk to main memory
Output(X) : transfers the buffer block which contains the data item X from main
memory to disk and replaces the appropriate physical block there.
Input (A)
A
B
Main Memory
Output (B)
B
Disk
Each transaction Ti has a private work area in which copies of
all the data items accessed and updated by Ti are kept. The system
creates this work area when he transaction is initiated; the system
removes it when the transaction either commits or aborts. Each data
item X kept in the work area of transaction Ti is denoted by xi.
Transaction Ti interacts with the database system by transferring data to
and from its work area to the system buffer. Data is transferred by these
2 operations:
1. read(X) assigns the value of data item X to the local variable xi. It
executed this operation as follows:
a. If block Bx on which X resides is not in main memory, it issues
input(Bx).
b. It assigns the value of X from the buffer block to xi .
2. Write(X) assigns the value of local variable xi to data item X in the
buffer block. It executes this operation as follows:
a. If block Bx on which X resides is not in main memory, it issues
input(Bx).
b. It assigns the value of xi to X in buffer Bx.
Both operations may require the transfer of a block from disk to
main memory. However, they do not require the transfer of a block from
main memory to disk.
The output (Bx) operation for the buffer block Bx on which X
resides does not need to take effect immediately after write (X) is
executed, since the block Bx may contain other data items that are still
being accessed.
A buffer block is eventually written out to the disk either
because the buffer manager needs the memory space for other purposes
or because the database system wishes to reflect the change to B on the
disk. (DBMS performs a force-output of buffer B if it issues an output
B).
Algorithms proposed to ensure database consistency and
transaction atomicity despite failures are known as recovery
algorithms, which have 2 parts :1: Actions taken during normal transaction processing to ensure
that enough information exists to allow recovery from
failures.
2: Actions taken after a failure to recover the database contents
to a state that ensures database consistency, transaction
atomicity, and durability.
Log-Based Recovery
The most widely used structure of recording database
modifications is the log. The log is a sequence of log records,
recording all the update activities in the database. There are
several types of log records. An update log record describes a
single database write. It has these fields:
• Transaction identifier
• Data-item identifier
• Old value
• New value
Other special log records exist to record significant events during
transaction processing.
Whenever a transaction performs a write, it is essential that the log
record for that write be created before the database is modified. (the
transaction has its own memory that acts like a cache for the modified
data items.)
Once a log record exists, we can output the modification to the
database if that is desirable. Also, we have the ability to undo a
modification that has already been output to the database. We undo it
by using the old-value field in log records.
Deferred Database Modification
This technique ensures transaction atomicity by recording all
database modifications in the log, but deferring the execution of
all write operations of a transaction until the transaction partially
commits.
When a transaction partially commits, the information on the log
associated with the transaction is used in executing the deferred
writes. If the system crashes before the transaction completes its
execution, or if the transaction aborts, then the information on the log
is simply ignored.
The execution of transaction Ti proceeds as follows.
Before Ti starts its execution, a record <Ti start> is written to the
log. A write(X) operation by Ti results in the writing of a new record
to the log. Finally, when Ti partially commits, a record <Ti commit> is
written to the log.
T0:
T1:
Read(A);
A = A – 50;
Write (A);
Read (B);
B = B + 50;
Write (B);
Read (C);
C = C – 100;
Write (C);
สมมุติให้ขอ้ มูลปั จจุบนั ของ A = 1000
B = 2000 และ C = 700
< T0 Start>
< T0, A, 950 >
< T0, B, 2050>
< T0 Commit>
< T1 Start>
< T1, C, 600 >
< T1 Commit>
ข้ อมูลใน log บันทึก
เฉพาะค่ าใหม่ เท่ านั้น
When transaction Ti partially commits, the records associated with it in
the log are used in executing the deferred writes. Since a failure may
occur while this updating is taking place, we must ensure that, before
the start of these updates, all the log records are written out to stable
storage. Once they have been written, the actual updating takes place,
and the transaction enters the committed state.
T0:
T1:
Read(A);
A = A – 50;
Write (A);
Read (B);
B = B + 50;
Write (B);
Read (C);
C = C – 100;
Write (C);
เรคอร์ ดใน log
ข้ อมูลใน Database
< T0 Start>
< T0, A, 950 >
< T0, B, 2050>
< T0 Commit>
A = 950
B = 2050
< T1 Start>
< T1, C, 600 >
< T1 Commit>
C = 600
เรคอร์ ดใน log
ข้ อมูลใน Database
< T0 Start>
< T0, A, 950 >
< T0, B, 2050>
System failure
A = 1000
B = 2000
< T1 Start>
< T1, C, 600 >
System failure
C = 700
DBMS does not take any action after recovery from failure because
database has been untouched.
เรคอร์ ดใน log
ข้ อมูลใน Database
< T0 Start>
< T0, A, 950 >
< T0, B, 2050>
< T0 Commit>
System failure
A = 950
B = 2050
< T1 Start>
< T1, C, 600 >
< T1 Commit>
System failure
C = 600
DBMS has to perform redo operation after recovery from failure.
เรคอร์ ดใน log
ข้ อมูลใน Database
< T0 Start>
< T0, A, 950 >
< T0, B, 2050>
System failure
A = 1000
B = 2000
< T1 Start>
< T1, C, 600 >
< T1 Commit>
C = 600
System failure
DBMS does not take any action to T0 because A and B are untouched
but DBMS must perform redo to T1 after recovery from failure.
Using the log, the system can handle any failure that results in the loss
of information on volatile storage. The recovery scheme uses the
following recovery procedure:
Redo(Ti) sets the value of all data items updated by transaction Ti to
the new values.
The redo operation must be idempotent; that is, executing it several
times must be equivalent to executing it once. This characteristic is
required if we are to guarantee correct behavior even if a failure occurs
during the recovery process.
After a failure, the recovery subsystem consults the log to determine
which transactions need to be redone. Transaction Ti needs to be
redone if and only if the log contains both the record
<Ti start>
<Ti commit>.
Thus, if the system crashes after the transaction completes its
execution, the recovery scheme uses the information in the log to
restore the system to a previous consistent state after the transaction
had completed.
Immediate Database Modification
This technique allows database modifications to be output to the
database while the transaction is still in the active state. Data
modifications written by active transactions are called uncommitted
modifications.
In the event of a crash or a transaction failure, the system must use the
old-value field of the log records to restore the modified data items to
the value they had prior to the start of the transaction. The undo
operation accomplishes this restoration.
Before a transaction Ti starts its execution, the system writes the record
<Ti start> to the log. During its execution, any write(X) operation by
Ti is preceded by the writing of the appropriate new update record to
the log. When Ti partially commits, the system writes the record <Ti
commit> to the log.
เรคอร์ ดใน log
T0:
T1:
Read(A);
A = A – 50;
Write (A);
Read (B);
B = B + 50;
Write (B);
Read (C);
C = C – 100;
Write (C);
< T0 Start>
< T0, A, 1000, 950 >
< T0, B, 2000, 2050>
ข้ อมูลใน Database
A = 950
B = 2050
< T0 Commit>
< T1 Start>
< T1, C, 700, 600 >
C = 600
< T1 Commit>
เรคอร์ ดใน log
< T0 Start>
< T0, A, 1000, 950 >
< T0, B, 2000, 2050>
ข้ อมูลใน Database
A = 950
B = 2050
System failure
< T1 Start>
< T1, C, 700, 600 >
C = 600
System failure
DBMS must perform undo to T0 and T1 by using old value after recovery
from failure.
เรคอร์ ดใน log
< T0 Start>
< T0, A, 1000, 950 >
< T0, B, 2000, 2050>
ข้ อมูลใน Database
A = 950
B = 2050
< T0 Commit>
System failure
< T1 Start>
< T1, C, 700, 600 >
C = 600
< T1 Commit>
System failure
DBMS has to perform redo operation by using new value to T0 and T1 after
recovery from failure.
เรคอร์ ดใน log
< T0 Start>
< T0, A, 1000, 950 >
< T0, B, 2000, 2050>
ข้ อมูลใน Database
A = 950
B = 2050
System failure
< T1 Start>
< T1, C, 700, 600 >
C = 600
< T1 Commit>
System failure
DBMS has to perform undo to T0 and redo to T1 after recovery from failure.
After a failure, the recovery subsystem consults the log to determine
which transactions need to be undone or redone. Transaction Ti
needs to be undone if the log contains only the record <Ti start>
and need to be redone if there exists
<Ti start> and <Ti commit>
Thus, if the system crashes after the transaction completes its
execution, the recovery scheme uses the information in the log to
restore the system to a previous consistent state after the transaction
had completed.
Since the information in the log is used in reconstructing the state of the
database, We therefore require that, before execution of an output(B)
operation, the log records corresponding to B be written onto stable
storage.
Rollback segment (RBS)
Rollback segment (RBS) : An Oraclex database has a data area that
contains a rollback segment (RBS) entry for each open transaction. RBS
entry is a set of images of rows that have been modified by the transaction.
The images represent the values of the rows before the execution of the
transaction. Each update operation executed by a transaction is applied to
row of a database table only after the previous value of the row is added to
the RBS entry.
Oraclex database server
Rollback segment
Before image
Transaction T
T.A write r
Database tables
Updated values
r
r
s
s
T.B write s
t
T.C read s
u
T.D read u
The open transaction operation creates a new RBS entry and associates it
with the transaction. The execution of a transaction commit operation
deletes the RBS entry and makes the changes permanent. The execution
of a rollback operation restores all of the modified rows from the RBS
entry.
Other DBMS systems, the transaction has its own memory that acts like a cache for
the modified rows. During the execution, the database tables are not changed.
Instead, the new row images are written into the memory of the transaction. All
accesses to rows in database tables go first to the transaction cache. If a row is not
found, the full database tables are used. The commit operation flushes the cache by
writing the new row values to the database tables and deleting the cache. The
rollback operation deletes the cache, leaving the database unchanged.
Cached updates database server
Update segment
Updated values
Transaction T
T.A write r
Database tables
Before image
r
r
s
s
T.B write s
t
T.C read s
u
T.D read u
Checkpoints
When a system failure occurs, DBMS must consult the log to determine
those transactions that need to be redone or those that need to be undone.
In principle, the entire log must be searched to determine this information.
There are two major difficulties with this approach:
1. The search process is time consuming.
2. Most of the transactions that, according to the algorithm, need to be
redone have already written their updates into the database. Although
redoing them will cause no harm, it will nevertheless cause recovery
to take longer.
Checkpoints (continue)
To reduce the number of transactions to be redone and undone,
the system periodically performs checkpoints, which require the
following sequence of actions to take place :1. Output onto stable storage all log records currently
residing
in main memory.
2. Output to the disk all modified buffer blocks.
3. Output onto stable storage a log record <checkpoint>.
Transactions are not allowed to perform any update actions, such as
writing to a buffer block or writing a log record, while a checkpoint is in
progress.
Checkpoints (continue)
The presence of a <checkpoint> record in the log allows the system
to streamline its recovery procedure. Consider a transaction Ti that committed
prior to the checkpoint. For such a transaction, the <Ti commit> record
appears in the log before the <checkpoint> record. Any database
modifications made by Ti must have been written to the database either prior
to the checkpoint or as part of the checkpoint itself. Thus, at the recovery time
there is no need to perform a redo operation on Ti.
This observation allows us to refine our previous recovery schemes.
After a failure has occurred, the recovery scheme examines the log to
determine the most recent transaction Ti that started executing before the most
recent checkpoint took place. It can find such a transaction by searching the
log backward, from the end of the log, until it finds the first <checkpoint>
record (since we are searching backward, the record found is the final
<checkpoint> record in the log); then it continues the search backward until
it finds the next <Ti start> record. This record identifies a transaction Ti .
Once the system has identified transaction Ti, the redo and undo
operations need to be applied to only transaction Ti and all transactions Tj that
started execution after transaction Ti. Let denote these transactions by the set T.
The remainder (earlier part) of the log can be ignored, and can be erased
whenever desired. The exact recovery operations to be performed depend on the
modification technique being used. For the immediate-modification technique,
the recovery operations are:
• For all transactions Tk in T that have no <Tk commit> record in the log,
execute undo(Tk).
• For all transaction Tk in T such that the record <Tk commit> appears in
the log, execute redo(Tk).
Consider the set of transactions {T0, T1, ….., T100} executed in the order
of the subscripts. Suppose that the most recent checkpoint took place
during the execution of transaction T67. Thus, only transactions T67, T68,
…., T100 need to be considered during the recovery scheme. Each of them
needs to be redone if it has committed; otherwise, it needs to be undone.