Download Document

Document related concepts

Oracle Database wikipedia , lookup

IMDb wikipedia , lookup

Open Database Connectivity wikipedia , lookup

Consistency model wikipedia , lookup

Relational model wikipedia , lookup

Functional Database Model wikipedia , lookup

Database wikipedia , lookup

Database model wikipedia , lookup

Clusterpoint wikipedia , lookup

Microsoft Jet Database Engine wikipedia , lookup

Versant Object Database wikipedia , lookup

Extensible Storage Engine wikipedia , lookup

Global serializability wikipedia , lookup

Commitment ordering wikipedia , lookup

Concurrency control wikipedia , lookup

Serializability wikipedia , lookup

Transcript
Chapter 15:
Transaction Management
Chapter 16: Transaction Management










Transaction Processing Overview
Transaction Concept
Transaction Definition in SQL
Transaction State
ACID Properties
Concurrent Executions
Serializability
Recoverability
Implementation of Isolation
Testing for Serializability.
Transaction Processing - Basics



A transaction is a logical unit of a database
processing
Transaction processing systems include large
databases and hundreds of concurrent users
Examples of these systems are:





airline reservations,
banking,
credit card processing,
supermarket checkout, and
similar systems
Multi - User Database Systems

One way to classify DBMSs is according to the number of
concurrent users:







single user
multi-user
Majority of database systems are of a multi - user type
Concurrent (or simultaneous from the user point of view)
database usage is possible thanks to computer
multiprogramming
Multiprogramming operating systems execute some
commands of one process, then suspend this process and
execute some commands of another process
After a while, the execution of the first process is resumed at
the point where it was interrupted
This type of process execution is called interleaving
Interleaved and Parallel Processes
process 1
process 1
process 2
Interleaved
process 2
t
t1
t3
t2
t4
t5
process 1
process 2
t2 - t1 + t4 - t 3
t3 - t2 + t5 - t 4
Parallel
t
A Question for You


Are multiprogramming and interleaving more
efficient than monoprogramming with serial
program execution?
Answers:
a)
b)
c)
Yes, but only from a user’s point of view
No, because multiprogramming means interrupting and
resuming programs, which introduce OS overheads
Yes, from the point of view of both users and computers,
because an OS interrupts execution of a program when it
issues an i/o operation, thus saving long idle times, and
resumes its execution when the i/o has been finished
Transaction Concept

A transaction is a single logical unit of a database processing that includes one or more database
access operations (read and write )

A transaction is a unit of program execution that accesses and possibly updates various data
items.




E.g. transaction to transfer $50 from account A to account B:
1.
2.
3.
4.
5.
6.




It may involve one or more operations on the database
It could be as simple as a single SQL command
Or as complex as the set of accesses performed by an application program
read(A)
A := A – 50
write(A)
read(B)`
B := B + 50
write(B)
Here our transaction consists of four database accesses (reads and writes), and twp
non-database activities (decreasing value of A & increasing value of B)
A transaction should always transform the database from one consistent state to
another.
If a transaction finishes successfully, all data it has changed are visible to other
transactions
If a transaction fails for any reason, DBMS has to undo all the changes that the
transaction made against the database
Transactions (continued)





In multi – user transaction processing systems, users execute
database transactions concurrently
Most often, concurrent means interleaved
The users can attempt to modify the same database items at
the same time, and that is potential source of database
inconsistency
Checking database integrity constraints is not enough to
protect a database from threats induced by its concurrent
usage
Two main issues to deal with:


Failures of various kinds, such as hardware failures and system crashes, due to
which transaction is incomplete
Concurrent execution of multiple transactions
Transaction Outcomes

A transaction can have one of two outcomes:



If it completes successfully, the transaction is said
to have committed and the database reaches a
new consistent state.
If it does not execute successfully, the transaction
is aborted.– In this case, the database must be
restored to the consistent state it was in before
the transaction started.– This is known as rollback.
Whatever the outcome, the database is in a
consistent state at the end of the transaction
Transaction Support

Once a transaction has committed, it cannot be aborted.
 Thus, if we decide that a committed transaction was a mistake,
then we must perform another transaction to reverse it.
 On the other hand, an aborted transaction can be restarted later,
and depending on the cause of failure, may successfully execute
and commit at that time.
 A DBMS has no way of knowing which updates are grouped
together to form a single, logical transaction.
 Therefore, the user must be provided with a way to indicate the
boundaries of each transaction.


For example, there may be keywords such as BEGIN_TRANSACTION,
COMMIT, and ROLLBACK to delimit a transaction.
If such delimiters are not used, the whole program is usually treated as a
single transaction with the DBMS automatically performing a COMMIT
upon successful termination, or ROLLBACK if not.
Transaction Execution with SQL

Transaction support provided by




COMMIT
ROLLBACK
COMMIT statement is reached when all changes
are permanently recorded within the database, it
indicates successful end of transaction
ROLLBACK statement is reached in which all
changes are aborted and database is rolled back to
previous consistent stage, it indicates unsuccessful
end of transaction.
Transaction State

Active –

Partially committed – after the final statement
the initial state; the transaction stays in this state
while it is executing
has been executed.

Failed --
after the discovery that normal execution can no
longer proceed.

Aborted –
after the transaction has been rolled back and
the database restored to its state prior to the start of the
transaction. Two options after it has been aborted:



restart the transaction
 can be done only if no internal logical error
kill the transaction
Committed –
after successful completion.
committed transactions cannot be undone
Effects of
Transaction State (Cont.)
Commit
Abort
Begin Transaction
Transaction Properties (ACID)
To preserve the integrity of data the database system must ensure:

Atomicity. Either all operations of the transaction are executed and
properly reflected in the database or none are.

Consistency.
A transaction transforms the database from one
consistent state to another.

Isolation. Although multiple transactions may execute concurrently,
each transaction must be unaware of other concurrently executing
transactions. Intermediate transaction results must be hidden from other
concurrently executed transactions.


That is, for every pair of transactions Ti and Tj, it appears to Ti that either Tj,
finished execution before Ti started, or Tj started execution after Ti finished.
Durability. After a transaction completes successfully, the changes it
has made to the database persist, even if there are system failures.
Example of Fund Transfer

Transaction to transfer $50 from account A to account B:
1.
2.
3.
4.
5.
6.

read(A)
A := A – 50
write(A)
read(B)
B := B + 50
write(B)
Atomicity requirement



if the transaction fails after step 3 and before step 6, money will be
“lost” leading to an inconsistent database state

Failure could be due to software or hardware
the system should ensure that updates of a partially executed
transaction are not reflected in the database
Either all the operations of the transaction be completed or if not
then the transaction should be aborted
Example of Fund Transfer

Transaction to transfer $50 from account A to account B:
1.
2.
3.
4.
5.
6.


read(A)
A := A – 50
write(A)
read(B)
B := B + 50
write(B)
Durability requirement —
once the user has been notified that the
transaction has completed (i.e., the transfer of the $50 has taken place), the
updates to the database by the transaction must persist even if there are
software or hardware failures.
Ensuring durability is the responsibility of recovery management component
Example of Fund Transfer (Cont.)

Transaction to transfer $50 from account A to account B:
1.
2.
3.
4.
5.
6.

read(A)
A := A – 50
write(A)
read(B)
B := B + 50
write(B)
Consistency requirement :




The sum of A and B is unchanged by the execution of the transaction
A transaction must see a consistent database. Execution of a transaction
must leave the database in either a new stable state or revert to old stable
state.
During transaction execution the database may be temporarily inconsistent.
When the transaction completes successfully the database must be
consistent


Erroneous transaction logic can lead to inconsistency
In general, consistency requirements include


Explicitly specified integrity constraints such as primary keys and foreign keys
Implicit integrity constraints

e.g. sum of balances of all accounts, minus sum of loan amounts must equal value of cashin-hand
Example of Fund Transfer (Cont.)

Isolation requirement — if between steps 3 and 6, another
transaction T2 is allowed to access the partially updated
database, it will see an inconsistent database (the sum A + B
will be less than it should be).
T1
T2
1. read(A)
2. A := A – 50
3. write(A)
read(A), read(B), print(A+B)
4. read(B)
5. B := B + 50
6. write(B



Data used during execution of a transaction cannot be used
by a second transaction, till the first one is completed
Isolation can be ensured trivially by running transactions
serially, that is, one after the other.
However, executing multiple transactions concurrently has
significant benefits, as we will see later.
Reason for Incomplete transactions



It can be aborted or terminated unsuccessfully
because of some problem during execution.
System may crash while one or more transactions
are in progress.
Transaction may encounter an unexpected situation
(read an unexpected data value or unable to access
some disk) and decide to abort.



DBMS ensures transactions atomicity by undoing actions of
incomplete transactions.
It maintains a records, called log, of all the writes to the database.
It ensures durability also.
If system crashes before changes made by complete Transaction
are written to the disk, log is used to remember and restore
changes when system restarts.
Log File

Typically, a log file contains records with the
following contents:
[start_transaction, T ]
(*T is transaction
id*)
[write_item, T, X, old_value, new_value]
[read_item,T, X ]
(*optional*)
[commit, T ]
[abort, T ]
Sources of Database Inconsistency



Uncontrolled execution of database transactions in a
multi – user environment can lead to database
inconsistency
There is a number of possible sources of database
inconsistency
The typical ones are:



lost update problem,
dirty read problem, and
unrepeatable read problem
Lost Update Problem
T1
T2
read_item ( X )
read_item (X)
X=X–N
write_item (X )
X=X+M
t
i
m
e
write_item (X)
After termination of T2, X = X + M.
T1's update to X is lost because
T2 wrote over X
Generally, lost update
problem is characterized by:
•T2 reads X,
•T1 writes X, and
•T2 writes X
Lost update problem


Imagine that a customer wants to withdraw £30 from
a bank account.
At the same time, the bank is crediting this month’s
salary.
Time
t1
t2
t3
t4
t5
t6

T1 (withdrawal)
begin_transaction
read(balance)
balance=balance-30
write(balance)
commit
T2 (credit salary)
begin_transaction
read(balance)
balance=balance+1000
write(balance)
commit
Balance
100
100
100
1100
70
70
Both transactions occur at roughly the same time
and read the same initial balance.

The last transaction to commit overwrites the update made by
the first.
Uncommitted Dependency : Dirty Read
Problem
T1
T2
read_item ( X )
X=X–N
write_item (X )
read_item (X)
X=X+M
write_item (X)
read_item ( Y )
T1 fails
t
i
m
e
Generally, dirty read
problem is characterized
by:
•T1 writes X,
•T2 reads X, and
•T1 fails
Since T1 failed, DBMS is
going to undo the changes
it made against the
database
T2 has already read item X =
X - N value, and that value is
going to be altered by DBMS
back to X
The Uncommitted dependency :Dirty Read
problem

Time
t1
t2
t3
t4
t5
t6
t7
t8


The uncommitted dependency problem (dirty read) occurs when
one transaction sees the intermediate results of another
(aborted) transaction.
T1 (withdrawal)
begin_transaction
read(balance)
balance=balance – 30
write(balance)
rollback
T2 (credit salary)
begin_transaction
read(balance)
balance = balance + 1000
write(balance)
commit
Balance
100
100
100
70
70
70
1070
1070
For some reason, the withdrawal transaction is aborted, but the
salary credit transaction has already seen the update.
When T2 commits, the balance is incorrect (it should be 1100).
Unrepeatable Read / Inconsistent Retrieval
problem
T1
T2
read_item ( X )
read_item (X)
X=X+M
write_item (X)
t
i
m
e
Generally, unrepeatable read
problem is characterized by:
•T1 reads X,
•T2 writes X, and
•T1 reads X
read_item (X )
Transaction T1 has got two different values of X in two subsequent
reads, because T2 has changed it in the meantime
Even if T1 didn't execute the second read command, it would use a
stale X value, and that's another form of the unrepeatable read problem
Unrepeatable Read / Inconsistent
Retrieval problem
The previous problems involved simultaneous updates to the database. However,
problems can also result if a transaction merely reads the result of an uncommitted
transaction.

Below, one transaction (T1) is transferring £10 from account Balw to Balz, and at the
same time, another transaction (T2) is summing all the accounts (Balw, Balx, Baly
and Balz). Try to figure out what’s gone wrong….
Time
T1 (transfer funds) T2 (sum accounts)
Balw Balx Baly Balz Sum
t1
begin_transaction
100 50 10 25
t2
begin_transaction
sum=0
100 50 10 25
0
t3
read(Balw)
read(Balw)
100 50 10 25
0
t4
balw=Balw - 10
sum = sum + Balw
100 50 10 25
100
t5
write(Balw)
read(Balx)
90 50 10 25
100
t6
read(Balz)
sum = sum + Balx
90 50 10 25
150
t7
balz = Balz + 10
read(Baly)
90 50 10 25
150
t8
write(Balz)
sum = sum + Baly
90 50 10 35
160
t9
commit
read(balz)
90 50 10 35
160
t10
sum = sum+Balz
90 50 10 35
195
t11
commit
90 50 10 35
195
….here, the £10 transferred by T1 has been counted twice by T2, making its result too large by £10.

A Question for You

What is the difference between:


Dirty read and
Unrepeatable read
A Question for You

What is the difference between:


Dirty read and
Unrepeatable read
Answers:
a)
b)
c)
There is no difference.
Even if there is a difference, I can't recall what it is.
The difference is:
–
–
The dirty read is a consequence of reading updates made
by a transaction before it has successfully finished (and has
even failed later).
The unrepeatable read is a consequence of allowing a
transaction to read data that the other one is altering.
Prevention of Concurrency Anomalies




Lost update, dirty read and unrepeatable read are
called concurrency anomalies
The concurrency control part of a DBMS has the
task to prevent these problems
DBMS is responsible to ensure that either all
operations of a transaction are successfully
executed and their effect is permanently stored in
the database, or it happens as if the transaction
were even not started
The effect of a partially executed transaction has to
be undone
Advantages of Concurrent Execution of Transactions

Advantages of concurrent execution are:
 Increased Throughput and Resource Utilization, leading
to better transaction throughput


E.g. one transaction can be using the CPU while another is
reading from or writing to the disk (processor and disk spend
less time idle.)
Because I/O activity can be done in parallel with CPU activity.

Reduced Average Response Time: The average time for a
transaction to be completed after it has been submitted

For transactions: short transactions need not wait behind
long ones. Concurrent execution reduces unpredictable
delays in running trans.
Implementation of Atomicity and Durability


The recovery-management component of a database system
implements the support for atomicity and durability.
E.g. A very simple but extremely inefficient scheme is the
shadow-copy scheme:

all updates are made on a shadow copy of the database

db_pointer is made to point to the updated shadow copy after


the transaction reaches partial commit and
all updated pages have been flushed to disk.
Implementation of Atomicity and Durability
(Cont.)

db_pointer always points to the current consistent
copy of the database.


In case transaction fails, old consistent copy pointed to by
db_pointer can be used, and the new copy can be
deleted.
The shadow-database scheme:




Assumes that only one transaction is active at a time.
Assumes disks do not fail
Useful for text editors, but
 extremely inefficient for large databases (why?)
Does not handle concurrent transactions
Database Architecture
Transaction
Manager
Buffer Manager
Scheduler
Recovery Manager
Low Level
DBMS
Database Architecture

The components of a DBMS that manage
transactions are as follows:


The transaction manager coordinates transactions on
behalf of application programs.
It communicates with the scheduler, which implements a
particular strategy for concurrency control



The scheduler tries to maximize concurrency without allowing
transactions to interfere with one another.
If failure occurs during a transaction, the recovery
manager ensures that the database is restored to the state
it was in before the start of the transaction.
The buffer manager is responsible for the transfer of data
between disk storage and main memory
Transaction Management






Both concurrency and recovery control are required to protect the
database from data inconsistencies and data loss.
Many DBMSs allow users to undertake simultaneous operations
on the database.
If these operations are not controlled, they can interfere with each
other, and the database may become inconsistent.
To overcome this the DBMS implements Concurrency control
schemes – i.e. mechanisms to achieve isolation

That is, to control the interaction among the concurrent transactions in
order to prevent them from destroying the consistency of the database
Database recovery is the process of restoring the database to a
correct state following a failure– The failure may be the result of a
system crash, media failure, software error, or accidental or
malicious destruction of data.
Whatever the cause of failure, the DBMS must be able to restore the
database to a consistent state
Concurrency Control

Concurrency control is the process of managing simultaneous
operations on the database without having them interfere with
one another.
 Concurrency control is needed because many users are able to
access the database simultaneously.

Note that managing concurrent access is easy if all users are
only reading data.

There is no way that such uses can interfere with one another.

However, when two or more users are accessing the databases
simultaneously, and at least one of them is updating data, there
may be interference that can cause inconsistencies.
Concurrency Control
How to prevent harmful interference btw transactions?
T1
T2 …
DB
(consistency
constraints)
Tn

To prevent such problems a
DBMS must implement
concurrency control
techniques based on
- locks
- timestamps and
validation
Schedules

Schedule – A sequences of instructions that specify
the chronological order (possibly interleaving) in
which instructions of concurrent transactions are
executed


Suppose there are n transactions T1, T2 ,…,Tn
A schedule S of these n transactions is an ordering of
their operations such that for each Ti participating in S,
the operations of Ti in S appear in the same order as in
Ti itself
 A schedule for a set of transactions must consist of
all instructions of those transactions
 It must preserve the order in which the instructions
appear in each individual transaction.
Complete Schedules



A transaction that successfully completes its execution
will have a commit instructions as the last statement
 by default transaction assumed to execute commit
instruction as its last step
A transaction that fails to successfully complete its
execution will have an abort instruction as the last
statement
A schedule that contains either an abort or a commit for
each transaction whose actions are listed in it is called
complete schedule.
Serial Schedules

A schedule S is said to be a serial schedule if for each Ti in S, all
operations of Ti are executed consecutively in S

If the actions of different transactions are not interleaved-that is
transactions are executed from start to end one by one

Serial schedules are considered to be correct, i.e. that they do
not exhibit concurrency anomalies, like:
 lost update
 unrepeatable read
 dirty read

But, serial schedules mean no interleaving and hence are
considered inefficient
Serial Schedules : Schedule 1


Let T1 transfer $50 from A to B, and T2 transfer
10% of the balance from A to B.
A serial schedule in which T1 is followed by T2 :
Schedule 2
• A serial schedule where T2 is followed by T1
Schedule 3

Let T1 and T2 be the transactions defined previously. The following
schedule is not a serial schedule, its concurrent schedule. But it is
equivalent to Schedule 1.
In Schedules 1, 2 and 3, the sum A + B is preserved.
Schedule 4

The following concurrent schedule does not preserve the
value of (A + B ).
Examples of Legal Schedules

T<S

1. avoid lost update problem


T: transfer $100 from A to C:
S: transfer $100 from B to C:
R(A)
W(A)
R(B)
R(C)
W(C)
W(B)
R(C)
W(C)

2. avoid inconsistent retrievals problem


T: transfer $100 from A to C: R(A)
S: compute total balance for A and C:
W(A)
R(C)
W(C)
R(A)
R(C)

3. avoid non-repeatable reads


T: transfer $100 from A to C
R(A) W(A)
S: check balance and withdraw $100 from A:
W(A)
R(C)
R(A)
W(C)
R(A)
Defining the Legal Schedules

1. To be serializable, the conflicting operations of T and S must
be ordered as if either T or S had executed first.



2. Suppose T and S conflict over some shared item(s) x.
3. In a serial schedule, T’s operations on x would appear before
S’s, or vice versa....for every shared item x.


We only care about the conflicting operations: everything else
will take care of itself.
As it turns out, this is true for all the operations, but again, we
only care about the conflicting ones.
4. A legal (conflict-serializable) interleaved schedule of T and S
must exhibit the same property.

Either T or S “wins” in the race to x; serializability dictates that
the “winner take all”.
Serializable Schedules





Basic Assumption – Each transaction preserves
database consistency.
Thus serial execution of a set of transactions
preserves database consistency.
Schedules that allow interleaving to some extent
and are correct are called serializable
schedules
A serializable schedule is considered to be
correct if it is equivalent to a serial schedule
Different forms of schedule equivalence
1. conflict serializability
2. view serializability
Conflict Serializable Schedules

Two operations conflict if




A serializiable schedule is conflict equivalent to a
serial schedule if


they belong to different transactions
at least one of them is a write
they access the same item X
order of any two conflicting operations is the same in
both schedules
Simplified view of transactions



We ignore operations other than read and write instructions
We assume that transactions may perform arbitrary computations
on data in local buffers in between reads and writes.
Our simplified schedules consist of only read and write
instructions.
Conflicting Instructions


Instructions li and lj of transactions Ti and Tj respectively,
conflict if and only if there exists some item Q accessed
by both li and lj, and at least one of these instructions
wrote Q.
1. li = read(Q), lj = read(Q). li and lj don’t conflict.
2. li = read(Q), lj = write(Q). They conflict.
3. li = write(Q), lj = read(Q). They conflict
4. li = write(Q), lj = write(Q). They conflict
Intuitively, a conflict between li and lj forces a (logical)
temporal order between them.

If li and lj are consecutive in a schedule and they do not
conflict, their results would remain the same even if they had
been interchanged in the schedule.
Conflict Non Equivalent Schedules
Schedule S2
Serial schedule S1
T1
T2
T1
read_item ( X )
X=X–N
write_item (X )
T2
read_item ( X )
read_item (X)
read_item (X)
X=X+M
write_item (X)
t
i
m
e
X=X–N
t
X=X+M
i
write_item (X )
m
write_item (X)
e
Schedules S1 and S2 are NOT conflict equivalent, since:
•in S1, write1_item (X ), read2_item(X )
•in S2, read2_item (X ), write1_item(X )
S2 is not conflict serializable schedule, because one can't find a serial
schedule that is conflict equivalent to S2
Conflict Serializability


If a schedule S can be transformed into a
schedule S´ by a series of swaps of nonconflicting instructions, we say that S and S´
are conflict equivalent.
We say that a schedule S is conflict
serializable if it is conflict equivalent to a
serial schedule.
Conflict Serializability (Cont.)

Schedule 3 can be transformed into Schedule 6, a
serial schedule where T2 follows T1, by series of
swaps of non-conflicting instructions.

Therefore Schedule 3 is conflict serializable.
Schedule 3
Schedule 6
Conflict Serializability (Cont.)

Example of a schedule that is not conflict serializable:

We are unable to swap instructions in the above schedule to
obtain either the serial schedule < T3, T4 >, or the serial
schedule < T4, T3 >.
View Serializability

Let S and S´ be two schedules with the same set of
transactions. S and S´ are view equivalent if the following
three conditions are met, for each data item Q,
1.
2.
3.
If in schedule S, transaction Ti reads the initial value of Q, then in
schedule S’ also transaction Ti must read the initial value of Q.
If in schedule S transaction Ti executes read(Q), and that value
was produced by transaction Tj (if any), then in schedule S’ also
transaction Ti must read the value of Q that was produced by the
same write(Q) operation of transaction Tj .
The transaction (if any) that performs the final write(Q) operation
in schedule S must also perform the final write(Q) operation in
schedule S’.
As can be seen, view equivalence is also based purely on reads
and writes alone.
View Serializability (Cont.)




A schedule S is view serializable if it is view equivalent to a
serial schedule.
Every conflict serializable schedule is also view
serializable. But reverse is not true.
Below is a schedule which is view-serializable but not conflict
serializable.( It is view equivalent to serial
schedule<T3,T4,T6>)
Every view serializable schedule that is not conflict
serializable has blind writes. (Without having performed
read(Q) operation directly writes operations are called blind
writes)
Other Notions of Serializability

The schedule below produces same outcome as the serial
schedule < T1, T5 >, yet is not conflict equivalent or view
equivalent to it.

Determining such equivalence requires analysis of operations
other than read and write.
The Graph Test for Serializability

To determine if a schedule is serializable, make a directed
graph:


Add a node for each committed transaction.
Add an arc from T to S if any equivalent serial schedule must
order T before S.




T must commit before S iff the schedule orders some operation of T
before some operation of S.
The schedule only defines such an order for conflicting operations...
...so this means that a pair of accesses from T and S conflict over
some item x, and the schedule says T “wins” the race to x.
The schedule is conflict-serializable if the graph has no cycles.

(winner take all)
Testing for Serializability



Consider some schedule of a set of transactions T1, T2, ..., Tn
Precedence graph — The graph consist of a pair G=(V,E).
A direct graph where V is set of vertices consist of all transactions
and E set of edges consist of all edges Ti
Tj for which one of
three condition holds:





Ti executes write(X) before Tj executes read(X).
Ti executes read(X) before Tj executes write(X).
Ti executes write(X) before Tj executes write(X).
If precedence graph for schedule S has a cycle, then schedule S is
not conflict serializable.If graph contains no cycles then schedule S is
conflict serializable.
Example 1
V
R(X)
W(X)
E
W(X)
y
R(X)
Testing for Conflict Serializability
Schedule S2
T1
T2
read2_item(X ), write1_item(X )
read_item ( X )
read_item (X)
X=X–N
t
X=X+M
i
write_item (X )
m
write_item (X)
e
T1
T2
read1_item(X ), write2_item(X )
write1_item(X ), write2_item(X )
Testing for Conflict Serializability
Schedule S3
T1
T2
read_item ( X )
X=X–N
write_item (X )
read_item (X)
X=X+M
read_item ( Y )
Y=Y+Q
write_item (Y )
write_item (X)
t
i
m
e
T1
T2
read1_item(X ), write2_item(X )
write1_item(X ), read2_item(X )
write1_item(X ), write2_item(X )
Test for Conflict Serializability


A schedule is conflict serializable if and
only if its precedence graph is not cyclic.
Cycle-detection algorithms (such as
depth first search) require on the order
of n2 operations, where n is the number
of vertices in the graph.


(Better algorithms take order n + e where
e is the number of edges.)
If precedence graph is acyclic (not
cyclic),the serializability order can be
obtained by a topological sorting of the
graph.


This is a linear order consistent with the
partial order of the graph.
For example, a serializability order for
Schedule
A
would
be
Ti-Tj-Tk-Tm , Ti-Tk-Tj-Tm
Test for View Serializability

The precedence graph test for conflict
serializability cannot be used directly to test for
view serializability.


The problem of checking if a schedule is view
serializable falls in the class of NP-complete
problems.


Extension to test for view serializability has cost
exponential in the size of the precedence graph.
Thus existence of an efficient algorithm is extremely
unlikely.
However practical algorithms that just check
some
sufficient
conditions
for
view
serializability can still be used.
Recoverable Schedules
Need to address the effect of transaction failures on concurrently
running transactions.



Recoverable schedule —For each pair of transactions Ti, Tj
such that Tj reads a data item previously written by a transaction
Ti , then the commit operation of Ti appears before the commit
operation of Tj.
The following schedule is not recoverable if T9 commits
immediately after the read
If T8 abort, T9 would have read (and possibly shown to the user)
an inconsistent database state. Hence, database must ensure
that schedules are recoverable.
Cascading Rollbacks


Cascading rollback – a single transaction failure leads
to a series of transaction rollbacks.
Consider the
following schedule where none of the transactions has yet
committed
(so
the
schedule
is
recoverable)
If T10 fails, T11 and T12 must also be rolled back because T11
depends on T10 and T12 depends on T11.
Can lead to the undoing of a significant amount of work
Cascadeless Schedules



Cascadeless schedules — cascading
rollbacks cannot occur; for each pair of
transactions Ti and Tj such that Tj reads a
data item previously written by Ti, the commit
operation of Ti appears before the read
operation of Tj.
Every cascadeless schedule is also
recoverable
It is desirable to restrict the schedules to
those that are cascadeless
Concurrency Control

A database must provide a mechanism that will
ensure that all possible schedules are





either conflict or view serializable, and
are recoverable and preferably cascadeless
A policy in which only one transaction can execute
at a time generates serial schedules, but provides
a poor degree of concurrency
Testing a schedule for serializability after it has
executed is a little too late!
Goal – to develop concurrency control protocols
that will assure serializability.
Concurrency Control vs. Serializability Tests


Concurrency-control protocols allow concurrent schedules, but
ensure that the schedules are conflict/view serializable, and are
recoverable and cascadeless .
Concurrency control protocols generally do not examine the
precedence graph as it is being created




Instead a protocol imposes a discipline that avoids nonseralizable
schedules.
We study such protocols in next chapter.
Different concurrency control protocols provide different tradeoffs
between the amount of concurrency they allow and the amount of
overhead that they incur.
Tests for serializability help us understand why a concurrency
control protocol is correct.
Summary


Executing transaction in an interleaved way may bring a
database in an inconsistent state
Transaction anomalies are:






Lost update,
Dirty read, and
Unrepeatable read
A DBMS is responsible to ensure that either all operations
of a transaction are successfully executed, or it is rolled
back
Log file records all important events (start, read, write,
commit)
When a transaction reaches its commit point, everything is
safely stored in a database (or a log file)
Summary on Serializability





Concurrency anomalies are avoided if a schedule is
serializable (equivalent to a serial schedule)
Serializable schedules are desirable because serial
schedules are generally inefficient
A schedule is conflict serializable if there are no cycles in
the precedence graph
Serializibility is only a tool to develop understanding of
desirable schedules, but it is not tested by DBMS before
executing a series of transactions
Instead, DBMS’s apply protocols that assure correctness
of the schedule

Many of these protocols rely on locking
End of Chapter