Download 4. Transaction processing concepts

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Database wikipedia , lookup

Clusterpoint wikipedia , lookup

Database model wikipedia , lookup

Consistency model wikipedia , lookup

Microsoft Jet Database Engine wikipedia , lookup

Object-relational impedance mismatch wikipedia , lookup

Extensible Storage Engine wikipedia , lookup

Global serializability wikipedia , lookup

Commitment ordering wikipedia , lookup

Concurrency control wikipedia , lookup

Serializability wikipedia , lookup

Transcript
3. Transaction processing concepts
What is a transaction?
• Logical unit of work
• Execution of a program that accesses or
changes the database
Fundamental property of a database:
• The data are shared among several users and
applications  concurrent processing,
possibly conflicting interests.
AdvDB-3
J. Teuhola 2015
64
Transaction viewpoints
• A transaction should lead the database from
one consistent state to another.
• Partial execution not allowed (principle:
all or nothing)
• Concurrent access to data by multiple
transactions should be supported.
• Transactions may end prematurely, due to
system (hardware/software) failure;
recovery mechanisms are needed.
AdvDB-3
J. Teuhola 2015
65
Categories of database processing
• OLTP = On-Line Transaction Processing
– Needed for the everyday functioning of the
organization’s main activities, accessing the
’operative’ databases
– Short transactions and response times
• OLAP = On-Line Analytic Processing
–
–
–
–
Decision support, data mining
Long read-only transactions
’Data warehouse’
The data need not be exactly up-to-date; refreshed
periodically
AdvDB-3
J. Teuhola 2015
66
Interleaved execution
• Concurrent transactions may be executed in
an interleaved fashion.
• Transactions consist of steps, between which
the data may be inconsistent.
• Arbitrary interleaving of steps will cause
problems (interference):
– Lost update problem
– Temporary update problem (‘dirty read’)
– Incorrect summary problem
AdvDB-3
J. Teuhola 2015
67
Lost update problem
Example: Bank account, two parallel withdrawals
Time
Transaction 1:
Withdraw 200 from account 123
Transaction 2:
Withdraw 100 from account 123
bal = read_balance(123) = 1000 €
bal = bal – 200 €
bal = read_balance(123) = 1000 €
bal = bal - 100 €
write_balance(123, bal) = 800 €
write_balance(123, bal) = 900 €
• The first update is lost. Note that the transactions are not
aware of each other’s internal program variables (bal).
AdvDB-3
J. Teuhola 2015
68
Temporary update (‘dirty read’) problem
Example transactions on bank accounts:
Time
Transaction 1
Transfer 100 € from account
123 to account 789
Transaction 2
Withdraw 200 € from account 123
bal = read_balance(123)=1000€
bal = bal - 100€
write_balance(123, bal)=900€
bal = read_balance(123) = 900€
bal = read_balance(789)=2000€
abort;
recover_balance(123)=1000€
bal = bal - 200 €
write_balance(123, bal)=700 €
// Should be 800 €
AdvDB-3
J. Teuhola 2015
69
Incorrect summary problem
Time
Transaction 1
Sum of balances
Transaction 2
Transfer 100 € from 789 to 123
sum = 0
bal = read_balance(123) = 1000 €
sum = sum + bal = 1000 €
bal = read_balance(456) = 2000 €
sum = sum + bal = 3000€
bal = read_balance(789) = 3000 €
bal = bal – 100 €
write_balance(789, bal) = 2900 €
bal = read_balance(123) = 1000 €
bal = bal + 100€
write_balance(123, bal) = 1100 €
bal = read_balance(789) = 2900 €
sum = sum + 2900 € = 5900 €
// Should be 6000 €
AdvDB-3
J. Teuhola 2015
70
Reasons for recovery
• DBMS must ensure that each transaction is
either
(1) executed in totality (all operations completed
successfully; updates made permanent), or
(2) has no effect on the database contents
nor on other transactions.
• In case of failure, the DBMS must roll back the
(partially executed) transaction so that (2)
holds.
AdvDB-3
J. Teuhola 2015
71
Types of failures
• Hardware/software failure: contents of main
memory may be lost.
• Transaction error: e.g. illegal operation or
user interrupt.
• Logical error: Violation of database
consistency constraints.
• Concurrency control error: Interference of
transactions, e.g. deadlock.
• Disk failure, accidents, etc.: Some data are
usually lost.
AdvDB-3
J. Teuhola 2015
72
Transaction events
•
•
•
•
•
•
•
begin_transaction
read/write
commit: successful end; fix the changes
rollback (abort): unsuccessful ending
undo effects of a failed transaction
redo effects of a successful transaction
end_transaction
AdvDB-3
J. Teuhola 2015
73
System log
• On disk (permanent storage)
• Enables recovery from failures
• Stores data about transaction states and
performed updates (before-/after-images)
• Transactions identified by a unique id.
• Log entries:
start, [read,] write, commit, abort
• Undo needs before-image (old value)
• Redo needs after-image (new value)
AdvDB-3
J. Teuhola 2015
74
Commit point
• Database operations performed successfully
• Effects recorded in the log.
• At commit the log page in buffer must be
force-written to disk.
• Writing other buffer pages may be postponed.
• At failure:
– Uncommitted transactions are undone.
– Committed transactions may have to be redone.
AdvDB-3
J. Teuhola 2015
75
Checkpoint
• Effects of write operations of committed
transactions are forced to disk.
• A transaction need not be redone at failure,
if its commit-entry in the log is before the last
checkpoint.
• Checkpoint is taken periodically.
AdvDB-3
J. Teuhola 2015
76
Checkpoint actions
• Suspend execution.
• Force-write committed updates.
• Write the checkpoint info to the log:
– active transactions
– pointers to the first and last log entries of
transactions (to speed-up recovery)
• Resume execution.
AdvDB-3
J. Teuhola 2015
77
Transaction processing scenario
Main memory
Program 1
Program 2
DBMS
manages
DB buffer
Log
buffer
‘Page frames’
Force-write at checkpoint
DB
Disk pages
AdvDB-3
Force-write at commit
Log pages
J. Teuhola 2015
Before-/
after-images
of changed
tuples
78
Desirable (ACID) properties of
transactions
• Atomicity: All or nothing.
• Consistency: From one consistent state of the
database to another.
• Isolation: Updates are invisible before commit;
no need for cascading rollbacks.
• Durability: Committed changes are not lost
(responsibility of the recovery method), except
in catastrophic failures.
AdvDB-3
J. Teuhola 2015
79
Levels of isolation
0: No overwrite of dirty reads.
1: No lost updates.
2. No dirty reads & no lost updates.
3. ‘True isolation’: No dirty reads & no lost
updates & repeatable read.
AdvDB-3
J. Teuhola 2015
80
Schedule
• Assume a set of transactions {T1, ..., Tn}
with ordered operations:
Ti = <op1>i ; <op2>i ; ...
• A schedule S is an ordered set of operations,
where each Ti-sequence is a subsequence
of S.
• Notice: Operations of different transactions
may be interleaved.
AdvDB-3
J. Teuhola 2015
81
Example schedule
• Operation sequences of three transactions:
T1:
T2:
T3:
r1(x); r1(y); w1(x); c1;
r2(y); w2(y); c2;
r3(x); r3(y); w3(x); c3;
• One possible interleaved schedule:
r1(x); r2(y); r1(y); w1(x); r3(x); r3(y); w2(y); w3(x); c2; c1; c3;
AdvDB-3
J. Teuhola 2015
82
Conflict of operations
• Notation for operations:
<op><trans-id>(<data item>)
where <op> is one of
{r, w, c, a} = {read, write, commit, abort}
• Conflict (= danger): Two operations
<op1>i(X) and <op2>j(X)
may cause trouble if
i  j and (<op1> = w or <op2> = w)
AdvDB-3
J. Teuhola 2015
83
Example of conflicts
• Schedule:
r1(x); r1(y); r2(x); w1(x); r2(y); w2(x); c1; c2;
• Conflicting pairs of operations:
– r1(x) and w2(x)
– r2(x) and w1(x)
– w1(x) and w2(x)
• Not in conflict:
– r1(y) and r2(y)
AdvDB-3
J. Teuhola 2015
84
Complete schedule
S is a complete schedule of T1, T2, ..., Tn, if
(1) S contains operations of T1, T2, ..., Tn
(but no others).
(2) Operation order of each Ti is preserved.
(3) Order of operations is fixed for each conflicting
pair; otherwise ordering may be partial.
(4) Commit/abort is the last operation of each Ti.
AdvDB-3
J. Teuhola 2015
85
Committed projection
• Committed projection of S =
All operations of S belonging to committed
transactions.
• Reason for this concept:
New transactions appear all the time;
difficult to find a moment when no active
transactions exist.
AdvDB-3
J. Teuhola 2015
86
Recoverable schedule
• Transaction T2 that reads X does not commit
until T1 that wrote X (before the read) has
committed.
• Explanation:
T2 can be rolled back as long as it has not
committed. If T1 aborts, T2 should be rolled
back to avoid consequences of dirty read.
AdvDB-3
J. Teuhola 2015
87
Cascading rollback
Cascade phenomenon:
T2 has to be aborted because T1 aborted;
T3 has to be aborted because T2 aborted; ...
Prevention: Read an item only after the writing
transaction has committed (cascadeless
schedule).
Strict schedule: Read/write X only after the last
writer transaction of X has committed.
Advantage: Simpler undo.
Disadvantage: Lower level of concurrency
AdvDB-3
J. Teuhola 2015
88
Serializability
Aim:
• Correct (but not necessarily serial) schedules
for interfering transactions
Means:
• Sufficient isolation
Serial schedule:
• All operations of each transaction Ti are
executed consecutively, without interleaving.
AdvDB-3
J. Teuhola 2015
89
Serializability (cont.)
Axiom: For independent transactions every
serial schedule is correct (n! alternatives
for n transactions).
Problem: Limited concurrency; if one transaction
waits for I/O, another cannot take over; the
processor is idle.
Serializable schedule: Equivalent to some
serial schedule (same result, though
interleaved execution); means logical
correctness.
AdvDB-3
J. Teuhola 2015
90
Forms of schedule equivalence
• Result equivalence:
May be accidental; it should hold for all possible
database states.
• Conflict equivalence:
The order of any two conflicting operations is
the same in both schedules.
• View equivalence: A read operation reads the
result of the same write operation (or none) in
both schedules.
AdvDB-3
J. Teuhola 2015
91
Forms of serializability
Corresponding to the forms of equivalence:
• Conflict serializability:
Simple testing algorithm
• View serializability:
Less restrictive than conflict serializability.
Testing is an NP-complete problem.
Note. Every conflict-serializable schedule is
also view-serializable, but not vice versa.
AdvDB-3
J. Teuhola 2015
92
Testing conflict serializability
1. For each transaction, create a node in a
precedence graph.
2. For each pair of conflicting operations
<op>Ti(X) and <op>Tj(X), occurring in this
order, create an edge from Ti to Tj.
3. The schedule is serializable if the
precedence graph has no cycles.
AdvDB-3
J. Teuhola 2015
93
Serializable schedule 
equivalent serial schedule
1. Derive the precedence graph of transactions
as above.
2. Perform a topological sort of the graph nodes.
3. The resulting sequence is the solution
(which may not be unique, in general).
AdvDB-3
J. Teuhola 2015
94
Example of a precedence graph
Schedule:
r1(x), r2(y), w1(x), r3(x), r3(y), w3(x), w3(y), w2(y),
c1, c2, c3
1
2
y
x
y
3
AdvDB-3
J. Teuhola 2015
The schedule is
not serializable,
due to the cycle.
95
Problems of serializability
• Difficult to determine the order of operations
beforehand.
• Scheduling must be done on-the-fly, based on
waiting, priorities, system load, etc.
• Testing serializability afterwards may be too
optimistic (cf. optimistic concurrency control).
• Serializability can be applied only to the
committed projection.
AdvDB-3
J. Teuhola 2015
96
Solution to maintaining serializability
• Control the interleaving of operations by
obeying rules (a protocol) that guarantee
serializability without checking it.
(E.g. two-phase locking protocol, see later)
• Conclusion: Serializability theory helps to
gain better understanding of protocols, to
obtain correct interleaving and improved
concurrency.
AdvDB-3
J. Teuhola 2015
97
Debit-credit transactions
• Less restrictive form of equivalence
• Addition to and subtraction from an
item can be done in any order
(commutativity).
• A schedule is correct if
<read, add/subtract, write> triples
are not interrupted.
AdvDB-3
J. Teuhola 2015
98
Granularity
• To be decided: Unit of data considered in
scheduling
• A smaller unit enables a higher level of
concurrency (but involves also more effort to
manage)
• A (disk) page is often the simplest unit of
operation.
AdvDB-3
J. Teuhola 2015
99
Transaction support in SQL
• Implicit BEGIN_TRANSACTION
• Explicit COMMIT or ROLLBACK
• SET TRANSACTION characteristics:
– Access mode: READ ONLY or READ WRITE
– Diagnostic area size: Number of feedback conditions
held simultaneously.
– Isolation level (from weakest to strongest):
•
•
•
•
READ UNCOMMITTED (allows dirty reads)
READ COMMITTED (allows non-repeatable reads)
REPEATABLE READ (allows ‘phantoms’)
SERIALIZABLE (actually stronger than defined earlier)
AdvDB-3
J. Teuhola 2015
100