Download Transaction Processing and Recovery

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Oracle Database wikipedia , lookup

Open Database Connectivity wikipedia , lookup

Microsoft SQL Server wikipedia , lookup

Functional Database Model wikipedia , lookup

Relational model wikipedia , lookup

Global serializability wikipedia , lookup

Database wikipedia , lookup

Database model wikipedia , lookup

Clusterpoint wikipedia , lookup

Microsoft Jet Database Engine wikipedia , lookup

Commitment ordering wikipedia , lookup

Versant Object Database wikipedia , lookup

Extensible Storage Engine wikipedia , lookup

Serializability wikipedia , lookup

Concurrency control wikipedia , lookup

Transcript
Transaction Processing, Concurrency
and
Recovery
CSCI 6442
©Copyright 2015, David C. Roberts, all rights reserved
Agenda







Definitions
SQL statements
Locking
Deadlock and livelock
Two-phase locking
Two-phase commit
Recovery
2
Definitions
A transaction is
a unit of work, from the viewpoint of a
user of the system—or from the viewpoint
of an application
a logical unit of work which may involve a
sequence of steps but which normally will
be considered by the user as one action
3
Properties of a Transaction
The ACID test was defined by Jim Gray of IBM, Tandem, DEC
and now Microsoft:
Atomicity: changes made by the transaction either all
happen or none of them happen
Consistent: at the start and end of the transaction the
database is in a consistent state
Isolated: the results produced by the transaction are the
same as would be produced if nothing else was running
Durable: once a transaction completes, the changes that it
makes are permanent
4
SQL Statements
BEGIN TRANSACTION: marks the
beginning of a transaction. Sometimes
not used.
 COMMIT WORK: marks the end of a
transaction’s processing. Changes are
made permanent.
 ROLLBACK WORK: marks the failure of a
transaction. Changes made by the
transaction (since BEGIN TRANSACTION
or since the last COMMIT WORK) are
removed.

5
Transaction Processing





This approach assumes a single database server
with multiple processes accessing the database
All operations are carried out synchronously
Today, large IT complexes may delegate parts of
a single transaction to multiple servers
As long as there is a single lock list, this
approach can be used for multiple database
servers accessing a single database
If there is not a single lock list, the same
techniques can be extended to a fully distributed
configuration
6
LOCKING
7
Problems Avoided Through Locking
Lost update
 Uncommitted dependency
 Inconsistent analysis

8
Lost Update
A and B are independent transactions reading the database.
1. A reads X
3. A adds 10 to X
5. A stores X
2. B reads X
4. B adds 20 to X
6. B stores X
Questions:
1. What is the final value of X?
2. What is the correct value of X?
3. What happened here?
A’s update has been lost, because it was overwritten by B’s change.
The solution is to allow only one transaction at a time to change the
9
same data.
Uncommitted Dependency
A and B are independent transactions
A updates X
B updates X
A performs a rollback
What has happened here?
B’s change has been lost because of A’s rollback
The solution: prevent B from making changes to X
until after A commits.
10
Inconsistent Analysis
A and B are independent transactions. A is a relation with all
values of Ai = 20.
A
A
B
A
B
A
reads A1 total = 20
reads A2 total = 40
changes A3 to 30
reads A3 total = 70
changes A2 to 10
reads A4 total = 90
What has happened here? The total is too high because B
changed values of A while the total was being computed.
The solution is to prevent any changes in any A values while
the total is being computed.
11
Locking
It is necessary for many transactions to
run at once, each potentially changing the
database
 They must not interfere with each other
 Locking has allows a transaction to
reserve data that it is reading or changing

12
Serializability
A transaction schedule is serializable if its results
are the same as the results of running them serially
—that is, one after the other. It is not important
which transaction executes first, only that the result
does not reflect any mixing of the transactions.
Serializability is the major correctness criterion for
the execution of concurrent transactions
We use locking to serialize our execution schedules
for transactions.
Wikipedia
13
Types of Locks
Read (aka shared): allows other
transactions to hold shared locks on the
same data, and to read the data
 Write (aka exclusive): permits no other
locks to be held on the same data until it
is released, and permits no operations on
the data except by the lockholder

14
Use of Locks







Before reading, acquire a read lock
If reading many values, read lock them all
Before updating, acquire a write lock
After the operation, release the lock
Usually SQL operates in autocommit mode, and
the DBMS obtains and releases locks
For transactions, SELECT FOR UPDATE allows for
acquisition of exclusive locks
COMMIT releases all locks
15
Lock Compatibility Matrix
If a first lock blocks the application of a second lock, the locks
are said to be incompatible.
Lock Type
Read-Lock
Read-Lock
Write-Lock
Write-Lock
x
x
x
16
Now, perhaps all our
concurrency problems are
solved!
Sorry, not quite…..
DEADLOCK
18
The Four Philosophers
A new philosophy department is formed
 The desire is for it to be as efficient as
possible
 All philosophers will be either thinking or
eating at all time
 No time will be spent talking or doing
anything else
 Obviously paper writing will happen at a
high level

Thanks to the great Edsger Dijkstra
19
The Four Philosophers (contd.)
The Department lunch room
20
Deadly Embrace
What’s the problem here?
 When a philosopher can’t get enough
resources to complete a transaction, then
as long as she holds resources waiting for
more, deadlock can occur.
 The solution is to require each philosopher
to pick up two forks at a time, or no forks.
 Unfortunately this won’t work for database
systems—we can’t know what data a
transaction will change in advance.

21
Solving the Deadly Embrace Problem

Timeout was the most popular solution for
a long time—set a threshold and roll back
transactions taking longer than the
threshold


But when the system gets busy, transactions
take much more time, so more get rolled back,
making the situation worse
A method has been found to detect
deadlock when it occurs, so that one
transaction can be rolled back
22
Waits-For Graph





Use a directed graph
Node represents a resource (rows, typically)
Arc represents a transaction that holds one
resource and waits on another
Every time a transaction waits for a lock, an arc
is added to the graph and the graph is checked
for loops
If a loop is detected in the waits-for graph, then
a deadlock has occurred and the junior
transaction is rolled back.
23
Waits-For Graph
Transaction X
Row A
Transaction Y
Row B
24
Livelock, a.k.a. “starvation”
Suppose that priority is given to new
requests for resources.
 If the system stays busy, then longrunning transactions may never be
completed.
 This phenomenon is called livelock.

25
Putting It All Together
26
Two-Phase Locking



We can’t request and apply all locks at once
But we can be smart about locking
Two-phase locking protocol applies and removes
locks in two phases:



1. expanding phase: locks are acquired, no locks are
released
2. shrinking phase: locks are released and no locks are
acquired
Summary: never acquire a lock after a lock has
been released
27
Strict Two-Phase Locking
S2PL requires that a transaction needs to
comply with 2PL and release its exclusive
locks only after it has ended (i.e., being
either committed or aborted)
 Shared locks can be released during phase
2

28
Strong Strict Two-Phase Locking
SS2PL is a subclass of S2PL
 SS2PL requires that the locking protocol
releases both exclusive and shared locks
acquired by a transaction only after the
transaction has ended.
 An SS2PL transaction has a phase 1 that
lasts the entire transaction, and a
degenerate phase 2.
 Also called “Rigorous”, most commonly
used in database systems

29
RECOVERY
30
Recovery
Two types of recovery:
 loss of media
 system failure, media not damaged
31
Loss of Media
Database is restored from backup
 Log is used to make all changes that were
made since last backup

32
System Crash
Called a “soft” failure
 System stops working but storage is not
damaged
 The log is used to restore the database to
a consistent state

33
System Log
Has a record of each change made by
every transaction
 Each transaction may have multiple log
records if it makes changes at different
times; each log record contains changes
for only one transaction
 Every log record has a sequence number,
the LSN, assigned in ascending order
 LSN is, effectively, a time stamp

34
Writing
Changes to the database are written
directly to the storage that is changed
 The LSN of the latest change made to a
page is written on the page
 However, WAL protocol strictly requires
that the log be written before its database
changes are written
 Every change to the database is first
written to the log

35
WAL—Write-Ahead Logging
WAL is followed by all modern database
systems
 Before a change is written to the
database, the log entries must be written
 The log is written sequentially (on disk) so
is easily written more quickly than writes
to the database; but positive interlocks
are used
 Often two copies of the log are written in
case of failure of one

36
System Log
Log contains before and after images of
every change made by a transaction
 Log also contains checkpoint record, that
includes list of all active transactions, time
stamp
 Buffers are flushed at each checkpoint

37
Recovery Operations
Redo—repeat a transaction, ensuring that
everything it was to do has been
completed. Do not repeat any parts of the
transaction that were carried out.
 Undo—reverse all changes made by a
transaction

38
Recovery Processing
Recovery processing starts at the
youngest checkpoint record in the log
 Going forward from the checkpoint record,
redo is applied to every transaction listed
in the checkpoint record
 Going backward from the end of the log,
undo is applied to every transaction that
does not have a commit shown in the log

39
Log Analysis
T1
T2
T3
T4
T5
No Change
Redo
Undo
Redo
Undo
Checkpoint
Failure
40
What’s Wrong with This Picture?
Checkpoints reduce throughput




Acceptance of new transactions must be
stopped
All buffers must be flushed
All active transactions must be saved in
checkpoint record
We’d like to do recovery without
checkpoints!
41
ARIES: Algorithm for Recovery and
Isolation Using Semantics
The recovery algorithm implemented by
every important DBMS today
 First implemented as improved recovery in
IBM’s DB2
 Described in landmark paper in ACM TODS
in 1992 by Mohan et. al.

42
ARIES and Logging
Every log record is assigned a log
sequence number (LSN) in ascending
order
 LSN is essentially a (compact form of)
time stamp
 Uses WAL
 When a database page is updated, LSN of
the update is written on the page

43
ARIES Checkpoints
ARIES take checkpoints
 Buffers are not flushed for checkpoints
 Checkpoint includes:




Active transactions and their states
LSNs of most recently written log record for
each transaction
Dirty data in the buffer pool
44
ARIES Recovery

Starts analysis pass from youngest checkpoint
record



Goes forward, to end of log, bringing information about
transactions and dirty pages up to date
Dirty pages information determines startng point for
redo pass, which is next
Redo pass



ARIES repeats history for updates not written to
database
Done for updates of all transactions
Log record’s update is redone if database page’s LSN is
less than log record’s LSN (contd.)
45
ARIES Recovery (Contd.)
After redo pass, database is in state just
before failure
 Undo pass starts at end of log




All “loser” transactions are rolled back
One sweep of the log for all undos
Undos are not conditional—all transactions
without a commit log entry are undone.
46
Simplifications
This presentation somewhat simplifies
ARIES so that it can be understood
 ARIES also has some recovery built into it
to handle failures during recovery
 This function as recovery within recovery,
would make our conversation hard to
understand
 There is also another feature you can
explore as part of the assignment for next
week

47
Two-Phase Commit


Suppose a commit must be made across two
independent databases, with two independent
servers
Then two-phase commit is used
1.
2.
3.

Send each server its part of the transaction, tell it to
PREPARE TO COMMIT
When all servers are ready, then send COMMT
command
If any server can’t commit, then all get ROLLBACK
command
What are tradeoffs here?
48
Thank you!
49