Database Transactions and Processess
H18 ACID properties of transactions
ACID properties van een DB
Atomicity: Elke transactie wordt volledig uitgevoerd(executed) of helemaal niet (en heeft dus geen
Consistency:DB moet consistent zijn. De waarden moeten de juiste waarden representeren. Het
uitvoeren v/e transactie in isolation behoud de DB consistency en veranderd naar een nieuwe
status welke de werkelijkheid weergeeft
Isolation: De concurrent execution van een set van transacties heeft het zelfde effect als een
seriële uitvoering van dezelfde set
Durability: De resultaten van committed transacties zijn permanent
De DB moet voldoen aan alle Integrity Constraints (niet alle DB states kunnen worden toegestaan):
Internal consistency: bijv Referential integrity en Replicated data
Enterprise rules: Enforcement of procedures.
Wanneer een transactie wordt uitgevoerd, moet deze voldoen aan deze voorwaarden. Gebeurt dit
niet, dan is er geen zekerheid over hoe de DB zal functioneren en is dus de data niet betrouwbaar.
Transaction consistency is een verantwoordelijkheid van de ontwerper van de transactie. Hij moet
er voor zorgen dat wanneer een transactie is uitgevoerd de DB in een juiste staat verkeerd door
het voldoen aan de Integrity Contstraints.
Levels of consistency:
Dynamic: deze kan eigenlijk niet worden getest in één DB state.
Een transaction is een unit of work: Een transactie moet alle werk doen om een DB te updaten. Dit
kan nooit door 2 transacties worden gedaan want dat zou een violation zijn van de consistency
Atomic execution implies that every transaction either commits or aborts.
In a DB changes are undone or rolled back (maw de oude situatie wordt terug gezet).
Redenen voor een abort:
Violation of integrity constraints (voorwaarden)
Resources unavailable (eg HD crash), violating the Isolation requirement
Het system (DB) moet garanderen dat wanneer een transactie commits (vastleggen), de effecten
in de DB blijven ook als de computer of het opslag medium uitvalt.
Normaal gesproken gebeurt dit niet bij een OS, maar den DB moet het garanderen. Een
mogelijkheid om te garanderen zijn bijvoorbeeld mirrored disks of disks in RAID geschakeld.
Durability is relatief, men moet bepalen wat het waard is en welke graad dus wordt uitgevoerd.
Concurrent execution (gelijktijdige uitvoering) van transacties. Maar de uiteindelijke uitvoering zal
nog steeds serieel zijn. Dus om aan de Isolation property te voldoen, moet een concurrent
schedule serializable zijn.
ACID properties garanderen dat de DB correct, consistent en up-to-date model van de echte wereld
is. Alhoewel is het strikt vast houden aan bijvoorbeeld Isolation belangrijk voor het functioneren
van de DB, het is erg conservatief. Een DB kan goed functioneren met zonder dat alle transacties
Isolated zijn, maar dit kan een risico met zich mee brengen. Conservatieve ACID is een zware
belasting op het systeem.
Interleaving: Het samenkomen van verschillende transacties in tijd. (zie ook de schema‟s)
H19 Models of transactions
Flat transactions
Some limitations
Single DBMS
All or nothing; no continue from certain point
No human or device action possible as part of transaction (cannot rollback)
Everything at once; no possibility to defer to later time without loosing ACID
A possible solution is to use savepoints.
Save points
Used for partial rollback of db.
Rollback to spi causes db updates after creation of spi to be undone
o S2 and S3 updated the database (else no point rolling back over them)
Program counter and local variables are not rolled back
Savepoint creation does not make prior database changes durable (abort rolls all changes
Implementation of save points:
When Ti creates a savepoint, s, insert a marker for s in Ti‟s lock list, Li , that separates lock
entries acquired before creation from those acquired after creation
When Ti rolls back to s, release all locks following marker for s in Li (in addition to undoing
all updates made since savepoint creation)
Distributed Transactions
Many enterprises support multiple legacy systems doing separate tasks. Increasing automation
requires that these systems be integrated.
Goal: distributed transaction should be ACID
Each subtransaction is locally ACID (e.g., local constraints maintained, locally serializable)
In addition the transaction should be globally ACID
A: Either all subtransactions commit or all abort
C: Global integrity constraints are maintained
I: Concurrently executing distributed transactions are globally serializable
D: Each subtransaction is durable
Distributed Database: one design but due to size split up over several hosts
Multidatabase: Several designs (may be used by several companies)
Hierarchical Model:
o No concurrency among subtransactions, root initiates
Peer Model:
o Concurrency among siblings and between parent and
children, any subtransaction can initiate commit
Nested Transactions
Parent can create children to perform subtasks; children may execute sequentially or
concurrently; parent waits until all children complete (no communication between parent
and children).
Each subtransaction (with its descendants) is isolated wrt each sibling (and its
descendants). Hence, siblings are serializable, but order is not determined and nested
transaction is non-deterministic.
Concurrent nested transactions are serializable.
A subtransaction is atomic. It can abort or commit independently of other subtransactions.
Commit is conditional on commit of parent (since child task is a subtask of parent task).
Abort causes abort of all subtransaction‟s children.
Nested transaction commits when root commits. At that point updates of committed
subtransactions are made durable.
Locking implementation of nested transactions
transactions satisfy:
Nested transactions are isolated with respect to one another
A parent does not execute concurrently with its children
A child (and its descendants) is isolated from its siblings (and their descendants)
Acquiring a read lock
A request to read x by subtransaction T2 of nested
transaction T1 is granted if:
No other nested transaction holds a write lock
on x
All other subtransactions of T1 holding write
locks on x are ancestors of T2 (hence are not
Acquiring a write lock
A request to write x by subtransaction T2 of
nested transaction T1 is granted if:
No other nested transaction holds a read
or write lock on x
All other subtransactions of T1 holding
read or write locks on x are ancestors of
T2 (and hence are not executing)
All locks obtained by T2 are held until it completes
If it aborts, all locks are released
If it commits, any locks it holds that are not held by its parent are inherited by its parent
When top-level transaction (and hence entire nested transaction) commits, all locks are released.
Chained transactions
Chaining allows a transaction to be decomposed into sub-transactions
with intermediate commit points.
Database updates are made durable at intermediate points => less
work is lost in a crash
Chaining compared with savepoints:
Savepoint: explicit rollback to arbitrary savepoint; all updates lost in a crash (put on
Chaining: abort rolls back to last commit; only the updates of the most recent transaction
lost in a crash
ACID properties, what did we lose?:
Atomicity Lost
Isolation  Lost
Consistency  needs consistency for subtransactions
Durability  Kept!
One approach to atomicity problem is compensation (see SAGAS)
Sagas are an extension to chained transactions that achieves partial atomicity
For each subtransaction, STi in a chained transaction T, a compensating transaction, CTi is
Thus if a transaction T consisting of 5 chained subtransactions aborts after the first 3
subtransactions have committed, then ST1 ST2 ST3 CT3 CT2 CT1 will perform the desired
With this type of compensation, when a transaction aborts, the value of every item it
changed is eventually restored to the value it had before that transaction started
However, complete atomicity is not guaranteed
o Some other concurrent transaction might have read the changed value before it
was restored to its original value
Other problems?
o non compensatable transactions (reset(X))
Recoverable queues
A recoverable queue is a transactional data structure in which information about transactions to be
executed later can be durably stored.
See also slide 31 and 32 for example
Queue could be implemented within the DB but performance suffers. A transaction should not hold
long duration locks.
Separate implementation takes advantage of semantics to improve performance.
Since queue is implemented by a separate
server (different from DBMS), the locking
discipline need not be two-phase; discipline can
be designed to suit the semantics of (the
abstract operations)
enqueue and dequeue
Lock on head (tail) pointer released
when dequeue (enqueue) operations
complete (Hence not strict or isolated)
Lock on entry that is enqueued or
dequeued held to commit time
Queue and DBMS are two separate systems.
Transactions must be committed at both but
isolation is implemented at the DBMS and
applies to the schedule of requests made to the
DBMS only.
As a result, any scheduling policy for accessing the queue might be enforced.
Real-world actions.
A real-world action performed from within a
transaction T, cannot be rolled back if crash occurs
before commit. On recovery after a crash, how can
we tell if the action has occurred?
Solution: device maintains read-only counter
(hardware) that is automatically incremented with
each action.
On recovery:
1. recover queue and database;
2. read recorded value of counter from database;
3. if (device value > recorded value)
then discard head entry;
// device performed action
else ;
// device did not perform action
4. restart server(s);
// which re-initiates transaction,
// because request was restored
// on queue
A workflow is a model of a complex, long-running enterprise process generally performed in a
highly distributed and heterogeneous environment.
Workflow task
Self-contained job performed by an agent
o Inventory transaction (agent = database server)
o Packing task (agent = human)
Has an associated role that defines type of job
o An agent can perform specified roles
Accepts input from other tasks, produces output
Has physical status: committed, aborted, ...
o Committed task has logical status: success, failure
Examples see slides 46 and 47 of lecture.
ACID properties
Individual tasks might be ACID, but workflow as a whole is not
o Some task might not be essential: its failure is ignored even though workflow
o Concurrent workflows might see each other‟s intermediate state
o Might not choose to compensate for a task even though workflow fails
Each task is either
o Retriable: Can ultimately be made to commit if retried a sufficient number of
times (e.g., deposit)
o Compensatable: Compensating task exists (e.g., withdraw)
o Pivot: Neither retriable nor compensatable (e.g., buy a nonrefundable ticket)
Allows management of an enterprise to guarantee that certain activities are carried out in
accordance with established business rules, even though those activities involve a
collection of agents, perhaps in different locations and perhaps with minimal training
Audit trail
Statistics … management information
H20 Implementing Isolation
Database operations p1 and p2 commute wanneer voor alle initiele DB states:
a) Return the same results and
b) Leave the DB in the same final state ongeachte de volgorde van uitvoering.
p1 commutes with p2 if
They operate on different data items
o w1(x) commutes with w2(y) and r2(y)
Both are reads
o r1(x) commutes with r2(x)
Operations that do not commute conflict
o w1(x) conflicts with w2(x)
o w1(x) conflicts with r2(x)
Voorbeelden van commuting operations zie slide 9 en 10 van 2 e HC
Conventional Operations
We abstract operations to two classes:
o r(x, X) - copy the value of database variable x to local variable X
o w(x, X) - copy the value of local variable X to database variable x
We use r1(x) and w1(x) to mean a read or write of x by transaction T1
Serializable schedules:
S is serializable if it is equivalent to a serial schedule (dus gelijk aan een serial schema).
Transactions are totally isolated in a serializable schedule
A schedule is correct for any application if it is a serializable schedule of consistent
Serializability (zwaarste vorm van Isolation Level, eigenlijk predicate locking) voorziet in een
conservatieve definitie van juistheid. Hierdoor wordt er voor veel overhead gezorgd en zullen dus
de prestaties en doorvoer terug lopen. Sommige Schema‟s die niet serializable zijn, zijn soms wel
te accepteren zijn (lagere Isolation Levels).
Conflict equivalence vs view equivalence
Up to now, a schedule is serializable if it is conflict equivalent to a serial schedule.
Conflict equivalence definitie: schema‟s S1 en S2 zijn conflict equivalent als de conflicterende
operaties op in beide dezelfde manier zijn
2 schema‟s van dezelfde set van operaties zijn
view equivalent als:
Corresponding read operations in each
return the same values (dus de
berekeningen zijn hetzelfde).
And both schedules yield the same final
database state
Conflict equivalence implies view equivalence. View
equivalence does not imply conflict equivalence.
Conflict equivalence is sterker dan view
Serialization graphs
Een schema is serializable als de graaf cykel vrij is. Dit is de enige optie.
De volgende stappen ondernemen:
1. Vind de conflicten
2. Teken de graaf
3. Kijk of de graaf een cykel bevat.
Dirty read: reading data that hasn‟t committed
Dirty write: T1 writes data written by active transaction.
T1: w(x)
Strict schedule: dirty reads and dirty writes are prohibited
Let op! Een strict schema is niet hetzelfde als een serializable schema.
Concurrency Control
Find a schedule of correct interleaving (concurrent schema‟s zijn gelijk aan seriële schema‟s). De
CC kan niet het hele schema zien, dus is er een strategie.
Door het toepassen van CC kan de response-time behoorlijk groter worden en de doorvoer kan
behoorlijk beperkt worden.
Strategy Do not serve a request if:
1. Violates strictness or serializability
2. There might be a possibility that a subsequent arrival may cause a violation of
Modellen van CC:
Immediate update:
Write updates a db item
Read copies value from db item
Commit makes updates durable
Abort undoes updates
Deferred update:
Write stores value in transaction‟s intention list.
Read copies value from db or intention list
Commit uses intention list to durably update db
Abort discards intention list.
Pessimistic control:
Transaction requests permission for each DB (read/write) operation
CC can:
o Grant operation
o Delay it until a subsequent event occurs (commit or abort of other transaction)
o Abort the transaction
Decisions made conservatively so that every commit req. can be granted.
Takes precautions even if conflicts do not occur.
Optimistic control:
Requests for DB operations are always granted (read/write)
Request to commit might be denied
o Transaction is aborted if it performed a non-serializable operation
Assumes conflicts are not likely
Immediate Update Pessimistic CC
Rule (can be used at each arriving request):
Don‟t grant request that imposes an ordering amongst active transactions (delay the
requesting transaction).
Grant request that doesn‟t conflict with previously granted requests of active transactions.
A transaction is forced to wait (if request is delayed). Delayed requests are reconsidered when a
transaction completes (commit or abort, but it becomes inactive)
Result: Each schedule is equivalent to a serial schedule in which transactions are ordered in the
order in which they commit (=commit order)
A transaction can
read if it hold a Read (shared) lock, (granted if no transaction currently holds a write lock
on that item)
write and update if it holds a Write (exclusive) lock.(granted if no transaction holds any
lock on that item)
delayed if request cannot be granted
Granted mode
Requested mode
All locks are released when transaction completes
Lock is not granted if the request conflicts with the rule (zie boven), thus transaction waits.
Result: schedules are serializable and strict.
Implementation of locks using Lock set L(x) and Wait set W(x).
Deadlock: CC that cause transactions to wait can cause deadlocks.
Abort one transaction in the cycle
Use wait-for graph to detect cycle
Assume deadlock when transactions waits longer than time-out period
Manual locking
Manual lock release when finished accessing item. Better performance, but due to early lock
release non-serializable schedules are possible.
Two-phase locking
Transaction doesn‟t release lock until it has all locks it will ever require (first phase). Then lock
which are no longer needed will be released (unlocking phase).
Schedule produced by two-phased locking control:
Equivalent to a serial schedule (ordered by time of first unlock operation)
Not necessarily recoverable (dirty reads and writes possible).
A two-phase locking control that holds write locks until commit produced strict serializable
A strict two-phase locking control holds all locks until commit and produces strict and
serializable schedules
Lock granularity
Table locking (coarse)
o Lock entire table when a row is accessed.
Row (tuple) locking (fine)
o Lock only the row that is accessed.
Page locking (compromise)
o When a row is accessed, lock the containing page
Deferred Update Optimistic CC
Under optimistic assumption that conflicts don‟t occur, read and write requests are always granted
(no locking, no overhead!).
Transaction has three phases:
Begin transaction
o Read Phase - transaction executes: reads from database, writes to intentions list
(deferred-update, no changes to database)
Request commit
o Validation Phase - check whether conflicts occurred during read phase; if yes abort
(discard intentions list)
o Write Phase - write intentions list to database (deferred update) if validation
For simplicity, we assume here that validation and write phases form a single critical
section (only one transaction is in its validation/write phase at a time)
Guarantees an equivalent serial schedule in which the order of transactions is the order in
which they enter validation (dynamic)
A transaction is validated when it wants to commit. T1 enters validation, then check if T1 conflicted
with any transaction. Overlapping of validation is not allowed!
Advantage: No deadlock
Disadvantage: No rollback possibility
H21 Isolation in relational databases
An example, all rows where the name is „mary‟ are locked (or selected) and a sum is calculated.
But still there can be a row inserted with the name „mary‟. This las row is called a phantom.
Phantoms occur when row locking is used. Phantoms can be prevented by using table locking or
predicate locking.
Predicate locking prevents phantoms and produces serializable schedules, but is too
complex to implement
Table locking prevents phantoms and produces serializable schedules, but negatively
impacts performance
Row locking does not prevent phantoms and can produce nonserializable schedules
Predicate locking
A predicate describes a set of rows, some are in a table and some are not; e.g. name =
Every SQL statement has an associated predicate
When executing a statement, acquire a (read or write) lock on the associated predicate
Two predicate locks conflict if one is a write and there exists a row (not necessarily in the
table) that is contained in both
Locking is conservative: there might be no rows in Accounts satisfying both predicates
Non-repeatable read
With a non-repeatable read, execution of same SELECT twice yields the same set of rows, but
attribute values might be different
SQL Isolation Levels
READ UNCOMMITTED – dirty reads, non-repeatable reads, and phantoms allowed
READ COMMITTED - dirty reads not allowed, but non-repeatable reads and phantoms
REPEATABLE READ – dirty reads, non-repeatable reads not allowed, but phantoms allowed
SERIALIZABLE – dirty reads, non-repeatable reads, and phantoms not allowed; all
schedules must be serializable
Locking implementation of SQL Isolation Levels
Locking implementation is based on:
Entities locked: rows, predicates, …
Lock modes: read and write
Lock duration:
o Short - locks acquired in order to execute a statement are released when
statement completes
o Long - locks acquired in order to execute a statement are held until transaction
o Medium – something in between
Write locks are handled identically at all isolation levels:
Long-duration predicate write locks are associated with UPDATE, DELETE, and INSERT
statements. This rules out dirty writes
In practice, predicate locks are implemented with table locks or by acquiring locks on an
index as well as the data.
Read locks are handled differently at each level:
READ UNCOMMITTED: no read locks
o Hence a transaction can read a write-locked item!
Allows dirty reads, non-repeatable reads, and phantoms
READ COMMITTED: short-duration read locks on rows returned by SELECT
o Prevents dirty reads, but non-repeatable reads and phantoms are possible
REPEATABLE READ: long-duration read locks on rows returned by SELECT
o Prevents dirty and non-repeatable reads, but phantoms are possible
SERIALIZABLE: long-duration read lock on predicate specified in WHERE clause
o Prevents dirty reads, non-repeatable reads, and phantoms and …
o guarantees serializable schedules
Some DBMSs allow only read-only transactions to be executed on READ UNCOMMITTED level.
Cursor stability
Cursor stability is a commonly implemented isolation level, which is an extension of READ
Long-duration write locks on predicates
Short-duration read locks on rows
Additional locks for handling cursors
Access by T1 through a cursor, C, generally involves OPEN followed by a sequence of FETCHs
C is INSENSITIVE: rows FETCHed cannot be affected by concurrent updates (since OPEN is
C is not INSENSITIVE: some rows FETCHed might have been updated by a concurrent
transaction, T2, and others might not
Read lock on row accessed through cursor is medium-duration; held until cursor is moved
Allowed at READ COMMITTED, hence lost update possible. Not allowed at CURSOR STABILITY
(since T1accesses t through a cursor).
There is a possibility of deadlock when both transactions access t through a cursor
Update locks
An update lock conflicts with other update and write locks, but not with read locks (see schedule
Some DBMS provide update locks to alleviate deadlock problem
A transaction that wants to read an item now and possibly update it later requests an
update lock on the item (manual locking)
An update lock is a read lock that can be upgraded to a write lock. Often used with
updatable cursors
Optimistic Read Committed
It‟s called optimistic because the transaction assumes that no transaction will write what it has
read, hence it gives up its read lock.
T1 aborts if it tries to write a tuple which its previously read and in the meantime some other
transaction has written that tuple and committed.
Other types of locking
Intention locking
Performance improvement possible if lock on parent is weak
Intention shared (IS) lock: in order to get an S lock on an item, T must first get IS locks
on all containing items (to root of hierarchy)
Intention exclusive (IX) lock: in order to get an X lock on an item, T must first get IX
locks on all containing items (to root of hierarchy)
Shared Intention Exclusive (SIX): Equivalent to an S lock and an IX lock on an item
Intention lock indicates transaction‟s intention to acquire conventional lock on a contained
Index locking
Locking of index pages
If a WHERE clause refers to a predicate name = mary and if there is an index on name,
then an index lock on the index entries for name = mary is like a predicate lock on that
If a WHERE clause refers to a predicate such as 50000< salary < 70000 and if there is an
index on salary, then a key-range index lock can be used to get the equivalent of a
predicate lock on the predicate 50000<salary<70000
Key-range locking
Index entries at leaf level are locked
See above.
Locking a B-Tree
Read Locks
o Obtain a read lock on the root, and work your way down the tree locking each
entry as it is reached
o When a new entry is locked, the lock on the previous entry (its parent) can be
This operation will never revisit the parent
No write operation of a concurrent transaction can pass this operation as it
goes down the tree
Called lock coupling or crabbing
Write Locks
o Obtain a write lock on the root, and work your way down the tree locking each
entry as it is reached
When a new entry n is locked, if that entry is not full, the locks on all its parents
can be released
An insert operation might have to go back up the tree, revisiting and
perhaps splitting some nodes
Even if that occurs, because n is not full, it will not have to split n and
hence need not go further up the tree
Thus it can release locks further up in the tree.
To avoid acquiring many fine grain locks on a table, a DBMS can set a lock escalation
threshold. If more than the threshold number of tuple (or page) locks are acquired, the
DBMS automatically trades them in for a table lock.
Granular locking
Problem: T1 holds a (fine grained) lock on field F1 in record R1. T2 requests a conflicting
(coarse grained) lock on R1. How does the concurrency control detect the conflict since it
sees F1 and R1 as different items?
Solution: Organize locks hierarchically by containment and require that in order for a
transaction to get a fine grained lock it must first get a coarse grained lock on the
containing item
o T1 must first get a lock on R1 before getting a lock on F1. The conflict with T2 is
detected at R1
Multi-version concurrency control
A multi-version DBMS maintains all versions created in the (recent) past. Major goal of a multiversion DBMS: avoid the need for read lock
All DBMSs guarantee that statements are isolated:
o Each statement sees state produced by the complete execution of other
statements, but state might not be committed
A multiversion control guarantees that each statement sees a committed state:
o A statement is executed in a state whose value is a version
o Referred to as statement-level read consistency
A multiversion control can also guarantee that all statements of a transaction see the same
committed state:
o All statements of a transaction access the same version
o Referred to as transaction-level read consistency
Read-only multi-version control
Distinguishes in advance read-only (R/O) transactions from read/write (R/W) transactions.
R/W transactions use a (conventional) immediate-update, pessimistic control. Hence,
transactions access the most current version of the database.
All the reads of a particular R/O transaction TRO are satisfied using the most recent version
that existed when TRO requested its first read. Read consistency multi-version control
T1 and T2 are read/write transactions T3 is read/only
T3 sees the version produced by T1
The equivalent serial order is T1, T3, T2
DBMS maintains a version counter (VC)
o Incremented each time a R/W transaction commits
The new version of a data item created by a R/W transaction is tagged with the value of VC
at the time the transaction commits
When a R/O transaction makes its first read request, the value of VC becomes its counter
value. Each request to read an item is satisfied by the version of the item having the
largest version number less than or equal to the transaction‟s counter value.
Read consistency multi-version control
R/O transactions
o Treated as before: get transaction-level read consistency
R/W transactions
o Write statements acquire long-duration write locks (delay other write statements)
o Read statements use most recent (committed) version at time of read
Not delayed by write locks (since read locks are not requested).
Snapshot Isolation
Does not distinguish between R/W and R/O transactions
A transaction reads the most recent version that existed at the time of its first read request
o Guarantees transaction-level read consistency
The write sets of any two concurrently executing transactions must be disjoint
o Two implementations of this specification
First Committer Wins
Locking implementation
First committer wins
Writes use deferred-update (intentions list)
T is allowed to commit only if no concurrent transaction
o committed before T and
o updated a data item that T also updated
is optimistic:
It can be implemented without any locks
Deadlock not possible
Validation (write set intersection) is required for R/W transactions and abort is
Schedules might not be serializable
Page 12 of 29
Locking implementation of snapshot isolation
Immediate update pessimistic control
Reads do not get any locks and execute as in the previous implementation
A transaction T that wants to perform a write on some item must request a write lock
If the version number of that item is greater than that of T, T is aborted (first
committer wins)
o Otherwise, if another transaction has a write lock on that item, T waits until that
transaction completes
If that transaction commits, T is aborted (first committer wins)
If that transaction aborts, T is given the write lock and allowed to write
Following anomalies are impossible: Dirty read, dirty write, non-repeatable read, lost update.
Write skew is possible.
In a write skew anomaly, two transactions (T1 and T2) concurrently read an overlapping data set
(e.g. values V1 and V2), concurrently make disjoint updates (e.g. T1 updates V1, T2 updates V2),
and finally concurrently commit, neither having seen the update performed by the other.
It looks like phantoms cannot occur in snapshot isolation but…non serializable schedules due to
phantoms are possible.
Example: concurrent transactions each execute SEL(P) and then insert a row satisfying P.
Neither sees the row inserted by the other.
Schedule is not serializable
Would be considered a phantom if it occurred at REPEATABLE READ.
Can be considered as write skew (is permitted in snapshot isolation).
H22 Atomicity and Durability
Processor failure, software bug
Server supports atomicity by providing a recovery procedures to restore db, by using
By user, Transaction, System, etc
Roll transaction Back
Durability requires commits to be permanent
Due to the possibility of media crash, the media used must be redundant.
Log contains information which can restore the DB.
Each modification of DB causes an update record to be appended to log.
Update record contains:
Identity of data item modified
Identity of transaction (tid) that did the modification
Before image (undo record) – copy of data item before update occurred.
Abort using log
(Assume immediate-update approach)
Scan log backwards using tid to identify transaction‟s update records
o Reverse each update using before image
o Reversal done in last-in-first-out order
In a strict system, new values unavailable to concurrent transactions (as a result of long
term exclusive locks); hence rollback makes transaction atomic
Problem: terminating scan (log can be long)
Solution: append a begin record for each transaction, containing tid, prior to its first
update record
Savepoint record inserted in log when savepoint created
o Contains tid, savepoint identity
Rollback Procedure:
o Scan log backwards using tid to identify update records
o Undo updates using before image
o Terminate scan when appropriate savepoint record encountered
recovery using a log
Abort all transactions active at time of crash
Problem: How do you identify them?
Solution: abort record or commit record appended to log when transaction terminates
o Recovery Procedure:
Scan log backwards - if T‟s first record is an update record, T was active at
time of crash. Roll it back
NB: a transaction is not committed until its commit record is in the log
Problem: Scan must retrace entire log
Solution: Periodically append checkpoint record (≠ savepoint!) to log. Contains tid‟s of all
active transactions at time of append
o Backward scan goes at least as far as last checkpoint record appended
o Transactions active at time of crash determined from log suffix that includes last
checkpoint record
o Scan continues until those transactions have been rolled back
Write ahead log: log buffer in main memory (extension of log on mass storage). Periodically
flushed to mass storage. Important log-buffer doesn‟t survive a crash.
Page buffer in main memory (cache), is volatile and has to be flushed to mass storage.
Atomicity and durability complicate algorithms. Requirements:
Write-ahead feature (move update records to log on mass store before database is
updated) necessary to preserve atomicity
New values written by a transaction must be on mass store when its commit record is
written to log (move new values to mass store before commit record) to preserve durability
Transaction not committed until commit record in log on mass store
Forced vs. Unforced writes:
On database page –
o Unforced write updates cache page, marks it dirty and returns control immediately.
o Forced write updates cache page, marks it dirty, uses it to update database page
on disk, and returns control when I/O completes.
On log –
o Unforced append adds record to log buffer and returns control immediately.
o Forced append, adds record to log buffer, writes buffer to log, and returns control
when I/O completes
After a flush of the log buffer, we start with a clean log buffer in volatile memory.
Log Sequence Number (LSN):
Log records are numbered sequentially
Each database page contains the LSN of the update record describing the most recent
update of any item in the page
Commit processing: Force Policy
1. Force any update records of T in log buffer
then …
2. Force any dirty pages updated by T in cache
then …
(1) and (2) ensure atomicity (write-ahead policy)
3. Append T‟s commit record to log buffer
then …
Force log buffer for immediate commit or …
Write log buffer when a group of transactions have committed (group commit)
(2) and (3) ensure durability
Using forced policy: transactions updates are in the DB (mass storage) when it commits.
Problem: Pages updated by T might still be in cache when T‟s commit record is appended to log
Solution: Update record contains after image (called a redo record) as well as before image
Write-ahead property still requires that update record be written to mass store before page
But it is no longer necessary to force dirty pages when commit record is written to log on
mass store since all after images precede commit record in log
Referred to as a no-force policy
Recovery processing: No-force policy
Problem: When a crash occurs there might exist some pages in database (on mass store)
containing updates of uncommitted transaction: they must be rolled back
that do not (but should) contain the updates of committed transactions: they must be
rolled forward
Solution: Use a sharp checkpoint
Before appending checkpoint record CK to log buffer, halt processing and force all dirty
pages from cache
Recovery process can assume that all updates in records prior to CK were written to
database (only updates in records after CK might not be in db)
p1 must be rolled forward using xnew
p2 must be rolled back using yold
1. Pass 1
Log is scanned backward to most recent checkpoint record, CK, to identify transactions
active at time of crash
2. Pass 2
Log is scanned forward from CK to most recent record. The after images in all update
records are used to roll the database forward
3. Pass 3
Log is scanned backwards to begin record of oldest transaction active at time of crash. The
before images in the update records of these transactions are used to roll these
transactions back
This is called DO-UNDO-REDO
(updates – rollback in pass 3 – rollforward in pass 2)
Issue 1: Database pages containing items updated after CK was appended to log might have been
flushed before crash
No problem – with physical logging, roll forward using after images in pass 2 is idempotent
Rollforward in this case is unnecessary, but not harmful
Issue 2: Some update records after CK might belong to an aborted transaction T1.
These updates are restored in pass 2 and but not rolled back in pass 3 since T1 was not
active at time of crash
Treat rollback operations for aborting T1 as ordinary updates and append compensating log
records to log
Issue 3: What if system crashes during recovery?
Recovery is restarted
If physical logging is used, pass 2 and pass 3 operations are idempotent and hence can be
Fuzzy checkpoints
Before writing CK, record the identity of all dirty pages (don‟t flush) in volatile memory. Write
(flush) dirty pages in the background
Page corresponding to U1 (x) is recorded
at CK1 and will have been flushed by CK2
Page corresponding to U2 (y) is recorded
at CK2, but might not have been flushed at
time of crash
Pass 2 must start at CK1
Deferred update system
Update: append new value to intentions-list; append update record to log buffer.
Abort: discard intentions-list
Commit: force commit record to log. Update db using intentinons-list.
Checkpoint record contains list of committed (not active) but incomplete transactions (intentionslist).
Scan back to most recent checkpoint record to determine transactions that are committed
but for which updates are incomplete at time of crash
Scan forward to install after images for incomplete transactions
No third pass required since transactions active (not committed) at time of crash have not
affected database
Database dump
Simple Dump
System stops accepting new transactions
Wait until all active transactions complete
Dump: copy entire database to a file on mass storage (including mirror)
Restart log and system
Simple Dump Restore
Install most recent dump file
Scan backward through log
Determine transactions that committed since dump was taken
Ignore aborted transactions and those that were active when media failed
Scan forward through log
Install after images of committed transactions
Fuzzy Dump
Write begin record to log; Copy db records to dump file while system is active.
Naïve restoration
Install dump on disk
Scan log backwards to begin dump record to produce list L of all transactions that
committed since start of dump
Scan log forward and install after images in update records of all transactions in L
Some examples that it works fine:
Naïve restoration doesn‟t handle 2 cases
T commits before dump starts but its dirty pages might not have been flushed until dump
completed. Dump does not read T‟s updates and T is not in L .
Dump reads T‟s updates but T later aborts:
H23 Architecture of Transaction processing systems
Three-tier architectures
First single user system.
Presentation services: display forms etc.
Application services: implements user req, interacts with DBMS
TPS = Transaction Processing System
Application server
Sets transaction boundaries
Acts as a workflow controller: implements user request as a sequence of tasks
e.g., registration = (check prerequisites, add student to course, bill student)
Acts as a router
Distributed transactions involve multiple servers
Server classes are used for load balancing
Since workflows might be time consuming and application server serves multiple clients,
application server is often multi-threaded
Transaction server
Stored procedures off-loaded to separate (transaction) servers to reduce load on DBMS.
Transaction server close to DBMS, Application server close to clients
Transaction server does bulk of data processing.
Interconnection of servers in 3-tiered model:
Session and context
A session exists between two entities if they exchange messages while cooperating to perform
some task.
Client/server session: server context (describing client) has to be maintained by server in order to
handle a sequence of client requests.
Direct vs. Queued transaction processing
Direct: Client waits until request is serviced. Service provided as quickly as possible and result is
returned. Client and server are synchronized.
Queued: Request enqueued and client continues execution. Server dequeues request at a later
time and enqueues result. Client dequeues result later. Client and server unsynchronized.
transactions on two recoverable queues Advantages:
Client can enter requests even if server is unavailable
Server can return results even if client is unavailable
Request will ultimately be served even if T2 aborts (since queue is transactional)
Heterogeneous vs Homogeneous TPS
Homogeneous systems are composed of HW and SW modules of a single vendor
Modules communicate through proprietary (often unpublished) interfaces
Hence, other vendor products cannot be included
Referred to as TP-Lite systems
Heterogeneous systems are composed of HW and SW modules of different vendors
Modules communicate through standard, published interfaces
Referred to as TP-Heavy systems
Middleware is the software that integrates the components of a heterogeneous system and
provides utility services . For example, supports communication (TCP/IP), security (Kerberos),
global ACID properties, translation (JDBC)
Transaction Manager
Middleware to support global atomicity of distributed transactions
Application invokes manager when transaction is initiated
Manager is informed each time a new server joins the transaction
Application invokes manager when transaction completes
Manager coordinates atomic commit protocol among servers to ensure global
TP monitors
A TP Monitor is a collection of middleware components that is useful in building hetereogeneous
transaction processing systems
Includes transaction manager
Application independent services not usually provided by an operating system
TP Monitor Services
Communication services
Built on message passing facility of OS
Capable of type checking
Peer-to-peer (sessions), RPC, and/or event communication (event broker)
Location transparent
Transactional (TRPC)
Robust against failures
Asymmetric or synchronous
If within transaction
Asymmetric use persistent queue
Symmetric requester joins transaction
ACID properties
Local isolation for a (non-db) server might be provided by a lock manager
Local atomicity for a (non-db) server might be provided by a log manager
Global isolation and atomicity are provided by transaction manager
Routing and load balancing
TP monitor can use load balancing to route a request to the least loaded member of
a server class
Threads can be thought of as low cost processes
Useful in servers (e.g., application server) that might be maintaining sessions for a
large number of clients
TP monitor provides
threads if OS does
Recoverable queues
Security services
authentication, and
Miscellaneous servers
File server
Clock server
Storage Architectures
Bottleneck in performance is disk I/O
DBMS maintains disk cache in main memory.
Some possible usage of systems:
RAID, NAS and SAN (storage attached network)
Architecture of Web transaction processing
A Web application server is a set of tools and modules for building and executing transaction
processing systems for the Web
Including the application server tier of the system
We discuss J2EE (Java 2 Enterprise Edition) standard
J2EE One language, many platforms
.NET One platform, many languages. set of products of Microsoft
J2EE defines a set of services and classes particularly oriented toward transaction-oriented
Web services
o Java servlets
o Enterprise Java beans
Enterprise Java Beans
Java classes that implement the business methods of an enterprise
Execution within an infrastructure of services provided by the Web application server
o Supports transactions, persistence, concurrency, authorization, etc.
o Implements declarative transaction semantics
The bean programmer can just declare that a particular method is to be a
transaction and does not have to specify the begin and commit commands
o Bean programmer can focus on business methods of the enterprise rather on
details of system implementation
Entity bean: represents a persistent business object whose state is stored in the database
o one bean = one table
o one bean instance = one row in table
Session bean: represents a client performing interactions within a session using the
business methods of the enterprise
o can retain state during interactions
o session beans call methods of entity beans (synchronous communication)
o can be transactional (JDBC or JTA: Java Transaction API)
Message-driven bean: is like a session bean but asynchronous (uses JMS message
The bean class
o Contains implementations of the business methods of the enterprise
A remote interface (also optionally a local interface)
o Used by clients to access the bean class remotely, using TRPC(or locally with the
local interface); acts as proxy for bean class
o Includes declarations of all the business methods
A home interface (also optionally a local home interface)
o Contains methods that control bean‟s life cycle (Create, remove) and finder
methods (e.g. FindByPrimaryKey)
A deployment descriptor
o Declarative metadata for the bean
o Describes persistence, transactional, and authorization properties
H24 Distributed Transactions
Atomic Commit Protocol
Global atomicity
All subtransactions of a distributed transaction must commit or all must abort
An atomic commit protocol, initiated by a coordinator (e.g., the transaction manager),
ensures this.
Coordinator polls cohorts (participating databases) to determine if they are all willing to
Protocol is supported in the xa-interface between a transaction manager and a resource
manager (e.g., DBMS)
ACID properties
Requirement for each local DBMS
o supports ACID properties locally for each subtransaction
o eliminates local deadlocks
The additional issues are:
o Global atomicity: all cohorts must abort or all commit
o Global deadlocks: there must be no deadlocks involving multiple sites
o Global serialization: distributed transaction must be globally serializable
Cohort Abort
Reasons for abort: Validation failure, deadlock, crash of cohort site, no communication with cohort
Atomic Commit Protocol
Most commonly used atomic commit protocol is the two-phase commit protocol
Implemented as an exchange of messages between the coordinator and the cohorts
Guarantees global atomicity of the transaction even if failures should occur while the
protocol is executing
Two-Phase Commit Protocol
Transaction record resides in volatile memory of transaction manager. Created when application
calls tx_begin.
Phase 1
Application invokes tx_commit
Coordinator sends prepare message to all cohorts
If cohort wants to commit, it moves all update records to mass store by forcing a prepare
record to its log
o Guarantees that cohort will be able to commit (despite crashes) if coordinator
decides commit (since update records are durable)
Cohort enters prepared state
Cohort sends a vote message (“ready” or “aborting”).
cannot change its mind
retains all locks if vote is ready”
enters uncertain period (it cannot foretell final outcome)
Note that cohort may abort at any time prior to or on receipt of
the message: it aborts and releases locks
Coordinator receives vote messages
Coordinator records vote in transaction record
Remember, vote indicates cohort is “ready” to commit or aborting
If any vote is aborting, coordinator decides abort and deletes transaction record
If all are ready, coordinator decides commit, forces commit record (containing
transaction record) to its log (end of phase 1)
Coordinator sends commit or abort message to all cohorts
Transaction is logically committed when commit record is durable
Since all cohorts are in prepared state, transaction can be committed despite any failures
Phase 2
Cohort receives commit message from coordinator
o Cohort commits locally by forcing a commit record to its log
o Cohort sends done message to coordinator
Cohort receives abort message from coordinator
o Cohort aborts
Locks are released and uncertain period ends
Coordinator receives a done message
o Coordinator records receipt of done message
o If all received, coordinator writes a complete record to its log and deletes
transaction record from volatile store
Abort case
Commit case
A participant recognizes 2 failure situations:
Timeout: No response to a message execute timeout protocol
Crash: On recovery, execute a restart protocol
A cohort is blocked when it cannot complete protocol until some failure is repaired.
Timeout protocol
[Cohort] Time out waiting for prepare
Abort the subtransaction
o Since the (distributed) transaction cannot Commit unless cohort votes to commit,
atomicity is preserved
[Coordinator] Time out waiting for vote
Abort the transaction
o Since coordinator controls decision, it can force all cohorts to abort, preserving
[Cohort] Time out waiting for commit/abort
Cohort is in prepared state
Cohort is blocked since it does not know coordinator‟s decision
o Coordinator might have decided commit or abort
o Cohort cannot unilaterally decide since its decision might be contrary to
coordinator‟s decision, violating atomicity
o Locks cannot be released
Cohort requests status from coordinator and remains blocked
[Coordinator] Time out waiting for done
Requests done message from delinquent cohort
Restart protocol
On restart cohort finds in its log
begin_transaction record, but no prepare record:
o Abort (transaction cannot have committed because cohort has not voted)
prepare record, but no commit record (cohort crashed in its uncertain period)
o Does not know if transaction committed or aborted
o Locks items mentioned in update records before restarting system
o Requests status from coordinator and blocks until it receives an answer
commit record
o Recover transaction to committed state using log
Cohort in blocked state, but coordinator does not respond to a request for status; either
Wait until the coordinator is restarted
Give up, make a unilateral decision, and attach a fancy name to the situation.
Resolve the potential loss of atomicity outside the system (thus human intervention
On restart
Records of transactions in phase 1 lost (transaction record in volatile memory)
If there are transactions in phase 2, then we have commit record in the log, but no
complete record: restore transaction record to volatile memory
On receiving a request from a cohort for transaction status
If transaction record exists in volatile memory, reply based on information in transaction
If no transaction record exists in volatile memory
o Cohort asks for status, but no trans.rec. at coord, then either
The coordinator aborted the transaction and deleted transaction record
The coordinator crashed, restarted and didn‟t find commit record. It was in
Phase 1 of the protocol and had not yet made a decision, or it had
previously aborted the transaction.
o Thus coordinator can respond abort
Linear Commit
Variation of two-phase commit that involves transfer of coordination
Cohorts are assumed to be connected in a linear chain
When left cohort A is ready to commit, it goes in prepared state and sends a vote message
„ready‟to B (request B to act as coordinator)
After receiving the vote message, B does the same as A and sends vote message to C
When vote message reaches rightmost cohort N, if N is ready to commit it commits the
entire transaction (acts as coordinator) and sends commit message to it‟s left.
Message goes down the chain until it reaches A.
When A has committed it sends a „done‟ message to B and so on until it reaches N.
Two-phase commit without prepared state
Assume exactly one cohort C does not support a prepared state.
Coordinator performs Phase 1 of two-phase commit protocol with all other cohorts
If they all agree to commit, coordinator requests that C commit its subtransaction (in
effect, requesting C to decide the transaction‟s outcome)
C responds commit/abort, and the coordinator sends a commit/abort message to all other
Global Deadlock
Not always detectable on any one site (it may be distributed over multiple sites).
Detection by a simple extension of local deadlock detection. Check for a cycle with a probe. The
probe is send to the coordinator of the cohort it is waiting for. When the probe returns a deadlock
Prevention by using timestamps. A older transaction never waits for a younger one, the younger
one is aborted.
If all sites use strict two-phase locking and the transaction manager uses a two-phase
commit protocol,
Then transactions are globally serializable in commit order.
o Improves availability: data can be accessed even though some site has failed
o Can improve performance: a transaction can access the closest (perhaps local)
o More storage
o Mutual consistency of replicas must be maintained
o Access by concurrent transactions to different replicas can lead to incorrect results
Replica control
Knows location of all replicas
Translates transaction‟s request to access an item into a request to access particular
Maintains some form of mutual consistency:
o Strong: all replicas always have the same value (in every committed version of the
o Weak: all replicas eventually have the same value
o Quorum: a quorum of replicas have the same value (a certain set of servers have
the same data)
Page 28 of 29
Read request: use nearest replica
Write request: update alle replicas
Synchronous: immediately
Asynchronous: eventually
Quorum Consensus replica control
To read using quorums, timestamps are used to select the right values. See also example sheet
(made during lecture).
Primary copy control
One copy designated primary, other copies secondary.
Reading is from the nearest copy. Writing is on the primary copy. After commit copies are
propagated to secondary copies, thus it‟s Asynchronous, hence good for performance bad for
