Download Database Transactions and Processess

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Relational model wikipedia , lookup

Microsoft SQL Server wikipedia , lookup

Database model wikipedia , lookup

Global serializability wikipedia , lookup

Clusterpoint wikipedia , lookup

Consistency model wikipedia , lookup

Object-relational impedance mismatch wikipedia , lookup

Microsoft Jet Database Engine wikipedia , lookup

Versant Object Database wikipedia , lookup

Commitment ordering wikipedia , lookup

Extensible Storage Engine wikipedia , lookup

Concurrency control wikipedia , lookup

Serializability wikipedia , lookup

Transcript
Database Transactions and Processess
Samenvatting 192110982
H18 ACID properties of transactions
ACID properties van een DB
Atomicity: Elke transactie wordt volledig uitgevoerd(executed) of helemaal niet (en heeft dus geen
invloed)
Consistency:DB moet consistent zijn. De waarden moeten de juiste waarden representeren. Het
uitvoeren v/e transactie in isolation behoud de DB consistency en veranderd naar een nieuwe
status welke de werkelijkheid weergeeft
Isolation: De concurrent execution van een set van transacties heeft het zelfde effect als een
seriële uitvoering van dezelfde set
Durability: De resultaten van committed transacties zijn permanent
Consistency
De DB moet voldoen aan alle Integrity Constraints (niet alle DB states kunnen worden toegestaan):
Internal consistency: bijv Referential integrity en Replicated data
Enterprise rules: Enforcement of procedures.
Wanneer een transactie wordt uitgevoerd, moet deze voldoen aan deze voorwaarden. Gebeurt dit
niet, dan is er geen zekerheid over hoe de DB zal functioneren en is dus de data niet betrouwbaar.
Transaction consistency is een verantwoordelijkheid van de ontwerper van de transactie. Hij moet
er voor zorgen dat wanneer een transactie is uitgevoerd de DB in een juiste staat verkeerd door
het voldoen aan de Integrity Contstraints.
Levels of consistency:
Static
Dynamic: deze kan eigenlijk niet worden getest in één DB state.
Een transaction is een unit of work: Een transactie moet alle werk doen om een DB te updaten. Dit
kan nooit door 2 transacties worden gedaan want dat zou een violation zijn van de consistency
Atomicity
Atomic execution implies that every transaction either commits or aborts.
In a DB changes are undone or rolled back (maw de oude situatie wordt terug gezet).
Redenen voor een abort:
1
Violation of integrity constraints (voorwaarden)
2
Deadlock
3
Resources unavailable (eg HD crash), violating the Isolation requirement
Durability
Het system (DB) moet garanderen dat wanneer een transactie commits (vastleggen), de effecten
in de DB blijven ook als de computer of het opslag medium uitvalt.
Normaal gesproken gebeurt dit niet bij een OS, maar den DB moet het garanderen. Een
mogelijkheid om te garanderen zijn bijvoorbeeld mirrored disks of disks in RAID geschakeld.
Durability is relatief, men moet bepalen wat het waard is en welke graad dus wordt uitgevoerd.
Isolation
Concurrent execution (gelijktijdige uitvoering) van transacties. Maar de uiteindelijke uitvoering zal
nog steeds serieel zijn. Dus om aan de Isolation property te voldoen, moet een concurrent
schedule serializable zijn.
ACID properties garanderen dat de DB correct, consistent en up-to-date model van de echte wereld
is. Alhoewel is het strikt vast houden aan bijvoorbeeld Isolation belangrijk voor het functioneren
van de DB, het is erg conservatief. Een DB kan goed functioneren met zonder dat alle transacties
Isolated zijn, maar dit kan een risico met zich mee brengen. Conservatieve ACID is een zware
belasting op het systeem.
Interleaving: Het samenkomen van verschillende transacties in tijd. (zie ook de schema‟s)
Page 1 of 29
H19 Models of transactions
Flat transactions
Some limitations

Single DBMS

All or nothing; no continue from certain point

No human or device action possible as part of transaction (cannot rollback)

Everything at once; no possibility to defer to later time without loosing ACID
A possible solution is to use savepoints.
Save points
Used for partial rollback of db.

Rollback to spi causes db updates after creation of spi to be undone
o S2 and S3 updated the database (else no point rolling back over them)

Program counter and local variables are not rolled back

Savepoint creation does not make prior database changes durable (abort rolls all changes
back)
Implementation of save points:

When Ti creates a savepoint, s, insert a marker for s in Ti‟s lock list, Li , that separates lock
entries acquired before creation from those acquired after creation

When Ti rolls back to s, release all locks following marker for s in Li (in addition to undoing
all updates made since savepoint creation)
Distributed Transactions
Many enterprises support multiple legacy systems doing separate tasks. Increasing automation
requires that these systems be integrated.
Goal: distributed transaction should be ACID

Each subtransaction is locally ACID (e.g., local constraints maintained, locally serializable)

In addition the transaction should be globally ACID

A: Either all subtransactions commit or all abort

C: Global integrity constraints are maintained

I: Concurrently executing distributed transactions are globally serializable

D: Each subtransaction is durable
Distributed Database: one design but due to size split up over several hosts
Multidatabase: Several designs (may be used by several companies)
Models:

Hierarchical Model:
o No concurrency among subtransactions, root initiates
commit

Peer Model:
o Concurrency among siblings and between parent and
children, any subtransaction can initiate commit
Nested Transactions

Parent can create children to perform subtasks; children may execute sequentially or
concurrently; parent waits until all children complete (no communication between parent
and children).

Each subtransaction (with its descendants) is isolated wrt each sibling (and its
descendants). Hence, siblings are serializable, but order is not determined and nested
transaction is non-deterministic.

Concurrent nested transactions are serializable.

A subtransaction is atomic. It can abort or commit independently of other subtransactions.
Commit is conditional on commit of parent (since child task is a subtask of parent task).
Abort causes abort of all subtransaction‟s children.

Nested transaction commits when root commits. At that point updates of committed
subtransactions are made durable.
Page 2 of 29
Locking implementation of nested transactions
Nested



transactions satisfy:
Nested transactions are isolated with respect to one another
A parent does not execute concurrently with its children
A child (and its descendants) is isolated from its siblings (and their descendants)
Acquiring a read lock
A request to read x by subtransaction T2 of nested
transaction T1 is granted if:

No other nested transaction holds a write lock
on x

All other subtransactions of T1 holding write
locks on x are ancestors of T2 (hence are not
executing)
Acquiring a write lock
A request to write x by subtransaction T2 of
nested transaction T1 is granted if:

No other nested transaction holds a read
or write lock on x

All other subtransactions of T1 holding
read or write locks on x are ancestors of
T2 (and hence are not executing)
All locks obtained by T2 are held until it completes

If it aborts, all locks are released

If it commits, any locks it holds that are not held by its parent are inherited by its parent
When top-level transaction (and hence entire nested transaction) commits, all locks are released.
Chained transactions

Chaining allows a transaction to be decomposed into sub-transactions
with intermediate commit points.

Database updates are made durable at intermediate points => less
work is lost in a crash
Chaining compared with savepoints:

Savepoint: explicit rollback to arbitrary savepoint; all updates lost in a crash (put on
intentionlist!)

Chaining: abort rolls back to last commit; only the updates of the most recent transaction
lost in a crash
ACID properties, what did we lose?:
Atomicity Lost
Isolation  Lost
Consistency  needs consistency for subtransactions
Durability  Kept!
One approach to atomicity problem is compensation (see SAGAS)
SAGAS

Sagas are an extension to chained transactions that achieves partial atomicity

For each subtransaction, STi in a chained transaction T, a compensating transaction, CTi is
designed

Thus if a transaction T consisting of 5 chained subtransactions aborts after the first 3
subtransactions have committed, then ST1 ST2 ST3 CT3 CT2 CT1 will perform the desired
compensation

With this type of compensation, when a transaction aborts, the value of every item it
changed is eventually restored to the value it had before that transaction started

However, complete atomicity is not guaranteed
o Some other concurrent transaction might have read the changed value before it
was restored to its original value

Other problems?
o non compensatable transactions (reset(X))
Page 3 of 29
Recoverable queues
A recoverable queue is a transactional data structure in which information about transactions to be
executed later can be durably stored.
See also slide 31 and 32 for example
Queue could be implemented within the DB but performance suffers. A transaction should not hold
long duration locks.
Separate implementation takes advantage of semantics to improve performance.
Since queue is implemented by a separate
server (different from DBMS), the locking
discipline need not be two-phase; discipline can
be designed to suit the semantics of (the
abstract operations)
enqueue and dequeue

Lock on head (tail) pointer released
when dequeue (enqueue) operations
complete (Hence not strict or isolated)

Lock on entry that is enqueued or
dequeued held to commit time
Queue and DBMS are two separate systems.
Transactions must be committed at both but
isolation is implemented at the DBMS and
applies to the schedule of requests made to the
DBMS only.
As a result, any scheduling policy for accessing the queue might be enforced.
Real-world actions.
A real-world action performed from within a
transaction T, cannot be rolled back if crash occurs
before commit. On recovery after a crash, how can
we tell if the action has occurred?
Solution: device maintains read-only counter
(hardware) that is automatically incremented with
each action.
Page 4 of 29
On recovery:
1. recover queue and database;
2. read recorded value of counter from database;
3. if (device value > recorded value)
then discard head entry;
// device performed action
else ;
// device did not perform action
4. restart server(s);
// which re-initiates transaction,
// because request was restored
// on queue
Workflows
A workflow is a model of a complex, long-running enterprise process generally performed in a
highly distributed and heterogeneous environment.
Workflow task

Self-contained job performed by an agent
o Inventory transaction (agent = database server)
o Packing task (agent = human)

Has an associated role that defines type of job
o An agent can perform specified roles

Accepts input from other tasks, produces output

Has physical status: committed, aborted, ...
o Committed task has logical status: success, failure
Examples see slides 46 and 47 of lecture.


ACID properties
Individual tasks might be ACID, but workflow as a whole is not
o Some task might not be essential: its failure is ignored even though workflow
completes
o Concurrent workflows might see each other‟s intermediate state
o Might not choose to compensate for a task even though workflow fails
Each task is either
o Retriable: Can ultimately be made to commit if retried a sufficient number of
times (e.g., deposit)
o Compensatable: Compensating task exists (e.g., withdraw)
o Pivot: Neither retriable nor compensatable (e.g., buy a nonrefundable ticket)
Importance



Allows management of an enterprise to guarantee that certain activities are carried out in
accordance with established business rules, even though those activities involve a
collection of agents, perhaps in different locations and perhaps with minimal training
Audit trail
Statistics … management information
Page 5 of 29
H20 Implementing Isolation
Commutativity
Database operations p1 and p2 commute wanneer voor alle initiele DB states:
a) Return the same results and
b) Leave the DB in the same final state ongeachte de volgorde van uitvoering.
p1 commutes with p2 if

They operate on different data items
o w1(x) commutes with w2(y) and r2(y)

Both are reads
o r1(x) commutes with r2(x)
Operations that do not commute conflict
o w1(x) conflicts with w2(x)
o w1(x) conflicts with r2(x)
Voorbeelden van commuting operations zie slide 9 en 10 van 2 e HC
Conventional Operations
We abstract operations to two classes:

Read
o r(x, X) - copy the value of database variable x to local variable X

Write
o w(x, X) - copy the value of local variable X to database variable x

We use r1(x) and w1(x) to mean a read or write of x by transaction T1
Serializable schedules:

S is serializable if it is equivalent to a serial schedule (dus gelijk aan een serial schema).

Transactions are totally isolated in a serializable schedule

A schedule is correct for any application if it is a serializable schedule of consistent
transactions
Serializability (zwaarste vorm van Isolation Level, eigenlijk predicate locking) voorziet in een
conservatieve definitie van juistheid. Hierdoor wordt er voor veel overhead gezorgd en zullen dus
de prestaties en doorvoer terug lopen. Sommige Schema‟s die niet serializable zijn, zijn soms wel
te accepteren zijn (lagere Isolation Levels).
Conflict equivalence vs view equivalence
Up to now, a schedule is serializable if it is conflict equivalent to a serial schedule.
Conflict equivalence definitie: schema‟s S1 en S2 zijn conflict equivalent als de conflicterende
operaties op in beide dezelfde manier zijn
geordend.
2 schema‟s van dezelfde set van operaties zijn
view equivalent als:

Corresponding read operations in each
return the same values (dus de
berekeningen zijn hetzelfde).

And both schedules yield the same final
database state
Conflict equivalence implies view equivalence. View
equivalence does not imply conflict equivalence.
Conflict equivalence is sterker dan view
equivalence
Serialization graphs
Een schema is serializable als de graaf cykel vrij is. Dit is de enige optie.
De volgende stappen ondernemen:
1. Vind de conflicten
2. Teken de graaf
3. Kijk of de graaf een cykel bevat.
Dirty read: reading data that hasn‟t committed
Dirty write: T1 writes data written by active transaction.
T1: w(x)
T2:
abort
w(x)
abort
Page 6 of 29
T1
T3
T2
Strict schedule: dirty reads and dirty writes are prohibited
Let op! Een strict schema is niet hetzelfde als een serializable schema.
Concurrency Control
Find a schedule of correct interleaving (concurrent schema‟s zijn gelijk aan seriële schema‟s). De
CC kan niet het hele schema zien, dus is er een strategie.
Door het toepassen van CC kan de response-time behoorlijk groter worden en de doorvoer kan
behoorlijk beperkt worden.
Strategy Do not serve a request if:
1. Violates strictness or serializability
2. There might be a possibility that a subsequent arrival may cause a violation of
serializability.
Modellen van CC:
Immediate update:

Write updates a db item

Read copies value from db item

Commit makes updates durable

Abort undoes updates
Deferred update:

Write stores value in transaction‟s intention list.

Read copies value from db or intention list

Commit uses intention list to durably update db

Abort discards intention list.
Pessimistic control:

Transaction requests permission for each DB (read/write) operation

CC can:
o Grant operation
o Delay it until a subsequent event occurs (commit or abort of other transaction)
o Abort the transaction

Decisions made conservatively so that every commit req. can be granted.

Takes precautions even if conflicts do not occur.
Optimistic control:

Requests for DB operations are always granted (read/write)

Request to commit might be denied
o Transaction is aborted if it performed a non-serializable operation

Assumes conflicts are not likely
Immediate Update Pessimistic CC
Rule (can be used at each arriving request):

Don‟t grant request that imposes an ordering amongst active transactions (delay the
requesting transaction).

Grant request that doesn‟t conflict with previously granted requests of active transactions.
A transaction is forced to wait (if request is delayed). Delayed requests are reconsidered when a
transaction completes (commit or abort, but it becomes inactive)
Result: Each schedule is equivalent to a serial schedule in which transactions are ordered in the
order in which they commit (=commit order)
Locking:
A transaction can

read if it hold a Read (shared) lock, (granted if no transaction currently holds a write lock
on that item)

write and update if it holds a Write (exclusive) lock.(granted if no transaction holds any
lock on that item)

delayed if request cannot be granted
Granted mode
Requested mode
read
Read
Write
Page 7 of 29
write
X
X
X
All locks are released when transaction completes
Lock is not granted if the request conflicts with the rule (zie boven), thus transaction waits.
Result: schedules are serializable and strict.
Implementation of locks using Lock set L(x) and Wait set W(x).
Deadlock: CC that cause transactions to wait can cause deadlocks.
Solution:

Abort one transaction in the cycle

Use wait-for graph to detect cycle

Assume deadlock when transactions waits longer than time-out period
Manual locking
Manual lock release when finished accessing item. Better performance, but due to early lock
release non-serializable schedules are possible.
Two-phase locking
Transaction doesn‟t release lock until it has all locks it will ever require (first phase). Then lock
which are no longer needed will be released (unlocking phase).
Schedule produced by two-phased locking control:

Equivalent to a serial schedule (ordered by time of first unlock operation)

Not necessarily recoverable (dirty reads and writes possible).

A two-phase locking control that holds write locks until commit produced strict serializable
schedules

A strict two-phase locking control holds all locks until commit and produces strict and
serializable schedules
Lock granularity

Table locking (coarse)
o Lock entire table when a row is accessed.

Row (tuple) locking (fine)
o Lock only the row that is accessed.

Page locking (compromise)
o When a row is accessed, lock the containing page
Deferred Update Optimistic CC
Under optimistic assumption that conflicts don‟t occur, read and write requests are always granted
(no locking, no overhead!).
Transaction has three phases:

Begin transaction
o Read Phase - transaction executes: reads from database, writes to intentions list
(deferred-update, no changes to database)

Request commit
o Validation Phase - check whether conflicts occurred during read phase; if yes abort
(discard intentions list)

Commit
o Write Phase - write intentions list to database (deferred update) if validation
successful

For simplicity, we assume here that validation and write phases form a single critical
section (only one transaction is in its validation/write phase at a time)

Guarantees an equivalent serial schedule in which the order of transactions is the order in
which they enter validation (dynamic)
Validation
A transaction is validated when it wants to commit. T1 enters validation, then check if T1 conflicted
with any transaction. Overlapping of validation is not allowed!
Advantage: No deadlock
Disadvantage: No rollback possibility
Page 8 of 29
H21 Isolation in relational databases
Phantoms
An example, all rows where the name is „mary‟ are locked (or selected) and a sum is calculated.
But still there can be a row inserted with the name „mary‟. This las row is called a phantom.
Phantoms occur when row locking is used. Phantoms can be prevented by using table locking or
predicate locking.
Locking

Predicate locking prevents phantoms and produces serializable schedules, but is too
complex to implement

Table locking prevents phantoms and produces serializable schedules, but negatively
impacts performance

Row locking does not prevent phantoms and can produce nonserializable schedules
Predicate locking

A predicate describes a set of rows, some are in a table and some are not; e.g. name =
„Mary‟

Every SQL statement has an associated predicate

When executing a statement, acquire a (read or write) lock on the associated predicate

Two predicate locks conflict if one is a write and there exists a row (not necessarily in the
table) that is contained in both

Locking is conservative: there might be no rows in Accounts satisfying both predicates
Non-repeatable read
With a non-repeatable read, execution of same SELECT twice yields the same set of rows, but
attribute values might be different
SQL Isolation Levels

READ UNCOMMITTED – dirty reads, non-repeatable reads, and phantoms allowed

READ COMMITTED - dirty reads not allowed, but non-repeatable reads and phantoms
allowed

REPEATABLE READ – dirty reads, non-repeatable reads not allowed, but phantoms allowed

SERIALIZABLE – dirty reads, non-repeatable reads, and phantoms not allowed; all
schedules must be serializable
Locking implementation of SQL Isolation Levels
Locking implementation is based on:

Entities locked: rows, predicates, …

Lock modes: read and write

Lock duration:
o Short - locks acquired in order to execute a statement are released when
statement completes
o Long - locks acquired in order to execute a statement are held until transaction
completes
o Medium – something in between
Write locks are handled identically at all isolation levels:

Long-duration predicate write locks are associated with UPDATE, DELETE, and INSERT
statements. This rules out dirty writes

In practice, predicate locks are implemented with table locks or by acquiring locks on an
index as well as the data.
Read locks are handled differently at each level:

READ UNCOMMITTED: no read locks
o Hence a transaction can read a write-locked item!
o
Allows dirty reads, non-repeatable reads, and phantoms

READ COMMITTED: short-duration read locks on rows returned by SELECT
o Prevents dirty reads, but non-repeatable reads and phantoms are possible

REPEATABLE READ: long-duration read locks on rows returned by SELECT
o Prevents dirty and non-repeatable reads, but phantoms are possible

SERIALIZABLE: long-duration read lock on predicate specified in WHERE clause
o Prevents dirty reads, non-repeatable reads, and phantoms and …
o guarantees serializable schedules
Page 9 of 29
Some DBMSs allow only read-only transactions to be executed on READ UNCOMMITTED level.
Cursor stability
Cursor stability is a commonly implemented isolation level, which is an extension of READ
COMMITTED.

Long-duration write locks on predicates

Short-duration read locks on rows

Additional locks for handling cursors
Access by T1 through a cursor, C, generally involves OPEN followed by a sequence of FETCHs

C is INSENSITIVE: rows FETCHed cannot be affected by concurrent updates (since OPEN is
isolated)

C is not INSENSITIVE: some rows FETCHed might have been updated by a concurrent
transaction, T2, and others might not
Read lock on row accessed through cursor is medium-duration; held until cursor is moved
Example
Allowed at READ COMMITTED, hence lost update possible. Not allowed at CURSOR STABILITY
(since T1accesses t through a cursor).
There is a possibility of deadlock when both transactions access t through a cursor
Update locks
An update lock conflicts with other update and write locks, but not with read locks (see schedule
below)
Some DBMS provide update locks to alleviate deadlock problem

A transaction that wants to read an item now and possibly update it later requests an
update lock on the item (manual locking)

An update lock is a read lock that can be upgraded to a write lock. Often used with
updatable cursors
Optimistic Read Committed
It‟s called optimistic because the transaction assumes that no transaction will write what it has
read, hence it gives up its read lock.
T1 aborts if it tries to write a tuple which its previously read and in the meantime some other
transaction has written that tuple and committed.
Other types of locking
Intention locking
Performance improvement possible if lock on parent is weak

Intention shared (IS) lock: in order to get an S lock on an item, T must first get IS locks
on all containing items (to root of hierarchy)

Intention exclusive (IX) lock: in order to get an X lock on an item, T must first get IX
locks on all containing items (to root of hierarchy)

Shared Intention Exclusive (SIX): Equivalent to an S lock and an IX lock on an item

Intention lock indicates transaction‟s intention to acquire conventional lock on a contained
item
Index locking
Locking of index pages

If a WHERE clause refers to a predicate name = mary and if there is an index on name,
then an index lock on the index entries for name = mary is like a predicate lock on that
predicate

If a WHERE clause refers to a predicate such as 50000< salary < 70000 and if there is an
index on salary, then a key-range index lock can be used to get the equivalent of a
predicate lock on the predicate 50000<salary<70000
Page 10 of 29
Key-range locking
Index entries at leaf level are locked
See above.
Locking a B-Tree

Read Locks
o Obtain a read lock on the root, and work your way down the tree locking each
entry as it is reached
o When a new entry is locked, the lock on the previous entry (its parent) can be
released

This operation will never revisit the parent

No write operation of a concurrent transaction can pass this operation as it
goes down the tree

Called lock coupling or crabbing


Write Locks
o Obtain a write lock on the root, and work your way down the tree locking each
entry as it is reached
o
When a new entry n is locked, if that entry is not full, the locks on all its parents
can be released

An insert operation might have to go back up the tree, revisiting and
perhaps splitting some nodes

Even if that occurs, because n is not full, it will not have to split n and
hence need not go further up the tree

Thus it can release locks further up in the tree.
To avoid acquiring many fine grain locks on a table, a DBMS can set a lock escalation
threshold. If more than the threshold number of tuple (or page) locks are acquired, the
DBMS automatically trades them in for a table lock.
Granular locking

Problem: T1 holds a (fine grained) lock on field F1 in record R1. T2 requests a conflicting
(coarse grained) lock on R1. How does the concurrency control detect the conflict since it
sees F1 and R1 as different items?

Solution: Organize locks hierarchically by containment and require that in order for a
transaction to get a fine grained lock it must first get a coarse grained lock on the
containing item
o T1 must first get a lock on R1 before getting a lock on F1. The conflict with T2 is
detected at R1
Multi-version concurrency control
A multi-version DBMS maintains all versions created in the (recent) past. Major goal of a multiversion DBMS: avoid the need for read lock

All DBMSs guarantee that statements are isolated:
o Each statement sees state produced by the complete execution of other
statements, but state might not be committed

A multiversion control guarantees that each statement sees a committed state:
o A statement is executed in a state whose value is a version
o Referred to as statement-level read consistency

A multiversion control can also guarantee that all statements of a transaction see the same
committed state:
o All statements of a transaction access the same version
o Referred to as transaction-level read consistency
Read-only multi-version control
Distinguishes in advance read-only (R/O) transactions from read/write (R/W) transactions.

R/W transactions use a (conventional) immediate-update, pessimistic control. Hence,
transactions access the most current version of the database.

All the reads of a particular R/O transaction TRO are satisfied using the most recent version
that existed when TRO requested its first read. Read consistency multi-version control
Page 11 of 29



T1 and T2 are read/write transactions T3 is read/only
T3 sees the version produced by T1
The equivalent serial order is T1, T3, T2
Implementation

DBMS maintains a version counter (VC)
o Incremented each time a R/W transaction commits

The new version of a data item created by a R/W transaction is tagged with the value of VC
at the time the transaction commits

When a R/O transaction makes its first read request, the value of VC becomes its counter
value. Each request to read an item is satisfied by the version of the item having the
largest version number less than or equal to the transaction‟s counter value.
Read consistency multi-version control

R/O transactions
o Treated as before: get transaction-level read consistency

R/W transactions
o Write statements acquire long-duration write locks (delay other write statements)
o Read statements use most recent (committed) version at time of read

Not delayed by write locks (since read locks are not requested).
Snapshot Isolation

Does not distinguish between R/W and R/O transactions

A transaction reads the most recent version that existed at the time of its first read request
o Guarantees transaction-level read consistency

The write sets of any two concurrently executing transactions must be disjoint
o Two implementations of this specification

First Committer Wins

Locking implementation
First committer wins

Writes use deferred-update (intentions list)

T is allowed to commit only if no concurrent transaction
o committed before T and
o updated a data item that T also updated

Control
o
o
o
o
is optimistic:
It can be implemented without any locks
Deadlock not possible
Validation (write set intersection) is required for R/W transactions and abort is
possible
Schedules might not be serializable
Page 12 of 29
Locking implementation of snapshot isolation

Immediate update pessimistic control

Reads do not get any locks and execute as in the previous implementation

A transaction T that wants to perform a write on some item must request a write lock
o
If the version number of that item is greater than that of T, T is aborted (first
committer wins)
o Otherwise, if another transaction has a write lock on that item, T waits until that
transaction completes

If that transaction commits, T is aborted (first committer wins)

If that transaction aborts, T is given the write lock and allowed to write
Following anomalies are impossible: Dirty read, dirty write, non-repeatable read, lost update.
Write skew is possible.
In a write skew anomaly, two transactions (T1 and T2) concurrently read an overlapping data set
(e.g. values V1 and V2), concurrently make disjoint updates (e.g. T1 updates V1, T2 updates V2),
and finally concurrently commit, neither having seen the update performed by the other.
Phantoms:
It looks like phantoms cannot occur in snapshot isolation but…non serializable schedules due to
phantoms are possible.
Example: concurrent transactions each execute SEL(P) and then insert a row satisfying P.

Neither sees the row inserted by the other.

Schedule is not serializable

Would be considered a phantom if it occurred at REPEATABLE READ.

Can be considered as write skew (is permitted in snapshot isolation).
Page 13 of 29
H22 Atomicity and Durability
Failures
Crash

Processor failure, software bug

Server supports atomicity by providing a recovery procedures to restore db, by using
rollback
Abort

By user, Transaction, System, etc

Roll transaction Back
Media

Durability requires commits to be permanent

Due to the possibility of media crash, the media used must be redundant.
Log
Log contains information which can restore the DB.
Each modification of DB causes an update record to be appended to log.
Update record contains:

Identity of data item modified

Identity of transaction (tid) that did the modification

Before image (undo record) – copy of data item before update occurred.
Abort using log
(Assume immediate-update approach)

Scan log backwards using tid to identify transaction‟s update records
o Reverse each update using before image
o Reversal done in last-in-first-out order

In a strict system, new values unavailable to concurrent transactions (as a result of long
term exclusive locks); hence rollback makes transaction atomic

Problem: terminating scan (log can be long)

Solution: append a begin record for each transaction, containing tid, prior to its first
update record


Crash



Savepoint record inserted in log when savepoint created
o Contains tid, savepoint identity
Rollback Procedure:
o Scan log backwards using tid to identify update records
o Undo updates using before image
o Terminate scan when appropriate savepoint record encountered
recovery using a log
Abort all transactions active at time of crash
Problem: How do you identify them?
Solution: abort record or commit record appended to log when transaction terminates
o Recovery Procedure:

Scan log backwards - if T‟s first record is an update record, T was active at
time of crash. Roll it back
NB: a transaction is not committed until its commit record is in the log
Page 14 of 29


Problem: Scan must retrace entire log
Solution: Periodically append checkpoint record (≠ savepoint!) to log. Contains tid‟s of all
active transactions at time of append
o Backward scan goes at least as far as last checkpoint record appended
o Transactions active at time of crash determined from log suffix that includes last
checkpoint record
o Scan continues until those transactions have been rolled back
Write ahead log: log buffer in main memory (extension of log on mass storage). Periodically
flushed to mass storage. Important log-buffer doesn‟t survive a crash.
Page buffer in main memory (cache), is volatile and has to be flushed to mass storage.
Atomicity and durability complicate algorithms. Requirements:

Write-ahead feature (move update records to log on mass store before database is
updated) necessary to preserve atomicity

New values written by a transaction must be on mass store when its commit record is
written to log (move new values to mass store before commit record) to preserve durability

Transaction not committed until commit record in log on mass store
Forced vs. Unforced writes:

On database page –
o Unforced write updates cache page, marks it dirty and returns control immediately.
o Forced write updates cache page, marks it dirty, uses it to update database page
on disk, and returns control when I/O completes.

On log –
o Unforced append adds record to log buffer and returns control immediately.
o Forced append, adds record to log buffer, writes buffer to log, and returns control
when I/O completes
o
After a flush of the log buffer, we start with a clean log buffer in volatile memory.
Page 15 of 29
Log Sequence Number (LSN):

Log records are numbered sequentially

Each database page contains the LSN of the update record describing the most recent
update of any item in the page
Commit processing: Force Policy
1. Force any update records of T in log buffer
then …
2. Force any dirty pages updated by T in cache
then …
(1) and (2) ensure atomicity (write-ahead policy)
3. Append T‟s commit record to log buffer
then …
Force log buffer for immediate commit or …
Write log buffer when a group of transactions have committed (group commit)
(2) and (3) ensure durability
Using forced policy: transactions updates are in the DB (mass storage) when it commits.
Problem: Pages updated by T might still be in cache when T‟s commit record is appended to log
buffer
Solution: Update record contains after image (called a redo record) as well as before image

Write-ahead property still requires that update record be written to mass store before page

But it is no longer necessary to force dirty pages when commit record is written to log on
mass store since all after images precede commit record in log

Referred to as a no-force policy
Page 16 of 29
Recovery processing: No-force policy
Problem: When a crash occurs there might exist some pages in database (on mass store)

containing updates of uncommitted transaction: they must be rolled back

that do not (but should) contain the updates of committed transactions: they must be
rolled forward
Solution: Use a sharp checkpoint

Before appending checkpoint record CK to log buffer, halt processing and force all dirty
pages from cache

Recovery process can assume that all updates in records prior to CK were written to
database (only updates in records after CK might not be in db)
p1 must be rolled forward using xnew
p2 must be rolled back using yold
1. Pass 1
Log is scanned backward to most recent checkpoint record, CK, to identify transactions
active at time of crash
2. Pass 2
Log is scanned forward from CK to most recent record. The after images in all update
records are used to roll the database forward
3. Pass 3
Log is scanned backwards to begin record of oldest transaction active at time of crash. The
before images in the update records of these transactions are used to roll these
transactions back

This is called DO-UNDO-REDO
(updates – rollback in pass 3 – rollforward in pass 2)
Issue 1: Database pages containing items updated after CK was appended to log might have been
flushed before crash

No problem – with physical logging, roll forward using after images in pass 2 is idempotent

Rollforward in this case is unnecessary, but not harmful
Issue 2: Some update records after CK might belong to an aborted transaction T1.

These updates are restored in pass 2 and but not rolled back in pass 3 since T1 was not
active at time of crash

Treat rollback operations for aborting T1 as ordinary updates and append compensating log
records to log
Issue 3: What if system crashes during recovery?

Recovery is restarted

If physical logging is used, pass 2 and pass 3 operations are idempotent and hence can be
redone
Page 17 of 29
Fuzzy checkpoints
Before writing CK, record the identity of all dirty pages (don‟t flush) in volatile memory. Write
(flush) dirty pages in the background
Example:

Page corresponding to U1 (x) is recorded
at CK1 and will have been flushed by CK2

Page corresponding to U2 (y) is recorded
at CK2, but might not have been flushed at
time of crash

Pass 2 must start at CK1
Deferred update system
Durability
Update: append new value to intentions-list; append update record to log buffer.
Abort: discard intentions-list
Commit: force commit record to log. Update db using intentinons-list.
Recovery
Checkpoint record contains list of committed (not active) but incomplete transactions (intentionslist).

Scan back to most recent checkpoint record to determine transactions that are committed
but for which updates are incomplete at time of crash

Scan forward to install after images for incomplete transactions

No third pass required since transactions active (not committed) at time of crash have not
affected database
Database dump
Simple Dump

System stops accepting new transactions

Wait until all active transactions complete

Dump: copy entire database to a file on mass storage (including mirror)

Restart log and system
Simple Dump Restore

Install most recent dump file

Scan backward through log

Determine transactions that committed since dump was taken

Ignore aborted transactions and those that were active when media failed

Scan forward through log

Install after images of committed transactions
Fuzzy Dump
Write begin record to log; Copy db records to dump file while system is active.
Naïve restoration

Install dump on disk

Scan log backwards to begin dump record to produce list L of all transactions that
committed since start of dump

Scan log forward and install after images in update records of all transactions in L
Page 18 of 29
Some examples that it works fine:
Naïve restoration doesn‟t handle 2 cases

T commits before dump starts but its dirty pages might not have been flushed until dump
completed. Dump does not read T‟s updates and T is not in L .

Dump reads T‟s updates but T later aborts:
Page 19 of 29
H23 Architecture of Transaction processing systems
Three-tier architectures
First single user system.
Presentation services: display forms etc.
Application services: implements user req, interacts with DBMS
TPS = Transaction Processing System
Application server

Sets transaction boundaries

Acts as a workflow controller: implements user request as a sequence of tasks
e.g., registration = (check prerequisites, add student to course, bill student)

Acts as a router
Distributed transactions involve multiple servers
Server classes are used for load balancing

Since workflows might be time consuming and application server serves multiple clients,
application server is often multi-threaded
Page 20 of 29
Transaction server

Stored procedures off-loaded to separate (transaction) servers to reduce load on DBMS.

Transaction server close to DBMS, Application server close to clients

Transaction server does bulk of data processing.
Interconnection of servers in 3-tiered model:
Session and context
A session exists between two entities if they exchange messages while cooperating to perform
some task.
Client/server session: server context (describing client) has to be maintained by server in order to
handle a sequence of client requests.
Direct vs. Queued transaction processing
Direct: Client waits until request is serviced. Service provided as quickly as possible and result is
returned. Client and server are synchronized.
Queued: Request enqueued and client continues execution. Server dequeues request at a later
time and enqueues result. Client dequeues result later. Client and server unsynchronized.
Three



transactions on two recoverable queues Advantages:
Client can enter requests even if server is unavailable
Server can return results even if client is unavailable
Request will ultimately be served even if T2 aborts (since queue is transactional)
Heterogeneous vs Homogeneous TPS

Homogeneous systems are composed of HW and SW modules of a single vendor

Modules communicate through proprietary (often unpublished) interfaces

Hence, other vendor products cannot be included

Referred to as TP-Lite systems

Heterogeneous systems are composed of HW and SW modules of different vendors

Modules communicate through standard, published interfaces

Referred to as TP-Heavy systems
Middleware is the software that integrates the components of a heterogeneous system and
provides utility services . For example, supports communication (TCP/IP), security (Kerberos),
global ACID properties, translation (JDBC)
Transaction Manager

Middleware to support global atomicity of distributed transactions

Application invokes manager when transaction is initiated

Manager is informed each time a new server joins the transaction

Application invokes manager when transaction completes

Manager coordinates atomic commit protocol among servers to ensure global
atomicity
Page 21 of 29
TP monitors
A TP Monitor is a collection of middleware components that is useful in building hetereogeneous
transaction processing systems

Includes transaction manager

Application independent services not usually provided by an operating system
TP Monitor Services

Communication services

Built on message passing facility of OS

Capable of type checking

Peer-to-peer (sessions), RPC, and/or event communication (event broker)

Location transparent

Transactional (TRPC)

Robust against failures

Asymmetric or synchronous

If within transaction

Asymmetric use persistent queue

Symmetric requester joins transaction

ACID properties

Local isolation for a (non-db) server might be provided by a lock manager

Local atomicity for a (non-db) server might be provided by a log manager

Global isolation and atomicity are provided by transaction manager

Routing and load balancing

TP monitor can use load balancing to route a request to the least loaded member of
a server class

Threading

Threads can be thought of as low cost processes

Useful in servers (e.g., application server) that might be maintaining sessions for a
large number of clients

TP monitor provides
threads if OS does
not

Recoverable queues

Security services

Encryption,
authentication, and
authorization

Miscellaneous servers

File server

Clock server
Page 22 of 29
Storage Architectures
Bottleneck in performance is disk I/O
DBMS maintains disk cache in main memory.
Some possible usage of systems:
RAID, NAS and SAN (storage attached network)
Architecture of Web transaction processing
A Web application server is a set of tools and modules for building and executing transaction
processing systems for the Web

Including the application server tier of the system
We discuss J2EE (Java 2 Enterprise Edition) standard

J2EE One language, many platforms

.NET One platform, many languages. set of products of Microsoft

J2EE defines a set of services and classes particularly oriented toward transaction-oriented
Web services
o Java servlets
o Enterprise Java beans
J2EE
Enterprise Java Beans

Java classes that implement the business methods of an enterprise

Execution within an infrastructure of services provided by the Web application server
o Supports transactions, persistence, concurrency, authorization, etc.
o Implements declarative transaction semantics

The bean programmer can just declare that a particular method is to be a
transaction and does not have to specify the begin and commit commands
o Bean programmer can focus on business methods of the enterprise rather on
details of system implementation
Page 23 of 29







Entity bean: represents a persistent business object whose state is stored in the database
o one bean = one table
o one bean instance = one row in table
Session bean: represents a client performing interactions within a session using the
business methods of the enterprise
o can retain state during interactions
o session beans call methods of entity beans (synchronous communication)
o can be transactional (JDBC or JTA: Java Transaction API)
Message-driven bean: is like a session bean but asynchronous (uses JMS message
queues)
The bean class
o Contains implementations of the business methods of the enterprise
A remote interface (also optionally a local interface)
o Used by clients to access the bean class remotely, using TRPC(or locally with the
local interface); acts as proxy for bean class
o Includes declarations of all the business methods
A home interface (also optionally a local home interface)
o Contains methods that control bean‟s life cycle (Create, remove) and finder
methods (e.g. FindByPrimaryKey)
A deployment descriptor
o Declarative metadata for the bean
o Describes persistence, transactional, and authorization properties
Page 24 of 29
H24 Distributed Transactions
Atomic Commit Protocol
Global atomicity

All subtransactions of a distributed transaction must commit or all must abort

An atomic commit protocol, initiated by a coordinator (e.g., the transaction manager),
ensures this.

Coordinator polls cohorts (participating databases) to determine if they are all willing to
commit

Protocol is supported in the xa-interface between a transaction manager and a resource
manager (e.g., DBMS)
ACID properties

Requirement for each local DBMS
o supports ACID properties locally for each subtransaction
o eliminates local deadlocks

The additional issues are:
o Global atomicity: all cohorts must abort or all commit
o Global deadlocks: there must be no deadlocks involving multiple sites
o Global serialization: distributed transaction must be globally serializable
Cohort Abort
Reasons for abort: Validation failure, deadlock, crash of cohort site, no communication with cohort
site.
Atomic Commit Protocol

Most commonly used atomic commit protocol is the two-phase commit protocol

Implemented as an exchange of messages between the coordinator and the cohorts

Guarantees global atomicity of the transaction even if failures should occur while the
protocol is executing
Two-Phase Commit Protocol
Transaction record resides in volatile memory of transaction manager. Created when application
calls tx_begin.
Phase 1
Application invokes tx_commit

Coordinator sends prepare message to all cohorts

If cohort wants to commit, it moves all update records to mass store by forcing a prepare
record to its log
o Guarantees that cohort will be able to commit (despite crashes) if coordinator
decides commit (since update records are durable)

Cohort enters prepared state

Cohort sends a vote message (“ready” or “aborting”).
Page 25 of 29
Cohort
cannot change its mind
retains all locks if vote is ready”
enters uncertain period (it cannot foretell final outcome)
Note that cohort may abort at any time prior to or on receipt of
the message: it aborts and releases locks
Coordinator receives vote messages

Coordinator records vote in transaction record

Remember, vote indicates cohort is “ready” to commit or aborting

If any vote is aborting, coordinator decides abort and deletes transaction record

If all are ready, coordinator decides commit, forces commit record (containing
transaction record) to its log (end of phase 1)

Coordinator sends commit or abort message to all cohorts
Note:

Transaction is logically committed when commit record is durable

Since all cohorts are in prepared state, transaction can be committed despite any failures




Phase 2

Cohort receives commit message from coordinator
o Cohort commits locally by forcing a commit record to its log
o Cohort sends done message to coordinator

Cohort receives abort message from coordinator
o Cohort aborts

Locks are released and uncertain period ends

Coordinator receives a done message
o Coordinator records receipt of done message
o If all received, coordinator writes a complete record to its log and deletes
transaction record from volatile store
Abort case
Commit case
Failures
A participant recognizes 2 failure situations:
Timeout: No response to a message execute timeout protocol
Crash: On recovery, execute a restart protocol
A cohort is blocked when it cannot complete protocol until some failure is repaired.
Timeout protocol
[Cohort] Time out waiting for prepare

Abort the subtransaction
o Since the (distributed) transaction cannot Commit unless cohort votes to commit,
atomicity is preserved
[Coordinator] Time out waiting for vote

Abort the transaction
o Since coordinator controls decision, it can force all cohorts to abort, preserving
atomicity
Page 26 of 29
[Cohort] Time out waiting for commit/abort

Cohort is in prepared state

Cohort is blocked since it does not know coordinator‟s decision
o Coordinator might have decided commit or abort
o Cohort cannot unilaterally decide since its decision might be contrary to
coordinator‟s decision, violating atomicity
o Locks cannot be released

Cohort requests status from coordinator and remains blocked
[Coordinator] Time out waiting for done

Requests done message from delinquent cohort
Restart protocol
Cohort
On restart cohort finds in its log

begin_transaction record, but no prepare record:
o Abort (transaction cannot have committed because cohort has not voted)

prepare record, but no commit record (cohort crashed in its uncertain period)
o Does not know if transaction committed or aborted
o Locks items mentioned in update records before restarting system
o Requests status from coordinator and blocks until it receives an answer

commit record
o Recover transaction to committed state using log
Cohort in blocked state, but coordinator does not respond to a request for status; either

Wait until the coordinator is restarted

Give up, make a unilateral decision, and attach a fancy name to the situation.

Resolve the potential loss of atomicity outside the system (thus human intervention
needed)
Coordinator
On restart

Records of transactions in phase 1 lost (transaction record in volatile memory)

If there are transactions in phase 2, then we have commit record in the log, but no
complete record: restore transaction record to volatile memory
On receiving a request from a cohort for transaction status

If transaction record exists in volatile memory, reply based on information in transaction
record

If no transaction record exists in volatile memory
o Cohort asks for status, but no trans.rec. at coord, then either

The coordinator aborted the transaction and deleted transaction record

The coordinator crashed, restarted and didn‟t find commit record. It was in
Phase 1 of the protocol and had not yet made a decision, or it had
previously aborted the transaction.
o Thus coordinator can respond abort
Linear Commit

Variation of two-phase commit that involves transfer of coordination

Cohorts are assumed to be connected in a linear chain





When left cohort A is ready to commit, it goes in prepared state and sends a vote message
„ready‟to B (request B to act as coordinator)
After receiving the vote message, B does the same as A and sends vote message to C
When vote message reaches rightmost cohort N, if N is ready to commit it commits the
entire transaction (acts as coordinator) and sends commit message to it‟s left.
Message goes down the chain until it reaches A.
When A has committed it sends a „done‟ message to B and so on until it reaches N.
Page 27 of 29
Two-phase commit without prepared state

Assume exactly one cohort C does not support a prepared state.

Coordinator performs Phase 1 of two-phase commit protocol with all other cohorts

If they all agree to commit, coordinator requests that C commit its subtransaction (in
effect, requesting C to decide the transaction‟s outcome)

C responds commit/abort, and the coordinator sends a commit/abort message to all other
sites
Global Deadlock
Not always detectable on any one site (it may be distributed over multiple sites).
Detection by a simple extension of local deadlock detection. Check for a cycle with a probe. The
probe is send to the coordinator of the cohort it is waiting for. When the probe returns a deadlock
exists.
Prevention by using timestamps. A older transaction never waits for a younger one, the younger
one is aborted.
If all sites use strict two-phase locking and the transaction manager uses a two-phase
commit protocol,
Then transactions are globally serializable in commit order.
Replication

Advantages
o Improves availability: data can be accessed even though some site has failed
o Can improve performance: a transaction can access the closest (perhaps local)
replica

Disadvantages
o More storage
o Mutual consistency of replicas must be maintained
o Access by concurrent transactions to different replicas can lead to incorrect results
Replica control

Knows location of all replicas

Translates transaction‟s request to access an item into a request to access particular
replica(s)

Maintains some form of mutual consistency:
o Strong: all replicas always have the same value (in every committed version of the
database)
o Weak: all replicas eventually have the same value
o Quorum: a quorum of replicas have the same value (a certain set of servers have
the same data)
Page 28 of 29
Read one /write all replica control
Read request: use nearest replica
Write request: update alle replicas

Synchronous: immediately

Asynchronous: eventually
Quorum Consensus replica control
To read using quorums, timestamps are used to select the right values. See also example sheet
(made during lecture).
Primary copy control
One copy designated primary, other copies secondary.
Reading is from the nearest copy. Writing is on the primary copy. After commit copies are
propagated to secondary copies, thus it‟s Asynchronous, hence good for performance bad for
consistency.
Page 29 of 29