Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Transaction Management An Overview Copyright © 2003 – 2013 by Curt Hill Transaction • Any one execution of a user program – A sequence of SQL statements – A program that accesses the DBMS to accomplish a similar action • Mostly interested in transactions that change the database – An interaction with queries is also interesting • A transaction is the unit of interest for concurrent execution and recovery Copyright © 2003 – 2013 by Curt Hill Examples • Buying a product – Finding how many there are – Removing the sold ones from stock – Problem is two requests that appear simultaneously and cannot both be satisfied • Complicated queries – Any query where a single table appears more than once in the From – The multiply referenced tables should be the same even with concurrent updates Copyright © 2003 – 2013 by Curt Hill The acronym ACID • Atomic – Transaction perceived to be indivisible • Consistent – Transforms database from one consistent state to another • Isolated – Understandable without regards to any other agents or transactions • Durable – Once committed permanence is guaranteed even with system crashes Copyright © 2003 – 2013 by Curt Hill Atomic • Either all the actions are applied or none of them are applied • No incomplete actions are allowed • Since a transaction is made up of many smaller actions: – Some of these may be done before a problem occurs – The DBMS must be able to undo any of the smaller pieces of a transaction if the entire transaction is aborted Copyright © 2003 – 2013 by Curt Hill Errors • What causes a transaction to fail? • Expected problem – The withdraw amount exceeds the ATM machine or source account – The transaction aborts itself • Transient problem – For reasons seen later the DBMS aborts the transaction and restarts it later • System error – Disk failure, power failure among Copyright © 2003 – 2013 by Curt Hill others Durability • A transaction may either commit or rollback • Committed transactions must be durable • Rolled back transaction must be completely undone – As if they were never executed at all – Queries are no problem, updates are • Once a transaction executes a commit or rollback this should be accomplished even if the system crashes Copyright © 2003 – 2013 by Curt Hill Consistency • Two domains: – A database is consistent if all constraints are met – A database is consistent if it correctly models some real-world situation • The first is met by the normal checking of the DBMS • The second is the responsibility of the transaction Copyright © 2003 – 2013 by Curt Hill Consistency example • Transferring money from one account to another – Removing the money from one account leaves the database inconsistent in the second sense – This is corrected when the money is added to the second account • Both actions must be in the same transaction Copyright © 2003 – 2013 by Curt Hill Isolation • Interleaving of the actions within several different actions may occur • The DBMS must guarantee that result will be the same as if they were completely serialized • This interleaving is from concurrent execution • Without interleaved execution, performance will be poor Copyright © 2003 – 2013 by Curt Hill Transaction Schedules • A transaction is a list of actions • The actions include: – Reads and writes of tuples – Commit or rollback commands – Others: arithmetic, comparisons, etc • Two lists may only interact through the reads and writes • A transaction schedule is a ordering all of the actions from a group of transactions Copyright © 2003 – 2013 by Curt Hill Schedules • A schedule interleaves the actions of several transactions • A complete schedule has all the actions of all the transactions • A serial schedule removes interleaving • Isolation dictates that the serial schedule ends with the same result as an interleaved schedule Copyright © 2003 – 2013 by Curt Hill Concurrency • Can we afford a serial schedule? – One transaction completely finished before the next one is begun • No – The impact on performance is too large • Instead – We have to handle the transactions concurrently • The issues are discussed in the concurrency presentation Copyright © 2003 – 2013 by Curt Hill Serializability • A serializable schedule has an equivalent effect to a serial schedule • The serializable schedule allows interleaved operations, while the serial schedule does not • The idea is that we get concurrent execution and consistent results • Consider some examples using two transactions Copyright © 2003 – 2013 by Curt Hill Notes on Examples • We have two transactions, T1 and T2 • Each does two reads and writes • These will be denoted by R(A) and W(A) – Read A (a page) – Write A (a page) Copyright © 2003 – 2013 by Curt Hill Example 1 – No Commonality T1 T2 R(A) W(A) R(X) W(X) R(B) R(Y) W(Y) W(B) Copyright © 2003 – 2013 by Curt Hill No Commonality • Easy example • Since there are no pages in common • Any interleaving works – Provided the order is maintained on each side • Painless and free concurrency • Somewhat more difficult if the Write is an insert in a B+Tree index which causes splits – Then the pages could be different and still interfere Copyright © 2003 – 2013 by Curt Hill Example 2 – Commonality T1 T2 R(X) W(X) R(X) W(X) R(Y) W(Y) R(Y) W(Y) Copyright © 2003 – 2013 by Curt Hill Commonality • This serializes the same as T1 completed and then T2 • The order of this could change and give a different serial • Such as the following Copyright © 2003 – 2013 by Curt Hill Example 2 – Again T1 T2 R(X) W(X) R(X) W(X) R(Y) W(Y) R(Y) W(Y) Copyright © 2003 – 2013 by Curt Hill Again • This serializes the same as T2 completed and then T1 • Either of these are acceptable • All we have to do is guarantee that our interleaved schedule is equivalent to some serial schedule, not a particular serial schedule • We do not care which customer gets the product as long as game is fair Copyright © 2003 – 2013 by Curt Hill Non-Serializable Schedule T1 R(X) T2 R(X) W(X) R(Y) W(X) R(Y) W(Y) W(Y) Copyright © 2003 – 2013 by Curt Hill No Equivalent Serial Schedule • T2’s write of X is lost • T1’s write of Y is also lost • This assumes that the entire page is given to or received from buffer pool • There are many other interleavings that also lose something • This lost update is one of several concurrent execution anomalies Copyright © 2003 – 2013 by Curt Hill Interleaved Execution Anomalies • The above is similar to updating a shared variable in memory • Databases allow both commit and rollback operations which cause further problems • These are unlike the shared memory problem Copyright © 2003 – 2013 by Curt Hill Four combinations • RR – Two separate transactions doing a Read – This is only always harmless case • RW – T1 wants to Read and T2 write page • WR – T1 wants to Write and T2 read page • WW – Both want to write same page – This was first example seen Copyright © 2003 – 2013 by Curt Hill Reading Uncommitted Data • Uncommitted data is any data modified by a transaction • The modification may be in the past or future • Reading uncommitted data is called a dirty read Copyright © 2003 – 2013 by Curt Hill An Aborted Transaction Problem T1 T2 R(X) W(X) R(X) Rollback T1 R(X) W(X) T2 R(X) W(X) Rollback Copyright © 2003 – 2013 by Curt Hill Rollback Problems • Notice in previous that T1 was a query, not an update that had problems • This is an example of a WR problem • We do not need a rollback to cause the problem • Consider the next situation where each record has $10,000 – T1 increases this by 10% and T2 increase by $1000 Copyright © 2003 – 2013 by Curt Hill A WR Problem T1 T2 R(X) W(X) R(X) W(X) R(Y) W(Y) Commit R(Y) W(Y) Commit Copyright © 2003 – 2013 by Curt Hill Results of the WR anomaly • The X record gets increased by 2000 – 2000 = 10000*10% + 1000 • The Y record gets increased by 2100 – 2100 = (10000+1000)*10% • Not consistent or serializable • T2’s read of X and T1’s read of Y are dirty reads Copyright © 2003 – 2013 by Curt Hill RW Problems • An RW problem can result in an unrepeatable read • Two reads in a row that yield different results Copyright © 2003 – 2013 by Curt Hill An Unrepeatable Read T1 T2 R(X) R(X) W(X) Commit R(X) W(X) Commit Copyright © 2003 – 2013 by Curt Hill What’s the fix? • Lock an object in one of several ways to prevent this type of interleaving • The most common protocol is Strict Two Phase Locking • AKA Strict 2PL • There are other protocols as well Copyright © 2003 – 2013 by Curt Hill Types of Locks • Shared lock – Used for reading • Exclusive lock – Used for writing but also allows reading • The lock may be on a: – – – – Tuple Page Relation This is its granularity • The larger the granularity the less concurrency and easier to manage Copyright © 2003 – 2013 by Curt Hill Strict 2PL Rules • A transaction requests a lock when it desires to access the object – Shared lock for reads and exclusive for writes • It holds all locks until complete – Either a commit or rollback • These locks are inserted by the DBMS, not necessarily observed in the transaction • A transaction that cannot get the lock is suspended until the item is available • If both reading and writing is desired get the exclusive lock Copyright © 2003 – 2013 by Curt Hill Some examples revisited • • • • Two new commands are added S(A) gets a shared lock on A X(A) gets an exclusive lock on A These locks are held until a commit or rollback Copyright © 2003 – 2013 by Curt Hill WW Problem with Locks T1 X(X) R(X) W(X) X(Y) R(Y) W(Y) Commit T2 X(X) Suspended until Commit R(X) W(X) X(Y) R(Y) W(Y) Commit Copyright © 2003 – 2013 by Curt Hill WR Problem with locks T1 S(X) Shared lock causes suspension until rollback T2 X(X) R(X) W(X) Rollback R(X) Copyright © 2003 – 2013 by Curt Hill Another WR T1 X(X) R(X) W(X) R(Y) W(Y) Commit T2 X(X) Exclusive lock suspended until commit occurs R(X) W(X) R(Y) W(Y) Commit Copyright © 2003 – 2013 by Curt Hill DeadLock T1 X(X) R(X) W(X) X(Y) Suspended since T2 has Y T2 X(Y) R(Y) W(Y) X(X) Suspended since T1 has X Both are now deadlocked Copyright © 2003 – 2013 by Curt Hill What do we do? • The usual solution is timers • When a transactions suspension for a lock exceeds a threshold value – Abort it – Rollback all actions – Restart the whole transaction Copyright © 2003 – 2013 by Curt Hill Performance • What does locking do to the performance of a DBMS? • It must slow the DBMS • This is better than incorrect results • It will slow serializable schedules that otherwise had no problems – This disregards lock bookkeeping • Consider a previous example Copyright © 2003 – 2013 by Curt Hill A Serializable Schedule T1 T2 R(X) W(X) R(Y) W(Y) Commit R(X) W(X) R(Y) W(Y) Commit Copyright © 2003 – 2013 by Curt Hill Commentary • This schedule was equivalent to the serial schedule of T1 followed by T2 • It also had nice concurrency • Introducing locking preserves correctness but destroys concurrency Copyright © 2003 – 2013 by Curt Hill A Serialed Schedule T1 X(X) R(X) W(X) X(Y) R(Y) W(Y) Commit T2 X(X) Lock prevents any action until T1’s commit R(X) W(X) X(Y) R(Y) W(Y) Commit Copyright © 2003 – 2013 by Curt Hill SQL • The transaction statements of SQL are: – Commit – Rollback – Begin • Not needed for simple queries • Commit is the default for changes if neither is given Copyright © 2003 – 2013 by Curt Hill Begin • Begin is used to show boundaries • Form is: BEGIN; or BEGIN TRANSACTION • The transaction is then terminated by the Commit or Rollback Copyright © 2003 – 2013 by Curt Hill Example • BEGIN; Insert Into course VALUES ('PHYS', 141, 4, 'College Physics') Insert Into course VALUES (‘CSCI', 242, 3, ‘Data Structures') COMMIT • The more common usage is through a program which determines whether to commit or rollback Copyright © 2003 – 2013 by Curt Hill Save Points • A save point is a named point to complete the rollback • Example: SAVEPOINT xyz … ROLLBACK to SAVEPOINT xyz • This prevents rolling back to BEGIN • This was introduced in SQL 1999 Copyright © 2003 – 2013 by Curt Hill Granularity • The size of the thing to lock – Tuple – Page – Relation • Tradeoff – Locking large things impedes concurrency – Locking small things requires more overhead and allows bad things to happen Copyright © 2003 – 2013 by Curt Hill Phantom Read Problem • Two reads with different results • T1 queries table and finds all rows with certain criteria • Does exclusive lock on all of these • T2 inserts a new record which matches the criteria • T1 now re-queries the criteria and gets a different set • Putting a shared lock on whole table solves problem but limits concurrency Copyright © 2003 – 2013 by Curt Hill Recovery • The Recovery Manager is responsible for several important issues • Removing rollbacks from the file system • Guaranteeing that the commit changes are preserved in the file system • Defending against crashes and rebuilding the database when one occurs • Recovery management is covered in a subsequent presentation Copyright © 2003 – 2013 by Curt Hill Closing Thoughts • ACID is the ideal of database reliabilty • This is not always possible in distributed databases – CAP theorem precluded – There we settle for BASE Copyright © 2003 – 2013 by Curt Hill