* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Document
Survey
Document related concepts
Oracle Database wikipedia , lookup
Open Database Connectivity wikipedia , lookup
Consistency model wikipedia , lookup
Relational model wikipedia , lookup
Functional Database Model wikipedia , lookup
Database model wikipedia , lookup
Clusterpoint wikipedia , lookup
Microsoft Jet Database Engine wikipedia , lookup
Versant Object Database wikipedia , lookup
Extensible Storage Engine wikipedia , lookup
Global serializability wikipedia , lookup
Commitment ordering wikipedia , lookup
Transcript
Chapter 15: Transaction Management Chapter 16: Transaction Management Transaction Processing Overview Transaction Concept Transaction Definition in SQL Transaction State ACID Properties Concurrent Executions Serializability Recoverability Implementation of Isolation Testing for Serializability. Transaction Processing - Basics A transaction is a logical unit of a database processing Transaction processing systems include large databases and hundreds of concurrent users Examples of these systems are: airline reservations, banking, credit card processing, supermarket checkout, and similar systems Multi - User Database Systems One way to classify DBMSs is according to the number of concurrent users: single user multi-user Majority of database systems are of a multi - user type Concurrent (or simultaneous from the user point of view) database usage is possible thanks to computer multiprogramming Multiprogramming operating systems execute some commands of one process, then suspend this process and execute some commands of another process After a while, the execution of the first process is resumed at the point where it was interrupted This type of process execution is called interleaving Interleaved and Parallel Processes process 1 process 1 process 2 Interleaved process 2 t t1 t3 t2 t4 t5 process 1 process 2 t2 - t1 + t4 - t 3 t3 - t2 + t5 - t 4 Parallel t A Question for You Are multiprogramming and interleaving more efficient than monoprogramming with serial program execution? Answers: a) b) c) Yes, but only from a user’s point of view No, because multiprogramming means interrupting and resuming programs, which introduce OS overheads Yes, from the point of view of both users and computers, because an OS interrupts execution of a program when it issues an i/o operation, thus saving long idle times, and resumes its execution when the i/o has been finished Transaction Concept A transaction is a single logical unit of a database processing that includes one or more database access operations (read and write ) A transaction is a unit of program execution that accesses and possibly updates various data items. E.g. transaction to transfer $50 from account A to account B: 1. 2. 3. 4. 5. 6. It may involve one or more operations on the database It could be as simple as a single SQL command Or as complex as the set of accesses performed by an application program read(A) A := A – 50 write(A) read(B)` B := B + 50 write(B) Here our transaction consists of four database accesses (reads and writes), and twp non-database activities (decreasing value of A & increasing value of B) A transaction should always transform the database from one consistent state to another. If a transaction finishes successfully, all data it has changed are visible to other transactions If a transaction fails for any reason, DBMS has to undo all the changes that the transaction made against the database Transactions (continued) In multi – user transaction processing systems, users execute database transactions concurrently Most often, concurrent means interleaved The users can attempt to modify the same database items at the same time, and that is potential source of database inconsistency Checking database integrity constraints is not enough to protect a database from threats induced by its concurrent usage Two main issues to deal with: Failures of various kinds, such as hardware failures and system crashes, due to which transaction is incomplete Concurrent execution of multiple transactions Transaction Outcomes A transaction can have one of two outcomes: If it completes successfully, the transaction is said to have committed and the database reaches a new consistent state. If it does not execute successfully, the transaction is aborted.– In this case, the database must be restored to the consistent state it was in before the transaction started.– This is known as rollback. Whatever the outcome, the database is in a consistent state at the end of the transaction Transaction Support Once a transaction has committed, it cannot be aborted. Thus, if we decide that a committed transaction was a mistake, then we must perform another transaction to reverse it. On the other hand, an aborted transaction can be restarted later, and depending on the cause of failure, may successfully execute and commit at that time. A DBMS has no way of knowing which updates are grouped together to form a single, logical transaction. Therefore, the user must be provided with a way to indicate the boundaries of each transaction. For example, there may be keywords such as BEGIN_TRANSACTION, COMMIT, and ROLLBACK to delimit a transaction. If such delimiters are not used, the whole program is usually treated as a single transaction with the DBMS automatically performing a COMMIT upon successful termination, or ROLLBACK if not. Transaction Execution with SQL Transaction support provided by COMMIT ROLLBACK COMMIT statement is reached when all changes are permanently recorded within the database, it indicates successful end of transaction ROLLBACK statement is reached in which all changes are aborted and database is rolled back to previous consistent stage, it indicates unsuccessful end of transaction. Transaction State Active – Partially committed – after the final statement the initial state; the transaction stays in this state while it is executing has been executed. Failed -- after the discovery that normal execution can no longer proceed. Aborted – after the transaction has been rolled back and the database restored to its state prior to the start of the transaction. Two options after it has been aborted: restart the transaction can be done only if no internal logical error kill the transaction Committed – after successful completion. committed transactions cannot be undone Effects of Transaction State (Cont.) Commit Abort Begin Transaction Transaction Properties (ACID) To preserve the integrity of data the database system must ensure: Atomicity. Either all operations of the transaction are executed and properly reflected in the database or none are. Consistency. A transaction transforms the database from one consistent state to another. Isolation. Although multiple transactions may execute concurrently, each transaction must be unaware of other concurrently executing transactions. Intermediate transaction results must be hidden from other concurrently executed transactions. That is, for every pair of transactions Ti and Tj, it appears to Ti that either Tj, finished execution before Ti started, or Tj started execution after Ti finished. Durability. After a transaction completes successfully, the changes it has made to the database persist, even if there are system failures. Example of Fund Transfer Transaction to transfer $50 from account A to account B: 1. 2. 3. 4. 5. 6. read(A) A := A – 50 write(A) read(B) B := B + 50 write(B) Atomicity requirement if the transaction fails after step 3 and before step 6, money will be “lost” leading to an inconsistent database state Failure could be due to software or hardware the system should ensure that updates of a partially executed transaction are not reflected in the database Either all the operations of the transaction be completed or if not then the transaction should be aborted Example of Fund Transfer Transaction to transfer $50 from account A to account B: 1. 2. 3. 4. 5. 6. read(A) A := A – 50 write(A) read(B) B := B + 50 write(B) Durability requirement — once the user has been notified that the transaction has completed (i.e., the transfer of the $50 has taken place), the updates to the database by the transaction must persist even if there are software or hardware failures. Ensuring durability is the responsibility of recovery management component Example of Fund Transfer (Cont.) Transaction to transfer $50 from account A to account B: 1. 2. 3. 4. 5. 6. read(A) A := A – 50 write(A) read(B) B := B + 50 write(B) Consistency requirement : The sum of A and B is unchanged by the execution of the transaction A transaction must see a consistent database. Execution of a transaction must leave the database in either a new stable state or revert to old stable state. During transaction execution the database may be temporarily inconsistent. When the transaction completes successfully the database must be consistent Erroneous transaction logic can lead to inconsistency In general, consistency requirements include Explicitly specified integrity constraints such as primary keys and foreign keys Implicit integrity constraints e.g. sum of balances of all accounts, minus sum of loan amounts must equal value of cashin-hand Example of Fund Transfer (Cont.) Isolation requirement — if between steps 3 and 6, another transaction T2 is allowed to access the partially updated database, it will see an inconsistent database (the sum A + B will be less than it should be). T1 T2 1. read(A) 2. A := A – 50 3. write(A) read(A), read(B), print(A+B) 4. read(B) 5. B := B + 50 6. write(B Data used during execution of a transaction cannot be used by a second transaction, till the first one is completed Isolation can be ensured trivially by running transactions serially, that is, one after the other. However, executing multiple transactions concurrently has significant benefits, as we will see later. Reason for Incomplete transactions It can be aborted or terminated unsuccessfully because of some problem during execution. System may crash while one or more transactions are in progress. Transaction may encounter an unexpected situation (read an unexpected data value or unable to access some disk) and decide to abort. DBMS ensures transactions atomicity by undoing actions of incomplete transactions. It maintains a records, called log, of all the writes to the database. It ensures durability also. If system crashes before changes made by complete Transaction are written to the disk, log is used to remember and restore changes when system restarts. Log File Typically, a log file contains records with the following contents: [start_transaction, T ] (*T is transaction id*) [write_item, T, X, old_value, new_value] [read_item,T, X ] (*optional*) [commit, T ] [abort, T ] Sources of Database Inconsistency Uncontrolled execution of database transactions in a multi – user environment can lead to database inconsistency There is a number of possible sources of database inconsistency The typical ones are: lost update problem, dirty read problem, and unrepeatable read problem Lost Update Problem T1 T2 read_item ( X ) read_item (X) X=X–N write_item (X ) X=X+M t i m e write_item (X) After termination of T2, X = X + M. T1's update to X is lost because T2 wrote over X Generally, lost update problem is characterized by: •T2 reads X, •T1 writes X, and •T2 writes X Lost update problem Imagine that a customer wants to withdraw £30 from a bank account. At the same time, the bank is crediting this month’s salary. Time t1 t2 t3 t4 t5 t6 T1 (withdrawal) begin_transaction read(balance) balance=balance-30 write(balance) commit T2 (credit salary) begin_transaction read(balance) balance=balance+1000 write(balance) commit Balance 100 100 100 1100 70 70 Both transactions occur at roughly the same time and read the same initial balance. The last transaction to commit overwrites the update made by the first. Uncommitted Dependency : Dirty Read Problem T1 T2 read_item ( X ) X=X–N write_item (X ) read_item (X) X=X+M write_item (X) read_item ( Y ) T1 fails t i m e Generally, dirty read problem is characterized by: •T1 writes X, •T2 reads X, and •T1 fails Since T1 failed, DBMS is going to undo the changes it made against the database T2 has already read item X = X - N value, and that value is going to be altered by DBMS back to X The Uncommitted dependency :Dirty Read problem Time t1 t2 t3 t4 t5 t6 t7 t8 The uncommitted dependency problem (dirty read) occurs when one transaction sees the intermediate results of another (aborted) transaction. T1 (withdrawal) begin_transaction read(balance) balance=balance – 30 write(balance) rollback T2 (credit salary) begin_transaction read(balance) balance = balance + 1000 write(balance) commit Balance 100 100 100 70 70 70 1070 1070 For some reason, the withdrawal transaction is aborted, but the salary credit transaction has already seen the update. When T2 commits, the balance is incorrect (it should be 1100). Unrepeatable Read / Inconsistent Retrieval problem T1 T2 read_item ( X ) read_item (X) X=X+M write_item (X) t i m e Generally, unrepeatable read problem is characterized by: •T1 reads X, •T2 writes X, and •T1 reads X read_item (X ) Transaction T1 has got two different values of X in two subsequent reads, because T2 has changed it in the meantime Even if T1 didn't execute the second read command, it would use a stale X value, and that's another form of the unrepeatable read problem Unrepeatable Read / Inconsistent Retrieval problem The previous problems involved simultaneous updates to the database. However, problems can also result if a transaction merely reads the result of an uncommitted transaction. Below, one transaction (T1) is transferring £10 from account Balw to Balz, and at the same time, another transaction (T2) is summing all the accounts (Balw, Balx, Baly and Balz). Try to figure out what’s gone wrong…. Time T1 (transfer funds) T2 (sum accounts) Balw Balx Baly Balz Sum t1 begin_transaction 100 50 10 25 t2 begin_transaction sum=0 100 50 10 25 0 t3 read(Balw) read(Balw) 100 50 10 25 0 t4 balw=Balw - 10 sum = sum + Balw 100 50 10 25 100 t5 write(Balw) read(Balx) 90 50 10 25 100 t6 read(Balz) sum = sum + Balx 90 50 10 25 150 t7 balz = Balz + 10 read(Baly) 90 50 10 25 150 t8 write(Balz) sum = sum + Baly 90 50 10 35 160 t9 commit read(balz) 90 50 10 35 160 t10 sum = sum+Balz 90 50 10 35 195 t11 commit 90 50 10 35 195 ….here, the £10 transferred by T1 has been counted twice by T2, making its result too large by £10. A Question for You What is the difference between: Dirty read and Unrepeatable read A Question for You What is the difference between: Dirty read and Unrepeatable read Answers: a) b) c) There is no difference. Even if there is a difference, I can't recall what it is. The difference is: – – The dirty read is a consequence of reading updates made by a transaction before it has successfully finished (and has even failed later). The unrepeatable read is a consequence of allowing a transaction to read data that the other one is altering. Prevention of Concurrency Anomalies Lost update, dirty read and unrepeatable read are called concurrency anomalies The concurrency control part of a DBMS has the task to prevent these problems DBMS is responsible to ensure that either all operations of a transaction are successfully executed and their effect is permanently stored in the database, or it happens as if the transaction were even not started The effect of a partially executed transaction has to be undone Advantages of Concurrent Execution of Transactions Advantages of concurrent execution are: Increased Throughput and Resource Utilization, leading to better transaction throughput E.g. one transaction can be using the CPU while another is reading from or writing to the disk (processor and disk spend less time idle.) Because I/O activity can be done in parallel with CPU activity. Reduced Average Response Time: The average time for a transaction to be completed after it has been submitted For transactions: short transactions need not wait behind long ones. Concurrent execution reduces unpredictable delays in running trans. Implementation of Atomicity and Durability The recovery-management component of a database system implements the support for atomicity and durability. E.g. A very simple but extremely inefficient scheme is the shadow-copy scheme: all updates are made on a shadow copy of the database db_pointer is made to point to the updated shadow copy after the transaction reaches partial commit and all updated pages have been flushed to disk. Implementation of Atomicity and Durability (Cont.) db_pointer always points to the current consistent copy of the database. In case transaction fails, old consistent copy pointed to by db_pointer can be used, and the new copy can be deleted. The shadow-database scheme: Assumes that only one transaction is active at a time. Assumes disks do not fail Useful for text editors, but extremely inefficient for large databases (why?) Does not handle concurrent transactions Database Architecture Transaction Manager Buffer Manager Scheduler Recovery Manager Low Level DBMS Database Architecture The components of a DBMS that manage transactions are as follows: The transaction manager coordinates transactions on behalf of application programs. It communicates with the scheduler, which implements a particular strategy for concurrency control The scheduler tries to maximize concurrency without allowing transactions to interfere with one another. If failure occurs during a transaction, the recovery manager ensures that the database is restored to the state it was in before the start of the transaction. The buffer manager is responsible for the transfer of data between disk storage and main memory Transaction Management Both concurrency and recovery control are required to protect the database from data inconsistencies and data loss. Many DBMSs allow users to undertake simultaneous operations on the database. If these operations are not controlled, they can interfere with each other, and the database may become inconsistent. To overcome this the DBMS implements Concurrency control schemes – i.e. mechanisms to achieve isolation That is, to control the interaction among the concurrent transactions in order to prevent them from destroying the consistency of the database Database recovery is the process of restoring the database to a correct state following a failure– The failure may be the result of a system crash, media failure, software error, or accidental or malicious destruction of data. Whatever the cause of failure, the DBMS must be able to restore the database to a consistent state Concurrency Control Concurrency control is the process of managing simultaneous operations on the database without having them interfere with one another. Concurrency control is needed because many users are able to access the database simultaneously. Note that managing concurrent access is easy if all users are only reading data. There is no way that such uses can interfere with one another. However, when two or more users are accessing the databases simultaneously, and at least one of them is updating data, there may be interference that can cause inconsistencies. Concurrency Control How to prevent harmful interference btw transactions? T1 T2 … DB (consistency constraints) Tn To prevent such problems a DBMS must implement concurrency control techniques based on - locks - timestamps and validation Schedules Schedule – A sequences of instructions that specify the chronological order (possibly interleaving) in which instructions of concurrent transactions are executed Suppose there are n transactions T1, T2 ,…,Tn A schedule S of these n transactions is an ordering of their operations such that for each Ti participating in S, the operations of Ti in S appear in the same order as in Ti itself A schedule for a set of transactions must consist of all instructions of those transactions It must preserve the order in which the instructions appear in each individual transaction. Complete Schedules A transaction that successfully completes its execution will have a commit instructions as the last statement by default transaction assumed to execute commit instruction as its last step A transaction that fails to successfully complete its execution will have an abort instruction as the last statement A schedule that contains either an abort or a commit for each transaction whose actions are listed in it is called complete schedule. Serial Schedules A schedule S is said to be a serial schedule if for each Ti in S, all operations of Ti are executed consecutively in S If the actions of different transactions are not interleaved-that is transactions are executed from start to end one by one Serial schedules are considered to be correct, i.e. that they do not exhibit concurrency anomalies, like: lost update unrepeatable read dirty read But, serial schedules mean no interleaving and hence are considered inefficient Serial Schedules : Schedule 1 Let T1 transfer $50 from A to B, and T2 transfer 10% of the balance from A to B. A serial schedule in which T1 is followed by T2 : Schedule 2 • A serial schedule where T2 is followed by T1 Schedule 3 Let T1 and T2 be the transactions defined previously. The following schedule is not a serial schedule, its concurrent schedule. But it is equivalent to Schedule 1. In Schedules 1, 2 and 3, the sum A + B is preserved. Schedule 4 The following concurrent schedule does not preserve the value of (A + B ). Examples of Legal Schedules T<S 1. avoid lost update problem T: transfer $100 from A to C: S: transfer $100 from B to C: R(A) W(A) R(B) R(C) W(C) W(B) R(C) W(C) 2. avoid inconsistent retrievals problem T: transfer $100 from A to C: R(A) S: compute total balance for A and C: W(A) R(C) W(C) R(A) R(C) 3. avoid non-repeatable reads T: transfer $100 from A to C R(A) W(A) S: check balance and withdraw $100 from A: W(A) R(C) R(A) W(C) R(A) Defining the Legal Schedules 1. To be serializable, the conflicting operations of T and S must be ordered as if either T or S had executed first. 2. Suppose T and S conflict over some shared item(s) x. 3. In a serial schedule, T’s operations on x would appear before S’s, or vice versa....for every shared item x. We only care about the conflicting operations: everything else will take care of itself. As it turns out, this is true for all the operations, but again, we only care about the conflicting ones. 4. A legal (conflict-serializable) interleaved schedule of T and S must exhibit the same property. Either T or S “wins” in the race to x; serializability dictates that the “winner take all”. Serializable Schedules Basic Assumption – Each transaction preserves database consistency. Thus serial execution of a set of transactions preserves database consistency. Schedules that allow interleaving to some extent and are correct are called serializable schedules A serializable schedule is considered to be correct if it is equivalent to a serial schedule Different forms of schedule equivalence 1. conflict serializability 2. view serializability Conflict Serializable Schedules Two operations conflict if A serializiable schedule is conflict equivalent to a serial schedule if they belong to different transactions at least one of them is a write they access the same item X order of any two conflicting operations is the same in both schedules Simplified view of transactions We ignore operations other than read and write instructions We assume that transactions may perform arbitrary computations on data in local buffers in between reads and writes. Our simplified schedules consist of only read and write instructions. Conflicting Instructions Instructions li and lj of transactions Ti and Tj respectively, conflict if and only if there exists some item Q accessed by both li and lj, and at least one of these instructions wrote Q. 1. li = read(Q), lj = read(Q). li and lj don’t conflict. 2. li = read(Q), lj = write(Q). They conflict. 3. li = write(Q), lj = read(Q). They conflict 4. li = write(Q), lj = write(Q). They conflict Intuitively, a conflict between li and lj forces a (logical) temporal order between them. If li and lj are consecutive in a schedule and they do not conflict, their results would remain the same even if they had been interchanged in the schedule. Conflict Non Equivalent Schedules Schedule S2 Serial schedule S1 T1 T2 T1 read_item ( X ) X=X–N write_item (X ) T2 read_item ( X ) read_item (X) read_item (X) X=X+M write_item (X) t i m e X=X–N t X=X+M i write_item (X ) m write_item (X) e Schedules S1 and S2 are NOT conflict equivalent, since: •in S1, write1_item (X ), read2_item(X ) •in S2, read2_item (X ), write1_item(X ) S2 is not conflict serializable schedule, because one can't find a serial schedule that is conflict equivalent to S2 Conflict Serializability If a schedule S can be transformed into a schedule S´ by a series of swaps of nonconflicting instructions, we say that S and S´ are conflict equivalent. We say that a schedule S is conflict serializable if it is conflict equivalent to a serial schedule. Conflict Serializability (Cont.) Schedule 3 can be transformed into Schedule 6, a serial schedule where T2 follows T1, by series of swaps of non-conflicting instructions. Therefore Schedule 3 is conflict serializable. Schedule 3 Schedule 6 Conflict Serializability (Cont.) Example of a schedule that is not conflict serializable: We are unable to swap instructions in the above schedule to obtain either the serial schedule < T3, T4 >, or the serial schedule < T4, T3 >. View Serializability Let S and S´ be two schedules with the same set of transactions. S and S´ are view equivalent if the following three conditions are met, for each data item Q, 1. 2. 3. If in schedule S, transaction Ti reads the initial value of Q, then in schedule S’ also transaction Ti must read the initial value of Q. If in schedule S transaction Ti executes read(Q), and that value was produced by transaction Tj (if any), then in schedule S’ also transaction Ti must read the value of Q that was produced by the same write(Q) operation of transaction Tj . The transaction (if any) that performs the final write(Q) operation in schedule S must also perform the final write(Q) operation in schedule S’. As can be seen, view equivalence is also based purely on reads and writes alone. View Serializability (Cont.) A schedule S is view serializable if it is view equivalent to a serial schedule. Every conflict serializable schedule is also view serializable. But reverse is not true. Below is a schedule which is view-serializable but not conflict serializable.( It is view equivalent to serial schedule<T3,T4,T6>) Every view serializable schedule that is not conflict serializable has blind writes. (Without having performed read(Q) operation directly writes operations are called blind writes) Other Notions of Serializability The schedule below produces same outcome as the serial schedule < T1, T5 >, yet is not conflict equivalent or view equivalent to it. Determining such equivalence requires analysis of operations other than read and write. The Graph Test for Serializability To determine if a schedule is serializable, make a directed graph: Add a node for each committed transaction. Add an arc from T to S if any equivalent serial schedule must order T before S. T must commit before S iff the schedule orders some operation of T before some operation of S. The schedule only defines such an order for conflicting operations... ...so this means that a pair of accesses from T and S conflict over some item x, and the schedule says T “wins” the race to x. The schedule is conflict-serializable if the graph has no cycles. (winner take all) Testing for Serializability Consider some schedule of a set of transactions T1, T2, ..., Tn Precedence graph — The graph consist of a pair G=(V,E). A direct graph where V is set of vertices consist of all transactions and E set of edges consist of all edges Ti Tj for which one of three condition holds: Ti executes write(X) before Tj executes read(X). Ti executes read(X) before Tj executes write(X). Ti executes write(X) before Tj executes write(X). If precedence graph for schedule S has a cycle, then schedule S is not conflict serializable.If graph contains no cycles then schedule S is conflict serializable. Example 1 V R(X) W(X) E W(X) y R(X) Testing for Conflict Serializability Schedule S2 T1 T2 read2_item(X ), write1_item(X ) read_item ( X ) read_item (X) X=X–N t X=X+M i write_item (X ) m write_item (X) e T1 T2 read1_item(X ), write2_item(X ) write1_item(X ), write2_item(X ) Testing for Conflict Serializability Schedule S3 T1 T2 read_item ( X ) X=X–N write_item (X ) read_item (X) X=X+M read_item ( Y ) Y=Y+Q write_item (Y ) write_item (X) t i m e T1 T2 read1_item(X ), write2_item(X ) write1_item(X ), read2_item(X ) write1_item(X ), write2_item(X ) Test for Conflict Serializability A schedule is conflict serializable if and only if its precedence graph is not cyclic. Cycle-detection algorithms (such as depth first search) require on the order of n2 operations, where n is the number of vertices in the graph. (Better algorithms take order n + e where e is the number of edges.) If precedence graph is acyclic (not cyclic),the serializability order can be obtained by a topological sorting of the graph. This is a linear order consistent with the partial order of the graph. For example, a serializability order for Schedule A would be Ti-Tj-Tk-Tm , Ti-Tk-Tj-Tm Test for View Serializability The precedence graph test for conflict serializability cannot be used directly to test for view serializability. The problem of checking if a schedule is view serializable falls in the class of NP-complete problems. Extension to test for view serializability has cost exponential in the size of the precedence graph. Thus existence of an efficient algorithm is extremely unlikely. However practical algorithms that just check some sufficient conditions for view serializability can still be used. Recoverable Schedules Need to address the effect of transaction failures on concurrently running transactions. Recoverable schedule —For each pair of transactions Ti, Tj such that Tj reads a data item previously written by a transaction Ti , then the commit operation of Ti appears before the commit operation of Tj. The following schedule is not recoverable if T9 commits immediately after the read If T8 abort, T9 would have read (and possibly shown to the user) an inconsistent database state. Hence, database must ensure that schedules are recoverable. Cascading Rollbacks Cascading rollback – a single transaction failure leads to a series of transaction rollbacks. Consider the following schedule where none of the transactions has yet committed (so the schedule is recoverable) If T10 fails, T11 and T12 must also be rolled back because T11 depends on T10 and T12 depends on T11. Can lead to the undoing of a significant amount of work Cascadeless Schedules Cascadeless schedules — cascading rollbacks cannot occur; for each pair of transactions Ti and Tj such that Tj reads a data item previously written by Ti, the commit operation of Ti appears before the read operation of Tj. Every cascadeless schedule is also recoverable It is desirable to restrict the schedules to those that are cascadeless Concurrency Control A database must provide a mechanism that will ensure that all possible schedules are either conflict or view serializable, and are recoverable and preferably cascadeless A policy in which only one transaction can execute at a time generates serial schedules, but provides a poor degree of concurrency Testing a schedule for serializability after it has executed is a little too late! Goal – to develop concurrency control protocols that will assure serializability. Concurrency Control vs. Serializability Tests Concurrency-control protocols allow concurrent schedules, but ensure that the schedules are conflict/view serializable, and are recoverable and cascadeless . Concurrency control protocols generally do not examine the precedence graph as it is being created Instead a protocol imposes a discipline that avoids nonseralizable schedules. We study such protocols in next chapter. Different concurrency control protocols provide different tradeoffs between the amount of concurrency they allow and the amount of overhead that they incur. Tests for serializability help us understand why a concurrency control protocol is correct. Summary Executing transaction in an interleaved way may bring a database in an inconsistent state Transaction anomalies are: Lost update, Dirty read, and Unrepeatable read A DBMS is responsible to ensure that either all operations of a transaction are successfully executed, or it is rolled back Log file records all important events (start, read, write, commit) When a transaction reaches its commit point, everything is safely stored in a database (or a log file) Summary on Serializability Concurrency anomalies are avoided if a schedule is serializable (equivalent to a serial schedule) Serializable schedules are desirable because serial schedules are generally inefficient A schedule is conflict serializable if there are no cycles in the precedence graph Serializibility is only a tool to develop understanding of desirable schedules, but it is not tested by DBMS before executing a series of transactions Instead, DBMS’s apply protocols that assure correctness of the schedule Many of these protocols rely on locking End of Chapter