* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download 4. Transaction processing concepts
Clusterpoint wikipedia , lookup
Database model wikipedia , lookup
Consistency model wikipedia , lookup
Microsoft Jet Database Engine wikipedia , lookup
Object-relational impedance mismatch wikipedia , lookup
Extensible Storage Engine wikipedia , lookup
Global serializability wikipedia , lookup
Commitment ordering wikipedia , lookup
3. Transaction processing concepts What is a transaction? • Logical unit of work • Execution of a program that accesses or changes the database Fundamental property of a database: • The data are shared among several users and applications concurrent processing, possibly conflicting interests. AdvDB-3 J. Teuhola 2015 64 Transaction viewpoints • A transaction should lead the database from one consistent state to another. • Partial execution not allowed (principle: all or nothing) • Concurrent access to data by multiple transactions should be supported. • Transactions may end prematurely, due to system (hardware/software) failure; recovery mechanisms are needed. AdvDB-3 J. Teuhola 2015 65 Categories of database processing • OLTP = On-Line Transaction Processing – Needed for the everyday functioning of the organization’s main activities, accessing the ’operative’ databases – Short transactions and response times • OLAP = On-Line Analytic Processing – – – – Decision support, data mining Long read-only transactions ’Data warehouse’ The data need not be exactly up-to-date; refreshed periodically AdvDB-3 J. Teuhola 2015 66 Interleaved execution • Concurrent transactions may be executed in an interleaved fashion. • Transactions consist of steps, between which the data may be inconsistent. • Arbitrary interleaving of steps will cause problems (interference): – Lost update problem – Temporary update problem (‘dirty read’) – Incorrect summary problem AdvDB-3 J. Teuhola 2015 67 Lost update problem Example: Bank account, two parallel withdrawals Time Transaction 1: Withdraw 200 from account 123 Transaction 2: Withdraw 100 from account 123 bal = read_balance(123) = 1000 € bal = bal – 200 € bal = read_balance(123) = 1000 € bal = bal - 100 € write_balance(123, bal) = 800 € write_balance(123, bal) = 900 € • The first update is lost. Note that the transactions are not aware of each other’s internal program variables (bal). AdvDB-3 J. Teuhola 2015 68 Temporary update (‘dirty read’) problem Example transactions on bank accounts: Time Transaction 1 Transfer 100 € from account 123 to account 789 Transaction 2 Withdraw 200 € from account 123 bal = read_balance(123)=1000€ bal = bal - 100€ write_balance(123, bal)=900€ bal = read_balance(123) = 900€ bal = read_balance(789)=2000€ abort; recover_balance(123)=1000€ bal = bal - 200 € write_balance(123, bal)=700 € // Should be 800 € AdvDB-3 J. Teuhola 2015 69 Incorrect summary problem Time Transaction 1 Sum of balances Transaction 2 Transfer 100 € from 789 to 123 sum = 0 bal = read_balance(123) = 1000 € sum = sum + bal = 1000 € bal = read_balance(456) = 2000 € sum = sum + bal = 3000€ bal = read_balance(789) = 3000 € bal = bal – 100 € write_balance(789, bal) = 2900 € bal = read_balance(123) = 1000 € bal = bal + 100€ write_balance(123, bal) = 1100 € bal = read_balance(789) = 2900 € sum = sum + 2900 € = 5900 € // Should be 6000 € AdvDB-3 J. Teuhola 2015 70 Reasons for recovery • DBMS must ensure that each transaction is either (1) executed in totality (all operations completed successfully; updates made permanent), or (2) has no effect on the database contents nor on other transactions. • In case of failure, the DBMS must roll back the (partially executed) transaction so that (2) holds. AdvDB-3 J. Teuhola 2015 71 Types of failures • Hardware/software failure: contents of main memory may be lost. • Transaction error: e.g. illegal operation or user interrupt. • Logical error: Violation of database consistency constraints. • Concurrency control error: Interference of transactions, e.g. deadlock. • Disk failure, accidents, etc.: Some data are usually lost. AdvDB-3 J. Teuhola 2015 72 Transaction events • • • • • • • begin_transaction read/write commit: successful end; fix the changes rollback (abort): unsuccessful ending undo effects of a failed transaction redo effects of a successful transaction end_transaction AdvDB-3 J. Teuhola 2015 73 System log • On disk (permanent storage) • Enables recovery from failures • Stores data about transaction states and performed updates (before-/after-images) • Transactions identified by a unique id. • Log entries: start, [read,] write, commit, abort • Undo needs before-image (old value) • Redo needs after-image (new value) AdvDB-3 J. Teuhola 2015 74 Commit point • Database operations performed successfully • Effects recorded in the log. • At commit the log page in buffer must be force-written to disk. • Writing other buffer pages may be postponed. • At failure: – Uncommitted transactions are undone. – Committed transactions may have to be redone. AdvDB-3 J. Teuhola 2015 75 Checkpoint • Effects of write operations of committed transactions are forced to disk. • A transaction need not be redone at failure, if its commit-entry in the log is before the last checkpoint. • Checkpoint is taken periodically. AdvDB-3 J. Teuhola 2015 76 Checkpoint actions • Suspend execution. • Force-write committed updates. • Write the checkpoint info to the log: – active transactions – pointers to the first and last log entries of transactions (to speed-up recovery) • Resume execution. AdvDB-3 J. Teuhola 2015 77 Transaction processing scenario Main memory Program 1 Program 2 DBMS manages DB buffer Log buffer ‘Page frames’ Force-write at checkpoint DB Disk pages AdvDB-3 Force-write at commit Log pages J. Teuhola 2015 Before-/ after-images of changed tuples 78 Desirable (ACID) properties of transactions • Atomicity: All or nothing. • Consistency: From one consistent state of the database to another. • Isolation: Updates are invisible before commit; no need for cascading rollbacks. • Durability: Committed changes are not lost (responsibility of the recovery method), except in catastrophic failures. AdvDB-3 J. Teuhola 2015 79 Levels of isolation 0: No overwrite of dirty reads. 1: No lost updates. 2. No dirty reads & no lost updates. 3. ‘True isolation’: No dirty reads & no lost updates & repeatable read. AdvDB-3 J. Teuhola 2015 80 Schedule • Assume a set of transactions {T1, ..., Tn} with ordered operations: Ti = <op1>i ; <op2>i ; ... • A schedule S is an ordered set of operations, where each Ti-sequence is a subsequence of S. • Notice: Operations of different transactions may be interleaved. AdvDB-3 J. Teuhola 2015 81 Example schedule • Operation sequences of three transactions: T1: T2: T3: r1(x); r1(y); w1(x); c1; r2(y); w2(y); c2; r3(x); r3(y); w3(x); c3; • One possible interleaved schedule: r1(x); r2(y); r1(y); w1(x); r3(x); r3(y); w2(y); w3(x); c2; c1; c3; AdvDB-3 J. Teuhola 2015 82 Conflict of operations • Notation for operations: <op><trans-id>(<data item>) where <op> is one of {r, w, c, a} = {read, write, commit, abort} • Conflict (= danger): Two operations <op1>i(X) and <op2>j(X) may cause trouble if i j and (<op1> = w or <op2> = w) AdvDB-3 J. Teuhola 2015 83 Example of conflicts • Schedule: r1(x); r1(y); r2(x); w1(x); r2(y); w2(x); c1; c2; • Conflicting pairs of operations: – r1(x) and w2(x) – r2(x) and w1(x) – w1(x) and w2(x) • Not in conflict: – r1(y) and r2(y) AdvDB-3 J. Teuhola 2015 84 Complete schedule S is a complete schedule of T1, T2, ..., Tn, if (1) S contains operations of T1, T2, ..., Tn (but no others). (2) Operation order of each Ti is preserved. (3) Order of operations is fixed for each conflicting pair; otherwise ordering may be partial. (4) Commit/abort is the last operation of each Ti. AdvDB-3 J. Teuhola 2015 85 Committed projection • Committed projection of S = All operations of S belonging to committed transactions. • Reason for this concept: New transactions appear all the time; difficult to find a moment when no active transactions exist. AdvDB-3 J. Teuhola 2015 86 Recoverable schedule • Transaction T2 that reads X does not commit until T1 that wrote X (before the read) has committed. • Explanation: T2 can be rolled back as long as it has not committed. If T1 aborts, T2 should be rolled back to avoid consequences of dirty read. AdvDB-3 J. Teuhola 2015 87 Cascading rollback Cascade phenomenon: T2 has to be aborted because T1 aborted; T3 has to be aborted because T2 aborted; ... Prevention: Read an item only after the writing transaction has committed (cascadeless schedule). Strict schedule: Read/write X only after the last writer transaction of X has committed. Advantage: Simpler undo. Disadvantage: Lower level of concurrency AdvDB-3 J. Teuhola 2015 88 Serializability Aim: • Correct (but not necessarily serial) schedules for interfering transactions Means: • Sufficient isolation Serial schedule: • All operations of each transaction Ti are executed consecutively, without interleaving. AdvDB-3 J. Teuhola 2015 89 Serializability (cont.) Axiom: For independent transactions every serial schedule is correct (n! alternatives for n transactions). Problem: Limited concurrency; if one transaction waits for I/O, another cannot take over; the processor is idle. Serializable schedule: Equivalent to some serial schedule (same result, though interleaved execution); means logical correctness. AdvDB-3 J. Teuhola 2015 90 Forms of schedule equivalence • Result equivalence: May be accidental; it should hold for all possible database states. • Conflict equivalence: The order of any two conflicting operations is the same in both schedules. • View equivalence: A read operation reads the result of the same write operation (or none) in both schedules. AdvDB-3 J. Teuhola 2015 91 Forms of serializability Corresponding to the forms of equivalence: • Conflict serializability: Simple testing algorithm • View serializability: Less restrictive than conflict serializability. Testing is an NP-complete problem. Note. Every conflict-serializable schedule is also view-serializable, but not vice versa. AdvDB-3 J. Teuhola 2015 92 Testing conflict serializability 1. For each transaction, create a node in a precedence graph. 2. For each pair of conflicting operations <op>Ti(X) and <op>Tj(X), occurring in this order, create an edge from Ti to Tj. 3. The schedule is serializable if the precedence graph has no cycles. AdvDB-3 J. Teuhola 2015 93 Serializable schedule equivalent serial schedule 1. Derive the precedence graph of transactions as above. 2. Perform a topological sort of the graph nodes. 3. The resulting sequence is the solution (which may not be unique, in general). AdvDB-3 J. Teuhola 2015 94 Example of a precedence graph Schedule: r1(x), r2(y), w1(x), r3(x), r3(y), w3(x), w3(y), w2(y), c1, c2, c3 1 2 y x y 3 AdvDB-3 J. Teuhola 2015 The schedule is not serializable, due to the cycle. 95 Problems of serializability • Difficult to determine the order of operations beforehand. • Scheduling must be done on-the-fly, based on waiting, priorities, system load, etc. • Testing serializability afterwards may be too optimistic (cf. optimistic concurrency control). • Serializability can be applied only to the committed projection. AdvDB-3 J. Teuhola 2015 96 Solution to maintaining serializability • Control the interleaving of operations by obeying rules (a protocol) that guarantee serializability without checking it. (E.g. two-phase locking protocol, see later) • Conclusion: Serializability theory helps to gain better understanding of protocols, to obtain correct interleaving and improved concurrency. AdvDB-3 J. Teuhola 2015 97 Debit-credit transactions • Less restrictive form of equivalence • Addition to and subtraction from an item can be done in any order (commutativity). • A schedule is correct if <read, add/subtract, write> triples are not interrupted. AdvDB-3 J. Teuhola 2015 98 Granularity • To be decided: Unit of data considered in scheduling • A smaller unit enables a higher level of concurrency (but involves also more effort to manage) • A (disk) page is often the simplest unit of operation. AdvDB-3 J. Teuhola 2015 99 Transaction support in SQL • Implicit BEGIN_TRANSACTION • Explicit COMMIT or ROLLBACK • SET TRANSACTION characteristics: – Access mode: READ ONLY or READ WRITE – Diagnostic area size: Number of feedback conditions held simultaneously. – Isolation level (from weakest to strongest): • • • • READ UNCOMMITTED (allows dirty reads) READ COMMITTED (allows non-repeatable reads) REPEATABLE READ (allows ‘phantoms’) SERIALIZABLE (actually stronger than defined earlier) AdvDB-3 J. Teuhola 2015 100