Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Consistency model wikipedia , lookup
Relational model wikipedia , lookup
Clusterpoint wikipedia , lookup
Database model wikipedia , lookup
Microsoft Jet Database Engine wikipedia , lookup
Object-relational impedance mismatch wikipedia , lookup
Extensible Storage Engine wikipedia , lookup
Versant Object Database wikipedia , lookup
Global serializability wikipedia , lookup
Commitment ordering wikipedia , lookup
Transaction Processing 1 Transactions • Many enterprises use databases to store information about their state – e.g., Balances of all depositors at a bank • When an event occurs in the real world that changes the state of the enterprise, a program is executed to change the database state in a corresponding way – e.g., Bank balance must be updated when deposit is made • Such a program is called a transaction 2 What Does a Transaction Do? • Update the database to reflect the occurrence of a real world event – Deposit transaction: Update customer’s balance in database • Cause the occurrence of a real world event – Withdraw transaction: Dispense cash (and update customer’s balance in database) • Return information from the database – RequestBalance transaction: Outputs customer’s balance 3 Transactions • The execution of each transaction must maintain the relationship between the database state and the enterprise state • Therefore additional requirements are placed on the execution of transactions beyond those placed on ordinary programs: – Atomicity – Consistency ACID – Isolation properties – Durability 4 ACID Properties • Atomic - Transaction should either complete or have no effect at all – Responsibility of transaction processing system • Consistent - Transaction should correctly transform the database state to reflect the effect of a real world event – Responsibility of transaction designer • Isolation - The effect of concurrently executing a set of transactions is the same as if they had executed serially (serializable) – Responsibility of transaction processing system • Durable - The effect of a transaction on the database state should not be lost once the transaction has committed – Responsibility of transaction processing system 5 Database Consistency • Enterprise (Business) Rules limit the occurrence of certain real-world events – Student cannot register for a course if the current number of registrants equals the maximum allowed • These limitations are (static) integrity constraints: assertions that must be satisfied by the database state • Database is consistent if all static integrity constraints are satisfied 6 Transaction Consistency • A consistent database state does not necessarily model the actual state of the enterprise – A deposit transaction that increments the balance by the wrong amount maintains the integrity constraint balance 0, but does not maintain the relation between the enterprise and database states • A consistent transaction maintains database consistency and the correspondence between the database state and the enterprise state (implements its specification) – Specification of deposit transaction includes balance = balance + amt_deposit , (balance is the initial value of balance) 7 Atomicity • A real-world event either happens or does not happen – Student either registers or does not register • Similarly, the system must ensure that either the corresponding transaction runs to completion or, if not, it has no effect at all – Not true of ordinary programs. A crash could leave files partially updated on recovery 8 Commit and Abort • If the transaction successfully completes it is said to commit – The system is responsible for preserving the transaction’s results in spite of subsequent failures • If the transaction does not successfully complete, it is said to abort – The system is responsible for undoing, or rolling back, any changes the transaction has made 9 Reasons for Abort • System crash • Transaction aborted by the system – Execution cannot be made atomic (a site is down) – Execution did not maintain database consistency (integrity constraint is violated) – Execution was not isolated – Resources not available (deadlock) • Transaction requests to roll back 10 API for Transactions • DBMS provide commands for setting transaction boundaries. For example: – begin transaction – commit – rollback • The commit command is a request – The system might commit the transaction, or it might abort it for one of the reasons on the previous slide • The rollback command is always satisfied 11 Durability • Durability deals with failure: – Media failure • The system must ensure that once a transaction commits, its effect on the database state is not lost in spite of subsequent failures – Not true of ordinary programs. A media failure after a program successfully terminates could cause the file system to be restored to a state that preceded the program’s execution • Mechanism for dealing with failures is the log 12 Isolation • Serial Execution: The transactions execute one after the other – Each one starts after the previous one completes. – The execution of each transaction is isolated from all others. • Serial execution is inadequate from a performance perspective • Concurrent execution offers performance benefits: – A computer system has multiple resources capable of executing independently (e.g., cpu’s, I/O devices), – only concurrently executing transactions can make effective use of the system 13 Concurrent Execution 14 Isolation • Concurrent (interleaved) execution of a set of consistent transactions offers performance benefits, but might not be correct • Example: course registration; cur_reg is number of current registrants T1: r(cur_reg : 29) w(cur_reg : 30) T2: r(cur_reg : 29) w(cur_reg : 30) time Result: Database state no longer corresponds to real-world state, integrity constraint violated (cur_reg <> |list_of_registered_students|) 15 Transaction Schedule T1: begin_transaction(); …. p1,1; …. p1,2; …. p1,3; local commit(); variables Transaction schedule p1,3 p1,2 p1,1 To db server • Consistent - performs correctly when executed in isolation starting in a consistent database state – Preserves database consistency – Moves database to a new state that corresponds to new realworld state 16 Schedule T1 T2 Arriving schedule (merge of transaction schedules) Concurrency Control Schedule in which requests are serviced To database T3 transaction schedules Database server 17 Example Schedules • Let T1 transfer $50 from A to B, and T2 transfer 10% of the balance from A to B. The following is a serial schedule, in which T1 is followed by T2. Schedule 1 18 Example Schedule (Cont.) • Let T1 and T2 be the transactions defined previously. The following schedule is not a serial schedule, but it is equivalent to the previous Schedule. Schedule 2 In both Schedule 1 and 2, the sum A + B is preserved. 19 Example Schedules (Cont.) • The following concurrent schedule does not preserve the value of the sum A + B. Schedule 3 20 Correct Schedules • Interleaved schedules equivalent to serial schedules are the only ones guaranteed to be correct for all applications • Equivalence based on commutativity of operations • Definition: Database operations p1 and p2 commute if, for all initial database states, they return the same results and leave the database in the same final state when executed in either order. 21 Commutativity of Read and Write Operations • p1 commutes with p2 if – They operate on different data items • w1(x) commutes with w2(y) and r2(y) – Both are reads • r1(x) commutes with r2(x) • Operations that do not commute conflict • w1(x) conflicts with w2(x) • w1(x) conflicts with r2(x) 22 Equivalence of Schedules • An interchange of adjacent operations of different transactions in a schedule creates an equivalent schedule if the operations commute S1 : S1,1, pi,j, pk,l, S1,2 S2 : S1,1, pk,l, pi,j, S1,2 where i < > k • Equivalence is transitive: If S1 can be derived from S2 by a series of such interchanges, S1 is equivalent to S2 23 Example of Equivalence conflict S1: r1(x) r2(x) w2(x) r1(y) w1(y) S2: r1(x) r2(x) r1(y) w2(x) w1(y) S3: r1(x) r1(y) r2(x) w2(x) w1(y) S4: r1(x) r1(y) r2(x) w1(y) w2(x) S5: r1(x) r1(y) w1(y) r2(x) w2(x) conflicting operations ordered in same way S1 is equivalent to S5 S5 is the serial schedule T1, T2 S1 is serializable S1 is not equivalent to the serial schedule T2, T1 24 Example of Equivalence T1: begin transaction read (x, X); X = X+4; write (x, X); commit; x=1, y=3 Interchange commuting operations T2: begin transaction read (x,Y); write (y,Y); commit; r1(x) r2(x) w2(y) w1(x) x=5, y=1 x=5, y=1 r2(x) w2(y) r1(x) w1(x) T2 T1 r1(x) r2(x) w2(y) w1(x) x=1, y=3 Interchange conflicting operations x=5, y=1 x=5, y=5 r1(x) w1(x) r2(x) w2(y) T1 T2 25 Serializable Schedules • S is serializable if it is equivalent to a serial schedule • Transactions are totally isolated in a serializable schedule • A schedule is correct for any application if it is a serializable schedule of consistent transactions • The schedule : r1(x) r2(y) w2(x) w1(y) is not serializable 26 Serializability • Different forms of schedule equivalence give rise to the notions of: 1. Conflict serializability 2. View serializability 27 Conflict Serializability • Two schedules are conflict equivalent if: – they have the same sets of actions, and – each pair of conflicting actions is ordered in the same way. • A schedule is conflict serializable if it is conflict equivalent to a serial schedule. – Note: Some serializable schedules are not conflict serializable! 28 Conflict Serializability (Cont.) • Schedule 2 below can be transformed into Schedule 1, a serial schedule where T2 follows T1, by series of swaps of non-conflicting instructions. Therefore Schedule 2 is conflict serializable. 29 Conflict Serializability (Cont.) • Example of a schedule that is not conflict serializable: T3 T4 read(Q) write(Q) write(Q) We are unable to swap instructions in the above schedule to obtain either the serial schedule < T3, T4 >, or the serial schedule < T4, T3 >. 30 Precedence Graph • A Precedence (or Serializability) graph: – Node for each Xact. – Arc from Ti to Tj if an action of Ti precedes and conflicts with an action of Tj. • T1 transfers $100 from A to B, T2 adds 6% – R1(A), W1(A), R2(A), W2(A), R2(B), W2(B), R1(B), W1(B) T1 T2 31 Precedence Graph of a Schedule, S • Theorem - A schedule is conflict serializable if and only if its precedence graph has no cycles • If precedence graph is acyclic, the serializability order can be obtained by a topological sorting of the graph. This is a linear order consistent with the partial order of the graph. 32 Example Conflict (*) S: … p1,i, …, p2,j, ... T2 T4 * T1 T5 S is serializable in order T1 T2 T3 T4 T5 T6 T7 T6 T7 T3 T2 T1 T4 T5 T3 S is not serializable due to cycle T2 T6 T7 T2 T6 T7 33 Example Schedule (Schedule A) T1 T2 read(X) T3 T4 T5 read(Y) read(Z) read(V) read(W) read(W) read(Y) write(Y) write(Z) read(U) read(Y) write(Y) read(Z) write(Z) read(U) write(U) 34 Precedence Graph for Schedule A T1 T3 T2 T4 A serializability order for Schedule A would be T5 T1 T3 T2 T4 . 35 View Equivalence • Two schedules of the same set of operations are view equivalent if: – Corresponding read operations in each return the same values (hence computations are the same) – Both schedules yield the same final database state • Conflict equivalence implies view equivalence. • View equivalence does not imply conflict equivalence. 36 View Equivalence T1: w(y) w(x) T2: r(y) w(x) T3: w(x) • Schedule is not conflict equivalent to a serial schedule • Schedule has same effect as serial schedule T2 T1 T3. It is view equivalent to a serial schedule and hence serializable 37 View Serializability • Let S and S´ be two schedules with the same set of transactions. S and S´ are view equivalent if the following three conditions are met: 1. For each data item Q, if transaction Ti reads the initial value of Q in schedule S, then transaction Ti must, in schedule S´, also read the initial value of Q. 2. For each data item Q if transaction Ti executes read(Q) in schedule S, and that value was produced by transaction Tj (if any), then transaction Ti must in schedule S´ also read the value of Q that was produced by transaction Tj . 3. For each data item Q, the transaction (if any) that performs the final write(Q) operation in schedule S must perform the final write(Q) operation in schedule S´. 38 Conflict vs View Equivalence set of schedules that are view equivalent to serial schedules set of schedules that are conflict equivalent to serial schedules • A concurrency control based on view equivalence should provide better performance than one based on conflict equivalence since less reordering is done but … • It is difficult to implement a view equivalence concurrency control • a concurrency control that guarantees conflict equivalence to serial schedules ensures correctness and is easily implemented 39 Recoverability: Schedules with Aborted Transactions T1 : r (x) w(y) commit T2: w(x) abort • T2 has aborted but has had an indirect effect on the database – schedule is unrecoverable • Problem: T1 read uncommitted data - dirty read • Solution: A concurrency control is recoverable if it does not allow T1 to commit until all other transactions that wrote values T1 read have committed T1 : r (x) w(y) req_commit abort T2: w(x) abort 40 Cascaded Abort • Recoverable schedules solve abort problem but allow cascaded abort: abort of one transaction forces abort of another T1: r (y) w(z) abort T2: r (x) w(y) abort T3: w(x) abort • Better solution: prohibit dirty reads 41 Dirty Write • Dirty write: A transaction writes a data item written by an active transaction • Dirty write complicates rollback: no rollback necessary T1: w(x) abort T2 : w(x) abort what value of x should be restored? 42 Strict Schedules • Strict schedule: Dirty writes and dirty reads are prohibited • Strict and serializable are two different properties – Strict, non-serializable schedule: r1(x) w2(x) r2(y) w1(y) c1 c2 – Serializable, non-strict schedule: w2(x) r1(x) w2(y) r1(y) c1 c2 43 Concurrency Control vs. Serializability Tests • Testing a schedule for serializability after it has executed is a little too late! • Goal – to develop concurrency control protocols that will assure serializability. They will generally not examine the precedence graph as it is being created; instead a protocol will impose a discipline that avoids nonseralizable schedules. • Tests for serializability help understand why a concurrency control protocol is correct. 44 Concurrency Control Serializable schedule Arriving schedule (from transactions) Concurrency Control (to processing engine) • Concurrency control cannot see entire schedule: – It sees one request at a time and must decide whether to allow it to be serviced • Strategy: Do not service a request if: – It violates strictness or serializability, or – There is a possibility that a subsequent arrival might cause a violation of serializability 45 Locking: A Technique for C. C. • A transaction can read a database item if it holds a read (shared) lock on the item • It can read or update the item if it holds a write (exclusive) lock • If the transaction does not already hold the required lock, a lock request is automatically made as part of the access 46 Locking • Concurrency control usually done via locking. • Lock info maintained by a “lock manager”: – Stores (XID, RID, Mode) triples. • This is a simplistic view; suffices for now. – Mode {S,X} – Lock compatibility table: • If a Xact can’t get a lock, it is suspended on a wait queue. -- S X -- S X 47 Two-Phase Locking (2PL) • 2PL: – If T wants to read an object, first obtains an S lock. – If T wants to modify an object, first obtains X lock. – If T releases any lock, it can acquire no new locks! • Locks are automatically obtained by DBMS. • Guarantees serializability! – Why? lock point growing phase # of locks shrinking phase Time 48 Strict 2PL • Strict 2PL: – If T wants to read an object, first obtains an S lock. – If T wants to modify an object, first obtains X lock. – Hold all locks until end of transaction. • Guarantees serializability, and recoverable schedule, too! # of locks Time 49 Lock Management • Lock and unlock requests are handled by the lock manager • Lock table entry: – – – Number of transactions currently holding a lock Type of lock held (shared or exclusive) Pointer to queue of lock requests • Locking and unlocking have to be atomic operations • Lock upgrade: transaction that holds a shared lock can be upgraded to hold an exclusive lock 50 Lock Manager Implementation • Question 1: What are we locking? – Tuples, pages, or tables? – Finer granularity increases concurrency, but also increases locking overhead. • Question 2: How do you “lock” something?? • Lock Table: A hash table of Lock Entries. – Lock Entry: • OID • Mode • List: Xacts holding lock • List: Wait Queue 51 Handling a Lock Request Lock Request (XID, OID, Mode) Mode==S Mode==X Currently Locked? Empty Wait Queue? Yes No Yes Currently X-locked? Yes No Put on Queue No Grant Lock 52 More Lock Manager Logic • On lock release (OID, XID): – Update list of Xacts holding lock. – Examine head of wait queue. – If Xact there can run, add it to list of Xacts holding lock (change mode as needed). – Repeat until head of wait queue cannot be run. • Note: Lock request handled atomically! – via latches (i.e. semaphores/mutex; OS stuff). 53 Lock Upgrades • Think about this scenario: – T1 locks A in S mode, T2 requests X lock on A, T3 requests S lock on A. What should we do? • In contrast: – T1 locks A in S mode, T2 requests X lock on A, T1 requests X lock on A. What should we do? • Allow such upgrades to supersede lock requests. – Consider this scenario: • S1(A), X2(A), X1(A): DEADLOCK! • BTW: Deadlock can occur even w/o upgrades: – X1(A), X2(B), S1(B), S2(A) 54 Deadlocks • Deadlock: Cycle of transactions waiting for locks to be released by each other. • Two ways of dealing with deadlocks: – – Deadlock prevention Deadlock detection 55 Deadlock Prevention X1(A), X2(B), S1(B), S2(A) • Assign a timestamp to each Xact as it enters the system. “Older” Xacts have priority. • Assume Ti requests a lock, but Tj holds a conflicting lock. – Wait-Die: If Ti has higher priority, it waits; else Ti aborts. (non-preemptive) – Wound-Wait: If Ti has higher priority, abort Tj; else Ti waits. (preemptive) – Note: After abort, restart with original timestamp! – Both guarantee deadlock-free behavior! Pros and cons of each? 56 Deadlock Detection • Create a waits-for graph: – – Nodes are transactions There is an edge from Ti to Tj if Ti is waiting for Tj to release a lock • Periodically check for cycles in the waits-for graph. • “Shoot” some Xact to break the cycle. • Simpler hack: time-outs. – T1 made no progress for a while? Shoot it. 57 Deadlock Detection (Continued) Example: T1: S(A), R(A), T2: T3: T4: S(B) X(B),W(B) X(C) S(C), R(C) X(A) X(B) T1 T2 T1 T2 T4 T3 T3 T3 58 Prevention vs. Detection • Prevention might abort too many Xacts. • Detection might allow deadlocks to tie up resources for a while. – Can detect more often, but it’s time-consuming. • The usual answer: – Detection is the winner. – Deadlocks are pretty rare. – If you get a lot of deadlocks, reconsider your schema/workload! 59 Lock Granularity • Data item: variable, record, row, table, file • When an item is accessed, the DBMS locks an entity that contains the item. The size of that entity determines the granularity of the lock – Coarse granularity (large entities locked) • Advantage: If transactions tend to access multiple items in the same entity, fewer lock requests need to be processed and less lock storage space required • Disadvantage: Concurrency is reduced since some items are unnecessarily locked – Fine granularity (small entities locked) • Advantages and disadvantages are reversed 60 Lock Granularity • Table locking (coarse) – Lock entire table when a row is accessed. • Row (tuple) locking (fine) – Lock only the row that is accessed. • Page locking (compromise) – When a row is accessed, lock the containing page 61 Dynamic Databases • If we relax the assumption that the DB is a fixed collection of objects, even Strict 2PL will not assure serializability: – T1 locks all pages containing sailor records with rating = 1, and finds oldest sailor (say, age = 71). – Next, T2 inserts a new sailor; rating = 1, age = 96. – T2 also deletes oldest sailor with rating = 2 (and, say, age = 80), and commits. – T1 now locks all pages containing sailor records with rating = 2, and finds oldest (say, age = 63). • No consistent DB state where T1 is “correct”! 62 The Problem • T1 implicitly assumes that it has locked the set of all sailor records with rating = 1. – Assumption only holds if no sailor records are added while T1 is executing! – Need some mechanism to enforce this assumption. (Index locking and predicate locking.) • Example shows that conflict serializability guarantees serializability only if the set of objects is fixed! 63 Data Index Locking Index r=1 • If there is an unclustered index on the rating field, T1 should lock the index page containing the data entries with rating = 1. – If there are no records with rating = 1, T1 must lock the index page where such a data entry would be, if it existed! • If there is no suitable index, T1 must lock all pages, and lock the file/table to prevent new pages from being added, to ensure that no new records with rating = 1 are added. 64 Predicate Locking • Grant lock on all records that satisfy some logical predicate, e.g. age > 2*salary. • Index locking is a special case of predicate locking for which an index supports efficient implementation of the predicate lock. – What is the predicate in the sailor example? • In general, predicate locking has a lot of locking overhead. 65 Transaction Support in SQL • Each transaction has an access mode, a diagnostics size, and an isolation level. • SET TRANSACTION ISOLATION LEVEL X where X is: Isolation Level Dirty Read Unrepeatable Read Phantom Problem READ UNCOMMITTED Maybe Maybe Maybe READ COMMITTED No Maybe Maybe REPEATABLE READ No No Maybe SERIALIZABLE No No No 66 Timestamp CC • Idea: Give each object a read-timestamp (RTS) and a write-timestamp (WTS), give each Xact a timestamp (TS) when it begins: – If action ai of Xact Ti conflicts with action aj of Xact Tj, and TS(Ti) < TS(Tj), then ai must occur before aj. Otherwise, restart violating Xact. 67 Timestamp-Ordering Protocol • Timestamp order equals to serialization order • Ensures conflict serializability • TS(Ti) < TS(Tj) implies Ti runs before Tj in a serial schedule • Each data item Q have two timestamps – WTS(Q): the largest timestamp of any transaction that successfully executed write(Q) – RTS(Q): the largest timestamp of any transaction successfully executed read(Q) 68 When Xact T wants to read Object O • If TS(T) < WTS(O), this violates timestamp order of T w.r.t. writer of O. – So, abort T and restart it with a new, larger TS. (If restarted with same TS, T will fail again! Contrast use of timestamps in 2PL for ddlk prevention.) • If TS(T) > WTS(O): – Allow T to read O. – Reset RTS(O) to max(RTS(O), TS(T)) • Change to RTS(O) on reads must be written to disk! This and restarts represent overheads. 69 When Xact T wants to Write Object O • If TS(T) < RTS(O), this violates timestamp order of T w.r.t. writer of O; abort and restart T. • If TS(T) < WTS(O), violates timestamp order of T w.r.t. writer of O. – Thomas Write Rule: We can safely ignore such outdated writes; need not restart T! (T’s write is effectively followed by another write, with no intervening reads.) Allows some serializable but non T1 conflict serializable schedules: R(A) • Else, allow T to write O. W(A) Commit T2 W(A) Commit 70 Timestamp-Ordering Protocol • Consider the following history: r1[x], r2[x], w2[x], r1[y], r2[y] Let TS(T1) and TS(T2) be 1 and 2 respectively. Show that the above history can be accepted by the timestamp-ordering protocol • Ans: Assume the initial timestamp of x and y = 0, consider each of the cases: – r1[x]: Since TS(T1)≥W-timestamp(x), read is executed and R-timestamp(x)=1 – r2[x]: Since TS(T2)≥W-timestamp(x), read is executed and R-timestamp(x)=2 – w2[x]: Since TS(T2)≥W-timestamp(x) and TS(T2)≥R-timestamp(x), write is executed and W-timestamp(x)=2 – r1[y]: Since TS(T1)≥W-timestamp(y), read is executed and R-timestamp(y)=1 – r2[y]: Since TS(T2)≥W-timestamp(y), read is executed and R-timestamp(y)=2 71 Timestamp CC and Recoverability v Unfortunately, unrecoverable schedules are allowed: • Timestamp CC can be modified to allow only recoverable schedules: – Buffer all writes until writer commits (but update WTS(O) when the write is allowed.) – Block readers T (where TS(T) > WTS(O)) until writer of O commits. • Similar to writers holding X locks until commit, but still not quite 2PL. T1 W(A) T2 R(A) W(B) Commit 72 Exercise • Suppose that initially the read-timestamp of data item X is 25 and the write timestamp of X is 20. If [r,t] (respectively [w,t]) denotes a read (write) operation on X by any transaction with timestamp t, describe the behavior of Basic Timestamp Ordering on the following sequence of operations: [r,19], [r,22], [w,21], [w,26], [r,28], [w,27], [w,32], [w,31] 73 Exercise • Determine whether each of following executions is serializable or not. For each serializable execution, give the serial schedule which is equivalent to the given schedule. 1. R1(X), W2(X), W1(X), R2(Y) 2. R1(X), W2(X), W3(X), R1(X) 3. R1(X), W2(X), W3(Y), W1(Y) 74 Exercise • Transactions T1, T2, T3 are to be run concurrently. The following gives details of the proposed interleaving of read/write operations and the time when each such operation is to be scheduled. TimeT1 T2 T3 1 read(A) 2 read(A) 3 read(D) 4 write(D) 5 write(A) 6 read(C) 7 write(B) 8 write(B) • Determine whether the operations can be executed in this order if concurrency is to be controlled using – two-phase locking – timestamp ordering 75