Download Concurrency control and recovery

Concurrency Control and Recovery • In real life: • users access the database concurrently, and • systems crash. • Concurrent access to the database also improves performance, yields better utilization of resources. • BUT: if not careful, concurrent access can lead to incorrect database states. Crashes can also leave the database in incoherent states. • Basic concurrency/recovery concept: transaction • executed atomically. All or nothing. • We cover: • transactions in SQL • implementation of transactions and recovery. Flight Reservation get values for :flight, :date, :seat EXEC SQL SELECT occupied INTO :occ FROM Flight WHERE fltNum = :flight AND fltdt= :date AND fltSeat=:seat if (!occ) { EXEC SQL UPDATE Flights SET occupied = ‘true’ WHERE fltNum= :flight AND fltdt= :date AND fltSeat=:seat /* more code missing */ } else /* notify customer that seat is not available */ Problem #1 Customer 1 - finds a seat empty Customer 2 - finds the same seat empty Customer 1 - reserves the seat. Customer 2 - reserves the seat. Customer 1 will not be happy. serializability Bank Transfers Transfer :amount from :account1 to :account2 EXEC SQL SELECT balance INTO :balance1 FROM Accounts WHERE accNo = :account1 if (balance1 >= amount) EXEC SQL UPDATE Accounts SET balance = balance + :amount WHERE acctNo = :account2; EXEC SQL UPDATE Accounts SET balance = balance - :amount WHERE acctNo = :account1; Crash... Transactions The user/programmer can group a sequence of commands so that they are executed atomically and in a serializable fashion: • Transaction commit: all the operations should be done and recorded. • Transaction abort: none of the operations should be done. In SQL: • EXEC SQL COMMIT; • EXEC SQL ROLLBACK; Easier said than done... ACID Properties Atomicity: all actions of a transaction happen, or none happen. Consistency: if a transaction is consistent, and the database starts from a consistent state, then it will end in a consistent state. Isolation: the execution of one transaction is isolated from other transactions. Durability: if a transaction commits, its effects persist in the database. How Do We Assure ACID? Concurrency control: Guarantees consistency and isolation, given atomicity. Logging and Recovery: Guarantees atomicity and durability. If you are going to be in the logging business, one of the things that you’ll have to do is learn about heavy equipment. -- Robert VanNatta Logging History of Columbia County More on SQL and Transactions Read only transactions: • if the transaction is only reading, we can allow more operations in parallel. EXEC SQL SET TRANSACTION READ ONLY; The default is: SET TRANSACTION READ WRITE; Dirty Data Data that has been written by a transaction that has not committed yet is called dirty data. Do we allow our transaction to read dirty data? It may go away… In SQL: SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED Note: default for READ UNCOMMITTED transactions is that they are READ ONLY. Problems with Dirty Data Transfer program: 1. Transfer $N from account 1 to account 2 2a. If account 1 has enough for the transfer, 2b. subtract $N from account 1, and commit. 2c. Subtract $N from account 2, and commit. Bad scenario: A1: $100, A2: $200, A3: $300 T1: transfer $150 from A1 to A2 T2: transfer $250 from A2 to A3. Events: • T2 does step 1, -> A3 has $550 • T1 does step 1, -> A2 has $350 • T2 does step 2a, all is ok. • T1 does step 2a and finds that A1 doesn’t have enough funds. • T2 does step 2b, -> A2 now has $100 • T1 does step 2c, -> A2 now has -$50. Concurrency Control Methods • Schedules • Serial schedules • Serializable schedules • Locking • Lock manager • 2 Phase Locking • Deadlocks: • Prevention • Detection Schedules •A schedule is an interleaving of a set of actions of different transactions, such that the actions of any single transaction are in order. • A schedule represents some actual sequence of database actions. T1 T2 R(A) W(A) R(B) W(B) • In a complete schedule, every transaction either commits or aborts. • Initial state + Schedule -> Final state. R(C) W(C) Acceptable Schedules Serial schedules: • The transactions run one at a time from beginning to completion. • Note: there are many possible serial schedules. Each one is OK. The DBMS does not provide any guarantee in which order concurrently submitted transactions are executed. Serializable schedules: • Final state is what some serial schedule would have produced. Aborted Transactions Slight modification to the definition: A schedule is serializable if it is equivalent to a serial schedule of committed transactions. • As if the aborted transactions never happened. Two issues to consider w.r.t. aborted transactions: • how does one undo the effect of a transaction? • What if another transaction sees the effects of an aborted one? Locks Concurrency control is usually done via locking. The lock manager maintains a list of entries: • object identifier (can be page, record, etc.) • number of objects holding lock on the object • nature of the lock (shared or exclusive) • pointer to a list of lock requests. -Lock compatibility table: --  If a transaction cannot get a lock, it is S  suspended on a wait queue. X  S X    Handling Lock Requests Lock Request (XID, OID, Mode) Mode==S Mode==X Currently Locked? Empty Wait Queue? Yes No Yes Yes No Put on Queue Grant Lock No Two-Phase Locking (2PL) • 2 phase locking: • if T wants to read an object, it first obtains an S lock. • If T wants to write an object, it first obtains an X lock. • If T releases any lock, it can acquire no new locks. • Recall: all this is done transparently to the user by the DBMS. • 2PL guarantees serializability! • Why?? lock point # of lock s growing phase shrinking phase Time Serializability Graphs Two actions conflict if they access the same data item. The precedence graph contains: • A node for every committed transaction • An arc from Ti to Tj if an action of Ti precedes and conflicts with an action of Tj. • T1 transfers $100 from A to B, T2 adds 6% to both R1(A), W1(A), R2(A), W2(A), R2(B), W2(B), R1(B), W1(B) T1 T2 Conflict Serializability • 2 schedules are conflict equivalent if: – they have the same sets of actions, and – each pair of conflicting actions is ordered in the same way. • A schedule is conflict serializable if it is conflict equivalent to a serial schedule. – Note: Some serializable schedules are not conflict serializable! • Theorem: A schedule is conflict serializable iff its precedence graph is acyclic. • Theorem: 2PL ensures that the precedence graph will be acyclic! Deadlocks Suppose we have the following scenario: • T1 asks for an exclusive lock on A • T2 asks for an exclusive lock on B • T1 asks for a shared lock on B • T2 asks for a shared lock on A Both T1 and T2 are waiting! We have a DEADLOCK. Possible solutions: • Prevent deadlocks to start with, or • Detect when they happen and do something about it. Deadlock Prevention • Give each transaction a timestamp. “Older” transactions have higher priority. • Assume Ti requests a lock, but Tj holds a conflicting lock. We can follow two strategies: • Wait-die: if Ti has higher priority, it waits; else Ti aborts. • Wound-wait: if Ti has higher priority, abort Tj; else Ti waits. • Note: after aborting, restart with original timestamp! Both strategies guarantee deadlock-free behavior! An Alternative to Prevention • In theory, deadlock can involve many transactions: T1 waits-for T2 waits-for T3 ...waits-for T1 • In practice, most “deadlock cycles” involve only 2 transactions. • Don’t need to prevent deadlock! What’s the problem with prevention? • Allow it to happen, then notice it and fix it. Deadlock detection. Deadlock Detection • Lock Manager maintains a “Waits-for” graph: • Node for each transaction. • Arc from Ti to Tj if Tj holds a lock and Ti is waiting for it. • Periodically check graph for cycles. • “Shoot” some transaction to break the cycle. • Simpler hack: time-outs. T1 made no progress for a while? Shoot it. Detection Versus Prevention • Prevention might abort too many transactions. • Detection might allow deadlocks to tie up resources for a while. – Can detect more often, but it’s time-consuming. • The usual answer: – Detection is the winner. – Deadlocks are pretty rare. – If you get a lot of deadlocks, reconsider your schema/workload! Review: ACID Properties Atomicity: all actions of a transaction happen, or none happen. Consistency: if a transaction is consistent, and the database starts from a consistent state, then it will end in a consistent state. Isolation: the execution of one transaction is isolated from other transactions. Durability: if a transaction commits, its effects persist in the database. The Recovery Manager guarantees Atomicity & Durability. Motivation for Recovery • Atomicity: – Transactions may abort (“Rollback”). • Durability: – What if DBMS stops running? (Causes?)  Desired Behavior after system restarts: – T1, T2 & T3 should be durable. – T4 & T5 should be aborted (effects not seen). T1 T2 T3 T4 T5 crash! Handling the Buffer Pool • Force every write to disk? – Poor response time. – But provides durability. No Steal Force • Steal buffer-pool frames from uncommited No Force transactions? – If not, poor throughput. – If so, how can we ensure atomicity? Steal Trivial Desired Basic Idea: Logging • Record REDO and UNDO information, for every update, in a log. – Sequential writes to log (put it on a separate disk). – Minimal info (difference) written to log, so multiple updates fit in a single log page. • Log: An ordered list of REDO/UNDO actions – Log record contains: <XID, pageID, offset, length, old data, new data> – and additional control information. • The Write-Ahead Logging Protocol:  Must force the log record for an update before the corresponding data page gets to disk.  Must write all log records for a transaction before commit. WAL & the Log DB LSNs pageLSNs RAM flushedLSN • Each log record has a unique Log Sequence Number (LSN). – LSNs always increasing. Log records flushed to disk • Each data page contains a pageLSN. – The LSN of the most recent log record for an update to that page. • System keeps track of flushedLSN. pageLSN – The max LSN flushed so far. “Log tail” in RAM • WAL: Before a page is written, – pageLSN flushedLSN Recovery Three steps: (a la` ARIES) Starting from a checkpoint: • Analysis: figure out which transactions committed since the checkpoint, and which failed. • REDO all actions in the log. • UNDO effects of failed transactions. Summary • Users access the database concurrently, and sometimes there are crashes. • Transactions are sets of operations that are guaranteed to be atomic. • The DBMS guarantees: Atomicity, Consistency, Isolation, Durability. • Isolation and consistency are guaranteed via locking: 2-phase (need special care for deadlocks). • Atomicity and durability are guaranteed by: • Logging • Recovery manager (that uses the log). There are MANY MANY more missing details!

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Concurrency control and recovery