* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Concurrency control and recovery
Survey
Document related concepts
Tandem Computers wikipedia , lookup
Operational transformation wikipedia , lookup
File locking wikipedia , lookup
Relational model wikipedia , lookup
Microsoft SQL Server wikipedia , lookup
Database model wikipedia , lookup
Global serializability wikipedia , lookup
Clusterpoint wikipedia , lookup
Versant Object Database wikipedia , lookup
Microsoft Jet Database Engine wikipedia , lookup
Extensible Storage Engine wikipedia , lookup
Commitment ordering wikipedia , lookup
Expense and cost recovery system (ECRS) wikipedia , lookup
Transcript
Concurrency Control and Recovery • In real life: • users access the database concurrently, and • systems crash. • Concurrent access to the database also improves performance, yields better utilization of resources. • BUT: if not careful, concurrent access can lead to incorrect database states. Crashes can also leave the database in incoherent states. • Basic concurrency/recovery concept: transaction • executed atomically. All or nothing. • We cover: • transactions in SQL • implementation of transactions and recovery. Flight Reservation get values for :flight, :date, :seat EXEC SQL SELECT occupied INTO :occ FROM Flight WHERE fltNum = :flight AND fltdt= :date AND fltSeat=:seat if (!occ) { EXEC SQL UPDATE Flights SET occupied = ‘true’ WHERE fltNum= :flight AND fltdt= :date AND fltSeat=:seat /* more code missing */ } else /* notify customer that seat is not available */ Problem #1 Customer 1 - finds a seat empty Customer 2 - finds the same seat empty Customer 1 - reserves the seat. Customer 2 - reserves the seat. Customer 1 will not be happy. serializability Bank Transfers Transfer :amount from :account1 to :account2 EXEC SQL SELECT balance INTO :balance1 FROM Accounts WHERE accNo = :account1 if (balance1 >= amount) EXEC SQL UPDATE Accounts SET balance = balance + :amount WHERE acctNo = :account2; EXEC SQL UPDATE Accounts SET balance = balance - :amount WHERE acctNo = :account1; Crash... Transactions The user/programmer can group a sequence of commands so that they are executed atomically and in a serializable fashion: • Transaction commit: all the operations should be done and recorded. • Transaction abort: none of the operations should be done. In SQL: • EXEC SQL COMMIT; • EXEC SQL ROLLBACK; Easier said than done... ACID Properties Atomicity: all actions of a transaction happen, or none happen. Consistency: if a transaction is consistent, and the database starts from a consistent state, then it will end in a consistent state. Isolation: the execution of one transaction is isolated from other transactions. Durability: if a transaction commits, its effects persist in the database. How Do We Assure ACID? Concurrency control: Guarantees consistency and isolation, given atomicity. Logging and Recovery: Guarantees atomicity and durability. If you are going to be in the logging business, one of the things that you’ll have to do is learn about heavy equipment. -- Robert VanNatta Logging History of Columbia County More on SQL and Transactions Read only transactions: • if the transaction is only reading, we can allow more operations in parallel. EXEC SQL SET TRANSACTION READ ONLY; The default is: SET TRANSACTION READ WRITE; Dirty Data Data that has been written by a transaction that has not committed yet is called dirty data. Do we allow our transaction to read dirty data? It may go away… In SQL: SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED Note: default for READ UNCOMMITTED transactions is that they are READ ONLY. Problems with Dirty Data Transfer program: 1. Transfer $N from account 1 to account 2 2a. If account 1 has enough for the transfer, 2b. subtract $N from account 1, and commit. 2c. Subtract $N from account 2, and commit. Bad scenario: A1: $100, A2: $200, A3: $300 T1: transfer $150 from A1 to A2 T2: transfer $250 from A2 to A3. Events: • T2 does step 1, -> A3 has $550 • T1 does step 1, -> A2 has $350 • T2 does step 2a, all is ok. • T1 does step 2a and finds that A1 doesn’t have enough funds. • T2 does step 2b, -> A2 now has $100 • T1 does step 2c, -> A2 now has -$50. Concurrency Control Methods • Schedules • Serial schedules • Serializable schedules • Locking • Lock manager • 2 Phase Locking • Deadlocks: • Prevention • Detection Schedules •A schedule is an interleaving of a set of actions of different transactions, such that the actions of any single transaction are in order. • A schedule represents some actual sequence of database actions. T1 T2 R(A) W(A) R(B) W(B) • In a complete schedule, every transaction either commits or aborts. • Initial state + Schedule -> Final state. R(C) W(C) Acceptable Schedules Serial schedules: • The transactions run one at a time from beginning to completion. • Note: there are many possible serial schedules. Each one is OK. The DBMS does not provide any guarantee in which order concurrently submitted transactions are executed. Serializable schedules: • Final state is what some serial schedule would have produced. Aborted Transactions Slight modification to the definition: A schedule is serializable if it is equivalent to a serial schedule of committed transactions. • As if the aborted transactions never happened. Two issues to consider w.r.t. aborted transactions: • how does one undo the effect of a transaction? • What if another transaction sees the effects of an aborted one? Locks Concurrency control is usually done via locking. The lock manager maintains a list of entries: • object identifier (can be page, record, etc.) • number of objects holding lock on the object • nature of the lock (shared or exclusive) • pointer to a list of lock requests. -Lock compatibility table: -- If a transaction cannot get a lock, it is S suspended on a wait queue. X S X Handling Lock Requests Lock Request (XID, OID, Mode) Mode==S Mode==X Currently Locked? Empty Wait Queue? Yes No Yes Yes No Put on Queue Grant Lock No Two-Phase Locking (2PL) • 2 phase locking: • if T wants to read an object, it first obtains an S lock. • If T wants to write an object, it first obtains an X lock. • If T releases any lock, it can acquire no new locks. • Recall: all this is done transparently to the user by the DBMS. • 2PL guarantees serializability! • Why?? lock point # of lock s growing phase shrinking phase Time Serializability Graphs Two actions conflict if they access the same data item. The precedence graph contains: • A node for every committed transaction • An arc from Ti to Tj if an action of Ti precedes and conflicts with an action of Tj. • T1 transfers $100 from A to B, T2 adds 6% to both R1(A), W1(A), R2(A), W2(A), R2(B), W2(B), R1(B), W1(B) T1 T2 Conflict Serializability • 2 schedules are conflict equivalent if: – they have the same sets of actions, and – each pair of conflicting actions is ordered in the same way. • A schedule is conflict serializable if it is conflict equivalent to a serial schedule. – Note: Some serializable schedules are not conflict serializable! • Theorem: A schedule is conflict serializable iff its precedence graph is acyclic. • Theorem: 2PL ensures that the precedence graph will be acyclic! Deadlocks Suppose we have the following scenario: • T1 asks for an exclusive lock on A • T2 asks for an exclusive lock on B • T1 asks for a shared lock on B • T2 asks for a shared lock on A Both T1 and T2 are waiting! We have a DEADLOCK. Possible solutions: • Prevent deadlocks to start with, or • Detect when they happen and do something about it. Deadlock Prevention • Give each transaction a timestamp. “Older” transactions have higher priority. • Assume Ti requests a lock, but Tj holds a conflicting lock. We can follow two strategies: • Wait-die: if Ti has higher priority, it waits; else Ti aborts. • Wound-wait: if Ti has higher priority, abort Tj; else Ti waits. • Note: after aborting, restart with original timestamp! Both strategies guarantee deadlock-free behavior! An Alternative to Prevention • In theory, deadlock can involve many transactions: T1 waits-for T2 waits-for T3 ...waits-for T1 • In practice, most “deadlock cycles” involve only 2 transactions. • Don’t need to prevent deadlock! What’s the problem with prevention? • Allow it to happen, then notice it and fix it. Deadlock detection. Deadlock Detection • Lock Manager maintains a “Waits-for” graph: • Node for each transaction. • Arc from Ti to Tj if Tj holds a lock and Ti is waiting for it. • Periodically check graph for cycles. • “Shoot” some transaction to break the cycle. • Simpler hack: time-outs. T1 made no progress for a while? Shoot it. Detection Versus Prevention • Prevention might abort too many transactions. • Detection might allow deadlocks to tie up resources for a while. – Can detect more often, but it’s time-consuming. • The usual answer: – Detection is the winner. – Deadlocks are pretty rare. – If you get a lot of deadlocks, reconsider your schema/workload! Review: ACID Properties Atomicity: all actions of a transaction happen, or none happen. Consistency: if a transaction is consistent, and the database starts from a consistent state, then it will end in a consistent state. Isolation: the execution of one transaction is isolated from other transactions. Durability: if a transaction commits, its effects persist in the database. The Recovery Manager guarantees Atomicity & Durability. Motivation for Recovery • Atomicity: – Transactions may abort (“Rollback”). • Durability: – What if DBMS stops running? (Causes?) Desired Behavior after system restarts: – T1, T2 & T3 should be durable. – T4 & T5 should be aborted (effects not seen). T1 T2 T3 T4 T5 crash! Handling the Buffer Pool • Force every write to disk? – Poor response time. – But provides durability. No Steal Force • Steal buffer-pool frames from uncommited No Force transactions? – If not, poor throughput. – If so, how can we ensure atomicity? Steal Trivial Desired Basic Idea: Logging • Record REDO and UNDO information, for every update, in a log. – Sequential writes to log (put it on a separate disk). – Minimal info (difference) written to log, so multiple updates fit in a single log page. • Log: An ordered list of REDO/UNDO actions – Log record contains: <XID, pageID, offset, length, old data, new data> – and additional control information. • The Write-Ahead Logging Protocol: Must force the log record for an update before the corresponding data page gets to disk. Must write all log records for a transaction before commit. WAL & the Log DB LSNs pageLSNs RAM flushedLSN • Each log record has a unique Log Sequence Number (LSN). – LSNs always increasing. Log records flushed to disk • Each data page contains a pageLSN. – The LSN of the most recent log record for an update to that page. • System keeps track of flushedLSN. pageLSN – The max LSN flushed so far. “Log tail” in RAM • WAL: Before a page is written, – pageLSN flushedLSN Recovery Three steps: (a la` ARIES) Starting from a checkpoint: • Analysis: figure out which transactions committed since the checkpoint, and which failed. • REDO all actions in the log. • UNDO effects of failed transactions. Summary • Users access the database concurrently, and sometimes there are crashes. • Transactions are sets of operations that are guaranteed to be atomic. • The DBMS guarantees: Atomicity, Consistency, Isolation, Durability. • Isolation and consistency are guaranteed via locking: 2-phase (need special care for deadlocks). • Atomicity and durability are guaranteed by: • Logging • Recovery manager (that uses the log). There are MANY MANY more missing details!