* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Document
Survey
Document related concepts
Consistency model wikipedia , lookup
Open Database Connectivity wikipedia , lookup
Oracle Database wikipedia , lookup
Functional Database Model wikipedia , lookup
Ingres (database) wikipedia , lookup
Extensible Storage Engine wikipedia , lookup
Global serializability wikipedia , lookup
Commitment ordering wikipedia , lookup
Relational model wikipedia , lookup
Microsoft Jet Database Engine wikipedia , lookup
Database model wikipedia , lookup
Versant Object Database wikipedia , lookup
Clusterpoint wikipedia , lookup
ContactPoint wikipedia , lookup
Transcript
Database Systems DataBase System Major Content & Grade Introduction * The *** Relational Model SQL **** Transaction Management *** Database Design (E-R) *** Database Design (Normalization) *** Haichang Gao , Software School , Xidian University 2 DataBase System Table of contents Motivation Basic concept of transactions Issues of concurrency control Haichang Gao , Software School , Xidian University DataBase System Motivation Operations on databases (SQL commands) Queries : select … from … where… Insertions Deletions Updates Create : insert … values … : delete … where … : update … where … tables, change attributes etc. These are basic operations on tables But are they “too basic” in real life? Haichang Gao , Software School , Xidian University DataBase System Motivation Consider a database for bank accounts Basic operations (in the eye of the customers) Withdraw Deposit Transfer Dividend (利息) Each basic operations contains multiple database operations Haichang Gao , Software School , Xidian University DataBase System Motivation Example : Transfer $k from x to y (Method 1) Find tuple for x’s account (database query) Read x’s account info into main memory 3. Check if x have at least $k 4. Subtract $k from x’s account 5. Write x’s new balance back to the database (database update) 6. Find tuple for y’s account (database query) 7. Read y’s account info into main memory 8. Add $k to y’s account 9. Write y’s new balance to the database (database update) 1. 2. Haichang Gao , Software School , Xidian University DataBase System Motivation One needs to maintain Consistency/Correctness Efficiency Consistency/Correctness: The right amount of money being transferred Easy to check for normal operations But what if System crashes Multiple users want to update same data Haichang Gao , Software School , Xidian University DataBase System Motivation System crashes, case 1 Find tuple for x’s account (database query) Read x’s account info into main memory 3. Check if x have at least $k 4. Subtract $k from x’s account 5. Write x’s new balance back to the database (database update) crashes! ------------------------------------------- System crashes! 6. Find tuple for y’s account (database query) 7. Read y’s account info into main memory 8. Add $k to y’s account 9. Write y’s new balance to the database (database update) 1. 2. What is the database like now? What happen if we don’t do anything about it? Haichang Gao , Software School , Xidian University DataBase System Motivation System crashes, case 2 Find tuple for x’s account (database query) Read x’s account info into main memory 3. Check if x have at least $k 4. Subtract $k from x’s account 5. Write x’s new balance back to the database (database update) 6. Find tuple for y’s account (database query) 7. Read y’s account info into main memory 8. Add $k to y’s account 9. Write y’s new balance to the database (database update) 1. 2. crashes! -------------------------------------------- System crashes! OK? But what is output which has being buffered? Haichang Gao , Software School , Xidian University DataBase System Motivation Two potential problems System crashes in the middle Need to make sure the system is consistent after restarting Some tuples may be updated but others aren’t System crashes at the “end” It is unclear if all changes is saved onto the disk When system crashes, all the unsaved changes is lost Need to ensure that all changes are reflected Haichang Gao , Software School , Xidian University DataBase System Motivation Another problem: multiple users Consider another operation, dividend: Find tuple for x’s account (database query) 2. Find tuple for y’s account (database query) 3. Read x’s account info into main memory 4. Read y’s account info into main memory 5. Add 1% to x’s account 6. Write x’s new balance back to the database (database update) 7. Add 1% to y’s account 8. Write y’s new balance back to the database (database update) 1. Haichang Gao , Software School , Xidian University DataBase System Motivation Suppose x has $100, y has $200, consider two operations x transfer $50 to y Dividend If transfer comes before dividend X : 100 -> 50 -> 50.5 Y : 200 -> 250 -> 252.5 If dividend comes before transfer X : 100 -> 101 -> 51 Y : 200 -> 202 -> 252 Haichang Gao , Software School , Xidian University DataBase System Motivation What if we want concurrent execution? What does concurrent mean? Can we concurrently run commands without any limitations? What is an acceptable schedule? Haichang Gao , Software School , Xidian University DataBase System Motivation 1. 2. 3. 4. 5. 6. 7. 8. 9. Find tuple for x’s account (database query) Read x’s account info into main memory Check if x have at least $k Subtract $k from x’s account Write x’s new balance back to the 1. Find tuple for x’s account (database query) database (database update) 2. Find tuple for y’s account (database query) 3. Read x’s account info into main memory 4. Read y’s account info into main memory 5. Add 1% to x’s account 6. Write x’s new balance back to the database (database update) 7. Add 1% to y’s account 8. Write y’s new balance back to the database (database update) Find tuple for y’s account (database query) Read y’s account info into main memory Add $k to y’s account Write y’s new balance to the database (database update) X : 100 -> 50 -> 50.5; Y : 200 -> 202 -> 252 Acceptable to the bank, but not the customer…. Haichang Gao , Software School , Xidian University DataBase System Motivation Thus need to define an acceptable standard of consistency, in the face of concurrent execution with other commands A plausible definition: “If multiple commands execute concurrently, the results must looks like that the commands are executed one by one (sequentially)’’ Haichang Gao , Software School , Xidian University DataBase System Motivation Many of the problems above can be eliminated if we Disable concurrency Forcing writes to disk immediately Do not write anything until the end of the command However this leads to inefficiency Thus: how to get the best of both worlds… Haichang Gao , Software School , Xidian University DataBase System Transaction basics -- definition A transaction is a unit of program execution that accesses and possibly updates various data items. Can be defined as A set of SQL statements Stored procedures Initiated by high level programming languages (Java, C++ etc.) Delimited by begin transaction & end transaction Example: Begin transaction X = select salary from person where name = “Chu” update person set salary = x * 10 where name = “” Update person set salary = x / 10 where name = “” End transaction Haichang Gao , Software School , Xidian University DataBase System Transaction basics -- states A transaction can be in any one of the 5 states: Active, the initial state; the transaction stays in this state while it is executing Partially committed, after the final statement has been executed. Failed, after the discovery that normal execution can no longer proceed. Aborted, after the transaction has been rolled back and the database restored to its state prior to the start of the transaction. Two options after it has been aborted: restart the transaction – only if no internal logical error kill the transaction Committed, after successful completion. Haichang Gao , Software School , Xidian University DataBase System Transaction basics -- states Haichang Gao , Software School , Xidian University DataBase System Transaction basics -- states A transaction need not commit immediately after its last statement Why? It is the DBMS’s responsibility to determine which transactions can commit and which to abort Also, it is the DBMS’s responsibility to clean up (roll back) after a transaction aborts Possibility of cascade aborts Haichang Gao , Software School , Xidian University DataBase System DataBase System Haichang Gao , Software School , Xidian University DataBase System Transaction basics -- consistency During transaction execution the database may be inconsistent. When the transaction is committed, the database must be consistent. Two main issues to deal with: Failures of various kinds, such as hardware failures and system crashes Concurrent execution of multiple transactions Haichang Gao , Software School , Xidian University DataBase System Transaction basics -- ACID Four basic properties that transactions must be maintained Atomicity (原子性): All or nothing Consistency (一致性) : Each transaction must ensure data consistency Isolation (隔离性): Transactions “unaware” of other concurrent transaction Durability (持久性): Once committed, changes to database must be persistent Haichang Gao , Software School , Xidian University DataBase System Transaction basics -- ACID Atomicity : All or nothing i.e. : Either all operations of the transaction are properly reflected in the database or none are. Implications If the system crashes in the middle of a transaction T, when the system restarts, before any user can use the database again, the DBMS must ensure either T is finished T never started Which do you think is easier? Make more sense? Haichang Gao , Software School , Xidian University DataBase System Transaction basics -- ACID Consistency: Each transaction must ensure data consistency i.e. Execution of a transaction in isolation preserves the consistency of the database. Thus all integrity and other constraints must be satisfied Haichang Gao , Software School , Xidian University DataBase System Transaction basics -- ACID Isolation : Transactions “unaware” of other concurrent transaction i.e. : Although multiple transactions may execute concurrently, each transaction must be unaware of other concurrently executing transactions. Intermediate transaction results must be hidden from other concurrently executed transactions. Implications: for every pair of transactions Ti and Tj, it appears to Ti that either Tj, finished execution before Ti started, or Tj started execution after Ti finished. Some level of interleaving are not allowed Haichang Gao , Software School , Xidian University DataBase System Transaction basics -- ACID Durability : Once committed, changes to database must be persistent i.e. : After a transaction completes successfully, the changes it has made to the database persist, even if there are system failures. Implications: Suppose a transaction commits, and then the system crashes. When the system restarts, before any user can use the database again, the DBMS must ensure that the changes made by this transaction is stored onto the disk. Haichang Gao , Software School , Xidian University DataBase System Transaction basics – DBMS support 2 major tasks in DBMS to handle transactions Concurrency control Recovery Handle how concurrent transaction is executed Goal: Isolation Handle how to recover a database after a failure Goal: Atomicity & Durability Consistency is maintained throughout various part of the DBMS (not the focus of this course) Many systems rolls the two part together as a “transaction manager” Haichang Gao , Software School , Xidian University DataBase System Transaction basics – DBMS support What makes transaction processing tricky(灵活的) Scheduling is hidden from the DBMS DBMS cannot enforce which transaction to execute next Buffer management is hidden from DBMS Although the transaction write something onto the database, it is only written to the buffers, to be transferred to the disk at unspecified time One can force transfer immediately, but will be very inefficient Haichang Gao , Software School , Xidian University DataBase System Table of contents Concurrency control Serializability Motivating Conflict Test examples serializability of serializability Haichang Gao , Software School , Xidian University DataBase System Concurrency control Why concurrency? increased processor and disk utilization, leading to better transaction throughput: one transaction can be using the CPU while another is reading from or writing to the disk reduced average response time for transactions: short transactions need not wait behind long ones. Shared resources. Many transaction may want to access the same object/tuple. Isolation. One of the key requirement of transactions Haichang Gao , Software School , Xidian University DataBase System Concurrency control -- schedule Schedules – sequences that indicate the chronological order(时间顺序) in which instructions of concurrent transactions are executed a schedule for a set of transactions must consist of all instructions of those transactions must preserve the order in which the instructions appear in each individual transaction. Assumption: at any time, only one operation from one transaction can be executed However, DBMS may interleave operations (交叉执 行) from multiple transactions Haichang Gao , Software School , Xidian University DataBase System Concurrency control -- schedule Serial schedule: schedules that does not allow interleaving between transactions (i.e. one transaction finishes before the other begin) Equivalent schedules: two schedules are equivalent if they “produce the same results” Haichang Gao , Software School , Xidian University DataBase System Notation used Database consists of objects (X, Y, Z) , each of them is an integer Transactions are labeled T1, T2 etc. Each transaction has a set of local variables (not accessible by other transactions) in main memory. (Labeled as a1, a2, b1, b2 etc.) Each transaction access the database by the read() & write() command Haichang Gao , Software School , Xidian University DataBase System Notation used A read command read a database object into a local variables (a1 <- read(X) ) A write command write a local variable into the database object (write(X, a1) ) Local variables for read() & write() will not be shown if the context is clear, or if it is unimportant Manipulation and calculation on objects can only be done on local variables (e.g. X <- X + 1 is not allowed, but a1 <- a1+ 1 is ok) In some case, the local manipulation is not shown (to highlight the effects of read() and write()) Haichang Gao , Software School , Xidian University DataBase System Notation used Example; (transfer, assuming overdraft(透支) is allowed) 1. a1 <- Read(X) 2. a1 <- a1 – k 3. Write(X, a1) 4. a2 <- Read(Y) 5. a2 <- a2 + k 6. Write(Y, a2) Haichang Gao , Software School , Xidian University DataBase System Serializable schedule – example Consider the fund transfer operation 1. Find tuple for x’s account (database query) 2. Read x’s account info into main memory 3. Check if x have at least $k Rewrite 4. Subtract $k from x’s Using class account notation 5. Write x’s new balance back to the database (database update) 6. Find tuple for y’s account (database query) 7. Read y’s account info into main memory 8. Add $k to y’s account 9. Write y’s new balance to the database (database update) 1. A1 <- Read(X) 2. A1 <- A1 – k 3. Write(X, A1) 4. A2 <- Read(Y) 5. A2 <- A2 + k 6. Write(Y, A2) Haichang Gao , Software School , Xidian University DataBase System Serializable schedule – example Consider the dividend operation 1. Find tuple for x’s account (database query) 2. Read x’s account info into main memory 3. Add 1% to x’s account 4. Write x’s new balance back to the database (database update) 5. Find tuple for y’s account (database query) 6. Read y’s account info into main memory 7. Add 1% to y’s account 8. Write y’s new balance back to the database (database update) Rewrite Using class notation 1. A1 <- Read(X) 2. A1 <- A1* 1.01 3. Write(X, A1) 4. A2 <- Read(Y) 5. A2 <- A2 * 1.01 6. Write(Y, A2) Haichang Gao , Software School , Xidian University DataBase System Serializable schedule – example Suppose x has $100, y has $200 Consider two operations x transfer $50 to y Dividend For serial schedules If transfer comes before dividend If dividend comes before transfer X : 100 -> 50 -> 50.5 Y : 200 -> 250 -> 252.5 X : 100 -> 101 -> 51 Y : 200 -> 202 -> 252 In both case, X + Y = 303 Haichang Gao , Software School , Xidian University DataBase System Serializable schedule – example (2) But with the following schedule 1. A1 <- Read(X) 2. A1 <- A1 – k 3. Write(X, A1) 1. 2. 3. 4. 5. 6. A1 <- Read(X) A1 <- A1* 1.01 Write(X, A1) A2 <- Read(Y) A2 <- A2 * 1.01 Write(Y, A2) 4. A2 <- Read(Y) 5. A2 <- A2 + k 6. Write(Y, A2) X: 100 -> 50 -> 50.5; Y: 200 -> 202 -> 252; X+Y= 302.5 Haichang Gao , Software School , Xidian University DataBase System Serializable schedule – example (2) What cause the problem? Contention of resources(资源竞争)? Interleaving(交叉执行)? Is interleaving always bad? Haichang Gao , Software School , Xidian University DataBase System Serializable schedule – example (2) But with the following schedule 1. A1 <- Read(X) 2. A1 <- A1 – k 3. Write(X, A1) 4. A2 <- Read(Y) 5. A2 <- A2 + k 6. Write(Y, A2) 1. A1 <- Read(X) 2. A1 <- A1* 1.01 3. Write(X, A1) 4. A2 <- Read(Y) 5. A2 <- A2 * 1.01 6. Write(Y, A2) X: 100 -> 50 -> 50.5; Y: 200 -> 250 -> 252.5; X+Y = 303 In this case, interleaving is ok! Haichang Gao , Software School , Xidian University DataBase System Serializable schedule – example (2) Let’s change slightly: 1. A1 <- Read(X) 2. A1 <- A1 – k 3. Write(X, A1) 4. A2 <- Read(Y) 5. A2 <- A2 + k 6. Write(Y, A2) X: 100 -> 50 -> 50.5; 1. 2. 3. 4. A1 <- Read(X) A1 <- A1* 1.01 Write(X, A1) A2 <- Read(Y) 5. A2 <- A2 * 1.01 6. Write(Y, A2) Y: 200 -> 250 -> 202; X+Y = 252.5 In this case, interleaving is very bad! Haichang Gao , Software School , Xidian University DataBase System Serializable schedule – example (2) Let’s change slightly (again): 1. A1 <- Read(X) 2. A1 <- A1 – k 3. Write(X, A1) 4. A2 <- Read(Y) 5. A2 <- A2 + k 6. Write(Y, A2) X: 100 -> 50 -> 50.5; 1. A1 <- Read(X) 2. A1 <- A1* 1.01 3. 4. 5. 6. Write(X, A1) A2 <- Read(Y) A2 <- A2 * 1.01 Write(Y, A2) Y: 200 -> 250 -> 252.5; X+Y = 303 In this case, interleaving is good again! Haichang Gao , Software School , Xidian University DataBase System Serializable schedule – example (2) What’s going on here? Interleaving can However, How be very bad. some interleaving does not cause problems. can we determine what kind of interleaving is “nice”? Haichang Gao , Software School , Xidian University DataBase System Serializability How to formalize the notion? One can look at final results If the schedule produce the same result as a serial schedule, then it’s fine. However, this may be due to luck and/or “commutative”(可交换的) operators 1. A1 <- Read(X) 2. A1 <- A1 – k 3. Write(X, A1) 4. A2 <- Read(Y) 5. A2 <- A2 + k 6. Write(Y, A2) 1. 2. 3. 4. 5. 6. A1 <- Read(X) A1 <- A1 + m Write(X, A1) A2 <- Read(Y) A2 <- A2 - m Write(Y, A2) A better notion is needed Haichang Gao , Software School , Xidian University DataBase System Conflict serializability Suppose two transactions (T1, T2) want to operate on the same data object (X) Four possible scenarios T1 Read(X), T2 Read(X) T1 Read(X), T2 Write(X) T1 Write(X), T2 Read(X) T1 Write(X), T2 Write(X) How does the order of these operations affect the results of the transactions? Haichang Gao , Software School , Xidian University DataBase System Conflict serializability -- example Remember this schedule? 1. A1 <- Read(X) 2. A1 <- A1 – k 3. Write(X, A1) 4. A2 <- Read(Y) 5. A2 <- A2 + k 6. Write(Y, A2) X: 100 -> 50 -> 50.5; i. A1 <- Read(X) ii. A1 <- A1* 1.01 iii. Write(X, A1) iv. A2 <- Read(Y) v. A2 <- A2 * 1.01 vi. Write(Y, A2) Y: 200 -> 250 -> 252.5; X+Y = 303 In this case, interleaving is ok! Haichang Gao , Software School , Xidian University DataBase System Conflict serializability -- example Remember this schedule? 1. A1 <- Read(X) 2. A1 <- A1 – k 3. Write(X, A1) 4. A2 <- Read(Y) 5. A2 <- A2 + k 6. Write(Y, A2) i. ii. A1 <- Read(X) A1 <- A1* 1.01 iii. Write(X, A1) iv. A2 <- Read(Y) v. A2 <- A2 * 1.01 vi. Write(Y, A2) (4) And (iii) are not in conflict, so can swap Haichang Gao , Software School , Xidian University DataBase System Conflict serializability -- example Remember this schedule? 1. A1 <- Read(X) 2. A1 <- A1 – k 3. Write(X, A1) 4. A2 <- Read(Y) 5. A2 <- A2 + k 6. Write(Y, A2) i. A1 <- Read(X) ii. A1 <- A1* 1.01 iii. Write(X, A1) iv. A2 <- Read(Y) v. A2 <- A2 * 1.01 vi. Write(Y, A2) (4) And (ii) are not in conflict, so can swap Haichang Gao , Software School , Xidian University DataBase System Conflict serializability -- example Remember this schedule? 1. 2. 3. 4. A1 <- Read(X) A1 <- A1 – k Write(X, A1) A2 <- Read(Y) 5. A2 <- A2 + k 6. Write(Y, A2) i. A1 <- Read(X) ii. A1 <- A1* 1.01 iii. Write(X, A1) iv. A2 <- Read(Y) v. A2 <- A2 * 1.01 vi. Write(Y, A2) (4) And (i) are not in conflict, so can swap Haichang Gao , Software School , Xidian University DataBase System Conflict serializability -- example Remember this schedule? 1. 2. 3. 4. 5. A1 <- Read(X) A1 <- A1 – k Write(X, A1) A2 <- Read(Y) A2 <- A2 + k 6. Write(Y, A2) i. A1 <- Read(X) ii. A1 <- A1* 1.01 iii. Write(X, A1) iv. A2 <- Read(Y) v. A2 <- A2 * 1.01 vi. Write(Y, A2) Similarily (5) And (i) – (iii) are not in conflict, so can swap Haichang Gao , Software School , Xidian University DataBase System Conflict serializability -- example Remember this schedule? 1. 2. 3. 4. 5. 6. A1 <- Read(X) A1 <- A1 – k Write(X, A1) A2 <- Read(Y) A2 <- A2 + k Write(Y, A2) i. ii. iii. iv. v. vi. A1 <- Read(X) A1 <- A1* 1.01 Write(X, A1) A2 <- Read(Y) A2 <- A2 * 1.01 Write(Y, A2) Similarily (6) And (i) – (iii) are not in conflict, so can swap We obtain a serial schedule by this transformation (swapping process) Haichang Gao , Software School , Xidian University DataBase System Conflict serializability -- example Now, remember this schedule? 1. A1 <- Read(X) 2. A1 <- A1 – k 3. Write(X, A1) 4. A2 <- Read(Y) 5. A2 <- A2 + k 6. Write(Y, A2) X: 100 -> 50 -> 50.5; i. ii. iii. iv. v. vi. A1 <- Read(X) A1 <- A1* 1.01 Write(X, A1) A2 <- Read(Y) A2 <- A2 * 1.01 Write(Y, A2) Y: 200 -> 202 -> 252; X+Y = 302.5 Haichang Gao , Software School , Xidian University DataBase System Conflict serializability -- example Now, remember this schedule? 1. A1 <- Read(X) 2. A1 <- A1 – k 3. Write(X, A1) 4. A2 <- Read(Y) 5. A2 <- A2 + k 6. Write(Y, A2) i. ii. iii. iv. v. vi. A1 <- Read(X) A1 <- A1* 1.01 Write(X, A1) A2 <- Read(Y) A2 <- A2 * 1.01 Write(Y, A2) (3) And (i) has conflict, so can’t swap (4) And (vi) has conflict, so can’t swap Haichang Gao , Software School , Xidian University DataBase System Conflict serializability -- example Can you work through this case? 1. A1 <- Read(X) 2. A1 <- A1 – k 3. Write(X, A1) 4. A2 <- Read(Y) 5. A2 <- A2 + k 6. Write(Y, A2) X: 100 -> 50 -> 50.5; 1. 2. 3. 4. A1 <- Read(X) A1 <- A1* 1.01 Write(X, A1) A2 <- Read(Y) 5. A2 <- A2 * 1.01 6. Write(Y, A2) Y: 200 -> 250 -> 202; X+Y = 250.5 In this case, interleaving is very bad! Haichang Gao , Software School , Xidian University DataBase System Conflict serializability From previous slides, it seems schedule can be transformed to a serial schedule good! (achieve isolation) A schedule cannot be transformed to a serial schedule bad! (do not achieve isolation) A Can we generalize? Yes. Haichang Gao , Software School , Xidian University DataBase System Conflict serializability -- definitions Schedule: sequences that indicate the chronological order in which instructions of concurrent transactions are executed Complete schedule: Schedule that contain commit/abort decision for each transaction in the schedule Serial schedule: A schedule where there is no interleaving of operations by multiple transactions. Denoted by < T1, T2, … , Tn> Haichang Gao , Software School , Xidian University DataBase System Conflict serializability -- definitions Conflict equivalent transformation: swapping adjacent operation on a schedule which does not conflict Conflict equivalent(冲突等价): Two schedules S and S’ are conflict equivalent if S can be transformed to S’ by successive conflict equivalent transformations Haichang Gao , Software School , Xidian University DataBase System Conflict serializability -- definitions Conflict serializability(冲突可串行化): a schedule S is conflict serializable if it is conflict equivalent to a serial schedule Example of a schedule that is not conflict serializable: T3 read(Q) T4 write(Q) write(Q) We are unable to swap instructions in the above schedule to obtain either the serial schedule < T3, T4 >, or the serial schedule < T4, T3 >. Haichang Gao , Software School , Xidian University DataBase System Test for serializability Consider the following schedules: 1. A1 <- Read(X) 2. A1 <- A1 – k 3. Write(X, A1) i. ii. iii. iv. v. vi. 1. A1 <- Read(X) 2. A1 <- A1 – k 3. Write(X, A1) i. A1 <- Read(X) ii. A1 <- A1* 1.01 iii. Write(X, A1) A1 <- Read(X) A1 <- A1* 1.01 Write(X, A1) A2 <- Read(Y) 4. A2 <- Read(Y) A2 <- A2 * 1.01 5. A2 <- A2 + k Write(Y, A2) 6. Write(Y, A2) 4. A2 <- Read(Y) 5. A2 <- A2 + k 6. Write(Y, A2) Not serializable iv. A2 <- Read(Y) v. A2 <- A2 * 1.01 vi. Write(Y, A2) Serializable Haichang Gao , Software School , Xidian University DataBase System How to guarantee serializable Need to provide protocol (rules on how data item is accessed) to ensure conflict serializability Goal of protocol: To allow access for data items that are not required by multiple transactions For those data items required by multiple transaction, restrict access in some way, or limit it to exclusive access Balance between safety and efficiency Too restrictive: little or no concurrency, ineffective Too lenient: leads to inconsistency Haichang Gao , Software School , Xidian University DataBase System Lock-based protocols “Exclusive” access locks Each database item is associated with locks Transaction must obtain locks before accessing the object Transaction must release lock when it finishes. Haichang Gao , Software School , Xidian University DataBase System Lock-based protocols Example: Lock(X) Read(X) Unlock(X) Read(X) Lock(X): check if object X is already locked If not, obtain the lock If so, wait or “do something” to handle the potential deadlock (like aborting) Unlock(X): release the lock on object X The addition of Lock(X) and Unlock(X) commands are done by the DBMS Haichang Gao , Software School , Xidian University DataBase System Lock-based protocols: S and X locks Two kinds of locks on each object Shared locks (S-locks) Requested before reading Multiple transactions can hold a S-lock on an object simultaneously Exclusive locks (X-locks) Requested before writing Only one transaction can hold an X-lock on an object at any given time No other transaction can hold any lock (not even a S-lock) if some transaction has an X-lock Haichang Gao , Software School , Xidian University DataBase System Lock-based protocols: S and X locks More on S and X locks A transaction that holds an S-lock on an object can read the object A transaction that holds an X-lock on an object can read and write the objects Lock-compatibility table T2 holds S-lock X-lock S-lock True False X-lock False False T1 Request Haichang Gao , Software School , Xidian University DataBase System Lock-based protocol Consider the following examples (again): 1. A1 <- Read(X) 2. A1 <- A1 – k 3. Write(X, A1) 4. A2 <- Read(Y) 5. A2 <- A2 + k 6. Write(Y, A2) T1 (Transfer) 1. A1 <- Read(X) 2. A1 <- A1* 1.01 3. Write(X, A1) 4. A2 <- Read(Y) 5. A2 <- A2 * 1.01 6. Write(Y, A2) T2 (Dividend) Haichang Gao , Software School , Xidian University DataBase System Lock-based protocol -- example With a lock-based protocol, one possible way T1 is transformed 1. A1 <- Read(X) 2. A1 <- A1 – k 3. Write(X, A1) 4. A2 <- Read(Y) 5. A2 <- A2 + k 6. Write(Y, A2) T1 (Transfer) 1. S-lock(X) 2. A1 <- Read(X) 3. A1 <- A1 – k 4. X-lock(X) 5. Write(X, A1) 6. X-lock(Y) 7. A2 <- Read(Y) 8. A2 <- A2 + k 9. Write(Y, A2) 10.Unlock(X) 11.Unlock(Y) Haichang Gao , Software School , Xidian University DataBase System Lock-based protocol -- example Notes from the previous example: Locks must be obtained before read/write can begin If a transaction want to read and write the same object, it can either Obtain an X-lock before reading Obtain an S-lock before reading, then obtain an X-lock before writing (it is not automatically granted) A transaction does not have to release locks immediately after use Good or bad? Haichang Gao , Software School , Xidian University DataBase System Two phase locking – definition The basic two-phase locking (2PL) protocol A transaction T must hold a lock on an item x in the appropriate mode before T accesses x. If a conflicting lock on x is being held by another transaction, T waits. Once T releases a lock, it cannot obtain any other lock subsequently. Note: a transaction is divided into two phases: A growing phase (obtaining locks) A shrinking phase (releasing locks) Claim : 2PL ensures conflict serializability Haichang Gao , Software School , Xidian University DataBase System Two phase locking – Serializability Lock-point: the point where the transaction obtains all the locks With 2PL, a schedule is conflict equivalent to a a serial schedule ordered by the lock-point of the transactions Haichang Gao , Software School , Xidian University DataBase System 2-phase locking -- example 1. 2. 3. 4. S-lock(X) A1 <- Read(X) A1 <- A1 – k X-lock(X) 5. Write(X, A1) 6. S-lock(Y) 7. A2 <- Read(Y) 8. A2 <- A2 + k 9. X-lock(Y) 10. Write(Y, A2) 11. Unlock(X) Lock point for T1 12. Unlock(Y) T1 1. S-lock(X) 1. 2. 3. 4. 5. 6. S-lock(X) A1 <- Read(X) A1 <- A1* 1.01 X-lock(X) Write(X, A1) S-lock(Y) T2 waits T2 waits 6. S-lock(Y) 7. A2 <- Read(Y) 8. A2 <- A2 * 1.01 Lock point for T2 9. X-lock(Y) 10. Write(Y, A2) 11. Unlock(Y) T2 Haichang Gao , Software School , Xidian University 12. Unlock(X) DataBase System DataBase System Haichang Gao , Software School , Xidian University DataBase System Table of contents Motivation Basic concept of transactions Issues of concurrency control Haichang Gao , Software School , Xidian University DataBase System Deadlocks 2 phase locking is a blocking protocol (transaction has to wait if it cannot obtain a lock) Probability of a deadlock(死锁) Example: 1. S-lock(X) 2. A1 <- Read(X) 3. A1 <- A1 – k Can’t proceed T2 has S-lock on X 4. X-lock(X) 1. S-lock(X) 2. A1 <- Read(X) 3. A1 <- A1* 1.01 4. X-lock(X) Can’t proceed T1 has S-lock on X Deadlock! Haichang Gao , Software School , Xidian University DataBase System Deadlocks How to deal with deadlocks? Ostrich (驼鸟策略) Pretend nothing happens and wait for user to hit the <ctrl-alt-del> key Timeout(超时法) Assume deadlock after there is no progress for some time, and do something about it Detection and recovery(检测与恢复) Wait until a deadlock occurs and do something about it Avoidance(避免死锁) Wait until a deadlock can occur if certain operations is executed and do something about it Prevention(预防法) Set up the system such that there is never a chance of deadlocks Haichang Gao , Software School , Xidian University DataBase System Deadlocks -- timeout When transaction waits too long for a lock, it is aborted and restarted again from the beginning The timeout period should be: long enough so that most transactions that are aborted are actually deadlocked; short enough so that deadlocked transactions don’t wait too long for their deadlocks to be broken. Haichang Gao , Software School , Xidian University DataBase System Deadlocks -- Detection Deadlocks can be described as a wait-for graph, which consists of a pair G = (V,E), V is a set of vertices (all the transactions in the system) E is a set of edges; each element is an ordered pair Ti Tj. Haichang Gao , Software School , Xidian University DataBase System Deadlocks -- Detection When deadlock is detected : Some transaction will have to rolled back (made a victim) to break deadlock. Select that transaction as victim that will incur minimum cost. Rollback -- determine how far to roll back transaction Total rollback: Abort the transaction and then restart it. More effective to roll back transaction only as far as necessary to break deadlock. (but tricky to implement) Drawback: maintaining wait-for-graph is not cheap Detecting cycles is an overhead Haichang Gao , Software School , Xidian University DataBase System The phantom(幻影) menace Consider the following 2 transactions 1. Select sum(salary) From faculty Where dept = “CS” 2. Select sum(salary) From faculty Where dept = “Math” T1 a. Insert into faculty values (“Lin”, “CS”, 1000) b. Insert into faculty values (“Lam”, “Math”, 5000) T2 There does not seems to be a conflict (in terms of tuple) Assume initially CS faculty have total salary 1,000,000 and Math faculty have total salary 2,000,000 Then T1 -> T2 will imply the select statements return 1,000,000 and 2,000,000 T2 -> T1 will imply the select statements return 1,001,000 and 2,005,000 Haichang Gao , Software School , Xidian University DataBase System The phantom menace But consider the following schedule 1. Select sum(salary) From faculty Where dept = “CS” 2. Select sum(salary) From faculty Where dept = “Math” a. Insert into faculty values (“Lin”, “CS”, 1000) b. Insert into faculty values (“Lam”, “Math”, 5000) T1 T2 The output will be 1,000,100 and 2,000,000 Not conflict serializable! Haichang Gao , Software School , Xidian University DataBase System The phantom problem This is known as the phantom problem Why does it occur? No tuples are in conflict However, conflict occurs for tuples that satisfy a certain condition (dept = “CS”, dept = “Math”) T1 require access for ALL tuples satisfying the condition However, T2 changes the number of tuples satisfying the condition No quick solution: index-locking as a possibility Haichang Gao , Software School , Xidian University DataBase System Isolation levels The goal of locking is to ensure serializability To ensure serializability, we require 2PL. No new locks are acquired once releasing locks begin Locks acquired need to be held for a long time (long locks) One cannot acquire a lock, done work with it, and then release it immediately Adv: ensure serializability Dis: less concurrency Is serializability ( long locks/two-phase locking) necessary in all cases? Haichang Gao , Software School , Xidian University Serializability – In practice DataBase System Data manipulation language must include a construct for specifying the set of actions that comprise a transaction. In SQL, a transaction begins implicitly. A transaction in SQL ends by: Commit work commits current transaction and begins a new one. Rollback work causes current transaction to abort. Levels of consistency specified by SQL-92: Serializable — default Repeatable read Read committed Read uncommitted Haichang Gao , Software School , Xidian University Serializability – In practice DataBase System Serializable — default Repeatable read — only committed records to be read, repeated reads of same record must return same value. However, a transaction may not be serializable – it may find some records inserted by a transaction but not find others. Read committed — only committed records can be read, but successive reads of record may return different (but committed) values. Read uncommitted — even uncommitted records may be read. Haichang Gao , Software School , Xidian University DataBase System Levels of Isolation SQL: SET TRANSACTION (READ ONLY | READ WRITE) ISOLATION LEVEL (READ UNCOMMITTED | READ COMMITTED | REPEATABLE READ | SERIALIZABLE); Two facts of the transaction defined by the statement: 1). Type of the transaction (Read only/Read Write) 2). Levels of isolation Haichang Gao , Software School , Xidian University DataBase System Levels of Isolation 四种隔离级别的读写锁 短期锁(short-term lock):只是在执行存取操作的一段足够长的时间内保留; 长期锁(long-term lock): 这种锁将一直保持直到事务提交为止。 Read Uncommitted (dirty reads) Read Committed Write locks on rows of a table are longterm Read locks on rows of a table are long-term Read and write locks on predicates are long-term NO (but it’s readonly) No Read locks at all No predicate locks at all No Short-term Read predicate locks Long-term Write predicate locks Yes Repeatable Read Yes Yes Short-term Read predicate locks Long-term Write predicate locks Serializable Yes Yes Long-term Read and Write predicate locks Haichang Gao , Software School , Xidian University DataBase System Levels of Isolation 1) Read Uncommitted 特点: · 该隔离级别执行的事务只能是读事务,而不允许更新事务; · 该隔离级别执行的事务无论访问什么数据都不(需要)加锁。 故该隔离级别可能Read到一个Uncommitted结果。 不会产生丢失修改,但是会产生读脏数据。 Haichang Gao , Software School , Xidian University DataBase System Levels of Isolation 2) Read Committed 特点: · 该隔离级别下加的写锁为长期锁,读锁为短期锁; 短期读谓词锁,长期写谓词锁. · 该隔离级别下对于三类冲突的两种不会发生: a. Wi(A) →Rj(A) (2) 故Read到的永远是Committed结果。(读脏不会出现) b. Wi(A) →Wj(A) (3) (一般的丢失修改不会发生) · 该隔离级别下对于三类冲突的一种会发生: c. Ri(A) → Wj(A) (1) 故会导致如下二类问题: ① 不可重复读 ② Scholar’s Lost Update Anomaly Haichang Gao , Software School , Xidian University DataBase System Levels of Isolation 3) Repeatable Read 特点: · 该隔离级别下加的读、写锁均为长期锁;并且仅对读谓词加的是短期锁,写 谓 词加为长期锁。 该隔离级别消除了丢失修改、读脏数据、不可重复读异常。但由于其对读谓词 加的是短期锁,故该隔离级别消除不了“幽灵更新异常”(phantom update anomaly) 。 4) Serializability (an Phantom Updates) 特点: · 该隔离级别下加的读、写锁均为长期锁;并且对读、写谓词加的均为长期锁 。 该隔离级别可保证并发的事务串行化并消除所有的异常。 Haichang Gao , Software School , Xidian University DataBase System Table of contents Recovery(恢复) Haichang Gao , Software School , Xidian University DataBase System Recovery – why? ACID properties Atomicity: all-or-nothing Durability: Once committed, changes must be permanent If the system is always functional, that’s ok. However, system may crash (failure). Haichang Gao , Software School , Xidian University 92 Recovery – why? DataBase System Types of failures: Transaction failure : Logical errors: transaction cannot complete due to some internal error condition (e.g. your program have a division by zero error) System errors: the database system must terminate an active transaction due to an error condition (e.g., deadlock) Haichang Gao , Software School , Xidian University 93 Recovery – why? DataBase System Types of failures (ctd) System crash: a power failure or other hardware or software failure causes the system to crash. Fail-stop assumption: non-volatile storage (非易失性存储器) contents are assumed to not be corrupted by system crash Database systems have numerous integrity checks to prevent corruption of disk data Haichang Gao , Software School , Xidian University 94 Recovery – why? DataBase System Types of failures (ctd): Disk failure: a head crash or similar disk failure destroys all or part of disk storage Destruction is assumed to be detectable: disk drives use checksums to detect failures With some disk systems, some failures maybe correctable. We focus on transaction/system failure as well as other failures that are correctable. Haichang Gao , Software School , Xidian University 95 DataBase System Recovery – why? Failure can occur ANY time Including the time that you do not want We do assume in this class that writing a page is atomic (i.e. failure do not occur in the middle of writing a page) Thus need to maintain atomicity and durability at all times Haichang Gao , Software School , Xidian University 96 DataBase System Recovery – why? After failure, the system needs to recover Need Need to recover to a consistent state to ensure all transactions are atomic and durable Thus when the system go back up and running after a failure A recovery module is up and running – before allowing any other transaction to run It will restore the database to the consistent state as well as ensure the ACID properties After that transactions may continue Notice that information need to be stored during the normal running of the transaction for the recovery module to run properly. Haichang Gao , Software School , Xidian University 97 DataBase System Recovery – atomicity requirement Suppose the system crash while a transaction T is being executed (but not committed) During recovery, one need to Find those transactions Ensure atomicity is held Abort or complete? Abort! (in many case, don’t know how to complete anyway) Abort means restoring the database to the point where those transactions has not started (as some changes may have propagated to the disk) Thus, after the recovery procedure, those transaction should seems to have NEVER been executed Haichang Gao , Software School , Xidian University 98 DataBase System Recovery – atomicity requirement Thus, during recovery, one need to Find those transactions that has started but not yet committed Ensure atomicity is held Find all the changes to the database that the transactions has done Undo all the changes Haichang Gao , Software School , Xidian University 99 DataBase System Recovery – durability requirement Suppose the crash occurs right after a transaction T committed. Seems to be ok, but … T may have written something onto the disk However, the writes may not have propagated to the disk Reasons The writes may be scheduled/buffered but the system crashed before such writes can execute The buffer manager/virtual memory management may decide to put a disk page into main memory for a long time (save disk access time) This is especially true on a network/shared file system Haichang Gao , Software School , Xidian University 100 DataBase System Recovery – durability requirement Remember, when you issue a write() command, the system does not necessary write what you have onto disk immediately. In some systems, you may issue a flush() command to force the writes onto the disk Haichang Gao , Software School , Xidian University 101 DataBase System Recovery – durability requirement Thus, during recovery, one need to Find those transactions that has committed Ensure durability is held Check if all the changes made by the transactions is written onto the disk If not, then redo all the changes Haichang Gao , Software School , Xidian University 102 DataBase System Recovery -- overview Recovery algorithms have two parts 1. Actions taken during normal transaction processing to ensure enough information exists to recover from failures 2. Actions taken after a failure to recover the database contents to a state that ensures atomicity, consistency and durability In order to achieve (1), we need to store the information in stable storage Haichang Gao , Software School , Xidian University 103 DataBase System Log files Logs are needed to record the operations on the database during normal operations The log is a sequence of log records, and maintains a record of update activities on the database. Haichang Gao , Software School , Xidian University 104 DataBase System Log files Different types of log records during normal operations: Begin record: <Ti, start> -- registered when transaction Ti begins Write record: <Ti, X, Vold, Vnew> -- registered when a database item X is updated by Ti, where Vold, Vnew store the old and new values respectively Commit record: <Ti, commit> -- registered when Ti commits. Vold, Vnew also known as before-image & after-image respectively Formally, a transaction commits when the commit log record is written Abort record: <Ti, abort> -- registered when Ti aborts Haichang Gao , Software School , Xidian University 105 DataBase System Log files With such information, redoing and undoing operations can be done: Undo: copy Vold back to the object. Redo: copy Vnew back to the object Haichang Gao , Software School , Xidian University 106 DataBase System Log files Logs are written onto stable storage Question: given an operation to be logged, should we log first or execute first? Consider the case the system fail between the two operations Haichang Gao , Software School , Xidian University 107 DataBase System Log files Thus logs must be write-ahead logs (WAL) i.e. all operations must be logged first Log records must be forced to stable storage before actually operations can be executed Haichang Gao , Software School , Xidian University 108 DataBase System Log-based recovery : basic approach Given a write-ahead log. How should recovery proceed after the system crash? Two major steps: Locating the Apply transaction that need works to be done compensatory action (补偿) on these transactions Haichang Gao , Software School , Xidian University 109 DataBase System Log-based recovery : basic approach Step 1: locating transactions that needed to be dealt with. Uncommitted but active transactions Need undo Transactions that has <start T> in log, but not <commit T> Committed transactions Need redo Transactions that has <commit T> in log Haichang Gao , Software School , Xidian University 110 DataBase System Log-based recovery : basic approach Example a) Undo T0 b) Undo T1, Redo T0 c) Redo T0, T1 Haichang Gao , Software School , Xidian University 111 DataBase System Log-based recovery : basic approach Step 2: Apply compensatory actions Redo & Undo Requirement: actions have to be idempotent(等幂) That is, even if the operation is executed multiple times the effect is the same as if it is executed once Haichang Gao , Software School , Xidian University 112 DataBase System Log-based recovery : basic approach For undo: copying old value of the item from the log to the database For redo: copying new value of the item from the log to the database Both operations are idempotent Haichang Gao , Software School , Xidian University 113