Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
CS 292 Special topics on Big Data Yuan Xue ([email protected]) Part I Relational Database (Transaction) Yuan Xue ([email protected]) Review and Look Forward What we know so far Design: Database schema Optimization objective: minimum redundancy with information preservation Operation: Database access and manipulation via SQL Design: How data is stored in database? Operation: How data is accessed and manipulated; How to configure index via SQL Optimization: How data storage and access method (Index creation) can be designed so that application (SQL queries) execution time is minimized Next step More on DB operation: How constraints/integrity is ensured Conceptual Design Entity/Relationship model Logic Design Data model mapping Logical Schema] Normalization Normalized Schema Physical Design Physical (Internal) Schema Motivation Example For Transaction Scenario Efficient support for counting Followers, Followees, Tweets A new table Number Consider the two operations Bob2013 unfollow Alice00 Dave11 follow Alice00 Operation by Bob2013 on Number Number User ID NumFollower NumFollowee NumTweet Alice00 6 7 8 Bob2013 1 2 3 Cathy123 6 9 12 Operation by Dave11 on Number X:= SELECT NumFollower FROM Number WHERE UserID =Alice00 X:= SELECT NumFollower FROM Number WHERE UserID =Alice00 X := X-1 X := X +1 UPDATE Number SET NumFollower=X WHERE UserID =Alice00 UPDATE Number SET NumFollower=X WHERE UserID =Alice00 Motivation Example For Transaction Scenario Efficient support for counting Followers, Followees, Tweets A new table Number Consider the two operations Bob2013 unfollow Alice00 Dave11 follow Alice00 Operation by Bob2013 on Number Number User ID NumFollower Alice00 6 Bob2013 Cathy123 Operation by Dave11 on Number X:= SELECT NumFollower Read(X) FROM Number WHERE UserID =Alice00 X:= SELECT NumFollower Read(X) FROM Number WHERE UserID =Alice00 X := X-1 X := X +1 UPDATE Number Write(X) SET NumFollower=X WHERE UserID =Alice00 UPDATE Number Write(X) SET NumFollower=X WHERE UserID =Alice00 NumFollowee NumTweet 7 8 1 2 3 6 9 12 X Possible Outcome (I) Operation by Bob2013: Read(X) X := X-1 Write(X) Operation by Dave11: Read(X) X := X+1 Write(X) Operation by Dave11: Read(X) X := X+1 Write(X) Operation by Bob2013: Read(X) X := X-1 Write(X) No interweaving Serial operation Order does not matter Possible Outcome (II) Operation by Bob2013: Read(X) X := X-1 Operation by Dave11: Read(X) X := X+1 Write(X) Write(X) Lost Update Final result X=? Motivation Example For Transaction Scenario Efficient support for counting Followers, Followees,Tweets A new table Number Consider the two operations Bob2013 unfollow Alice00 Dave11 follow Alice00 Number User ID NumFollower Alice00 6 Bob2013 Cathy123 Getting more complicated - operate on two relations simultaneously Operation by Bob2013 on both Number and Follow Operation by Dave11 on both Number and Follow NumFollowee NumTweet 7 8 1 2 3 6 9 12 X Followee Follower Timesta mp Alice00 Bob2013 Y 2011.1.1.3.6.6 Bob2013 Cathy123 2012.10.2.6.7.7 Read(X) X := X-1 Write(X) Read(X) X := X+1 Write(X) Alice00 Cathy123 2012.11.1.2.3.3 Cathy123 Alice00 2012.11.1.2.6.6 Delete(Y) Insert(Z) Bob2013 Alice00 2012.11.1.2.6.7 Alice00 Dave11 Z 2012.11.1.2.6.7 Possible Outcome (II) Operation by Bob2013: Read(X) X := X-1 Write(X) Operation by Dave11: Dirty Read Read(X) X := X+1 Write(X) Write(Z) Failed! [Roll back value of X] Delete(Y) DaveII gets the wrong number “Inconsistent” database state Operation by Dave11: Transactions Transaction Solution: Group into one unit -- transaction Read(X) X := X+1 Write(X) Write(Z) An executing program that includes some database operations (e.g., read, write). These operation will together form an atomic unit of work against database At the end of a transaction the database must be in a valid or consistent state that satisfy all the constraints specified on the database schema. Two main purposes: Concurrent database access Isolation between programs accessing a database concurrently. Resilient to system failures Reliable units of work that allow correct recovery from failures Keep a database consistent even in cases of system failure, when execution stops (completely or partially) and many operations upon a database remain uncompleted, with unclear status. Transaction Properties ACID (Atomicity, Consistency, Isolation, Durability) is a set of properties that guarantee that database transactions are processed reliably. Atomicity: "all-or-nothing" proposition Each work unit performed in a database must either complete in its entirety or have no effect whatsoever The state of the whole database: Consistency: database constraints conformation A database state that obey all the constraints Each client, each transaction, Can assume all constraints hold when transaction begins Must guarantee all constraints hold when transaction ends Isolation: serial equivalency Operations may be interleaved, but execution must be equivalent to some sequential (serial) order of all transactions. Durability: durable storage. If system crashes after transaction commits, all effects of transaction remain in database Constraints Database constraints Inherent model-based constraints Inherent constraints of the relation model ( not duplicate tuples ) Explicit schema-based constraints Express in the schema of the relation model via DDL Application-based constraints (semantic constraints/business rules) Read FDS section 3.2 for details Express and enforced with in the data model Schema-based constraints Specified on a database schema, hold on (Time invariant) every valid database state of this schema Domain constraint Key constraints uniqueness Constraints on NULL Entity integrity constraint With in each tuple, the value of each attribute must be an atomic value from the domain No primary key can be NULL Referential integrity constraint Specified between two relations where one has a foreign key that refers to the other Transaction Support Concurrency Control Deal with interweaving transactions Guarantee isolation and database consistency Recovery Mechanisms Deal with system failures Guarantee all-or-nothing execution and database consistency and durability Concurrency Control Simple solution for isolation If transactions are executed serially, i.e., sequentially with no overlap in time, no transaction concurrency exists Then why is concurrency control needed? Problem: performance suffers Most high-performance transactional systems need to run transactions concurrently to meet their performance requirements. Concurrency Control Techniques Locking Locking basics Two-Phase Locking Dealing with Deadlocks Timestamp ordering Multiversion concurrency control (MVCC) Locking for Concurrency Control Lock: A variable Lock(X) associated with a data element X [each element has a unique lock] Describe the status of the element X Types of Locks Binary Locks [two states]: locked, unlocked Read-write Lock [three states]: read_locked, write_locked, unlocked Binary Lock Operation: Read-write Lock Operation: - - - Each transaction (T) must first acquire Lock(X) before reading/writing X If Lock(X) is held by another transaction (T’), then wait The transaction (T) must release Lock(X) after all reading/writing ops on X are completed The transaction (T) does not need to acquire Lock(X) if it already holds it. [Reentrant Lock] - Each transaction (T) must issue read_lock(X) or write_lock(X) before reading X Each transaction (T) must issue write_lock(X) before writing X Lock acquisition follows Read-Write Lock rule The transaction (T) must issue unlock(X) after all reading/writing ops on X are completed Lock conversion and upgrade Implementation of Lock Lock Table Implementation using Hash Table T1 T2 scheduler Tn Lock Table Lock manager subsystem X Hash function Lock info for X . . . Lock state, #reads, Locking Transacti on ID If element not found in hash table, it is unlocked Two Phase Locking Would Locking solve the problem? Lock is a basic mechanism. It needs to work with a protocol of how to apply locks to support correct concurrency control T1: Read_lock(Y) Read(Y) unlock(Y) T2: Read_lock(X) Read(X) unlock(X) Write_lock(Y) Read(Y) Y:=X+Y Write(Y) Unlock(Y) Write_lock(X) Read(X) X:=X+Y Write(X) Unlock(X) Two Phase Locking Locks are applied and removed in two phases: Expanding phase: locks are acquired and no locks are released. Shrinking phase: locks are released and no locks are acquired. T1: Read_lock(Y) Read(Y) Write_lock(X) unlock(Y) Read(X) X:=X+Y Write(X) Unlock(X) T2: Read_lock(X) Read(X) Write_lock(Y) unlock(X) Read(Y) Y:=X+Y Write(Y) Unlock(Y) It can be proved that if every transaction in a schedule follows the two-phase locking protocol, the schedule is guaranteed to be serializable (coming up in slide 23). Problem: Deadlock Wait-die: old transaction can wait for young transaction, young transaction do not wait for old ones and will die Wound-wait: young transaction can wait for old transaction, old transaction will preempt young transaction Review – Transaction Problem: Application must perform multiple operations (read, write) to the database as a unit to keep database in consistent states Operation concurrent execution is needed for performance consideration Solution: Transaction bundles multiple operations into one unit ACID property Application: defines transaction DBMS: supports transaction Pessimistic Optimistic Users … Concurrency control Locking Timestamp Ordering MVCC Recovery Application Program T1 T2 … DBMS Consistent DB state Tn T Consistent DB state Consistent DB state’ Transaction States Writing Transaction in SQL Transaction States BEGIN TRANSACTION [SQL statement] COMMIT or ROLLBACK [Single SQL statement] Schedule Transaction Ti sequence of Read_i(x), write_i(x) operations T2: Read_2(X) Write_2(X) Write_2(Z) Read_2(Y) Read_2(Z) Schedule (history): a chronological order in which operations are executed from various transactions T1: Read_1(X) Write_1(X) Write_1(Y) Read_1(Y) Read_1(Z) Operations from the same transaction must follow the same order in which they occur in the transaction Operations from different transactions can be interleaved in the schedule Totally ordering: for any two operations, one must occur before the other (vs. partial order only on conflicting operations) Property of Schedule from two perspectives Concurrency control Recovery (coming up later) T1: Read_1(X) Write_1(X) Write_1(Y) Read_1(Y) Read_1(Z) T2: Read_2(X) Write_2(X) Write_2(Z) Read_2(Y) Read_2(Z) Serializable Schedule Serial schedules T2: Read_2(X) Write_2(X) Write_2(Z) Read_2(Y) Read_2(Z) No concurrent execution correct Basis of correct concurrent execution Serializable Schedule T1: Read_1(X) Write_1(X) Write_1(Y) Read_1(Y) Read_1(Z) Allow concurrent execution Considered to be correct as it is equivalent to some serial schedules Equivalence of schedules Conflict equivalence View equivalence T1: Read_1(X) Write_1(X) Write_1(Y) Read_1(Y) Read_1(Z) ? T2: Read_2(X) Write_2(X) Write_2(Z) Read_2(Y) Read_2(Z) Conflict Equivalence Conflict operations They belong to different transactions They access the same element (X) At least one of the operations is a write(X) Intuition: two operations are conflicting if changing their orders can result in a different outcome Two types Read-write conflict Write-write conflict Read_1(X),Write_2(X) Write_1(X),Write_2(X) Two schedules are Conflict Equivalence, if the orders of any two conflicting operations is the same in both schedules A schedule is Conflict Serializable if it is conflict equivalent to a serial schedule. T1: Read_1(X) Write_1(X) T2: Read_2(X) Write_2(X) Conflict Serializablity: testing and ensurance Question: Conflict Serializablity is good, but could we know whether a schedule is conflict serializable? Testing based on precedence graph Each transaction is a node Directed edges represent the order of conflicts (read-write, write-write) Schedule is serializable if and only the precedence graph has no cycle Equivalent serial schedule can be created by topological sort Most concurrency control techniques do not actually test for serializablity. They develop rules (protocols) to guarantee that any schedule that follows the rules are serializable Two phase locking used in majority of commercial DBMS Basic (covered in slide 19) may cause deadlock Conservative: Prevents deadlock by locking all desired data items before transaction begins execution Strict: where unlocking is performed after a transaction terminates (commits or aborts and rolled-back). This is the most commonly used two-phase locking algorithm Timestamp ordering no locking overhead; suite better for distributed implementation ( used in H-store and Spanner) Timestamp Ordering Timestamp A monotonically increasing variable (integer) indicating the age of an operation or a transaction. ID assigned by DBMS to a transaction TS(T) A larger timestamp value indicates a more recent event or operation. Implementation: counter, system clock Timestamp Ordering Algorithm Timestamp based algorithm uses timestamp to serialize the execution of concurrent transactions Only equivalent serial schedule has the transactions in the order of their timestamps Basic Timestamp Ordering Read_TS(X) read TS of item X largest TS among all the TS of transactions that have successfully read X Read_TS(X) = TS(T), T is the youngest transaction that has read X successfully Write_TX(X): similar definition 1. Transaction T issues a write_item(X) operation: If read_TS(X) > TS(T) or if write_TS(X) > TS(T), then an younger transaction has already read the data item so abort and roll-back T and reject the operation. If the condition in part (a) does not exist, then execute write_item(X) of T and set write_TS(X) to TS(T). 2. Transaction T issues a read_item(X) operation: If write_TS(X) > TS(T), then an younger transaction has already written to the data item so abort and roll-back T and reject the operation. If write_TS(X) TS(T), then execute read_item(X) of T and set read_TS(X) to the larger of TS(T) and the current read_TS(X). Good News: guarantee to be conflict serializable Bad News: reject too many transactions Resubmitted transactions will have a new timestamp Cascading rollback Strict Timestamp Ordering Strict Timestamp Ordering A solution to basic TO problem Easy recoverable and conflict serializable 1. Transaction T issues a write_item(X) operation: If TS(T) > read_TS(X), then delay T until the transaction T’ that wrote or read X has terminated (committed or aborted). 2. Transaction T issues a read_item(X) operation: If TS(T) > write_TS(X), then delay T until the transaction T’ that wrote or read X has terminated (committed or aborted). Thomas’s Write Rule A solution to basic TO problem Does not enforce conflict serializable Transaction T issues a write_item(X) operation: If read_TS(X) > TS(T) then abort and roll-back T and reject the operation. If write_TS(X) > TS(T), then just ignore the write operation and continue execution. This is because the most recent writes counts in case of two consecutive writes. If the conditions given in 1 and 2 above do not occur, then execute write_item(X) of T and set write_TS(X) to TS(T). What is a Data Element/Object? Relation A Tuple A Tuple B Relation B Tuple C ? Disk block A ... ... Disk block B ... Multiple Granularity Locks • Locking works in any case, but should we choose small or large objects? • If we lock large objects (e.g., Relations) – Need few locks – Low concurrency If we lock small objects (e.g., tuples,fields) – Need more locks – More concurrency • – Solution: Multiple granularity level locking Why Recovery is Needed Atomic: whenever a transaction is summited to a DBMS for execution Recovery from transaction failures All operations in the transaction are completed successfully and their effect is recorded permanently in the database If a transaction fails after executing some of its operations, then these operations executed must be undone and have no lasting effect (no effect on the database or any other transactions ) Restore database to the most recent consistent state (all or nothing) Types of failures Hardware failures Software errors Concurrency control enforcement Schedules classified on recoverability Recoverable schedule: One where no transaction needs to be rolled back. A schedule S is recoverable if no transaction T in S commits until all transactions T’ that have written an item that T reads have committed. Cascadeless schedule: One where every transaction reads only the items that are written by committed transactions. Recovery Techniques Causes of failures Two main mechanisms Systems crashes (partial transaction) Transaction failures (being preempted by concurrency techniques) System log Check point Two main techniques to recover from non catastrophic transaction failures Deferred update (No-UNDO/REDO) see slide 38 Immediate update (UNDO/REDO) not covered, see book [FDS] section 23.3 UNDO and REDO operations are required to be idempotent Data Update Immediate Update: As soon as a data item is modified in cache, the disk copy is updated. Deferred Update: All modified data items in the cache is written either after a transaction ends its execution or after a fixed number of transactions have completed their execution. Shadow update: The modified version of a data item does not overwrite its disk copy but is written at a separate disk location. In-place update: The disk version of the data item is overwritten by the cache version. require a transaction log Transaction Log For recovery from any type of failure data values prior to modification (BFIM - BeFore Image) and the new value after modification (AFIM – AFter Image) are required. These values and other information is stored in a sequential file called Transaction log. A sample log is given below. Back P and Next P point to the previous and next log records of the same transaction. T ID Back P Next P Operation Data item Begin T1 0 1 T1 1 4 Write X Begin T2 0 8 T1 2 5 W Y T1 4 7 R M T3 0 9 R N T1 5 nil End BFIM AFIM X = 100 X = 200 Y = 50 Y = 100 M = 200 M = 200 N = 400 N = 400 Write-Ahead Logging When in-place update (immediate or deferred) is used then log is necessary for recovery and it must be available to recovery manager. Write-Ahead Logging (WAL) protocol. For Undo: Before a data item’s AFIM is flushed to the database disk (overwriting the BFIM) its BFIM must be written to the log and the log must be saved on a stable store (log disk). For Redo: Before a transaction executes its commit operation, all its AFIMs must be written to the log and the log must be saved on a stable store. Checkpoint Time to time (randomly or under some criteria) the database flushes its buffer to database disk to minimize the task of recovery. The following steps defines a checkpoint operation: 1. 2. 3. 4. Suspend execution of transactions temporarily. Force write modified buffer data to disk. Write a [checkpoint] record to the log, save the log to disk. Resume normal transaction execution. During recovery redo or undo is required to transactions appearing after [checkpoint] record. Deferred update (No-UNDO/REDO) Deferred update (No-UNDO/REDO): do not physically update the database on disk until after a transaction reaches its commit point; The updates are recorded in the database. Before reaching commit, all transactions updates are recorded in the local transaction workspace or in the main memory buffers that the DBMS maintains. Before commit, the updates recorded persistently in the log, then after commit, the updates are written to the database on disk. If a transaction fails before reaching its commit point, it will not have changed the database in any way. So UNDO is not needed, it may be necessary to REDO the effect of the operation of a committed transaction from the log, because their effect may not yet have been recorded in the database. Summary – Transaction ACID Application correctness A Recovery I Concurrency control C D Persistency of disk Putting Things Together Users Application Program/Queries Query Processing DBMS system Data access (Database Engine) Meta-data Data http://www.dbinfoblog.com/post/24/the-query-processor/