Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Relational model wikipedia , lookup
Microsoft SQL Server wikipedia , lookup
Database model wikipedia , lookup
Global serializability wikipedia , lookup
Clusterpoint wikipedia , lookup
Consistency model wikipedia , lookup
Object-relational impedance mismatch wikipedia , lookup
Microsoft Jet Database Engine wikipedia , lookup
Versant Object Database wikipedia , lookup
Commitment ordering wikipedia , lookup
Extensible Storage Engine wikipedia , lookup
Database Transactions and Processess Samenvatting 192110982 H18 ACID properties of transactions ACID properties van een DB Atomicity: Elke transactie wordt volledig uitgevoerd(executed) of helemaal niet (en heeft dus geen invloed) Consistency:DB moet consistent zijn. De waarden moeten de juiste waarden representeren. Het uitvoeren v/e transactie in isolation behoud de DB consistency en veranderd naar een nieuwe status welke de werkelijkheid weergeeft Isolation: De concurrent execution van een set van transacties heeft het zelfde effect als een seriële uitvoering van dezelfde set Durability: De resultaten van committed transacties zijn permanent Consistency De DB moet voldoen aan alle Integrity Constraints (niet alle DB states kunnen worden toegestaan): Internal consistency: bijv Referential integrity en Replicated data Enterprise rules: Enforcement of procedures. Wanneer een transactie wordt uitgevoerd, moet deze voldoen aan deze voorwaarden. Gebeurt dit niet, dan is er geen zekerheid over hoe de DB zal functioneren en is dus de data niet betrouwbaar. Transaction consistency is een verantwoordelijkheid van de ontwerper van de transactie. Hij moet er voor zorgen dat wanneer een transactie is uitgevoerd de DB in een juiste staat verkeerd door het voldoen aan de Integrity Contstraints. Levels of consistency: Static Dynamic: deze kan eigenlijk niet worden getest in één DB state. Een transaction is een unit of work: Een transactie moet alle werk doen om een DB te updaten. Dit kan nooit door 2 transacties worden gedaan want dat zou een violation zijn van de consistency Atomicity Atomic execution implies that every transaction either commits or aborts. In a DB changes are undone or rolled back (maw de oude situatie wordt terug gezet). Redenen voor een abort: 1 Violation of integrity constraints (voorwaarden) 2 Deadlock 3 Resources unavailable (eg HD crash), violating the Isolation requirement Durability Het system (DB) moet garanderen dat wanneer een transactie commits (vastleggen), de effecten in de DB blijven ook als de computer of het opslag medium uitvalt. Normaal gesproken gebeurt dit niet bij een OS, maar den DB moet het garanderen. Een mogelijkheid om te garanderen zijn bijvoorbeeld mirrored disks of disks in RAID geschakeld. Durability is relatief, men moet bepalen wat het waard is en welke graad dus wordt uitgevoerd. Isolation Concurrent execution (gelijktijdige uitvoering) van transacties. Maar de uiteindelijke uitvoering zal nog steeds serieel zijn. Dus om aan de Isolation property te voldoen, moet een concurrent schedule serializable zijn. ACID properties garanderen dat de DB correct, consistent en up-to-date model van de echte wereld is. Alhoewel is het strikt vast houden aan bijvoorbeeld Isolation belangrijk voor het functioneren van de DB, het is erg conservatief. Een DB kan goed functioneren met zonder dat alle transacties Isolated zijn, maar dit kan een risico met zich mee brengen. Conservatieve ACID is een zware belasting op het systeem. Interleaving: Het samenkomen van verschillende transacties in tijd. (zie ook de schema‟s) Page 1 of 29 H19 Models of transactions Flat transactions Some limitations Single DBMS All or nothing; no continue from certain point No human or device action possible as part of transaction (cannot rollback) Everything at once; no possibility to defer to later time without loosing ACID A possible solution is to use savepoints. Save points Used for partial rollback of db. Rollback to spi causes db updates after creation of spi to be undone o S2 and S3 updated the database (else no point rolling back over them) Program counter and local variables are not rolled back Savepoint creation does not make prior database changes durable (abort rolls all changes back) Implementation of save points: When Ti creates a savepoint, s, insert a marker for s in Ti‟s lock list, Li , that separates lock entries acquired before creation from those acquired after creation When Ti rolls back to s, release all locks following marker for s in Li (in addition to undoing all updates made since savepoint creation) Distributed Transactions Many enterprises support multiple legacy systems doing separate tasks. Increasing automation requires that these systems be integrated. Goal: distributed transaction should be ACID Each subtransaction is locally ACID (e.g., local constraints maintained, locally serializable) In addition the transaction should be globally ACID A: Either all subtransactions commit or all abort C: Global integrity constraints are maintained I: Concurrently executing distributed transactions are globally serializable D: Each subtransaction is durable Distributed Database: one design but due to size split up over several hosts Multidatabase: Several designs (may be used by several companies) Models: Hierarchical Model: o No concurrency among subtransactions, root initiates commit Peer Model: o Concurrency among siblings and between parent and children, any subtransaction can initiate commit Nested Transactions Parent can create children to perform subtasks; children may execute sequentially or concurrently; parent waits until all children complete (no communication between parent and children). Each subtransaction (with its descendants) is isolated wrt each sibling (and its descendants). Hence, siblings are serializable, but order is not determined and nested transaction is non-deterministic. Concurrent nested transactions are serializable. A subtransaction is atomic. It can abort or commit independently of other subtransactions. Commit is conditional on commit of parent (since child task is a subtask of parent task). Abort causes abort of all subtransaction‟s children. Nested transaction commits when root commits. At that point updates of committed subtransactions are made durable. Page 2 of 29 Locking implementation of nested transactions Nested transactions satisfy: Nested transactions are isolated with respect to one another A parent does not execute concurrently with its children A child (and its descendants) is isolated from its siblings (and their descendants) Acquiring a read lock A request to read x by subtransaction T2 of nested transaction T1 is granted if: No other nested transaction holds a write lock on x All other subtransactions of T1 holding write locks on x are ancestors of T2 (hence are not executing) Acquiring a write lock A request to write x by subtransaction T2 of nested transaction T1 is granted if: No other nested transaction holds a read or write lock on x All other subtransactions of T1 holding read or write locks on x are ancestors of T2 (and hence are not executing) All locks obtained by T2 are held until it completes If it aborts, all locks are released If it commits, any locks it holds that are not held by its parent are inherited by its parent When top-level transaction (and hence entire nested transaction) commits, all locks are released. Chained transactions Chaining allows a transaction to be decomposed into sub-transactions with intermediate commit points. Database updates are made durable at intermediate points => less work is lost in a crash Chaining compared with savepoints: Savepoint: explicit rollback to arbitrary savepoint; all updates lost in a crash (put on intentionlist!) Chaining: abort rolls back to last commit; only the updates of the most recent transaction lost in a crash ACID properties, what did we lose?: Atomicity Lost Isolation Lost Consistency needs consistency for subtransactions Durability Kept! One approach to atomicity problem is compensation (see SAGAS) SAGAS Sagas are an extension to chained transactions that achieves partial atomicity For each subtransaction, STi in a chained transaction T, a compensating transaction, CTi is designed Thus if a transaction T consisting of 5 chained subtransactions aborts after the first 3 subtransactions have committed, then ST1 ST2 ST3 CT3 CT2 CT1 will perform the desired compensation With this type of compensation, when a transaction aborts, the value of every item it changed is eventually restored to the value it had before that transaction started However, complete atomicity is not guaranteed o Some other concurrent transaction might have read the changed value before it was restored to its original value Other problems? o non compensatable transactions (reset(X)) Page 3 of 29 Recoverable queues A recoverable queue is a transactional data structure in which information about transactions to be executed later can be durably stored. See also slide 31 and 32 for example Queue could be implemented within the DB but performance suffers. A transaction should not hold long duration locks. Separate implementation takes advantage of semantics to improve performance. Since queue is implemented by a separate server (different from DBMS), the locking discipline need not be two-phase; discipline can be designed to suit the semantics of (the abstract operations) enqueue and dequeue Lock on head (tail) pointer released when dequeue (enqueue) operations complete (Hence not strict or isolated) Lock on entry that is enqueued or dequeued held to commit time Queue and DBMS are two separate systems. Transactions must be committed at both but isolation is implemented at the DBMS and applies to the schedule of requests made to the DBMS only. As a result, any scheduling policy for accessing the queue might be enforced. Real-world actions. A real-world action performed from within a transaction T, cannot be rolled back if crash occurs before commit. On recovery after a crash, how can we tell if the action has occurred? Solution: device maintains read-only counter (hardware) that is automatically incremented with each action. Page 4 of 29 On recovery: 1. recover queue and database; 2. read recorded value of counter from database; 3. if (device value > recorded value) then discard head entry; // device performed action else ; // device did not perform action 4. restart server(s); // which re-initiates transaction, // because request was restored // on queue Workflows A workflow is a model of a complex, long-running enterprise process generally performed in a highly distributed and heterogeneous environment. Workflow task Self-contained job performed by an agent o Inventory transaction (agent = database server) o Packing task (agent = human) Has an associated role that defines type of job o An agent can perform specified roles Accepts input from other tasks, produces output Has physical status: committed, aborted, ... o Committed task has logical status: success, failure Examples see slides 46 and 47 of lecture. ACID properties Individual tasks might be ACID, but workflow as a whole is not o Some task might not be essential: its failure is ignored even though workflow completes o Concurrent workflows might see each other‟s intermediate state o Might not choose to compensate for a task even though workflow fails Each task is either o Retriable: Can ultimately be made to commit if retried a sufficient number of times (e.g., deposit) o Compensatable: Compensating task exists (e.g., withdraw) o Pivot: Neither retriable nor compensatable (e.g., buy a nonrefundable ticket) Importance Allows management of an enterprise to guarantee that certain activities are carried out in accordance with established business rules, even though those activities involve a collection of agents, perhaps in different locations and perhaps with minimal training Audit trail Statistics … management information Page 5 of 29 H20 Implementing Isolation Commutativity Database operations p1 and p2 commute wanneer voor alle initiele DB states: a) Return the same results and b) Leave the DB in the same final state ongeachte de volgorde van uitvoering. p1 commutes with p2 if They operate on different data items o w1(x) commutes with w2(y) and r2(y) Both are reads o r1(x) commutes with r2(x) Operations that do not commute conflict o w1(x) conflicts with w2(x) o w1(x) conflicts with r2(x) Voorbeelden van commuting operations zie slide 9 en 10 van 2 e HC Conventional Operations We abstract operations to two classes: Read o r(x, X) - copy the value of database variable x to local variable X Write o w(x, X) - copy the value of local variable X to database variable x We use r1(x) and w1(x) to mean a read or write of x by transaction T1 Serializable schedules: S is serializable if it is equivalent to a serial schedule (dus gelijk aan een serial schema). Transactions are totally isolated in a serializable schedule A schedule is correct for any application if it is a serializable schedule of consistent transactions Serializability (zwaarste vorm van Isolation Level, eigenlijk predicate locking) voorziet in een conservatieve definitie van juistheid. Hierdoor wordt er voor veel overhead gezorgd en zullen dus de prestaties en doorvoer terug lopen. Sommige Schema‟s die niet serializable zijn, zijn soms wel te accepteren zijn (lagere Isolation Levels). Conflict equivalence vs view equivalence Up to now, a schedule is serializable if it is conflict equivalent to a serial schedule. Conflict equivalence definitie: schema‟s S1 en S2 zijn conflict equivalent als de conflicterende operaties op in beide dezelfde manier zijn geordend. 2 schema‟s van dezelfde set van operaties zijn view equivalent als: Corresponding read operations in each return the same values (dus de berekeningen zijn hetzelfde). And both schedules yield the same final database state Conflict equivalence implies view equivalence. View equivalence does not imply conflict equivalence. Conflict equivalence is sterker dan view equivalence Serialization graphs Een schema is serializable als de graaf cykel vrij is. Dit is de enige optie. De volgende stappen ondernemen: 1. Vind de conflicten 2. Teken de graaf 3. Kijk of de graaf een cykel bevat. Dirty read: reading data that hasn‟t committed Dirty write: T1 writes data written by active transaction. T1: w(x) T2: abort w(x) abort Page 6 of 29 T1 T3 T2 Strict schedule: dirty reads and dirty writes are prohibited Let op! Een strict schema is niet hetzelfde als een serializable schema. Concurrency Control Find a schedule of correct interleaving (concurrent schema‟s zijn gelijk aan seriële schema‟s). De CC kan niet het hele schema zien, dus is er een strategie. Door het toepassen van CC kan de response-time behoorlijk groter worden en de doorvoer kan behoorlijk beperkt worden. Strategy Do not serve a request if: 1. Violates strictness or serializability 2. There might be a possibility that a subsequent arrival may cause a violation of serializability. Modellen van CC: Immediate update: Write updates a db item Read copies value from db item Commit makes updates durable Abort undoes updates Deferred update: Write stores value in transaction‟s intention list. Read copies value from db or intention list Commit uses intention list to durably update db Abort discards intention list. Pessimistic control: Transaction requests permission for each DB (read/write) operation CC can: o Grant operation o Delay it until a subsequent event occurs (commit or abort of other transaction) o Abort the transaction Decisions made conservatively so that every commit req. can be granted. Takes precautions even if conflicts do not occur. Optimistic control: Requests for DB operations are always granted (read/write) Request to commit might be denied o Transaction is aborted if it performed a non-serializable operation Assumes conflicts are not likely Immediate Update Pessimistic CC Rule (can be used at each arriving request): Don‟t grant request that imposes an ordering amongst active transactions (delay the requesting transaction). Grant request that doesn‟t conflict with previously granted requests of active transactions. A transaction is forced to wait (if request is delayed). Delayed requests are reconsidered when a transaction completes (commit or abort, but it becomes inactive) Result: Each schedule is equivalent to a serial schedule in which transactions are ordered in the order in which they commit (=commit order) Locking: A transaction can read if it hold a Read (shared) lock, (granted if no transaction currently holds a write lock on that item) write and update if it holds a Write (exclusive) lock.(granted if no transaction holds any lock on that item) delayed if request cannot be granted Granted mode Requested mode read Read Write Page 7 of 29 write X X X All locks are released when transaction completes Lock is not granted if the request conflicts with the rule (zie boven), thus transaction waits. Result: schedules are serializable and strict. Implementation of locks using Lock set L(x) and Wait set W(x). Deadlock: CC that cause transactions to wait can cause deadlocks. Solution: Abort one transaction in the cycle Use wait-for graph to detect cycle Assume deadlock when transactions waits longer than time-out period Manual locking Manual lock release when finished accessing item. Better performance, but due to early lock release non-serializable schedules are possible. Two-phase locking Transaction doesn‟t release lock until it has all locks it will ever require (first phase). Then lock which are no longer needed will be released (unlocking phase). Schedule produced by two-phased locking control: Equivalent to a serial schedule (ordered by time of first unlock operation) Not necessarily recoverable (dirty reads and writes possible). A two-phase locking control that holds write locks until commit produced strict serializable schedules A strict two-phase locking control holds all locks until commit and produces strict and serializable schedules Lock granularity Table locking (coarse) o Lock entire table when a row is accessed. Row (tuple) locking (fine) o Lock only the row that is accessed. Page locking (compromise) o When a row is accessed, lock the containing page Deferred Update Optimistic CC Under optimistic assumption that conflicts don‟t occur, read and write requests are always granted (no locking, no overhead!). Transaction has three phases: Begin transaction o Read Phase - transaction executes: reads from database, writes to intentions list (deferred-update, no changes to database) Request commit o Validation Phase - check whether conflicts occurred during read phase; if yes abort (discard intentions list) Commit o Write Phase - write intentions list to database (deferred update) if validation successful For simplicity, we assume here that validation and write phases form a single critical section (only one transaction is in its validation/write phase at a time) Guarantees an equivalent serial schedule in which the order of transactions is the order in which they enter validation (dynamic) Validation A transaction is validated when it wants to commit. T1 enters validation, then check if T1 conflicted with any transaction. Overlapping of validation is not allowed! Advantage: No deadlock Disadvantage: No rollback possibility Page 8 of 29 H21 Isolation in relational databases Phantoms An example, all rows where the name is „mary‟ are locked (or selected) and a sum is calculated. But still there can be a row inserted with the name „mary‟. This las row is called a phantom. Phantoms occur when row locking is used. Phantoms can be prevented by using table locking or predicate locking. Locking Predicate locking prevents phantoms and produces serializable schedules, but is too complex to implement Table locking prevents phantoms and produces serializable schedules, but negatively impacts performance Row locking does not prevent phantoms and can produce nonserializable schedules Predicate locking A predicate describes a set of rows, some are in a table and some are not; e.g. name = „Mary‟ Every SQL statement has an associated predicate When executing a statement, acquire a (read or write) lock on the associated predicate Two predicate locks conflict if one is a write and there exists a row (not necessarily in the table) that is contained in both Locking is conservative: there might be no rows in Accounts satisfying both predicates Non-repeatable read With a non-repeatable read, execution of same SELECT twice yields the same set of rows, but attribute values might be different SQL Isolation Levels READ UNCOMMITTED – dirty reads, non-repeatable reads, and phantoms allowed READ COMMITTED - dirty reads not allowed, but non-repeatable reads and phantoms allowed REPEATABLE READ – dirty reads, non-repeatable reads not allowed, but phantoms allowed SERIALIZABLE – dirty reads, non-repeatable reads, and phantoms not allowed; all schedules must be serializable Locking implementation of SQL Isolation Levels Locking implementation is based on: Entities locked: rows, predicates, … Lock modes: read and write Lock duration: o Short - locks acquired in order to execute a statement are released when statement completes o Long - locks acquired in order to execute a statement are held until transaction completes o Medium – something in between Write locks are handled identically at all isolation levels: Long-duration predicate write locks are associated with UPDATE, DELETE, and INSERT statements. This rules out dirty writes In practice, predicate locks are implemented with table locks or by acquiring locks on an index as well as the data. Read locks are handled differently at each level: READ UNCOMMITTED: no read locks o Hence a transaction can read a write-locked item! o Allows dirty reads, non-repeatable reads, and phantoms READ COMMITTED: short-duration read locks on rows returned by SELECT o Prevents dirty reads, but non-repeatable reads and phantoms are possible REPEATABLE READ: long-duration read locks on rows returned by SELECT o Prevents dirty and non-repeatable reads, but phantoms are possible SERIALIZABLE: long-duration read lock on predicate specified in WHERE clause o Prevents dirty reads, non-repeatable reads, and phantoms and … o guarantees serializable schedules Page 9 of 29 Some DBMSs allow only read-only transactions to be executed on READ UNCOMMITTED level. Cursor stability Cursor stability is a commonly implemented isolation level, which is an extension of READ COMMITTED. Long-duration write locks on predicates Short-duration read locks on rows Additional locks for handling cursors Access by T1 through a cursor, C, generally involves OPEN followed by a sequence of FETCHs C is INSENSITIVE: rows FETCHed cannot be affected by concurrent updates (since OPEN is isolated) C is not INSENSITIVE: some rows FETCHed might have been updated by a concurrent transaction, T2, and others might not Read lock on row accessed through cursor is medium-duration; held until cursor is moved Example Allowed at READ COMMITTED, hence lost update possible. Not allowed at CURSOR STABILITY (since T1accesses t through a cursor). There is a possibility of deadlock when both transactions access t through a cursor Update locks An update lock conflicts with other update and write locks, but not with read locks (see schedule below) Some DBMS provide update locks to alleviate deadlock problem A transaction that wants to read an item now and possibly update it later requests an update lock on the item (manual locking) An update lock is a read lock that can be upgraded to a write lock. Often used with updatable cursors Optimistic Read Committed It‟s called optimistic because the transaction assumes that no transaction will write what it has read, hence it gives up its read lock. T1 aborts if it tries to write a tuple which its previously read and in the meantime some other transaction has written that tuple and committed. Other types of locking Intention locking Performance improvement possible if lock on parent is weak Intention shared (IS) lock: in order to get an S lock on an item, T must first get IS locks on all containing items (to root of hierarchy) Intention exclusive (IX) lock: in order to get an X lock on an item, T must first get IX locks on all containing items (to root of hierarchy) Shared Intention Exclusive (SIX): Equivalent to an S lock and an IX lock on an item Intention lock indicates transaction‟s intention to acquire conventional lock on a contained item Index locking Locking of index pages If a WHERE clause refers to a predicate name = mary and if there is an index on name, then an index lock on the index entries for name = mary is like a predicate lock on that predicate If a WHERE clause refers to a predicate such as 50000< salary < 70000 and if there is an index on salary, then a key-range index lock can be used to get the equivalent of a predicate lock on the predicate 50000<salary<70000 Page 10 of 29 Key-range locking Index entries at leaf level are locked See above. Locking a B-Tree Read Locks o Obtain a read lock on the root, and work your way down the tree locking each entry as it is reached o When a new entry is locked, the lock on the previous entry (its parent) can be released This operation will never revisit the parent No write operation of a concurrent transaction can pass this operation as it goes down the tree Called lock coupling or crabbing Write Locks o Obtain a write lock on the root, and work your way down the tree locking each entry as it is reached o When a new entry n is locked, if that entry is not full, the locks on all its parents can be released An insert operation might have to go back up the tree, revisiting and perhaps splitting some nodes Even if that occurs, because n is not full, it will not have to split n and hence need not go further up the tree Thus it can release locks further up in the tree. To avoid acquiring many fine grain locks on a table, a DBMS can set a lock escalation threshold. If more than the threshold number of tuple (or page) locks are acquired, the DBMS automatically trades them in for a table lock. Granular locking Problem: T1 holds a (fine grained) lock on field F1 in record R1. T2 requests a conflicting (coarse grained) lock on R1. How does the concurrency control detect the conflict since it sees F1 and R1 as different items? Solution: Organize locks hierarchically by containment and require that in order for a transaction to get a fine grained lock it must first get a coarse grained lock on the containing item o T1 must first get a lock on R1 before getting a lock on F1. The conflict with T2 is detected at R1 Multi-version concurrency control A multi-version DBMS maintains all versions created in the (recent) past. Major goal of a multiversion DBMS: avoid the need for read lock All DBMSs guarantee that statements are isolated: o Each statement sees state produced by the complete execution of other statements, but state might not be committed A multiversion control guarantees that each statement sees a committed state: o A statement is executed in a state whose value is a version o Referred to as statement-level read consistency A multiversion control can also guarantee that all statements of a transaction see the same committed state: o All statements of a transaction access the same version o Referred to as transaction-level read consistency Read-only multi-version control Distinguishes in advance read-only (R/O) transactions from read/write (R/W) transactions. R/W transactions use a (conventional) immediate-update, pessimistic control. Hence, transactions access the most current version of the database. All the reads of a particular R/O transaction TRO are satisfied using the most recent version that existed when TRO requested its first read. Read consistency multi-version control Page 11 of 29 T1 and T2 are read/write transactions T3 is read/only T3 sees the version produced by T1 The equivalent serial order is T1, T3, T2 Implementation DBMS maintains a version counter (VC) o Incremented each time a R/W transaction commits The new version of a data item created by a R/W transaction is tagged with the value of VC at the time the transaction commits When a R/O transaction makes its first read request, the value of VC becomes its counter value. Each request to read an item is satisfied by the version of the item having the largest version number less than or equal to the transaction‟s counter value. Read consistency multi-version control R/O transactions o Treated as before: get transaction-level read consistency R/W transactions o Write statements acquire long-duration write locks (delay other write statements) o Read statements use most recent (committed) version at time of read Not delayed by write locks (since read locks are not requested). Snapshot Isolation Does not distinguish between R/W and R/O transactions A transaction reads the most recent version that existed at the time of its first read request o Guarantees transaction-level read consistency The write sets of any two concurrently executing transactions must be disjoint o Two implementations of this specification First Committer Wins Locking implementation First committer wins Writes use deferred-update (intentions list) T is allowed to commit only if no concurrent transaction o committed before T and o updated a data item that T also updated Control o o o o is optimistic: It can be implemented without any locks Deadlock not possible Validation (write set intersection) is required for R/W transactions and abort is possible Schedules might not be serializable Page 12 of 29 Locking implementation of snapshot isolation Immediate update pessimistic control Reads do not get any locks and execute as in the previous implementation A transaction T that wants to perform a write on some item must request a write lock o If the version number of that item is greater than that of T, T is aborted (first committer wins) o Otherwise, if another transaction has a write lock on that item, T waits until that transaction completes If that transaction commits, T is aborted (first committer wins) If that transaction aborts, T is given the write lock and allowed to write Following anomalies are impossible: Dirty read, dirty write, non-repeatable read, lost update. Write skew is possible. In a write skew anomaly, two transactions (T1 and T2) concurrently read an overlapping data set (e.g. values V1 and V2), concurrently make disjoint updates (e.g. T1 updates V1, T2 updates V2), and finally concurrently commit, neither having seen the update performed by the other. Phantoms: It looks like phantoms cannot occur in snapshot isolation but…non serializable schedules due to phantoms are possible. Example: concurrent transactions each execute SEL(P) and then insert a row satisfying P. Neither sees the row inserted by the other. Schedule is not serializable Would be considered a phantom if it occurred at REPEATABLE READ. Can be considered as write skew (is permitted in snapshot isolation). Page 13 of 29 H22 Atomicity and Durability Failures Crash Processor failure, software bug Server supports atomicity by providing a recovery procedures to restore db, by using rollback Abort By user, Transaction, System, etc Roll transaction Back Media Durability requires commits to be permanent Due to the possibility of media crash, the media used must be redundant. Log Log contains information which can restore the DB. Each modification of DB causes an update record to be appended to log. Update record contains: Identity of data item modified Identity of transaction (tid) that did the modification Before image (undo record) – copy of data item before update occurred. Abort using log (Assume immediate-update approach) Scan log backwards using tid to identify transaction‟s update records o Reverse each update using before image o Reversal done in last-in-first-out order In a strict system, new values unavailable to concurrent transactions (as a result of long term exclusive locks); hence rollback makes transaction atomic Problem: terminating scan (log can be long) Solution: append a begin record for each transaction, containing tid, prior to its first update record Crash Savepoint record inserted in log when savepoint created o Contains tid, savepoint identity Rollback Procedure: o Scan log backwards using tid to identify update records o Undo updates using before image o Terminate scan when appropriate savepoint record encountered recovery using a log Abort all transactions active at time of crash Problem: How do you identify them? Solution: abort record or commit record appended to log when transaction terminates o Recovery Procedure: Scan log backwards - if T‟s first record is an update record, T was active at time of crash. Roll it back NB: a transaction is not committed until its commit record is in the log Page 14 of 29 Problem: Scan must retrace entire log Solution: Periodically append checkpoint record (≠ savepoint!) to log. Contains tid‟s of all active transactions at time of append o Backward scan goes at least as far as last checkpoint record appended o Transactions active at time of crash determined from log suffix that includes last checkpoint record o Scan continues until those transactions have been rolled back Write ahead log: log buffer in main memory (extension of log on mass storage). Periodically flushed to mass storage. Important log-buffer doesn‟t survive a crash. Page buffer in main memory (cache), is volatile and has to be flushed to mass storage. Atomicity and durability complicate algorithms. Requirements: Write-ahead feature (move update records to log on mass store before database is updated) necessary to preserve atomicity New values written by a transaction must be on mass store when its commit record is written to log (move new values to mass store before commit record) to preserve durability Transaction not committed until commit record in log on mass store Forced vs. Unforced writes: On database page – o Unforced write updates cache page, marks it dirty and returns control immediately. o Forced write updates cache page, marks it dirty, uses it to update database page on disk, and returns control when I/O completes. On log – o Unforced append adds record to log buffer and returns control immediately. o Forced append, adds record to log buffer, writes buffer to log, and returns control when I/O completes o After a flush of the log buffer, we start with a clean log buffer in volatile memory. Page 15 of 29 Log Sequence Number (LSN): Log records are numbered sequentially Each database page contains the LSN of the update record describing the most recent update of any item in the page Commit processing: Force Policy 1. Force any update records of T in log buffer then … 2. Force any dirty pages updated by T in cache then … (1) and (2) ensure atomicity (write-ahead policy) 3. Append T‟s commit record to log buffer then … Force log buffer for immediate commit or … Write log buffer when a group of transactions have committed (group commit) (2) and (3) ensure durability Using forced policy: transactions updates are in the DB (mass storage) when it commits. Problem: Pages updated by T might still be in cache when T‟s commit record is appended to log buffer Solution: Update record contains after image (called a redo record) as well as before image Write-ahead property still requires that update record be written to mass store before page But it is no longer necessary to force dirty pages when commit record is written to log on mass store since all after images precede commit record in log Referred to as a no-force policy Page 16 of 29 Recovery processing: No-force policy Problem: When a crash occurs there might exist some pages in database (on mass store) containing updates of uncommitted transaction: they must be rolled back that do not (but should) contain the updates of committed transactions: they must be rolled forward Solution: Use a sharp checkpoint Before appending checkpoint record CK to log buffer, halt processing and force all dirty pages from cache Recovery process can assume that all updates in records prior to CK were written to database (only updates in records after CK might not be in db) p1 must be rolled forward using xnew p2 must be rolled back using yold 1. Pass 1 Log is scanned backward to most recent checkpoint record, CK, to identify transactions active at time of crash 2. Pass 2 Log is scanned forward from CK to most recent record. The after images in all update records are used to roll the database forward 3. Pass 3 Log is scanned backwards to begin record of oldest transaction active at time of crash. The before images in the update records of these transactions are used to roll these transactions back This is called DO-UNDO-REDO (updates – rollback in pass 3 – rollforward in pass 2) Issue 1: Database pages containing items updated after CK was appended to log might have been flushed before crash No problem – with physical logging, roll forward using after images in pass 2 is idempotent Rollforward in this case is unnecessary, but not harmful Issue 2: Some update records after CK might belong to an aborted transaction T1. These updates are restored in pass 2 and but not rolled back in pass 3 since T1 was not active at time of crash Treat rollback operations for aborting T1 as ordinary updates and append compensating log records to log Issue 3: What if system crashes during recovery? Recovery is restarted If physical logging is used, pass 2 and pass 3 operations are idempotent and hence can be redone Page 17 of 29 Fuzzy checkpoints Before writing CK, record the identity of all dirty pages (don‟t flush) in volatile memory. Write (flush) dirty pages in the background Example: Page corresponding to U1 (x) is recorded at CK1 and will have been flushed by CK2 Page corresponding to U2 (y) is recorded at CK2, but might not have been flushed at time of crash Pass 2 must start at CK1 Deferred update system Durability Update: append new value to intentions-list; append update record to log buffer. Abort: discard intentions-list Commit: force commit record to log. Update db using intentinons-list. Recovery Checkpoint record contains list of committed (not active) but incomplete transactions (intentionslist). Scan back to most recent checkpoint record to determine transactions that are committed but for which updates are incomplete at time of crash Scan forward to install after images for incomplete transactions No third pass required since transactions active (not committed) at time of crash have not affected database Database dump Simple Dump System stops accepting new transactions Wait until all active transactions complete Dump: copy entire database to a file on mass storage (including mirror) Restart log and system Simple Dump Restore Install most recent dump file Scan backward through log Determine transactions that committed since dump was taken Ignore aborted transactions and those that were active when media failed Scan forward through log Install after images of committed transactions Fuzzy Dump Write begin record to log; Copy db records to dump file while system is active. Naïve restoration Install dump on disk Scan log backwards to begin dump record to produce list L of all transactions that committed since start of dump Scan log forward and install after images in update records of all transactions in L Page 18 of 29 Some examples that it works fine: Naïve restoration doesn‟t handle 2 cases T commits before dump starts but its dirty pages might not have been flushed until dump completed. Dump does not read T‟s updates and T is not in L . Dump reads T‟s updates but T later aborts: Page 19 of 29 H23 Architecture of Transaction processing systems Three-tier architectures First single user system. Presentation services: display forms etc. Application services: implements user req, interacts with DBMS TPS = Transaction Processing System Application server Sets transaction boundaries Acts as a workflow controller: implements user request as a sequence of tasks e.g., registration = (check prerequisites, add student to course, bill student) Acts as a router Distributed transactions involve multiple servers Server classes are used for load balancing Since workflows might be time consuming and application server serves multiple clients, application server is often multi-threaded Page 20 of 29 Transaction server Stored procedures off-loaded to separate (transaction) servers to reduce load on DBMS. Transaction server close to DBMS, Application server close to clients Transaction server does bulk of data processing. Interconnection of servers in 3-tiered model: Session and context A session exists between two entities if they exchange messages while cooperating to perform some task. Client/server session: server context (describing client) has to be maintained by server in order to handle a sequence of client requests. Direct vs. Queued transaction processing Direct: Client waits until request is serviced. Service provided as quickly as possible and result is returned. Client and server are synchronized. Queued: Request enqueued and client continues execution. Server dequeues request at a later time and enqueues result. Client dequeues result later. Client and server unsynchronized. Three transactions on two recoverable queues Advantages: Client can enter requests even if server is unavailable Server can return results even if client is unavailable Request will ultimately be served even if T2 aborts (since queue is transactional) Heterogeneous vs Homogeneous TPS Homogeneous systems are composed of HW and SW modules of a single vendor Modules communicate through proprietary (often unpublished) interfaces Hence, other vendor products cannot be included Referred to as TP-Lite systems Heterogeneous systems are composed of HW and SW modules of different vendors Modules communicate through standard, published interfaces Referred to as TP-Heavy systems Middleware is the software that integrates the components of a heterogeneous system and provides utility services . For example, supports communication (TCP/IP), security (Kerberos), global ACID properties, translation (JDBC) Transaction Manager Middleware to support global atomicity of distributed transactions Application invokes manager when transaction is initiated Manager is informed each time a new server joins the transaction Application invokes manager when transaction completes Manager coordinates atomic commit protocol among servers to ensure global atomicity Page 21 of 29 TP monitors A TP Monitor is a collection of middleware components that is useful in building hetereogeneous transaction processing systems Includes transaction manager Application independent services not usually provided by an operating system TP Monitor Services Communication services Built on message passing facility of OS Capable of type checking Peer-to-peer (sessions), RPC, and/or event communication (event broker) Location transparent Transactional (TRPC) Robust against failures Asymmetric or synchronous If within transaction Asymmetric use persistent queue Symmetric requester joins transaction ACID properties Local isolation for a (non-db) server might be provided by a lock manager Local atomicity for a (non-db) server might be provided by a log manager Global isolation and atomicity are provided by transaction manager Routing and load balancing TP monitor can use load balancing to route a request to the least loaded member of a server class Threading Threads can be thought of as low cost processes Useful in servers (e.g., application server) that might be maintaining sessions for a large number of clients TP monitor provides threads if OS does not Recoverable queues Security services Encryption, authentication, and authorization Miscellaneous servers File server Clock server Page 22 of 29 Storage Architectures Bottleneck in performance is disk I/O DBMS maintains disk cache in main memory. Some possible usage of systems: RAID, NAS and SAN (storage attached network) Architecture of Web transaction processing A Web application server is a set of tools and modules for building and executing transaction processing systems for the Web Including the application server tier of the system We discuss J2EE (Java 2 Enterprise Edition) standard J2EE One language, many platforms .NET One platform, many languages. set of products of Microsoft J2EE defines a set of services and classes particularly oriented toward transaction-oriented Web services o Java servlets o Enterprise Java beans J2EE Enterprise Java Beans Java classes that implement the business methods of an enterprise Execution within an infrastructure of services provided by the Web application server o Supports transactions, persistence, concurrency, authorization, etc. o Implements declarative transaction semantics The bean programmer can just declare that a particular method is to be a transaction and does not have to specify the begin and commit commands o Bean programmer can focus on business methods of the enterprise rather on details of system implementation Page 23 of 29 Entity bean: represents a persistent business object whose state is stored in the database o one bean = one table o one bean instance = one row in table Session bean: represents a client performing interactions within a session using the business methods of the enterprise o can retain state during interactions o session beans call methods of entity beans (synchronous communication) o can be transactional (JDBC or JTA: Java Transaction API) Message-driven bean: is like a session bean but asynchronous (uses JMS message queues) The bean class o Contains implementations of the business methods of the enterprise A remote interface (also optionally a local interface) o Used by clients to access the bean class remotely, using TRPC(or locally with the local interface); acts as proxy for bean class o Includes declarations of all the business methods A home interface (also optionally a local home interface) o Contains methods that control bean‟s life cycle (Create, remove) and finder methods (e.g. FindByPrimaryKey) A deployment descriptor o Declarative metadata for the bean o Describes persistence, transactional, and authorization properties Page 24 of 29 H24 Distributed Transactions Atomic Commit Protocol Global atomicity All subtransactions of a distributed transaction must commit or all must abort An atomic commit protocol, initiated by a coordinator (e.g., the transaction manager), ensures this. Coordinator polls cohorts (participating databases) to determine if they are all willing to commit Protocol is supported in the xa-interface between a transaction manager and a resource manager (e.g., DBMS) ACID properties Requirement for each local DBMS o supports ACID properties locally for each subtransaction o eliminates local deadlocks The additional issues are: o Global atomicity: all cohorts must abort or all commit o Global deadlocks: there must be no deadlocks involving multiple sites o Global serialization: distributed transaction must be globally serializable Cohort Abort Reasons for abort: Validation failure, deadlock, crash of cohort site, no communication with cohort site. Atomic Commit Protocol Most commonly used atomic commit protocol is the two-phase commit protocol Implemented as an exchange of messages between the coordinator and the cohorts Guarantees global atomicity of the transaction even if failures should occur while the protocol is executing Two-Phase Commit Protocol Transaction record resides in volatile memory of transaction manager. Created when application calls tx_begin. Phase 1 Application invokes tx_commit Coordinator sends prepare message to all cohorts If cohort wants to commit, it moves all update records to mass store by forcing a prepare record to its log o Guarantees that cohort will be able to commit (despite crashes) if coordinator decides commit (since update records are durable) Cohort enters prepared state Cohort sends a vote message (“ready” or “aborting”). Page 25 of 29 Cohort cannot change its mind retains all locks if vote is ready” enters uncertain period (it cannot foretell final outcome) Note that cohort may abort at any time prior to or on receipt of the message: it aborts and releases locks Coordinator receives vote messages Coordinator records vote in transaction record Remember, vote indicates cohort is “ready” to commit or aborting If any vote is aborting, coordinator decides abort and deletes transaction record If all are ready, coordinator decides commit, forces commit record (containing transaction record) to its log (end of phase 1) Coordinator sends commit or abort message to all cohorts Note: Transaction is logically committed when commit record is durable Since all cohorts are in prepared state, transaction can be committed despite any failures Phase 2 Cohort receives commit message from coordinator o Cohort commits locally by forcing a commit record to its log o Cohort sends done message to coordinator Cohort receives abort message from coordinator o Cohort aborts Locks are released and uncertain period ends Coordinator receives a done message o Coordinator records receipt of done message o If all received, coordinator writes a complete record to its log and deletes transaction record from volatile store Abort case Commit case Failures A participant recognizes 2 failure situations: Timeout: No response to a message execute timeout protocol Crash: On recovery, execute a restart protocol A cohort is blocked when it cannot complete protocol until some failure is repaired. Timeout protocol [Cohort] Time out waiting for prepare Abort the subtransaction o Since the (distributed) transaction cannot Commit unless cohort votes to commit, atomicity is preserved [Coordinator] Time out waiting for vote Abort the transaction o Since coordinator controls decision, it can force all cohorts to abort, preserving atomicity Page 26 of 29 [Cohort] Time out waiting for commit/abort Cohort is in prepared state Cohort is blocked since it does not know coordinator‟s decision o Coordinator might have decided commit or abort o Cohort cannot unilaterally decide since its decision might be contrary to coordinator‟s decision, violating atomicity o Locks cannot be released Cohort requests status from coordinator and remains blocked [Coordinator] Time out waiting for done Requests done message from delinquent cohort Restart protocol Cohort On restart cohort finds in its log begin_transaction record, but no prepare record: o Abort (transaction cannot have committed because cohort has not voted) prepare record, but no commit record (cohort crashed in its uncertain period) o Does not know if transaction committed or aborted o Locks items mentioned in update records before restarting system o Requests status from coordinator and blocks until it receives an answer commit record o Recover transaction to committed state using log Cohort in blocked state, but coordinator does not respond to a request for status; either Wait until the coordinator is restarted Give up, make a unilateral decision, and attach a fancy name to the situation. Resolve the potential loss of atomicity outside the system (thus human intervention needed) Coordinator On restart Records of transactions in phase 1 lost (transaction record in volatile memory) If there are transactions in phase 2, then we have commit record in the log, but no complete record: restore transaction record to volatile memory On receiving a request from a cohort for transaction status If transaction record exists in volatile memory, reply based on information in transaction record If no transaction record exists in volatile memory o Cohort asks for status, but no trans.rec. at coord, then either The coordinator aborted the transaction and deleted transaction record The coordinator crashed, restarted and didn‟t find commit record. It was in Phase 1 of the protocol and had not yet made a decision, or it had previously aborted the transaction. o Thus coordinator can respond abort Linear Commit Variation of two-phase commit that involves transfer of coordination Cohorts are assumed to be connected in a linear chain When left cohort A is ready to commit, it goes in prepared state and sends a vote message „ready‟to B (request B to act as coordinator) After receiving the vote message, B does the same as A and sends vote message to C When vote message reaches rightmost cohort N, if N is ready to commit it commits the entire transaction (acts as coordinator) and sends commit message to it‟s left. Message goes down the chain until it reaches A. When A has committed it sends a „done‟ message to B and so on until it reaches N. Page 27 of 29 Two-phase commit without prepared state Assume exactly one cohort C does not support a prepared state. Coordinator performs Phase 1 of two-phase commit protocol with all other cohorts If they all agree to commit, coordinator requests that C commit its subtransaction (in effect, requesting C to decide the transaction‟s outcome) C responds commit/abort, and the coordinator sends a commit/abort message to all other sites Global Deadlock Not always detectable on any one site (it may be distributed over multiple sites). Detection by a simple extension of local deadlock detection. Check for a cycle with a probe. The probe is send to the coordinator of the cohort it is waiting for. When the probe returns a deadlock exists. Prevention by using timestamps. A older transaction never waits for a younger one, the younger one is aborted. If all sites use strict two-phase locking and the transaction manager uses a two-phase commit protocol, Then transactions are globally serializable in commit order. Replication Advantages o Improves availability: data can be accessed even though some site has failed o Can improve performance: a transaction can access the closest (perhaps local) replica Disadvantages o More storage o Mutual consistency of replicas must be maintained o Access by concurrent transactions to different replicas can lead to incorrect results Replica control Knows location of all replicas Translates transaction‟s request to access an item into a request to access particular replica(s) Maintains some form of mutual consistency: o Strong: all replicas always have the same value (in every committed version of the database) o Weak: all replicas eventually have the same value o Quorum: a quorum of replicas have the same value (a certain set of servers have the same data) Page 28 of 29 Read one /write all replica control Read request: use nearest replica Write request: update alle replicas Synchronous: immediately Asynchronous: eventually Quorum Consensus replica control To read using quorums, timestamps are used to select the right values. See also example sheet (made during lecture). Primary copy control One copy designated primary, other copies secondary. Reading is from the nearest copy. Writing is on the primary copy. After commit copies are propagated to secondary copies, thus it‟s Asynchronous, hence good for performance bad for consistency. Page 29 of 29