Download B - Homework Market

Document related concepts

Global serializability wikipedia , lookup

Relational model wikipedia , lookup

Database wikipedia , lookup

Database model wikipedia , lookup

Clusterpoint wikipedia , lookup

Consistency model wikipedia , lookup

Microsoft Jet Database Engine wikipedia , lookup

Versant Object Database wikipedia , lookup

Commitment ordering wikipedia , lookup

Extensible Storage Engine wikipedia , lookup

Serializability wikipedia , lookup

Concurrency control wikipedia , lookup

Transcript
Transactions
• Transaction:
– A unit of work
– A sequence of operations which are bundled
together
• Either all of them should be executed or none
i.e. not partial execution
• Important in terms of concurrency and crash
recovery
1
FIGURE 21.2
Two sample transactions. (a) Transaction T1.
(b) Transaction T2.
2
Bad Failure Eg: Fund Transfer
• Transfer $50 from checking to saving account
–
1.
2.
3.
4.
5.
6.
Initially, A and B each have 200$.
read(checking)
checking := checking – 50
write(checking)
read(saving)
saving := saving + 50
write(saving)
• If transaction fails after step 3, before step 6 ?
• Money will be “lost” : inconsistent state
3
Bad Failure Eg: Fund Transfer
• So if system crashes after step 3, what should
happen?
• DBMS must ensure that either the transaction
is re-started and steps 4 through 6 are
completed or …
• The effects of steps 1 through 3 are undone
– And user informed that the transaction failed
• This is an example of crash recovery
4
Concurrency
• Parallel processing: processes are concurrently
executed in multiple CPUs. Advantage?
• Increased throughput: can do different things
simultaneously, so can do more things.
• Interleaved processing: concurrent execution of
processes interleaved in single CPU. Why?
• When one process is doing I/O, the other can be
accessing the database: better throughput.
• Prevents one process from holding up other
processes for a long period of time
5
Concurrency
• Concurrency: We will refer to both parallel
processing on multiple CPUS and interleaved
processing on a single CPU as concurrency.
– Most issues apply in both cases
• When would we want to allow two transactions
to execute concurrently ?
• If two transactions are not “interfering” with
each other then we want to allow concurrent
execution.
– When do we think of interfering with each other ?
• Dealing with the same data
6
Concurrency
• Eg:
–
–
–
–
–
Jack has a bank account
Jill has a separate bank account
Jack depositing 50$ into his account
Jill depositing 50$ into her account
Should we allow concurrent execution here ?
• Yes, can do concurrently
– Not interfering with each other
• Suppose a joint bank account ?
7
Bad Concurrency Example
•
Eg: Jack and Jill each depositing 50$ into a
joint bank account, initial balance $200
Jack: T1
Jill: T2
read(balance B):200
B := B + 50: 250
read(B): 200
B := B + 50: 250
write(B):250
write(B):250
•
Final balance ends up with wrong value
8
Concurrency and Recovery
• Two main issues to deal with:
– Concurrent execution
– Failures of various kinds, such as hardware
failures and system crashes
• They do effect each other, but we will initially
look at them mostly separately.
• Will look initially at the basics
– What we want, what we try to avoid, what kind of
information we need to store etc.
• Later we will look at algorithms
9
FIGURE 21.2
Two sample transactions. (a) Transaction T1.
(b) Transaction T2.
10
Transactions
• Transaction: includes one or more database
access operations
• Read : no change made
• Write :insert, update, delete: a change is made
• Internal actions: don’t have an effect on other
transactions
• Commit, Abort
– Needed for recovery, not for concurrency
– More on next slide
11
COMMIT/ABORT
• commit_transaction: This signals a successful
end of the transaction
– Any changes made by the transaction can be safely
committed to the database and will not be undone.
– These changes are final
• rollback (or abort): This signals that the
transaction has ended unsuccessfully
– Any changes to the database made by the
transaction must be undone.
12
Transactions
• Transaction could be
– stand-alone : SQL submitted interactively,
– embedded within a program.
• application program may contain several
transactions. Bank Eg:
– at end of month
• Get a list of checking accounts have < 1000$
• For these, transfer money from savings
• SQL support – will study later.
13
Some Op. Systems Concepts
• Where is the database going to be stored ?
• On disk – much more storage than RAM.
• Where does data have to be in order to
manipulate it ?
• RAM: can’t access anything on disk
• Have to move data between RAM and disk
• How much data moved at a time ?
• Block : Basic unit of data transfer from the
disk to the computer main memory.
14
Read item
• read_item(X): Reads a database item named X
into a program variable
– We assume that program variable is also named X.
• read_item(X) includes the following steps:
1. Find the address of the disk block that contains item X.
2. Copy that disk block into a buffer in main memory
 if that block is not already in some main memory buffer
3. Copy item X from the buffer to the program variable
named X.
15
Write item
• write_item(X): Writes the value of program
variable X into the database item named X.
• write_item(X) includes the following steps:
1. Find address of the disk block that contains item X.
2. Copy that disk block into a buffer in main memory
 if that disk block is not already in some RAM buffer
3. Copy item X from the program variable named X
into its correct location in the buffer.
4. Store the updated block from the buffer back to disk
 either immediately or at some later point in time
 Why not write directly into disk?
 Disk I/O is expensive.
16
COMMIT
•
As its final action, a transaction T has to
either COMMIT or ABORT
• COMMIT :
1. T tells DBMS that it is committing i.e. it has
completed successfully.
2. DBMS comes back and issues a commit :
•
•
It is telling T that all the changes are going to be
made permanent, even if the system crashes
This is now the DBMS’s responsibility
17
ABORT
• T tells DBMS that it is terminating itself due to
some error (eg: no data found).
• Alternatively, DBMS might decide to abort T
– Could be due to some internal reason
– Could be due to a system crash
• DBMS has to ensure that database remains in a
consistent state
– Either aborted transaction started again
• If aborted by DBMS
– Or it is killed and its effects are undone. Aka
rollback
18
Schedule Example
• T1: r(x), w(x), commit.
• T2 : r(y), r(x) commit.
T1
T2
r1(x)
r2(y)
w1(x)
c1
r2(y)
c2
• S = r1(x), r2(y), w1(x), r2(x), c1, c2.
19
Schedules
• Schedule : a schedule S is sequences of
instructions that specify the chronological
order in which instructions of concurrent
transactions are executed
• S must preserve the order in which the
instructions appear in each individual
transaction.
• Operations from other transactions Tj can be
interleaved with the operations of Ti in S.
– Potential schedule (i.e. what may happen)
• May want to limit potential schedules
– Actual schedule
20
Serial Schedules
• Serial schedule: A schedule S is serial if, for
every transaction T participating in the schedule,
all the operations of T are executed
consecutively in the schedule.
– Eg: All ops of T1 followed by all ops of T2
• Otherwise, the schedule is called an interleaved
schedule
– Eg: Some T1 ops, followed by some T2 ops, followed
by some T1 ops
21
[SKS] Eg of serial, interleaved
schedules
22
Schedule Example S1
• T1: r(x), w(x), commit.
• T2 : r(y), commit.
T1
T2
r1(x)
w1(x)
c1
r2(y)
c2
• S1 = r1(x), w1(x), c1, r2(y), c2. Is S1 serial?
• S1 is serial : T1 followed by T2
23
Schedule Example S2
• T1: r(x), w(x), commit.
• T2 : r(y), commit.
T1
T2
r1(x)
r2(y)
w1(x)
c1
c2
• S2 = r1(x), r2(y), w1(x), c1, c2. Is S2 serial?
• S2 is not serial – is interleaved.
24
ACID Properties: Atomicity
• ACID Properties: These are desirable properties
of transactions:
• Jim Gray of Microsoft won the 1998 Turing
award for his work on transactions in databases
– ACID properties, locking, fault tolerance
• Atomicity: Either all operations of the
transaction should occur or none should occur.
– The system has to enforce this by making sure that
only a part of the transaction is not done.
25
Atomicity Example of Fund Transfer
The following is one transaction
1.
2.
3.
4.
5.
6.
read(A)
A := A – 50
write(A)
read(B)
B := B + 50
write(B)
• Atomicity requirement
– if the transaction fails after step 3 and before step 6,
money will be “lost” leading to an inconsistent database
state
– the system should ensure that updates of a partially
executed transaction are not reflected in the database
26
ACID Properties: Consistency
• What do we mean by consistency ?
• Consistency requirements could include
• Explicitly specified constraints such as primary keys and
foreign keys
• Implicit integrity constraints
– e.g. sum of A and B should remain same
• When do we worry about consistency?
• Programmer has to ensure that a transaction which
completes by itself (no interleaving, no crashes) leaves
the database in a consistent state
– Eg: if T transferring money from checking to saving adds one
more dollar to saving
– DBMS can’t catch this kind of inconsistency
– Erroneous transaction logic can lead to inconsistency
27
ACID Properties: Consistency
• Here we are only interested in consistency when
there could be a problem
– via interleaving of different transactions or
– T not completing due to system crash
• If things were consistent without concurrency or
recovery, we want to ensure
– same consistent state with concurrency + recovery.
• Before T starts, assume database is consistent
• After T completes, database must be consistent
• During execution of T, OK if the database
becomes temporarily inconsistent.
28
Consistency Example
1.
2.
3.
4.
5.
6.
read(A)
A := A – 50
write(A)
read(B)
B := B + 50
write(B)
• Consistency requirement in above example?
• The sum of A and B is unchanged by the
execution of the transaction.
– We are adding 50 to A and subtracting 50 from B
– Should be true even if system crashes in the middle
and some part of the transaction has to be redone.
What could be the problem ?
29
ACID Properties: Isolation
• Isolation: Want to allow multiple transactions to
execute concurrently for better performance
– Don’t want transactions to interfere with each other
if they are working on the same data
• User should be able to understand the effect of
each transaction by itself
– Even if the DBMS does interleaving, the user should
be sheltered from any negative effects of the
interleaving.
– To the user it should look like serial execution
30
Isolation Example
•
T2 is printing A+B, T1 is subtracting 50
from A and adding 50 to B.
T1
read(A)
A := A – 50
write(A)
read(B)
B := B + 50
write(B)
T2
read(A)
read(B)
print(A+B)
31
Isolation Example
• Will T1 T2 be same as T2 T1 ?
– Eg: A = 80, B = 60
• Will all interleaved schedules give the same
result as the serial schedules i.e. same result
as T1 T2 or T2 T1 ?
• No
– what would be an interleaved schedule which
would give a different (undesirable) result ?
32
Isolation Example
•
Consider interleaved schedule:
T1
T2
1.
2.
3.
read(A)
A := A – 50
write(A)
read(A), read(B), print(A+B)
4.
5.
6.
read(B)
B := B + 50
write(B)
–
What will be the result with A = 80, B = 60
33
ACID Properties: Isolation
• Isolation can be ensured trivially by running
transactions serially
– that is, one after the other. Problem ?
• Inefficiency. Should allow concurrent execution
but it should be serializable:
– To T1 it should like it finished completely before T2 or
it didn’t start till all of T2 had finished
– Even if execution is interleaved
– Because T1 and T2 don’t interfere with each other
– will study later.
• Isolation (non-interference) can be achieved by
locks, will study later in detail.
34
ACID Properties: Durability, Eg
• Durability: When T finishes, system issues a
commit (tells user that T completed)
– then DBMS has to guarantee that changes made by T
will take place i.e. be recorded in the database
– even if system crashes.
• Eg: customer comes to withdraw from ATM
1.Customer at ATM, requests 300$ withdrawal
2.ATM sends request to central computer
3.Central computer issues a commit to ATM
4.ATM gives money to customer
35
ACID Properties: Durability, Eg
• If system (central computer) crashes before the
debit is made, what will happen ?
• Why shouldn’t the debit be made before the
300$ is given?
• Buffering : for efficiency reason, local copies
of file blocks are kept in volatile RAM, and
disk I/Os are done periodically
– Disk I/Os are expensive !
36
Transaction Failure
• We distinguish between catasrophic
failure and non-catastrophic failure.
What kind of failure is catastrophic ?
• When the disk gets wiped out ! Why is
this important in a database context ?
• The database is stored on disk
– For non-catastrophic failure can recover by
using the disk
– Can’t do for catastrophic failure
37
Causes of Non-Catastrophic Failure
• A computer failure (system crash): A hardware
or software error occurs during transaction
execution
– If the hardware crashes, the contents of the
computer’s internal memory may be lost.
• A transaction or system error : Some operation
in the transaction may cause it to fail. Eg ?
• Eg: integer overflow or division by zero.
• User may interrupt the transaction during its
execution.
38
Causes of Non-Catastrophic Failure
• Local errors or exception conditions detected
by the transaction:
– Eg: transaction involves fund withdrawal :
• account has insufficient account balance
• Concurrency control enforcement: The
concurrency control method may decide to
abort the transaction, to be restarted later
– Eg: because it violates serializability
• Will study later: basically means transactions
interfering with each other
– Eg: several transactions are in a state of deadlock
39
Causes of Catastrophic Failure
• Disk failure: Some disk blocks may lose their
data because of a read or write malfunction or
because of a disk read/write head crash.
– This may happen during a read or a write operation
of the transaction.
• Physical problems and catastrophes: Eg?
• Power failure, fire, overwriting disks or tapes
by mistake.
40
UNDO/REDO
Recovery techniques use the following operators:
• undo: Similar to rollback except that it applies
to a single operation rather than to a whole
transaction.
• redo: This specifies that certain transaction
operations must be redone to ensure that all the
operations of a committed transaction have
been applied successfully to the database
– If a system crash takes place, we may need to redo
some of the operations of a committed transaction.
41
Transactions and Recovery
• Transactions that have written their commit
entry in the log must also have recorded all their
write operations in the log. Why?
• Suppose a transaction has been issued a
commit, but its operations have not been carried
out, and the system crashes. What to do?
• We need to redo the ops. What information do
we need to redo the ops?
• We need to know what all the write operations
are; this information has to be in the system log.
42
Transactions and Recovery
• Can this system log be in RAM or does it have
to be on disk?
• The log file must be kept on disk because the
contents of RAM may be lost in a system crash
– Master copy on disk, working copy in RAM
• Force writing a log: before a transaction
reaches its commit point, any portion of the log
that has not been written to the disk yet must
now be written to the disk.
– This process is called force-writing the log file and
has to be done before a commit is issued to a
transaction.
– If force writing log, why not force write database?
43
System Log
• The System Log : keeps track of all
transaction operations that affect the values of
database items.
• This information needed to permit recovery from
transaction failures.
• The log is kept on disk so it is not affected by
any type of failure except for disk or catastrophic
failure.
– In addition, the log is periodically backed up to
archival storage (tape) to guard against such
catastrophic failures.
44
FIGURE 20.4
State transition diagram illustrating the states
for transaction execution.
45
Transactions: System Concepts
• Recovery manager keeps track of following
operations:
• begin_transaction: marks beginning of transaction
execution.
• read or write: These specify read or write operations
on the database items that are executed as part of a
transaction.
• end_transaction: marks the end limit of transaction
execution. Check
– whether changes introduced by the transaction can be
permanently applied to the database or
– whether the transaction has to be aborted because it violates
concurrency control or for some other reason.
46
System Log Records
Types of system log record:
1. [start_transaction,T]: Records that transaction T has
started execution.
2. [write_item,T,X,old_value,new_value]: Records that
T has changed the value of database item X from
old_value to new_value. Why keep old_value ? Why
keep new_value ?
3. [read_item,T,X]: Records that T has read the value
of database item X. Why keep this ?
4. [commit,T]: Records that T has completed
successfully : affirms that its effect should be
committed (recorded permanently) to the database.
5. [abort,T]: Records that T has been aborted.
47
Recovery using log records:
• If the system crashes, we can recover to a consistent
database state by examining the log
– by using one of the techniques we will study later
• Log contains a record of every write operation that
changes the value of some database item
– it is possible to undo the effect of these write operations of a
transaction T . How ?
• Trace backward through the log and reset all items
changed by a write operation of T to their old_values.
• Can also redo the effect of the write operations of a
transaction T. How ?
• Trace forward through the log and set all items changed
by a write operation of T (that did not get done
permanently) to their new_values.
48
Commit Point
• Commit Point of a Transaction: Transaction T
reaches its commit point when all its operations
that access the database have been executed
successfully and the effect of all the transaction
operations on the database has been recorded in
the log.
– Beyond the commit point, the transaction is said to
be committed, and its effect is assumed to be
permanently recorded in the database.
– The transaction then writes entry [commit,T] into
the log.
49
Recoverability Example
• T1
T2
r1(x): x = 6
xx+3
w1(x): x=9
r2(x)
xx+2
w2(x): x= 11
c2
a1
• Problem ?
• T2 committed, can’t undo, but T2 read a value
written by T1 and T2 has to be rolled back
50
Recoverability: Recoverable Schedules
• Recoverable schedule: One where no
committed transaction needs to be rolled back.
– Intuition: once a transaction is committed, it is
final, and cannot be rolled back
• How can previous Eg be avoided ?
– How can we ensure that a schedule is recoverable?
• A schedule S is recoverable if the following
holds: if T2 reads from T1, then T2 cannot
commit till T1 has committed.
– Let’s look at why this is important
51
Recoverability: Recoverable Schedules
–
–
–
–
Suppose T2 reads from T1
Then T2 commits
Then T1 aborts
What would we want to see happen to T2 ?
• T2 should be aborted. Why ?
• Because T2 read a value from T1 and T1 aborted
– So T2 read a “non-existent” value
• Is it possible to abort T2 now ?
• No, T2 has already been issued a commit
– DBMS has guaranteed that changes made by T2 are going
to be made permanent
– Can’t abort T2 now
52
Cascading Rollbacks [SKS Eg]
• If T10 fails ?
• Then T11 and then T12 must also be rolled back
• Cascading rollback : a single transaction
failure leads to a series of transaction rollbacks
• In above eg the schedule is recoverable. Why ?
• None of the transactions have committed
53
Recoverability: Cascadeless Schedules
• Cascadeless schedules : no cascading rollbacks.
Advantage ?
• Cascading rollbacks are inefficient. Why ?
• Work already done is lost.
• What do we need to guarantee cascadeless ?
• No reading uncommitted writes
– If T1 writes x, T2 can’t read this x till T1 has committed
– Intuition ?
• If T2 allowed to read x before T1 committed, what
happens if T1 aborts after T2 has read x?
• T2 has to be aborted – cascading rollback
54
Recoverability: Cascadeless Schedules
• If no cascading rollbacks then another advantage
is that don’t need to keep read operations in
system log. Why?
– Why do we keep read operations ?
• We only keep T1 read operations in case T1 has
read from T2 and T2 aborts
– we will need to rollback T1 also.
• Can’t happen if no cascading rollbacks.
• Every cascadeless schedule is also recoverable
– Not other way around. Eg ?
• Eg: w2(x), r1(x), a2
55
Recoverability: Strict Schedules
• Strict Schedules: if T1 writes x, then T2 can’t
read x or write x till T1 committs.
• Strict is stronger property than cascadeless
– Difference between cascadeless and strict ?
• In a strict schedule: if T1 has written x, then T2
can’t read or write over x till T1 has committed
• In cascadeless: T2 can’t read, but can write
• Eg of a schedule : cascadeless but not strict ?
• Eg: w2(x), w1(x)
• Accounts for blind writes also.
56
Recoverability: Strict Schedules
• With strict schedules, easier to recover. Why?
• Just restore before image : what the value was
before it was written over
• Eg: x = 3, w1(x) : 4 (remembers old value = 3),
a1 ; restore x = 3.
• Won’t work if not strict:
• Eg: x = 3, w1(x) : 4 (remembers old value = 3),
w2(x) : 5 (remembers old value = 4), a1 ;
restore x = 3: WRONG ! How to do ?
• Need to remember T2 wrote 5 in the log.
57
Recoverability: Strict Schedules
• In strict schedule, to roll back, just have to
restore old value.
• For non strict schedules, have to include
new_value since could have blind writes
– Do undo’s followed by redo’s
• Strict protocols require simpler write entries
– Only need to keep old_value
58
Recoverability
• Four levels of classification vis-à-vis recovery
–
–
–
–
Not-recoverable
Recoverable but not cascadeless
Cascadeless but not strict
Strict
• Recoverable/Non recoverable : w1(x),r2(x) :
depends on which commits first T1or T2
• Cascadeless/strict depends on read/writes of
uncommitted writes
59
Recoverability
• Recoverable: T1 does not commit before T2 if
T1 reads from a write done by T2 and T2 not yet
committed.
• Cascadeless: T1 does not read from write done
by T2 till T2 committed.
• Strict: T1 does not read from or write over write
done by T2 till T2 committed.
60
Recoverability
•
•
•
•
•
•
•
•
Eg: w2(x), r1(x), c1, a2 ?
Not recoverable
Eg: w2(x), r1(x), a2 ?
Recoverable but not cascadeless
Eg: w2(x), w1(x), a2 ?
Cascadeless but not strict
Eg: w2(x), c2, w1(x), ?
Strict
61