Download DB2 – Triggers This chapter describes triggers, their types, creation

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Entity–attribute–value model wikipedia , lookup

Database wikipedia , lookup

Microsoft Jet Database Engine wikipedia , lookup

Relational algebra wikipedia , lookup

Functional Database Model wikipedia , lookup

Relational model wikipedia , lookup

Clusterpoint wikipedia , lookup

Serializability wikipedia , lookup

Concurrency control wikipedia , lookup

Database model wikipedia , lookup

Extensible Storage Engine wikipedia , lookup

Transcript
DB2 – Triggers
This chapter describes triggers, their types, creation and dropping of the triggers.
Introduction
A trigger is a set of actions, which are performed for responding to an INSERT, UPDATE or DELETE operation on
a specified table in the database. Triggers are stored in the database at once. They handle governance of data. They
can be accessed and shared among multiple applications. The advantage of using triggers is, if any change needs to
be done in the application, it is done at the trigger; instead of changing each application that is accessing the trigger.
Triggers are easy to maintain and they enforce faster application development. Triggers are defined using an SQL
statement “CREATE TRIGGER”.
Types of triggers
There are three types of triggers:
1. BEFORE triggers
They are executed before any SQL operation.
2. AFTER triggers
They are executed after any SQL operation.
Creating a BEFORE trigger
Let us see how to create a sequence of trigger:
Syntax:
db2 create sequence <seq_name>
Example: Creating a sequence of triggers for table shopper.sales1
db2 create sequence sales1_seq as int start with 1 increment by 1
Syntax:
db2 create trigger <trigger_name> no cascade before insert on
<table_name> referencing new as <table_object> for each row set
<table_object>.<col_name>=nextval for <sequence_name>
Example: Creating trigger for shopper.sales1 table to insert primary key numbers automatically
db2 create trigger sales1_trigger no cascade before insert on
shopper.sales1 referencing new as obj for each row set
obj.id=nextval for sales1_seq
Now try inserting any values:
db2 insert into shopper.sales1(itemname, qty, price)
values('bicks', 100, 24.00)
Retrieving values from table
Let us see how to retrieve values from a table:
Syntax:
db2 select * from <tablename>
Example:
db2 select * from shopper.sales1
Output:
ID
ITEMNAME
-------
------------
QTY
----------
3
bicks
100
2
bread
100
2 record(s) selected.
Creating an AFTER trigger
Let us see how to create an after trigger:
Syntax:
db2 create trigger <trigger_name> no cascade before insert on
<table_name> referencing new as <table_object> for each row set
<table_object>.<col_name>=nextval for <sequence_name>
Example: [To insert and retrieve the values]
db2 create trigger sales1_tri_after after insert on shopper.sales1
for each row mode db2sql begin atomic update shopper.sales1
set price=qty*price; end
Output:
//inseting values in shopper.sales1
db2 insert into shopper.sales1(itemname,qty,price)
values('chiken',100,124.00)
//output
ID
ITEMNAME
QTY
PRICE
----- -------------- ----------- ----------3 bicks
100
2400.00
4 chiken
100
12400.00
2 bread
100
2400.00
3 record(s) selected.
Dropping a trigger
Here is how a database trigger is dropped:
Syntax:
db2 drop trigger <trigger_name>
Example:
db2 drop trigger slaes1_trigger
DBMS - Joins
We understand the benefits of taking a Cartesian product of two relations, which gives us all the possible tuples that
are paired together. But it might not be feasible for us in certain cases to take a Cartesian product where we
encounter huge relations with thousands of tuples having a considerable large number of attributes.
Join is a combination of a Cartesian product followed by a selection process. A Join operation pairs two tuples from
different relations, if and only if a given join condition is satisfied.
We will briefly describe various join types in the following sections.
Theta (θ) Join
Theta join combines tuples from different relations provided they satisfy the theta condition. The join condition is
denoted by the symbol θ.
Notation
R1 ⋈θ R2
R1 and R2 are relations having attributes (A1, A2, .., An) and (B1, B2,.. ,Bn) such that the attributes don’t have
anything in common, that is R1 ∩ R2 = Φ.
Theta join can use all kinds of comparison operators.
Student
SID
Name
Std
101
Alex
10
102
Maria
11
Subjects
Class
Subject
10
Math
10
English
11
Music
11
Sports
Student_Detail −
STUDENT ⋈Student.Std
= Subject.Class
SUBJECT
Student_detail
SID
Name
Std
Class
Subject
101
Alex
10
10
Math
101
Alex
10
10
English
102
Maria
11
11
Music
102
Maria
11
11
Sports
Equijoin
When Theta join uses only equality comparison operator, it is said to be equijoin. The above example corresponds
to equijoin.
Natural Join (⋈)
Natural join does not use any comparison operator. It does not concatenate the way a Cartesian product does. We
can perform a Natural Join only if there is at least one common attribute that exists between two relations. In
addition, the attributes must have the same name and domain.
Natural join acts on those matching attributes where the values of attributes in both the relations are same.
Courses
CID
Course
Dept
CS01
Database
CS
ME01
Mechanics
ME
EE01
Electronics
EE
HoD
Dept
Head
CS
Alex
ME
Maya
EE
Mira
Courses ⋈ HoD
Dept
CID
Course
Head
CS
CS01
Database
Alex
ME
ME01
Mechanics
Maya
EE
EE01
Electronics
Mira
Outer Joins
Theta Join, Equijoin, and Natural Join are called inner joins. An inner join includes only those tuples with matching
attributes and the rest are discarded in the resulting relation. Therefore, we need to use outer joins to include all the
tuples from the participating relations in the resulting relation. There are three kinds of outer joins − left outer join,
right outer join, and full outer join.
Left Outer Join(R
S)
All the tuples from the Left relation, R, are included in the resulting relation. If there are tuples in R without any
matching tuple in the Right relation S, then the S-attributes of the resulting relation are made NULL.
Left
A
B
100
Database
101
Mechanics
102
Electronics
Right
A
B
100
Alex
102
Maya
104
Mira
Courses
HoD
A
B
C
D
100
Database
100
Alex
101
Mechanics
---
---
102
Electronics
102
Maya
Right Outer Join: ( R
S)
All the tuples from the Right relation, S, are included in the resulting relation. If there are tuples in S without any
matching tuple in R, then the R-attributes of resulting relation are made NULL.
Courses
HoD
A
B
C
D
100
Database
100
Alex
102
Electronics
102
Maya
---
---
104
Mira
Full Outer Join: ( R S)
All the tuples from both participating relations are included in the resulting relation. If there are no matching tuples
for both relations, their respective unmatched attributes are made NULL.
Courses
HoD
A
B
C
D
100
Database
100
Alex
101
Mechanics
---
---
102
Electronics
102
Maya
---
---
104
Mira
DBMS - Indexing
We know that data is stored in the form of records. Every record has a key field, which helps it to be recognized
uniquely.
Indexing is a data structure technique to efficiently retrieve records from the database files based on some attributes
on which the indexing has been done. Indexing in database systems is similar to what we see in books.
Indexing is defined based on its indexing attributes. Indexing can be of the following types −

Primary Index − Primary index is defined on an ordered data file. The data file is ordered on a key field.
The key field is generally the primary key of the relation.

Secondary Index − Secondary index may be generated from a field which is a candidate key and has a
unique value in every record, or a non-key with duplicate values.

Clustering Index − Clustering index is defined on an ordered data file. The data file is ordered on a nonkey field.
Ordered Indexing is of two types −

Dense Index

Sparse Index
Dense Index
In dense index, there is an index record for every search key value in the database. This makes searching faster but
requires more space to store index records itself. Index records contain search key value and a pointer to the actual
record on the disk.
Sparse Index
In sparse index, index records are not created for every search key. An index record here contains a search key and
an actual pointer to the data on the disk. To search a record, we first proceed by index record and reach at the actual
location of the data. If the data we are looking for is not where we directly reach by following the index, then the
system starts sequential search until the desired data is found.
Multilevel Index
Index records comprise search-key values and data pointers. Multilevel index is stored on the disk along with the
actual database files. As the size of the database grows, so does the size of the indices. There is an immense need to
keep the index records in the main memory so as to speed up the search operations. If single-level index is used,
then a large size index cannot be kept in memory which leads to multiple disk accesses.
Multi-level Index helps in breaking down the index into several smaller indices in order to make the outermost level
so small that it can be saved in a single disk block, which can easily be accommodated anywhere in the main
memory.
B+ Tree
A B+ tree is a balanced binary search tree that follows a multi-level index format. The leaf nodes of a B+ tree denote
actual data pointers. B+ tree ensures that all leaf nodes remain at the same height, thus balanced. Additionally, the
leaf nodes are linked using a link list; therefore, a B+ tree can support random access as well as sequential access.
Structure of B+ Tree
Every leaf node is at equal distance from the root node. A B+ tree is of the order n where n is fixed for every B+ tree.
Internal nodes −

Internal (non-leaf) nodes contain at least ⌈n/2⌉ pointers, except the root node.

At most, an internal node can contain n pointers.
Leaf nodes −

Leaf nodes contain at least ⌈n/2⌉ record pointers and ⌈n/2⌉ key values.

At most, a leaf node can contain n record pointers and n key values.

Every leaf node contains one block pointer P to point to next leaf node and forms a linked list.
B+ Tree Insertion

B+ trees are filled from bottom and each entry is done at the leaf node.

If a leaf node overflows −

o
Split node into two parts.
o
Partition at i = ⌊(m+1)/2⌋.
o
First i entries are stored in one node.
o
Rest of the entries (i+1 onwards) are moved to a new node.
o
ith key is duplicated at the parent of the leaf.
If a non-leaf node overflows −
o
Split node into two parts.
o
Partition the node at i = ⌈(m+1)/2⌉.
o
Entries up to i are kept in one node.
o
Rest of the entries are moved to a new node.
B+ Tree Deletion

B+ tree entries are deleted at the leaf nodes.

The target entry is searched and deleted.
o

If it is an internal node, delete and replace with the entry from the left position.
After deletion, underflow is tested,
o

If distribution is not possible from left, then
o

If underflow occurs, distribute the entries from the nodes left to it.
Distribute from the nodes right to it.
If distribution is not possible from left or from right, then
o
Merge the node with left and right to it.
DBMS - Hashing
For a huge database structure, it can be almost next to impossible to search all the index values through all its level
and then reach the destination data block to retrieve the desired data. Hashing is an effective technique to calculate
the direct location of a data record on the disk without using index structure.
Hashing uses hash functions with search keys as parameters to generate the address of a data record.
Hash Organization

Bucket − A hash file stores data in bucket format. Bucket is considered a unit of storage. A bucket typically
stores one complete disk block, which in turn can store one or more records.

Hash Function − A hash function, h, is a mapping function that maps all the set of search-keys K to the
address where actual records are placed. It is a function from search keys to bucket addresses.
Static Hashing
In static hashing, when a search-key value is provided, the hash function always computes the same address. For
example, if mod-4 hash function is used, then it shall generate only 5 values. The output address shall always be
same for that function. The number of buckets provided remains unchanged at all times.
Operation

Insertion − When a record is required to be entered using static hash, the hash function h computes the
bucket address for search key K, where the record will be stored.
Bucket address = h(K)

Search − When a record needs to be retrieved, the same hash function can be used to retrieve the address of
the bucket where the data is stored.

Delete − This is simply a search followed by a deletion operation.
Bucket Overflow
The condition of bucket-overflow is known as collision. This is a fatal state for any static hash function. In this case,
overflow chaining can be used.

Overflow Chaining − When buckets are full, a new bucket is allocated for the same hash result and is
linked after the previous one. This mechanism is called Closed Hashing.

Linear Probing − When a hash function generates an address at which data is already stored, the next free
bucket is allocated to it. This mechanism is called Open Hashing.
Dynamic Hashing
The problem with static hashing is that it does not expand or shrink dynamically as the size of the database grows or
shrinks. Dynamic hashing provides a mechanism in which data buckets are added and removed dynamically and ondemand. Dynamic hashing is also known as extended hashing.
Hash function, in dynamic hashing, is made to produce a large number of values and only a few are used initially.
Organization
The prefix of an entire hash value is taken as a hash index. Only a portion of the hash value is used for computing
bucket addresses. Every hash index has a depth value to signify how many bits are used for computing a hash
function. These bits can address 2n buckets. When all these bits are consumed − that is, when all the buckets are full
− then the depth value is increased linearly and twice the buckets are allocated.
Operation

Querying − Look at the depth value of the hash index and use those bits to compute the bucket address.

Update − Perform a query as above and update the data.

Deletion − Perform a query to locate the desired data and delete the same.

Insertion − Compute the address of the bucket
o
If the bucket is already full.

Add more buckets.

Add additional bits to the hash value.

Re-compute the hash function.
o
Else

o
Add data to the bucket,
If all the buckets are full, perform the remedies of static hashing.
Hashing is not favorable when the data is organized in some ordering and the queries require a range of data. When
data is discrete and random, hash performs the best.
Hashing algorithms have high complexity than indexing. All hash operations are done in constant time.
DBMS - Transaction
ACID Properties
A transaction is a very small unit of a program and it may contain several lowlevel tasks. A transaction in a database
system must maintain Atomicity, Consistency, Isolation, and Durability − commonly known as ACID properties −
in order to ensure accuracy, completeness, and data integrity.

Atomicity − This property states that a transaction must be treated as an atomic unit, that is, either all of its
operations are executed or none. There must be no state in a database where a transaction is left partially
completed. States should be defined either before the execution of the transaction or after the
execution/abortion/failure of the transaction.

Consistency − The database must remain in a consistent state after any transaction. No transaction should
have any adverse effect on the data residing in the database. If the database was in a consistent state before
the execution of a transaction, it must remain consistent after the execution of the transaction as well.

Durability − The database should be durable enough to hold all its latest updates even if the system fails or
restarts. If a transaction updates a chunk of data in a database and commits, then the database will hold the
modified data. If a transaction commits but the system fails before the data could be written on to the disk,
then that data will be updated once the system springs back into action.

Isolation − In a database system where more than one transaction are being executed simultaneously and in
parallel, the property of isolation states that all the transactions will be carried out and executed as if it is the
only transaction in the system. No transaction will affect the existence of any other transaction.
Serializability
When multiple transactions are being executed by the operating system in a multiprogramming environment, there
are possibilities that instructions of one transactions are interleaved with some other transaction.

Schedule − A chronological execution sequence of a transaction is called a schedule. A schedule can have
many transactions in it, each comprising of a number of instructions/tasks.

Serial Schedule − It is a schedule in which transactions are aligned in such a way that one transaction is
executed first. When the first transaction completes its cycle, then the next transaction is executed.
Transactions are ordered one after the other. This type of schedule is called a serial schedule, as transactions
are executed in a serial manner.
In a multi-transaction environment, serial schedules are considered as a benchmark. The execution sequence of an
instruction in a transaction cannot be changed, but two transactions can have their instructions executed in a random
fashion. This execution does no harm if two transactions are mutually independent and working on different
segments of data; but in case these two transactions are working on the same data, then the results may vary. This
ever-varying result may bring the database to an inconsistent state.
To resolve this problem, we allow parallel execution of a transaction schedule, if its transactions are either
serializable or have some equivalence relation among them.
Equivalence Schedules
An equivalence schedule can be of the following types −
Result Equivalence
If two schedules produce the same result after execution, they are said to be result equivalent. They may yield the
same result for some value and different results for another set of values. That's why this equivalence is not
generally considered significant.
View Equivalence
Two schedules would be view equivalence if the transactions in both the schedules perform similar actions in a
similar manner.
For example −

If T reads the initial data in S1, then it also reads the initial data in S2.

If T reads the value written by J in S1, then it also reads the value written by J in S2.

If T performs the final write on the data value in S1, then it also performs the final write on the data value in
S2.
Conflict Equivalence
Two schedules would be conflicting if they have the following properties −

Both belong to separate transactions.

Both accesses the same data item.

At least one of them is "write" operation.
Two schedules having multiple transactions with conflicting operations are said to be conflict equivalent if and only
if −

Both the schedules contain the same set of Transactions.

The order of conflicting pairs of operation is maintained in both the schedules.
Note − View equivalent schedules are view serializable and conflict equivalent schedules are conflict serializable.
All conflict serializable schedules are view serializable too.
States of Transactions
A transaction in a database can be in one of the following states −

Active − In this state, the transaction is being executed. This is the initial state of every transaction.

Partially Committed − When a transaction executes its final operation, it is said to be in a partially
committed state.

Failed − A transaction is said to be in a failed state if any of the checks made by the database recovery
system fails. A failed transaction can no longer proceed further.

Aborted − If any of the checks fails and the transaction has reached a failed state, then the recovery
manager rolls back all its write operations on the database to bring the database back to its original state
where it was prior to the execution of the transaction. Transactions in this state are called aborted. The
database recovery module can select one of the two operations after a transaction aborts −

o
Re-start the transaction
o
Kill the transaction
Committed − If a transaction executes all its operations successfully, it is said to be committed. All its
effects are now permanently established on the database system.
DBMS - Storage System
Databases are stored in file formats, which contain records. At physical level, the actual data is stored in
electromagnetic format on some device. These storage devices can be broadly categorized into three types −

Primary Storage − The memory storage that is directly accessible to the CPU comes under this category.
CPU's internal memory (registers), fast memory (cache), and main memory (RAM) are directly accessible
to the CPU, as they are all placed on the motherboard or CPU chipset. This storage is typically very small,
ultra-fast, and volatile. Primary storage requires continuous power supply in order to maintain its state. In
case of a power failure, all its data is lost.

Secondary Storage − Secondary storage devices are used to store data for future use or as backup.
Secondary storage includes memory devices that are not a part of the CPU chipset or motherboard, for
example, magnetic disks, optical disks (DVD, CD, etc.), hard disks, flash drives, and magnetic tapes.

Tertiary Storage − Tertiary storage is used to store huge volumes of data. Since such storage devices are
external to the computer system, they are the slowest in speed. These storage devices are mostly used to
take the back up of an entire system. Optical disks and magnetic tapes are widely used as tertiary storage.
Memory Hierarchy
A computer system has a well-defined hierarchy of memory. A CPU has direct access to it main memory as well as
its inbuilt registers. The access time of the main memory is obviously less than the CPU speed. To minimize this
speed mismatch, cache memory is introduced. Cache memory provides the fastest access time and it contains data
that is most frequently accessed by the CPU.
The memory with the fastest access is the costliest one. Larger storage devices offer slow speed and they are less
expensive, however they can store huge volumes of data as compared to CPU registers or cache memory.
Magnetic Disks
Hard disk drives are the most common secondary storage devices in present computer systems. These are called
magnetic disks because they use the concept of magnetization to store information. Hard disks consist of metal disks
coated with magnetizable material. These disks are placed vertically on a spindle. A read/write head moves in
between the disks and is used to magnetize or de-magnetize the spot under it. A magnetized spot can be recognized
as 0 (zero) or 1 (one).
Hard disks are formatted in a well-defined order to store data efficiently. A hard disk plate has many concentric
circles on it, called tracks. Every track is further divided into sectors. A sector on a hard disk typically stores 512
bytes of data.
Redundant Array of Independent Disks
RAID or Redundant Array of Independent Disks, is a technology to connect multiple secondary storage devices and
use them as a single storage media.
RAID consists of an array of disks in which multiple disks are connected together to achieve different goals. RAID
levels define the use of disk arrays.