Download Course Name : Database Management Systems

Document related concepts

Oracle Database wikipedia , lookup

Microsoft SQL Server wikipedia , lookup

SQL wikipedia , lookup

Open Database Connectivity wikipedia , lookup

Commitment ordering wikipedia , lookup

Relational algebra wikipedia , lookup

Entity–attribute–value model wikipedia , lookup

Database wikipedia , lookup

Functional Database Model wikipedia , lookup

Microsoft Jet Database Engine wikipedia , lookup

ContactPoint wikipedia , lookup

Versant Object Database wikipedia , lookup

Clusterpoint wikipedia , lookup

Serializability wikipedia , lookup

Concurrency control wikipedia , lookup

Extensible Storage Engine wikipedia , lookup

Relational model wikipedia , lookup

Database model wikipedia , lookup

Transcript
Department of Computer Science and Engineering
Course Name
:
Database Management Systems
Course Number
:
A55025
Course Designation :
Core
Prerequisites
Data Structures
:
III B Tech – I Semester
(2015-2016)
A.Malli Karjuna Reddy
Assistant Professor
SYLLABUS
Introduction to Database System Concepts: Database-System Applications, Purpose of
Database Systems, View of Data, Database Languages, Relational Databases, Database
Design, Data Storage and Querying, Transaction Management, Database Architecture,
Data Mining and Information Retrieval, Spatial Databases, Database Users and
Administrators, History of Database Systems.
Unit – I
RELATIONAL DATABASES
Introduction to the Relation Models: Structure of Relational Databases, Database
Schema, Keys, Schema Diagrams, Relational Query Languages, Relational Operations.
Introduction to SQL: Overview of the SQL Query Language, SQL Data Definition,
Basic Structure of SQL Queries, Additional Basic Operations, Set Operations, Null
Values, Aggregate Functions Nested Subqueries, Modification of the Database.
Intermediate SQL: Join Expressions, Views, Transactions, Integrity Constraints, SQL
Data Types and Schemas, Authorization.
Unit – II
Advanced SQL: Accessing SQL From a Programming Language, Functions and
Procedures, Triggers, Recursive Queries, Advanced Aggregation Features, OLAP.
Formal Relational Query Languages: The Relational Algebra, The Tuple Relational
Calculus, The Domain Relational Calculus.
DATABASE DESIGN
Unit – III
Database Design and the E-R Model: Overview of the Design Process, The EntityRelationship Model, Constraints, Removing Redundant Attributes in Entity Sets, EntityRelationship Diagrams, Reduction to Relational Schemas, Entity-Relationship Design
Issues, Extended E-R Features, Alternative Notations for Modeling Data, Other Aspects of
Database Design.
Relational Database Design: Features of Good Relational Designs, Atomic Domains and
First Normal Form, Decomposition Using Functional Dependencies, FunctionalDependency Theory, Algorithms for Decomposition, Decomposition Using Multivalued
Dependencies, More Normal Forms, Database-Design Process, Modeling Temporal Data.
DATA STORAGE AND QUERYING
Storage and File Structure : Overview of Physical Storage Media, Magnetic Disk and
Flash Storage, RAID, Tertiary Storage, File Organization, Organization of Records in
Files Data-Dictionary Storage, Database Buffer.
Unit – IV
Indexing and Hashing: Basic Concepts, Ordered Indices, B+-Tree Index Files, B+-Tree
Extensions, Multiple-Key Access, Static Hashing, Dynamic Hashing, Comparison of
Ordered Indexing and Hashing, Bitmap Indices, Index Definition in SQL.
Query Processisng : Overview, Measures of Query Cost, Selection Operation, Sorting,
Join Operation, Other Operations, Evaluation of Expressions.
TRANSACTION MANAGEMENT
Transactions : Transaction Concept, A Simple Transaction Model, Storage Structure,
Transaction Atomicity and Durability, Transaction Isolation, Serializability, Transaction
Isolation and Atomicity, Transaction Isolation Levels, Implementation of Isolation Levels,
Transactions as SQL Statements.
Unit – V
Concurrency Control : Lock-Based Protocols, Deadlock Handling, Multiple Granularity,
Timestamp-Based Protocols, Validation-Based Protocols, Multiversion Schemes,
Snapshot Isolation, Insert Operations, Delete Operations, and Predicate Reads, Weak
Levels of Consistency in Practice, Concurrency in Index Structures.
Recovery System: Failure Classification, Storage, Recovery and Atomicity, Recovery
Algorithm, Buffer Management, Failure with Loss of Nonvolatile Storage, Early Lock
Release and Logical Undo Operations, ARIES, Remote Backup Systems.
TEXT BOOKS & OTHER REFERENCES
Text Books
1) Database System Concepts, Abraham Silberschatz, Henry F. Korth, S.
Sudarshan, 6th Edition, Tata Mc Graw-Hill.
1.
2) Database Management System, Raghu Rama Kirshna, Johannes Gchrke, TATA
MC Graw Hill 3rd Edition.
2.
Suggested / Reference Books
Data base Systems design, Implementation and Management Peter Rob & Carlos
1.
Coronel 7th Edition.
Fundamentals of Database Systems Elmasri Navrate Pearson Education.
2.
Websites References
1.
2.
3.
http://en.wikipedia.org/wiki/Database_normalization
http://www.w3schools.com/sql/default.asp
http://www.sql-tutorial.net/
4.
www.cs.iit.edu/~cs561/cs425/algebra/home.html
5.
www.tutorialspoint.com/dbms/
6.
www.studytonight.com/dbms/
7.
www.dbms2.com
8.
https://www.wiziq.com/tutorials/dbms
9.
www.cramerz.com/database_concepts/dbms_and_rdbms
10.
www.plsqlchallenge.com
Time Table
Room No:
Class Hour
Time
W.E.F: 23/06/2015
1
9:00 -09:50
2
09.50 –
10:40
3
10:40 –
11:30
4
5
11:30 – 12:
20
1:00 –
1:50
MON
THU
FRI
SAT
LUNCH BREAK
WED
12:20 – 1:00
TUE
6
1:50 –
2:40
7
8
2:40 –
3:30
3:30-4:20
PROGRAM EDUCATIONAL OBJECTIVES (PEO’s)
PEO1
The Graduates are employable as software professionals in reputed industries.
PEO2
The Graduates analyze problems by applying the principles of computer science,
mathematics and scientific investigation to design and implement industry
accepted solutions using latest technologies.
PEO3
The Graduates work productively in supportive and leadership roles on
multidisciplinary teams with effective communication and team work skills with
high regard to legal and ethical responsibilities.
PEO4
The Graduates embrace lifelong learning to meet ever changing developments in
computer science and Engineering.
PROGRAM OUTCOMES (PO’s)
PO1
An ability to communicate effectively and work on multidisciplinary teams
PO2
An ability to identify, formulate and solve computer system problems with
professional and ethical responsibility.
PO3
A recognition of the need for, and an ability to engage in life-long learning to use
the latest techniques, skills and modern engineering tools
PO4
The broad education necessary to understand the impact of engineering solutions
in a global, economic, environmental and social context
PO5
An ability to apply knowledge of mathematics, science, and computing to analyze,
design and implement solutions to the realistic problems.
PO6
An ability to apply suitable process with the understanding of software
development practice.
Course Outcomes:
CO 1
Design Entity-Relationship Model for enterprise level databases.
CO 2
Develop the database and provide restricted access to different users of
database and formulate the Complex SQL queries.
CO 3
Ability to analyze working principles of various protocols implemented by
specific DBMS components.
CO 4
Analyze various Formal Query Languages for Query Optimization and
various Normal forms to carry out Schema refinement.
CO 5
Justify the use of suitable Indices and Hashing mechanisms for real time
implementation.
MAPPING OF COURSE OUT COMES WITH PO’s & PEO’s
Course
Outcomes-CO’S
CO 1
CO 2
CO 3
Programme
OutcomesPO’s
Programme Educational
Objectives- PEO’s
PO-5
PEO-2
PO-5
PEO-2
PO-5
PEO-2
CO 4
PO-6
PEO-1
CO 5
PO-6
PEO-1
COURSE SCHEDULE
Distribution of Hours Unit – Wise
Chapters
Unit
I
II
III
IV
V
Topic
i. Introduction to
Database System
Concepts
ii. Introduction to the
Relation Models
iii. Introduction to SQL
i. Intermediate SQL
ii. Advanced SQL
iii. Formal Relational
Query Languages
i. Database Design and
the E-R Model
ii. Relational Database
Design
i. Storage and File
Structure
ii. Indexing and Hashing
iii. Query Processing
i. Transactions
ii. Concurrency Control
iii. Recovery System
Book1
Book2
Ch1,2,3,4,5
Ch1,3
Ch 6,7,8
Ch 4,5
Total No.
of Hours
15
12
Ch9,10,11,12
Ch2,15
11
Ch8.9,10
12
Ch 18,19
15
Ch13,14,15
Ch 16,17,18,19
Contact classes for Syllabus coverage
65
Tutorial Classes : 05 ; Online Quiz : 1 per unit
Descriptive Tests : 02 (Before Mid Examination)
Revision classes :1 per unit
Number of Hours / lectures available in this Semester / Year
65
Lecture Plan
S. No.
Topic
Unit-1
1
2
3
4
5
Database-System Applications, Purpose of Database
Systems, View of Data, Database Languages,
Relational Databases, Database Design, Data Storage and
Querying, Transaction Management, Database
Architecture
Data Mining and Information Retrieval, Spatial Databases
Database Users and Administrators, History of Database
Systems.
Structure of Relational Databases, Database Schema,
Keys
6
Tutorial class
7
Schema Diagrams, Relational Query Languages
8
Relational Operations, Overview of the SQL Query
Language
9
SQL Data Definition, Basic Structure of SQL Queries
10
Tutorial class
11
Additional Basic Operations, Null Values
12
Aggregate Functions, Set Operations
13
Nested Sub queries
14
Modification of the Database
15
Unit I Review
Unit-2
16
Join Expressions
17
Views, Transactions, Integrity Constraints,
18
SQL Data Types and Schemas, Authorization.
19
Accessing SQL From a Programming Language
20
Tutorial class
21
Functions and Procedures ,Triggers
22
Recursive Queries
23
Advanced Aggregation Features, OLAP
24
The Relational Algebra
25
The Tuple Relational Calculus,
26
The Domain Relational Calculus
27
Unit II Review
Unit-3
28
Overview of the Design Process
Date of Completion
29
The Entity-Relationship Model, Constraints
30
Removing Redundant Attributes in Entity Sets
31
32
33
34
35
36
Entity-Relationship Diagrams, Reduction to Relational
Schemas
Entity-Relationship Design Issues, Extended E-R
Features
Tutorial class
Features of Good Relational Designs, Atomic Domains
and First Normal Form
Decomposition Using Functional Dependencies,
Functional-Dependency Theory, Algorithms for
Decomposition
Decomposition Using Multivalued Dependencies,
37
More Normal Forms, Database-Design Process, Modeling
Temporal Data.
38
Unit III Review
Unit-4
39
Overview of Physical Storage Media
40
Magnetic Disk and Flash Storage
41
RAID, Tertiary Storage
42
File Organization, Organization of Records in Files
43
Data-Dictionary Storage, Database Buffer.
44
Tutorial class
45
Basic Concepts, Ordered Indices ,B+-Tree Index Files,
B+-Tree Extensions
46
Multiple-Key Access, Static Hashing, Dynamic Hashing
47
Comparison of Ordered Indexing and Hashing, Bitmap
Indices, Index Definition in SQL.
48
Overview, Measures of Query Cost, Selection Operation,
49
Sorting, Join Operation, Other Operations, Evaluation of
Expressions
50
Unit IV Review
Unit-5
51
Transaction Concept, A Simple Transaction Model
52
Storage Structure, Transaction Atomicity and Durability
53
54
55
56
57
Transaction Isolation, Serializability, Transaction
Isolation and Atomicity,
Transaction Isolation Levels, Implementation of Isolation
Levels
Tutorial class
Lock-Based Protocols, Deadlock Handling, Multiple
Granularity
Timestamp-Based Protocols, Validation-Based Protocols,
Multiversion Schemes, Snapshot Isolation
58
Insert Operations, Delete Operations, and Predicate Reads
59
Weak Levels of Consistency in Practice, Concurrency in
Index Structures.
60
Failure Classification, Storage, Recovery and Atomicity, ,
61
Recovery Algorithm, Buffer Management
62
Tutorial class
63
64
65
Failure with Loss of Nonvolatile Storage,Early Lock
Release and Logical Undo Operations
ARIES, Remote Backup Systems.
Unit V Review
Date of Unit Completion & Remarks
Unit – 1
Date
:
__ / __ / __
Remarks:
________________________________________________________________________
________________________________________________________________________
Unit – 2
Date
:
__ / __ / __
Remarks:
________________________________________________________________________
________________________________________________________________________
Unit – 3
Date
:
__ / __ / __
Remarks:
________________________________________________________________________
________________________________________________________________________
Unit – 4
Date
:
__ / __ / __
Remarks:
________________________________________________________________________
________________________________________________________________________
Unit – 5
Date
:
__ / __ / __
Remarks:
________________________________________________________________________
________________________________________________________________________
Unit Wise Assignments (With different Levels of thinking (Blooms Taxonomy))
Note: For every question please mention the level of Blooms taxonomy
Unit – 1
1.
Why would you choose a database system instead of simply storing data in
Operating system files? When would it make sense not to use a database system
L5
2.
What is logical data independence and why is it important?
L1
3.
4.
Explain the difference between external, internal, and conceptual schemas.
How are these different schema layers related to the concepts of logical and physical
data independence?
L5
Consider the following relational schema and briefly answer the questions that
follow:
L1 & L4
Emp(eid: integer, ename: string, age: integer, salary: real)
Works(eid: integer, did: integer, pct time: integer)
Dept(did: integer, budget: real, managerid: integer)
1. Define a table constraint on Emp that will ensure that every employee makes at
least $10,000.
2. Define a table constraint on Dept that will ensure that all managers have age > 30.
3. Define an assertion on Dept that will ensure that all managers have age > 30.
Compare this assertion with the equivalent table constraint. Explain which is better.
Unit – 2
1.
2.
3.
4.
Consider the following schema:
L4
Suppliers(sid: integer, sname: string, address: string)
Parts(pid: integer, pname: string, color: string)
Catalog(sid: integer, pid: integer, cost: real)
The key fields are underlined, and the domain of each field is listed after the field
name. Thus sid is the key for Suppliers, pid is the key for Parts, and sid and pid
together form the key for Catalog. The Catalog relation lists the prices charged for
parts by Suppliers. Write the following queries in relational algebra, tuple relational
calculus, and domain relational calculus:
1. Find the names of suppliers who supply some red part.
2. Find the sids of suppliers who supply some red or green part.
3. Find the sids of suppliers who supply some red part or are at 221 Packer Ave.
4. Find the sids of suppliers who supply some red part and some green part.
5. Find the sids of suppliers who supply every part.
6. Find the sids of suppliers who supply every red part.
7. Find the sids of suppliers who supply every red or green part.
8. Find the sids of suppliers who supply every red part or supply every green part.
9. Find pairs of sids such that the supplier with the first sid charges more for some part
Than the supplier with the second sid.
10. Find the pids of parts that are supplied by at least two different suppliers.
11. Find the pids of the most expensive parts supplied by suppliers named Yosemite
Sham.
12. Find the pids of parts supplied by every supplier at less than $200. (If any supplier
either does not supply the part or charges more than $200 for it, the part is not
selected.)
Write triggers to enforce the referential integrity constraint from section to time slot,
on updates to section, and time slot. Note that the ones do not cover the update
operation.
L4
What is relational completeness? If a query language is relationally complete,
Can you write any desired query in that language?
L1
What is an unsafe query? Give an example and explain why it is important
to disallow such queries.
L1
Unit – 3
1.
2.
3.
4.
A university database contains information about professors (identified by social
security number, or SSN) and courses (identified by course id). Professors teach
courses; each of the following situations concerns the Teaches relationship set. For
each situation, draw an ER diagram that describes it (assuming that no further
constraints hold).
L5
1. Professors can teach the same course in several semesters, and each offering must
be recorded.
2. Professors can teach the same course in several semesters, and only the most recent
Such offering needs to be recorded. (Assume this condition applies in all subsequent
questions.)
3. Every professor must teach some course.
4. Every professor teaches exactly one course (no more, no less).
5. Every professor teaches exactly one course (no more, no less), and every course
must be taught by some professor.
6. Now suppose that certain courses can be taught by a team of professors jointly, but
it is possible that no one professor in a team can teach the course. Model this situation,
Introducing additional entity sets and relationship sets if necessary.
Construct an E-R diagram for a hospital with a set of patients and a set of medical
doctors. Associate with each patient a log of the various tests and examinations
conducted.
L6
What is meant by repetition of information and inability to represent information?
Explain why each of these properties may indicate a bad relational database design.
L1& L5
Consider the following set F of functional dependencies on the relation schema r(A, B,
C, D, E, F):
L5
A → BCD
BC → DE
B→D
D→A
a. Compute B+.
b. Prove (using Armstrong’s axioms) that AF is a super key.
c. Compute a canonical cover for the above set of functional dependencies F; give
each step of your derivation with an explanation.
d. Give a 3NF decomposition of r based on the canonical cover.
e. Give a BCNF decomposition of r using the original set of functional dependencies
Unit – 4
1.
2.
3.
4.
Unit – 5
Which of the three basic file organizations would you choose for a file where the most
frequent operations are as follows?
L1
1. Search for records based on a range of Field values.
2. Perform inserts and scans where the order of records does not matter.
3. Search for a record based on a particular field value.
What is scrubbing, in the context of RAID systems, and why is scrubbing important?
L1
Suppose that a page can contain at most four data values and that all data
Values are integers. Using only B+ trees of order 2, give examples of each of the
following:
L5
1. A B+ tree whose height changes from 2 to 3 when the value 25 is inserted. Show
your structure before and after the insertion.
2. A B+ tree in which the deletion of the value 25 leads to redistribution. Show your
Structure before and after the deletion.
Why is a hash structure not the best choice for a search key on which range queries are
likely?
L1
1.
2.
3.
4.
Consider the following classes of schedules: serializable, conflict-serializable,
view-serializable, recoverable, avoids-cascading-aborts, and strict. For each of the
following schedules, state which of the above classes it belongs to. If you cannot
decide whether a schedule belongs in a certain class based on the listed actions,
explain briefly.
L5
The actions are listed in the order they are scheduled, and prefixed with the transaction
name.If a commit or abort is not shown, the schedule is incomplete; assume that
abort/commit must follow all the listed actions.
1. T1:R(X), T2:R(X), T1:W(X), T2:W(X)
2. T1:W(X), T2:R(Y), T1:R(Y), T2:R(X)
3. T1:R(X), T2:R(Y), T3:W(X), T2:R(X), T1:R(Y)
4. T1:R(X), T1:R(Y), T1:W(X), T2:R(Y), T3:W(Y), T1:W(X), T2:R(Y)
5. T1:R(X), T2:W(X), T1:W(X), T2:Abort, T1:Commit
6. T1:R(X), T2:W(X), T1:W(X), T2:Commit, T1:Commit
7. T1:W(X), T2:R(X), T1:W(X), T2:Abort, T1:Commit
8. T1: W(X), T2:R(X), T1:W(X), T2:Commit, T1:Commit.
For each of the following locking protocols, assuming that every transaction
follows that locking protocol, state which of these desirable properties are ensured:
serializability, conflict-serializability, recoverability, avoid cascading aborts.
1. Always obtain an exclusive lock before writing; hold exclusive locks until end-oftransaction. No shared locks are ever obtained.
L5
2. In addition to (1), obtain a shared lock before reading; shared locks can be released
at any time.
3. As in (2), and in addition, locking is two-phase.
Under what conditions is it less expensive to avoid deadlock than to allow deadlocks
to occur and then to detect them?
If deadlock is avoided by deadlock-avoidance schemes, is starvation still possible?
Explain your answer.
In multiple-granularity locking, what is the difference between implicit and explicit
locking?
L5
Explain the reasons why recovery of interactive transactions is more difficult to deal
with than is recovery of batch transactions. Is there a simple way to deal with this
difficulty? (Hint: Consider an automatic teller machine transaction in which cash is
withdrawn.)
L5
Unit Wise Case Studies (With different Levels of thinking (Blooms Taxonomy))
Note: For every Case Study please mention the level of Blooms taxonomy
Unit – 1
Develop a Database Design for Hospital Management System
L3
System Description
Aditya hospital is a multi specialty hospital that includes a number of departments, rooms,
doctors, nurses, compounders, and other staff working in the hospital. Patients having different
kinds of ailments come to the hospital and get checkup done from the concerned doctors. If
1.
required they are admitted in the hospital and discharged after treatment.
The aim of this case study is to design and develop a database for the hospital to maintain the
records of various departments, rooms, and doctors in the hospital. It also maintains records of
the regular patients, patients admitted in the hospital, the check up of patients done by the
doctors, the patients that have been operated, and patients discharged from the hospital.
2. Consider the B+ tree index of order d= 2 shown in above
1. Show the tree that would result from inserting a data entry with key 9 into this tree.
2. Show the B+ tree that would result from inserting a data entry with key 3 into the
original tree. How many page reads and page writes will the insertion require?
3. Show the B+ tree that would result from deleting the data entry with key 8 from the
original tree, assuming that the left sibling is checked for possible redistribution.
4. Show the B+ tree that would result from deleting the data entry with key 8 from the
original tree, assuming that the right sibling is checked for possible redistribution.
5. Show the B+ tree that would result from starting with the original tree, inserting a data
entry with key 46 and then deleting the data entry with key 52.
L1
Unit Wise Multiple Choice Questions for CRT & Competitive Examinations
Unit---I
1. A relational database consists of a collection of
a) Tables
b) Fields
c) Records
d) Keys
Answer:a
2. The term _______ is used to refer to a row.
a) Attribute
b) Tuple
c) Field
d) Instance
Answer:b.
3. For each attribute of a relation, there is a set of permitted values, called the ________ of that attribute.
a) Domain
b) Relation
c) Set
d) Schema
Answer:a.
4.Database __________ , which is the logical design of the database, and the database _______,which is a
snapshot of the data in the database at a given instant in time.
a) Instance, Schema
b) Relation, Schema
c) Relation, Domain
d) Schema, Instance
Answer:d
5. The tuples of the relations can be of ________ order.
a) Any
b) Same
c) Sorted
d) Constant
Answer:a
6. Which one of the following is a set of one or more attributes taken collectively to uniquely identify a record?
a) Candidate key
b) Sub key
c) Super key
d) Foreign key
Answer:c
7. The subset of super key is a candidate key under what condition ?
a) No proper subset is a super key
b) All subsets are super keys
c) Subset is a super key
d) Each subset is a super key
Answer:a
8. A attribute in a relation is a foreign key if the _______ key from one relation is used as an attribute in that
relation .
a) Candidate
b) Primary
c) Super
d) Sub
Answer:b
9. A _________ integrity constraint requires that the values appearing in specified attributes of any tuple in the
referencing relation also appear in specified attributes of at least one tuple in the referenced relation.
a) Referential
b) Referencing
c) Specific
d) Primary
Answer:a
10. Using which language can a user request information from a database ?
a) Query
b) Relational
c) Structural
d) Compiler
Answer:a.
11. Which one of the following is a procedural language ?
a) Domain relational calculus
b) Tuple relational calculus
c) Relational algebra
d) Query language
Answer:c
12. The_____ operation allows the combining of two relations by merging pairs of tuples, one from each relation,
into a single tuple.
a) Select
b) Join
c) Union
d) Intersection
Answer:b
13. The result which operation contains all pairs of tuples from the two relations, regardless of whether their
attribute values match.
a) Join
b) Cartesian product
c) Intersection
d) Set difference
Answer:b
14.Which one of the following is used to define the structure of the relation ,deleting relations and relating
schemas ?
a) DML(Data Manipulation Langauge)
b) DDL(Data Definition Langauge)
c) Query
d) Relational Schema
Answer:b
15. Which one of the following provides the ability to query information from the database and to insert tuples
into, delete tuples from, and modify tuples in the database ?
a) DML(Data Manipulation Langauge)
b) DDL(Data Definition Langauge)
c) Query
d) Relational Schema
Answer:a
16. Create table employee (name varchar ,id integer)
What type of statement is this ?
a) DML
b) DDL
c) View
d) Integrity constraint
Answer:b
17. Select * from employee
What type of statement is this?
a) DML
b) DDL
c) View
d) Integrity constraint
Answer:a
18. To remove a relation from an SQL database, we use the ______ command.
a) Delete
b) Purge
c) Remove
d) Drop table
Answer:d
19. Select * from employee where salary>10000 and dept_id=101;
Which of the following fields are displayed as output?
a) Salary, dept_id
b) Employee
c) Salary
d) All the field of employee relation
Answer:d
20. Select emp_name
from department
where dept_name like ’ _____ Computer Science’;
Which one of the following has to be added into the blank to select the dept_name which has Computer Science as
its ending string ?
a) %
b) _
c) ||
d) $
Answer:a
21. In SQL the spaces at the end of the string are removed by _______ function .
a) Upper
b) String
c) Trim
d) Lower
Answer:c
22. If we want to retain all duplicates, we must write ________ in place of union.
a) Union all
b) Union some
c) Intersect all
d) Intersect some
Answer:a
23. _________ joins are SQL server default
a) Outer
b) Inner
c) Equi
d) None of the mentioned
Answer:b
24. In a employee table to include the attributes whose value always have some value which of the following
constraint must be used ?
a) Null
b) Not null
c) Unique
d) Distinct
Answer:b
25. The primary key must be
a) Unique
b) Not null
c) Both a and b
d) Either a or b
Answer:c
26. Select __________
from instructor
where dept name= ’Comp. Sci.’;
Which of the following should be used to find the mean of the salary ?
a) Mean(salary)
b) Avg(salary)
c) Sum(salary)
d) Count(salary)
Answer:b
27. All aggregate functions except _____ ignore null values in their input collection.
a) Count(attribute)
b) Count(*)
c) Avg
d) Sum
Answer:b
UNIT-----II
1. SQL applies predicates in the _______ clause after groups have been formed, so aggregate functions may be
used.
a) Group by
b) With
c) Where
d) Having
Answer:b
2. Aggregate functions can be used in the select list or the_______clause of a select statement or subquery. They
cannot be used in a ______ clause.
a) Where, having
b) Having, where
c) Group by, having
d) Group by, where
Answer:b
3. The ________ keyword is used to access attributes of preceding tables or subqueries in the from clause.
a) In
b) Lateral
c) Having
d) With
Answer:b
4. Which of the following creates temporary relation for the query on which it is defined ?
a) With
b) From
c) Where
d) Select
Answer:a
5. With max_budget (value) as
(select max(budget)
from department)
select budget
from department, max_budget
where department.budget = max budget.value;
In the query given above which one of the following is a temporary relation ?
a) Budget
b) Department
c) Value
d) Max_budget
Answer:d
6.Which of the following is not a aggregate function ?
a) Avg
b) Sum
c) With
d) Min
View Answer
Answer:c
7. The____condition allows a general predicate over the relations being joined.
a) On
b) Using
c) Set
d) Where
Answer:a
8. Which of the join operations do not preserve non matched tuples.
a) Left outer join
b) Right outer join
c) Inner join
d) Natural join
Answer:c
9. What type of join is needed when you wish to include rows that do not have matching values?
a) Equi-join
b) Natural join
c) Outer join
d) All of the mentioned
Answer:c
10. Which of the following creates a virtual relation for storing the query ?
a) Function
b) View
c) Procedure
d) None of the mentioned
Answer:b
11. In order to undo the work of transaction after last commit which one should be used ?
a) View
b) Commit
c) Rollback
d) Flashback
Answer:c
12. A transaction completes its execution is said to be
a) Committed
b) Aborted
c) Rolled back
d) Failed
Answer:a
13. Foreign key is the one in which the ________ of one relation is referenced in another relation.
a) Foreign key
b) Primary key
c) References
d) Check constraint
Answer:b
14. Domain constraints, functional dependency and referential integrity are special forms of _________.
a) Foreign key
b) Primary key
c) Assertion
d) Referential constraint
Answer:c
15. An ________ on an attribute of a relation is a data structure that allows the database system to find those
tuples in the relation that have a specified value for that attribute efficiently, without scanning through all the
tuples of the relation.
a) Index
b) Reference
c) Assertion
d) Timestamp
Answer:a
16. Which of he following is used to input the entry and give the result in a variable in a procedure ?
a) Put and get
b) Get and put
c) Out and In
d) In and out
Answer:d
17. A __________ is a special kind of a store procedure that executes in response to certain action on the table like
insertion, deletion or updation of data.
a) Procedures
b) Triggers
c) Functions
d) None of the mentioned
Answer:b
18. Trigger are supported in
a) Delete
b) Update
c) Views
d) All of the mentioned
View Answer
Answer:c
19. Ranking of queries is done by which of the following ?
a) Group by
b) Order by
c) Having
d) Both a and b
Answer:b
20. Which of the following is a fundamental operation in relational algebra ?
a) Set intersection
b) Natural join
c) Assignment
d) None of the mentioned
Answer:d
UNIT----III
1.An ________ is a set of entities of the same type that share the same properties, or attributes .
a) Entity set
b) Attribute set
c) Relation set
d) Entity model
Answer:a
2. The function that an entity plays in a relationship is called that entity’s _____________.
a) Participation
b) Position
c) Role
d) Instance
Answer:c
3. _____________, express the number of entities to which another entity can be associated via a relationship set.
a) Mapping Cardinality
b) Relational Cardinality
c) Participation Constraints
d) None of the mentioned
Answer:a
4. Data integrity constraints are used to:
a) Control who is allowed access to the data
b) Ensure that duplicate records are not entered into the table
c) Improve the quality of data entered for a specific property
d) Prevent users from changing the values stored in the table
Answer:c
5. Which one of the following uniquely identifies the elements in the relation?
a) Secondary Key
b) Primary key
c) Foreign key
d) Composite key
Answer:b
6. ____________ is preferred method for enforcing data integrity
a) Constraints
b) Stored Procedure
c) Triggers
d) Cursors
Answer:a
7. The entity relationship set is represented in E-R diagram as
a) Double diamonds
b) Undivided rectangles
c) Dashed lines
d) Diamond
Answer:d
8. The Rectangles divided into two parts represents
a) Entity set
b) Relationship set
c) Attributes of a relationship set
d) Primary key
Answer:a
9. An entity set that does not have sufficient attributes to form a primary key is termed a __________ .
a) Strong entity set
b) Variant set
c) Weak entity set
d) Variable set
Answer:c
10. Given the basic ER and relational models, which of the following is INCORRECT?
a) An attribute of an entity can have more than one value
b) An attribute of an entity can be composite
c) In a row of a relational table, an attribute can have more than one value
d) In a row of a relational table, an attribute can have exactly one value or a NULL value
Answer:c
11. Which of the following indicates the maximum number of entities that can be involved in a relationship?
a) Minimum cardinality
b) Maximum cardinality
c) ERD
d) Greater Entity Count
Answer:b
12. The entity set person is classified as student and employee .This process is called
a) Generalization
b) Specialization
c) Inheritance
d) Constraint generalization
Answer:b
13. Functional dependencies are a generalization of
a) Key dependencies
b) Relation dependencies
c) Database dependencies
d) None of the mentioned
Answer:a
14. In the __________ normal form, a composite attribute is converted to individual attributes.
A) First
B) Second
C) Third
D) Fourth
Answer:a
15. A table on the many side of a one to many or many to many relationship must:
a) Be in Second Normal Form (2NF)
b) Be in Third Normal Form (3NF)
c) Have a single attribute key
d) Have a composite key
Answer:d
16. Tables in second normal form (2NF):
a) Eliminate all hidden dependencies
b) Eliminate the possibility of a insertion anomalies
c) Have a composite key
d) Have all non key fields depend on the whole primary key
Answer:a
17. A relation is in ____________ if an attribute of a composite key is dependent on an attribute of other
composite key.
a) 2NF
b) 3NF
c) BCNF
d) 1NF
Answer:b
18. If a multivalued dependency holds and is not implied by the corresponding functional dependency, it usually
arises from one of the following sources.
a) A many-to-many relationship set
b) A multivalued attribute of an entity set
c) A one-to-many relationship set
d) Both a and b
Answer:d
UNIT----IV
1. Which of the following is a physical storage media ?
a) Tape Storage
b) Optical Storage
c) Flash memory
d) All of the mentioned
Answer:d
2. The _________ is the fastest and most costly form of storage, which is relatively small; its use is managed by
the computer system hardware.
a) Cache
b) Disk
c) Main memory
d) Flash memory
Answer:a
3. There are “record-once” versions of compact disk and digital video disk, which can be written only once; such
disks are also called __________ disks.
a) Write-once, read-many (WORM)
b) CD-R
c) DVD-W
d) CD-ROM
Answer:a
4. Tape storage is referred to as __________ storage.
a) Direct-access
b) Random-access
c) Sequential-access
d) All of the mentioned
Answer:c
5. A __________ is the smallest unit of information that can be read from or written to the disk.
a) Track
b) Spindle
c) Sector
d) Platter
Answer:c
6. The disk platters mounted on a spindle and the heads mounted on a disk arm are together known as
___________.
a) Read-disk assemblies
b) Head–disk assemblies
c) Head-write assemblies
d) Read-read assemblies
Answer:b
7. . _________ is the time from when a read or write request is issued to when data transfer begins.
a) Access time
b) Average seek time
c) Seek time
d) Rotational latency time
Answer:a
8. Which level of RAID refers to disk mirroring with block striping?
a) RAID level 1
b) RAID level 2
c) RAID level 0
d) RAID level 3
Answer:a
9. ___________ is popular for applications such as storage of log files in a database system, since it offers the best
write performance.
a) RAID level 1
b) RAID level 2
c) RAID level 0
d) RAID level 3
Answer:a
10. ______________ which increases the number of I/O operations needed to write a single logical block, pays a
significant time penalty in terms of write performance.
a) RAID level 1
b) RAID level 2
c) RAID level 5
d) RAID level 3
Answer:a
11. Tertiary storage is built with :
a) a lot of money
b) unremovable media
c) removable media
d) secondary storage
Answer:c
12. The file organization which allows us to read records that would satisfy the join condition by using one block
read is
a) Heap file organization
b) Sequential file organization
c) Clustering file organization
d) Hash file organization
Answer: c
13. A relational database system needs to maintain data about the relations, such as the schema of the relations.
This is called
a) Metadata
b) Catalog
c) Log
d) Dictionary
Answer:a
14. The _______ is that part of main memory available for storage of copies of disk blocks.
a) Buffer
b) Catalog
c) Storage
d) Secondary storage
Answer:a
15. The purpose of an N-Ary association is:
a) To capture a parent-child relationship
b) To deal with one to many relationships
c) To deal with relationships that involve more than two tables
d) To represent an inheritance relationship
Answer: c
16. Bitmap indices are a specialized type of index designed for easy querying on ___________.
a) Bit values
b) Binary digits
c) Multiple keys
d) Single keys
Answer: c
17. A _______ on the attribute A of relation r consists of one bitmap for each value that A can take.
a) Bitmap index
b) Bitmap
c) Index
d) Array
Answer: a
18. Bitmaps can be combined with regular B+-tree indices for relations where a few attribute values are extremely
common, and other values also occur, but much less frequently.
a) Bitmap, B-tree
b) Bitmap, B+tree
c) B-tree, Bitmap
d) B+tree, Bitmap
Answer: b
19. In a B+-tree index ______, for each value we would normally maintain a list of all records with that value for
the indexed attribute.
a) Leaf
b) Node
c) Root
d) Link
Answer: a
20.How many types of indexes are there in sql server?
a) 1
b) 2
c) 3
d) 4
Answer: b
UNIT----V
1. A transaction is delimited by statements (or function calls) of the form __________.
a) Begin transaction and end transaction
b) Start transaction and stop transaction
c) Get transaction and post transaction
d) Read transaction and write transaction
Answer:a
2. Identify the characteristics of transactions
a) Atomicity
b) Durability
c) Isolation
d) All of the mentioned
Answer:d
3. In SQL, which command is used to issue multiple CREATE TABLE, CREATE VIEW and GRANT statements
in a single transaction?
a) CREATE PACKAGE
b) CREATE SCHEMA
c) CREATE CLUSTER
d) All of the mentioned
Answer: b
4. The unit of storage that can store one are more records in a hash file organization are
a) Buckets
b) Disk pages
c) Blocks
d) Nodes
Answer:a
5. A ______ file system is software that enables multiple computers to share file storage while maintaining
consistent space allocation and file content.
a) Storage
b) Tertiary
c) Secondary
d) Cluster
Answer:d
6. If an transaction is performed in a database and committed, the changes are taken to the previous state of
transaction by
a) Flashback
b) Rollback
c) Both a and b
d) Cannot be done
Answer:d
7. _______ means that data used during the execution of a transaction cannot be used by a second transaction until
the first one is completed.
a) Serializability
b) Atomicity
c) Isolation
d) Time stamping
Answer:c
8. Some of the utilities of DBMS are _____________
i) Loading ii) Backup iii) File organization iv) Process Organization
a) i, ii, and iv only
b) i, ii and iii only
c) ii, iii and iv only
d) All i, ii, iii, and iv
Answer: b
9. In order to maintain transactional integrity and database consistency, what technology does a DBMS deploy?
a) Triggers
b) Pointers
c) Locks
d) Cursors
Answer:c
10. A lock that allows concurrent transactions to access different rows of the same table is known as a
a) Database-level lock
b) Table-level lock
c) Page-level lock
d) Row-level lock
Answer:d
11. Which of the following are introduced to reduce the overheads caused by the log-based recovery?
a) Checkpoints
b) Indices
c) Deadlocks
d) Locks
Answer:d
12. Which of the following protocols ensures conflict serializability and safety from deadlocks?
a) Two-phase locking protocol
b) Time-stamp ordering protocol
c) Graph based protocol
d) Both (a) and (b) above
Answer:b
13. In a granularity hierarchy the highest level represents the
a) Entire database
b) Area
c) File
d) Record
Answer: a
14. If a node is locked in ___________ , explicit locking is being done at a lower level of the tree, but with only
shared-mode locks.
a) Intention lock modes
b) Intention-shared-exclusive mode
c) Intention-exclusive (IX) mode
d) Intention-shared (IS) mode
Answer: a
15. ____________ denotes the largest timestamp of any transaction that executed write(Q) successfully.
a) W-timestamp(Q)
b) R-timestamp(Q)
c) RW-timestamp(Q)
d) WR-timestamp(Q)
Answer: a
16. ARIES uses a ___________ to identify log records, and stores it in database pages.
a) Log sequence number
b) Log number
c) Lock number
d) Sequence
Answer: b
17. ______________ is used to minimize unnecessary redos during recovery.
a) Dirty page table
b) Page table
c) Dirty redo
d) All of the mentioned
Answer: a
18. The remote backup site is sometimes also called the
a) Primary Site
b) Secondary Site
c) Tertiary Site
d) None of the mentioned
Answer: b
19. Remote backup system must be _________ with the primary site.
a) Synchronised
b) Separated
c) Connected
d) Detached but related
Answer: a
20. The backup is taken by
a) Erasing all previous records
b) Entering the new records
c) Sending all log records from primary site to the remote backup site
d) Sending selected records from primary site to the remote backup site
Answer: a
University Question Papers
R09
Code No: R09220502
Set No. 2
II B.Tech II Semester Examinations,APRIL 2011
DATA BASE MANAGEMENT SYSTEMS
Common to ME, IT, MECT, AME, CSE, ECE
Time: 3 hours
Max Marks: 75
Answer any FIVE Questions
All Questions carry equal marks
?????
1. (a) Brie y explain the Database Design process.
(b) De ne these terms: Entity, Entity set, Attribute, Key.
2. Explain schema re nement in Database Design?
[7+8]
[15]
3. (a) Write a note on DBMS? Explain Database System Applications.
(b) What is a Data Model? Explain ER data model.
[7+8]
4. During its execution, a transaction passes through several states, until it nally
commits or aborts. List all possible sequences of states through which a transaction
may pass. Explain why each state transition may occur?
[15]
5. (a) How is Data organized in Tree based index?
(b) When would users use a Tree based index?
6. Explain ARIES.
[7+8]
[15]
7. (a) Consider the following Relations
Student (snum: integer, sname: string, major: string, level: string, age: integer)
Class (name: string, meets at: time, room: string, d: integer)
Enrolled (snum: integer, cname: string)
Faculty ( d: integer, fname: string, deptid: integer)
Write the following queries in SQL.
i. Find the names of students not enrolled in any class.
ii. Find the names of students enrolled in the maximum number of classes.
iii. Print the level and the average age of students for that level, for each level.
iv. Print the level and the average age of the students for that level, for all
levels except JR.
(b) Explain following in brief
i. Triggers
ii. Assertions
[11+4]
8. Consider the following Schema:
Suppliers (sid : integer, sname: string, address: string) Parts
(pid : integer, pname: string, color: string)
www.jntuworld.com
www.jntuworld.com
Code No: R09220502
R09
Set No. 2
Catalog (sid : integer; pid : integer, cost: real)
The key elds are underlined. The catalog relation lists the price changes for parts by
supplies. Write the following Queries in Relational Algebra and tuple relational
calculus.
(a) Find the sids of suppliees who supply some red part and some green part
(b) Find the sids of suppliees who supply every part
(c) Find the sids of suppliees who supply every red or green part
(d) Find the pids of parts supplied by at least two di erent suppliees.
?????
www.jntuworld.com
[15]
www.jntuworld.com
Code No: R09220502
R09
Set No. 4
II B.Tech II Semester Examinations,APRIL 2011 DATA
BASE MANAGEMENT SYSTEMS
Common to ME, IT, MECT, AME, CSE, ECE
Time: 3 hours
Max Marks: 75
Answer any FIVE Questions
All Questions carry equal marks
?????
1. (a) Explain how to di erentiate attributes in Entity set?
(b) Explain all the functional dependencies in Entity sets?
[7+8]
2. (a) Discuss about DDL and DML.
(b) What are ve main functions of Database Administrator? Explain
[8+7]
3. (a) Construct an ER diagram for a bank Database. Bank maintains data about
customers, their loans, their deposits, lockers. Determine the entities and
relationships.
(b) De ne the terms: Entity Set, Role, Relationship set, Aggregation.
[7+8]
4. (a) Explain dynamic Data structure?
(b) Explain over ow of pages?
[7+8]
5. (a) Explain about the basic form of a SQL queries.
(b) Write the following queries in SQL for the following schema.
Sailors (sid: integer, sname: string, rating: integer, age: real)
Boats (bid: integer, bname: string, color: string)
Reserves (sid: integer, bid: integer, day: date)
i. Compute increments for the ratings of persons who have sailed two different boats on the same day.
ii. Find the ages of sailors whose names begins and ends with B and has at
least three characters.
iii. Find the colors of boats reserved by Raghu.
iv. Find the sids of all sailors who have reserved red boats but not green
boats.
[7+8]
6. (a) De ne the divisible operation in terms of the basic relational algebra opera-tions.
Describe a typical query that calls for division. Unlike join, the division operation is
not given special treatment in database systems. Explain why.
(b) Database Systems use some variant of Relational Algebra to represent query
evaluation plans. Explain why Algebra is suitable for this purpose. [7+8]
7. (a) What is Thomas write rule?
(b) Explain the time-stamp ordering protocol?
8. Explain advanced recovery Techniques?
?????
www.jntuworld.com
[7+8]
[15]
www.jntuworld.com
Code No: R09220502
Time: 3 hours
R09
Set No. 1
II B.Tech II Semester Examinations,APRIL 2011 DATA
BASE MANAGEMENT SYSTEMS
Common to ME, IT, MECT, AME, CSE, ECE
Max Marks: 75
Answer any FIVE Questions
All Questions carry equal marks
?????
1. (a) Give a note on storage manager component of database system structure.
(b) Make a comparison between Database system and File system.
[7+8]
2. (a) How the Data is stored in External Storage?
(b) Explain le organization & indexing?
[7+8]
3. (a) Construct an ER diagram for university registrar's o ce. The o ce maintains data
about each class, including the instructor, the enrollment and the time and place
of the class meetings. For each student class pair a grade is recorded.
Determine the entities and relationships.
(b) What is the composite attribute? How to model it in the ER diagram? Explain
with an example.
[7+8]
4. Explain the Di erence between three storage types, Volatile, Non-Volatile and Stable in terms of I/O cost.
[15]
5. (a) Discuss about Tuple Relational Calculus in detail.
(b) Write the following queries in Tuple Relational Calculus for following Schema.
Sailors (sid: integer, sname: string, rating: integer, age: real)
Boats (bid: integer, bname: string, color: string)
Reserves (sid: integer, bid: integer, day: date)
i. Find the names of sailors who have reserved a red boat
ii. Find the names of sailors who have reserved at least one boat
iii. Find the names of sailors who have reserved at least two boats
iv. Find the names of sailors who have reserved all boats.
[7+8]
6. (a) Explain constraints on an Entity set.
(b) Explain constraints on a Relationship set.
[7+8]
7. (a) Write the following queries in SQL using Nested queries concept for following
Schema.
Sailors (sid: integer, sname: string, rating: integer, age: real)
Boats (bid: integer, bname: string, color: string)
Reserves (sid: integer, bid: integer, day: date)
i. Find the names of sailors who have reserved both red and green boat
ii. Find the names of sailors who have reserved all boats
www.jntuworld.com
www.jntuworld.com
Code No: R09220502
R09
Set No. 1
iii. Find the names of sailors who have not reserved red boat
iv. Find sailors whose rating is better than some sailor called raghu.
(b) What is a correlated nested query? Explain with an example.
[11+4]
8. Consider the following two transactions:
T1: read(A);
read(B);
if A=0 then B:= B + 1;
write(B).
T2: read(B);
read(A);
if B=0 then A:= A + 1;
write(A).
Let the consistency requirement be A=0 V B=0, with A=B=0 the initial values.
(a) Show that every serial execution involving these two transactions preserves the
consistency of the Database?
(b) Show a concurrent execution of T 1 and T2 that produces a non serializable
Schedule?
(c) Is there a concurrent execution of T1 and T2 that produces a serializable
Schedule?
[5+5+5]
?????
www.jntuworld.com
www.jntuworld.com
Code No: R09220502
Time: 3 hours
R09
Set No. 3
II B.Tech II Semester Examinations,APRIL 2011 DATA
BASE MANAGEMENT SYSTEMS
Common to ME, IT, MECT, AME, CSE, ECE
Max Marks: 75
Answer any FIVE Questions
All Questions carry equal marks
?????
1. (a) Explain covering constraints & overlap constraints.
(b) Give a detail note on weak entity set.
[7+8]
2. (a) Explain functional dependency with an example?
(b) Compare Third NF and BCNF, explain with examples?
[11+4]
3. (a) What is the relationship betweens les & indexes?
(b) What is the search key for an index?
(c) What is Data entry in an index?
[7+4+4]
4. (a) Explain the Database users and user interfaces.
(b) Discuss the function of Database Administrator.
[9+6]
5. (a) Discuss about joins in Relational Algebra with examples.
(b) Explain about set operations in Relational Algebra with examples.
[7+8]
6. Explain shadow-copy technique for Atomicity and Durability?
[15]
7. (a) Consider the following Relations
Student (snum: integer, sname: string, major: string, level: string, age: integer)
Class (name: string, meets at: time, room: string, d: integer)
Enrolled (snum: integer, cname: string)
Faculty ( d: integer, fname: string, deptid: integer)
Write the following queries in SQL.
i. Find the names of all juniors (level = JR) who are enrolled in a class
taught by I. teach.
ii. Find the age of the oldest student who is either a History major or
enrolled in a course taught by I. teach.
iii. Find the names of all classes that either meet in a room R128 or have
2
ve or more students enrolled.
iv. Find the number of all students who are enrolled in two classes that
meet at the same time.
(b) What is a trigger and what are its 3 parts. Explain in detail.
[11+4]
8. Stable Storage cant be Implemented, Explain why it can't be?
?????
6
[15]
3
Tutorial Sheet
Unit-I
Topics Revised
Date:
Quick Test Topics
Date:
Case Study Discussed
Date:
4
Unit-II
Topics Revised
Date:
Quick Test Topics
Date:
Case Study Discussed
Date:
5
Unit-III
Topics Revised
Date:
Quick Test Topics
Date:
Case Study Discussed
Date:
Unit-IV
6
Topics Revised
Date:
Quick Test Topics
Date:
Case Study Discussed
Date:
Unit-V
7
Topics Revised
Date:
Quick Test Topics
Date:
Case Study Discussed
Date:
8
TOPICS BEYOND SYLLABUS
1) Introduction to Oracle Reports 9i and Oracle forms 9i
2) Query Optimization.
Add-on Programmes:
1. My Sql Certification through FOSS Programme
2
3
4
Guest Lectures:
1. Workshop on IBM DB2
2.
3.
4.
Unit Wise PPT’s:
9
Unit Wise lecture Notes:
What is a Database?
To find out what database is, we have to start from data, which is the basic building block of any DBMS.
Data: Facts, figures, statistics etc. having no particular meaning (e.g. 1, ABC, 19 etc).
Record: Collection of related data items, e.g. in the above example the three data items had no meaning. But
if we organize them in the following way, then they collectively represent meaningful information.
Roll
Name
Age
1
ABC
19
Table or Relation: Collection of related records.
Roll
Name
Age
1
2
3
ABC
DEF
XYZ
19
22
28
The columns of this relation are called Fields, Attributes or Domains.
called Tuples or Records.
Database: Collection of related relations. Consider the following collection of tables:
T1
Roll
Name
Age
1
ABC
19
2
DEF
22
3
XYZ
28
T2
Roll
Address
1
KOL
2
DEL
3
MUM
The
rows
are
10
T3
Roll
Year
1
I
2
II
3
I
T4
Year
Hostel
I
H1
II
H2
We now have a collection of 4 tables. They can be called a “related collection” because we can clearly find out
that there are some common attributes existing in a selected pair of tables. Because of these common attributes
we may combine the data of two or more tables together to find out the complete details of a student.
Questions like “Which hostel does the youngest student live in?” can be answered now,
although Age and Hostelattributes are in different tables.
In a database, data is organized strictly in row and column format. The rows are calledTuple or Record. The
data items within one row may belong to different data types. On the other hand, the columns are often
called Domain or Attribute. All the data items within a single attribute are of the same data type.
What is Management System?
A management system is a set of rules and procedures which help us to create organize and manipulate the
database. It also helps us to add, modify delete data items in the database. The management system can be
either manual or computerized.
The management system is important because without the existence of some kind of rules and regulations it is
not possible to maintain the database. We have to select the particular attributes which should be included in a
particular table; the common attributes to create relationship between two tables; if a new record has to be
inserted or deleted then which tables should have to be handled etc. These issues must be resolved by having
some kind of rules to follow in order to maintain the integrity of the database.
Views of Data
We know that the same thing, if viewed from different angles produces difference sights. Likewise, the
database that we have created already can have different aspects to reveal if seen from different levels of
abstraction. The term Abstraction is very important here. Generally it means the amount of detail you want to
hide. Any entity can be seen from different perspectives and levels of complexity to make it a reveal its
current amount of abstraction. Let us illustrate by a simple example.
A computer reveals the minimum of its internal details, when seen from outside. We do not know what parts it
is built with. This is the highest level of abstraction, meaning very few details are visible. If we open the
computer case and look inside at the hard disc, motherboard, CD drive, CPU and RAM, we are in middle level
11
of abstraction. If we move on to open the hard disc and examine its tracks, sectors and read-write heads, we
are at the lowest level of abstraction, where no details are invisible.
In the same manner, the database can also be viewed from different levels of abstraction to reveal different
levels of details. From a bottom-up manner, we may find that there are three levels of abstraction or views in
the database. We discuss them here.
The word schema means arrangement – how we want to arrange things that we have to store. The diagram
above shows the three different schemas used in DBMS, seen from different levels of abstraction.
The lowest level, called the Internal or Physical schema, deals with the description of how raw data items
(like 1, ABC, KOL, H2 etc.) are stored in the physical storage (Hard Disc, CD, Tape Drive etc.). It also
describes the data type of these data items, the size of the items in the storage media, the location (physical
address) of the items in the storage device and so on. This schema is useful for database application
developers and database administrator.
The middle level is known as the Conceptual or Logical Schema, and deals with the structure of the entire
database. Please note that at this level we are not interested with the raw data items anymore, we are interested
with the structure of the database. This means we want to know the information about the attributes of each
table, the common attributes in different tables that help them to be combined, what kind of data can be input
into these attributes, and so on. Conceptual or Logical schema is very useful for database administrators whose
responsibility is to maintain the entire database.
The highest level of abstraction is the External or View Schema. This is targeted for the end users. Now, an
end user does not need to know everything about the structure of the entire database, rather than the amount of
details he/she needs to work with. We may not want the end user to become confused with astounding amount
of details by allowing him/her to have a look at the entire database, or we may also not allow this for the
purpose of security, where sensitive information must remain hidden from unwanted persons. The database
administrator may want to create custom made tables, keeping in mind the specific kind of need for each user.
These tables are also known as virtual tables, because they have no separate physical existence. They are
crated dynamically for the users at runtime. Say for example, in our sample database we have created earlier,
we have a special officer whose responsibility is to keep in touch with the parents of any under aged student
living in the hostels. That officer does not need to know every detail except the Roll, Name,
12
Addresss and Age. The database administrator may create a virtual table with only these four attributes, only
for the use of this officer.
Data Independence
This brings us to our next topic: data independence. It is the property of the database which tries to ensure that
if we make any change in any level of schema of the database, the schema immediately above it would require
minimal or no need of change.
What does this mean? We know that in a building, each floor stands on the floor below it. If we change the
design of any one floor, e.g. extending the width of a room by demolishing the western wall of that room, it is
likely that the design in the above floors will have to be changed also. As a result, one change needed in one
particular floor would mean continuing to change the design of each floor until we reach the top floor, with an
increase in the time, cost and labour. Would not life be easy if the change could be contained in one floor
only? Data independence is the answer for this. It removes the need for additional amount of work needed in
adopting the single change into all the levels above.
Data independence can be classified into the following two types:
Physical Data Independence: This means that for any change made in the physical schema, the need to
change the logical schema is minimal. This is practically easier to achieve. Let us explain with an example.
Say, you have bought an Audio CD of a recently released film and one of your friends has bought an Audio
Cassette of the same film. If we consider the physical schema, they are entirely different. The first is digital
recording on an optical media, where random access is possible. The second one is magnetic recording on a
magnetic media, strictly sequential access. However, how this change is reflected in the logical schema is very
interesting. For music tracks, the logical schema for both the CD and the Cassette is the title card imprinted on
their back. We have information like Track no, Name of the Song, Name of the Artist and Duration of the
Track, things which are identical for both the CD and the Cassette. We can clearly say that we have achieved
the physical data independence here.
Logical Data Independence: This means that for any change made in the logical schema, the need to change
the external schema is minimal. As we shall see, this is a little difficult to achieve. Let us explain with an
example.
Suppose the CD you have bought contains 6 songs, and some of your friends are interested in copying some of
those songs (which they like in the film) into their favorite collection. One friend wants the songs 1, 2, 4, 5, 6,
another wants 1, 3, 4, 5 and another wants 1, 2, 3, 6. Each of these collections can be compared to a view
schema for that friend. Now by some mistake, a scratch has appeared in the CD and you cannot extract the
song 3. Obviously, you will have to ask the friends who have song 3 in their proposed collection to alter their
view by deleting song 3 from their proposed collection as well.
Database Administrator
The Database Administrator, better known as DBA, is the person (or a group of persons) responsible for the
well being of the database management system. S/he has the flowing functions and responsibilities regarding
database management:
Definition of the schema, the architecture of the three levels of the data abstraction, data independence.
Modification of the defined schema as and when required.
Definition of the storage structure i.e. and access method of the data stored i.e. sequential, indexed or direct.
13
Creating new used-id, password etc, and also creating the access permissions that each user can or cannot
enjoy. DBA is responsible to create user roles, which are collection of the permissions (like read, write etc.)
granted and restricted for a class of users. S/he can also grant additional permissions to and/or revoke existing
permissions from a user if need be.
Defining the integrity constraints for the database to ensure that the data entered conform to some rules,
thereby increasing the reliability of data.
Creating a security mechanism to prevent unauthorized access, accidental or intentional handling of data that
can cause security threat.
Creating backup and recovery policy. This is essential because in case of a failure the database must be able to
revive itself to its complete functionality with no loss of data, as if the failure has never occurred. It is essential
to keep regular backup of the data so that if the system fails then all data up to the point of failure will be
available from a stable storage. Only those amount of data gathered during the failure would have to be fed to
the database to recover it to a healthy status.
Advantages and Disadvantages of Database Management System
We must evaluate whether there is any gain in using a DBMS over a situation where we do not use it. Let us
summarize the advantages.
Reduction of Redundancy: This is perhaps the most significant advantage of using DBMS. Redundancy is
the problem of storing the same data item in more one place. Redundancy creates several problems like
requiring extra storage space, entering same data more than once during data insertion, and deleting data from
more than one place during deletion. Anomalies may occur in the database if insertion, deletion etc are not
done properly.
Sharing of Data: In a paper-based record keeping, data cannot be shared among many users. But in
computerized DBMS, many users can share the same database if they are connected via a network.
Data Integrity: We can maintain data integrity by specifying integrity constrains, which are rules and
restrictions about what kind of data may be entered or manipulated within the database. This increases the
reliability of the database as it can be guaranteed that no wrong data can exist within the database at any point
of time.
Data security: We can restrict certain people from accessing the database or allow them to see certain portion
of the database while blocking sensitive information. This is not possible very easily in a paper-based record
keeping.
However, there could be a few disadvantages of using DBMS. They can be as following:
As DBMS needs computers, we have to invest a good amount in acquiring the hardware, software, installation
facilities and training of users.
We have to keep regular backups because a failure can occur any time. Taking backup is a lengthy process and
the computer system cannot perform any other job at this time.
While data security system is a boon for using DBMS, it must be very robust. If someone can bypass the
security system then the database would become open to any kind of mishandling.
Database Design
When a company asks you to make them a working, functional DBMS which they can work with, there are
certain steps to follow. Let us summarize them here:
14
Gathering information: This could be a written document that describes the system in question with
reasonable amount of details.
Producing ERD: ERD or Entity Relationship Diagram is a diagrammatic representation of the description we
have gathered about the system.
Designing the database: Out of the ERD we have created, it is very easy to determine the tables, the
attributes which the tables must contain and the relationship among these tables.
Normalization: This is a process of removing different kinds of impurities from the tables we have just
created in the above step.
How to Prepare an ERD
Step 1
let us take a very simple example and we try to reach a fully organized database from it. Let us look at the
following simple statement:
A boy eats an ice cream.
This is a description of a real word activity, and we may consider the above statement as a written document
(very short, of course).
Step
2
Now we have to prepare the ERD. Before doing that we have to process the statement a little. We can see that
the sentence contains a subject (boy), an object (ice cream) and a verb (eats) that defines the relationship
between the subject and the object. Consider the nouns as entities (boy and ice cream) and the verb (eats) as a
relationship. To plot them in the diagram, put the nouns within rectangles and the relationship within a
diamond. Also, show the relationship with a directed arrow, starting from the subject entity (boy) towards the
object entity (ice cream).
Well, fine. Up to this point the ERD shows how boy and ice cream are related. Now, every boy must have a
name, address, phone number etc. and every ice cream has a manufacturer, flavor, price etc. Without these the
diagram is not complete. These items which we mentioned here are known as attributes, and they must be
incorporated in the ERD as connected ovals.
But can only entities have attributes? Certainly not. If we want then the relationship must have their attributes
too. These attribute do not inform anything more either about theboy or the ice cream, but they provide
additional information about the relationships between the boy and the ice cream.
15
Step
3
We are almost complete now. If you look carefully, we now have defined structures for at least three tables
like the following:
Boy
Name Address Phone
Ice Cream
Manufacturer Flavor Price
Eats
Date
Time
However, this is still not a working database, because by definition, database should be “collection of related
tables.” To make them connected, the tables must have some common attributes. If we chose the attribute
Name of the Boy table to play the role of the common attribute, then the revised structure of the above tables
become something like the following.
Boy
Name Address Phone
Ice Cream
Manufacturer Flavor Price Name
Eats
Date Time Name
This is as complete as it can be. We now have information about the boy, about the ice cream he has eaten and
about the date and time when the eating was done.
Cardinality of Relationship
While creating relationship between two entities, we may often need to face the cardinality problem. This
simply means that how many entities of the first set are related to how many entities of the second set.
Cardinality can be of the following three types.
One-to-One
Only one entity of the first set is related to only one entity of the second set. E.g. A teacher teaches a
student. Only one teacher is teaching only one student. This can be expressed in the following diagram as:
16
One-to-Many
Only one entity of the first set is related to multiple entities of the second set. E.g. A teacher teaches
students. Only one teacher is teaching many students. This can be expressed in the following diagram as:
Many-to-One
Multiple entities of the first set are related to multiple entities of the second set. E.g.Teachers teach a
student. Many teachers are teaching only one student. This can be expressed in the following diagram as:
Many-to-Many
Multiple entities of the first set is related to multiple entities of the second set. E.g.Teachers teach students. In
any school or college many teachers are teaching many students. This can be considered as a two way one-tomany relationship. This can be expressed in the following diagram as:
In this discussion we have not included the attributes, but you can understand that they can be used without
any problem if we want to.
The Concept of Keys
A key is an attribute of a table which helps to identify a row. There can be many different types of keys which
are explained here.
Super Key or Candidate Key: It is such an attribute of a table that can uniquely identify a row in a table.
Generally they contain unique values and can never contain NULL values. There can be more than one super
key or candidate key in a table e.g. within a STUDENT table Roll and Mobile No. can both serve to uniquely
identify a student.
Primary Key: It is one of the candidate keys that are chosen to be the identifying key for the entire table. E.g.
although there are two candidate keys in the STUDENT table, the college would obviously use Roll as the
primary key of the table.
Alternate Key: This is the candidate key which is not chosen as the primary key of the table. They are named
so because although not the primary key, they can still identify a row.
Composite Key: Sometimes one key is not enough to uniquely identify a row. E.g. in a single class Roll is
enough to find a student, but in the entire school, merely searching by the Roll is not enough, because there
could be 10 classes in the school and each one of them may contain a certain roll no 5. To uniquely identify
17
the student we have to say something like “class VII, roll no 5”. So, a combination of two or more attributes is
combined to create a unique combination of values, such as Class + Roll.
Foreign Key: Sometimes we may have to work with an attribute that does not have a primary key of its own.
To identify its rows, we have to use the primary attribute of a related table. Such a copy of another related
table’s primary key is called foreign key.
Strong and Weak Entity
Based on the concept of foreign key, there may arise a situation when we have to relate an entity having a
primary key of its own and an entity not having a primary key of its own. In such a case, the entity having its
own primary key is called a strong entity and the entity not having its own primary key is called a weak entity.
Whenever we need to relate a strong and a weak entity together, the ERD would change just a little.
Say, for example, we have a statement “A Student lives in a Home.” STUDENT is obviously a strong entity
having a primary key Roll. But HOME may not have a unique primary key, as its only attribute Address may
be shared by many homes (what if it is a housing estate?). HOME is a weak entity in this case.
The ERD of this statement would be like the following
As you can see, the weak entity itself and the relationship linking a strong and weak entity must have double
border.
Different Types of Database
There are three different types of data base. The difference lies in the organization of the database and the
storage structure of the data. We shall briefly mention them here.
Relational DBMS
This is our subject of study. A DBMS is relational if the data is organized into relations, that is, tables. In
RDBMS, all data are stored in the well-known row-column format.
Hierarchical DBMS
In HDBMS, data is organized in a tree like manner. There is a parent-child relationship among data items and
the data model is very suitable for representing one-to-many relationship. To access the data items, some kind
of tree-traversal techniques are used, such as preorder traversal.
Because HDBMS is built on the one-to-many model, we have to face a little bit of difficulty to organize a
hierarchical database into row column format. For example, consider the following hierarchical database that
shows four employees (E01, E02, E03, and E04) belonging to the same department D1.
18
There are two ways to represent the above one-to-many information into a relation that is built in one-to-one
relationship. The first is called Replication, where the department id is replicated a number of times in the
table like the following.
Dept-Id Employee Code
D1
E01
D1
E02
D1
E03
D1
E04
Replication makes the same data item redundant and is an inefficient way to store data. A better way is to use
a technique called the Virtual Record. While using this, the repeating data item is not used in the table. It is
kept at a separate place. The table, instead of containing the repeating information, contains a pointer to that
place where the data item is stored.
This organization saves a lot of space as data is not made redundant.
Network DBMS
The NDBMS is built primarily on a one–to-many relationship, but where a parent-child representation among
the data items cannot be ensured. This may happen in any real world situation where any entity can be linked
to any entity. The NDBMS was proposed by a group of theorists known as the Database Task Group
(DBTG). What they said looks like this:
In NDBMS, all entities are called Records and all relationships are called Sets. The record from where the
relationship starts is called the Owner Record and where it ends is called Member Record. The relationship
or set is strictly one-to-many.
In case we need to represent a many-to-many relationship, an interesting thing happens. In NDBMS, Owner
and Member can only have one-to-many relationship. We have to introduce a third common record with
which both the Owner and Member can have one-to-many relationship. Using this common record, the Owner
and Member can be linked by a many-to-many relationship.
Suppose we have to represent the statement Teachers teach students. We have to introduce a third record,
suppose CLASS to which both teacher and the student can have a many-to-many relationship. Using the class
in the middle, teacher and student can be linked to a virtual many-to-many relationship.
19
Normazalition
While designing a database out of an entity–relationship model, the main problem existing in that “raw”
database is redundancy. Redundancy is storing the same data item in more one place. A redundancy creates
several problems like the following:
Extra storage space: storing the same data in many places takes large amount of disk space.
Entering same data more than once during data insertion.
Deleting data from more than one place during deletion.
Modifying data in more than one place.
Anomalies may occur in the database if insertion, deletion, modification etc are no done properly. It creates
inconsistency and unreliability in the database.
To solve this problem, the “raw” database needs to be normalized. This is a step by step process of removing
different kinds of redundancy and anomaly at each step. At each step a specific rule is followed to remove
specific kind of impurity in order to give the database a slim and clean look.
Un-Normalized Form (UNF)
If a table contains non-atomic values at each row, it is said to be in UNF. An atomic value is something that
can not be further decomposed. A non-atomic value, as the name suggests, can be further decomposed and
simplified. Consider the following table:
Emp-Id Emp-Name
Month Sales Bank-Id
Bank-Name
E01
AA
Jan
1000 B01
SBI
Feb
1200
Mar
850
E02
BB
Jan
2200 B02
UTI
Feb
2500
E03
CC
Jan
1700 B01
SBI
Feb
1800
Mar
1850
Apr
1725
In the sample table above, there are multiple occurrences of rows under each key Emp-Id. Although
considered to be the primary key, Emp-Id cannot give us the unique identification facility for any single row.
Further, each primary key points to a variable length record (3 for E01, 2 for E02 and 4 for E03).
First Normal Form (1NF)
20
A relation is said to be in 1NF if it contains no non-atomic values and each row can provide a unique
combination of values. The above table in UNF can be processed to create the following table in 1NF.
Emp-Name
Month Sales Bank-Id
Bank-Name
Emp-Id
E01
AA
Jan
1000 B01
SBI
E01
AA
Feb
1200 B01
SBI
E01
AA
Mar
850
B01
SBI
E02
BB
Jan
2200 B02
UTI
E02
BB
Feb
2500 B02
UTI
E03
CC
Jan
1700 B01
SBI
E03
CC
Feb
1800 B01
SBI
E03
CC
Mar
1850 B01
SBI
E03
CC
Apr
1725 B01
SBI
As you can see now, each row contains unique combination of values. Unlike in UNF, this relation contains
only atomic values, i.e. the rows can not be further decomposed, so the relation is now in 1NF.
Second Normal Form (2NF)
A relation is said to be in 2NF f if it is already in 1NF and each and every attribute fully depends on the
primary key of the relation. Speaking inversely, if a table has some attributes which is not dependant on the
primary key of that table, and then it is not in 2NF.
Let us explain. Emp-Id is the primary key of the above relation. Emp-Name, Month, Sales and Bank-Name all
depend upon Emp-Id. But the attribute Bank-Name depends on Bank-Id, which is not the primary key of the
table. So the table is in 1NF, but not in 2NF. If this position can be removed into another related relation, it
would come to 2NF.
Emp-Id Emp-Name Month Sales Bank-Id
E01
AA
JAN 1000 B01
E01
AA
FEB 1200 B01
E01
AA
MAR 850 B01
E02
BB
JAN 2200 B02
E02
BB
FEB 2500 B02
E03
CC
JAN 1700 B01
E03
CC
FEB 1800 B01
E03
CC
MAR 1850 B01
E03
CC
APR 1726 B01
Bank-Id Bank-Name
B01
SBI
B02
UTI
After removing the portion into another relation we store lesser amount of data in two relations without any
loss information. There is also a significant reduction in redundancy.
21
Third Normal Form (3NF)
A relation is said to be in 3NF, if it is already in 2NF and there exists no transitive dependency in that
relation. Speaking inversely, if a table contains transitive dependency, then it is not in 3NF, and the table must
be split to bring it into 3NF.
What is a transitive dependency? Within a relation if we see
A→B [Bdepends on A]
And
B → C [C depends on B]
Then we may derive
A → C[C depends on A]
Such derived dependencies hold well in most of the situations. For example if we have
Roll → Marks
And
Marks → Grade
Then we may safely derive
Roll → Grade.
This third dependency was not originally specified but we have derived it.
The derived dependency is called a transitive dependency when such dependency becomes improbable.
For example we have been given
Roll → City
And
City → STDCode
If we try to derive Roll → STDCode it becomes a transitive dependency, because obviously the STDCode of a
city cannot depend on the roll number issued by a school or college. In such a case the relation should be
broken into two, each containing one of these two dependencies:
Roll → City
And
City → STD code
Boyce-Code Normal Form (BCNF)
A relationship is said to be in BCNF if it is already in 3NF and the left hand side of every dependency is a
candidate key. A relation which is in 3NF is almost always in BCNF. These could be same situation when a
3NF relation may not be in BCNF the following conditions are found true.
The candidate keys are composite.
There are more than one candidate keys in the relation.
There are some common attributes in the relation.
Professor Code Department Head of Dept. Percent Time
P1
Physics
Ghosh
50
P1
Mathematics Krishnan
50
P2
Chemistry Rao
25
22
P2
Physics
Ghosh
75
P3
Mathematics Krishnan
100
Consider, as an example, the above relation. It is assumed that:
A professor can work in more than one department
The percentage of the time he spends in each department is given.
Each department has only one Head of Department.
The relation diagram for the above relation is given as the following:
The given relation is in 3NF. Observe, however, that the names of Dept. and Head of Dept. are duplicated.
Further, if Professor P2 resigns, rows 3 and 4 are deleted. We lose the information that Rao is the Head of
Department of Chemistry.
The normalization of the relation is done by creating a new relation for Dept. and Head of Dept. and deleting
Head of Dept. form the given relation. The normalized relations are shown in the following.
Professor Code Department Percent Time
P1
Physics
50
P1
Mathematics 50
P2
Chemistry 25
P2
Physics
75
P3
Mathematics 100
Head of Dept.
Department
Physics
Ghosh
Mathematics Krishnan
Chemistry Rao
See the dependency diagrams for these new relations.
23
Fourth Normal Form (4NF)
When attributes in a relation have multi-valued dependency, further Normalization to 4NF and 5NF are
required. Let us first find out what multi-valued dependency is.
A multi-valued dependency is a typical kind of dependency in which each and every attribute within a
relation depends upon the other, yet none of them is a unique primary key.
We will illustrate this with an example. Consider a vendor supplying many items to many projects in an
organization. The following are the assumptions:
A vendor is capable of supplying many items.
A project uses many items.
A vendor supplies to many projects.
An item may be supplied by many vendors.
A multi valued dependency exists here because all the attributes depend upon the other and yet none of them
is a primary key having unique value.
Vendor Code Item Code Project No.
V1
I1
P1
V1
I2
P1
V1
I1
P3
V1
I2
P3
V2
I2
P1
V2
I3
P1
V3
I1
P2
V3
I1
P3
The given relation has a number of problems. For example:
If vendor V1 has to supply to project P2, but the item is not yet decided, then a row with a blank for item code
has to be introduced.
The information about item I1 is stored twice for vendor V3.
Observe that the relation given is in 3NF and also in BCNF. It still has the problem mentioned above. The
problem is reduced by expressing this relation as two relations in the Fourth Normal Form (4NF). A relation is
in 4NF if it has no more than one independent multi valued dependency or one independent multi valued
dependency with a functional dependency.
The table can be expressed as the two 4NF relations given as following. The fact that vendors are capable of
supplying certain items and that they are assigned to supply for some projects in independently specified in the
4NF relation.
24
Vendor-Supply
Item Code
Vendor Code
V1
I1
V1
I2
V2
I2
V2
I3
V3
I1
Vendor-Project
Project No.
Vendor Code
V1
P1
V1
P3
V2
P1
V3
P2
Fifth Normal Form (5NF)
These relations still have a problem. While defining the 4NF we mentioned that all the attributes depend upon
each other. While creating the two tables in the 4NF, although we have preserved the dependencies between
Vendor Code and Item code in the first table and Vendor Code and Item code in the second table, we have lost
the relationship between Item Code and Project No. If there were a primary key then this loss of dependency
would not have occurred. In order to revive this relationship we must add a new table like the following.
Please note that during the entire process of normalization, this is the only step where a new table is created by
joining two attributes, rather than splitting them into separate tables.
Project No. Item Code
P1
11
P1
12
P2
11
P3
11
P3
13
Let us finally summarize the normalization steps we have discussed so far.
Input
Transformation
Output
Relation
Relation
All Relations Eliminate variable length record. Remove multi-attribute lines in table.
1NF
1NF
Remove dependency of non-key attributes on part of a multi-attribute key.
2NF
Relation
2NF
Remove dependency of non-key attributes on other non-key attributes.
3NF
3NF
Remove dependency of an attribute of a multi attribute key on an attribute of BCNF
another (overlapping) multi-attribute key.
BCNF
Remove more than one independent multi-valued dependency from relation by 4NF
splitting relation.
25
4NF
Add one relation relating attributes with multi-valued dependency.
5NF
Indexing
What is an Index?
An index is a small table having only two columns. The first column contains a copy of the primary or
candidate key of a table and the second column contains a set of pointers holding the address of the disk block
where that particular key value can be found.
The advantage of using index lies in the fact is that index makes search operation perform very fast. Suppose a
table has a several rows of data, each row is 20 bytes wide. If you want to search for the record number 100,
the management system must thoroughly read each and every row and after reading 99x20 = 1980 bytes it will
find record number 100. If we have an index, the management system starts to search for record number 100
not from the table, but from the index. The index, containing only two columns, may be just 4 bytes wide in
each of its rows. After reading only 99x4 = 396 bytes of data from the index the management system finds an
entry for record number 100, reads the address of the disk block where record number 100 is stored and
directly points at the record in the physical storage device. The result is a much quicker access to the record (a
speed advantage of 1980:396).
The only minor disadvantage of using index is that it takes up a little more space than the main table.
Additionally, index needs to be updated periodically for insertion or deletion of records in the main table.
However, the advantages are so huge that these disadvantages can be considered negligible.
Types of Index
Primary Index
In primary index, there is a one-to-one relationship between the entries in the index table and the records in the
main table. Primary index can be of two types:
Dense primary index: the number of entries in the index table is the same as the number of entries in the
main table. In other words, each and every record in the main table has an entry in the index.
26
Sparse or Non-Dense Primary Index:
For large tables the Dense Primary Index itself begins to grow in size. To keep the size of the index smaller,
instead of pointing to each and every record in the main table, the index points to the records in the main table
in a gap. See the following example.
As you can see, the data blocks have been divided in to several blocks, each containing a fixed number of
records (in our case 10). The pointer in the index table points to the first record of each data block, which is
known as the Anchor Record for its important function. If you are searching for roll 14, the index is first
searched to find out the highest entry which is smaller than or equal to 14. We have 11. The pointer leads us to
roll 11 where a short sequential search is made to find out roll 14.
Clustering Index
It may happen sometimes that we are asked to create an index on a non-unique key, such as Dept-id. There
could be several employees in each department. Here we use a clustering index, where all employees
belonging to the same Dept-id are considered to be within a single cluster, and the index pointers point to the
cluster as a whole.
27
Let us explain this diagram. The disk blocks contain a fixed number of records (in this case 4 each). The index
contains entries for 5 separate departments. The pointers of these entries point to the anchor record of the
block where the first of the Dept-id in the cluster can be found. The blocks themselves may point to the anchor
record of the next block in case a cluster overflows a block size. This can be done using a special pointer at the
end of each block (comparable to the next pointer of the linked list organization).
The previous scheme might become a little confusing because one disk block might be shared by records
belonging to different cluster. A better scheme could be to use separate disk blocks for separate clusters. This
has been explained in the next page.
28
In this scheme, as you can see, we have used separate disk block for the clusters. The pointers, like before,
have pointed to the anchor record of the block where the first of the cluster entries would be found. The block
pointers only come into action when a cluster overflows the block size, as for Dept-id 2. This scheme takes
more space in the memory and the disk, but the organization in much better and cleaner looking.
Secondary Index
While creating the index, generally the index table is kept in the primary memory (RAM) and the main table,
because of its size is kept in the secondary memory (Hard Disk). Theoretically, a table may contain millions of
records (like the telephone directory of a large city), for which even a sparse index becomes so large in size
that we cannot keep it in the primary memory. And if we cannot keep the index in the primary memory, then
we lose the advantage of the speed of access. For very large table, it is better to organize the index in multiple
levels. See the following example.
29
In this scheme, the primary level index, (created with a gap of 100 records, and thereby smaller in size), is
kept in the RAM for quick reference. If you need to find out the record of roll 14 now, the index is first
searched to find out the highest entry which is smaller than or equal to 14. We have 1. The adjoining pointer
leads us to the anchor record of the corresponding secondary level index, where another similar search is
conducted. This finally leads us to the actual data block whose anchor record is roll 11. We now come to roll
11 where a short sequential search is made to find out roll 14.
Multilevel Index
The Multilevel Index is a modification of the secondary level index system. In this system we may use even
more number of levels in case the table is even larger.
Index in a Tree like Structure
We can use tree-like structures as index as well. For example, a binary search tree can also be used as an
index. If we want to find out a particular record from a binary search tree, we have the added advantage of
binary search procedure, that makes searching be performed even faster. A binary tree can be considered as
a 2-way Search Tree, because it has two pointers in each of its nodes, thereby it can guide you to two distinct
ways. Remember that for every node storing 2 pointers, the number of value to be stored in each node is one
less than the number of pointers, i.e. each node would contain 1 value each.
M-Way
Search
Tree
The abovementioned concept can be further expanded with the notion of the m-Way Search Tree, where m
represents the number of pointers in a particular node. If m = 3, then each node of the search tree contains 3
pointers,
and
each
node
would
then
contain
2
values.
A sample m-Way Search Tree with m = 3 is given in the following.
30
Transaction
What is a Transaction?
A transaction is an event which occurs on the database. Generally a transaction reads a value from the
database or writes a value to the database. If you have any concept of Operating Systems, then we can say that
a transaction is analogous to processes.
Although a transaction can both read and write on the database, there are some fundamental differences
between these two classes of operations. A read operation does not change the image of the database in any
way. But a write operation, whether performed with the intention of inserting, updating or deleting data from
the database, changes the image of the database. That is, we may say that these transactions bring the database
from an image which existed before the transaction occurred (called theBefore Image or BFIM) to an image
which exists after the transaction occurred (called the After Image or AFIM).
The Four Properties of Transactions
Every transaction, for whatever purpose it is being used, has the following four properties. Taking the initial
letters of these four properties we collectively call them theACID Properties. Here we try to describe them
and explain them.
Atomicity: This means that either all of the instructions within the transaction will be reflected in the
database, or none of them will be reflected.
Say for example, we have two accounts A and B, each containing Rs 1000/-. We now start a transaction to
deposit Rs 100/- from account A to Account B.
Read A;
A = A – 100;
Write A;
Read B;
B = B + 100;
Write B;
Fine, is not it? The transaction has 6 instructions to extract the amount from A and submit it to B. The AFIM
will show Rs 900/- in A and Rs 1100/- in B.
Now, suppose there is a power failure just after instruction 3 (Write A) has been complete. What happens
now? After the system recovers the AFIM will show Rs 900/- in A, but the same Rs 1000/- in B. It would be
said that Rs 100/- evaporated in thin air for the power failure. Clearly such a situation is not acceptable.
The solution is to keep every value calculated by the instruction of the transaction not in any stable storage
(hard disc) but in a volatile storage (RAM), until the transaction completes its last instruction. When we see
that there has not been any error we do something known as a COMMIT operation. Its job is to write every
temporarily calculated value from the volatile storage on to the stable storage. In this way, even if power fails
at instruction 3, the post recovery image of the database will show accounts A and B both containing Rs
1000/-, as if the failed transaction had never occurred.
Consistency: If we execute a particular transaction in isolation or together with other transaction, (i.e.
presumably in a multi-programming environment), the transaction will yield the same expected result.
To give better performance, every database management system supports the execution of multiple
transactions at the same time, using CPU Time Sharing. Concurrently executing transactions may have to deal
with the problem of sharable resources, i.e. resources that multiple transactions are trying to read/write at the
same time. For example, we may have a table or a record on which two transaction are trying to read or write
31
at the same time. Careful mechanisms are created in order to prevent mismanagement of these sharable
resources, so that there should not be any change in the way a transaction performs. A transaction which
deposits Rs 100/- to account A must deposit the same amount whether it is acting alone or in conjunction with
another transaction that may be trying to deposit or withdraw some amount at the same time.
Isolation: In case multiple transactions are executing concurrently and trying to access a sharable resource at
the same time, the system should create an ordering in their execution so that they should not create any
anomaly
in
the
value
stored
at
the
sharable
resource.
There are several ways to achieve this and the most popular one is using some kind of locking mechanism.
Again, if you have the concept of Operating Systems, then you should remember the semaphores, how it is
used by a process to make a resource busy before starting to use it, and how it is used to release the resource
after the usage is over. Other processes intending to access that same resource must wait during this time.
Locking is almost similar. It states that a transaction must first lock the data item that it wishes to access, and
release the lock when the accessing is no longer required. Once a transaction locks the data item, other
transactions wishing to access the same data item must wait until the lock is released.
Durability: It states that once a transaction has been complete the changes it has made should be permanent.
As we have seen in the explanation of the Atomicity property, the transaction, if completes successfully, is
committed. Once the COMMIT is done, the changes which the transaction has made to the database are
immediately written into permanent storage. So, after the transaction has been committed successfully, there is
no question of any loss of information even if the power fails. Committing a transaction guarantees that the
AFIM has been reached.
There are several ways Atomicity and Durability can be implemented. One of them is called Shadow Copy.
In this scheme a database pointer is used to point to the BFIM of the database. During the transaction, all the
temporary changes are recorded into a Shadow Copy, which is an exact copy of the original database plus the
changes made by the transaction, which is the AFIM. Now, if the transaction is required to COMMIT, then the
database pointer is updated to point to the AFIM copy, and the BFIM copy is discarded. On the other hand, if
the transaction is not committed, then the database pointer is not updated. It keeps pointing to the BFIM, and
the AFIM is discarded. This is a simple scheme, but takes a lot of memory space and time to implement.
If you study carefully, you can understand that Atomicity and Durability is essentially the same thing, just as
Consistency and Isolation is essentially the same thing.
Transaction States
There
are
Active: The
the
following
initial
state
six
when
states
the
in
which
transaction
a
has
transaction
just
started
may
exist:
execution.
Partially Committed: At any given point of time if the transaction is executing properly, then it is going
towards it COMMIT POINT. The values generated during the execution are all stored in volatile storage.
Failed: If the transaction fails for some reason. The temporary values are no longer required, and the
transaction is set to ROLLBACK. It means that any change made to the database by this transaction up to the
point of the failure must be undone. If the failed transaction has withdrawn Rs. 100/- from account A, then the
ROLLBACK operation should add Rs 100/- to account A.
Aborted: When the ROLLBACK operation is over, the database reaches the BFIM. The transaction is now
said to have been aborted.
Committed: If no failure occurs then the transaction reaches the COMMIT POINT. All the temporary values
are written to the stable storage and the transaction is said to have been committed.
32
Terminated: Either committed or aborted, the transaction finally reaches this state.
The whole process can be described using the following diagram:
Concurrent Execution
A schedule is a collection of many transactions which is implemented as a unit. Depending upon how these
transactions are arranged in within a schedule, a schedule can be of two types:


Serial: The transactions are executed one after another, in a non-preemptive manner.
Concurrent: The transactions are executed in a preemptive, time shared method.
In Serial schedule, there is no question of sharing a single data item among many transactions, because not
more than a single transaction is executing at any point of time. However, a serial schedule is inefficient in the
sense that the transactions suffer for having a longer waiting time and response time, as well as low amount of
resource utilization.
In concurrent schedule, CPU time is shared among two or more transactions in order to run them concurrently.
However, this creates the possibility that more than one transaction may need to access a single data item for
read/write purpose and the database could contain inconsistent value if such accesses are not handled properly.
Let us explain with the help of an example.
Let us consider there are two transactions T1 and T2, whose instruction sets are given as following. T1 is the
same as we have seen earlier, while T2 is a new transaction.
T1
Read A;
A = A – 100;
Write A;
Read B;
B = B + 100;
Write B;
T2
Read A;
Temp = A * 0.1;
Read C;
C = C + Temp;
Write C;
33
T2 is a new transaction which deposits to account C 10% of the amount in account A.
If we prepare a serial schedule, then either T1 will completely finish before T2 can begin, or T2 will
completely finish before T1 can begin. However, if we want to create a concurrent schedule, then some
Context Switching need to be made, so that some portion of T1 will be executed, then some portion of T2 will
be executed and so on. For example say we have prepared the following concurrent schedule.
T1
T2
Read A;
A = A - 100;
Write A;
Read A;
Temp = A * 0.1;
Read C;
C = C + Temp;
Write C;
Read B;
B = B + 100;
Write B;
No problem here. We have made some Context Switching in this Schedule, the first one after executing the
third instruction of T1, and after executing the last statement of T2. T1 first deducts Rs 100/- from A and
writes the new value of Rs 900/- into A. T2 reads the value of A, calculates the value of Temp to be Rs 90/and adds the value to C. The remaining part of T1 is executed and Rs 100/- is added to B.
It is clear that a proper Context Switching is very important in order to maintain the Consistency and Isolation
properties of the transactions. But let us take another example where a wrong Context Switching can bring
about disaster. Consider the following example involving the same T1 and T2
T1
T2
Read A;
A = A - 100;
Read A;
Temp = A * 0.1;
Read C;
C = C + Temp;
Write C;
Write A;
Read B;
B = B + 100;
Write B;
This schedule is wrong, because we have made the switching at the second instruction of T1. The result is
very confusing. If we consider accounts A and B both containing Rs 1000/- each, then the result of this
schedule should have left Rs 900/- in A, Rs 1100/- in B and add Rs 90 in C (as C should be increased by 10%
of the amount in A). But in this wrong schedule, the Context Switching is being performed before the new
34
value of Rs 900/- has been updated in A. T2 reads the old value of A, which is still Rs 1000/-, and deposits Rs
100/- in C. C makes an unjust gain of Rs 10/- out of nowhere.
In the above example, we detected the error simple by examining the schedule and applying common sense.
But there must be some well formed rules regarding how to arrange instructions of the transactions to create
error free concurrent schedules. This brings us to our next topic, the concept of Serializability.
Serializability
When several concurrent transactions are trying to access the same data item, the instructions within these
concurrent transactions must be ordered in some way so as there are no problem in accessing and releasing the
shared data item. There are two aspects of serializability which are described here:
Conflict Serializability
Two instructions of two different transactions may want to access the same data item in order to perform a
read/write operation. Conflict Serializability deals with detecting whether the instructions are conflicting in
any way, and specifying the order in which these two instructions will be executed in case there is any
conflict. A conflict arises if at least one (or both) of the instructions is a write operation. The following rules
are important in Conflict Serializability:
1.
If two instructions of the two concurrent transactions are both for read operation, then they are not in
conflict, and can be allowed to take place in any order.
2.
If one of the instructions wants to perform a read operation and the other instruction wants to perform a
write operation, then they are in conflict, hence their ordering is important. If the read instruction is performed
first, then it reads the old value of the data item and after the reading is over, the new value of the data item is
written. It the write instruction is performed first, then updates the data item with the new value and the read
instruction reads the newly updated value.
3.
If both the transactions are for write operation, then they are in conflict but can be allowed to take
place in any order, because the transaction do not read the value updated by each other. However, the value
that persists in the data item after the schedule is over is the one written by the instruction that performed the
last write.
It may happen that we may want to execute the same set of transaction in a different schedule on another day.
Keeping in mind these rules, we may sometimes alter parts of one schedule (S1) to create another schedule
(S2) by swapping only the non-conflicting parts of the first schedule. The conflicting parts cannot be swapped
in this way because the ordering of the conflicting instructions is important and cannot be changed in any
other schedule that is derived from the first. If these two schedules are made of the same set of transactions,
then both S1 and S2 would yield the same result if the conflict resolution rules are maintained while creating
the new schedule. In that case the schedule S1 and S2 would be called Conflict Equivalent.
View Serializability:
This is another type of serializability that can be derived by creating another schedule out of an existing
schedule, involving the same set of transactions. These two schedules would be called View Serializable if the
following rules are followed while creating the second schedule out of the first. Let us consider that the
transactions T1 and T2 are being serialized to create two different schedules S1 and S2 which we want to
be View Equivalent and both T1 and T2 wants to access the same data item.
35
1.
If in S1, T1 reads the initial value of the data item, then in S2 also, T1 should read the initial value of
that same data item.
2.
If in S1, T1 writes a value in the data item which is read by T2, then in S2 also, T1 should write the
value in the data item before T2 reads it.
3.
If in S1, T1 performs the final write operation on that data item, then in S2 also, T1 should perform the
final write operation on that data item.
Except in these three cases, any alteration can be possible while creating S2 by modifying S1.
Concurrency Control
When multiple transactions are trying to access the same sharable resource, there could arise many problems if
the access control is not done properly. There are some important mechanisms to which access control can be
maintained. Earlier we talked about theoretical concepts like serializability, but the practical concept of this
can be implemented by using Locks and Timestamps. Here we shall discuss some protocols where Locks and
Timestamps can be used to provide an environment in which concurrent transactions can preserve their
Consistency and Isolation properties.
Lock Based Protocol
A lock is nothing but a mechanism that tells the DBMS whether a particular data item is being used by any
transaction for read/write purpose. Since there are two types of operations, i.e. read and write, whose basic
nature are different, the locks for read and write operation may behave differently.
Read operation performed by different transactions on the same data item poses less of a challenge. The value
of the data item, if constant, can be read by any number of transactions at any given time.
Write operation is something different. When a transaction writes some value into a data item, the content of
that data item remains in an inconsistent state, starting from the moment when the writing operation begins up
to the moment the writing operation is over. If we allow any other transaction to read/write the value of the
data item during the write operation, those transaction will read an inconsistent value or overwrite the value
being written by the first transaction. In both the cases anomalies will creep into the database.
The simple rule for locking can be derived from here. If a transaction is reading the content of a sharable data
item, then any number of other processes can be allowed to read the content of the same data item. But if any
transaction is writing into a sharable data item, then no other transaction will be allowed to read or write that
same data item.
Depending upon the rules we have found, we can classify the locks into two types.
Shared Lock: A transaction may acquire shared lock on a data item in order to read its content. The lock is
shared in the sense that any other transaction can acquire the shared lock on that same data item for reading
purpose.
Exclusive Lock: A transaction may acquire exclusive lock on a data item in order to both read/write into it.
36
The lock is excusive in the sense that no other transaction can acquire any kind of lock (either shared or
exclusive) on that same data item.
The relationship between Shared and Exclusive Lock can be represented by the following table which is
known as Lock Matrix.
Locks already existing
Shared Exclusive
Shared TRUE FALSE
Exclusive FALSE FALSE
How Should Lock be Used?
In a transaction, a data item which we want to read/write should first be locked before the read/write is done.
After the operation is over, the transaction should then unlock the data item so that other transaction can lock
that same data item for their respective usage. In the earlier chapter we had seen a transaction to deposit Rs
100/- from account A to account B. The transaction should now be written as the following:
Lock-X (A); (Exclusive Lock, we want to both read A’s value and modify it)
ReadA;
A=A+100;
Write A;
Unlock(A); (Unlocking A after the modification is done)
Lock-X(B); (Exclusive Lock, we want to both read B’s value and modify it)
Read B;
B = B + 100;
Write B;
Unlock (B); (Unlocking B after the modification is done)
And the transaction that deposits 10% amount of account A to account C should now be written as:
Lock-S (A); (Shared Lock, we only want to read A’s value)
Read A;
Temp = A * 0.1;
Unlock (A); (Unlocking A)
Lock-X (C); (Exclusive Lock, we want to both read C’s value and modify it)
Read C;
C = C + Temp;
Write C;
Unlock (C); (Unlocking C after the modification is done)
Let us see how these locking mechanisms help us to create error free schedules. You should remember that in
the previous chapter we discussed an example of an erroneous schedule:
T1
T2
Read A;
A = A - 100;
Read A;
Temp = A * 0.1;
Read C;
C = C + Temp;
Write C;
37
Write A;
Read B;
B = B + 100;
Write B;
We detected the error based on common sense only, that the Context Switching is being performed before the
new value has been updated in A. T2 reads the old value of A, and thus deposits a wrong amount in C. Had we
used the locking mechanism, this error could never have occurred. Let us rewrite the schedule using the locks.
T1
T2
Lock-X (A)
Read A;
A = A - 100;
Write A;
Lock-S (A)
Read A;
Temp = A * 0.1;
Unlock (A)
Lock-X(C)
Read C;
C = C + Temp;
Write C;
Unlock (C)
Write A;
Unlock (A)
Lock-X (B)
Read B;
B = B + 100;
Write B;
Unlock (B)
We cannot prepare a schedule like the above even if we like, provided that we use the locks in the
transactions. See the first statement in T2 that attempts to acquire a lock on A. This would be impossible
because T1 has not released the excusive lock on A, and T2 just cannot get the shared lock it wants on A. It
must wait until the exclusive lock on A is released by T1, and can begin its execution only after that. So the
proper schedule would look like the following:
T1
T2
Lock-X (A)
Read A;
A = A - 100;
Write A;
Unlock (A)
Lock-S (A)
Read A;
38
Temp = A * 0.1;
Unlock (A)
Lock-X(C)
Read C;
C = C + Temp;
Write C;
Unlock (C)
Lock-X (B)
Read B;
B = B + 100;
Write B;
Unlock (B)
And this automatically becomes a very correct schedule. We need not apply any manual effort to detect or
correct the errors that may creep into the schedule if locks are not used in them.
Two Phase Locking Protocol
The use of locks has helped us to create neat and clean concurrent schedule. The Two Phase Locking Protocol
defines the rules of how to acquire the locks on a data item and how to release the locks.
The Two Phase Locking Protocol assumes that a transaction can only be in one of two phases.
Growing Phase: In this phase the transaction can only acquire locks, but cannot release any lock. The
transaction enters the growing phase as soon as it acquires the first lock it wants. From now on it has no option
but to keep acquiring all the locks it would need. It cannot release any lock at this phase even if it has finished
working with a locked data item. Ultimately the transaction reaches a point where all the lock it may need has
been acquired. This point is called Lock Point.
Shrinking Phase: After Lock Point has been reached, the transaction enters the shrinking phase. In this phase
the transaction can only release locks, but cannot acquire any new lock. The transaction enters the shrinking
phase as soon as it releases the first lock after crossing the Lock Point. From now on it has no option but to
keep
releasing
all
the
acquired
locks.
There are two different versions of the Two Phase Locking Protocol. One is called the Strict Two Phase
Locking Protocol and the other one is called the Rigorous Two Phase Locking Protocol.
Strict Two Phase Locking Protocol
In this protocol, a transaction may release all the shared locks after the Lock Point has been reached, but it
cannot release any of the exclusive locks until the transaction commits. This protocol helps in creating cascade
less schedule.
A Cascading Schedule is a typical problem faced while creating concurrent schedule. Consider the following
schedule once again.
T1
T2
Lock-X (A)
Read A;
A = A - 100;
Write A;
Unlock (A)
Lock-S (A)
Read A;
39
Temp = A * 0.1;
Unlock (A)
Lock-X(C)
Read C;
C = C + Temp;
Write C;
Unlock (C)
Lock-X (B)
Read B;
B = B + 100;
Write B;
Unlock (B)
The schedule is theoretically correct, but a very strange kind of problem may arise here. T1 releases the
exclusive lock on A, and immediately after that the Context Switch is made. T2 acquires a shared lock on A to
read its value, perform a calculation, update the content of account C and then issue COMMIT. However, T1
is not finished yet. What if the remaining portion of T1 encounters a problem (power failure, disc failure etc)
and cannot be committed? In that case T1 should be rolled back and the old BFIM value of A should be
restored. In such a case T2, which has read the updated (but not committed) value of A and calculated the
value of C based on this value, must also have to be rolled back. We have to rollback T2 for no fault of T2
itself, but because we proceeded with T2 depending on a value which has not yet been committed. This
phenomenon of rolling back a child transaction if the parent transaction is rolled back is called Cascading
Rollback, which causes a tremendous loss of processing power and execution time.
Using Strict Two Phase Locking Protocol, Cascading Rollback can be prevented. In Strict Two Phase Locking
Protocol a transaction cannot release any of its acquired exclusive locks until the transaction commits. In such
a case, T1 would not release the exclusive lock on A until it finally commits, which makes it impossible for T2
to acquire the shared lock on A at a time when A’s value has not been committed. This makes it impossible for
a schedule to be cascading.
Rigorous Two Phase Locking Protocol
In Rigorous Two Phase Locking Protocol, a transaction is not allowed to release any lock (either shared or
exclusive) until it commits. This means that until the transaction commits, other transaction might acquire a
shared lock on a data item on which the uncommitted transaction has a shared lock; but cannot acquire any
lock on a data item on which the uncommitted transaction has an exclusive lock.
Timestamp Ordering Protocol
A timestamp is a tag that can be attached to any transaction or any data item, which denotes a specific time on
which the transaction or data item had been activated in any way. We, who use computers, must all be familiar
with the concepts of “Date Created” or “Last Modified” properties of files and folders. Well, timestamps are
things like that.
A timestamp can be implemented in two ways. The simplest one is to directly assign the current value of the
clock to the transaction or the data item. The other policy is to attach the value of a logical counter that keeps
incrementing
as
new
timestamps
are
required.
The timestamp of a transaction denotes the time when it was first activated. The timestamp of a data item can
be of the following two types:
W-timestamp (Q): This means the latest time when the data item Q has been written into.
R-timestamp (Q): This means the latest time when the data item Q has been read from.
40
These two timestamps are updated each time a successful read/write operation is performed on the data item
Q.
How should timestamps be used?
The timestamp ordering protocol ensures that any pair of conflicting read/write operations will be executed in
their respective timestamp order. This is an alternative solution to using locks.
For Read operations:
If TS (T) < W-timestamp (Q), then the transaction T is trying to read a value of data item Q which has already
been overwritten by some other transaction. Hence the value which T wanted to read from Q does not exist
there anymore, and T would be rolled back.
If TS (T) >= W-timestamp (Q), then the transaction T is trying to read a value of data item Q which has been
written and committed by some other transaction earlier. Hence T will be allowed to read the value of Q, and
the R-timestamp of Q should be updated to TS (T).
For Write operations:
If TS (T) < R-timestamp (Q), then it means that the system has waited too long for transaction T to write its
value, and the delay has become so great that it has allowed another transaction to read the old value of data
item Q. In such a case T has lost its relevance and will be rolled back.
Else if TS (T) < W-timestamp (Q), then transaction T has delayed so much that the system has allowed
another transaction to write into the data item Q. in such a case too, T has lost its relevance and will be rolled
back.
Otherwise the system executes transaction T and updates the W-timestamp of Q to TS (T).