Download What restrictions are imposed on outer join

Document related concepts

SQL wikipedia , lookup

Microsoft SQL Server wikipedia , lookup

IMDb wikipedia , lookup

Oracle Database wikipedia , lookup

Open Database Connectivity wikipedia , lookup

Commitment ordering wikipedia , lookup

Ingres (database) wikipedia , lookup

Entity–attribute–value model wikipedia , lookup

Relational algebra wikipedia , lookup

Microsoft Jet Database Engine wikipedia , lookup

Database wikipedia , lookup

Functional Database Model wikipedia , lookup

Versant Object Database wikipedia , lookup

Extensible Storage Engine wikipedia , lookup

Serializability wikipedia , lookup

Clusterpoint wikipedia , lookup

ContactPoint wikipedia , lookup

Concurrency control wikipedia , lookup

Database model wikipedia , lookup

Relational model wikipedia , lookup

Transcript
Database Management System
SYNERGY INSTITUTE OF ENGG. & TECHNOLOGY
DHENKANAL
DATABASE MANAGEMENT SYSTEM
Lecturer notes
Prepared by :
Nirjharinee Parida
Prepared By Nirjharinee Parida
Database Management System
PCCS4204 Database
Engineering
Module1: (12 Hrs)
Introduction to database Systems, Basic concepts &Definitions, Data Dictionary, DBA, Fileoriented system vs. Database System, Database Language.
Database System Architecture-Schemas, Sub Schemas & Instances, 3-level database
architecture, Data Abstraction, Data Independence, Mappings, Structure, Components & functions
of DBMS, Data models, Mapping E-R model to Relational, Network and Object Oriented Data
models, types of Database systems,
Storage Strategies: Detailed Storage Architecture, Storing Data, Magnetic Disk, RAID, Other
Disks, Magnetic Tape, Storage Access, File & Record Organization, File Organizations & Indexes,
Order Indices, B+ Tree Index Files, Hashing
Module2: (16 Hrs)
Relational Algebra, Tuple & Domain Relational Calculus, Relational Query Languages: SQL and
QBE.
Database Design :-Database development life cycle(DDLC),Automated design tools, Functional
dependency and Decomposition, Dependency Preservation & lossless Design, Normalization,
Normal forms:1NF, 2NF,3NF,and BCNF, Multi-valued Dependencies, 4NF & 5NF.
Query processing and optimization: Evaluation of Relational Algebra Expressions, Query
optimization.
Module3: (12 Hrs)
Transaction processing and concurrency control: Transaction concepts, concurrency control,
locking and Timestamp methods for concurrency control.
Database Recovery System: Types of Data Base failure & Types of Database Recovery, Recovery
techniques
Advanced topics: Object-Oriented & Object – Relational Database, Parallel & Distributed
Database, Introduction to Data warehousing & Data Mining
Text Books:
1. Database System Concepts by Sudarshan, Korth (McGraw-Hill Education)
2. Fundamentals of Database System By Elmasari &Navathe- Pearson Education
References Books:
(1) An introduction to Database System – Bipin Desai, Galgotia Publications
(2) Database System: concept, Design & Application by S.K.Singh (Pearson Education)
(3) Database management system by leon &leon (Vikas publishing House).
(4) Database Modeling and Design: Logical Design by Toby J. Teorey, Sam S. Lightstone, and
Tom Nadeau, “”, 4th Edition, 2005, Elsevier India Publications, New Delhi
(5) Fundamentals of Database Management System – Gillenson, Wiley India
Prepared By Nirjharinee Parida
Database Management System
Module-1
Prepared By Nirjharinee Parida
Database Management System
What do you mean by Data and Database?
Data can be divided into three categories.
Raw data – this could be “85” – doesn’t have meaning when it stands alone. It might
mean something if you knew it was weight of a man in Kilograms.
Related raw data is a group (data set or data file) of organized raw data that can be
tied together. For example, it could be a group of Names, weights, blood group and
identification numbers, all tied to the Identity cards issued to patients at hospitals
Cleaned raw data is all the above after being validated or processed through some
process. Such a process might ensure that blood groups doesn’t have any value as
“red” or “black” for example only allowed values could be of the kind A,A+,B,B+ etc.
Data can be acquired from many different sources. It must always be evaluated as to
which category it belongs, and if it needs any additional validation before analysis that
produces information.
Database:
A database consists of an organized collection of interrelated data for one or more
uses, typically in digital form.
Examples of databases could be: Database for Educational Institute or a Bank, Library,
Railway Reservation system etc.
Prepared By Nirjharinee Parida
Database Management System
What Is a DBMS?
 Consists of two things- a Database and a set of programs.
 Database is a very large, integrated collection of data.
 The set of programs are used to Access and Process the database.
 So DBMS can be defined as the software package designed to store and
manage or process the database.
FILE SYSTEM
Set of
programs
File system
Data is stored in Different Files in forms of Records.
The programs are written time to time as per the requirement to manipulate the data
within files.
 A program to debit and credit an account
 A program to find the balance of an account
A program to generate monthly statements
Disadvantages of File system over DBMS
Most explicit and major disadvantages of file system when compared to database
management system are as follows:
Prepared By Nirjharinee Parida
Database Management System
Data Redundancy- The files are created in the file system as and when required by an
enterprise over its growth path. So in that case the repetition of information about an
entity cannot be avoided.
Eg. The addresses of customers will be present in the file maintaining information
about customers holding savings account and also the address of the customers will be
present in file maintaining the current account. Even when same customer have a
saving account and current account his address will be present at two places.
Data Inconsistency: Data redundancy leads to greater problem than just wasting the
storage i.e. it may lead to inconsistent data. Same data which has been repeated at
several places may not match after it has been updated at some places.
For example: Suppose the customer requests to change the address for his account in
the Bank and the Program is executed to update the saving bank account file only but
his current bank account file is not updated. Afterwards the addresses of the same
customer present in saving bank account file and current bank account file will not
match.
Moreover there will be no way to find out which address is latest out of these two.
Difficulty in Accessing Data: For generating ad hoc reports the programs will not
already be present and only options present will to write a new program to generate
requested report or to work manually. This is going to take impractical time and will be
more expensive.
For example: Suppose all of sudden the administrator gets a request to generate a list
of all the customers holding the saving banks account who lives in particular locality of
the city. Administrator will not have any program already written to generate that list but
say he has a program which can generate a list of all the customers holding the savings
account. Then he can either provide the information by going thru the list manually to
select the customers living in the particular locality or he can write a new program to
generate the new list. Both of these ways will take large time which would generally be
impractical.
Data Isolation: Since the data files are created at different times and supposedly by
different people the structures of different files generally will not match. The data will be
scattered in different files for a particular entity. So it will be difficult to obtain
Prepared By Nirjharinee Parida
Database Management System
appropriate data.
For example: Suppose the Address in Saving Account file have fields: Add line1, Add
line2, City, State, Pin while the fields in address of Current account are: House No.,
Street No., Locality, City, State, Pin. Administrator is asked to provide the list of
customers living in a particular locality. Providing consolidated list of all the customers
will require looking in both files. But they both have different way of storing the address.
Writing a program to generate such a list will be difficult.
Integrity Problems: All the consistency constraints have to be applied to database
through appropriate checks in the coded programs. This is very difficult when number
such constraint is very large.
For example: An account should not have balance less than Rs. 500. To enforce this
constraint appropriate check should be added in the program which add a record and
the program which withdraw from an account. Suppose later on this amount limit is
increased then all those check should be updated to avoid inconsistency. These time to
time changes in the programs will be great headache for the administrator.
? Security and access control: Database should be protected from unauthorized users.
Every user should not be allowed to access every data. Since application programs are
added to the system For example: The Payroll Personnel in a bank should not be
allowed to access accounts information of the customers.
Concurrency Problems: When more than one users are allowed to process the
database. If in that environment two or more users try to update a shared data element
at about the same time then it may result into inconsistent data.
For example: Suppose Balance of an account is Rs. 500. And User A and B try to
withdraw Rs 100 and Rs 50 respectively at almost the same time using the Update
process.
Update:
1. Read the balance amount.
2. Subtract the withdrawn amount from balance.
3. Write updated Balance value.
Suppose A performs Step 1 and 2 on the balance amount i.e it reads 500 and subtract
100 from it. But at the same time B withdraws Rs 50 and he performs the Update
Prepared By Nirjharinee Parida
Database Management System
process and he also reads the balance as 500 subtract 50 and writes back 450. User A
will also write his updated Balance amount as 400. They may update the Balance value
in any order depending on various reasons concerning to system being used by both of
the users. So finally the balance will be either equal to 400 or 450. Both of these values
are wrong for the updated balance and so now the balance amount is having
inconsistent value forever.
Why Use a DBMS?
 Data independence and efficient access.
 Reduced application development time.
 Data integrity and security.
 Uniform data administration.
 Concurrent access, recovery from crashes.
Role of DBMS:
The earlier Information system will work as follows:
user
SET of
programs
File system
disk
While the DBMS will be another layer of software package placed between the file
system and set of application programs. The Role of DBMS can described by the
following diagram at a very high level.
Instances and Schema:
Prepared By Nirjharinee Parida
Database Management System
Database schema: Overall design of the database. An analogy to the programming
language could be the definition of various variables with their data types. In case of
relational database management system the definition of table names, and their fields
with data types will be the database schema.
Database Instance: The collection of information stored in the database at a particular
moment is called database instance. An analogy to the programming languages would
be the values stored in the variables during the execution of programs. In case of
relational database management system the data stored in various tables at a particular
time is the instance of the database.
Data Abstraction: Three-Level Architecture of DBMS:
Since many of the database system users are not computer trained, developers hide
the complexity from users through several level of abstraction, to simplify user’s
interaction with the system:
Physical Level:
 Lowest level of abstraction
 Describes ‘how’ the data are actually stored
 Complex low-level data structures are defined by system programs which are
 generally hidden from high level computer programs also.
In the case of relational database management systems the files and indexes used
are described at physical level of abstraction.
It is similar as a programming language hides exact way of storing the values
defined by variables or records or arrays. Thus defining exact way of storing a
record or an array defined by suppose C language will be called physical level of
abstraction.
 Physical schema is used at the physical level of abstraction.
Logical Level:
1. Describes what data are stored in the database and what relationships exist among
those data.
2.Entire database is represented in simple structure which may be specified by very
Prepared By Nirjharinee Parida
Database Management System
complex structures at physical level.
3. In the case of relational database management systems definitions of Tables and
their fields are defined at logical level of abstraction.
4.An analogy with programming language for logical level of data abstraction is the
definitions of record structures or arrays in a programming language (say C).
5 Logical Schema is used at logical level of abstraction.
View Level:
1. Describes only part of the entire database.
2.Many users will not be concerned with all the information stored in a database
3.System may provide several views for the several type of users of the database
which will show only the concerned part of the database.
4. View schema is used at view level of abstraction.
EXTERNAL
VIEW
External
External
mappings
View
View
CONCEPTUAL
Conceptual Schema
LEVEL
mappings
Internal Schema
INTERNAL
LEVEL
STORED DATABASE
Prepared By Nirjharinee Parida
Database Management System
Data Independence:
Physical Data Independence:

application programs to be rewritten.
 The changes in physical schema can include: using new storage devices, using
different data structures, using different file organizations or storage structures or
changing file index. All these changes should be possible without changes in
logical schema or application programs to be rewritten.
Logical Data Independence:

rewritten.
 The changes like addition or deletion of entities, attributes or relationships come
in logical schema changes and they should be possible without rewriting the
already written application programs.
Data Definition Language (DDL):
This language provides a set of commands which can be used to define
 what is the data in database.
 what is the relationship between various data elements
 What are the integrity constraints put on various data items needed to be
satisfied etc.

 The DDL statements are compiled to form the Data Dictionary or Data Directory
which contains the meta data i.e. data about data.
The data dictionary is consulted by DBMS before any operation on data.
Data Manipulation Language(DML):
Prepared By Nirjharinee Parida
Database Management System
 This consists of very high level statements that are used to specify the operations
to be performed on the database.
Data Models, Schemas and Instances
A characteristic of the database approach is that it provides a level of data abstraction,
by hiding details of data storage that are not needed by most users.
A data model is a collection of concepts that can be used to describe the structure of a
database. The model provides the necessary means to achieve the abstraction.
The structure of a database is characterized by data types, relationships, and
constraints that hold for the data. Models also include a set of operations for specifying
retrievals and updates.
Data models are changing to include concepts to specify the behaviour of the database
application. This allows designers to specify a set of user defined operations that are
allowed.
Categories of Data Models
Data models can be categorized in multiple ways.

High level/conceptual data models – provide concepts close to the way users
perceive the data.

Physical data models – provide concepts that describe the details of how data
is stored in the computer. These concepts are generally meant for the specialist,
and not the end user.

Representational data models – provide concepts that may be understood by
the end user but not far removed from the way data is organized.
Conceptual data models use concepts such as entities, attributes and relationships.

Entity – represents a real world object or concept

Attribute - represents property of interest that describes an entity, such as name
or salary.
Prepared By Nirjharinee Parida
Database Management System

Relationships – among two or more entities, represents an association among
two or more entities.
Representational data models are used most frequently in commercial DBMSs. They
include relational data models, and legacy models such as network and hierarchical
models.
Physical data models describe how data is stored in files by representing record
formats, record orderings and access paths.
Object data models – a group of higher level implementation data models closer to
conceptual data models.
Schemas, Instances and Database State
The description of a database is called the database schema. The schema is specified
during database design, and is not expected to change frequently.
Data models have conventions for displaying schemas as diagrams. A displayed
schema is called a schema diagram.
Entity Relationship (ER) Model.
The most popular high-level conceptual data model is the ER model. It is frequently
used for the conceptual design of database applications.
The diagrammatic notation associated with the ER model, is referred to as the ER
diagram. ER diagrams show the basic data structures and constraints.
Entity Types, Entity Sets, Attributes and Keys

The basic object of an ER diagram is the entity. An entity represents a ‘thing’ in the
real world.

Examples of entities might be a physical entity, such as a student, a house, a
product etc, or conceptual entities such as a company, a job position, a course, etc.
Prepared By Nirjharinee Parida
Database Management System

Entities have attributes, which basically are the properties/characteristics of a
particular entity.
Examples of entities & attributes
There are several types of entities. Including:

Simple vs. Composite

Single-valued vs. Multi-valued

Stored vs. Derived.
Simple vs. Composite Attributes

Composite attributes can be divided into smaller subparts, which represent more
basic attributes, which have their own meanings.

A common example of a composite attribute is Address. Address can be broken
down into a number of subparts, such as Street Address, City, Postal Code. Street
Address may be further broken down by Number, Street Name and Apartment/Unit
number.

Attributes that are not divisible into subparts are called simple or atomic attributes.

Composite attributes can be used if the attribute is referred to as the whole, and the
atomic attributes are not referred to. For example, if you wish to store the Company
Location, unless you will use the atomic information such as Postal Code, or City
separately from the other Location information (Street Address etc) then there is no
need to subdivide it into its component attributes, and the whole Location can be
designated as a simple attribute.

What are examples of other composite attributes?
Single-Valued vs. Multi-valued Attributes

Most attributes have a single value for each entity, such as a car only has one
model, a student has only one ID number, an employee has only one data of birth.
These attributes are called single-valued attributes.
Prepared By Nirjharinee Parida
Database Management System

Sometimes an attribute can have multiple values for a single entity, for example, a
doctor may have more than one specialty (or may have only one specialty), a
customer may have more than one mobile phone number, or they may not have one
at all. These attributes are called multi-valued attributes.

Multi-valued attributes may have a lower and upper bounds to constrain the number
of values allowed. For example, a doctor must have at least one specialty, but no
more than 3 specialties.
Stored vs. Derived Attributes

If an attribute can be calculated using the value of another attribute, they are called
derived attributes.

The attribute that is used to derive the attribute is called a stored attribute.

Derived attributes are not stored in the file, but can be derived when needed from
the stored attributes.
Null Valued Attributes

There are cases where an attribute does not have an applicable value for an
attribute. For these situations, the value null is created.

A person who does not have a mobile phone would have null stored at the value for
the Mobile Phone Number attribute.

Null can also be used in situations where the attribute value is unknown. There are
two cases where this can occur, one where it is known that the attribute is valued,
but the value is missing, for example hair color. Every person has a hair color, but
the information may be missing. Another situation is if mobile phone number is null,
it is not known if the person does not have a mobile phone or if that information is
just missing.
Complex Attributes

Complex attributes are attributes that are nested in an arbitrary way.

For example a person can have more than one residence, and each residence can
have more than one phone, therefore it is a complex attribute that can be
represented as:
Prepared By Nirjharinee Parida
Database Management System

{Multi-valued attributes are displayed between braces}

(Complex Attributes are represented using parentheses)
E.g.
{AddressPhone({Phone(AreaCode, PhoneNumber)}, Address(StreetAddress(Number,
Street, ApartmentNumber), City, State, Zip))}
Entity Types, Entity Sets, Keys and Value Sets
Entity Types and Entity Sets

An entity type defines a collection of entities that have the same attributes. Each
entity type in the database is described by its name and attributes. The entity share
the same attributes, but each entity has its own value for each attribute.
Entity Type Example:

Entity Type:
Student

Entity Attributes:
StudentID,
Name,
Surname,
Date of Birth,
Department

The collection of all entities of a particular entity type in the database at any point in
time is called an entity set. The entity type (Student) and the entity set (Student) can
be referred to using the same name.
Entity Set Example:
Prepared By Nirjharinee Parida
Database Management System

Entity Type: Student

Entity Set:
[123, John, Smith, 12/01/1981, Computer Technology]
[456, Jane, Doe, 05/02/1979, Mathematics]
[789, Semra, Aykan, 02/08/1980, Linguistics]
The entity type describes the intension, or schema for a set of entities that share the
same structure. The collection of entities of a particular entity type is grouped into the
entity set, called the extension.
Key Attributes of an Entity Type

An important constraint on entities of an entity type is the uniqueness constraint.

A key attribute is an attribute whose values are distinct for each individual entity in
the entity set.

The values of the key attribute can be used to identify each entity uniquely.

Sometimes a key can consist of several attributes together, where the combination
of attributes is unique for a given entity. This is called a composite key.

Composite keys should be minimal, meaning that all attributes must be included to
have the uniqueness property.

Specifying that an attribute is a key of an entity type means that the uniqueness
property must hold true for every entity set of the entity type.

An entity can have more than one key attribute, and some entities may have no key
attribute. Those entities with no key attribute are called weak entity types.
Value Sets (Domains) of Attributes

Each simple attribute of an entity is associates with a domain of values, or value set,
which specifies the set of values that may be assigned to that attribute for each
entity. For example, date of birth must be before today’s date, and after 01/01/1900,
or the Student Name attribute must be a string of alphabetic characters.

Value sets are not specified in ER diagrams.
Prepared By Nirjharinee Parida
Database Management System
ER DIAGRAM NOTATION
Prepared By Nirjharinee Parida
Database Management System
Relationship Review

Each time an attribute of one entity type refers to another entity type, some
relationship exists.

In ER diagrams, these references should be represented as relationships, rather
than attributes.

For example, in the Company database schema, an attribute of employee is the
department they work for, rather than representing this information as an attribute of
the Employee entity type, it should be represented on a diagram as a relationship
between the two entities.

Relationships between entities are represented using a diamond shape.

Relationships are usually given a verb name, which specifies the relationship
between two entities.

If we look at the relationship between Employee and Department, an employee
works for a department, therefore the relationship would be represented.
Employee
Works
for
Department
Degree of Relationship Type

The degree of a relationship type is the number of participating entity types.
Meaning if the relationship is between two entity types (Employee and Department),
then the relationship is binary, or has a degree of two.
Prepared By Nirjharinee Parida
Database Management System

If the relationship is between three participating entities, it has a degree of three, and
therefore is a ternary relationship.

For example, if we have three entities, Supplier, Project and Part. Each part is
supplied by a unique supplier, and is used for a given project within a company; the
relationship “Supplies” is a ternary (degree of three) between Supplier, Project and
Part, meaning all three participate in the supplies relationship.
Supplier
Project
Supplies
Part
Constraints on Relationship Types

Relationship types have certain constraints that limit the possible combination of
entities that may participate in relationship.

An example of a constraint is that if we have the entities Doctor and Patient, the
organization may have a rule that a patient cannot be seen by more than one doctor.
This constraint needs to be described in the schema.

There are two main types of relationship constraints, cardinality ratio, and
participation.
Cardinality for Binary Relationship.

Binary relationships are relationships between exactly two entities.

The cardinality ratio specifies the maximum number of relationship instances that an
entity can participate in.

The possible cardinality ratios for binary relationship types are: 1:1, 1:N, N:1, M:N.

Cardinality ratios are shown on ER diagrams by displaying 1, M and N on the
diamonds.
Prepared By Nirjharinee Parida
Database Management System

The ratio shown closest to an entity, represents the ratio the other entity has to that
entity.
Participation Constraints and Existence Dependencies

The participation constraint specifies whether the existence of an entity depends on
its being related to another entity via the relationship type.

The constraint specifies the minimum number of relationship instances that each
entity can participate in.

There are two types of participation constraints:
o Total:

If an entity can exist, only if it participates in at least one relationship
instance, then that is called total participation, meaning that every
entity in one set, must be related to at least one entity in a designated
entity set.

An example would be the Employee and Department relationship. If
company policy states that every employee must work for a
department, then an employee can exist only if it participates in at lest
one relationship instance (i.e. an employee can’t exist without a
department)

It is also sometimes called an existence dependency.

Total participation is represented by a double line, going from the
relationship to the dependent entity.
o Partial:

If only a part of the set of entities participate in a relationship, then it is
called partial participation.

Using the Company example, every employee will not be a manager of
a department, so the participation of an employee in the “Manages”
relationship is partial.

Partial participation is represented by a single line.
Attributes of Relationship Types

Relationships can have attributes similar to entity types.
Prepared By Nirjharinee Parida
Database Management System

For example, in the relationship Works_On, between the Employee entity and the
Department entity we would like to keep track of the number of hours an employee
works on a project. Therefore we can include Number of Hours as an attribute of the
relationship.

Another example is for the “manages” relationship between employee and
department, we can add Start Date as an attribute of the Manages relationship.

For some relationships (1:1, or 1:N), the attribute can be placed on one of the
participating entity types. For example the “Manages” relationship is 1:1, StartDate
can either be migrated to Employee or Department.
Weak Entity Types

Entity types that do not have key attributes are called weak entity types.

Entities that belong to a weak entity type are identified by being related to specific
entities from another entity type in combination with one of their attribute values.

This entity type is called an identifying or owner entity type.

The relationship that relates the identifying entity type with the weak entity type is
called an identifying relationship.

A weak entity type always has a total participation constraint with respect to the
identifying relationship, because a weak entity cannot exist without its owner.

Not all existence dependencies result in a weak entity type; if an entity has a key
attribute then it is not a weak entity.

A weak entity type usually has a partial key, which is the set of attributes that can
uniquely identify weak entities that are related to the same owner entity.
Weak Entity Example

For example, lets assume in a library database, we have an entity type Book. For
each book, we keep track of the author, ISBN, and title. The library may own several
copies of the same book, and for each copy, it keeps track of the copy number (a
different copy number for each copy of a given book) and price of each copy.
Book
has
Has
copy
Prepared By Nirjharinee
Parida
Database Management System

Because the copy number is only unique for each book (meaning Book 123 may
have copy 1, copy 2, copy 3, and book 456 may also have copy 1, copy 2 and copy
3) and not for all copies of all books, it cannot be considered unique for each copy.

Therefore because the Copy entity does not have a key attribute, it is considered a
weak entity type, an is identified by being related to the Book entity. The book entity
is the identifying entity, and the relationship is the identifying relationship.

Because a copy cannot exist without the owner (Book) the Copy entity type has a
total participation constraint with respect to the identifying relationship.

The partial key of the Copy entity is Copy Number, for each owner entity Book, the
Copy Number uniquely identifies the copy.
Mapping ER diagram to relational model
SID
Name
Major
GPA
1234
John
CS
2.8
5678
Mary
EE
3.6
Prepared By Nirjharinee Parida
Database Management System
Basic Ideas:
 Build a table for each entity set
 Build a table for each relationship set if necessary (more on this later)
 Make a column in the table for each attribute in the entity set
 Indivisibility Rule and Ordering Rule
 Primary Key
ssn
NAME
SID
name
student
GPA
MAJOR
ADVISER
professor
dept
Prepared By Nirjharinee Parida
Database Management System
Representation of Weak Entity Set
Age
Name
Parent_SID
10
Bart
1234
8
Lisa
5678
Prepared By Nirjharinee Parida
Database Management System
Representation of Relationship Set
 Unary/Binary Relationship set
 Depends on the cardinality and participation of the relationship
 Two possible approaches
 N-ary (multiple) Relationship set
primary Key Issue
 Identifying Relationship
No relational model representation necessary
REPRESENTING RELATIONSHIP SET
•
For one-to-one relationship without total participation
–
Build a table with two columns, one column for each participating entity
set’s primary key.
Add successive columns, one for each descriptive
attributes of the relationship set (if any).
•
For one-to-one relationship with one entity set having total participation
–
Augment one extra column on the right side of the table of the entity set
with total participation, put in there the primary key of the entity set without
complete participation as per to the relationship.
Example – One-to-One Relationship Set
Prepared By Nirjharinee Parida
Database Management System
SID
Maj_ID Co
S_Degree
9999
07
1234
8888
05
5678
• For one-to-many relationship without total participation
– Same thing as one-to-one
• For one-to-many/many-to-one relationship with one entity set having
total participation on “many” side
– Augment one extra column on the right side of the table of the
entity set on the “many” side, put in there the primary key of the
entity set on the “one” side as per to the relationship.
Prepared By Nirjharinee Parida
Database Management System
SID
Name
Major
GPA
Pro_SSN
Ad_Sem
9999
Bart
Economy
-4.0
123-456
Fall 2006
8888
Lisa
Physics
4.0
567-890
Fall 2005
• For many-to-many relationship
– Same thing
participation.
as
one-to-one
relationship
without
total
– Primary key of this new schema is the union of the foreign keys
of both entity sets.
– No augmentation approach possible.
N-ary Relationship
Build a new table with as many columns as there are attributes for the
union of the primary keys of all participating entity sets.
Augment additional columns for descriptive attributes of the relationship set
(if necessary)
The primary key of this table is the union of all primary keys of entity sets
that are on “many” side
That is it, we are done.
•
Prepared By Nirjharinee Parida
Database Management System
P-Key1
P-Key2
P-Key3
A-Key
D-Attribute
9999
8888
7777
6666
Yes
1234
5678
9012
3456
No
Prepared By Nirjharinee Parida
Database Management System
Module-I
Short Questions
2008
1. What is the difference between a primary key and a candidate key? [2][Introduction to database
system]
2. Mention the various categories of Data Model. [2][ Data Model]
3. Define: Entity Type, Entity Set and Value Set. [2][ER model]
2009
1. What is the difference between multi valued and derived attributes? [2][ ER model]
2. Differentiate between procedural and non procedural language with example. [2][Query language]
3. What is generalization? How it differs from specialization? [2][ER model]
4. Draw the ER diagram for the following entity sets.
Movies (Title, year, length, film type)
And Stars(name, address) [2][ER model]
2010
1. Differentiate between foreign Key and references. [2][ER model]
2. Find the tuple calculus representation for the following SQL query :
Select R1.A, R2.B from R1, R2 whereR1.b=R2.a;[2][Query languages]
2011
1. What is the job of DBA? [2][Database languages]
2. What are the different levels of data abstraction? How those are linked with data dependence?
[2][Introduction to database system]
3. Define integrity rules and constraints.[2][ ER model]
4. Give an example of a weak entity set and explain why it is weak. [2] [ER model]
5. What are the differences between Relational algebra and relational calculus? [2][ Database languages]
6. Write down two DML statements for database recovery and explain it. [2][Database languages]
2012
1. What is the need of DBMS? [2][Introduction to database system]
2. What do you mean by data integrity? [2][Introduction to database system]
3. What is ER model? [2][ER model]
4. What do you mean by weak entity set? [2][ ER model]
5. What are the different aggregate functions in SQL? [2][Query languages]
6. Define query language. [2][Query Languages]
Prepared By Nirjharinee Parida
Database Management System
Long Question
2008
1. Define entity, attribute and relationships as used in relational databases. Describe purpose of E-R
Model. Illustrate your answer with an example. [5] [ER model]
2. What are the major components of the relational model? What is simple relational database ?What are
two models in which you can use SQL ? [5] [Data model]
3. What is an object-oriented database? What are its advantages compared ‘to relational database?
Explain some applications where an object-oriented database may be useful.
[5][Introduction to database system]
4. Consider the following tables :
A
3
8
B
7
6
C
9
5
A
5
8
F
8
2
G
1
6
Show the semantics and the output of the following query :
SELECT * FROM S, R WHERE S.A = R.A AND S.B = R.G ; [5][Database languages]
5. List out the six fundamental operators and 4 additional operators in relational algebra.
[2.5] [Relational algebra]
6. Explain the two conditions needed for the set difference operation (union operation) to be valid.
[2.5] [Relational algebra]
2009
1. List at least four advantages of using a database management system over a traditional file
management system. Are there any disadvantages of DBMS? [5][Introduction to database system]
2. What is abstraction? Is it necessary for database system? Explain how database architecture satisfies
abstraction at various levels . [5][Introduction to database system]
3. Draw the ER diagram for the following relations.
Courses (Number, Room). Is it a weak entity set.
Depts (Name, BOD)
Lab-courses(computer alocation)
Theory-courses (name, faculty_ name) [5] [ER model]
2010
1. Consider the set of relations :
Student(name, roll,mark)
Score(roll,grade)
Details(name,address)
For the following query:
"Find name & address of students scoring grade 'A'," Represent it in relational algebra, tuple calculus,
domain calculus, QBE and SQL.[10] [Query languages]
Prepared By Nirjharinee Parida
Database Management System
2. Describe the steps to reduce an E-R schema to tables. [10][ER model]
3. Differentiate between Object-Orienteddatabase and Object relational Database.
[5][Introduction to Database system]
4. What is a constraint? Describe its types with examples. [10][ER model]
2011
1. A person can have many cars. A particular brand of car can have many owners. Draw the ER diagram
and convert it to Relational model. [5[ER model]
2. Draw a network model for the above problem in question 2(a) and explain. [5][Datamodel]
3. How object oriented data models helps in current database management systems? Give an example
of a customer and bank relation to explain an object oriented model. [5][Datamodel]
4. Describe the database architecture and explain the role of different users with respect to that
architecture. [5][Introduction to database system]
5. Consider the following relations:
EMPLOYEE(empno, name, office, age)
BOOKS(isbn, title, author, publisher)
LOAN(empno, isbn, date)
Write the following quries in relational algebra:
i. Find the name of the employees who have borrowed more than 5
books published by pearson.
ii. Find the name of employees who have borrowed all books published
by pearson
[5][Database languages]
2012
1. What is database? Explain DBMS system architecture. [5][Introduction to database system]
2. Draw an ER diagram for banking system. [5][ER model]
3. What are the advantages and disadvantages of DBMS? [5][Introduction to database system]
4. Let the following relation schemas be given:
R=(A, B, C)
S=(D, E, F)
Let relations r(R) and s(S) be given. Give an expression in SQL that is equivalent to each of the
following queries:
i.
πA(r)
ii.
r*s
[5][Query Languages]
5. Differentiate between the following terms:
i.
Relational and object oriented data models
iii.
Data definitions and Data manipulation languages
[5][ Introduction to database system]
6. Write short notes on the following:
A. Network Model [5][ Introduction to database system]
B. ER Model
Prepared By Nirjharinee Parida
Database Management System
Module-2
Prepared By Nirjharinee Parida
Database Management System
Relational Algebra:
 Basic operations:
o Selection (σ)
Selects a subset of rows from relation.
o Projection (π) Selects a subset of columns from relation.
o Cross-product (×) Allows us to combine two relations.
o Set-difference () Tuples in reln. 1, but not in reln. 2.
o Union (U) Tuples in reln. 1 and in reln. 2.
o Rename( ρ) Use new name for the Tables or fields.
 Additional operations:
o Intersection (∩), join( ), division(÷): Not essential, but (very!) useful.
 Since each operation returns a relation, operations can be composed! (Algebra is
“closed”.)
Projection
 Deletes attributes that are not in projection list.
 Schema of result contains exactly the fields in the projection list, with the same
names that they had in the (only) input relation. ( Unary Operation)
 Projection operator has to eliminate duplicates! (as it returns a relation which is a
set)
o Note: real systems typically don’t do duplicate elimination unless the user
explicitly asks for it. (Duplicate values may be representing different real
world entity or relationship).
Consider the BOOK table:
Acc-No
Title
Author
100
200
300
400
500
“DBMS”
“DBMS”
“COMPILER”
“COMPILER”
“OS”
“Silbershatz”
“Ramanuj”
“Silbershatz”
“Ullman”
“Sudarshan”
600
“DBMS”
“Silbershatz”
Prepared By Nirjharinee Parida
Database Management System
πTitle(BOOK) =
Title
“DBMS”
“COMPILER”
“OS”
Selection
 Selects rows that satisfy selection condition.
 No duplicates in result! (Why?)
 Schema of result identical to schema of (only) input relation.
 Result relation can be the input for another relational algebra operation!
(Operator composition.)
σAcc-no>300(BOOK) =
AccNo
400
500
600
Title
Author
“COMPILER” “Ullman”
“OS”
“Sudarshan”
“DBMS”
“Silbershatz”
σTitle=”DBMS”(BOOK)=
AccNo
100
200
600
Title
Author
“DBMS”
“DBMS”
“DBMS”
“Silbershatz”
“Ramanuj”
“Silbershatz”
πAcc-no (σTitle=”DBMS” (BOOK))=
AccNo
100
200
600
Prepared By Nirjharinee Parida
Database Management System
Union, Intersection, Set-Difference
 All of these operations take two input relations, which must be union-compatible:
o Same number of fields.
o `Corresponding’ fields have the same type.
 What is the schema of result?
Consider:
Borrower
Depositor
Cust-name
Acc-no
CustLoan-no
Suleman
A-100
name
Radheshyam A-300
Ram
L-13
Ram
A-401
Shyam
L-30
Suleman L-42
List of customers who are either borrower or depositor at bank= πCust-name
(Borrower) U πCust-name (Depositor)=
Cust-name
Ram
Shyam
Suleman
Radeshyam
Customers who are both borrowers and depositors = πCust-name (Borrower) ∩
πCust-name (Depositor)=
Custname
Ram
Suleman
Customers who are borrowers but not depositors = πCust-name (Borrower)  πCust-name
(Depositor)=
Cust-name
Shyam
Prepared By Nirjharinee Parida
Database Management System
Cartesian-Product or Cross-Product (S1 × R1)

Each row of S1 is paired with each row of R1.

Result schema has one field per field of S1 and R1, with field names `inherited’ if
possible.

Consider the borrower and loan tables as follows:
Borrower:
Custname
Ram
Shyam
Suleman
Loan-no
L-13
L-30
L-42
Loan:
Loanno
L-13
L-30
L-42
Amount
1000
20000
40000
Cross product of Borrower and Loan, Borrower × Loan =
Borrower.Custname
Ram
Ram
Ram
Shyam
Shyam
Shyam
Suleman
Suleman
Suleman
Borrower.Loanno
L-13
L-13
L-13
L-30
L-30
L-30
L-42
L-42
L-42
Loan.Loanno
L-13
L-30
L-42
L-13
L-30
L-42
L-13
L-30
L-42
Loan.Amount
1000
20000
40000
1000
20000
40000
1000
20000
40000
The rename operation can be used to rename the fields to avoid confusion when two
field names are same in two participating tables:
For example the statement, ρLoan-borrower(Cust-name,Loan-No-1, Loan-No-2,Amount)( Borrower × Loan)
results into- A new Table named Loan-borrower is created where it has four fields which
Prepared By Nirjharinee Parida
Database Management System
are renamed as Cust-name, Loan-No-1, Loan-No-2 and Amount and the rows contains
the same data as the cross product of Borrower and Loan.
Loan-borrower:
Custname
Ram
Ram
Ram
Shyam
Shyam
Shyam
Suleman
Suleman
Suleman
Loan-No1
L-13
L-13
L-13
L-30
L-30
L-30
L-42
L-42
L-42
LoanNo-2
L-13
L-30
L-42
L-13
L-30
L-42
L-13
L-30
L-42
Amount
1000
20000
40000
1000
20000
40000
1000
20000
40000
Rename Operation:
It can be used in two ways :


return the result of expression E in the table named x.
return the result of expression E in the table named x with the
attributes renamed to A1, A2,…, An.

It’s benefit can be understood by the solution of the query “ Find the largest
account balance in the bank”
It can be solved by following steps:

Find out the relation of those balances which are not largest.

Consider Cartesion product of Account with itself i.e. Account × Account

Compare the balances of first Account table with balances of second Account
table in the product.

For that we should rename one of the account table by some other name to
avoid the confusion
It can be done by following operation
ΠAccount.balance (σAccount.balance < d.balance(Account× ρd(Account))

So the above relation contains the balances which are not largest.
Prepared By Nirjharinee Parida
Database Management System

Subtract this relation from the relation containing all the balances i.e . Πbalance
(Account).
So the final statement for solving above query is
Πbalance (Account)- ΠAccount.balance (σAccount.balance < d.balance(Account× ρd(Account))
Additional Operations
Natural Join (

)
Forms Cartesian product of its two arguments, performs selection forcing
equality on those attributes that appear in both relations

For example consider Borrower and Loan relations, the natural join
between them
will automatically perform the selection
on the table returned by Borrower × Loan which force equality on the
attribute that appear in both Borrower and Loan i.e. Loan-no and also will
have only one of the column named Loan-No.

= σBorrower.Loan-no
That means
= Loan.Loan-no
(Borrower ×
Loan).

The table returned from this will be as follows:
Eliminate rows that does not satisfy the selection criteria “σBorrower.Loan-no = Loan.Loan-no” from
Borrower × Loan =
Borrower.Custname
Ram
Ram
Ram
Shyam
Shyam
Shyam
Suleman
Suleman
Suleman
Borrower.Loanno
L-13
L-13
L-13
L-30
L-30
L-30
L-42
L-42
L-42
Loan.Loanno
L-13
L-30
L-42
L-13
L-30
L-42
L-13
L-30
L-42
Loan.Amount
1000
20000
40000
1000
20000
40000
1000
20000
40000
Prepared By Nirjharinee Parida
Database Management System
And will remove one of the column named Loan-no.
 i.e.
Cust-name
Ram
Shyam
Suleman
=
Loan-no
L-13
L-30
L-42
Amount
1000
20000
40000
Division Operation:

denoted by ÷ is used for queries that include the phrase “for all”.

For example “Find customers who has an account in all branches in
branch city Agra”. This query can be solved by following statement.
ΠCustomer-name.
branch-name
(
)
÷
Πbranch-name
(σBranch-
city=”Agra”(Branch)

The division operations can be specified by using only basic operations
as follows: Let r(R) and s(S) be given relations for schema R and S with
r ÷ s = ΠR-S(r) - ΠR-S ((ΠR-S (r) × s) - ΠR-S,S (r))
Tuple Relational Calculus
Relational algebra is an example of procedural language while tuple relational calculus
is a nonprocedural query language.
A query is specified as:
{t | P(t)}, i.e it is the set of all tuples t such that predicate P is true for t.
The formula P(t) is formed using atoms which uses the relations, tuples of relations and
fields of tuples and following symbols
Prepared By Nirjharinee Parida
Database Management System
These atoms can then be used to form formulas with following symbols
For example : here are some queries and a way to express them using tuple calculus:
o Find the branch-name, loan-number and amount for loans over Rs
1200.
.
o Find the loan number for each loan of an amount greater that
Rs1200.
o Find the names of all the customers who have a loan from the
Sadar branch.
o Find all customers who have a loan , an account, or both at the
bank
o Find only those customers who have both an account and a loan.
o Find all customers who have an account but do not have loan.
Prepared By Nirjharinee Parida
Database Management System
o Find all customers who have an account at all branches located in Agra
Domain Relational Calculus

Domain relational calculus is another non procedural language for expressing
database queries.

A query is specified as:
{<x1,x2,…,xn> | P(x1,x2,…,xn)} where x1,x2,…,xn represents domain variables. P
represent a predicate formula as in tuple calculus

Since the domain variables are referred in place of tuples the formula doesn’t
refer the fields of tuples rather they refer the domain variables.

For example the queries in domain calculus are mentioned as follows:
o Find the branch-name, loan-number and amount for loans over Rs 1200.
.
o Find the loan number for each loan of an amount greater that Rs1200.
o Find the names of all the customers who have a loan from the Sadar
branch and find the loan amount
o Find names of all customers who have a loan , an account, or both at the
Sadar Branch
Prepared By Nirjharinee Parida
Database Management System
o Find only those customers who have both an account and a loan.
o Find all customers who have an account but do not have loan.
o Find all customers who have an account at all branches located in Agra
Outer Join.
Outer join operation is an extension of join operation to deal with missing
information

Suppose that we have following relational schemas:
Employee( employee-name, street, city)
Fulltime-works(employee-name, branch-name, salary)
A snapshot of these relations is as follows:
Employee:
employeename
Ram
Shyam
Suleman
street
city
M G Road
Agra
New Mandi Mathura
Road
Bhagat
Aligarh
Singh Road
Fulltime-works
employeename
Ram
Shyam
branchname
Sadar
Sanjay
salary
30000
20000
Prepared By Nirjharinee Parida
Database Management System
Rehman
Place
Dayalbagh
40000
Suppose we want complete information of the full time employees.

The natural join (
)will result into the loss of information
for Suleman and Rehman because they don’t have record in both the tables ( left
and right relation). The outer join will solve the problem.

Three forms of outer join:
o Left outer join(
:the tuples which doesn’t match while doing natural
join from left relation are also added in the result putting null values in
missing field of right relation.
o Right outer join(
:the tuples which doesn’t match while natural join
from right relation are also added in the result putting null values in
missing field of left relation.
o Full outer join(
): include both of the left and right outer joins i.e.
adds the tuples which did not match either in left relation or right relation
and put null in place of missing values.

The result for three forms of outer join are as follows:
Left join:
employeename
Ram
Shyam
Suleman
=
street
City
M G Road
Agra
New Mandi Mathura
Road
Bhagat
Aligarh
Singh Road
branchname
Sadar
Sanjay
Place
Null
salary
30000
20000
Null
Prepared By Nirjharinee Parida
Database Management System
Right join:
employeename
Ram
Shyam
Rehman
=
street
city
branchname
Sadar
Sanjay
Place
Dayalbagh
salary
salary
Aligarh
branchname
Sadar
Sanjay
Place
null
null
Dayalbagh 40000
M G Road
Agra
New Mandi Mathura
Road
null
null
Full join:
employeename
Ram
Shyam
Suleman
Rehman
30000
20000
40000
=
street
city
M G Road
New Mandi
Road
Bhagat
Singh Road
null
Agra
Mathura
30000
20000
null
Structured Query Language (SQL)
Introduction

Commercial database systems use more user friendly language to specify the
queries.

SQL is the most influential commercially marketed product language.

Other commercially used languages are QBE, Quel, and Datalog.
Basic Structure

The basic structure of an SQL consists of three clauses: select, from and
where.

select: it corresponds to the projection operation of relational algebra. Used to
list the attributes desired in the result.

from: corresponds to the Cartesian product operation of relational algebra. Used
to list the relations to be scanned in the evaluation of the expression
Prepared By Nirjharinee Parida
Database Management System

where: corresponds to the selection predicate of the relational algebra. It
consists of a predicate involving attributes of the relations that appear in the from
clause.

A typical SQL query has the form:
select A1, A2,…, An
from r1, r2,…, rm
where P
o Ai represents an attribute
o rj represents a relation
o P is a predicate
o It is equivalent to following relational algebra expression:
o
[Note: The words marked in dark in this text work as keywords in SQL language. For
example “select”, “from” and “where” in the above paragraph are shown in bold font to
indicate that they are keywords]
Select Clause
Let us see some simple queries and use of select clause to express them in
SQL.

Find the names of all branches in the Loan relation
select branch-name
from Loan

By default the select clause includes duplicate values. If we want to force the
elimination of duplicates the distinct keyword is used as follows:
select distinct branch-name
from Loan

The all key word can be used to specify explicitly that duplicates are not
removed. Even if we not use all it means the same so we don’t require all to use
in select clause.
select all branch-name
Prepared By Nirjharinee Parida
Database Management System
from Loan

The asterisk “*” can be used to denote “all attributes”. The following SQL
statement will select and all the attributes of Loan.
select *
from Loan

The arithmetic expressions involving operators, +, -, *, and / are also allowed in
select clause. The following statement will return the amount multiplied by 100
for the rows in Loan table.
select branch-name, loan-number, amount * 10 from Loan.
Where Clause

Find all loan numbers for loans made at “Sadar” branch with loan amounts
greater than Rs 1200.
select loan-number
from Loan
where branch-name= “Sadar” and amount > 1200

where clause uses uses logival connectives and, or, and not

operands of the logical connectives can be expressions involving the comparison
operators <, <=, >, >=, =, and < >.

between can be used to simplify the comparisons
select loan-number
from Loan
where amount between 90000 and 100000
From Clause

The from clause by itself defines a Cartesian product of the relations in the
clause.

When an attribute is present in more than one relation they can be referred as
relation-name.attribute-name to avoid the ambiguity.

For all customers who have loan from the bank, find their names and loan
numbers
select distinct customer-name, Borrower.loan-number
Prepared By Nirjharinee Parida
Database Management System
from Borrower, Loan
where Borrower.loan-number = Loan.loan-number
The Rename Operation

Used for renaming both relations both relations and attributes in SQL

Use as clause: old-name as new-name

Find the names and loan numbers of the customers who have a loan at the
“Sadar” branch.
select distinct customer-name, borrower.loan-number as loan-id
from Borrower, Loan
where Borrower.loan-number = Loan.loan-number and
branch-name = “Sadar”
we can now refer the loan-number instead by the name loan-id.

For all customers who have a loan from the bank, find their names and loannumbers.
select distinct customer-name, T.loan-number
from Borrower as T, Loan as S
where T.loan-number = S.loan-number

Find the names of all branches that have assets greater than at least one branch
located in “Mathura”.
select distinct T.branch-name
from branch as T, branch as S
where T.assets > S.assets and S.branch-city = “Mathura”
String Operation

Two special characters are used for pattern matching in strings:
o Percent ( % ) : The % character matches any substring
o Underscore( _ ): The _ character matches any character

“%Mandi”: will match with the strings ending with “Mandi” viz. “Raja Ki mandi”,
“Peepal Mandi”

“_ _ _” matches any string of three characters.
Prepared By Nirjharinee Parida
Database Management System

Find the names of all customers whose street address includes the substring
“Main”
select customer-name
from Customer
where customer-street like “%Main%”
Set Operations

union, intersect and except operations are set operations available in SQL.

Relations participating in any of the set operation must be compatible; i.e. they
must have the same set of attributes.

Union Operation:
o Find all customers having a loan, an account, or both at the bank
(select customer-name
from Depositor )
union
(select customer-name
from Borrower )
It will automatically eliminate duplicates.
o If we want to retain duplicates union all can be used
(select customer-name
from Depositor )
union all
(select customer-name
from Borrower )

Intersect Operation
o Find all customers who have both an account and a loan at the bank
(select customer-name
from Depositor )
intersect
(select customer-name
from Borrower )
Prepared By Nirjharinee Parida
Database Management System
o If we want to retail all the duplicates
(select customer-name
from Depositor )
intersect all
(select customer-name
from Borrower )

Except Opeartion
o Find all customers who have an account but no loan at the bank
(select customer-name
from Depositor )
except
(select customer-name
from Borrower )
o
If we want to retain the duplicates:
(select customer-name
from Depositor )
except all
(select customer-name
from Borrower )
Aggregate Functions

Aggregate functions are those functions which take a collection of values as input
and return a single value.

SQL offers 5 built in aggregate functionso Average: avg
o Minimum:min
o Maximum:max
o Total: sum
o Count:count

The input to sum and avg must be a collection of numbers but others may have
collections of non-numeric data types as input as well
Prepared By Nirjharinee Parida
Database Management System

Find the average account balance at the Sadar branch
select avg(balance)
from Account
where branch-name= “Sadar”
The result will be a table which contains single cell (one row and one column)
having numerical value corresponding to average balance of all account at sadar
branch.

group by clause is used to form groups, tuples with the same value on all
attributes in the group by clause are placed in one group.

Find the average account balance at each branch
select branch-name, avg(balance)
from Account
group by branch-name

By default the aggregate functions include the duplicates.

distinct keyword is used to eliminate duplicates in an aggregate functions:

Find the number of depositors for each branch
select branch-name, count(distinct customer-name)
from Depositor, Account
where Depositor.account-number = Account.account-number
group by branch-name

having clause is used to state condition that applies to groups rather than tuples.

Find the average account balance at each branch where average account
balance is more than Rs. 1200
select branch-name, avg(balance)
from Account
group by branch-name
having avg(balance) > 1200

Count the number of tuples in Customer table
select count(*)
from Customer

SQL doesn’t allow distinct with count(*)
Prepared By Nirjharinee Parida
Database Management System

When where and having are both present in a statement where is applied
before having.
Nested Sub queries

A subquery is a select-from-where expression that is nested within another
query.

Set Membership

The in and not in connectives are used for this type of subquery.

“Find all customers who have both a loan and an account at the bank”, this query
can be written using nested subquery form as follows
select distinct customer-name
from Borrower
where customer-name in(select customer-name
from Depositor )

Select the names of customers who have a loan at the bank, and whose names
are neither “Smith” nor “Jones”
select distinct customer-name
from Borrower
where customer-name not in(“Smith”, “Jones”)

Set Comparison

Find the names of all branches that have assets greater than those of at least
one branch located in Mathura
select branch-name
from Branch
where asstets > some (select assets
from Branch
where branch-city = “Mathura” )

Apart from > some others comparison could be < some , <= some , >= some
, = some , < > some.
Prepared By Nirjharinee Parida
Database Management System

Find the names of all branches that have assets greater than that of each branch
located in Mathura
select branch-name
from Branch
where asstets > all (select assets
from Branch
where branch-city = “Mathura” )

Apart from > all others comparison could be < all , <= all , >= all , = all ,
<
>all.
Views
 in SQL create view command is used to define a view as follows:
create view v as <query expression>
where <query expression> is any legal query expression and v is the view name.

either an account or a loan at the branch. This can be defined as follows:
create view All-customer as
(select branch-name, customer-name
from Depositor, Account
where Depositor.account-number=account.account-number)
union
(select branch-name, customer-name
from Borrower, Loan
where Borrower.loan-number = Loan.loan-number)
 The attributes names may be specified explicitly within a set of round bracket
after the name of view.

names may be used as relations in subsequent queries. Using the
view Allcustomer
Find all customers of Sadar branch
Prepared By Nirjharinee Parida
Database Management System
select customer-name
from All-customer
where branch-name= “Sadar”

create-view clause creates a view definition in the database which stays until
a command - drop view view-name - is executed.
Modification of Database
Deletion
 In SQL we can delete only whole tuple and not the values on any particular
attributes. The command is as follows:
delete from r
where P.
where P is a predicate and r is a relation.
 delete command operates on only one relation at a time. Examples are as
follows:
 Delete all tuples from the Loan relation
delete from Loan
 Delete all of the Smith’s account record
delete from Depositor
where customer-name = “Smith”
 Delete all loans with loan amounts between Rs 1300 and Rs 1500.
delete from Loan
where amount between 1300 and 1500
 Delete the records of all accounts with balances below the average at the bank
delete from Account
where balance < ( select avg(balance)
from Account)
Prepared By Nirjharinee Parida
Database Management System
Insertion
In SQL we either specify a tuple to be inserted or write a query whose result is a
set of tuples to be inserted. Examples are as follows:
Insert an account of account number A-9732 at the Sadar branch having balance
of Rs 1200
insert into Account
values(“Sadar”, “A-9732”, 1200)
the values are specified in the order in which the corresponding attributes are
listed in the relation schema.
SQL allows the attributes to be specified as part of the insert statement
insert into Account(account-number, branch-name, balance)
values(“A-9732”, “Sadar”, 1200)
insert into Account(branch-name, account-number, balance)
values(“Sadar”, “A-9732”, 1200)
Provide for all loan customers of the Sadar branch a new Rs 200 saving account
for each loan account they have. Where loan-number serve as the account number
for these accounts.
insert into Account
select branch-name, loan-number, 200
from Loan
where branch-name = “Sadar”
Updates
Used to change a value in a tuple without changing all values in the tuple.
Suppose that annual interest payments are being made, and all balances are to be
increased by 5 percent.
update Account
set balance = balance * 1.05
Suppose that accounts with balances over Rs10000 receive 6 percent interest,
Prepared By Nirjharinee Parida
Database Management System
whereas all others receive 5 percent.
update Account
set balance = balance * 1.06
where balance > 10000
update Account
set balance = balance * 1.05
where balance <= 10000
Data Definition Language
Data Types in SQL
char(n): fixed length character string, length n.
varchar(n): variable length character string, maximum length n.
int: an integer.
smallint: a small integer.
numeric(p,d): fixed point number, p digits( plus a sign), and d of the p digits are
to right of the decimal point.
real, double precision: floating point and double precision numbers.
float(n): a floating point number, precision at least n digits.
date: calendar date; four digits for year, two for month and two for day of month.
time: time of day n hours minutes and seconds.
Domains can be defined as
create domain person-name char(20).
the domain name person-name can be used to define the type of an attribute just like
built-in domain.
Schema Definition in SQL
create table command is used to define relations.
create table r (A1D1, A2D2,… , AnDn,
<integrity constraint1>,
…,
<integrity constraintk>)
Prepared By Nirjharinee Parida
Database Management System
where r is relation name, each Ai is the name of attribute, Di is the domain type of
values of Ai. Several types of integrity constraints are available to define in SQL.
Integrity Constraints which are allowed in SQL are
primary key(Aj1, Aj2,… , Ajm)
and
check(P) where P is the predicate.
drop table command is used to remove relations from database.
alter table command is used to add attributes to an existing relation
alter table r add A D
it will add attribute A of domain type D in relation r.
alter table r drop A
it will remove the attribute A of relation r.
Integrity Constraints

Integrity Constraints guard against accidental damage to the database.

Integrity constraints are predicates pertaining to the database.

Domain Constraints:

Predicates defined on the domains are Domain constraints.

Simplest Domain constraints are defined by defining standard data types of the
attributes like Integer, Double, Float, etc.

We can define domains by create domain clause also we can define the
constraints on such domains as follows:
create domain hourly-wage numeric(5,2)
constraint wage-value-test check(value >= 4.00)

So we can use hourly-wage as data type for any attribute where DBMS will
automatically allow only values greater than or equal to 4.00.

Other examples for defining Domain constraints are as follows:
Prepared By Nirjharinee Parida
Database Management System
create domain account-number char(10)
constraint account-number-null-test check(value not null)
create domain account-type char(10)
constraint account-type-test
check (value in ( “Checking”, “Saving”))
By using the later domain of two above the DBMS will allow only values for any attribute
having type as account-type i.e. Checking and Saving.

Referential Integrity:

Foreign Key: If two table R and S are related to each other, K1 and K2 are
primary keys of the two relations also K1 is one of the attribute in S. Suppose we
want that every row in S must have a corresponding row in R, then we define the
K1 in S as foreign key. Example in our original database of library we had a
table for relation BORROWEDBY, containing two fields Card No. and Acc. No. .
Every row of BORROWEDBY relation must have corresponding row in USER
Table having same Card No. and a row in BOOK table having same Acc. No..
Then we will define the Card No. and Acc. No. in BORROWEDBY relation as
foreign keys.

In other way we can say that every row of BORROWEDBY relation must refer to
some row in BOOK and also in USER tables.

Such referential requirement in one table to another table is called Referential
Integrity.

Referential Integrity constraints are defined by defining some of the attributes in a
table, which forms primary key of some other table, as foreign key.
Functional Dependencies
Suppose in a relation having schema R,
. A functional dependency
holds on R if, in any table having schema R, for every two rows r1 and r2 the values of
attributes
are same in r1 and r2 then values of attributes
are also same.
Consider for example the table as follows
Prepared By Nirjharinee Parida
Database Management System
Check if
Seq A
B
C
D
1
a1
b1
c1
d1
2
a1
b2
c1
d2
3
a2
b2
c2
d2
4
a2
b3
c2
d3
5
a3
b3
c2
d4
Holds, find pair of rows where value of A is same

Row 1 and 2, value of A is same and C is also same

row 3 and 4, Value of A is same and C is also same

No other two rows having same value on A, So
Check if
holds.
Holds, find pair of rows where value of C is same

row 1 and 2, value of C is same and A is also same

row 3 and 4, value of C is same and A is also same

row 4 and 5, value of C is same but A is not same, So
We can prove
doesn’t hold.
also holds, find pair of rows where value of A and B are both
same

No row where A and B both are same, So
holds
If K is a super key of a relation R then it means functional dependency
holds and
vice versa.
Armstrong’s Rules: Suppose there is a given relation R and a set of functional
dependencies F that holds on R. Then these rules can be used to derive all of the other
functional dependencies which are logically implied from the given relation R and
functional dependencies F.

Reflexivity rule: if

Augmentation rule: if

Transitivity rule: if
is a set of attributes and
holds and
holds and
, then
holds.
is a set of attributes, then
holds, then
holds.
holds.
Prepared By Nirjharinee Parida
Database Management System
Additional rules are also formed to simplify deriving new functional dependencies since
applying Armstrong’s rules is a lengthy and tiresome task. Although we can generate all
the functional dependencies using only Armstrong’s rule.

Union rule: if

Decomposition rule. if

Pseudotransitivity rule. If
holds and
holds, then
holds, then
holds and
holds.
holds and
holds, then
holds.
holds.
Closure of Functional Dependencies: Suppose the given set of functional
dependencies is F for a given relation schema R. When we apply various rules stated
above and generate all of the possible newer functional dependencies. Then the set
containing all these newer functional dependencies and the given set of functional
dependencies F is called the closure of functional dependencies and is denoted as F +.
Consider schema R=( A, B, C, G, H, I ) and the set of functional dependencies F
containing following functional dependencies.
Find other functional dependencies that can be derived using various rules given above
Examples are as followscan be derived using functional dependencies 1 and 5 and transitivity rule.
can be derived using functional dependencies 3 and 4 and union rule.
can be derived using 2 and 4 and Pseudotransitivity.
Prepared By Nirjharinee Parida
Database Management System
Normal Forms
Some of the undesirable properties that a bad database design may have
o Repetition of information
o Inability to represent certain information
o Incapability to maintain integrity of data
The normal forms of relational database theory provide criteria for determining a table's
degree of vulnerability to logical inconsistencies and anomalies.

The higher the normal form applicable to a table, the less vulnerable it is to
inconsistencies and anomalies.

Each table has a "highest normal form" (HNF): by definition, a table always
meets the requirements of its HNF and of all normal forms lower than its HNF;
also by definition, a table fails to meet the requirements of any normal form
higher than its HNF.

Generally known hierarchy of normal forms is as follows First Normal Form(1NF),
Second Normal Form(2NF), Third Normal Form(3NF), Fourth Normal Form(4NF),
Fifth Normal Form(5NF).

We will discuss only up to 3NF of above hierarchy and another normal form
Boyce-Codd Normal Form(BCNF) in this course.
First Normal Form

According to Date's definition of 1NF, a table is in 1NF if and only if it is
"isomorphic to some relation", which means, specifically, that it satisfies the
following five conditions:
1. There's no top-to-bottom ordering to the rows.
2. There's no left-to-right ordering to the columns.
3. There are no duplicate rows.
4. Every row-and-column intersection contains exactly one value from the
applicable domain (and nothing else).
Prepared By Nirjharinee Parida
Database Management System
5. All columns are regular [i.e. rows have no hidden components such as
row IDs, object IDs, or hidden timestamps].

Examples of tables (or views) that would not meet this definition of 1NF are:
A table that lacks a unique key. Such a table would be able to accommodate duplicate
rows, in violation of condition 3.
A view whose definition mandates that results be returned in a particular order, so that
the row-ordering is an intrinsic and meaningful aspect of the view. This violates
condition 1. The tuples in true relations are not ordered with respect to each other.
A table which is having at least one nullable attribute. A nullable attribute would be in
violation of condition 4, which requires every field to contain exactly one value from its
column's domain. It should be noted, however, that this aspect of condition 4 is
controversial. It marks an important departure from Codd's later vision of the relational
model, which made explicit provision for nulls.
Codd states that the "values in the domains on which each relation is defined are
required to be atomic with respect to the DBMS." Codd defines an atomic value as one
that "cannot be decomposed into smaller pieces by the DBMS (excluding certain special
functions)." Meaning a field should not be divided into parts with more than one kind of
data in it such that what one part means to the DBMS depends on another part of the
same field.
Suppose a novice designer wish to record the names and telephone numbers of
customers. He defines a customer table which looks like this:
Prepared By Nirjharinee Parida
Database Management System
Customer
Customer ID
First Name
Surname
Telephone
Number
123
Robert
Ingram
555-861-2025
456
Jane
Wright
555-403-1659
789
Maria
Fernandez
555-808-9633
The designer then becomes aware of a requirement to record multiple telephone
numbers for some customers. He reasons that the simplest way of doing this is to allow
the "Telephone Number" field in any given record to contain more than one value:
Customer
ID
First Name
Surname
Telephone
Number
123
Robert
Ingram
555-861-2025
456
Jane
Wright
555-403-1659
555-776-4100
789
Maria
Fernandez
555-808-9633
Assuming, however, that the Telephone Number column is defined on some
Telephone Number-like domain (e.g. the domain of strings 12 characters in
length), the representation above is not in 1NF. 1NF (and, for that matter, the
RDBMS) prevents a single field from containing more than one value from its
column's domain.

Repeating groups across columns: The designer might attempt to get
around this restriction by defining multiple Telephone Number columns:
Prepared By Nirjharinee Parida
Database Management System
Customer First
ID
Name
Surname
Tel. No. 1
123
Robert
Ingram
555-8612025
456
Jane
Wright
555-4031659
789
Maria
Fernandez
555-8089633
Tel. No. 2
Tel. No. 3
555-776-4100
555-4031659
This representation, however, makes use of nullable columns, and therefore does not
conform to Date's definition of 1NF. Even if the view is taken that nullable columns are
allowed, the design is not in keeping with the spirit of 1NF.Tel. No. 1, Tel. No. 2., and
Tel. No. 3. share exactly the same domain and exactly the same meaning; the splitting
of Telephone Number into three headings is artificial and causes logical problems.
These problems include:
Difficulty in querying the table. Answering such questions as "Which customers have
telephone number X?" and "Which pairs of customers share a telephone number?" is
awkward.
Inability to enforce uniqueness of Customer-to-Telephone Number links through the
RDBMS. Customer 789 might mistakenly be given a Tel. No. 2 value that is exactly the
same as her Tel. No. 1 value.
Restriction of the number of telephone numbers per customer to three. If a customer
with four telephone numbers comes along, we are constrained to record only three and
leave the fourth unrecorded. This means that the database design is imposing
constraints on the business process, rather than (as should ideally be the case) viceversa.
Prepared By Nirjharinee Parida
Database Management System
Repeating groups within columns: The designer might, alternatively, retain the single
Telephone Number column but alter its domain, making it a string of sufficient length to
accommodate multiple telephone numbers:
Customer
ID
First
Name
Surname
Telephone
Numbers
123
Robert
Ingram
555-861-2025
456
Jane
Wright
555-403-1659,
555-776-4100
789
Maria
Fernandez
555-808-9633
This design is consistent with 1NF according to Date’s definition but not according to
Codd’s definition. It presents several design issues. The Telephone Number heading
becomes semantically woolly, as it can now represent either a telephone number, a list
of telephone numbers, or indeed anything at all. A query such as "Which pairs of
customers share a telephone number?" is more difficult to formulate, given the
necessity to cater for lists of telephone numbers as well as individual telephone
numbers. Meaningful constraints on telephone numbers are also very difficult to define
in the RDBMS with this design.
A design that complies with 1NF:A design that is unambiguously in 1NF makes use
of two tables: a Customer Name table and a Customer Telephone Number table.
Prepared By Nirjharinee Parida
Database Management System
Customer Name
Customer
ID
First
Name
Surname
Customer Telephone
Customer ID
Telephone Number
123
555-861-2025
456
555-403-1659
123
Robert
Ingram
456
Jane
Wright
456
555-776-4100
789
Maria
Fernandez
789
555-808-9633
Repeating groups of telephone numbers do not occur in this design. Instead, each
Customer-to-Telephone Number link appears on its own record.
It is worth noting that this design meets the additional requirements for second and third
normal form (3NF).
Second Normal Form
2NF was originally defined by E.F. Codd in 1971.
A 1NF table is in 2NF if and only if, given any candidate key K and any attribute A that
is not a constituent of a candidate key, A depends upon the whole of K rather than just
a part of it
A 1NF table is in 2NF if and only if all its non-prime attributes are functionally dependent
on the whole of every candidate key. (A non-prime attribute is one that does not belong
to any candidate key.)
Note that when a 1NF table has no composite candidate keys (candidate keys
consisting of more than one attribute), the table is automatically in 2NF.
Consider a table describing employees' skills:
Prepared By Nirjharinee Parida
Database Management System
Employees' Skills
Employee
Skill
Current
Location
Work
Jones
Typing
114 Main Street
Jones
Shorthand
114 Main Street
Jones
Whittling
114 Main Street
Bravo
Light Cleaning
73 Industrial Way
Ellis
Alchemy
73 Industrial Way
Ellis
Flying
73 Industrial Way
Harrison
Light Cleaning
73 Industrial Way
Neither {Employee} nor {Skill} is a candidate key for the table. This is because a given
Employee might need to appear more than once (he might have multiple Skills), and a
given Skill might need to appear more than once (it might be possessed by multiple
Employees). Only the composite key {Employee, Skill} qualifies as a candidate key for
the table.
The remaining attribute, Current Work Location, is dependent on only part of the
candidate key, namely Employee. Therefore the table is not in 2NF. Note the
redundancy in the way Current Work Locations are represented: we are told three times
that Jones works at 114 Main Street, and twice that Ellis works at 73 Industrial Way.
This redundancy makes the table vulnerable to update anomalies: it is, for example,
possible to update Jones' work location on his "Typing" and "Shorthand" records and
not update his "Whittling" record. The resulting data would imply contradictory answers
to the question "What is Jones' current work location?"
Prepared By Nirjharinee Parida
Database Management System
A 2NF alternative to this design would represent the same information in two tables: an
"Employees" table with candidate key {Employee}, and an "Employees' Skills" table with
candidate key {Employee, Skill}:
Employees’ Skills
Employees
Employee
Jones
Bravo
Ellis
Harrison
Current Work Location
114 Main Street
73 Industrial Way
73 Industrial Way
73 Industrial Way
Employee
Jones
Jones
Jones
Bravo
Ellis
Ellis
Harrison
Skill
Typing
Shorthand
Whittling
Light Cleaning
Alchemy
Flying
Light Cleaning
Neither of these tables can suffer from update anomalies.
Not all 2NF tables are free from update anomalies, however. An example of a 2NF table
which suffers from update anomalies is:
Tournament Winners
Tournament
Year
Winner
Des Moines Masters
Indiana Invitational
Cleveland Open
Des Moines Masters
Indiana Invitational
1998
1998
1999
1999
1999
Chip Masterson
Al Fredrickson
Bob Albertson
Al Fredrickson
Chip Masterson
Winner Date of
Birth
14 March 1977
21 July 1975
28 September 1968
21 July 1975
14 March 1977
Even though Winner and Winner Date of Birth are determined by the whole key
{Tournament / Year} and not part of it, particular Winner / Winner Date of Birth
combinations are shown redundantly on multiple records. This leads to an update
anomaly: if updates are not carried out consistently, a particular winner could be shown
as having two different dates of birth.
Prepared By Nirjharinee Parida
Database Management System
The underlying problem is the transitive dependency to which the Winner Date of Birth
attribute is subject. Winner Date of Birth actually depends on Winner, which in turn
depends on the key Tournament / Year.
This problem is addressed by third normal form (3NF)
Note: In addition to the primary key, the table may contain other candidate keys; it is
necessary to establish that no non-prime attributes have part-key dependencies on any
of these candidate keys.
Third Normal Form:

3NF as defined by E.F. Codd in 1971 is - a table is in 3NF if and only if both of
the following conditions hold:
The relation R (table) is in second normal form (2NF)
Every non-prime attribute of R is non-transitively dependent (i.e. directly dependent) on
every candidate key of R.
Note:
A non-prime attribute of R is an attribute that does not belong to any candidate key of
R. A transitive dependency is a functional dependency in which X → Z (X determines Z)
indirectly, because X → Y and Y → Z (where it is not the case that Y → X).
A 3NF definition, equivalent to Codd's given by Carlo Zaniolo in 1982, states that a table
is in 3NF if and only if, for each of its functional dependencies X → A, at least one of
the following conditions holds:
X contains A (that is, X → A is trivial functional dependency), or
X is a superkey, or Each attribute in X-A is a prime attribute (i.e., it is contained within
a candidate key)
Prepared By Nirjharinee Parida
Database Management System
Zaniolo's definition gives a clear sense of the difference between 3NF and the more
stringent Boyce-Codd normal form (BCNF). BCNF simply eliminates the third alternative
("X-A has only prime attribute").
Difference between 2NF and 3NF can be stated as: non-key attributes be dependent on
"the whole key" ensures that a table is in 2NF; while that non-key attributes be
dependent on "nothing but the key" ensures that the table is in 3NF.
Example of table given above :
Tournament
Des Moines Masters
Indiana Invitational
Cleveland Open
Des Moines Masters
Indiana Invitational
Tournament Winners
Year
1998
1998
1999
1999
1999
Winner
Chip Masterson
Al Fredrickson
Bob Albertson
Al Fredrickson
Chip Masterson
Winner Date of Birth
14 March 1977
21 July 1975
28 September 1968
21 July 1975
14 March 1977
This table is in 2NF but not in 3NF. The breach of 3NF occurs because the non-prime
attribute Winner Date of Birth is transitively dependent on the candidate key
{Tournament, Year} via the non-prime attribute Winner. The fact that Winner Date of
Birth is functionally dependent on Winner makes the table vulnerable to logical
inconsistencies, as there is nothing to stop the same person from being shown with
different dates of birth on different records.
In order to express the same facts without violating 3NF, it is necessary to split the table
into two:
Tournament Winners
Tournament
Des Moines Masters
Indiana Invitational
Cleveland Open
Des Moines Masters
Indiana Invitational
Year
1998
1998
1999
1999
1999
Winner
Chip Masterson
Al Fredrickson
Bob Albertson
Al Fredrickson
Chip Masterson
Prepared By Nirjharinee Parida
Database Management System
Player Dates of Birth
Player
Chip Masterson
Al Fredrickson
Bob Albertson
Date of Birth
14 March 1977
21 July 1975
28 September 1968
Boyce-Codd Normal Form:

It is a slightly stronger version of the third normal form (3NF). A table is in BoyceCodd normal form if and only if for every one of its non-trivial [dependencies] X
→ Y, X is a superkey—that is, X is either a candidate key or a superset thereof.

Note the above set of tables “Tournament Winners” and “Player Dates of Birth”
shown as in 3NF are also in BCNF

Only in rare cases does a 3NF table not meet the requirements of BCNF. A 3NF
table which does not have multiple overlapping candidate keys is guaranteed to
be in BCNF

An example of a 3NF table that does not meet BCNF is
Today's Court Bookings
Court
1
1
1
2
2
2
Start Time
09:30
11:00
14:00
10:00
11:30
15:00
End Time
10:30
12:00
15:30
11:30
13:30
16:30
Rate Type
SAVER
SAVER
STANDARD
PREMIUM-B
PREMIUM-B
PREMIUM-A
There are two courts available and there are four distinct rate types:

SAVER, for Court 1 bookings made by members

STANDARD, for Court 1 bookings made by non-members

PREMIUM-A, for Court 2 bookings made by members

PREMIUM-B, for Court 2 bookings made by non-members
Prepared By Nirjharinee Parida
Database Management System
So, Rate Type → Court is only non-trivial functional dependency that holds.
o We can observe that the table's candidate keys are:

{Court, Start Time}

{Court, End Time}

{Rate Type, Start Time}

{Rate Type, End Time}
o In the Today's Court Bookings table, there are no non-prime attributes:
that is, all attributes belong to candidate keys. Therefore the table adheres
to both 2NF and 3NF
o The table does not adhere to BCNF because in the dependency Rate
Type → Court, the determining attribute (Rate Type) is not a super key.
The design can be amended so that it meets BCNF as follows:
Today’s Bookings
Rate Types
Rate Type
SAVER
STANDARD
PREMIUM-A
PREMIUM-B
Court
1
1
2
2
Member Flag
Yes
No
Yes
No
Rate Type
SAVER
SAVER
STANDARD
PREMIUM-B
Start Time
09:30
11:00
14:00
10:00
End Time
10:30
12:00
15:30
11:30
the candidate keys for the Today's Bookings table are {Rate Type, Start Time}
and {Rate Type, End Time}. Both tables are in BCNF.
Consider the following table:
Lending
branchname
Sadar
Sanjayplace
branch-city
assets
Agra
Agra
200000
100000
customername
Ram
Ram
loan-number
amount
L-12
L-13
12000
13000
This table stores the information regarding loans. This table has following problems:
Prepared By Nirjharinee Parida
Database Management System

Since every branch is going to have several loans, the table will have one row for
each loan taken from a branch all of which will have same value for the columns
branch-name, branch-city and assets, repetition of data.

Updating the branch-city or assets of a particular branch will require updating
each row of this table and hence the operation will be costly.

If we miss any row without updating then there will be more than one value for a
branch city or assets of a branch, which means breaching the data integrity.

If there is a branch having no loans then we will not have any entry in this table
and we will not be able represent the complete information.
Decomposition
 The above problem can be solved by decomposing the above table. The set of
relations R1, R2,…Rn is a decomposition of relation R if R = R1
R2 … Rn . It
should be noted that every pair Ri and Ri+1 of this set should have at least one
common attribute so that they can be combined back again using join operation.
 But all decompositions of this table will not be free from problem.
 Consider for example if we form two new tables out of our Lending table as
follows
Branch-customer-schema = (branch-name, branch-city, assets, customer name)
Customer-loan-schema = (customer-name, loan-number, amount)
Then the resulting tables with data will be as follows:
Branch-customer
branch-
branch-city
assets
name
customername
Sadar
Agra
200000
Ram
Sanjay-
Agra
100000
Ram
place
Prepared By Nirjharinee Parida
Database Management System
Customer-loan
customer-
loan-number
amount
Ram
L-12
12000
Ram
L-13
13000
name
Now suppose to know the branch for loan L-12 we try to form join of these two we will a
table as follows:
Branch-customer
branchname
Sadar
Sadar
Sanjayplace
Sanjayplace
Customer-loan =
branch-city
assets
loan-number
amount
200000
200000
100000
customername
Ram
Ram
Ram
Agra
Agra
Agra
L-12
L-13
L-12
12000
13000
12000
Agra
100000
Ram
L-13
13000
According to this join both of the loans are taken from both of the branches. This is an
example of information loss. This occurred because the choice of Column to be kept
common in two tables after decomposition is wrong.
Lossless-Join Decomposition: A decomposition { R1, R2,…Rn } of relation schema R
is lossless join decomposition if for all legal relations r on schema R,
r=
in other words after decomposition, when we join all of the decomposed tables with data
it should result in the original table with data as was before decomposition.
Otherwise it is called Lossy-join decomposition.
Dependency preservation: This is another desirable property of a decomposition.
Suppose it is given that a set F of functional dependencies holds on any relation based
on schema R. Then set of functional dependencies that holds on any relation
Prepared By Nirjharinee Parida
Database Management System
subschema R1 is F1 that contains all the functional dependencies of F which contains
attributes of only R1. So if decomposition of R is { R1, R2,…Rn } such that corresponding
functional dependencies which holds on them are { F1, F2,…Fn } then following should
be true.
F+ = {F1
…
F2
Fn}+.
Such a decomposition is called dependency preserving decomposition.
For example:
Consider the schema R = {A, B, C, D} such that following functional dependency holds
on it F = {
}.
Now suppose the decomposition of this R is R1= {A,B} and R2 = {B,C,D}, so the
functional dependencies which holds on R1 are F1= {
} (Note: F1 should contain all
the functional dependencies in F which have only attributes of R 1) and those on R2 are
F2 ={
}. If we union F1
F2 is {
} which doesn’t contain the
, so it
is not a dependency preserving decomposition.
If we decompose R into these relation schemas R1 ={A,B,C} and R2={C,D} then
F1={
} and F2 ={
} so F1
F2 is {
}.
Normalization Using Functional Dependency
Lossless-Join Decomposition using FD:
Let R is relation schema and F is a set of functional dependency on R. Let R 1 and R2
form a decomposition of R. This decomposition is lossless join decomposition if at least
one of the following functional dependency is in F+:

R1 ∩ R2 → R1

R1 ∩ R2 → R2
Example: Lending-schema=(branch-name, branch-city, assets, customer-name, loannumber, amount) the FD that holds on this schema are given as
branch-name → assets branch-city
loan-number → amount branch-name
so the decomposition of it into two schema as follows:
Branch-schema = (branch-name, branch-city, assets)
Prepared By Nirjharinee Parida
Database Management System
Loan-info-schema = (branch-name, customer-name, loan-number, amount)
is a lossless join decomposition becauseBranch-schema ∩ Loan-info-schema = branch-name
and we have an FD branch-name → assets branch-city, applying augmentation rule to
it, this FD is equivalent to branch-name → branch-name assets branch-city i.e. branchname →Branch-schema.
Third Normal Form Using FD:
Let R is a relation having F as the minimal set of functional dependencies that holds on
R.
Then do the following:

Initially have an empty set of relations.

for each FD in F,

Add a relation Ri =(
i=1
) if no other relation contains
, Increase i by one
After adding all such relations add another relation Ri = ( any candidate key of R) if no
other relation is containing a candidate key.
Boyce-Codd Normal Form using FD:
1. Let Ri be relation i.e. not in BCNF
2. And, let
is the FD that holds on but
doesn’t hold on (i.e.
is not
a super key of Ri)
3. Replace relation Ri by two relations (
) and (Ri - ).
4. Now check again all the relations present with all the FD’s that holds on
them and Go back to step 1.
Example:
Consider: Lending-schema=(branch-name, branch-city, assets, customer-name, loannumber, amount) the FD that holds on this schema are given as
1.branch-name → assets branch-city
2.loan-number → amount branch-name
Prepared By Nirjharinee Parida
Database Management System
We can see that Lending-schema is not in BCNF. Also we see that in FD branch-name
→ assets branch-city, branch-name is not superkey of Lending-schema. So new
relations is a set as follows:
Branch-schema=(branch-name, branch-city, assets)
branch-name → assets branch-city
Loan-info-schema = (branch-name, customer-name, loan-number, amount)
loan-number → amount branch-name
Again in the new set of relations we see Loan-info-schema is not in BCNF as loannumber is not a super key of Loan-info-schema. Again we decompose it and the set of
relations are
Branch-schema=(branch-name, branch-city, assets)
branch-name → assets branch-city
Loan-schema = (branch-name, loan-number, amount)
loan-number → amount branch-name
Borrower-schema = (customer-name, loan-number)
Now all of the three relations are in BCNF so we do not have to decompose any more.
3. BCNF may not satisfy the dependency preservation criteria.
a. In some cases, a non-BCNF table cannot be decomposed into tables that satisfy
BCNF and preserve the dependencies that held in the original table
b. For example, a set of functional dependencies {AB → C, C → B} cannot be
represented by a BCNF schema.
c. Unlike the first three normal forms, BCNF is not always achievable.
d. Consider the following non-BCNF table whose functional dependencies follow the
{AB → C, C → B} pattern:
Prepared By Nirjharinee Parida
Database Management System
Nearest Shop
Person
Davidson
Davidson
Wright
Fuller
Fuller
Fuller
Shop Type
Optician
Hairdresser
Bookshop
Bakery
Nearest Shop
Eagle Eye
Snippets
Merlin Books
Doughy's
Sweeney
Hairdresser
Todd's
Optician
Eagle Eye
For each Person / Shop Type combination, the table tells us which shop of this type is
geographically nearest to the person's home. We assume for simplicity that a single
shop cannot be of more than one type.
The candidate keys of the table are:
 {Person, Shop Type}
 {Person, Nearest Shop}
Because all three attributes are prime attributes (i.e. belong to candidate keys), the
table is in 3NF. The table is not in BCNF, however, as the Shop Type attribute is
functionally dependent on a non-superkey: Nearest Shop.
Shop Near Person
Person
Davidson
Davidson
Wright
Fuller
Fuller
Fuller
Shop
Eagle Eye
Snippets
Merlin Books
Doughy's
Sweeney
Todd's
Eagle Eye
Shop
Shop
Eagle Eye
Snippets
Merlin Books
Doughy's
Sweeney Todd's
Shop Type
Optician
Hairdresser
Bookshop
Bakery
Hairdresser
The "Shop Near Person" table has a candidate key of {Person, Shop}, and the "Shop"
table has a candidate key of {Shop}. Unfortunately, although this design adheres to
Prepared By Nirjharinee Parida
Database Management System
BCNF, it is unacceptable on different grounds: it allows us to record multiple shops of
the same type against the same person. In other words, its candidate keys do not
guarantee that the functional dependency {Person, Shop Type} → {Shop} will be
respected.
Prepared By Nirjharinee Parida
Database Management System
Module-II
Short Questions
2008
1. Let R=(A,B,C,D) and functional dependencies
(1) A->C, (2) AB->D. What is the closure of {A,B} ? [2][ Database design]
2. What do you mean by semi less join? [2][ Database design]
3. What do you mean by multi-valued dependency? [2][ Database design]
4. Define and differentiate between Natural Join and Inner Join. [2][ Database design]
2009
No questions
2010
1. Consider a multilevel indexwith fan-out=4 used to index 25 records, draw the structure. (1) A->C, (2)
AB->D. What is theclosure of {A, B}? [2][ Database design]
2. For a relation R(A,B,C,D) with the dependency among numeric field values: 6= 2A + B and 9 = 2E.
Draw the E-Rdiagram. [2][ Database design]
3. What is a query tree? Draw the query treefor the following SQL query:Select R1 .A, R2.B from R1,
R2 where R1. b=R2.a; [2][ Query processing]
4. For,the following set of dependencies :
{A -> BC, B-> D,C -> DE, BC ->F} Find primary key of the relation. [2][ Database design]
5. What do you mean by RAID? [2][ Database design]
2011
1. What is fully functional dependency? Give an example. [2][ Database design]
2. What is multivalued dependency? [2][ Database design]
2012
No questions
Long Questions
2008
Prepared By Nirjharinee Parida
Database Management System
1. What is normalization of relation? What is a key attribute in a relation? What is the difference
between 1st Normal Form, 2nd normal form and 3rd normal form? [5][Database design]
2. Define the structure and properties of B Tree. Explain how the B tree is used as an index structure.
Construct a B tree of order 3 with following key value : 10, 2, 30, 20, 86, 4, 6, 3, 60, 84, 88, 33, 52,
91, 69. [5][Query processing]
State Armstrong’s axioms. Show that Armstrong’s axioms are complete. [5][Query processing]
3. Explain the difference between inner join and outer join. What are the restrictions on using outer join?
Give examples to support your answer. [5][Query processing]
4. What does the term redundancy mean? Discuss the implications of redundancy in a relational
database. [5][Database design]
5. Define (i) Primary key, and (ii) Foreign key, Suppose relation R(A,B,C,D,E) has functional
dependencies:
AB ->C
D ->A
AE ->B
CD -> E
BE ->D Find all the candidate keys of R. [5][Database design]
6. Construct a B+ tree of order 1 with following keys, 1, 9, 5, 3, 7, 11, 17, 13, 15? [5][Query processing]
What is the use of outer join and list out the three types of outer join with the notations used in
relational algebra? [5][Database design]
2009
1. Suppose we have the following relation with given data as:
Roll_no Student_name Subject_code Subject_name Marks
101
Aasish
S01
English
80
101
Aasish
S02
Physics
56
101
Aasish
S03
Chemistry
63
102
Sushmita
S01
English
90
102
Sushmita
S02
Physics
85
102
Sushmita
S03
Chemistry
91
To avoid redundancy in the first proposal we have decomposed the relation in to the following relations.
Subject (Roll_no, student_name) And subject(subject_code, subject_name, marks))
In the second proposal it is decomposed in to the following relations.
Student(Roll_no, Student_name)
Result(Roll_no,subject_code, marks)
Subject(Subject_code, Subject_name)
State whether the decompsition are lossless or lossy. [5][Database design]
Prepared By Nirjharinee Parida
Database Management System
What is normalization? Is it necessary for RDBMS? Expalin 4NF and 5NF with example.
[5][Database design]
3. Consider the three relations given below.
2.
Order(order_no, order_date, customer_no )
Order_item(Order_no, item_no, quantity, bill_amount)
Item (item_no, item_name, unit_price) State whether the following relations are in 3NF or not if not
then what steps you should follow to bring those in to 3NF? [5][Database design]
What is query optimization? Write the Heuristic optimization algorithm and solve the query given
below:
Find the names of employees work on project name “NASA” and born after 1956.
Project (project_name, Project_number, Dept)
Employee (Emp_id, P_number, DOB, name)
4.
Work_on (P_no, Emp_id) [10][Database design]
Let us consider a relation with attributes A, B, C, D, E and F. Suppose that this relation has the FD
AB->, BC->AD, D->E and CF->B . What Is the closure of {A,B} that is {AB}+ [5][Database design]
2010
1. Consider the following set of data items :
A
B
C
A1
1
X1
A2
2
X2
A3
3
X3
A3
3
X3
Represent it in 3NF
D
D1
D2
D3
D3
[5][Database design]
2. What is the difference between 4NFand BCNF? Describe with examples. [5][Database design]
3. Write short notes on Multivalued dependencies . [5][Database design]
2011
1. Explain Armstrong’s axioms with examples. [5][Database design]
2012
1. What is normalization? Why it is required? Explain the Boyce-Codd normal form with an example.
[5][Database design]
2. Write short notes on the following
A. Hashing
B. Dependency preservation [5][Qyery processing, Database design]
Prepared By Nirjharinee Parida
Database Management System
MODULE -3
TRANSACTION MANAGEMENT
Prepared By Nirjharinee Parida
Database Management System
THE CONCEPT OF A TRANSACTION
A user writes data access/update programs in terms of the high-level query and update
language supported by the DBMS. To understand how the DBMS handles such
requests, with respect to concurrency control and recovery, it is convenient to regard an
execution of a user program, or transaction, as a series of reads and writes of
database objects:
To read a database object, it is must brought into main memory from disk, and then its
value is copied into a program variable.
To write a database object, an in-memory copy of the object is must modified and
then written to disk.
Database `objects' are the units in which programs read or write information. The
units could be pages, records, and so on, but this is dependent on the DBMS and
is not central to the principles underlying concurrency control or recovery. In this
chapter, we will consider a database to be a collection of independent objects.
When objects are added to or deleted from a database, or there are relationships
between database objects that we want to exploit for performance.
There are four important properties of transactions that a DBMS must ensure to
maintain data in the face of concurrent access and system failures:
1. Users should be able to regard the execution of each transaction as atomic: either
all actions are carried out or none are. Users should not have to worry about the
effect of incomplete transactions (say, when a system crash occurs).
2. Each transaction, run by itself with no concurrent execution of other transactions,
must preserve the consistency of the database. This property is called consistency,
and the DBMS assumes that it holds for each transaction. Ensuring this
property of a transaction is the responsibility of the user.
3. Users should be able to understand a transaction without considering the effect of
other concurrently executing transactions, even if the DBMS interleaves the actions
of several transactions for performance reasons. This property is sometimes
Prepared By Nirjharinee Parida
Database Management System
referred to as isolation: Transactions are isolated, or protected, from the effects
of concurrently scheduling other transactions.
4. Once the DBMS informs the user that a transaction has been successfully completed,
its effects should persist even if the system crashes before all its changes
are reflected on disk. This property is called durability.
The acronym ACID is sometimes used to refer to the four properties of transactions
that we have presented here: atomicity, consistency, isolation and durability. We now
consider how each of these properties is ensured in a DBMS.
TRANSACTIONS AND SCHEDULES
A transaction is seen by the DBMS as a series, or list, of actions. The actions that can
be executed by a transaction include reads and writes of database objects. A
transaction can also be de_ned as a set of actions that are partially ordered. That is, the
relative order of some of the actions may not be important. In order to concentrate on
the main issues, we will treat transactions (and later, schedules) as a list of actions.
A schedule is a list of actions (reading, writing, aborting, or committing) from a set of
transactions, and the order in which two actions of a transaction T appear in a schedule
must be the same as the order in which they appear in T.
For example, the schedule shows an execution order for actions of two transactions T1
and T2. We move forward in time as we go down from one row to the next. We
emphasize that a schedule describes the actions of transactions as seen by the DBMS.
In addition to these actions, a transaction may carry out other actions, such as reading
or writing from operating system files, evaluating arithmetic expressions, and so on.
T1
R(A)
W(A)
T2
R(B)
W(B)
(
A Schedule Involving Two Transactions)
Prepared By Nirjharinee Parida
Database Management System
R(C)
W(C)
Notice that the schedule in figure does not contain an abort or commit action for either
transaction. A schedule that contains either an abort or a commit for each transaction
whose actions are listed in it is called a complete schedule. A complete schedule must
contain all the actions of every transaction that appears in it. If the actions of different
transactions are not interleaved- that is, transactions are executed from start to finish,
one by one- we call the schedule a serial schedule.
The basic database access operations that a transaction can include are as follows:
• read_item(X): Reads a database item named X into a program variable.
• write_item(X): Writes the value of program variable X into the database item namedX.
Executing a read_item(X) command includes the following steps:
1. Find the address of the disk block that contains item X.
2. Copy that disk block into a buffer in main memory (if that disk block is not already in
some main memory buffer).
3. Copy item X from the buffer to the program variable named X.
Executing a write_item(X) command includes the following steps:
1. Find the address of the disk block that contains item X.
2. Copy that disk block into a buffer in main memory (if that disk block is not already in
some main memory buffer).
3. Copy item X from the program variable named X into its correct location in the buffer.
4. Store the updated block from the buffer back to disk (either immediately or at some
later point in time).
The DBMS will generally maintain a number of buffers in main memory that hold
database disk blocks containing the database items being processed. When these
buffers are all occupied, and additional database blocks must be copied into memory,
Prepared By Nirjharinee Parida
Database Management System
some buffer replacement policy is used to choose which of the current buffers is to be
replaced. If the chosen buffer has been modified, it must be written back to disk before it
is reused.
A transaction includes read_item and wri te_item operations to access and update the
database. Figure
2 shows examples of two very simple transactions. The read-set of a transaction is the
set of all items
that the transaction reads, and the write-set is the set of all items that the transaction
writes. For
example, the read-set of T1in Figure 2 is {X, Y} and its write-set is also {X, Y}.
(Figure 2: Two sample transactions. (a) Transaction T1
. (b)
Transaction T2.)
Why Concurrency Control Is Needed
Several problems can occur when concurrent transactions execute in an uncontrolled
manner. Figure 2( a) shows a transaction T1 that transfers N reservations from one
flight whose number of reserved seats is stored in the database item named X to
another flight whose number of reserved seats is stored in the database item named Y.
Prepared By Nirjharinee Parida
Database Management System
Figure 2(b) shows a simpler transaction T2 that just reserves M seats on the first flight
(X) referenced in transaction T1.
The Lost Update Problem: This problem occurs when two transactions that access he
same database items have their operations interleaved in a way that makes the value of
some database items incorrect.
Suppose that transactions T1 and T2 are submitted at approximately the same time,
and suppose that their operations are interleaved as shown in Figure 3a; then the final
value of item X is incorrect, because T2 reads the value of X before T1 changes it in the
database, and hence the updated value resulting from T1 is lost
Prepared By Nirjharinee Parida
Database Management System
Figure 3: Some problems that occur when concurrent execution is uncontrolled.
(a) The lost update problem. (b) The temporary update problem.
The Temporary Update (or Dirty Read) Problem. This problem occurs when one
transaction updates a database item and then the transaction fails for some reason. The
updated item is accessed by another transaction before it is changed back to its original
value. Figure 3b shows an example where T1 updates item X and then fails before
completion, so the system must change X back to its original value. Before it can do so,
however, transaction T2 reads the "temporary" value of X, which will not be recorded
permanently in the database because of the failure of T1.The value of item X that is
read by T2 is called dirty data, because it has been created by a transaction that has
not completed and committed yet; hence, this problem is also known as the dirty read
problem.
The Incorrect Summary Problem: If one transaction is calculating an aggregate
summary function on a number of records while other transactions are updating some of
these records, the aggregate function may calculate some values before they are
updated and others after they are updated. For example, suppose that a transaction T3
Prepared By Nirjharinee Parida
Database Management System
is calculating the total number of reservations on all the flights; meanwhile, transaction
T1 is executing. If the interleaving of operations shown in Figure 3c occurs, the
result of T3 will be off by an amount N because T3 reads the value of X after N seats
have been subtracted from it but reads the value of Y before those N seats have been
added to it. Another problem that may occur is called unrepeatable read, where a
transaction T reads an item twice and the item is changed by another transaction T'
between the two reads. Hence, T receives different values for its two reads of the same
item.
Why Recovery Is Needed
Whenever a transaction is submitted to a DBMS for execution, the system is
responsible for making sure
that either (1) all the operations in the transaction are completed successfully and their
effect is recorded permanently in the database, or (2) the transaction has no effect
whatsoever on the database or on any other transactions. The DBMS must not permit
some operations of a transaction T to be applied to the database while other operations
of T are not. This may happen if a transaction fails after executing some of its
operations but before executing all of them.
Types of Failures: Failures are generally classified as transaction, system, and media
failures. There are several possible reasons for a transaction to fail in the middle of
execution:
• A computer failure (system crash): A hardware, software, or network error occurs in
the computer system during transaction execution. Hardware crashes are usually media
failuresfor example, main memory failure.
• A transaction or system error: Some operation in the transaction may cause it to fail,
such as integer overflow or division by zero. Transaction failure may also occur because
of erroneous parameter values or because of a logical programming error.' In addition,
the user may interrupt the transaction during its execution.
• Local errors or exception conditions detected by the transaction: During
transaction execution, certain conditions may occur that necessitate cancellation of the
Prepared By Nirjharinee Parida
Database Management System
transaction. For example, data for the transaction may not be found. Notice that an
exception condition," such as insufficient account balance in a banking database, may
cause a transaction,such as a fund withdrawal, to be canceled. This exception should
be programmed in the transaction itself, and hence would not be considered a failure.
• Concurrency control enforcement: The concurrency control method may decide to
abort the transaction, to be restarted later, because it violates serializability or because
several transactions are in a state of deadlock.
• Disk failure: Some disk blocks may lose their data because of a read or write
malfunction or because of a disk read/write head crash. This may happen during a read
or a write operation of the transaction.
• Physical problems and catastrophes: This refers to an endless list of problems that
includes power or air-conditioning failure, fire, theft, sabotage, overwriting disks or tapes
by mistake, and mounting of a wrong tape by the operator.
Figure 4: State transition diagram illustrating the states for transaction execution
The System Log
To be able to recover from failures that affect transactions, the system maintains a log
to keep track of all transaction operations that affect the values of database items. This
information may be needed to permit recovery from failures. We now list the types of
entries-called log records-that are written to the log and the action each performs. In
these entries, T refers to a unique transaction-id that is generated automatically by the
system and is used to identify each transaction:
Prepared By Nirjharinee Parida
Database Management System
[start-transaction,T]: Indicates that transaction T has started execution.
• [write_item, T, X, old_value, new_value]: Indicates that transaction T has changed the
value of database item X from old_value to new_value.
• [read_item,T,X]: Indicates that transaction T has read the value of database item X.
• [commit,T]: Indicates that transaction T has completed successfully, and affirms that
its
effect can be committed (recorded permanently) to the database.
• [abort,T]: Indicates that transaction T has been aborted.
Commit Point of a Transaction
A transaction T reaches its commit point when all its operations that access the
database have been executed successfully and the effect of all the transaction
operations on the database have been recorded in the log. Beyond the commit point,
the transaction is said to be committed, and its effect is assumed to be permanently
recorded in the database. The transaction then writes a commit record [commit,T] into
the log. If a system failure occurs, we search back in the log for all transactions T that
have written a [start_transaction,T] record into the log but have not written their
[commit,T] record yet; these transactions may have to be rolled back to undo their effect
on the database during the recovery process.
Prepared By Nirjharinee Parida
Database Management System
CHARACTERIZING SCHEDULES BASED ON RECOVERABILITY
When transactions are executing concurrently in an interleaved fashion, then the order
of execution of operations from the various transactions is known as a schedule (or
history).
Schedules (Histories) of Transactions
A schedule (or history) S of n transactions T1, T2, ... , Tn is an ordering of the
operations of the transactions subject to the constraint that, for each transaction Ti that
participates in S, the operations of Ti in S must appear in the same order in which they
occur in Ti.For the purpose of recovery and concurrency control, we are mainly
interested in the read, item and write_item operations of the transactions, as well as the
commit and abort operations. A shorthand notation for describing a schedule uses the
symbols r, w, c, and a for the operations read_item, write_item, commit, and abort,
respectively, and appends as subscript the transaction id (transaction number) to each
operation in the schedule. In this notation, the database item 'X that is read or written
follows the rand w operations in parentheses. For example, the schedule of Figure 3(a),
which we shall call Sa, can be written as follows in this notation:
Sa: r1(X); r2(X); W1(X); r1(Y); w2(X); W1(Y);
Two operations in a schedule are said to conflict if they satisfy all three of the following
conditions: (1) they belong to different transactions; (2) they access the same item X;
and (3) at least one of the operations is a write_item(X). A schedule S of n transactions
T1 , T2, ••• , Tn, is said to be a complete schedule if the following conditions
hold: 1. The operations in S are exactly those operations in T1, T2, •.• , Tn, including a
commit or abort operation as the last operation for each transaction in the schedule.
2. For any pair of operations from the same transaction Ti, their order of appearance in
S is the same as their order of appearance in T;
3. For any two conflicting operations, one of the two must occur before the other in the
schedule.
For some schedules it is easy to recover from transaction failures, whereas for other
schedules the recovery process can be quite involved. Hence, it is important to
characterize the types of schedules for which recovery is possible, as well as those for
Prepared By Nirjharinee Parida
Database Management System
which recovery is relatively simple. First, we would like to ensure that, once a
transaction T is committed, it should never be necessary to roll back T. The
schedules that theoretically meet this criterion are called recoverable schedules and
those that do not are called non recoverable, and hence should not be permitted. A
schedule S is recoverable if no transaction T in S commits until all transactions T' that
have written an item that T reads have committed.
Recoverable schedules require a complex recovery process, but if sufficient information
is kept (in the log), a recovery algorithm can be devised. In a recoverable schedule, no
committed transaction ever needs to be rolled back. However, it is possible for a
phenomenon known as cascading rollback (or cascading abort) to occur, where an
uncommitted transaction has to be rolled back because it read an item from a
transaction that failed. Because cascading rollback can be quite time-consuming-since
numerous transactions can be rolled back it is important to characterize the schedules
where thisphenomenon is guaranteed not to occur. A schedule is said to be
cascadeless, or to avoid cascading rollback, if every transaction in the schedule reads
only items that were written by committed transactions.
Finally, there is a third, more restrictive type of schedule, called a strict schedule, in
which transactions can neither read nor write an item X until the last transaction that
wrote X has committed (or aborted). Strict schedules simplify the recovery process. In a
strict schedule, the process of undoing a write_item(X) operation of an aborted
transaction is simply to restore the before image (old_value or BFIM) of data item X.
This simple procedure always works correctly for strict schedules, but it may not
work for recoverable or cascadeless schedules.
LOCKING TECHNIQUES FOR CONCURRENCY CONTROL
A lock is a variable associated with a data item that describes the status of the item
with respect to possible operations that can be applied to it. Generally, there is one lock
for each data item in the database. Locks are used as a means of synchronizing the
access by concurrent transactions to the database items.
Prepared By Nirjharinee Parida
Database Management System
Types of Locks and System Lock Tables
Several types of locks are used in concurrency control.
Binary Locks : A binary lock can have two states or values: locked and unlocked (or 1
and 0, for simplicity). A distinct lock is associated with each database item X. If the
value of the lock on X is 1, item 'X cannot be accessed by a database operation that
requests the item. If the value of the lock on X is 0, the item can be accessed when
requested. We refer to the current value (or state) of the lock associated with item X as
LOCK(X).
Two operations, lock_item and unlock_item, are used with binary locking. A transaction
requests access to an item X by first issuing a lock_item(X) operation. If LOCK(X) = 1,
the transaction is forced to wait. If LOCK(X) = 0, it is set to 1 (the transaction locks the
item) and the transaction is allowed to access item X.When the transaction is through
using the item, it issues an un1ock_i tem(X) operation, which sets LOCK(X) to
0(unlocks the item) so that 'X may be accessed by other transactions. Hence, a binary
lock enforces mutual exclusion on the data item.
If the simple binary locking scheme described here is used, every transaction must obey
the following rules:
1. A transaction T must issue the operation 1ock_i tem(X) before any read_i tem(X) or
wri te_item(X) operations are performed in T.
2. A transaction T must issue the operation unlock_i tem(X) after all read_i tem(X) and
wri te_item(X) operations are completed in T.
3. A transaction T will not issue a lock_i tem(X) operation if it already holds the lock on
item X.
4. A transaction T will not issue an unlock_i tern(X) operation unless it already holds the
lock on item X.
Shared/Exclusive (or Read/Write) Locks: The preceding binary locking scheme is too
restrictive for database items, because at most one transaction can hold a lock on a
given item. We should allow several transactions to access the same item X if they all
access X for reading purposes only. However, if a transaction is to write an item X, it
Prepared By Nirjharinee Parida
Database Management System
must have exclusive access to X. For this purpose, a different type of lock called a
multiple mode lock is used. In this scheme-called shared/exclusive or read/write
locksthere are three locking operations: read_lock(X), write_lock(X), and unlock(X). A
lock associated with anitem X, LOCK(X), now has three possible states: "read-locked,"
"writelocked," or "unlocked." A readlocked item is also called share-locked, because
other transactions are allowed to read the item, whereas a write-locked item is called
exclusive-locked, because a single transaction exclusively holds the lock on the item.
When we use the shared/exclusive locking scheme, the system must enforce the
following rules:
1. A transaction T must issue the operation read_lock(X) or wri te_l ock(X) before any
read_i tem(X) operation is performed in T.
2. A transaction T must issue the operation wri te_l ock(X) before any wri te_i tem(X)
operation is performed in T.
3. A transaction T must issue the operation unlock(X) after all read_i tem(X) and wri te_i
tem(X) operations are completed in T.
4. A transaction T will not issue a read_lock(X) operation if it already holds a read
(shared) lock or a write (exclusive) lock on item X.
5. A transaction T will not issue a wri te_l ock(X) operation if it already holds a read
(shared) lock or write (exclusive) lock on item X. This rule may be relaxed.
6. A transaction T will not issue an unlock(X) operation unless it already holds a read
(shared) lock or a write (exclusive) lock on item X.
Guaranteeing Serializability by Two-Phase locking
A transaction is said to follow the two-phase locking protocol if all locking operations
(read_lock, write_lock) precede the first unlock operation in the transaction. Such a
transaction can be divided into two phases: an expanding or growing (first) phase,
during which new locks on items can be acquired but none can be released; and a
shrinking (second) phase, during which existing locks can be released but no new
locks can be acquired.
Prepared By Nirjharinee Parida
Database Management System
Basic, Conservative, Strict, and Rigorous Two-Phase Locking: There are a number
of variations of twophase locking (2PL). The technique just described is known as basic
2PL. A variation known as conservative 2PL (or static 2PL) requires a transaction to
lock all the items it accesses before the transaction begins execution, by predeclaring
its readset and write-set.
The read-set of a transaction is the set of all items that the transaction reads, and the
write-set is the set of all items that it writes. If any of the predeclared items needed
cannot be locked, the transaction does not lock any item; instead, it waits until all the
items are available for locking.
A more restrictive variation of strict 2PL is rigorous 2PL, which also guarantees strict
schedules. In this variation, a transaction T does not release any of its locks (exclusive
or shared) until after it commits or aborts, and so it is easier to implement than strict
2pL. Notice the difference between conservative and rigorous 2PL; the former must lock
all its items before it starts so once the transaction starts it is in its shrinking phase,
whereas the latter does not unlock any of its items until after it terminates (by
committing or aborting) so the transaction is in its expanding phase until it ends.
Dealing with Deadlock and Starvation
Deadlock occurs when each transaction T in a set of two or more transactions is waiting
for some item that is locked by some other transaction T' in the set. Hence, each
transaction in the set is on a waiting queue, waiting for one of the other transactions in
the set to release the lock on an item.
Deadlock Prevention Protocols: One way to prevent deadlock is to use a deadlock
prevention protocol.P One deadlock prevention protocol, which is used in conservative
two-phase locking, requires that every transaction lockall the items it needs in advance
(which is generally not a practical assumption)-if any of the items cannot be obtained,
none of the items are locked. Rather, the transaction waits and then tries again to lock
all the items it needs. This solution obviously further limits concurrency.
Prepared By Nirjharinee Parida
Database Management System
A number of other deadlock prevention schemes have been proposed that make a
decision about what to do with a transaction involved in a possible deadlock situation:
Should it be blocked and made to wait or should it be aborted, or should the transaction
preempt and abort another transaction?
The rules followed by these schemes are as follows:
• Wait-die: If TS(T) < TS (Tj ) , then (Tj older than Tj ) Tj is allowed to wait; otherwise (Tj
younger than T) abort T1 (T, dies) and restart it later with the same timestamp.
• Wound-wait: If TS(T) < TS(Tj ) , then (T, older than Tj ) abort Tj (T, wounds Tj ) and
restart it later
with the same timestamp; otherwise (T, younger than T) Tj is allowed to wait.
A second-more practical-approach to dealing with deadlock is deadlock detection,
where the system checks if a state of deadlock actually exists. This solution is attractive
if we know there will be little interference among the transactions-that is, if different
transactions will rarely access the same items at the same time. A simple scheme to
deal with deadlock is the use of timeouts, This method is practical because of its low
overhead and simplicity. In this method, if a transaction waits for a period longer than a
system-defined timeout period, the system assumes that the transaction may be
deadlocked and aborts it-regardless of whether a deadlock actually exists or not.
Starvation: Another problem that may occur when we use locking is starvation, which
occurs when a transaction cannot proceed for an indefinite period of time while other
transactions in the system continue normally. This may occur if the waiting scheme for
locked items is unfair, giving priority to some transactions over others. One solution for
starvation is to have a fair waiting scheme, such as using a first-come-first-served
queue; transactions are enabled to lock an item in the order in which they originally
requested the lock.
Prepared By Nirjharinee Parida
Database Management System
CONCURRENCY CONTROL BASED ON TIMESTAMP ORDERING
A timestamp is a unique identifier created by the DBMS to identify a transaction.
Typically, timestamp values are assigned in the order in which the transactions are
submitted to the system, so a timestamp can be thought of as the transaction start time.
We will refer to the timestamp of transaction T as TS(T)
Concurrency control techniques based on timestamp ordering do not use locks; hence,
deadlocks cannot occur. The idea for this scheme is to order the transactions based on
their timestamps. A schedule in which the transactions participate is then serializable,
and the equivalent serial schedule has the transactions in order of their timestamp
values. This is called timestamp ordering (TO). The timestamp algorithm must ensure
that, for each item accessed by conflicting operations in the schedule, the order in which
the item is accessed does not violate the serializability order.
To do this, the algorithm associates with each database item X two timestamp (TS)
values:
1. Read_TS(X): The read timestamp of item Xi this is the largest timestamp among all
the timestamps of transactions that have successfully read item X-that is, read_TS(X) =
TS(T), where T is the youngest transaction that has read X successfully.
2. Write_TS(X): The write timestamp of item Xi this is the largest of all the timestamps of
transactions that have successfully written item X-that is, write_TS(X) = TS(T), where T
is the youngest transaction that has written X successfully.
Whenever some transaction T times to issue a read item(X) or a write_item(X)
operation, the basic TO algorithm compares the timestamp of T with read_TS(X) and
write_TS(X) to ensure that the timestamp order of transaction execution is not violated.
If this order is violated, then transaction T is aborted and resubmitted to the system as a
new transaction with a new timestamp. If T is aborted and rolled back, any transaction T
1 that may have used a value written by T must also be rolled back. Similarly, any
transaction T2 that may have used a value written by T1 must also be rolled back, and
so on. This effect is known as cascading rollback and is one of the problems associated
with basic TO, since the schedules produced are not guaranteed to be recoverable.
Prepared By Nirjharinee Parida
Database Management System
We first describe the basic TO algorithm here. The concurrency control algorithm must
check whether conflicting operations violate the timestamp ordering in the following two
cases:
1. Transaction T issues a write_item(X) operation:
a) If read_TS(X) > TS(T) or if write_TS(X) > TS(T), then abort and roll back T and reject
the operation. This should be done because some younger transaction with a
timestamp greater than TS(T)-and hence after T in the timestamp ordering-has already
read or written the value of item X before T had a chance to write X, thus violating the
timestamp ordering.
b) If the condition in part (a) does not occur, then execute the wri te_i tem(X) operation
ofT and set write_TS(X) to TS(T).
2. Transaction T issues a read_item(X) operation:
a) If write_TS(X) > TS(T), then abort and roll back T and reject the operation. This
should be done because some younger transaction with timestamp greater than TS(T)and hence after T in the timestamp ordering-has already written the value of item X
before T had a chance to read X.
b) If write_TS(X) ≤TS(T), then execute the read_item(X) operation of T and set
read_TS(X) to the larger of TS(T) and the current read_TS(X).
DATABASE RECOVERY
Database recovery is the process of restoring the database to a correct state following
a failure. The failure may be the result of a system crash due to hardware or software
errors, a media failure, such as a head crash, or a software error in the application, such
as a logical error in the program that is accessing the database. It may also be the
result of unintentional or intentional corruption or destruction of data. Whatever the
underlying cause of the failure, the DBMS must be able to recover from the failure and
restore the database to a consistent state. It is the responsibility of DBMS to ensure that
the database is reliable and remains in a consistent state in the presence of failures. In
general,backup and recovery refers to the various strategies and procedures involved in
Prepared By Nirjharinee Parida
Database Management System
protecting your database against data loss and reconstructing the data such that no
data is lost after failure.
Thus, recovery scheme is an integral part of the database system that is responsible
for the restoration of the database to a consistent state that existed prior to the
occurrence of the failure.
Data Storage
The storage of data generally includes four different types of media with an increasing
degree of reliability. These are:
¨ Main memory
¨ Magnetic disk
¨ Magnetic tape
¨ Optical disk.
Stable Storage
There is a Stable Storage in which information is never lost. Stable storage devices are
the theoretically impossible to obtain. But, we must use some technique to design a
storage system in which the chances of data loss are extremely low. The most
important information needed for whole recovery process must be stored in stable
storage.
Implementation of Stable Storage
" ¨ RAID (Redundant Array of Independent Disk) is commonly used for implementation
of stable storage. In case of RAID,information is replicated on several disks in form of
array and each disk has independent failure modes. So, in this case failure of single
disk does not result in loss of data. RAID system however cannot prevent data loss in
case disasters like fires or flooding.
•
" ¨ In order to prevent from disasters like fire or flooding, the system storage on archival
backup of tapes sent to off site however, since tapes cannot be sent off site
continuously and hence some data may be lost in disasters like fire or flooding.
•
Prepared By Nirjharinee Parida
Database Management System
" ¨ In order to remove above problem, more secure system keep a copy of each block of
a stable storage at a remote site writing it out over a computer network in addition to
storing the block on a local disk.
Causes of failures
 System Crashes
 User Error
 Carelessness
 Sabotage (intentional corruption of data)
 Statement Failure
 Application software errors
 Network Failure
 Media Failure
 Natural Physical Disasters
Recovery from failures
"¨ Actions taken during normal transaction processing to ensure enough
information exists to allow recovery from failure.
"¨ Actions taken following a failure to recover the database to a state that is known
to be correct.
Recovery and Atomicity of a Transaction
In order to understand the concept of atomicity of a transactions, consider again a
simplified banking transactions that transfer Rs 50 from account A to account B with
initial values of A and B being Rs 1000 and Rs 2000 respectively.
Prepared By Nirjharinee Parida
Database Management System
T1
Read(A,a)
a=a-50
Write(A,a)
Output(AX)
Read(B,b)
b=b+50
Write(B,b)
Output(BX)
In this example we suppose that after each write operation output operation is erformed
or in other-words we can say that this is an example of Force-Output operation.
Suppose that a system crash occurs during the execution of transaction Ti after Output
(AX) has been taken place but before output (BX) was executed, here AX and BX are
the buffer block which contains the values of data item A and B. Since output (AX)
operation performed successfully it means that data item A has value 950 in disk and
output (BX) does not performed successfully so its value remains 2000. It means that
now system enters into inconsistent state because there is a loss of Rs 50 after
execution of transaction T1.
•In order to recover for above problem there are two simple means:
(i) Re-execute T1
If transaction T1 is re-executed successfully then value of A is Rs 900 and B is Rs 2050.
It results again incorrect information because value A is Rs 900 rather than Rs 950.
(ii) Do not Re-execute T1
The current state of system has A =950 and B=2000 which again an inconsistent state.
Thus, in either case database is left in an inconsistent state, so this simplerecovery
scheme does not work. The reason for this difficulty is that we have modified the
database without having assurance that the transaction will be indeed commit or
completed.
Prepared By Nirjharinee Parida
Database Management System
We have to perform either all or no database modification made by Ti. This idea gives
the concept of Atomicity of Transaction. It means that either 100% or 0% of database
modification should be performed by the transaction. If the transaction is not completed
successfully then some modification performed by transaction stored in database and
others not, then database becomes inconsistent. Thus in order to achieve the
consistency of database transaction should have the property of Atomicity.To achieve
our goal of atomicity, we must first output information regarding the modification to a
stable storage without modifying the database. There are two ways to perform such
outputs. These are:
¨ Log Based Recovery
¨ Shadow Paging
Log Based Recovery
It is most used structure for recording database modification. In log based recovery a
log file is maintained for recovery purpose.Log file is a sequence of log records. Log
Record maintain a record of all the operations (update) of the database. There are
several types of log record.
<Start> Log Record:
Contain information about the start of each transaction. It has transaction identification.
Transaction
identifier
is
the
unique
identification
of
the
transaction
that
starts.Representation
<Ti , start>
<Update> Log Record:
It describes a single database write and has the following fields:
< Ti, Xj, V1,V2 >
Here, Ti is transaction identifier, Xj is the data item, V1 is the old value of data item and
V2 is the modified or new value of the data item Xj.
For example < T0, A, 1000, 1100 >
Here, a transaction T0 perform a successful write operation on data Item
A whose old value is 1000 and after write operation A has value 1100.
Prepared By Nirjharinee Parida
Database Management System
<Commit> Log Record
When a transaction Ti is successfully committed or completed a <Ti,commit> log record
is stored in the log file.
<Abort> Log Record
When a transaction Ti is aborted due to any reason, a <Ti, abort> log record is stored in
the log file. Whenever a transaction performs a write A, it is essential that the log
record for that write be created before the database is modified. Once a log record
exists, we have the ability to undo a transaction.
Example Suppose that a system crash occurs during the execution of transaction Ti
after Output (AX) has been taken place but before output (BX) was executed, here AX
and BX are the buffer block which contains the values of data item A and B.Since output
(AX) operation performed successfully it means that data item A has value 950 in disk
and output (BX) does not performed successfully so its value remains 2000. It means
that now system enters into inconsistent state because there is a loss of Rs 50 after
execution of transaction Ti.Then in order to recover the data log record play very
important role.
We just copy the old value of A i.e. 1000 from the log record to the database stored in
disk, the operation of taking the old value of the data item from the log record is called
Undo operation. Since log record plays a vital role for the recovery of the database so
the log must reside in stable storage. Here, we assume that every log record
is written to stable storage as soon as it is created.
There are two techniques for log-based recovery:
¨ Deferred Database Modification
¨ Immediate Database Modification
Prepared By Nirjharinee Parida
Database Management System
Deferred Database Modification
It ensures transaction atomicity by recording all database modifications in the log, but
deferring the execution of all write operations of a transaction until the transaction
partially commits. A transaction is said to be partially committed once the final action of
the transaction has been executed. When a transaction has performed all the actions,
then the information in the log associated with the transaction is used in executing the
deferred writes. In other words, at partial commits time logged updates are “replayed”
into database item. It means that in this case during write operation the modified values
of local variable are not copied into database items, but the corresponding new and old
values is stored in the log record. But when the transaction successfully performed all
the operations, then the information stored at log record is used to set the value of data
items, but if the transaction fail to complete, then the value of data item retain their old
values and hence in that case information on the log record is simply ignored.
The execution of transaction Ti proceeds as follows:
< Ti, Start > Before Ti starts its execution, a log record is written to the log file.
< Ti, A, V2> The write operation by Ti results in the writing of new records to the log.
This record indicates the new value of A i.e. V2 after the write operation performed by
Ti.
< Ti, Commit > When Ti partially commits this record is written to the log.
It is important to note that only the new value of the data item is required
by the deferred modification technique because if the transaction fails
before the commit then log record is simply ignored because data item
already contain old values due to deferred write operation. But if the transaction commit
the new value present in the log record are copied to the data base item. Thus log
record contain only the new values of data item, not the old values of data item. This
approach save the stable storage space and reduce the complexity of log record.
Prepared By Nirjharinee Parida
Database Management System
Example
We reconsider our simplified banking system. Let T1 be a transaction that transfers Rs
100 from account A to account B.This transaction is defined as follows:
T1
Read (A,a)
a=a-100
write(A,a)
Read(B,b)
b=b+100
Write(B,b)
Example
We reconsider our simplified banking system. Let T1 be a transaction that transfers Rs
100 from account A to account B.This transaction is defined as follows:
T1
Read (A,a)
a=a-100
write(A,a)
Read(B,b)
b=b+100
Write(B,b)
Prepared By Nirjharinee Parida
Database Management System
Recovery procedure
The recovery procedure of deferred database modification is based on Redo operation
as explain below:
Redo(Ti)
It sets the value of all data items updated by transaction Ti to the new values from the
log of records. After a failure has occurred the recovery subsystem consults the log to
determine which transaction need to be redone. Transaction Ti needs to be redone if an
only if the log contain both the record <Ti, start> and the record <Ti, commit>. Thus, if
the system crashes after the transaction completes its execution, then the information in
the log is used in restoring the system to a previous consistence state.
Prepared By Nirjharinee Parida
Database Management System
Conclusion
Redo: If the log contain both records <Ti, start> and
<Ti, commit>
This transaction may be or may not be stored to disk physically. So use Redo operation
to get the modified values from the log record.
Re-execute: In all other cases ignore the log record and re-execute
the transaction.
Immediate Database Modification
The immediate database modification technique allows database modification to be
output to the database while the transaction is still in the active state. The data
modification written by active transactions are called “uncommitted modification”.
If the system crash or transaction aborts, then the old value field of the log records is
used to restore the modified data items to the value they had prior to the start of the
transaction. This restoration is accomplished through the undo operation. In order to
understand undo operations, let us consider the format of log record.
<Ti, Xj, Vold, Vnew>
Here, Ti is transaction identifier, Xj is the data item, Vold is the old value of data item
and Vnew is the modified or new value of the data item Xj.
Checkpoints
To reduce these types of overhead, we introduce Checkpoints To recover the database
after some failure we must consult the log record to determine which transaction needs
to be undone and redone. For this we need to search the entire log to determine this
information. There are two major problems with this approach. These are:
¨ The search process is time-consuming.
¨ Most of the transactions need to redone have already written their
updates into the database. Although redoing them will cause no harm, but it will make
recovery process more time consuming.
The system periodically performs Checkpoints.
Prepared By Nirjharinee Parida
Database Management System
Actions Performed During Checkpoints
¨ Output onto stable storage all log records currently residing in main memory.
¨ Output on the disk all modified buffer blocks. ¨ Output onto stable storage a log record
<checkpoint>. The presence of a <checkpoint> record makes recovery process more
streamline. Consider a transaction Ti that committed prior to the Checkpoint, means that
<Ti, Commit> must appear in the log before the <checkpoint> record. Any database
modifications made by Ti must have been written to the database either prior to the
checkpoint or as part of checkpoint itself. Thus, at recovery time, there is no need to
perform a redo operation on Ti.
The checkpoint record gives a list of all transactions that were in progress at the time
the checkpoint was taken. Thus, the checkpoints help the system to provide information
at restart time which transaction to undo and which to redo.
Prepared By Nirjharinee Parida
Database Management System
· A system failure has occurred at time tf. ·
The most recent checkpoint prior to time tf was taken at time tc.
· Transactions of type T1 completed (successfully) prior to time tc.
· Transactions of type T2 started prior to time tc and completed (successfully) after time
tc and before time tf
· Transactions of type T3 also started prior to time tc but did not complete by time tf.
· Transactions of type T4 started after time tc and completed (successfully) before time
tf.
· Finally, transactions of type T5 also started after time tc, but did not complete by time
tf.
Prepared By Nirjharinee Parida
Database Management System
It should be clear that, in case of immediate modification technique those transactions
that have <Ti, start> and <Ti, commit> must be redo and those transactions that have
only <Ti, start> and no <Ti, commit> must be undo.
Thus, when the system is restarted in case of immediate database modification,
transactions of types T3 and T5 must be undone, and transactions of types T2 and T4
must be redone. Note, however that transactions of type T1 do not enter in the restart
process at all, because their updates were forced to the database at time tc as part of
the checkpoint process.
As, we assumed earlier that every log record is output to stable storage at the time it is
created. This assumption imposes a high overhead on system execution for the
following reasons:
· Output to stable storage is performed in units of blocks. In most cases, a log record is
much smaller than a block. Thus, the output of each log record translates to a much
larger output at the physical level.
· The cost of performing the output of a block to storage is sufficiently high that it is
desirable to output multiple log records at once.
To do so, we write log records to a log buffer in main memory, where they stay
temporarily until they are output to stable storage. Multiple log records can be gathered
in the log buffer and output to stable storage in a single output operation. The order of
log records in the stable storage must be exactly the same as the order in which they
were written to the log buffer.
Due to the use of log buffering a log record may reside in only main memory (volatile
storage) for a considerable time before it is output to stable storage. Since such log
records are lost if the system crashes, we must impose additional requirements on the
recovery techniques to ensure transaction atomicity.
· Transaction Ti enters the commit state after the <Ti commit> log record has been
output to stable storage.
Prepared By Nirjharinee Parida
Database Management System
· Before the <Ti commit> log record can be output to stable storage,all log records
pertaining to transaction Ti must have been output tostable storage. · Before a block of
data in main memory can be output to the database (in nonvolatile storage), all log
records pertaining to data in that block must have been output to stable storage.
The latter rule is called the write-ahead logging (WAL) rule.
The Write-Ahead Log Protocol
Before writing a page to disk, every update log record that describes a change to this
page must be forced to stable storage. This is accomplished by forcing all log records to
stable storage before writing the page to disk.
WAL is the fundamental rule that ensures that a record of every change to the database
is available while attempting to recover from a crash. If a transaction makes a change
and committed, the no-force approach means that some of these changes may not
have been written to disk at the time of a subsequent crash. Without a record of these
changes, there would be no way to ensure that the changes of a committed transaction
survive crashes. Note that the definition of a committed transaction is effectively “a
transaction whose log records, including a commit record, have all been written to
stable storage”. When a transaction is committed, the log detail is forced to stable
storage, even if a no-force approach is being used.
Example
The database is stored in nonvolatile storage (disk) and blocks of data are brought into
main memory as needed. Since main memory is typically much smaller than the entire
database, it may be necessary to overwrite a block B1 in main memory when another
block B2 needs to be brought into memory. If B1 has been modified B1 must be output
prior to the input of B2. The rules for the output of log records limit the freedom of the
system to output blocks of data. If the input of block B2 causes block B1 to be chosen
for output, all log records pertaining to data in B1 must be output to stable storage
before B1 is output.
Thus, the sequence of actions by the system would be as follows:
Prepared By Nirjharinee Parida
Database Management System
· Output log records to stable storage until all log records pertaining to block B1 have
been output.
· Output block B1 to disk.
· Input block B2 from disk to main memory.
It is important that no writes to the block B1 be in progress while the
preceding sequence of actions is carried out.
Failure with Loss of Nonvolatile Storage
Until now, we have considered only the case where a failure results in the loss of
information residing in volatile storage while the content of the nonvolatile storage
remains intact. Although failures in which the content of nonvolatile storage is lost are
rare we nevertheless need to be prepared to deal with this type of failure. In this section,
we discuss only disk storage.
The basic scheme is to dump the entire content of the database to stable storage
periodically-say, once per day. For example we may dump the database to one or more
magnetic tapes. If a failure occurs that results in the loss of physical database blocks,
the most recent dump is used in restoring the database to a previous consistent state.
Once this restoration has been accomplished, the system uses the log to bring the
database system to the most recent consistent state.
Prepared By Nirjharinee Parida
Database Management System
Module-III
Short Questions
2008
1. What are the two techniques to prevent deadlock? [2][Concurrency control]
2. What is meant by Concurrency? [2][Concurrency control]
2009
1. What do you mean by atomicity? Explain with example. [2][Transaction processing]
2. What is meant by two phase commit protocol? [2][Concurrency control]
3. Explain one of the pessimistic concurrency control scheme with example. [2][Concurrency control]
4. What is the difference between REDO and UNDO operation? [2][Database recovery]
5. What is mean by concurrency? [2][Concurrency control]
6. What do you mean by serializability in transaction processing? [2][Concurrency control]
2010
1. What do you mean by ACID properties ofa transaction? [2][Transaction processing]
2. Forthefollowing operations:
T1 : read (x) ;
x=x+10;
write(x);
T2: read(x);
read(x);
For simultaneous execution Which problem can happen? [2][Transaction processing]
3. What is a timestamp? If TS (Ti)>TS (Tj)then which transaction is younger? Justify. Consider TS (Ti)
is the timestamp of transaction Ti. [2][Transaction processing]
2011
1. What is serial schedule? [2][Transaction processing]
2. What is the difference between concurrency control and crash control? [2][Concurrency control]
2012
1. What is transaction atomicity? [2][Transaction processing]
2. What is two phase locking? [2][Concurrency control]
Prepared By Nirjharinee Parida
Database Management System
Long Questions
2008
1. Explain difference between Implicit and Explicit locks. Give examples to support your answer.
[5][Concurrency control]
2. Define ‘‘Data mining’’. What are the supports must available with DBMS to Facilitate data mining?
[5][Data mining]
2009
1. Define transaction. Explain different states of a transaction. Differentiate between chained transaction
and nested transaction. Discuss their advantages and disadvantages. [5][Data mining]
2. What do you mean by concurrent operations? List two disadvantages of it. Discuss the solutions for
the problem occurs due to concurrency. [5][Concurrency control]
3. Explain the ACID properties associated with database transaction s. What is the lost update problem?
[5][Transaction processing]
4. Write short notes on any two:
a. OLAP and OLTP
b. Exclusive Lock vs. Shared Lock
c. Dead Lock [5][Transaction processing]
2010
1. What is collision? Discuss the various collision resolution techniques. [5][Database recovery]
2. What do you mean by Locking? Explainthe Two phase locking with an example.
[5][Concurrency control]
3. Write short notes on : a. Double Buffering b. Parallel Databases [5][Database recovery]
2011
1. Give an account of various types of database failure and methods to recover data. Explain with
examples.
[5][Database recovery]
2. Give an account of various locks under concurrency control. [5][Concurrency control]
3. Define transaction. Explain ACID properties with example. [5][Transaction processing]
4. Define ‘‘Data mining’’. What are the supports must available with DBMS to facilitate data mining?
[5][Transaction processing]
1.
2.
3.
4.
2012
Differentiate between the following terms:
i.
Locking and Timestamp based Schedulers. [5][Concurrency control]
What do you mean by a transaction? How can you implement atomicity in transactions? Explain.
[5][Transaction processing]
Describe the concept of serializability with suitable example. [5][Transaction processing]
Explain Armstrong’s axioms with ACID properties. State their usefulness. [5][Transaction processing]
State the types of database failure and explain the corresponding database recovery
technique.
[5][Database recovery]
Prepared By Nirjharinee Parida
Database Management System
Fourth semester-2009
Relational Database system
1.a. What do you mean by atomicity? Explain With example.
Ans:
Atomicity. Either all operations of the transaction are reflected properly in the database, or none are.
Example:
Ti:
Read (A)
A=A-50
Write (A)
Read (B)
B=B+50
Write (B)
Whenever Ti is executed, either it should be executed fully or not all. Suppose the system fails at
a point when write (A) has been performed, but write (B) is yet to performed, the value of A+B, as
reflected in the database, would be deficit by Rs. 50, thus violating the consistency requirement. Now the
solution would be to roll back the transaction and thus reverting to the old consistency sate is achieved by
reverting A to its old value. The old value of A is retrieved from the log-file.
1.b.
What is difference between multi valued attributes and derived attribute.
Multi valued attribute:
The attribute having more than one value is multi-valued attribute
eg: phone-no, dependent name, vehicle
Derived Attribute:
The values for this type of attribute can be derived from the values of existing attributes
eg:
age which can be derived from (currentdate-birthdate)
experience_in_year can be calculated as (currentdate-joindate)
1.c What is difference between procedural and non procedural language.
Ans:
In dbms the Procedural language specifies how to output of query statement must be obtained
Example:
Relational algebra
In dbms the Non-procedural language specifies what output is to be obtained.
Example:
SQL,QBE
Prepared By Nirjharinee Parida
Database Management System
1.d. What is generalization? How it differs from specialization.
Ans:
Generalization is the abstracting process of viewing set of objects as a single general class by
concentrating on the general characteristics of the constituent sets while suppressing or ignoring their
differences.
Specialization is the abstracting process of introducing new characteristics to an existing class of objects
to create one or more new classes of objects. This involves taking a higher-level, and using additional
characteristics, generating lower-level entities. The lower-level entities also inherits the, characteristics
of the higher-level entity.
1.e. What is meant by two phase commit.
Ans:
When T completes its execution—that is, when all the sites at which T has executed inform Ci that T has
completed—Ci starts the 2PC protocol.
• Phase 1. Ci adds the record<prepare T> to the log, and forces the log onto stable storage. It then sends a
prepare T message to all sites at which T executed. On receiving such a message, the transaction manager at that
site determines whether it is willing to commit its portion of T. If the answer is no, it adds a record <no T> to the
log, and then responds by sending an abort T message to Ci. If the answer is yes, it adds a record <ready T> to the
log, and forces the log (with all the log records corresponding to T) onto stable storage. The transaction manager
then replies with a ready T message to Ci.
• Phase 2. When Ci receives responses to the prepare T message from all the sites, or when a prespecified interval
of time has elapsed since the prepare T message was sent out, Ci can determine whether the transaction T can be
committed or aborted. Transaction T can be committed if Ci received a ready T message from all the participating
sites. Otherwise, transaction T must be aborted. Depending on the verdict, either a record <commit T> or a record
<abort T> is added to the log and the log is forced onto stable storage. At this point, the fate of the transaction has
been sealed.
1.f. Explain one of the pessimistic concurrency control scheme with example.
Ans:
Locking and time stamp ordering are pessimistic in that they force a wait or a rollback when ever a
conflict is detected, even though there is a chance that the schedule may be conflict serializable.
Example of Locking protocol:



Example of a transaction performing locking:
T2: lock-S(A);
read (A);
unlock(A);
lock-S(B);
read (B);
unlock(B);
display(A+B)
Locking as above is not sufficient to guarantee serializability — if A and B get updated inbetween the read of A and B, the displayed sum would be wrong.
A locking protocol is a set of rules followed by all transactions while requesting and releasing
locks. Locking protocols restrict the set of possible schedules.
Prepared By Nirjharinee Parida
Database Management System
1.g.
What is the difference between REDO and UNDO operation.
Ans:
Transcation which began either before or after the last check point,but were COMMITED after the
checkpoint, prior to failure.These transcation need a REDO operation during recovery.
Transcation which began before or after the last checkpoint, but were still not commited at the time of
failure. These need UNDO operation at the time of recovery.
1.h:
What is meant by concurrency?
Ans:
Multiple transactions are allowed to run concurrently in the system.
1.i. What do you mean by serializability in transaction processing?
Ans:
A (possibly concurrent) schedule is serializable if it is equivalent to a serial schedule. Different forms of
schedule equivalence give rise to the notions of:
o conflict serializability
o view serializability
1.j. Draw the E_R diagram for the following entity sets Movies(title,year,length,filmtype) ans
stars(name,address)
Ans:
title
year
movies
length
filmtyp
e
M
N
acts
name
address
stars
Role name
2. a. List at least four advantages of using a database management system over a traditional file
management system over a traditional file management system Is there any disadvantages of DBMS?
Ans:
Advantages of dbms:
Reduction of redundancies:
Centralized control of data by the DBA avoids unnecessary duplication of data and effectively reduces
the total amount of data storage required avoiding duplication in the elimination of the inconsistencies
that tend to be present in redundant data files.
Prepared By Nirjharinee Parida
Database Management System
Sharing of data:
A database allows the sharing of data under its control by any number of application programs or users.
Data Integrity:
Data integrity means that the data contained in the database is both accurate and consistent. Therefore
data values being entered for storage could be checked to ensure that they fall with in a specified range
and are of the correct format.
Data Security:
The DBA who has the ultimate responsibility for the data in the dbms can ensure that proper access
procedures are followed including proper authentication schemas for access to the DBS and additional
check before permitting access to sensitive data.
Conflict resolution:
DBA resolve the conflict on requirements of various user and applications. The DBA chooses the best file
structure and access method to get optional performance for the application.
Data Independence:
Data independence is usually considered from two points of views; physically data independence and
logical data independence.
Physical data Independence allows changes in the physical storage devices or organization of the files
to be made without requiring changes in the conceptual view or any of the external views and hence in
the application programs using the data base.
Logical data independence indicates that the conceptual schema can be changed without affecting the
existing external schema or any application program.
Disadvantage of DBMS:
1.
2.
3.
4.
5.
DBMS software and hardware (networking installation) cost is high
The processing overhead by the dbms for implementation of security, integrity and sharing of
the data.
centralized database control
Setup of the database system requires more knowledge, money, skills, and time.
The complexity of the database may result in poor performance.
Prepared By Nirjharinee Parida
Database Management System
2.b. What is abstraction ? Is it necessary for database system ? Explain how database architecture
satisfies abstraction at various levels
Ans:
Data Abstraction
For the system to be usable, it must retrieve data efficiently. The need for efficiency has led designers to use
complex data structures to represent data in the database. Since many database-systems users are not computer
trained, developers hide the complexity from users through several levels of abstraction, to simplify users’
interactions with the system:
• Physical level. The lowest level of abstraction describes how the data are actually stored. The physical level
describes complex low-level data structures in detail.
• Logical level. The next-higher level of abstraction describes what data are stored in the database, and what
relationships exist among those data. The logical level thus describes the entire database in terms of a small number
of relatively simple structures. Although implementation of the simple structures at the logical level may involve
complex physical-level structures, the user of the logical level does not need to be aware of this complexity.
Database administrators, who must decide what information to keep in the database, use the logical level of
abstraction.
• View level. The highest level of abstraction describes only part of the entire database. Even though the logical
level uses simpler structures, complexity remains because of the variety of information stored in a large database.
Many users of the database system do not need all this information; instead, they need to access only a part of the
database. The view level of abstraction exists to simplify their interaction with the system. The system may provide
many views for the same database.
3.a Draw the E_R diagram for the following relations:
Course(number,room) – it is a weak entity set.
Depts(name,HOD)
Labcourse(computer allocation)
Theory courses(name,faculty_name)
Ans:
name
hod
Faculty
name
name
DEPTS
THEORY
COURSES
Computer
allocation
has
handles
has
COURSES
number
LAB
COURSES
room
Prepared By Nirjharinee Parida
Database Management System
3.b. Suppose we have the following relation with given data as :
To avoid redundancy in the first proposal we have decomposed the relation into the following
relations:
Student(rollno,student_name) and subject(subject-code,subject-name,marks)
In the second proposal it is decomposed into the following relations:
Student(rollno,student_name) result(rollno,subject_code,marks) and
subject(subject_code,subject_name)
State whether the decomposition are lossless or lossy.
Ans:
Examination( R ) (rollno(A),student_name(B),subject_code(C ),subject_name( D ),marks(E))
The R is decomposed into
Student(R1) (rollno(A),student_name(B))
Result(R2)(rollno(A),subject_code(C),marks(E))
Subject(R3) (subject_code(C),subject_name(D))
A
B
C
D
E
R1
αA
αB
β1C
β1D
β1E
R2
αA
β2B
αC
β2D
αC
R3
β3A
β3B
αC
αD
β3E
Prepared By Nirjharinee Parida
Database Management System
A
B
C
D
E
R1
αA
αB
β1C
β1D
β1E
R2
αA
αB
αC
αD
αC
R3
β3A
β3B
αC
αD
β3E
In above table we are changing 2nd row and 2nd column as αB as in R1 , A-> B so, when A is αA then B
is αB.
In above table we are changing 2nd row and 4th column as αD as in R1 , C->D , so, when A is αC then
B is αD.
After updation of table The R2 contains all α
So the decomposition of R inot R1,R2,R3 is lossless.
4.a. What is normalization? Is it necessary for RDBMS. Explain 4NF and 5NF with examples
Ans:
Normalisation:
The basic objective of normalization is to reduce redundancy which means that information is to
be stored only once. Storing information several times leads to wastage of storage space and
increase in the total size of the data stored. Relations are normalized so that when relations in a
database are to be altered during the life time of the database, we do not lose information or
introduce inconsistencies. The type of alterations normally needed for relations are:
o
o
o
Insertion of new data values to a relation. This should be possible without being forced to
leave blank fields for some attributes.
Deletion of a tuple, namely, a row of a relation. This should be possible without losing vital
information unknowingly.
Updating or changing a value of an attribute in a tuple. This should be possible without
exhaustively searching all the tuples in the relation.
4NF:
A relation schema R I said to be in 4NF, if and only if every MVD A→→B holding on R satisfies either the
following two conditions:
a. It is trivial MVD or
b. A is a super key of R
Prepared By Nirjharinee Parida
Database Management System
5NF:
A schema R will be in 5NF, if each JD *(R1,R2,…,Rn) holding on R satisfies one of the following two
conditions:
a. It is trivial JD i.e, at least one of the projection Ri(1<=i<=n) is equal to R itself.
b. Each of the projections Ri(i<=i<=n) is a super key.
When attributes in a relation have multivalucd dependency, further Normalisation to 4NF
and 5NF are required. We will illustrate this with an example. Consider a vendor supplying
many items to many projects in an organisation. The following are the assumptions:
1. A vendor is capable of supplying many items.
2. A project uses many items
3. A vendor supplies to many projects.
4. An item may be supplied by many vendors.
Table 8 gives a relation for this problem and figure 10 lhe dependency diagram(s).
Prepared By Nirjharinee Parida
Database Management System
Prepared By Nirjharinee Parida
Database Management System
3.b.
Consider the three relations given below:
Order(order_no,order_date,customer_no)
Order_item(order_no,item_no,quantity,bill_amount)
Item(item_no,item_name,unit_price)
State whether the following relations are in #NF or not. If not , then what steps you should follow to
bring those into 3NF.
Ans:
THIRD NORMAL FORM:
Defn: A relational scheme R<S,F> is in third normal form(3NF) if for all non trivial function dependencies
in F+ of the form X→A, either X contains a key(i.e, X is super key) or A is a prime key attribute.
A Third Normal Form normalization will be needed where all attributes in a relation tuple are not
functionally dependent only on the key attribute. If two non-key attributes are functionally dependent,
then there will be unnecessary duplication of data. So remove such attributes from table and form a
new table in which it contains the tow non key attributes which are functionally depedent.
In order table {orderno} is key attribute and non keys are customer_no and order_date. Here
order_date is not depending on customer_no so ther is no transitivity dependency. This table is in 3NF.
In order_item table key attributes are {order_no,itemno} and non key are quantity and bill_amount. Here
bill_amount is not depending on quantity. So there is transitivity dependency and there for this table is in
3NF.
In Item table key attribute itemno is key attribute and non key attributes are item_name,unit_price and
Unit_price is depending on item_name. so there a transitivity depdency. So the table is not in 3NF.
Decompose the table as:
R1(item_no,item_name)
R2(item_name,Unit_price)
Now these table are in 3NF because both the table have one non key attributes and having no transitivity
dependency.
Prepared By Nirjharinee Parida
Database Management System
5.a. Define transaction. Explain different states of a transaction. Differentiate between chained
transaction and nested transaction. Discuss their advantaged and disadvantages.
Ans:
A transaction is a unit of program execution that accesses and possibly updates various data items.
TRANSCATION STATE:
A transaction must be in one of the following states:
• Active, the initial state; the transaction stays in this state while it is executing
• Partially committed, after the final statement has been executed
• Failed, after the discovery that normal execution can no longer proceed
• Aborted, after the transaction has been rolled back and the database has been
restored to its state prior to the start of the transaction
• Committed, after successful completion
Nested transaction:
When one transaction starts in another transaction that is called nested transaction.
Advantage:
When one transaction starts it also starts sub transaction. So it ensures security for parent transaction and
child transaction.
Disadvantages:
Prepared By Nirjharinee Parida
Database Management System
It will take more time to execute parent and corresponding child transaction.
Chained transaction:
When one transaction starts in another transaction and parent transaction waits for completing the child
transaction that is called chained transaction.
Advantage:
When one transaction starts it also starts sub transaction. So it ensures security for parent transaction and
child transaction.
Disadvantages:
It will take more time to execute parent and corresponding child transaction.
Prepared By Nirjharinee Parida
Database Management System
THIRD SEMESTER EXAMINATION-2010
DATABASE SYSTEM
1. a. Explain the concept of data abstraction.
Ans:
Data Abstraction:
For the system to be usable, it must retrieve data efficiently. The need for efficiency has led designers to
use complex data structures to represent data in the database. Since many database-systems users are not
computer trained, developers hide the complexity from users through several levels of abstraction, to
simplify users’ interactions with the system:
1. physical level
2. Logical level
3. View level
b. Define a schema with example.
Ans:
A schema is a logical data base description and is drawn as a chart of the types of data that are used .
It gives the names of the entities and attributes and specify the relationships between them.
A database schema includes such information as :




Characteristics of data items such as entities and attributes .
Logical structures and relationships among these data items .
Format for storage representation.
Integrity parameters such as physical authorization and back up policies.
Example:
STUDENT(rollno (primary key),name, address)
c. What is meant by OLTP application.
Ans:
OLTP application means online transaction processing in which data base is updated by inserting records
or deleting records or by modifying the records .
Examples:
Railway ticket booking,Online banking,Online shopping
d. What do you mean by unauthorized access of database system ? explain
Prepared By Nirjharinee Parida
Database Management System
ans:
If any user accessing any data base with out getting permission from server or owner of the database
that is unauthorized access to data base. Server or owner can give the permission to user to acces the
database by grant command with different privileges.
e.
Strong entity:
An entity whch have key attribute is called strong entity.
Example:
Loan(loanno,amount,loan type)
Weak Entity:
Entity types that do not contain any key attribute, and hence can not be identified independently are
called weak entity types. A weak entity can be identified by uniquely only by considering some of its
attributes in conjunction with the primary key attribute of another entity, which is called the identifying
owner entity.
Example:
Payment(pslno,pmtamt,pdate)
1.f.
Ans:
Distinguish between physical data independency and logical data independency.
Answer:
• Physical data independence is the ability to modify the physical scheme without making it
necessary to rewrite application programs. Such modifications include changing from unblocked
to blocked record storage, or from sequential to random access files.
• Logical data independence is the ability to modify the conceptual scheme without making it
necessary to rewrite application programs. Such a modification might be adding a field to a
record; an application program’s view hides this change from the program.
g. Define second normal form of relational database system.
Prepared By Nirjharinee Parida
Database Management System
Ans:
Defn: A relation scheme R<S,F> is in second normal form(2NF) if it is in the !NF and if all non prime
attributes are fully functionally dependent on the relation keys.
A relation is said to be in2NF if it is in 1NF and non-key attributes are functionally dependent on the
key attribute(s). Further. if the key has more than one attribute then no non-key attributes should be
functionally dependent upon a part of the key attributes.
h. Define timestamp ordering in concurrency control.
Ans:
Timestamps:
With each transaction Ti in the system, we associate a unique fixed timestamp, denoted by TS(Ti). This
timestamp is assigned by the database system before the transaction Ti starts execution. If a transaction
Ti has been assigned timestamp TS(Ti), and a new transaction Tj enters the system, then TS(Ti) < TS(Tj ).
There are two simple methods for implementing this scheme:
1. Use the value of the system clock as the timestamp; that is, a transaction’s timestampis equal
to the value of the clock when the transaction enters the system.
2. Use a logical counter that is incremented after a new timestamp has been assigned; that is, a
transaction’s timestamp is equal to the value of the counter When the transaction enters the
system.
The timestamps of the transactions determine the serializability order. Thus, if TS(Ti) < TS(Tj ), then the
system must ensure that the produced schedule is equivalent to a serial schedule in which transaction Ti
appears before transaction Tj .
To implement this scheme, we associate with each data item Q two timestamp values:
• W-timestamp(Q) denotes the largest timestamp of any transaction that executed write(Q)
successfully.
• R-timestamp(Q) denotes the largest timestamp of any transaction that executed read(Q) successfully
i. Explain the concept of hashing
ans:
Hashing is simply a mathematical formula that manipulates the key in some form to compute the index
for this key in the hash table.
j. Distingusih between XML data base and WEB database.
Prepared By Nirjharinee Parida
Database Management System
Ans:
Web database are the database which is accessed through the internet and through a standard
interface that is CGi(common gateway interface) to act a middle ware.The data base stored in server ia
relational database.Data can be retieved thorugh jdbc-odbc bridge.
XML data can be stored in any of several different ways. For example, XML data can be stored as
strings in a relational database. Alternatively, relations can represent XML data as trees. As
another alternative, XML data can be mapped to relations in the same way that E-R schemas are
mapped to relational schemas. XML data may also be stored in file systems, or in XML-databases,
which use XML as their internal representation .
2.a.
What is meant by data models ? describe the catagories of data models.
Data model:
The data model describes the structure of a database. It is a collection of conceptual tools for
describing data, data relationships and consistency constraints and various types of data model such
as
1. Object based logical model
2. Record based logical model
3. Physical model
Types of data model:
1.
2.
3.
2.b.
Object based logical model
a. ER-model
b. Functional model
c. Object oriented model
d. Semantic model
Record based logical model
a. Hierarchical database model
b. Network model
c. Relational model
Physical model
What is meant by ER models? Draw an ER diagram for a bank database schema where a bank
can have multiple branches , each branch can have multi accounts and loans.
Prepared By Nirjharinee Parida
Database Management System
Ans.
Entity Relationship Model
The entity-relationship data model perceives the real world as consisting of basic objects, called entities
and relationships among these objects. It was developed to facilitate data base design by allowing
specification of an enterprise schema which represents the overall logical structure of a data base.
Main features of ER-MODEL:





3.a.
Entity relationship model is a high level conceptual model
It allows us to describe the data involved in a real world enterprise in terms of objects and their
relationships.
It is widely used to develop an initial design of a database
It provides a set of useful concepts that make it convenient for a developer to move from a
baseid set of information to a detailed and description of information that can be easily
implemented in a database system
It describes data as a collection of entities, relationships and attributes.
Define functional dependency. Describe Armstrongs’ axioms of functional dependencies.
Ans:
Prepared By Nirjharinee Parida
Database Management System
The functional dependency x→y
Holds on scema R if, in any legal relation r(R ), for all pairs of tuples t1 and t2 in r such that
t1[x]=t2[x]. it is also the case that t1[y]=t2[y]
CLOSURE OF A SET OF FUNCTIONAL DEPEDENCIES
Given a relational schema R, a functional dependencies f on R is logically implied by a set of functional
dependencies F on R if every relation instance r(R) that satisfies F also satisfies f.
The closure of F, denoted by F+, is the set of all functional dependencies logically implied by F.
The closure of F can be found by using a collection of rules called Armstrong axioms.
Reflexivity rule: If A is a set of attributes and B is subset or equal to A, then A→B holds.
Augmentation rule: If A→B holds and C is a set of attributes, then CA→CB holds
Transitivity rule: If A→B holds and B→C holds, then A→C holds.
Union rule: If A→B holds and A→C then A→BC holds
Decomposition rule: If A→BC holds, then A→B holds and A→C holds.
Pseudo transitivity rule: If A→B holds and BC→D holds, then AC→D holds.
3.b.
Distinguish between equivalence sets of functional dependencies and minimal sets of
functional dependencies.
Ans:
Equivalence sets:
Given two sets of F and G of FDs defined over the same relational schema, we will say that F and G are
equivalents if and only if F+ =G+. We will indicate that F and G are equivalent sets by writing F=G.
Minimal cover:
For a given set F of FDs, a canonical cover(minimal cover) denoted by Fc, is a set of FDs where the
following condition are simultaneously satisfied.
1. Every FD of Fc is simple. That is the right hand side of every fd of Fc has only on attribute.
2. Fc is left reduced
3. Fc is non redundant
4. What is meant by normalization of Relational database? Define BCNF. How does it differ from
3NF? Why is it considered as a stronger from 3NF? Explain with suitable examples.
NORMALIZATION
Prepared By Nirjharinee Parida
Database Management System
The basic objective of normalization is to reduce redundancy which means that information is to
be stored only once. Storing information several times leads to wastage of storage space and
increase in the total size of the data stored. Relations are normalized so that when relations in a
database are to be altered during the life time of the database, we do not lose information or
introduce inconsistencies. The type of alterations normally needed for relations are:
o
Insertion of new data values to a relation. This should be possible without being forced to
leave blank fields for some attributes.
o Deletion of a tuple, namely, a row of a relation. This should be possible without losing vital
information unknowingly.
o Updating or changing a value of an attribute in a tuple. This should be possible without
exhaustively searching all the tuples in the relation.
BOYCE CODD NORMAL FORM:
Defn: a normalized relation scheme R<S,F> is in Boyce Codd normal form if for every nontrivial FD in F+
of the form X→A where X is subset of S and AЄS, X is a super key of R.
How it differs from 3nf:
A relational scheme R<S,F> is in third normal form(3NF) if for all non trivial function dependencies in F+
of the form X→A, either X contains a key(i.e, X is super key) or A is a prime key attribute.
But in BCNF for every nontrivial FD in F+ of the form X→A where X is subset of S and AЄS, X must be a
candidate key of R.
BCNF is stronger than 3NF:
A relation schema in 3NF may still have some anomalies under the situations when a schema has
multiple candidate keys, which my be composite and overlapping.
Assume that a relation has more than one possible key. Assume further that the composite keys have a
common attribute. If an attribute of a composite key is dependent on an attribute of the other
composite key, a normalization called BCNF is needed. Consider. as an
example, the relation Professor:
It is assumed that
1. A professor can work in more than one department
2. The percentage of the time he spends in each department is given.
Prepared By Nirjharinee Parida
Database Management System
3. Each department has only one Head of Department.
The relationship diagram for the above relation is given in figure 8. Table 6 gives the relation attributes.
The two possible composite keys are professor code and Dept. or Professor code and Hcad of Dept.
Observe that department as well as Head of Dept. are not non-key attributes. They are a part of a
composite key
Prepared By Nirjharinee Parida
Database Management System
5.a How multi level indexing improve the efficiency of searching an index file.
12.2.1.2 Multilevel Indices
Even if we use a sparse index, the index itself may become too large for efficient processing. It is not
unreasonable, in practice, to have a file with 100,000 records,with 10 records stored in each block. If we
have one index record per block, the index has 10,000 records. Index records are smaller than data
records, so let us assume that 100
index records fit on a block. Thus, our index occupies 100 blocks. Such large indices are stored as
sequential files on disk.
If an index is sufficiently small to be kept in main memory, the search time to find an entry is low.
However, if the index is so large that it must be kept on disk, a search for an entry requires several disk
block reads. Binary search can be used on the index file to locate an entry, but the search still has a large
cost. If the index occupies b
blocks, binary search requires as many as _log2(b)_ blocks to be read. (_x_ denotes the least integer that
is greater than or equal to x; that is, we round upward.) For our 100-block index, binary search requires
Prepared By Nirjharinee Parida
Database Management System
seven block reads. On a disk system where a block read takes 30 milliseconds, the search will take 210
milliseconds, which is long.
Note that, if overflow blocks have been used, binary search will not be possible. In that case, a
sequential search is typically used, and that requires b block reads, which will take even longer. Thus, the
process of searching a large index may be costly.
To deal with this problem, we treat the index just as we would treat any other sequential file, and
construct a sparse index on the primary index, To locate a record, we first use binary search on the outer
index to find the record for the largest search-key value less than or equal to the one that we desire. The
pointer points to a block of the inner index. We scan this block until we find the record that has the largest
search-key value less than or equal to the one that we desire. The pointer in this record points to the block
of the file that contains the record for which we are looking. Using the two levels of indexing, we have
read only one index block, rather than the seven we read with binary search, if we assume that the outer
index is already in main memory. If our file is extremely large, even the outer index may grow too large
to fit in main memory. In such a case,we can create yet another level of index. Indeed, we can repeat this
process as many times as necessary. Indices with two or more levels are called multilevel indices.
Searching for records with a multilevel index requires significantly fewer I/O operations than does
searching for records by binary search. Each level of index could correspond to a unit of physical storage.
Thus, we may have indices at the track, cylinder, and disk levels. A typical dictionary is an example of a
multilevel index in the non database world.
The header of each page lists the first word alphabetically on that page. Such a book index is a
multilevel index: The words at the top of each page of the book index form a sparse index on the contents
of the dictionary pages
Prepared By Nirjharinee Parida
Database Management System
5.b.
How does a B-tree differ from b+-tree ? Why a B+ tree is usually preferred as an access structure
to a data file.
B+-Tree Index Files:
B+-tree indices are an alternative to indexed-sequential files.
1. Disadvantage of indexed-sequential files: performance degrades as file grows, since many
overflow blocks get created. Periodic reorganization of entire file is required.
2. Advantage of B+-tree index files: automatically reorganizes itself with small, local, changes, in
the face of insertions and deletions. Reorganization of entire file is not required to maintain
performance.
3. Disadvantage of B+-trees: extra insertion and deletion overhead, space overhead.
4. Advantages of B+-trees outweigh disadvantages, and they are used extensively.
A B+-tree is a rooted tree satisfying the following properties:
1.
2.
3.
4.
All paths from root to leaf are of the same length
Each node that is not a root or a leaf has between [n/2] and n children.
A leaf node has between [(n–1)/2] and n–1 values
Special cases:
a. If the root is not a leaf, it has at least 2 children.
b. If the root is a leaf (that is, there are no other nodes in the tree), it can have between 0
and (n–1) values.
Example of B+ tree
B+-Tree File Organization
1. Index file degradation problem is solved by using B+-Tree indices. Data file degradation problem
is solved by using B+-Tree File Organization.
2. The leaf nodes in a B+-tree file organization store records, instead of pointers.
3. Since records are larger than pointers, the maximum number of records that can be stored in a
leaf node is less than the number of pointers in a non leaf node.
4. Leaf nodes are still required to be half full.
5. Insertion and deletion are handled in the same way as insertion and deletion of entries in a B+tree index.
Prepared By Nirjharinee Parida
Database Management System
B-Tree Index Files
1. Similar to B+-tree, but B-tree allows search-key values to appear only once; eliminates
redundant storage of search keys.
2. Search keys in non leaf nodes appear nowhere else in the B-tree; an additional pointer field for
each search key in a non leaf node must be included.
3. Generalized B-tree leaf node
Non leaf node – pointers Bi are the bucket or file record pointers
Examples of B-tree:
6.a What is meant by query execution plan? Discuss the main heuristic that are applied
during query optimization process.
Ans:
A sequence of primitive operations that can be used to evaluate a query is a query execution plan or queryevaluation plan.
Heuristic optimization:
A drawback of cost-based optimization is the cost of optimization itself. Although the cost of query
processing can be reduced by clever optimizations, cost-based optimization is still expensive. Hence,
many systems use heuristics to reduce the number of choices that must be made in a cost-based fashion.
Some systems even choose to use only heuristics, and do not use cost-based optimization at all.
o
o
o
Cost-based optimization is expensive, even with dynamic programming.
Systems may use heuristics to reduce the number of choices that must be made in a cost-based
fashion.
Heuristic optimization transforms the query-tree by using a set of rules that typically (but not in
all cases) improve execution performance:
o Perform selection early (reduces the number of tuples)
o Perform projection early (reduces the number of attributes)
o Perform most restrictive selection and join operations before other similar operations.
Prepared By Nirjharinee Parida
Database Management System
 Some systems use only heuristics, others combine heuristics with partial cost-based
optimization.
Steps in Typical Heuristic Optimization:
1.
Deconstruct conjunctive selections into a sequence of single selection operations
2.
Move selection operations down the query tree for the earliest possible execution
3.
Execute first those selection and join operations that will produce the smallest relations
4.
Replace Cartesian product operations that are followed by a selection condition by join
operations .
5.
Deconstruct and move as far down the tree as possible lists of projection attributes,
creating new projections where needed.
6.
Identify those sub trees whose operations can be pipelined, and execute them using
pipelining.
6.b. Describe the ACID properties of a database transaction.
Ans:
ACID properties of transaction:




Atomicity. Either all operations of the transaction are reflected properly in the database, or
none are.
Consistency. Execution of a transaction in isolation (that is, with no other transaction executing
concurrently) preserves the consistency of the database.
Isolation. Even though multiple transactions may execute concurrently, the system guarantees
that, for every pair of transactions Ti and Tj , it appears to Ti that either Tj finished execution
before Ti started, or Tj started execution after Ti finished. Thus, each transaction is unaware of
other transactions executing concurrently in the system.
Durability. After a transaction completes successfully, the changes it has made
to the database persist, even if there are system failures.
These properties are often called the ACID properties; the acronym is derived from the first letter of
each of the four properties.
7.a. Discuss how serializability is used to enforce concurrency control in a database system.
Ans:
Serializability:



Basic Assumption – Each transaction preserves database consistency.
Thus serial execution of a set of transactions preserves database consistency.
A (possibly concurrent) schedule is serializable if it is equivalent to a serial schedule. Different
forms of schedule equivalence give rise to the notions of:
1. conflict serializability 2. view serializability
Prepared By Nirjharinee Parida
Database Management System
The Two-Phase Locking Protocol
One protocol that ensures serializability is the two-phase locking protocol. This protocol requires that
each transaction issue lock and unlock requests in two phases:
1. Growing phase. A transaction may obtain locks, but may not release any lock.
2. Shrinking phase. A transaction may release locks, but may not obtain any new locks.
Initially, a transaction is in the growing phase. The transaction acquires locks as needed. Once the
transaction releases a lock, it enters the shrinking phase, and it can issue no more lock requests.
For example, transactions T3 and T4 are two phase. On the other hand, transactions T1 and T2 are not
two phase. Note that the unlock instructions do not need to appear at the end of the transaction. For
example, in the case of transaction T3, we could move the unlock(B) instruction to just after the lockX(A) instruction, and still retain the two-phase locking property.
Prepared By Nirjharinee Parida
Database Management System
7.b. Discuss the problems of dead lock and starvation and different approaches to deal with these
problems.
DEADLOCK:
System is deadlocked if there is a set of transactions such that every transaction in the set
is waiting for another transaction in the set.
Recovery from Deadlock:
When a detection algorithm determines that a deadlock exists, the system must recover from the
deadlock. The most common solution is to roll back one or more transactions to break the deadlock.
Three actions need to be taken:
1. Selection of a victim. Given a set of deadlocked transactions, we must determine which transaction
(or transactions) to roll back to break the deadlock. We should roll back those transactions that will
incur the minimum cost. Unfortunately, the term minimum cost is not a precise one. Many factors may
determine the cost of a rollback, including
a. How long the transaction has computed, and how much longer the transaction will compute
before it completes its designated task.
b. How many data items the transaction has used.
c. How many more data items the transaction needs for it to complete.
d. How many transactions will be involved in the rollback.
Prepared By Nirjharinee Parida
Database Management System
2. Rollback. Once we have decided that a particular transaction must be rolled back, we must determine
how far this transaction should be rolled back. The simplest solution is a total rollback: Abort the
transaction and then restart it. However, it is more effective to roll back the transaction only as far as
necessary to break the deadlock. Such partial rollback requires the system to maintain additional
information about the state of all the running transactions. Specially, the sequence of lock
requests/grants and updates performed by the transaction needs to be recorded. The deadlock
detection mechanism should decide which locks the selected transaction needs to release in order to
break the deadlock. The selected transaction must be rolled back to the point where it obtained the .rst
of these locks, undoing all actions it took after that point. The recovery mechanism must be capable of
performing such partial rollbacks. Furthermore, the transactions must be capable of resuming execution
after a partial rollback. See the bibliographical notes for relevant references.
3. Starvation. In a system where the selection of victims is based primarily on cost factors, it may
happen that the same transaction is always picked as a victim. As a result, this transaction never
completes its designated task, thus there is starvation. We must ensure that transaction can be picked
as a victim only a (small) .nite number of times. The most common solution is to include the number of
rollbacks in the cost factor.
8. Consider the following relation:
Customer(cust_name,cust_street,cust_city)
Loan(branch_name,loan_number,amount)
Borrower(cust_name,loan_number,amount)
Write the following quries in SQL for above relation:
b. Find the name of all customers how have a loan at the “ Redwood” Branch.
c. Find the branch name,loan number and amount for loans over Rs. 50000/d. Find the name of all customers who have a loan an account or both at perryridge
branch.
e. Find all customers who have a loan from the bank. Find their names and loan numbers
f.
Find all loan numbers for loan made at redwood branch with loan amounts greater
than Rs.60000.
Ans:
a.
select borrower.cust_name from loan ,borrower where loan.loan_number=borrower.loan_number
and loan.branch_name=’Redwood’
b.
select branch_name,loan_numer,amount from loan where amount>50000
Prepared By Nirjharinee Parida
Database Management System
c.
(select customer-name from depositor a ,account b where a.account_no=b.account_no and
b.branch_name=’perryridge’) union (select customer-name from borrower a,loan b where
loan.loan_number=b.loan_number and b. branch_name=’perryridge’)
d.
select borrower.cust_name,borrower.loan_number from borrower , loan where
borrower.loan_number=loan.loan_number
e.
select a.loan_number from borrower a,loan b where
b.branch_name=’perryridge’ and a.amount>60000
a.loan_number=b.loan_number and
Prepared By Nirjharinee Parida
Database Management System
Fourth Semester Examination-2010
DATABASE ENGINEERING
1. a.
What do you mean by ACID proerties of a transaction?
Ans:




b.
Ans:
Atomicity. Either all operations of the transaction are reflected properly in the database, or
none are.
Consistency. Execution of a transaction in isolation (that is, with no other transaction executing
concurrently) preserves the consistency of the database.
Isolation. Even though multiple transactions may execute concurrently, the system guarantees
that, for every pair of transactions Ti and Tj , it appears to Ti that either Tj finished execution
before Ti started, or Tj started execution after Ti finished. Thus, each transaction is unaware of
other transactions executing concurrently in the system.
Durability. After a transaction completes successfully, the changes it has made
to the database persist, even if there are system failures.
Consider a multilevel index with fan-out=4 used to index 25 records,draw the structure.
7
2
14
4
5
21
7
6
25
c.
For the following operations:
T1: read (X)
X=X+10
Write (X)
T2:read(X)
Read (X)
Prepared By Nirjharinee Parida
Database Management System
For simultaneous execution of T1 & T2 which problem can happen
Ans:
Dirty read occurs in T2 when READ (X) statement is executed because at the same time X is
opened at T1.
d. For a relation R(A,B,C,D) with the dependency among numeric field values:
C=2A+B and D=2B, draw the ER digram.
R
A
B
C=2A+B
e.
Ans:
D=2B
Differentiate between foreign key and references.
It states that the tuple in one relation that refers to another relation must refer to an existing tuple in
that relation. This constraints is specified on two relations .
If a column is declared as foreign key that must be primary key of another table.
Department(deptcode,dname)
Here the deptcode is the primary key.
Emp(empcode,name,city,deptcode).
Here the deptcode is foreign key.
References keyword is used to refer the parent table where primary key is created
f.
What is time stamp? If TS(Ti)>TS(Tj) then which transaction is younger? Justify. Consider
TS(Ti) is the time stamp of transaction Ti.
Prepared By Nirjharinee Parida
Database Management System
Ans:
Timestamps:
With each transaction Ti in the system, we associate a unique fixed timestamp, denoted by TS(Ti). This
timestamp is assigned by the database system before the transaction Ti starts execution. If a transaction
Ti has been assigned timestamp TS(Ti), and a new transaction Tj enters the system, then TS(Ti) < TS(Tj ).
There are two simple methods for implementing this scheme:
1. Use the value of the system clock as the timestamp; that is, a transaction’s timestampis equal
to the value of the clock when the transaction enters the system.
2. Use a logical counter that is incremented after a new timestamp has been assigned; that is, a
transaction’s timestamp is equal to the value of the counter When the transaction enters the
system.
Ti is younger then Tj because TS(Ti)>TS(Tj)
g. What is query tree? Draw the query tree for the following SQL query.
Select R1.A,R2.B from R1,R2 where R1.b=R2.a
Ans:
Query tree is tree which gives the idea of query evaluation plan.
Prepared By Nirjharinee Parida
Database Management System
ПR1.A,R2.B
σR1.b=R2.a
R1
R2
h. Find the tuple calculus representation for the following SQL query:
select R1.a,R2.b from R1 and R2 where R1.b=R2.a;
ans:
R={t(a,b)|sЄR1(s[a]=R1[a] ^uЄR2(u[b]=R2[b] ^s[b]=u[a]))}
i. For the following set of dependencies:
{A→BC,B→D,C→DE,BC→F}
Find the primary key of the relation.
Ans:
A+={ABCDEF}
B+={BD}
BC+={BCDEF}
Here the super key is A because A+ gives all the attributes of R.
It is also candidate key and it can primary key of R.
j. What do you mean by RAID.
Ans:
A variety of disk-organization techniques, collectively called redundant arrays of independent disks
(RAID), have been used to achieve improved performance and reliability.
In the past, system designers viewed storage systems composed of several small cheap disks as a costeffective alternative to using large, expensive disks; the cost per megabyte of the smaller disks was less
Prepared By Nirjharinee Parida
Database Management System
than that of larger disks. In fact, the I in RAID, which now stands for independent, originally stood for
inexpensive. Today, however, all disks are physically small, and larger-capacity disks actually have a
lower cost per megabyte. RAID systems are used for their higher reliability and higher performance rate,
rather than for economic reasons.
Prepared By Nirjharinee Parida
Database Management System
2. Consider the set of relations:
Student(name,roll,mark)
Score(roll,grade)
Details(name,address)
For the following qyuery:
“Find name and address of students scoring grade “A”
Represent it in relational algebra, tuple relational calculus, domain relational calculus, QBE & SQL
Ans:
Relational algebra:
Пdetails.name,details.address(σstudent.roll=score.roll and student.name=details.name and and
score.grade=”A”(studentXscoreXdetails))
Tuple relational calculus:
R={t|sЄdetails(t[name]=s[name] ^uЄstudent(u[name]=s[name]) ^wЄscore(w[roll]=u[roll]))}
Domain relational calculus:
R={<nm,ad>|nm,adЄdetails ^ r,gЄscore^nm,r,mЄstudent^g=”A”}
SQL:
Select details.name,details.address from student,score,details where student.roll=score.roll and
student.name=details.name and score.grade=”A”
Prepared By Nirjharinee Parida
Database Management System
3. a. What is collision ? Discuss the various collision resolution techniques.
Collision Resolution Techniques:
There are two broad ways of collision resolution:
1. Separate Chaining:: An array of linked list implementation.
2. Open Addressing: Array-based implementation.
(i) Linear probing (linear search)
(ii) Quadratic probing (nonlinear search)
(iii) Double hashing (uses two hash functions)
Separate Chaining:
•
The hash table is implemented as an array of linked lists.
•
•
Inserting an item, r, that hashes at index i is simply insertion into the linked list at position i.
Synonyms are chained in the same linked list.
•
Retrieval of an item, r, with hash address, i, is simply retrieval from the linked list at position i.
Prepared By Nirjharinee Parida
Database Management System
•
•
Deletion of an item, r, with hash address, i, is simply deleting r from the linked list at position i.
Example: Load the keys 23, 13, 21, 14, 7, 8, and 15 , in this order, in a hash table of size 7 using
separate chaining with the hash function: h(key) = key % 7
h(23) = 23 % 7 = 2
h(13) = 13 % 7 = 6
h(21) = 21 % 7 = 0
h(14) = 14 % 7 = 0 collision
h(7) = 7 % 7 = 0
collision
h(8) = 8 % 7 = 1
h(15) = 15 % 7 = 1
collision
Introduction to Open Addressing:
•
•
•
All items are stored in the hash table itself.
In addition to the cell data (if any), each cell keeps one of the three states: EMPTY, OCCUPIED,
DELETED.
While inserting, if a collision occurs, alternative cells are tried until an empty cell is found.
•
Deletion: (lazy deletion): When a key is deleted the slot is marked as DELETED rather than
EMPTY otherwise subsequent searches that hash at the deleted cell will fail.
•
Probe sequence: A probe sequence is the sequence of array indexes that is followed in
searching for an empty cell during an insertion, or in searching for a key during find or delete
operations.
•
The most common probe sequences are of the form:
hi(key) = [h(key) + c(i)] % n, for i = 0, 1, …, n-1.
Prepared By Nirjharinee Parida
Database Management System
where h is a hash function and n is the size of the hash table
• The function c(i) is required to have the following two properties:
Property 1: c(0) = 0
Property 2: The set of values {c(0) % n, c(1) % n, c(2) % n, . . . , c(n-1) % n} must be a permutation of
{0, 1, 2,. . ., n – 1}, that is, it must contain every integer between 0 and n - 1 inclusive.
•
The function c(i) is used to resolve collisions.
•
To insert item r, we examine array location h0(r) = h(r). If there is a collision, array locations
h1(r), h2(r), ..., hn-1(r) are examined until an empty slot is found.
•
Similarly, to find item r, we examine the same sequence of locations in the same order.
•
Note: For a given hash function h(key), the only difference in the open addressing collision
resolution techniques (linear probing, quadratic probing and double hashing) is in the
definition of the function c(i).
Common definitions of c(i) are:
•
Collision resolution technique
c(i)
Linear probing
i
Quadratic probing
±i2
Double hashing
i*hp(key)
where hp(key) is another hash function.
Open Addressing: Linear Probing:
•
•
c(i) is a linear function in i of the form c(i) = a*i.
Usually c(i) is chosen as:
c(i) = i
for i = 0, 1, . . . , tableSize – 1
•
The probe sequences are then given by:
Prepared By Nirjharinee Parida
Database Management System
hi(key) = [h(key) + i] % tableSize
•
for i = 0, 1, . . . , tableSize – 1
For c(i) = a*i to satisfy Property 2, a and n must be relatively prime.
3.b. Consider the following set of data items:
A
B
C
D
A1
1
X1
D1
A2
2
X2
D2
A3
3
X3
D3
A3
3
X3
D3
Represent it in 3NF.
Ans:
In this table 4th row is duplicate record of 3rd one. Delete that record before checking for normalization.
Now the table is in 1NF format.
In this relation all row are uniue.
In this relation all functional dependencies are there like
A→B,B→C,C→D,A→C.A→D,B→D,AB→C,AB→D,…
Here the candidate key is (ABCD)
So it a all key schema.
There is no transitivity dependency so it is 3NF table.
Prepared By Nirjharinee Parida
Database Management System
4. Describe the steps to reduce an ER-scema to tables
Conversion of ER-diagram to relational database
Conversion of entity sets:
1. For each strong entity type E in the ER diagram, we create a relation R containing all the single
attributes of E. The primary key of the relation R will be one of the key attribute of R.
STUDENT(rollno (primary key),name, address)
FACULTY(id(primary key),name ,address, salary)
COURSE(course-id,(primary key),course_name,duration)
DEPARTMENT(dno(primary key),dname)
2. for each weak entity type W in the ER diagram, we create another relation R that contains all
simple attributes of W. If E is an owner entity of W then key attribute of E is also include In R.
This key attribute of R is set as a foreign key attribute of R. Now the combination of primary key
attribute of owner entity type and partial key of the weak entity type will form the key of the
weak entity type
GUARDIAN((rollno,name) (primary key),address,relationship)
Conversion of relationship sets:
Binary Relationships:
 One-to-one relationship:
For each 1:1 relationship type R in the ER-diagram involving two entities E1 and E2 we choose
one of entities(say E1) preferably with total participation and add primary key attribute of
another E as a foreign key attribute in the table of entity(E1). We will also include all the simple
attributes of relationship type R in E1 if any, For example, the department relationship has been
extended tp include head-id and attribute of the relationship.
DEPARTMENT(D_NO,D_NAME,HEAD_ID,DATE_FROM)
Prepared By Nirjharinee Parida
Database Management System
 One-to-many relationship:
For each 1:n relationship type R involving two entities E1 and E2, we identify the entity type (say
E1) at the n-side of the relationship type R and include primary key of the entity on the other
side of the relation (say E2) as a foreign key attribute in the table of E1. We include all simple
attribute(or simple components of a composite attribute of R(if any) in he table E1)
For example:
The works in relationship between the DEPARTMENT and FACULTY. For this relationship choose
the entity at N side, i.e, FACULTY and add primary key attribute of another entity DEPARTMENT,
ie, DNO as a foreign key attribute in FACULTY.
FACULTY(CONSTAINS WORKS_IN RELATIOSHIP)
(ID,NAME,ADDRESS,BASIC_SAL,DNO)

Many-to-many relationship:
For each m:n relationship type R, we create a new table (say S) to represent R, We also include
the primary key attributes of both the participating entity types as a foreign key attribute in s.
Any simple attributes of the m:n relationship type(or simple components as a composite
attribute) is also included as attributes of S.
For example:
The M:n relationship taught-by between entities COURSE; and FACULTY shod be represented as
a new table. The structure of the table will include primary key of COURSE and primary key of
FACULTY entities.
TAUGHT-BY(ID (primary key of FACULTY table),course-id (primary key of COURSE table)

N-ary relationship:
For each n-anry relationship type R where n>2, we create a new table S to represent R, We
include as foreign key attributes in s the primary keys of the relations that represent the
participating entity types. We also include any simple attributes of the n-ary relationship type(or
Prepared By Nirjharinee Parida
Database Management System
simple components of complete attribute) as attributes of S. The primary key of S is usually a
combination of all the foreign keys that reference the relations representing the participating
entity types.
Loan
Customer
Loan sanctio
n
Employee
LOAN-SANCTION(cusomet-id,loanno,empno,sancdate,loan_amount)

Multi-valued attributes:
For each multivalued attribute ‘A’, we create a new relation R that includes an attribute
corresponding to plus the primary key attributes k of the relation that represents the entity type
or relationship that has as an attribute. The primary key of R is then combination of A and k.
For example, if a STUDENT entity has rollno,name and phone number where phone numer is a
multivalued attribute the we will create table PHONE(rollno,phoneno) where primary key is the
combination,In the STUDENT table we need not have phone number, instead if can be simply
(rollno,name) only.
PHONE(rollno,phoneno)
Prepared By Nirjharinee Parida
Database Management System
Account_n
o
name
Account
branch
generalisation
specialisation
Is-a
intrest
charges
Saving
Current
 Converting Generalisation /specification hierarchy to tables:
A simple rule for conversion may be to decompose all the specialized entities into table in case
they are disjoint, for example, for the figure we can create the two table as:
Account(account_no,name,branch,balance)
Saving account(account-no,intrest)
Current_account(account-no,charges)
Prepared By Nirjharinee Parida
Database Management System
5.a.
Differentiate between object-oriented database and object relational database
CRITERIA
ORDBMS
ODBMS
Define standard
SQL3/4
ODMG_V2.0
Support for object oriented
programming
Limited mostly to new data
types
Direct and extensive
Simplicity of use
Same RDBMS, with some
confusing extensions
OK for programmer; some SQL
access foe end users
Complex data relationships
Difficult to model
Can handle arbitrary
complexity; users can write
methods on any structure
Simplicity of development
Provides independence of data
from application, good for
simple relationship
Objects are a natural way to
model, can accommodate a
wide verity of types and
relationships.
5.b What do you mean by locking ? Explain the two phase locking with an example


A lock is a mechanism to control concurrent access to a data item
Data items can be locked in two modes :
o exclusive (X) mode. Data item can be both read as well as written. X-lock is requested
using lock-X instruction.
Prepared By Nirjharinee Parida
Database Management System
o
shared (S) mode. Data item can only be read. S-lock is requested using lock-S
instruction.
 Lock requests are made to concurrency-control manager. Transaction can proceed only after
request is granted.
The Two-Phase Locking Protocol
One protocol that ensures serializability is the two-phase locking protocol. This protocol
requires that each transaction issue lock and unlock requests in two phases:
1. Growing phase. A transaction may obtain locks, but may not release any lock.
2. Shrinking phase. A transaction may release locks, but may not obtain any
new locks.
Initially, a transaction is in the growing phase. The transaction acquires locks as
needed. Once the transaction releases a lock, it enters the shrinking phase, and it
can issue no more lock requests.
For example, transactions T3 and T4 are two phase. On the other hand, transactions
T1 and T2 are not two phase. Note that the unlock instructions do not need to appear
at the end of the transaction. For example, in the case of transaction T3, we could
move the unlock(B) instruction to just after the lock-X(A) instruction, and still retain
the two-phase locking property.
Prepared By Nirjharinee Parida
Database Management System
6. What is the difference between 4NF and BCNF. Describe with examples.
Ans:
BCNF:
The criteria for Boyce-Codd normal form (BCNF) are:
* The table must be in 3NF.
* Every non-trivial functional dependency must be a dependency on a superkey.
4NF:
The criteria for fourth normal form (4NF) are:
* The table must be in BCNF.
* There must be no non-trivial multivalued dependencies on something other than a superkey. A BCNF
table is said to be in 4NF if and only if all of its multivalued dependencies are functional dependencies.
Prepared By Nirjharinee Parida
Database Management System
COURSE
TEACHER
TEXT
OS
Ravi
Galvin
OS
Vivek
Dietel
OS
Ravi
Dietel
CTX:
As
indic
CO
Ram
Hamcher
ated
in
CO
Shayam
M.Mano
ctx,
the
CO
Ram
M.Mano
sche
ma
CO
Shyam
Hamcher
CTX
RTS
Ravi
Dietel
doe
s
not have any non –trivial FDs. Thus, it is an “ALL KEY” schema and all legal relations under this schema
will be in BCNF. But, This relation still has the following data anomalies:
OS
vivek
Galvin
a.
The information that a particular teacher is teaching a particular course is represented as many
times as the number of text books followed for that particular course.
b.
The information that a particular text book is followed for a particular course is represented as
many times as the number of teachers teaching the particular course.
These anomalies are due to the non-trivial multi valued dependencies holding on the schema
CTX.
Trivial MVD:
An MVD α→→β holding on a relation schema R is said to be trivial,if:
a.
α is subset or equal to β
B.
αUβ=R
7. What is a constraint ? describe its types with examples
Ans:
Prepared By Nirjharinee Parida
Database Management System
Rules which are enforced on data being entered and prevents the user from entering invalid data into
table are called constraints. Thus constraints super control data being entered in tables fro permanent
storage.
RELATIONAL CONSTRAINTS:
There are three types of constraints on relational database that include
o
o
o
DOMAIN CONSTRAINTS
KEY CONSTRAINTS
INTEGRITY CONSTRAINTS
DOMAIN CONSTRAINTS:
It specifies that each attribute in a relation an atomic value from the corresponding domains. The data
types associated with commercial RDBMS domains include:
o Standard numeric data types for integer
o Real numbers
o Characters
o Fixed length strings and variable length strings
Thus, domain constraints specifies the condition that we to put on each instance of the relation. So
the values that appear in each column must be drawn from the domain associated with that
column.
Rollno
Name
City
Age
101
Sujit
Bam
23
102
kunal
bbsr
22
Key constraints:
This constraints states that the key attribute value in each tuple msut be unique .i.e, no two tuples
contain the same value for the key attribute.(null values can allowed)
Emp(empcode,name,address) . here empcode can be unique
Integrity constraints:
Prepared By Nirjharinee Parida
Database Management System
There are two types of integrity constraints:
o
o
Entity integrity constraints
Referential integrity constraints
Entity integrity constraints:
It states that no primary key value can be null and unique. This is because the primary key is used to
identify individual tuple in the relation. So we will not be able to identify the records uniquely containing
null values for the primary key attributes. This constraint is specified on one individual relation.
Referential integrity constraints:
It states that the tuple in one relation that refers to another relation must refer to an existing tuple in
that relation. This constraints is specified on two relations .
If a column is declared as foreign key that must be primary key of another table.
Department(deptcode,dname)
Here the deptcode is the primary key.
Emp(empcode,name,city,deptcode).
Here the deptcode is foreign key.
8.
a).
Double Buffering:
When several blocks need to be transferred from disk to main memory and all the block address are
known. Several buffers can be reserved in main memory to speed up the transfer. While one buffer is
being read or written, the cpu can process data in the other buffer. This is possible because an
independent disk I/O processor exists that once started can proceed to transfer a data block between
memory and disk independent of and in called to cpu processing
Prepared By Nirjharinee Parida
Database Management System
Reading and processing can proceed in parallel when the time required to process a disk block in
memory is less then the time required to read the next block and fill a buffer. The cpu can street
processing a block once its transfer to main memory is completed at the same time the disk I/O
processor can be reading and transferring the next block into different blocks. This technique is called
double buffering.
b) Parallel Databases:
Parallel database system is a DBMS running across multiple processors and disks that is designed to
perform various tasks concurrently like loading data, building indexes and evaluating queries
Parallelism in Databases





Data can be partitioned across multiple disks for parallel I/O.
Individual relational operations (e.g., sort, join, aggregation) can be executed in parallel
o data can be partitioned and each processor can work independently on its own
partition.
Queries are expressed in high level language (SQL, translated to relational algebra)
o makes parallelization easier.
Different queries can be run in parallel with each other. Concurrency control takes care of
conflicts.
Thus, databases naturally lend themselves to parallelism.
Parallel Database Architectures




Shared memory -- processors share a common memory
Shared disk -- processors share a common disk
Shared nothing -- processors share neither a common memory nor common disk
Hierarchical -- hybrid of the above architectures
Prepared By Nirjharinee Parida
Database Management System
c) Multi Valued Dependencies:
A relation scheme R(A,B,C) is said to have multi valued dependencies A->>B and
A->>C if and only if for every legal relation r(R) and for a tuple pair (t1,t2) belongs to r that satisfies
t1[A]=t2[B], t1[A]!=t2[B], and t1[C]!=t2[C] there exist {t3,t4}
єr
That stratifies t3[A]=t4[A]= t1[A]=t2[A]
And
t3[B]=t1[B] and t4[B]=t2[B]
And t3[C]=t2[C] and t4[C]=t1[C]
The MVD always occurse in pairs A->>B and A->>C. Both can be jointly denoted as
A->>B|C
Prepared By Nirjharinee Parida
Database Management System
Course
Sname
book
Btech
Rajiv
Os
Mtech
Rajiv
Aca
Mca
Ajay
Se
Bca
Abhay
Spm
Bis
Ajay
Testing
Here sname and book are multivalued facts about the attribute course. Here the student has no control
over the books to be used for their course. These are independent of each other
MVDs are
Course->> sname
Course->>book
6. What is the difference between 4NF nad BCNF? Describe with examples.
4NF:
Given a relation scheme R such that the set D of FDs and MVDs are satisfied, consider a set attributes X
and Y where X is subset or equal to R,Y is subset or equal to Y. The reltion scheme R is in 4NF if for all
mutivalued dependencies of the form X →→Y Є D+
Either X →→Y is a trivial MVD or X is super key of R.
BOYCE CODD NORMAL FORM:
Defn: a normalized relation scheme R<S,F> is in Boyce Codd normal form if for every nontrivial FD in F+
of the form X→A where X is subset of S and AЄS, X is a super key of R.
Course
Teacher
Text
Os
Ravi
Galvin
Os
Vivek
Dietel
Prepared By Nirjharinee Parida
Database Management System
Os
Ravi
Dietel
Os
Vivek
Galvin
Co
Ram
Hamcher
Co
Shyam
M.mano
Co
Ram
M.Mano
Co
Shayam
Hamcher
As in CTX, The schema CTX does not have any non –trivial FD thus it is an “all-key” schema. It is in BCNF.
The relation CTX has not trivial MVDs course->>teacher and course->>text. Thsese are nor non trivial
MVD. And course is not a super key of CTX since CTX is an all key schema. Thus CTX in BCNF but not in
4NF.
4NF is more desirable than BCNF because it reduces the repletion of information. If we consider a BCNF
schema no in 4NF. We observe that decomposition into 4NF does not lose information provided that a
loss less join decomposition is used. Yet redundancy is reduced.
RELATIONAL DATABASE MANAGEMENT SYSTEMS
4TH-SEMESTER EXAMINATION ,2006
1.
a. Differentiate between schema and instances.
Ans:
Relation Schema:
A relational schema specifies the relation’ name, its attributes and the domain of each attribute. If R
is the name of a relation and A1,A2,… and is a list of attributes representing R then R(A1,A2,…,an) is
called a relational schema. Each attribute in this relational schema takes a value from some specific
domain called domain(Ai).
Relation Instance:
Prepared By Nirjharinee Parida
Database Management System
A relational instance denoted as r is a collection of tuples for a given relational schema at a specific
point of time.
A relation state r to the relations schema R(A1,A2…,An) also denoted by r® is a set of n-tuples
R{t1,t2,…tm} Where each n-tuple is an ordered list of n values
T=<v1,v2,….vn> Where each vi belongs to domain (Ai) or contains null values.
1.b. Define weak entity and cardinality ratio
Ans:
Weak entity:
Entity types that do not contain any key attribute, and hence can not be identified independently are
called weak entity types. A weak entity can be identified by uniquely only by considering some of its
attributes in conjunction with the primary key attribute of another entity, which is called the identifying
owner entity.
Cardianality Ratio(Mapping Cardinalities):
Mapping cardinalities or cardinality ratios, express the number of entities to which another entity can be
associated via a relationship set.
Mapping cardinalities are most useful in describing binary relationship sets, although they can
contribute to the description of relationship sets that involve more than two entity sets.
For a binary relationship set R between entity sets A and B, the mapping cardinalities must be one of the
following:
One-to-one,one-to-many,many-to-one,many-to-many
1. c. Why null values are considered to be bad in a relation.
Ans:
The attribute value which is unknown to user is called NULL valued attribute.
We know NULL values are not allowed for key attribute. Because through key attribute we uniquely
identify the tuple in a relation. If it is null we can uniquely identify the tuples
1.d. Define database authorization.
Ans:
Database authorization is the process of giving authorization to use user’s database to another user. In
oracle through grant command we can give different privileges of database to other users and revoke
command is cancel privileges of database.
Prepared By Nirjharinee Parida
Database Management System
1.e Explain functional dependence
ans:
The functional dependency x→y Holds on schema R if, in any legal relation r(R ), for all
pairs of tuples t1 and t2 in r such that t1[x]=t2[x]. it is also the case that t1[y]=t2[y]
In order(orderno,orderdate,itemno,price,qtysold)
The functional dependencies are
orderno→orderdate
itemno→price
orderno,itemno→qtysold
1.f What is data encryption.
Ans:
Data encryption is method in which data is converted to the specific format use certain techniques so
that data can’t be used by unauthorized users.
1.g. Define 4th normal form
Ans:
FOURTH NORMAL FORM:
Defn:
Given a relation scheme R such that the set D of FDs and MVDs are satisfied, consider a set attributes X
and Y where X is subset or equal to R,Y is subset or equal to Y. The reltion scheme R is in 4NF if for all
mutivalued dependencies of the form X →→Y Є D+
Either X →→Y is a trivial MVD or X is super key of R.
1.h. What is update anomalies.
Update Anomalies:
Prepared By Nirjharinee Parida
Database Management System
Multiple copies of the same fact may lead to update anomalies or inconsistencies when an update is
made and only some of the multiple copies are updated. Thus ,a change in the phone_no of jones must
be made, for consistency in all tuple pertained to the student jones.If one of the tuples is not changed to
reflect the new phone_no of jones.
1.i.
What are the various types of database users?
Ans:
Database users :
Naive users :
Users who need not be aware of the presence of the database system or any other system
supporting their usage are considered naïve users . A user of an automatic teller machine falls on this
category.
Online users :
These are users who may communicate with the database directly via an online terminal or
indirectly via a user interface and application program. These users are aware of the database system
and also know the data manipulation language system.
Application programmers :
Professional programmers who are responsible for developing application programs or user
interfaces utilized by the naïve and online user falls into this category.
Database Administration :
A person who has central control over the system is called database administrator .
1.j.
What do you mean by database audit
Ans:
Data base audit means it is the process of storing or recording all details of dml operations for a
database in a table in secondary memory. This table can sued for auditing and check back transactions
very easily.
Prepared By Nirjharinee Parida
Database Management System
2. Consider the following three tables-sailors, reserves and boats having the follwing attributes.
Sailors (Salid,Salname,Rating,Age)
Reserves(Salid,Boatid,Day)
Boats(Boatid,boat_name,color)
Use the above schema and solve the queries in relational algebra
i)
Find the name of the sailors who have reserve boat 103.
Ans:
∏salname( Sailors.salid=
Reserves.salid
and
Reserves.Boatid=
Boats.boatid
and
Reserves.Boatid=
Boats.boatid
and
Reserves.Boatid=
Boats.boatid
and
reserves.boatid=103(Sailors X Reserves X Boats))
ii)
Find the name of the sailors who have reserved red color.
Ans:
∏salname( Sailors.salid=
Reserves.salid
and
Boats.color=”RED”(Sailors X Reserves X Boats))
iii)
Find the columns of boats reserved by Lubber.
Ans:
∏salname( Sailors.salid=
Reserves.salid
and
Sailors.salname=”Lubber”(Sailors X Reserves X Boats))
3. Explain with examples the following SQL commands
i)
CREATE –both view and table
ans:
create table tablename (columname1 datatype1,columnname2 datatype2…..)
This statement creates a new table for a user
Prepared By Nirjharinee Parida
Database Management System
eg: create table student(rollno varchar2(10),name varchar2(20),city varchar2(20))
create view view name as select table name [where condition]
eg: create view v1 as select rollno,name,city from student where city=’bbsr’
This statement creates a new view from a given table as per requirements of the user’s criteria.
ii)
ALTER(add):
Adding new column to a table:
ALTER table table name add ( new_column datatype(size))
Eg:
ALTER table student add ( phone_no varchar2(10))
ALTER(modify):
Modifying the size of a given column of a table:
ALTER table table name modify (column datatype(new size))
Eg:
ALTER table student modify (name varchar2 (50))
iii)
SELECT –group by, having apart from FROM and WHERE
Prepared By Nirjharinee Parida
Database Management System
Ans:
Select and group by clause is used to retrieve records from the table group wise using certain criteria.
Eg:
Display deptno, total salary of employee records for deptno 10 and whose city=’bbsr’
Ans:
Select deptno,sum(sal) from emp where city=’bbsr’ group by deptno having deptno=10
4.
Consider the universal relation R={A,B,C,D,E,F,G,H,I,J} and set of functional dependencies.
F={ AB→C,A→DE,B→F,F→GH,D→IJ}
What is the key for R ? Decompose R into 2NF relations.
ans:
In relation R
(AB)+ = {A,B,C,D,E,F,G,H,I,J}
A+= {A,D,E,I,J}
B+= {B,F}
Prepared By Nirjharinee Parida
Database Management System
5. Draw a state transition diagram and discuss the typical states that a transaction goes through
during execution
Ans:
TRANSCATION STATE:
A transaction must be in one of the following states:
• Active, the initial state; the transaction stays in this state while it is executing
• Partially committed, after the final statement has been executed
• Failed, after the discovery that normal execution can no longer proceed
• Aborted, after the transaction has been rolled back and the database has been
restored to its state prior to the start of the transaction
• Committed, after successful completion
The state diagram corresponding to a transaction appears in Figure 15.1. We say
that a transaction has committed only if it has entered the committed state. Similarly,we say that a
transaction has aborted only if it has entered the aborted state. A transaction is said to have terminated
if has either committed or aborted.
A transaction starts in the active state. When it finishes its final statement, it enters the partially
committed state. At this point, the transaction has completed its execution, but it is still possible that it
may have to be aborted, since the actual output may still be temporarily residing in main memory, and
thus a hardware failure may preclude its successful completion.
The database system then writes out enough information to disk that, even in the event of a failure, the
updates performed by the transaction can be re-created when the system restarts after the failure.
When the last of this information is written out, the transaction enters the committed state.
Prepared By Nirjharinee Parida
Database Management System


It can restart the transaction, but only if the transaction was aborted as a result of some
hardware or software error that was not created through the internal logic of the transaction. A
restarted transaction is considered to be a new transaction.
It can kill the transaction. It usually does so because of some internal logical error that can be
corrected only by rewriting the application program, or because the input was bad, or because
the desired data were not found in the database.
6.
Explain the informal design guidelines for relation schema
Ans:
Data base design process:
We can identify six main phases of the database design process:
1. Requirement collection and analysis
2. Conceptual data base design
3. Choice of a DBMS
4. Data model mapping(logical database design)
5. physical data base design
6. database system implementation and tuning
Prepared By Nirjharinee Parida
Database Management System
1. Requirement collection and analysis
Before we can effectively design a data base we must know and analyze the expectation of the
users and the intended uses of the database in as much as detail.
2. Conceptual data base design
The goal for this phase I s to produce a conceptual schema for the database that is independent
of a specific DBMS.
 We often use a high level data model such er-model during this phase
 We specify as many of known database application on transactions as possible
using a notation the is independent of any specific dbms.
 Often the dbms choice is already made for the organization the intent of
conceptual design still to keep , it as free as possible from implementation
consideration.
3.
Choice of a DBMS
The choice of dbms is governed by a no. of factors some technical other economic and still other
concerned with the politics of the organization.
The economics and organizational factors that offer the choice of the dbms are:
Software cost, maintenance cost, hardware cost, database creation and conversion cost,
personnel cost, training cost, operating cost.
4. Data model mapping (logical database design)
During this phase, we map the conceptual schema from the high level data model used on phase
2 into a data model of the choice dbms.
5. Physical databse design
During this phase we design the specification for the database in terms of physical storage
structure ,record placement and indexes.
6. Database system implementation and tuning
During this phase, the database and application programs are implemented, tested and
eventually deployed for service.
6.
What is the need for normalization? Explain the first, second and third normal forms with an example.
Ans:
NORMALIZATION
Prepared By Nirjharinee Parida
Database Management System
The basic objective of normalization is to reduce redundancy which means that information is to
be stored only once. Storing information several times leads to wastage of storage space and
increase in the total size of the data stored. Relations are normalized so that when relations in a
database are to be altered during the life time of the database, we do not lose information or
introduce inconsistencies. The type of alterations normally needed for relations are:
o
Insertion of new data values to a relation. This should be possible without being forced to
leave blank fields for some attributes.
o Deletion of a tuple, namely, a row of a relation. This should be possible without losing vital
information unknowingly.
o Updating or changing a value of an attribute in a tuple. This should be possible without
exhaustively searching all the tuples in the relation.
FIRST NORMAL FORM:
Defn: A relation scheme is said to be in first normal form(1NF) if the values in the domain of each
attribute of the relation are atomic. In other words, only one value is associated with each attribute and
the value is not a set of values or a list of values.
Functional dependencies are:
orderno → orderdate
SECOND NORMAL FORM:
Defn: A relation scheme R<S,F> is in second normal form(2NF) if it is in the !NF and if all non prime
attributes are fully functionally dependent on the relation keys.
Prepared By Nirjharinee Parida
Database Management System
A relation is said to be in2NF if it is in 1NF and non-key attributes are functionally dependent on the
key attribute(s). Further. if the key has more than one attribute then no non-key attributes should be
functionally dependent upon a part of the key attributes. Consider, for example, the relation given in
table 1. This relation is in 1NF. The key is (Order no.. Item code). The dependency diagram for
attributes of this relation is shown in figure 5. The non-key attribute Price_Unit is functionally
dependent on Item code which is part of the relation key. Also, the non-key attribute Order date is
functionally dependent on Order no. which is a part of the relation key.
Thus the relation is not in 2NF. It can be transformed to 2NF by splitting it into three
relations as shown in table 3.
In table 3 the relation Orders has Order no. as the key. The relation Order details has the
composite key Order no. and Item code. In both relations the non-key attributes are
functionally dependent on the whole key. Observe that by transforming to 2NF relations the
Prepared By Nirjharinee Parida
Database Management System
THIRD NORMAL FORM:
Defn: A relational scheme R<S,F> is in third normal form(3NF) if for all non trivial function dependencies
in F+ of the form X→A, either X contains a key(i.e, X is super key) or A is a prime key attribute.
A Third Normal Form normalization will be needed where all attributes in a relation tuple are not
functionally dependent only on the key attribute. If two non-key attributes are functionally dependent,
then there will be unnecessary duplication of data. Consider the relation given in table 4. Here. Roll no.
is the key and all other attributes are
functionally dependent on it. Thus it is in 2NF. If it is known that in the college all first
year students are accommodated in Ganga hostel, all second year students in Kaveri, all third year
students in Krishna, and all fourth year students in Godavari, then the non-key attribute Hostel name is
dependent on the non-key attribute Year. This dependency is shown in figure 6.
Prepared By Nirjharinee Parida
Database Management System
Observe that given the year of student, his hostel is known and vice versa. The dependency of hostel on
year leads to duplication of data as is evident from table 4. If it is decided to ask all first year students to
move to Kaveri hostel, and all second year students to Ganga hostel. this change should be made in
many places in table 4. Also, when a student's year of study changes, his hostel change should also be
noted in Table 4. This is undesirable. Table 4 is said to be in 3NF if it is in 2NF and no non-key attribute
is functionally dependent on any other non-key attribute. Table 4 is thus not in 3NF. To transform it to
3NF, we should introduce another relation which includes the functionally related non-key attributes.
This is shown in table 5.
Prepared By Nirjharinee Parida
Database Management System
8. Expalin the data base recovery technique based on deferred update.
Ans:
To ensure atomicity despite failures, we first output information describing the modifications to
stable storage without modifying the database itself.
We study two approaches:
 log-based recovery, and
 shadow-paging
We assume (initially) that transactions run serially, that is, one after the other.
Two approaches using logs
 Deferred database modification
 Immediate database modification
Deferred Database Modification













The deferred database modification scheme records all modifications to the log, but defers all
the writes to after partial commit.
Assume that transactions execute serially
Transaction starts by writing <Ti start> record to log.
A write(X) operation results in a log record <Ti, X, V> being written, where V is the new value
for X
o Note: old value is not needed for this scheme
The write is not performed on X at this time, but is deferred.
When Ti partially commits, <Ti commit> is written to the log
Finally, the log records are read and used to actually execute the previously deferred writes.
During recovery after a crash, a transaction needs to be redone if and only if both <Ti start>
and<Ti commit> are there in the log.
Redoing a transaction Ti ( redoTi) sets the value of all data items updated by the transaction to
the new values.
Crashes can occur while
the transaction is executing the original updates, or
while recovery action is being taken
example transactions T0 and T1 (T0 executes before T1):
T0: read (A)
T1 : read (C)
A: - A - 50
C:-C- 100
Prepared By Nirjharinee Parida
Database Management System
Write (A)
write (C)
read (B)
B:- B + 50
write (B)

Below we show the log as it appears at three instances of time.
If log on stable storage at time of crash is as in case:
(a) No redo actions need to be taken
(b) redo(T0) must be performed since <T0 commit> is present
(c) redo(T0) must be performed followed by redo(T1) since
<T0 commit> and <Ti commit> are present
Prepared By Nirjharinee Parida
Database Management System
RELATIONAL DATABASE MANAGEMENT SYSTEMS
Btech-4TH-SEMESTER EXAMINATION ,2008
1.
a.
What is the difference between a primary key and a candidate key.
Ans:
Candidate key:
In a relation R, a candidate key for R is a subset of the set of attributes of R, which have the
following properties:

Uniqueness:

Irreducible:
no two distinct tuples in R have the same values for
the candidate key
No proper subset of the candidate key has the
uniqueness property that is the candidate key.
Eg: (cname,telno)
Primary key:
The primary key is the candidate key that is chosen by the database designer as the principal means of
identifying entities with in an entity set. The remaining candidate keys if any, are called alternate key
b. Let R=(A,B,C,D) and functional dependencies
A→C, AB→D.
What is the closure of {AB}.
Ans:
Result={AB}
In A→c, A is subset of result {AB},
So result={AB}U {C}={ABC}
In AB→D, AB is subset of result {ABC},
So result={ABC}U {D}={ABCD}
{AB}+={ABCD}
c. What do you mean by semi less join ?
Prepared By Nirjharinee Parida
Database Management System
Ans
Semijoin Strategy



Let r1 be a relation with schema R1 stores at site S1
Let r2 be a relation with schema R2 stores at site S2
Evaluate the expression r1
r2 and obtain the result at S1.
1. Compute temp1  R1  R2 (r1) at S1.
2. 2. Ship temp1 from S1 to S2.
3. 3. Compute temp2  r2 temp1 at S2
4. 4. Ship temp2 from S2 to S1.
5. 5. Compute r1 temp2 at S1. This is the same as r1 r2.
d. Define super key and give example to illustrate the super key,
Ans:
A super key is a set of one or more attributes that taken collectively, allow us to identify uniquely an
entity in the entity set.
For example , customer-id,(cname,customer-id),(cname,telno)
e. What are two techniques to prevent deadlock.
Ans:
Deadlock prevention techniques:
wait-die scheme — non-preemptive

older transaction may wait for younger one to release data item.
Younger transactions never wait for older ones; they are rolled back
instead.
 a transaction may die several times before acquiring needed data item
wound-wait scheme — preemptive

older transaction wounds (forces rollback) of younger transaction
instead of waiting for it. Younger transactions may wait for older ones.
 may be fewer rollbacks than wait-die scheme.
f. What do you mean by multi valued dependency.
Ans:
MULTIVALUED DEPEDENCY:
Prepared By Nirjharinee Parida
Database Management System
Defn:Given a relation scheme R, Le X ,Y and Z be subsets of attributes of R. then the multi valued
dependency X Y holds in a relation R defined on R if given two tuples t1 and t2 in R with
t1(X)=t2(X);
R contains two tuples t3 and t4 with the following characteristics: t1,t2,t3,t4 have the X value i.e,
T1(X)= T2(X)=T3(X)= T4(X)
The Y values of t1 and t3 are the same and the Y values of t2 and t4 are the same .i.e,
T1(Y)= T3(Y),T2(Y)= T4(Y)
And T2(Z)= T3(Z),T1(Z)= T4(Z)
Eg:
course  teacher
course  book
g. Define and differentiate between natural join and inner join
ans:
The natural join is a binary operation that allows us to combine certain selections and a Cartesian
product into one operation. It is denoted by the “join” symbol . The natural-join operation forms a
Cartesian product of its two arguments, performs a selection forcing equality on those attributes that
appear in both relation schemas, and finally removes duplicate attributes.
�
h. What is meant by concurrency?
Concurrent Executions:
Multiple transactions are allowed to run concurrently in the system.
Advantages are:
increased processor and disk utilization, leading to better transaction throughput: one
transaction can be using the CPU while another is reading from or writing to the disk
reduced average response time for transactions: short transactions need not wait behind
long ones.
i. Mention the various categories of data model.
Ans:
The data model describes the structure of a database. It is a collection of conceptual tools for describing
data, data relationships and consistency constraints and various types of data model such as
4. Object based logical model
5. Record based logical model
Prepared By Nirjharinee Parida
Database Management System
6. Physical model
Types of data model:
4.
Object based logical model
a. ER-model
b. Functional model
c. Object oriented model
d. Semantic model
Record based logical model
a. Hierarchical database model
b. Network model
c. Relational model
Physical model
5.
6.
j.
Define : Entity type, Entity set and value set.
Ans:
An entity is a “thing” or “object” in the real world that is distinguishable from all other
objects.Entity type describes the charactstics of entity.
Customer(cno,name,city)
An entity set is a set of entities of the same type that share the same properties, or attributes.
The set of all persons who are customers at a given bank, for example, can be defined as the entity set
customer.
Values set specifies related attribute values for an entity set.
Value set of customer:
(c001,sujit,bbsr),(c002,minati,ctc),(c003,kunal,bbsr)
Prepared By Nirjharinee Parida
Database Management System
2.a
What is normalization? What is a key attribute in a relation .Explain the difference between first,
second and third normal forms.
Ans:
NORMALIZATION
The basic objective of normalization is to reduce redundancy which means that information is to
be stored only once. Storing information several times leads to wastage of storage space and
increase in the total size of the data stored. Relations are normalized so that when relations in a
database are to be altered during the life time of the database, we do not lose information or
introduce inconsistencies. The type of alterations normally needed for relations are:
o
Insertion of new data values to a relation. This should be possible without being forced to
leave blank fields for some attributes.
o Deletion of a tuple, namely, a row of a relation. This should be possible without losing vital
information unknowingly.
o Updating or changing a value of an attribute in a tuple. This should be possible without
exhaustively searching all the tuples in the relation.
Key attribute:
The key attribute of a relation uniquely identifies a row in a relation
Types of key attributes are :
Super key ,candidate key, primary key
FIRST NORMAL FORM:
Prepared By Nirjharinee Parida
Database Management System
Defn: A relation scheme is said to be in first normal form(1NF) if the values in the domain of each
attribute of the relation are atomic. In other words, only one value is associated with each attribute and
the value is not a set of values or a list of values.
The 1st normalform separate repeating and non repeating attributes
SECOND NORMAL FORM:
Defn: A relation scheme R<S,F> is in second normal form(2NF) if it is in the !NF and if all non prime
attributes are fully functionally dependent on the relation keys.
The 2nd normal form remove partial dependencies
THIRD NORMAL FORM:
Defn: A relational scheme R<S,F> is in third normal form(3NF) if for all non trivial function dependencies
in F+ of the form X→A, either X contains a key(i.e, X is super key) or A is a prime key attribute.
The 3rd normal form remove transitivity dependencies from a relation.
2.b
Define entity, attribute and relationships as used in relational databases. Describe purpose of ERmodel Illustrate your answer.
Ans:
An entity is a “thing” or “object” in the real world that is distinguishable from all other objects. For
example, each person in an enterprise is an entity. An entity has a set properties and the values for
some set of properties may uniquely identify an entity.
Prepared By Nirjharinee Parida
Database Management System
BOOK is entity and its properties(called as attributes) bookcode, booktitle, price etc .
Attributes:
An entity is represented by a set of attributes. Attributes are descriptive properties possessed by each
member of an entity set.
Customer is an entity and its attributes are customerid, custmername, custaddress etc.
An attribute as used in the E-R model , can be characterized by the following attribute types.




Simple and composite attribute
single-valued and multi-valued attribute
Derived Attribute
NULL valued attribute
Relationship sets:
A relationship is an association among several entities.
A relationship set is a set of relationships of the same type. Formally, it is a mathematical relation on
n>=2 entity sets. If E1,E2…En are entity sets, then a relation ship set R is a subset of
{(e1,e2,…en)|e1Є E1,e2 Є E2..,en Є En}
where (e1,e2,…en) is a relation ship.
customer
borro
w
loan
Consider the two entity sets customer and loan. We define the relationship set borrow to denote the
association between customers and the bank loans that the customers have.
Prepared By Nirjharinee Parida
Database Management System
Entity Relationship Model
The entity-relationship data model perceives the real world as consisting of basic objects, called entities
and relationships among these objects. It was developed to facilitate data base design by allowing
specification of an enterprise schema which represents the overall logical structure of a data base.
Main features of ER-MODEL:





Entity relationship model is a high level conceptual model
It allows us to describe the data involved in a real world enterprise in terms of objects and their
relationships.
It is widely used to develop an initial design of a database
It provides a set of useful concepts that make it convenient for a developer to move from a
based set of information to a detailed and description of information that can be easily
implemented in a database system
It describes data as a collection of entities, relationships and attributes.
Ans:
B-Tree Index Files



Similar to B+-tree, but B-tree allows search-key values to appear only once; eliminates
redundant storage of search keys.
Search keys in nonleaf nodes appear nowhere else in the B-tree; an additional pointer
field for each search key in a nonleaf node must be included.
Generalized B-tree leaf node
Prepared By Nirjharinee Parida
Database Management System

Non leaf node – pointers Bi are the bucket or file record pointers
B-Tree Index File Example
Prepared By Nirjharinee Parida
Database Management System
10
30
20
2,10
4
10
2
30
86
30
20,30
2
6
10, 30
2
86
10, 30
2,4
86
30
10
10
3
30
4
2
20
6
86
2,3
60
30
4
20
6
86
10
10
88
30
4
2,3
6
20
4
84
60,86
30, 84
2,3
6
20
60
86
Prepared By Nirjharinee Parida
Database Management System
10
10
30, 84
4
30, 84
4
2,3
33
20
6
2,3
52
20
6
86,88
86,88
33,60
60
10
52
4
91
2,3
6
84
30
60
33
20
86,88
10
10
4
2,3
2,3
52
4
52
84,88
6
30
84,88
6
30
69
20
91
33
20
91
33
60,69
60
86
86
Prepared By Nirjharinee Parida
Database Management System
Ans:
RELATIONAL MODEL
Relational model is simple model in which database is represented as a collection of “relations” where
each relation is represented by two-dimensional table.
The relational model consists of three major components:
1.
The set of relations and set of domains that define the way, data can be represented(data
structures)
The relational model was founded by E.F.Codd of the IBM in 1972.The basic concept in the relational
model is that of a relation.
Properties:
o
o
o
o
It is column homogeneous. In other words, in any given column of a table, all items are of the
same kind.
Each item is a simple number or a character string. That is a table must be in first normal form.
All rows of a table are distinct.
The ordering of rows with in a table is immaterial.
Prepared By Nirjharinee Parida
Database Management System
o
2.
The column of a table are assigned distinct names and the ordering of these columns in
immaterial.
Integrity rules that define the procedure to protect data(data integrity)
Integrity constraints:
There are two types of integrity constraints:
o
o
Entity integrity constraints
Referential integrity constraints
Entity integrity constraints:
It states that no primary key value can be null and unique. This is because the primary key is used to
identify individual tuple in the relation. So we will not be able to identify the records uniquely containing
null values for the primary key attributes. This constraint is specified on one individual relation.
Referential integrity constraints:
It states that the tuple in one relation that refers to another relation must refer to an existing tuple in
that relation. This constraint is specified on two relations.
If a column is declared as foreign key that must be primary key of another table.
3.
The operations that can be performed on data (data manipulation)
They are insertion, deletion and updating records in a relation.
The two models that can use SQL are:
Relational model, object oriented model
Prepared By Nirjharinee Parida
Database Management System
Locking of tables
Both implied and explicit read locks are still a read lock, however a read lock that had been obtained will
prevent any writes to the table.
Difference between implicit and explicit locks.
A select statement will cause an implicit read lock within the execution of the select, release after the
select completes. If there are multiple selects, the read lock is obtained for each select separately, so
writes is possible in between the multiple selects.
An explicit read lock will keep the lock until it is released. You can do multiple select between the LOCK
TABLES and UNLOCK TABLES and you are sure that no data in the tables you have read locked is
changed in between the multiple selects.
Take an example:
LOCK TABLE t1 READ, t2 READ;
SELECT * FROM t1;
SELECT * FROM t2;
UNLOCK TABLES;
With the explicit LOCK TABLES, you can be sure that there is no changes to the tables in between your
selects. So, the data you read is consistent.
Assuming you do not have the Lock Tables, what happened is that another connection is able to update
the 2 tables in between your select, example, after you complete the 1st select, another connection is
able to update the table t1 and t2 before you execute the 2nd select, in this case the data you have
obtained from the 2 selects may no longer be consistent since the data had been changed between your
1st and 2nd select.
Therefore, you need the explicit Read Lock to get a consistant view of the data for the 2 selects.
4.
Prepared By Nirjharinee Parida
Database Management System
Object oriented database
Definition An object-oriented database management system (OODBMS), sometimes shortened to ODBMS
for object database management system), is a database management system (DBMS) that
supports the modelling and creation of data as objects. This includes some kind of support for
classes of objects and the inheritance of class properties and methods by subclasses and their
objects. There is currently no widely agreed-upon standard for what constitutes an OODBMS,
and OODBMS products are considered to be still in their infancy. In the meantime, the objectrelational database management system (ORDBMS), the idea that object-oriented database
concepts can be superimposed on relational databases, is more commonly encountered in
available products. An object-oriented database interface standard is being developed by an
industry group, the Object Data Management Group (ODMG). The Object Management Group
(OMG) has already standardized an object-oriented data brokering interface between systems in
a network.
Object-Oriented Database Advantages
why Versant's object-oriented database solutions instead of traditional RDBMS?
Where data handling requirements are simple and suitable to rigid row and column structures an RDBMS
might be an appropriate solutiuon. However,for many applications, today's most challenging aspect is
controlling the inherent complexity of the subject matter itself - the complexity must be tamed. And
tamed in a way that enables continual evolution of the application as the environment and needs change.
For these applications, an object-oriented database is the best answer:
COMPLEX (INTER-) RELATIONSHIPS
If there are a lot of many-to-many relationships, tree structures or network (graph) structures then
Versant's object-oriented database solutions will handle those relationships much faster than a relational
database.
COMPLEX DATA
For many applications, the most challenging aspect is controlling the inherent complexity of the subject
matter itself - the complexity must be tamed. For these applications, a Versant object-oriented database is
the best answer. Architectures that mix technical needs such as persistence (and SQL) with the domain
model are an invitation to disaster. Versant's object-oriented database solutions let you develop using
objects that need only contain the domain behaviour, freeing you from persistence concerns.
Prepared By Nirjharinee Parida
Database Management System
NO MAPPING LAYER
It is difficult, time consuming, expensive in development, and expensive at run time, to map the objects
into a relational database and performance can suffer. Versant's object-oriented database solutions store
objects as objects - yes, it's as easy as 1, 2, 3. Versant's object database solutions are designed to store
many-to-many, tree and network relationships as named bi-directional associations without having the
need for JOIN tables. Hence, Versant's object database solutions save programming time, and objects can
be stored and retrieved faster. Modern O/R mapping tools may simplify many mapping problems,
however they don’t provide seamless data distribution or the performance of Versant's object-oriented
database solutions.
FAST AND EASY DEVELOPEMENT, ABILITY TO COPE WITH CONTINOUS EVOLUTION
The complexity of telecommunications infrastructure, transportation networks, simulations, financial
instruments and other domains must be tamed. And tamed in a way that enables continual evolution of the
application as the environment and needs change. Architectures that mix technical needs such as
persistence (and SQL) with the domain model are an invitation to disaster. Versant's object-oriented
database solutions let you develop using objects that need only contain the domain behaviour, freeing you
from persistence concerns.
5.a State Armstrong’s axiom. Show that Armstrong’s axiom are complete.
Ans:
The closure of F, denoted by F+, is the set of all functional dependencies logically implied by F.
The closure of F can be found by using a collection of rules called Armstrong axioms.
Reflexivity rule: If A is a set of attributes and B is subset or equal to A, then A→B holds.
Augmentation rule: If A→B holds and C is a set of attributes, then CA→CB holds
Transitivity rule: If A→B holds and B→C holds, then A→C holds.
Union rule: If A→B holds and A→C then A→BC holds
Decomposition rule: If A→BC holds, then A→B holds and A→C holds.
Pseudo transitivity rule: If A→B holds and BC→D holds, then AC→D holds.
Suppose we are given a relation schema R=(A,B,C,G,H,I) and the set of function dependencies
A→B,A→C,CG→H,CG→I,B→H
We list several members of F+ here:
 A→H, since A→B and B→H hold, we apply the transitivity rule.
 CG→HI. Since CG→H and CG→I , the union rule implies that CG→HI
 AG→I, since A→C and CG→I, the pseudo transitivity rule implies that AG→I holds
Prepared By Nirjharinee Parida
Database Management System
5.b.Expalin difference between inner join and outer join. What are restriction on using outer join?
Give examples to support your answer.
Ans:
Inner
join
:
An
inner
join
(sometimes
called
a
simple
join
)
is
a
join of two or more tables that returns only those rows that satisfy the join condition.
Outer Joins : An outer join extends the result of a simple join. An
outer join returns all rows that satisfy the join condition and also returns some or all of those rows from
one table for which no rows from the other satisfy the join condition
What restrictions are imposed on outer join queries that use the operator (+)?
Outer join queries that use the Oracle join operator (+) are subject to the following rules and
restrictions, which do not apply to the FROM clause OUTER JOIN syntax:





The (+) operator cannot be specified in a query block that also contains the FROM clause OUTER
JOIN syntax.
The (+) operator can appear only in the WHERE clause, or in the context of left-correlation (that
is, when specifying the TABLE clause) in the FROM clause, and can be applied only to a column
of a table or view.
If A and B are joined by multiple join conditions, then the (+) operator must be used in all of
these conditions. If not done, then Oracle Database will return only the rows resulting from a
simple join, but without a warning or error showing that the results are not of an outer join.
The (+) operator does not produce an outer join if one table is specified in the outer query and
the other table in an inner query.
The (+) operator cannot be used to outer-join a table to itself, although self joins are valid.
Example
The following statement is not valid:
SELECT EMPLOYEE_ID, MANAGER_ID
FROM EMPLOYEES
WHERE EMPLOYEES.MANAGER_ID(+) = EMPLOYEES.EMPLOYEE_ID;
Prepared By Nirjharinee Parida
Database Management System
However, the following self join is valid:
SELECT E1.EMPLOYEE_ID, E1.MANAGER_ID, E2.EMPLOYEE_ID
FROM EMPLOYEES E1, EMPLOYEES E2
WHERE E1.MANAGER_ID(+) = E2.EMPLOYEE_ID;




The (+) operator can be applied only to a column, not to an arbitrary expression. However, an
arbitrary expression can contain one or more columns marked with the (+) operator.
A WHERE condition containing the (+) operator cannot be combined with another condition
using the OR logical operator.
A WHERE condition cannot use the IN comparison condition to compare a column marked with
the (+) operator with an expression.
A WHERE condition cannot compare any column marked with the (+) operator with a sub-query.
If the WHERE clause contains a condition that compares a column from table B with a
constant, then the (+) operator must be applied to the column so that Oracle returns the
rows from table A for which it has generated nulls for this column. Otherwise, Oracle
returns only the results of a simple join.
Ans:
Data redundancy and inconsistency:
The same information may be written in several files. This redundancy leads to higher storage
and access cost. It may lead data inconsistency that is the various copies of the same data may
longer agree for example a changed customer address may be reflected in single file but not else
where in the system.
Reduction of redundancies in dbms::
Centralized control of data by the DBA avoids unnecessary duplication of data and effectively
reduces the total amount of data storage required avoiding duplication in the elimination of the
inconsistencies that tend to be present in redundant data files.
Eg:
Prepared By Nirjharinee Parida
Database Management System
Here order no 1456 and order date 260289 ,order no 1886 and order date 1886 is replicated in the
relation.
Ans:
Primary key:
The primary key is the candidate key that is chosen by the database designer as the principal means of
identifying entities with in an entity set. The remaining candidate keys if any are called alternate key
Foreign key:
Foreign key is a attribute or group of attribute which is derived from other table in which this
attribute/attributes must be primary key.
Department(deptcode,dname)
Here the deptcode is the primary key.
Prepared By Nirjharinee Parida
Database Management System
Emp(empcode,name,city,deptcode).
Here the deptcode is foreign key.
Candidate keys of R:
AB+=ABC
AE+=ABCDE
CD+=ABCDE
BE+=ABCDE
A+=A
B+=B
C+=C
D+=AD
E+=E
Here super keys are:
{AE},{CD},{BE}
Because AE+,CD+,BE+ Gives all the attributes of relation R(A,B,C,D,E)
In AE,CD,BE there is no extraneous attribute so they are also candidate keys.
Distributed Database System
Distributing data across sites or departments in an organization allows those data to reside where they are
generated or most needed, but still to be accessible from other sites and from other departments. Keeping
multiple copies of the database across different sites also allows large organizations to continue their
database operations even when one site is affected by a natural disaster, such as flood, fire, or earthquake.
Prepared By Nirjharinee Parida
Database Management System
Distributed database systems handle geographically or administratively distributed data spread across
multiple database systems.
In a homogeneous distributed database
o
o
o
All sites have identical software
Are aware of each other and agree to cooperate in processing user requests.
Each site surrenders part of its autonomy in terms of right to change schemas or
software
o Appears to user as a single system
In a heterogeneous distributed database
o
o
Different sites may use different schemas and software
 Difference in schema is a major problem for query processing
 Difference in softwrae is a major problem for transaction processing
Sites may not be aware of each other and may provide only
limited facilities for cooperation in transaction processing
(Distributed database system)
Prepared By Nirjharinee Parida
Database Management System

A distributed database allows a user convenient and transparent access to data which is not stored at
the site, while allowing each site control over its own local data. A distributed database can be made
more reliable than a centralized system because if one site fails, the database can continue functioning,
but if the centralized system fails, the database can no longer continue with its normal operation. Also,
a distributed database allows parallel execution of queries and possibly splitting one query into many
parts to increase throughput.

A centralized system is easier to design and implement. A centralized system is cheaper to operate
because messages do not have to be sent.
Prepared By Nirjharinee Parida
Database Management System
Ans:
SELECT * FROM S,R WHERE S.A=R.A AND S.B=R.G
RELATIONAL ALGEBRA:
∏ S,A,S.B,S.C,R.F,R.G(
S.A=R.A AND S.B=R.G(SXR))
OUTPUT:
A
B
C
F
G
8
6
5
2
6
(c) Define “Data Mining”. What are the supports must available with DBMS to facitate data mining.
Ans:
Prepared By Nirjharinee Parida
Database Management System
The term data mining refers loosely to the process of semi automatically analyzing
large databases to find useful patterns. Like knowledge discovery in artificial intelligence
(also called machine learning), or statistical analysis, data mining attempts to discover
rules and patterns from data. However, data mining differs from machine learning and
statistics in that it deals with large volumes of data, stored primarily on disk. That is, data
mining deals with “knowledge discovery in databases”.
What technological infrastructure is required?
Today, data mining applications are available on all size systems for mainframe, client/server,
and PC platforms. System prices range from several thousand dollars for the smallest
applications up to $1 million a terabyte for the largest. Enterprise-wide applications generally
range in size from 10 gigabytes to over 11 terabytes. NCR has the capacity to deliver
applications exceeding 100 terabytes. There are two critical technological drivers:

Size of the database: the more data being processed and maintained, the more powerful the
system required.

Query complexity: the more complex the queries and the greater the number of queries being
processed, the more powerful the system required.
Relational database storage and management technology is adequate for many data mining
applications less than 50 gigabytes. However, this infrastructure needs to be significantly
enhanced to support larger applications. Some vendors have added extensive indexing
capabilities to improve query performance. Others use new hardware architectures such as
Massively Parallel Processors (MPP) to achieve order-of-magnitude improvements in query
time. For example, MPP systems from NCR link hundreds of high-speed Pentium processors to
achieve performance levels exceeding those of the largest supercomputers
RELATION ALGEBRA:
Relational algebra is a set of basic operations used to manipulate the data in relational model. These
operations enable the user to specify basic retrieval request. The result of retrieval is anew relation,
formed from one or more relation. This operation can be classified in two categories.
 Basic Set Operation
 Union
 Intersection
 Set difference
Prepared By Nirjharinee Parida
Database Management System
 Cartesian product
 Relational operations
 Select
 Project
 Join
 Division
Other operators are Rename, Assignment
Ans:
Set Difference Operation
 Notation r – s
 Defined as:
 r – s = {t | t  r and t  s}
 Set differences must be taken between compatible relations.
o r and s must have the same arity
o attribute domains of r and s must be compatible
Set Difference Operation – Example
Prepared By Nirjharinee Parida
Database Management System
Relations r, s:
A
B
A
B

1

2

2

3

1
s
r
r – s:
A
B

1

1
8.C.
Construct a B+ tree of order 1 with following keys 1,9,5,3,7,11,17,13,15
Ans:
A B+-tree is a rooted tree satisfying the following properties:
Prepared By Nirjharinee Parida
Database Management System




All paths from root to leaf are of the same length
Each node that is not a root or a leaf has between [n/2] and n children.
A leaf node has between [(n–1)/2] and n–1 values
Special cases:
 If the root is not a leaf, it has at least 2 children.
 If the root is a leaf (that is, there are no other nodes in the tree), it can have
between 0 and (n–1) values.
Here order of B+ tree is 1 means maximum n-1 i.e 0 elements can be stored in a node .It is not possible
os order must be greater then 1.
B+ tree for order 4
The elements to insert are 1,9,5,3,7,11,17,13,15.
Prepared By Nirjharinee Parida
Database Management System
Insert 1,9,5
Insert 3
1
9
5
1
5
9
5
Insert 7
1
1
3
5
3
5
1
5
1
3
9
5
Insert 17
5
3
3
9
5
5
11
Insert 13
5
3
3
Insert 11
3
3
1
9
9
11
17
11
5
9
11
13
3
17
Prepared By Nirjharinee Parida
Database Management System
Outer Joins : An outer join extends the result of a simple join. An
outer join returns all rows that satisfy the join condition and also returns some or all of those rows from
one table for which no rows from the other satisfy the join condition
The outer join operation is an extension of the join operation to deal with missing information.
There are three forms of outer join
 left outer join
 right outer join
 full outer join
Notation of outer joins:
Left Outer Join
loan
Borrower
Right Outer Join
loan
borrower
Full Outer Join
loan
borrower
Prepared By Nirjharinee Parida
Database Management System
RELATIONAL DATABASE MANAGEMENT SYSTEMS
Mca-3rd-SEMESTER EXAMINATION ,2008
1.
a. What are the different operators in relational algebra ?
Relational algebra is a set of basic operations used to manipulate the data in relational model. These
operations enable the user to specify basic retrieval request. The result of retrieval is a new relation,
formed from one or more relation. These operation can be classified in two categories.
 Basic Set Operation
 Union
 Intersection
 Set difference
 Cartesian product
 Relational operations
 Select
 Project
 Join
 Division
b.
b.Define and differentiate between external, internal and conceptual.
Ans:
External level :
The external level is at the highest level of database abstraction . At this level, there will be many views
define for different users requirement. A view will describe only a subset of the database. Any number
of user views may exist for a given global or subschema.
Conceptual level :
At this level of database abstraction all the database entities and the relationships among them
are included . One conceptual view represents the entire database . This conceptual view is defined by
the conceptual schema.
Internal level :
Prepared By Nirjharinee Parida
Database Management System
It is the lowest level of abstraction closest to the physical storage method used . It indicates how
the data will be stored and describes the data structures and access methods to be used by the
database . The internal view is expressed by internal schema.
b. What is an outer join and use a diagram to explain
Ans:
OUTER JOIN:
The outer join operation is an extension of the join operation to deal with missing information.
There are three forms of outer join
 left outer join
 right outer join
 full outer join
Outer join:
Results of
Natural join
Results of missing
information from
left relation
Results of missing
information from
right relation
d.
Ans:
Lost update problem:
Consider the situation illustrated in figure.That figure is meant to be read as follows: Transcation A
retrieves some tuple t at time t1, transaction B retireves that same tuple t at time t2, transaction A
updates the tuple (on the basis of the values seen at time t1) at time t3, and transaction B updates
the same tuple (on the basis of the values seen at time t2 which are the same as those seen at time
Prepared By Nirjharinee Parida
Database Management System
t1) at time t4. Transcation A’s update is lost at time t4, because truncation B overwrites it with out
even looking at it.
Transaction A
Transaction B
Retrieve t
t1
t2
update t
retrieve t
t3
t4
update t
Ans:
A database management system that provides three level of data is said to follow
architecture .
three-level
External level
Conceptual level
Internal level
There are two mapping in three level architecture



1. External/conceptual mapping
2. conceptual/internal mapping
A mapping between the external and conceptual view gives the correspondence among the records
and the relationships of the external and conceptual views.
Prepared By Nirjharinee Parida
Database Management System
Conceptual schmea is related to the internal schema by the conceptual/internal mapping. This
enables the dbms to find the actual record or combination of records in physical storage that
constitute a logical record in conceptual schema
A relationship represents an association between two or more entities. They are classified un terms
of degree ,connectivity, cardinality and existence. For example, IS-A relationship, HAS-A relationship
etc.
A particular occurrence of a relationship is called as relationship instance or relational occurrence. In
an ER-model similar relationships are grouped into relations ser called as the relation types. It is set
of meaningful association between one or more participating entity types.
Primary key:
The primary key is the candidate key that is chosen by the database designer as the principal means of
identifying entities with in an entity set. The remaining candidate keys if any, are called alternate key or
secondary key
Foreign key is a derived key attribute from another relation in which that must be primary key or Unique
key.
Department(deptcode,dname)
Here the deptcode is the primary key.
Emp(empcode,name,city,deptcode).
Here the deptcode is foreign key.
Yes.
Prepared By Nirjharinee Parida
Database Management System
Physical data independence is the ability to modify the physical scheme without making it
necessary to rewrite application programs. Such modifications include changing from unblocked
to blocked record storage, or from sequential to random access files.
The tuple relational calculus is a non procedural query language. It describes the desired
information with out giving a specific procedure for obtaining that information.
A query in the tuple relational calculus is expressed as:
{t|P(t)}
where P is a formula. Several tuple variables may appear in a formula. A tuple variable is said to
be a free variable unless it is quantified by a  or .
A functional dependency set is said to be minimal
When it satisfies 3 conditions
a. No- redundant functional dependency set
b. Left reduced functional dependency set
c. Every Fd set is simple means right hand side of FD must have only one attribute.
Prepared By Nirjharinee Parida