Download DATABASE MANAGEMENT SYSTEMS

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

IMDb wikipedia , lookup

SQL wikipedia , lookup

Commitment ordering wikipedia , lookup

Microsoft SQL Server wikipedia , lookup

Open Database Connectivity wikipedia , lookup

Relational algebra wikipedia , lookup

Entity–attribute–value model wikipedia , lookup

Microsoft Jet Database Engine wikipedia , lookup

Database wikipedia , lookup

Serializability wikipedia , lookup

Versant Object Database wikipedia , lookup

Clusterpoint wikipedia , lookup

Extensible Storage Engine wikipedia , lookup

Concurrency control wikipedia , lookup

Relational model wikipedia , lookup

Database model wikipedia , lookup

Transcript
DATABASE MANAGEMENT SYSTEMS
Two Mark Questions
Unit I- Introduction
1. Define database
A database is a collection of information that is organized so that it can easily be accessed, managed, and
updated.
2. Define DBMS
A database-management system (DBMS) is a collection of interrelated data and a set of programs to
access those data. The collection of data, usually referred to as the database, contains information relevant
to an enterprise. The primary goal of a DBMS is to provide a way to store and retrieve database
information that is both convenient and efficient.
3. What are the applications of database systems?
• Banking:
• Airlines:
• Universities:
• Telecommunication:
• Finance:
• Sales:
• Manufacturing:
• Human resources:
4. What is the difference between file systems and database systems? (or)
Compare file systems and database systems (or)What are the advantages of database systems?
 Data redundancy and inconsistency
 Difficulty in accessing data
 Data isolation
 Integrity problems
 Atomicity problems
 Concurrent-access anomalies
 Security problems
5. Define atomicity
Atomicity states that database modifications must follow an “all or nothing” rule. Each transaction is said
to be “atomic.” If one part of the transaction fails, the entire transaction fails.
6. What are the three levels of abstraction?
 Physical level
 Logical level
 View level
7. Define instance
The collection of information stored in the database at a particular moment is called an instance of the
Database.
8. Define schema
The overall design of the database is called the database schema. Schemas are changed infrequently, if at
all.
1
9. Define data model
Data model is a collection of conceptual tools for describing data, data relationships, data semantics, and
consistency constraints.
10. Name the types of data model
 The Entity-Relationship Model
 Relational model
 Object oriented data model
 Semi structured model
 Hierarchical data model
 Network Model
11. What are the two types of database languages?
 Data definition language
 Data manipulation language
12. Define data dictionary
A data dictionary contains metadata—that is, data about data. The schema of a table is an example of
metadata. A database system consults the data dictionary before reading or modifying actual data.
13. Define DDL
A database schema by a set of definitions expressed by a special language called a data-definition
language (DDL).
14. Define DML
A data-manipulation language (DML) is a language that enables users to access or manipulate data as
organized by the appropriate data model. There are basically two types:
• Procedural DMLs require a user to specify what data are needed and how to get those data.(Eg:
Relational Algebra)
• Declarative DMLs (also referred to as nonprocedural DMLs) require a user to specify what data are
needed without specifying how to get those data.(Eg: Tuple Relational Calculus & Domain Relational
Calculus)
15. Define query language
A query is a statement requesting the retrieval of information. The portion of a DML that involves
information retrieval is called a query language.
16. What are the different types of users work with the database system?
 Naive users
 Application programmers
 Sophisticated users
 Specialized users
17. What is the use of naïve users?
Naive users are unsophisticated users who interact with the system by invoking one of the application
programs that have been written previously. Naive users may also simply read reports generated from the
database.
2
18. Define Sophisticated users
Sophisticated users interact with the system without writing programs. Instead, they form their requests in
a database query language. They submit each such query to a query processor, whose function is to break
down DML statements into instructions that the storage manager understands.
19. Compare 2-tier and 2-tier architecture
Difference
2-tier
The application is partitioned into a
Definition
component that resides at the client
machine, which invokes database
system functionality at the server
machine through query language
statements.
Example
Diagram
ODBC and JDBC
2-tier
The client machine acts as merely a front
end and does not contain any direct
database calls. Instead, the client end
communicates with an application server,
usually through a forms interface. The
application server in turn communicates
with a database system to access data.
Business logic of the application,WWW
20. Define Specialized users
Specialized users are sophisticated users who write specialized database applications that do not fit into
the traditional data-processing framework.
21. What is the role of DBA(Database Administrator)?
 Schema definition
 Storage structure and access-method definition
 Schema and physical-organization modification
 Granting of authorization for data access
 Routine maintenance
22. Define E-R Model
Entity-Relationship (ER) model, a high-level data model that is useful in developing a conceptual design
for a database. Creation of an ER diagram, which is one of the first steps in designing a database, helps
the designer(s) to understand and to specify the desired components of the database and the relationships
among those components. An ER model is a diagram containing entities or "items", relationships among
them, and attributes of the entities and the relationships.
23. Define entity
An entity is a “thing” or “object” in the real world that is distinguishable from all other objects. For
example, each person in an enterprise is an entity.
3
24. Define entity set
An entity set is a set of entities of the same type that share the same properties, or attributes. The set of all
persons who are customers at a given bank, for example, can be defined as the entity set customer.
25. What are the attributes characterized in E-R model?
 Simple and composite attributes.
 Single-valued and multivalued attributes
 Derived attribute.
26.Define relationship
A relationship is an association among several entities.
27. Define relationship set
A relationship set is a set of relationships of the same type. Formally, it is a mathematical relation on n ≥
2 (possibly nondistinct) entity sets. If E1, E2, . . .,En are entity sets, then a relationship set R is a subset of
where (e1, e2, . . . , en) is a relationship.
28. Define Mapping cardinalities. What are its types?
Mapping cardinalities, or cardinality ratios, express the number of entities to which another entity can be
associated via a relationship set. For a binary relationship set R between entity sets A and B, the mapping
cardinality must be one of the following:
• One to one.
• One to many.
• Many to one.
• Many to many.
29. Define Keys
A key allows us to identify a set of attributes that suffice to distinguish entities from each other. Keys also
help uniquely identify relationships, and thus distinguish relationships from each other.
30. Define super key
A
super
key
is
defined
in
the
relational
model
attributes
of
a
relation
variable
for
which
it
all
relations
assigned
to
that
variable
there
are
tuples (rows) that have the same values for the attributes in this set.
as
a
set
of
holds
that
in
no
two
distinct
31. Define candidate key
A candidate key of a relationship is a set of attributes of that relationship such that there are no two
distinct tuples with the same values for these attributes. In simple example candidate key is a minimal
super key, i.e. a super key of which no proper subset is also a super key.
32. Define primary key
The primary key of a relational table uniquely identifies each record in the table. It can either be a normal
attribute that is guaranteed to be unique (such as Social Security Number in a table with no more than one
record per person) or it can be generated by the DBMS (such as a globally unique identifier, or GUID, in
Microsoft SQL Server). Primary keys may consist of a single attribute or multiple attributes in
combination.
4
33. What is the use of E-R diagram?
An entity-relationship (ER) diagram is a specialized graphic that illustrates the interrelationships between
entities in a database. ER diagrams often use symbols to represent three different types of information.
Boxes are commonly used to represent entities. Diamonds are normally used to represent relationships
and ovals are used to represent attributes.
34. Define weak entity sets
An entity set may not have sufficient attributes to form a primary key. Such an entity set is termed a weak
entity set.
35. Define strong entity sets
An entity set that has a primary key is termed a strong entity set.
36. Define Specialization
The process of designating sub groupings within an entity set is called specialization.
37. Define Generalization
Generalization, which is a containment relationship that exists between a higher-level entity set and one or
more lower-level entity sets.
38. Define Attribute Inheritance
A crucial property of the higher- and lower-level entities created by specialization and generalization is
attribute inheritance. The attributes of the higher-level entity sets are said to be inherited by the lowerlevel entity sets.
39. Define aggregation
Aggregation is an abstraction through which relationships are treated as higher level entities.
40. What are the disadvantages of E-R model?
 Limited constraint representation
 Limited relationship representation
 No representation of data manipulation
 Loss of information
5
41. What are the advantages of E-R model?
 It is easy and simple to understand with minimal training. Therefore the model can be used by the
database designer to communicate design to the end user.
 It has explicit linkages between entities
 It is possible to find a connection from one node to all the other nodes.
42. What are the features of E-R model?
 It has a high degree of data independence and seeks to remove redundancy in data representation
based on mathematical theorem.
 The ER model is a top-down approach in system design
 It can be used as a basis for the unification of different views of data such as; network model,
relational or entity modeling
 It was developed after the relational database when the industry shifted its attention to transaction
processing
43. Define cardinality.
Cardinality: the cardinality of a relationship is the actual number of related occurrences for each of the
related entities. The connectivity of a relationship describes the mapping of associated entity instances in
the relationship. The values of connectivity are "one" or "many". The basic types of connectivity for
relations are: one-to-one, one-to-many, and many-to-many. Many-to-many relationships cannot be
translated to relational tables; these relations must be translated into two or more one-to-many
relationships.
44. Why is it better to use the n-ary relationship over the binary relationship?
The n-ary relation can accept more entity relations than binary which only accepts two. An n-ary
relationship set shows more clearly that several entities participate in a single relationship.
45. What is the difference between a single inheritance and multiple inheritances?
Single inheritance is when a given entity set is involved in a lower entity set in only one ISA relationship.
A multiple inheritance is when it is involved in more than one ISA relationship.
46. Define UML.
Unified Modeling Language (UML), is a proposed standard for creating specifications of various
components of a software system. Some of the parts of UML are:
• Class diagram.
• Use case diagram.
• Activity diagram.
• Implementation diagram.
Unit2- Relational Model
1. Define relational model
Relational model is most widely used data model for commercial data-processing. The reason it’s used so
much is, because it’s simple and easy to maintain. The model is based on a collection of tables. Users of
the database can create tables, insert new tables or modify existing tables.
2. Define relational databases
A relational database consists of a collection of tables, each of which is assigned a unique name. A row in
a table represents a relationship among a set of values. Since a table is a collection of such relationships,
6
there is a close correspondence between the concept of table and the mathematical concept of relation,
from which the relational data model takes its name.
3. Define foreign key
A foreign key is a field in a relational table that matches the primary key column of another table. The
foreign key can be used to cross-reference tables.
4. What is the schema diagram?
A database schema, along with primary key and foreign key dependencies, can be depicted pictorially by
schema diagrams.
5. Define relational algebra
The relational algebra is a procedural query language. It consists of a set of operations that take one or
two relations as input and produce a new relation as their result.
6. What are the basic operations of relational algebra?
 Select
 Project
 Union
 set difference
 Cartesian product
 Rename
Note: The first three (unary operations & the remaining three are binary operation)
7. Define schema diagram
Schema diagram is defined as the pictorial representation of a database schema, along with primary key
and foreign key dependencies.
.
8. Define Relational algebra operations(selection, projection, union, set diffrenec, cartisian with an
example.
9. Define Views.
Any relation that is not part of the logical model, but is made visible to a user as a virtual relation, is
called views. The syntax for view is,
CREATE OR REPLACE VIEW <view_name> AS
SELECT <column_name>
FROM <table_name>;
10. Define Materialized Views
Database systems allow view relations to be stored, but they make sure that, if the actual relations used in
the view definition change, the view is kept up to date. Such views are called materialized views.
11. What are the conditions used to updating an view?
 In view definition, from clause has only one table
 Select clause contains only the attribute name but does not have any expression, aggregation
function
 Any attribute in the select clause (not listed) can be set to NULL
 The query does not have a group by or having clause
7
12. Define Tuple Relational calculus (TRL).
The tuple relational calculus, is a nonprocedural query language. It describes the desired information
without giving a specific procedure for obtaining that information. A query in the tuple relational calculus
is expressed as
{t | P(t)}
that is, it is the set of all tuples t such that predicate P is true for t.
13. Define Domain Relational calculus(DRC)
Domain relational calculus, uses domain variables that take on values from an attributes domain, rather
than values for an entire tuple. An expression in the domain relational calculus is of the form
{< x1, x2, . . . , xn > | P(x1, x2, . . . , xn)}
where x1, x2, . . . , xn represent domain variables. P represents a formula composed of atoms, as was the
case in the tuple relational calculus.
14. Define referential integrity.
Referential integrity is a database concept that ensures that relationships between tables remain consistent.
When one table has a foreign key to another table, the concept of referential integrity states that you may
not add a record to the table that contains the foreign key unless there is a corresponding record in the
linked table.
15. Define Integrity constraints
Integrity constraints provide a mechanism for ensuring that data conforms to guidelines specified by the
database administrator. The constraints available in SQL are Foreign Key, Not Null, Unique, Check.
Constraints can be defined in two ways
1) The constraints can be specified immediately after the column definition. This is called column-level
definition.
2) The constraints can be specified after all the columns are defined. This is called table-level definition.
16. Define Domain Integrity.
The domain integrity states that every element from a relation should respect the type and restrictions of
it's corresponding attribute. A type can have a variable length which needs to be respected. Restrictions
could be the rage of a value that the element can have, the default value if none is provided and if the
element can be NULL.
17. What do you mean by triggers?
It defines the actions to be executed automatically when certain events occur and corresponding
conditions are satisfied.
18. Define authentication.
Authentication is any process by which you verify that someone is who they claim they are.
19. Define authorization.
The process of giving individuals access to system objects based on their identity.
20. Define embedded SQL
Embedded SQL is defined as the process of embedding the SQL within procedural programming
languages. These language (sometimes referred to as 3GLs) include C/C++, Cobol, Fortran, and Ada.
Thus the embedded SQL provides the 3GL with a way to manipulate a database, supporting:
� highly customized applications
� background applications running without user intervention
� database manipulation which exceeds the abilities of simple SQL
8
� applications linking to Oracle packages, e.g. forms and reports
� Applications which need customized window interfaces
21. Define dynamic SQL.
It allows programs to construct and submit SQL queries at run time. Dynamic SQL statements are stored
as strings of characters that are entered when the program runs. They can be entered by the programmer
or generated by the program itself, but unlike static SQL statements, they are not embedded in the source
program. Also in contrast to static SQL statements, dynamic SQL statements can change from one
execution to the next.
22. Define cursors
A mechanism for retrieving rows from the database one at a time
23. Define distributed databases.
It consists of a collection of sites, connected together via some kind of communication network in which
 Each site is a full database system site on its own right,
 The sites have agreed to work together so that a auser at any site can access data anywhere in the
network exactly as if the data were all stored at the user’s own site
24. What are the advantages of distributed databases?
1. Local autonomy
2. No reliance on a central site (bottleneck, vulnerability)
3. Continuous operation (reliability, availability)
4. Location independence
5. Fragmentation independence
6. Replication independence
7. Distributed Query Processing (optimization)
8. Distributed Transaction Management (concurrency, recovery)
9. Hardware independence
10. OS independence
11. Network independence
12. DBMS independence
25. Define client server databases.
A client server databases consists of three primary software components (aside from the network software
and operating systems of the computers in question): the client application (also called the front end), the
data access layer (also called middleware), and the database server (also called a database engine,
DBMS, data source, or back end).
26. What are the advantages of client server databases?
 Data sharing
 Integrity services
 Data interchangeability
 Location Independence of Data and processing
27. What are the disadvantages of client server databdases?
 Traffic Congestion
 Robustness
9
28. Define Encryption.
Encryption is the process of transforming information (referred to as plaintext) using an algorithm (called
cipher) to make it unreadable to anyone except those possessing special knowledge, usually referred to as
a key.
UNIT –III
1. Define normalization
Database normalization is the process of removing redundant data from the database to improve storage
efficiency, data integrity, and scalability. Normalization generally involves splitting existing tables into
multiple ones, which must be re-joined or linked each time a query is issued.
2. List out the drawbacks of Redundant Information
 Wastage of Storage
 Causes problems with update anomalies
 Insertion anomalies
 Deletion anomalies
 Modification anomalies
3. List out the advantages of normalization.
Less storage space
Quicker updates
Less data inconsistency
Clearer data relationships
Easier to add data
Flexible Structure
4. Define functional dependency.
An attribute Y is said to have a functional dependency on a set of attributes X (written X →Y) if and only
if each X value is associated with precisely one Y value.
5. Define Trivial functional dependency
A trivial functional dependency is a functional dependency of an attribute on a superset of itself.
{Ssn,Pnumber} -> {Hours}
Trivial
{Ssn} -> {Ename }
Non trivial
6. Define Full functional dependency
An attribute is fully functionally dependent on a set of attributes X if it is
 functionally dependent on X, and
 not functionally dependent on any proper subset of X.
{Ssn,Pnumber} -> {Hours}
7. Define Transitive dependency
A transitive dependency is an indirect functional dependency, one in which X→Z only by virtue of X→Y
and Y→Z.
10
8. Define Multivalued dependency
A multivaluesd dependency is a constraint according to which the presence of certain rows in a table
implies the presence of certain other rows.
9. Define Join dependency
A table T is subject to a join dependency if T can always be recreated by joining multiple tables each
having a subset of the attributes of T.
10. What are the Inference Rules for FDs?
(Reflexive) If Y subset-of X, then X -> Y
(Augmentation) If X -> Y, then XZ -> YZ
(Transitive) If X -> Y and Y -> Z, then X -> Z
Decomposition: If X -> YZ, then X -> Y and X -> Z
Union: If X -> Y and X -> Z, then X -> YZ
Pseudo transitivity: If X -> Y and WY -> Z, then WX -> Z
11. Define First Normal Form.
A relation is said to be in First Normal Form (1NF) if and only if each attribute of the relation is atomic.
It does not allows the composite and multi valued attributes.
12. Define second normal form.
A relation schema R is in second normal form (2NF) if a relation in 1NF and every non key attribute A in
R is fully functionally dependent on the primary key.
13. Define Third Normal Form.
A relation schema R is in third normal form (3NF) if a table is in second normal form (2NF) and there are
no transitive dependencies.
14. Define BCNF.
A relation is in BCNF, if and only if, every determinant is a candidate key.
15. Compare 3NF & BCNF.
3NF
BCNF
A relation schema R is in 3NF if for every
A relation schema R is in Boyce- Codd
nontrivial FD X-> Y in R, X is not a
Normal Form (BCNF) if for every nontrivial
candidate key
FD X-> Y in R, X is a candidate key
3NF has some redundancy
BCNF removes all redundancies caused by
FD’s
Performance is Lesser than BCNF
Better Performance than 3NF
16. Define multivalued dependency.
A multivalued dependency on R, X ->>Y, says that if two tuples of R agree on all the attributes of X, then
their components in Y may be swapped, and the result will be two tuples that are also in the relation.
17. Define fourth normal form.
A relation R is in 4NF if and only if, for every one of its non-trivial multivalued dependencies XY, X
is a superkey—that is, X is either a candidate key or a superset thereof.
11
18. Define fifth normal form.
An entity is in Fifth Normal Form (5NF) if, and only if, it is in 4NF and every join dependency for the
entity is a consequence of its candidate keys.
19. Define domain key normal form (DKNF).
A relation is in DK/NF if every constraint on the relation is a logical consequence of the definition of keys
and domains.
20. Define join dependency.
A join dependency is a constraint on the set of legal relations over a database scheme. A table T is subject
to a join dependency if T can always be recreated by joining multiple tables each having a subset of the
attributes of T. If one of the tables in the join has all the attributes of the table T, the join dependency is
called trivial. The join dependency plays an important role in the Fifth normal form, also known as
project-join normal form
UNIT IV
1. Define transaction.
A transaction is a unit of program execution that accesses and possibly updates various data items. A
transaction must see a consistent database.
2. What are the transaction states?
 Active, the initial state; the transaction stays in this state while it is executing.
 Partially committed, after the final statement has been executed
 Failed, after the discovery that normal execution can no longer proceed.
 Aborted, after the transaction has been rolled backed and the database has been restored to its state
prior to the start of transaction.
 Committed, after successful completion
3. Define ACID properties.
 Atomicity. Either all operations of the transaction are properly reflected in the database or none
are.
 Consistency. Execution of a transaction in isolation preserves the consistency of the database.
 Isolation. Although multiple transactions may execute concurrently, each transaction must be
unaware of other concurrently executing transactions.
 That is, for every pair of transactions Ti and Tj, it appears to Ti that either Tj, finished
execution before Ti started, or Tj started execution after Ti finished.
 Durability. After a transaction completes successfully, the changes it has made to the database
persist, even if there are system failures.
4. List the advantages of concurrency.
 increased processor and disk utilization
 reduced waiting time
5. Define schedule.
Schedules – sequences that indicate the chronological order in which instructions of concurrent
transactions are executed
 a schedule for a set of transactions must consist of all instructions of those transactions
 must preserve the order in which the instructions appear in each individual transaction.
12
6. Define serializability.
Serializability is the classical concurrency scheme. It ensures that a schedule for executing concurrent
transactions is equivalent to one that executes the transactions serially in some order. It assumes that all
accesses to the database are done using read and write operations.
7. Define serial schedule.
A schedule that considers all the actions of a transaction T1, followed by all the actions of another
transaction T2 and so on is called serial schedule.
8. What are the types of serializability?
 conflict serializability
 view serializability
9. Define conflict serializability.
A schedule S is conflict serializable if it is conflict equivalent to a serial schedule. If a schedule S can be
transformed into a schedule S´ by a series of swaps of non-conflicting instructions, we say that S and S´
are conflict equivalent.
10. Define view serializability
A schedule S is view serializable it is view equivalent to a serial schedule. Every conflict serializable
schedule is also view serializable.
13
11. Define recoverable schedule.
Recoverable schedule is the one where for each pair of transactions Ti and Tj such that Tj reads a data
item previously written by Ti, the commit operation of Ti appears before the commit operation of Tj.
12. Define cascading rollback.
An uncommitted transaction will be rolled back because of the failure of the first transaction, from which
other transactions reads the data item. This phenomenon of wasting the desirable amount of work is called
cascading rollback.
13. What is blind write?
If a transaction writes a data item without reading the data is called blind write.
This sometimes causes inconsistency.
14. Define lock. What is the use of locking?
A lock is a mechanism to control concurrent access to a data item. It is used to prevent concurrent
transactions from interfering with one another and enforcing an additional condition that guarantees
serializability.
15. What is shared lock and Exclusive lock?
Shared lock allows other transactions to read the data item and write is not allowed. Exclusive lock allows
both read and write of data item and a single transaction exclusively holds the lock.
16. What is called as a time stamp?
A time stamp is a unique identifier for each transaction generated by the system. Concurrency control
protocols use this time stamp to ensure serializability.
17. When does a deadlock occur?
Deadlock occurs when one transaction T in a set of two or more transactions is waiting for some item that
is locked by some other transaction in the set.
18. What is meant by transaction rollback?
If a transaction fails for reasons like power failure, hardware failure or logical error in the transaction after
updating the database, it is rolled back to restore the previous value.
19. What are the objectives of concurrency control?
 To be resistant to site and communication failure.
 To permit parallelism to satisfy performance requirements.
 To place few constraints on the structure of atomic actions.
20. What is replication?
The process of generating and reproducing multiple copies of data at one or more
sites is called replication.
21. What are the two phases available in two phase locking protocol?
Phase 1: Growing Phase
 transaction may obtain locks
 transaction may not release locks
Phase 2: Shrinking Phase
 transaction may release locks
 transaction may not obtain locks
14
22. What is strict & rigorous two phase locking protocol?
 Strict two-phase locking. A transaction must hold all its exclusive locks till it commits/aborts.
 Rigorous two-phase locking is even stricter: here all locks are held till commit/abort. In this
protocol transactions can be serialized in the order in which they commit.
23. Define upgrading & downgrading.
Upgrading -> convert shared lock to exclusive lock in growing phase
Downgrading -> convert exclusive lock to shared lock in growing phase
24. What is the role of lock manager?
A Lock manager can be implemented as a separate process to which transactions send lock and unlock
requests. The lock manager replies to a lock request by sending a lock grant messages. The requesting
transaction waits until its request is answered. The lock manager maintains a data structure called a lock
table to record granted locks and pending requests.
25. Define graph based protocol.
 Graph-based protocols are an alternative to two-phase locking
 Impose a partial ordering  on the set D = {d1, d2 ,..., dh} of all data items.
o If di  dj then any transaction accessing both di and dj must access di before accessing dj.
o Implies that the set D may now be viewed as a directed acyclic graph, called a database
graph.
 The tree-protocol is a simple kind of graph protocol.
26. What is called as a time stamp?
A time stamp is a unique identifier for each transaction generated by the system. Concurrency control
protocols use this time stamp to ensure serializability.
27. Define thomson’s write rule.
A transaction Ti issues write(Q).
 If TS(Ti) < R-timestamp(Q), then the value of Q that Ti is producing was needed previously, and
the system assumed that that value would never be produced. Hence, the write operation is
rejected, and Ti is rolled back.
 If TS(Ti) < W-timestamp(Q), then Ti is attempting to write an obsolete value of Q. Hence, this
write operation can be ignored
 Otherwise, the write operation is executed, and W-timestamp(Q) is set to TS(Ti).
28. What are the phases in validation based protocol?
1. Read phase. During this phase, the system executes transaction Ti. It reads the values of the
various data items and stores them in variables local to Ti. It performs all write operations on temporary
local variables, without updates of the actual database.
2. Validation phase. Transaction Ti performs a validation test to determine whether it can copy to
the database the temporary local variables that hold the results of write operations without causing a
violation of serializability.
3. Write phase. If transaction Ti succeeds in validation (step 2), then the system applies the actual
updates to the database. Otherwise, the system rolls backTi.
29. Define multiple granularities.
The multiple-granularity locking protocol, which ensures serializability, is this: Each transaction Ti can
lock a node Q by following these rules:
 It must observe the lock-compatibility function of Figure 16.17.
 It must lock the root of the tree first, and can lock it in any mode.
15
 It can lock a node Q in S or IS mode only if it currently has the parent of Q locked in either IX or
IS mode.
 It can lock a node Q in X, SIX, or IX mode only if it currently has the parent of Q locked in either
IX or SIX mode.
 It can lock a node only if it has not previously unlocked any node (that is, Ti is two phase).
 It can unlock a node Q only if it currently has none of the children of Q locked.
30. What are the various deadlock prevention technique?
 wait-die scheme — non-preemptive
 wound-wait scheme — preemptive
 Timeout-Based Schemes
31. Define log.
A log is kept on stable storage. The log is a sequence of log records, and maintains a record of update
activities on the database.
32. What is a checkpoint?
Check-pointing is saving enough state of a process so that the process can be restarted at the point in the
computa-tion where the checkpoint was taken.
 T1 can be ignored (updates already output to disk due to checkpoint)
 T2 and T3 redone.
 T4 undone
UNIT V
1. What are the various physical storage media?
2. Define access time, seek time, rotational latency, data transfer rate & mean time to failure
Access time – the time it takes from when a read or write request is issued to when data transfer begins.
Consists of:
Seek time – time it takes to reposition the arm over the correct track.
Rotational latency – time it takes for the sector to be accessed to appear under the head.
Data-transfer rate – the rate at which data can be retrieved from or stored to the disk.
16
Mean time to failure (MTTF) – the average time the disk is expected to run continuously without any
failure.
3. Define mirroring or shadowing.
Duplicate every disk. Logical disk consists of two physical disks. Every write is carried out on both
disks. Reads can take place from either disk. If one disk in a pair fails, data still available in the other.
Data loss would occur only if a disk fails, and its mirror disk also fails before the system is repaired
4. How to measure the performance of RAID levels?
 Monetary cost
 Performance: Number of I/O operations per second, and bandwidth during normal operation
 Performance during failure
 Performance during rebuild
5. What is RAID?
RAID stands for Redundant Array of Inexpensive Disks. RAID is the organization of multiple disks into a
large, high performance logical disk. Disk arrays stripe data across multiple disks and access them in
parallel to achieve:
 Higher data transfer rates on large data accesses and
 Higher I/O rates on small data accesses.
Data striping also results in uniform load balancing across all of the disks, eliminating hot spots that
otherwise saturate a small number of disks, while the majority of disks sit idle.
6. What are the various file organizations?
 Heap – a record can be placed anywhere in the file where there is space
 Sequential – store records in sequential order, based on the value of the search key of each record
 Hashing – a hash function computed on some attribute of each record; the result specifies in
which block of the file the record should be placed
7. Define data dictionary.
Data dictionary (also called system catalog) stores metadata: that is, data about data, such as
 Information about relations
 User and accounting information, including passwords
 Statistical and descriptive data
 Physical file organization information
 Information about indices
8. Define index.
An index that provides an alternate method of accessing records or portions of records in a data base or
file.
9. What are the types of indexing?
• Ordered index (Primary index or clustering index) – which is used to access data sorted by order
of values.
• Hash index (secondary index or non-clustering index ) - used to access data that is distributed
uniformly across a range of buckets.
10. What are the factors that should be considered while choosing indexing methods?
• access type
• access time
• insertion time
17
•
•
deletion time
space overhead
11. What are the types of ordered indices?
• Dense index - an index record appears for every search-key value in the file.
• Sparse index - an index record that appears for only some of the values in the file.
12. Compare dense index and sparse index.
Dense index
 An index record appears for every search
key value in file.
 This record contains search key value and
a pointer to the actual record.
Faster
Sparse index
some of the
records.
the largest search key value less than or equal to the
search key value we are looking for.
Slower
13. Define primary and secondary indices.
The primary index defines the physical organization of the records in the database. Each db has one and
only one primary index. In addition, it also have any number of secondary indices.
The secondary indices provide alternate access paths to the data by allowing different fields in the record
to be used as index keys. Each secondary index is stored in a separate area with its own storage allocation,
and any number of secondary indices can be dynamically created and deleted. Index names for a file must
be unique.
14. Define multi level indexing.
Multi level indexing is used, if primary index does not fit in memory, access becomes expensive. To
reduce number of disk accesses to index records, treat primary index kept on disk as a sequential file and
construct a sparse index on it.
 outer index – a sparse index of primary index
 inner index – the primary index file
15. Define hashing.
Hashing is a method to store data in an array so that storing, searching, inserting and deleting data is fast.
For this every record needs an unique key. The basic idea is not to search for the correct position of a
record with comparisons but to compute the position within the array. The function that returns the
position is called the 'hash function' and the array is called a 'hash table'.
16. What are the types of hashing?
 Static hashing
 Dynamic hashing (Extendable hashing)
18
17. Compare static hashing & dynamic hashing.
Static Hashing has the number of primary pages in the directory fixed. Thus, when a bucket is full, we
need an overflow bucket to store any additional records that hash to the full bucket. This can be done with
a link to an overflow page, or a linked list of overflow pages. The linked list can be separate for each
bucket, or the same for all buckets that overflow.
In dynamic Hashing, the size of the directory grows with the number of collisions to accommodate new
records and avoid long overflow page chains.
18. What are the steps to be performed in query processing?
1) The scanning, parsing, and validating module produces an internal representation of the query.
2) The query optimizer module devises an execution plan which is the execution strategy to retrieve the
result of the query from the database files. A query typically has many possible execution strategies
differing in performance, and the process of choosing a reasonably efficient one is known as query
optimization. Query optimization is beyond this course
3) The code generator generates the code to execute the plan.
4) The runtime database processor runs the generated code to produce the query result.
19. Define database tuning.
Database Tuning is the process of continuing to revise/adjust the physical database design by monitoring
resource utilization as well as internal DBMS processing to reveal bottlenecks such as contention for the
same data or devices.
19