* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download What restrictions are imposed on outer join
Survey
Document related concepts
Microsoft SQL Server wikipedia , lookup
Oracle Database wikipedia , lookup
Open Database Connectivity wikipedia , lookup
Commitment ordering wikipedia , lookup
Ingres (database) wikipedia , lookup
Entity–attribute–value model wikipedia , lookup
Relational algebra wikipedia , lookup
Microsoft Jet Database Engine wikipedia , lookup
Functional Database Model wikipedia , lookup
Versant Object Database wikipedia , lookup
Extensible Storage Engine wikipedia , lookup
Serializability wikipedia , lookup
Clusterpoint wikipedia , lookup
ContactPoint wikipedia , lookup
Concurrency control wikipedia , lookup
Transcript
Database Management System SYNERGY INSTITUTE OF ENGG. & TECHNOLOGY DHENKANAL DATABASE MANAGEMENT SYSTEM Lecturer notes Prepared by : Nirjharinee Parida Prepared By Nirjharinee Parida Database Management System PCCS4204 Database Engineering Module1: (12 Hrs) Introduction to database Systems, Basic concepts &Definitions, Data Dictionary, DBA, Fileoriented system vs. Database System, Database Language. Database System Architecture-Schemas, Sub Schemas & Instances, 3-level database architecture, Data Abstraction, Data Independence, Mappings, Structure, Components & functions of DBMS, Data models, Mapping E-R model to Relational, Network and Object Oriented Data models, types of Database systems, Storage Strategies: Detailed Storage Architecture, Storing Data, Magnetic Disk, RAID, Other Disks, Magnetic Tape, Storage Access, File & Record Organization, File Organizations & Indexes, Order Indices, B+ Tree Index Files, Hashing Module2: (16 Hrs) Relational Algebra, Tuple & Domain Relational Calculus, Relational Query Languages: SQL and QBE. Database Design :-Database development life cycle(DDLC),Automated design tools, Functional dependency and Decomposition, Dependency Preservation & lossless Design, Normalization, Normal forms:1NF, 2NF,3NF,and BCNF, Multi-valued Dependencies, 4NF & 5NF. Query processing and optimization: Evaluation of Relational Algebra Expressions, Query optimization. Module3: (12 Hrs) Transaction processing and concurrency control: Transaction concepts, concurrency control, locking and Timestamp methods for concurrency control. Database Recovery System: Types of Data Base failure & Types of Database Recovery, Recovery techniques Advanced topics: Object-Oriented & Object – Relational Database, Parallel & Distributed Database, Introduction to Data warehousing & Data Mining Text Books: 1. Database System Concepts by Sudarshan, Korth (McGraw-Hill Education) 2. Fundamentals of Database System By Elmasari &Navathe- Pearson Education References Books: (1) An introduction to Database System – Bipin Desai, Galgotia Publications (2) Database System: concept, Design & Application by S.K.Singh (Pearson Education) (3) Database management system by leon &leon (Vikas publishing House). (4) Database Modeling and Design: Logical Design by Toby J. Teorey, Sam S. Lightstone, and Tom Nadeau, “”, 4th Edition, 2005, Elsevier India Publications, New Delhi (5) Fundamentals of Database Management System – Gillenson, Wiley India Prepared By Nirjharinee Parida Database Management System Module-1 Prepared By Nirjharinee Parida Database Management System What do you mean by Data and Database? Data can be divided into three categories. Raw data – this could be “85” – doesn’t have meaning when it stands alone. It might mean something if you knew it was weight of a man in Kilograms. Related raw data is a group (data set or data file) of organized raw data that can be tied together. For example, it could be a group of Names, weights, blood group and identification numbers, all tied to the Identity cards issued to patients at hospitals Cleaned raw data is all the above after being validated or processed through some process. Such a process might ensure that blood groups doesn’t have any value as “red” or “black” for example only allowed values could be of the kind A,A+,B,B+ etc. Data can be acquired from many different sources. It must always be evaluated as to which category it belongs, and if it needs any additional validation before analysis that produces information. Database: A database consists of an organized collection of interrelated data for one or more uses, typically in digital form. Examples of databases could be: Database for Educational Institute or a Bank, Library, Railway Reservation system etc. Prepared By Nirjharinee Parida Database Management System What Is a DBMS? Consists of two things- a Database and a set of programs. Database is a very large, integrated collection of data. The set of programs are used to Access and Process the database. So DBMS can be defined as the software package designed to store and manage or process the database. FILE SYSTEM Set of programs File system Data is stored in Different Files in forms of Records. The programs are written time to time as per the requirement to manipulate the data within files. A program to debit and credit an account A program to find the balance of an account A program to generate monthly statements Disadvantages of File system over DBMS Most explicit and major disadvantages of file system when compared to database management system are as follows: Prepared By Nirjharinee Parida Database Management System Data Redundancy- The files are created in the file system as and when required by an enterprise over its growth path. So in that case the repetition of information about an entity cannot be avoided. Eg. The addresses of customers will be present in the file maintaining information about customers holding savings account and also the address of the customers will be present in file maintaining the current account. Even when same customer have a saving account and current account his address will be present at two places. Data Inconsistency: Data redundancy leads to greater problem than just wasting the storage i.e. it may lead to inconsistent data. Same data which has been repeated at several places may not match after it has been updated at some places. For example: Suppose the customer requests to change the address for his account in the Bank and the Program is executed to update the saving bank account file only but his current bank account file is not updated. Afterwards the addresses of the same customer present in saving bank account file and current bank account file will not match. Moreover there will be no way to find out which address is latest out of these two. Difficulty in Accessing Data: For generating ad hoc reports the programs will not already be present and only options present will to write a new program to generate requested report or to work manually. This is going to take impractical time and will be more expensive. For example: Suppose all of sudden the administrator gets a request to generate a list of all the customers holding the saving banks account who lives in particular locality of the city. Administrator will not have any program already written to generate that list but say he has a program which can generate a list of all the customers holding the savings account. Then he can either provide the information by going thru the list manually to select the customers living in the particular locality or he can write a new program to generate the new list. Both of these ways will take large time which would generally be impractical. Data Isolation: Since the data files are created at different times and supposedly by different people the structures of different files generally will not match. The data will be scattered in different files for a particular entity. So it will be difficult to obtain Prepared By Nirjharinee Parida Database Management System appropriate data. For example: Suppose the Address in Saving Account file have fields: Add line1, Add line2, City, State, Pin while the fields in address of Current account are: House No., Street No., Locality, City, State, Pin. Administrator is asked to provide the list of customers living in a particular locality. Providing consolidated list of all the customers will require looking in both files. But they both have different way of storing the address. Writing a program to generate such a list will be difficult. Integrity Problems: All the consistency constraints have to be applied to database through appropriate checks in the coded programs. This is very difficult when number such constraint is very large. For example: An account should not have balance less than Rs. 500. To enforce this constraint appropriate check should be added in the program which add a record and the program which withdraw from an account. Suppose later on this amount limit is increased then all those check should be updated to avoid inconsistency. These time to time changes in the programs will be great headache for the administrator. ? Security and access control: Database should be protected from unauthorized users. Every user should not be allowed to access every data. Since application programs are added to the system For example: The Payroll Personnel in a bank should not be allowed to access accounts information of the customers. Concurrency Problems: When more than one users are allowed to process the database. If in that environment two or more users try to update a shared data element at about the same time then it may result into inconsistent data. For example: Suppose Balance of an account is Rs. 500. And User A and B try to withdraw Rs 100 and Rs 50 respectively at almost the same time using the Update process. Update: 1. Read the balance amount. 2. Subtract the withdrawn amount from balance. 3. Write updated Balance value. Suppose A performs Step 1 and 2 on the balance amount i.e it reads 500 and subtract 100 from it. But at the same time B withdraws Rs 50 and he performs the Update Prepared By Nirjharinee Parida Database Management System process and he also reads the balance as 500 subtract 50 and writes back 450. User A will also write his updated Balance amount as 400. They may update the Balance value in any order depending on various reasons concerning to system being used by both of the users. So finally the balance will be either equal to 400 or 450. Both of these values are wrong for the updated balance and so now the balance amount is having inconsistent value forever. Why Use a DBMS? Data independence and efficient access. Reduced application development time. Data integrity and security. Uniform data administration. Concurrent access, recovery from crashes. Role of DBMS: The earlier Information system will work as follows: user SET of programs File system disk While the DBMS will be another layer of software package placed between the file system and set of application programs. The Role of DBMS can described by the following diagram at a very high level. Instances and Schema: Prepared By Nirjharinee Parida Database Management System Database schema: Overall design of the database. An analogy to the programming language could be the definition of various variables with their data types. In case of relational database management system the definition of table names, and their fields with data types will be the database schema. Database Instance: The collection of information stored in the database at a particular moment is called database instance. An analogy to the programming languages would be the values stored in the variables during the execution of programs. In case of relational database management system the data stored in various tables at a particular time is the instance of the database. Data Abstraction: Three-Level Architecture of DBMS: Since many of the database system users are not computer trained, developers hide the complexity from users through several level of abstraction, to simplify user’s interaction with the system: Physical Level: Lowest level of abstraction Describes ‘how’ the data are actually stored Complex low-level data structures are defined by system programs which are generally hidden from high level computer programs also. In the case of relational database management systems the files and indexes used are described at physical level of abstraction. It is similar as a programming language hides exact way of storing the values defined by variables or records or arrays. Thus defining exact way of storing a record or an array defined by suppose C language will be called physical level of abstraction. Physical schema is used at the physical level of abstraction. Logical Level: 1. Describes what data are stored in the database and what relationships exist among those data. 2.Entire database is represented in simple structure which may be specified by very Prepared By Nirjharinee Parida Database Management System complex structures at physical level. 3. In the case of relational database management systems definitions of Tables and their fields are defined at logical level of abstraction. 4.An analogy with programming language for logical level of data abstraction is the definitions of record structures or arrays in a programming language (say C). 5 Logical Schema is used at logical level of abstraction. View Level: 1. Describes only part of the entire database. 2.Many users will not be concerned with all the information stored in a database 3.System may provide several views for the several type of users of the database which will show only the concerned part of the database. 4. View schema is used at view level of abstraction. EXTERNAL VIEW External External mappings View View CONCEPTUAL Conceptual Schema LEVEL mappings Internal Schema INTERNAL LEVEL STORED DATABASE Prepared By Nirjharinee Parida Database Management System Data Independence: Physical Data Independence: application programs to be rewritten. The changes in physical schema can include: using new storage devices, using different data structures, using different file organizations or storage structures or changing file index. All these changes should be possible without changes in logical schema or application programs to be rewritten. Logical Data Independence: rewritten. The changes like addition or deletion of entities, attributes or relationships come in logical schema changes and they should be possible without rewriting the already written application programs. Data Definition Language (DDL): This language provides a set of commands which can be used to define what is the data in database. what is the relationship between various data elements What are the integrity constraints put on various data items needed to be satisfied etc. The DDL statements are compiled to form the Data Dictionary or Data Directory which contains the meta data i.e. data about data. The data dictionary is consulted by DBMS before any operation on data. Data Manipulation Language(DML): Prepared By Nirjharinee Parida Database Management System This consists of very high level statements that are used to specify the operations to be performed on the database. Data Models, Schemas and Instances A characteristic of the database approach is that it provides a level of data abstraction, by hiding details of data storage that are not needed by most users. A data model is a collection of concepts that can be used to describe the structure of a database. The model provides the necessary means to achieve the abstraction. The structure of a database is characterized by data types, relationships, and constraints that hold for the data. Models also include a set of operations for specifying retrievals and updates. Data models are changing to include concepts to specify the behaviour of the database application. This allows designers to specify a set of user defined operations that are allowed. Categories of Data Models Data models can be categorized in multiple ways. High level/conceptual data models – provide concepts close to the way users perceive the data. Physical data models – provide concepts that describe the details of how data is stored in the computer. These concepts are generally meant for the specialist, and not the end user. Representational data models – provide concepts that may be understood by the end user but not far removed from the way data is organized. Conceptual data models use concepts such as entities, attributes and relationships. Entity – represents a real world object or concept Attribute - represents property of interest that describes an entity, such as name or salary. Prepared By Nirjharinee Parida Database Management System Relationships – among two or more entities, represents an association among two or more entities. Representational data models are used most frequently in commercial DBMSs. They include relational data models, and legacy models such as network and hierarchical models. Physical data models describe how data is stored in files by representing record formats, record orderings and access paths. Object data models – a group of higher level implementation data models closer to conceptual data models. Schemas, Instances and Database State The description of a database is called the database schema. The schema is specified during database design, and is not expected to change frequently. Data models have conventions for displaying schemas as diagrams. A displayed schema is called a schema diagram. Entity Relationship (ER) Model. The most popular high-level conceptual data model is the ER model. It is frequently used for the conceptual design of database applications. The diagrammatic notation associated with the ER model, is referred to as the ER diagram. ER diagrams show the basic data structures and constraints. Entity Types, Entity Sets, Attributes and Keys The basic object of an ER diagram is the entity. An entity represents a ‘thing’ in the real world. Examples of entities might be a physical entity, such as a student, a house, a product etc, or conceptual entities such as a company, a job position, a course, etc. Prepared By Nirjharinee Parida Database Management System Entities have attributes, which basically are the properties/characteristics of a particular entity. Examples of entities & attributes There are several types of entities. Including: Simple vs. Composite Single-valued vs. Multi-valued Stored vs. Derived. Simple vs. Composite Attributes Composite attributes can be divided into smaller subparts, which represent more basic attributes, which have their own meanings. A common example of a composite attribute is Address. Address can be broken down into a number of subparts, such as Street Address, City, Postal Code. Street Address may be further broken down by Number, Street Name and Apartment/Unit number. Attributes that are not divisible into subparts are called simple or atomic attributes. Composite attributes can be used if the attribute is referred to as the whole, and the atomic attributes are not referred to. For example, if you wish to store the Company Location, unless you will use the atomic information such as Postal Code, or City separately from the other Location information (Street Address etc) then there is no need to subdivide it into its component attributes, and the whole Location can be designated as a simple attribute. What are examples of other composite attributes? Single-Valued vs. Multi-valued Attributes Most attributes have a single value for each entity, such as a car only has one model, a student has only one ID number, an employee has only one data of birth. These attributes are called single-valued attributes. Prepared By Nirjharinee Parida Database Management System Sometimes an attribute can have multiple values for a single entity, for example, a doctor may have more than one specialty (or may have only one specialty), a customer may have more than one mobile phone number, or they may not have one at all. These attributes are called multi-valued attributes. Multi-valued attributes may have a lower and upper bounds to constrain the number of values allowed. For example, a doctor must have at least one specialty, but no more than 3 specialties. Stored vs. Derived Attributes If an attribute can be calculated using the value of another attribute, they are called derived attributes. The attribute that is used to derive the attribute is called a stored attribute. Derived attributes are not stored in the file, but can be derived when needed from the stored attributes. Null Valued Attributes There are cases where an attribute does not have an applicable value for an attribute. For these situations, the value null is created. A person who does not have a mobile phone would have null stored at the value for the Mobile Phone Number attribute. Null can also be used in situations where the attribute value is unknown. There are two cases where this can occur, one where it is known that the attribute is valued, but the value is missing, for example hair color. Every person has a hair color, but the information may be missing. Another situation is if mobile phone number is null, it is not known if the person does not have a mobile phone or if that information is just missing. Complex Attributes Complex attributes are attributes that are nested in an arbitrary way. For example a person can have more than one residence, and each residence can have more than one phone, therefore it is a complex attribute that can be represented as: Prepared By Nirjharinee Parida Database Management System {Multi-valued attributes are displayed between braces} (Complex Attributes are represented using parentheses) E.g. {AddressPhone({Phone(AreaCode, PhoneNumber)}, Address(StreetAddress(Number, Street, ApartmentNumber), City, State, Zip))} Entity Types, Entity Sets, Keys and Value Sets Entity Types and Entity Sets An entity type defines a collection of entities that have the same attributes. Each entity type in the database is described by its name and attributes. The entity share the same attributes, but each entity has its own value for each attribute. Entity Type Example: Entity Type: Student Entity Attributes: StudentID, Name, Surname, Date of Birth, Department The collection of all entities of a particular entity type in the database at any point in time is called an entity set. The entity type (Student) and the entity set (Student) can be referred to using the same name. Entity Set Example: Prepared By Nirjharinee Parida Database Management System Entity Type: Student Entity Set: [123, John, Smith, 12/01/1981, Computer Technology] [456, Jane, Doe, 05/02/1979, Mathematics] [789, Semra, Aykan, 02/08/1980, Linguistics] The entity type describes the intension, or schema for a set of entities that share the same structure. The collection of entities of a particular entity type is grouped into the entity set, called the extension. Key Attributes of an Entity Type An important constraint on entities of an entity type is the uniqueness constraint. A key attribute is an attribute whose values are distinct for each individual entity in the entity set. The values of the key attribute can be used to identify each entity uniquely. Sometimes a key can consist of several attributes together, where the combination of attributes is unique for a given entity. This is called a composite key. Composite keys should be minimal, meaning that all attributes must be included to have the uniqueness property. Specifying that an attribute is a key of an entity type means that the uniqueness property must hold true for every entity set of the entity type. An entity can have more than one key attribute, and some entities may have no key attribute. Those entities with no key attribute are called weak entity types. Value Sets (Domains) of Attributes Each simple attribute of an entity is associates with a domain of values, or value set, which specifies the set of values that may be assigned to that attribute for each entity. For example, date of birth must be before today’s date, and after 01/01/1900, or the Student Name attribute must be a string of alphabetic characters. Value sets are not specified in ER diagrams. Prepared By Nirjharinee Parida Database Management System ER DIAGRAM NOTATION Prepared By Nirjharinee Parida Database Management System Relationship Review Each time an attribute of one entity type refers to another entity type, some relationship exists. In ER diagrams, these references should be represented as relationships, rather than attributes. For example, in the Company database schema, an attribute of employee is the department they work for, rather than representing this information as an attribute of the Employee entity type, it should be represented on a diagram as a relationship between the two entities. Relationships between entities are represented using a diamond shape. Relationships are usually given a verb name, which specifies the relationship between two entities. If we look at the relationship between Employee and Department, an employee works for a department, therefore the relationship would be represented. Employee Works for Department Degree of Relationship Type The degree of a relationship type is the number of participating entity types. Meaning if the relationship is between two entity types (Employee and Department), then the relationship is binary, or has a degree of two. Prepared By Nirjharinee Parida Database Management System If the relationship is between three participating entities, it has a degree of three, and therefore is a ternary relationship. For example, if we have three entities, Supplier, Project and Part. Each part is supplied by a unique supplier, and is used for a given project within a company; the relationship “Supplies” is a ternary (degree of three) between Supplier, Project and Part, meaning all three participate in the supplies relationship. Supplier Project Supplies Part Constraints on Relationship Types Relationship types have certain constraints that limit the possible combination of entities that may participate in relationship. An example of a constraint is that if we have the entities Doctor and Patient, the organization may have a rule that a patient cannot be seen by more than one doctor. This constraint needs to be described in the schema. There are two main types of relationship constraints, cardinality ratio, and participation. Cardinality for Binary Relationship. Binary relationships are relationships between exactly two entities. The cardinality ratio specifies the maximum number of relationship instances that an entity can participate in. The possible cardinality ratios for binary relationship types are: 1:1, 1:N, N:1, M:N. Cardinality ratios are shown on ER diagrams by displaying 1, M and N on the diamonds. Prepared By Nirjharinee Parida Database Management System The ratio shown closest to an entity, represents the ratio the other entity has to that entity. Participation Constraints and Existence Dependencies The participation constraint specifies whether the existence of an entity depends on its being related to another entity via the relationship type. The constraint specifies the minimum number of relationship instances that each entity can participate in. There are two types of participation constraints: o Total: If an entity can exist, only if it participates in at least one relationship instance, then that is called total participation, meaning that every entity in one set, must be related to at least one entity in a designated entity set. An example would be the Employee and Department relationship. If company policy states that every employee must work for a department, then an employee can exist only if it participates in at lest one relationship instance (i.e. an employee can’t exist without a department) It is also sometimes called an existence dependency. Total participation is represented by a double line, going from the relationship to the dependent entity. o Partial: If only a part of the set of entities participate in a relationship, then it is called partial participation. Using the Company example, every employee will not be a manager of a department, so the participation of an employee in the “Manages” relationship is partial. Partial participation is represented by a single line. Attributes of Relationship Types Relationships can have attributes similar to entity types. Prepared By Nirjharinee Parida Database Management System For example, in the relationship Works_On, between the Employee entity and the Department entity we would like to keep track of the number of hours an employee works on a project. Therefore we can include Number of Hours as an attribute of the relationship. Another example is for the “manages” relationship between employee and department, we can add Start Date as an attribute of the Manages relationship. For some relationships (1:1, or 1:N), the attribute can be placed on one of the participating entity types. For example the “Manages” relationship is 1:1, StartDate can either be migrated to Employee or Department. Weak Entity Types Entity types that do not have key attributes are called weak entity types. Entities that belong to a weak entity type are identified by being related to specific entities from another entity type in combination with one of their attribute values. This entity type is called an identifying or owner entity type. The relationship that relates the identifying entity type with the weak entity type is called an identifying relationship. A weak entity type always has a total participation constraint with respect to the identifying relationship, because a weak entity cannot exist without its owner. Not all existence dependencies result in a weak entity type; if an entity has a key attribute then it is not a weak entity. A weak entity type usually has a partial key, which is the set of attributes that can uniquely identify weak entities that are related to the same owner entity. Weak Entity Example For example, lets assume in a library database, we have an entity type Book. For each book, we keep track of the author, ISBN, and title. The library may own several copies of the same book, and for each copy, it keeps track of the copy number (a different copy number for each copy of a given book) and price of each copy. Book has Has copy Prepared By Nirjharinee Parida Database Management System Because the copy number is only unique for each book (meaning Book 123 may have copy 1, copy 2, copy 3, and book 456 may also have copy 1, copy 2 and copy 3) and not for all copies of all books, it cannot be considered unique for each copy. Therefore because the Copy entity does not have a key attribute, it is considered a weak entity type, an is identified by being related to the Book entity. The book entity is the identifying entity, and the relationship is the identifying relationship. Because a copy cannot exist without the owner (Book) the Copy entity type has a total participation constraint with respect to the identifying relationship. The partial key of the Copy entity is Copy Number, for each owner entity Book, the Copy Number uniquely identifies the copy. Mapping ER diagram to relational model SID Name Major GPA 1234 John CS 2.8 5678 Mary EE 3.6 Prepared By Nirjharinee Parida Database Management System Basic Ideas: Build a table for each entity set Build a table for each relationship set if necessary (more on this later) Make a column in the table for each attribute in the entity set Indivisibility Rule and Ordering Rule Primary Key ssn NAME SID name student GPA MAJOR ADVISER professor dept Prepared By Nirjharinee Parida Database Management System Representation of Weak Entity Set Age Name Parent_SID 10 Bart 1234 8 Lisa 5678 Prepared By Nirjharinee Parida Database Management System Representation of Relationship Set Unary/Binary Relationship set Depends on the cardinality and participation of the relationship Two possible approaches N-ary (multiple) Relationship set primary Key Issue Identifying Relationship No relational model representation necessary REPRESENTING RELATIONSHIP SET • For one-to-one relationship without total participation – Build a table with two columns, one column for each participating entity set’s primary key. Add successive columns, one for each descriptive attributes of the relationship set (if any). • For one-to-one relationship with one entity set having total participation – Augment one extra column on the right side of the table of the entity set with total participation, put in there the primary key of the entity set without complete participation as per to the relationship. Example – One-to-One Relationship Set Prepared By Nirjharinee Parida Database Management System SID Maj_ID Co S_Degree 9999 07 1234 8888 05 5678 • For one-to-many relationship without total participation – Same thing as one-to-one • For one-to-many/many-to-one relationship with one entity set having total participation on “many” side – Augment one extra column on the right side of the table of the entity set on the “many” side, put in there the primary key of the entity set on the “one” side as per to the relationship. Prepared By Nirjharinee Parida Database Management System SID Name Major GPA Pro_SSN Ad_Sem 9999 Bart Economy -4.0 123-456 Fall 2006 8888 Lisa Physics 4.0 567-890 Fall 2005 • For many-to-many relationship – Same thing participation. as one-to-one relationship without total – Primary key of this new schema is the union of the foreign keys of both entity sets. – No augmentation approach possible. N-ary Relationship Build a new table with as many columns as there are attributes for the union of the primary keys of all participating entity sets. Augment additional columns for descriptive attributes of the relationship set (if necessary) The primary key of this table is the union of all primary keys of entity sets that are on “many” side That is it, we are done. • Prepared By Nirjharinee Parida Database Management System P-Key1 P-Key2 P-Key3 A-Key D-Attribute 9999 8888 7777 6666 Yes 1234 5678 9012 3456 No Prepared By Nirjharinee Parida Database Management System Module-I Short Questions 2008 1. What is the difference between a primary key and a candidate key? [2][Introduction to database system] 2. Mention the various categories of Data Model. [2][ Data Model] 3. Define: Entity Type, Entity Set and Value Set. [2][ER model] 2009 1. What is the difference between multi valued and derived attributes? [2][ ER model] 2. Differentiate between procedural and non procedural language with example. [2][Query language] 3. What is generalization? How it differs from specialization? [2][ER model] 4. Draw the ER diagram for the following entity sets. Movies (Title, year, length, film type) And Stars(name, address) [2][ER model] 2010 1. Differentiate between foreign Key and references. [2][ER model] 2. Find the tuple calculus representation for the following SQL query : Select R1.A, R2.B from R1, R2 whereR1.b=R2.a;[2][Query languages] 2011 1. What is the job of DBA? [2][Database languages] 2. What are the different levels of data abstraction? How those are linked with data dependence? [2][Introduction to database system] 3. Define integrity rules and constraints.[2][ ER model] 4. Give an example of a weak entity set and explain why it is weak. [2] [ER model] 5. What are the differences between Relational algebra and relational calculus? [2][ Database languages] 6. Write down two DML statements for database recovery and explain it. [2][Database languages] 2012 1. What is the need of DBMS? [2][Introduction to database system] 2. What do you mean by data integrity? [2][Introduction to database system] 3. What is ER model? [2][ER model] 4. What do you mean by weak entity set? [2][ ER model] 5. What are the different aggregate functions in SQL? [2][Query languages] 6. Define query language. [2][Query Languages] Prepared By Nirjharinee Parida Database Management System Long Question 2008 1. Define entity, attribute and relationships as used in relational databases. Describe purpose of E-R Model. Illustrate your answer with an example. [5] [ER model] 2. What are the major components of the relational model? What is simple relational database ?What are two models in which you can use SQL ? [5] [Data model] 3. What is an object-oriented database? What are its advantages compared ‘to relational database? Explain some applications where an object-oriented database may be useful. [5][Introduction to database system] 4. Consider the following tables : A 3 8 B 7 6 C 9 5 A 5 8 F 8 2 G 1 6 Show the semantics and the output of the following query : SELECT * FROM S, R WHERE S.A = R.A AND S.B = R.G ; [5][Database languages] 5. List out the six fundamental operators and 4 additional operators in relational algebra. [2.5] [Relational algebra] 6. Explain the two conditions needed for the set difference operation (union operation) to be valid. [2.5] [Relational algebra] 2009 1. List at least four advantages of using a database management system over a traditional file management system. Are there any disadvantages of DBMS? [5][Introduction to database system] 2. What is abstraction? Is it necessary for database system? Explain how database architecture satisfies abstraction at various levels . [5][Introduction to database system] 3. Draw the ER diagram for the following relations. Courses (Number, Room). Is it a weak entity set. Depts (Name, BOD) Lab-courses(computer alocation) Theory-courses (name, faculty_ name) [5] [ER model] 2010 1. Consider the set of relations : Student(name, roll,mark) Score(roll,grade) Details(name,address) For the following query: "Find name & address of students scoring grade 'A'," Represent it in relational algebra, tuple calculus, domain calculus, QBE and SQL.[10] [Query languages] Prepared By Nirjharinee Parida Database Management System 2. Describe the steps to reduce an E-R schema to tables. [10][ER model] 3. Differentiate between Object-Orienteddatabase and Object relational Database. [5][Introduction to Database system] 4. What is a constraint? Describe its types with examples. [10][ER model] 2011 1. A person can have many cars. A particular brand of car can have many owners. Draw the ER diagram and convert it to Relational model. [5[ER model] 2. Draw a network model for the above problem in question 2(a) and explain. [5][Datamodel] 3. How object oriented data models helps in current database management systems? Give an example of a customer and bank relation to explain an object oriented model. [5][Datamodel] 4. Describe the database architecture and explain the role of different users with respect to that architecture. [5][Introduction to database system] 5. Consider the following relations: EMPLOYEE(empno, name, office, age) BOOKS(isbn, title, author, publisher) LOAN(empno, isbn, date) Write the following quries in relational algebra: i. Find the name of the employees who have borrowed more than 5 books published by pearson. ii. Find the name of employees who have borrowed all books published by pearson [5][Database languages] 2012 1. What is database? Explain DBMS system architecture. [5][Introduction to database system] 2. Draw an ER diagram for banking system. [5][ER model] 3. What are the advantages and disadvantages of DBMS? [5][Introduction to database system] 4. Let the following relation schemas be given: R=(A, B, C) S=(D, E, F) Let relations r(R) and s(S) be given. Give an expression in SQL that is equivalent to each of the following queries: i. πA(r) ii. r*s [5][Query Languages] 5. Differentiate between the following terms: i. Relational and object oriented data models iii. Data definitions and Data manipulation languages [5][ Introduction to database system] 6. Write short notes on the following: A. Network Model [5][ Introduction to database system] B. ER Model Prepared By Nirjharinee Parida Database Management System Module-2 Prepared By Nirjharinee Parida Database Management System Relational Algebra: Basic operations: o Selection (σ) Selects a subset of rows from relation. o Projection (π) Selects a subset of columns from relation. o Cross-product (×) Allows us to combine two relations. o Set-difference () Tuples in reln. 1, but not in reln. 2. o Union (U) Tuples in reln. 1 and in reln. 2. o Rename( ρ) Use new name for the Tables or fields. Additional operations: o Intersection (∩), join( ), division(÷): Not essential, but (very!) useful. Since each operation returns a relation, operations can be composed! (Algebra is “closed”.) Projection Deletes attributes that are not in projection list. Schema of result contains exactly the fields in the projection list, with the same names that they had in the (only) input relation. ( Unary Operation) Projection operator has to eliminate duplicates! (as it returns a relation which is a set) o Note: real systems typically don’t do duplicate elimination unless the user explicitly asks for it. (Duplicate values may be representing different real world entity or relationship). Consider the BOOK table: Acc-No Title Author 100 200 300 400 500 “DBMS” “DBMS” “COMPILER” “COMPILER” “OS” “Silbershatz” “Ramanuj” “Silbershatz” “Ullman” “Sudarshan” 600 “DBMS” “Silbershatz” Prepared By Nirjharinee Parida Database Management System πTitle(BOOK) = Title “DBMS” “COMPILER” “OS” Selection Selects rows that satisfy selection condition. No duplicates in result! (Why?) Schema of result identical to schema of (only) input relation. Result relation can be the input for another relational algebra operation! (Operator composition.) σAcc-no>300(BOOK) = AccNo 400 500 600 Title Author “COMPILER” “Ullman” “OS” “Sudarshan” “DBMS” “Silbershatz” σTitle=”DBMS”(BOOK)= AccNo 100 200 600 Title Author “DBMS” “DBMS” “DBMS” “Silbershatz” “Ramanuj” “Silbershatz” πAcc-no (σTitle=”DBMS” (BOOK))= AccNo 100 200 600 Prepared By Nirjharinee Parida Database Management System Union, Intersection, Set-Difference All of these operations take two input relations, which must be union-compatible: o Same number of fields. o `Corresponding’ fields have the same type. What is the schema of result? Consider: Borrower Depositor Cust-name Acc-no CustLoan-no Suleman A-100 name Radheshyam A-300 Ram L-13 Ram A-401 Shyam L-30 Suleman L-42 List of customers who are either borrower or depositor at bank= πCust-name (Borrower) U πCust-name (Depositor)= Cust-name Ram Shyam Suleman Radeshyam Customers who are both borrowers and depositors = πCust-name (Borrower) ∩ πCust-name (Depositor)= Custname Ram Suleman Customers who are borrowers but not depositors = πCust-name (Borrower) πCust-name (Depositor)= Cust-name Shyam Prepared By Nirjharinee Parida Database Management System Cartesian-Product or Cross-Product (S1 × R1) Each row of S1 is paired with each row of R1. Result schema has one field per field of S1 and R1, with field names `inherited’ if possible. Consider the borrower and loan tables as follows: Borrower: Custname Ram Shyam Suleman Loan-no L-13 L-30 L-42 Loan: Loanno L-13 L-30 L-42 Amount 1000 20000 40000 Cross product of Borrower and Loan, Borrower × Loan = Borrower.Custname Ram Ram Ram Shyam Shyam Shyam Suleman Suleman Suleman Borrower.Loanno L-13 L-13 L-13 L-30 L-30 L-30 L-42 L-42 L-42 Loan.Loanno L-13 L-30 L-42 L-13 L-30 L-42 L-13 L-30 L-42 Loan.Amount 1000 20000 40000 1000 20000 40000 1000 20000 40000 The rename operation can be used to rename the fields to avoid confusion when two field names are same in two participating tables: For example the statement, ρLoan-borrower(Cust-name,Loan-No-1, Loan-No-2,Amount)( Borrower × Loan) results into- A new Table named Loan-borrower is created where it has four fields which Prepared By Nirjharinee Parida Database Management System are renamed as Cust-name, Loan-No-1, Loan-No-2 and Amount and the rows contains the same data as the cross product of Borrower and Loan. Loan-borrower: Custname Ram Ram Ram Shyam Shyam Shyam Suleman Suleman Suleman Loan-No1 L-13 L-13 L-13 L-30 L-30 L-30 L-42 L-42 L-42 LoanNo-2 L-13 L-30 L-42 L-13 L-30 L-42 L-13 L-30 L-42 Amount 1000 20000 40000 1000 20000 40000 1000 20000 40000 Rename Operation: It can be used in two ways : return the result of expression E in the table named x. return the result of expression E in the table named x with the attributes renamed to A1, A2,…, An. It’s benefit can be understood by the solution of the query “ Find the largest account balance in the bank” It can be solved by following steps: Find out the relation of those balances which are not largest. Consider Cartesion product of Account with itself i.e. Account × Account Compare the balances of first Account table with balances of second Account table in the product. For that we should rename one of the account table by some other name to avoid the confusion It can be done by following operation ΠAccount.balance (σAccount.balance < d.balance(Account× ρd(Account)) So the above relation contains the balances which are not largest. Prepared By Nirjharinee Parida Database Management System Subtract this relation from the relation containing all the balances i.e . Πbalance (Account). So the final statement for solving above query is Πbalance (Account)- ΠAccount.balance (σAccount.balance < d.balance(Account× ρd(Account)) Additional Operations Natural Join ( ) Forms Cartesian product of its two arguments, performs selection forcing equality on those attributes that appear in both relations For example consider Borrower and Loan relations, the natural join between them will automatically perform the selection on the table returned by Borrower × Loan which force equality on the attribute that appear in both Borrower and Loan i.e. Loan-no and also will have only one of the column named Loan-No. = σBorrower.Loan-no That means = Loan.Loan-no (Borrower × Loan). The table returned from this will be as follows: Eliminate rows that does not satisfy the selection criteria “σBorrower.Loan-no = Loan.Loan-no” from Borrower × Loan = Borrower.Custname Ram Ram Ram Shyam Shyam Shyam Suleman Suleman Suleman Borrower.Loanno L-13 L-13 L-13 L-30 L-30 L-30 L-42 L-42 L-42 Loan.Loanno L-13 L-30 L-42 L-13 L-30 L-42 L-13 L-30 L-42 Loan.Amount 1000 20000 40000 1000 20000 40000 1000 20000 40000 Prepared By Nirjharinee Parida Database Management System And will remove one of the column named Loan-no. i.e. Cust-name Ram Shyam Suleman = Loan-no L-13 L-30 L-42 Amount 1000 20000 40000 Division Operation: denoted by ÷ is used for queries that include the phrase “for all”. For example “Find customers who has an account in all branches in branch city Agra”. This query can be solved by following statement. ΠCustomer-name. branch-name ( ) ÷ Πbranch-name (σBranch- city=”Agra”(Branch) The division operations can be specified by using only basic operations as follows: Let r(R) and s(S) be given relations for schema R and S with r ÷ s = ΠR-S(r) - ΠR-S ((ΠR-S (r) × s) - ΠR-S,S (r)) Tuple Relational Calculus Relational algebra is an example of procedural language while tuple relational calculus is a nonprocedural query language. A query is specified as: {t | P(t)}, i.e it is the set of all tuples t such that predicate P is true for t. The formula P(t) is formed using atoms which uses the relations, tuples of relations and fields of tuples and following symbols Prepared By Nirjharinee Parida Database Management System These atoms can then be used to form formulas with following symbols For example : here are some queries and a way to express them using tuple calculus: o Find the branch-name, loan-number and amount for loans over Rs 1200. . o Find the loan number for each loan of an amount greater that Rs1200. o Find the names of all the customers who have a loan from the Sadar branch. o Find all customers who have a loan , an account, or both at the bank o Find only those customers who have both an account and a loan. o Find all customers who have an account but do not have loan. Prepared By Nirjharinee Parida Database Management System o Find all customers who have an account at all branches located in Agra Domain Relational Calculus Domain relational calculus is another non procedural language for expressing database queries. A query is specified as: {<x1,x2,…,xn> | P(x1,x2,…,xn)} where x1,x2,…,xn represents domain variables. P represent a predicate formula as in tuple calculus Since the domain variables are referred in place of tuples the formula doesn’t refer the fields of tuples rather they refer the domain variables. For example the queries in domain calculus are mentioned as follows: o Find the branch-name, loan-number and amount for loans over Rs 1200. . o Find the loan number for each loan of an amount greater that Rs1200. o Find the names of all the customers who have a loan from the Sadar branch and find the loan amount o Find names of all customers who have a loan , an account, or both at the Sadar Branch Prepared By Nirjharinee Parida Database Management System o Find only those customers who have both an account and a loan. o Find all customers who have an account but do not have loan. o Find all customers who have an account at all branches located in Agra Outer Join. Outer join operation is an extension of join operation to deal with missing information Suppose that we have following relational schemas: Employee( employee-name, street, city) Fulltime-works(employee-name, branch-name, salary) A snapshot of these relations is as follows: Employee: employeename Ram Shyam Suleman street city M G Road Agra New Mandi Mathura Road Bhagat Aligarh Singh Road Fulltime-works employeename Ram Shyam branchname Sadar Sanjay salary 30000 20000 Prepared By Nirjharinee Parida Database Management System Rehman Place Dayalbagh 40000 Suppose we want complete information of the full time employees. The natural join ( )will result into the loss of information for Suleman and Rehman because they don’t have record in both the tables ( left and right relation). The outer join will solve the problem. Three forms of outer join: o Left outer join( :the tuples which doesn’t match while doing natural join from left relation are also added in the result putting null values in missing field of right relation. o Right outer join( :the tuples which doesn’t match while natural join from right relation are also added in the result putting null values in missing field of left relation. o Full outer join( ): include both of the left and right outer joins i.e. adds the tuples which did not match either in left relation or right relation and put null in place of missing values. The result for three forms of outer join are as follows: Left join: employeename Ram Shyam Suleman = street City M G Road Agra New Mandi Mathura Road Bhagat Aligarh Singh Road branchname Sadar Sanjay Place Null salary 30000 20000 Null Prepared By Nirjharinee Parida Database Management System Right join: employeename Ram Shyam Rehman = street city branchname Sadar Sanjay Place Dayalbagh salary salary Aligarh branchname Sadar Sanjay Place null null Dayalbagh 40000 M G Road Agra New Mandi Mathura Road null null Full join: employeename Ram Shyam Suleman Rehman 30000 20000 40000 = street city M G Road New Mandi Road Bhagat Singh Road null Agra Mathura 30000 20000 null Structured Query Language (SQL) Introduction Commercial database systems use more user friendly language to specify the queries. SQL is the most influential commercially marketed product language. Other commercially used languages are QBE, Quel, and Datalog. Basic Structure The basic structure of an SQL consists of three clauses: select, from and where. select: it corresponds to the projection operation of relational algebra. Used to list the attributes desired in the result. from: corresponds to the Cartesian product operation of relational algebra. Used to list the relations to be scanned in the evaluation of the expression Prepared By Nirjharinee Parida Database Management System where: corresponds to the selection predicate of the relational algebra. It consists of a predicate involving attributes of the relations that appear in the from clause. A typical SQL query has the form: select A1, A2,…, An from r1, r2,…, rm where P o Ai represents an attribute o rj represents a relation o P is a predicate o It is equivalent to following relational algebra expression: o [Note: The words marked in dark in this text work as keywords in SQL language. For example “select”, “from” and “where” in the above paragraph are shown in bold font to indicate that they are keywords] Select Clause Let us see some simple queries and use of select clause to express them in SQL. Find the names of all branches in the Loan relation select branch-name from Loan By default the select clause includes duplicate values. If we want to force the elimination of duplicates the distinct keyword is used as follows: select distinct branch-name from Loan The all key word can be used to specify explicitly that duplicates are not removed. Even if we not use all it means the same so we don’t require all to use in select clause. select all branch-name Prepared By Nirjharinee Parida Database Management System from Loan The asterisk “*” can be used to denote “all attributes”. The following SQL statement will select and all the attributes of Loan. select * from Loan The arithmetic expressions involving operators, +, -, *, and / are also allowed in select clause. The following statement will return the amount multiplied by 100 for the rows in Loan table. select branch-name, loan-number, amount * 10 from Loan. Where Clause Find all loan numbers for loans made at “Sadar” branch with loan amounts greater than Rs 1200. select loan-number from Loan where branch-name= “Sadar” and amount > 1200 where clause uses uses logival connectives and, or, and not operands of the logical connectives can be expressions involving the comparison operators <, <=, >, >=, =, and < >. between can be used to simplify the comparisons select loan-number from Loan where amount between 90000 and 100000 From Clause The from clause by itself defines a Cartesian product of the relations in the clause. When an attribute is present in more than one relation they can be referred as relation-name.attribute-name to avoid the ambiguity. For all customers who have loan from the bank, find their names and loan numbers select distinct customer-name, Borrower.loan-number Prepared By Nirjharinee Parida Database Management System from Borrower, Loan where Borrower.loan-number = Loan.loan-number The Rename Operation Used for renaming both relations both relations and attributes in SQL Use as clause: old-name as new-name Find the names and loan numbers of the customers who have a loan at the “Sadar” branch. select distinct customer-name, borrower.loan-number as loan-id from Borrower, Loan where Borrower.loan-number = Loan.loan-number and branch-name = “Sadar” we can now refer the loan-number instead by the name loan-id. For all customers who have a loan from the bank, find their names and loannumbers. select distinct customer-name, T.loan-number from Borrower as T, Loan as S where T.loan-number = S.loan-number Find the names of all branches that have assets greater than at least one branch located in “Mathura”. select distinct T.branch-name from branch as T, branch as S where T.assets > S.assets and S.branch-city = “Mathura” String Operation Two special characters are used for pattern matching in strings: o Percent ( % ) : The % character matches any substring o Underscore( _ ): The _ character matches any character “%Mandi”: will match with the strings ending with “Mandi” viz. “Raja Ki mandi”, “Peepal Mandi” “_ _ _” matches any string of three characters. Prepared By Nirjharinee Parida Database Management System Find the names of all customers whose street address includes the substring “Main” select customer-name from Customer where customer-street like “%Main%” Set Operations union, intersect and except operations are set operations available in SQL. Relations participating in any of the set operation must be compatible; i.e. they must have the same set of attributes. Union Operation: o Find all customers having a loan, an account, or both at the bank (select customer-name from Depositor ) union (select customer-name from Borrower ) It will automatically eliminate duplicates. o If we want to retain duplicates union all can be used (select customer-name from Depositor ) union all (select customer-name from Borrower ) Intersect Operation o Find all customers who have both an account and a loan at the bank (select customer-name from Depositor ) intersect (select customer-name from Borrower ) Prepared By Nirjharinee Parida Database Management System o If we want to retail all the duplicates (select customer-name from Depositor ) intersect all (select customer-name from Borrower ) Except Opeartion o Find all customers who have an account but no loan at the bank (select customer-name from Depositor ) except (select customer-name from Borrower ) o If we want to retain the duplicates: (select customer-name from Depositor ) except all (select customer-name from Borrower ) Aggregate Functions Aggregate functions are those functions which take a collection of values as input and return a single value. SQL offers 5 built in aggregate functionso Average: avg o Minimum:min o Maximum:max o Total: sum o Count:count The input to sum and avg must be a collection of numbers but others may have collections of non-numeric data types as input as well Prepared By Nirjharinee Parida Database Management System Find the average account balance at the Sadar branch select avg(balance) from Account where branch-name= “Sadar” The result will be a table which contains single cell (one row and one column) having numerical value corresponding to average balance of all account at sadar branch. group by clause is used to form groups, tuples with the same value on all attributes in the group by clause are placed in one group. Find the average account balance at each branch select branch-name, avg(balance) from Account group by branch-name By default the aggregate functions include the duplicates. distinct keyword is used to eliminate duplicates in an aggregate functions: Find the number of depositors for each branch select branch-name, count(distinct customer-name) from Depositor, Account where Depositor.account-number = Account.account-number group by branch-name having clause is used to state condition that applies to groups rather than tuples. Find the average account balance at each branch where average account balance is more than Rs. 1200 select branch-name, avg(balance) from Account group by branch-name having avg(balance) > 1200 Count the number of tuples in Customer table select count(*) from Customer SQL doesn’t allow distinct with count(*) Prepared By Nirjharinee Parida Database Management System When where and having are both present in a statement where is applied before having. Nested Sub queries A subquery is a select-from-where expression that is nested within another query. Set Membership The in and not in connectives are used for this type of subquery. “Find all customers who have both a loan and an account at the bank”, this query can be written using nested subquery form as follows select distinct customer-name from Borrower where customer-name in(select customer-name from Depositor ) Select the names of customers who have a loan at the bank, and whose names are neither “Smith” nor “Jones” select distinct customer-name from Borrower where customer-name not in(“Smith”, “Jones”) Set Comparison Find the names of all branches that have assets greater than those of at least one branch located in Mathura select branch-name from Branch where asstets > some (select assets from Branch where branch-city = “Mathura” ) Apart from > some others comparison could be < some , <= some , >= some , = some , < > some. Prepared By Nirjharinee Parida Database Management System Find the names of all branches that have assets greater than that of each branch located in Mathura select branch-name from Branch where asstets > all (select assets from Branch where branch-city = “Mathura” ) Apart from > all others comparison could be < all , <= all , >= all , = all , < >all. Views in SQL create view command is used to define a view as follows: create view v as <query expression> where <query expression> is any legal query expression and v is the view name. either an account or a loan at the branch. This can be defined as follows: create view All-customer as (select branch-name, customer-name from Depositor, Account where Depositor.account-number=account.account-number) union (select branch-name, customer-name from Borrower, Loan where Borrower.loan-number = Loan.loan-number) The attributes names may be specified explicitly within a set of round bracket after the name of view. names may be used as relations in subsequent queries. Using the view Allcustomer Find all customers of Sadar branch Prepared By Nirjharinee Parida Database Management System select customer-name from All-customer where branch-name= “Sadar” create-view clause creates a view definition in the database which stays until a command - drop view view-name - is executed. Modification of Database Deletion In SQL we can delete only whole tuple and not the values on any particular attributes. The command is as follows: delete from r where P. where P is a predicate and r is a relation. delete command operates on only one relation at a time. Examples are as follows: Delete all tuples from the Loan relation delete from Loan Delete all of the Smith’s account record delete from Depositor where customer-name = “Smith” Delete all loans with loan amounts between Rs 1300 and Rs 1500. delete from Loan where amount between 1300 and 1500 Delete the records of all accounts with balances below the average at the bank delete from Account where balance < ( select avg(balance) from Account) Prepared By Nirjharinee Parida Database Management System Insertion In SQL we either specify a tuple to be inserted or write a query whose result is a set of tuples to be inserted. Examples are as follows: Insert an account of account number A-9732 at the Sadar branch having balance of Rs 1200 insert into Account values(“Sadar”, “A-9732”, 1200) the values are specified in the order in which the corresponding attributes are listed in the relation schema. SQL allows the attributes to be specified as part of the insert statement insert into Account(account-number, branch-name, balance) values(“A-9732”, “Sadar”, 1200) insert into Account(branch-name, account-number, balance) values(“Sadar”, “A-9732”, 1200) Provide for all loan customers of the Sadar branch a new Rs 200 saving account for each loan account they have. Where loan-number serve as the account number for these accounts. insert into Account select branch-name, loan-number, 200 from Loan where branch-name = “Sadar” Updates Used to change a value in a tuple without changing all values in the tuple. Suppose that annual interest payments are being made, and all balances are to be increased by 5 percent. update Account set balance = balance * 1.05 Suppose that accounts with balances over Rs10000 receive 6 percent interest, Prepared By Nirjharinee Parida Database Management System whereas all others receive 5 percent. update Account set balance = balance * 1.06 where balance > 10000 update Account set balance = balance * 1.05 where balance <= 10000 Data Definition Language Data Types in SQL char(n): fixed length character string, length n. varchar(n): variable length character string, maximum length n. int: an integer. smallint: a small integer. numeric(p,d): fixed point number, p digits( plus a sign), and d of the p digits are to right of the decimal point. real, double precision: floating point and double precision numbers. float(n): a floating point number, precision at least n digits. date: calendar date; four digits for year, two for month and two for day of month. time: time of day n hours minutes and seconds. Domains can be defined as create domain person-name char(20). the domain name person-name can be used to define the type of an attribute just like built-in domain. Schema Definition in SQL create table command is used to define relations. create table r (A1D1, A2D2,… , AnDn, <integrity constraint1>, …, <integrity constraintk>) Prepared By Nirjharinee Parida Database Management System where r is relation name, each Ai is the name of attribute, Di is the domain type of values of Ai. Several types of integrity constraints are available to define in SQL. Integrity Constraints which are allowed in SQL are primary key(Aj1, Aj2,… , Ajm) and check(P) where P is the predicate. drop table command is used to remove relations from database. alter table command is used to add attributes to an existing relation alter table r add A D it will add attribute A of domain type D in relation r. alter table r drop A it will remove the attribute A of relation r. Integrity Constraints Integrity Constraints guard against accidental damage to the database. Integrity constraints are predicates pertaining to the database. Domain Constraints: Predicates defined on the domains are Domain constraints. Simplest Domain constraints are defined by defining standard data types of the attributes like Integer, Double, Float, etc. We can define domains by create domain clause also we can define the constraints on such domains as follows: create domain hourly-wage numeric(5,2) constraint wage-value-test check(value >= 4.00) So we can use hourly-wage as data type for any attribute where DBMS will automatically allow only values greater than or equal to 4.00. Other examples for defining Domain constraints are as follows: Prepared By Nirjharinee Parida Database Management System create domain account-number char(10) constraint account-number-null-test check(value not null) create domain account-type char(10) constraint account-type-test check (value in ( “Checking”, “Saving”)) By using the later domain of two above the DBMS will allow only values for any attribute having type as account-type i.e. Checking and Saving. Referential Integrity: Foreign Key: If two table R and S are related to each other, K1 and K2 are primary keys of the two relations also K1 is one of the attribute in S. Suppose we want that every row in S must have a corresponding row in R, then we define the K1 in S as foreign key. Example in our original database of library we had a table for relation BORROWEDBY, containing two fields Card No. and Acc. No. . Every row of BORROWEDBY relation must have corresponding row in USER Table having same Card No. and a row in BOOK table having same Acc. No.. Then we will define the Card No. and Acc. No. in BORROWEDBY relation as foreign keys. In other way we can say that every row of BORROWEDBY relation must refer to some row in BOOK and also in USER tables. Such referential requirement in one table to another table is called Referential Integrity. Referential Integrity constraints are defined by defining some of the attributes in a table, which forms primary key of some other table, as foreign key. Functional Dependencies Suppose in a relation having schema R, . A functional dependency holds on R if, in any table having schema R, for every two rows r1 and r2 the values of attributes are same in r1 and r2 then values of attributes are also same. Consider for example the table as follows Prepared By Nirjharinee Parida Database Management System Check if Seq A B C D 1 a1 b1 c1 d1 2 a1 b2 c1 d2 3 a2 b2 c2 d2 4 a2 b3 c2 d3 5 a3 b3 c2 d4 Holds, find pair of rows where value of A is same Row 1 and 2, value of A is same and C is also same row 3 and 4, Value of A is same and C is also same No other two rows having same value on A, So Check if holds. Holds, find pair of rows where value of C is same row 1 and 2, value of C is same and A is also same row 3 and 4, value of C is same and A is also same row 4 and 5, value of C is same but A is not same, So We can prove doesn’t hold. also holds, find pair of rows where value of A and B are both same No row where A and B both are same, So holds If K is a super key of a relation R then it means functional dependency holds and vice versa. Armstrong’s Rules: Suppose there is a given relation R and a set of functional dependencies F that holds on R. Then these rules can be used to derive all of the other functional dependencies which are logically implied from the given relation R and functional dependencies F. Reflexivity rule: if Augmentation rule: if Transitivity rule: if is a set of attributes and holds and holds and , then holds. is a set of attributes, then holds, then holds. holds. Prepared By Nirjharinee Parida Database Management System Additional rules are also formed to simplify deriving new functional dependencies since applying Armstrong’s rules is a lengthy and tiresome task. Although we can generate all the functional dependencies using only Armstrong’s rule. Union rule: if Decomposition rule. if Pseudotransitivity rule. If holds and holds, then holds, then holds and holds. holds and holds, then holds. holds. Closure of Functional Dependencies: Suppose the given set of functional dependencies is F for a given relation schema R. When we apply various rules stated above and generate all of the possible newer functional dependencies. Then the set containing all these newer functional dependencies and the given set of functional dependencies F is called the closure of functional dependencies and is denoted as F +. Consider schema R=( A, B, C, G, H, I ) and the set of functional dependencies F containing following functional dependencies. Find other functional dependencies that can be derived using various rules given above Examples are as followscan be derived using functional dependencies 1 and 5 and transitivity rule. can be derived using functional dependencies 3 and 4 and union rule. can be derived using 2 and 4 and Pseudotransitivity. Prepared By Nirjharinee Parida Database Management System Normal Forms Some of the undesirable properties that a bad database design may have o Repetition of information o Inability to represent certain information o Incapability to maintain integrity of data The normal forms of relational database theory provide criteria for determining a table's degree of vulnerability to logical inconsistencies and anomalies. The higher the normal form applicable to a table, the less vulnerable it is to inconsistencies and anomalies. Each table has a "highest normal form" (HNF): by definition, a table always meets the requirements of its HNF and of all normal forms lower than its HNF; also by definition, a table fails to meet the requirements of any normal form higher than its HNF. Generally known hierarchy of normal forms is as follows First Normal Form(1NF), Second Normal Form(2NF), Third Normal Form(3NF), Fourth Normal Form(4NF), Fifth Normal Form(5NF). We will discuss only up to 3NF of above hierarchy and another normal form Boyce-Codd Normal Form(BCNF) in this course. First Normal Form According to Date's definition of 1NF, a table is in 1NF if and only if it is "isomorphic to some relation", which means, specifically, that it satisfies the following five conditions: 1. There's no top-to-bottom ordering to the rows. 2. There's no left-to-right ordering to the columns. 3. There are no duplicate rows. 4. Every row-and-column intersection contains exactly one value from the applicable domain (and nothing else). Prepared By Nirjharinee Parida Database Management System 5. All columns are regular [i.e. rows have no hidden components such as row IDs, object IDs, or hidden timestamps]. Examples of tables (or views) that would not meet this definition of 1NF are: A table that lacks a unique key. Such a table would be able to accommodate duplicate rows, in violation of condition 3. A view whose definition mandates that results be returned in a particular order, so that the row-ordering is an intrinsic and meaningful aspect of the view. This violates condition 1. The tuples in true relations are not ordered with respect to each other. A table which is having at least one nullable attribute. A nullable attribute would be in violation of condition 4, which requires every field to contain exactly one value from its column's domain. It should be noted, however, that this aspect of condition 4 is controversial. It marks an important departure from Codd's later vision of the relational model, which made explicit provision for nulls. Codd states that the "values in the domains on which each relation is defined are required to be atomic with respect to the DBMS." Codd defines an atomic value as one that "cannot be decomposed into smaller pieces by the DBMS (excluding certain special functions)." Meaning a field should not be divided into parts with more than one kind of data in it such that what one part means to the DBMS depends on another part of the same field. Suppose a novice designer wish to record the names and telephone numbers of customers. He defines a customer table which looks like this: Prepared By Nirjharinee Parida Database Management System Customer Customer ID First Name Surname Telephone Number 123 Robert Ingram 555-861-2025 456 Jane Wright 555-403-1659 789 Maria Fernandez 555-808-9633 The designer then becomes aware of a requirement to record multiple telephone numbers for some customers. He reasons that the simplest way of doing this is to allow the "Telephone Number" field in any given record to contain more than one value: Customer ID First Name Surname Telephone Number 123 Robert Ingram 555-861-2025 456 Jane Wright 555-403-1659 555-776-4100 789 Maria Fernandez 555-808-9633 Assuming, however, that the Telephone Number column is defined on some Telephone Number-like domain (e.g. the domain of strings 12 characters in length), the representation above is not in 1NF. 1NF (and, for that matter, the RDBMS) prevents a single field from containing more than one value from its column's domain. Repeating groups across columns: The designer might attempt to get around this restriction by defining multiple Telephone Number columns: Prepared By Nirjharinee Parida Database Management System Customer First ID Name Surname Tel. No. 1 123 Robert Ingram 555-8612025 456 Jane Wright 555-4031659 789 Maria Fernandez 555-8089633 Tel. No. 2 Tel. No. 3 555-776-4100 555-4031659 This representation, however, makes use of nullable columns, and therefore does not conform to Date's definition of 1NF. Even if the view is taken that nullable columns are allowed, the design is not in keeping with the spirit of 1NF.Tel. No. 1, Tel. No. 2., and Tel. No. 3. share exactly the same domain and exactly the same meaning; the splitting of Telephone Number into three headings is artificial and causes logical problems. These problems include: Difficulty in querying the table. Answering such questions as "Which customers have telephone number X?" and "Which pairs of customers share a telephone number?" is awkward. Inability to enforce uniqueness of Customer-to-Telephone Number links through the RDBMS. Customer 789 might mistakenly be given a Tel. No. 2 value that is exactly the same as her Tel. No. 1 value. Restriction of the number of telephone numbers per customer to three. If a customer with four telephone numbers comes along, we are constrained to record only three and leave the fourth unrecorded. This means that the database design is imposing constraints on the business process, rather than (as should ideally be the case) viceversa. Prepared By Nirjharinee Parida Database Management System Repeating groups within columns: The designer might, alternatively, retain the single Telephone Number column but alter its domain, making it a string of sufficient length to accommodate multiple telephone numbers: Customer ID First Name Surname Telephone Numbers 123 Robert Ingram 555-861-2025 456 Jane Wright 555-403-1659, 555-776-4100 789 Maria Fernandez 555-808-9633 This design is consistent with 1NF according to Date’s definition but not according to Codd’s definition. It presents several design issues. The Telephone Number heading becomes semantically woolly, as it can now represent either a telephone number, a list of telephone numbers, or indeed anything at all. A query such as "Which pairs of customers share a telephone number?" is more difficult to formulate, given the necessity to cater for lists of telephone numbers as well as individual telephone numbers. Meaningful constraints on telephone numbers are also very difficult to define in the RDBMS with this design. A design that complies with 1NF:A design that is unambiguously in 1NF makes use of two tables: a Customer Name table and a Customer Telephone Number table. Prepared By Nirjharinee Parida Database Management System Customer Name Customer ID First Name Surname Customer Telephone Customer ID Telephone Number 123 555-861-2025 456 555-403-1659 123 Robert Ingram 456 Jane Wright 456 555-776-4100 789 Maria Fernandez 789 555-808-9633 Repeating groups of telephone numbers do not occur in this design. Instead, each Customer-to-Telephone Number link appears on its own record. It is worth noting that this design meets the additional requirements for second and third normal form (3NF). Second Normal Form 2NF was originally defined by E.F. Codd in 1971. A 1NF table is in 2NF if and only if, given any candidate key K and any attribute A that is not a constituent of a candidate key, A depends upon the whole of K rather than just a part of it A 1NF table is in 2NF if and only if all its non-prime attributes are functionally dependent on the whole of every candidate key. (A non-prime attribute is one that does not belong to any candidate key.) Note that when a 1NF table has no composite candidate keys (candidate keys consisting of more than one attribute), the table is automatically in 2NF. Consider a table describing employees' skills: Prepared By Nirjharinee Parida Database Management System Employees' Skills Employee Skill Current Location Work Jones Typing 114 Main Street Jones Shorthand 114 Main Street Jones Whittling 114 Main Street Bravo Light Cleaning 73 Industrial Way Ellis Alchemy 73 Industrial Way Ellis Flying 73 Industrial Way Harrison Light Cleaning 73 Industrial Way Neither {Employee} nor {Skill} is a candidate key for the table. This is because a given Employee might need to appear more than once (he might have multiple Skills), and a given Skill might need to appear more than once (it might be possessed by multiple Employees). Only the composite key {Employee, Skill} qualifies as a candidate key for the table. The remaining attribute, Current Work Location, is dependent on only part of the candidate key, namely Employee. Therefore the table is not in 2NF. Note the redundancy in the way Current Work Locations are represented: we are told three times that Jones works at 114 Main Street, and twice that Ellis works at 73 Industrial Way. This redundancy makes the table vulnerable to update anomalies: it is, for example, possible to update Jones' work location on his "Typing" and "Shorthand" records and not update his "Whittling" record. The resulting data would imply contradictory answers to the question "What is Jones' current work location?" Prepared By Nirjharinee Parida Database Management System A 2NF alternative to this design would represent the same information in two tables: an "Employees" table with candidate key {Employee}, and an "Employees' Skills" table with candidate key {Employee, Skill}: Employees’ Skills Employees Employee Jones Bravo Ellis Harrison Current Work Location 114 Main Street 73 Industrial Way 73 Industrial Way 73 Industrial Way Employee Jones Jones Jones Bravo Ellis Ellis Harrison Skill Typing Shorthand Whittling Light Cleaning Alchemy Flying Light Cleaning Neither of these tables can suffer from update anomalies. Not all 2NF tables are free from update anomalies, however. An example of a 2NF table which suffers from update anomalies is: Tournament Winners Tournament Year Winner Des Moines Masters Indiana Invitational Cleveland Open Des Moines Masters Indiana Invitational 1998 1998 1999 1999 1999 Chip Masterson Al Fredrickson Bob Albertson Al Fredrickson Chip Masterson Winner Date of Birth 14 March 1977 21 July 1975 28 September 1968 21 July 1975 14 March 1977 Even though Winner and Winner Date of Birth are determined by the whole key {Tournament / Year} and not part of it, particular Winner / Winner Date of Birth combinations are shown redundantly on multiple records. This leads to an update anomaly: if updates are not carried out consistently, a particular winner could be shown as having two different dates of birth. Prepared By Nirjharinee Parida Database Management System The underlying problem is the transitive dependency to which the Winner Date of Birth attribute is subject. Winner Date of Birth actually depends on Winner, which in turn depends on the key Tournament / Year. This problem is addressed by third normal form (3NF) Note: In addition to the primary key, the table may contain other candidate keys; it is necessary to establish that no non-prime attributes have part-key dependencies on any of these candidate keys. Third Normal Form: 3NF as defined by E.F. Codd in 1971 is - a table is in 3NF if and only if both of the following conditions hold: The relation R (table) is in second normal form (2NF) Every non-prime attribute of R is non-transitively dependent (i.e. directly dependent) on every candidate key of R. Note: A non-prime attribute of R is an attribute that does not belong to any candidate key of R. A transitive dependency is a functional dependency in which X → Z (X determines Z) indirectly, because X → Y and Y → Z (where it is not the case that Y → X). A 3NF definition, equivalent to Codd's given by Carlo Zaniolo in 1982, states that a table is in 3NF if and only if, for each of its functional dependencies X → A, at least one of the following conditions holds: X contains A (that is, X → A is trivial functional dependency), or X is a superkey, or Each attribute in X-A is a prime attribute (i.e., it is contained within a candidate key) Prepared By Nirjharinee Parida Database Management System Zaniolo's definition gives a clear sense of the difference between 3NF and the more stringent Boyce-Codd normal form (BCNF). BCNF simply eliminates the third alternative ("X-A has only prime attribute"). Difference between 2NF and 3NF can be stated as: non-key attributes be dependent on "the whole key" ensures that a table is in 2NF; while that non-key attributes be dependent on "nothing but the key" ensures that the table is in 3NF. Example of table given above : Tournament Des Moines Masters Indiana Invitational Cleveland Open Des Moines Masters Indiana Invitational Tournament Winners Year 1998 1998 1999 1999 1999 Winner Chip Masterson Al Fredrickson Bob Albertson Al Fredrickson Chip Masterson Winner Date of Birth 14 March 1977 21 July 1975 28 September 1968 21 July 1975 14 March 1977 This table is in 2NF but not in 3NF. The breach of 3NF occurs because the non-prime attribute Winner Date of Birth is transitively dependent on the candidate key {Tournament, Year} via the non-prime attribute Winner. The fact that Winner Date of Birth is functionally dependent on Winner makes the table vulnerable to logical inconsistencies, as there is nothing to stop the same person from being shown with different dates of birth on different records. In order to express the same facts without violating 3NF, it is necessary to split the table into two: Tournament Winners Tournament Des Moines Masters Indiana Invitational Cleveland Open Des Moines Masters Indiana Invitational Year 1998 1998 1999 1999 1999 Winner Chip Masterson Al Fredrickson Bob Albertson Al Fredrickson Chip Masterson Prepared By Nirjharinee Parida Database Management System Player Dates of Birth Player Chip Masterson Al Fredrickson Bob Albertson Date of Birth 14 March 1977 21 July 1975 28 September 1968 Boyce-Codd Normal Form: It is a slightly stronger version of the third normal form (3NF). A table is in BoyceCodd normal form if and only if for every one of its non-trivial [dependencies] X → Y, X is a superkey—that is, X is either a candidate key or a superset thereof. Note the above set of tables “Tournament Winners” and “Player Dates of Birth” shown as in 3NF are also in BCNF Only in rare cases does a 3NF table not meet the requirements of BCNF. A 3NF table which does not have multiple overlapping candidate keys is guaranteed to be in BCNF An example of a 3NF table that does not meet BCNF is Today's Court Bookings Court 1 1 1 2 2 2 Start Time 09:30 11:00 14:00 10:00 11:30 15:00 End Time 10:30 12:00 15:30 11:30 13:30 16:30 Rate Type SAVER SAVER STANDARD PREMIUM-B PREMIUM-B PREMIUM-A There are two courts available and there are four distinct rate types: SAVER, for Court 1 bookings made by members STANDARD, for Court 1 bookings made by non-members PREMIUM-A, for Court 2 bookings made by members PREMIUM-B, for Court 2 bookings made by non-members Prepared By Nirjharinee Parida Database Management System So, Rate Type → Court is only non-trivial functional dependency that holds. o We can observe that the table's candidate keys are: {Court, Start Time} {Court, End Time} {Rate Type, Start Time} {Rate Type, End Time} o In the Today's Court Bookings table, there are no non-prime attributes: that is, all attributes belong to candidate keys. Therefore the table adheres to both 2NF and 3NF o The table does not adhere to BCNF because in the dependency Rate Type → Court, the determining attribute (Rate Type) is not a super key. The design can be amended so that it meets BCNF as follows: Today’s Bookings Rate Types Rate Type SAVER STANDARD PREMIUM-A PREMIUM-B Court 1 1 2 2 Member Flag Yes No Yes No Rate Type SAVER SAVER STANDARD PREMIUM-B Start Time 09:30 11:00 14:00 10:00 End Time 10:30 12:00 15:30 11:30 the candidate keys for the Today's Bookings table are {Rate Type, Start Time} and {Rate Type, End Time}. Both tables are in BCNF. Consider the following table: Lending branchname Sadar Sanjayplace branch-city assets Agra Agra 200000 100000 customername Ram Ram loan-number amount L-12 L-13 12000 13000 This table stores the information regarding loans. This table has following problems: Prepared By Nirjharinee Parida Database Management System Since every branch is going to have several loans, the table will have one row for each loan taken from a branch all of which will have same value for the columns branch-name, branch-city and assets, repetition of data. Updating the branch-city or assets of a particular branch will require updating each row of this table and hence the operation will be costly. If we miss any row without updating then there will be more than one value for a branch city or assets of a branch, which means breaching the data integrity. If there is a branch having no loans then we will not have any entry in this table and we will not be able represent the complete information. Decomposition The above problem can be solved by decomposing the above table. The set of relations R1, R2,…Rn is a decomposition of relation R if R = R1 R2 … Rn . It should be noted that every pair Ri and Ri+1 of this set should have at least one common attribute so that they can be combined back again using join operation. But all decompositions of this table will not be free from problem. Consider for example if we form two new tables out of our Lending table as follows Branch-customer-schema = (branch-name, branch-city, assets, customer name) Customer-loan-schema = (customer-name, loan-number, amount) Then the resulting tables with data will be as follows: Branch-customer branch- branch-city assets name customername Sadar Agra 200000 Ram Sanjay- Agra 100000 Ram place Prepared By Nirjharinee Parida Database Management System Customer-loan customer- loan-number amount Ram L-12 12000 Ram L-13 13000 name Now suppose to know the branch for loan L-12 we try to form join of these two we will a table as follows: Branch-customer branchname Sadar Sadar Sanjayplace Sanjayplace Customer-loan = branch-city assets loan-number amount 200000 200000 100000 customername Ram Ram Ram Agra Agra Agra L-12 L-13 L-12 12000 13000 12000 Agra 100000 Ram L-13 13000 According to this join both of the loans are taken from both of the branches. This is an example of information loss. This occurred because the choice of Column to be kept common in two tables after decomposition is wrong. Lossless-Join Decomposition: A decomposition { R1, R2,…Rn } of relation schema R is lossless join decomposition if for all legal relations r on schema R, r= in other words after decomposition, when we join all of the decomposed tables with data it should result in the original table with data as was before decomposition. Otherwise it is called Lossy-join decomposition. Dependency preservation: This is another desirable property of a decomposition. Suppose it is given that a set F of functional dependencies holds on any relation based on schema R. Then set of functional dependencies that holds on any relation Prepared By Nirjharinee Parida Database Management System subschema R1 is F1 that contains all the functional dependencies of F which contains attributes of only R1. So if decomposition of R is { R1, R2,…Rn } such that corresponding functional dependencies which holds on them are { F1, F2,…Fn } then following should be true. F+ = {F1 … F2 Fn}+. Such a decomposition is called dependency preserving decomposition. For example: Consider the schema R = {A, B, C, D} such that following functional dependency holds on it F = { }. Now suppose the decomposition of this R is R1= {A,B} and R2 = {B,C,D}, so the functional dependencies which holds on R1 are F1= { } (Note: F1 should contain all the functional dependencies in F which have only attributes of R 1) and those on R2 are F2 ={ }. If we union F1 F2 is { } which doesn’t contain the , so it is not a dependency preserving decomposition. If we decompose R into these relation schemas R1 ={A,B,C} and R2={C,D} then F1={ } and F2 ={ } so F1 F2 is { }. Normalization Using Functional Dependency Lossless-Join Decomposition using FD: Let R is relation schema and F is a set of functional dependency on R. Let R 1 and R2 form a decomposition of R. This decomposition is lossless join decomposition if at least one of the following functional dependency is in F+: R1 ∩ R2 → R1 R1 ∩ R2 → R2 Example: Lending-schema=(branch-name, branch-city, assets, customer-name, loannumber, amount) the FD that holds on this schema are given as branch-name → assets branch-city loan-number → amount branch-name so the decomposition of it into two schema as follows: Branch-schema = (branch-name, branch-city, assets) Prepared By Nirjharinee Parida Database Management System Loan-info-schema = (branch-name, customer-name, loan-number, amount) is a lossless join decomposition becauseBranch-schema ∩ Loan-info-schema = branch-name and we have an FD branch-name → assets branch-city, applying augmentation rule to it, this FD is equivalent to branch-name → branch-name assets branch-city i.e. branchname →Branch-schema. Third Normal Form Using FD: Let R is a relation having F as the minimal set of functional dependencies that holds on R. Then do the following: Initially have an empty set of relations. for each FD in F, Add a relation Ri =( i=1 ) if no other relation contains , Increase i by one After adding all such relations add another relation Ri = ( any candidate key of R) if no other relation is containing a candidate key. Boyce-Codd Normal Form using FD: 1. Let Ri be relation i.e. not in BCNF 2. And, let is the FD that holds on but doesn’t hold on (i.e. is not a super key of Ri) 3. Replace relation Ri by two relations ( ) and (Ri - ). 4. Now check again all the relations present with all the FD’s that holds on them and Go back to step 1. Example: Consider: Lending-schema=(branch-name, branch-city, assets, customer-name, loannumber, amount) the FD that holds on this schema are given as 1.branch-name → assets branch-city 2.loan-number → amount branch-name Prepared By Nirjharinee Parida Database Management System We can see that Lending-schema is not in BCNF. Also we see that in FD branch-name → assets branch-city, branch-name is not superkey of Lending-schema. So new relations is a set as follows: Branch-schema=(branch-name, branch-city, assets) branch-name → assets branch-city Loan-info-schema = (branch-name, customer-name, loan-number, amount) loan-number → amount branch-name Again in the new set of relations we see Loan-info-schema is not in BCNF as loannumber is not a super key of Loan-info-schema. Again we decompose it and the set of relations are Branch-schema=(branch-name, branch-city, assets) branch-name → assets branch-city Loan-schema = (branch-name, loan-number, amount) loan-number → amount branch-name Borrower-schema = (customer-name, loan-number) Now all of the three relations are in BCNF so we do not have to decompose any more. 3. BCNF may not satisfy the dependency preservation criteria. a. In some cases, a non-BCNF table cannot be decomposed into tables that satisfy BCNF and preserve the dependencies that held in the original table b. For example, a set of functional dependencies {AB → C, C → B} cannot be represented by a BCNF schema. c. Unlike the first three normal forms, BCNF is not always achievable. d. Consider the following non-BCNF table whose functional dependencies follow the {AB → C, C → B} pattern: Prepared By Nirjharinee Parida Database Management System Nearest Shop Person Davidson Davidson Wright Fuller Fuller Fuller Shop Type Optician Hairdresser Bookshop Bakery Nearest Shop Eagle Eye Snippets Merlin Books Doughy's Sweeney Hairdresser Todd's Optician Eagle Eye For each Person / Shop Type combination, the table tells us which shop of this type is geographically nearest to the person's home. We assume for simplicity that a single shop cannot be of more than one type. The candidate keys of the table are: {Person, Shop Type} {Person, Nearest Shop} Because all three attributes are prime attributes (i.e. belong to candidate keys), the table is in 3NF. The table is not in BCNF, however, as the Shop Type attribute is functionally dependent on a non-superkey: Nearest Shop. Shop Near Person Person Davidson Davidson Wright Fuller Fuller Fuller Shop Eagle Eye Snippets Merlin Books Doughy's Sweeney Todd's Eagle Eye Shop Shop Eagle Eye Snippets Merlin Books Doughy's Sweeney Todd's Shop Type Optician Hairdresser Bookshop Bakery Hairdresser The "Shop Near Person" table has a candidate key of {Person, Shop}, and the "Shop" table has a candidate key of {Shop}. Unfortunately, although this design adheres to Prepared By Nirjharinee Parida Database Management System BCNF, it is unacceptable on different grounds: it allows us to record multiple shops of the same type against the same person. In other words, its candidate keys do not guarantee that the functional dependency {Person, Shop Type} → {Shop} will be respected. Prepared By Nirjharinee Parida Database Management System Module-II Short Questions 2008 1. Let R=(A,B,C,D) and functional dependencies (1) A->C, (2) AB->D. What is the closure of {A,B} ? [2][ Database design] 2. What do you mean by semi less join? [2][ Database design] 3. What do you mean by multi-valued dependency? [2][ Database design] 4. Define and differentiate between Natural Join and Inner Join. [2][ Database design] 2009 No questions 2010 1. Consider a multilevel indexwith fan-out=4 used to index 25 records, draw the structure. (1) A->C, (2) AB->D. What is theclosure of {A, B}? [2][ Database design] 2. For a relation R(A,B,C,D) with the dependency among numeric field values: 6= 2A + B and 9 = 2E. Draw the E-Rdiagram. [2][ Database design] 3. What is a query tree? Draw the query treefor the following SQL query:Select R1 .A, R2.B from R1, R2 where R1. b=R2.a; [2][ Query processing] 4. For,the following set of dependencies : {A -> BC, B-> D,C -> DE, BC ->F} Find primary key of the relation. [2][ Database design] 5. What do you mean by RAID? [2][ Database design] 2011 1. What is fully functional dependency? Give an example. [2][ Database design] 2. What is multivalued dependency? [2][ Database design] 2012 No questions Long Questions 2008 Prepared By Nirjharinee Parida Database Management System 1. What is normalization of relation? What is a key attribute in a relation? What is the difference between 1st Normal Form, 2nd normal form and 3rd normal form? [5][Database design] 2. Define the structure and properties of B Tree. Explain how the B tree is used as an index structure. Construct a B tree of order 3 with following key value : 10, 2, 30, 20, 86, 4, 6, 3, 60, 84, 88, 33, 52, 91, 69. [5][Query processing] State Armstrong’s axioms. Show that Armstrong’s axioms are complete. [5][Query processing] 3. Explain the difference between inner join and outer join. What are the restrictions on using outer join? Give examples to support your answer. [5][Query processing] 4. What does the term redundancy mean? Discuss the implications of redundancy in a relational database. [5][Database design] 5. Define (i) Primary key, and (ii) Foreign key, Suppose relation R(A,B,C,D,E) has functional dependencies: AB ->C D ->A AE ->B CD -> E BE ->D Find all the candidate keys of R. [5][Database design] 6. Construct a B+ tree of order 1 with following keys, 1, 9, 5, 3, 7, 11, 17, 13, 15? [5][Query processing] What is the use of outer join and list out the three types of outer join with the notations used in relational algebra? [5][Database design] 2009 1. Suppose we have the following relation with given data as: Roll_no Student_name Subject_code Subject_name Marks 101 Aasish S01 English 80 101 Aasish S02 Physics 56 101 Aasish S03 Chemistry 63 102 Sushmita S01 English 90 102 Sushmita S02 Physics 85 102 Sushmita S03 Chemistry 91 To avoid redundancy in the first proposal we have decomposed the relation in to the following relations. Subject (Roll_no, student_name) And subject(subject_code, subject_name, marks)) In the second proposal it is decomposed in to the following relations. Student(Roll_no, Student_name) Result(Roll_no,subject_code, marks) Subject(Subject_code, Subject_name) State whether the decompsition are lossless or lossy. [5][Database design] Prepared By Nirjharinee Parida Database Management System What is normalization? Is it necessary for RDBMS? Expalin 4NF and 5NF with example. [5][Database design] 3. Consider the three relations given below. 2. Order(order_no, order_date, customer_no ) Order_item(Order_no, item_no, quantity, bill_amount) Item (item_no, item_name, unit_price) State whether the following relations are in 3NF or not if not then what steps you should follow to bring those in to 3NF? [5][Database design] What is query optimization? Write the Heuristic optimization algorithm and solve the query given below: Find the names of employees work on project name “NASA” and born after 1956. Project (project_name, Project_number, Dept) Employee (Emp_id, P_number, DOB, name) 4. Work_on (P_no, Emp_id) [10][Database design] Let us consider a relation with attributes A, B, C, D, E and F. Suppose that this relation has the FD AB->, BC->AD, D->E and CF->B . What Is the closure of {A,B} that is {AB}+ [5][Database design] 2010 1. Consider the following set of data items : A B C A1 1 X1 A2 2 X2 A3 3 X3 A3 3 X3 Represent it in 3NF D D1 D2 D3 D3 [5][Database design] 2. What is the difference between 4NFand BCNF? Describe with examples. [5][Database design] 3. Write short notes on Multivalued dependencies . [5][Database design] 2011 1. Explain Armstrong’s axioms with examples. [5][Database design] 2012 1. What is normalization? Why it is required? Explain the Boyce-Codd normal form with an example. [5][Database design] 2. Write short notes on the following A. Hashing B. Dependency preservation [5][Qyery processing, Database design] Prepared By Nirjharinee Parida Database Management System MODULE -3 TRANSACTION MANAGEMENT Prepared By Nirjharinee Parida Database Management System THE CONCEPT OF A TRANSACTION A user writes data access/update programs in terms of the high-level query and update language supported by the DBMS. To understand how the DBMS handles such requests, with respect to concurrency control and recovery, it is convenient to regard an execution of a user program, or transaction, as a series of reads and writes of database objects: To read a database object, it is must brought into main memory from disk, and then its value is copied into a program variable. To write a database object, an in-memory copy of the object is must modified and then written to disk. Database `objects' are the units in which programs read or write information. The units could be pages, records, and so on, but this is dependent on the DBMS and is not central to the principles underlying concurrency control or recovery. In this chapter, we will consider a database to be a collection of independent objects. When objects are added to or deleted from a database, or there are relationships between database objects that we want to exploit for performance. There are four important properties of transactions that a DBMS must ensure to maintain data in the face of concurrent access and system failures: 1. Users should be able to regard the execution of each transaction as atomic: either all actions are carried out or none are. Users should not have to worry about the effect of incomplete transactions (say, when a system crash occurs). 2. Each transaction, run by itself with no concurrent execution of other transactions, must preserve the consistency of the database. This property is called consistency, and the DBMS assumes that it holds for each transaction. Ensuring this property of a transaction is the responsibility of the user. 3. Users should be able to understand a transaction without considering the effect of other concurrently executing transactions, even if the DBMS interleaves the actions of several transactions for performance reasons. This property is sometimes Prepared By Nirjharinee Parida Database Management System referred to as isolation: Transactions are isolated, or protected, from the effects of concurrently scheduling other transactions. 4. Once the DBMS informs the user that a transaction has been successfully completed, its effects should persist even if the system crashes before all its changes are reflected on disk. This property is called durability. The acronym ACID is sometimes used to refer to the four properties of transactions that we have presented here: atomicity, consistency, isolation and durability. We now consider how each of these properties is ensured in a DBMS. TRANSACTIONS AND SCHEDULES A transaction is seen by the DBMS as a series, or list, of actions. The actions that can be executed by a transaction include reads and writes of database objects. A transaction can also be de_ned as a set of actions that are partially ordered. That is, the relative order of some of the actions may not be important. In order to concentrate on the main issues, we will treat transactions (and later, schedules) as a list of actions. A schedule is a list of actions (reading, writing, aborting, or committing) from a set of transactions, and the order in which two actions of a transaction T appear in a schedule must be the same as the order in which they appear in T. For example, the schedule shows an execution order for actions of two transactions T1 and T2. We move forward in time as we go down from one row to the next. We emphasize that a schedule describes the actions of transactions as seen by the DBMS. In addition to these actions, a transaction may carry out other actions, such as reading or writing from operating system files, evaluating arithmetic expressions, and so on. T1 R(A) W(A) T2 R(B) W(B) ( A Schedule Involving Two Transactions) Prepared By Nirjharinee Parida Database Management System R(C) W(C) Notice that the schedule in figure does not contain an abort or commit action for either transaction. A schedule that contains either an abort or a commit for each transaction whose actions are listed in it is called a complete schedule. A complete schedule must contain all the actions of every transaction that appears in it. If the actions of different transactions are not interleaved- that is, transactions are executed from start to finish, one by one- we call the schedule a serial schedule. The basic database access operations that a transaction can include are as follows: • read_item(X): Reads a database item named X into a program variable. • write_item(X): Writes the value of program variable X into the database item namedX. Executing a read_item(X) command includes the following steps: 1. Find the address of the disk block that contains item X. 2. Copy that disk block into a buffer in main memory (if that disk block is not already in some main memory buffer). 3. Copy item X from the buffer to the program variable named X. Executing a write_item(X) command includes the following steps: 1. Find the address of the disk block that contains item X. 2. Copy that disk block into a buffer in main memory (if that disk block is not already in some main memory buffer). 3. Copy item X from the program variable named X into its correct location in the buffer. 4. Store the updated block from the buffer back to disk (either immediately or at some later point in time). The DBMS will generally maintain a number of buffers in main memory that hold database disk blocks containing the database items being processed. When these buffers are all occupied, and additional database blocks must be copied into memory, Prepared By Nirjharinee Parida Database Management System some buffer replacement policy is used to choose which of the current buffers is to be replaced. If the chosen buffer has been modified, it must be written back to disk before it is reused. A transaction includes read_item and wri te_item operations to access and update the database. Figure 2 shows examples of two very simple transactions. The read-set of a transaction is the set of all items that the transaction reads, and the write-set is the set of all items that the transaction writes. For example, the read-set of T1in Figure 2 is {X, Y} and its write-set is also {X, Y}. (Figure 2: Two sample transactions. (a) Transaction T1 . (b) Transaction T2.) Why Concurrency Control Is Needed Several problems can occur when concurrent transactions execute in an uncontrolled manner. Figure 2( a) shows a transaction T1 that transfers N reservations from one flight whose number of reserved seats is stored in the database item named X to another flight whose number of reserved seats is stored in the database item named Y. Prepared By Nirjharinee Parida Database Management System Figure 2(b) shows a simpler transaction T2 that just reserves M seats on the first flight (X) referenced in transaction T1. The Lost Update Problem: This problem occurs when two transactions that access he same database items have their operations interleaved in a way that makes the value of some database items incorrect. Suppose that transactions T1 and T2 are submitted at approximately the same time, and suppose that their operations are interleaved as shown in Figure 3a; then the final value of item X is incorrect, because T2 reads the value of X before T1 changes it in the database, and hence the updated value resulting from T1 is lost Prepared By Nirjharinee Parida Database Management System Figure 3: Some problems that occur when concurrent execution is uncontrolled. (a) The lost update problem. (b) The temporary update problem. The Temporary Update (or Dirty Read) Problem. This problem occurs when one transaction updates a database item and then the transaction fails for some reason. The updated item is accessed by another transaction before it is changed back to its original value. Figure 3b shows an example where T1 updates item X and then fails before completion, so the system must change X back to its original value. Before it can do so, however, transaction T2 reads the "temporary" value of X, which will not be recorded permanently in the database because of the failure of T1.The value of item X that is read by T2 is called dirty data, because it has been created by a transaction that has not completed and committed yet; hence, this problem is also known as the dirty read problem. The Incorrect Summary Problem: If one transaction is calculating an aggregate summary function on a number of records while other transactions are updating some of these records, the aggregate function may calculate some values before they are updated and others after they are updated. For example, suppose that a transaction T3 Prepared By Nirjharinee Parida Database Management System is calculating the total number of reservations on all the flights; meanwhile, transaction T1 is executing. If the interleaving of operations shown in Figure 3c occurs, the result of T3 will be off by an amount N because T3 reads the value of X after N seats have been subtracted from it but reads the value of Y before those N seats have been added to it. Another problem that may occur is called unrepeatable read, where a transaction T reads an item twice and the item is changed by another transaction T' between the two reads. Hence, T receives different values for its two reads of the same item. Why Recovery Is Needed Whenever a transaction is submitted to a DBMS for execution, the system is responsible for making sure that either (1) all the operations in the transaction are completed successfully and their effect is recorded permanently in the database, or (2) the transaction has no effect whatsoever on the database or on any other transactions. The DBMS must not permit some operations of a transaction T to be applied to the database while other operations of T are not. This may happen if a transaction fails after executing some of its operations but before executing all of them. Types of Failures: Failures are generally classified as transaction, system, and media failures. There are several possible reasons for a transaction to fail in the middle of execution: • A computer failure (system crash): A hardware, software, or network error occurs in the computer system during transaction execution. Hardware crashes are usually media failuresfor example, main memory failure. • A transaction or system error: Some operation in the transaction may cause it to fail, such as integer overflow or division by zero. Transaction failure may also occur because of erroneous parameter values or because of a logical programming error.' In addition, the user may interrupt the transaction during its execution. • Local errors or exception conditions detected by the transaction: During transaction execution, certain conditions may occur that necessitate cancellation of the Prepared By Nirjharinee Parida Database Management System transaction. For example, data for the transaction may not be found. Notice that an exception condition," such as insufficient account balance in a banking database, may cause a transaction,such as a fund withdrawal, to be canceled. This exception should be programmed in the transaction itself, and hence would not be considered a failure. • Concurrency control enforcement: The concurrency control method may decide to abort the transaction, to be restarted later, because it violates serializability or because several transactions are in a state of deadlock. • Disk failure: Some disk blocks may lose their data because of a read or write malfunction or because of a disk read/write head crash. This may happen during a read or a write operation of the transaction. • Physical problems and catastrophes: This refers to an endless list of problems that includes power or air-conditioning failure, fire, theft, sabotage, overwriting disks or tapes by mistake, and mounting of a wrong tape by the operator. Figure 4: State transition diagram illustrating the states for transaction execution The System Log To be able to recover from failures that affect transactions, the system maintains a log to keep track of all transaction operations that affect the values of database items. This information may be needed to permit recovery from failures. We now list the types of entries-called log records-that are written to the log and the action each performs. In these entries, T refers to a unique transaction-id that is generated automatically by the system and is used to identify each transaction: Prepared By Nirjharinee Parida Database Management System [start-transaction,T]: Indicates that transaction T has started execution. • [write_item, T, X, old_value, new_value]: Indicates that transaction T has changed the value of database item X from old_value to new_value. • [read_item,T,X]: Indicates that transaction T has read the value of database item X. • [commit,T]: Indicates that transaction T has completed successfully, and affirms that its effect can be committed (recorded permanently) to the database. • [abort,T]: Indicates that transaction T has been aborted. Commit Point of a Transaction A transaction T reaches its commit point when all its operations that access the database have been executed successfully and the effect of all the transaction operations on the database have been recorded in the log. Beyond the commit point, the transaction is said to be committed, and its effect is assumed to be permanently recorded in the database. The transaction then writes a commit record [commit,T] into the log. If a system failure occurs, we search back in the log for all transactions T that have written a [start_transaction,T] record into the log but have not written their [commit,T] record yet; these transactions may have to be rolled back to undo their effect on the database during the recovery process. Prepared By Nirjharinee Parida Database Management System CHARACTERIZING SCHEDULES BASED ON RECOVERABILITY When transactions are executing concurrently in an interleaved fashion, then the order of execution of operations from the various transactions is known as a schedule (or history). Schedules (Histories) of Transactions A schedule (or history) S of n transactions T1, T2, ... , Tn is an ordering of the operations of the transactions subject to the constraint that, for each transaction Ti that participates in S, the operations of Ti in S must appear in the same order in which they occur in Ti.For the purpose of recovery and concurrency control, we are mainly interested in the read, item and write_item operations of the transactions, as well as the commit and abort operations. A shorthand notation for describing a schedule uses the symbols r, w, c, and a for the operations read_item, write_item, commit, and abort, respectively, and appends as subscript the transaction id (transaction number) to each operation in the schedule. In this notation, the database item 'X that is read or written follows the rand w operations in parentheses. For example, the schedule of Figure 3(a), which we shall call Sa, can be written as follows in this notation: Sa: r1(X); r2(X); W1(X); r1(Y); w2(X); W1(Y); Two operations in a schedule are said to conflict if they satisfy all three of the following conditions: (1) they belong to different transactions; (2) they access the same item X; and (3) at least one of the operations is a write_item(X). A schedule S of n transactions T1 , T2, ••• , Tn, is said to be a complete schedule if the following conditions hold: 1. The operations in S are exactly those operations in T1, T2, •.• , Tn, including a commit or abort operation as the last operation for each transaction in the schedule. 2. For any pair of operations from the same transaction Ti, their order of appearance in S is the same as their order of appearance in T; 3. For any two conflicting operations, one of the two must occur before the other in the schedule. For some schedules it is easy to recover from transaction failures, whereas for other schedules the recovery process can be quite involved. Hence, it is important to characterize the types of schedules for which recovery is possible, as well as those for Prepared By Nirjharinee Parida Database Management System which recovery is relatively simple. First, we would like to ensure that, once a transaction T is committed, it should never be necessary to roll back T. The schedules that theoretically meet this criterion are called recoverable schedules and those that do not are called non recoverable, and hence should not be permitted. A schedule S is recoverable if no transaction T in S commits until all transactions T' that have written an item that T reads have committed. Recoverable schedules require a complex recovery process, but if sufficient information is kept (in the log), a recovery algorithm can be devised. In a recoverable schedule, no committed transaction ever needs to be rolled back. However, it is possible for a phenomenon known as cascading rollback (or cascading abort) to occur, where an uncommitted transaction has to be rolled back because it read an item from a transaction that failed. Because cascading rollback can be quite time-consuming-since numerous transactions can be rolled back it is important to characterize the schedules where thisphenomenon is guaranteed not to occur. A schedule is said to be cascadeless, or to avoid cascading rollback, if every transaction in the schedule reads only items that were written by committed transactions. Finally, there is a third, more restrictive type of schedule, called a strict schedule, in which transactions can neither read nor write an item X until the last transaction that wrote X has committed (or aborted). Strict schedules simplify the recovery process. In a strict schedule, the process of undoing a write_item(X) operation of an aborted transaction is simply to restore the before image (old_value or BFIM) of data item X. This simple procedure always works correctly for strict schedules, but it may not work for recoverable or cascadeless schedules. LOCKING TECHNIQUES FOR CONCURRENCY CONTROL A lock is a variable associated with a data item that describes the status of the item with respect to possible operations that can be applied to it. Generally, there is one lock for each data item in the database. Locks are used as a means of synchronizing the access by concurrent transactions to the database items. Prepared By Nirjharinee Parida Database Management System Types of Locks and System Lock Tables Several types of locks are used in concurrency control. Binary Locks : A binary lock can have two states or values: locked and unlocked (or 1 and 0, for simplicity). A distinct lock is associated with each database item X. If the value of the lock on X is 1, item 'X cannot be accessed by a database operation that requests the item. If the value of the lock on X is 0, the item can be accessed when requested. We refer to the current value (or state) of the lock associated with item X as LOCK(X). Two operations, lock_item and unlock_item, are used with binary locking. A transaction requests access to an item X by first issuing a lock_item(X) operation. If LOCK(X) = 1, the transaction is forced to wait. If LOCK(X) = 0, it is set to 1 (the transaction locks the item) and the transaction is allowed to access item X.When the transaction is through using the item, it issues an un1ock_i tem(X) operation, which sets LOCK(X) to 0(unlocks the item) so that 'X may be accessed by other transactions. Hence, a binary lock enforces mutual exclusion on the data item. If the simple binary locking scheme described here is used, every transaction must obey the following rules: 1. A transaction T must issue the operation 1ock_i tem(X) before any read_i tem(X) or wri te_item(X) operations are performed in T. 2. A transaction T must issue the operation unlock_i tem(X) after all read_i tem(X) and wri te_item(X) operations are completed in T. 3. A transaction T will not issue a lock_i tem(X) operation if it already holds the lock on item X. 4. A transaction T will not issue an unlock_i tern(X) operation unless it already holds the lock on item X. Shared/Exclusive (or Read/Write) Locks: The preceding binary locking scheme is too restrictive for database items, because at most one transaction can hold a lock on a given item. We should allow several transactions to access the same item X if they all access X for reading purposes only. However, if a transaction is to write an item X, it Prepared By Nirjharinee Parida Database Management System must have exclusive access to X. For this purpose, a different type of lock called a multiple mode lock is used. In this scheme-called shared/exclusive or read/write locksthere are three locking operations: read_lock(X), write_lock(X), and unlock(X). A lock associated with anitem X, LOCK(X), now has three possible states: "read-locked," "writelocked," or "unlocked." A readlocked item is also called share-locked, because other transactions are allowed to read the item, whereas a write-locked item is called exclusive-locked, because a single transaction exclusively holds the lock on the item. When we use the shared/exclusive locking scheme, the system must enforce the following rules: 1. A transaction T must issue the operation read_lock(X) or wri te_l ock(X) before any read_i tem(X) operation is performed in T. 2. A transaction T must issue the operation wri te_l ock(X) before any wri te_i tem(X) operation is performed in T. 3. A transaction T must issue the operation unlock(X) after all read_i tem(X) and wri te_i tem(X) operations are completed in T. 4. A transaction T will not issue a read_lock(X) operation if it already holds a read (shared) lock or a write (exclusive) lock on item X. 5. A transaction T will not issue a wri te_l ock(X) operation if it already holds a read (shared) lock or write (exclusive) lock on item X. This rule may be relaxed. 6. A transaction T will not issue an unlock(X) operation unless it already holds a read (shared) lock or a write (exclusive) lock on item X. Guaranteeing Serializability by Two-Phase locking A transaction is said to follow the two-phase locking protocol if all locking operations (read_lock, write_lock) precede the first unlock operation in the transaction. Such a transaction can be divided into two phases: an expanding or growing (first) phase, during which new locks on items can be acquired but none can be released; and a shrinking (second) phase, during which existing locks can be released but no new locks can be acquired. Prepared By Nirjharinee Parida Database Management System Basic, Conservative, Strict, and Rigorous Two-Phase Locking: There are a number of variations of twophase locking (2PL). The technique just described is known as basic 2PL. A variation known as conservative 2PL (or static 2PL) requires a transaction to lock all the items it accesses before the transaction begins execution, by predeclaring its readset and write-set. The read-set of a transaction is the set of all items that the transaction reads, and the write-set is the set of all items that it writes. If any of the predeclared items needed cannot be locked, the transaction does not lock any item; instead, it waits until all the items are available for locking. A more restrictive variation of strict 2PL is rigorous 2PL, which also guarantees strict schedules. In this variation, a transaction T does not release any of its locks (exclusive or shared) until after it commits or aborts, and so it is easier to implement than strict 2pL. Notice the difference between conservative and rigorous 2PL; the former must lock all its items before it starts so once the transaction starts it is in its shrinking phase, whereas the latter does not unlock any of its items until after it terminates (by committing or aborting) so the transaction is in its expanding phase until it ends. Dealing with Deadlock and Starvation Deadlock occurs when each transaction T in a set of two or more transactions is waiting for some item that is locked by some other transaction T' in the set. Hence, each transaction in the set is on a waiting queue, waiting for one of the other transactions in the set to release the lock on an item. Deadlock Prevention Protocols: One way to prevent deadlock is to use a deadlock prevention protocol.P One deadlock prevention protocol, which is used in conservative two-phase locking, requires that every transaction lockall the items it needs in advance (which is generally not a practical assumption)-if any of the items cannot be obtained, none of the items are locked. Rather, the transaction waits and then tries again to lock all the items it needs. This solution obviously further limits concurrency. Prepared By Nirjharinee Parida Database Management System A number of other deadlock prevention schemes have been proposed that make a decision about what to do with a transaction involved in a possible deadlock situation: Should it be blocked and made to wait or should it be aborted, or should the transaction preempt and abort another transaction? The rules followed by these schemes are as follows: • Wait-die: If TS(T) < TS (Tj ) , then (Tj older than Tj ) Tj is allowed to wait; otherwise (Tj younger than T) abort T1 (T, dies) and restart it later with the same timestamp. • Wound-wait: If TS(T) < TS(Tj ) , then (T, older than Tj ) abort Tj (T, wounds Tj ) and restart it later with the same timestamp; otherwise (T, younger than T) Tj is allowed to wait. A second-more practical-approach to dealing with deadlock is deadlock detection, where the system checks if a state of deadlock actually exists. This solution is attractive if we know there will be little interference among the transactions-that is, if different transactions will rarely access the same items at the same time. A simple scheme to deal with deadlock is the use of timeouts, This method is practical because of its low overhead and simplicity. In this method, if a transaction waits for a period longer than a system-defined timeout period, the system assumes that the transaction may be deadlocked and aborts it-regardless of whether a deadlock actually exists or not. Starvation: Another problem that may occur when we use locking is starvation, which occurs when a transaction cannot proceed for an indefinite period of time while other transactions in the system continue normally. This may occur if the waiting scheme for locked items is unfair, giving priority to some transactions over others. One solution for starvation is to have a fair waiting scheme, such as using a first-come-first-served queue; transactions are enabled to lock an item in the order in which they originally requested the lock. Prepared By Nirjharinee Parida Database Management System CONCURRENCY CONTROL BASED ON TIMESTAMP ORDERING A timestamp is a unique identifier created by the DBMS to identify a transaction. Typically, timestamp values are assigned in the order in which the transactions are submitted to the system, so a timestamp can be thought of as the transaction start time. We will refer to the timestamp of transaction T as TS(T) Concurrency control techniques based on timestamp ordering do not use locks; hence, deadlocks cannot occur. The idea for this scheme is to order the transactions based on their timestamps. A schedule in which the transactions participate is then serializable, and the equivalent serial schedule has the transactions in order of their timestamp values. This is called timestamp ordering (TO). The timestamp algorithm must ensure that, for each item accessed by conflicting operations in the schedule, the order in which the item is accessed does not violate the serializability order. To do this, the algorithm associates with each database item X two timestamp (TS) values: 1. Read_TS(X): The read timestamp of item Xi this is the largest timestamp among all the timestamps of transactions that have successfully read item X-that is, read_TS(X) = TS(T), where T is the youngest transaction that has read X successfully. 2. Write_TS(X): The write timestamp of item Xi this is the largest of all the timestamps of transactions that have successfully written item X-that is, write_TS(X) = TS(T), where T is the youngest transaction that has written X successfully. Whenever some transaction T times to issue a read item(X) or a write_item(X) operation, the basic TO algorithm compares the timestamp of T with read_TS(X) and write_TS(X) to ensure that the timestamp order of transaction execution is not violated. If this order is violated, then transaction T is aborted and resubmitted to the system as a new transaction with a new timestamp. If T is aborted and rolled back, any transaction T 1 that may have used a value written by T must also be rolled back. Similarly, any transaction T2 that may have used a value written by T1 must also be rolled back, and so on. This effect is known as cascading rollback and is one of the problems associated with basic TO, since the schedules produced are not guaranteed to be recoverable. Prepared By Nirjharinee Parida Database Management System We first describe the basic TO algorithm here. The concurrency control algorithm must check whether conflicting operations violate the timestamp ordering in the following two cases: 1. Transaction T issues a write_item(X) operation: a) If read_TS(X) > TS(T) or if write_TS(X) > TS(T), then abort and roll back T and reject the operation. This should be done because some younger transaction with a timestamp greater than TS(T)-and hence after T in the timestamp ordering-has already read or written the value of item X before T had a chance to write X, thus violating the timestamp ordering. b) If the condition in part (a) does not occur, then execute the wri te_i tem(X) operation ofT and set write_TS(X) to TS(T). 2. Transaction T issues a read_item(X) operation: a) If write_TS(X) > TS(T), then abort and roll back T and reject the operation. This should be done because some younger transaction with timestamp greater than TS(T)and hence after T in the timestamp ordering-has already written the value of item X before T had a chance to read X. b) If write_TS(X) ≤TS(T), then execute the read_item(X) operation of T and set read_TS(X) to the larger of TS(T) and the current read_TS(X). DATABASE RECOVERY Database recovery is the process of restoring the database to a correct state following a failure. The failure may be the result of a system crash due to hardware or software errors, a media failure, such as a head crash, or a software error in the application, such as a logical error in the program that is accessing the database. It may also be the result of unintentional or intentional corruption or destruction of data. Whatever the underlying cause of the failure, the DBMS must be able to recover from the failure and restore the database to a consistent state. It is the responsibility of DBMS to ensure that the database is reliable and remains in a consistent state in the presence of failures. In general,backup and recovery refers to the various strategies and procedures involved in Prepared By Nirjharinee Parida Database Management System protecting your database against data loss and reconstructing the data such that no data is lost after failure. Thus, recovery scheme is an integral part of the database system that is responsible for the restoration of the database to a consistent state that existed prior to the occurrence of the failure. Data Storage The storage of data generally includes four different types of media with an increasing degree of reliability. These are: ¨ Main memory ¨ Magnetic disk ¨ Magnetic tape ¨ Optical disk. Stable Storage There is a Stable Storage in which information is never lost. Stable storage devices are the theoretically impossible to obtain. But, we must use some technique to design a storage system in which the chances of data loss are extremely low. The most important information needed for whole recovery process must be stored in stable storage. Implementation of Stable Storage " ¨ RAID (Redundant Array of Independent Disk) is commonly used for implementation of stable storage. In case of RAID,information is replicated on several disks in form of array and each disk has independent failure modes. So, in this case failure of single disk does not result in loss of data. RAID system however cannot prevent data loss in case disasters like fires or flooding. • " ¨ In order to prevent from disasters like fire or flooding, the system storage on archival backup of tapes sent to off site however, since tapes cannot be sent off site continuously and hence some data may be lost in disasters like fire or flooding. • Prepared By Nirjharinee Parida Database Management System " ¨ In order to remove above problem, more secure system keep a copy of each block of a stable storage at a remote site writing it out over a computer network in addition to storing the block on a local disk. Causes of failures System Crashes User Error Carelessness Sabotage (intentional corruption of data) Statement Failure Application software errors Network Failure Media Failure Natural Physical Disasters Recovery from failures "¨ Actions taken during normal transaction processing to ensure enough information exists to allow recovery from failure. "¨ Actions taken following a failure to recover the database to a state that is known to be correct. Recovery and Atomicity of a Transaction In order to understand the concept of atomicity of a transactions, consider again a simplified banking transactions that transfer Rs 50 from account A to account B with initial values of A and B being Rs 1000 and Rs 2000 respectively. Prepared By Nirjharinee Parida Database Management System T1 Read(A,a) a=a-50 Write(A,a) Output(AX) Read(B,b) b=b+50 Write(B,b) Output(BX) In this example we suppose that after each write operation output operation is erformed or in other-words we can say that this is an example of Force-Output operation. Suppose that a system crash occurs during the execution of transaction Ti after Output (AX) has been taken place but before output (BX) was executed, here AX and BX are the buffer block which contains the values of data item A and B. Since output (AX) operation performed successfully it means that data item A has value 950 in disk and output (BX) does not performed successfully so its value remains 2000. It means that now system enters into inconsistent state because there is a loss of Rs 50 after execution of transaction T1. •In order to recover for above problem there are two simple means: (i) Re-execute T1 If transaction T1 is re-executed successfully then value of A is Rs 900 and B is Rs 2050. It results again incorrect information because value A is Rs 900 rather than Rs 950. (ii) Do not Re-execute T1 The current state of system has A =950 and B=2000 which again an inconsistent state. Thus, in either case database is left in an inconsistent state, so this simplerecovery scheme does not work. The reason for this difficulty is that we have modified the database without having assurance that the transaction will be indeed commit or completed. Prepared By Nirjharinee Parida Database Management System We have to perform either all or no database modification made by Ti. This idea gives the concept of Atomicity of Transaction. It means that either 100% or 0% of database modification should be performed by the transaction. If the transaction is not completed successfully then some modification performed by transaction stored in database and others not, then database becomes inconsistent. Thus in order to achieve the consistency of database transaction should have the property of Atomicity.To achieve our goal of atomicity, we must first output information regarding the modification to a stable storage without modifying the database. There are two ways to perform such outputs. These are: ¨ Log Based Recovery ¨ Shadow Paging Log Based Recovery It is most used structure for recording database modification. In log based recovery a log file is maintained for recovery purpose.Log file is a sequence of log records. Log Record maintain a record of all the operations (update) of the database. There are several types of log record. <Start> Log Record: Contain information about the start of each transaction. It has transaction identification. Transaction identifier is the unique identification of the transaction that starts.Representation <Ti , start> <Update> Log Record: It describes a single database write and has the following fields: < Ti, Xj, V1,V2 > Here, Ti is transaction identifier, Xj is the data item, V1 is the old value of data item and V2 is the modified or new value of the data item Xj. For example < T0, A, 1000, 1100 > Here, a transaction T0 perform a successful write operation on data Item A whose old value is 1000 and after write operation A has value 1100. Prepared By Nirjharinee Parida Database Management System <Commit> Log Record When a transaction Ti is successfully committed or completed a <Ti,commit> log record is stored in the log file. <Abort> Log Record When a transaction Ti is aborted due to any reason, a <Ti, abort> log record is stored in the log file. Whenever a transaction performs a write A, it is essential that the log record for that write be created before the database is modified. Once a log record exists, we have the ability to undo a transaction. Example Suppose that a system crash occurs during the execution of transaction Ti after Output (AX) has been taken place but before output (BX) was executed, here AX and BX are the buffer block which contains the values of data item A and B.Since output (AX) operation performed successfully it means that data item A has value 950 in disk and output (BX) does not performed successfully so its value remains 2000. It means that now system enters into inconsistent state because there is a loss of Rs 50 after execution of transaction Ti.Then in order to recover the data log record play very important role. We just copy the old value of A i.e. 1000 from the log record to the database stored in disk, the operation of taking the old value of the data item from the log record is called Undo operation. Since log record plays a vital role for the recovery of the database so the log must reside in stable storage. Here, we assume that every log record is written to stable storage as soon as it is created. There are two techniques for log-based recovery: ¨ Deferred Database Modification ¨ Immediate Database Modification Prepared By Nirjharinee Parida Database Management System Deferred Database Modification It ensures transaction atomicity by recording all database modifications in the log, but deferring the execution of all write operations of a transaction until the transaction partially commits. A transaction is said to be partially committed once the final action of the transaction has been executed. When a transaction has performed all the actions, then the information in the log associated with the transaction is used in executing the deferred writes. In other words, at partial commits time logged updates are “replayed” into database item. It means that in this case during write operation the modified values of local variable are not copied into database items, but the corresponding new and old values is stored in the log record. But when the transaction successfully performed all the operations, then the information stored at log record is used to set the value of data items, but if the transaction fail to complete, then the value of data item retain their old values and hence in that case information on the log record is simply ignored. The execution of transaction Ti proceeds as follows: < Ti, Start > Before Ti starts its execution, a log record is written to the log file. < Ti, A, V2> The write operation by Ti results in the writing of new records to the log. This record indicates the new value of A i.e. V2 after the write operation performed by Ti. < Ti, Commit > When Ti partially commits this record is written to the log. It is important to note that only the new value of the data item is required by the deferred modification technique because if the transaction fails before the commit then log record is simply ignored because data item already contain old values due to deferred write operation. But if the transaction commit the new value present in the log record are copied to the data base item. Thus log record contain only the new values of data item, not the old values of data item. This approach save the stable storage space and reduce the complexity of log record. Prepared By Nirjharinee Parida Database Management System Example We reconsider our simplified banking system. Let T1 be a transaction that transfers Rs 100 from account A to account B.This transaction is defined as follows: T1 Read (A,a) a=a-100 write(A,a) Read(B,b) b=b+100 Write(B,b) Example We reconsider our simplified banking system. Let T1 be a transaction that transfers Rs 100 from account A to account B.This transaction is defined as follows: T1 Read (A,a) a=a-100 write(A,a) Read(B,b) b=b+100 Write(B,b) Prepared By Nirjharinee Parida Database Management System Recovery procedure The recovery procedure of deferred database modification is based on Redo operation as explain below: Redo(Ti) It sets the value of all data items updated by transaction Ti to the new values from the log of records. After a failure has occurred the recovery subsystem consults the log to determine which transaction need to be redone. Transaction Ti needs to be redone if an only if the log contain both the record <Ti, start> and the record <Ti, commit>. Thus, if the system crashes after the transaction completes its execution, then the information in the log is used in restoring the system to a previous consistence state. Prepared By Nirjharinee Parida Database Management System Conclusion Redo: If the log contain both records <Ti, start> and <Ti, commit> This transaction may be or may not be stored to disk physically. So use Redo operation to get the modified values from the log record. Re-execute: In all other cases ignore the log record and re-execute the transaction. Immediate Database Modification The immediate database modification technique allows database modification to be output to the database while the transaction is still in the active state. The data modification written by active transactions are called “uncommitted modification”. If the system crash or transaction aborts, then the old value field of the log records is used to restore the modified data items to the value they had prior to the start of the transaction. This restoration is accomplished through the undo operation. In order to understand undo operations, let us consider the format of log record. <Ti, Xj, Vold, Vnew> Here, Ti is transaction identifier, Xj is the data item, Vold is the old value of data item and Vnew is the modified or new value of the data item Xj. Checkpoints To reduce these types of overhead, we introduce Checkpoints To recover the database after some failure we must consult the log record to determine which transaction needs to be undone and redone. For this we need to search the entire log to determine this information. There are two major problems with this approach. These are: ¨ The search process is time-consuming. ¨ Most of the transactions need to redone have already written their updates into the database. Although redoing them will cause no harm, but it will make recovery process more time consuming. The system periodically performs Checkpoints. Prepared By Nirjharinee Parida Database Management System Actions Performed During Checkpoints ¨ Output onto stable storage all log records currently residing in main memory. ¨ Output on the disk all modified buffer blocks. ¨ Output onto stable storage a log record <checkpoint>. The presence of a <checkpoint> record makes recovery process more streamline. Consider a transaction Ti that committed prior to the Checkpoint, means that <Ti, Commit> must appear in the log before the <checkpoint> record. Any database modifications made by Ti must have been written to the database either prior to the checkpoint or as part of checkpoint itself. Thus, at recovery time, there is no need to perform a redo operation on Ti. The checkpoint record gives a list of all transactions that were in progress at the time the checkpoint was taken. Thus, the checkpoints help the system to provide information at restart time which transaction to undo and which to redo. Prepared By Nirjharinee Parida Database Management System · A system failure has occurred at time tf. · The most recent checkpoint prior to time tf was taken at time tc. · Transactions of type T1 completed (successfully) prior to time tc. · Transactions of type T2 started prior to time tc and completed (successfully) after time tc and before time tf · Transactions of type T3 also started prior to time tc but did not complete by time tf. · Transactions of type T4 started after time tc and completed (successfully) before time tf. · Finally, transactions of type T5 also started after time tc, but did not complete by time tf. Prepared By Nirjharinee Parida Database Management System It should be clear that, in case of immediate modification technique those transactions that have <Ti, start> and <Ti, commit> must be redo and those transactions that have only <Ti, start> and no <Ti, commit> must be undo. Thus, when the system is restarted in case of immediate database modification, transactions of types T3 and T5 must be undone, and transactions of types T2 and T4 must be redone. Note, however that transactions of type T1 do not enter in the restart process at all, because their updates were forced to the database at time tc as part of the checkpoint process. As, we assumed earlier that every log record is output to stable storage at the time it is created. This assumption imposes a high overhead on system execution for the following reasons: · Output to stable storage is performed in units of blocks. In most cases, a log record is much smaller than a block. Thus, the output of each log record translates to a much larger output at the physical level. · The cost of performing the output of a block to storage is sufficiently high that it is desirable to output multiple log records at once. To do so, we write log records to a log buffer in main memory, where they stay temporarily until they are output to stable storage. Multiple log records can be gathered in the log buffer and output to stable storage in a single output operation. The order of log records in the stable storage must be exactly the same as the order in which they were written to the log buffer. Due to the use of log buffering a log record may reside in only main memory (volatile storage) for a considerable time before it is output to stable storage. Since such log records are lost if the system crashes, we must impose additional requirements on the recovery techniques to ensure transaction atomicity. · Transaction Ti enters the commit state after the <Ti commit> log record has been output to stable storage. Prepared By Nirjharinee Parida Database Management System · Before the <Ti commit> log record can be output to stable storage,all log records pertaining to transaction Ti must have been output tostable storage. · Before a block of data in main memory can be output to the database (in nonvolatile storage), all log records pertaining to data in that block must have been output to stable storage. The latter rule is called the write-ahead logging (WAL) rule. The Write-Ahead Log Protocol Before writing a page to disk, every update log record that describes a change to this page must be forced to stable storage. This is accomplished by forcing all log records to stable storage before writing the page to disk. WAL is the fundamental rule that ensures that a record of every change to the database is available while attempting to recover from a crash. If a transaction makes a change and committed, the no-force approach means that some of these changes may not have been written to disk at the time of a subsequent crash. Without a record of these changes, there would be no way to ensure that the changes of a committed transaction survive crashes. Note that the definition of a committed transaction is effectively “a transaction whose log records, including a commit record, have all been written to stable storage”. When a transaction is committed, the log detail is forced to stable storage, even if a no-force approach is being used. Example The database is stored in nonvolatile storage (disk) and blocks of data are brought into main memory as needed. Since main memory is typically much smaller than the entire database, it may be necessary to overwrite a block B1 in main memory when another block B2 needs to be brought into memory. If B1 has been modified B1 must be output prior to the input of B2. The rules for the output of log records limit the freedom of the system to output blocks of data. If the input of block B2 causes block B1 to be chosen for output, all log records pertaining to data in B1 must be output to stable storage before B1 is output. Thus, the sequence of actions by the system would be as follows: Prepared By Nirjharinee Parida Database Management System · Output log records to stable storage until all log records pertaining to block B1 have been output. · Output block B1 to disk. · Input block B2 from disk to main memory. It is important that no writes to the block B1 be in progress while the preceding sequence of actions is carried out. Failure with Loss of Nonvolatile Storage Until now, we have considered only the case where a failure results in the loss of information residing in volatile storage while the content of the nonvolatile storage remains intact. Although failures in which the content of nonvolatile storage is lost are rare we nevertheless need to be prepared to deal with this type of failure. In this section, we discuss only disk storage. The basic scheme is to dump the entire content of the database to stable storage periodically-say, once per day. For example we may dump the database to one or more magnetic tapes. If a failure occurs that results in the loss of physical database blocks, the most recent dump is used in restoring the database to a previous consistent state. Once this restoration has been accomplished, the system uses the log to bring the database system to the most recent consistent state. Prepared By Nirjharinee Parida Database Management System Module-III Short Questions 2008 1. What are the two techniques to prevent deadlock? [2][Concurrency control] 2. What is meant by Concurrency? [2][Concurrency control] 2009 1. What do you mean by atomicity? Explain with example. [2][Transaction processing] 2. What is meant by two phase commit protocol? [2][Concurrency control] 3. Explain one of the pessimistic concurrency control scheme with example. [2][Concurrency control] 4. What is the difference between REDO and UNDO operation? [2][Database recovery] 5. What is mean by concurrency? [2][Concurrency control] 6. What do you mean by serializability in transaction processing? [2][Concurrency control] 2010 1. What do you mean by ACID properties ofa transaction? [2][Transaction processing] 2. Forthefollowing operations: T1 : read (x) ; x=x+10; write(x); T2: read(x); read(x); For simultaneous execution Which problem can happen? [2][Transaction processing] 3. What is a timestamp? If TS (Ti)>TS (Tj)then which transaction is younger? Justify. Consider TS (Ti) is the timestamp of transaction Ti. [2][Transaction processing] 2011 1. What is serial schedule? [2][Transaction processing] 2. What is the difference between concurrency control and crash control? [2][Concurrency control] 2012 1. What is transaction atomicity? [2][Transaction processing] 2. What is two phase locking? [2][Concurrency control] Prepared By Nirjharinee Parida Database Management System Long Questions 2008 1. Explain difference between Implicit and Explicit locks. Give examples to support your answer. [5][Concurrency control] 2. Define ‘‘Data mining’’. What are the supports must available with DBMS to Facilitate data mining? [5][Data mining] 2009 1. Define transaction. Explain different states of a transaction. Differentiate between chained transaction and nested transaction. Discuss their advantages and disadvantages. [5][Data mining] 2. What do you mean by concurrent operations? List two disadvantages of it. Discuss the solutions for the problem occurs due to concurrency. [5][Concurrency control] 3. Explain the ACID properties associated with database transaction s. What is the lost update problem? [5][Transaction processing] 4. Write short notes on any two: a. OLAP and OLTP b. Exclusive Lock vs. Shared Lock c. Dead Lock [5][Transaction processing] 2010 1. What is collision? Discuss the various collision resolution techniques. [5][Database recovery] 2. What do you mean by Locking? Explainthe Two phase locking with an example. [5][Concurrency control] 3. Write short notes on : a. Double Buffering b. Parallel Databases [5][Database recovery] 2011 1. Give an account of various types of database failure and methods to recover data. Explain with examples. [5][Database recovery] 2. Give an account of various locks under concurrency control. [5][Concurrency control] 3. Define transaction. Explain ACID properties with example. [5][Transaction processing] 4. Define ‘‘Data mining’’. What are the supports must available with DBMS to facilitate data mining? [5][Transaction processing] 1. 2. 3. 4. 2012 Differentiate between the following terms: i. Locking and Timestamp based Schedulers. [5][Concurrency control] What do you mean by a transaction? How can you implement atomicity in transactions? Explain. [5][Transaction processing] Describe the concept of serializability with suitable example. [5][Transaction processing] Explain Armstrong’s axioms with ACID properties. State their usefulness. [5][Transaction processing] State the types of database failure and explain the corresponding database recovery technique. [5][Database recovery] Prepared By Nirjharinee Parida Database Management System Fourth semester-2009 Relational Database system 1.a. What do you mean by atomicity? Explain With example. Ans: Atomicity. Either all operations of the transaction are reflected properly in the database, or none are. Example: Ti: Read (A) A=A-50 Write (A) Read (B) B=B+50 Write (B) Whenever Ti is executed, either it should be executed fully or not all. Suppose the system fails at a point when write (A) has been performed, but write (B) is yet to performed, the value of A+B, as reflected in the database, would be deficit by Rs. 50, thus violating the consistency requirement. Now the solution would be to roll back the transaction and thus reverting to the old consistency sate is achieved by reverting A to its old value. The old value of A is retrieved from the log-file. 1.b. What is difference between multi valued attributes and derived attribute. Multi valued attribute: The attribute having more than one value is multi-valued attribute eg: phone-no, dependent name, vehicle Derived Attribute: The values for this type of attribute can be derived from the values of existing attributes eg: age which can be derived from (currentdate-birthdate) experience_in_year can be calculated as (currentdate-joindate) 1.c What is difference between procedural and non procedural language. Ans: In dbms the Procedural language specifies how to output of query statement must be obtained Example: Relational algebra In dbms the Non-procedural language specifies what output is to be obtained. Example: SQL,QBE Prepared By Nirjharinee Parida Database Management System 1.d. What is generalization? How it differs from specialization. Ans: Generalization is the abstracting process of viewing set of objects as a single general class by concentrating on the general characteristics of the constituent sets while suppressing or ignoring their differences. Specialization is the abstracting process of introducing new characteristics to an existing class of objects to create one or more new classes of objects. This involves taking a higher-level, and using additional characteristics, generating lower-level entities. The lower-level entities also inherits the, characteristics of the higher-level entity. 1.e. What is meant by two phase commit. Ans: When T completes its execution—that is, when all the sites at which T has executed inform Ci that T has completed—Ci starts the 2PC protocol. • Phase 1. Ci adds the record<prepare T> to the log, and forces the log onto stable storage. It then sends a prepare T message to all sites at which T executed. On receiving such a message, the transaction manager at that site determines whether it is willing to commit its portion of T. If the answer is no, it adds a record <no T> to the log, and then responds by sending an abort T message to Ci. If the answer is yes, it adds a record <ready T> to the log, and forces the log (with all the log records corresponding to T) onto stable storage. The transaction manager then replies with a ready T message to Ci. • Phase 2. When Ci receives responses to the prepare T message from all the sites, or when a prespecified interval of time has elapsed since the prepare T message was sent out, Ci can determine whether the transaction T can be committed or aborted. Transaction T can be committed if Ci received a ready T message from all the participating sites. Otherwise, transaction T must be aborted. Depending on the verdict, either a record <commit T> or a record <abort T> is added to the log and the log is forced onto stable storage. At this point, the fate of the transaction has been sealed. 1.f. Explain one of the pessimistic concurrency control scheme with example. Ans: Locking and time stamp ordering are pessimistic in that they force a wait or a rollback when ever a conflict is detected, even though there is a chance that the schedule may be conflict serializable. Example of Locking protocol: Example of a transaction performing locking: T2: lock-S(A); read (A); unlock(A); lock-S(B); read (B); unlock(B); display(A+B) Locking as above is not sufficient to guarantee serializability — if A and B get updated inbetween the read of A and B, the displayed sum would be wrong. A locking protocol is a set of rules followed by all transactions while requesting and releasing locks. Locking protocols restrict the set of possible schedules. Prepared By Nirjharinee Parida Database Management System 1.g. What is the difference between REDO and UNDO operation. Ans: Transcation which began either before or after the last check point,but were COMMITED after the checkpoint, prior to failure.These transcation need a REDO operation during recovery. Transcation which began before or after the last checkpoint, but were still not commited at the time of failure. These need UNDO operation at the time of recovery. 1.h: What is meant by concurrency? Ans: Multiple transactions are allowed to run concurrently in the system. 1.i. What do you mean by serializability in transaction processing? Ans: A (possibly concurrent) schedule is serializable if it is equivalent to a serial schedule. Different forms of schedule equivalence give rise to the notions of: o conflict serializability o view serializability 1.j. Draw the E_R diagram for the following entity sets Movies(title,year,length,filmtype) ans stars(name,address) Ans: title year movies length filmtyp e M N acts name address stars Role name 2. a. List at least four advantages of using a database management system over a traditional file management system over a traditional file management system Is there any disadvantages of DBMS? Ans: Advantages of dbms: Reduction of redundancies: Centralized control of data by the DBA avoids unnecessary duplication of data and effectively reduces the total amount of data storage required avoiding duplication in the elimination of the inconsistencies that tend to be present in redundant data files. Prepared By Nirjharinee Parida Database Management System Sharing of data: A database allows the sharing of data under its control by any number of application programs or users. Data Integrity: Data integrity means that the data contained in the database is both accurate and consistent. Therefore data values being entered for storage could be checked to ensure that they fall with in a specified range and are of the correct format. Data Security: The DBA who has the ultimate responsibility for the data in the dbms can ensure that proper access procedures are followed including proper authentication schemas for access to the DBS and additional check before permitting access to sensitive data. Conflict resolution: DBA resolve the conflict on requirements of various user and applications. The DBA chooses the best file structure and access method to get optional performance for the application. Data Independence: Data independence is usually considered from two points of views; physically data independence and logical data independence. Physical data Independence allows changes in the physical storage devices or organization of the files to be made without requiring changes in the conceptual view or any of the external views and hence in the application programs using the data base. Logical data independence indicates that the conceptual schema can be changed without affecting the existing external schema or any application program. Disadvantage of DBMS: 1. 2. 3. 4. 5. DBMS software and hardware (networking installation) cost is high The processing overhead by the dbms for implementation of security, integrity and sharing of the data. centralized database control Setup of the database system requires more knowledge, money, skills, and time. The complexity of the database may result in poor performance. Prepared By Nirjharinee Parida Database Management System 2.b. What is abstraction ? Is it necessary for database system ? Explain how database architecture satisfies abstraction at various levels Ans: Data Abstraction For the system to be usable, it must retrieve data efficiently. The need for efficiency has led designers to use complex data structures to represent data in the database. Since many database-systems users are not computer trained, developers hide the complexity from users through several levels of abstraction, to simplify users’ interactions with the system: • Physical level. The lowest level of abstraction describes how the data are actually stored. The physical level describes complex low-level data structures in detail. • Logical level. The next-higher level of abstraction describes what data are stored in the database, and what relationships exist among those data. The logical level thus describes the entire database in terms of a small number of relatively simple structures. Although implementation of the simple structures at the logical level may involve complex physical-level structures, the user of the logical level does not need to be aware of this complexity. Database administrators, who must decide what information to keep in the database, use the logical level of abstraction. • View level. The highest level of abstraction describes only part of the entire database. Even though the logical level uses simpler structures, complexity remains because of the variety of information stored in a large database. Many users of the database system do not need all this information; instead, they need to access only a part of the database. The view level of abstraction exists to simplify their interaction with the system. The system may provide many views for the same database. 3.a Draw the E_R diagram for the following relations: Course(number,room) – it is a weak entity set. Depts(name,HOD) Labcourse(computer allocation) Theory courses(name,faculty_name) Ans: name hod Faculty name name DEPTS THEORY COURSES Computer allocation has handles has COURSES number LAB COURSES room Prepared By Nirjharinee Parida Database Management System 3.b. Suppose we have the following relation with given data as : To avoid redundancy in the first proposal we have decomposed the relation into the following relations: Student(rollno,student_name) and subject(subject-code,subject-name,marks) In the second proposal it is decomposed into the following relations: Student(rollno,student_name) result(rollno,subject_code,marks) and subject(subject_code,subject_name) State whether the decomposition are lossless or lossy. Ans: Examination( R ) (rollno(A),student_name(B),subject_code(C ),subject_name( D ),marks(E)) The R is decomposed into Student(R1) (rollno(A),student_name(B)) Result(R2)(rollno(A),subject_code(C),marks(E)) Subject(R3) (subject_code(C),subject_name(D)) A B C D E R1 αA αB β1C β1D β1E R2 αA β2B αC β2D αC R3 β3A β3B αC αD β3E Prepared By Nirjharinee Parida Database Management System A B C D E R1 αA αB β1C β1D β1E R2 αA αB αC αD αC R3 β3A β3B αC αD β3E In above table we are changing 2nd row and 2nd column as αB as in R1 , A-> B so, when A is αA then B is αB. In above table we are changing 2nd row and 4th column as αD as in R1 , C->D , so, when A is αC then B is αD. After updation of table The R2 contains all α So the decomposition of R inot R1,R2,R3 is lossless. 4.a. What is normalization? Is it necessary for RDBMS. Explain 4NF and 5NF with examples Ans: Normalisation: The basic objective of normalization is to reduce redundancy which means that information is to be stored only once. Storing information several times leads to wastage of storage space and increase in the total size of the data stored. Relations are normalized so that when relations in a database are to be altered during the life time of the database, we do not lose information or introduce inconsistencies. The type of alterations normally needed for relations are: o o o Insertion of new data values to a relation. This should be possible without being forced to leave blank fields for some attributes. Deletion of a tuple, namely, a row of a relation. This should be possible without losing vital information unknowingly. Updating or changing a value of an attribute in a tuple. This should be possible without exhaustively searching all the tuples in the relation. 4NF: A relation schema R I said to be in 4NF, if and only if every MVD A→→B holding on R satisfies either the following two conditions: a. It is trivial MVD or b. A is a super key of R Prepared By Nirjharinee Parida Database Management System 5NF: A schema R will be in 5NF, if each JD *(R1,R2,…,Rn) holding on R satisfies one of the following two conditions: a. It is trivial JD i.e, at least one of the projection Ri(1<=i<=n) is equal to R itself. b. Each of the projections Ri(i<=i<=n) is a super key. When attributes in a relation have multivalucd dependency, further Normalisation to 4NF and 5NF are required. We will illustrate this with an example. Consider a vendor supplying many items to many projects in an organisation. The following are the assumptions: 1. A vendor is capable of supplying many items. 2. A project uses many items 3. A vendor supplies to many projects. 4. An item may be supplied by many vendors. Table 8 gives a relation for this problem and figure 10 lhe dependency diagram(s). Prepared By Nirjharinee Parida Database Management System Prepared By Nirjharinee Parida Database Management System 3.b. Consider the three relations given below: Order(order_no,order_date,customer_no) Order_item(order_no,item_no,quantity,bill_amount) Item(item_no,item_name,unit_price) State whether the following relations are in #NF or not. If not , then what steps you should follow to bring those into 3NF. Ans: THIRD NORMAL FORM: Defn: A relational scheme R<S,F> is in third normal form(3NF) if for all non trivial function dependencies in F+ of the form X→A, either X contains a key(i.e, X is super key) or A is a prime key attribute. A Third Normal Form normalization will be needed where all attributes in a relation tuple are not functionally dependent only on the key attribute. If two non-key attributes are functionally dependent, then there will be unnecessary duplication of data. So remove such attributes from table and form a new table in which it contains the tow non key attributes which are functionally depedent. In order table {orderno} is key attribute and non keys are customer_no and order_date. Here order_date is not depending on customer_no so ther is no transitivity dependency. This table is in 3NF. In order_item table key attributes are {order_no,itemno} and non key are quantity and bill_amount. Here bill_amount is not depending on quantity. So there is transitivity dependency and there for this table is in 3NF. In Item table key attribute itemno is key attribute and non key attributes are item_name,unit_price and Unit_price is depending on item_name. so there a transitivity depdency. So the table is not in 3NF. Decompose the table as: R1(item_no,item_name) R2(item_name,Unit_price) Now these table are in 3NF because both the table have one non key attributes and having no transitivity dependency. Prepared By Nirjharinee Parida Database Management System 5.a. Define transaction. Explain different states of a transaction. Differentiate between chained transaction and nested transaction. Discuss their advantaged and disadvantages. Ans: A transaction is a unit of program execution that accesses and possibly updates various data items. TRANSCATION STATE: A transaction must be in one of the following states: • Active, the initial state; the transaction stays in this state while it is executing • Partially committed, after the final statement has been executed • Failed, after the discovery that normal execution can no longer proceed • Aborted, after the transaction has been rolled back and the database has been restored to its state prior to the start of the transaction • Committed, after successful completion Nested transaction: When one transaction starts in another transaction that is called nested transaction. Advantage: When one transaction starts it also starts sub transaction. So it ensures security for parent transaction and child transaction. Disadvantages: Prepared By Nirjharinee Parida Database Management System It will take more time to execute parent and corresponding child transaction. Chained transaction: When one transaction starts in another transaction and parent transaction waits for completing the child transaction that is called chained transaction. Advantage: When one transaction starts it also starts sub transaction. So it ensures security for parent transaction and child transaction. Disadvantages: It will take more time to execute parent and corresponding child transaction. Prepared By Nirjharinee Parida Database Management System THIRD SEMESTER EXAMINATION-2010 DATABASE SYSTEM 1. a. Explain the concept of data abstraction. Ans: Data Abstraction: For the system to be usable, it must retrieve data efficiently. The need for efficiency has led designers to use complex data structures to represent data in the database. Since many database-systems users are not computer trained, developers hide the complexity from users through several levels of abstraction, to simplify users’ interactions with the system: 1. physical level 2. Logical level 3. View level b. Define a schema with example. Ans: A schema is a logical data base description and is drawn as a chart of the types of data that are used . It gives the names of the entities and attributes and specify the relationships between them. A database schema includes such information as : Characteristics of data items such as entities and attributes . Logical structures and relationships among these data items . Format for storage representation. Integrity parameters such as physical authorization and back up policies. Example: STUDENT(rollno (primary key),name, address) c. What is meant by OLTP application. Ans: OLTP application means online transaction processing in which data base is updated by inserting records or deleting records or by modifying the records . Examples: Railway ticket booking,Online banking,Online shopping d. What do you mean by unauthorized access of database system ? explain Prepared By Nirjharinee Parida Database Management System ans: If any user accessing any data base with out getting permission from server or owner of the database that is unauthorized access to data base. Server or owner can give the permission to user to acces the database by grant command with different privileges. e. Strong entity: An entity whch have key attribute is called strong entity. Example: Loan(loanno,amount,loan type) Weak Entity: Entity types that do not contain any key attribute, and hence can not be identified independently are called weak entity types. A weak entity can be identified by uniquely only by considering some of its attributes in conjunction with the primary key attribute of another entity, which is called the identifying owner entity. Example: Payment(pslno,pmtamt,pdate) 1.f. Ans: Distinguish between physical data independency and logical data independency. Answer: • Physical data independence is the ability to modify the physical scheme without making it necessary to rewrite application programs. Such modifications include changing from unblocked to blocked record storage, or from sequential to random access files. • Logical data independence is the ability to modify the conceptual scheme without making it necessary to rewrite application programs. Such a modification might be adding a field to a record; an application program’s view hides this change from the program. g. Define second normal form of relational database system. Prepared By Nirjharinee Parida Database Management System Ans: Defn: A relation scheme R<S,F> is in second normal form(2NF) if it is in the !NF and if all non prime attributes are fully functionally dependent on the relation keys. A relation is said to be in2NF if it is in 1NF and non-key attributes are functionally dependent on the key attribute(s). Further. if the key has more than one attribute then no non-key attributes should be functionally dependent upon a part of the key attributes. h. Define timestamp ordering in concurrency control. Ans: Timestamps: With each transaction Ti in the system, we associate a unique fixed timestamp, denoted by TS(Ti). This timestamp is assigned by the database system before the transaction Ti starts execution. If a transaction Ti has been assigned timestamp TS(Ti), and a new transaction Tj enters the system, then TS(Ti) < TS(Tj ). There are two simple methods for implementing this scheme: 1. Use the value of the system clock as the timestamp; that is, a transaction’s timestampis equal to the value of the clock when the transaction enters the system. 2. Use a logical counter that is incremented after a new timestamp has been assigned; that is, a transaction’s timestamp is equal to the value of the counter When the transaction enters the system. The timestamps of the transactions determine the serializability order. Thus, if TS(Ti) < TS(Tj ), then the system must ensure that the produced schedule is equivalent to a serial schedule in which transaction Ti appears before transaction Tj . To implement this scheme, we associate with each data item Q two timestamp values: • W-timestamp(Q) denotes the largest timestamp of any transaction that executed write(Q) successfully. • R-timestamp(Q) denotes the largest timestamp of any transaction that executed read(Q) successfully i. Explain the concept of hashing ans: Hashing is simply a mathematical formula that manipulates the key in some form to compute the index for this key in the hash table. j. Distingusih between XML data base and WEB database. Prepared By Nirjharinee Parida Database Management System Ans: Web database are the database which is accessed through the internet and through a standard interface that is CGi(common gateway interface) to act a middle ware.The data base stored in server ia relational database.Data can be retieved thorugh jdbc-odbc bridge. XML data can be stored in any of several different ways. For example, XML data can be stored as strings in a relational database. Alternatively, relations can represent XML data as trees. As another alternative, XML data can be mapped to relations in the same way that E-R schemas are mapped to relational schemas. XML data may also be stored in file systems, or in XML-databases, which use XML as their internal representation . 2.a. What is meant by data models ? describe the catagories of data models. Data model: The data model describes the structure of a database. It is a collection of conceptual tools for describing data, data relationships and consistency constraints and various types of data model such as 1. Object based logical model 2. Record based logical model 3. Physical model Types of data model: 1. 2. 3. 2.b. Object based logical model a. ER-model b. Functional model c. Object oriented model d. Semantic model Record based logical model a. Hierarchical database model b. Network model c. Relational model Physical model What is meant by ER models? Draw an ER diagram for a bank database schema where a bank can have multiple branches , each branch can have multi accounts and loans. Prepared By Nirjharinee Parida Database Management System Ans. Entity Relationship Model The entity-relationship data model perceives the real world as consisting of basic objects, called entities and relationships among these objects. It was developed to facilitate data base design by allowing specification of an enterprise schema which represents the overall logical structure of a data base. Main features of ER-MODEL: 3.a. Entity relationship model is a high level conceptual model It allows us to describe the data involved in a real world enterprise in terms of objects and their relationships. It is widely used to develop an initial design of a database It provides a set of useful concepts that make it convenient for a developer to move from a baseid set of information to a detailed and description of information that can be easily implemented in a database system It describes data as a collection of entities, relationships and attributes. Define functional dependency. Describe Armstrongs’ axioms of functional dependencies. Ans: Prepared By Nirjharinee Parida Database Management System The functional dependency x→y Holds on scema R if, in any legal relation r(R ), for all pairs of tuples t1 and t2 in r such that t1[x]=t2[x]. it is also the case that t1[y]=t2[y] CLOSURE OF A SET OF FUNCTIONAL DEPEDENCIES Given a relational schema R, a functional dependencies f on R is logically implied by a set of functional dependencies F on R if every relation instance r(R) that satisfies F also satisfies f. The closure of F, denoted by F+, is the set of all functional dependencies logically implied by F. The closure of F can be found by using a collection of rules called Armstrong axioms. Reflexivity rule: If A is a set of attributes and B is subset or equal to A, then A→B holds. Augmentation rule: If A→B holds and C is a set of attributes, then CA→CB holds Transitivity rule: If A→B holds and B→C holds, then A→C holds. Union rule: If A→B holds and A→C then A→BC holds Decomposition rule: If A→BC holds, then A→B holds and A→C holds. Pseudo transitivity rule: If A→B holds and BC→D holds, then AC→D holds. 3.b. Distinguish between equivalence sets of functional dependencies and minimal sets of functional dependencies. Ans: Equivalence sets: Given two sets of F and G of FDs defined over the same relational schema, we will say that F and G are equivalents if and only if F+ =G+. We will indicate that F and G are equivalent sets by writing F=G. Minimal cover: For a given set F of FDs, a canonical cover(minimal cover) denoted by Fc, is a set of FDs where the following condition are simultaneously satisfied. 1. Every FD of Fc is simple. That is the right hand side of every fd of Fc has only on attribute. 2. Fc is left reduced 3. Fc is non redundant 4. What is meant by normalization of Relational database? Define BCNF. How does it differ from 3NF? Why is it considered as a stronger from 3NF? Explain with suitable examples. NORMALIZATION Prepared By Nirjharinee Parida Database Management System The basic objective of normalization is to reduce redundancy which means that information is to be stored only once. Storing information several times leads to wastage of storage space and increase in the total size of the data stored. Relations are normalized so that when relations in a database are to be altered during the life time of the database, we do not lose information or introduce inconsistencies. The type of alterations normally needed for relations are: o Insertion of new data values to a relation. This should be possible without being forced to leave blank fields for some attributes. o Deletion of a tuple, namely, a row of a relation. This should be possible without losing vital information unknowingly. o Updating or changing a value of an attribute in a tuple. This should be possible without exhaustively searching all the tuples in the relation. BOYCE CODD NORMAL FORM: Defn: a normalized relation scheme R<S,F> is in Boyce Codd normal form if for every nontrivial FD in F+ of the form X→A where X is subset of S and AЄS, X is a super key of R. How it differs from 3nf: A relational scheme R<S,F> is in third normal form(3NF) if for all non trivial function dependencies in F+ of the form X→A, either X contains a key(i.e, X is super key) or A is a prime key attribute. But in BCNF for every nontrivial FD in F+ of the form X→A where X is subset of S and AЄS, X must be a candidate key of R. BCNF is stronger than 3NF: A relation schema in 3NF may still have some anomalies under the situations when a schema has multiple candidate keys, which my be composite and overlapping. Assume that a relation has more than one possible key. Assume further that the composite keys have a common attribute. If an attribute of a composite key is dependent on an attribute of the other composite key, a normalization called BCNF is needed. Consider. as an example, the relation Professor: It is assumed that 1. A professor can work in more than one department 2. The percentage of the time he spends in each department is given. Prepared By Nirjharinee Parida Database Management System 3. Each department has only one Head of Department. The relationship diagram for the above relation is given in figure 8. Table 6 gives the relation attributes. The two possible composite keys are professor code and Dept. or Professor code and Hcad of Dept. Observe that department as well as Head of Dept. are not non-key attributes. They are a part of a composite key Prepared By Nirjharinee Parida Database Management System 5.a How multi level indexing improve the efficiency of searching an index file. 12.2.1.2 Multilevel Indices Even if we use a sparse index, the index itself may become too large for efficient processing. It is not unreasonable, in practice, to have a file with 100,000 records,with 10 records stored in each block. If we have one index record per block, the index has 10,000 records. Index records are smaller than data records, so let us assume that 100 index records fit on a block. Thus, our index occupies 100 blocks. Such large indices are stored as sequential files on disk. If an index is sufficiently small to be kept in main memory, the search time to find an entry is low. However, if the index is so large that it must be kept on disk, a search for an entry requires several disk block reads. Binary search can be used on the index file to locate an entry, but the search still has a large cost. If the index occupies b blocks, binary search requires as many as _log2(b)_ blocks to be read. (_x_ denotes the least integer that is greater than or equal to x; that is, we round upward.) For our 100-block index, binary search requires Prepared By Nirjharinee Parida Database Management System seven block reads. On a disk system where a block read takes 30 milliseconds, the search will take 210 milliseconds, which is long. Note that, if overflow blocks have been used, binary search will not be possible. In that case, a sequential search is typically used, and that requires b block reads, which will take even longer. Thus, the process of searching a large index may be costly. To deal with this problem, we treat the index just as we would treat any other sequential file, and construct a sparse index on the primary index, To locate a record, we first use binary search on the outer index to find the record for the largest search-key value less than or equal to the one that we desire. The pointer points to a block of the inner index. We scan this block until we find the record that has the largest search-key value less than or equal to the one that we desire. The pointer in this record points to the block of the file that contains the record for which we are looking. Using the two levels of indexing, we have read only one index block, rather than the seven we read with binary search, if we assume that the outer index is already in main memory. If our file is extremely large, even the outer index may grow too large to fit in main memory. In such a case,we can create yet another level of index. Indeed, we can repeat this process as many times as necessary. Indices with two or more levels are called multilevel indices. Searching for records with a multilevel index requires significantly fewer I/O operations than does searching for records by binary search. Each level of index could correspond to a unit of physical storage. Thus, we may have indices at the track, cylinder, and disk levels. A typical dictionary is an example of a multilevel index in the non database world. The header of each page lists the first word alphabetically on that page. Such a book index is a multilevel index: The words at the top of each page of the book index form a sparse index on the contents of the dictionary pages Prepared By Nirjharinee Parida Database Management System 5.b. How does a B-tree differ from b+-tree ? Why a B+ tree is usually preferred as an access structure to a data file. B+-Tree Index Files: B+-tree indices are an alternative to indexed-sequential files. 1. Disadvantage of indexed-sequential files: performance degrades as file grows, since many overflow blocks get created. Periodic reorganization of entire file is required. 2. Advantage of B+-tree index files: automatically reorganizes itself with small, local, changes, in the face of insertions and deletions. Reorganization of entire file is not required to maintain performance. 3. Disadvantage of B+-trees: extra insertion and deletion overhead, space overhead. 4. Advantages of B+-trees outweigh disadvantages, and they are used extensively. A B+-tree is a rooted tree satisfying the following properties: 1. 2. 3. 4. All paths from root to leaf are of the same length Each node that is not a root or a leaf has between [n/2] and n children. A leaf node has between [(n–1)/2] and n–1 values Special cases: a. If the root is not a leaf, it has at least 2 children. b. If the root is a leaf (that is, there are no other nodes in the tree), it can have between 0 and (n–1) values. Example of B+ tree B+-Tree File Organization 1. Index file degradation problem is solved by using B+-Tree indices. Data file degradation problem is solved by using B+-Tree File Organization. 2. The leaf nodes in a B+-tree file organization store records, instead of pointers. 3. Since records are larger than pointers, the maximum number of records that can be stored in a leaf node is less than the number of pointers in a non leaf node. 4. Leaf nodes are still required to be half full. 5. Insertion and deletion are handled in the same way as insertion and deletion of entries in a B+tree index. Prepared By Nirjharinee Parida Database Management System B-Tree Index Files 1. Similar to B+-tree, but B-tree allows search-key values to appear only once; eliminates redundant storage of search keys. 2. Search keys in non leaf nodes appear nowhere else in the B-tree; an additional pointer field for each search key in a non leaf node must be included. 3. Generalized B-tree leaf node Non leaf node – pointers Bi are the bucket or file record pointers Examples of B-tree: 6.a What is meant by query execution plan? Discuss the main heuristic that are applied during query optimization process. Ans: A sequence of primitive operations that can be used to evaluate a query is a query execution plan or queryevaluation plan. Heuristic optimization: A drawback of cost-based optimization is the cost of optimization itself. Although the cost of query processing can be reduced by clever optimizations, cost-based optimization is still expensive. Hence, many systems use heuristics to reduce the number of choices that must be made in a cost-based fashion. Some systems even choose to use only heuristics, and do not use cost-based optimization at all. o o o Cost-based optimization is expensive, even with dynamic programming. Systems may use heuristics to reduce the number of choices that must be made in a cost-based fashion. Heuristic optimization transforms the query-tree by using a set of rules that typically (but not in all cases) improve execution performance: o Perform selection early (reduces the number of tuples) o Perform projection early (reduces the number of attributes) o Perform most restrictive selection and join operations before other similar operations. Prepared By Nirjharinee Parida Database Management System Some systems use only heuristics, others combine heuristics with partial cost-based optimization. Steps in Typical Heuristic Optimization: 1. Deconstruct conjunctive selections into a sequence of single selection operations 2. Move selection operations down the query tree for the earliest possible execution 3. Execute first those selection and join operations that will produce the smallest relations 4. Replace Cartesian product operations that are followed by a selection condition by join operations . 5. Deconstruct and move as far down the tree as possible lists of projection attributes, creating new projections where needed. 6. Identify those sub trees whose operations can be pipelined, and execute them using pipelining. 6.b. Describe the ACID properties of a database transaction. Ans: ACID properties of transaction: Atomicity. Either all operations of the transaction are reflected properly in the database, or none are. Consistency. Execution of a transaction in isolation (that is, with no other transaction executing concurrently) preserves the consistency of the database. Isolation. Even though multiple transactions may execute concurrently, the system guarantees that, for every pair of transactions Ti and Tj , it appears to Ti that either Tj finished execution before Ti started, or Tj started execution after Ti finished. Thus, each transaction is unaware of other transactions executing concurrently in the system. Durability. After a transaction completes successfully, the changes it has made to the database persist, even if there are system failures. These properties are often called the ACID properties; the acronym is derived from the first letter of each of the four properties. 7.a. Discuss how serializability is used to enforce concurrency control in a database system. Ans: Serializability: Basic Assumption – Each transaction preserves database consistency. Thus serial execution of a set of transactions preserves database consistency. A (possibly concurrent) schedule is serializable if it is equivalent to a serial schedule. Different forms of schedule equivalence give rise to the notions of: 1. conflict serializability 2. view serializability Prepared By Nirjharinee Parida Database Management System The Two-Phase Locking Protocol One protocol that ensures serializability is the two-phase locking protocol. This protocol requires that each transaction issue lock and unlock requests in two phases: 1. Growing phase. A transaction may obtain locks, but may not release any lock. 2. Shrinking phase. A transaction may release locks, but may not obtain any new locks. Initially, a transaction is in the growing phase. The transaction acquires locks as needed. Once the transaction releases a lock, it enters the shrinking phase, and it can issue no more lock requests. For example, transactions T3 and T4 are two phase. On the other hand, transactions T1 and T2 are not two phase. Note that the unlock instructions do not need to appear at the end of the transaction. For example, in the case of transaction T3, we could move the unlock(B) instruction to just after the lockX(A) instruction, and still retain the two-phase locking property. Prepared By Nirjharinee Parida Database Management System 7.b. Discuss the problems of dead lock and starvation and different approaches to deal with these problems. DEADLOCK: System is deadlocked if there is a set of transactions such that every transaction in the set is waiting for another transaction in the set. Recovery from Deadlock: When a detection algorithm determines that a deadlock exists, the system must recover from the deadlock. The most common solution is to roll back one or more transactions to break the deadlock. Three actions need to be taken: 1. Selection of a victim. Given a set of deadlocked transactions, we must determine which transaction (or transactions) to roll back to break the deadlock. We should roll back those transactions that will incur the minimum cost. Unfortunately, the term minimum cost is not a precise one. Many factors may determine the cost of a rollback, including a. How long the transaction has computed, and how much longer the transaction will compute before it completes its designated task. b. How many data items the transaction has used. c. How many more data items the transaction needs for it to complete. d. How many transactions will be involved in the rollback. Prepared By Nirjharinee Parida Database Management System 2. Rollback. Once we have decided that a particular transaction must be rolled back, we must determine how far this transaction should be rolled back. The simplest solution is a total rollback: Abort the transaction and then restart it. However, it is more effective to roll back the transaction only as far as necessary to break the deadlock. Such partial rollback requires the system to maintain additional information about the state of all the running transactions. Specially, the sequence of lock requests/grants and updates performed by the transaction needs to be recorded. The deadlock detection mechanism should decide which locks the selected transaction needs to release in order to break the deadlock. The selected transaction must be rolled back to the point where it obtained the .rst of these locks, undoing all actions it took after that point. The recovery mechanism must be capable of performing such partial rollbacks. Furthermore, the transactions must be capable of resuming execution after a partial rollback. See the bibliographical notes for relevant references. 3. Starvation. In a system where the selection of victims is based primarily on cost factors, it may happen that the same transaction is always picked as a victim. As a result, this transaction never completes its designated task, thus there is starvation. We must ensure that transaction can be picked as a victim only a (small) .nite number of times. The most common solution is to include the number of rollbacks in the cost factor. 8. Consider the following relation: Customer(cust_name,cust_street,cust_city) Loan(branch_name,loan_number,amount) Borrower(cust_name,loan_number,amount) Write the following quries in SQL for above relation: b. Find the name of all customers how have a loan at the “ Redwood” Branch. c. Find the branch name,loan number and amount for loans over Rs. 50000/d. Find the name of all customers who have a loan an account or both at perryridge branch. e. Find all customers who have a loan from the bank. Find their names and loan numbers f. Find all loan numbers for loan made at redwood branch with loan amounts greater than Rs.60000. Ans: a. select borrower.cust_name from loan ,borrower where loan.loan_number=borrower.loan_number and loan.branch_name=’Redwood’ b. select branch_name,loan_numer,amount from loan where amount>50000 Prepared By Nirjharinee Parida Database Management System c. (select customer-name from depositor a ,account b where a.account_no=b.account_no and b.branch_name=’perryridge’) union (select customer-name from borrower a,loan b where loan.loan_number=b.loan_number and b. branch_name=’perryridge’) d. select borrower.cust_name,borrower.loan_number from borrower , loan where borrower.loan_number=loan.loan_number e. select a.loan_number from borrower a,loan b where b.branch_name=’perryridge’ and a.amount>60000 a.loan_number=b.loan_number and Prepared By Nirjharinee Parida Database Management System Fourth Semester Examination-2010 DATABASE ENGINEERING 1. a. What do you mean by ACID proerties of a transaction? Ans: b. Ans: Atomicity. Either all operations of the transaction are reflected properly in the database, or none are. Consistency. Execution of a transaction in isolation (that is, with no other transaction executing concurrently) preserves the consistency of the database. Isolation. Even though multiple transactions may execute concurrently, the system guarantees that, for every pair of transactions Ti and Tj , it appears to Ti that either Tj finished execution before Ti started, or Tj started execution after Ti finished. Thus, each transaction is unaware of other transactions executing concurrently in the system. Durability. After a transaction completes successfully, the changes it has made to the database persist, even if there are system failures. Consider a multilevel index with fan-out=4 used to index 25 records,draw the structure. 7 2 14 4 5 21 7 6 25 c. For the following operations: T1: read (X) X=X+10 Write (X) T2:read(X) Read (X) Prepared By Nirjharinee Parida Database Management System For simultaneous execution of T1 & T2 which problem can happen Ans: Dirty read occurs in T2 when READ (X) statement is executed because at the same time X is opened at T1. d. For a relation R(A,B,C,D) with the dependency among numeric field values: C=2A+B and D=2B, draw the ER digram. R A B C=2A+B e. Ans: D=2B Differentiate between foreign key and references. It states that the tuple in one relation that refers to another relation must refer to an existing tuple in that relation. This constraints is specified on two relations . If a column is declared as foreign key that must be primary key of another table. Department(deptcode,dname) Here the deptcode is the primary key. Emp(empcode,name,city,deptcode). Here the deptcode is foreign key. References keyword is used to refer the parent table where primary key is created f. What is time stamp? If TS(Ti)>TS(Tj) then which transaction is younger? Justify. Consider TS(Ti) is the time stamp of transaction Ti. Prepared By Nirjharinee Parida Database Management System Ans: Timestamps: With each transaction Ti in the system, we associate a unique fixed timestamp, denoted by TS(Ti). This timestamp is assigned by the database system before the transaction Ti starts execution. If a transaction Ti has been assigned timestamp TS(Ti), and a new transaction Tj enters the system, then TS(Ti) < TS(Tj ). There are two simple methods for implementing this scheme: 1. Use the value of the system clock as the timestamp; that is, a transaction’s timestampis equal to the value of the clock when the transaction enters the system. 2. Use a logical counter that is incremented after a new timestamp has been assigned; that is, a transaction’s timestamp is equal to the value of the counter When the transaction enters the system. Ti is younger then Tj because TS(Ti)>TS(Tj) g. What is query tree? Draw the query tree for the following SQL query. Select R1.A,R2.B from R1,R2 where R1.b=R2.a Ans: Query tree is tree which gives the idea of query evaluation plan. Prepared By Nirjharinee Parida Database Management System ПR1.A,R2.B σR1.b=R2.a R1 R2 h. Find the tuple calculus representation for the following SQL query: select R1.a,R2.b from R1 and R2 where R1.b=R2.a; ans: R={t(a,b)|sЄR1(s[a]=R1[a] ^uЄR2(u[b]=R2[b] ^s[b]=u[a]))} i. For the following set of dependencies: {A→BC,B→D,C→DE,BC→F} Find the primary key of the relation. Ans: A+={ABCDEF} B+={BD} BC+={BCDEF} Here the super key is A because A+ gives all the attributes of R. It is also candidate key and it can primary key of R. j. What do you mean by RAID. Ans: A variety of disk-organization techniques, collectively called redundant arrays of independent disks (RAID), have been used to achieve improved performance and reliability. In the past, system designers viewed storage systems composed of several small cheap disks as a costeffective alternative to using large, expensive disks; the cost per megabyte of the smaller disks was less Prepared By Nirjharinee Parida Database Management System than that of larger disks. In fact, the I in RAID, which now stands for independent, originally stood for inexpensive. Today, however, all disks are physically small, and larger-capacity disks actually have a lower cost per megabyte. RAID systems are used for their higher reliability and higher performance rate, rather than for economic reasons. Prepared By Nirjharinee Parida Database Management System 2. Consider the set of relations: Student(name,roll,mark) Score(roll,grade) Details(name,address) For the following qyuery: “Find name and address of students scoring grade “A” Represent it in relational algebra, tuple relational calculus, domain relational calculus, QBE & SQL Ans: Relational algebra: Пdetails.name,details.address(σstudent.roll=score.roll and student.name=details.name and and score.grade=”A”(studentXscoreXdetails)) Tuple relational calculus: R={t|sЄdetails(t[name]=s[name] ^uЄstudent(u[name]=s[name]) ^wЄscore(w[roll]=u[roll]))} Domain relational calculus: R={<nm,ad>|nm,adЄdetails ^ r,gЄscore^nm,r,mЄstudent^g=”A”} SQL: Select details.name,details.address from student,score,details where student.roll=score.roll and student.name=details.name and score.grade=”A” Prepared By Nirjharinee Parida Database Management System 3. a. What is collision ? Discuss the various collision resolution techniques. Collision Resolution Techniques: There are two broad ways of collision resolution: 1. Separate Chaining:: An array of linked list implementation. 2. Open Addressing: Array-based implementation. (i) Linear probing (linear search) (ii) Quadratic probing (nonlinear search) (iii) Double hashing (uses two hash functions) Separate Chaining: • The hash table is implemented as an array of linked lists. • • Inserting an item, r, that hashes at index i is simply insertion into the linked list at position i. Synonyms are chained in the same linked list. • Retrieval of an item, r, with hash address, i, is simply retrieval from the linked list at position i. Prepared By Nirjharinee Parida Database Management System • • Deletion of an item, r, with hash address, i, is simply deleting r from the linked list at position i. Example: Load the keys 23, 13, 21, 14, 7, 8, and 15 , in this order, in a hash table of size 7 using separate chaining with the hash function: h(key) = key % 7 h(23) = 23 % 7 = 2 h(13) = 13 % 7 = 6 h(21) = 21 % 7 = 0 h(14) = 14 % 7 = 0 collision h(7) = 7 % 7 = 0 collision h(8) = 8 % 7 = 1 h(15) = 15 % 7 = 1 collision Introduction to Open Addressing: • • • All items are stored in the hash table itself. In addition to the cell data (if any), each cell keeps one of the three states: EMPTY, OCCUPIED, DELETED. While inserting, if a collision occurs, alternative cells are tried until an empty cell is found. • Deletion: (lazy deletion): When a key is deleted the slot is marked as DELETED rather than EMPTY otherwise subsequent searches that hash at the deleted cell will fail. • Probe sequence: A probe sequence is the sequence of array indexes that is followed in searching for an empty cell during an insertion, or in searching for a key during find or delete operations. • The most common probe sequences are of the form: hi(key) = [h(key) + c(i)] % n, for i = 0, 1, …, n-1. Prepared By Nirjharinee Parida Database Management System where h is a hash function and n is the size of the hash table • The function c(i) is required to have the following two properties: Property 1: c(0) = 0 Property 2: The set of values {c(0) % n, c(1) % n, c(2) % n, . . . , c(n-1) % n} must be a permutation of {0, 1, 2,. . ., n – 1}, that is, it must contain every integer between 0 and n - 1 inclusive. • The function c(i) is used to resolve collisions. • To insert item r, we examine array location h0(r) = h(r). If there is a collision, array locations h1(r), h2(r), ..., hn-1(r) are examined until an empty slot is found. • Similarly, to find item r, we examine the same sequence of locations in the same order. • Note: For a given hash function h(key), the only difference in the open addressing collision resolution techniques (linear probing, quadratic probing and double hashing) is in the definition of the function c(i). Common definitions of c(i) are: • Collision resolution technique c(i) Linear probing i Quadratic probing ±i2 Double hashing i*hp(key) where hp(key) is another hash function. Open Addressing: Linear Probing: • • c(i) is a linear function in i of the form c(i) = a*i. Usually c(i) is chosen as: c(i) = i for i = 0, 1, . . . , tableSize – 1 • The probe sequences are then given by: Prepared By Nirjharinee Parida Database Management System hi(key) = [h(key) + i] % tableSize • for i = 0, 1, . . . , tableSize – 1 For c(i) = a*i to satisfy Property 2, a and n must be relatively prime. 3.b. Consider the following set of data items: A B C D A1 1 X1 D1 A2 2 X2 D2 A3 3 X3 D3 A3 3 X3 D3 Represent it in 3NF. Ans: In this table 4th row is duplicate record of 3rd one. Delete that record before checking for normalization. Now the table is in 1NF format. In this relation all row are uniue. In this relation all functional dependencies are there like A→B,B→C,C→D,A→C.A→D,B→D,AB→C,AB→D,… Here the candidate key is (ABCD) So it a all key schema. There is no transitivity dependency so it is 3NF table. Prepared By Nirjharinee Parida Database Management System 4. Describe the steps to reduce an ER-scema to tables Conversion of ER-diagram to relational database Conversion of entity sets: 1. For each strong entity type E in the ER diagram, we create a relation R containing all the single attributes of E. The primary key of the relation R will be one of the key attribute of R. STUDENT(rollno (primary key),name, address) FACULTY(id(primary key),name ,address, salary) COURSE(course-id,(primary key),course_name,duration) DEPARTMENT(dno(primary key),dname) 2. for each weak entity type W in the ER diagram, we create another relation R that contains all simple attributes of W. If E is an owner entity of W then key attribute of E is also include In R. This key attribute of R is set as a foreign key attribute of R. Now the combination of primary key attribute of owner entity type and partial key of the weak entity type will form the key of the weak entity type GUARDIAN((rollno,name) (primary key),address,relationship) Conversion of relationship sets: Binary Relationships: One-to-one relationship: For each 1:1 relationship type R in the ER-diagram involving two entities E1 and E2 we choose one of entities(say E1) preferably with total participation and add primary key attribute of another E as a foreign key attribute in the table of entity(E1). We will also include all the simple attributes of relationship type R in E1 if any, For example, the department relationship has been extended tp include head-id and attribute of the relationship. DEPARTMENT(D_NO,D_NAME,HEAD_ID,DATE_FROM) Prepared By Nirjharinee Parida Database Management System One-to-many relationship: For each 1:n relationship type R involving two entities E1 and E2, we identify the entity type (say E1) at the n-side of the relationship type R and include primary key of the entity on the other side of the relation (say E2) as a foreign key attribute in the table of E1. We include all simple attribute(or simple components of a composite attribute of R(if any) in he table E1) For example: The works in relationship between the DEPARTMENT and FACULTY. For this relationship choose the entity at N side, i.e, FACULTY and add primary key attribute of another entity DEPARTMENT, ie, DNO as a foreign key attribute in FACULTY. FACULTY(CONSTAINS WORKS_IN RELATIOSHIP) (ID,NAME,ADDRESS,BASIC_SAL,DNO) Many-to-many relationship: For each m:n relationship type R, we create a new table (say S) to represent R, We also include the primary key attributes of both the participating entity types as a foreign key attribute in s. Any simple attributes of the m:n relationship type(or simple components as a composite attribute) is also included as attributes of S. For example: The M:n relationship taught-by between entities COURSE; and FACULTY shod be represented as a new table. The structure of the table will include primary key of COURSE and primary key of FACULTY entities. TAUGHT-BY(ID (primary key of FACULTY table),course-id (primary key of COURSE table) N-ary relationship: For each n-anry relationship type R where n>2, we create a new table S to represent R, We include as foreign key attributes in s the primary keys of the relations that represent the participating entity types. We also include any simple attributes of the n-ary relationship type(or Prepared By Nirjharinee Parida Database Management System simple components of complete attribute) as attributes of S. The primary key of S is usually a combination of all the foreign keys that reference the relations representing the participating entity types. Loan Customer Loan sanctio n Employee LOAN-SANCTION(cusomet-id,loanno,empno,sancdate,loan_amount) Multi-valued attributes: For each multivalued attribute ‘A’, we create a new relation R that includes an attribute corresponding to plus the primary key attributes k of the relation that represents the entity type or relationship that has as an attribute. The primary key of R is then combination of A and k. For example, if a STUDENT entity has rollno,name and phone number where phone numer is a multivalued attribute the we will create table PHONE(rollno,phoneno) where primary key is the combination,In the STUDENT table we need not have phone number, instead if can be simply (rollno,name) only. PHONE(rollno,phoneno) Prepared By Nirjharinee Parida Database Management System Account_n o name Account branch generalisation specialisation Is-a intrest charges Saving Current Converting Generalisation /specification hierarchy to tables: A simple rule for conversion may be to decompose all the specialized entities into table in case they are disjoint, for example, for the figure we can create the two table as: Account(account_no,name,branch,balance) Saving account(account-no,intrest) Current_account(account-no,charges) Prepared By Nirjharinee Parida Database Management System 5.a. Differentiate between object-oriented database and object relational database CRITERIA ORDBMS ODBMS Define standard SQL3/4 ODMG_V2.0 Support for object oriented programming Limited mostly to new data types Direct and extensive Simplicity of use Same RDBMS, with some confusing extensions OK for programmer; some SQL access foe end users Complex data relationships Difficult to model Can handle arbitrary complexity; users can write methods on any structure Simplicity of development Provides independence of data from application, good for simple relationship Objects are a natural way to model, can accommodate a wide verity of types and relationships. 5.b What do you mean by locking ? Explain the two phase locking with an example A lock is a mechanism to control concurrent access to a data item Data items can be locked in two modes : o exclusive (X) mode. Data item can be both read as well as written. X-lock is requested using lock-X instruction. Prepared By Nirjharinee Parida Database Management System o shared (S) mode. Data item can only be read. S-lock is requested using lock-S instruction. Lock requests are made to concurrency-control manager. Transaction can proceed only after request is granted. The Two-Phase Locking Protocol One protocol that ensures serializability is the two-phase locking protocol. This protocol requires that each transaction issue lock and unlock requests in two phases: 1. Growing phase. A transaction may obtain locks, but may not release any lock. 2. Shrinking phase. A transaction may release locks, but may not obtain any new locks. Initially, a transaction is in the growing phase. The transaction acquires locks as needed. Once the transaction releases a lock, it enters the shrinking phase, and it can issue no more lock requests. For example, transactions T3 and T4 are two phase. On the other hand, transactions T1 and T2 are not two phase. Note that the unlock instructions do not need to appear at the end of the transaction. For example, in the case of transaction T3, we could move the unlock(B) instruction to just after the lock-X(A) instruction, and still retain the two-phase locking property. Prepared By Nirjharinee Parida Database Management System 6. What is the difference between 4NF and BCNF. Describe with examples. Ans: BCNF: The criteria for Boyce-Codd normal form (BCNF) are: * The table must be in 3NF. * Every non-trivial functional dependency must be a dependency on a superkey. 4NF: The criteria for fourth normal form (4NF) are: * The table must be in BCNF. * There must be no non-trivial multivalued dependencies on something other than a superkey. A BCNF table is said to be in 4NF if and only if all of its multivalued dependencies are functional dependencies. Prepared By Nirjharinee Parida Database Management System COURSE TEACHER TEXT OS Ravi Galvin OS Vivek Dietel OS Ravi Dietel CTX: As indic CO Ram Hamcher ated in CO Shayam M.Mano ctx, the CO Ram M.Mano sche ma CO Shyam Hamcher CTX RTS Ravi Dietel doe s not have any non –trivial FDs. Thus, it is an “ALL KEY” schema and all legal relations under this schema will be in BCNF. But, This relation still has the following data anomalies: OS vivek Galvin a. The information that a particular teacher is teaching a particular course is represented as many times as the number of text books followed for that particular course. b. The information that a particular text book is followed for a particular course is represented as many times as the number of teachers teaching the particular course. These anomalies are due to the non-trivial multi valued dependencies holding on the schema CTX. Trivial MVD: An MVD α→→β holding on a relation schema R is said to be trivial,if: a. α is subset or equal to β B. αUβ=R 7. What is a constraint ? describe its types with examples Ans: Prepared By Nirjharinee Parida Database Management System Rules which are enforced on data being entered and prevents the user from entering invalid data into table are called constraints. Thus constraints super control data being entered in tables fro permanent storage. RELATIONAL CONSTRAINTS: There are three types of constraints on relational database that include o o o DOMAIN CONSTRAINTS KEY CONSTRAINTS INTEGRITY CONSTRAINTS DOMAIN CONSTRAINTS: It specifies that each attribute in a relation an atomic value from the corresponding domains. The data types associated with commercial RDBMS domains include: o Standard numeric data types for integer o Real numbers o Characters o Fixed length strings and variable length strings Thus, domain constraints specifies the condition that we to put on each instance of the relation. So the values that appear in each column must be drawn from the domain associated with that column. Rollno Name City Age 101 Sujit Bam 23 102 kunal bbsr 22 Key constraints: This constraints states that the key attribute value in each tuple msut be unique .i.e, no two tuples contain the same value for the key attribute.(null values can allowed) Emp(empcode,name,address) . here empcode can be unique Integrity constraints: Prepared By Nirjharinee Parida Database Management System There are two types of integrity constraints: o o Entity integrity constraints Referential integrity constraints Entity integrity constraints: It states that no primary key value can be null and unique. This is because the primary key is used to identify individual tuple in the relation. So we will not be able to identify the records uniquely containing null values for the primary key attributes. This constraint is specified on one individual relation. Referential integrity constraints: It states that the tuple in one relation that refers to another relation must refer to an existing tuple in that relation. This constraints is specified on two relations . If a column is declared as foreign key that must be primary key of another table. Department(deptcode,dname) Here the deptcode is the primary key. Emp(empcode,name,city,deptcode). Here the deptcode is foreign key. 8. a). Double Buffering: When several blocks need to be transferred from disk to main memory and all the block address are known. Several buffers can be reserved in main memory to speed up the transfer. While one buffer is being read or written, the cpu can process data in the other buffer. This is possible because an independent disk I/O processor exists that once started can proceed to transfer a data block between memory and disk independent of and in called to cpu processing Prepared By Nirjharinee Parida Database Management System Reading and processing can proceed in parallel when the time required to process a disk block in memory is less then the time required to read the next block and fill a buffer. The cpu can street processing a block once its transfer to main memory is completed at the same time the disk I/O processor can be reading and transferring the next block into different blocks. This technique is called double buffering. b) Parallel Databases: Parallel database system is a DBMS running across multiple processors and disks that is designed to perform various tasks concurrently like loading data, building indexes and evaluating queries Parallelism in Databases Data can be partitioned across multiple disks for parallel I/O. Individual relational operations (e.g., sort, join, aggregation) can be executed in parallel o data can be partitioned and each processor can work independently on its own partition. Queries are expressed in high level language (SQL, translated to relational algebra) o makes parallelization easier. Different queries can be run in parallel with each other. Concurrency control takes care of conflicts. Thus, databases naturally lend themselves to parallelism. Parallel Database Architectures Shared memory -- processors share a common memory Shared disk -- processors share a common disk Shared nothing -- processors share neither a common memory nor common disk Hierarchical -- hybrid of the above architectures Prepared By Nirjharinee Parida Database Management System c) Multi Valued Dependencies: A relation scheme R(A,B,C) is said to have multi valued dependencies A->>B and A->>C if and only if for every legal relation r(R) and for a tuple pair (t1,t2) belongs to r that satisfies t1[A]=t2[B], t1[A]!=t2[B], and t1[C]!=t2[C] there exist {t3,t4} єr That stratifies t3[A]=t4[A]= t1[A]=t2[A] And t3[B]=t1[B] and t4[B]=t2[B] And t3[C]=t2[C] and t4[C]=t1[C] The MVD always occurse in pairs A->>B and A->>C. Both can be jointly denoted as A->>B|C Prepared By Nirjharinee Parida Database Management System Course Sname book Btech Rajiv Os Mtech Rajiv Aca Mca Ajay Se Bca Abhay Spm Bis Ajay Testing Here sname and book are multivalued facts about the attribute course. Here the student has no control over the books to be used for their course. These are independent of each other MVDs are Course->> sname Course->>book 6. What is the difference between 4NF nad BCNF? Describe with examples. 4NF: Given a relation scheme R such that the set D of FDs and MVDs are satisfied, consider a set attributes X and Y where X is subset or equal to R,Y is subset or equal to Y. The reltion scheme R is in 4NF if for all mutivalued dependencies of the form X →→Y Є D+ Either X →→Y is a trivial MVD or X is super key of R. BOYCE CODD NORMAL FORM: Defn: a normalized relation scheme R<S,F> is in Boyce Codd normal form if for every nontrivial FD in F+ of the form X→A where X is subset of S and AЄS, X is a super key of R. Course Teacher Text Os Ravi Galvin Os Vivek Dietel Prepared By Nirjharinee Parida Database Management System Os Ravi Dietel Os Vivek Galvin Co Ram Hamcher Co Shyam M.mano Co Ram M.Mano Co Shayam Hamcher As in CTX, The schema CTX does not have any non –trivial FD thus it is an “all-key” schema. It is in BCNF. The relation CTX has not trivial MVDs course->>teacher and course->>text. Thsese are nor non trivial MVD. And course is not a super key of CTX since CTX is an all key schema. Thus CTX in BCNF but not in 4NF. 4NF is more desirable than BCNF because it reduces the repletion of information. If we consider a BCNF schema no in 4NF. We observe that decomposition into 4NF does not lose information provided that a loss less join decomposition is used. Yet redundancy is reduced. RELATIONAL DATABASE MANAGEMENT SYSTEMS 4TH-SEMESTER EXAMINATION ,2006 1. a. Differentiate between schema and instances. Ans: Relation Schema: A relational schema specifies the relation’ name, its attributes and the domain of each attribute. If R is the name of a relation and A1,A2,… and is a list of attributes representing R then R(A1,A2,…,an) is called a relational schema. Each attribute in this relational schema takes a value from some specific domain called domain(Ai). Relation Instance: Prepared By Nirjharinee Parida Database Management System A relational instance denoted as r is a collection of tuples for a given relational schema at a specific point of time. A relation state r to the relations schema R(A1,A2…,An) also denoted by r® is a set of n-tuples R{t1,t2,…tm} Where each n-tuple is an ordered list of n values T=<v1,v2,….vn> Where each vi belongs to domain (Ai) or contains null values. 1.b. Define weak entity and cardinality ratio Ans: Weak entity: Entity types that do not contain any key attribute, and hence can not be identified independently are called weak entity types. A weak entity can be identified by uniquely only by considering some of its attributes in conjunction with the primary key attribute of another entity, which is called the identifying owner entity. Cardianality Ratio(Mapping Cardinalities): Mapping cardinalities or cardinality ratios, express the number of entities to which another entity can be associated via a relationship set. Mapping cardinalities are most useful in describing binary relationship sets, although they can contribute to the description of relationship sets that involve more than two entity sets. For a binary relationship set R between entity sets A and B, the mapping cardinalities must be one of the following: One-to-one,one-to-many,many-to-one,many-to-many 1. c. Why null values are considered to be bad in a relation. Ans: The attribute value which is unknown to user is called NULL valued attribute. We know NULL values are not allowed for key attribute. Because through key attribute we uniquely identify the tuple in a relation. If it is null we can uniquely identify the tuples 1.d. Define database authorization. Ans: Database authorization is the process of giving authorization to use user’s database to another user. In oracle through grant command we can give different privileges of database to other users and revoke command is cancel privileges of database. Prepared By Nirjharinee Parida Database Management System 1.e Explain functional dependence ans: The functional dependency x→y Holds on schema R if, in any legal relation r(R ), for all pairs of tuples t1 and t2 in r such that t1[x]=t2[x]. it is also the case that t1[y]=t2[y] In order(orderno,orderdate,itemno,price,qtysold) The functional dependencies are orderno→orderdate itemno→price orderno,itemno→qtysold 1.f What is data encryption. Ans: Data encryption is method in which data is converted to the specific format use certain techniques so that data can’t be used by unauthorized users. 1.g. Define 4th normal form Ans: FOURTH NORMAL FORM: Defn: Given a relation scheme R such that the set D of FDs and MVDs are satisfied, consider a set attributes X and Y where X is subset or equal to R,Y is subset or equal to Y. The reltion scheme R is in 4NF if for all mutivalued dependencies of the form X →→Y Є D+ Either X →→Y is a trivial MVD or X is super key of R. 1.h. What is update anomalies. Update Anomalies: Prepared By Nirjharinee Parida Database Management System Multiple copies of the same fact may lead to update anomalies or inconsistencies when an update is made and only some of the multiple copies are updated. Thus ,a change in the phone_no of jones must be made, for consistency in all tuple pertained to the student jones.If one of the tuples is not changed to reflect the new phone_no of jones. 1.i. What are the various types of database users? Ans: Database users : Naive users : Users who need not be aware of the presence of the database system or any other system supporting their usage are considered naïve users . A user of an automatic teller machine falls on this category. Online users : These are users who may communicate with the database directly via an online terminal or indirectly via a user interface and application program. These users are aware of the database system and also know the data manipulation language system. Application programmers : Professional programmers who are responsible for developing application programs or user interfaces utilized by the naïve and online user falls into this category. Database Administration : A person who has central control over the system is called database administrator . 1.j. What do you mean by database audit Ans: Data base audit means it is the process of storing or recording all details of dml operations for a database in a table in secondary memory. This table can sued for auditing and check back transactions very easily. Prepared By Nirjharinee Parida Database Management System 2. Consider the following three tables-sailors, reserves and boats having the follwing attributes. Sailors (Salid,Salname,Rating,Age) Reserves(Salid,Boatid,Day) Boats(Boatid,boat_name,color) Use the above schema and solve the queries in relational algebra i) Find the name of the sailors who have reserve boat 103. Ans: ∏salname( Sailors.salid= Reserves.salid and Reserves.Boatid= Boats.boatid and Reserves.Boatid= Boats.boatid and Reserves.Boatid= Boats.boatid and reserves.boatid=103(Sailors X Reserves X Boats)) ii) Find the name of the sailors who have reserved red color. Ans: ∏salname( Sailors.salid= Reserves.salid and Boats.color=”RED”(Sailors X Reserves X Boats)) iii) Find the columns of boats reserved by Lubber. Ans: ∏salname( Sailors.salid= Reserves.salid and Sailors.salname=”Lubber”(Sailors X Reserves X Boats)) 3. Explain with examples the following SQL commands i) CREATE –both view and table ans: create table tablename (columname1 datatype1,columnname2 datatype2…..) This statement creates a new table for a user Prepared By Nirjharinee Parida Database Management System eg: create table student(rollno varchar2(10),name varchar2(20),city varchar2(20)) create view view name as select table name [where condition] eg: create view v1 as select rollno,name,city from student where city=’bbsr’ This statement creates a new view from a given table as per requirements of the user’s criteria. ii) ALTER(add): Adding new column to a table: ALTER table table name add ( new_column datatype(size)) Eg: ALTER table student add ( phone_no varchar2(10)) ALTER(modify): Modifying the size of a given column of a table: ALTER table table name modify (column datatype(new size)) Eg: ALTER table student modify (name varchar2 (50)) iii) SELECT –group by, having apart from FROM and WHERE Prepared By Nirjharinee Parida Database Management System Ans: Select and group by clause is used to retrieve records from the table group wise using certain criteria. Eg: Display deptno, total salary of employee records for deptno 10 and whose city=’bbsr’ Ans: Select deptno,sum(sal) from emp where city=’bbsr’ group by deptno having deptno=10 4. Consider the universal relation R={A,B,C,D,E,F,G,H,I,J} and set of functional dependencies. F={ AB→C,A→DE,B→F,F→GH,D→IJ} What is the key for R ? Decompose R into 2NF relations. ans: In relation R (AB)+ = {A,B,C,D,E,F,G,H,I,J} A+= {A,D,E,I,J} B+= {B,F} Prepared By Nirjharinee Parida Database Management System 5. Draw a state transition diagram and discuss the typical states that a transaction goes through during execution Ans: TRANSCATION STATE: A transaction must be in one of the following states: • Active, the initial state; the transaction stays in this state while it is executing • Partially committed, after the final statement has been executed • Failed, after the discovery that normal execution can no longer proceed • Aborted, after the transaction has been rolled back and the database has been restored to its state prior to the start of the transaction • Committed, after successful completion The state diagram corresponding to a transaction appears in Figure 15.1. We say that a transaction has committed only if it has entered the committed state. Similarly,we say that a transaction has aborted only if it has entered the aborted state. A transaction is said to have terminated if has either committed or aborted. A transaction starts in the active state. When it finishes its final statement, it enters the partially committed state. At this point, the transaction has completed its execution, but it is still possible that it may have to be aborted, since the actual output may still be temporarily residing in main memory, and thus a hardware failure may preclude its successful completion. The database system then writes out enough information to disk that, even in the event of a failure, the updates performed by the transaction can be re-created when the system restarts after the failure. When the last of this information is written out, the transaction enters the committed state. Prepared By Nirjharinee Parida Database Management System It can restart the transaction, but only if the transaction was aborted as a result of some hardware or software error that was not created through the internal logic of the transaction. A restarted transaction is considered to be a new transaction. It can kill the transaction. It usually does so because of some internal logical error that can be corrected only by rewriting the application program, or because the input was bad, or because the desired data were not found in the database. 6. Explain the informal design guidelines for relation schema Ans: Data base design process: We can identify six main phases of the database design process: 1. Requirement collection and analysis 2. Conceptual data base design 3. Choice of a DBMS 4. Data model mapping(logical database design) 5. physical data base design 6. database system implementation and tuning Prepared By Nirjharinee Parida Database Management System 1. Requirement collection and analysis Before we can effectively design a data base we must know and analyze the expectation of the users and the intended uses of the database in as much as detail. 2. Conceptual data base design The goal for this phase I s to produce a conceptual schema for the database that is independent of a specific DBMS. We often use a high level data model such er-model during this phase We specify as many of known database application on transactions as possible using a notation the is independent of any specific dbms. Often the dbms choice is already made for the organization the intent of conceptual design still to keep , it as free as possible from implementation consideration. 3. Choice of a DBMS The choice of dbms is governed by a no. of factors some technical other economic and still other concerned with the politics of the organization. The economics and organizational factors that offer the choice of the dbms are: Software cost, maintenance cost, hardware cost, database creation and conversion cost, personnel cost, training cost, operating cost. 4. Data model mapping (logical database design) During this phase, we map the conceptual schema from the high level data model used on phase 2 into a data model of the choice dbms. 5. Physical databse design During this phase we design the specification for the database in terms of physical storage structure ,record placement and indexes. 6. Database system implementation and tuning During this phase, the database and application programs are implemented, tested and eventually deployed for service. 6. What is the need for normalization? Explain the first, second and third normal forms with an example. Ans: NORMALIZATION Prepared By Nirjharinee Parida Database Management System The basic objective of normalization is to reduce redundancy which means that information is to be stored only once. Storing information several times leads to wastage of storage space and increase in the total size of the data stored. Relations are normalized so that when relations in a database are to be altered during the life time of the database, we do not lose information or introduce inconsistencies. The type of alterations normally needed for relations are: o Insertion of new data values to a relation. This should be possible without being forced to leave blank fields for some attributes. o Deletion of a tuple, namely, a row of a relation. This should be possible without losing vital information unknowingly. o Updating or changing a value of an attribute in a tuple. This should be possible without exhaustively searching all the tuples in the relation. FIRST NORMAL FORM: Defn: A relation scheme is said to be in first normal form(1NF) if the values in the domain of each attribute of the relation are atomic. In other words, only one value is associated with each attribute and the value is not a set of values or a list of values. Functional dependencies are: orderno → orderdate SECOND NORMAL FORM: Defn: A relation scheme R<S,F> is in second normal form(2NF) if it is in the !NF and if all non prime attributes are fully functionally dependent on the relation keys. Prepared By Nirjharinee Parida Database Management System A relation is said to be in2NF if it is in 1NF and non-key attributes are functionally dependent on the key attribute(s). Further. if the key has more than one attribute then no non-key attributes should be functionally dependent upon a part of the key attributes. Consider, for example, the relation given in table 1. This relation is in 1NF. The key is (Order no.. Item code). The dependency diagram for attributes of this relation is shown in figure 5. The non-key attribute Price_Unit is functionally dependent on Item code which is part of the relation key. Also, the non-key attribute Order date is functionally dependent on Order no. which is a part of the relation key. Thus the relation is not in 2NF. It can be transformed to 2NF by splitting it into three relations as shown in table 3. In table 3 the relation Orders has Order no. as the key. The relation Order details has the composite key Order no. and Item code. In both relations the non-key attributes are functionally dependent on the whole key. Observe that by transforming to 2NF relations the Prepared By Nirjharinee Parida Database Management System THIRD NORMAL FORM: Defn: A relational scheme R<S,F> is in third normal form(3NF) if for all non trivial function dependencies in F+ of the form X→A, either X contains a key(i.e, X is super key) or A is a prime key attribute. A Third Normal Form normalization will be needed where all attributes in a relation tuple are not functionally dependent only on the key attribute. If two non-key attributes are functionally dependent, then there will be unnecessary duplication of data. Consider the relation given in table 4. Here. Roll no. is the key and all other attributes are functionally dependent on it. Thus it is in 2NF. If it is known that in the college all first year students are accommodated in Ganga hostel, all second year students in Kaveri, all third year students in Krishna, and all fourth year students in Godavari, then the non-key attribute Hostel name is dependent on the non-key attribute Year. This dependency is shown in figure 6. Prepared By Nirjharinee Parida Database Management System Observe that given the year of student, his hostel is known and vice versa. The dependency of hostel on year leads to duplication of data as is evident from table 4. If it is decided to ask all first year students to move to Kaveri hostel, and all second year students to Ganga hostel. this change should be made in many places in table 4. Also, when a student's year of study changes, his hostel change should also be noted in Table 4. This is undesirable. Table 4 is said to be in 3NF if it is in 2NF and no non-key attribute is functionally dependent on any other non-key attribute. Table 4 is thus not in 3NF. To transform it to 3NF, we should introduce another relation which includes the functionally related non-key attributes. This is shown in table 5. Prepared By Nirjharinee Parida Database Management System 8. Expalin the data base recovery technique based on deferred update. Ans: To ensure atomicity despite failures, we first output information describing the modifications to stable storage without modifying the database itself. We study two approaches: log-based recovery, and shadow-paging We assume (initially) that transactions run serially, that is, one after the other. Two approaches using logs Deferred database modification Immediate database modification Deferred Database Modification The deferred database modification scheme records all modifications to the log, but defers all the writes to after partial commit. Assume that transactions execute serially Transaction starts by writing <Ti start> record to log. A write(X) operation results in a log record <Ti, X, V> being written, where V is the new value for X o Note: old value is not needed for this scheme The write is not performed on X at this time, but is deferred. When Ti partially commits, <Ti commit> is written to the log Finally, the log records are read and used to actually execute the previously deferred writes. During recovery after a crash, a transaction needs to be redone if and only if both <Ti start> and<Ti commit> are there in the log. Redoing a transaction Ti ( redoTi) sets the value of all data items updated by the transaction to the new values. Crashes can occur while the transaction is executing the original updates, or while recovery action is being taken example transactions T0 and T1 (T0 executes before T1): T0: read (A) T1 : read (C) A: - A - 50 C:-C- 100 Prepared By Nirjharinee Parida Database Management System Write (A) write (C) read (B) B:- B + 50 write (B) Below we show the log as it appears at three instances of time. If log on stable storage at time of crash is as in case: (a) No redo actions need to be taken (b) redo(T0) must be performed since <T0 commit> is present (c) redo(T0) must be performed followed by redo(T1) since <T0 commit> and <Ti commit> are present Prepared By Nirjharinee Parida Database Management System RELATIONAL DATABASE MANAGEMENT SYSTEMS Btech-4TH-SEMESTER EXAMINATION ,2008 1. a. What is the difference between a primary key and a candidate key. Ans: Candidate key: In a relation R, a candidate key for R is a subset of the set of attributes of R, which have the following properties: Uniqueness: Irreducible: no two distinct tuples in R have the same values for the candidate key No proper subset of the candidate key has the uniqueness property that is the candidate key. Eg: (cname,telno) Primary key: The primary key is the candidate key that is chosen by the database designer as the principal means of identifying entities with in an entity set. The remaining candidate keys if any, are called alternate key b. Let R=(A,B,C,D) and functional dependencies A→C, AB→D. What is the closure of {AB}. Ans: Result={AB} In A→c, A is subset of result {AB}, So result={AB}U {C}={ABC} In AB→D, AB is subset of result {ABC}, So result={ABC}U {D}={ABCD} {AB}+={ABCD} c. What do you mean by semi less join ? Prepared By Nirjharinee Parida Database Management System Ans Semijoin Strategy Let r1 be a relation with schema R1 stores at site S1 Let r2 be a relation with schema R2 stores at site S2 Evaluate the expression r1 r2 and obtain the result at S1. 1. Compute temp1 R1 R2 (r1) at S1. 2. 2. Ship temp1 from S1 to S2. 3. 3. Compute temp2 r2 temp1 at S2 4. 4. Ship temp2 from S2 to S1. 5. 5. Compute r1 temp2 at S1. This is the same as r1 r2. d. Define super key and give example to illustrate the super key, Ans: A super key is a set of one or more attributes that taken collectively, allow us to identify uniquely an entity in the entity set. For example , customer-id,(cname,customer-id),(cname,telno) e. What are two techniques to prevent deadlock. Ans: Deadlock prevention techniques: wait-die scheme — non-preemptive older transaction may wait for younger one to release data item. Younger transactions never wait for older ones; they are rolled back instead. a transaction may die several times before acquiring needed data item wound-wait scheme — preemptive older transaction wounds (forces rollback) of younger transaction instead of waiting for it. Younger transactions may wait for older ones. may be fewer rollbacks than wait-die scheme. f. What do you mean by multi valued dependency. Ans: MULTIVALUED DEPEDENCY: Prepared By Nirjharinee Parida Database Management System Defn:Given a relation scheme R, Le X ,Y and Z be subsets of attributes of R. then the multi valued dependency X Y holds in a relation R defined on R if given two tuples t1 and t2 in R with t1(X)=t2(X); R contains two tuples t3 and t4 with the following characteristics: t1,t2,t3,t4 have the X value i.e, T1(X)= T2(X)=T3(X)= T4(X) The Y values of t1 and t3 are the same and the Y values of t2 and t4 are the same .i.e, T1(Y)= T3(Y),T2(Y)= T4(Y) And T2(Z)= T3(Z),T1(Z)= T4(Z) Eg: course teacher course book g. Define and differentiate between natural join and inner join ans: The natural join is a binary operation that allows us to combine certain selections and a Cartesian product into one operation. It is denoted by the “join” symbol . The natural-join operation forms a Cartesian product of its two arguments, performs a selection forcing equality on those attributes that appear in both relation schemas, and finally removes duplicate attributes. � h. What is meant by concurrency? Concurrent Executions: Multiple transactions are allowed to run concurrently in the system. Advantages are: increased processor and disk utilization, leading to better transaction throughput: one transaction can be using the CPU while another is reading from or writing to the disk reduced average response time for transactions: short transactions need not wait behind long ones. i. Mention the various categories of data model. Ans: The data model describes the structure of a database. It is a collection of conceptual tools for describing data, data relationships and consistency constraints and various types of data model such as 4. Object based logical model 5. Record based logical model Prepared By Nirjharinee Parida Database Management System 6. Physical model Types of data model: 4. Object based logical model a. ER-model b. Functional model c. Object oriented model d. Semantic model Record based logical model a. Hierarchical database model b. Network model c. Relational model Physical model 5. 6. j. Define : Entity type, Entity set and value set. Ans: An entity is a “thing” or “object” in the real world that is distinguishable from all other objects.Entity type describes the charactstics of entity. Customer(cno,name,city) An entity set is a set of entities of the same type that share the same properties, or attributes. The set of all persons who are customers at a given bank, for example, can be defined as the entity set customer. Values set specifies related attribute values for an entity set. Value set of customer: (c001,sujit,bbsr),(c002,minati,ctc),(c003,kunal,bbsr) Prepared By Nirjharinee Parida Database Management System 2.a What is normalization? What is a key attribute in a relation .Explain the difference between first, second and third normal forms. Ans: NORMALIZATION The basic objective of normalization is to reduce redundancy which means that information is to be stored only once. Storing information several times leads to wastage of storage space and increase in the total size of the data stored. Relations are normalized so that when relations in a database are to be altered during the life time of the database, we do not lose information or introduce inconsistencies. The type of alterations normally needed for relations are: o Insertion of new data values to a relation. This should be possible without being forced to leave blank fields for some attributes. o Deletion of a tuple, namely, a row of a relation. This should be possible without losing vital information unknowingly. o Updating or changing a value of an attribute in a tuple. This should be possible without exhaustively searching all the tuples in the relation. Key attribute: The key attribute of a relation uniquely identifies a row in a relation Types of key attributes are : Super key ,candidate key, primary key FIRST NORMAL FORM: Prepared By Nirjharinee Parida Database Management System Defn: A relation scheme is said to be in first normal form(1NF) if the values in the domain of each attribute of the relation are atomic. In other words, only one value is associated with each attribute and the value is not a set of values or a list of values. The 1st normalform separate repeating and non repeating attributes SECOND NORMAL FORM: Defn: A relation scheme R<S,F> is in second normal form(2NF) if it is in the !NF and if all non prime attributes are fully functionally dependent on the relation keys. The 2nd normal form remove partial dependencies THIRD NORMAL FORM: Defn: A relational scheme R<S,F> is in third normal form(3NF) if for all non trivial function dependencies in F+ of the form X→A, either X contains a key(i.e, X is super key) or A is a prime key attribute. The 3rd normal form remove transitivity dependencies from a relation. 2.b Define entity, attribute and relationships as used in relational databases. Describe purpose of ERmodel Illustrate your answer. Ans: An entity is a “thing” or “object” in the real world that is distinguishable from all other objects. For example, each person in an enterprise is an entity. An entity has a set properties and the values for some set of properties may uniquely identify an entity. Prepared By Nirjharinee Parida Database Management System BOOK is entity and its properties(called as attributes) bookcode, booktitle, price etc . Attributes: An entity is represented by a set of attributes. Attributes are descriptive properties possessed by each member of an entity set. Customer is an entity and its attributes are customerid, custmername, custaddress etc. An attribute as used in the E-R model , can be characterized by the following attribute types. Simple and composite attribute single-valued and multi-valued attribute Derived Attribute NULL valued attribute Relationship sets: A relationship is an association among several entities. A relationship set is a set of relationships of the same type. Formally, it is a mathematical relation on n>=2 entity sets. If E1,E2…En are entity sets, then a relation ship set R is a subset of {(e1,e2,…en)|e1Є E1,e2 Є E2..,en Є En} where (e1,e2,…en) is a relation ship. customer borro w loan Consider the two entity sets customer and loan. We define the relationship set borrow to denote the association between customers and the bank loans that the customers have. Prepared By Nirjharinee Parida Database Management System Entity Relationship Model The entity-relationship data model perceives the real world as consisting of basic objects, called entities and relationships among these objects. It was developed to facilitate data base design by allowing specification of an enterprise schema which represents the overall logical structure of a data base. Main features of ER-MODEL: Entity relationship model is a high level conceptual model It allows us to describe the data involved in a real world enterprise in terms of objects and their relationships. It is widely used to develop an initial design of a database It provides a set of useful concepts that make it convenient for a developer to move from a based set of information to a detailed and description of information that can be easily implemented in a database system It describes data as a collection of entities, relationships and attributes. Ans: B-Tree Index Files Similar to B+-tree, but B-tree allows search-key values to appear only once; eliminates redundant storage of search keys. Search keys in nonleaf nodes appear nowhere else in the B-tree; an additional pointer field for each search key in a nonleaf node must be included. Generalized B-tree leaf node Prepared By Nirjharinee Parida Database Management System Non leaf node – pointers Bi are the bucket or file record pointers B-Tree Index File Example Prepared By Nirjharinee Parida Database Management System 10 30 20 2,10 4 10 2 30 86 30 20,30 2 6 10, 30 2 86 10, 30 2,4 86 30 10 10 3 30 4 2 20 6 86 2,3 60 30 4 20 6 86 10 10 88 30 4 2,3 6 20 4 84 60,86 30, 84 2,3 6 20 60 86 Prepared By Nirjharinee Parida Database Management System 10 10 30, 84 4 30, 84 4 2,3 33 20 6 2,3 52 20 6 86,88 86,88 33,60 60 10 52 4 91 2,3 6 84 30 60 33 20 86,88 10 10 4 2,3 2,3 52 4 52 84,88 6 30 84,88 6 30 69 20 91 33 20 91 33 60,69 60 86 86 Prepared By Nirjharinee Parida Database Management System Ans: RELATIONAL MODEL Relational model is simple model in which database is represented as a collection of “relations” where each relation is represented by two-dimensional table. The relational model consists of three major components: 1. The set of relations and set of domains that define the way, data can be represented(data structures) The relational model was founded by E.F.Codd of the IBM in 1972.The basic concept in the relational model is that of a relation. Properties: o o o o It is column homogeneous. In other words, in any given column of a table, all items are of the same kind. Each item is a simple number or a character string. That is a table must be in first normal form. All rows of a table are distinct. The ordering of rows with in a table is immaterial. Prepared By Nirjharinee Parida Database Management System o 2. The column of a table are assigned distinct names and the ordering of these columns in immaterial. Integrity rules that define the procedure to protect data(data integrity) Integrity constraints: There are two types of integrity constraints: o o Entity integrity constraints Referential integrity constraints Entity integrity constraints: It states that no primary key value can be null and unique. This is because the primary key is used to identify individual tuple in the relation. So we will not be able to identify the records uniquely containing null values for the primary key attributes. This constraint is specified on one individual relation. Referential integrity constraints: It states that the tuple in one relation that refers to another relation must refer to an existing tuple in that relation. This constraint is specified on two relations. If a column is declared as foreign key that must be primary key of another table. 3. The operations that can be performed on data (data manipulation) They are insertion, deletion and updating records in a relation. The two models that can use SQL are: Relational model, object oriented model Prepared By Nirjharinee Parida Database Management System Locking of tables Both implied and explicit read locks are still a read lock, however a read lock that had been obtained will prevent any writes to the table. Difference between implicit and explicit locks. A select statement will cause an implicit read lock within the execution of the select, release after the select completes. If there are multiple selects, the read lock is obtained for each select separately, so writes is possible in between the multiple selects. An explicit read lock will keep the lock until it is released. You can do multiple select between the LOCK TABLES and UNLOCK TABLES and you are sure that no data in the tables you have read locked is changed in between the multiple selects. Take an example: LOCK TABLE t1 READ, t2 READ; SELECT * FROM t1; SELECT * FROM t2; UNLOCK TABLES; With the explicit LOCK TABLES, you can be sure that there is no changes to the tables in between your selects. So, the data you read is consistent. Assuming you do not have the Lock Tables, what happened is that another connection is able to update the 2 tables in between your select, example, after you complete the 1st select, another connection is able to update the table t1 and t2 before you execute the 2nd select, in this case the data you have obtained from the 2 selects may no longer be consistent since the data had been changed between your 1st and 2nd select. Therefore, you need the explicit Read Lock to get a consistant view of the data for the 2 selects. 4. Prepared By Nirjharinee Parida Database Management System Object oriented database Definition An object-oriented database management system (OODBMS), sometimes shortened to ODBMS for object database management system), is a database management system (DBMS) that supports the modelling and creation of data as objects. This includes some kind of support for classes of objects and the inheritance of class properties and methods by subclasses and their objects. There is currently no widely agreed-upon standard for what constitutes an OODBMS, and OODBMS products are considered to be still in their infancy. In the meantime, the objectrelational database management system (ORDBMS), the idea that object-oriented database concepts can be superimposed on relational databases, is more commonly encountered in available products. An object-oriented database interface standard is being developed by an industry group, the Object Data Management Group (ODMG). The Object Management Group (OMG) has already standardized an object-oriented data brokering interface between systems in a network. Object-Oriented Database Advantages why Versant's object-oriented database solutions instead of traditional RDBMS? Where data handling requirements are simple and suitable to rigid row and column structures an RDBMS might be an appropriate solutiuon. However,for many applications, today's most challenging aspect is controlling the inherent complexity of the subject matter itself - the complexity must be tamed. And tamed in a way that enables continual evolution of the application as the environment and needs change. For these applications, an object-oriented database is the best answer: COMPLEX (INTER-) RELATIONSHIPS If there are a lot of many-to-many relationships, tree structures or network (graph) structures then Versant's object-oriented database solutions will handle those relationships much faster than a relational database. COMPLEX DATA For many applications, the most challenging aspect is controlling the inherent complexity of the subject matter itself - the complexity must be tamed. For these applications, a Versant object-oriented database is the best answer. Architectures that mix technical needs such as persistence (and SQL) with the domain model are an invitation to disaster. Versant's object-oriented database solutions let you develop using objects that need only contain the domain behaviour, freeing you from persistence concerns. Prepared By Nirjharinee Parida Database Management System NO MAPPING LAYER It is difficult, time consuming, expensive in development, and expensive at run time, to map the objects into a relational database and performance can suffer. Versant's object-oriented database solutions store objects as objects - yes, it's as easy as 1, 2, 3. Versant's object database solutions are designed to store many-to-many, tree and network relationships as named bi-directional associations without having the need for JOIN tables. Hence, Versant's object database solutions save programming time, and objects can be stored and retrieved faster. Modern O/R mapping tools may simplify many mapping problems, however they don’t provide seamless data distribution or the performance of Versant's object-oriented database solutions. FAST AND EASY DEVELOPEMENT, ABILITY TO COPE WITH CONTINOUS EVOLUTION The complexity of telecommunications infrastructure, transportation networks, simulations, financial instruments and other domains must be tamed. And tamed in a way that enables continual evolution of the application as the environment and needs change. Architectures that mix technical needs such as persistence (and SQL) with the domain model are an invitation to disaster. Versant's object-oriented database solutions let you develop using objects that need only contain the domain behaviour, freeing you from persistence concerns. 5.a State Armstrong’s axiom. Show that Armstrong’s axiom are complete. Ans: The closure of F, denoted by F+, is the set of all functional dependencies logically implied by F. The closure of F can be found by using a collection of rules called Armstrong axioms. Reflexivity rule: If A is a set of attributes and B is subset or equal to A, then A→B holds. Augmentation rule: If A→B holds and C is a set of attributes, then CA→CB holds Transitivity rule: If A→B holds and B→C holds, then A→C holds. Union rule: If A→B holds and A→C then A→BC holds Decomposition rule: If A→BC holds, then A→B holds and A→C holds. Pseudo transitivity rule: If A→B holds and BC→D holds, then AC→D holds. Suppose we are given a relation schema R=(A,B,C,G,H,I) and the set of function dependencies A→B,A→C,CG→H,CG→I,B→H We list several members of F+ here: A→H, since A→B and B→H hold, we apply the transitivity rule. CG→HI. Since CG→H and CG→I , the union rule implies that CG→HI AG→I, since A→C and CG→I, the pseudo transitivity rule implies that AG→I holds Prepared By Nirjharinee Parida Database Management System 5.b.Expalin difference between inner join and outer join. What are restriction on using outer join? Give examples to support your answer. Ans: Inner join : An inner join (sometimes called a simple join ) is a join of two or more tables that returns only those rows that satisfy the join condition. Outer Joins : An outer join extends the result of a simple join. An outer join returns all rows that satisfy the join condition and also returns some or all of those rows from one table for which no rows from the other satisfy the join condition What restrictions are imposed on outer join queries that use the operator (+)? Outer join queries that use the Oracle join operator (+) are subject to the following rules and restrictions, which do not apply to the FROM clause OUTER JOIN syntax: The (+) operator cannot be specified in a query block that also contains the FROM clause OUTER JOIN syntax. The (+) operator can appear only in the WHERE clause, or in the context of left-correlation (that is, when specifying the TABLE clause) in the FROM clause, and can be applied only to a column of a table or view. If A and B are joined by multiple join conditions, then the (+) operator must be used in all of these conditions. If not done, then Oracle Database will return only the rows resulting from a simple join, but without a warning or error showing that the results are not of an outer join. The (+) operator does not produce an outer join if one table is specified in the outer query and the other table in an inner query. The (+) operator cannot be used to outer-join a table to itself, although self joins are valid. Example The following statement is not valid: SELECT EMPLOYEE_ID, MANAGER_ID FROM EMPLOYEES WHERE EMPLOYEES.MANAGER_ID(+) = EMPLOYEES.EMPLOYEE_ID; Prepared By Nirjharinee Parida Database Management System However, the following self join is valid: SELECT E1.EMPLOYEE_ID, E1.MANAGER_ID, E2.EMPLOYEE_ID FROM EMPLOYEES E1, EMPLOYEES E2 WHERE E1.MANAGER_ID(+) = E2.EMPLOYEE_ID; The (+) operator can be applied only to a column, not to an arbitrary expression. However, an arbitrary expression can contain one or more columns marked with the (+) operator. A WHERE condition containing the (+) operator cannot be combined with another condition using the OR logical operator. A WHERE condition cannot use the IN comparison condition to compare a column marked with the (+) operator with an expression. A WHERE condition cannot compare any column marked with the (+) operator with a sub-query. If the WHERE clause contains a condition that compares a column from table B with a constant, then the (+) operator must be applied to the column so that Oracle returns the rows from table A for which it has generated nulls for this column. Otherwise, Oracle returns only the results of a simple join. Ans: Data redundancy and inconsistency: The same information may be written in several files. This redundancy leads to higher storage and access cost. It may lead data inconsistency that is the various copies of the same data may longer agree for example a changed customer address may be reflected in single file but not else where in the system. Reduction of redundancies in dbms:: Centralized control of data by the DBA avoids unnecessary duplication of data and effectively reduces the total amount of data storage required avoiding duplication in the elimination of the inconsistencies that tend to be present in redundant data files. Eg: Prepared By Nirjharinee Parida Database Management System Here order no 1456 and order date 260289 ,order no 1886 and order date 1886 is replicated in the relation. Ans: Primary key: The primary key is the candidate key that is chosen by the database designer as the principal means of identifying entities with in an entity set. The remaining candidate keys if any are called alternate key Foreign key: Foreign key is a attribute or group of attribute which is derived from other table in which this attribute/attributes must be primary key. Department(deptcode,dname) Here the deptcode is the primary key. Prepared By Nirjharinee Parida Database Management System Emp(empcode,name,city,deptcode). Here the deptcode is foreign key. Candidate keys of R: AB+=ABC AE+=ABCDE CD+=ABCDE BE+=ABCDE A+=A B+=B C+=C D+=AD E+=E Here super keys are: {AE},{CD},{BE} Because AE+,CD+,BE+ Gives all the attributes of relation R(A,B,C,D,E) In AE,CD,BE there is no extraneous attribute so they are also candidate keys. Distributed Database System Distributing data across sites or departments in an organization allows those data to reside where they are generated or most needed, but still to be accessible from other sites and from other departments. Keeping multiple copies of the database across different sites also allows large organizations to continue their database operations even when one site is affected by a natural disaster, such as flood, fire, or earthquake. Prepared By Nirjharinee Parida Database Management System Distributed database systems handle geographically or administratively distributed data spread across multiple database systems. In a homogeneous distributed database o o o All sites have identical software Are aware of each other and agree to cooperate in processing user requests. Each site surrenders part of its autonomy in terms of right to change schemas or software o Appears to user as a single system In a heterogeneous distributed database o o Different sites may use different schemas and software Difference in schema is a major problem for query processing Difference in softwrae is a major problem for transaction processing Sites may not be aware of each other and may provide only limited facilities for cooperation in transaction processing (Distributed database system) Prepared By Nirjharinee Parida Database Management System A distributed database allows a user convenient and transparent access to data which is not stored at the site, while allowing each site control over its own local data. A distributed database can be made more reliable than a centralized system because if one site fails, the database can continue functioning, but if the centralized system fails, the database can no longer continue with its normal operation. Also, a distributed database allows parallel execution of queries and possibly splitting one query into many parts to increase throughput. A centralized system is easier to design and implement. A centralized system is cheaper to operate because messages do not have to be sent. Prepared By Nirjharinee Parida Database Management System Ans: SELECT * FROM S,R WHERE S.A=R.A AND S.B=R.G RELATIONAL ALGEBRA: ∏ S,A,S.B,S.C,R.F,R.G( S.A=R.A AND S.B=R.G(SXR)) OUTPUT: A B C F G 8 6 5 2 6 (c) Define “Data Mining”. What are the supports must available with DBMS to facitate data mining. Ans: Prepared By Nirjharinee Parida Database Management System The term data mining refers loosely to the process of semi automatically analyzing large databases to find useful patterns. Like knowledge discovery in artificial intelligence (also called machine learning), or statistical analysis, data mining attempts to discover rules and patterns from data. However, data mining differs from machine learning and statistics in that it deals with large volumes of data, stored primarily on disk. That is, data mining deals with “knowledge discovery in databases”. What technological infrastructure is required? Today, data mining applications are available on all size systems for mainframe, client/server, and PC platforms. System prices range from several thousand dollars for the smallest applications up to $1 million a terabyte for the largest. Enterprise-wide applications generally range in size from 10 gigabytes to over 11 terabytes. NCR has the capacity to deliver applications exceeding 100 terabytes. There are two critical technological drivers: Size of the database: the more data being processed and maintained, the more powerful the system required. Query complexity: the more complex the queries and the greater the number of queries being processed, the more powerful the system required. Relational database storage and management technology is adequate for many data mining applications less than 50 gigabytes. However, this infrastructure needs to be significantly enhanced to support larger applications. Some vendors have added extensive indexing capabilities to improve query performance. Others use new hardware architectures such as Massively Parallel Processors (MPP) to achieve order-of-magnitude improvements in query time. For example, MPP systems from NCR link hundreds of high-speed Pentium processors to achieve performance levels exceeding those of the largest supercomputers RELATION ALGEBRA: Relational algebra is a set of basic operations used to manipulate the data in relational model. These operations enable the user to specify basic retrieval request. The result of retrieval is anew relation, formed from one or more relation. This operation can be classified in two categories. Basic Set Operation Union Intersection Set difference Prepared By Nirjharinee Parida Database Management System Cartesian product Relational operations Select Project Join Division Other operators are Rename, Assignment Ans: Set Difference Operation Notation r – s Defined as: r – s = {t | t r and t s} Set differences must be taken between compatible relations. o r and s must have the same arity o attribute domains of r and s must be compatible Set Difference Operation – Example Prepared By Nirjharinee Parida Database Management System Relations r, s: A B A B 1 2 2 3 1 s r r – s: A B 1 1 8.C. Construct a B+ tree of order 1 with following keys 1,9,5,3,7,11,17,13,15 Ans: A B+-tree is a rooted tree satisfying the following properties: Prepared By Nirjharinee Parida Database Management System All paths from root to leaf are of the same length Each node that is not a root or a leaf has between [n/2] and n children. A leaf node has between [(n–1)/2] and n–1 values Special cases: If the root is not a leaf, it has at least 2 children. If the root is a leaf (that is, there are no other nodes in the tree), it can have between 0 and (n–1) values. Here order of B+ tree is 1 means maximum n-1 i.e 0 elements can be stored in a node .It is not possible os order must be greater then 1. B+ tree for order 4 The elements to insert are 1,9,5,3,7,11,17,13,15. Prepared By Nirjharinee Parida Database Management System Insert 1,9,5 Insert 3 1 9 5 1 5 9 5 Insert 7 1 1 3 5 3 5 1 5 1 3 9 5 Insert 17 5 3 3 9 5 5 11 Insert 13 5 3 3 Insert 11 3 3 1 9 9 11 17 11 5 9 11 13 3 17 Prepared By Nirjharinee Parida Database Management System Outer Joins : An outer join extends the result of a simple join. An outer join returns all rows that satisfy the join condition and also returns some or all of those rows from one table for which no rows from the other satisfy the join condition The outer join operation is an extension of the join operation to deal with missing information. There are three forms of outer join left outer join right outer join full outer join Notation of outer joins: Left Outer Join loan Borrower Right Outer Join loan borrower Full Outer Join loan borrower Prepared By Nirjharinee Parida Database Management System RELATIONAL DATABASE MANAGEMENT SYSTEMS Mca-3rd-SEMESTER EXAMINATION ,2008 1. a. What are the different operators in relational algebra ? Relational algebra is a set of basic operations used to manipulate the data in relational model. These operations enable the user to specify basic retrieval request. The result of retrieval is a new relation, formed from one or more relation. These operation can be classified in two categories. Basic Set Operation Union Intersection Set difference Cartesian product Relational operations Select Project Join Division b. b.Define and differentiate between external, internal and conceptual. Ans: External level : The external level is at the highest level of database abstraction . At this level, there will be many views define for different users requirement. A view will describe only a subset of the database. Any number of user views may exist for a given global or subschema. Conceptual level : At this level of database abstraction all the database entities and the relationships among them are included . One conceptual view represents the entire database . This conceptual view is defined by the conceptual schema. Internal level : Prepared By Nirjharinee Parida Database Management System It is the lowest level of abstraction closest to the physical storage method used . It indicates how the data will be stored and describes the data structures and access methods to be used by the database . The internal view is expressed by internal schema. b. What is an outer join and use a diagram to explain Ans: OUTER JOIN: The outer join operation is an extension of the join operation to deal with missing information. There are three forms of outer join left outer join right outer join full outer join Outer join: Results of Natural join Results of missing information from left relation Results of missing information from right relation d. Ans: Lost update problem: Consider the situation illustrated in figure.That figure is meant to be read as follows: Transcation A retrieves some tuple t at time t1, transaction B retireves that same tuple t at time t2, transaction A updates the tuple (on the basis of the values seen at time t1) at time t3, and transaction B updates the same tuple (on the basis of the values seen at time t2 which are the same as those seen at time Prepared By Nirjharinee Parida Database Management System t1) at time t4. Transcation A’s update is lost at time t4, because truncation B overwrites it with out even looking at it. Transaction A Transaction B Retrieve t t1 t2 update t retrieve t t3 t4 update t Ans: A database management system that provides three level of data is said to follow architecture . three-level External level Conceptual level Internal level There are two mapping in three level architecture 1. External/conceptual mapping 2. conceptual/internal mapping A mapping between the external and conceptual view gives the correspondence among the records and the relationships of the external and conceptual views. Prepared By Nirjharinee Parida Database Management System Conceptual schmea is related to the internal schema by the conceptual/internal mapping. This enables the dbms to find the actual record or combination of records in physical storage that constitute a logical record in conceptual schema A relationship represents an association between two or more entities. They are classified un terms of degree ,connectivity, cardinality and existence. For example, IS-A relationship, HAS-A relationship etc. A particular occurrence of a relationship is called as relationship instance or relational occurrence. In an ER-model similar relationships are grouped into relations ser called as the relation types. It is set of meaningful association between one or more participating entity types. Primary key: The primary key is the candidate key that is chosen by the database designer as the principal means of identifying entities with in an entity set. The remaining candidate keys if any, are called alternate key or secondary key Foreign key is a derived key attribute from another relation in which that must be primary key or Unique key. Department(deptcode,dname) Here the deptcode is the primary key. Emp(empcode,name,city,deptcode). Here the deptcode is foreign key. Yes. Prepared By Nirjharinee Parida Database Management System Physical data independence is the ability to modify the physical scheme without making it necessary to rewrite application programs. Such modifications include changing from unblocked to blocked record storage, or from sequential to random access files. The tuple relational calculus is a non procedural query language. It describes the desired information with out giving a specific procedure for obtaining that information. A query in the tuple relational calculus is expressed as: {t|P(t)} where P is a formula. Several tuple variables may appear in a formula. A tuple variable is said to be a free variable unless it is quantified by a or . A functional dependency set is said to be minimal When it satisfies 3 conditions a. No- redundant functional dependency set b. Left reduced functional dependency set c. Every Fd set is simple means right hand side of FD must have only one attribute. Prepared By Nirjharinee Parida