* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download DATABASE MANAGEMENT SYSTEMS
Survey
Document related concepts
Commitment ordering wikipedia , lookup
Microsoft SQL Server wikipedia , lookup
Open Database Connectivity wikipedia , lookup
Relational algebra wikipedia , lookup
Entity–attribute–value model wikipedia , lookup
Microsoft Jet Database Engine wikipedia , lookup
Serializability wikipedia , lookup
Versant Object Database wikipedia , lookup
Clusterpoint wikipedia , lookup
Extensible Storage Engine wikipedia , lookup
Concurrency control wikipedia , lookup
Transcript
DATABASE MANAGEMENT SYSTEMS Two Mark Questions Unit I- Introduction 1. Define database A database is a collection of information that is organized so that it can easily be accessed, managed, and updated. 2. Define DBMS A database-management system (DBMS) is a collection of interrelated data and a set of programs to access those data. The collection of data, usually referred to as the database, contains information relevant to an enterprise. The primary goal of a DBMS is to provide a way to store and retrieve database information that is both convenient and efficient. 3. What are the applications of database systems? • Banking: • Airlines: • Universities: • Telecommunication: • Finance: • Sales: • Manufacturing: • Human resources: 4. What is the difference between file systems and database systems? (or) Compare file systems and database systems (or)What are the advantages of database systems? Data redundancy and inconsistency Difficulty in accessing data Data isolation Integrity problems Atomicity problems Concurrent-access anomalies Security problems 5. Define atomicity Atomicity states that database modifications must follow an “all or nothing” rule. Each transaction is said to be “atomic.” If one part of the transaction fails, the entire transaction fails. 6. What are the three levels of abstraction? Physical level Logical level View level 7. Define instance The collection of information stored in the database at a particular moment is called an instance of the Database. 8. Define schema The overall design of the database is called the database schema. Schemas are changed infrequently, if at all. 1 9. Define data model Data model is a collection of conceptual tools for describing data, data relationships, data semantics, and consistency constraints. 10. Name the types of data model The Entity-Relationship Model Relational model Object oriented data model Semi structured model Hierarchical data model Network Model 11. What are the two types of database languages? Data definition language Data manipulation language 12. Define data dictionary A data dictionary contains metadata—that is, data about data. The schema of a table is an example of metadata. A database system consults the data dictionary before reading or modifying actual data. 13. Define DDL A database schema by a set of definitions expressed by a special language called a data-definition language (DDL). 14. Define DML A data-manipulation language (DML) is a language that enables users to access or manipulate data as organized by the appropriate data model. There are basically two types: • Procedural DMLs require a user to specify what data are needed and how to get those data.(Eg: Relational Algebra) • Declarative DMLs (also referred to as nonprocedural DMLs) require a user to specify what data are needed without specifying how to get those data.(Eg: Tuple Relational Calculus & Domain Relational Calculus) 15. Define query language A query is a statement requesting the retrieval of information. The portion of a DML that involves information retrieval is called a query language. 16. What are the different types of users work with the database system? Naive users Application programmers Sophisticated users Specialized users 17. What is the use of naïve users? Naive users are unsophisticated users who interact with the system by invoking one of the application programs that have been written previously. Naive users may also simply read reports generated from the database. 2 18. Define Sophisticated users Sophisticated users interact with the system without writing programs. Instead, they form their requests in a database query language. They submit each such query to a query processor, whose function is to break down DML statements into instructions that the storage manager understands. 19. Compare 2-tier and 2-tier architecture Difference 2-tier The application is partitioned into a Definition component that resides at the client machine, which invokes database system functionality at the server machine through query language statements. Example Diagram ODBC and JDBC 2-tier The client machine acts as merely a front end and does not contain any direct database calls. Instead, the client end communicates with an application server, usually through a forms interface. The application server in turn communicates with a database system to access data. Business logic of the application,WWW 20. Define Specialized users Specialized users are sophisticated users who write specialized database applications that do not fit into the traditional data-processing framework. 21. What is the role of DBA(Database Administrator)? Schema definition Storage structure and access-method definition Schema and physical-organization modification Granting of authorization for data access Routine maintenance 22. Define E-R Model Entity-Relationship (ER) model, a high-level data model that is useful in developing a conceptual design for a database. Creation of an ER diagram, which is one of the first steps in designing a database, helps the designer(s) to understand and to specify the desired components of the database and the relationships among those components. An ER model is a diagram containing entities or "items", relationships among them, and attributes of the entities and the relationships. 23. Define entity An entity is a “thing” or “object” in the real world that is distinguishable from all other objects. For example, each person in an enterprise is an entity. 3 24. Define entity set An entity set is a set of entities of the same type that share the same properties, or attributes. The set of all persons who are customers at a given bank, for example, can be defined as the entity set customer. 25. What are the attributes characterized in E-R model? Simple and composite attributes. Single-valued and multivalued attributes Derived attribute. 26.Define relationship A relationship is an association among several entities. 27. Define relationship set A relationship set is a set of relationships of the same type. Formally, it is a mathematical relation on n ≥ 2 (possibly nondistinct) entity sets. If E1, E2, . . .,En are entity sets, then a relationship set R is a subset of where (e1, e2, . . . , en) is a relationship. 28. Define Mapping cardinalities. What are its types? Mapping cardinalities, or cardinality ratios, express the number of entities to which another entity can be associated via a relationship set. For a binary relationship set R between entity sets A and B, the mapping cardinality must be one of the following: • One to one. • One to many. • Many to one. • Many to many. 29. Define Keys A key allows us to identify a set of attributes that suffice to distinguish entities from each other. Keys also help uniquely identify relationships, and thus distinguish relationships from each other. 30. Define super key A super key is defined in the relational model attributes of a relation variable for which it all relations assigned to that variable there are tuples (rows) that have the same values for the attributes in this set. as a set of holds that in no two distinct 31. Define candidate key A candidate key of a relationship is a set of attributes of that relationship such that there are no two distinct tuples with the same values for these attributes. In simple example candidate key is a minimal super key, i.e. a super key of which no proper subset is also a super key. 32. Define primary key The primary key of a relational table uniquely identifies each record in the table. It can either be a normal attribute that is guaranteed to be unique (such as Social Security Number in a table with no more than one record per person) or it can be generated by the DBMS (such as a globally unique identifier, or GUID, in Microsoft SQL Server). Primary keys may consist of a single attribute or multiple attributes in combination. 4 33. What is the use of E-R diagram? An entity-relationship (ER) diagram is a specialized graphic that illustrates the interrelationships between entities in a database. ER diagrams often use symbols to represent three different types of information. Boxes are commonly used to represent entities. Diamonds are normally used to represent relationships and ovals are used to represent attributes. 34. Define weak entity sets An entity set may not have sufficient attributes to form a primary key. Such an entity set is termed a weak entity set. 35. Define strong entity sets An entity set that has a primary key is termed a strong entity set. 36. Define Specialization The process of designating sub groupings within an entity set is called specialization. 37. Define Generalization Generalization, which is a containment relationship that exists between a higher-level entity set and one or more lower-level entity sets. 38. Define Attribute Inheritance A crucial property of the higher- and lower-level entities created by specialization and generalization is attribute inheritance. The attributes of the higher-level entity sets are said to be inherited by the lowerlevel entity sets. 39. Define aggregation Aggregation is an abstraction through which relationships are treated as higher level entities. 40. What are the disadvantages of E-R model? Limited constraint representation Limited relationship representation No representation of data manipulation Loss of information 5 41. What are the advantages of E-R model? It is easy and simple to understand with minimal training. Therefore the model can be used by the database designer to communicate design to the end user. It has explicit linkages between entities It is possible to find a connection from one node to all the other nodes. 42. What are the features of E-R model? It has a high degree of data independence and seeks to remove redundancy in data representation based on mathematical theorem. The ER model is a top-down approach in system design It can be used as a basis for the unification of different views of data such as; network model, relational or entity modeling It was developed after the relational database when the industry shifted its attention to transaction processing 43. Define cardinality. Cardinality: the cardinality of a relationship is the actual number of related occurrences for each of the related entities. The connectivity of a relationship describes the mapping of associated entity instances in the relationship. The values of connectivity are "one" or "many". The basic types of connectivity for relations are: one-to-one, one-to-many, and many-to-many. Many-to-many relationships cannot be translated to relational tables; these relations must be translated into two or more one-to-many relationships. 44. Why is it better to use the n-ary relationship over the binary relationship? The n-ary relation can accept more entity relations than binary which only accepts two. An n-ary relationship set shows more clearly that several entities participate in a single relationship. 45. What is the difference between a single inheritance and multiple inheritances? Single inheritance is when a given entity set is involved in a lower entity set in only one ISA relationship. A multiple inheritance is when it is involved in more than one ISA relationship. 46. Define UML. Unified Modeling Language (UML), is a proposed standard for creating specifications of various components of a software system. Some of the parts of UML are: • Class diagram. • Use case diagram. • Activity diagram. • Implementation diagram. Unit2- Relational Model 1. Define relational model Relational model is most widely used data model for commercial data-processing. The reason it’s used so much is, because it’s simple and easy to maintain. The model is based on a collection of tables. Users of the database can create tables, insert new tables or modify existing tables. 2. Define relational databases A relational database consists of a collection of tables, each of which is assigned a unique name. A row in a table represents a relationship among a set of values. Since a table is a collection of such relationships, 6 there is a close correspondence between the concept of table and the mathematical concept of relation, from which the relational data model takes its name. 3. Define foreign key A foreign key is a field in a relational table that matches the primary key column of another table. The foreign key can be used to cross-reference tables. 4. What is the schema diagram? A database schema, along with primary key and foreign key dependencies, can be depicted pictorially by schema diagrams. 5. Define relational algebra The relational algebra is a procedural query language. It consists of a set of operations that take one or two relations as input and produce a new relation as their result. 6. What are the basic operations of relational algebra? Select Project Union set difference Cartesian product Rename Note: The first three (unary operations & the remaining three are binary operation) 7. Define schema diagram Schema diagram is defined as the pictorial representation of a database schema, along with primary key and foreign key dependencies. . 8. Define Relational algebra operations(selection, projection, union, set diffrenec, cartisian with an example. 9. Define Views. Any relation that is not part of the logical model, but is made visible to a user as a virtual relation, is called views. The syntax for view is, CREATE OR REPLACE VIEW <view_name> AS SELECT <column_name> FROM <table_name>; 10. Define Materialized Views Database systems allow view relations to be stored, but they make sure that, if the actual relations used in the view definition change, the view is kept up to date. Such views are called materialized views. 11. What are the conditions used to updating an view? In view definition, from clause has only one table Select clause contains only the attribute name but does not have any expression, aggregation function Any attribute in the select clause (not listed) can be set to NULL The query does not have a group by or having clause 7 12. Define Tuple Relational calculus (TRL). The tuple relational calculus, is a nonprocedural query language. It describes the desired information without giving a specific procedure for obtaining that information. A query in the tuple relational calculus is expressed as {t | P(t)} that is, it is the set of all tuples t such that predicate P is true for t. 13. Define Domain Relational calculus(DRC) Domain relational calculus, uses domain variables that take on values from an attributes domain, rather than values for an entire tuple. An expression in the domain relational calculus is of the form {< x1, x2, . . . , xn > | P(x1, x2, . . . , xn)} where x1, x2, . . . , xn represent domain variables. P represents a formula composed of atoms, as was the case in the tuple relational calculus. 14. Define referential integrity. Referential integrity is a database concept that ensures that relationships between tables remain consistent. When one table has a foreign key to another table, the concept of referential integrity states that you may not add a record to the table that contains the foreign key unless there is a corresponding record in the linked table. 15. Define Integrity constraints Integrity constraints provide a mechanism for ensuring that data conforms to guidelines specified by the database administrator. The constraints available in SQL are Foreign Key, Not Null, Unique, Check. Constraints can be defined in two ways 1) The constraints can be specified immediately after the column definition. This is called column-level definition. 2) The constraints can be specified after all the columns are defined. This is called table-level definition. 16. Define Domain Integrity. The domain integrity states that every element from a relation should respect the type and restrictions of it's corresponding attribute. A type can have a variable length which needs to be respected. Restrictions could be the rage of a value that the element can have, the default value if none is provided and if the element can be NULL. 17. What do you mean by triggers? It defines the actions to be executed automatically when certain events occur and corresponding conditions are satisfied. 18. Define authentication. Authentication is any process by which you verify that someone is who they claim they are. 19. Define authorization. The process of giving individuals access to system objects based on their identity. 20. Define embedded SQL Embedded SQL is defined as the process of embedding the SQL within procedural programming languages. These language (sometimes referred to as 3GLs) include C/C++, Cobol, Fortran, and Ada. Thus the embedded SQL provides the 3GL with a way to manipulate a database, supporting: � highly customized applications � background applications running without user intervention � database manipulation which exceeds the abilities of simple SQL 8 � applications linking to Oracle packages, e.g. forms and reports � Applications which need customized window interfaces 21. Define dynamic SQL. It allows programs to construct and submit SQL queries at run time. Dynamic SQL statements are stored as strings of characters that are entered when the program runs. They can be entered by the programmer or generated by the program itself, but unlike static SQL statements, they are not embedded in the source program. Also in contrast to static SQL statements, dynamic SQL statements can change from one execution to the next. 22. Define cursors A mechanism for retrieving rows from the database one at a time 23. Define distributed databases. It consists of a collection of sites, connected together via some kind of communication network in which Each site is a full database system site on its own right, The sites have agreed to work together so that a auser at any site can access data anywhere in the network exactly as if the data were all stored at the user’s own site 24. What are the advantages of distributed databases? 1. Local autonomy 2. No reliance on a central site (bottleneck, vulnerability) 3. Continuous operation (reliability, availability) 4. Location independence 5. Fragmentation independence 6. Replication independence 7. Distributed Query Processing (optimization) 8. Distributed Transaction Management (concurrency, recovery) 9. Hardware independence 10. OS independence 11. Network independence 12. DBMS independence 25. Define client server databases. A client server databases consists of three primary software components (aside from the network software and operating systems of the computers in question): the client application (also called the front end), the data access layer (also called middleware), and the database server (also called a database engine, DBMS, data source, or back end). 26. What are the advantages of client server databases? Data sharing Integrity services Data interchangeability Location Independence of Data and processing 27. What are the disadvantages of client server databdases? Traffic Congestion Robustness 9 28. Define Encryption. Encryption is the process of transforming information (referred to as plaintext) using an algorithm (called cipher) to make it unreadable to anyone except those possessing special knowledge, usually referred to as a key. UNIT –III 1. Define normalization Database normalization is the process of removing redundant data from the database to improve storage efficiency, data integrity, and scalability. Normalization generally involves splitting existing tables into multiple ones, which must be re-joined or linked each time a query is issued. 2. List out the drawbacks of Redundant Information Wastage of Storage Causes problems with update anomalies Insertion anomalies Deletion anomalies Modification anomalies 3. List out the advantages of normalization. Less storage space Quicker updates Less data inconsistency Clearer data relationships Easier to add data Flexible Structure 4. Define functional dependency. An attribute Y is said to have a functional dependency on a set of attributes X (written X →Y) if and only if each X value is associated with precisely one Y value. 5. Define Trivial functional dependency A trivial functional dependency is a functional dependency of an attribute on a superset of itself. {Ssn,Pnumber} -> {Hours} Trivial {Ssn} -> {Ename } Non trivial 6. Define Full functional dependency An attribute is fully functionally dependent on a set of attributes X if it is functionally dependent on X, and not functionally dependent on any proper subset of X. {Ssn,Pnumber} -> {Hours} 7. Define Transitive dependency A transitive dependency is an indirect functional dependency, one in which X→Z only by virtue of X→Y and Y→Z. 10 8. Define Multivalued dependency A multivaluesd dependency is a constraint according to which the presence of certain rows in a table implies the presence of certain other rows. 9. Define Join dependency A table T is subject to a join dependency if T can always be recreated by joining multiple tables each having a subset of the attributes of T. 10. What are the Inference Rules for FDs? (Reflexive) If Y subset-of X, then X -> Y (Augmentation) If X -> Y, then XZ -> YZ (Transitive) If X -> Y and Y -> Z, then X -> Z Decomposition: If X -> YZ, then X -> Y and X -> Z Union: If X -> Y and X -> Z, then X -> YZ Pseudo transitivity: If X -> Y and WY -> Z, then WX -> Z 11. Define First Normal Form. A relation is said to be in First Normal Form (1NF) if and only if each attribute of the relation is atomic. It does not allows the composite and multi valued attributes. 12. Define second normal form. A relation schema R is in second normal form (2NF) if a relation in 1NF and every non key attribute A in R is fully functionally dependent on the primary key. 13. Define Third Normal Form. A relation schema R is in third normal form (3NF) if a table is in second normal form (2NF) and there are no transitive dependencies. 14. Define BCNF. A relation is in BCNF, if and only if, every determinant is a candidate key. 15. Compare 3NF & BCNF. 3NF BCNF A relation schema R is in 3NF if for every A relation schema R is in Boyce- Codd nontrivial FD X-> Y in R, X is not a Normal Form (BCNF) if for every nontrivial candidate key FD X-> Y in R, X is a candidate key 3NF has some redundancy BCNF removes all redundancies caused by FD’s Performance is Lesser than BCNF Better Performance than 3NF 16. Define multivalued dependency. A multivalued dependency on R, X ->>Y, says that if two tuples of R agree on all the attributes of X, then their components in Y may be swapped, and the result will be two tuples that are also in the relation. 17. Define fourth normal form. A relation R is in 4NF if and only if, for every one of its non-trivial multivalued dependencies XY, X is a superkey—that is, X is either a candidate key or a superset thereof. 11 18. Define fifth normal form. An entity is in Fifth Normal Form (5NF) if, and only if, it is in 4NF and every join dependency for the entity is a consequence of its candidate keys. 19. Define domain key normal form (DKNF). A relation is in DK/NF if every constraint on the relation is a logical consequence of the definition of keys and domains. 20. Define join dependency. A join dependency is a constraint on the set of legal relations over a database scheme. A table T is subject to a join dependency if T can always be recreated by joining multiple tables each having a subset of the attributes of T. If one of the tables in the join has all the attributes of the table T, the join dependency is called trivial. The join dependency plays an important role in the Fifth normal form, also known as project-join normal form UNIT IV 1. Define transaction. A transaction is a unit of program execution that accesses and possibly updates various data items. A transaction must see a consistent database. 2. What are the transaction states? Active, the initial state; the transaction stays in this state while it is executing. Partially committed, after the final statement has been executed Failed, after the discovery that normal execution can no longer proceed. Aborted, after the transaction has been rolled backed and the database has been restored to its state prior to the start of transaction. Committed, after successful completion 3. Define ACID properties. Atomicity. Either all operations of the transaction are properly reflected in the database or none are. Consistency. Execution of a transaction in isolation preserves the consistency of the database. Isolation. Although multiple transactions may execute concurrently, each transaction must be unaware of other concurrently executing transactions. That is, for every pair of transactions Ti and Tj, it appears to Ti that either Tj, finished execution before Ti started, or Tj started execution after Ti finished. Durability. After a transaction completes successfully, the changes it has made to the database persist, even if there are system failures. 4. List the advantages of concurrency. increased processor and disk utilization reduced waiting time 5. Define schedule. Schedules – sequences that indicate the chronological order in which instructions of concurrent transactions are executed a schedule for a set of transactions must consist of all instructions of those transactions must preserve the order in which the instructions appear in each individual transaction. 12 6. Define serializability. Serializability is the classical concurrency scheme. It ensures that a schedule for executing concurrent transactions is equivalent to one that executes the transactions serially in some order. It assumes that all accesses to the database are done using read and write operations. 7. Define serial schedule. A schedule that considers all the actions of a transaction T1, followed by all the actions of another transaction T2 and so on is called serial schedule. 8. What are the types of serializability? conflict serializability view serializability 9. Define conflict serializability. A schedule S is conflict serializable if it is conflict equivalent to a serial schedule. If a schedule S can be transformed into a schedule S´ by a series of swaps of non-conflicting instructions, we say that S and S´ are conflict equivalent. 10. Define view serializability A schedule S is view serializable it is view equivalent to a serial schedule. Every conflict serializable schedule is also view serializable. 13 11. Define recoverable schedule. Recoverable schedule is the one where for each pair of transactions Ti and Tj such that Tj reads a data item previously written by Ti, the commit operation of Ti appears before the commit operation of Tj. 12. Define cascading rollback. An uncommitted transaction will be rolled back because of the failure of the first transaction, from which other transactions reads the data item. This phenomenon of wasting the desirable amount of work is called cascading rollback. 13. What is blind write? If a transaction writes a data item without reading the data is called blind write. This sometimes causes inconsistency. 14. Define lock. What is the use of locking? A lock is a mechanism to control concurrent access to a data item. It is used to prevent concurrent transactions from interfering with one another and enforcing an additional condition that guarantees serializability. 15. What is shared lock and Exclusive lock? Shared lock allows other transactions to read the data item and write is not allowed. Exclusive lock allows both read and write of data item and a single transaction exclusively holds the lock. 16. What is called as a time stamp? A time stamp is a unique identifier for each transaction generated by the system. Concurrency control protocols use this time stamp to ensure serializability. 17. When does a deadlock occur? Deadlock occurs when one transaction T in a set of two or more transactions is waiting for some item that is locked by some other transaction in the set. 18. What is meant by transaction rollback? If a transaction fails for reasons like power failure, hardware failure or logical error in the transaction after updating the database, it is rolled back to restore the previous value. 19. What are the objectives of concurrency control? To be resistant to site and communication failure. To permit parallelism to satisfy performance requirements. To place few constraints on the structure of atomic actions. 20. What is replication? The process of generating and reproducing multiple copies of data at one or more sites is called replication. 21. What are the two phases available in two phase locking protocol? Phase 1: Growing Phase transaction may obtain locks transaction may not release locks Phase 2: Shrinking Phase transaction may release locks transaction may not obtain locks 14 22. What is strict & rigorous two phase locking protocol? Strict two-phase locking. A transaction must hold all its exclusive locks till it commits/aborts. Rigorous two-phase locking is even stricter: here all locks are held till commit/abort. In this protocol transactions can be serialized in the order in which they commit. 23. Define upgrading & downgrading. Upgrading -> convert shared lock to exclusive lock in growing phase Downgrading -> convert exclusive lock to shared lock in growing phase 24. What is the role of lock manager? A Lock manager can be implemented as a separate process to which transactions send lock and unlock requests. The lock manager replies to a lock request by sending a lock grant messages. The requesting transaction waits until its request is answered. The lock manager maintains a data structure called a lock table to record granted locks and pending requests. 25. Define graph based protocol. Graph-based protocols are an alternative to two-phase locking Impose a partial ordering on the set D = {d1, d2 ,..., dh} of all data items. o If di dj then any transaction accessing both di and dj must access di before accessing dj. o Implies that the set D may now be viewed as a directed acyclic graph, called a database graph. The tree-protocol is a simple kind of graph protocol. 26. What is called as a time stamp? A time stamp is a unique identifier for each transaction generated by the system. Concurrency control protocols use this time stamp to ensure serializability. 27. Define thomson’s write rule. A transaction Ti issues write(Q). If TS(Ti) < R-timestamp(Q), then the value of Q that Ti is producing was needed previously, and the system assumed that that value would never be produced. Hence, the write operation is rejected, and Ti is rolled back. If TS(Ti) < W-timestamp(Q), then Ti is attempting to write an obsolete value of Q. Hence, this write operation can be ignored Otherwise, the write operation is executed, and W-timestamp(Q) is set to TS(Ti). 28. What are the phases in validation based protocol? 1. Read phase. During this phase, the system executes transaction Ti. It reads the values of the various data items and stores them in variables local to Ti. It performs all write operations on temporary local variables, without updates of the actual database. 2. Validation phase. Transaction Ti performs a validation test to determine whether it can copy to the database the temporary local variables that hold the results of write operations without causing a violation of serializability. 3. Write phase. If transaction Ti succeeds in validation (step 2), then the system applies the actual updates to the database. Otherwise, the system rolls backTi. 29. Define multiple granularities. The multiple-granularity locking protocol, which ensures serializability, is this: Each transaction Ti can lock a node Q by following these rules: It must observe the lock-compatibility function of Figure 16.17. It must lock the root of the tree first, and can lock it in any mode. 15 It can lock a node Q in S or IS mode only if it currently has the parent of Q locked in either IX or IS mode. It can lock a node Q in X, SIX, or IX mode only if it currently has the parent of Q locked in either IX or SIX mode. It can lock a node only if it has not previously unlocked any node (that is, Ti is two phase). It can unlock a node Q only if it currently has none of the children of Q locked. 30. What are the various deadlock prevention technique? wait-die scheme — non-preemptive wound-wait scheme — preemptive Timeout-Based Schemes 31. Define log. A log is kept on stable storage. The log is a sequence of log records, and maintains a record of update activities on the database. 32. What is a checkpoint? Check-pointing is saving enough state of a process so that the process can be restarted at the point in the computa-tion where the checkpoint was taken. T1 can be ignored (updates already output to disk due to checkpoint) T2 and T3 redone. T4 undone UNIT V 1. What are the various physical storage media? 2. Define access time, seek time, rotational latency, data transfer rate & mean time to failure Access time – the time it takes from when a read or write request is issued to when data transfer begins. Consists of: Seek time – time it takes to reposition the arm over the correct track. Rotational latency – time it takes for the sector to be accessed to appear under the head. Data-transfer rate – the rate at which data can be retrieved from or stored to the disk. 16 Mean time to failure (MTTF) – the average time the disk is expected to run continuously without any failure. 3. Define mirroring or shadowing. Duplicate every disk. Logical disk consists of two physical disks. Every write is carried out on both disks. Reads can take place from either disk. If one disk in a pair fails, data still available in the other. Data loss would occur only if a disk fails, and its mirror disk also fails before the system is repaired 4. How to measure the performance of RAID levels? Monetary cost Performance: Number of I/O operations per second, and bandwidth during normal operation Performance during failure Performance during rebuild 5. What is RAID? RAID stands for Redundant Array of Inexpensive Disks. RAID is the organization of multiple disks into a large, high performance logical disk. Disk arrays stripe data across multiple disks and access them in parallel to achieve: Higher data transfer rates on large data accesses and Higher I/O rates on small data accesses. Data striping also results in uniform load balancing across all of the disks, eliminating hot spots that otherwise saturate a small number of disks, while the majority of disks sit idle. 6. What are the various file organizations? Heap – a record can be placed anywhere in the file where there is space Sequential – store records in sequential order, based on the value of the search key of each record Hashing – a hash function computed on some attribute of each record; the result specifies in which block of the file the record should be placed 7. Define data dictionary. Data dictionary (also called system catalog) stores metadata: that is, data about data, such as Information about relations User and accounting information, including passwords Statistical and descriptive data Physical file organization information Information about indices 8. Define index. An index that provides an alternate method of accessing records or portions of records in a data base or file. 9. What are the types of indexing? • Ordered index (Primary index or clustering index) – which is used to access data sorted by order of values. • Hash index (secondary index or non-clustering index ) - used to access data that is distributed uniformly across a range of buckets. 10. What are the factors that should be considered while choosing indexing methods? • access type • access time • insertion time 17 • • deletion time space overhead 11. What are the types of ordered indices? • Dense index - an index record appears for every search-key value in the file. • Sparse index - an index record that appears for only some of the values in the file. 12. Compare dense index and sparse index. Dense index An index record appears for every search key value in file. This record contains search key value and a pointer to the actual record. Faster Sparse index some of the records. the largest search key value less than or equal to the search key value we are looking for. Slower 13. Define primary and secondary indices. The primary index defines the physical organization of the records in the database. Each db has one and only one primary index. In addition, it also have any number of secondary indices. The secondary indices provide alternate access paths to the data by allowing different fields in the record to be used as index keys. Each secondary index is stored in a separate area with its own storage allocation, and any number of secondary indices can be dynamically created and deleted. Index names for a file must be unique. 14. Define multi level indexing. Multi level indexing is used, if primary index does not fit in memory, access becomes expensive. To reduce number of disk accesses to index records, treat primary index kept on disk as a sequential file and construct a sparse index on it. outer index – a sparse index of primary index inner index – the primary index file 15. Define hashing. Hashing is a method to store data in an array so that storing, searching, inserting and deleting data is fast. For this every record needs an unique key. The basic idea is not to search for the correct position of a record with comparisons but to compute the position within the array. The function that returns the position is called the 'hash function' and the array is called a 'hash table'. 16. What are the types of hashing? Static hashing Dynamic hashing (Extendable hashing) 18 17. Compare static hashing & dynamic hashing. Static Hashing has the number of primary pages in the directory fixed. Thus, when a bucket is full, we need an overflow bucket to store any additional records that hash to the full bucket. This can be done with a link to an overflow page, or a linked list of overflow pages. The linked list can be separate for each bucket, or the same for all buckets that overflow. In dynamic Hashing, the size of the directory grows with the number of collisions to accommodate new records and avoid long overflow page chains. 18. What are the steps to be performed in query processing? 1) The scanning, parsing, and validating module produces an internal representation of the query. 2) The query optimizer module devises an execution plan which is the execution strategy to retrieve the result of the query from the database files. A query typically has many possible execution strategies differing in performance, and the process of choosing a reasonably efficient one is known as query optimization. Query optimization is beyond this course 3) The code generator generates the code to execute the plan. 4) The runtime database processor runs the generated code to produce the query result. 19. Define database tuning. Database Tuning is the process of continuing to revise/adjust the physical database design by monitoring resource utilization as well as internal DBMS processing to reveal bottlenecks such as contention for the same data or devices. 19