Download File Systems and Databases Assessment

Database Management - Assignment 5 Good work. See comment below… Grade: 10 out of 10 points Questions 1. Explain the following statement: a transaction is a logical unit of work. A transaction is generated by events and it is a logical unit of work. That means, to record a transaction, all related events should be completed and no partial states are acceptable to avoid data inconsistency. For example, when a student checks in a book (book_id = “12345”), that transaction consists of two events: updating the BOOK table by changing the book status (check_out) “Y” to “N” and deleting a row that corresponds to the same book in the CHECKOUTBOOK table. Changing the status of book to “Checked in” does not complete the check_in transaction and an incomplete transaction violates data integrity and creates an inconsistent database. That is why a transaction is a logical unit of work or events. Using SQL, to check in a book can be completed by the following two statements. UPDATE book SET check_out = “N” WHERE book_id = “12345”; DELETE FROM checkoutbook WHERE book_id = “12345”; You would need to use a COMMIT statement to save both these changes to the database. 2. List and discuss the four transaction properties. The four transaction properties are atomicity, durability, serializability and isolation. - Atomicity means that a transaction is indivisible and it cannot be partially completed to avoid inconsistent data. - Durability means the permanent state of a transaction. For example, when an update occurs the value is changed from an old value to a new value. Durability means the state of a database reaches the state with a new value after completion of a transaction. - Serializability means the occurrence of concurrent execution of multiple transactions, one after another. This is important in multi-user distributed database systems. - Isolation means that the same data cannot be accessed and updated by several transactions. In other words, the second transaction can access the same data after the first transaction completed. In a single-user DBMS, all transactions done by a single user are serial and isolated because one transaction is executed at a time. It ensures the serializability and isolation and it is necessary to use controls for the atomicity and durability of a transaction only. However, in a multi-user DBMS environment, serializability and isolation are important in addition to atomicity and durability in order to meet data integrity and consistency since several concurrent transactions are executed the same data. For instance, in a Athena stand-alone library system, although circulation transactions are done one by one, in the Accent centralized library system which is being used in all LAUSD district schools, these four transaction properties are important because multiple users access the same database at the same time. 3. What is concurrency control, and what is its objective? Concurrency control is the management of simultaneous executions of transactions. The objective of concurrency control is to make sure that the simultaneous transactions are done one after another in the multi-user database environment. Without having this, because multiple users access the same data at the same time over a distributed database system, there may be problems of losing data updates, loss of data integrity and data inconsistency. 4. What three levels of backup may be used in database recovery management? Briefly, describe what each of those three backup levels does. When any failure happens because of the software, hardware or external factors, the data from the backup of the database is what you can use to recover the previous data. Therefore, to backup the database or to schedule automatic database backup is important for data recovery. In database recovery management, there are three levels of backups: 1. Full backup level Full backup gives an exact copy of the entire database. 2. Differential backup level Differential backup copies a part of the database that contains the updates completed after the latest backup copy. 3. Transaction log backup level This backup copies the transactions that are recorded in the transaction log between after the previous backup and just before the failure. 5. List three components of a DDBMS, and list three advantages and three disadvantages of a DDBMS? The three components of a DDBMS are: 1. Computer workstations – Computers are needed to form the network system to distribute the database. 2. The network components – Networking software and hardware components are needed in each workstation to interact with each other. 3. The data processors – The data processors on each computer send and retrieve the data locally. The three advantages of a DDBMS are: faster data access, faster data processing and a User-friendly interface. In DDBMS, several workstations are added to the network system and data is locally stored and accessed at different sites. Therefore, the database system delivers faster data access and faster data processing. In addition, end users are more familiar with the regular pcs and work stations rather than the main frame computers. This is why DDBMS gives the user-friendly and easy interface to users. The three disadvantages of a DDBMS are:1. Complexity of management and control – In DDBMS, transaction management, concurrency control, data security, data recovery play important roles to make sure the database is consistent. That is why data management and control is more complex in DDBMS than that in traditional systems. 2. DDBMS increases the storage needs—Because the data is stored in different workstations, additional disk storage is needed. 3. DDBMS increases the training costs—Users require more training for the complex data management in a distributed system and it increases the cost. 6. Explain the need for the two-phase commit protocol. Then describe the two phases. Although the centralized database system requires only one data processor (DP), in the distributed database multiple-site data environment one DP is needed for each site because multiple processes are done by multiple sites. In this case, it is important that each transaction operation is committed by each local DP. Each DP maintains its own transaction log. When one of the DPs can not commit the transactions while each transaction is committed, that results in a inconsistent database. Because the two-phase commit ensures that all nodes commit their part of the transaction, it is required to solve the data inconsistency in the distributed multi-site data environment. Two-phase commit protocol is implemented in two phases: the Preparation Phase and the Final Commit Phase. - In the Preparation Phase, first, the coordinator ensures all subordinates are prepared to commit. The subordinate replies to the “YES/NO” message to the coordinator by writing to the transaction log using the write-ahead protocol. If all subordinates are ready to commit, the transaction continues to phase 2, otherwise, the coordinator aborts the transaction. - In the Final Commit Phase, the coordinator sends a “COMMIT” message and waits for a reply from the subordinators. The subordinates use the “DO” protocol to update the database. The coordinator cancels all changes by using the “UNDO” protocol if at least one of the subordinates replies “NOT COMMITTED”. 7. Describe the three data fragmentation strategies. Give some examples. 1. Horizontal fragmentation – In the horizontal fragmentation strategy, a table is divided into groups of rows logically and each fragment (group/sunset) contains unique rows and it is stored at a different node. All rows have the same attributes and the SELECT statement produces the contents of the fragments. 2. Vertical fragmentation – In the vertical fragmentation strategy, the table is divided into logical groups of attributes (columns). Each fragment contains unique columns and is stored at a different location. The content in the fragment is obtained by using the “PROJECT” statement. 3. Mixed fragmentation – This strategy is the combination of horizontal and vertical strategies. In other words, each row fragment may be a combination of groups of attributes. For example, suppose that the following is the student table that is centralized in the school district. Student Table S_NO 101 102 103 104 105 106 S_FNAME Clay Melissa Mimi David Aung Diane S_LNAME Gediman Rentchler Rangtha Gillham Min Anderson S_DOB 100978 020477 090580 100978 020477 090580 LOC_CODE 8571 8556 8571 8522 8556 8522 LOCATION Canoga Park El Camino Canoga Park Taft El Camino Taft STATUS G D G G G D OD_AMT $0.00 $0.50 $0.00 $0.00 $0.00 $4.50 Suppose that the district is interested to organize the student by location code, in the horizontal fragmentation strategy, the student table is divided into 3 fragments as shown below. FRAG_NAME STU_H1 STU_H2 STU_H3 LOCATION Canoga Park El Camino Taft CONDITION LOC_CODE="8571" LOC_CODE="8556" LOC_CODE="8522" NODE_NAME CP EC TF S_NO 101,103 102,105 104,106 NO. OF ROWS 2 2 2 Suppose that the district has two departments: library services and student information. In the vertical fragmentation of the student table, the table is divided into two fragments according to the attributes as below. FRAG_NAME LOCATION NODE_NAME STU_V1 Library Service Student Information LIBSER ATTRIBUTE NAMES S_NO, S_FNAME, S_LNAME, S_DOB, LOC_CODE SINFO S_NO, STATUS, OD_AMT STU_V2 below. In the mixed fragmentation, the fragmentations of the student table will be as HORIZONTAL CRITERIA NODE NAME ROWS Library Service Student Information LOC_CODE="8571" CP_S 101,103 LOC_CODE="8571" CP_L 101,103 Library Service Student Information LOC_CODE="8556" EC_S 102,105 LOC_CODE="8556" EC_L 102,105 Library Service Student Information LOC_CODE="8522" TF_S 104,106 S_NO, STATUS, OD_AMT S_NO, S_FNAME, S_LNAME, S_DOB, LOC_CODE LOC_CODE="8522" TF_L 104,106 S_NO, STATUS, OD_AMT FRAG_NAME LOCATION STU_M1 STU_M2 STU_M3 STU_M4 STU_M5 STU_M6 ATTRIBUTE NAMES S_NO, S_FNAME, S_LNAME, S_DOB, LOC_CODE S_NO, STATUS, OD_AMT S_NO, S_FNAME, S_LNAME, S_DOB, LOC_CODE 8. What is data replication, and what are the three replication strategies? Data replication means that copies of data fragments are stored at multiple sites to enhance data availability and response time. If the database is replicated, it is required to update all copies of the database at all sites for a ‘WRITE’ operation to maintain data consistency. There are three replication strategies: fully replicated, partially replicated and unreplicated. The key factors to decide to use data replication are size of the database, usage frequency and costs. (1). Fully replicated strategy In this strategy, each database fragment is copied and stored at multiple sites. But fully replication increases the overhead cost and it is not practical. (2). Partially replicated strategy In this strategy, some database fragments are copied and stored at multiple sites. Partially replication is used by most DDBMSs. (3). Unreplicated strategy This strategy does not copy any fragment of the database and just stores the fragment at a single site. 9. You've been hired to integrate all of the libraries in the California State University system. Describe a high-level design you might use to set this up as a distributed database. I might use heterogeneous distributed database design to integrate all of the libraries in the California State University system because first of all, I believe all libraries are using different platforms, different DBMSs and different operating systems. Only Heterogeneous distributed database design can handle different DBMS that are running under different computer systems. Using heterogeneous distributed database design is more cost effective than using a homogeneous system since all libraries have started different systems under different platforms. In addition, using distributed database systems enhance data availability and response time as opposed to using the centralized database system. 10. C.J Date's Twelve Commandments for Distributed Databases are listed on page 514. Choose what you think are the three most important commandments and the three least likely to implemented commandments as they relate to library science. Explain your rationale. Deciding on the most important commandments and the ones least likely to be implemented depends on the distributed database system used. Suppose that all of the libraries in the California State University system are integrated under the distributed database system to minimize the costs in the long run. For this system, I think the three most important commandments are Hardware independence, Operating system independence and Database independence; this is because libraries are using different hardware, operating systems and databases according to each library’s needs. To be able to work together in libraries, the distributed databases with independent hardware, operating system and database are highly required. The three least likely to be implemented commandments are Failure independence, Location transparency and Distributed query processing because every distributed system already satisfies these three capabilities. The database system is able to continue when a node failure in the network and the users do not need to know where the data came from. Finally, the transaction processor is one of the components of the distributed database system and it does the distributed query processing. This is why these three commandments are basic features of a distributed database and they are least likely to be implemented for the desired distributed database library system.

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download File Systems and Databases Assessment