Download File Systems and Databases Assessment

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Oracle Database wikipedia , lookup

Global serializability wikipedia , lookup

IMDb wikipedia , lookup

Open Database Connectivity wikipedia , lookup

Entity–attribute–value model wikipedia , lookup

Ingres (database) wikipedia , lookup

Commitment ordering wikipedia , lookup

Relational model wikipedia , lookup

Database wikipedia , lookup

Microsoft Jet Database Engine wikipedia , lookup

Extensible Storage Engine wikipedia , lookup

Versant Object Database wikipedia , lookup

ContactPoint wikipedia , lookup

Serializability wikipedia , lookup

Clusterpoint wikipedia , lookup

Database model wikipedia , lookup

Concurrency control wikipedia , lookup

Transcript
Database Management - Assignment 5
Good work. See comment below…
Grade: 10 out of 10 points
Questions
1. Explain the following statement: a transaction is a logical unit of work.
A transaction is generated by events and it is a logical unit of work. That means, to
record a transaction, all related events should be completed and no partial states are
acceptable to avoid data inconsistency.
For example, when a student checks in a book (book_id = “12345”), that
transaction consists of two events: updating the BOOK table by changing the book status
(check_out) “Y” to “N” and deleting a row that corresponds to the same book in the
CHECKOUTBOOK table. Changing the status of book to “Checked in” does not complete
the check_in transaction and an incomplete transaction violates data integrity and creates an
inconsistent database. That is why a transaction is a logical unit of work or events.
Using SQL, to check in a book can be completed by the following two statements.
UPDATE book
SET check_out = “N”
WHERE book_id = “12345”;
DELETE FROM checkoutbook
WHERE book_id = “12345”;
You would need to use a COMMIT statement to save both these changes to the
database.
2. List and discuss the four transaction properties.
The four transaction properties are atomicity, durability, serializability and isolation.
- Atomicity means that a transaction is indivisible and it cannot be partially completed to
avoid inconsistent data.
- Durability means the permanent state of a transaction. For example, when an update
occurs the value is changed from an old value to a new value. Durability means the state of
a database reaches the state with a new value after completion of a transaction.
- Serializability means the occurrence of concurrent execution of multiple transactions, one
after another. This is important in multi-user distributed database systems.
- Isolation means that the same data cannot be accessed and updated by several
transactions. In other words, the second transaction can access the same data after the first
transaction completed.
In a single-user DBMS, all transactions done by a single user are serial and isolated because
one transaction is executed at a time. It ensures the serializability and isolation and it is
necessary to use controls for the atomicity and durability of a transaction only. However, in a
multi-user DBMS environment, serializability and isolation are important in addition to
atomicity and durability in order to meet data integrity and consistency since several
concurrent transactions are executed the same data.
For instance, in a Athena stand-alone library system, although circulation transactions are
done one by one, in the Accent centralized library system which is being used in all LAUSD
district schools, these four transaction properties are important because multiple users
access the same database at the same time.
3. What is concurrency control, and what is its objective?
Concurrency control is the management of simultaneous executions of
transactions.
The objective of concurrency control is to make sure that the simultaneous
transactions are done one after another in the multi-user database environment. Without
having this, because multiple users access the same data at the same time over a
distributed database system, there may be problems of losing data updates, loss of data
integrity and data inconsistency.
4. What three levels of backup may be used in database recovery management? Briefly,
describe what each of those three backup levels does.
When any failure happens because of the software, hardware or external factors, the data
from the backup of the database is what you can use to recover the previous data.
Therefore, to backup the database or to schedule automatic database backup is important
for data recovery. In database recovery management, there are three levels of backups:
1. Full backup level
Full backup gives an exact copy of the entire database.
2. Differential backup level
Differential backup copies a part of the database that contains the updates completed after
the latest backup copy.
3. Transaction log backup level
This backup copies the transactions that are recorded in the transaction log between after
the previous backup and just before the failure.
5. List three components of a DDBMS, and list three advantages and three disadvantages of
a DDBMS?
The three components of a DDBMS are:
1. Computer workstations – Computers are needed to form the network system to distribute
the database.
2. The network components – Networking software and hardware components are needed in
each workstation to interact with each other.
3. The data processors – The data processors on each computer send and retrieve the data
locally.
The three advantages of a DDBMS are: faster data access, faster data processing
and a User-friendly interface. In DDBMS, several workstations are added to the network
system and data is locally stored and accessed at different sites. Therefore, the database
system delivers faster data access and faster data processing. In addition, end users are
more familiar with the regular pcs and work stations rather than the main frame computers.
This is why DDBMS gives the user-friendly and easy interface to users.
The three disadvantages of a DDBMS are:1. Complexity of management and control – In DDBMS, transaction management,
concurrency control, data security, data recovery play important roles to make sure the
database is consistent. That is why data management and control is more complex in
DDBMS than that in traditional systems.
2. DDBMS increases the storage needs—Because the data is stored in different
workstations, additional disk storage is needed.
3. DDBMS increases the training costs—Users require more training for the complex data
management in a distributed system and it increases the cost.
6. Explain the need for the two-phase commit protocol. Then describe the two phases.
Although the centralized database system requires only one data processor (DP),
in the distributed database multiple-site data environment one DP is needed for each site
because multiple processes are done by multiple sites. In this case, it is important that each
transaction operation is committed by each local DP. Each DP maintains its own transaction
log. When one of the DPs can not commit the transactions while each transaction is
committed, that results in a inconsistent database. Because the two-phase commit ensures
that all nodes commit their part of the transaction, it is required to solve the data
inconsistency in the distributed multi-site data environment.
Two-phase commit protocol is implemented in two phases: the Preparation Phase
and the Final Commit Phase.
- In the Preparation Phase, first, the coordinator ensures all subordinates are prepared to
commit. The subordinate replies to the “YES/NO” message to the coordinator by writing to
the transaction log using the write-ahead protocol. If all subordinates are ready to commit,
the transaction continues to phase 2, otherwise, the coordinator aborts the transaction.
- In the Final Commit Phase, the coordinator sends a “COMMIT” message and waits for a
reply from the subordinators. The subordinates use the “DO” protocol to update the
database. The coordinator cancels all changes by using the “UNDO” protocol if at least one
of the subordinates replies “NOT COMMITTED”.
7. Describe the three data fragmentation strategies. Give some examples.
1. Horizontal fragmentation –
In the horizontal fragmentation strategy, a table is divided into groups of rows
logically and each fragment (group/sunset) contains unique rows and it is stored at a
different node. All rows have the same attributes and the SELECT statement produces the
contents of the fragments.
2. Vertical fragmentation –
In the vertical fragmentation strategy, the table is divided into logical groups of
attributes (columns). Each fragment contains unique columns and is stored at a different
location. The content in the fragment is obtained by using the “PROJECT” statement.
3. Mixed fragmentation –
This strategy is the combination of horizontal and vertical strategies. In other
words, each row fragment may be a combination of groups of attributes.
For example, suppose that the following is the student table that is centralized in the school
district.
Student Table
S_NO
101
102
103
104
105
106
S_FNAME
Clay
Melissa
Mimi
David
Aung
Diane
S_LNAME
Gediman
Rentchler
Rangtha
Gillham
Min
Anderson
S_DOB
100978
020477
090580
100978
020477
090580
LOC_CODE
8571
8556
8571
8522
8556
8522
LOCATION
Canoga Park
El Camino
Canoga Park
Taft
El Camino
Taft
STATUS
G
D
G
G
G
D
OD_AMT
$0.00
$0.50
$0.00
$0.00
$0.00
$4.50
Suppose that the district is interested to organize the student by location code, in
the horizontal fragmentation strategy, the student table is divided into 3 fragments as shown
below.
FRAG_NAME
STU_H1
STU_H2
STU_H3
LOCATION
Canoga Park
El Camino
Taft
CONDITION
LOC_CODE="8571"
LOC_CODE="8556"
LOC_CODE="8522"
NODE_NAME
CP
EC
TF
S_NO
101,103
102,105
104,106
NO. OF
ROWS
2
2
2
Suppose that the district has two departments: library services and student
information. In the vertical fragmentation of the student table, the table is divided into two
fragments according to the attributes as below.
FRAG_NAME
LOCATION
NODE_NAME
STU_V1
Library Service
Student
Information
LIBSER
ATTRIBUTE NAMES
S_NO, S_FNAME, S_LNAME, S_DOB,
LOC_CODE
SINFO
S_NO, STATUS, OD_AMT
STU_V2
below.
In the mixed fragmentation, the fragmentations of the student table will be as
HORIZONTAL
CRITERIA
NODE
NAME
ROWS
Library Service
Student
Information
LOC_CODE="8571"
CP_S
101,103
LOC_CODE="8571"
CP_L
101,103
Library Service
Student
Information
LOC_CODE="8556"
EC_S
102,105
LOC_CODE="8556"
EC_L
102,105
Library Service
Student
Information
LOC_CODE="8522"
TF_S
104,106
S_NO, STATUS, OD_AMT
S_NO, S_FNAME, S_LNAME,
S_DOB, LOC_CODE
LOC_CODE="8522"
TF_L
104,106
S_NO, STATUS, OD_AMT
FRAG_NAME
LOCATION
STU_M1
STU_M2
STU_M3
STU_M4
STU_M5
STU_M6
ATTRIBUTE NAMES
S_NO, S_FNAME, S_LNAME,
S_DOB, LOC_CODE
S_NO, STATUS, OD_AMT
S_NO, S_FNAME, S_LNAME,
S_DOB, LOC_CODE
8. What is data replication, and what are the three replication strategies?
Data replication means that copies of data fragments are stored at multiple sites to
enhance data availability and response time. If the database is replicated, it is required to
update all copies of the database at all sites for a ‘WRITE’ operation to maintain data
consistency.
There are three replication strategies: fully replicated, partially replicated and unreplicated. The key factors to decide to use data replication are size of the database, usage
frequency and costs.
(1). Fully replicated strategy
In this strategy, each database fragment is copied and stored at multiple sites. But fully
replication increases the overhead cost and it is not practical.
(2). Partially replicated strategy
In this strategy, some database fragments are copied and stored at multiple sites. Partially
replication is used by most DDBMSs.
(3). Unreplicated strategy
This strategy does not copy any fragment of the database and just stores the fragment at a
single site.
9. You've been hired to integrate all of the libraries in the California State University system.
Describe a high-level design you might use to set this up as a distributed database.
I might use heterogeneous distributed database design to integrate all of the
libraries in the California State University system because first of all, I believe all libraries are
using different platforms, different DBMSs and different operating systems. Only
Heterogeneous distributed database design can handle different DBMS that are running
under different computer systems. Using heterogeneous distributed database design is
more cost effective than using a homogeneous system since all libraries have started
different systems under different platforms. In addition, using distributed database systems
enhance data availability and response time as opposed to using the centralized database
system.
10. C.J Date's Twelve Commandments for Distributed Databases are listed on page 514.
Choose what you think are the three most important commandments and the three least
likely to implemented commandments as they relate to library science. Explain your
rationale.
Deciding on the most important commandments and the ones least likely to be
implemented depends on the distributed database system used.
Suppose that all of the libraries in the California State University system are
integrated under the distributed database system to minimize the costs in the long run. For
this system, I think the three most important commandments are Hardware independence,
Operating system independence and Database independence; this is because libraries are
using different hardware, operating systems and databases according to each library’s
needs. To be able to work together in libraries, the distributed databases with independent
hardware, operating system and database are highly required.
The three least likely to be implemented commandments are Failure independence,
Location transparency and Distributed query processing because every distributed system
already satisfies these three capabilities. The database system is able to continue when a
node failure in the network and the users do not need to know where the data came from.
Finally, the transaction processor is one of the components of the distributed database
system and it does the distributed query processing. This is why these three
commandments are basic features of a distributed database and they are least likely to be
implemented for the desired distributed database library system.