Download Solutions 2003 - Department of Computing

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Open Database Connectivity wikipedia , lookup

Microsoft Jet Database Engine wikipedia , lookup

Serializability wikipedia , lookup

Extensible Storage Engine wikipedia , lookup

Entity–attribute–value model wikipedia , lookup

Concurrency control wikipedia , lookup

Relational algebra wikipedia , lookup

Database wikipedia , lookup

ContactPoint wikipedia , lookup

Clusterpoint wikipedia , lookup

Database model wikipedia , lookup

Relational model wikipedia , lookup

Transcript
SOLUTIONS - CIS209 - INTERNAL - 2003
PROBLEM 1
[25]
Question 1
[6]
A database is defined as “a shared collection of logically related persistent data as part of the
information system of an organisation”. Explain in brief the meaning of “shared”, “logically related”
and “persistent” in this definition.
Answer
“shared” means that different parts of an organisation — represented either via direct users or
applications — put their data together in one repository — the database — which is then shared
by all of them;
[2]
“logically related” means that the different items of data stored in a database are not independent
from each-other; different links/relationships exist between them;
[2]
“persistent” means that, once stored, data does not disappear unless users instruct the DBMS to
remove it; data persists whether or not the application programs that access it are running and it
even persists in situations when the DBMS goes down;
[2]
Award 2 marks per explanation if the main idea was captured by the answer (i.e., even if the
answer does not provide the above level of detail).
TOTAL [6]
Question 2
[6]
Define the notion of “candidate key”. Can a relation have more than one candidate key? Give an
example. Define the notion of “primary key” in terms of the “candidate key”.
Answer
A candidate key of a relation R is a (sub)set of attributes CK of R with the following properties:
a) no distinct tuples in R can have the same value for CK (uniqueness); and
[1]
b) no proper subset of R has the uniqueness property (irreducibility).
[1]
The alternative definition — “A candidate key of R is a subset of attribute which can be used to
uniquely identify each tuple in R” — should be rewarded 1 mark only.
A relation can have more than one candidate key.
[1]
Example: R(StudentNo, FirstName, LastName, DOB, Address, Programme)
CK: StudentNO and CK: (FistName, LastName, DOB)
[2]
(award 1 mark for each key correctly identified, but only if the relation has indeed two CKs)
A primary key is a candidate key arbitrarily chosen by the developer of the database system. [1]
(Note: in fact, the PK is used by many DBMSs in joins, and thus the choice is not completely
arbitrary — the matter of efficiency arises; however, within the context of the relational model, no
such arguments as efficiency arise, thus the choice is arbitrary)
TOTAL [6]
CIS209 – IS52003A
2003 Internal Solutions
1
Question 3
[3]
Explain in brief what is it meant by physical program–data independence (or simply by just
program—data independence) in the context of database systems.
Answer
The internal/physical and the logical levels of a database system are separated from each other
[and are linked through a schema mapping].
[1]
Application programs access the data of a database at the logical level [or the external].
[1]
Physical program-data independence is the immunity of application programs to changes at the
internal/physical level (assuming that the conceptual level does not change).
[1]
Award full marks if the general idea is conveyed, but less rigorously.
TOTAL [3]
Question 4
[3]
Enumerate three benefits of the database approach to data management (as opposed to a bare
file based approach).
Answer
- reduced redundancy;
- fewer possibilities for storing inconsistent data;
- easier integrity maintenance;
- shared data;
- program-data independence;
- better maintenance of the data/information of the organisation (and of the overall information
system);
- data security can be easier enforced;
- recovery mechanisms exist;
- standards can be enforced;
- conflicting requirements (for the overall information system) can be balanced;
Award 1 mark per advantage/benefit correctly stated, but not more than 3 marks.
TOTAL [3]
Question 5
[4]
Data can be stored in files and application programs can share this data by having a direct
access to the respective files (refer to Diagram 1, below). However, data-centred applications
normally employ a database management system (refer to Diagram 2, below).
DBMS
Program/Application 1
data files 1
Program/Application 1
data
Program/Application n
data files n
Program/Application n
Diagram 1
Diagram 2
How is the redundancy of data in each of the two approaches?
CIS209 – IS52003A
2003 Internal Solutions
2
Answer
In the first approach —file based — data is not integrated and thus the probability of ending with a
lot of redundancy in the overall information system is quite high (different departments/parts of
the system share sometimes big chunks of data.
[2]
In the database approach data is integrated and shared. [Providing a good design] the
redundancy of data can be greatly reduced. [In fact, redundancy can almost completely be
reduced, providing there are no other reasons to maintain it (efficiency of queries that would
employ joins).]
[2]
The text in square brackets is not a necessary part of the answer.
TOTAL [4]
Question 6
[3]
A union, intersection or difference can only be performed between two relations if they are type
compatible. What is it meant by type compatibility? Give an example of two type compatible and
two non type compatible relations (only the headings are required).
Answer
Two relations are type compatible if [and only if] they have the same headings.
[1]
Alternatively, two relations are type compatible if [and only if] they have the same set of attributes.
The following two relations (having the same headings) are type compatible
Men {<name : varchar>, <dob : date>, <address : varchar>}
Women {<name : varchar>, <dob : date>, <address : varchar>}
[1]
whereas the following two relations are not type compatible
Students {<name : varchar>, <address : varchar>, <programme : varchar>}
Tutors {<name : varchar>, <address : varchar>, <main topic : char(6)>}
[1]
Award 1 mark for any correct respective example.
If the examples are correct, but consist only of attribute names (and not of <attribute-name :
attribute type> pairs, then award only 1 mark.
TOTAL [3]
PROBLEM 2
[25]
Question 1
[19]
Draw an ER diagram for the following description. The diagram should illustrate the entity types,
including their attributes, the relationships between them and the multiplicity of each relationship
(note that the textual description does not specify the multiplicity of all the relationships; you will
have to state it yourself, according to your understanding of the problem). Work according to the
following conventions: many-to-many relationships do not have to be transformed into one-tomany relationships; attributes could be composite and/or multi-valued; relationships may have
attributes.
An airline company intends to develop a database system for storing information about flights and their
passengers. The information about flights is of two types: general flight information (such as the BA123
flight from London to Paris, departing at 12:00, available daily) and specific flight information (such as the
BA123 flight on 02/04/2003 whose captain is John Smith). Note that a passenger may book a place on a
specific flight but could not book a general flight. The general flight information should consist of flight
number, destination airport, starting-point airport, intermediate stops (a list of airports), departure time and
arrival time. The specific flight information should consist of date of flight, captain, delay at departure and
delay at arrival.
CIS209 – IS52003A
2003 Internal Solutions
3
The airline company has a flotilla of aircraft. The required information on aircraft consists of model/type
(e.g., Airbus 540), seat capacity, normal flying altitude, flight autonomy (how long could it fly without
refuelling), and an internal aircraft identifier used in case the company has more than one aircraft of the
same model/type. A specific flight is always assigned one aircraft.
Passengers book (specific) flights. The information regarding a booking, that is to be stored in the database,
consists of ticket number and payment details. It would be useful to also have recorded in the database the
time when the booking was made and the name of the staff member who performed the operation. The
information required for each passenger consists of name, contact details  these are made of house
number, street, city, postcode, country and telephone  and whether the person is smoking or nonsmoking.
Answer
Passenger
name
contactDetails
houseNo
street
city
postcode
country
telephone [1..2]
smokingNonSmoking
1..*
ticketNo
paymentDetails
date
staff
Books
1..*
GenericFlight
Aircraft
flightNo
destination
startingPoint
intermediateStops
departureTime
arrivalTime
SpecificFlight
1
Has
0..* date
captain
delayAtDeparture
delayAtArrival
0..*
IsAssignedTo
id
1 type
seatCapacity
altitude
autonomy
Award
4 marks for the correct identification of entities (1 per entity)
(the names of the entities may be different, provided they “preserve” the same meaning as
above)
4 marks for the correct identification of attributes (1 per entity)
(the names of the attributes may be different, provided they “preserve” the same meaning as
above; the marking should also accommodate “small” variations  e.g. if one attribute is
missing, nut the others are correctly identified, full marks should be awarded)
3 marks for the correct identification of relationships (1 per relationship)
(the names and direction of the relationships may be different, provided they “preserve” the
same meaning as above)
6 marks for correct identification of multiplicity of relationships (2 per relationship  one at each
end)
2 marks for the correct identification of the attributes for the Books relationship; still award 2
marks even if ‘date’ and ‘staff’ are not included.
CIS209 – IS52003A
2003 Internal Solutions
4
Possible correct variations:
- contact details considered as a separate entity;
- telephone may be considered as single-valued attribute;
- relationships Books replaced with an entity Booking with the same attributes as the relationship;
there will be two one-to-many relationships between Booking, on one side, and Passenger and
SpecificFlight, on the other;
- the multiplicity of the IsAssignedTo, on the SpecificFlight side may be [1..*];
- the multiplicity of the Has, on the SpecificFlight side may be [1..*];
- Aircraft may be divided into two entities: a generic aircraft entity (detailing the type/model) and a
specific aircraft entity (containing the id of each particular aircraft and other specific details);
alternatively, the ‘id’ attribute as it is now in Aircraft could be regarded as multivalued.
TOTAL [19]
Question 2
[6]
Consider the ER structure depicted in Figure 1 below. Find an application (e.g. library, hospital,
software development company, university, etc.) for which this structure could be used to model a
part of its information system and illustrate this model — i.e., find meaningful names for the entity
types (E1 and E2), for the attributes of each entity (a1, ..., a5, b1, ..., b4), for the relationship R and
for its attribute c.
c
E1
a1
a2
a3
a4
a5
E2
[0..*]
R
[1..*] b1
b2
b3
b4
Figure 1
Answer
Sample solution:
E1 : Student(studNo, fName, lName, address, dOB)
E2 : Course(code, title, level, shortSyllabus)
R : Takes (each student must take at least one course, but there may be (new optional) courses
that are not taken by anyone (yet)
c : result
Award:
1 mark for a correct illustration of E1, a1 ... a5;
1 mark for a correct illustration of E2, b1 ... b4;
2 marks for a correct illustration of R;
2 marks for a correct illustration of c.
TOTAL [6]
CIS209 – IS52003A
2003 Internal Solutions
5
PROBLEM 3
[25]
Question 1
Consider the following relation.
patient_id
patient_name
p_DOB
[4]
disease
doctor
speciality
diagnosis
treatment
Consider also the following assumptions: a doctor gives a unique diagnosis for the disease of one
patient; however, a doctor may give different diagnoses for the same disease (for different
patients); each diagnosis has associated a unique treatment; a doctor has a unique speciality; a
patient has a unique patient_id.
(a) In each of the expressions below, substitute the question marks with sets of attributes from
the above relation to obtain expressions representing functional dependencies.
[3]
patient_id  ?
?  diagnosis
?  treatment
(b) Choose a primary key for this relation.
[1]
Answer
a) patient_id  patient_name (or p_DOB)
patient_id, disease, doctor  diagnosis
diagnosis  treatment
b) PK : (patient_id, disease, doctor)
[1]
[1]
[1]
[1]
TOTAL [4]
Question 2
Consider the following relation.
project
task
max_budget
[21]
duration
payment_rate
contractor
contracted_time
and the following functional dependencies:
(project, task)  max_budget, duration
//there is a unique max_budget and period of work (duration) per project task
(task, max_budget, duration)  payment_rate
//the contracting payment rate is unique given a certain task, max_budget and duration
(project, task, contractor)  contacted_time
//contractors are employed on project tasks
Assume they completely express all the functional dependencies existing in the given relation
(i.e., the other are either trivial or can be deduced from the given ones).
a) State the primary key for this relation (there is a unique candidate key).
[2]
b) State the definition for Boyce-Codd Normal Form (BCNF)
[2]
c) State a reason why this relation is not in BCNF.
[2]
d) State Heath’s theorem (for non-loss decomposition).
[3]
e) Decompose/transform (non-loss) the given relation into a set of relations in BCNF. Explain how
you apply Heath’s theorem for each decomposition you make. State the end result clearly. Also,
state the candidate keys for each resulting BCNF relation.
[12]
Note that the order in which you employ the above functional dependencies in normalisation is
important — some orders may lead to the loss of certain dependencies. You are advised to start
with the second functional dependency.
CIS209 – IS52003A
2003 Internal Solutions
6
Answer
a) primary key (PK) : (project, task, contractor)
[2]
b) A relation is in BCNF if and only if each of its non-trivial (left-)irreducible functional
dependencies has a candidate key as its determinant.
The “softer” (incorrect) version, whereby “non-trivial” and “ireducible” are not mentioned, should
also be accepted (awarded full marks), provided the following question is also correctly
answered. Otherwise award only 1 mark.
[2]
c) The first two functional dependencies (FDs) do not have a candidate key as their determinant,
therefore they cause the relation to not be in BCNF. If either is mentioned as a cause, award full
marks.
[2]
d) Let R be a relation and A, B and C subsets of attributes of R satisfying the following condition
“heading(R)=ABC”. If R satisfies the functional dependency “AB” then R is equal to the join
of its projections on (A, B) and (A, C). (alternatively, ‘If R satisfies the functional dependency
“AB” then R can be non-loss decomposed into (A, B) and (A, C)).
[3]
e)
(1) Heath’s theorem for R (the initial relation) based on
‘task, max_budget, duration  payment_rate’ leads to:
R1 (task, max_budget, duration, payment_rate)
CK/PK : (task, max_budget, duration)
R2 (project, task, max_budget, duration, contractor, contracted_time)
CK/PK : (project, task, contractor)
R1 is in BCNF
R2 is not in BCNF, due to ‘project, task  max_budget, duration’
(2) Heath’s theorem for R2, based on ‘project, task  max_budget, duration’ leads to
R21 (project, task, max_budget, duration)
CK/PK : (project, task)
R22 (project, task, contractor, contracted_time)
CK/PK : (project, task, contractor)
R21 is in BCNF
R22 is in BCNF
All of the initial FDs have been expressed.
Result:
(task, max_budget, duration, payment_rate)
(project, task, max_budget, duration)
(project, task, contractor, contracted_time)
Award 6 marks for step (1) and 6 marks for step (2).
Alternatively, award 4 marks for correct set of normalised relations (refer to “Result”, above; this
should include the specification of CKs) — this means that the student had an intuition of the
correct answer. The rest of 8 marks should be awarded for a correct normalisation process
(application of Heath’s theorem (2 marks) + identification of relations in or not in BCNF (2
marks)).
[12]
NOTE: A correct answer to question e) accompanied by incorrect answers to questions b) and/or
d) should look suspicious. Although it is possible that the student knows how to apply the
definitions without being able to state them, this is improbable and the marker should consider
such cases with care.
TOTAL [21]
CIS209 – IS52003A
2003 Internal Solutions
7
PROBLEM 4
[25]
Consider the following database schema (some tuples are provided for explanatory purposes;
arrows denote foreign keys —foreign keys are in italics; primary keys are in bold and underlined):
Customer
name
dOB
address
occupation
AccountType
name
category
Student Current
Golden Current
Golden Savings
current
current
savings
minBalance
-500
-2000
5000
interest
0.5%
1%
3.5%
Account
accNo
type
C-110-221
S-009-677
balance
owner
Student Current
Golden Savings
Joe Bloggs
Mary Bear
-245
5500
Transaction
accNo
date
C-110-221
C-110-221
12/04/2003
21/04/2003
time
12:30
9:15
transType
valIn
cash withdraw
cheque payment
0
200
valOut
30
0
Question 1
Referring to the above schema, express the following natural language queries in SQL:
[19]
(a) Find the minimum balance and interest for the ‘Golden Savings’ account type.
[1]
SELECT minBalance, interest
FROM
AccountType
WHERE type = ‘Golden Savings’;
(b) List the account names and their corresponding interest rates ordered according to the
interest rates for all the ‘savings’ account types.
[2]
SELECT
FROM
WHERE
ORDER BY
name, interest
AccountType
category = ’savings’
interest;
(c) List the date, time, transaction type, value in and value out for all the transactions incurred
between ‘1/01/2003’ and ‘1/04/2003’ on ‘Joe Bloggs’’s ‘Student Current’ account.
[3]
SELECT
FROM
WHERE
date, time, transType, valIn, valOut
Transaction T, Account A
T.accNo = A.accNo AND
date BETWEEN (‘01/01/2003’ AND ‘01/04/2003’) AND
owner = ‘Joe Bloggs’ AND type = ‘Student Current’;
/* note that it ,makes no difference
“student Current” accounts */
whether
Joe
Bloggs
(d) List how much money ‘Mary Bear’ has in all her ‘savings’ accounts.
SELECT
FROM
WHERE
has
one
or
more
[3]
SUM(balance) AS totalSavings
Account A, AccountType AT
type = name AND
owner = ‘Mary Bear’ AND category = ‘savings’;
CIS209 – IS52003A
2003 Internal Solutions
8
(e) List the name, address, occupation and total balance for all the customers whose total
balance is negative — for each customer, their “total balance” means the sum of the balances of
all their accounts.
[3]
SELECT
FROM
WHERE
GROUP BY
HAVING
name, address, occupation, SUM(balance) AS totalBalance
Account A, Customer C
owner = name
name, address, occupation
totalBalance < 0;
‘address’ and ‘occupation’ are semantically redundant in the GROUP BY clause, but they are
syntactically necessary in SQL; however, still award full marks even if the student uses only
‘name’ in the GROUP BY clause (i.e. assume that s/he would be able to “fix” such an omission if
s/he were executing the command).
Also ward full marks of the student does not use a name for the computed field and reuses the
SUM expression in the HAVING clause.
(f) List the account number, balance, account name/type and the interest on the respective
account, for the account on which ‘Joe Bloggs’ has the highest balance.
[3]
SELECT
FROM
WHERE
accNo, balance, type, interest ---‘name’ may be used instead ‘type’
Account, AccountType
type = name AND
owner = ‘Joe Bloggs’ AND
balance IN (
SELECT MAX(balance)
FROM
Account
WHERE owner = ‘Joe Bloggs’
);
‘=’ could have been used instead of ‘IN’ to introduce the subquery — i.e., award full marks if that
solution is proposed. The reason for the solution proposed above is its compliance with the
relational model (the result of any select statement should be a relation/set).
g) List the account category, number of accounts and average balance per category of account
for all the customers whose occupation is ‘student’.
[4]
SELECT
FROM
WHERE
category, COUNT(accNo) AS noAccounts, AVG(balance) AS avgBalance
Account, AccountType AT, Customer C
type = AT.name AND owner = C.name AND
occupation = ‘student’
GROUP BY category;
Note that ‘COUNT(*)’ could have been used instead of ‘COUNT(accNo)’.
TOTAL [19]
Question 2
Referring to the above schema, express the following integrity constraints in SQL:
[6]
a) The balance on each individual account (stored in ‘Account’) should not go below the minimum
balance for its type (as stated in ‘AccountType’).
[3]
CREATE ASSERTION BalanceLimit CHECK (
NOT EXISTS ( SELECT * FROM Account, AccountType
WHERE
type = name AND balance < minBalance ));
CIS209 – IS52003A
2003 Internal Solutions
9
b) The value-out for cash withdraws (see Transaction) for any “Student Current” account cannot
be greater than 100 per individual transaction (note that this does not prevent the owner to
withdraw more than 100 in consecutive transactions).
[3]
CREATE ASSERTION WithdrawLimit CHECK (
NOT EXISTS ( SELECT * FROM Transaction T, Account A
WHERE
T.accNo = A.accNo AND transType = ‘cash withdraw’ AND
type = ‘Student Current’ AND valOut > 100 ));
TOTAL [6]
PROBLEM 5
[25]
Question 1
[10]
a) Consider the relation “Absences (student, date)”, with the primary key “(student, date)”, which
records the dates when students are absent from university. For illustration, a small extension is
given in Figure 1 below:
Absences
student
adate
S. Allen
P. Clark
S. Allen
M. Lewis
12/01/2003
03/02/2003
05/03/2003
05/03/2003
CREATE VIEW
SELECT
FROM
GROUP BY
AbsCount AS
student, COUNT(adate) AS noAbsences
Absences
student;
Figure 1
Figure 2
Consider the view “AbsCount”, as defined in Figure 2. This represents the number of absences
per student. Lastly, consider the following update operation attempted on AbsCount:
UPDATE AbsCount SET noAbsences = 3 WHERE student = ‘S. Allen’;
State whether this update operation could be performed by a relational DBMS and explain your
answer. Draw a general rule regarding views from the above example. Although the syntax is that
of SQL, you should consider the problem independent from any specific database language
and/or DBMS.
[6]
b) State two restrictions imposed by SQL2 on update operations to views.
[4]
Answer
a)
This operation can simply not be performed by any relational DBMS.
[1]
Explanation
Any update on a view has to be propagated to the base relations on which the view is defined.
The view ‘AbsCount’ aggregates tuples from ‘Absences’. The proposed update should lead to the
insertion or deletion of tuples in ‘Absence’ (obviously, only in the cases when the new value in the
update statement is different from the old value). To insert a tuple into ‘Absences’, both ‘student’
and ‘adate’ should be provided. ‘adate’ is neither given in the query, nor can it be generated
automatically for the tuples to be inserted/deleted.
[4]
If the student’s explanation is coherent and illustrates an understanding of the problem, but if it is
not as comprehensive as above, still award full marks.
Rule
Any coherent rule that illustrates an understanding of the problem — even if it is incorrect —
should be awarded full marks (i.e. 1). For example:
“Updates are not possible through views that are defined via some aggregate functions” is
incorrect but should be given 1 mark.
[1]
CIS209 – IS52003A
2003 Internal Solutions
10
b) In SQL2 updates on the following views are not possible:
- views defined on two or more tables;
- views defined via UNION, INTERSECT, EXCEPT;
- the SELECT statement contains the word DISTINCT;
- column specifications that contain elements different from a simple reference to columns of
underlying base tables
- ... etc. (see p. 83 of Study Guide, Vol. 1)
Award 2 marks per correct statement.
[4]
TOTAL [10]
Question 2
[8]
a) What is a transaction? Give a simple example.
[4]
b) State and succinctly explain the two mechanisms customarily provided by the transaction
manager (of a DBMS) for the implementation of transactions
[4]
Answer
a)
A transaction is a sequence of database operations that represents a logical unit of work.
[2]
An example could be given in the context of a database that stores some redundant data (e.g.,
each loan, in a library database, is stored explicitly, but the total number of loans is also explicitly
stored for each borrower) – a transaction is required when such data is updated.
[2]
b)
COMMIT - is issued at the end of a transaction (i.e. after all the operations of the transaction were
successfully executed); once a DBMS received a COMMIT, the respective transaction is
guaranteed to be executed.
[2]
ROLLBACK - is issues during a transaction if an error occurred in the execution of one of its
operations; once a DBMS received a ROLLBACK, all the performed operations of the respective
transaction are guaranteed to be undone.
[2]
Accept ‘the locking mechanism’ and the system’s ‘log’. Award 2 marks for correct description of
each, but the maximum of marks, in this case, should not go over 3.
TOTAL [8]
Question 3
[7]
a) Explain what is it meant by impedance mismatch in the context of relational database systems.
[5]
b) Consider two real life systems, A and B. Each requires the support of a database systems.
System A consists of very many types of data objects (or entities), but each type (or entity) has
only a few instances. System B consists of a moderate number of types of data objects (or
entities), but each type (or entity) has very many instances. Disregarding any other constraints,
for which system would you propose the use of a relational DBMS?
[2]
Answer
a)
In applications based on relational databases, data has to be translated between the way it is
stored/represented on/in the database (the database’s data types) and the way it is represented
in the application programmes (the data types of the programming language). Usually, the data
types used by a relational database do not coincide with the data types used by a programming
language. This is called impedance mismatch, and may cause the corruption of data.
[5]
b)
System B.
[2]
TOTAL [7]
CIS209 – IS52003A
2003 Internal Solutions
11