Download on Multiplicity and Normalization

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Clusterpoint wikipedia , lookup

Functional Database Model wikipedia , lookup

Ingres (database) wikipedia , lookup

Object-relational impedance mismatch wikipedia , lookup

Entity–attribute–value model wikipedia , lookup

Join (SQL) wikipedia , lookup

Extensible Storage Engine wikipedia , lookup

Relational model wikipedia , lookup

Database model wikipedia , lookup

Transcript
1
MIS 3500 Database Management Systems * Asper School of Business
Instructor: Bob Travica
Lab on Multiplicity and Normalization
(Updated January 2014)
This lab demonstrates the concepts of multiplicity, normalization, referential
integrity, and data anomalies associated with theory of relational databases. You will
use MS Access for exploring the concepts.
You will create some tables, alter them, and test them. The goal is to study
principles of relational database systems, rather than to focus on procedures of
developing tables. A basic level of knowledge of using MS Access is assumed.
1. Multiplicity
To get started, find the MS Access icon on the desktop and click it. Then, create a
new database.
Understanding normalization starts with understanding multiplicity. Discussed below
are the three types of multiplicity we studied in class, which determine three types of
relationships.
1.1 One-to-Many Multiplicity (1:M)
This is the most frequent relationship in database systems. To establish the 1:M
relationship, you need to import the key from the table on the “one side” into the
table on the “many side”. Sometimes these are called “left table” and “right table”,
although this labeling corresponds to the direction of exporting the key rather than
physical locations of table. (See Figure 1)
When tables have single-attribute keys, then the direction of drawing the relationship
determines which is the “one side” or “left table”. So, the table can really be on the
right side, but if you start drawing the relationship from its key to an attribute in the
other table, the former one will be considered the “left table” or “one side table”
while the other table will be the “right table” or “many side table”.
Figure1. Creating Relationship between Tables
Destination Table,
Many Side Table,
“Right Table”
Table X
XID
Attribute
YID
*
1
1
Table Y
YID
Attribute
Source Table,
One Side Table,
“Left Table”
2
The key in the One Side table (YID) is the foreign key in the Many Side table.
To study the one-to-many relationship, run the set of procedures below for creating
tables Customer and Order. Let us assume that the relationship between Customer
and Order is based on the business rule that each customer can place many orders,
while each order must be associated with only one customer. Therefore, this is oneto-many relationship.
Be sure to choose Number as the data type for the keys. If you use the default data
type called AutoNumber, you may run into troubles when you want to modify tables
and enter values for a foreign key.
Run this set of procedures:
1. To use Design View, click Create and then Table Design. Then, create a Table
that you will name Customer.
Specifically, create CustomerID field, chose Number for the data type, and
make this attribute Primary Key (right-click the row and choose Primary Key).
Then, create a column called Name, and set Text for the data type. See
Figure 2.
2. Create a Table you will name Order. Create a field OrderID, set the data type
to Number, and make this column the Primary Key. Also, create columns
CustomerID (Number) and OrderDate (Date/Time).
3. Establish a relationship between CustomerID in Customer and CustomerID in
Order. To do this, first you have to close the tables you have just created.
Then, use Database Tools/Relationships, and bring the tables on the screen, if
they are not there yet (e.g., right-clicking in the relationship window will pop
up a menu with the function Show Table).
Draw the relationship by pressing the mouse left button and driving the
cursor from CustomerID in Customer to CustomerID in Order.
4. In the form Edit Relationship that popped up, click the Enforce Referential
Integrity checkbox (you can leave blank the options on updating and deletion
or you can check any of these). Look at the Relationship Type text box in this
form, and notice that Access automatically names this relationship One-toMany.
2
3
Click the Create button. If all went OK, you should see in the Relationship
window a link between your tables and multiplicity 1 and many (the infinity
symbol) displayed.
5. Once both tables are finalized, enter some data in each (see Figure 2). For
dates, use the Calendar help; for year, use the last year rather than the one
in the example shown.
Figure 2. One-To-Many Relationship
Customer
Order
CustomerID Name
OrderID CustomerID OrderDate
100
Piere
1
100
Jan. 3, 2005
200
John
2
100
Jan. 13, 2005
300
Micaela
3
200
Feb. 5, 2005
400
Rose
4
200
May 15, 2005
Note: The key column is boldfaced.
Analysis:
1) Notice how the same values of CustomerID repeat in Order, while OrderID takes
on different values in each row. This is because each (one) instance of Customer can
be associated with many instances of Order (e.g., CustomerID 100 is associated with
OrderIDs 1 and 2). In contrast, each value of OrderID corresponds to only one value
of CustomerID (OrderID 1 is associated with CustomerID 100 only, OrderID 2 with
CustomerID 100 only, OrderID 3 with CustomerID 200 only, etc.).
2) As depicted in Figure 2, the Order contains only those values of CustomerID that
exist in the table Customer. This means that referential integrity is supported. In
other words, only those values of CustomerID that already exist in the Customer
table can be used in the Order table. Without referential integrity, the Order table
can contain customer identifiers that do not exist in customer records.
Try to see what happens when you violate referential integrity. For example, try to
enter the number 500 in the CustomerID column of the Order table. What happens?
Lastly, notice that the customers with CustomerIDs 300 and 400 are not associated
with any order yet.
3
4
1.2 One-to-One Multiplicity (1:1)
In contrast to the frequently used one-to-many multiplicity, one-to-one multiplicity is
used much less. The one-to-one multiplicity is created by sharing the same key
between two tables.
An example of 1:1 multiplicity is between the Customer table and the BillingAddress
table shown in Figure 3. The logic behind separating the second address from the
customer record is that not all customers will have the second address, and so
inserting its column in Customer would waste the storage.
Concentrate on Figure 3. You only have to create a new table called BillingAddress
with the columns CustomerID (Number, key), and BillingAddress (Text).
Modify the Customer table by adding the column Address (Text). To use the design
modification function, open a table, and then click View/Design View. Note: You can
simplify the address values; those shown are just for fun. ;-)
Establish a relationship between Customer and BillingAddress tables, and enforce
referential integrity. In drawing the relationship, note again that it does matter which
is the “left” and which the “right” table, although they are both “one” table in terms
of multiplicity.
Figure 3. One-to-One Relationship
Customer
CustomerID Name
Address
100
Mick
Jagger
57 Rolling Stone Sq., London,
England
200
Keith
Richards
33 RR Pkwy, London, England
300
Charlie
Watts
75 Roll’n’Rock Dr., England
BillingAddress
CustomerID Billing Address
100
Mike’s Yacht Club, Cote d’Azur
300
Castillo Hermoso, Costa Brava, Spain
4
5
1.3 Many-to-Many Multiplicity (M:N)
Many-to-many relationships occur in class diagrams to reflect frequent business
situations (e.g., an item can appear on many orders, and an order can contain many
items). To be implemented in a relational database system, M:N relationships must
be transformed into 1:M relationships through the technique of data normalization.
(More on this in the section on normalization below.)
Let us try to implement a design that is not normalized first. Note that this is just for
the purpose of study—not the design you should ever implement in reality.
To create a M:N relationship using the technique described in Section 1.1, you need
to make one table’s key a foreign key in the other table. Therefore, there will be two
foreign keys linking two tables in a M:N relationship.
Run the following set of procedures in order to study the M:N relationship. At the
end, you should have new tables Order1 and Item created and connected. (See
Figure 4)
Figure 4. Many-to-Many Relationship
(Note: Tables are not normalized and we use them just for study purposes—not in a properly
designed database system.)
Order1
Order1ID ItemID CustomerID
Date
1
10
100
1/3/2005
2
10
200
1/3/2005
Note: Business rule (the first half): Each item appears on many orders…
Item
ItemID Order1ID ItemName
10
1
Nut
20
1
Bolt
Note: Business rule (the second half): … and each order can contain many items.
Notice above that item 10 appears on orders 1 and 2 (table Order1), while order 1
contains items 10 and 20 (table Item).
To create these tables:
5
6
1. Make a copy of the Order table, and name it Order1. A quick method is to
open Order table, click Save As/Save Object As, and type Order1.
Open Order1, and delete all the data from it. Rename OrderID into Order1ID.
This procedure speeds up your work, while making sure that you can
manipulate data as you wish. For example, if you want to add another field to
the key in the existing Order table and so make a combined (concatenated)
key but you did not delete the old data from Order1, Access will report the
error that the key field cannot be null (see step 2).
2. In the table Order1, add a new column ItemID (Number). Hint: in Design
View, right-click the column CustomerID; in the popup menu, click Insert
Column. You want to name the newly created column ItemID.
3. Set Order1ID and ItemID to be a concatenated key. Hint: Press Control key,
and click the left-most column of each column’s name; in the popup menu,
select Primary Key (or click the Primary Key button in the upper Table Design
menu; look for an image of key on the buttons displayed).
4. Create the table Item with columns ItemID (Number), Order1ID (Number),
and ItemName (Text). Make ItemID and Order1ID a combined key.
5. Set a M:N relationship between the tables Order1 and Item, by drawing two
relationships. The first relationship is between the column Order1ID in the
table Order1 and the column Order1ID in the table Item. What happens with
referential integrity? You can see that it is not possible to enforce it since
DBMS reports an error. In addition, the relationship cannot be designed as
1:M but only as “indeterminate”. This is what happens when you try to force a
M:N relationship on the system—it will not be accepted. You can just “fudge
it” (simulate it), and the system will give you no guarantee that data will be
integrated.
6. Set the other relationship between the column ItemID in the table Item and
the column ItemID in the table Order1. You do experience the same problems
as in step 4. The database engine will ask you if you want to edit the existing
relationship, and you should answer ‘No’. This will result in creating a new
relationship between Order1 and a new table the system will create and name
Item_1. This is how the database engine responds to your essentially illegal
request.
7. Enter some data in the tables (perhaps it is best for now to use the example
in Figure 4). To keep it simple, let us assume that ItemID takes values of
two-digit numbers 10, 20, etc.
6
7
Analysis: The design you created represents the business rule that each item can be
associated with many orders, while each order can contain many items. Thus, the
many-to-many relationship between Order1 and Item appears as if being
implemented.
However, Access actually does not support this relationship, and that is why you are
getting the strange “indeterminate” relationships and a rejection to your request to
enforce referential integrity. Indeed, this design is not normalized, and therefore it is
inappropriate for a relational database system. For the things that can go wrong with
this design, please see the next section.
2. Normalization
Data in a relational database system must be normalized. The purpose of
normalizing data is to preserve data quality (accuracy, integrity) and to avoid
problems (“anomalies”) with data insertion, modification and deletion.
2.1 Problems with Non-Normalized Data
Referential integrity Loss. You have already encountered the issue of normalization
when exploring referential integrity in this lab. Normalized data support referential
integrity, while non-normalized data do not.
Consider again the one-to-many relationship between the tables Customer and Order
in Figure 2. When you tried to enter such a value of CustomerID in the table Order
that did not exist already in the table Customer, the database engine stopped you.
Otherwise, a user could enter orders for non-existing customers. In contrast, the
non-normalized design in Figure 4 allows you to enter FKs that do not match PKs.
For exercise, try to enter Oder1ID 2000 in the table Item. What happens?
Indeed, you can enter any data in the columns for primary and foreign keys, and the
system will not detect errors.
To explore other disadvantages of non-normalized data (and advantages of
normalized data), let us create a table CustomerLong that mixes master data with
7
8
transactional data. Therefore, this table will contain a repeated group (customer data
repeat for each order; note that a real customer record would be much longer). As
shown in Figure 5, CustomerLong merges tables Customer and Order from Figure 2.
To quickly create the table CustomerLong do the following.
1. Make a copy of the table Customer and name it CustomerLong.
2. Open CustomerLong in Design view and remove the key property from the column
CustomerID. This has to be done because you will be having some rows with same
values in CustomerID and the system does not tolerate this. So, you have to “fool
the system” for the sake of this exercise by having a table with no key column for a
while.
3. Create columns OrderID (Number) and OrderDate (Date/Time).
4. Enter additional data for Piere and John as shown in Figure 5.
5. Make columns CustomerID and OrderID the primary key.
Figure 5. Non-normalized Table with Repeated Group Customer
CustomerLong
CustomerID
Name
OrderID
OrderDate
100
Piere
1
Jan. 3, 2005
100
Piere
2
Jan. 3, 2005
200
John
3
Feb. 5, 2005
200
John
4
May 15, 2005
Now, let us see what happens if we try to perform standard database operations of
deleting and inserting data.
Deletion Anomaly: If you delete records on orders #1 and #2 in CustomerLong (say,
the customer #100 cancels the orders), you will loose the data on this Customer as
well. Therefore, there is deletion anomaly—undesired loss of data when some data
are deleted.
Insertion Anomaly: To enter a customer record for a new customer Micaela you
would need to also enter OrderIDs for each record. (It is assumed that this situation
occurs when the minimum multiplicity for Order is zero). This is so because the table
uses a combined key (CustomerID+OrderID), and the DBMS does not allow key
columns to be blank (to have the null value). Your choices are (a) to wait until
Micaela and Rose place orders and then enter customer data or (b) to make up the
order data (see the made-up number 9999999 in Figure 6). Therefore, there is
8
9
insertion anomaly—the desired data cannot be entered when so needed or some
violation of data accuracy must be applied. Business consequences of this design, if
implemented, are extremely serious (falsification of facts).
9
10
Figure 6. Insertion Anomaly
CustomerLong
CustomerID
Name
OrderID
Date
100
Piere
1
Jan. 3, 2005
100
Piere
2
Jan. 3, 2005
200
John
3
Feb. 5, 2005
200
John
4
May 15, 2005
300
Micaela
9999999
Modification Anomaly: Normalized tables support the principle of referential integrity,
which ensures that values of the FK are kept dependent on the values of the PK.
Consider the tables Customer, Order, and BillingAddress. Open the table Customer,
and change the CustomerID from 100 to 110. Try to save the table.
Two outcomes may result from your attempt to modify data. If you
established the relationships between Customer and the related tables without
enabling cascading update (modification) and deletion, the database engine will
inform you that the change cannot be saved because CustomerID exists in those
other tables. If cascading update and deletion are enabled, your new value of
CustomerID will appear in related rows of tables Order and BillingAddress.
Now, look back at your table CustomerLong. Since the table is not normalized
(broken down to two inter-linked tables), no dependencies and controls described
above have been established. Therefore, the user can modify key values arbitrarily,
and the system will tolerate this. To prevent this form happening, the table
CustomerLong must be normalized, that is, brought into 3NF, as explained below.
2.2 Normalizing Many-to-Many Relationships
A relational database system is most comfortable with the 1:M relationship. In this
form, data integrity can be preserved and querying properly performed. This means
that a M:N relationship must be transformed into two 1:M relationships.
The method of normalizing this M:N relationship is by inserting a new table that will
“bridge or link” the tables Order and Item. You can think of this new table as the one
that is "absorbing" or "reducing" the multiplicity between tables Order and Item.
Both Order and Item will have a separate 1:M relationship with this bridge table
(OrderItem1 in Figure 7).
To see how this works, run the following set of procedures. Your result should
resemble what is depicted in Figure 7.
10
11
Figure 7. Normalized M:N Relationship
Order
1
*
OrderItem1
OrderID
OrderID
CustomerID
Item1ID
Date
Quantity
1
*
Item1
Item1ID
Item1Name
To complete this exercise do the following.
1. Use the old Order table.
2. Create a new table Item1 with fields Item1ID (Number, PK) and Item1Name
(Text).
3. Create a new table OrderItem1 with fields Item1ID (Number, PK), OrderID
(Number, PK), and Quantity (Number).
3. Close the tables, and establish a 1:M relationship between Order and OrderItem1,
while enforcing referential integrity and the cascading update and deletion.
4. Establish a relationship between Item1 and OrderItem1, while enforcing
referential integrity and the cascading update and deletion.
5. Enter some data into Item1.
Test your design for normalization. Open all the three tables (Order, Item1 and
OrderItem1).
Enter data in OrderItem1. What determines the range of acceptable values for the
key columns? Why?
Once you have some records in OrderItem1, try to change values of the keys in
Order and in Item1. What happens with foreign keys in OrderItem1?
Try to delete some records in Order and Item1. What happens in OrderItem1? Why?
2.3 Normalizing One-to-One Relationships
The tables Customer and BillingAddress in Section 1.2, Figure 3, are already
normalized since all non-key attributes depend on the key only (3NF).
11
12
Run some normalization tests. For example, try to insert in BillingAddress a value of
CustomerID that does not exist in Customer. What happens? You can still insert a
new Customer since this is considered the “left” table (the one form which the key is
exported, provided that you drew the relationship from Customer to BillingAddress).
Therefore, there is no insertion anomaly.
Try to delete a record from Customer. What happens with the related table in
BillingAddress? (The referenced record should be deleted as well.)
That’s all, folks!
(For now )
12