Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
1 MIS 3500 Database Management Systems * Asper School of Business Instructor: Bob Travica Lab on Multiplicity and Normalization (Updated January 2014) This lab demonstrates the concepts of multiplicity, normalization, referential integrity, and data anomalies associated with theory of relational databases. You will use MS Access for exploring the concepts. You will create some tables, alter them, and test them. The goal is to study principles of relational database systems, rather than to focus on procedures of developing tables. A basic level of knowledge of using MS Access is assumed. 1. Multiplicity To get started, find the MS Access icon on the desktop and click it. Then, create a new database. Understanding normalization starts with understanding multiplicity. Discussed below are the three types of multiplicity we studied in class, which determine three types of relationships. 1.1 One-to-Many Multiplicity (1:M) This is the most frequent relationship in database systems. To establish the 1:M relationship, you need to import the key from the table on the “one side” into the table on the “many side”. Sometimes these are called “left table” and “right table”, although this labeling corresponds to the direction of exporting the key rather than physical locations of table. (See Figure 1) When tables have single-attribute keys, then the direction of drawing the relationship determines which is the “one side” or “left table”. So, the table can really be on the right side, but if you start drawing the relationship from its key to an attribute in the other table, the former one will be considered the “left table” or “one side table” while the other table will be the “right table” or “many side table”. Figure1. Creating Relationship between Tables Destination Table, Many Side Table, “Right Table” Table X XID Attribute YID * 1 1 Table Y YID Attribute Source Table, One Side Table, “Left Table” 2 The key in the One Side table (YID) is the foreign key in the Many Side table. To study the one-to-many relationship, run the set of procedures below for creating tables Customer and Order. Let us assume that the relationship between Customer and Order is based on the business rule that each customer can place many orders, while each order must be associated with only one customer. Therefore, this is oneto-many relationship. Be sure to choose Number as the data type for the keys. If you use the default data type called AutoNumber, you may run into troubles when you want to modify tables and enter values for a foreign key. Run this set of procedures: 1. To use Design View, click Create and then Table Design. Then, create a Table that you will name Customer. Specifically, create CustomerID field, chose Number for the data type, and make this attribute Primary Key (right-click the row and choose Primary Key). Then, create a column called Name, and set Text for the data type. See Figure 2. 2. Create a Table you will name Order. Create a field OrderID, set the data type to Number, and make this column the Primary Key. Also, create columns CustomerID (Number) and OrderDate (Date/Time). 3. Establish a relationship between CustomerID in Customer and CustomerID in Order. To do this, first you have to close the tables you have just created. Then, use Database Tools/Relationships, and bring the tables on the screen, if they are not there yet (e.g., right-clicking in the relationship window will pop up a menu with the function Show Table). Draw the relationship by pressing the mouse left button and driving the cursor from CustomerID in Customer to CustomerID in Order. 4. In the form Edit Relationship that popped up, click the Enforce Referential Integrity checkbox (you can leave blank the options on updating and deletion or you can check any of these). Look at the Relationship Type text box in this form, and notice that Access automatically names this relationship One-toMany. 2 3 Click the Create button. If all went OK, you should see in the Relationship window a link between your tables and multiplicity 1 and many (the infinity symbol) displayed. 5. Once both tables are finalized, enter some data in each (see Figure 2). For dates, use the Calendar help; for year, use the last year rather than the one in the example shown. Figure 2. One-To-Many Relationship Customer Order CustomerID Name OrderID CustomerID OrderDate 100 Piere 1 100 Jan. 3, 2005 200 John 2 100 Jan. 13, 2005 300 Micaela 3 200 Feb. 5, 2005 400 Rose 4 200 May 15, 2005 Note: The key column is boldfaced. Analysis: 1) Notice how the same values of CustomerID repeat in Order, while OrderID takes on different values in each row. This is because each (one) instance of Customer can be associated with many instances of Order (e.g., CustomerID 100 is associated with OrderIDs 1 and 2). In contrast, each value of OrderID corresponds to only one value of CustomerID (OrderID 1 is associated with CustomerID 100 only, OrderID 2 with CustomerID 100 only, OrderID 3 with CustomerID 200 only, etc.). 2) As depicted in Figure 2, the Order contains only those values of CustomerID that exist in the table Customer. This means that referential integrity is supported. In other words, only those values of CustomerID that already exist in the Customer table can be used in the Order table. Without referential integrity, the Order table can contain customer identifiers that do not exist in customer records. Try to see what happens when you violate referential integrity. For example, try to enter the number 500 in the CustomerID column of the Order table. What happens? Lastly, notice that the customers with CustomerIDs 300 and 400 are not associated with any order yet. 3 4 1.2 One-to-One Multiplicity (1:1) In contrast to the frequently used one-to-many multiplicity, one-to-one multiplicity is used much less. The one-to-one multiplicity is created by sharing the same key between two tables. An example of 1:1 multiplicity is between the Customer table and the BillingAddress table shown in Figure 3. The logic behind separating the second address from the customer record is that not all customers will have the second address, and so inserting its column in Customer would waste the storage. Concentrate on Figure 3. You only have to create a new table called BillingAddress with the columns CustomerID (Number, key), and BillingAddress (Text). Modify the Customer table by adding the column Address (Text). To use the design modification function, open a table, and then click View/Design View. Note: You can simplify the address values; those shown are just for fun. ;-) Establish a relationship between Customer and BillingAddress tables, and enforce referential integrity. In drawing the relationship, note again that it does matter which is the “left” and which the “right” table, although they are both “one” table in terms of multiplicity. Figure 3. One-to-One Relationship Customer CustomerID Name Address 100 Mick Jagger 57 Rolling Stone Sq., London, England 200 Keith Richards 33 RR Pkwy, London, England 300 Charlie Watts 75 Roll’n’Rock Dr., England BillingAddress CustomerID Billing Address 100 Mike’s Yacht Club, Cote d’Azur 300 Castillo Hermoso, Costa Brava, Spain 4 5 1.3 Many-to-Many Multiplicity (M:N) Many-to-many relationships occur in class diagrams to reflect frequent business situations (e.g., an item can appear on many orders, and an order can contain many items). To be implemented in a relational database system, M:N relationships must be transformed into 1:M relationships through the technique of data normalization. (More on this in the section on normalization below.) Let us try to implement a design that is not normalized first. Note that this is just for the purpose of study—not the design you should ever implement in reality. To create a M:N relationship using the technique described in Section 1.1, you need to make one table’s key a foreign key in the other table. Therefore, there will be two foreign keys linking two tables in a M:N relationship. Run the following set of procedures in order to study the M:N relationship. At the end, you should have new tables Order1 and Item created and connected. (See Figure 4) Figure 4. Many-to-Many Relationship (Note: Tables are not normalized and we use them just for study purposes—not in a properly designed database system.) Order1 Order1ID ItemID CustomerID Date 1 10 100 1/3/2005 2 10 200 1/3/2005 Note: Business rule (the first half): Each item appears on many orders… Item ItemID Order1ID ItemName 10 1 Nut 20 1 Bolt Note: Business rule (the second half): … and each order can contain many items. Notice above that item 10 appears on orders 1 and 2 (table Order1), while order 1 contains items 10 and 20 (table Item). To create these tables: 5 6 1. Make a copy of the Order table, and name it Order1. A quick method is to open Order table, click Save As/Save Object As, and type Order1. Open Order1, and delete all the data from it. Rename OrderID into Order1ID. This procedure speeds up your work, while making sure that you can manipulate data as you wish. For example, if you want to add another field to the key in the existing Order table and so make a combined (concatenated) key but you did not delete the old data from Order1, Access will report the error that the key field cannot be null (see step 2). 2. In the table Order1, add a new column ItemID (Number). Hint: in Design View, right-click the column CustomerID; in the popup menu, click Insert Column. You want to name the newly created column ItemID. 3. Set Order1ID and ItemID to be a concatenated key. Hint: Press Control key, and click the left-most column of each column’s name; in the popup menu, select Primary Key (or click the Primary Key button in the upper Table Design menu; look for an image of key on the buttons displayed). 4. Create the table Item with columns ItemID (Number), Order1ID (Number), and ItemName (Text). Make ItemID and Order1ID a combined key. 5. Set a M:N relationship between the tables Order1 and Item, by drawing two relationships. The first relationship is between the column Order1ID in the table Order1 and the column Order1ID in the table Item. What happens with referential integrity? You can see that it is not possible to enforce it since DBMS reports an error. In addition, the relationship cannot be designed as 1:M but only as “indeterminate”. This is what happens when you try to force a M:N relationship on the system—it will not be accepted. You can just “fudge it” (simulate it), and the system will give you no guarantee that data will be integrated. 6. Set the other relationship between the column ItemID in the table Item and the column ItemID in the table Order1. You do experience the same problems as in step 4. The database engine will ask you if you want to edit the existing relationship, and you should answer ‘No’. This will result in creating a new relationship between Order1 and a new table the system will create and name Item_1. This is how the database engine responds to your essentially illegal request. 7. Enter some data in the tables (perhaps it is best for now to use the example in Figure 4). To keep it simple, let us assume that ItemID takes values of two-digit numbers 10, 20, etc. 6 7 Analysis: The design you created represents the business rule that each item can be associated with many orders, while each order can contain many items. Thus, the many-to-many relationship between Order1 and Item appears as if being implemented. However, Access actually does not support this relationship, and that is why you are getting the strange “indeterminate” relationships and a rejection to your request to enforce referential integrity. Indeed, this design is not normalized, and therefore it is inappropriate for a relational database system. For the things that can go wrong with this design, please see the next section. 2. Normalization Data in a relational database system must be normalized. The purpose of normalizing data is to preserve data quality (accuracy, integrity) and to avoid problems (“anomalies”) with data insertion, modification and deletion. 2.1 Problems with Non-Normalized Data Referential integrity Loss. You have already encountered the issue of normalization when exploring referential integrity in this lab. Normalized data support referential integrity, while non-normalized data do not. Consider again the one-to-many relationship between the tables Customer and Order in Figure 2. When you tried to enter such a value of CustomerID in the table Order that did not exist already in the table Customer, the database engine stopped you. Otherwise, a user could enter orders for non-existing customers. In contrast, the non-normalized design in Figure 4 allows you to enter FKs that do not match PKs. For exercise, try to enter Oder1ID 2000 in the table Item. What happens? Indeed, you can enter any data in the columns for primary and foreign keys, and the system will not detect errors. To explore other disadvantages of non-normalized data (and advantages of normalized data), let us create a table CustomerLong that mixes master data with 7 8 transactional data. Therefore, this table will contain a repeated group (customer data repeat for each order; note that a real customer record would be much longer). As shown in Figure 5, CustomerLong merges tables Customer and Order from Figure 2. To quickly create the table CustomerLong do the following. 1. Make a copy of the table Customer and name it CustomerLong. 2. Open CustomerLong in Design view and remove the key property from the column CustomerID. This has to be done because you will be having some rows with same values in CustomerID and the system does not tolerate this. So, you have to “fool the system” for the sake of this exercise by having a table with no key column for a while. 3. Create columns OrderID (Number) and OrderDate (Date/Time). 4. Enter additional data for Piere and John as shown in Figure 5. 5. Make columns CustomerID and OrderID the primary key. Figure 5. Non-normalized Table with Repeated Group Customer CustomerLong CustomerID Name OrderID OrderDate 100 Piere 1 Jan. 3, 2005 100 Piere 2 Jan. 3, 2005 200 John 3 Feb. 5, 2005 200 John 4 May 15, 2005 Now, let us see what happens if we try to perform standard database operations of deleting and inserting data. Deletion Anomaly: If you delete records on orders #1 and #2 in CustomerLong (say, the customer #100 cancels the orders), you will loose the data on this Customer as well. Therefore, there is deletion anomaly—undesired loss of data when some data are deleted. Insertion Anomaly: To enter a customer record for a new customer Micaela you would need to also enter OrderIDs for each record. (It is assumed that this situation occurs when the minimum multiplicity for Order is zero). This is so because the table uses a combined key (CustomerID+OrderID), and the DBMS does not allow key columns to be blank (to have the null value). Your choices are (a) to wait until Micaela and Rose place orders and then enter customer data or (b) to make up the order data (see the made-up number 9999999 in Figure 6). Therefore, there is 8 9 insertion anomaly—the desired data cannot be entered when so needed or some violation of data accuracy must be applied. Business consequences of this design, if implemented, are extremely serious (falsification of facts). 9 10 Figure 6. Insertion Anomaly CustomerLong CustomerID Name OrderID Date 100 Piere 1 Jan. 3, 2005 100 Piere 2 Jan. 3, 2005 200 John 3 Feb. 5, 2005 200 John 4 May 15, 2005 300 Micaela 9999999 Modification Anomaly: Normalized tables support the principle of referential integrity, which ensures that values of the FK are kept dependent on the values of the PK. Consider the tables Customer, Order, and BillingAddress. Open the table Customer, and change the CustomerID from 100 to 110. Try to save the table. Two outcomes may result from your attempt to modify data. If you established the relationships between Customer and the related tables without enabling cascading update (modification) and deletion, the database engine will inform you that the change cannot be saved because CustomerID exists in those other tables. If cascading update and deletion are enabled, your new value of CustomerID will appear in related rows of tables Order and BillingAddress. Now, look back at your table CustomerLong. Since the table is not normalized (broken down to two inter-linked tables), no dependencies and controls described above have been established. Therefore, the user can modify key values arbitrarily, and the system will tolerate this. To prevent this form happening, the table CustomerLong must be normalized, that is, brought into 3NF, as explained below. 2.2 Normalizing Many-to-Many Relationships A relational database system is most comfortable with the 1:M relationship. In this form, data integrity can be preserved and querying properly performed. This means that a M:N relationship must be transformed into two 1:M relationships. The method of normalizing this M:N relationship is by inserting a new table that will “bridge or link” the tables Order and Item. You can think of this new table as the one that is "absorbing" or "reducing" the multiplicity between tables Order and Item. Both Order and Item will have a separate 1:M relationship with this bridge table (OrderItem1 in Figure 7). To see how this works, run the following set of procedures. Your result should resemble what is depicted in Figure 7. 10 11 Figure 7. Normalized M:N Relationship Order 1 * OrderItem1 OrderID OrderID CustomerID Item1ID Date Quantity 1 * Item1 Item1ID Item1Name To complete this exercise do the following. 1. Use the old Order table. 2. Create a new table Item1 with fields Item1ID (Number, PK) and Item1Name (Text). 3. Create a new table OrderItem1 with fields Item1ID (Number, PK), OrderID (Number, PK), and Quantity (Number). 3. Close the tables, and establish a 1:M relationship between Order and OrderItem1, while enforcing referential integrity and the cascading update and deletion. 4. Establish a relationship between Item1 and OrderItem1, while enforcing referential integrity and the cascading update and deletion. 5. Enter some data into Item1. Test your design for normalization. Open all the three tables (Order, Item1 and OrderItem1). Enter data in OrderItem1. What determines the range of acceptable values for the key columns? Why? Once you have some records in OrderItem1, try to change values of the keys in Order and in Item1. What happens with foreign keys in OrderItem1? Try to delete some records in Order and Item1. What happens in OrderItem1? Why? 2.3 Normalizing One-to-One Relationships The tables Customer and BillingAddress in Section 1.2, Figure 3, are already normalized since all non-key attributes depend on the key only (3NF). 11 12 Run some normalization tests. For example, try to insert in BillingAddress a value of CustomerID that does not exist in Customer. What happens? You can still insert a new Customer since this is considered the “left” table (the one form which the key is exported, provided that you drew the relationship from Customer to BillingAddress). Therefore, there is no insertion anomaly. Try to delete a record from Customer. What happens with the related table in BillingAddress? (The referenced record should be deleted as well.) That’s all, folks! (For now ) 12