* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Chapter 6 review
Survey
Document related concepts
Transcript
BTM 382 Database Management Chapter 6: Normalization of Database Tables Chitu Okoli Associate Professor in Business Technology Management John Molson School of Business, Concordia University, Montréal Structure of BTM 382 Database Management Week 1: Introduction and overview ch1: Introduction Weeks 2-6: Database design ch3: Relational model ch4: ER modeling ch6: Normalization ERD modeling exercise ch5: Advanced data modeling Week 7: Midterm exam Weeks 8-10: Database programming ch7: Intro to SQL ch8: Advanced SQL SQL exercises Weeks 11-13: Database management ch2,12,14: Data models ch13: Business intelligence and data warehousing ch9,15,16: Selected managerial topics Review of Chapter 6: Normalization of Database Tables What are dependencies between attributes in a table, and how is tracing of dependencies used to normalize tables? What are the normal forms in a relational database? Why and when would you consider denormalizing tables in a relational database? Problems with unnormalized tables Needless redundancy, hence insert, update and delete anomalies (inconsistencies) Data updates are less efficient because tables are larger Indexing is less effective Views (virtual tables) are more cumbersome Understanding dependencies to be able to properly normalize tables Functional dependency Functional dependency: A→B or (A,B)→(C,D) “B is functionally dependent on A” means that if you know A, then there you definitely know the correct value for B E.g. Project.ID → Project.Name Also called determination: “A determines B” Full functional dependency: (A,B)→C where A↛C and B↛C When all the attributes in a key are required for the determination (none is optional) E.g. (Project.ID, Project.Manager) → Project.Name Project.Manager is optional—this is not a full functional dependency E.g. (Project.Manager, Project.StartDate) → Project.Name This is a full functional dependency, assuming a manager can launch no more than one project on a given date Repeating group = multivalued attribute Attribute whose values contain multiple values (a list or array of values), instead of a single value Illegal in the relational model; troublesome for normalization if you don’t catch it Two possible solutions (e.g. Project.ID → Project.Location): 1. Create multiple attributes for each possible value (e.g. Project.Location1, Project.Location2, Project.Location3) 2. Create a new entity to store multiple possible values (e.g. Location) Multivalued dependency Functional dependency: A→B Multivalued dependency: A↠B,C A determines B and A determines C, but B and C have nothing to do with each other E.g. Project.ID ↠ Project.EmployeeID, Project.Location Since a project might have multiple locations and multiple employees work on a project, the EmployeeID and Location in the same row might have nothing to do with each other Usually indicates that one or more multivalued attributes were not handled properly Partial and transitive dependencies Partial dependency: (A,B)→(C,D) and B→C (A,B) is a candidate key (e.g. primary key) C doesn’t need both A and B to determine it; it only needs B E.g. (Project.ID,Project.ManagerID) → Project.Name and Project.ID → Project.Name Transitive dependency: A→(B,C) and B→C A is a candidate key Technically speaking, a transitive dependency requires that B and C not be part of any candidate key. However, if you expand the meaning to include even if they are part of the key, then you will automatically avoid problems with BCNF A determines C, but so does B, even though B is not a candidate key E.g. Project.ID → (Project.Client,Project.Location) and Project.Client → Project.Location The normal forms Normalization of relations: https://youtu.be/NwcVv1cxflk (note and 0:34 and 1:50) Summary of attaining normal forms 1NF: Primary key identified and no multivalued attributes Legitimate primary key selected (unique identifying key) Only one value per table cell; no lists/arrays (multivalued attributes) in any table cell If you split multivalued attributes off to separate tables, then you avoid 4NF violations 2NF: 1NF minus partial dependencies All candidate key dependencies are fully functional (A,B)→C where A↛C and B↛C 3NF/BCNF: 2NF minus transitive dependencies Only a candidate key determines any attribute If A→(B,C), then B ↛ C There is a technical distinction between 3NF and BCNF, but if you keep this rule, then you take care of both 3NF and BCNF 4NF: BCNF minus multivalued dependencies Each row strictly describes just one entity If you split multivalued attributes into separate tables to attain 1NF, then you also avoid 4NF violations DKNF, 5NF, 6NF relatively rare and often not worth the trouble normalizing, even if applicable Dependency diagram: Basic tool for normalization Depicts all dependencies found in a given table structure Gives bird’s-eye view of all relationships among table’s attributes Makes it less likely that you will overlook an important dependency 3NF vs BCNF BCNF is only an issue because of poor selection of primary key for 1NF step Regardless, dealing with all dependencies resolves table into BCNF Fixing 4NF problem The only reason a table might be in 3NF/BCNF but not in 4NF is because two originally multivalued attributes existed at 1NF stage Multivalued attributes should always be placed in separate tables, or be split into multiple attributes If you do this in the first step to resolve 1NF, you will never have problems with 4NF Denormalization Denormalization Although normalization is important, processing speed and efficiency are also important in database design Summary of Chapter 6: Normalization of Database Tables Correctly identifying dependencies from the very beginning is critical to properly normalize tables. The most important normal forms are 1NF, 2NF, 3NF, BCNF and 4NF. Although normalization to 4NF is usually important, a designer might sometimes want to denormalize a table to a lower normal form. Sources Most of the slides are adapted from Database Systems: Design, Implementation and Management by Carlos Coronel and Steven Morris. 11th edition (2015) published by Cengage Learning. ISBN 13: 978-1-285-19614-5 Other sources are noted on the slides themselves