Download Chapter 6 review

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Database wikipedia , lookup

Encyclopedia of World Problems and Human Potential wikipedia , lookup

Relational algebra wikipedia , lookup

Clusterpoint wikipedia , lookup

Entity–attribute–value model wikipedia , lookup

Functional Database Model wikipedia , lookup

Relational model wikipedia , lookup

Database model wikipedia , lookup

Transcript
BTM 382 Database Management
Chapter 6:
Normalization of Database Tables
Chitu Okoli
Associate Professor in Business Technology Management
John Molson School of Business, Concordia University, Montréal
Structure of BTM 382 Database Management





Week 1: Introduction and overview
 ch1: Introduction
Weeks 2-6: Database design
 ch3: Relational model
 ch4: ER modeling
 ch6: Normalization
 ERD modeling exercise
 ch5: Advanced data modeling
Week 7: Midterm exam
Weeks 8-10: Database programming
 ch7: Intro to SQL
 ch8: Advanced SQL
 SQL exercises
Weeks 11-13: Database management
 ch2,12,14: Data models
 ch13: Business intelligence and data warehousing
 ch9,15,16: Selected managerial topics
Review of Chapter 6:
Normalization of Database Tables
 What are dependencies between attributes in a table,
and how is tracing of dependencies used to
normalize tables?
 What are the normal forms in a relational database?
 Why and when would you consider denormalizing
tables in a relational database?
Problems with unnormalized tables
 Needless redundancy, hence insert, update and
delete anomalies (inconsistencies)
 Data updates are less efficient because tables are
larger
 Indexing is less effective
 Views (virtual tables) are more cumbersome
Understanding dependencies to be
able to properly normalize tables
Functional dependency
 Functional dependency: A→B or (A,B)→(C,D)
 “B is functionally dependent on A” means that if you know A, then there
you definitely know the correct value for B
 E.g. Project.ID → Project.Name
 Also called determination: “A determines B”
 Full functional dependency: (A,B)→C where A↛C and B↛C
 When all the attributes in a key are required for the determination (none is
optional)
 E.g. (Project.ID, Project.Manager) → Project.Name
Project.Manager is optional—this is not a full functional dependency
 E.g. (Project.Manager, Project.StartDate) → Project.Name
This is a full functional dependency, assuming a manager can launch no
more than one project on a given date
Repeating group = multivalued attribute
 Attribute whose values contain multiple values (a list
or array of values), instead of a single value
 Illegal in the relational model; troublesome for
normalization if you don’t catch it
 Two possible solutions
(e.g. Project.ID → Project.Location):
1. Create multiple attributes for each possible value (e.g.
Project.Location1, Project.Location2, Project.Location3)
2. Create a new entity to store multiple possible values (e.g.
Location)
Multivalued dependency
 Functional dependency: A→B
 Multivalued dependency: A↠B,C
 A determines B and A determines C, but B and C have
nothing to do with each other
 E.g. Project.ID ↠ Project.EmployeeID, Project.Location
 Since a project might have multiple locations and multiple
employees work on a project, the EmployeeID and Location in
the same row might have nothing to do with each other
 Usually indicates that one or more multivalued attributes
were not handled properly
Partial and transitive dependencies
 Partial dependency: (A,B)→(C,D) and B→C
 (A,B) is a candidate key (e.g. primary key)
 C doesn’t need both A and B to determine it; it only needs B
 E.g. (Project.ID,Project.ManagerID) → Project.Name
and Project.ID → Project.Name
 Transitive dependency: A→(B,C) and B→C
 A is a candidate key
 Technically speaking, a transitive dependency requires that B and C not be part
of any candidate key. However, if you expand the meaning to include even if they
are part of the key, then you will automatically avoid problems with BCNF
 A determines C, but so does B, even though B is not a candidate key
 E.g. Project.ID → (Project.Client,Project.Location)
and Project.Client → Project.Location
The normal forms
Normalization of relations: https://youtu.be/NwcVv1cxflk
(note and 0:34 and 1:50)
Summary of attaining normal forms

1NF: Primary key identified and no multivalued attributes
 Legitimate primary key selected (unique identifying key)
 Only one value per table cell; no lists/arrays (multivalued attributes) in any table cell
 If you split multivalued attributes off to separate tables, then you avoid 4NF violations

2NF: 1NF minus partial dependencies
 All candidate key dependencies are fully functional
 (A,B)→C where A↛C and B↛C

3NF/BCNF: 2NF minus transitive dependencies
 Only a candidate key determines any attribute
 If A→(B,C), then B ↛ C
 There is a technical distinction between 3NF and BCNF, but if you keep this rule, then you
take care of both 3NF and BCNF

4NF: BCNF minus multivalued dependencies
 Each row strictly describes just one entity
 If you split multivalued attributes into separate tables to attain 1NF, then you also avoid 4NF
violations

DKNF, 5NF, 6NF
 relatively rare and often not worth the trouble normalizing, even if applicable
Dependency diagram:
Basic tool for normalization
 Depicts all dependencies found in a given table structure
 Gives bird’s-eye view of all relationships among table’s
attributes
 Makes it less likely that you will overlook an important
dependency
3NF vs BCNF
 BCNF is only an issue
because of poor selection
of primary key for 1NF step
 Regardless, dealing with all
dependencies resolves
table into BCNF
Fixing 4NF problem
 The only reason a table
might be in 3NF/BCNF
but not in 4NF is
because two originally
multivalued attributes
existed at 1NF stage
 Multivalued attributes
should always be placed
in separate tables, or be
split into multiple
attributes
 If you do this in the first
step to resolve 1NF,
you will never have
problems with 4NF
Denormalization
Denormalization
 Although normalization is important, processing speed
and efficiency are also important in database design
Summary of Chapter 6:
Normalization of Database Tables
 Correctly identifying dependencies from the very
beginning is critical to properly normalize tables.
 The most important normal forms are 1NF, 2NF,
3NF, BCNF and 4NF.
 Although normalization to 4NF is usually important, a
designer might sometimes want to denormalize a
table to a lower normal form.
Sources
 Most of the slides are adapted from Database
Systems: Design, Implementation and
Management by Carlos Coronel and Steven Morris.
11th edition (2015) published by Cengage Learning.
ISBN 13: 978-1-285-19614-5
 Other sources are noted on the slides themselves