Download Normalization of Database Tables

Document related concepts

Oracle Database wikipedia , lookup

Microsoft Jet Database Engine wikipedia , lookup

Concurrency control wikipedia , lookup

Relational algebra wikipedia , lookup

Extensible Storage Engine wikipedia , lookup

Database wikipedia , lookup

Entity–attribute–value model wikipedia , lookup

Ingres (database) wikipedia , lookup

Clusterpoint wikipedia , lookup

ContactPoint wikipedia , lookup

Relational model wikipedia , lookup

Database model wikipedia , lookup

Transcript
Chapter 5
Normalization of Database Tables
Database Systems: Design, Implementation, and Management
Peter Rob & Carlos Coronel
In this chapter, you will learn:
 What normalization is and what role it plays in
database design
 About the normal forms 1NF, 2NF, 3NF, BCNF, and
4NF
 How normal forms can be transformed from lower
normal forms to higher normal forms
 That normalization and E-R modeling are used
concurrently to produce a good database design
 That some situations require denormalization to
generate information efficiently
Database Tables and Normalization
Normalization
 Process
for evaluating and correcting table structures to
minimize data redundancies
 process for assigning attributes to tables. It



reduces data redundancies
helps eliminate data anomalies.
produces controlled redundancies to link tables
Normalization works through a series of
stages called normal forms:
 First
normal form (1NF)
 Second normal form (2NF)
 Third normal form (3NF)
 Fourth normal form (4NF)
Database Tables and Normalization
Normalization
 2NF
is better than 1NF;
 3NF is better than 2NF
 For most business database design purposes,
3NF is as high as we need to go in normalization process
The highest level of normalization is not always
most desirable.
Database Tables and Normalization
The Need for Normalization
 Case
of a Construction Company

Building project -- Project number, Name, Employees assigned
to the project.

Employee -- Employee number, Name, Job classification

The company charges its clients by billing the hours spent on
each project. The hourly billing rate is dependent on the
employee’s position.

Periodically, a report is generated. Table 5.1

The easiest way to generate the required report might seem to
be a table whose contents correspond to the reporting
requirements. Figure 5.1
Database Tables and Normalization
Need for Normalization:
Problems with the Figure 5.1
 The
project number is intended to be a primary key,
but it contains nulls.
 The
table displays data redundancies.
 The
table entries invite data inconsistencies.
 The
data redundancies yield the following anomalies:

Update anomalies. (modify JOB_CLASS for Employee 105)

Insertion anomalies. (a new Employee not yet assigned)

Deletion anomalies. ( Employee 103 quits)
The Normalization Process
Database Tables and Normalization
Conversion to 1NF

Repeating groups – a group of multiple entries can exist for any
single key attribute occurrence.

Repeating groups must be eliminated
Any project number (PROJ_NUM) can have a group of several
data entries.

A relational table must not contain repeating groups.
Step 1: Eliminate the Repeating Groups –
Repeating groups can be eliminated by adding the
appropriate entry in at least the primary key column(s).
Step 2: Identify the Primary Key
•
Uniquely identifies attribute values (rows)
•
Combination of PROJ_NUM and EMP_NUM
Database Tables and Normalization
Step 3: Identify all Dependencies
Dependency Diagram





The primary key components are bold, underlined, and
shaded in a different color.
The arrows above entities indicate
all desirable dependencies ( dependencies based on PK )
The arrows below the dependency diagram indicate less
desirable dependencies –
– partial dependencies
( dependencies based on only a part of PK )
– transitive dependencies
( nonprime attribute → nonprime attribute )
Prime attribute = Key attribute
Nonprime attribute = Nonkey attribute
Database Tables and Normalization
 EMP_NUM → EMP_NAME, JOB_CLASS_, CHG_HOUR
 PROJ_NUM → PROJ_NAME
 JOB_CLASS → CHG_HOUR
Database Tables and Normalization
1NF Definition
 The
term first normal form (1NF) describes the
tabular format in which:



All the key attributes are defined.
There are no repeating groups in the table.
Each row/col intersection can contain one and only one
value, not set of values.
All attributes are dependent on the primary key.
 All relational tables satisfy the 1NF requirements.
1NF Drawback
 Partial
dependencies (EMP_NUM → EMP_NAME, JOB_CLASS_,
CHG_HOUR)
→→ data redundancies
→→ data anomalies
Database Tables and Normalization
Conversion to 2NF
 Step


1: Identify All Key Components
Writing each key component on a separate line, and then
writing the original key on the last line and
PROJ_NUM
EMP_NUM
PROJ_NUM, EMP_NUM
 Step

2: Identify the Dependent Attributes
Writing the dependent attributes after each new key.
PROJECT ( PROJ_NUM, PROJ_NAME)
EMPLOYEE ( EMP_NUM, EMP_NAME, JOB_CLASS, CHG_HOUR)
ASSIGN ( PROJ_NUM, EMP_NUM, HOURS)
Database Tables and Normalization
 2NF Definition

A table is in 2NF if:

It is in 1NF and

It includes no partial dependencies;
that is, no attribute is dependent on only portion of primary key.

A table whose primary key is not composite
must automatically be in 2NF.
BECAUSE a partial dependency can exist only if
a table has a composite primary key

It is still possible for a table in 2NF to exhibit transitive dependency;
that is, one or more attributes may be functionally dependent on nonkey
attributes.
 2NF Drawback

Transitive dependencies (JOB_CLASS →
→→ data redundancies
→→ data anomalies
CHG_HOUR)
Database Tables and Normalization
Conversion to 3NF
 Create
a separate table with attributes in a transitive
functional dependence relationship.
Step 1: Identify Each New Determinant
JOB_CLASS
 Step 2: Identify the Dependent Attributes
JOB_CLASS → CHG_HOUR
 Step 3: Remove the Dependent Attributes from Transitive
Dependencies
EMPLOYEE (EMP_NUM, EMP_NAME, JOB_CLASS)
JOB (JOB_CLASS, CHG_HOUR)

PROJECT (PROJ_NUM, PROJ_NAME)
ASSIGN (PROJ_NUM, EMP_NUM, HOURS)
Database Tables and Normalization
3NF Definition
A
table is in 3NF if:


It is in 2NF and
It contains no transitive dependencies.
Improving the Design
 Table structures are cleaned up to eliminate the
troublesome initial partial and transitive
dependencies
 Normalization
cannot, by itself, be relied on to make good designs
 It is valuable
because its use helps eliminate data redundancies
Improving the Design
Issues to address in order to produce a good
normalized set of tables:
 Evaluate
PK Assignments
 Evaluate Naming Conventions
 Refine Attribute Atomicity
 Identify New Attributes
 Identify New Relationships
 Refine Primary Keys as Required for Data Granularity
 Maintain Historical Accuracy
 Evaluate Using Derived Attributes
Improving the Design
Adding
relationships
 Project’s
manager
 EMP_NUM as a FK in PROJECT.
3NF
Project
manager
Improving the Design
PK
assignment
 JOB_CODE
Naming
conventions
 JOB_CHG_HOUR
 JOB_CLASS
>> JOB_DESCRIPTION
2NF
Improving the Design
3NF
atomicity
 EMP_NAME >> EMP_LNAME,EMP_FNAME,EMP_INITIAL
 Adding attributes
JOB_CLASS
 EMP_HIREDATE
 Attribute
Improving the Design



Refining PKs
 (EMP_NUM+PROJ_NUM) >> ASSIGN_NUM
Maintaining historical accuracy
3NF
 ASSIGN_CHG_HOUR <<>> JOB_CHG_HOUR
Using derived attributes
 ASSIGN_CHARGE = ASSIGN_HOURS × ASSIGN_CHG_HOUR
Surrogate Key Considerations
 When primary key is considered to be unsuitable,
designers use system-defined surrogate keys
 The DBMS can be used to have the system assign
the PK values (JOB_CODE)
>> to ensure entity integrity
Limitations on system-defined surrogate keys
 Data
entries in Table 5.3 are inappropriate
because they duplicate existing records
 However, it does not prevent us from making the entries
shown in Table 5.3.
>> Multiple duplicate records problem
 We still must ensure the uniqueness in
JOB_DESCRIPTION through the use of a unique index.
Database Tables and Normalization
Boyce-Codd Normal Form (BCNF)

A table is in Boyce-Codd normal form (BCNF)
if every determinant in the table is a candidate key.
(A determinant is any attribute whose value determines other
values with a row.)

If a table contains only one candidate key,
the 3NF and the BCNF are equivalent.

BCNF can be violated
only if the table contains more than one candidate key

BCNF is a special case of 3NF.

Figure 5.7 illustrates a table that is in 3NF but not in BCNF.

Figure 5.8 shows how the table can be decomposed to conform
to the BCNF form.
A Table That Is In 3NF But Not In BCNF

A + B → C, D

C → B : Not transitive dependencies
(A nonkey attribute is the determinant of a key attribute)
→ → 3NF
C
: Not candidate key → → Not In BCNF
The Decomposition of a Table Structure to meet BCNF Requirements

A + B → C, D

C→B
Change the PK
to A+C

A + C → B, D

C→B

A+C→D

C→B
The Boyce-Codd Normal Form (BCNF)

STU_ID + STAFF_ID → CLASS_CODE, ENROLL_GRADE

CLASS_CODE → STAFF_ID
Decomposition into BCNF
Figure 5.9
Normalization and Database Design
 Normalization should be part of the design process
 E-R Diagram provides macro view
 Normalization provides micro view of entities


Focuses on characteristics of specific entities
A micro view of the entities within the ER diagram
 Difficult to separate normalization from E-R diagramming
 Two techniques should be used concurrently
Normalization and Database Design
 Database Design and Normalization Example:
(Construction Company)

Summary of Operations:

The company manages many projects.

Each project requires the services of many employees.

An employee may be assigned to several different projects.

Some employees are not assigned to a project and perform duties
not specifically related to a project.

Some employees are part of a labor pool, to be shared by all project
teams.

Each employee has a (single) primary job classification. This job
classification determines the hourly billing rate.

Many employees can have the same job classification.
Normalization and Database Design
Two Initial Entities:
[ 3NF ]

PROJECT (PROJ_NUM, PROJ_NAME)

EMPLOYEE ( EMP_NUM, EMP_LNAME, EMP_FNAME, EMP_INITIAL,
JOB_DESCRIPTION, JOB_CHG_HOUR)
No partial dep.
Transitive dep. JOB_DESCRIPTION → JOB_CHG_HOUR [ 2NF ]
Normalization and Database Design
Normalization and Database Design
 Three Entities After Transitive Dependency Removed
PROJECT (PROJ_NUM, PROJ_NAME)
EMPLOYEE
( EMP_NUM, EMP_LNAME, EMP_FNAME, EMP_INITIAL, JOB_CODE)
EMPLOYEE
JOB
( JOB_CODE, JOB_DESCRIPTION, JOB_CHG_HOUR)
The Modified ERD For A Contracting Company
Because the normalization process yields an additional
entity (JOB),
we modify the initial ERD.
M
Is held by
Normalization and Database Design
Normalization and Database Design
Normalization and Database Design
 Creation of the Composite Entity ASSIGNMENT
ASSIGNMENT
Is held by
The Final ( Implementable) ERD for the Contracting Company
Normalization and Database Design
Normalization and Database Design
 Attribute ASSIGN_HOUR is assigned to the composite
entity ASSIGN.
 “Manages” relationship is created between EMPLOYEE
and PROJECT.
Manages
PROJECT (PROJ_NUM, PROJ_NAME, EMP_NUM)
ASSIGNMENT (ASSIGN_NUM,
ASSIGN_DATE , ASSIGN_HOURS, ASSIGN_CHG_HOUR , ASSIGN_CHARGE,
EMP_NUM,
PROJ_NUM)
EMPLOYEE (EMP_NUM, EMP_LNAME, EMP_FNAME, EMP_INITIAL,
EMP_HIREDATE, JOB_CODE)
JOB (JOB_CODE, JOB_DESCRIPTION, JOB_CHG_HOUR)
Normalization and Database Design
Higher-Level Normal Forms
In some databases,
multiple multivalued attributes exist
4NF Definition
A
table is in 4NF
if it is in 3NF and
has no multiple sets of multivalued dependencies.
Higher-Level Normal Forms



An employee can have multiple assignments and can also be
involved in multiple service organization.
EMP_SERVICE (volunteer work) and EMP_ASSIGN (assigned
project)
Independent
each may have many different values.
The table contain two sets of multivalued dependencies.
1
A Set of Tables in 4NF
 The solution is to eliminate the problems caused by
independent multivalued dependencies.
ORGANIZATION
M
Service
N
EMPLOYEE
M
Assign
N
PROJECT
Denormalization
 Creation of normalized relations is important database
design goal
 Processing requirements should also be a goal
 If tables decomposed to conform to normalization
requirements

Number of database tables expands
 Joining larger number of tables
takes additional disk input/output (I/O) operations and
processing logic

Reduces system speed
Denormalization
 Normalization is only one of many database design
goals.
 Normalized (decomposed) tables require additional
processing, ( Join : additional I/O operations )
reduce system speed.
 CUSTOMER( CUS_NUM, CUS_NAME, … , ZIP_CODE, CITY)
transitive dep.
Is it really practical?
ZIP(ZIP_CODE, CITY)
 Some degree of denormalization
→→ increase processing speed
Denormalization
 Normalization purity is often difficult to sustain in the modern
database environment.
 The conflict between design efficiency, information requirements,
and processing speed are often resolved through compromises that
include denormalization.
 In fact, we used a 2NF structure in the JOB table (Fig.5.6) to
decrease the likelihood of referential integrity violations.


JOB table

Original PK: JOB_CLASS

New PK: JOB_CODE
EMPLOYEE table

Original FK: JOB_CLASS

New FK: JOB_CODE
 Use denormalization cautiously.
The Initial 1NF Structure
Identifying the Possible PK Attributes
Table Structures Based On The Selected PKs
Foreign key
Summary
Summary