Download Normal forms

Document related concepts

Oracle Database wikipedia , lookup

Concurrency control wikipedia , lookup

Microsoft Jet Database Engine wikipedia , lookup

Relational algebra wikipedia , lookup

Database wikipedia , lookup

Ingres (database) wikipedia , lookup

Functional Database Model wikipedia , lookup

Clusterpoint wikipedia , lookup

Entity–attribute–value model wikipedia , lookup

Extensible Storage Engine wikipedia , lookup

Relational model wikipedia , lookup

Database model wikipedia , lookup

Transcript
Database Design
Normal forms
& Normalization
Compiled by S. Z.
Learning Objectives
•
Data Integrity
•
Data loss vs data lossless
•
Problems with bad database design:
– Incomplete (easy to fix, collect all necessary attributes)
– redundancy
Attribute decomposition
Identify key attributes and functional dependencies among attributes
Normal forms:
– 1NF,
– 2NF,
– 3NF,
– BCNF,
– 4NF
What normalization is?
How normal forms can be transformed from lower normal forms to higher normal forms
What role it plays in the database design process?
How normalization and ER modeling are used concurrently to produce a good database design?
How some situations require denormalization to generate information efficiently?
•
•
•
•
•
•
•
•
• Database (RM schema) design is more challenging
than manipulating tables.
• A correct (good) design is essential to writing
efficient queries against the data stored in the
database, it is therefore important to learn the
principles of good design.
Database design basics
• Certain principles guide the database design process.
– The first principle is that duplicate information (also called
redundant data) is bad, because it wastes space and increases the
likelihood of errors and inconsistencies.
– The second principle is that the correctness and completeness of
information is important. If your database contains incorrect
information, any reports that pull information from the database
will also contain incorrect information. As a result, any decisions
you make that are based on those reports will then be misinformed.
Guide to Oracle 10g
4
What is a good database schema
design?
•
A good database design is, therefore, one that:
– Divides your information into subject-based tables to reduce redundant data. Minimize
unnecessary redundancies, yet Keep necessary redundancies
– Data needs to be atomic
– Helps support and ensure the accuracy, completeness, consistency and integrity of your
information.
– Provides Access with the information it requires to join the information in the tables together as
needed.
– Accommodates your data processing and reporting needs.
– provides you with access to up-to-date, accurate information.
– easily accommodates change
Guide to Oracle 10g
5
The design process
•
The design process consists of the following steps:
–
–
–
–
Determine the purpose of your database
This helps prepare you for the remaining steps.
Find and organize the information required
Gather all of the types of information you might want to record in the database, such as product name and
order number.
– Divide the information into tables
– Divide your information items into major entities or subjects, such as Products or Orders. Each subject then
becomes a table.
– Turn information items into columns
– Decide what information you want to store in each table. Each item becomes a field, and is displayed as a
column in the table. For example, an Employees table might include fields such as Last Name and Hire Date.
– Specify primary keys
– Choose each table’s primary key. The primary key is a column that is used to uniquely identify each row. An
example might be Product ID or Order ID.
– Set up the table relationships
– Look at each table and decide how the data in one table is related to the data in other tables. Add fields to
tables or create new tables to clarify the relationships, as necessary.
– Refine your design
– Analyze your design for errors. Create the tables and add a few records of sample data. See if you can get the
results you want from your tables. Make adjustments to the design, as needed.
– Apply the normalization rules
– Apply the data normalization rules to see if your tables are structured correctly. Make adjustments to the
Guide to Oracle
6
tables,10g
as needed.
Data Integrity
• Data integrity refers to maintaining and assuring the accuracy and
consistency of data over its entire life-cycle, and is a critical aspect to
the design, implementation and usage of database system which stores,
processes, or retrieves data.
• Data integrity is the opposite of
– Incomplete data, data corruption, which is a form of data loss.
– Data Redundancy
Problems with unnormalized data
•
•
•
•
•
Contains redundancy
Contains multi-values, non atomic field.
Does not have a primary key identified
Cause data anomaly
Etc.
Example of Unnormalized Data
The idea is simple!
• This is a table design problem.
Issues of bad database design
• Another Example: Company that manages
building projects
– Charges its clients by billing hours spent on each
contract
– Hourly billing rate is dependent on employee’s
position
– Periodically, report is generated that contains
information displayed in Table 5.1
11
Issues of bad database design
12
Issues of bad database design
13
Issues of bad database design
• Structure of data set in Figure 5.1 does not handle
data very well
• The table structure appears to work; report
generated with ease
• Unfortunately, report may yield different results
depending on what data anomaly has occurred
14
How to measure goodness of database
design?
• In the relational model, methods exist for
quantifying how efficient a database is.
• These classifications are called normal forms (or
NF).
15
What is relational model schema is
about?
• Decide what information (in the mini world of
your application) you need, how to divide that
information into the appropriate tables and
columns, and how those tables relate to each other.
Guide to Oracle 10g
16
FD
• https://en.wikipedia.org/wiki/Functional_depende
ncy
Guide to Oracle 10g
17
Normal Form
• Edgar F. Codd originally established three normal
forms: 1NF, 2NF and 3NF.
• The normal forms are progressive, so to achieve
Second Normal Form, the tables must already be
in First Normal Form.
• 2NF is better than 1NF; 3NF is better than 2NF
–
More advanced normal forms
•
There are now others (BCNF, 4NF, 5NF) that are generally accepted, but 3NF is widely considered
to be sufficient for most applications.
–
Is out the scope of 242 course. Will be studied in 342 course.
•
Most tables when reaching 3NF are also in BCNF (Boyce-Codd Normal Form).
•
Highest level of normalization is not always most desirable. Also tradeoffs of various factors need to
be considered. For example, tables are sometimes denormalized to yield less I/O which increases
processing speed
When do you need to check normal forms
• Tables mapped from ER model usually meets
3NF, but there is no guarantee.
• Therefore once you have a database design, with
tables mapped from the constructs of ER diagram,
you need to check each table for up to 3NF.
• If a table meets 3NF, usually the table is in the
good shape. Otherwise, the table usually needs to
be further decomposed.
What keys are important?
• Candidate keys vs primary keys
Dependency and partial dependency
• What is dependency?
– If you look at two attributes (in a table), there are two kinds
of relationship.
• Independent from each other, for example age and state in
student table.
• One depends on the other, or in order words, one decide the
other. For example, ssn and age. Age depends on ssn. If you
know one’s ssn, you know his/her age. This shows the
dependency of age on ssn, which usually is key.
• Partial dependency (in case when the primary key consists
of multiple fields.)
– Fields within the table are dependent only on part of the
primary key
Dependency vs. determinant
• A determinant is the reversed concept of
dependency.
• A determinant is any attribute (simple or
composite) on which some other attribute is fully
functionally dependent.
Guide to Oracle 10g
23
First normal form (1NF)
– Primary key field identified
– No multi-valued attributes, no composite attributes,
i.e. each attribute is atomic, one value for each
attribute.
• Applies to every relation
Another Example Table 2
Title
Author1
Author
2
Database
System
Concepts
Abraham
Silberschatz
Operating
System
Concepts
Abraham
Silberschatz
ISBN
Subject
Pages
Publisher
Henry F. 0072958863
Korth
MySQL,
Computers
1168
McGraw-Hill
Henry F. 0471694665
Korth
Computers
944
McGraw-Hill
Similar problems with Table 2
• This table is not very efficient with storage.
• This design does not protect data integrity.
• Third, this table does not scale well.
First Normal Form
• In our Table 2, we have two violations of First
Normal Form:
– First, we have more than one author field,
– Second, our subject field contains more than one
piece of information. With more than one value in
a single field, it would be very difficult to search
for all books on a given subject.
First Normal Table
• Table 3
Title
Author
ISBN
Subject
Pages
Publisher
Database System
Concepts
Abraham
Silberschatz
0072958863
MySQL
1168
McGraw-Hill
Database System
Concepts
Henry F. Korth
0072958863
Computers
1168
McGraw-Hill
Operating System
Concepts
Henry F. Korth
0471694665
Computers
944
McGraw-Hill
Operating System
Concepts
Abraham
Silberschatz
0471694665
Computers
944
McGraw-Hill
continue
• We now have two rows for a single book.
Additionally, we would be violating the Second
Normal Form…
• A better solution to our problem would be to separate
the data into separate tables- an Author table and a
Subject table to store our information, removing that
information from the Book table:
Subject Table
Subject_ID
Subject
1
MySQL
2
Computers
Author Table
Book Table
Author_ID
Last Name
1
Silberschatz Abraham
2
Korth
ISBN
Title
Pages
Publisher
0072958863
Database System
Concepts
1168
McGraw-Hill
0471694665
Operating System
Concepts
944
McGraw-Hill
First Name
Henry
• Each table has a primary key, used for joining tables
together when querying the data. A primary key value
must be unique with in the table (no two books can
have the same ISBN number), and a primary key is
also an index, which speeds up data retrieval based on
the primary key.
• Now to define relationships between the tables
Relationships
Book_Author Table
Book_Subject Table
ISBN
Author_ID
0072958863
1
ISBN
Subject_ID
0072958863
2
0072958863
1
0471694665
1
0072958863
2
0471694665
2
0471694665
2
Second normal form (2NF)
Normalization (continued)
• Second normal form (2NF)
– In 1NF
– No partial dependencies
Normalization (continued)
• Basic procedure for identifying partial
dependency:
– Look at each field that is not part of the composite
primary key
– Make certain you are required to have both parts of
the composite field to determine the value of the
data element
Second Normal Form (2NF)
• As the First Normal Form deals with redundancy
of data across a horizontal row, Second Normal
Form (or 2NF) deals with redundancy of data in
vertical columns.
• The Book Table will be used for the 2NF example
2NF Table
Publisher Table
Publisher_ID
Publisher Name
1
McGraw-Hill
Book Table
ISBN
Title
Pages
Publisher_ID
0072958863
Database System
Concepts
1168
1
0471694665
Operating System
Concepts
944
1
2NF
• Here we have a one-to-many relationship between the
book table and the publisher. A book has only one
publisher, and a publisher will publish many books.
When we have a one-to-many relationship, we place a
foreign key in the Book Table, pointing to the
primary key of the Publisher Table.
• The other requirement for Second Normal Form is
that you cannot have any data in a table with a
composite key that does not relate to all portions of
the composite key.
Bad Example
• studenttable
– (student ID, course ID, course Name, grade)
• In this case, course name depends on courseID
only, so called partial dependency. Thus violate
2NF.
Third normal form (3NF)
Normalization (continued)
• Third normal form (3NF)
– In 2NF
– No transitive dependencies, i.e. the non – primary
key attributes should be mutually independent
• Table is in 3NF when it is in 2NF and there are no
transitive dependencies
• Transitive dependency
– Field is dependent on another field within the table
that is not the primary key field
Third Normal Form
• Third normal form (3NF) requires that there are
no functional dependencies of non-key attributes
on something other than a candidate key.
• A table is in 3NF if all of the non-primary key
attributes are mutually independent
• There should not be transitive dependencies
Bad Example
• Studenttable2
– (sid, student, state, state governor)
– In this case, state depends on sid, while state
governor depends on state, through which, depends
on sid, so exist a transitive dependency from state
governor to sid via state
– Thus violate 3NF
• For most business database design purposes, 3NF
is as high as we need to go in normalization
process
• 3NF does not deal satisfactorily with the case of a
relation with overlapping candidate keys, i.e.
multiple composite candidate keys with at least
one attribute in common.
The Boyce-Codd Normal Form
(BCNF)
• BCNF requires that the table is
– 3NF
– and only determinants are the candidate keys
• Every determinant in table is a candidate key
– Has same characteristics as primary key, but for
some reason, not chosen to be primary key
• When table contains only one candidate key, the
3NF and the BCNF are equivalent
• BCNF can be violated only when table contains
more than one candidate key
43
The Boyce-Codd Normal Form
(BCNF) (continued)
• Most designers consider the BCNF as special case
of 3NF, therefore, BCNF sometimes is called
3.5NF.
• Table can be in 3NF and fails to meet BCNF
– No partial dependencies, nor does it contain
transitive dependencies
– A nonkey attribute is the determinant of a key
attribute
44
The Boyce-Codd Normal Form (BCNF)
(continued)
45
The Boyce-Codd Normal Form (BCNF)
(continued)
46
The Boyce-Codd Normal Form (BCNF)
(continued)
47
• If a relational schema is in BCNF then all redundancy
based on functional dependency has been removed,
although other types of redundancy may still exist.
•
48
Fourth normal form (4NF)
Table in 3NF may contain multivalued
dependencies that produce either numerous null
values or redundant data
It may be necessary to convert 3NF table to
fourth normal form (4NF) by Splitting table to
remove multivalued dependencies
49
Fourth Normal Form (4NF)
• A relation is in 4NF if it is already in 3NF and has no multi-valued
dependencies.
• Table is in fourth normal form (4NF) when both of the following are
true:
– It is in 3NF
– Has no multiple sets of independent (will be discussed in later
slides) multivalued dependencies (no multiple of multiple, or at
most one multiple dependencies). i.e., a record type can contain at
most one multi-valued facts about an entity.
• 4NF is largely academic if tables conform to following two rules:
– All attributes must be dependent on primary key, but independent
of each other
– No row contains two or more multivalued facts about an entity
50
Multiple Multi-valued dependency
• Multiple multi-valued dependencies exist when
– There are at least three attributes A, B, and C in a
relation and
– For each value of A there is a well-defined set of
values for B, and a well-defined set of values for C,
but the set of values of B is independent of set C.
– Every possible combination of the two multi-valued
attributes have to be stored in the database thus leading
to redundancy and consequent anomalies.
51
Course ID
Instructor
Textbook
CSCI242
Zhang
Intro to MYSQL
CSCI242
Allison
MYSQL
CSCI242
Zhang
Oracle
CSCI242
Allison
Intro to MYSQL
• This is ok, by designating all three fields combined to
serve as primary key of the table.
• However, this contain multiple (two in this case)
multi-value sets (instructors and textbook, with
respect to course ID, respectively).
52
• By splitting the above relation into two relations and
placing the multi-valued attributes in each table by
themselves, we can convert the above to 4NF
• Course-INST(course-ID, Instructor)
• Course-TEXT(course-ID, Textbook)
53
• Other problems caused by violating fourth normal form are similar in spirit
to those mentioned earlier for violations of second or third normal form.
They take different variations depending on the chosen maintenance policy:
• If there are repetitions, then updates have to be done in multiple records, and
they could become inconsistent.
• Insertion of a new skill may involve looking for a record with a blank skill,
or inserting a new record with a possibly blank language, or inserting
multiple records pairing the new skill with some or all of the languages.
• Deletion of a skill may involve blanking out the skill field in one or more
records (perhaps with a check that this doesn't leave two records with the
same language and a blank skill), or deleting one or more records, coupled
with a check that the last mention of some language hasn't also been deleted.
• Fourth normal form minimizes such update problems.
54
Independence
•
•
•
We mentioned independent multi-valued facts earlier, and we now illustrate what we mean
in terms of the example. The two many-to-many relationships, employee:skill and
employee:language, are "independent" in that there is no direct connection between skills and
languages. There is only an indirect connection because they belong to some common
employee. That is, it does not matter which skill is paired with which language in a record; the
pairing does not convey any information. That's precisely why all the maintenance policies
mentioned earlier can be allowed.
In contrast, suppose that an employee could only exercise certain skills in certain languages.
Perhaps Smith can cook French cuisine only, but can type in French, German, and Greek.
Then the pairings of skills and languages becomes meaningful, and there is no longer an
ambiguity of maintenance policies. In the present case, only the following form is correct:
------------------------------- | EMPLOYEE | SKILL | LANGUAGE | |----------+-------+----------|
| Smith | cook | French | | Smith | type | French | | Smith | type | German | | Smith | type | Greek
| ------------------------------- Thus the employee:skill and employee:language relationships are
no longer independent. These records do not violate fourth normal form. When there is an
interdependence among the relationships, then it is acceptable to represent them in a single
record.
55
4.1.2 Multivalued Dependencies
•
•
•
•
•
For readers interested in pursuing the technical background of fourth normal form a bit further, we mention
that fourth normal form is defined in terms of multivalued dependencies, which correspond to our
independent multi-valued facts. Multivalued dependencies, in turn, are defined essentially as relationships
which accept the "cross-product" maintenance policy mentioned above. That is, for our example, every one
of an employee's skills must appear paired with every one of his languages. It may or may not be obvious to
the reader that this is equivalent to our notion of independence: since every possible pairing must be
present, there is no "information" in the pairings. Such pairings convey information only if some of them
can be absent, that is, only if it is possible that some employee cannot perform some skill in some language.
If all pairings are always present, then the relationships are really independent.
We should also point out that multivalued dependencies and fourth normal form apply as well to
relationships involving more than two fields. For example, suppose we extend the earlier example to
include projects, in the following sense:
An employee uses certain skills on certain projects.
An employee uses certain languages on certain projects.
If there is no direct connection between the skills and languages that an employee uses on a project, then we
could treat this as two independent many-to-many relationships of the form EP:S and EP:L, where "EP"
represents a combination of an employee with a project. A record including employee, project, skill, and
language would violate fourth normal form. Two records, containing fields E,P,S and E,P,L, respectively,
would satisfy fourth normal form.
56
Fourth Normal Form (4NF) (continued)
57
Fourth Normal Form (4NF)
(continued)
58
Fifth normal form (5NF)
• Fifth normal form (5NF), also known as projectjoin normal form (PJ/NF) is a level of database
normalization designed to reduce redundancy in
relational databases recording multi-valued facts by
isolating semantically related multiple relationships.
• A relation is said to be in the 5NF if and only if every
non-trivial join dependency in it is implied by
the candidate keys.
59
• A join dependency *{A, B, … Z} on R is implied by
the candidate key(s) of R if and only if each of A, B,
…, Z is a superkey for R.
60
Normal Forms
1NF
2NF 3NF BCNF
4NF
5NF
UNAVOIDABLE REDUNDANCIES
Normalization certainly doesn't remove all
redundancies.
Certain redundancies seem to be unavoidable,
particularly when several multivalued facts are
dependent rather than independent.
INTER-RECORD REDUNDANCY
The normal forms discussed here deal only with redundancies occurring
within a single record type. Fifth normal form is considered to be the
"ultimate" normal form with respect to such redundancies.
Other redundancies can occur across multiple record types. For the
example concerning employees, departments, and locations, the
following records are in third normal form in spite of the obvious
redundancy:
| EMPLOYEE | DEPARTMENT |
| DEPARTMENT | LOCATION |
| EMPLOYEE | LOCATION |
In fact, two copies of the same record type would constitute the ultimate
in this kind of undetected redundancy. Beyond the scope of this course.
.
63
7 CONCLUSION
While we have tried to present the normal forms in a simple and
understandable way, we are by no means suggesting that the data
design process is correspondingly simple. The design process involves
many complexities which are quite beyond the scope of this paper. In
the first place, an initial set of data elements and records has to be
developed, as candidates for normalization. Then the factors affecting
normalization have to be assessed:
•Single-valued vs. multi-valued facts.
•Dependency on the entire key.
•Independent vs. dependent facts.
•The presence of mutual constraints.
•The presence of non-unique or non-singular representations.
And, finally, the desirability of normalization has to be assessed, in
terms of its performance impact on retrieval applications.
64
To judge a person is healthy is probably easier than
to make a person become healthy, to say the least in some cases!!
Now we know how to verify whether or not a database
design is conforming to normal forms.
Still we want to know how to design databases to meet
expectations of the the normal forms?
There are algorithms for converting a given “bad”
database design to increasingly better design.
65
Database Normalization
• Database normalization is the process of removing
redundant data from your tables in to improve storage
efficiency, data integrity, and scalability, through removing
un-normalized relationship between the attributes in the
same table.
• Normalization generally involves splitting existing tables
into multiple ones, which may need to be re-joined or
linked each time when any query involving the multiple
tables is issued.
• If you want to normalize data, normalize at the higher level
first, i.e., normalize the table, the meta data of data.
History
• Edgar F. Codd first proposed the process of
normalization and what came to be known as the 1st
normal form in his paper A Relational Model of
Data for Large Shared Data Banks.
• Codd stated:
“There is, in fact, a very simple elimination
procedure which we shall call normalization.
Through decomposition nonsimple domains are
replaced by ‘domains whose elements are atomic
(nondecomposable) values.’”
Database Tables and Normalization
•
Normalization
– Step-by-step process used to determine which data elements should be stored in which tables
– Purpose
• evaluate and correct table structures
• minimize data data redundancy without losing information
• Reduces data anomalies
•
Multiple levels of normalization
– Works through a series of stages called normal forms:
• First normal form (1NF)
• Second normal form (2NF)
• Third normal form (3NF)
The Normalization Process
• Each table represents a single subject
• No data item will be unnecessarily stored in more
than one table
• All attributes in a table are dependent on the
primary key
The Normalization Process (continued)
70
Conversion to First Normal Form
• Repeating group
– Derives its name from the fact that a group of
multiple entries of same type can exist for any
single key attribute occurrence
• Relational table must not contain repeating groups
• Normalizing table structure will reduce data
redundancies
• Normalization is three-step procedure
71
Conversion to First Normal Form
(continued)
• Step 1: Eliminate the Repeating Groups
– Present data in tabular format, where each cell has
single value and there are no repeating groups
– Eliminate repeating groups, eliminate nulls by
making sure that each repeating group attribute
contains an appropriate data value
72
Conversion to First Normal Form
(continued)
73
Conversion to First Normal Form
(continued)
• Step 2: Identify the Primary Key
– Primary key must uniquely identify attribute value
– New key must be composed
74
Conversion to First Normal Form
(continued)
• Step 3: Identify All Dependencies
– Dependencies can be depicted with help of a
diagram
– Dependency diagram:
• Depicts all dependencies found within given table
structure
• Helpful in getting bird’s-eye view of all
relationships among table’s attributes
• Makes it less likely that will overlook an important
dependency
75
Conversion to First Normal Form
(continued)
76
Conversion to First Normal Form
(continued)
• First normal form describes tabular format in
which:
– All key attributes are defined
– There are no repeating groups in the table
– All attributes are dependent on primary key
• All relational tables satisfy 1NF requirements
• Some tables contain partial dependencies
– Dependencies based on only part of the primary key
– Sometimes used for performance reasons, but should be
used with caution
– Still subject to data redundancies
77
Conversion to Second Normal Form
• Relational database design can be improved by
converting the database into second normal form
(2NF)
• Two steps
78
Conversion to Second Normal Form
(continued)
• Step 1: Write Each Key Component
on a Separate Line
– Write each key component on separate line, then
write original (composite) key on last line
– Each component will become key in new table
79
Conversion to Second Normal Form
(continued)
• Step 2: Assign Corresponding Dependent
Attributes
– Determine those attributes that are dependent on
other attributes
– At this point, most anomalies have been eliminated
80
Conversion to Second Normal Form
(continued)
81
Conversion to Second Normal Form
(continued)
• Table is in second normal form (2NF) when:
– It is in 1NF and
– It includes no partial dependencies:
• No attribute is dependent on only portion of primary
key
82
Conversion to Third Normal Form
• Data anomalies created are easily eliminated by
completing three steps
• Step 1: Identify Each New Determinant
– For every transitive dependency, write its
determinant as PK for new table
• Determinant
– Any attribute whose value determines other values within a row
83
Conversion to Third Normal Form
(continued)
• Step 2: Identify the Dependent Attributes
– Identify attributes dependent on each determinant
identified in Step 1 and identify dependency
– Name table to reflect its contents and function
84
Conversion to Third Normal Form
(continued)
• Step 3: Remove the Dependent Attributes from
Transitive Dependencies
– Eliminate all dependent attributes in transitive
relationship(s) from each of the tables that have
such a transitive relationship
– Draw new dependency diagram to show all tables
defined in Steps 1–3
– Check new tables as well as tables modified in Step
3 to make sure that each table has determinant and
that no table contains inappropriate dependencies
85
Conversion to Third Normal Form
(continued)
86
Conversion to Third Normal Form
(continued)
• A table is in third normal form (3NF) when both
of the following are true:
– It is in 2NF
– It contains no transitive dependencies
87
Improving the Design
• Table structures are cleaned up to eliminate
troublesome initial partial and transitive
dependencies
• Normalization cannot, by itself, be relied on to
make good designs
• It is valuable because its use helps eliminate data
redundancies
88
Improving the Design (continued)
• Issues to address in order to produce a good
normalized set of tables:
–
–
–
–
–
–
–
–
Evaluate PK Assignments
Evaluate Naming Conventions
Refine Attribute Atomicity
Identify New Attributes
Identify New Relationships
Refine Primary Keys as Required for Data Granularity
Maintain Historical Accuracy
Evaluate Using Derived Attributes
89
Improving the Design (continued)
90
Improving the Design (continued)
91
Surrogate Key Considerations
• When primary key is considered to be unsuitable,
designers use surrogate keys
• Data entries in Table 5.3 are inappropriate because
they duplicate existing records
– Yet there has been no violation of either entity
integrity or referential integrity
92
Surrogate Key Considerations
(continued)
93
Normalization and Database Design
• Normalization should be part of design process
• Make sure that proposed entities meet required normal
form before table structures are created
• Many real-world databases have been improperly designed
or burdened with anomalies if improperly modified during
course of time
• You may be asked to redesign and modify existing
databases
94
Normalization and Database Design
(continued)
• ER diagram
– Provides big picture, or macro view, of an
organization’s data requirements and operations
– Created through an iterative process
• Identifying relevant entities, their attributes and their
relationship
• Use results to identify additional entities and
attributes
95
Normalization and Database Design
(continued)
• Normalization procedures
– Focus on characteristics of specific entities
– Represents micro view of entities within ER
diagram
• Difficult to separate normalization process from
ER modeling process
• Two techniques should be used concurrently
96
Normalization and Database Design
(continued)
97
Normalization and Database Design
(continued)
98
Normalization and Database Design
(continued)
99
Normalization and Database Design
(continued)
100
Normalization and Database Design
(continued)
101
Denormalization
• Creation of normalized relations is important
database design goal
• Processing requirements should also be a goal
• If tables decomposed to conform to normalization
requirements:
– Number of database tables expands
102
Denormalization (continued)
• Joining the larger number of tables takes
additional input/output (I/O) operations and
processing logic, thereby reducing system speed
• Conflicts between design efficiency, information
requirements, and processing speed are often
resolved through compromises that may include
denormalization
103
Denormalization (continued)
• Unnormalized tables in production database tend
to suffer from these defects:
– Data updates are less efficient because programs
that read and update tables must deal with larger
tables
– Indexing is more cumbersome
– Unnormalized tables yield no simple strategies for
creating virtual tables known as views
104
Denormalization (continued)
• Use denormalization cautiously
• Understand why—under some circumstances—
unnormalized tables are better choice
105
Summary
• Normalization is technique used to design tables
in which data redundancies are minimized
• First three normal forms (1NF, 2NF, and 3NF) are
most commonly encountered
• Table is in 1NF when all key attributes are defined
and when all remaining attributes are dependent
on primary key
106
Summary (continued)
• Table is in 2NF when it is in 1NF and contains
no partial dependencies
• Table is in 3NF when it is in 2NF and contains
no transitive dependencies
• Table that is not in 3NF may be split into new
tables until all of the tables meet 3NF
requirements
• Normalization is important part—but only
part—of design process
107
Summary (continued)
108
Summary (continued)
109
Summary (continued)
110
References
111