Download Second Normal Form - Department of Computer Science

Document related concepts

Global serializability wikipedia , lookup

Encyclopedia of World Problems and Human Potential wikipedia , lookup

Microsoft Access wikipedia , lookup

Commitment ordering wikipedia , lookup

Serializability wikipedia , lookup

Open Database Connectivity wikipedia , lookup

IMDb wikipedia , lookup

Oracle Database wikipedia , lookup

Extensible Storage Engine wikipedia , lookup

Entity–attribute–value model wikipedia , lookup

Relational algebra wikipedia , lookup

Ingres (database) wikipedia , lookup

Microsoft Jet Database Engine wikipedia , lookup

Database wikipedia , lookup

Concurrency control wikipedia , lookup

Clusterpoint wikipedia , lookup

Database model wikipedia , lookup

ContactPoint wikipedia , lookup

Relational model wikipedia , lookup

Transcript
Lecture 3 Functional Dependency and
Normal Forms
Prof. Sin-Min Lee
Department of Computer Science
Database Design Process
Application 1
External
Model
Application 2
Application 3
Application 4
External
Model
External
Model
External
Model
Application 1
Conceptual
requirements
Application 2
Conceptual
requirements
Application 3
Conceptual
requirements
Conceptual
Model
Logical
Model
Internal
Model
Application 4
Conceptual
requirements
Database System Concepts
3.2
©Silberschatz, Korth and Sudarshan
Relational Database Model
Relations
Source: ESRI Advanced ArcInfo
Database System Concepts
3.3
©Silberschatz, Korth and Sudarshan
Source: ESRI Advanced ArcInfo
Database System Concepts
3.4
©Silberschatz, Korth and Sudarshan
Source: ESRI Advanced ArcInfo
Database System Concepts
3.5
©Silberschatz, Korth and Sudarshan
Source: ESRI Advanced ArcInfo
Database System Concepts
3.6
©Silberschatz, Korth and Sudarshan
Georelational Database Model
Database System Concepts
3.7
©Silberschatz, Korth and Sudarshan
Attribute Relationships
Functional Dependency:
refers to the relationships between attributes within
a relation.
If the value of attribute A determines the value of
attribute B, then attribute B is functionally dependent
upon attribute A.
Database System Concepts
3.8
©Silberschatz, Korth and Sudarshan
Source: ESRI Advanced ArcInfo
Database System Concepts
3.9
©Silberschatz, Korth and Sudarshan
Functional Dependencies
X -> Y means:
 X functionally determines Y
 Y depends on X
 Values of Y component depend on, determined by values of X
component
Database System Concepts
3.10
©Silberschatz, Korth and Sudarshan
Functional Dependencies
Given t1 and t2:

if t1[X] = t2 [X] then t1[Y] = t2 [Y] (1)
 In other words if the values of X are equal, then Y value are
equal
 Values of X component uniquely (functionally) determine
values of Y component iff (1)
Database System Concepts
3.11
©Silberschatz, Korth and Sudarshan
Data Normalization
 Primarily a tool to validate and improve a logical design so that it
satisfies certain constraints that avoid unnecessary duplication of
data.
 The process of decomposing relations with anomalies to produce
smaller, well-structured relations.
 Primary Objective: Reduce Redundancy,Reduce nulls,
 Improve “modify” activities:
 insert,
 update,
 delete,
 but not read
 Price: degraded query, display, reporting
Database System Concepts
3.12
©Silberschatz, Korth and Sudarshan
Normal Forms
 First Normal Form (1NF)
 Second Normal Form (2NF)
 Third Normal Form (3NF)
 Boyce-Codd Normal Form (BCNF)
 Fourth Normal Form (4NF)
 Fifth Normal Form (5NF)
Database System Concepts
3.13
©Silberschatz, Korth and Sudarshan
Normalization
No transitive
dependency
between
nonkey
attributes
All
determinants
are candidate
keys - Single
multivalued
dependency
Database System Concepts
BoyceCodd and
Higher
3.14
Functional
dependency
of nonkey
attributes on
the primary
key - Atomic
values only
Full
Functional
dependency
of nonkey
attributes on
the primary
key
©Silberschatz, Korth and Sudarshan
Unnormalized Relations
 First step in normalization is to convert the data into a two-
dimensional table
 In unnormalized relations data can repeat within a column
Database System Concepts
3.15
©Silberschatz, Korth and Sudarshan
Unnormalized Relation
Patient #
Surgeon #
145
1111 311
Surg. date
Patient Name
Jan 1,
1995; June
12, 1995
John White
Patient Addr Surgeon
15 New St.
New York,
NY
243
1234 467
2345 189
Jan 8,
1996
Charles Brown
4876 145
Nov 5,
1995
Hal Kane
5123 145
May 10,
1995
Paul Kosher
Charles
Field
10 Main St. Patricia
Rye, NY
Gold
Dogwood
Lane
Harrison,
David
NY
Rosen
55 Boston
Post Road,
Chester,
CN
Beth Little
Blind Brook
Mamaronec
k, NY
Beth Little
6845 243
Apr 5,
1994 Dec
15, 1984
Ann Hood
Hilton Road
Larchmont, Charles
NY
Field
Database System Concepts
3.16
Postop drug
Drug side effects
Gallstone
s removal;
Beth Little Kidney
Michael
stones
Penicillin,
Diamond removal
none-
Apr 5,
1994 May
10, 1995
Mary Jones
Surgery
rash
none
Eye
Cataract
removal
Thrombos Tetracyclin Fever
is removal e none
none
Open
Heart
Surgery
Cephalosp
orin
none
Cholecyst
ectomy
Gallstone
s
Removal
Eye
Cornea
Replacem
ent Eye
cataract
removal
Demicillin
none
none
none
Tetracyclin
e
Fever
©Silberschatz, Korth and Sudarshan
First Normal Form
 To move to First Normal Form a relation must contain only
atomic values at each row and column.
 No repeating groups
 A column or set of columns is called a Candidate Key when its
values can uniquely identify the row in the relation.
Database System Concepts
3.17
©Silberschatz, Korth and Sudarshan
First Normal Form
Patient #
Surgeon # Surgery DatePatient Name Patient Addr Surgeon Name
1111
145
01-Jan-95 John White
1111
311
12-Jun-95 John White
15 New St.
New York,
NY
15 New St.
New York,
NY
1234
243
05-Apr-94 Mary Jones
10 Main St.
Rye, NY
1234
467
10-May-95 Mary Jones
2345
4876
5123
6845
6845
189
145
145
243
243
Database System Concepts
Charles
08-Jan-96 Brown
10 Main St.
Rye, NY
Dogwood
Lane
Harrison,
NY
05-Nov-95 Hal Kane
55 Boston
Post Road,
Chester,
CN
05-Apr-94 Ann Hood
15-Dec-84 Ann Hood
Hilton Road
Larchmont,
NY
3.18
Drug adminSide Effects
Charles Field
Gallstone
s removal
Kidney
stones
removal
Eye
Cataract
removal
Patricia Gold
Thrombos
is removal none
none
David Rosen
Open
Heart
Surgery
none
Beth Little
Cholecyst
ectomy
Demicillin
Beth Little
Michael
Diamond
Blind Brook
Mamaronec
10-May-95 Paul Kosher k, NY
Beth Little
Hilton Road
Larchmont,
NY
Surgery
Penicillin
rash
none
none
Tetracyclin
e
Fever
Cephalosp
orin
Charles Field
Gallstone
s
Removal
none
Eye
Cornea
Replacem Tetracyclin
ent
e
Charles Field
Eye
cataract
removal
none
none
none
Fever
none
©Silberschatz, Korth and Sudarshan
Second Normal Form
 A relation is said to be in Second Normal Form when every
nonkey attribute is fully functionally dependent on the primary
key.
 That is, every nonkey attribute needs the full primary key for unique
identification
Database System Concepts
3.19
©Silberschatz, Korth and Sudarshan
Second Normal Form
Patient #
1111
1234
2345
4876
5123
6845
Database System Concepts
Patient Name Patient Address
15 New St. New
John White York, NY
10 Main St. Rye,
Mary Jones NY
Charles
Dogwood Lane
Brown
Harrison, NY
55 Boston Post
Hal Kane
Road, Chester,
Blind Brook
Paul Kosher Mamaroneck, NY
Hilton Road
Ann Hood
Larchmont, NY
3.20
©Silberschatz, Korth and Sudarshan
Second Normal Form
Surgeon #
Surgeon Name
145 Beth Little
189 David Rosen
243 Charles Field
311 Michael Diamond
467 Patricia Gold
Database System Concepts
3.21
©Silberschatz, Korth and Sudarshan
Second Normal Form
Patient # Surgeon # Surgery Date
1111
1111
1234
1234
2345
4876
Drug Admin Side Effects
145
Gallstones
01-Jan-95 removal
Kidney
Penicillin
rash
311
stones
12-Jun-95 removal
none
none
243
Eye Cataract
05-Apr-94 removal
Tetracycline Fever
467
Thrombosis
10-May-95 removal
189
Open Heart
08-Jan-96 Surgery
Cephalospori
n
none
145
Cholecystect
05-Nov-95 omy
Demicillin
none
none
none
none
none
5123
145
6845
243
6845
243
Database System Concepts
Surgery
Gallstones
10-May-95 Removal
Eye cataract
15-Dec-84 removal
Eye Cornea
05-Apr-94 Replacement
3.22
none
none
Tetracycline Fever
©Silberschatz, Korth and Sudarshan
Third Normal Form
 A relation is said to be in Third Normal Form if there is no
transitive functional dependency between nonkey attributes
 When one nonkey attribute can be determined with one or more
nonkey attributes there is said to be a transitive functional
dependency.
 The side effect column in the Surgery table is determined by the
drug administered
 Side effect is transitively functionally dependent on drug so Surgery
is not 3NF
Database System Concepts
3.23
©Silberschatz, Korth and Sudarshan
Third Normal Form
Patient # Surgeon # Surgery Date
Surgery
Drug Admin
1111
145
1111
311
01-Jan-95 Gallstones removal
Kidney stones
12-Jun-95 removal
1234
243
05-Apr-94 Eye Cataract removal Tetracycline
1234
467
10-May-95 Thrombosis removal
2345
189
08-Jan-96 Open Heart Surgery
Cephalosporin
4876
145
05-Nov-95 Cholecystectomy
Demicillin
5123
145
10-May-95 Gallstones Removal
none
6845
243
none
6845
243
15-Dec-84 Eye cataract removal
Eye Cornea
05-Apr-94 Replacement
Database System Concepts
3.24
Penicillin
none
none
Tetracycline
©Silberschatz, Korth and Sudarshan
Third Normal Form
Drug Admin
Database System Concepts
Side Effects
Cephalosporin
none
Demicillin
none
none
none
Penicillin
rash
Tetracycline
Fever
3.25
©Silberschatz, Korth and Sudarshan
Functional Dependency and Keys
 Functional Dependency: The value of one attribute (the
determinant) determines the value of another attribute.
 Candidate Key: Each non-key field is functionally dependent on
every candidate key.
Database System Concepts
3.26
©Silberschatz, Korth and Sudarshan
Steps in Normalization
Database System Concepts
3.27
©Silberschatz, Korth and Sudarshan
Normalization – most used
 Four most commonly used normal forms are first (1NF), second
(2NF) and third (3NF) normal forms, and Boyce–Codd normal
form (BCNF).
 Based on functional dependencies among the attributes of a
relation.
 A relation can be normalized to a specific form to prevent possible
occurrence of update anomalies.
Database System Concepts
3.28
©Silberschatz, Korth and Sudarshan
First Normal Form
 No multi-valued attributes.
 Every attribute value is atomic.
 Why are the following tables not in 1NF
Employee (ssn, Name, Salary, Address, ListOfSkills)
Department (Did, Dname, ssn)
Database System Concepts
3.29
©Silberschatz, Korth and Sudarshan
Second Normal Form
 1NF and every non-key attribute is fully functionally
dependent on the primary key.
 Every non-key attribute must be defined by the entire key, not
by only part of the key.
 No partial functional dependencies.
Assuming that we have a composite PK (LicensePlate, OwnerSSN) for the Vechicle
Table below, why is the table not in 2NF ?
Vehicle (LicensePlate, Brand, Model, PurchasePrice, Year, OwnerSSN, OwnerName)
Database System Concepts
3.30
©Silberschatz, Korth and Sudarshan
Third Normal Form &
BCNF
 2NF and no transitive dependencies (functional dependency
between non-key attributes = BCNF)
Why are the following tables not in 3NF or BCNF ?
 Why is Employee [ssn, name, salary, did, dname]
 Customer
Database System Concepts
3.31
©Silberschatz, Korth and Sudarshan
3NF & BCNF
 It is very rare for a Table to be in 3NF and not be in BCNF
(violation of BCNF).
 Given a Relation R with attributes A, B and C where A and B are
together the composite PK,
IF A, B -> C and C -> B
THEN R is in 3NF and is not in BCNF
Example: Student, course -> Instructor
Instructor -> Course
Database System Concepts
3.32
©Silberschatz, Korth and Sudarshan
Steps in Normalization

1NF: a table, without multivalued attributes
 if not, then decompose

2NF: 1NF and every non-key attribute is fully functionally dependent on the
primary key
 if not, then decompose

3NF: 2NF and no transitive dependencies
 if not, then decompose

GENERAL:
 Each table should describe a single theme
 Modification anomalies are minimized
Hint: THE KEY, THE WHOLE KEY AND NOTHING BUT THE KEY
Database System Concepts
3.33
©Silberschatz, Korth and Sudarshan
EXAMPLE - OBTAIN CANDIDATE KEYS
Consider the following scheme from an airline
database system:
( P (pilot) , F (flight# ), D (date), T (scheduled time to
depart) )
We have the following FD's :
 F ----> T
PDT ----> F
FD ----> P
Provide some superkeys:
 PDT is a superkey, and FD is a superkey.
 Is PDT a candidate key?
 PD is not a superkey, nor is DT, nor is PT.
 So, PDT is a candidate key.
 FD is also a candidate key, since neither F or D
are superkeys.
Database System Concepts
3.34
©Silberschatz, Korth and Sudarshan
CLOSURE OF A SET OF FD'S
If F is a set of functional
dependencies for a relation R, the set
of all functional dependencies that can
be derived from F, denoted by F+, is
called the CLOSURE of F.
We can use Armstrong's axioms, and
the 3 derived rules, to compute the
closure of F, F+.
Database System Concepts
3.35
©Silberschatz, Korth and Sudarshan
WORKING TO GET THE CLOSURE F+
 GIVEN: scheme (A, B, C, G, H, I)
 GIVEN: FD set (A--->B, A--->C, CG--->H,
CG--->I, B--->H)
 Some members of F+ are
 A--->H {Transitivity Rule applied to A--->B and
B--->H)
 CG--->HI {Union Rule applied to CG--->H and
CG--->I}
 AG--->I {By Augmentation Rule, AG--->CG;
then Transitivity}
Database System Concepts
3.36
©Silberschatz, Korth and Sudarshan
THE CLOSURE OF A SET OF ATTRIBUTES
GIVEN: FD set F and a given attribute A (or set
of attributes A)
FIND : The set of attributes functionally
dependent on A, called the closure of A, and
denoted by A+
IMPORTANT USE FOR THIS: To determine if A
is a superkey, we compute A+, the set of
attributes functionally dependent on A. If A+
consists of ALL the attributes in the relation,
then A is a superkey
HOW DO WE FIND A+? The following
algorithm does the trick!
Database System Concepts
3.37
©Silberschatz, Korth and Sudarshan
ALGORITHM TO FIND THE CLOSURE OF
ATTRIBUTE A, DENOTED BY A+
result := A;
while { result changes }
for each functional dependency B--->C
begin
if B is contained in result, then result := result U C
' end
endwhile
A+ := result
Database System Concepts
3.38
©Silberschatz, Korth and Sudarshan
EXAMPLE TO FIND THE CLOSURE A+ OF AN ATTRIBUTE A
GIVEN: Relation R with attributes W, X, Y, Z and FD's W
---> Z YZ ---> X
WZ ---> Y
FIND : WZ+
PSEUDO TRACE OF THE ALGORITHM:
 result := WZ
 from first 2 FD's, no change to "result"
 from WZ ---> Y, since WZ is contained in result, we
get result := WZY
 since YZ is contained in result, we get result :=
WZYX
 Thus, every attribute in R is in WZ+, so WZ is a
superkey!
Database System Concepts
3.39
©Silberschatz, Korth and Sudarshan
Normalization

Normalization of data - method for analyzing schemas
 Unsatisfactory schemas decomposed into smaller ones
with desirable properties
 Objectives of normalization
 good relation schemas disallowing update anomalies
Database System Concepts
3.40
©Silberschatz, Korth and Sudarshan
Formal framework
 database normalized to any degree (1, 2, 3, 4, 5, etc.)
 normalization is not done in isolation
 need:
 lossless join
 dependency preservation
 additional normal forms meet other desirable criteria
Database System Concepts
3.41
©Silberschatz, Korth and Sudarshan
Normal Forms
 1st, 2nd, 3rd, BCNF consider only FD and key constraints
 constraints must not be hard to understand or detect
 need not normalize to highest form (e.g. for performance
reasons)
Database System Concepts
3.42
©Silberschatz, Korth and Sudarshan
1NF - 1st normal form



part of the formal definition of a relation
disallow multivalued attributes, composite attributes and their
combination
In 1NF single (atomic, indivisible) values
Database System Concepts
3.43
©Silberschatz, Korth and Sudarshan
Normalize into 1NF?

How to normalize nested relations into 1NF?
 Remove nested relation attributes into new relation
 propagate PK
 combine PK and partial PK
 recursively unnest - multilevel nesting
 useful in converting hierarchical schemes into 1NF
Database System Concepts
3.44
©Silberschatz, Korth and Sudarshan
Difficulties with 1NF

insert, delete, update

Determine if describe entity identified by PK?

If not, called non-full FDs

we need full FDs for good inserts, deletes, updates
Database System Concepts
3.45
©Silberschatz, Korth and Sudarshan
Second Normal Form - 2NF

Uses the concepts of FDs, PKs and this definition:

An FD is a Full functional dependency if:
given Y -> Z
Removal of any attribute from Y means the FD does not hold any more
Database System Concepts
3.46
©Silberschatz, Korth and Sudarshan
2NF
 A relation schema R is in 2NF if:
 Relation is in 1NF
 Every non-prime attribute A in R is fully functionally dependent on the
primary key
Prime attribute - attribute that is a member of the primary key K
 R can be decomposed into 2NF relations via the process of 2NF
normalization
 Remove partial dependencies
 create new relations where partials are full
Database System Concepts
3.47
©Silberschatz, Korth and Sudarshan
Simplifying Functional Dependencies
through Normalization
Normalization:
the identification of functional dependencies
and the modifications required to structurally
change the database to remove undesirable
dependencies
Database System Concepts
3.48
©Silberschatz, Korth and Sudarshan
Source: ESRI Advanced ArcInfo
Database System Concepts
3.49
©Silberschatz, Korth and Sudarshan
Source: ESRI Advanced ArcInfo
Database System Concepts
3.50
©Silberschatz, Korth and Sudarshan
Source: ESRI Advanced ArcInfo
Database System Concepts
3.51
©Silberschatz, Korth and Sudarshan
Source: ESRI Advanced ArcInfo
Database System Concepts
3.52
©Silberschatz, Korth and Sudarshan
Source: ESRI Advanced ArcInfo
Database System Concepts
3.53
©Silberschatz, Korth and Sudarshan
Source: ESRI Advanced ArcInfo
Database System Concepts
3.54
©Silberschatz, Korth and Sudarshan
September 2 ,2004
Read the following article:IBM's early relational
database scientists:
http://www.mcjones.org/System_R/SQL_Re
union_95/sqlr95.html
Chapter 3 3.1. And Chapter 7,7.1-7.3.2
Work on problems:
7.12.7.13,7.14,7.15
Database System Concepts
3.55
©Silberschatz, Korth and Sudarshan