Download functional dependencies

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Extensible Storage Engine wikipedia , lookup

Open Database Connectivity wikipedia , lookup

Global serializability wikipedia , lookup

Microsoft Jet Database Engine wikipedia , lookup

Concurrency control wikipedia , lookup

Entity–attribute–value model wikipedia , lookup

Encyclopedia of World Problems and Human Potential wikipedia , lookup

Ingres (database) wikipedia , lookup

Navitaire Inc v Easyjet Airline Co. and BulletProof Technologies, Inc. wikipedia , lookup

Functional Database Model wikipedia , lookup

Clusterpoint wikipedia , lookup

Database wikipedia , lookup

Versant Object Database wikipedia , lookup

Healthcare Cost and Utilization Project wikipedia , lookup

Relational algebra wikipedia , lookup

Database model wikipedia , lookup

Relational model wikipedia , lookup

Transcript
Databases 6:
Normalization
Hossein Rahmani
Design of a Relational DB-Schema
» What is a good relational database schema?
» Conceptual level (ER-schema, Views)
» Implementation level (consistency, efficiency of base
relations = stored tables)
» How do you achieve a good schema?
» Bottom-up (start with relations between attributes)
»design by synthesis
» Top-down (start with ER-schema and then further
decomposition):
»design by analysis
2
Databases lecture 6
Obtaining a good design
» We first give (informally) 4 guidelines
» Then we will give formal requirements
incorporating the guidelines (based on
functional dependencies)
3
Databases lecture 6
Guideline 1: Semantics
» Make sure that the semantics (meaning) of
all base relations and attributes is clear
»Tuples must be easily interpreted as ‘facts’
»Do not mix, if possible, attributes of more than
one entity or relation type in one base relation
4
Databases lecture 6
Examples of poor design
5
Databases lecture 6
Guideline 2: Redundancy and anomalies
» Avoid redundancy: reduce the space that is
needed to store the database as much as
possible
» Prevent anomalies when changing data in
the database
»update (insertion / deletion / modification)
anomalies
6
Databases lecture 6
Redundancy - example
7
Databases lecture 6
Update Anomalies
» Cause: doubly stored data, wrong design
(example: see next slide)
» insertion anomalies:
» new tuple contains incorrect attribute value for an already
stored entity
» new entity has a null key
» deletion anomalies
» Incomplete deletion of an entity
» unwanted deletion of an entity
» modification anomalies
» incomplete modification of an entity
8
Databases lecture 6
Guideline 3: NULL-values
» Some base relations contain many attributes
that often are ‘NULL’
»Unnecessary use of space
»Multiple meanings of ‘NULL’
»JOIN operations can have undesired effects
»COUNT and SUM can go wrong
» SO: place an attribute in a base relation in
which it is as least as possible ‘NULL’
9
Databases lecture 6
Guideline 4: False (Spurious) Tuples
» If we select base relations wrong, a
(NATURAL-)JOIN can create tuples that do
not have any connection with the mini world
(see next slides)
» So: select base relations such that at a JOIN
on primary or foreign keys, no spurious
tuples can occur. Don’t JOIN on other
attributes
10
Databases lecture 6
Wrong choice - relations
11
Databases lecture 6
Wrong choice: states
12
Databases lecture 6
Natural join -> spurious tuples (marked *)
13
Databases lecture 6
Normalization
» Using ‘normalization’, you can adhere to
these guidelines for a large part
» In a number of steps (algorithms) you
transfer a given relational database schema
into an ever higher normal form
» Base concept: functional dependency
14
Databases lecture 6
Functional dependency
» Start with one universal relation schema R
containing all attributes A1,..,An
» Given two attribute sets X and Y in R
» Functional dependency X  Y exists (X
functionally determines Y; Y is functionally
dependent on X) if:
»r(R): t1,t2  r: t1[X] = t2[X]  t1[Y] = t2[Y]
» i.e.: component X determines component Y
15
Databases lecture 6
Functional dependency
AB  C
A
16
Databases lecture 6
B
C
Functional dependency
» If X is a superkey of R then X  Y holds for
each set Y of attributes in R
» If X  Y, then nothing can be concluded on
the existence of Y  X
» X  Y follows from the semantics of the
attributes in X and Y (which means that the
designer should note and declare it)
» r(R) is legal if it agrees with all functional
dependencies (FDs) declared on R
17
Databases lecture 6
Inference rules for FDs
» Six rules for deriving FDs:
»IR1 (reflexive): if X  Y then X  Y (trivial)
As a special case: X  X
»IR2 (extension): {X  Y} |= XZ  YZ
»IR3 (transitive): {X  Y, Y  Z} |= X  Z
»IR4 (project): {X  YZ} |= X  Y
»IR5 (combine): {X  Y, X  Z} |= X  YZ
»IR6 (pseudotransitive):
{X  Y, WY  Z} |= WX  Z
18
Databases lecture 6
Closure F+ of F
» Given a set of FDs for R: F(R)
» IR1-3 is sound & complete (Armstrong)
»Sound: If a new FD f can be derived from F(R)
using IR1-3, and r(R) is legal for F, then r(R) is
also legal for F  {f}
»Complete: If FD f holds on R then f can be
derived from F using IR1-3
» The set F+(R) of all FDs that can be derived
from F, is called the closure F+ of F(R)
19
Databases lecture 6
Closure X+ under F
» Given X  Y  F. The closure X+ of X
under F is the set of all attributes that
are also functionally dependent on X
Algorithm to determine X+:
X+ := X ;
do { oldX+ = X+ ;
for all Y  Z  F :
if X+  Y then X+ = X+  Z;
} while (oldX+  X+)
20
Databases lecture 6
Equivalence
» Two sets of FDs, F and E are equivalent
(F  E) iff F+ = E+
» Semantically: if F  E, then r(R) is legal for
F iff r(R) is legal for E
» By definition: F |= f iff F  F  {f}
» For each set F there exist many
equivalent sets of FDs. We prefer
simplicity: minimal cover
21
Databases lecture 6
Minimal Cover
» We can translate any F into an equivalent
minimal cover G
» A set FDs G is a minimal cover of F if G  F and
» for all X  Y  G,
Y has exactly one attribute (so, if
X  YZ, then split into X  Y and X  Z)
» We cannot remove any X  Y from G without
loosing equivalence with F
» We cannot replace any X  Y in G by W  Y with
W  X, without loosing equivalence with F
22
Databases lecture 6
Algorithm for Minimal Cover
1) Start with G := F ;
2) Replace all X  Y with Y = {A1,..,An} by X  Ai ; (IR4)
3) For all XY  A: if G - {XY  A}  G  {X  A}
then replace XY  A with X  A;
4) If G - {X  A}  G then remove X  A;
Example:
1) AB  CD; C  D; A  CB;
2) AB  C; AB  D; C  D; A  C; A  B;
3) A  C; A  D; C  D ; A  B;
4) A  C; C  D ; A  B;
23
Databases lecture 6
Normal forms
» Invented by Codd as a test on relational
database schemas
»The tests (‘normal forms’) grow more severe.
The more severe the test, the higher the normal
form, the more robust the database
»If a schema does not pass the test, it is
decomposed in partial schemas that do pass the
test
»It is not always necessary to reach the highest
possible normal form
24
Databases lecture 6
1NF - First Normal Form
» Attributes can only be single-valued
»Is a basic demand of most relational
databases
» Example of a non-1NF relation (see next
slide). This normally is already ruled out
by the definition of a relation (so using
the relational database model
automatically ensures 1NF)
25
Databases lecture 6
Non-1NF - example
26
Databases lecture 6
1NF - First Normal Form
» Solutions for a multi-valued attribute A in R:
»Preferred: create new relation S with A and a
foreign key to R
»Extend the key of R with an index number for
the values of A (redundancy!); e.g. department
has no. 5A, or 5B, or 5C
»Determine the maximum number of values per
tuple for A (say k) and replace attribute by k
attributes (say, loc1, loc2, and loc3). This
introduces null-values!
27
Databases lecture 6
2NF - Second Normal form
» Definition: X  Y is a partial functional
dependency if there is an attribute A in X s.t.
X-{A}  Y
» X  Y is total if it is not partial
» 2NF:
each non-primary attribute is totally
dependent on primary key (and not on parts
of the primary key)
28
Databases lecture 6
2NF - Normalizing
» Break up the relation such that every partial
key with their dependent attributes is in a
separate relation. Only keep those attributes
that depend totally on the primary key
» Example (see next slide)
29
Databases lecture 6
2NF - example
30
Databases lecture 6
3NF - Third Normal Form
» Definition: X  Y is a transitive dependency
if there is a Z that is not (part of) a candidate
key s.t. X  Z and Z  Y
» 3NF: no non-primary attribute is transitively
depending on the primary key
31
Databases lecture 6
3NF - Normalizing
» Break up the relation such that the
attributes that are depending on not-key
attributes appear in a separate table
(together with the attributes on which they
depend)
» Example (see next slide)
32
Databases lecture 6
3NF - example
33
Databases lecture 6
General form 2NF and 3NF
» Put the same demands on all candidate keys
(super keys) – which is more severe
»2NF: every non-key attribute is totally
dependent on all keys
»3NF: no non-key attribute is transitively
dependent on any key
Other formulation:
if X  A then A is prime or X is a super key
» Example (see next slide)
34
Databases lecture 6
General form 2NF and 3NF - example
1NF
2NF
35
Databases lecture 6
General form 2NF and 3NF - example
2NF
3NF
36
Databases lecture 6
Boyce-Codd Normal Form
» Simpler, but stronger than 3NF
» BCNF: for each non-trivial dependency
X  A holds that X is a super key
» Difference: in 3NF if A is a prime attribute, X
does not have to be super key
» In many cases a 3NF schema is also BCNF
37
Databases lecture 6
BCNF example
38
Databases lecture 6
Decompositions
» Only adhering to a normal form is not
enough
» We must not lose attributes in the process!
» Non-additive Join-property:
»a natural join of the result of a decomposition
should result in the original table, without
spurious tuples
» There exist algorithms to automatically find
good decompositions
39
Databases lecture 6
ER-schema to relational schemas
» A relational database schema that is mapped
from an ER-schema is often in BCNF, but
always in 3NF (so, check if BCNF is applicable
and useful)
» Many CASE-tools can map an ER-schema
automatically into a good relational schema
(e.g., SQL create-table commands)
40
Databases lecture 6