* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download functional dependencies
Extensible Storage Engine wikipedia , lookup
Open Database Connectivity wikipedia , lookup
Global serializability wikipedia , lookup
Microsoft Jet Database Engine wikipedia , lookup
Concurrency control wikipedia , lookup
Entity–attribute–value model wikipedia , lookup
Encyclopedia of World Problems and Human Potential wikipedia , lookup
Ingres (database) wikipedia , lookup
Navitaire Inc v Easyjet Airline Co. and BulletProof Technologies, Inc. wikipedia , lookup
Functional Database Model wikipedia , lookup
Clusterpoint wikipedia , lookup
Versant Object Database wikipedia , lookup
Healthcare Cost and Utilization Project wikipedia , lookup
Relational algebra wikipedia , lookup
Databases 6: Normalization Hossein Rahmani Design of a Relational DB-Schema » What is a good relational database schema? » Conceptual level (ER-schema, Views) » Implementation level (consistency, efficiency of base relations = stored tables) » How do you achieve a good schema? » Bottom-up (start with relations between attributes) »design by synthesis » Top-down (start with ER-schema and then further decomposition): »design by analysis 2 Databases lecture 6 Obtaining a good design » We first give (informally) 4 guidelines » Then we will give formal requirements incorporating the guidelines (based on functional dependencies) 3 Databases lecture 6 Guideline 1: Semantics » Make sure that the semantics (meaning) of all base relations and attributes is clear »Tuples must be easily interpreted as ‘facts’ »Do not mix, if possible, attributes of more than one entity or relation type in one base relation 4 Databases lecture 6 Examples of poor design 5 Databases lecture 6 Guideline 2: Redundancy and anomalies » Avoid redundancy: reduce the space that is needed to store the database as much as possible » Prevent anomalies when changing data in the database »update (insertion / deletion / modification) anomalies 6 Databases lecture 6 Redundancy - example 7 Databases lecture 6 Update Anomalies » Cause: doubly stored data, wrong design (example: see next slide) » insertion anomalies: » new tuple contains incorrect attribute value for an already stored entity » new entity has a null key » deletion anomalies » Incomplete deletion of an entity » unwanted deletion of an entity » modification anomalies » incomplete modification of an entity 8 Databases lecture 6 Guideline 3: NULL-values » Some base relations contain many attributes that often are ‘NULL’ »Unnecessary use of space »Multiple meanings of ‘NULL’ »JOIN operations can have undesired effects »COUNT and SUM can go wrong » SO: place an attribute in a base relation in which it is as least as possible ‘NULL’ 9 Databases lecture 6 Guideline 4: False (Spurious) Tuples » If we select base relations wrong, a (NATURAL-)JOIN can create tuples that do not have any connection with the mini world (see next slides) » So: select base relations such that at a JOIN on primary or foreign keys, no spurious tuples can occur. Don’t JOIN on other attributes 10 Databases lecture 6 Wrong choice - relations 11 Databases lecture 6 Wrong choice: states 12 Databases lecture 6 Natural join -> spurious tuples (marked *) 13 Databases lecture 6 Normalization » Using ‘normalization’, you can adhere to these guidelines for a large part » In a number of steps (algorithms) you transfer a given relational database schema into an ever higher normal form » Base concept: functional dependency 14 Databases lecture 6 Functional dependency » Start with one universal relation schema R containing all attributes A1,..,An » Given two attribute sets X and Y in R » Functional dependency X Y exists (X functionally determines Y; Y is functionally dependent on X) if: »r(R): t1,t2 r: t1[X] = t2[X] t1[Y] = t2[Y] » i.e.: component X determines component Y 15 Databases lecture 6 Functional dependency AB C A 16 Databases lecture 6 B C Functional dependency » If X is a superkey of R then X Y holds for each set Y of attributes in R » If X Y, then nothing can be concluded on the existence of Y X » X Y follows from the semantics of the attributes in X and Y (which means that the designer should note and declare it) » r(R) is legal if it agrees with all functional dependencies (FDs) declared on R 17 Databases lecture 6 Inference rules for FDs » Six rules for deriving FDs: »IR1 (reflexive): if X Y then X Y (trivial) As a special case: X X »IR2 (extension): {X Y} |= XZ YZ »IR3 (transitive): {X Y, Y Z} |= X Z »IR4 (project): {X YZ} |= X Y »IR5 (combine): {X Y, X Z} |= X YZ »IR6 (pseudotransitive): {X Y, WY Z} |= WX Z 18 Databases lecture 6 Closure F+ of F » Given a set of FDs for R: F(R) » IR1-3 is sound & complete (Armstrong) »Sound: If a new FD f can be derived from F(R) using IR1-3, and r(R) is legal for F, then r(R) is also legal for F {f} »Complete: If FD f holds on R then f can be derived from F using IR1-3 » The set F+(R) of all FDs that can be derived from F, is called the closure F+ of F(R) 19 Databases lecture 6 Closure X+ under F » Given X Y F. The closure X+ of X under F is the set of all attributes that are also functionally dependent on X Algorithm to determine X+: X+ := X ; do { oldX+ = X+ ; for all Y Z F : if X+ Y then X+ = X+ Z; } while (oldX+ X+) 20 Databases lecture 6 Equivalence » Two sets of FDs, F and E are equivalent (F E) iff F+ = E+ » Semantically: if F E, then r(R) is legal for F iff r(R) is legal for E » By definition: F |= f iff F F {f} » For each set F there exist many equivalent sets of FDs. We prefer simplicity: minimal cover 21 Databases lecture 6 Minimal Cover » We can translate any F into an equivalent minimal cover G » A set FDs G is a minimal cover of F if G F and » for all X Y G, Y has exactly one attribute (so, if X YZ, then split into X Y and X Z) » We cannot remove any X Y from G without loosing equivalence with F » We cannot replace any X Y in G by W Y with W X, without loosing equivalence with F 22 Databases lecture 6 Algorithm for Minimal Cover 1) Start with G := F ; 2) Replace all X Y with Y = {A1,..,An} by X Ai ; (IR4) 3) For all XY A: if G - {XY A} G {X A} then replace XY A with X A; 4) If G - {X A} G then remove X A; Example: 1) AB CD; C D; A CB; 2) AB C; AB D; C D; A C; A B; 3) A C; A D; C D ; A B; 4) A C; C D ; A B; 23 Databases lecture 6 Normal forms » Invented by Codd as a test on relational database schemas »The tests (‘normal forms’) grow more severe. The more severe the test, the higher the normal form, the more robust the database »If a schema does not pass the test, it is decomposed in partial schemas that do pass the test »It is not always necessary to reach the highest possible normal form 24 Databases lecture 6 1NF - First Normal Form » Attributes can only be single-valued »Is a basic demand of most relational databases » Example of a non-1NF relation (see next slide). This normally is already ruled out by the definition of a relation (so using the relational database model automatically ensures 1NF) 25 Databases lecture 6 Non-1NF - example 26 Databases lecture 6 1NF - First Normal Form » Solutions for a multi-valued attribute A in R: »Preferred: create new relation S with A and a foreign key to R »Extend the key of R with an index number for the values of A (redundancy!); e.g. department has no. 5A, or 5B, or 5C »Determine the maximum number of values per tuple for A (say k) and replace attribute by k attributes (say, loc1, loc2, and loc3). This introduces null-values! 27 Databases lecture 6 2NF - Second Normal form » Definition: X Y is a partial functional dependency if there is an attribute A in X s.t. X-{A} Y » X Y is total if it is not partial » 2NF: each non-primary attribute is totally dependent on primary key (and not on parts of the primary key) 28 Databases lecture 6 2NF - Normalizing » Break up the relation such that every partial key with their dependent attributes is in a separate relation. Only keep those attributes that depend totally on the primary key » Example (see next slide) 29 Databases lecture 6 2NF - example 30 Databases lecture 6 3NF - Third Normal Form » Definition: X Y is a transitive dependency if there is a Z that is not (part of) a candidate key s.t. X Z and Z Y » 3NF: no non-primary attribute is transitively depending on the primary key 31 Databases lecture 6 3NF - Normalizing » Break up the relation such that the attributes that are depending on not-key attributes appear in a separate table (together with the attributes on which they depend) » Example (see next slide) 32 Databases lecture 6 3NF - example 33 Databases lecture 6 General form 2NF and 3NF » Put the same demands on all candidate keys (super keys) – which is more severe »2NF: every non-key attribute is totally dependent on all keys »3NF: no non-key attribute is transitively dependent on any key Other formulation: if X A then A is prime or X is a super key » Example (see next slide) 34 Databases lecture 6 General form 2NF and 3NF - example 1NF 2NF 35 Databases lecture 6 General form 2NF and 3NF - example 2NF 3NF 36 Databases lecture 6 Boyce-Codd Normal Form » Simpler, but stronger than 3NF » BCNF: for each non-trivial dependency X A holds that X is a super key » Difference: in 3NF if A is a prime attribute, X does not have to be super key » In many cases a 3NF schema is also BCNF 37 Databases lecture 6 BCNF example 38 Databases lecture 6 Decompositions » Only adhering to a normal form is not enough » We must not lose attributes in the process! » Non-additive Join-property: »a natural join of the result of a decomposition should result in the original table, without spurious tuples » There exist algorithms to automatically find good decompositions 39 Databases lecture 6 ER-schema to relational schemas » A relational database schema that is mapped from an ER-schema is often in BCNF, but always in 3NF (so, check if BCNF is applicable and useful) » Many CASE-tools can map an ER-schema automatically into a good relational schema (e.g., SQL create-table commands) 40 Databases lecture 6