* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Second Normal Form - Department of Computer Science
Survey
Document related concepts
Global serializability wikipedia , lookup
Encyclopedia of World Problems and Human Potential wikipedia , lookup
Microsoft Access wikipedia , lookup
Commitment ordering wikipedia , lookup
Serializability wikipedia , lookup
Open Database Connectivity wikipedia , lookup
Oracle Database wikipedia , lookup
Extensible Storage Engine wikipedia , lookup
Entity–attribute–value model wikipedia , lookup
Relational algebra wikipedia , lookup
Ingres (database) wikipedia , lookup
Microsoft Jet Database Engine wikipedia , lookup
Concurrency control wikipedia , lookup
Clusterpoint wikipedia , lookup
Database model wikipedia , lookup
Transcript
Lecture 3 Functional Dependency and Normal Forms Prof. Sin-Min Lee Department of Computer Science Database Design Process Application 1 External Model Application 2 Application 3 Application 4 External Model External Model External Model Application 1 Conceptual requirements Application 2 Conceptual requirements Application 3 Conceptual requirements Conceptual Model Logical Model Internal Model Application 4 Conceptual requirements Database System Concepts 3.2 ©Silberschatz, Korth and Sudarshan Relational Database Model Relations Source: ESRI Advanced ArcInfo Database System Concepts 3.3 ©Silberschatz, Korth and Sudarshan Source: ESRI Advanced ArcInfo Database System Concepts 3.4 ©Silberschatz, Korth and Sudarshan Source: ESRI Advanced ArcInfo Database System Concepts 3.5 ©Silberschatz, Korth and Sudarshan Source: ESRI Advanced ArcInfo Database System Concepts 3.6 ©Silberschatz, Korth and Sudarshan Georelational Database Model Database System Concepts 3.7 ©Silberschatz, Korth and Sudarshan Attribute Relationships Functional Dependency: refers to the relationships between attributes within a relation. If the value of attribute A determines the value of attribute B, then attribute B is functionally dependent upon attribute A. Database System Concepts 3.8 ©Silberschatz, Korth and Sudarshan Source: ESRI Advanced ArcInfo Database System Concepts 3.9 ©Silberschatz, Korth and Sudarshan Functional Dependencies X -> Y means: X functionally determines Y Y depends on X Values of Y component depend on, determined by values of X component Database System Concepts 3.10 ©Silberschatz, Korth and Sudarshan Functional Dependencies Given t1 and t2: if t1[X] = t2 [X] then t1[Y] = t2 [Y] (1) In other words if the values of X are equal, then Y value are equal Values of X component uniquely (functionally) determine values of Y component iff (1) Database System Concepts 3.11 ©Silberschatz, Korth and Sudarshan Data Normalization Primarily a tool to validate and improve a logical design so that it satisfies certain constraints that avoid unnecessary duplication of data. The process of decomposing relations with anomalies to produce smaller, well-structured relations. Primary Objective: Reduce Redundancy,Reduce nulls, Improve “modify” activities: insert, update, delete, but not read Price: degraded query, display, reporting Database System Concepts 3.12 ©Silberschatz, Korth and Sudarshan Normal Forms First Normal Form (1NF) Second Normal Form (2NF) Third Normal Form (3NF) Boyce-Codd Normal Form (BCNF) Fourth Normal Form (4NF) Fifth Normal Form (5NF) Database System Concepts 3.13 ©Silberschatz, Korth and Sudarshan Normalization No transitive dependency between nonkey attributes All determinants are candidate keys - Single multivalued dependency Database System Concepts BoyceCodd and Higher 3.14 Functional dependency of nonkey attributes on the primary key - Atomic values only Full Functional dependency of nonkey attributes on the primary key ©Silberschatz, Korth and Sudarshan Unnormalized Relations First step in normalization is to convert the data into a two- dimensional table In unnormalized relations data can repeat within a column Database System Concepts 3.15 ©Silberschatz, Korth and Sudarshan Unnormalized Relation Patient # Surgeon # 145 1111 311 Surg. date Patient Name Jan 1, 1995; June 12, 1995 John White Patient Addr Surgeon 15 New St. New York, NY 243 1234 467 2345 189 Jan 8, 1996 Charles Brown 4876 145 Nov 5, 1995 Hal Kane 5123 145 May 10, 1995 Paul Kosher Charles Field 10 Main St. Patricia Rye, NY Gold Dogwood Lane Harrison, David NY Rosen 55 Boston Post Road, Chester, CN Beth Little Blind Brook Mamaronec k, NY Beth Little 6845 243 Apr 5, 1994 Dec 15, 1984 Ann Hood Hilton Road Larchmont, Charles NY Field Database System Concepts 3.16 Postop drug Drug side effects Gallstone s removal; Beth Little Kidney Michael stones Penicillin, Diamond removal none- Apr 5, 1994 May 10, 1995 Mary Jones Surgery rash none Eye Cataract removal Thrombos Tetracyclin Fever is removal e none none Open Heart Surgery Cephalosp orin none Cholecyst ectomy Gallstone s Removal Eye Cornea Replacem ent Eye cataract removal Demicillin none none none Tetracyclin e Fever ©Silberschatz, Korth and Sudarshan First Normal Form To move to First Normal Form a relation must contain only atomic values at each row and column. No repeating groups A column or set of columns is called a Candidate Key when its values can uniquely identify the row in the relation. Database System Concepts 3.17 ©Silberschatz, Korth and Sudarshan First Normal Form Patient # Surgeon # Surgery DatePatient Name Patient Addr Surgeon Name 1111 145 01-Jan-95 John White 1111 311 12-Jun-95 John White 15 New St. New York, NY 15 New St. New York, NY 1234 243 05-Apr-94 Mary Jones 10 Main St. Rye, NY 1234 467 10-May-95 Mary Jones 2345 4876 5123 6845 6845 189 145 145 243 243 Database System Concepts Charles 08-Jan-96 Brown 10 Main St. Rye, NY Dogwood Lane Harrison, NY 05-Nov-95 Hal Kane 55 Boston Post Road, Chester, CN 05-Apr-94 Ann Hood 15-Dec-84 Ann Hood Hilton Road Larchmont, NY 3.18 Drug adminSide Effects Charles Field Gallstone s removal Kidney stones removal Eye Cataract removal Patricia Gold Thrombos is removal none none David Rosen Open Heart Surgery none Beth Little Cholecyst ectomy Demicillin Beth Little Michael Diamond Blind Brook Mamaronec 10-May-95 Paul Kosher k, NY Beth Little Hilton Road Larchmont, NY Surgery Penicillin rash none none Tetracyclin e Fever Cephalosp orin Charles Field Gallstone s Removal none Eye Cornea Replacem Tetracyclin ent e Charles Field Eye cataract removal none none none Fever none ©Silberschatz, Korth and Sudarshan Second Normal Form A relation is said to be in Second Normal Form when every nonkey attribute is fully functionally dependent on the primary key. That is, every nonkey attribute needs the full primary key for unique identification Database System Concepts 3.19 ©Silberschatz, Korth and Sudarshan Second Normal Form Patient # 1111 1234 2345 4876 5123 6845 Database System Concepts Patient Name Patient Address 15 New St. New John White York, NY 10 Main St. Rye, Mary Jones NY Charles Dogwood Lane Brown Harrison, NY 55 Boston Post Hal Kane Road, Chester, Blind Brook Paul Kosher Mamaroneck, NY Hilton Road Ann Hood Larchmont, NY 3.20 ©Silberschatz, Korth and Sudarshan Second Normal Form Surgeon # Surgeon Name 145 Beth Little 189 David Rosen 243 Charles Field 311 Michael Diamond 467 Patricia Gold Database System Concepts 3.21 ©Silberschatz, Korth and Sudarshan Second Normal Form Patient # Surgeon # Surgery Date 1111 1111 1234 1234 2345 4876 Drug Admin Side Effects 145 Gallstones 01-Jan-95 removal Kidney Penicillin rash 311 stones 12-Jun-95 removal none none 243 Eye Cataract 05-Apr-94 removal Tetracycline Fever 467 Thrombosis 10-May-95 removal 189 Open Heart 08-Jan-96 Surgery Cephalospori n none 145 Cholecystect 05-Nov-95 omy Demicillin none none none none none 5123 145 6845 243 6845 243 Database System Concepts Surgery Gallstones 10-May-95 Removal Eye cataract 15-Dec-84 removal Eye Cornea 05-Apr-94 Replacement 3.22 none none Tetracycline Fever ©Silberschatz, Korth and Sudarshan Third Normal Form A relation is said to be in Third Normal Form if there is no transitive functional dependency between nonkey attributes When one nonkey attribute can be determined with one or more nonkey attributes there is said to be a transitive functional dependency. The side effect column in the Surgery table is determined by the drug administered Side effect is transitively functionally dependent on drug so Surgery is not 3NF Database System Concepts 3.23 ©Silberschatz, Korth and Sudarshan Third Normal Form Patient # Surgeon # Surgery Date Surgery Drug Admin 1111 145 1111 311 01-Jan-95 Gallstones removal Kidney stones 12-Jun-95 removal 1234 243 05-Apr-94 Eye Cataract removal Tetracycline 1234 467 10-May-95 Thrombosis removal 2345 189 08-Jan-96 Open Heart Surgery Cephalosporin 4876 145 05-Nov-95 Cholecystectomy Demicillin 5123 145 10-May-95 Gallstones Removal none 6845 243 none 6845 243 15-Dec-84 Eye cataract removal Eye Cornea 05-Apr-94 Replacement Database System Concepts 3.24 Penicillin none none Tetracycline ©Silberschatz, Korth and Sudarshan Third Normal Form Drug Admin Database System Concepts Side Effects Cephalosporin none Demicillin none none none Penicillin rash Tetracycline Fever 3.25 ©Silberschatz, Korth and Sudarshan Functional Dependency and Keys Functional Dependency: The value of one attribute (the determinant) determines the value of another attribute. Candidate Key: Each non-key field is functionally dependent on every candidate key. Database System Concepts 3.26 ©Silberschatz, Korth and Sudarshan Steps in Normalization Database System Concepts 3.27 ©Silberschatz, Korth and Sudarshan Normalization – most used Four most commonly used normal forms are first (1NF), second (2NF) and third (3NF) normal forms, and Boyce–Codd normal form (BCNF). Based on functional dependencies among the attributes of a relation. A relation can be normalized to a specific form to prevent possible occurrence of update anomalies. Database System Concepts 3.28 ©Silberschatz, Korth and Sudarshan First Normal Form No multi-valued attributes. Every attribute value is atomic. Why are the following tables not in 1NF Employee (ssn, Name, Salary, Address, ListOfSkills) Department (Did, Dname, ssn) Database System Concepts 3.29 ©Silberschatz, Korth and Sudarshan Second Normal Form 1NF and every non-key attribute is fully functionally dependent on the primary key. Every non-key attribute must be defined by the entire key, not by only part of the key. No partial functional dependencies. Assuming that we have a composite PK (LicensePlate, OwnerSSN) for the Vechicle Table below, why is the table not in 2NF ? Vehicle (LicensePlate, Brand, Model, PurchasePrice, Year, OwnerSSN, OwnerName) Database System Concepts 3.30 ©Silberschatz, Korth and Sudarshan Third Normal Form & BCNF 2NF and no transitive dependencies (functional dependency between non-key attributes = BCNF) Why are the following tables not in 3NF or BCNF ? Why is Employee [ssn, name, salary, did, dname] Customer Database System Concepts 3.31 ©Silberschatz, Korth and Sudarshan 3NF & BCNF It is very rare for a Table to be in 3NF and not be in BCNF (violation of BCNF). Given a Relation R with attributes A, B and C where A and B are together the composite PK, IF A, B -> C and C -> B THEN R is in 3NF and is not in BCNF Example: Student, course -> Instructor Instructor -> Course Database System Concepts 3.32 ©Silberschatz, Korth and Sudarshan Steps in Normalization 1NF: a table, without multivalued attributes if not, then decompose 2NF: 1NF and every non-key attribute is fully functionally dependent on the primary key if not, then decompose 3NF: 2NF and no transitive dependencies if not, then decompose GENERAL: Each table should describe a single theme Modification anomalies are minimized Hint: THE KEY, THE WHOLE KEY AND NOTHING BUT THE KEY Database System Concepts 3.33 ©Silberschatz, Korth and Sudarshan EXAMPLE - OBTAIN CANDIDATE KEYS Consider the following scheme from an airline database system: ( P (pilot) , F (flight# ), D (date), T (scheduled time to depart) ) We have the following FD's : F ----> T PDT ----> F FD ----> P Provide some superkeys: PDT is a superkey, and FD is a superkey. Is PDT a candidate key? PD is not a superkey, nor is DT, nor is PT. So, PDT is a candidate key. FD is also a candidate key, since neither F or D are superkeys. Database System Concepts 3.34 ©Silberschatz, Korth and Sudarshan CLOSURE OF A SET OF FD'S If F is a set of functional dependencies for a relation R, the set of all functional dependencies that can be derived from F, denoted by F+, is called the CLOSURE of F. We can use Armstrong's axioms, and the 3 derived rules, to compute the closure of F, F+. Database System Concepts 3.35 ©Silberschatz, Korth and Sudarshan WORKING TO GET THE CLOSURE F+ GIVEN: scheme (A, B, C, G, H, I) GIVEN: FD set (A--->B, A--->C, CG--->H, CG--->I, B--->H) Some members of F+ are A--->H {Transitivity Rule applied to A--->B and B--->H) CG--->HI {Union Rule applied to CG--->H and CG--->I} AG--->I {By Augmentation Rule, AG--->CG; then Transitivity} Database System Concepts 3.36 ©Silberschatz, Korth and Sudarshan THE CLOSURE OF A SET OF ATTRIBUTES GIVEN: FD set F and a given attribute A (or set of attributes A) FIND : The set of attributes functionally dependent on A, called the closure of A, and denoted by A+ IMPORTANT USE FOR THIS: To determine if A is a superkey, we compute A+, the set of attributes functionally dependent on A. If A+ consists of ALL the attributes in the relation, then A is a superkey HOW DO WE FIND A+? The following algorithm does the trick! Database System Concepts 3.37 ©Silberschatz, Korth and Sudarshan ALGORITHM TO FIND THE CLOSURE OF ATTRIBUTE A, DENOTED BY A+ result := A; while { result changes } for each functional dependency B--->C begin if B is contained in result, then result := result U C ' end endwhile A+ := result Database System Concepts 3.38 ©Silberschatz, Korth and Sudarshan EXAMPLE TO FIND THE CLOSURE A+ OF AN ATTRIBUTE A GIVEN: Relation R with attributes W, X, Y, Z and FD's W ---> Z YZ ---> X WZ ---> Y FIND : WZ+ PSEUDO TRACE OF THE ALGORITHM: result := WZ from first 2 FD's, no change to "result" from WZ ---> Y, since WZ is contained in result, we get result := WZY since YZ is contained in result, we get result := WZYX Thus, every attribute in R is in WZ+, so WZ is a superkey! Database System Concepts 3.39 ©Silberschatz, Korth and Sudarshan Normalization Normalization of data - method for analyzing schemas Unsatisfactory schemas decomposed into smaller ones with desirable properties Objectives of normalization good relation schemas disallowing update anomalies Database System Concepts 3.40 ©Silberschatz, Korth and Sudarshan Formal framework database normalized to any degree (1, 2, 3, 4, 5, etc.) normalization is not done in isolation need: lossless join dependency preservation additional normal forms meet other desirable criteria Database System Concepts 3.41 ©Silberschatz, Korth and Sudarshan Normal Forms 1st, 2nd, 3rd, BCNF consider only FD and key constraints constraints must not be hard to understand or detect need not normalize to highest form (e.g. for performance reasons) Database System Concepts 3.42 ©Silberschatz, Korth and Sudarshan 1NF - 1st normal form part of the formal definition of a relation disallow multivalued attributes, composite attributes and their combination In 1NF single (atomic, indivisible) values Database System Concepts 3.43 ©Silberschatz, Korth and Sudarshan Normalize into 1NF? How to normalize nested relations into 1NF? Remove nested relation attributes into new relation propagate PK combine PK and partial PK recursively unnest - multilevel nesting useful in converting hierarchical schemes into 1NF Database System Concepts 3.44 ©Silberschatz, Korth and Sudarshan Difficulties with 1NF insert, delete, update Determine if describe entity identified by PK? If not, called non-full FDs we need full FDs for good inserts, deletes, updates Database System Concepts 3.45 ©Silberschatz, Korth and Sudarshan Second Normal Form - 2NF Uses the concepts of FDs, PKs and this definition: An FD is a Full functional dependency if: given Y -> Z Removal of any attribute from Y means the FD does not hold any more Database System Concepts 3.46 ©Silberschatz, Korth and Sudarshan 2NF A relation schema R is in 2NF if: Relation is in 1NF Every non-prime attribute A in R is fully functionally dependent on the primary key Prime attribute - attribute that is a member of the primary key K R can be decomposed into 2NF relations via the process of 2NF normalization Remove partial dependencies create new relations where partials are full Database System Concepts 3.47 ©Silberschatz, Korth and Sudarshan Simplifying Functional Dependencies through Normalization Normalization: the identification of functional dependencies and the modifications required to structurally change the database to remove undesirable dependencies Database System Concepts 3.48 ©Silberschatz, Korth and Sudarshan Source: ESRI Advanced ArcInfo Database System Concepts 3.49 ©Silberschatz, Korth and Sudarshan Source: ESRI Advanced ArcInfo Database System Concepts 3.50 ©Silberschatz, Korth and Sudarshan Source: ESRI Advanced ArcInfo Database System Concepts 3.51 ©Silberschatz, Korth and Sudarshan Source: ESRI Advanced ArcInfo Database System Concepts 3.52 ©Silberschatz, Korth and Sudarshan Source: ESRI Advanced ArcInfo Database System Concepts 3.53 ©Silberschatz, Korth and Sudarshan Source: ESRI Advanced ArcInfo Database System Concepts 3.54 ©Silberschatz, Korth and Sudarshan September 2 ,2004 Read the following article:IBM's early relational database scientists: http://www.mcjones.org/System_R/SQL_Re union_95/sqlr95.html Chapter 3 3.1. And Chapter 7,7.1-7.3.2 Work on problems: 7.12.7.13,7.14,7.15 Database System Concepts 3.55 ©Silberschatz, Korth and Sudarshan