* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download relation - Csmaster
Entity–attribute–value model wikipedia , lookup
Extensible Storage Engine wikipedia , lookup
Relational algebra wikipedia , lookup
Microsoft Jet Database Engine wikipedia , lookup
Concurrency control wikipedia , lookup
Functional Database Model wikipedia , lookup
Clusterpoint wikipedia , lookup
ContactPoint wikipedia , lookup
David M. Kroenke’s Database Processing: Fundamentals, Design, and Implementation Chapter Three: The Relational Model and Normalization DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 3-1 Chapter Premise • We have received one or more tables of existing data • The data is to be stored in a NEW database • Should the table of data – be stored as is, or – restructured? DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 3-2 Example 1 DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 3-3 Example 2 DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 3-4 Data Redundancy • Data redundancy results in data inconsistency – Different and conflicting versions of the same data appear in different places – Errors more likely to occur when the same data must be entered in several different places • Data anomalies develop when required changes in redundant data are not made successfully DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 3-5 Modification Anomalies • Update anomalies – Occur when changes must be made to existing records • Insertion anomalies – Occur when entering new records • Deletion anomalies – Occur when deleting records DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 3-6 Modification Anomalies • The EQUIPMENT_REPAIR table before and after an incorrect update operation on AcquisitionCost for Type = Drill Press: DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 3-7 The Relational Model • Introduced in 1970 • Created by E.F. Codd – IBM engineer – The model used the mathematical system known as “relational algebra” • Today it is the standard for commercial DBMS products DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 3-8 Important Relational Model Terms • • • • • • • • • • • • Entity Relation (table) Functional Dependency Determinant Candidate Key Composite Key Primary Key Surrogate Key Foreign Key Referential integrity constraint Normal Form Multivalued Dependency DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 3-9 Entity • An entity is some identifiable thing that users want to track: – – – – – – – – Customer Computer Sale Student Invoice Department Course Policy DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 3-10 Relation • Data about entities is stored in relations • A relation is a two-dimensional table with these characteristics: – Rows contain data about an entity – No two rows may contain identical data – Columns contain data about attributes of the entity – All entries in a column are the same data type – Each column has a unique name – Cells of the table hold a single value DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 3-11 Employee Relation DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 3-12 Normalization • This is the process of organizing data into relations (tables) that are structurally sound from the perspective of the relational model – SQL works well on data organized this way – Data quality is higher; i.e., the data is more reliable DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 3-13 Objectives of Normalization • Develop a good description of the data, its relationships and constraints • Produce a stable set of relations that – Is a faithful model of the enterprise – Is highly flexible – Reduces redundancy • saves space • reduces data inconsistency – Is free of all anomalies DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 3-14 Anomalies are very bad! DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 3-15 Anomalies • An anomaly is an inconsistent, incomplete, or contradictory state of the database – Insertion anomaly – user cannot insert a new record when it should be possible to do so – Deletion anomaly – when a record is deleted, other information that is tied to it is also deleted (not by design) – Update anomaly – a record is updated, but other appearances of the same data are not DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 3-16 Data redundancy leads to anomalies Find examples of insertion, deletion & update anomalies DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 3-17 Normalization • Normalization plays an important role in database design. • Through it, we decompose relations (tables) in stages from lower to higher normal forms – 1NF, 2NF, 3NF, BCNF – Other normal forms are 4NF, 5NF, DKNF • We use normalization and E-R modeling together for good database design • It all starts with identifying functional dependencies (FD’s) DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 3-18 Functional Dependency • If A and B are (sets of) attributes of relation R, B is functionally dependent on A if… – a particular value of A determines a unique value of B. • Emp_Name is functionally dependent on Emp_Num • with a particular value for Emp_Num, I can find the name of the employee (Emp_Name) with that Emp_Num • A→B says A determines B – or B is dependent on A DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 3-19 Example of FDs • R = NewStudent(stuId, lastName, major, credits, status, ssn) • Some FDs in R: stuId → lastName stuId → (lastName, major, credits, status, ssn, stuId) ssn → (stuId, lastName, major, credits, status, ssn) credits → status DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 3-20 Functional Dependency • A functional dependency occurs when the value of one (a set of) attribute(s) determines the value of a second (set of) attribute(s): StudentID StudentName StudentID (DormName, DormRoom, Fee) • The attribute on the left side of the functional dependency is called the determinant • Functional dependencies may be based on equations: ExtendedPrice = Quantity X UnitPrice (Quantity, UnitPrice) ExtendedPrice • But, functional dependencies are not equations! DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 3-21 Composite Determinants • A determinant of a functional dependency may itself consist of more than one attribute: (StudentName, ClassName) (Grade) Note that StudentName Grade ClassName Grade DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 3-22 Functional Dependencies in the SKU_DATA Table Can you find three FDs? DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 3-23 Functional Dependencies in the ORDER_ITEM Table Can you find two? DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 3-24 Keys are determinants • A key is a combination of one or more attributes that is used to identify rows in a relation • A composite key is a key that consists of two or more attributes DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 3-25 Keys & FD’s • Superkey – functionally determines all attributes in a relation • Candidate key – a superkey that is a minimal identifier • Primary key - chosen candidate key – Must always be filled (non-null) Entity integrity – Must be unique – May be composite – Ideally, it is short, numeric and never changes DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 3-26 Surrogate Keys • A surrogate key is an artificial column added to a relation to serve as a primary key: – DBMS supplied – Short, numeric, never changes, never reused – Has artificial values that may be meaningless to users – See next slide for example DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 3-27 Utility of Surrogate Keys • RENTAL_PROPERTY with no surrogate key: RENTAL_PROPERTY (Street, City, State/Province, Zip/PostalCode, Country, Rental_Rate) • RENTAL_PROPERTY with a surrogate key: RENTAL_PROPERTY (PropertyID, Street, City, State/Province, Zip/PostalCode, Country, Rental_Rate) DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 3-28 Foreign Keys • A foreign key is the primary key of one relation that is placed in another relation – It forms a link between the two relations – A foreign key can be atomic (single column) or composite DEPARTMENT (DepartmentName, BudgetCode, ManagerName) EMPLOYEE (EmployeeNumber, EmployeeName,DepartmentName) DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 3-29 Referential Integrity • Relations must exhibit integrity in their links to other relations – A foreign key field must contain a value that • equals a primary key value in the corresponding relation, or • is NULL SKU_DATA (SKU, SKU_Description, Department, Buyer) ORDER_ITEM (OrderNumber, SKU, Quantity, Price, ExtendedPrice) Where ORDER_ITEM.SKU must exist in SKU_DATA.SKU DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 3-30 When might a foreign key value be NULL? • A CUSTOMER may have no AGENT • An EMPLOYEE may be assigned to no DEPARTMENT (yet) DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 3-31 Normal Forms • 1NF – A table that qualifies as a relation is in 1NF • 2NF – A relation is in 2NF if all of its nonkey attributes are dependent on every attribute in the primary key • 3NF – A relation is in 3NF if it is in 2NF and has no determinants except the primary key • Boyce-Codd Normal Form (BCNF) – A relation is in BCNF if every determinant is a candidate key DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 3-32 First Normal Form (1NF) • A relation is in 1NF if every attribute is single-valued for each tuple – each cell of the table contains only one value • Domains of attributes are atomic – No sets – No lists – No repeating fields or groups allowed DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 3-33 This relation is not in 1NF stuid lastName S1001 Smith S1003 Jones S1006 Lee S1010 Burns S1060 Jones major History Math CSC Math Art English CSC credits status ssn 90 95 15 Sr Sr Fr 100429500 010124567 088520876 63 Jr 099320985 25 Fr 064624738 (Assume students can have double majors) DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 3-34 Decomposing into 1NF • Create a new table for each multi-valued attribute – Put the PK of the original table and the multi-valued attribute in this table – The PK of this new table is composite – Will have additional rows for each value of the attribute • Remove the multi-valued attribute from the original table NewStu2(stuId, lastName, credits,status, ssn) Majors(stuId, major) DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 3-35 Two new tables stuId lastName credits status S1001 S1003 S1006 S1010 S1060 Smith Jones Lee Burns Jones Sr Sr Fr Jr Fr 90 95 15 63 25 ssn 100429500 010124567 088520876 099320985 064624738 NewStu2 DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall stuId S1001 major History S1003 Math S1006 CSC S1006 Math S1010 Art S1010 English S1060 CSC Majors 3-36 Another method for 1NF • “Flatten” the original table by making the multi-valued attribute part of a new composite key • Student(stuId, lastName, major, credits, status, ssn) – See next slide… DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 3-37 Flattened and in 1NF stuid lastName S1001 S1003 S1006 S1006 S1010 S1010 S1060 Smith Jones Lee Lee Burns Burns Jones major History Math CSC Math Art English CSC credits status 90 95 15 15 63 63 25 Sr Sr Fr Fr Jr Jr Fr ssn 100429500 010124567 088520876 088520876 099320985 099320985 064624738 NewStu Table with PK (stuID, major) DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 3-38 Another (generally not-so-good) method for 1NF • If the number of repeats is specified or limited, can make additional columns for multiple values • Student(stuId, lastName, major1, major2, credits, status, ssn) DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 3-39 This relation is also not in 1NF DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 3-40 The “flattening” approach works best here But we are still prone to all forms of anomalies, and so we must go on to transform this into 2NF and above. DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 3-41 Second Normal Form (2NF) • A relation is in second normal form (2NF) if it is in first normal form and all the nonkey attributes are fully functionally dependent on the key. – No non-key attribute is FD on just part of the key – If R’s key has only one attribute (ie, is not composite), and R is 1NF, R is automatically in 2NF DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 3-42 Full Functional Dependency • Transforming to 2NF requires that all dependencies within a relation are full functional dependencies – ie., no partial dependencies on the key • In relation R, a set of attributes B is fully functionally dependent on set of attributes A if B is functionally dependent on A… – but not functionally dependent on any proper subset of A • This means every attribute in A is needed to functionally determine B DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 3-43 NewClass is not in 2NF NewClass(courseNo, stuId, lastName, facId, schedule, room, grade) FDs: (courseNo,stuId) → (lastName) (courseNo,stuId) →(facId) (courseNo,stuId) →(schedule) (courseNo,stuId) →(room) (courseNo,stuId) →(grade) courseNo → facId courseNo → schedule courseNo → room stuId → lastName Looks like we’ve found a primary key But…these are all partially dependent on the key …plus trivial FDs that are partial… DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 3-44 Decomposing into 2NF • Identify each partial FD. • Create a new relation for each part of the PK that determines other attributes. – Remove attributes that depend on each of these determinants from the original relation & put them in the new relation – I.e., place all determinants in separate relations along with their dependent attributes • In the original relation keep the composite key and any attributes that are fully functionally dependent on all of it. • Even if the composite key has no dependent attributes, keep that relation to connect logically to the other relations. DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 3-45 Putting NewClass into 2NF • NewClass(courseNo, stuId, lastName, facId, schedule, room, grade ) • FDs grouped by determinant: courseNo → (courseNo, facId, schedule, room) stuId → (stuId, lastName) (courseNo,stuId) → (courseNo, stuId, facId, schedule, room, lastName, grade) • Create tables grouped by determinants: Course(courseNo, facId, schedule, room) Stu(stuId, lastName) • Keep relation with original composite key, with attributes FD on it: NewStu2( courseNo, stuId, grade) DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 3-46 2NF - Putting it all together • We started with: – NewClass(courseNo, stuId, lastName, facId, schedule, room, grade) • It was already in 1NF – We decomposed it into 2NF: • Course(courseNo, facId, schedule, room) • Stu(stuId, lastName) • NewStu2( courseNo, stuId, grade) DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 3-47 Third Normal Form • A relation is in 3NF if whenever a nontrivial functional dependency X→A exists, – either X is a superkey or – A is a member of some candidate key • To be in 3NF, a relation must be in 2NF and have no transitive dependencies – I.e., no non-key attribute may determine another non-key attribute. – Here key includes “candidate key” DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 3-48 Transitive Dependency • If A, B, and C are attributes of relation R, such that A → B, and B → C, then C is transitively dependent on A. • NewStudent (stuId, lastName, major, credits, status) FD: credits→status …but credits is not a key By transitivity: stuId→credits AND credits→status implies stuId→status DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 3-49 Decomposing into 3NF • NewStudent (stuId, lastName, major, credits, status) • FD credits→status • Remove the dependent attribute, status, from the relation • Create a new table with the dependent attribute and its determinant, credits • Keep the determinant in the original table NewStu2 (stuId, lastName, major, credits) Status (credits, status) DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall In 3NF 3-50 1NF/2NF/3NF Process • Before moving onto BCNF… – Put relation in 1NF • Remove all multi-valued attributes/repeating groups/etc. – List all FD’s / find a key – Put relation in 2NF • Remove all partial dependencies on key – Put relation in 3NF • Remove all transitive dependencies DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 3-51 Comprehensive example • Work(projName, projMgr, empID, hours, empName, budget, startDate, salary, empMgr, empDept, rating) – If not in 1NF, fix it – List all FD’s – Find a key – Remove all partial dependencies (2NF) – Remove all transitive dependencies (3NF) DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 3-52