Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Open Database Connectivity wikipedia , lookup
Oracle Database wikipedia , lookup
Entity–attribute–value model wikipedia , lookup
Ingres (database) wikipedia , lookup
Microsoft Jet Database Engine wikipedia , lookup
Concurrency control wikipedia , lookup
Functional Database Model wikipedia , lookup
Extensible Storage Engine wikipedia , lookup
Clusterpoint wikipedia , lookup
ContactPoint wikipedia , lookup
Database Normalization What is Normalization Normalization allows us to organize data so that it: • Allows faster access (dependencies make sense) • Reduced space (less redundancy) Data Normalization Primarily a tool to validate and improve a logical design so that it satisfies certain constraints that avoid unnecessary duplication of data The process of decomposing relations with anomalies to produce smaller, wellstructured relations Results of Normalization Removes the following modification anomalies (integrity errors) with the database • Insertion • Deletion • Update ANOMALIES Insertion • inserting one fact in the database requires knowledge of other facts unrelated to the fact being inserted Deletion • Deleting one fact from the database causes loss of other unrelated data from the database Update • Updating the values of one fact requires multiple changes to the database Insertion Anomaly Suppose we want to enter the new Job 'Cooker' having $800 as a salary. We cannot enter the data into the EMPLOYEE relation until an employee enrolls in this Job. This restriction seems silly. Why should we have to wait until someone takes the Job before we can record its salary? This situation is called insertion anomaly. It is due to the relational model constraint that disables 'Null values' for the primary key (EMPID). EMPLOYEE EMPID NAME JOB 200 300 600 700 450 Smith John George Jean Mike Driver Guardian Guardian Gardener Gardener SALAR Y 1500 2000 2000 2500 2500 Insertion Anomaly These anomalies can be eliminated by decomposing the employee relation into the two ones shown in the following figure. EMP-JOB EmpId 200 300 600 700 450 Name Smith John George Jean Mike JOB-SAL Job driving Guardian Guardian Gardener Gardener Job Salary Driving Guardian 1500 2000 Gardener 2500 ANOMALIES EXAMPLES TABLE: COURSE COURSE# SECTION# C_NAME CIS564 072 Database Design CIS564 073 Database Design CIS570 072 Oracle Forms CIS564 074 Database Design ANOMALIES EXAMPLES Insertion: Suppose our College has approved a new course called CIS 563: SQL & PL/SQL. Can this information about the new course be entered (inserted) into the table COURSE in its present form? COURSE# SECTION# C_NAME CIS564 072 Database Design CIS564 073 Database Design CIS570 072 Oracle Forms CIS564 074 Database Design Deletion anomaly For the data in following figure (where we have only one driver), if we delete the employee number 200, we will lose not only that this employee is a driver, but also that driving costs' $1500. This is called a deletion anomaly; we are losing more information than we want to. We lose facts about two entities with one deletion. EMPLOYEE EMPID 200 300 600 700 450 NAME Smith John George Jean Mike JOB Driver Guardian Guardian Gardener Gardener SALARY 1500 2000 2000 2500 2500 Update anomaly Example1: If the salary of the job "Guardian" changes from $2000 to $2100, the updating operation must be repeated with each Guardian employee. EMPLOYEE EMPID 200 300 600 700 450 NAME Smith John George Jean Mike JOB Driver Guardian Guardian Gardener Gardener SALARY 1500 2000 2000 2500 2500 ANOMALIES EXAMPLES Update: Suppose the course name (C_Name) for CIS 564 got changed to Database Management. How many times do you have to make this change in the COURSE table in its current form? COURSE# SECTION# C_NAME CIS564 072 Database Design CIS564 073 Database Design CIS570 072 Oracle Forms CIS564 074 Database Design ANOMALIES Following Figure shows the relations issued from the decomposition phase. Now the previous anomalies are avoided so then we can : Delete the employee Smith from EMP-JOB without losing the fact that driving has a salary $1500, Insert a new job in the JOB-SAL table without enrolled employees, Modify only one tuple in the JOB-SAL table instead of many tuples in the initial relation. EMP-JOB EmpId 200 300 600 700 450 Name Smith John George Jean Mike JOB-SAL Job driving Guardian Guardian Gardener Gardener Job Salary Driving Guardian 1500 2000 Gardener 2500 ANOMALIES So, a table (relation) is a stable (‘good’) table only if it is free from any of these anomalies at any point in time. You have to ensure that each and every table in a database is always free from these modification anomalies. And, how do you ensure that? ‘Normalization’ theory helps. NORMAL FORMS 1 NF 2NF 3NF BCNF (Boyce-Codd Normal Form) 4NF 5NF Normal Forms Normalization is done through changing or transforming data into various Normal Forms. There are 5 Normal Forms but we almost never use 4NF or 5NF. We will only be concerned with 1NF, 2NF, and 3NF. Normal Forms For a database to be in a normal form, it must meet all requirements of the previous forms: • Eg. For a database to be in 2NF, it must already be in 1NF. For a database to be in 3NF, it must already be in 1NF and 2NF. Manager Fatma Abdulaziz Ali Employees Sayed, Tariq Tafla, Mohammed Sarai, Miriam Sample Data Manager Fatma Abdulkaziz Ali Employees Sayed, Tariq Tafla, Mohammed Sarai, Miriam Data that is not atomic means: • • • • We can’t easily sort the data We can’t easily search or index the data We can’t easily change the data We can’t easily reference the data in other tables Manager Fatma Abdulaziz Ali Employee1 Sayed Tafla Sarai Employee2 Tariq Mohammed Miriam Sample Data • We still can’t easily sort, search, or index our employees. • What if a manager has more than 2 employees, 10 employees, 100 employees? We’d need to add columns to our database just for these cases. • It is still hard to reference our employees in other tables. Manager Fatma Abdulaziz Ali Employee1 Sayed Tafla Sarai Employee2 Tariq Mohammed Miriam First Normal Form 1NF means that we must: • Eliminate duplicate columns from the same table, and • Create separate tables for each group of related data into separate tables, each with a unique row identifier (primary key) Let’s get started by making our columns atomic… Manager Fatma Fatma Abdulaziz Abdulaziz Ali Ali Employee Sayed Tariq Tafla Mohammed Sarai Miriam Primary Key The best primary key would be the Employee column. Every employee only has one manager, therefore an employee is unique. Employee Sayed Tariq Tafla Mohammed Sarai Miriam Manager Fatma Fatma Abdulaziz Abdulaziz Ali Ali Students should now say that the Employee is the Primary Key since there are now multiple manager values in the table. Only Employee is unique. First Normal Form Congratulations! The fact that all our data and columns is atomic and we have a primary key means that we are in 1NF! Employee Sayed Tariq Tafla Mohammed Sarai Miriam Manager Fatma Fatma Abdulaziz Abdulaziz Ali Ali Students should now say that the Employee is the Primary Key since there are now multiple manager values in the table. Only Employee is unique. ID 1 2 3 4 5 6 7 8 9 Employee ManagerID Sayed 7 Tariq 7 Tafla 8 Mohammed 8 Sarai 9 Miriam 9 Fatma Abdulaziz Ali Manager Fatma Abdulaziz Ali Employees Sayed, Tariq Tafla, Mohammed Sarai, Miriam ID 1 2 3 4 5 6 7 8 9 Employee ManagerID Sayed 7 Tariq 7 Tafla 8 Mohammed 8 Sarai 9 Miriam 9 Fatma Abdulaziz Ali Moving to Second Normal Form A database in 2NF must also be in 1NF: • Data must be atomic • Every row (or tuple) must have a unique primary key Plus: • Subsets of data that apply to multiple rows (repeating data) are moved to separate tables CustID 1 2 3 4 5 6 7 8 9 FirstName Bob John Sandy Maria Gameil James Shiela Ian Ed LastName Smith Brown Jessop Hernandez Hintz Richardson Green Sampson Rodgers Address 123 Main St. 555 2nd Ave. 4256 James St. 4599 Columbia 569 Summit St. 12 Cameron Bay 12 Michigan Ave. 56 Manitoba St. 15 Athol St. City Tucson St. Paul Chicago Vancouver St. Paul Regina Chicago Winnipeg Regina State AZ MN IL BC MN SK IL MB SK Zip 12345 54355 43555 V5N 1M0 54355 S4T 2V8 43555 M5W 9N7 S4T 2V9 City Tucson St. Paul Chicago Vancouver St. Paul Regina Chicago Winnipeg Regina State AZ MN IL BC MN SK IL MB SK Zip 12345 54355 43555 V5N 1M0 54355 S4T 2V8 43555 M5W 9N7 S4T 2V9 Moving to Second Normal Form The CustID determines all the data in the row, but U.S. Zip codes determines the City and State. (eg. A given Zip code can only belong to one city and state so storing Zip codes with a City and State is redundant) This means that City and State are Functionally Dependent on the value in Zip code and not only the primary key. Moving to Second Normal Form To be in 2NF, this repeating data must be in its own table. So: • Let’s create a Zip code table that maps Zip codes to their City and State. • Note that Canadian Postal Codes are different: the same city and state can have many different postal codes. Customer Table CustID 1 2 3 4 5 6 7 8 9 FirstName Bob John Sandy Maria Gameil James Shiela Ian Ed Zip Code Table Zip 12345 54355 43555 V5N 1M0 S4T 2V8 M5W 9N7 S4T 2V9 LastName Smith Brown Jessop Hernandez Hintz Richardson Green Sampson Rodgers City Tucson St. Paul Chicago Vancouver Regina Winnipeg Regina State AZ MN IL BC SK MB SK Address 123 Main St. 555 2nd Ave. 4256 James St. 4599 Columbia 569 Summit St. 12 Cameron Bay 12 Michigan Ave. 56 Manitoba St. 15 Athol St. Zip 12345 54355 43555 V5N 1M0 54355 S4T 2V8 43555 M5W 9N7 S4T 2V9 Advantages of 2NF Saves space in the database by reducing redundancies If a customer calls, you can just ask them for their Zip code and you’ll know their city and state! (No more spelling mistakes) If a City name changes, we only need to make one change to the database. Summary So Far… 1NF: • All data is atomic • All rows have a unique primary key 2NF: • Data is in 1NF • Subsets of data in multiple columns are moved to a new table • These new tables are related using foreign keys Moving to 3NF To be in 3NF, a database must be: • In 2NF • All columns must be fully functionally dependent on the primary key (There are no transitive dependencies) OrderID CustID ProdID Price Quantity Total 1 1001 AB-111 50 1,000 50,000 2 1002 AB-111 60 500 30,000 3 1001 ZA-245 35 100 3,500 4 1003 MB-153 82 25 2,050 5 1004 ZA-245 42 10 420 6 1002 ZA-245 40 50 2,000 7 1001 AB-111 75 100 7,500 OrderID CustID ProdID Price Quantity Total 1 1001 AB-111 50 1,000 50,000 2 1002 AB-111 60 500 30,000 3 1001 ZA-245 35 100 3,500 4 1003 MB-153 82 25 2,050 5 1004 ZA-245 42 10 420 6 1002 ZA-245 40 50 2,000 7 1001 AB-111 75 100 7,500 OrderID CustID ProdID Price Quantity Total 1 1001 AB-111 50 1,000 50,000 2 1002 AB-111 60 500 30,000 3 1001 ZA-245 35 100 3,500 4 1003 MB-153 82 25 2,050 5 1004 ZA-245 42 10 420 6 1002 ZA-245 40 50 2,000 7 1001 AB-111 75 100 7,500 OrderID CustID ProdID Price Quantity Total 1 1001 AB-111 50 1,000 50,000 2 1002 AB-111 60 500 30,000 3 1001 ZA-245 35 100 3,500 4 1003 MB-153 82 25 2,050 5 1004 ZA-245 42 10 420 6 1002 ZA-245 40 50 2,000 7 1001 AB-111 75 100 7,500 Maybe price is dependent on the ProdID and Quantity: The more you buy of a given product the cheaper that product becomes! So we ask the business manager and she tells us that this is the case. OrderID CustID ProdID Price Quantity Total 1 1001 AB-111 50 1,000 50,000 2 1002 AB-111 60 500 30,000 3 1001 ZA-245 35 100 3,500 4 1003 MB-153 82 25 2,050 5 1004 ZA-245 42 10 420 6 1002 ZA-245 40 50 2,000 7 1001 AB-111 75 100 7,500 We say that Price has a transitive dependency on ProdID and Quantity. • This means that Price isn’t just determined by the OrderID. It is also determined by the size (or quantity) of the order (and of course what is ordered). OrderID CustID ProdID Price Quantity Total OrderID CustID ProdID Price Quantity Total OrderID CustID ProdID Price Quantity OrderID CustID ProdID Price Quantity OrderID CustID ProdID ProdID Quantity Price Quantity OrderID CustID ProdID Quantity ProdID Quantity Price 1 1001 AB-111 1,000 AB-111 1 75 2 1002 AB-111 500 AB-111 101 60 3 1001 ZA-245 100 AB-111 501 50 4 1003 MB-153 25 ZA-245 1 42 5 1004 ZA-245 10 ZA-245 11 40 6 1002 ZA-245 50 ZA-245 51 35 7 1001 AB-111 100 MB-153 1 82 OrderID CustID ProdID Quantity ProdID Quantity Price 1 1001 AB-111 1,000 AB-111 1 75 2 1002 AB-111 500 AB-111 101 60 3 1001 ZA-245 100 AB-111 501 50 4 1003 MB-153 25 ZA-245 1 42 5 1004 ZA-245 10 ZA-245 11 40 6 1002 ZA-245 50 ZA-245 51 35 7 1001 AB-111 100 MB-153 1 82 Congratulations! We’re now in 3NF! We can also quickly figure out what price to offer our customers for any quantity they want. To Summarize (again) A database is in 3NF if: • It is in 2NF • It has no transitive dependencies A transitive dependency exists when one attribute (or field) is determined by another non-key attribute (or field) We remove fields with a transitive dependency to a new table and link them by a foreign key. Summarizing A database is in 2NF if: • It is in 1NF • There is no repeating data in its tables. Put another way, if we use a composite primary key, then all attributes are dependent on all parts of the key. And Finally… A database is in 1NF if: • All its attributes are atomic (meaning they contain only a single unit or type of data), and • All rows have a unique primary key.