* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Relational Database Design - State University of New York
Open Database Connectivity wikipedia , lookup
Serializability wikipedia , lookup
Oracle Database wikipedia , lookup
Extensible Storage Engine wikipedia , lookup
Entity–attribute–value model wikipedia , lookup
Relational algebra wikipedia , lookup
Microsoft Jet Database Engine wikipedia , lookup
Concurrency control wikipedia , lookup
Ingres (database) wikipedia , lookup
Clusterpoint wikipedia , lookup
Relational model wikipedia , lookup
Relational Database Design III Database Principles Normalization: • A table is good if it is about one thing. • A table is good if the only FDs are of the form key attribute • Consider the table below with the following FDs. We know that its CKs are {A,F}, {B,F} and {C,F}. R A B C D B, E -> C A, F -> B C -> A, D B -> E E F • So R is a bad table with several bad FDs. Database Principles bad good bad bad Exercise: • Label the FDs of the following table as good or bad. The only CK is {B}. R A B C D E F D -> C, A B -> F F -> C, E B -> D Database Principles bad good bad good Exercise: • Label the FDs of the following table as good or bad. The CKs are {B,E}, {D,E} and {F,E}. R A B C D E F A, B -> C D, E -> B F -> D B, E -> F D -> A Database Principles bad good bad good bad Prime and Non-prime Attributes • Attributes that belong to some CK are called prime. • Other attributes are called non-prime. • The CKs of R are {A,F}, {B,F} and {C,F}. R A B C D B, E -> C A, F -> B C -> A, D B -> E E F • Prime Attributes: A, B, C, F • Non-prime Attributes: D, E Database Principles Prime and Non-prime Attributes • The CK of R is {B}. R A B C D D -> C, A B -> F F -> C, E B -> D E F • Prime Attributes: B • Non-prime Attributes: A, C, D, E, F Database Principles Prime and Non-prime Attributes • The CKs are {B,E}, {D,E} and {F,E}. R A B C D A, B -> C D, E -> B F -> D B, E -> F D -> A E F • Prime Attributes: B, D, F, E • Non-prime Attributes: A, C Database Principles The Normal Forms: • The process of turning bad tables into good ones is called normalization. • The process of taking good tables and combining them in such a way as to create a bad table (typically done for the sake of efficiency) is called denormalization. unnormalized tables BCNF 4NF ... 3NF Database Principles 2NF 1NF Basic Idea: • Start with a table in one of the outer rings of the diagram and decompose it into two tables that belong to an inner ring. BCNF 4NF ... 3NF 2NF 1NF • NOTE: The closer to the center the “better” the tables. Database Principles Decomposing Tables: • Decomposing a table means breaking it up into two or more tables using the relational algebra project operator. R A B C D E S A T B C S = π A,B,C ( R ) C D E T = π C,D,E ( R ) Exercise: Write these relational algebra operations in SQL. Database Principles Beware: Not all Table Decompositions are Safe: • Consider the table Student SID SName DOB Major s1 s2 s3 s4 John Mary Alfredo Tracey 1982 1983 1975 1982 Hist Eng Math CS • Suppose we decompose the table into two tables S = π sid,sname,dob ( Student ) SID s1 s2 s3 s4 T = π dob,major ( Student ) SName DOB DOB John Mary Alfredo Tracey 1982 1983 1975 1982 1982 1983 1975 1982 Database Principles Major Hist Eng Math CS Question: • Do S and T contain the same information as Student? • Examples: What is Alfredo’s major? Math What year was Mary born in? 1983 What is John’s major? History or Computer Science So you can see that S and T do NOT contain the same info as Student. • The easy test is to try to recreate Student from S and T using JOIN. Database Principles Answer: • Join(S,T) SID SName DOB Major s1 s1 s2 s3 s4 s4 John John Mary Alfredo Tracey Tracey 1982 1982 1983 1975 1982 1982 Hist CS Eng Math Hist CS • We notice that the join table contains noise rows. • This means we can not reconstruct Student exactly so we have lost info using this decomposition. • It is called a lossy-join decomposition but we might call it a noisy-join decomposition. Database Principles Lossless-Join Decomposition • A decomposition of a table R into two tables, S and T, is called a lossless-join decomposition if R = join(S,T), with no new rows and no rows lost. • We might call this a noiseless-join decomposition. • If a decomposition of R into two tables, S and T, is a lossless-join decomposition then no information found in the original table R is lost. Database Principles Main Theorem: • If R is a table and we decompose R into two tables, S and T, such that – union(S ,T)= R and intersect( S , T ) is a key of either S or T then this is a lossless-join decomposition of R and R = join(S,T). NOTE: What makes our original decomposition a lossy-join ∩ S , T ) is decomposition is that {DOB} = intersect( neither a key to S nor to T. Database Principles Theorem Proof: • If R is a table and we decompose R into two tables, S and T, such that • union(S ,T) = R and • intersect( S , T ) is a key of T. Let |R| denote the number of rows in R. |S| <= |R| and |T| <= |R| since both S and T are produced by projecting over R. R subset_of join(S,T) so |R| <= |join(S,T)| Since intersect( S , T ) is a key to T, each value of intersect( S , T ) appears once and only once in T. Thus each row in S will join to only one row of T and so |join(S,T)| <= |S|. Since |S| <= |R| we conclude |join(S,T)| == |R| and so the two tables Database are equal. Principles Lossless-join Example: • Consider the table Student SID SName DOB Major s1 s2 s3 s4 John Mary Alfredo Tracey 1982 1983 1975 1982 Hist Eng Math CS • Suppose we decompose the table into two tables S = π sid,sname,dob ( Student ) SID s1 s2 s3 s4 T = π sid,dob,major ( Student ) SName DOB SID DOB Major John Mary Alfredo Tracey 1982 1983 1975 1982 s1 s2 s3 s4 1982 1983 1975 1982 Hist Eng Math CS Database Principles Question: • Do S and T contain the same information as Student? • Examples: What is Alfredo’s major? Math What year was Mary born in? 1983 What is John’s major? History So you can see that S and T do contain the same info as Student. • Try to recreate Student from S and T using JOIN. Database Principles Answer: • Join(S,T) SID SName DOB Major s1 s2 s3 s4 John Mary Alfredo Tracey 1982 1983 1975 1982 Hist Eng Math CS • We notice that the join table contain no noise rows. • This means we can reconstruct Student exactly so we have not lost info using this decomposition. • For that reason it is called a lossless-join decomposition but we might call it a noiseless-join decomposition. Database Principles We Use Lossless-join Decompositions. • In all of the work that follows, the only decompositions we will be applying are lossless-join decompositions. Database Principles Normalization (1NF): • First Normal Form: A table is in First Normal Form (1NF) if all of its values are atomic. Enrollment SID CrsID s1 c1,c2,c3 NOT 1NF Enrollment SID s1 s1 s1 CrsID c1 c2 c3 NOTE: All our tables will be assumed 1NF Database Principles 1NF Normalization (2NF): • Second Normal Form: A table is in Second Normal Form (2NF) if it is in 1NF and there are no bad FDs of the form: prime (proper subset of candidate key) -> non-prime • Note 1: This situation can not occur if all CKs are singletons. In that case all prime attributes are also keys. • Note 2: Suppose the CKs for a table are {A,E} and {B, E} and suppose A, B --> C. This is not a 2NF violation since {A,B} is not a subset of a CK. We will deal with this situation alter. Database Principles 2NF Violation Example: • Suppose we have an alternative BOOK table whose CKs are {ISBN} and {author,title}. Book1 ISBN author title pub_name pub_date c_price a_addr CKs: {ISBN}, {author,title} FD: author --> a_addr -- 2NF violation • Explanation: The prime attributes are {ISBN, author, title} and a_addr is non-prime so the FD violates the 2NF rule. Database Principles 2NF Violation Resolution • To remove an FD that is a 2NF violation from a table perform the following decomposition: – create a new table containing all the attributes of the bad FD Author author a_addr CK: {author} FD: author --> a_addr -- good – remove the attributes from the right-hand-side of the bad FD from the original table Book ISBN author title pub_name pub_date c_price Database Principles CK: {ISBN} FDs: all good Did We Lose Information? intersect(Book , Author ) = {author} • {author} is the intersection of the two table schemas and is the key to one of the tables (Author) so this is a losslessjoin decomposition. • In other words: join(Book, Author) = Book1 Database Principles 2NF Exercise: • Consider the table below with the following FDs and whose CKs are {A,F}, {B,F} and {C,F}. R A B C D B, E -> C A, F -> B C -> A, D B -> E E F • Step 1: Identify the prime and non-prime attributes prime: A, B, C, F non-prime: D, E • Step 2: Identify the 2NF violating FDs. C --> D and B --> E Database Principles 2NF Exercise: • Step 3: Decompose the original table R, removing the bad FDs R A B C D E F decomposes to R’ A B C F S C D CKs: {A, F}, {B, F}, PK: C {C,F} FD: C -> D -- good FDs: B -> C -- bad A, F -> B -- good C -> A -- bad Database Principles T B E PK: B FD: B -> E -- good Exercise: • In the previous example we combined the two FDs as B, E -> C B -> E -- FD 1 -- FD 2 B -> C • Prove this is so. (i) B -> B, E (ii) B -> C -- FD1, ident(B), aug+ -- (i), FD1, trans Database Principles Normalization (3NF): • Third Normal Form: A table is in Third Normal Form (3NF) if it is in 2NF and there are no bad FDs of the form: non-prime -> non-prime Database Principles 3NF Violation Example: • Suppose we have the original Cardholder table created from the ER Diagram. Cardholder borrowerid b_name b_addr b_status loan_limit pk prime: borrowerid non-prime: b_name, b_addr, b_status, loan_limit borrowerid -> b_name borrowerid -> b_addr borrowerid -> b_status borrowerid -> loan_limit b_status -> loan_limit -- good -- good -- good -- good -- 3NF violation Database Principles 3NF Violation Resolution • To remove an FD that is a 3NF violation from a table perform the following decomposition: – create a new table containing all the attributes of the bad FD Status b_status loan_limit CK: {b_status} FD: b_status --> loan_limit -- good – remove the attributes from the right-hand-side of the bad FD from the original table Cardholder borrowerid b_name b_addr b_status Database Principles CK: {borrowerid} FDs: all good 3NF Exercise: • The following table has only one CK and that is {B}. Decompose the table. R A B C D D -> C, A B -> F F -> C, E B -> D E F bad good bad good • Step 1: List the prime and non-prime attributes prime: B non-prime: A, C, D, E, F Step 2: Identify the 3NF violating FDs. D -> C, A and F -> C, E Exercise: Prove that a table with only singleton CKs has no 2NF violations. Database Principles 3NF Exercise: • Step 3: Decompose the original table R, removing the bad FDs R A B C D D -> C, A B -> F F -> C, E B -> D E F decomposes to S D C A R’ B D F PK: B FDs: B -> D,F -- good PK: D FD: D -> C,A -- good T F E PK: F FD: F -> E -- good NOTE: The column C is missing from the table T because we removed it from R when we created S so it was not available when we created T. Database Principles Boyce-Codd Normal Form (BCNF): • Boyce-Codd Normal Form: A table is in Boyce-Codd Normal Form (BCNF) if it is in 3NF and there are no remaining bad FDs of any kind. • Any remaining bad FDs must be of one of the following forms: prime(not a key) -> prime prime(not a subset of a candidate key) -> non- prime prime(not subset of a candidate key), non-prime -> prime prime(not subset of a candidate key), non-prime -> non-prime Exercise: Prove non-prime prime is impossible. Database Principles BCNF Violation Example: • The table below describes the assignment of a police officer (o) to drive a certain vehicle (v) during a certain shift (s) on a certain day (d). The assignment is given a certain readiness level (r ) that depends on the skills of the officer and the equipment in the vehicle. The weather (w) is a function of the date and the shift. • The only CK is {VIN,Shift,Date}. ShiftAssignment Officer VIN Shift Date R_Level Weather o v s d r a Database Principles w BCNF Violation Example: ShiftAssignment Officer VIN Shift Date R_Level Weather o v s d r w a Prime: VIN, Shift, Date Non-prime: Officer, R_Level, Weather FDs: VIN Shift, Date -> Officer VIN, Officer -> R_Level Shift, Date -> Weather -- good -- BCNF violation -- BCNF violation Database Principles BCNF Violation Resolution • To remove an FD that is a BCNF violation from a table perform the following decomposition: – create a new table containing all the attributes of the bad FD Readiness VIN Officer R_Level Weather Shift Date Weather CK: {VIN, Officer} FD: VIN, Officer --> R_Level CK: {Shift, Date} FD: Shift, Date --> Weather Database Principles -- good -- good BCNF Violation Resolution • To remove an FD that is a BCNF violation from a table perform the following decomposition: – remove the attributes from the right-hand-side of the bad FDs from the original table ShiftAssignment Officer VIN Shift Date PK: {VIN, Shift, Date} FD: VIN, Shift, Date --> Officer pk Database Principles -- good Exercise: R A B C D FDs: A -> D D -> B A, C -> E E -> A E CKs: (A, C} {E, C} Prime: A, E, C Non-prime: B, D Violations: A -> D D -> B A, C -> E E -> A -- 2NF viol -- 3NF viol -- good -- BCNF viol Database Principles Exercise: • Remove 2NF violations R’ A B C S A E D FD: A -> D -- good pk A, C -> E E -> A -- good -- BCNF viol • Remove 3NF violations: R’ A B C E D -> B seems to have disappeared?? But did it? A -> D together with D -> B imply A -> B This new FD is another 2NF violation Database Principles Exercise: • Remove 2NF violations (second try) R’ A B C S A E D B pk FD: A -> D,B D -> B -- good -- 3NF viol • Remove 3NF violations: R” A C S A E D pk A, C -> E -- good E -> A -- BCNF viol A -> D -- good Database Principles S D B pk D -> B -- good Exercise: • Remove BCNF violation R’” C E S A pk pk D A -> D -- good S D B pk D -> B -- good U A E pk E -> A -- good NOTE: The FD A, C -> E seems to have disappeared but performing a join of R’” and U will reproduce R” R” = join(R’”,U) and so recover the original FD (and its associated Enterprise Rule). Database Principles Fourth Normal Form (4NF): • By this time there are no more bad FDs. • This makes the tables pretty good but sometimes it is still possible to improve them. • 4NF violations can come about from Enterprise Rules that can not be expressed as an FD. • 4NF violations typically come about when the two relationships, r1 and r2, described below are converted into a single table. E1 e1 r1 (1,n) E2 (1,n) e2 r2 (1,n) Database Principles E3 (1,n) e3 4NF Violation Example: • From the ERD CAR type MAKER name sells (1,n) (1,n) COUNTRY name sold_in (1,n) (1,n) • We produce the table sells_sold_in car_type maker_name country_name • Instead of sells car_type maker_name sells_in maker_name Database Principles country_name 4NF violation example What rows would the table sells_sold_in contain? One possibility is sells_sold_in = join(sells,sold_in) This will mean that in sells_sold_in every car_type that a particular car maker produces will be matched with every country that the same car maker sells cars in. If this is the case, then the sells_sold_in table contains a 4NF violation. Database Principles How to Recognize a 4NF Violation: • Suppose we have a car maker that sells two models in two countries. sells car_type maker_name c1 c2 m1 m1 sells_in maker_name m1 m1 country_name crty1 crty2 • In the single sells_sold_in table this would appear as sells_sold_in car_type maker_name c1 c1 c2 c2 country_name m1 ctry1 m1 crty2 m1 crty1 m2 crty2 Database Principles How to Recognize a 4NF Violation: • Within this table there may be a hidden Enterprise Rule make_sells_in car_type maker_name c1 c1 c2 c2 m1 m1 m1 m2 country_name ctry1 crty2 crty1 crty2 Enterprise Rule: If a car maker sells one car type in a country then it sells all its car types in the same country. Exercise: Verify that the insert, update and delete anomalies exist in the table above and they do not exist in the tables below sells car_type maker_name c1 c2 sells_in maker_name m1 m1 m1 m1 Database Principles country_name crty1 crty2 How to Recognize a 4NF Violation: A table is a 4NF violation only if the permanent Enterprise Rule exists and not whether or not, on a particular day, it happens to contain all the rows of a join of two other tables. The sells_sold_in table has a 4NF violation only if sells_sold_in = join(sells, sold_in) every time you might ever construct this join. NOTE: intersect(sells, sold_in ) is not a key in either sells or sold_in so as a decomposition, this is not guaranteed to be a lossless-join decomposition. It is lossless-join if sells_sold_in in a 4NF violation however. prove this. Database Principles How to Recognize a 4NF Violation looking at Rows: • sells_sold_in is a 4NF violation if join(sells, sells_in) is always equal to sells_sold_in join(sells, sells_in) = maker_sells_in sells car_type maker_name c1 c2 car_type maker_name m1 m1 = join sells_in maker_name m1 m1 maker_sells_in c1 c1 c2 c2 country_name crty1 crty2 Database Principles m1 m1 m1 m2 country_name ctry1 crty2 crty1 crty2 4NF Violation Resolution Decompose the table with a 4NF violation into two tables. Since the 4NF-violation table is always equal to the join the two decomposition tables this decomposition will be a lossless-join decomposition. Database Principles Will We Ever Encounter a 4NF violation? • Anyone who has taken this course should know not to create a single table from the two relationships in the original ER diagram so should not create a table containing a 4NF violation. • It is the presence of the Enterprise Rule mentioned earlier in a single table to produces a 4NF violation. • So if you do create the single table but the Enterprise Rule is not true then the single table is not a violation of the 4NF Rule. Database Principles Abstractly: Suppose R is a table with two attribute sets X and Y. Further suppose union(X, Y) = R intersect(X, Y) != intersect(X, Y) is not a key to either S = X(R) or T = Y(R) Then R contains a 4NF violation if and only if this decomposition is always lossless-join. This is a good thing. it tells us that if there is a problem there is always a solution. Database Principles