* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Zvi`s changes, if any, are marked in green, they are not copyrighted
Microsoft Access wikipedia , lookup
Global serializability wikipedia , lookup
Encyclopedia of World Problems and Human Potential wikipedia , lookup
Microsoft SQL Server wikipedia , lookup
Commitment ordering wikipedia , lookup
Extensible Storage Engine wikipedia , lookup
Serializability wikipedia , lookup
Open Database Connectivity wikipedia , lookup
Oracle Database wikipedia , lookup
Entity–attribute–value model wikipedia , lookup
Ingres (database) wikipedia , lookup
Microsoft Jet Database Engine wikipedia , lookup
Concurrency control wikipedia , lookup
Relational algebra wikipedia , lookup
Clusterpoint wikipedia , lookup
Database model wikipedia , lookup
Zvi’s changes, if any, are marked in green, they are not copyrighted by the authors, and the authors are not responsible for them !Chapter 7: Relational Database Design Our goal in this chapter is to formalize your intuition from the entity-relationship design approach. It’s actually kind of fun. If you are into the intellectual aesthetic of what we are doing, notice that with ER, we captured notions that are common to all database applications: 1-1, 1-many, many-to-many relationships, ISA relationships, etc. When I design a database application I draw those entities and relationships, but when I get down to the details, I use “normalization theory” (aka database design theory) to be sure I get this right. Database System Concepts 7.2 ©Silberschatz, Korth and Sudarshan First Normal Form Domain is atomic if its elements are considered to be indivisible units • Examples of non-atomic domains: • Set of names, composite attributes • Identification numbers like CS101 that can be broken up into parts A relational schema R is in first normal form if the domains of all attributes of R are atomic Non-atomic values complicate storage and encourage redundant (repeated) storage of data • E.g. Set of accounts stored with each customer, and set of owners stored with each account • We assume all relations are in first normal form Database System Concepts 7.3 ©Silberschatz, Korth and Sudarshan First Normal Form (Contd.) Atomicity is actually a property of how the elements of the domain are used. • E.g. Strings would normally be considered indivisible • Suppose that students are given course numbers which are strings of the form CS0012 or EE1127 • If the first two characters are extracted to find the department, the domain of course numbers is not atomic. • Doing so is a bad idea: leads to encoding of information in application program rather than in the database. In practice people do it to answer many queries and that is why SQL DML allows it, but we are in relational algebra… Story of a certain French bank…. Database System Concepts 7.4 ©Silberschatz, Korth and Sudarshan Pitfalls in Relational Database Design Relational database design requires that we find a “good” collection of relation schemas. A bad design may lead to • Repetition of Information. • Inability to represent certain information. Design Goals: • Avoid redundant data • Ensure that relationships among attributes are represented • Facilitate the checking of updates for violation of database integrity constraints. Database System Concepts 7.5 ©Silberschatz, Korth and Sudarshan A canonical example We first consider the relation schema R=R(E,G,S), where • E: Employee number • G: Grade (as in rank within a company) • S: Salary The users specified for us a set of semantic constraints: • Each value of E has a single value of G associated with it. • Each value of G has a single value of S associated with it. For simplicity, we will sometimes refer to relations schemes by listing their attributes; thus we may write EGS instead of R above. We will spend a lot of time discussing this example, if we understand it, we understand more than half of normalization theory needed in practice! Database System Concepts 7.6 ©Zvi M. Kedem Korth and Sudarshan ©Silberschatz, A sample instance Consider a sample instance of EGS: • EGS: E a b g d G A B A C S 1 2 1 1 By examining the above sample instance, we will note that the design suffers from various deficiencies. Database System Concepts 7.7 ©Zvi M. Kedem Korth and Sudarshan ©Silberschatz, Deficiencies of the design Redundancy: The fact (or more technically, the constraint) that G = A implies S = 1 is stored twice in the relation. General problems: • Inability to store partial information: If the salary schedule is augmented by a new grade, G = D, with corresponding salary of S = 3, this fact cannot be stored in the relation, unless E has the value of null. (But it is quite clear that in this relation E is the only possibly primary key,so we cannot have “several” tuples potentially with the value of null in this column.) • More difficult to enforce consistent updating. • Information may disappear: If employee b leaves, we forget what salary is given to grade B. Database System Concepts 7.8 ©Zvi M. Kedem Korth and Sudarshan ©Silberschatz, Decomposition The above problems can be solved by decomposing the relation EGS into two relations: • R1(EG)=pEG (R): • R2(GS)=pGS(R): SELECT E, G SELECT G, S FROM R; FROM R; Obtaining, • R1: E a b g d • R2: G A B C Database System Concepts G A B A C S 1 2 1 7.9 ©Zvi M. Kedem Korth and Sudarshan ©Silberschatz, Reconstructing the original relation How to reconstruct R from R1, R2? It is easy to see that it is enough to compute R = R1 R2 (the natural join of R1 and R2, written using easily displayed fonts), that is: SELECT E, G, S FROM R1, R2 WHERE R1.G = R2.G; and indeed we obtain. • E a b g d G A B A C S 1 2 1 1 Will this be true of other decompositions? Database System Concepts 7.10 ©Zvi M. Kedem Korth and Sudarshan ©Silberschatz, A second decomposition Let us try decomposing EGS into EG and ES. We obtain: • ES: E a b g d S 1 2 1 1 • EG: E a b g d G A B A C The natural join gives the original relation back: EG ES = EGS. Database System Concepts 7.11 ©Zvi M. Kedem Korth and Sudarshan ©Silberschatz, A third decomposition Let us attempt another “reasonable'' decomposition: EGS into ES and GS. We obtain: • ES: E a b g d S 1 2 1 1 • GS: G A B C S 1 2 1 Database System Concepts 7.12 ©Zvi M. Kedem Korth and Sudarshan ©Silberschatz, The natural join of the “small” relations Let us compute the natural join of ES and GS. We obtain: • E a a b g g d d G A C B A C A C S 1 1 2 1 1 1 1 This relation is not equal to the original EGS relation Database System Concepts 7.13 ©Zvi M. Kedem Korth and Sudarshan ©Silberschatz, Decompositions vs. reconstructions By examining the 3 decompositions we observe that: • Some decompositions allow us to reconstruct the original relation. • Some decompositions do not allow us to reconstruct the original relation • We want a decomposition that allows us to reconstruct the original relation (otherwise we get false facts). Database System Concepts 7.14 ©Zvi M. Kedem Korth and Sudarshan ©Silberschatz, Lossless join decompositions We say that some decomposition of a relation R into relations R1, ..., Rm, where each Ri is the projection of R on some attributes is a lossless join decomposition, if and only if: • Each attribute of R appears in some Ri • R is the natural join of R1, ..., Rm We will also use the term “valid'' for lossless join Database System Concepts 7.15 ©Zvi M. Kedem Korth and Sudarshan ©Silberschatz, Example Consider the relation schema: Lending-schema = (branch-name, branch-city, assets, customer-name, loan-number, amount) Redundancy: • Data for branch-name, branch-city, assets are repeated for each loan that a branch makes • Wastes space • Complicates updating, introducing possibility of inconsistency of assets value Null values • Cannot store information about a branch if no loans exist • Can use null values, but they are difficult to handle. Database System Concepts 7.16 ©Silberschatz, Korth and Sudarshan Decomposition Decompose the relation schema Lending-schema into: Branch-schema = (branch-name, branch-city,assets) Loan-info-schema = (customer-name, loan-number, branch-name, amount) All attributes of an original schema (R) must appear in the decomposition (R1, R2): R = R1 R2 Lossless-join decomposition. For all possible relations r on schema R r = R1 (r) R2 (r) A very important property: If R = R1 R2, always: r R1 (r) R2 (r) Therefore lossless means: you do not gain spurious tuples by joining, which seems the only reasonable way to reconstruct the original relation. Database System Concepts 7.17 ©Silberschatz, Korth and Sudarshan Example of Non Lossless-Join Decomposition Decomposition of R = (A, B) R2 = (A) A B A B a a b a b 1 2 A(r) B(r) 1 2 1 r A (r) Database System Concepts R2 = (B) B (r) A B a a b b 1 2 1 2 7.18 ©Silberschatz, Korth and Sudarshan Goal — Devise a Theory for the Following Decide whether a particular relation R is in “good” form. In the case that a relation R is not in “good” form, decompose it into a set of relations {R1, R2, ..., Rn} such that • each relation is in good form • the decomposition is a lossless-join decomposition • And dependencies are preserved (more about this later) Our theory is based on: • functional dependencies • multivalued dependencies (optional) Dennis doesn’t like these because he has never seen them hold in practice. But look up in the book if you like. Database System Concepts 7.19 ©Silberschatz, Korth and Sudarshan Functional Dependencies Constraints on the set of legal relations. Require that the value for a certain set of attributes determines uniquely the value for another set of attributes. A functional dependency is a generalization of the notion of a key. Database System Concepts 7.20 ©Silberschatz, Korth and Sudarshan Functional Dependencies (Cont.) Let R be a relation schema a R and b R The functional dependency ab holds on R if and only if for any legal relations r(R), whenever any two tuples t1 and t2 of r agree on the attributes a, they also agree on the attributes b. That is, t1[a] = t2 [a] t1[b ] = t2 [b ] Example: Consider r(A,B) with the following instance of r. 1 1 3 4 5 7 On this instance, A B does NOT hold, but B A does hold. Database System Concepts 7.21 ©Silberschatz, Korth and Sudarshan An example Although functional dependencies are properties of a schema, and not only of the individual instances, we will practice, and check if certain FDs are satisfied by some sample instance: • A a1 a2 a2 a1 a1 a2 • • • • • AC B b1 b1 b2 b1 b2 b3 D d1 d2 d3 d1 d2 d2 E e1 e2 e3 e1 e4 e5 F f1 f2 f3 f4 f5 f6 No AB C Yes E CD Yes DB No F ABC Yes Database System Concepts C c1 c1 c3 c1 c2 c3 7.22 ©Zvi M. Kedem Korth and Sudarshan ©Silberschatz, Relative power of some FDs An important note: A C is always at least as powerful as AB C that is If a relation satisfies A C it must satisfy AB C What we are really saying is that if C = f(A), then of course C = f(A,B) If not obvious to you, then look at it this way. A C says that any two rows that agree on their A values also agree on their C values. AB C says that any two rows that agree on both their A and B values also agree on their C values. Well, if they agree on both their A and B values, then surely they agree on their A values, so if AC held, we would be done. The reverse does not hold. Database System Concepts 7.23 ©Zvi M. Kedem Korth and Sudarshan ©Silberschatz, Functional Dependencies (Cont.) K is a superkey for relation schema R if and only if K R K is a candidate key for R if and only if • K R, and • for no proper subset a K, a R Functional dependencies allow us to express constraints that cannot be expressed using superkeys. Consider the schema: Loan-info-schema = (customer-name, loan-number, branch-name, amount). We expect this set of functional dependencies to hold: loan-number amount loan-number branch-name but would not expect the following to hold: loan-number customer-name Database System Concepts 7.24 ©Silberschatz, Korth and Sudarshan Use of Functional Dependencies We use functional dependencies to: • test relations to see if they are legal under a given set of functional dependencies. • If a relation r is legal under a set F of functional dependencies, we say that r satisfies F. • specify constraints on the set of legal relations • We say that F holds on R if all legal relations on R satisfy the set of functional dependencies F. Note: A specific instance of a relation schema may satisfy a functional dependency even if the functional dependency does not hold on all legal instances. For example, a specific instance of Loan-schema may, by chance, satisfy loan-number customer-name. Database System Concepts 7.25 ©Silberschatz, Korth and Sudarshan Trivial FDs An FD X Y, where X and Y are sets of attributes is trivial if and only if Y X. (Such FD gives no constraints, as it is always satisfied.) Example: AB A is trivial. Database System Concepts 7.26 ©Zvi M. Kedem Korth and Sudarshan ©Silberschatz, Decomposition and union of some FDs An FD X A1 A2 ... Am, where Ai's are individual attributes is equivalent to the set of FDs: X A1, X A2, ..., X Am. Example: ABC DE is equivalent to ABC D and ABC E. Database System Concepts 7.27 ©Zvi M. Kedem Korth and Sudarshan ©Silberschatz, Logical implications of FDs It will be important to us to determine if a given set of FDs forces other FDs to be true (necessarily implies that additional FDs must hold). Consider again the EGS relation. Which FDs are satisfied? E G, G S, E S are all true in the real world. If the real world tells you only: E G and G S Can you deduce on your own, without understanding the application that E S? Database System Concepts 7.28 ©Zvi M. Kedem Korth and Sudarshan ©Silberschatz, Logical implications of FDs (cont.) Yes, by simple logical argument: transitivity. Take any (set of) tuples that are equal on E. Then given E G we know that they are equal on G. Then given G S we know that they are equal on S. So we have shown that E S must hold We say that E G, G S logically imply E S and we write E G, G S |= E S. Database System Concepts 7.29 ©Zvi M. Kedem Korth and Sudarshan ©Silberschatz, Logical implications of FDs (cont.) If the real world tells you only: • E G and E S, Can you deduce on your own, without understanding the application that • G S No, because of a counterexample: • E a b G A A S 1 2 This relation satisfies E G and E S, but violates G S. Database System Concepts 7.30 ©Zvi M. Kedem Korth and Sudarshan ©Silberschatz, Closure of a Set of Functional Dependencies Given a set F set of functional dependencies, there are certain other functional dependencies that are logically implied by F. • E.g. If A B and B C, then we can infer that A C The set of all functional dependencies logically implied by F is the closure of F. We denote the closure of F by F+. Database System Concepts 7.31 ©Silberschatz, Korth and Sudarshan Closure of Attribute Sets Given a set of attributes a, define the closure of a under F (denoted by a+) as the set of attributes that are functionally determined by a under F: a b is in F+ b a+ Algorithm to compute a+, the closure of a under F (guilt by association approach) result := a; while (changes to result) do for each b g in F do begin if b result then result := result g end Database System Concepts 7.32 ©Silberschatz, Korth and Sudarshan Example of Attribute Set Closure R = (A, B, C, G, H, I) F = {A B AC CG H CG I B H} (AG)+ 1. result = AG 2. result = ABCG (A C and A B) 3. result = ABCGH (CG H and CG AGBC) 4. result = ABCGHI (CG I and CG AGBCH) Is AG a candidate key? 1. Is AG a super key? 1. Does AG R? Or equivalently: (AG)+ = R? 2. Is any subset of AG a superkey? 1. Does A R? or equivalently: A+ = R? 2. Does G R? or equivalently: G+ = R? Database System Concepts 7.33 ©Silberschatz, Korth and Sudarshan !Example (order of FDs slightly changed) Let R = ABCDEFGHIJ Given FDs: • F BG • A DE • BD • CF • IJ Then ABC+ = ABCDEFG and ABC Z (for a set of attributes Z), if and only if Z ABCDEFG Therefore: • ABC FC is true • ABC IC is false Database System Concepts 7.34 ©Zvi M. Kedem Korth and Sudarshan ©Silberschatz, EGS again We can easily show that • {E G, G S}+ contains E S • {E G, E S}+ does not contain G S So what we did in an ad-hoc manner previously can be converted to an algorithmic procedure. Database System Concepts 7.35 ©Zvi M. Kedem Korth and Sudarshan ©Silberschatz, Uses of Attribute Closure There are several uses of the attribute closure algorithm: Testing for superkey: • To test if a is a superkey, we compute a+, and check if a+ contains all attributes of R. Testing functional dependencies • To check if a functional dependency a b holds (or, in other words, is in F+), just check if b a+. • That is, we compute a+ by using attribute closure, and then check if it contains b. • Is a simple and cheap test, and very useful Computing closure of F • For each g R, we find the closure g+, and for each S g+, we output a functional dependency g S. Database System Concepts 7.36 ©Silberschatz, Korth and Sudarshan Keys of relations The notion of an FD allows us to formally define keys. Given R, satisfying a set of FDs, a set of attributes X of R is a key, if and only if: • X+ = R. • For every Y X such that Y X, we have Y+ R. Note that if R does not satisfy any (nontrivial) FDs, then R is the only key of R. In our example E G, G S, E S E was the only key of EGS. Database System Concepts 7.37 ©Zvi M. Kedem Korth and Sudarshan ©Silberschatz, Review of EGS Let us review the relation EGS. The “new relations” were: • EG, with the key E and a non-trivial FD E G. • GS, with the key G and a non-trivial FD G S. • ES, with the key E and a non-trivial FD E S. Each of those relations was “good,” • I.e., there were no redundancies in any of them, each nontrivial FD contained a key on the left side It turns out tha EG, GS is better than EG, ES as a decomposition. Can you see why? Database System Concepts 7.41 ©Zvi M. Kedem Korth and Sudarshan ©Silberschatz, !Third Normal Form A relation schema R is in third normal form (3NF) if for all: a b in F+ at least one of the following holds: • a b is trivial (i.e., b a) • a is a superkey for R, that is check whether a+ = R • Each attribute A in b – a is contained in a candidate key for R. (NOTE: each attribute may be in a different candidate key) We now proceed to show how to obtain third normal form with a lossless join decomposition. Database System Concepts 7.47 ©Silberschatz, Korth and Sudarshan Canonical Cover Sets of functional dependencies may have redundant dependencies that can be inferred from the others • Eg: A C is redundant in: {A B, B C, A C} Parts of a functional dependency may be redundant {A B, B C, A CD} can be simplified to {A B, B C, A D} Note, this is really a redundant dependency issue, as we can say that in {A B, B C, A C, A D} , A C is redundant • E.g. on RHS: • E.g. on LHS: {A B, B C, AC D} can be simplified to {A B, B C, A D} This is NOT redundant dependency issue, but unnecessarily weak description of dependencies Intuitively, a canonical cover of F is a “minimal” set of functional dependencies equivalent to F, with no redundant dependencies or having redundant parts of dependencies Database System Concepts 7.48 ©Silberschatz, Korth and Sudarshan Extraneous Attributes Consider a set F of functional dependencies and the functional dependency a b in F. • Attribute A is extraneous in a if A a and F logically implies (F – {a b}) {(a – A) b}. • Attribute A is extraneous in b if A b and the set of functional dependencies (F – {a b}) {a (b – A)} logically implies F. Note: implication in the opposite direction is trivial in each of the cases above, since a “stronger” functional dependency always implies a weaker one Example: Given F = {A B, AB C } • B is extraneous in AB C because A C is implied by F (equivalently C is in A+) Example: Given F = {A C, AB CD} • C is extraneous in AB CD since AB C can be inferred even after deleting C Database System Concepts 7.49 ©Silberschatz, Korth and Sudarshan Testing if an Attribute is Extraneous Consider a set F of functional dependencies and the functional dependency a b in F. • To test if attribute A a (left side) is extraneous in a 1. compute the attribute closure ( {a} – A)+ using the dependencies in F including a b, 2. check that ( {a} – A)+ contains b; if it does, A is extraneous • To test if attribute A b (right side) is extraneous in b 1. compute a+ using only the dependencies in F’ = (F – {a b}) {a (b – A)}, 2. check that a+ contains A; if it does, A is extraneous Database System Concepts 7.50 ©Silberschatz, Korth and Sudarshan Canonical Cover Algorithm (IMPORTANT) A canonical cover for F is a set of dependencies Fc such that • F logically implies all dependencies in Fc, and • Fc logically implies all dependencies in F, and • No functional dependency in Fc contains an extraneous attribute, and • Each left side of functional dependency in Fc is unique. To compute a canonical cover for F: repeat Use the union rule to replace any dependencies in F a1 b1 and a1 b2 with a1 b1 b2 Find a functional dependency a b with an extraneous attribute either in a or in b If an extraneous attribute is found, delete it from a b until F does not change Note: Union rule need not be applied until the end. Dennis prefers delaying the union because it makes it easier to spot redundant relations. But the book uses this approach. Database System Concepts 7.51 ©Silberschatz, Korth and Sudarshan Canonical Cover Algorithm (Dennis Preference) To compute a canonical cover for F: Convert functional dependencies so right hand side of every functional dependency has just one attribute repeat Remove redundant functional dependencies Find extraneous attributes from the left side of functional dependencies and remove them until F does not change Apply the union rule. You may use either algorithm. They are close but not the same. Database System Concepts 7.52 ©Silberschatz, Korth and Sudarshan The EmToPrHoSkLoRo relation The relation deals with employees who use tools on projects and work a certain number of hours per week. An employee may work in various locations and has a variety of skills. All employees having a certain skill and working in a certain location meet in a specified room once a week. The attributes of the relation are: • Em: Employee • To: Tool • Pr: Project • Ho: Hours per week • Sk: Skill • Lo: Location • Ro: Room for meeting Database System Concepts 7.53 ©Zvi M. Kedem Korth and Sudarshan ©Silberschatz, The FDs of the relation The relation satisfies the following FDs: • Each employee uses a single tool: Em To • Each employee works on a single project: Em Pr. • Each tool can be used on a single project only: To Pr. • An employee uses each tool for the same number of hours each week: EmTo Ho. • All the employees working in a location having a certain skill always meet in the same room: SkLo Ro. • Each room is in one location only: Ro Lo. Database System Concepts 7.54 ©Zvi M. Kedem Korth and Sudarshan ©Silberschatz, Our FDs 1. Em To 2. Em Pr 3. To Pr 4. EmTo Ho 5. SkLo Ro 6. Ro Lo Database System Concepts 7.55 ©Zvi M. Kedem Korth and Sudarshan ©Silberschatz, Run the algorithm No RHS can be combined, so we check whether there are any redundant attributes. We start with FD 1. • We check whether we can remove To from Em To. This is possible if we can derive Em To using Em Pr To Pr EmTo Ho SkLo Ro Ro Lo Computing the closure of Em using the above FDs gives us only EmPr, so the attribute To must be kept. Database System Concepts 7.56 ©Zvi M. Kedem Korth and Sudarshan ©Silberschatz, Run the algorithm • We check whether we can remove Pr (from FD2). This is possible if we can derive Em Pr using Em To To Pr EmTo Ho SkLo Ro Ro Lo Computing the closure of Em using the above FDs gives us EmToPrHo, so FD 2 can be removed.. Database System Concepts 7.57 ©Zvi M. Kedem Korth and Sudarshan ©Silberschatz, Run the algorithm We now have 1. Em To 2. To Pr 3. EmTo Ho 4. SkLo Ro 5. Ro Lo The next one is 3. • We check if Em can be removed from the left. This is possible if we can derive To Ho using all the FDs. Computing the closure of To using the FDs gives ToPr, and therefore Em cannot be removed • We check if To can be removed. This is possible if we can derive Em Ho using all the FDs. Computing the closure of Em using the FDs gives EmToPrHo, and therefore To can be removed Database System Concepts 7.58 ©Zvi M. Kedem Korth and Sudarshan ©Silberschatz, Run the algorithm We now have 1. Em To 2. To Pr 3. Em Ho 4. SkLo Ro 5. Ro Lo Database System Concepts 7.59 ©Zvi M. Kedem Korth and Sudarshan ©Silberschatz, Run the algorithm We now attempt to remove an attribute from the LHS of 4, and an entire FD, but neither is possible. We try again, but we cannot change anything. Therefore we are done To form the tables, simply combine the left hand and right hand sides (after combining tables FDs having a common left hand side via the union rule). Then remove any relation whose attributes are subsets of some other relation. In our example: EmToHo ToPr SkLoRo If any of these tables contains a superkey of all attributes, we have a lossless join decomposition. That wasn’t the case so we add EmSkLo. (Check that attribute cover gives all attributes and this is minimal.) Database System Concepts 7.60 ©Zvi M. Kedem Korth and Sudarshan ©Silberschatz, Why is it necessary to store a global key Why is it necessary to store the global key? Consider the relation: LnFnVaSa: • Ln: Last Name • Fn First Name • Va Vacation days per year • Sa Salary The functional dependencies are: • Ln Va • Fn Sa The key is LnFn The relation is not in 3NF Database System Concepts 7.61 ©Zvi M. Kedem Korth and Sudarshan ©Silberschatz, Why is it necessary to store . . . (cont.) Our algorithm (without the key being included) will produce the decomposition • LnVa • FnSa This is not a lossless-join decomposition • In fact we do not know who the employees are (what are the valid pairs of LnFn) So we decompose • LnVa • FnSa • LnFn Database System Concepts 7.62 ©Zvi M. Kedem Korth and Sudarshan ©Silberschatz, How about EGS Applying to algorithm to 1. E G 2. G S 3. E S The only thing we can do is to find that in FD 3, attribute S is redundant, and therefore we get as canonical cover: 1. E G 2. G S Database System Concepts 7.63 ©Zvi M. Kedem Korth and Sudarshan ©Silberschatz, Decomposition of EGS Applying the algorithm to EGS, we get our desired decomposition: • EG • GS Database System Concepts 7.64 ©Zvi M. Kedem Korth and Sudarshan ©Silberschatz, Back to SQL DDL How are we going to express in SQL what we have learned? We need to express: • keys • functional dependencies Expressing keys is very easy, we use the PRIMARY KEY and UNIQUE keywords Expressing functional dependencies is possible also by means of a CHECK condition • What we need to say for the relation SkLoRo is that each tuple satisfies the following condition There are no tuples in the relation with the same value of Ro and different values of Lo Database System Concepts 7.65 ©Zvi M. Kedem Korth and Sudarshan ©Silberschatz, Back to SQL DDL (cont.) CREATE TABLE SkLoRo (Sk …, Lo …, Ro…, PRIMARY KEY (Sk,Lo), CHECK (NOT EXISTS SELECT * FROM SkLoRo AS copy WHERE (SkLoRo.Ro = copy.Ro AND NOT SkLoRo.Lo = copy.Lo)); Clever but I’ve never seen this in practice. Database System Concepts 7.66 ©Zvi M. Kedem Korth and Sudarshan ©Silberschatz, !Design Goals Goal for a relational database design is: • Third Normal Form. • Lossless join. • Dependency preservation. Database System Concepts 7.67 ©Silberschatz, Korth and Sudarshan Denormalization for Performance May want to use non-normalized schema for performance E.g. displaying customer-name along with account-number and balance requires join of account with depositor Alternative 1: Use denormalized relation containing attributes of account as well as depositor with all above attributes • faster lookup • Extra space and extra execution time for updates • extra coding work for programmer and possibility of error in extra code Alternative 2: use a materialized view defined as account depositor • Benefits and drawbacks same as above, except no extra coding work for programmer and avoids possible errors Database System Concepts 7.68 ©Silberschatz, Korth and Sudarshan Other Design Issues Some aspects of database design are not caught by normalization Sometimes performance dictates combining several rows into one: Instead of earnings(company-id, year, amount), use • earnings-2000, earnings-2001, earnings-2002, etc., all on the schema (company-id, earnings). • Above are in third normal form (in fact a slightly stronger optional form called BCNF), but make querying across years difficult and needs new table each year • company-year(company-id, earnings-2000, earnings-2001, earnings-2002) • Also in BCNF, but also makes querying across years difficult and requires new attribute each year. • Is an example of a crosstab, where values for one attribute become column names • Used in spreadsheets, and in data analysis tools Database System Concepts 7.69 ©Silberschatz, Korth and Sudarshan Vertical Partitioning Can you see any reason why Acctbal(acctid, bal), acctaddress(acctid, street, city, zip) might be better than acct(acctid, bal, street, city, zip)? Both are normalized Database System Concepts 7.70 ©Silberschatz, Korth and Sudarshan