Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Normalization of Database Relations : Manual, Automatic or None Ali Yazici Software Engineering Department Atılım University, Ankara, Turkey Agenda 1. 2. 3. 4. 5. 6. 7. 8. 9. Introduction Normalization (briefly) Normalizing XML Auto normalization (earlier work) Normalization algorithms JMathNorm Demo Conclusions References 22.5.2017 ICADIWT 2009 – London Metropolitan University 2 1. Introduction Stages of RDB design: • • • • Determination of user requirements Conceptual design Logical design Physical design Conceptual design • Data Modelling with • • UML ER or EER Logical design • • Mapping of UML/EER to Relational Schema Normalization 22.5.2017 ICADIWT 2009 – London Metropolitan University 3 Normality: For and Against? 22.5.2017 ICADIWT 2009 – London Metropolitan University 4 2. Normalization (briefly) R={R1,R2,R3,…,Rn} For each Ri designate a primary key (PK) Define FD’s among Ri’s based on the PK Normalize 1NF → 8NF ! • • • 1NF: A relational database table is in1NF if it is free of repeating groups. 2NF: A 1NF table is in 2NF if and only if , given any candidate key and any attribute that is not a constituent of a candidate key, the non-key attribute depends upon the whole of the candidate key rather than just a part of it. 3NF: A table is in 3NF if and only if both of the following conditions hold: • • The relation R (table) is in second normal form (2NF) Every non-prime attribute of R is non-transitively dependent on every key of R. 22.5.2017 ICADIWT 2009 – London Metropolitan University 5 ...Normalization (briefly) • • • • BCNF: A table is in Boyce-Codd normal form if and only if , for every one of its non-trivial functional dependencies X → Y, X is a superkey—that is, X is either a candidate key or a superset thereof. 4NF: A table is in 4NF if and only if , for every one of its non-trivial multivalued dependencies X →→ Y, X is a superkey—that is, X is either a candidate key or a superset thereof. 5NF: A table is said to be in the 5NF if and only if every join dependency in it is implied by the candidate keys. • A join dependency *{A, B, … Z} on R is implied by the candidate key(s) of R if and only if each of A, B, …, Z is a superkey for R 6NF: A table is in 6NF if and only if it satisfies no nontrivial join dependencies at all — where, as before, a join dependency is trivial if and only if at least one of the projections (possibly U_projections) involved is taken over the set of all attributes of R concerned. 22.5.2017 ICADIWT 2009 – London Metropolitan University 6 ...Cause of violations Multi-valued attributes cause violation of 1NF Partial dependency violates 2NF Transitive dependency violates 3NF Dependence of a prime attribute on a non-prime attribute violates BCNF 22.5.2017 ICADIWT 2009 – London Metropolitan University 7 …Automation Normalization is a process of decomposing each Ri into well-formed! subrelations in a systematic way Can we automate this process? Commercial design tools (CASE tools) do not have a complete solution • • • • Oracle Designer Oracle SQL Data Modeller ER-Win ... 22.5.2017 ICADIWT 2009 – London Metropolitan University 8 Why normalize? If db schema is not normalized • As a result of certain transactions, data is lost • Database integrity is lost! • Example: • COMPANY(InvoNo, Item, CustId, CustName, CustPhoto, CustNextKin, (CustProperty) ) • COMPANY is not normalized! • Consequences! 22.5.2017 ICADIWT 2009 – London Metropolitan University 9 ..if not normalized! InNo Item CustId Cust Cust Cust Name Photo NextKin 0345 PC 108 Jane 0390 Laptop 109 John 0455 Printer 108 Jane 22.5.2017 ICADIWT 2009 – London Metropolitan University Property Property 1 2 10 Example Sample DB Schema • PURCHASE-ITEM = {orderNo, • partNo,partDescription, custNo, custName, quantity, price} FD1 {orderNo, partNo}→{partDescription, quantity, price} • FD2 partNo → partDescription • FD3 (Partial dependency) custNo → custName (Transitive dependency) 22.5.2017 ICADIWT 2009 – London Metropolitan University 11 ...Example Normalization into 2NF by decomposing PURCHASE_ITEM into P1 and P2 as follows: • P1={orderNo, partNo,quantity, custNo, custName, price} • P2={partNo,partDescription} Normalization of P1 into 3NF • P11={orderNo, partNo,quantity, custNo, price} • P12={custNo, custName} 22.5.2017 ICADIWT 2009 – London Metropolitan University 12 3. Normalizing XML Relational normal forms will partly apply to XML codes to eliminate by using keyref and key tags on XML schemas • • • • • Ambiguity Minimize redundancy Preserve data consistency Allow for rational data maintenance Some references: • • • W. Provost, http://www.xml.com/pub/a/2002/11/13/normalizing.html M. Arenas & L. Libkin, “A normal form for XML Documents”, Proceedings of the 21st. ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, M. (2002) J.J.Lu & S. Renjen, “Normalizing XML schemas through relational databases”, Proceedings of the 43rd ACM Southeast Regional Conference, Vol.1 (2005) 220 - 221 22.5.2017 ICADIWT 2009 – London Metropolitan University 13 4. Auto Normalization (Earlier work) Prolog based work (complex programming and data structures) • • Ceri, S. and Gottlob, G.: Normalization of Relations & Prolog, Communications of the ACM, Vol.29, No.6 (1986) Welzer, W., Rozman, I. and Gyrks, J.G.: Automated Normalization Tool, Microprocessing & Microprogramming, Vol.25 (1989) 375-380. A prototype tool • Du, H., and Werry, L.: Micro: A Normalization Tool for Relational Database Designers, J. Of Network and Computer Applications, Vo.22 (1999) 215-232 Object Constraint Object • Akehurst, D.H., Bordbar, B., Rodgers, P.J., and Dalgliesh, N.T.G.: Automatic Normalization via Metamodelling, Proc. of the ASE 2002 Workshop on Declarative Meta Programming to Support Software Development (2002) 22.5.2017 ICADIWT 2009 – London Metropolitan University 14 ...Earlier work A web-based tool • Kung, H-J. and Tung, H-L.: A Web-based Tool to Enhance Teaching/Learning Database Normalization, Proc. of the 2006 Southern Association for Information Systems Conference (2006) 251-258. Symbolic Approach • Yazici, A. and Karakaya, Z.: Normalizing Relational Database Schemas Using Mathematica, LNCS, Springer-Verlag, Vol.3992 (2006) 375-382 Dependency graph approach • Bahmani, A.H., Naghibzadeh and Bahmani, B, CCECE 2008, Canada 22.5.2017 ICADIWT 2009 – London Metropolitan University 15 5. Normalization Algorithms 22.5.2017 ICADIWT 2009 – London Metropolitan University 16 ClosureX Algorithm ClosureX: Determining X+ under FD Given a DB schema R, a set of attributes X and FD set F calculate X+ 1. X+ := X; 2. repeat tempX+ := X+ for each FD Y → Z in F do if X+ covers Y then X+ := X+ U Z until ( X+ = tempX+ ) 22.5.2017 ICADIWT 2009 – London Metropolitan University 17 FullClosureX FullClosureX, X++, is yet another function similar to ClosureX which returns all attributes that are fully dependent on X with respect to FD set. This function is used to remove partial dependencies for transforming a relation into 2NF. Algorithm FullClosureX(X: attribute set; F: FD set): return closure in tempX ; 1. tempX := X; 2. repeat oldX := tempX; for each FD Y → Z in F do if Y subset tempX then if not(Y subset X) then tempX := Z U tempX else if Y = X then tempX := Z U tempX; until (length(oldX) = length(tempX)); 3. return tempX; 22.5.2017 ICADIWT 2009 – London Metropolitan University 18 Some Definitions Given a set of FD’s F, an attribute B is said to be extraneous in X→A w.r.t. F if X = ZB, X ≠ Z, and A ε Z+. A set of FD’s H is called a minimal cover for a set F if each dependency in H as exactly one attribute on the right-hand side, if no attribute on the left-hand side is extraneous, and if no dependency in H can be derived from the other dependencies in H. • Uniqueness problem! Actually, the calculation of a minimal cover consists of ”Elimination of ExtraneousAttributes” followed by the ”Elimination of Redundant Dependencies”. Normalization algorithms considered in this study makes use of the minimal cover of a given set of FD’s. Moreover, they are computationally efficient with at most O(d2) operations where d is the number of FD’s in the schema. 22.5.2017 ICADIWT 2009 – London Metropolitan University 19 2NF Algorithm Algorithm 2NF(X: attribute set; F: FD set): return decomposed relation set ; (insert, first, next, mark, and end are list functions) 1. P := ø; 2. G := MinimalCover(F); 3. F := First(G); 4. While f ≠ end(G) do begin XP := fullClosureX (f.determinant, G) mark(XP,R); insert(XP,P); repeat ff := f; f := next(f); until ff.determinant ≠ f.determinant 5. for all attributes in X do if attribute is not marked then insert (attribute,XP); if XP ≠ ø then insert(XP,P); 6. Return P; 22.5.2017 ICADIWT 2009 – London Metropolitan University 20 3NF Algorithm Algorithm 3NF: Bernstein’s Synthesis Algorithm:Given a DB schema R and FD set F, find 3NF decompositions D 1. Find a Minimal Cover G for F 2. For each a left-hand side X of a FD that appears in G, create a relation schema in D with attributes {XU{A1}U{A2}U. . .U{Ak}}, where X→A1, X→A2,...,X→Ak are the only dependencies in G with X as the left-hand side (X is the key) 3. Place any remaining attributes (that have not been placed in any relation) in a single relation schema to ensure the attribute preservation 22.5.2017 ICADIWT 2009 – London Metropolitan University 21 BCNF Algorithm Algorithm BCNF(R: attribute set in 3NF; F: FD set): return Q in BCNF; 1. D := R; 2. while there is a left-hand side X of a FD X → Y in F do if X → Y violatesBCNF then decomp R into Rm and Rn Rm := D - Y ; and Rn := X U Y ; 3. return Q := Rm U Rn; 22.5.2017 ICADIWT 2009 – London Metropolitan University 22 Properties of 3NF and BCNF algorithms Bernstein’s Synthesis Algorithm • It guarantees dependency preserving property. • However, it does not guarantee nonadditive join property. • An algorithm exists which satisfy both (Elmasri and Navathe, Fundamentals of Database Systems, 5th Ed., Addison-Wesley, (2007) 379-380). • Algorithm requires the relation to be in 1NF (not 2NF) only! BCNF Algorithm • It guarantees nonadditive join property. Both algorithms are quadratic (O(d2)) in the number of attributes=degree of the relation. 22.5.2017 ICADIWT 2009 – London Metropolitan University 23 6. JMathNorm An interactive GUI tool • IntelliJ Java for GUI and JLink library • Mathematica JLink facility to derive the kernel • Mathematica implementation of the abstract normalization algorithms 22.5.2017 ICADIWT 2009 – London Metropolitan University 24 Data Structures List structure and basic built-in list functions such as • Union[], Complement[], Intersection[], MemberQ[], Extract[], Append[], Length[], and Sort[]. For a db schema R left/right side of the FD’s are represented as lists. For example: • R = {a, b, c, d, e, f}, (trivial) functional dependencies for example are shown as two lists: • FL = {{a,b}, {a,b},{a,b},{a,b}} • FR = {c, d, e, f} 22.5.2017 ICADIWT 2009 – London Metropolitan University 25 BCNF in Mathematica BCNF[GL_,GR_,U_] := Module[{i,X,D,Q,DIF,REL}, D = U; Q = {}; For[i = 1,i <= Length[GL], i++, If[Length[GL[[i]]]>1,X = Sort[GL[[i]]],X={GL[[i]]}]; flag = violatesBCNF[GL,GR,X,GR[[i]],U]; If[flag == 1,REL = Union[X,{GR[[i]]}]; Q = Union[Q,{REL}]; RC = Complement[U,{GR[[i]]}]; DIF = Intersection[U,RC]; Print["Q=",Q," RC=",RC," DIF=",DIF]; Q = Union[Q,{DIF}];];]; Return[Q];]; violatesBCNF[GL_,GR_,X_,Y_,U_] := Module[{XP}, XP = Sort[ClosureX[GL,GR,X]]; If[XP == Sort[U],flag = 0,flag = 1]; Return[flag];]; 22.5.2017 ICADIWT 2009 – London Metropolitan University 26 IntelliJ Main Menu 22.5.2017 ICADIWT 2009 – London Metropolitan University 27 Launch Mathematica 22.5.2017 ICADIWT 2009 – London Metropolitan University 28 Functional Dependency submenu 22.5.2017 ICADIWT 2009 – London Metropolitan University 29 Operations submenu 22.5.2017 ICADIWT 2009 – London Metropolitan University 30 Normalization submenu 22.5.2017 ICADIWT 2009 – London Metropolitan University 31 Results submenu 22.5.2017 ICADIWT 2009 – London Metropolitan University 32 Demo 22.5.2017 ICADIWT 2009 – London Metropolitan University 33 Defining FD’s 22.5.2017 ICADIWT 2009 – London Metropolitan University 34 BCNF Decomposition 22.5.2017 ICADIWT 2009 – London Metropolitan University 35 Output in table form 22.5.2017 ICADIWT 2009 – London Metropolitan University 36 8. Conclusions Symbolic power of Mathematica JLink facility of Mathematica OOP with Java to produce a GUI Things to be done • • • • • • Physical Schema Generation geared towards a specific DBMS Progressive approach between normal forms Tree representation of the decomposition process Input FD’s from a text file and parse Make it interactive and intelligent Web application! 22.5.2017 ICADIWT 2009 – London Metropolitan University 37 9. References 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. Yazici, A. and Karakaya, Z.: Normalizing Relational Database Schemas Using Mathematica, LNCS, Springer-Verlag, Vol.3992 (2006) 375-382. Elmasri, R. and Navathe, S.B.: Fundamentals of Database Systems, 5th Ed., Addison Wesley (2007). Kung, H. and Case, T.: Traditional and Alternative Database Normalization Techniques:Their Impacts on IS/IT Students’ Perceptions and Performance, International Journal of Information Technology Education, Vol.1, No.1 (2004) 53-76. Ceri, S. and Gottlob, G.: Normalization of Relations and Prolog, Communications of the ACM, Vol.29, No.6 (1986) Welzer, W., Rozman, I. and Gyrks, J.G.: Automated Normalization Tool, Microprocessing and Microprogramming, Vol.25 (1989) 375-380. Akehurst, D.H., Bordbar, B., Rodgers, P.J., and Dalgliesh, N.T.G.: Automatic Normalization via Metamodelling, Proc. of the ASE 2002 Workshop on Declarative Meta Programming to Support Software Development (2002) Kung, H-J. and Tung, H-L.: A Web-based Tool to Enhance Teaching/Learning Database Normalization, Proc. of the 2006 Southern Association for Information Systems Conference (2006) 251-258. Wolfram, S.: The Mathematica Book, 4th Ed., Cambridge University Press (1999). Du, H. and Wery, L.: Micro: A Normalization Tool for Relational Database Designers, Journal of Network and Computer Applications, Vol.22 (1999) 215-232. Manning, M.V.: Database Design, Application Development and Administration, 2nd. Ed., McGraw-Hill (2004). Diederich, J. and Milton, J.: New Methods and Fast Algorithms for Database Normalization, ACM Trans. on Database Systems, Vol.13, No.3(1988) 339-365. Ozharahan, E.: Database Management: Concepts, Design and Practice, Prentice Hall (1990). Bernstein, P.A.: Synthesizing Third Norm Relations from Functional Dependencies, ACM Trans. on Database Systems, Vol.1, No.4 (1976) 277-298. 22.5.2017 ICADIWT 2009 – London Metropolitan University 38 22.5.2017 ICADIWT 2009 – London Metropolitan University Thanks! 39