Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Extending and Inferring Functional Dependencies in Schema Transformation Qi He Tok Wang Ling Dept. of Computer Science School of Computing National Univ. of Singapore 1 Agenda Background Schematic discrepancy and schema transformation Qualified FD Inference rules of qualified FD Propagation rules of qualified FD during schema transformation Related work Conclusion 2 Motivation Functional dependencies (FDs) are useful Ensure the integrity of data in insertion, deletion and update Check lossless join decomposition Normalize relations Optimize queries … 3 Issues on FDs in Individual DB Inference rules of FDs in fixed relations Derivation of dependencies on views from those on original relations How about in multidatabase systems? 4 FDs in Multidatabase Multidatabase is distributed (i.e., data may be divided and stored in several databases) and heterogeneous (i.e., similar data may be represented in quite different forms in component databases) The distribution and heterogeneity of multidatabase require: Complex schema transformations rather than the relational algebra to integrate database schemas A flexible expression of FDs to represent dependencies in component databases Consequently, the expression and derivation of FDs in multidatabase are more difficult than in individual databases 5 Agenda Background Schematic discrepancy and schema transformation Qualified FD Inference rules of qualified FD Propagation rules of qualified FD during schema transformation Related work Conclusion 6 Schematic discrepancy A semantic heterogeneity among component schemas Schematic discrepancy occurs when one DB’s attribute values correspond to schema labels (attribute names or relation names) in others See next page for an example 7 DB1: Supply product supplier month price p1 s1 Jan 100 p1 s1 Feb 105 … … … … unite({s1,…,sn},supplier) fold(Supply, month,price) DB2: Supply product supplier Jan Feb … Dec p1 s1 100 105 … 97 p1 s2 99 107 … 110 … … … … … … unfold(Supply, month,price) split(Supply,supplier) split(Supply,supplier) unite({s1,…,sn},supplier) DB3: s1 DB4: s1 product month price p1 Jan 100 product Jan Feb … Dec p1 Feb 105 p1 100 105 … 97 … … … … … … … … product Jan Feb … Dec p1 99 107 … 110 … … … … unfold(si,month,price) s2 s2 fold(si,month,price) (i =1,2,… ) product month price p1 Jan 99 p1 Feb 107 … … … … … … Supplier names modeled as attribute values in DB1 and DB2, but relation names in DB3 and DB4; Months modeled as attribute values in DB1 and DB3, but attribute names in DB2 and DB4 8 Schematic Discrepant Transformation We call a transformation between schematic discrepant schemas schematic discrepant transformation A schematic discrepant transformation can be implemented by a sequence of restructuring operators (unfold, fold, split and unite [5]). Unfold transforms attribute values to attribute names, and fold does the converse. Split transforms attribute values to relation names, and unite does the converse. [5] L. V. S. Lakshmanan, F. Sadri, and S. N. Subramanian. On efficiently implementing schemaSQL on SQL database system. VLDB, 1999, 471-482. 9 Agenda Background Schematic discrepancy and schema transformation Qualified FD Inference rules of qualified FD Propagation rules of qualified FD during schema transformation Related work Conclusion 10 Jan product Qualified FD (E.g.) supplier price supplier price Feb product … Constraint: in the 1st quarter, the products’ prices supplied by supplier s1 are uniquely determined by the products. That is, in the union R = Mi{Jan, Feb, Mar}(supplier = s1Mi), the FD productprice holds. We represent this constraint as a qualified FD as follows: the dependency holds in the union of the relations Jan, Feb and Mar the dependency holds when supplier is s1 {Jan, Feb, Mar}(product, supplierσ={s1} price) called qualification attribute 11 Qualified FD FDs holding over a set of relations or a set of horizontal partitions of relations If a qualified FD only contains regular attributes and holds in one relation, then it is just a conventional FD. Flexible to represent dependencies in distributed and heterogeneous schemas 12 Inference rules of qualified FDs Let F be a set of qualified FDs and f be a qualified FD for the schemas of a set of relations R . We say F implies f, if every instance of R that satisfies F also satisfies f. We define F+ to be set of qualified FDs implied by F. Inference rules will allow us to deduce all the true qualified FDs for R , i.e., those in F+ 13 Inference Rules Given X a mixed set of regular and qualification attributes, Y and Z sets of regular attributes, A an attribute, R a set of relations, R1 R, D1 D dom(A), we have: (A1) Partition on relation set. If R (XY) holds, then R1 (XY) holds. (A2) Composition on relation set. If {Ri, Rj}(XY) holds for any Ri, Rj R, then R (XY) holds. (A3) Partition on qualification. If R (Aσ=D, XY) holds, then R (Aσ=D1, XY) holds. (A4) Composition on qualification. If R (A={ai,aj}, XY) holds for any ai, ajD, then R (A=D, XY) holds. (A5) Single-valued qualification. If a dom(A), then R (Aσ={a}A) holds. (A6) Assembly. If R (Aσ={a}, XY) holds for each a D, then R (Aσ=D, A, XY) holds. 14 Inference Rules (cont.) (A7) Reflexivity. If Y X, then R (XY) holds. (A8) Augmentation. If R (XY) holds, then R (X, ZY, Z). (A9) Transitivity. If R (XY) and R (X1, YZ) hold, where X1 is a set (possibly an empty set) of some qualification attributes in X, then R (XZ) holds. (A10) Dummy qualification. R (XY) iff R (Aσ=dom(A), XY) holds. A useful rule derived from the above inference rules: (A11) Disassembly. R (A, XY) holds iff R (Aσ={a}, XY) holds for each a dom(A) 15 Inference Rules Rule A5 and A7 give trivial dependencies Rule A7, A8 and A9 extend Armstrong's Axiom, the inference rules for FDs The rules are sound and complete to infer qualified FDs in fixed relations 16 Inference Rule of Disassembly (E.g.) Supply product supplier month price Original FD: {product, supplier, month}price equivalent to: {product, supplierσ={si}, month}price for each sidom(supplier) equivalent to: {product, supplierσ={si}, monthσ={mj}}price for each sidom(supplier) and mj dom(month) 17 Agenda Background Schematic discrepancy and schema transformation Qualified FD Inference rules of qualified FD Propagation rules of qualified FD during schema transformation Related work Conclusion 18 Propagation Rules Let R be a set of original relations, and S be a set of relations transformed from R by applying some restructuring operator (i.e., unfold, fold, split or unite). From a given set of qualified FDs F for R , propagation rules allow us to deduce the qualified FDs for S. We give the rules for split/unite and unfold/fold in a pairwise way. 19 Propagation Rules for split/unite split(R, B) … R(A1, …, An, B) b1(A1, …, An) unite({b1,…, bm}, B). where dom(B) = {b1, …, bm} bm(A1, …, An) Let X be a mixed set of regular and qualification attributes from {A1, …, An}, Y {A1, …, An} be a set of regular attributes, and R {b1, …, bm} be a set of relation names. We have the following rule: (P1) R(Bσ=R , XY) holds iff R (XY) holds. 20 Propagation Rules for split/unite The qualification on the values of attribute B in a qualified FD becomes the qualification on the relation set over which the inferred qualified FD holds, as B values become relation names in the transformed schemas. unite is a qualified FD preserving transformation, but split is not The FD R(X B) will not be transformed to any dependency in application of split 21 Propagation Rules for unfold/fold R1(A1, …, An, B, C) Rk(A1, …, An, B, C) fold(Si, B, C) for each i = 1, …, k Sk(A1, …, An, b1, …, bm) … S1(A1, …, An, b1, …, bm) … unfold(Ri, B, C) for each i = 1, …, k where dom(B) = {b1, …, bm}, and the values of attributes bi are from dom(C) Let X be a mixed set of regular and qualification attributes from {A1, …, An}, Y {A1, …, An} be a set of regular attributes. Let R {R1, …, Rk} and S {S1, …, Sk}, such that the relations of S are transformed from R. We have the following rules: (P2) R (Bσ={bi}, XC) holds iff S(Xbi) holds. (P3) R (Bσ={bi}, X, CY) holds iff S(X, biY) holds. (P4) R (XY) holds iff S(XY) holds. 22 Propagation Rules for unfold/fold The rules are based on a set of unfold/fold operators because some qualified FDs may hold over a set of relations which are transformed together by unfold/fold operations. Rule P2 and P3 indicate that the qualification on the value of attribute B in a qualified FD becomes the qualification on the attribute name in the inferred qualified FD. Both fold and unfold are not qualified FD preserving transformations. 23 Derive Qualified FDs in Schema Transformation Let R be a set of original relations, and S be a set of relations transformed from R by applying a sequence of restructuring operators From a given set of qualified FDs F for R , we deduce the qualified FDs for S, using both inference rules and propagation rules 24 Derive Qualified FDs in Schema Transformation (E.g.) fold(si,month,price) DB3: (i=1,2,…) s1 DB4: s1 unite({s1,…,sn},supplier) product Jan Feb … Dec product month price s2 s2 product Jan Feb … Dec product month price … DB1: Supply product supplier month price … Given qualified FDs in DB4: si(product Jan,…,Dec) for each si = s1, s2,…,sn After fold, in DB3, we get (by propagation rule P2) si(product, monthσ={mj} price) for each si = s1, s2,…,sn and mj = Jan, …, Dec then (by disassembly rule A11): si(product, month price) After unite, in DB1, we get (by propagation rule P1) Supply(product, month, supplierσ={si} price) for each si = s1, s2,…,sn then (by disassembly rule A11): Supply(product, month, supplier price) 25 Derive Qualified FDs in Schema Transformation (E.g.) In the above example, the qualified FD on the original relations are transformed to an equivalent qualified FD on the transformed relation. Directly applying the inference rules and propagation rules need the computation of qualified FD closure, which take much time. Instead, in the paper, we gave some derived rules to propagate qualified FDs in schema transformation. 26 Related work (1) An extension to FDs in the database design world are FDs partially holding in a relation, in the sense that only some tuples, called exceptions, break the dependencies. These dependencies include weak FDs [7], afunctional dependencies [3] and partial FDs [2]. [2] F. Berzal, J. C. Cubero, F. Cuenca, J. M. Medina. Relational decomposition through partial functional dependencies. Data & Knowledge Engineering 43(2), 2002, 207-234. [3] P. De Bra and J. Paredaens. Conditional dependencies for horizontal decompositions. ICALP, 1983. [7] Tok Wang Ling. Extending classical functional dependencies for physical database design (lecture notes), 2001. http://www.comp.nus.edu.sg/~lingtw/cs4221/extended.fds.pdf 27 Related work (1) Differences between weak FD (or some other similar dependencies) and qualified FD: A weak FD predicates that some tuples (but don’t know which tuples) in a relation would violate the dependency, while a qualified FD indicates exactly what kind of tuples satisfy the dependency. Those dependencies are specified on individual relations, while qualified FDs may hold over a set of relations. We are not aware of any axiomatization for weak FD, afunctional dependency or partial FD. 28 Related work (2) In individual databases, the issue of inferring FDs on views from FDs on original relations has been introduced in [1, 4]. They did not consider schematic discrepancy in schema transformation Our work complements existing work based on the relational algebra [1] S. Abiteboul, R. Hull, and V. Vianu. Foundations of Databases. Addison-Wesley, 1995, 173-187, 216-235. [4] G. Gottlob. Computing covers for embedded functional dependencies. SIGMOD, 1987. 29 Related work (3) Some work [6, 8, 9] has been done on the derivation of constraints in schema integration based on ER or OO model. They did not consider schematic discrepancy in schema integration They did not prove the completeness of their methods, while we did. [6] Mong Li Lee and Tok Wang Ling. Resolving constraint conflicts in the integration of ER schemas. ER, 1997, 394-407. [8] M. P. Reddy, B. E. Prasad, and A. Gupta. Formulating global integrity constraints during derivation of global schema. Data & Knowledge Engineering, 1995, 241-268 [9] M. W. W. Vermeer and P. M. G. Apers. The role of integrity constraints in database interoperation. VLDB, 1996, 425-435 30 Conclusion Proposed qualified FD to represent dependencies for distributed and heterogeneous schemas in a multidatabase system Found a set of inference rules and propagation rules to derive qualified FDs in schematic discrepant transformation The rules are sound and complete to derive some classes (not all) of qualified FDs in schema transformation 31 Reference [1] S. Abiteboul, R. Hull, and V. Vianu. Foundations of Databases. Addison-Wesley, 1995, 173-187, 216-235. [2] F. Berzal, J. C. Cubero, F. Cuenca, J. M. Medina. Relational decomposition through partial functional dependencies. Data & Knowledge Engineering 43(2), 2002, 207-234. [3] P. De Bra and J. Paredaens. Conditional dependencies for horizontal decompositions. ICALP, 1983. [4] G. Gottlob. Computing covers for embedded functional dependencies. SIGMOD, 1987. [5] L. V. S. Lakshmanan, F. Sadri, and S. N. Subramanian. On efficiently implementing schemaSQL on SQL database system. VLDB, 1999, 471-482. [6] Mong Li Lee and Tok Wang Ling. Resolving constraint conflicts in the integration of ER schemas. ER, 1997, 394-407. [7] Tok Wang Ling. Extending classical functional dependencies for physical database design (lecture notes), 2001. http://www.comp.nus.edu.sg/~lingtw/cs4221/extended.fds.pdf [8] M. P. Reddy, B. E. Prasad, and A. Gupta. Formulating global integrity constraints during derivation of global schema. Data & Knowledge Engineering, 1995, 241-268 [9] M. W. W. Vermeer and P. M. G. Apers. The role of integrity constraints in database interoperation. VLDB, 1996, 425-435 32