Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Dagstuhl Seminar January 17, 2011 - Bidirectional Transformations PReCISE LIBD Transformations in Database Engineering Jean-Luc Hainaut, Anthony Cleve 2 Objectives of the tutorial to presents some important aspects of transformations in DB in an informal and intuitive way with practical applications in mind mainly (but not exclusively) for non-DB communities 3 Contents 1. The context of databases 2. The concept of transformation in the DB context 3. Elementary and complex transformations 4. Representation of schemas in DB Engineering 5. About property preservation 6. Applications of transformations 7. Challenges 4 1. The context of databases 1. The context of databases - Concepts, terms, issues What is a Database? The structured collection of the data necessary to keep the memory of an organization (structures, rules and facts) to act as a reliable and efficient data server for an application system schema Client program Information/data structures are known (a.o., for processing) through schema(s) 5 6 1. The context of databases - Concepts, terms, issues The schemas and models of a database Schemas in DBMS's (1973-2011) view n view 2 view 1 conceptual schema Client program logical schema = uses interface physical schema = mapping = instance of 7 1. The context of databases - Concepts, terms, issues The schemas and models of a database Standard DB design methodologies (1974-2011) Users requirements DDL code Conceptual design conceptual schema Logical design logical schema Physical design physical schema Coding View design view n view 2 view 1 8 1. The context of databases - Concepts, terms, issues transformations Un ouvrage est une oeuvre littéraire publiée. Il est caractérisé par son numéro identifiant, son titre, son éditeur, sa date de première parution, ses mots-clés (10 au maximum), une brève note de présentation (ces notes sont en cours de constitution), le nom et le prénom de ses auteurs. A un ouvrage correspondent un certain nombre d'exemplaires, qui en sont la matérialisation physique. ... Database design create database BIB create dbspace BIB_DATA; create table OUVRAGE ( NUMERO char(18) not null, TITRE varchar(60) not null, EDITEUR char(32) not null, DATE_1RE_PARUTION date not null, PRESENTATION varchar(255), primary key (NUMERO)) in BIB_DATA; . . . alter table EXEMPLAIRE add constraint FKDE foreign key (NUMERO)references OUVRAGE; . . . create unique index IDOUVRAGE on OUVRAGE (NUMERO); . . . OUVRAGE Numéro Titre Editeur Date 1re parution Mot clé[0-10] Présentation[0-1] id: Numéro 0-N écrit 1-N AUTEUR Nom Prénom 0-N de 1-1 EXEMPLAIRE Num série Date acquisition Localisation Etage Rayon Travée Etat[0-1] id: de.OUVRAGE Num série MOT_CLE NUMERO MOT_CLE id: NUMERO MOT_CLE acc ref: NUMERO EXEMPLAIRE NUMERO NUM_SERIE DATE_ACQUISITION LOC_ETAGE LOC_RAYON LOC_TRAVEE ETAT[0-1] id: NUMERO NUM_SERIE acc ref: NUMERO MOT_CLE NUMERO MOT_CLE id: NUMERO MOT_CLE ref: NUMERO conceptual schema OUVRAGE NUMERO TITRE EDITEUR DATE_1RE_PARUTION PRESENTATION[0-1] id: NUMERO acc BIB_DATA OUVRAGE MOT_CLE EXEMPLAIRE ECRIT AUTEUR AUTEUR ID_AUTEUR NOM PRENOM id: ID_AUTEUR acc EXEMPLAIRE NUMERO NUM_SERIE DATE_ACQUISITION LOC_ETAGE LOC_RAYON LOC_TRAVEE ETAT[0-1] id: NUMERO NUM_SERIE ref: NUMERO ECRIT ID_AUTEUR NUMERO id: ID_AUTEUR NUMERO acc ref: NUMERO acc equ: ID_AUTEUR physical schema (Oracle 11) OUVRAGE NUMERO TITRE EDITEUR DATE_1RE_PARUTION PRESENTATION[0-1] id: NUMERO AUTEUR ID_AUTEUR NOM PRENOM id: ID_AUTEUR ECRIT ID_AUTEUR NUMERO id: ID_AUTEUR NUMERO ref: NUMERO equ: ID_AUTEUR Logical schema (relational) 1. The context of databases - Concepts, terms, issues The schemas and models of a database in UML: a meta-model is a formal system of abstract constructs that can be used to describe any situation pertaining to a modeling domain; the notation is an integral part of a meta-model; a model is an artefact made up of instances of constructs of a meta-model, and that specifies the structures of a definite situation of an application domain in the Database realm: a model is a formal system of abstract constructs that can be used to describe any situation pertaining to a modeling domain; can be given several notations; a schema is an artefact using the constructs of a model, and that specifies the structures of one definite situation of an application domain Examples: the relational model the Entity-relationship model the relational schema of the DAGSTUHL-ORG database. 9 10 1. The context of databases - Concepts, terms, issues The schemas and models of a database Seen as an evolving bag of facts real world system modelled by describes instance of comply with fact classes describes schema a way to see the world philosophy describes instance of expressed into model is a domain of meta-schema describes instance of expressed into instance of meta-model meta-meta-schema expressed into instance of metameta-model 1. The context of databases - Concepts, terms, issues The schemas and models of a database Abstraction levels and paradigms abstraction levels conceptual logical/view physical code Paradigms (aka "data model") ER; EER; OO (UML; etc.); ORM; Bachman; RDF; . . . relational; OO (UML; etc.); object-relational; XML DTD; XML Schema; standard file; network; hierarchical; . . . Oracle 11g; Oracle 8; DB2 9.7; MySQL 5.5; IDS2; IMS; . . . Oracle 11g SQL-DDL; IDS2 DDL; . . . 11 1. The context of databases - Engineering processes Database engineering processes DB Analysis and Design across abstraction levels (from abstract to concrete) and modelling paradigms DB Reverse Engineering across abstraction levels (from concrete to abstract) and modelling paradigms DB Evolution same abstraction level - same modelling paradigm DB Migration same abstraction level - change of modelling paradigm others : refactoring, integration, view derivation, ETL, . . . several abstraction levels - several modelling paradigms 12 13 1. The context of databases - Engineering processes Database engineering processes DB Analysis and Design conceptual ER EER OO ORM logical/view relational OO Obj-relat network physical Oracle 8 Oracle 11g DB2 9.7 IDS2 code Oracle 8 DDL Oracle 11g DDL DB2 9.7 DDL IDS2 DDL 14 1. The context of databases - Engineering processes Database engineering processes DB Reverse Engineering conceptual ER EER OO ORM logical/view relational OO Obj-relat network physical Oracle 8 Oracle 11g DB2 9.7 IDS2 code Oracle 8 DDL Oracle 11g DDL DB2 9.7 DDL IDS2 DDL 15 1. The context of databases - Engineering processes Database engineering processes DB Evolution conceptual ER EER OO ORM logical/view relational OO Obj-relat network physical Oracle 8 Oracle 11g DB2 9.7 IDS2 code Oracle 8 DDL Oracle 11g DDL DB2 9.7 DDL IDS2 DDL Not recommended 16 1. The context of databases - Engineering processes Database engineering processes DB Migration conceptual ER EER OO ORM logical/view relational OO Obj-relat network physical Oracle 8 Oracle 11g DB2 9.7 IDS2 code Oracle 8 DDL Oracle 11g DDL DB2 9.7 DDL IDS2 DDL Not recommended 17 2. The concept of transformation in the DB context 2. The concept of transformation in the DB context 18 DB engineering process modelling Most engineering processes are artefact transformations Users requirements DDL code = DB-design(Users Requirements) Conceptual design Conceptual schema DB-design = Coding o PhysD o LogD o ConcD Conceptual schema = ConcD(Users Requirements) Logical design Logical schema Logical schema = LogD(Conceptual schema) Physical schema = PhysD(Logical schema) Physical design Physical schema DDL code = Coding(Physical schema) ConcD = Analysis o Normalisation o Integration etc. Coding DDL code 19 2. The concept of transformation in the DB context DB engineering process modelling An example (relational logical design) BOOK ISBN Title Author[0-5] DatePublished id: ISBN 0-N of 1-1 COPY CopyNbr DatePurchased id: of.BOOK CopyNbr BOOK ISBN Title DatePublis hed id: ISBN AUTHOR AuthorNam e id: AuthorNam e COPY ISBN CopyNbr DatePurchas ed id: ISBN CopyNbr ref: ISBN WRITE AuthorNam e ISBN id: ISBN AuthorNam e ref: ISBN ref: AuthorNam e No m ore than 5 WRITE rows per BOOK row. 20 2. The concept of transformation in the DB context DB engineering process modelling An example (transforming multivalued attributes) BOOK ISBN Title Author[0-5] DatePublished id: ISBN 0-N of 1-1 COPY CopyNbr DatePurchased id: of.BOOK CopyNbr BOOK ISBN Title DatePublis hed id: ISBN 0-N 0-5 of 1-1 COPY CopyNbr DatePurchas ed id: of.BOOK CopyNbr AUTHOR AuthorNam e id: AuthorNam e write 1-N 21 2. The concept of transformation in the DB context DB engineering process modelling An example (transforming many-to-many relationship types) BOOK ISBN Title DatePublis hed id: ISBN 0-N 0-5 BOOK ISBN Title DatePublis hed id: ISBN AUTHOR AuthorNam e id: AuthorNam e write 1-N 0-N 1-N 0-5 aw bw of of 1-1 1-1 COPY CopyNbr DatePurchas ed id: of.BOOK CopyNbr AUTHOR AuthorName id: AuthorName COPY CopyNbr DatePurchas ed id: of.BOOK CopyNbr 1-1 1-1 WRITE id: bw.BOOK aw.AUTHOR 22 2. The concept of transformation in the DB context DB engineering process modelling An example (transforming one-to-many relationship types) BOOK ISBN Title DatePublis hed id: ISBN 0-N AUTHOR AuthorName id: AuthorName 1-N 0-5 aw bw of 1-1 COPY CopyNbr DatePurchas ed id: of.BOOK CopyNbr 1-1 1-1 WRITE id: bw.BOOK aw.AUTHOR BOOK ISBN Title DatePublis hed id: ISBN AUTHOR AuthorNam e id: AuthorNam e COPY ISBN CopyNbr DatePurchas ed id: ISBN CopyNbr ref: ISBN WRITE AuthorNam e ISBN id: ISBN AuthorNam e ref: ISBN ref: AuthorNam e No m ore than 5 WRITE rows per BOOK row. 23 2. The concept of transformation in the DB context The concept of transformation A transformation T replaces a construct C in a schema S1 with another construct C', leading to schema S2 T S1 C S2 C' schemas 24 2. The concept of transformation in the DB context The concept of transformation If the schema describes actual data, the transformation should also tell how to convert the data (t) ... T S1 S2 C C' schemas t data c c' 25 2. The concept of transformation in the DB context The concept of transformation - Definition A transformation S is defined by two mappings T and t S = <T,t> C T inst_of c C' = T(C) inst_of t c' = t(c) T: structural mapping = syntax of S t: instance mapping = semantics of S 2. The concept of transformation in the DB context The concept of transformation - Definition Mapping T can be specified with two predicates: P: minimal pre-condition Q: maximal post-condition S = <T,t> = <P,Q,t> 26 2. The concept of transformation in the DB context Specifying a transformation Expressing structural predicates P and Q Value-based (more concise, a name denotes an object) entity-type(E) there exists an entity type with name E Object-based (more general, a name is a property of an object) entity-type(e) there exists an entity type denoted by e name(e,E) the name of e is E must allow specification and reasoning (e.g., FOL, DL) 27 2. The concept of transformation in the DB context Specifying a transformation Expressing structural predicates P and Q entity-type(E) there exists an entity type with name E attribute(O,A,m,M,T) object (with name) O has an attribute with name A, cardinality m-M and type T id(O,Cp) object (with name) O has an identifier comprising components Cp rel-type(R) there exists a rel-type with name R role(R,r,E,m,M) rel-type R has a role with name r, played by E, with cardinality m-M 28 29 2. The concept of transformation in the DB context Specifying a transformation Expressing structural predicates P and Q entity-type(CUSTOMER) attribute(CUSTOMER,Cust#,1,1,integer) attribute(CUSTOMER,Name,1,1,string) attribute(CUSTOMER,Phone,0,5,string) id(CUSTOMER,{Cust#}) = CUSTOMER Cust# Name Phone[0-5] id: Cust# 30 2. The concept of transformation in the DB context Specifying a transformation P Q P = entity-type(CUSTOMER) attribute(CUSTOMER,Cust#,1,1,integer) attribute(CUSTOMER,Name,1,1,string) attribute(CUSTOMER,Phone,0,5,string) id(CUSTOMER,{Cust#}) = = CUSTOMER Cust# Name Phone[0-5] id: Cust# Q = entity-type(CUSTOMER) attribute(CUSTOMER,Cust#,1,1,integer) attribute(CUSTOMER,Name,1,1,string) id(CUSTOMER,{Cust#}) entity-type(PHONE) attribute(PHONE,Phone,1,1,string) id(PHONE,{Phone}) rel-type(has) role(has,,CUSTOMER,0,5) role(has,,PHONE,1,N) CUSTOMER PHONE Cust# Name id: Cust# 0-5 Phone id: Phone has 1-N 31 2. The concept of transformation in the DB context Specifying a transformation From now on: P CUSTOMER CUSTOMER Cust# Name Phone[0-5] id: Cust# Q PHONE Cust# Name id: Cust# 0-5 Phone id: Phone has 1-N 32 2. The concept of transformation in the DB context Inverse transformations -1 S2 = S1 iff C: P1(C) C = T2(T1(C)) T1 CUSTOMER Cust# Name Phone[0-5] id: Cust# CUSTOMER PHONE Cust# Name id: Cust# T2 0-5 Phone id: Phone has 1-N Intuitively, S2 undoes the effect of S1 at the structural level mapping t ignored 33 2. The concept of transformation in the DB context Reversible transformations A transformation can ... augment the information contents of the schema CUSTOMER Cust# Name Addres s CUSTOMER Cus t# Nam e Addres s Phone CUSTOMER Cust# Name Phone CUSTOMER Cust# Name Phone preserve the information contents of the schema CUSTOMER Cus t# Nam e Addres s Phone decrease the information contents of the schema more complex patterns exist CUSTOMER Cus t# Nam e 1-1 has PHONE 1-N Phone id: Phone 34 2. The concept of transformation in the DB context Reversible transformations Transformation S1 is reversible if it preserves the information contents of the source schema reversible= semantics preserving mapping t involved 2. The concept of transformation in the DB context Reversible transformations A transformation can be ... not reversible: not semantics-preserving reversible: "one-way" semantics-preserving symmetrically reversible: fully semantics-preserving 35 2. The concept of transformation in the DB context Reversible transformations Examples P: R(A,B,C); Q: R1(A,B); R2(A,C); P: R(A,B,C); A B|C Q: R1(A,B); R2(A,C); not reversible reversible (Fagin's theorem) P: R(A,B,C); A B|C Q: R1(A,B); R2(A,C); R1[A] = R2[C]; symmetrically reversible 36 37 2. The concept of transformation in the DB context Reversible transformations A transformation is reversible if there is an inverse mapping for instances as well S1 is reversible iff S2 = S1 : C: P(C) C = T2(T1(C)) c inst(C): c = t2(t1(c)) -1 38 2. The concept of transformation in the DB context Symmetrically reversible transformations S is symmetrically reversible iff both S and S S = <P,Q,t> -1 are reversible S -1 = <Q,P,t-1> SR-transformations are the most desirable operators in analysis, design, reverse engineering, migration, refactoring, and (partially) evolution processes 39 3. Elementary and complex transformations 3. Elementary and complex transformations Elementary : cannot be decomposed into smaller SR-transformations Complex : can be decomposed into (more) elementary SR-transformations 40 41 3. Elementary and complex transformations Elementary transformations DOCUMENT DocID Title Date-Published Keyword[0-10] id: DocID 0-N written 0-N BOOK ISBN Publisher id: ISBN DOCUMENT DocID Title Date-Published Keyword[0-10] id: DocID AUTHOR Name First-Name Origin 0-N of 1-1 AUTHOR Name First-Name Origin 0-N 0-N 1-1 WRITTEN id: doc.DOCUMENT by.AUTHOR 1-1 by DOCUMENT DocID Title Date-Published Keyword[0-10] id: DocID doc COPY Serial-No Date-Acquired id: of.BOOK Serial-No DOCUMENT DocID Title Date-Published id: DocID 0-10 describe 1-N KEYWORD Keyword id: Keyword COPY ISBN Serial-No Date-Acquired id: ISBN Serial-No ref: ISBN BOOK ISBN Publisher id: ISBN 3. Elementary and complex transformations Elementary and complex SR-transformations Elementary transformations are building blocks for more complex operators challenge: Developing higher-level SR transformations with elementary SR-transformations 42 3. Elementary and complex transformations Three classes of complex SR-transformations compound transformations predicate-driven transformations model-driven transformations 43 3. Elementary and complex transformations Compound transformations The composition of two transformations is a transformation The composition of two SR-transformations is an SR-transformation S1 = <T1, t1> S2 = <T2, t2> S12 = S2 o S1 = <T2 o T1, t2 o t1> 44 45 3. Elementary and complex transformations Compound transformations new! ACCOUNT AccID Available id: AccID expens es Amount 0-5 DAY-of-WEEK Day-of-Week id: Day-of-Week known known known ACCOUNT AccID Available id: AccID 0-5 of known ACCOUNT AccID Available Expenses[0-5] Day-of-Week Am ount id: AccID id(Expenses): Day-of-Week 1-N ACCOUNT AccID Available Exp-Monday[0-1] Exp-Tues day_1[0-1] Exp-Wednesday_2[0-1] Exp-Thursday_3[0-1] Exp-Friday_4[0-1] id: AccID ACCOUNT AccID Available id: AccID 1-1 EXPENSES Day-of-Week Am ount id: of.ACCOUNT Day-of-Week dom(Day-of-Week) = {'Monday','Tuesday', .. ,'Friday'} DAY-of-WEEK Day-of-Week id: Day-of-Week 1-N 0-5 on of 1-1 1-1 EXPENSES Am ount id: of.ACCOUNT on.DAY-of-WEEK 3. Elementary and complex transformations Predicate-driven (conditional) transformations Transformations that apply to a set of qualified objects in the current schema S ( p) where S is a transformation p is a structural predicate interpretation: apply S to all the objects that satisfy p 46 3. Elementary and complex transformations 47 Predicate-driven (conditional) transformations We need a language for p structural (e.g., DL): complex and leading to huge expressions ad hoc : expressive, concise, parametric, but not generic, not closed ROLE_per_RT(I J): the number of roles of the current rel-type is between I and J ONE_ROLE_per_RT(1 2): the number of "one" roles (with cardinality ?-1) is between I and J MAX_CARD_of_ATT(I J): the maximum cardinality of the current attribute is between I and J DEPTH_of_ATT(I J): the level of the current attribute is between I and J 3. Elementary and complex transformations 48 Predicate-driven (conditional) transformations S (p) RT_into_ET(ROLE_per_RT(3 N)): transform all rel-types into an entity type (if they have at least 3 roles) RT_into_REF(ROLE_per_RT(2 2) and ONE_ROLE_per_RT(1 2)): transform all rel-types into referential attributes (if they are binary and one-to-many or one-to-one) INSTANTIATE(MAX_CARD_of_ATT(2 4)): instanciate amm attributes (if they are "slightly" multivalued: from 2 to 4values) ATT_into_ET_VAL(DEPTH_of_ATT(1 1) and MAX_CARD_of_ATT(5 N)): transform all attributes into an entity type (if they are at the top level and they are "strongly" multivalued: at least 5 values) 3. Elementary and complex transformations Model-driven transformation Goal: considering schema S1 in model M1, transform S1 into S2 that complies with model M2. Of course, as far as possible through SR-transformations! Example: considering the Entity-relationship schema S1, transform S1 into S2 that complies with the relational model. Of course, as far as possible without information loss! Structure: a compound transformation comprising predicate-driven transformations. Practical form: a transformation plan. 49 3. Elementary and complex transformations Model-driven transformation Building principles: 1. Identify the constructs of M1 that violate M2 (called invalid) 2. For each invalid construct C, apply a transformation <T,t> = <P,Q,t> such that P(C) and T(C) satisfies M2 Things may be a bit more complex, requiring a compound transformation. Example: processing N-ary rel-types for relational compliance requires two successive transformations 50 3. Elementary and complex transformations Model-driven transformation Example: ER to Binary (flat Bachman) conversion The binary model is a variant of the ER model in which: there is no ISA relations rel-types are functional (binary + one-to-many or one-to-one) rel-types have no attributes each rel-type is defined on two distinct entity types (no cyclic rel-types) attributes are single-valued and atomic. 51 3. Elementary and complex transformations Model-driven transformation Flat Bachman schemas - invalid constructs: ISA relations cyclic rel-types complex rel-types (with attributes, N-ary) many-to-many binary rel-types multivalued attributes compound attributes. 52 3. Elementary and complex transformations Model-driven transformation Flat Bachman schemas - processing invalid constructs: ISA relations: materialization cyclic rel-types: transform into entity types complex rel-types (with attributes, N-ary): transform into entity types many-to-many binary rel-types: transform into entity types multivalued attributes: transform into entity types compound attributes: disagregate. 53 3. Elementary and complex transformations Model-driven transformation Transformation plan for ER to Flat Bachman conversion ISA_into_RT; transform ISA relations by materialization; RT_into_ET(RECURSIVITY_in_RT(2 N)); transform rel-types in which the same entity type appears more than once; RT_into_ET(ATT_per_RT(1 N) or ROLE_per_RT(3 N)); transform complex rel-types; RT_into_ET(ONE_ROLE_per_RT(0 0)); transform rel-types in which there is no "one" role; LOOP; iteratively flatten the attribute structure ATT_into_ET_INST(MAX_CARD_of_ATT(2 N)) DISAGGREGATE ENDLOOP; 54 55 3. Elementary and complex transformations Model-driven transformation Example of ER to Flat Bachman conversion DOCUMENT DocID Title Date-Published Keyword[0-10] id: DocID res pons ible-for 0-10 res pons ible 0-N res erved BOOK ISBN Publis her id': ISBN 0-N BORROWER PID Nam e id: PID 0-N 0-N of borrowing d isa 1-1 1-1 KEYWORD Keyword id: d.DOCUMENT Keyword BOOK ISBN Publis her id': ISBN BORROWER PID Nam e id: PID 0-1 0-N of 0-N 0-N 1-1 what RESPONSIBLE 0-N 1-1 1-1 PROJECT ProjCode Title id: ProjCode RESERVED id: by.BORROWER what.DOCUMENT by 0-N PROJECT ProjCode Title id: ProjCode COPY Serial-No Date-Acquired Loc_Store Loc_Shelf Loc_Row id: of.BOOK Serial-No 0-N for 0-N 1-1 by 1-1 0-N is 0-N of 1-1 COPY Serial-No Date-Acquired Location Store Shelf Row id: of.BOOK Serial-No 0-1 0-N 0-1 DOCUMENT DocID Title Date-Published id: DocID what 1-1 1-1 1-1 BORROWING id: for.PROJECT by.BORROWER what.COPY 3. Elementary and complex transformations Model-driven transformation Other popular examples ER to UML UML to ER ER to relational relational to ER COBOL files to ER ER to XML relational to XML 56 57 4. Representation of schemas in DB Engineering 4. Representation of schemas in DB Engineering Dealing with multiple models A typical organization uses several different data models. E.g., it commonly uses DB2 databases, also uses a legacy IDMS database, writes its conceptual schemas in the ER model, quite often transfers data between databases, exchanges data with its environment, standardizes on XML format, plans to migrate some databases to other platforms, prepares the development of a datawarehouse, study the feasibility to merge several departments (and their information systems), etc. 58 59 4. Representation of schemas in DB Engineering Dealing with multiple models conceptual schema organization application program design data warehouse operational data migrate ETL extract & export XML import environment XML 60 4. Representation of schemas in DB Engineering Dealing with multiple models Considering all the inter-model and intra-model conversions, the organization requires N x N different mappings (= 16). Srel>er Srel>rel Ser>er Srer>rel Relational Model ER Model Srel>cod Ser>xml Scod>rel CODASYL Model Sxml>er XML Model Scod>xml Sxml>xml Scod>cod Sxml>cod 61 4. Representation of schemas in DB Engineering Dealing with multiple models The usual answer: introducing a pivot model. Considering all the inter-model and intra-model conversions, the organization requires 2 x N + 1 different mappings (= 9). Sp>p Relational Model Srel>p Ser>p Sp>rel Sp>er ER Model Sp>cod Sp>xml XML Model Scod>p Sxml>p Pivot Model CODASYL Model 4. Representation of schemas in DB Engineering The Generic Entity-relationship (GER) model as the pivot model abstraction levels conceptual logical/view physical code data models ER; EER; OO (UML; etc.); ORM; Bachman; RDF; . . . relational; OO (UML; etc.); object-relational; XML DTD; XML Schema; standard file; network; hierarchical; . . . Oracle 11g; Oracle 8; DB2 9.7; MySQL 5.5; IDS2; IMS; . . . Oracle 11g SQL-DDL; IDS2 DDL; . . . GER 62 63 4. Representation of schemas in DB Engineering Specifying operational model M in the GER Procedure identifying the concepts of the GER that are pertinent in M specifying the structural constraints that hold in valid M schemas renaming the selected constructs according to the taxonomy of M. ER Model Ser>ger Suml>ger UML Class Model Sxml>ger XML Model GER Model Relat. Model Srel>ger 64 4. Representation of schemas in DB Engineering Specifying operational model M in the GER Example: SQL2 is a specialization of the GER relational constructs GER constructs assembly rules database schema schema table entity type domain simple domain nullable column single-valued and atomic attribute with cardinality [0-1] not null column single-valued and atomic attribute with cardinality [1-1] primary key primary identifier unique constraint secondary identifier foreign key reference group the composition of the reference group must be the same as that of the target identifier SQL names GER names the GER names must follow the SQL syntax an entity type includes at least one attribute a primary identifier comprises attributes with cardinality [1-1] 65 4. Representation of schemas in DB Engineering Specifying operational model M in the GER Notion of M-compliant schema This schema is SQL2-compliant: primary key DETAIL ORD-ID SEQ_NBR REFERENCE QTY-ORD id: ORD-ID SEQ_NBR ref: ORD-ID ORDER ORD-ID DATE_RECEIVED ORIGIN id: ORD-ID ref: ORIGIN CUSTOMER CUSTOMER ID id: CUSTOMER ID column foreign key This schema is not SQL2-compliant: is-a hierarchy PERSON PID Nam e id: PID no attributes rel-type P EMPLOYEE RegNbr Service id: RegNbr CUSTOMER 0-N table has 1-1 non-elementary attribute ACCOUNT AccNbr Deposit[0-N] Amount Date id: AccNbr 66 4. Representation of schemas in DB Engineering Specifying operational model M in the GER Important consequence Inter-model engineering transformations (ER to SQL2) are expressed as intramodel transformations (ER to GER to GER to SQL2) Logical design ER schema SQL2 schema Logical design Sger>ger ER schema Ser>ger GER schema Sger>rel SQL2 schema 67 5. About property preservation 5. About property preservation A schema has some important properties or facets the semantics of its components components may be assigned statistics (e.g., there are 15.000 CUSTOMER entities) constraints : identifiers, functional dependencies, existence constraints, cardinality constraints, etc. generic operations can by applied to their instances (insert, update, delete, etc.) some components have annotations (free text) they have 2D coordinates in the schema space others ... Are these properties preserved in SR-transformations? How to propagate them to the target schema? How can we prove they are preserved? 68 69 5. About property preservation Semantics preservation By definition, an R- or SR-transformations preserve the semantics of the schema. How to prove it? By mapping the GER model on a simpler model which already includes the concept of R- and SR- transformation. Example: the N1NF relational model. SNF2>NF2 Sger> NF2 GER Model S NF2 >ger N1NF Relat. Model 70 5. About property preservation Semantics preservation A GER transformation Sg is SR if there exists a (possibly complex) SR-transformation Sr such that, Sg = SNF2 >ger o Sr o Sger>NF2 Sg Sr Sger>erm GER schema N1NF schema Serm>ger 71 5. About property preservation Semantics preservation Sger>NF2 A1 A2[0-N] A3 att-into-ET/v att-into-ET/v = rA A: entities; desc-A'(A,A1,A3); R(A,A2[1-N]); desc-A'[A]=A; 0-N 1-N EA2 A2 id: A2 S NF2 >ger A,EA2: entities; desc-A'(A,A1,A3); desc-EA2(EA2,A2); rA(A,EA2); desc-A'[A]=A; desc-EA2[EA2]=rA[EA2]=EA2; A: entities; desc-A(A,A1,A2[0-N],A3); desc-A[A]=A; project-join A A1 A3 A unnest extension A: entities; desc-A'(A,A1,A3); R'(A,A2); desc-A'[A]=A; SNF2 >ger o extension o unnest o project-join o Sger>NF2 5. About property preservation Statistics preservation Static and dynamic data metrics are important, specially for physical design: How many CUSTOMER entities? How many distinct values of CITY attribute? How many ORDER entities per CUSTOMER entity? How many CUSTOMER entities with no ORDER entites? How many new ORDER entities per day? How many updates of ADDRESS attribute per day? Hard to collect; easier to get at the conceptual level how to propagate them at the other levels (logical and physical) 72 73 5. About property preservation Statistics preservation The main static statistics: E A1[m-M]: D A2 gr: A2 NE Entity type NA1 A1 A1/E E/A1 A1/E Attribute NG G ND D mE-ME rE rE rE Group Domain Role R NR Relationship type mF-MF rF F S1 F/S1 Collection 74 5. About property preservation Statistics preservation A A1 A2[0-N] A3 NA NA2 A2/A A/A2 0A2/A A2 (NA) NA2 = NA2' A2/A = NEA2/NA A/A2 = EA2/A2' 0A2/A = 0rA A2 = A2' att-into-ET/i A A1 A3 0-N R (NA) NEA2 = NA A2/A NR = NA A2/A R.A = A2/A 0R.A = 0A2/A NA2' = NB EA2/A2' = A/A2 A2' = A2 NA NEA2 0R.A NA2' EA2/A2' A2' NR = NEA2 R.A = NEA2/NA 1-1 EA2 A2' id: A2' R.A 75 6. Applications of transformations 6. Applications of transformations • • • • • • Improving enginering processes Automating enginering processes Traceability Developing new engineering processes Education Co-transformations + a lot of other applications (see BX-Grace report) 76 6. Applications of transformations Improving enginering processes • • • Fosters systematic and reproducible engineering techniques Better control and auditing of the design products Minimize (or at least identify) semantic losses 77 6. Applications of transformations Automating enginering processes Fairly easy to automate but requires very careful analysis of predicates P and Q 78 79 6. Applications of transformations Automating enginering processes The DB-MAIN CASE environment - Elementary transformations 1. select an object 3. if needed, select the variant 4. if needed, give target names 2. select a transformation 6. Applications of transformations Automating enginering processes The DB-MAIN CASE environment - Model-driven transformations 80 81 6. Applications of transformations Traceability The history of the transformations applied to produce schema S2 from schema S1 is the trace of the "S1 S2" engineering process. It can be used to derive direct and reverse mappings. Such mappings are used to identify: • for each construct in source schema S1, the constructs in target schema S2 that derive from it, • for each construct in target schema S2, the constructs in source schema S2 that it derive from. Examples: • which conceptual object does DB2 column ORDER.CUST implement? • how has relationship type "writes" ben implemented in the DB2 schema? 6. Applications of transformations Developing new engineering processes Three examples: • Database reverse engineering: modelled as the inverse of forward engineering. Challenge: finding the bi-directional transformation between the physical and conceptual schemas. • Design recovery: reconstruction of the process that could have been executed when a legacy database was designed. Can be recovered induction on the history of the reverse enginering process. HIistory of a process = trace of the transformations that have been carried out during the process. • Schema quality evaluation and improvement: identifying bad patterns in a schema and replacing them by better but equivalent data structures. Equivalent = that can be derived from each other through reversible transformations. Bad, better: according to quality criteria, such as simplicity, expressivity, no redundancy, normalization, etc. 82 83 6. Applications of transformations Education Obvious! provided transformation techniques are presented in an intuitive and natural way! 6. Applications of transformations Co-transformations An complex software system includes artefacts pertaining to several paradigms: • • • • • database : static structures, (re)active components, data programs GUI forms and reports various secondary components (e.g., ETL, validation, loading, security management scripts) When the database schema is modified, some of the other components must be updated accordingly. Can this update be automated? Yes, provided schema modification has ben carried out through formal transformations. Application: evolution of large, data-centered, systems (see Anthony's position statement. 84 85 7. Conclusions and challenges 7. Conclusions and challenges Intuitively, most database engineering processes are transformational by nature. By combining elementary transformations, we can give these processes a precise transformational definition. A transformation can be formalized so that its preservation properties can be proved. We need a small set of elementary transformations (20 - 40). Once correctly defined, a transformation is quite reliable, and is guaranteed to preserve information whatever the context in which it is applied. Transformations are (sort of …) easy to implement in CASE tools. Several general-purpose languages and engines: QVT, ATL, Kermeta, GReAT, VIATRA, Tefkat, TXL and ... XSLT! 86 6. Challenges However, some problems are not (completely) solved: a transformation must address all the aspects of the data structures: documentation, annotations, statistics, operations (methods). complex problem: propagating the constraints; OK for uniqueness, but others are less obvious. how to efficiently transform the data, following schema transformation? See J.-M. Hick thesis (2003). modifying a high-level abstract schema is easy, but how do we propagate the modifications to the lower-level schema and code (traceability)? transforming the data structures is nice, but what about the programs? Notion of co-transformation. See A. Cleve’s thesis (2009) how to derive a procedural transformation from the <P,Q> specification? how to derive a transformation plan from couple (M1, M2)? 87 88 Selected references (from our contribution) References Anthony Cleve, Tom Mens, Jean-Luc Hainaut. Data-Intensive System Evolution, IEEE Computer, pp. 110-112, IEEE CS, 43(8), August 2010. Anthony Cleve, Program Analysis and Transformation for Data-Intensive System Evolution, PhD Thesis, University of Namur, 2009 Jean-Luc Hainaut, Anthony Cleve, Jean Henrard and Jean-Marc Hick. Migration of Legacy Information Systems, in Software Evolution. Mens, T. and Demeyer, S. (Eds), Springer, pp. 107-138, 2008 Hainaut, J-L, The Transformational Approach to Database Engineering, in Lämmel, R., Saraiva, J., Visser, V., (Eds), Generative and Transformational Techniques in Software Engineering, pp. 95-143, LNCS 4143, Springer, 2006) Jean-Marc Hick and Jean-Luc Hainaut. Database application evolution: A transformational approach, Data and Knowledge Engineering, 59(3): pp. 534-558, 2006. Anthony Cleve and Jean-Luc Hainaut. Co-transformations in Database Applications Evolution, in Generative and Transformational Techniques in Software Engineering, LNCS, Vol. 4143, pp. 409-421, Springer-Verlag, 2006. Jean-Luc Hainaut. Transformation-based Database Engineering, in Transformation of Knowledge, Information and Data: Theory and Applications, pages 1-26, IDEA Group, 2005. Jean Henrard, Anthony Cleve and Jean-Luc Hainaut. Inverse Wrappers for Legacy Information Systems Migration, in Proceedings of 1st International Workshop on Wrapper Techniques for Legacy Systems, (WCRE’04/WRAP’04), Computer Science Report, Volume 04-34, pages 30-43, Technische Universiteit Eindhoven, 2004. 89 References Anthony Cleve, Jean Henrard and Jean-Luc Hainaut. Co-transformations in Information System Reengineering, in Proceedings of the 2nd International Workshop on Metamodels, Schemas, and Grammars for Reverse Engineering, (WCRE’04/ATEM-04), Electronic Notes in Theoretical Computer Science, Volume 137, pages 5-15, Elsevier, 2005. Jean-Luc Hainaut. Specification preservation in schema transformations - Application to semantics and statistics, Data and Knowledge Engineering, 16(1): Elsevier Science Publish., 1996 Jean-Luc Hainaut, Jean Henrard, Jean-Marc Hick, Didier Roland and Vincent Englebert. Database Design Recovery, in Proceedings of the 8th Conference on Advanced Information Systems Engineering, (CAiSE’96), Lecture Notes in Computer Science, Volume 1080, pages 272-300, Springer-Verlag, 1996 Jean-Luc Hainaut. Transformation-Based Database Engineering, in Tutorials of the 21th International Conference on Very Large Data Bases, (VLDB’95), 1995. Jean-Luc Hainaut, Catherine Tonneau, Michel Joris and Muriel Chandelon. Transformation-based Database Reverse Engineering, in Proceedings of 12th International Conference on Entity-Relationship Approach (ER’93), Lecture Notes in Computer Science, Volume 823, pages 364-375, Springer-Verlag, 1994. Jean-Luc Hainaut,Mario Cadelli,Bernard Decuyper and Olivier Marchand. TRAMIS:a transformationbased database CASE tool, in Proceedings of 5th International Conference on Software Engineering and Applications, EC2 Publish., 1992. 90 91 References Jean-Luc Hainaut. Entity-generating Schema Transformations for Entity-Relationship Models, in Proceedings of the 10th International Conference on the Entity-Relationship Approach (ER’91), pages 643-670, ER Institute, 1991 Jean-Luc Hainaut. Theoretical and Practical Tools for Data Base Design, in Proceedings of the 7th International Conference on Very Large Data Bases, (VLDB’81), pages 216-224, IEEE Computer Society, 1981 Most of these references are available on the site of the LIBD: http://info.fundp.ac.be/libd Otherwise, ask 92 Thanks