Download 3. Elementary and complex transformations

Document related concepts

Oracle Database wikipedia , lookup

Entity–attribute–value model wikipedia , lookup

Database wikipedia , lookup

Clusterpoint wikipedia , lookup

Functional Database Model wikipedia , lookup

Relational model wikipedia , lookup

Database model wikipedia , lookup

Transcript
Dagstuhl Seminar January 17, 2011 - Bidirectional Transformations
PReCISE
LIBD
Transformations in Database Engineering
Jean-Luc Hainaut, Anthony Cleve
2
Objectives of the tutorial
 to presents some important aspects of transformations in DB
 in an informal and intuitive way
 with practical applications in mind
 mainly (but not exclusively) for non-DB communities
3
Contents
1. The context of databases
2. The concept of transformation in the DB context
3. Elementary and complex transformations
4. Representation of schemas in DB Engineering
5. About property preservation
6. Applications of transformations
7. Challenges
4
1. The context of databases
1. The context of databases - Concepts, terms, issues
What is a Database?
The structured collection of the data necessary to
 keep the memory of an organization (structures, rules and facts)
 to act as a reliable and efficient data server for an application system
schema
Client
program
Information/data structures are known (a.o., for processing)
through schema(s)
5
6
1. The context of databases - Concepts, terms, issues
The schemas and models of a database
Schemas in DBMS's (1973-2011)
view n
view 2
view 1
conceptual schema
Client
program
logical schema
= uses interface
physical schema
= mapping
= instance of
7
1. The context of databases - Concepts, terms, issues
The schemas and models of a database
Standard DB design methodologies (1974-2011)
Users requirements
DDL code
Conceptual design
conceptual schema
Logical design
logical schema
Physical design
physical schema
Coding
View design
view n
view 2
view 1
8
1. The context of databases - Concepts, terms, issues
transformations
Un ouvrage est une oeuvre littéraire publiée.
Il est caractérisé par son numéro identifiant,
son titre, son éditeur, sa date de première
parution, ses mots-clés (10 au maximum),
une brève note de présentation (ces notes
sont en cours de constitution), le nom et le
prénom de ses auteurs. A un ouvrage
correspondent un certain nombre
d'exemplaires, qui en sont la matérialisation
physique. ...
Database design
create database BIB
create dbspace BIB_DATA;
create table OUVRAGE (
NUMERO char(18) not null,
TITRE varchar(60) not null,
EDITEUR char(32) not null,
DATE_1RE_PARUTION date not null,
PRESENTATION varchar(255),
primary key (NUMERO)) in BIB_DATA;
. . .
alter table EXEMPLAIRE add constraint FKDE
foreign key (NUMERO)references OUVRAGE;
. . .
create unique index IDOUVRAGE
on OUVRAGE (NUMERO);
. . .
OUVRAGE
Numéro
Titre
Editeur
Date 1re parution
Mot clé[0-10]
Présentation[0-1]
id: Numéro
0-N
écrit
1-N
AUTEUR
Nom
Prénom
0-N
de
1-1
EXEMPLAIRE
Num série
Date acquisition
Localisation
Etage
Rayon
Travée
Etat[0-1]
id: de.OUVRAGE
Num série
MOT_CLE
NUMERO
MOT_CLE
id: NUMERO
MOT_CLE
acc
ref: NUMERO
EXEMPLAIRE
NUMERO
NUM_SERIE
DATE_ACQUISITION
LOC_ETAGE
LOC_RAYON
LOC_TRAVEE
ETAT[0-1]
id: NUMERO
NUM_SERIE
acc
ref: NUMERO
MOT_CLE
NUMERO
MOT_CLE
id: NUMERO
MOT_CLE
ref: NUMERO
conceptual
schema
OUVRAGE
NUMERO
TITRE
EDITEUR
DATE_1RE_PARUTION
PRESENTATION[0-1]
id: NUMERO
acc
BIB_DATA
OUVRAGE
MOT_CLE
EXEMPLAIRE
ECRIT
AUTEUR
AUTEUR
ID_AUTEUR
NOM
PRENOM
id: ID_AUTEUR
acc
EXEMPLAIRE
NUMERO
NUM_SERIE
DATE_ACQUISITION
LOC_ETAGE
LOC_RAYON
LOC_TRAVEE
ETAT[0-1]
id: NUMERO
NUM_SERIE
ref: NUMERO
ECRIT
ID_AUTEUR
NUMERO
id: ID_AUTEUR
NUMERO
acc
ref: NUMERO
acc
equ: ID_AUTEUR
physical schema
(Oracle 11)
OUVRAGE
NUMERO
TITRE
EDITEUR
DATE_1RE_PARUTION
PRESENTATION[0-1]
id: NUMERO
AUTEUR
ID_AUTEUR
NOM
PRENOM
id: ID_AUTEUR
ECRIT
ID_AUTEUR
NUMERO
id: ID_AUTEUR
NUMERO
ref: NUMERO
equ: ID_AUTEUR
Logical schema
(relational)
1. The context of databases - Concepts, terms, issues
The schemas and models of a database

in UML:
 a meta-model is a formal system of abstract constructs that can be used to
describe any situation pertaining to a modeling domain; the notation is an integral
part of a meta-model;
 a model is an artefact made up of instances of constructs of a meta-model, and that
specifies the structures of a definite situation of an application domain

in the Database realm:
 a model is a formal system of abstract constructs that can be used to describe any
situation pertaining to a modeling domain; can be given several notations;
 a schema is an artefact using the constructs of a model, and that specifies the
structures of one definite situation of an application domain

Examples: the relational model
the Entity-relationship model
the relational schema of the DAGSTUHL-ORG database.
9
10
1. The context of databases - Concepts, terms, issues
The schemas and models of a database
Seen as an evolving bag of facts
real world
system
modelled
by
describes
instance
of
comply
with
fact
classes
describes
schema
a way to see
the world
philosophy
describes
instance
of
expressed
into
model
is a domain
of
meta-schema
describes
instance
of
expressed
into
instance
of
meta-model
meta-meta-schema
expressed
into
instance
of
metameta-model
1. The context of databases - Concepts, terms, issues
The schemas and models of a database
Abstraction levels and paradigms
abstraction
levels
conceptual
logical/view
physical
code
Paradigms (aka "data model")
ER; EER; OO (UML; etc.); ORM; Bachman; RDF; . . .
relational; OO (UML; etc.); object-relational; XML DTD;
XML Schema; standard file; network; hierarchical; . . .
Oracle 11g; Oracle 8; DB2 9.7; MySQL 5.5; IDS2; IMS; . . .
Oracle 11g SQL-DDL; IDS2 DDL; . . .
11
1. The context of databases - Engineering processes
Database engineering processes

DB Analysis and Design
across abstraction levels (from abstract to concrete) and modelling
paradigms

DB Reverse Engineering
across abstraction levels (from concrete to abstract) and modelling
paradigms

DB Evolution
same abstraction level - same modelling paradigm

DB Migration
same abstraction level - change of modelling paradigm

others : refactoring, integration, view derivation, ETL, . . .
several abstraction levels - several modelling paradigms
12
13
1. The context of databases - Engineering processes
Database engineering processes
DB Analysis and Design
conceptual
ER
EER
OO
ORM
logical/view
relational
OO
Obj-relat
network
physical
Oracle 8
Oracle 11g
DB2 9.7
IDS2
code
Oracle 8
DDL
Oracle 11g
DDL
DB2 9.7
DDL
IDS2
DDL
14
1. The context of databases - Engineering processes
Database engineering processes
DB Reverse Engineering
conceptual
ER
EER
OO
ORM
logical/view
relational
OO
Obj-relat
network
physical
Oracle 8
Oracle 11g
DB2 9.7
IDS2
code
Oracle 8
DDL
Oracle 11g
DDL
DB2 9.7
DDL
IDS2
DDL
15
1. The context of databases - Engineering processes
Database engineering processes
DB Evolution
conceptual
ER
EER
OO
ORM
logical/view
relational
OO
Obj-relat
network
physical
Oracle 8
Oracle 11g
DB2 9.7
IDS2
code
Oracle 8
DDL
Oracle 11g
DDL
DB2 9.7
DDL
IDS2
DDL
Not recommended
16
1. The context of databases - Engineering processes
Database engineering processes
DB Migration
conceptual
ER
EER
OO
ORM
logical/view
relational
OO
Obj-relat
network
physical
Oracle 8
Oracle 11g
DB2 9.7
IDS2
code
Oracle 8
DDL
Oracle 11g
DDL
DB2 9.7
DDL
IDS2
DDL
Not recommended
17
2. The concept of transformation in the DB
context
2. The concept of transformation in the DB context
18
DB engineering process modelling
Most engineering processes are artefact transformations
Users requirements
DDL code = DB-design(Users Requirements)
Conceptual design
Conceptual schema
DB-design = Coding o PhysD o LogD o ConcD
Conceptual schema = ConcD(Users Requirements)
Logical design
Logical schema
Logical schema = LogD(Conceptual schema)
Physical schema = PhysD(Logical schema)
Physical design
Physical schema
DDL code = Coding(Physical schema)
ConcD = Analysis o Normalisation o Integration
etc.
Coding
DDL code
19
2. The concept of transformation in the DB context
DB engineering process modelling
An example (relational logical design)
BOOK
ISBN
Title
Author[0-5]
DatePublished
id: ISBN
0-N
of
1-1
COPY
CopyNbr
DatePurchased
id: of.BOOK
CopyNbr

BOOK
ISBN
Title
DatePublis hed
id: ISBN
AUTHOR
AuthorNam e
id: AuthorNam e
COPY
ISBN
CopyNbr
DatePurchas ed
id: ISBN
CopyNbr
ref: ISBN
WRITE
AuthorNam e
ISBN
id: ISBN
AuthorNam e
ref: ISBN
ref: AuthorNam e
No m ore than 5 WRITE rows
per BOOK row.
20
2. The concept of transformation in the DB context
DB engineering process modelling
An example (transforming multivalued attributes)
BOOK
ISBN
Title
Author[0-5]
DatePublished
id: ISBN
0-N
of
1-1
COPY
CopyNbr
DatePurchased
id: of.BOOK
CopyNbr
BOOK
ISBN
Title
DatePublis hed
id: ISBN

0-N
0-5
of
1-1
COPY
CopyNbr
DatePurchas ed
id: of.BOOK
CopyNbr
AUTHOR
AuthorNam e
id: AuthorNam e
write
1-N
21
2. The concept of transformation in the DB context
DB engineering process modelling
An example (transforming many-to-many relationship types)
BOOK
ISBN
Title
DatePublis hed
id: ISBN
0-N
0-5
BOOK
ISBN
Title
DatePublis hed
id: ISBN
AUTHOR
AuthorNam e
id: AuthorNam e
write
1-N

0-N
1-N
0-5
aw
bw
of
of
1-1
1-1
COPY
CopyNbr
DatePurchas ed
id: of.BOOK
CopyNbr
AUTHOR
AuthorName
id: AuthorName
COPY
CopyNbr
DatePurchas ed
id: of.BOOK
CopyNbr
1-1
1-1
WRITE
id: bw.BOOK
aw.AUTHOR
22
2. The concept of transformation in the DB context
DB engineering process modelling
An example (transforming one-to-many relationship types)
BOOK
ISBN
Title
DatePublis hed
id: ISBN
0-N
AUTHOR
AuthorName
id: AuthorName
1-N
0-5
aw
bw
of
1-1
COPY
CopyNbr
DatePurchas ed
id: of.BOOK
CopyNbr
1-1
1-1
WRITE
id: bw.BOOK
aw.AUTHOR

BOOK
ISBN
Title
DatePublis hed
id: ISBN
AUTHOR
AuthorNam e
id: AuthorNam e
COPY
ISBN
CopyNbr
DatePurchas ed
id: ISBN
CopyNbr
ref: ISBN
WRITE
AuthorNam e
ISBN
id: ISBN
AuthorNam e
ref: ISBN
ref: AuthorNam e
No m ore than 5 WRITE rows
per BOOK row.
23
2. The concept of transformation in the DB context
The concept of transformation
A transformation T replaces a construct C in a schema S1 with
another construct C', leading to schema S2
T
S1
C
S2
C'
schemas
24
2. The concept of transformation in the DB context
The concept of transformation
If the schema describes actual data, the transformation should also tell
how to convert the data (t) ...
T
S1
S2
C
C'
schemas
t
data
c
c'
25
2. The concept of transformation in the DB context
The concept of transformation - Definition
A transformation S is defined by two mappings T and t
S = <T,t>
C
T
inst_of
c
C' = T(C)
inst_of
t
c' = t(c)
T: structural mapping = syntax of S
t: instance mapping = semantics of S
2. The concept of transformation in the DB context
The concept of transformation - Definition
Mapping T can be specified with two predicates:
P: minimal pre-condition
Q: maximal post-condition
S = <T,t> = <P,Q,t>
26
2. The concept of transformation in the DB context
Specifying a transformation
Expressing structural predicates P and Q
Value-based (more concise, a name denotes an object)
entity-type(E)
there exists an entity type with name E
Object-based (more general, a name is a property of an object)
entity-type(e)
there exists an entity type denoted by e
name(e,E)
the name of e is E
must allow specification and reasoning (e.g., FOL, DL)
27
2. The concept of transformation in the DB context
Specifying a transformation
Expressing structural predicates P and Q
entity-type(E)
there exists an entity type with name E
attribute(O,A,m,M,T) object (with name) O has an attribute with name A, cardinality m-M
and type T
id(O,Cp)
object (with name) O has an identifier comprising components Cp
rel-type(R)
there exists a rel-type with name R
role(R,r,E,m,M)
rel-type R has a role with name r, played by E, with cardinality m-M
28
29
2. The concept of transformation in the DB context
Specifying a transformation
Expressing structural predicates P and Q
entity-type(CUSTOMER)
 attribute(CUSTOMER,Cust#,1,1,integer)
 attribute(CUSTOMER,Name,1,1,string)
 attribute(CUSTOMER,Phone,0,5,string)
 id(CUSTOMER,{Cust#})
=
CUSTOMER
Cust#
Name
Phone[0-5]
id: Cust#
30
2. The concept of transformation in the DB context
Specifying a transformation
P
Q
P = entity-type(CUSTOMER)
 attribute(CUSTOMER,Cust#,1,1,integer)
 attribute(CUSTOMER,Name,1,1,string)
 attribute(CUSTOMER,Phone,0,5,string)
 id(CUSTOMER,{Cust#})
=

=
CUSTOMER
Cust#
Name
Phone[0-5]
id: Cust#
Q = entity-type(CUSTOMER)
 attribute(CUSTOMER,Cust#,1,1,integer)
 attribute(CUSTOMER,Name,1,1,string)
 id(CUSTOMER,{Cust#})
 entity-type(PHONE)
 attribute(PHONE,Phone,1,1,string)
 id(PHONE,{Phone})
 rel-type(has)
 role(has,,CUSTOMER,0,5)
 role(has,,PHONE,1,N)
CUSTOMER
PHONE
Cust#
Name
id: Cust#
0-5
Phone
id: Phone
has
1-N
31
2. The concept of transformation in the DB context
Specifying a transformation
From now on:
P
CUSTOMER

CUSTOMER
Cust#
Name
Phone[0-5]
id: Cust#
Q
PHONE
Cust#
Name
id: Cust#
0-5
Phone
id: Phone
has
1-N
32
2. The concept of transformation in the DB context
Inverse transformations
-1
S2 = S1
iff
C: P1(C)  C = T2(T1(C))
T1


CUSTOMER
Cust#
Name
Phone[0-5]
id: Cust#
CUSTOMER
PHONE
Cust#
Name
id: Cust#
T2
0-5
Phone
id: Phone
has
1-N
Intuitively, S2 undoes the effect of S1 at the structural level
mapping t ignored
33
2. The concept of transformation in the DB context
Reversible transformations
A transformation can ...
augment the information contents of the schema
CUSTOMER
Cust#
Name
Addres s
CUSTOMER
Cus t#
Nam e
Addres s
Phone
CUSTOMER
Cust#
Name
Phone

CUSTOMER
Cust#
Name
Phone
preserve the information contents of the schema


CUSTOMER
Cus t#
Nam e
Addres s
Phone
decrease the information contents of the schema




more complex patterns exist
CUSTOMER
Cus t#
Nam e
1-1
has
PHONE
1-N Phone
id: Phone
34
2. The concept of transformation in the DB context
Reversible transformations
Transformation S1 is reversible if it preserves
the information contents of the source schema
reversible= semantics preserving
mapping t involved
2. The concept of transformation in the DB context
Reversible transformations
A transformation can be ...

not reversible: not semantics-preserving

reversible: "one-way" semantics-preserving

symmetrically reversible: fully semantics-preserving
35
2. The concept of transformation in the DB context
Reversible transformations
Examples
P: R(A,B,C);
Q: R1(A,B);
R2(A,C);
P: R(A,B,C);
A  B|C
Q: R1(A,B);
R2(A,C);
not reversible
reversible (Fagin's theorem)
P: R(A,B,C);
A  B|C
Q: R1(A,B);
R2(A,C);
R1[A] = R2[C];
symmetrically reversible
36
37
2. The concept of transformation in the DB context
Reversible transformations
A transformation is reversible if
there is an inverse mapping for instances as well
S1
is reversible iff
 S2 = S1 :
 C: P(C)  C = T2(T1(C))

 c  inst(C): c = t2(t1(c))
-1
38
2. The concept of transformation in the DB context
Symmetrically reversible transformations
S
is symmetrically reversible iff both S and S
S = <P,Q,t>

-1
are reversible
S -1 = <Q,P,t-1>
SR-transformations are the most desirable operators in analysis,
design, reverse engineering, migration, refactoring, and (partially)
evolution processes
39
3. Elementary and complex transformations
3. Elementary and complex transformations
Elementary : cannot be decomposed into smaller SR-transformations
Complex : can be decomposed into (more) elementary
SR-transformations
40
41
3. Elementary and complex transformations
Elementary transformations
DOCUMENT
DocID
Title
Date-Published
Keyword[0-10]
id: DocID
0-N
written
0-N
BOOK
ISBN
Publisher
id: ISBN
DOCUMENT
DocID
Title
Date-Published
Keyword[0-10]
id: DocID
AUTHOR
Name
First-Name
Origin
0-N
of
1-1


AUTHOR
Name
First-Name
Origin
0-N
0-N
1-1
WRITTEN
id: doc.DOCUMENT
by.AUTHOR
1-1
by

DOCUMENT
DocID
Title
Date-Published
Keyword[0-10]
id: DocID
doc
COPY
Serial-No
Date-Acquired
id: of.BOOK
Serial-No
DOCUMENT
DocID
Title
Date-Published
id: DocID
0-10
describe
1-N
KEYWORD
Keyword
id: Keyword
COPY
ISBN
Serial-No
Date-Acquired
id: ISBN
Serial-No
ref: ISBN
BOOK
ISBN
Publisher
id: ISBN
3. Elementary and complex transformations
Elementary and complex SR-transformations
Elementary transformations are
building blocks for more complex operators
challenge:
Developing higher-level SR transformations
with elementary SR-transformations
42
3. Elementary and complex transformations
Three classes of complex SR-transformations

compound transformations

predicate-driven transformations

model-driven transformations
43
3. Elementary and complex transformations
Compound transformations
The composition of two transformations is a transformation
The composition of two SR-transformations is an SR-transformation
S1 = <T1, t1>
S2 = <T2, t2>
S12 = S2 o S1 = <T2 o T1, t2 o t1>
44
45
3. Elementary and complex transformations
Compound transformations
new!
ACCOUNT
AccID
Available
id: AccID
expens es
Amount
0-5
DAY-of-WEEK
Day-of-Week
id: Day-of-Week
known
known
known
ACCOUNT
AccID
Available
id: AccID
0-5
of
known


ACCOUNT
AccID
Available
Expenses[0-5]
Day-of-Week
Am ount
id: AccID
id(Expenses):
Day-of-Week
1-N



ACCOUNT
AccID
Available
Exp-Monday[0-1]
Exp-Tues day_1[0-1]
Exp-Wednesday_2[0-1]
Exp-Thursday_3[0-1]
Exp-Friday_4[0-1]
id: AccID
ACCOUNT
AccID
Available
id: AccID
1-1
EXPENSES
Day-of-Week
Am ount
id: of.ACCOUNT
Day-of-Week
dom(Day-of-Week) = {'Monday','Tuesday', .. ,'Friday'}
DAY-of-WEEK
Day-of-Week
id: Day-of-Week
1-N
0-5
on
of
1-1
1-1
EXPENSES
Am ount
id: of.ACCOUNT
on.DAY-of-WEEK
3. Elementary and complex transformations
Predicate-driven (conditional) transformations
Transformations that apply to a set of qualified objects in the current schema
S ( p)
where
S is a transformation
p is a structural predicate
interpretation: apply S to all the objects that satisfy p
46
3. Elementary and complex transformations
47
Predicate-driven (conditional) transformations
We need a language for p
 structural (e.g., DL): complex and leading to huge expressions
 ad hoc :
expressive, concise, parametric,
but not generic, not closed
ROLE_per_RT(I J): the number of roles of the current rel-type is between I and J
ONE_ROLE_per_RT(1 2): the number of "one" roles (with cardinality ?-1) is between I and J
MAX_CARD_of_ATT(I J): the maximum cardinality of the current attribute is between I and J
DEPTH_of_ATT(I J): the level of the current attribute is between I and J
3. Elementary and complex transformations
48
Predicate-driven (conditional) transformations
S (p)
RT_into_ET(ROLE_per_RT(3 N)):
transform all rel-types into an entity type
(if they have at least 3 roles)
RT_into_REF(ROLE_per_RT(2 2) and ONE_ROLE_per_RT(1 2)):
transform all rel-types into referential attributes
(if they are binary and one-to-many or one-to-one)
INSTANTIATE(MAX_CARD_of_ATT(2 4)):
instanciate amm attributes
(if they are "slightly" multivalued: from 2 to 4values)
ATT_into_ET_VAL(DEPTH_of_ATT(1 1) and MAX_CARD_of_ATT(5 N)):
transform all attributes into an entity type
(if they are at the top level and they are "strongly" multivalued: at least 5
values)
3. Elementary and complex transformations
Model-driven transformation
Goal: considering schema S1 in model M1, transform S1 into S2 that complies
with model M2. Of course, as far as possible through SR-transformations!
Example: considering the Entity-relationship schema S1, transform S1 into S2 that
complies with the relational model. Of course, as far as possible without
information loss!
Structure: a compound transformation comprising predicate-driven transformations.
Practical form: a transformation plan.
49
3. Elementary and complex transformations
Model-driven transformation
Building principles:
1. Identify the constructs of M1 that violate M2 (called invalid)
2. For each invalid construct C, apply a transformation <T,t> = <P,Q,t> such that
P(C) and T(C) satisfies M2
Things may be a bit more complex, requiring a compound transformation.
Example: processing N-ary rel-types for relational compliance requires two
successive transformations
50
3. Elementary and complex transformations
Model-driven transformation
Example: ER to Binary (flat Bachman) conversion
The binary model is a variant of the ER model in which:

there is no ISA relations

rel-types are functional (binary + one-to-many or one-to-one)

rel-types have no attributes

each rel-type is defined on two distinct entity types (no cyclic rel-types)

attributes are single-valued and atomic.
51
3. Elementary and complex transformations
Model-driven transformation
Flat Bachman schemas - invalid constructs:

ISA relations

cyclic rel-types

complex rel-types (with attributes, N-ary)

many-to-many binary rel-types

multivalued attributes

compound attributes.
52
3. Elementary and complex transformations
Model-driven transformation
Flat Bachman schemas - processing invalid constructs:

ISA relations: materialization

cyclic rel-types: transform into entity types

complex rel-types (with attributes, N-ary): transform into entity types

many-to-many binary rel-types: transform into entity types

multivalued attributes: transform into entity types

compound attributes: disagregate.
53
3. Elementary and complex transformations
Model-driven transformation
Transformation plan for ER to Flat Bachman conversion
ISA_into_RT;
transform ISA relations by materialization;
RT_into_ET(RECURSIVITY_in_RT(2 N)); transform rel-types in which the same entity type
appears more than once;
RT_into_ET(ATT_per_RT(1 N) or ROLE_per_RT(3 N)); transform complex rel-types;
RT_into_ET(ONE_ROLE_per_RT(0 0)); transform rel-types in which there is no "one" role;
LOOP;
iteratively flatten the attribute structure
ATT_into_ET_INST(MAX_CARD_of_ATT(2 N))
DISAGGREGATE
ENDLOOP;
54
55
3. Elementary and complex transformations
Model-driven transformation
Example of ER to Flat Bachman conversion
DOCUMENT
DocID
Title
Date-Published
Keyword[0-10]
id: DocID
res pons ible-for
0-10
res pons ible
0-N
res erved
BOOK
ISBN
Publis her
id': ISBN
0-N
BORROWER
PID
Nam e
id: PID
0-N
0-N
of
borrowing
d
isa
1-1
1-1
KEYWORD
Keyword
id: d.DOCUMENT
Keyword
BOOK
ISBN
Publis her
id': ISBN
BORROWER
PID
Nam e
id: PID
0-1
0-N
of
0-N
0-N
1-1
what
RESPONSIBLE
0-N
1-1
1-1
PROJECT
ProjCode
Title
id: ProjCode
RESERVED
id: by.BORROWER
what.DOCUMENT
by
0-N
PROJECT
ProjCode
Title
id: ProjCode
COPY
Serial-No
Date-Acquired
Loc_Store
Loc_Shelf
Loc_Row
id: of.BOOK
Serial-No
0-N
for
0-N
1-1
by
1-1
0-N
is
0-N
of
1-1
COPY
Serial-No
Date-Acquired
Location
Store
Shelf
Row
id: of.BOOK
Serial-No
0-1

0-N
0-1
DOCUMENT
DocID
Title
Date-Published
id: DocID
what
1-1
1-1 1-1
BORROWING
id: for.PROJECT
by.BORROWER
what.COPY
3. Elementary and complex transformations
Model-driven transformation
Other popular examples

ER to UML

UML to ER

ER to relational

relational to ER

COBOL files to ER

ER to XML

relational to XML
56
57
4. Representation of schemas in DB
Engineering
4. Representation of schemas in DB Engineering
Dealing with multiple models
A typical organization uses several different data models. E.g., it

commonly uses DB2 databases,

also uses a legacy IDMS database,

writes its conceptual schemas in the ER model,

quite often transfers data between databases,

exchanges data with its environment,

standardizes on XML format,

plans to migrate some databases to other platforms,

prepares the development of a datawarehouse,

study the feasibility to merge several departments (and their information
systems),

etc.
58
59
4. Representation of schemas in DB Engineering
Dealing with multiple models
conceptual
schema
organization
application
program
design
data warehouse
operational data
migrate
ETL
extract
& export
XML
import
environment
XML
60
4. Representation of schemas in DB Engineering
Dealing with multiple models
Considering all the inter-model and intra-model conversions,
the organization requires N x N different mappings (= 16).
Srel>er
Srel>rel
Ser>er
Srer>rel
Relational
Model
ER Model
Srel>cod
Ser>xml
Scod>rel
CODASYL
Model
Sxml>er
XML Model
Scod>xml
Sxml>xml
Scod>cod
Sxml>cod
61
4. Representation of schemas in DB Engineering
Dealing with multiple models
The usual answer: introducing a pivot model.
Considering all the inter-model and intra-model conversions,
the organization requires 2 x N + 1 different mappings (= 9).
Sp>p
Relational
Model
Srel>p
Ser>p
Sp>rel
Sp>er
ER Model
Sp>cod
Sp>xml
XML Model
Scod>p
Sxml>p
Pivot Model
CODASYL
Model
4. Representation of schemas in DB Engineering
The Generic Entity-relationship (GER) model as the pivot model
abstraction
levels
conceptual
logical/view
physical
code
data models
ER; EER; OO (UML; etc.); ORM; Bachman; RDF; . . .
relational; OO (UML; etc.); object-relational; XML DTD;
XML Schema; standard file; network; hierarchical; . . .
Oracle 11g; Oracle 8; DB2 9.7; MySQL 5.5; IDS2; IMS; . . .
Oracle 11g SQL-DDL; IDS2 DDL; . . .
GER
62
63
4. Representation of schemas in DB Engineering
Specifying operational model M in the GER
Procedure

identifying the concepts of the GER that are pertinent in M

specifying the structural constraints that hold in valid M schemas

renaming the selected constructs according to the taxonomy of M.
ER Model
Ser>ger
Suml>ger
UML Class Model
Sxml>ger
XML Model
GER Model
Relat. Model
Srel>ger
64
4. Representation of schemas in DB Engineering
Specifying operational model M in the GER
Example: SQL2 is a specialization of the GER
relational constructs
GER constructs
assembly rules
database schema
schema
table
entity type
domain
simple domain
nullable column
single-valued and atomic
attribute with cardinality [0-1]
not null column
single-valued and atomic
attribute with cardinality [1-1]
primary key
primary identifier
unique constraint
secondary identifier
foreign key
reference group
the composition of the reference
group must be the same as that of
the target identifier
SQL names
GER names
the GER names must follow the
SQL syntax
an entity type includes at least one
attribute
a primary identifier comprises
attributes with cardinality [1-1]
65
4. Representation of schemas in DB Engineering
Specifying operational model M in the GER
Notion of M-compliant schema
This schema is SQL2-compliant:
primary key
DETAIL
ORD-ID
SEQ_NBR
REFERENCE
QTY-ORD
id: ORD-ID
SEQ_NBR
ref: ORD-ID
ORDER
ORD-ID
DATE_RECEIVED
ORIGIN
id: ORD-ID
ref: ORIGIN
CUSTOMER
CUSTOMER ID
id: CUSTOMER ID
column
foreign key
This schema is not SQL2-compliant:
is-a hierarchy
PERSON
PID
Nam e
id: PID
no attributes
rel-type
P
EMPLOYEE
RegNbr
Service
id: RegNbr
CUSTOMER
0-N
table
has
1-1
non-elementary attribute
ACCOUNT
AccNbr
Deposit[0-N]
Amount
Date
id: AccNbr
66
4. Representation of schemas in DB Engineering
Specifying operational model M in the GER
Important consequence
Inter-model engineering transformations (ER to SQL2) are expressed as intramodel transformations (ER to GER to GER to SQL2)
Logical design
ER schema
SQL2 schema
Logical design
Sger>ger
ER schema
Ser>ger
GER schema
Sger>rel
SQL2 schema
67
5. About property preservation
5. About property preservation
A schema has some important properties or facets

the semantics of its components

components may be assigned statistics (e.g., there are 15.000 CUSTOMER entities)

constraints : identifiers, functional dependencies, existence constraints, cardinality
constraints, etc.

generic operations can by applied to their instances (insert, update, delete, etc.)

some components have annotations (free text)

they have 2D coordinates in the schema space

others ...
 Are these properties preserved in SR-transformations?
 How to propagate them to the target schema?
 How can we prove they are preserved?
68
69
5. About property preservation
Semantics preservation
By definition, an R- or SR-transformations preserve the semantics of the
schema.
How to prove it?
By mapping the GER model on a simpler model which already includes
the concept of R- and SR- transformation.
Example: the N1NF relational model.
SNF2>NF2
Sger> NF2
GER Model
S NF2 >ger
N1NF Relat.
Model
70
5. About property preservation
Semantics preservation
A GER transformation Sg is SR if there exists a (possibly complex)
SR-transformation Sr such that,
Sg = SNF2 >ger o Sr o Sger>NF2
Sg
Sr
Sger>erm
GER schema
N1NF schema
Serm>ger
71
5. About property preservation
Semantics preservation
Sger>NF2


A1
A2[0-N]
A3
att-into-ET/v
att-into-ET/v =
rA


A: entities;
desc-A'(A,A1,A3);
R(A,A2[1-N]);
desc-A'[A]=A;
0-N
1-N
EA2
A2
id: A2
S NF2 >ger
A,EA2: entities;
desc-A'(A,A1,A3);
desc-EA2(EA2,A2);
rA(A,EA2);
desc-A'[A]=A;
desc-EA2[EA2]=rA[EA2]=EA2;
A: entities;
desc-A(A,A1,A2[0-N],A3);
desc-A[A]=A;
project-join
A
A1
A3

A
unnest

extension
A: entities;
desc-A'(A,A1,A3);
R'(A,A2);
desc-A'[A]=A;
SNF2 >ger o extension o unnest o project-join o Sger>NF2
5. About property preservation
Statistics preservation
Static and dynamic data metrics are important, specially for physical design:

How many CUSTOMER entities?

How many distinct values of CITY attribute?

How many ORDER entities per CUSTOMER entity?

How many CUSTOMER entities with no ORDER entites?

How many new ORDER entities per day?

How many updates of ADDRESS attribute per day?
Hard to collect; easier to get at the conceptual level
 how to propagate them at the other levels (logical and physical)
72
73
5. About property preservation
Statistics preservation
The main static statistics:
E
A1[m-M]: D
A2
gr: A2
NE
Entity
type
NA1
A1
A1/E
 E/A1
 A1/E
Attribute
NG
G
ND
D
mE-ME
rE
 rE
 rE
Group Domain Role
R
NR
Relationship
type
mF-MF
rF
F
S1
 F/S1
Collection
74
5. About property preservation
Statistics preservation
A
A1
A2[0-N]
A3
NA
NA2
A2/A
A/A2
0A2/A
A2
(NA)
NA2 = NA2'
A2/A = NEA2/NA
A/A2 = EA2/A2'
0A2/A = 0rA
A2 = A2'
att-into-ET/i



A
A1
A3
0-N
R
(NA)
NEA2 = NA A2/A
NR = NA A2/A
R.A = A2/A
0R.A = 0A2/A
NA2' = NB
EA2/A2' = A/A2
A2' = A2
NA
NEA2
0R.A
NA2'
EA2/A2'
A2'
NR = NEA2
R.A = NEA2/NA
1-1
EA2
A2'
id: A2'
R.A
75
6. Applications of transformations
6. Applications of transformations
•
•
•
•
•
•
Improving enginering processes
Automating enginering processes
Traceability
Developing new engineering processes
Education
Co-transformations
+ a lot of other applications (see BX-Grace report)
76
6. Applications of transformations
Improving enginering processes
•
•
•
Fosters systematic and reproducible engineering techniques
Better control and auditing of the design products
Minimize (or at least identify) semantic losses
77
6. Applications of transformations
Automating enginering processes
Fairly easy to automate but requires very careful analysis
of predicates P and Q
78
79
6. Applications of transformations
Automating enginering processes
The DB-MAIN CASE environment - Elementary transformations
1. select an object
3. if needed, select the variant
4. if needed, give target names
2. select a transformation
6. Applications of transformations
Automating enginering processes
The DB-MAIN CASE environment - Model-driven transformations
80
81
6. Applications of transformations
Traceability
The history of the transformations applied to produce schema S2 from
schema S1 is the trace of the "S1 S2" engineering process.
It can be used to derive direct and reverse mappings.
Such mappings are used to identify:
• for each construct in source schema S1, the constructs in target
schema S2 that derive from it,
• for each construct in target schema S2, the constructs in source
schema S2 that it derive from.
Examples:
• which conceptual object does DB2 column ORDER.CUST
implement?
• how has relationship type "writes" ben implemented in the DB2
schema?
6. Applications of transformations
Developing new engineering processes
Three examples:
• Database reverse engineering: modelled as the inverse of forward
engineering. Challenge: finding the bi-directional transformation
between the physical and conceptual schemas.
• Design recovery: reconstruction of the process that could have been
executed when a legacy database was designed. Can be recovered
induction on the history of the reverse enginering process. HIistory of a
process = trace of the transformations that have been carried out
during the process.
• Schema quality evaluation and improvement: identifying bad
patterns in a schema and replacing them by better but equivalent data
structures. Equivalent = that can be derived from each other through
reversible transformations. Bad, better: according to quality criteria,
such as simplicity, expressivity, no redundancy, normalization, etc.
82
83
6. Applications of transformations
Education
Obvious!
provided transformation techniques are presented in an intuitive
and natural way!
6. Applications of transformations
Co-transformations
An complex software system includes artefacts pertaining to
several paradigms:
•
•
•
•
•
database : static structures, (re)active components, data
programs
GUI
forms and reports
various secondary components (e.g., ETL, validation, loading,
security management scripts)
When the database schema is modified, some of the other
components must be updated accordingly.
Can this update be automated? Yes, provided schema modification
has ben carried out through formal transformations.
Application: evolution of large, data-centered, systems (see Anthony's
position statement.
84
85
7. Conclusions and challenges
7. Conclusions and challenges

Intuitively, most database engineering processes are transformational by
nature.

By combining elementary transformations, we can give these processes a
precise transformational definition.

A transformation can be formalized so that its preservation properties can
be proved.

We need a small set of elementary transformations (20 - 40).

Once correctly defined, a transformation is quite reliable, and is guaranteed
to preserve information whatever the context in which it is applied.

Transformations are (sort of …) easy to implement in CASE tools.

Several general-purpose languages and engines: QVT, ATL, Kermeta,
GReAT, VIATRA, Tefkat, TXL and ... XSLT!
86
6. Challenges
However, some problems are not (completely) solved:

a transformation must address all the aspects of the data structures:
documentation, annotations, statistics, operations (methods).

complex problem: propagating the constraints; OK for uniqueness, but
others are less obvious.

how to efficiently transform the data, following schema transformation?
See J.-M. Hick thesis (2003).

modifying a high-level abstract schema is easy, but how do we propagate
the modifications to the lower-level schema and code (traceability)?

transforming the data structures is nice, but what about the programs?
Notion of co-transformation. See A. Cleve’s thesis (2009)

how to derive a procedural transformation from the <P,Q> specification?

how to derive a transformation plan from couple (M1, M2)?
87
88
Selected references
(from our contribution)
References
Anthony Cleve, Tom Mens, Jean-Luc Hainaut. Data-Intensive System Evolution, IEEE Computer, pp.
110-112, IEEE CS, 43(8), August 2010.
Anthony Cleve, Program Analysis and Transformation for Data-Intensive System Evolution, PhD Thesis,
University of Namur, 2009
Jean-Luc Hainaut, Anthony Cleve, Jean Henrard and Jean-Marc Hick. Migration of Legacy Information
Systems, in Software Evolution. Mens, T. and Demeyer, S. (Eds), Springer, pp. 107-138, 2008
Hainaut, J-L, The Transformational Approach to Database Engineering, in Lämmel, R., Saraiva, J.,
Visser, V., (Eds), Generative and Transformational Techniques in Software Engineering, pp. 95-143,
LNCS 4143, Springer, 2006)
Jean-Marc Hick and Jean-Luc Hainaut. Database application evolution: A transformational approach,
Data and Knowledge Engineering, 59(3): pp. 534-558, 2006.
Anthony Cleve and Jean-Luc Hainaut. Co-transformations in Database Applications Evolution, in
Generative and Transformational Techniques in Software Engineering, LNCS, Vol. 4143, pp. 409-421,
Springer-Verlag, 2006.
Jean-Luc Hainaut. Transformation-based Database Engineering, in Transformation of Knowledge,
Information and Data: Theory and Applications, pages 1-26, IDEA Group, 2005.
Jean Henrard, Anthony Cleve and Jean-Luc Hainaut. Inverse Wrappers for Legacy Information Systems
Migration, in Proceedings of 1st International Workshop on Wrapper Techniques for Legacy Systems,
(WCRE’04/WRAP’04), Computer Science Report, Volume 04-34, pages 30-43, Technische Universiteit
Eindhoven, 2004.
89
References
Anthony Cleve, Jean Henrard and Jean-Luc Hainaut. Co-transformations in Information System
Reengineering, in Proceedings of the 2nd International Workshop on Metamodels, Schemas, and
Grammars for Reverse Engineering, (WCRE’04/ATEM-04), Electronic Notes in Theoretical Computer
Science, Volume 137, pages 5-15, Elsevier, 2005.
Jean-Luc Hainaut. Specification preservation in schema transformations - Application to semantics and
statistics, Data and Knowledge Engineering, 16(1): Elsevier Science Publish., 1996
Jean-Luc Hainaut, Jean Henrard, Jean-Marc Hick, Didier Roland and Vincent Englebert. Database
Design Recovery, in Proceedings of the 8th Conference on Advanced Information Systems Engineering,
(CAiSE’96), Lecture Notes in Computer Science, Volume 1080, pages 272-300, Springer-Verlag, 1996
Jean-Luc Hainaut. Transformation-Based Database Engineering, in Tutorials of the 21th International
Conference on Very Large Data Bases, (VLDB’95), 1995.
Jean-Luc Hainaut, Catherine Tonneau, Michel Joris and Muriel Chandelon. Transformation-based
Database Reverse Engineering, in Proceedings of 12th International Conference on Entity-Relationship
Approach (ER’93), Lecture Notes in Computer Science, Volume 823, pages 364-375, Springer-Verlag,
1994.
Jean-Luc Hainaut,Mario Cadelli,Bernard Decuyper and Olivier Marchand. TRAMIS:a transformationbased database CASE tool, in Proceedings of 5th International Conference on Software Engineering
and Applications, EC2 Publish., 1992.
90
91
References
Jean-Luc Hainaut. Entity-generating Schema Transformations for Entity-Relationship Models, in
Proceedings of the 10th International Conference on the Entity-Relationship Approach (ER’91), pages
643-670, ER Institute, 1991
Jean-Luc Hainaut. Theoretical and Practical Tools for Data Base Design, in Proceedings of the 7th
International Conference on Very Large Data Bases, (VLDB’81), pages 216-224, IEEE Computer
Society, 1981
Most of these references are available on the site of the LIBD:
http://info.fundp.ac.be/libd
Otherwise, ask
92
Thanks