Download Schema and data translation

Document related concepts

Database wikipedia , lookup

Entity–attribute–value model wikipedia , lookup

Relational model wikipedia , lookup

Database model wikipedia , lookup

Transcript
Schema and Data Translation
Paolo Atzeni
Università Roma Tre
Based on
Tutorial at ICDE 2006
Paper in EDBT 2006 (with P. Cappellari and P. Bernstein)
Demo in Sigmod 2007 (with P. Cappellari and G. Gianforme)
ADBIS --- Varna, October 2, 2007
Outline
•
•
•
•
Introduction
Model management
Schema and data translation: the problem
A metamodel based approach
P. Atzeni
Varna - October 2, 2007
2
A ten-year goal for database research
• The “Asilomar report”
(Bernstein et al. Sigmod Record 1999 www.acm.org/sigmod):
– The information utility:
make it easy for everyone to store, organize, access,
and analyze the majority of human information online
P. Atzeni
Varna - October 2, 2007
3
A general framework: cooperation
• "The capacity of a system to interact (effectively) with other
systems, possibly managed by different organizations"
• Forms of cooperation
– Process-centered cooperation:
• the systems offer services one another, by exchanging
messages, or by triggering activities, without making
remote data explicitly visible
– Data-centered cooperation:
• the systems offer data one another; data is distributed,
heterogeneous and autonomous
P. Atzeni
Varna - October 2, 2007
4
Databases in the Internet era
• Databases before the Internet
– An internal infrastructure, a precious resource, but usually
hidden, with some controlled cooperation
• Internet changes the requirements
– More users (not only humans), more diverse cooperating
systems (distributed, heterogeneous, autonomous), more
types of data
• "Future" Internet changes more
– New devices (embedded everywhere), even more users
(many “per person”), real mobility, need for personalization
and adaptation
P. Atzeni
Varna - October 2, 2007
5
The most studied form of data-centered
cooperation: integration
• We are interested in data-centered cooperation, often referred to
as integration
“The unification of related, heterogeneous data from disparate
sources, for example, to enable collaboration” (Hammer &
Stonebraker 2005)
• Some "paradigms" …
P. Atzeni
Varna - October 2, 2007
6
Multidatabase
Global Manager
P. Atzeni
Mediator
Mediator
Mediator
Local mgr
Local mgr
Local mgr
DB
DB
DB
Varna - October 2, 2007
7
Data Warehousing System
DW manager
Data Warehouse
“Integrator”
Mediator
Mediator
Mediator
Local manager
Local manager
Local manager
DB
DB
DB
P. Atzeni
Varna - October 2, 2007
8
Intermediate solutions in practice
Application
Local manager
Mediator
Local Manager
DB
DB
Integrator
Local manager
DB
P. Atzeni
Mediator
Mediator
Local manager
Local manager
DB
DB
Varna - October 2, 2007
9
Peer-based architecture
Peer mgr
Local mgr
Mediator
DB
Peer mgr
Local mgr
Local mgr
DB
Mediator
DB
Mediator
Local mgr
DB
Mediator
Peer mgr
Local mgr
Mediator
Local mgr
DB
DB
Mediator
P. Atzeni
Varna - October 2, 2007
10
Data is not just in databases
•
•
•
•
•
•
•
Mail messages
Web pages
Spreadsheets
Textual documents
Palmtop devices, mobile phones
Multimedia annotations (e.g., in digital photos)
XML documents
P. Atzeni
Varna - October 2, 2007
11
The same data in the same form?
• Adaptivity:
– Personalization: content adapted to the user
• upon system's decision
• upon user's request
– Customization: structure adapted to the user
• according to the user's role
• upon user's request
– Context dependence
• User, Device, Network, Place, Time, Rate
P. Atzeni
Varna - October 2, 2007
12
Summarizing: a general need
• We have data at various places, and data has to be
– exchanged
– replicated
– migrated
– integrated
– adapted
P. Atzeni
Varna - October 2, 2007
13
A major difficulty
• Heterogeneity
– System level
– Model level
– Structural (different structure for similar data)
– Semantic (different meaning for the same structure)
• Many efforts, but current techniques are mostly manual and ad
hoc
P. Atzeni
Varna - October 2, 2007
14
A direction for the solutions
• Be general!
– ad hoc solution could work in-the-small, but they
• are repetitive and time consuming
• do not scale
• are not maintainable
• Historical notes:
– W. C. McGee: Generalization: Key to Successful Electronic
Data Processing. J. ACM 1959
• Indeed, databases are the result of generalization applied to
secondary storage management!
P. Atzeni
Varna - October 2, 2007
15
Generality requires …
• … high-level descriptions of problems within the family of
interest:
– Metadata:
• “data about data”
• (formal or informal) description of structures and meaning
• General solutions leverage on metadata (and then operate on
data as a consequence)
P. Atzeni
Varna - October 2, 2007
16
Outline
•
•
•
•
Introduction and motivation
Model management
Schema and data translation: the problem
A metamodel based approach
P. Atzeni
Varna - October 2, 2007
17
A wider perspective
• (Generic) Model Management:
– A proposal by Bernstein et al (2000 +)
– Includes a set of operators on
• schemas and
• mappings between schemas
P. Atzeni
Varna - October 2, 2007
18
Terminology: a warning
P. Atzeni
Model Mgmt people
Traditional DB people
Meta-metamodel
Metamodel
Metamodel
Model
Model
Schema
Varna - October 2, 2007
19
Schemas and mappings
• A simplified approach:
– Schema:
• a set of elements, related in some way to one another
– Mapping:
• a set of correspondences (pair of elements) or
• its reification, a third schema related to the other two via
two sets of correspondences
P. Atzeni
Varna - October 2, 2007
20
Model mgmt operators, a first set
•
•
•
•
map = Match (S1, S2)
S3 = Merge (S1, S2, map)
S2 = Diff (S1, map)
and more
–
–
–
–
–
–
P. Atzeni
map3 = Compose (map1, map2)
S2 = Select (S1, pred)
Apply (S, f)
list = Enumerate (S)
S2 = Copy (S1)
…
Varna - October 2, 2007
21
Match
• map = Match (S1, S2)
– given
• two schemas S1, S2
– returns
• a mapping between them
• the “classical” initial step in data integration:
– find the common elements of two schemas and the
correspondences between them
P. Atzeni
Varna - October 2, 2007
22
Merge
• S3 = Merge (S1, S2, map)
– given
• two schemas and a mapping between them
– returns
• a third schema (and two mappings)
• the “classical” second step in data integration:
– given the correspondences, find a way to obtain one schema
out of two
P. Atzeni
Varna - October 2, 2007
23
Diff
• S2 = Diff (S1, map)
– given
• a schema and a mapping from it (to some other schema,
not relevant)
– returns
• a (sub-)schema, with the elements that do not participate
in the mapping
P. Atzeni
Varna - October 2, 2007
24
Example
(Bernstein and Rahm, ER 2000)
• A database (a “source”), a data warehouse and a mapping
between the two
• we get a second source, with some similarity to the first one
• and we want to update the DW
DB1
DW1
DW2
DB2
P. Atzeni
Varna - October 2, 2007
25
Example, the "solution"
m2 = Match(DB1,DB2)
DW2
DB1
m1
DW1
m3
m2
DB2
DB2’
P. Atzeni
m5
m4
m3= Compose(m2,m1)
DB2’=Diff(DB2,m3)
DW2’, m4 user defined
m5 = Match(DW1,DW2’)
DW2 = Merge(DW1,DW2’,m5)
DW2’
Varna - October 2, 2007
26
Magic does not exist
• Operators might require human intervention:
– Match is the main case
• Scripts involving operators might require human intervention as
well (or at least benefit from it):
– a full implementation of each operator might not always
available
– a mapping might require manual specification
– incomparable alternatives might exist
P. Atzeni
Varna - October 2, 2007
27
The “data level”
• The major operators have also an extended version that
operates on data, and not only on schemas
• Especially apparent for
– Merge
P. Atzeni
Varna - October 2, 2007
28
We also have heterogeneity
• Round trip engineering (Bernstein, CIDR 2003)
– A specification (for example ER or UML) and an
implementation (for example, relational)
– then a change to the implementation: want to revise the
specification
• We need a translation from the implementation model to the
specification one
P. Atzeni
S1
S2
I1
I2
Varna - October 2, 2007
29
Model management with heterogeneity
• The previous operators have to be “model generic” (capable of
working on different models)
• We need a “translation” operator
– <S2, map12> = ModelGen (S1)
P. Atzeni
Varna - October 2, 2007
30
ModelGen, an additional operator
• <S2, map12> = ModelGen (S1)
– given
• a schema (in a model)
– returns
• a schema (in a different data model) and a mapping
between the two
• A “translation” from a model to another
• I should call it “SchemaGen” …
• We should better write
– <S2, map12> = ModelGen (S1,mod2)
P. Atzeni
Varna - October 2, 2007
31
Round trip engineering
S1
m1
S2
m3
S2’
m4
m2
I1
I2
I2’
m2 = Match (I1,I2)
m3 = Compose (m1,m2)
I2’= Diff(I2,m3)
<S2’,m4 > = Modelgen(I2’)
… Match, Merge
P. Atzeni
Varna - October 2, 2007
32
Outline
•
•
•
•
Introduction
Model management
Schema and data translation: the problem
A metamodel based approach
P. Atzeni
Varna - October 2, 2007
33
Schema and data translation,
a long standing issue
• Schema translation:
– given schema S1 in model M1 and model M2
– find a schema S2 in M2 that “corresponds” to S1
• Schema and data translation:
– given also a database D1 for S1
– find also a database D2 for S2 that “contains the same data”
to D1
P. Atzeni
Varna - October 2, 2007
34
Schema and data translation,
a long standing issue
• Translations from a model to another have been studied since
the 1970’s
• Whenever a new model is defined, techniques and tools to
generate translations are studied
• However, proposals and solutions are usually model specific
P. Atzeni
Varna - October 2, 2007
35
Model specific solutions
• Given an ER schema, find the suitable relational schema that
“implements” it
– the original paper (Chen 1976) contains the basics
– further discussions by many (e.g. Markowitz and Shoshani
1989)
– illustrated in every textbook
• Similarly with
– any other conceptual model and any other logical one
– XML and relational (or object)
P. Atzeni
Varna - October 2, 2007
36
Another problem in the picture: data exchange
• Given a source S1 and a target schema S2 (in different models
or even in the same one), find a translation, that is, a function
that given a database D1 for S1 produces a database D2 for S2
that “correspond” to D1
• Often emphasized with reference to materialized solutions
P. Atzeni
Varna - October 2, 2007
37
Integration
• Given two or more sources, build an integrated schema or
database
P. Atzeni
Varna - October 2, 2007
38
Schema translation
• Given a schema find another one with respect to some specific
goal (another model, better quality, …)
P. Atzeni
Varna - October 2, 2007
39
Data exchange
• Given a source and a target schema, find a transformation from
the former to the latter
P. Atzeni
Varna - October 2, 2007
40
Schema translation and data exchange
• Can be seen a complementary:
– Data translation = schema translation + data exchange
• Given a source schema and database
• Schema translation produces the target schema
• Data exchange generates the target database
• In model management terms we could write
– Schema translation:
• <S2, map12> = ModelGen (S1,mod2)
– Data exchange:
• i2 = DataGen (S1,i1,S2,map12)
P. Atzeni
Varna - October 2, 2007
41
Outline
•
•
•
•
Introduction
Model management
Schema and data translation: the problem
A metamodel based approach
P. Atzeni
Varna - October 2, 2007
42
The problem
• ModelGen
– given two data models M1 and M2, and a schema S1 of M1
(the source schema and model),
• generate a schema S2 of M2 (the target schema and
model), corresponding to S1
• and, for each database D1 over S1, generate an
equivalent database D2 over S2
P. Atzeni
Varna - October 2, 2007
43
We have been doing this for a while
• Initial work more than ten years ago (Atzeni & Torlone, 1996)
• Major novelty recently (Atzeni, Cappellari & Bernstein, 2006)
– translation of both schemas and data
– data-level translations generated automatically, from
schema-level ones
P. Atzeni
Varna - October 2, 2007
44
A simple example
•
•
•
An object relational database, to be translated in a relational one
Source: the OR-model
Target: the relational model
System managed ids
used as references
P. Atzeni
Varna - October 2, 2007
45
Example, 2
•
•
Does the OR model allow for keys?
Assume EmpNo and Name are keys
P. Atzeni
Varna - October 2, 2007
46
Example, 3
•
•
Does the OR model allow for keys?
Assume no keys are specified
P. Atzeni
Varna - October 2, 2007
47
Many different models (and variations …)
N-ary ER
w/ gen
OR
Binary ER
w/ gen
N-ary ER
w/o gen
Bin ER w/o gen
w/o attr on rel
P. Atzeni
…
…
…
Bin ER w/ gen
w/o attr on rel
Binary ER
w/o gen
Relational
XSD
Bin ER w/ gen
w/o M:N rel
OO w/ gen
Bin ER w/o gen
w/o M:N rel
OO w/o gen
Varna - October 2, 2007
48
Heterogeneity
• We need to handle artifacts and data in various models
– Data are defined wrt to schemas
– Schemas are defined wrt to models
– How models can be defined?
Models
Schemas
Data
P. Atzeni
Varna - October 2, 2007
49
A metamodel approach
•
•
•
The constructs in the various models are rather similar:
– can be classified into a few categories (Hull & King 1986):
• Lexical: set of printable values (domain)
• Abstract (entity, class, …)
• Aggregation: a construction based on (subsets of)
cartesian products (relationship, table)
• Function (attribute, property)
• Hierarchies
• …
We can fix a set of metaconstructs (each with variants):
– lexical, abstract, aggregation, function, ...
– the set can be extended if needed, but this will not be frequent
A model is defined in terms of the metaconstructs it uses
P. Atzeni
Varna - October 2, 2007
50
The metamodel approach, example
• The ER model:
– Abstract (called Entity)
– Function from Abstract to Lexical (Attribute)
– Aggregation of abstracts (Relationship)
– …
• The OR model:
– Abstract (Table with ID)
– Function from Abstract to Lexical (value-based Attribute)
– Function from Abstract to Abstract (reference Attribute)
– Aggregation of lexicals (value-based Table)
– Component of Aggregation of Lexicals (Column)
– …
P. Atzeni
Varna - October 2, 2007
51
The supermodel
• A model that includes all the meta-constructs (in their most
general forms)
– Each model is subsumed by the supermodel (modulo
construct renaming)
– Each schema for any model is also a schema for the
supermodel (modulo construct renaming)
P. Atzeni
Varna - October 2, 2007
52
The metamodel approach, translations
• The constructs in the various models are rather similar:
– can be classified into a few categories (“metaconstructs'')
– translations can be defined on metaconstructs,
• and there are “standard”, accepted ways to deal with
translations of metaconstructs
• they can be performed within the supermodel
– each translation from the supermodel SM to a target model
M is also a translation from any other model to M:
• given n models, we need n translations, not n2
P. Atzeni
Varna - October 2, 2007
53
Generic translation environment
Supermodel
2. Translation
1. Copy
3. Copy
Source
model
Target
model
Translation:
composition 1,2 & 3
P. Atzeni
Varna - October 2, 2007
54
Translations within the supermodel
• We still have too many models:
– Just within simple ER model versions, we have 4 or 5
constructs, and each has several independent features
which give rise to variants
• for example, relationships can be
– binary or N-ary
– with all possible cardinalities or without many-tomany
– with or without the possibility of specifying optionality
– with or without attributes
–…
– Combining all these, we get hundreds of models!
– The management of a specific translation for each model
would be hopeless
P. Atzeni
Varna - October 2, 2007
55
Translations, the approach
• Elementary translation steps to be combined
• Each translation step handles a supermodel construct (or a
feature thereof) "to be eliminated" or "transformed"
• A translation is the concatenation of elementary translation
steps
P. Atzeni
Varna - October 2, 2007
56
A complex translation, example
(0,N)
(0,N)
•
•
•
•
•
P. Atzeni
Eliminate N-ary relationships
Eliminate attributes from relationships
Eliminate many-to-many relationships
Replace relationships with references
Eliminate generalizations
Varna - October 2, 2007
57
Complex translations
N-ary ER
w/ gen
Binary ER
w/ gen
N-ary ER
w/o gen
Bin ER w/ gen
w/o attr on rel
Binary ER
w/o gen
Bin ER w/o gen
w/o attr on rel
Relational
P. Atzeni
…
Elim. N-ary relationships
Elim. Relationship attr.s
Elim. M:N relationships
Replace relationships with
references
Elim OO generalizations
Elim ER generalizations
Bin ER w/ gen
w/o M:N rel
OO w/ gen
Bin ER w/o gen
w/o M:N rel
OO w/o gen
Varna - October 2, 2007
58
Translations
• Basic translations are written in a variant of Datalog, with OID
invention
– We specify them at the schema level
– The tool "translates them down" to the data level
– Some completion or tuning may be needed
P. Atzeni
Varna - October 2, 2007
59
A Multi-Level Dictionary
• Handles models, schemas and data
• Has both a model specific and a model independent component
P. Atzeni
Varna - October 2, 2007
60
description
Multi-Level Dictionary
model
Model descriptions
(mM)
Supermodel description
(mSM)
schema
Model specific schemas
(M)
Supermodel schemas
(SM)
data
Model specific instances
(i-M)
Supermodel instances
(i-SM)
model
specific
model
generic
P. Atzeni
Varna - October 2, 2007
model
independence
61
mM
M
i-M
mSM
SM
i-SM
The supermodel description
MSM-Property
MSM-Construct
OID
Name
Constr.
Type
11
Name
1
String
12
Name
2
String
OID
Name
IsLex
13
IsKey
2
Boolean
1
AggregationOfLexicals
F
14
IsNullable
2
Boolean
2
ComponentOfAggrOfLex
T
15
Type
2
String
3
Abstract
F
16
Name
3
String
4
AttributeOfAbstract
T
17
Name
4
String
5
BinaryAggregationOfAbstracts
F
18
IsIdentifier
4
Boolean
19
IsNullable
4
Boolean
20
Type
4
String
21
IsFunct1
5
Boolean
22
IsOptional1
5
Boolean
23
Role1
5
String
24
IsFunct2
5
Boolean
25
IsOptional2
5
Boolean
26
Role2
5
String
MSM-Reference
OID
Name
Construct
Target
30
Aggregation
2
1
31
Abstract
4
3
32
Abstract1
5
3
33
Abstract2
5
3
P. Atzeni
Varna - October 2, 2007
62
mM
M
i-M
mSM
SM
i-SM
Model descriptions
MM-Model
MSM-Construct
OID
Name
1
Relational
OID
Name
IsLex
2
Entity-Relationship
1
AggregationOfLexicals
F
3
Object
2
ComponentOfAggrOfLex
T
3
Abstract
F
MM-Construct
OID
Model
MSM-Constr
Name
4
AttributeOfAbstract
T
1
2
3
ER_Entity
5
BinaryAggregationOfAbstracts
F
2
2
4
ER_Attribute
3
2
5
ER_Relationship
4
1
1
Rel_Table
OID
Name
Construct
Type
5
1
2
Rel_Column
…
…
…
…
6
3
3
OO_Class
MSM-Property
MM-Property
…
…
…
MSM-Reference
…
MM-Reference
…
P. Atzeni
…
…
OID
Name
Construct
…
…
…
…
…
…
Varna - October 2, 2007
63
mM
M
i-M
mSM
SM
i-SM
Schemas in a model
EmpNo
Employees
SM-Construct
OID
Name
IsLex
…
…
…
3
ER-Entity
F
4
ER-AttributeOfEntity
T
…
…
…
Name
Departments
Address
ER-AttributeOfEntity
ER-Entity
OID
Schema
Name
301
1
Employees
302
1
Departments
201
3
Clerks
202
3
Offices
ER schemas
P. Atzeni
Name
OID
Schema
Name
isIdent
isNullable
Type
AbstrOID
401
1
EmpNo
T
F
Int
301
402
1
Name
F
F
Text
301
404
1
Name
T
F
Char
302
405
1
Address
F
F
Text
302
501
3
Code
T
F
Int
201
…
…
…
…
…
…
…
Varna - October 2, 2007
64
mM
M
i-M
mSM
SM
i-SM
Schemas in the supermodel
EmpNo
Employees
MSM-Construct
OID
Name
IsLex
…
…
…
3
Abstract
F
4
AttributeOfAbstract
T
…
…
…
Schema
Name
301
1
Employees
302
1
Departments
201
3
Clerks
202
3
Offices
Supermodel schemas
P. Atzeni
Name
Departments
Address
SM-AttributeOfAbstract
SM-Abstract
OID
Name
OID
Schema
Name
isIdent
isNullable
Type
AbstrOID
401
1
EmpNo
T
F
Int
301
402
1
Name
F
F
Text
301
404
1
Name
T
F
Char
302
405
1
Address
F
F
Text
302
501
3
Code
T
F
Int
201
…
…
…
…
…
…
…
Varna - October 2, 2007
65
mM
M
i-M
mSM
SM
i-SM
Instances in the supermodel
i-SM-Abstract
SM-Abstract
OID
OID
AbsOID
Schema
InstOID
Name
3010
301
301
1
1
Employees
3011
302
301
1
1
Departments
…
201
3…
…
Clerks
3010
202
301
3
2
Offices
Supermodel
Supermodelinstances
schemas
P. Atzeni
i-SM-AttributeOfAbstract
SM-AttributeOfAbstract
OID
OID
AttrOfAbsOID
Schema
Name InstOID
isIdent
Value
isNullable
i- AbsOID
Type
AbstrOID
4010
401
1 401 EmpNo
1 T
75432
F
Int 3010 301
4020
402
1 402
Name
1 F
John Doe
F
Text 3010 301
1
Name
T
…
404
4021
405
…
501
…
F
1 402 Address
1 F
3 …
… T
…F
Int
…
…
…
…
Code
…
Varna - October 2, 2007
Bob White
F
Char
302
Text 3011 302
… 201
…
66
description
Multi-Level Repository, generation and use
model
Model descriptions
(mM)
Model specific schemas
schema
(M)
Structure fixed, content
provided by model
designers Model
out ofspecific
mSM instances
Supermodel description
(mSM)
Supermodel schemas
Structure fixed, content
(SM)
provided by tool
designers
Supermodel instances
data
(i-M)
Structure generated
by
(i-SM)
the tool from the content
of mM
model
specific
P. Atzeni
Varna - October 2, 2007
Structure generated by
the tool from the content
model of mSMmodel
independence
generic
Structure generated by
the tool from the content
of mSM
67
Translations
• Basic translations are written in a variant of Datalog, with OID
invention
– We specify them at the schema level
– The tool "translates them down" to the data level
– Some completion or tuning may be needed
P. Atzeni
Varna - October 2, 2007
68
A basic translation
• From (a simple) binary ER model to the relational model
– a table for each entity
– a column (in the table for E) for each attribute of an entity E
– for each M:N relationship
• a table for the relationship
• columns …
– for each 1:N and 1:1 relationship:
• a column for each attribute of the identifier …
P. Atzeni
Varna - October 2, 2007
69
A basic translation application
EmpNo
Employees
Employees
Name
EmpNo
Name
Affiliation
1,1
Affiliation
0,N
Departments
Name
Name
Departments
P. Atzeni
Address
Address
Varna - October 2, 2007
70
A basic translation (in supermodel terms)
• From (a simple) binary ER model to the relational model
antable
aggregation
lexicals for each abstract
– a
for each of
entity
component
of the
aggregation
for each
attribute
abstract
– a column
(in the
table
for E) for each
attribute
of anofentity
E
for each
each M:N
M:N relationship
aggregation of abstracts …
– for
…table for the relationship
• a
• columns …
– for each 1:N and 1:1 relationship:
• a column for each attribute of the identifier …
P. Atzeni
Varna - October 2, 2007
71
"An aggregation of lexicals for each abstract"
SM_AggregationOfLexicals(
OID: #aggregationOID_1(OID),
Name: name)

SM_Abstract (
OID: OID,
Name: name ) ;
P. Atzeni
Varna - October 2, 2007
72
Datalog with OID invention
• Datalog (informally):
– a logic programming language with no function symbols and
predicates that correspond to relations in a database
– we use a non-positional notation
• Datalog with OID invention:
– an extension of Datalog that uses Skolem functions to
generate new identifiers when needed
• Skolem functions:
– injective functions that generate "new" values (value that do
not appear anywhere else); so different Skolem functions
have disjoint ranges
P. Atzeni
Varna - October 2, 2007
73
"An aggregation of lexicals for each abstract"
SM_AggregationOfLexicals(
OID: #aggregationOID_1(OID),
Name: n)

SM_Abstract (
OID: OID,
Name: n ) ;
P. Atzeni
•
•
the value for the attribute Name
is copied (by using variable n)
the value for OID is "invented":
a new value for the function
#aggregationOID_1(OID) for
each different value of OID, so a
different value for each value of
SM_Abstract.OID
Varna - October 2, 2007
74
"An aggregation of lexicals for each abstract"
Employees
EmpNo
Name
Employees
SM-Abstract
OID
Schema
Name
301
1
Employees
302
1
Departments
…
…
…
OID
Schema
401
1
402
1
…
…
SM_AggregationOfLexicals(
OID: #aggregationOID_1(OID),
SM-AttributeOfAbstract
Name: n)
Name
isIdent
isNullable
Type AbstrOID

EmpNo
T
F
Int
301
SM_Abstract
(
Name
F
F
Text
301
OID: OID,
…
…
…
…
…
Name: n ) ;
SM-AggregationOfLexicals
OID
Schema
Name
1001
11
Employees
1002
11
Departments
SM-aggregationOID_1_SK
OID
absOID
1001
301
1002
302
P. Atzeni
Varna - October 2, 2007
75
"A component of the aggregation for each
attribute of abstract"
SM_ComponentOfAggregation… (
OID: #componentOID_1(attOID),
Name: name,
AggrOID: #aggregationOID_1(absOID),
IsNullable: isNullable,
IsKey: isIdent,
Type : type )
←
SM_AttributeOfAbstract(
OID: attOID,
Name: name,
AbstractOID: absOID,
IsIdent: isIdent,
IsNullable: isNullable ,
Type : type ) ;
P. Atzeni
•
•
•
Skolem functions
– are functions
– are injective
– have disjoint ranges
the first function "generates"
a new value
the second "reuses" the value
generated by the first rule
Varna - October 2, 2007
76
SM_ComponentOfAggregation…
A component (of the aggregation for each
OID: #componentOID_1(attOID),
attribute of abstract"
Name: name, EmpNo
Employees
Employees
AggrOID: #aggregationOID_1(absOID),
Name
EmpNo
Name
IsNullable: isNullable,
IsKey: isIdent,
SM-AttributeOfAbstract
SM-Abstract
Type : type )
OID Schema
Name
isIdent
isNullable
Type AbstrOID
OID Schema
Name
←
401
1
EmpNo
T
F
Int
301
301
1
Employees
SM_AttributeOfAbstract(
402
1
Name
F
F
Text
301
302
1
Departments
OID: attOID,
…
…
…
…
…
…
…
…
…
…
Name: name,
SM-ComponentOfAggregationOfLexicals
SM-AggregationOfLexicals
AbstractOID: absOID,
OID Schema
Name
isIdent
isNullable
Type
AggrOID
OIDIsIdent:
Schema
isIdent, Name
1003
11
EmpNo
T
F
Int
1001
1001
11
Employees
IsNullable:
isNullable
,
11
Name
F
F
Text
1001
1002
Type : 11
type ) ; Departments 1004
SM-aggregationOID_1_SK
SM-componentOID_1_SK
OID
absOID
OID
absOID
1001
301
1003
401
1002
302
1004
402
P. Atzeni
Varna - October 2, 2007
77
Generating data-level translations
•
•
•
Same environment
Same language
High level translation specification
Supermodel description
(mSM)
Supermodel schemas
(SM)
Schema
translation
Supermodel instances
(i-SM)
Data
translation
P. Atzeni
Varna - October 2, 2007
78
Translation rules, data level
i-SM_ ComponentOfAggregation… (
OID: #i-componentOID_1 (i-attOID),
SM_ComponentOfAggregation… (
i-AggrOID: #i-aggregationOID_1(i-absOID),
OID: #componentOID_1(attOID),
ComponentOfAggregationOfLexicalsOID:
Name: name,
#componentOID_1(attOID),
AggrOID:
Value: Value )
#aggregationOID_1(absOID),
←
IsNullable: isNullable,
i-SM_AttributeOfAbstract(
IsKey: isIdent,
OID: i-attOID,
Type : type )
i-AbstractOID: i-absOID,
←
AttributeOfAbstractOID: attOID,
SM_AttributeOfAbstract(
Value: Value ),
OID: attOID,
SM_AttributeOfAbstract(
Name: name,
OID: attOID,
AbstractOID: absOID,
AbstractOID: absOID,
IsIdent: isIdent,
Name: attName,
IsNullable: isNullable ,
IsNullable: isNull,
Type: type ) ;
IsID: isIdent,
Type: type )
P. Atzeni
Varna - October 2, 2007
79
Correctness
• Usually modelled in terms of information capacity
equivalence/dominance (Hull 1986, Miller 1993, 1994)
• Mainly negative results in practical settings that are non-trivial
• Probably hopeless to have correctness in general
• We follow an "axiomatic" approach:
– We have to verify the correctness of the basic translations,
and then infer that of complex ones
P. Atzeni
Varna - October 2, 2007
80
Experiments
• A significant set of models
– ER (in many variants and extensions)
– Relational
– OR
– XSD
– UML
• Demo (good luck!)
P. Atzeni
Varna - October 2, 2007
81
OR Tab, gen
ref, FK, nested
XSD
OR noTab,
gen
ref, FK, nested
OR noTab, gen,
FK, nested
OR noTab
ref, FK,
nested
OO ref,
nested
OR Tab,
ref, FK, nested
OR noTab,
FK, nested
OR noTab,
FK, flat
OO ref,
flat
Relational
P. Atzeni
37+ Remove generalizations
36 Unnest sets (with ref)
03 Unnest sets (with FKs)
24 Unnest structures (flatten)
06 Unnest structures (TT & FKs)
01 Unnest structures (TT & ref)
43 FKs for references
29 Tables for typed tables
30 Typed tables for tables
13 References for FK
31 Nest referenced classes
Varna - October 2, 2007
82
OR Tab, gen
ref, FK, nested
XSD
OR noTab,
gen
ref, FK, nested
OR noTab, gen,
FK, nested
OR noTab
ref, FK,
nested
OO ref,
nested
OR Tab,
ref, FK, nested
OR noTab,
FK, nested
36 Unnest sets (with ref)
24 Unnest structures (flatten)
OR noTab,
FK, flat
OO ref,
flat
01 Unnest structures (TT & ref)
43 FKs for references
29 Tables for typed tables
Relational
P. Atzeni
Varna - October 2, 2007
83
Summary
• ModelGen was studied a few years ago
• New interest on it within the "Model management" framework
• New approach
– Translation of schema and data
– Visible (and in part self generated) dictionary
– Visible and modifiable rules
– Skolem functions describe mappings
P. Atzeni
Varna - October 2, 2007
84
Conclusion
• The ten-year goal of the Asilomar report:
– The information utility:
make it easy for everyone to store, organize, access,
and analyze the majority of human information online
• A lot of interesting work has been done but …
• …integration, translation, exchange are still difficult…
• … 2009 is approaching … we are late!
P. Atzeni
Varna - October 2, 2007
85