Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Schema and Data Translation Paolo Atzeni Università Roma Tre Based on Tutorial at ICDE 2006 Paper in EDBT 2006 (with P. Cappellari and P. Bernstein) Demo in Sigmod 2007 (with P. Cappellari and G. Gianforme) ADBIS --- Varna, October 2, 2007 Outline • • • • Introduction Model management Schema and data translation: the problem A metamodel based approach P. Atzeni Varna - October 2, 2007 2 A ten-year goal for database research • The “Asilomar report” (Bernstein et al. Sigmod Record 1999 www.acm.org/sigmod): – The information utility: make it easy for everyone to store, organize, access, and analyze the majority of human information online P. Atzeni Varna - October 2, 2007 3 A general framework: cooperation • "The capacity of a system to interact (effectively) with other systems, possibly managed by different organizations" • Forms of cooperation – Process-centered cooperation: • the systems offer services one another, by exchanging messages, or by triggering activities, without making remote data explicitly visible – Data-centered cooperation: • the systems offer data one another; data is distributed, heterogeneous and autonomous P. Atzeni Varna - October 2, 2007 4 Databases in the Internet era • Databases before the Internet – An internal infrastructure, a precious resource, but usually hidden, with some controlled cooperation • Internet changes the requirements – More users (not only humans), more diverse cooperating systems (distributed, heterogeneous, autonomous), more types of data • "Future" Internet changes more – New devices (embedded everywhere), even more users (many “per person”), real mobility, need for personalization and adaptation P. Atzeni Varna - October 2, 2007 5 The most studied form of data-centered cooperation: integration • We are interested in data-centered cooperation, often referred to as integration “The unification of related, heterogeneous data from disparate sources, for example, to enable collaboration” (Hammer & Stonebraker 2005) • Some "paradigms" … P. Atzeni Varna - October 2, 2007 6 Multidatabase Global Manager P. Atzeni Mediator Mediator Mediator Local mgr Local mgr Local mgr DB DB DB Varna - October 2, 2007 7 Data Warehousing System DW manager Data Warehouse “Integrator” Mediator Mediator Mediator Local manager Local manager Local manager DB DB DB P. Atzeni Varna - October 2, 2007 8 Intermediate solutions in practice Application Local manager Mediator Local Manager DB DB Integrator Local manager DB P. Atzeni Mediator Mediator Local manager Local manager DB DB Varna - October 2, 2007 9 Peer-based architecture Peer mgr Local mgr Mediator DB Peer mgr Local mgr Local mgr DB Mediator DB Mediator Local mgr DB Mediator Peer mgr Local mgr Mediator Local mgr DB DB Mediator P. Atzeni Varna - October 2, 2007 10 Data is not just in databases • • • • • • • Mail messages Web pages Spreadsheets Textual documents Palmtop devices, mobile phones Multimedia annotations (e.g., in digital photos) XML documents P. Atzeni Varna - October 2, 2007 11 The same data in the same form? • Adaptivity: – Personalization: content adapted to the user • upon system's decision • upon user's request – Customization: structure adapted to the user • according to the user's role • upon user's request – Context dependence • User, Device, Network, Place, Time, Rate P. Atzeni Varna - October 2, 2007 12 Summarizing: a general need • We have data at various places, and data has to be – exchanged – replicated – migrated – integrated – adapted P. Atzeni Varna - October 2, 2007 13 A major difficulty • Heterogeneity – System level – Model level – Structural (different structure for similar data) – Semantic (different meaning for the same structure) • Many efforts, but current techniques are mostly manual and ad hoc P. Atzeni Varna - October 2, 2007 14 A direction for the solutions • Be general! – ad hoc solution could work in-the-small, but they • are repetitive and time consuming • do not scale • are not maintainable • Historical notes: – W. C. McGee: Generalization: Key to Successful Electronic Data Processing. J. ACM 1959 • Indeed, databases are the result of generalization applied to secondary storage management! P. Atzeni Varna - October 2, 2007 15 Generality requires … • … high-level descriptions of problems within the family of interest: – Metadata: • “data about data” • (formal or informal) description of structures and meaning • General solutions leverage on metadata (and then operate on data as a consequence) P. Atzeni Varna - October 2, 2007 16 Outline • • • • Introduction and motivation Model management Schema and data translation: the problem A metamodel based approach P. Atzeni Varna - October 2, 2007 17 A wider perspective • (Generic) Model Management: – A proposal by Bernstein et al (2000 +) – Includes a set of operators on • schemas and • mappings between schemas P. Atzeni Varna - October 2, 2007 18 Terminology: a warning P. Atzeni Model Mgmt people Traditional DB people Meta-metamodel Metamodel Metamodel Model Model Schema Varna - October 2, 2007 19 Schemas and mappings • A simplified approach: – Schema: • a set of elements, related in some way to one another – Mapping: • a set of correspondences (pair of elements) or • its reification, a third schema related to the other two via two sets of correspondences P. Atzeni Varna - October 2, 2007 20 Model mgmt operators, a first set • • • • map = Match (S1, S2) S3 = Merge (S1, S2, map) S2 = Diff (S1, map) and more – – – – – – P. Atzeni map3 = Compose (map1, map2) S2 = Select (S1, pred) Apply (S, f) list = Enumerate (S) S2 = Copy (S1) … Varna - October 2, 2007 21 Match • map = Match (S1, S2) – given • two schemas S1, S2 – returns • a mapping between them • the “classical” initial step in data integration: – find the common elements of two schemas and the correspondences between them P. Atzeni Varna - October 2, 2007 22 Merge • S3 = Merge (S1, S2, map) – given • two schemas and a mapping between them – returns • a third schema (and two mappings) • the “classical” second step in data integration: – given the correspondences, find a way to obtain one schema out of two P. Atzeni Varna - October 2, 2007 23 Diff • S2 = Diff (S1, map) – given • a schema and a mapping from it (to some other schema, not relevant) – returns • a (sub-)schema, with the elements that do not participate in the mapping P. Atzeni Varna - October 2, 2007 24 Example (Bernstein and Rahm, ER 2000) • A database (a “source”), a data warehouse and a mapping between the two • we get a second source, with some similarity to the first one • and we want to update the DW DB1 DW1 DW2 DB2 P. Atzeni Varna - October 2, 2007 25 Example, the "solution" m2 = Match(DB1,DB2) DW2 DB1 m1 DW1 m3 m2 DB2 DB2’ P. Atzeni m5 m4 m3= Compose(m2,m1) DB2’=Diff(DB2,m3) DW2’, m4 user defined m5 = Match(DW1,DW2’) DW2 = Merge(DW1,DW2’,m5) DW2’ Varna - October 2, 2007 26 Magic does not exist • Operators might require human intervention: – Match is the main case • Scripts involving operators might require human intervention as well (or at least benefit from it): – a full implementation of each operator might not always available – a mapping might require manual specification – incomparable alternatives might exist P. Atzeni Varna - October 2, 2007 27 The “data level” • The major operators have also an extended version that operates on data, and not only on schemas • Especially apparent for – Merge P. Atzeni Varna - October 2, 2007 28 We also have heterogeneity • Round trip engineering (Bernstein, CIDR 2003) – A specification (for example ER or UML) and an implementation (for example, relational) – then a change to the implementation: want to revise the specification • We need a translation from the implementation model to the specification one P. Atzeni S1 S2 I1 I2 Varna - October 2, 2007 29 Model management with heterogeneity • The previous operators have to be “model generic” (capable of working on different models) • We need a “translation” operator – <S2, map12> = ModelGen (S1) P. Atzeni Varna - October 2, 2007 30 ModelGen, an additional operator • <S2, map12> = ModelGen (S1) – given • a schema (in a model) – returns • a schema (in a different data model) and a mapping between the two • A “translation” from a model to another • I should call it “SchemaGen” … • We should better write – <S2, map12> = ModelGen (S1,mod2) P. Atzeni Varna - October 2, 2007 31 Round trip engineering S1 m1 S2 m3 S2’ m4 m2 I1 I2 I2’ m2 = Match (I1,I2) m3 = Compose (m1,m2) I2’= Diff(I2,m3) <S2’,m4 > = Modelgen(I2’) … Match, Merge P. Atzeni Varna - October 2, 2007 32 Outline • • • • Introduction Model management Schema and data translation: the problem A metamodel based approach P. Atzeni Varna - October 2, 2007 33 Schema and data translation, a long standing issue • Schema translation: – given schema S1 in model M1 and model M2 – find a schema S2 in M2 that “corresponds” to S1 • Schema and data translation: – given also a database D1 for S1 – find also a database D2 for S2 that “contains the same data” to D1 P. Atzeni Varna - October 2, 2007 34 Schema and data translation, a long standing issue • Translations from a model to another have been studied since the 1970’s • Whenever a new model is defined, techniques and tools to generate translations are studied • However, proposals and solutions are usually model specific P. Atzeni Varna - October 2, 2007 35 Model specific solutions • Given an ER schema, find the suitable relational schema that “implements” it – the original paper (Chen 1976) contains the basics – further discussions by many (e.g. Markowitz and Shoshani 1989) – illustrated in every textbook • Similarly with – any other conceptual model and any other logical one – XML and relational (or object) P. Atzeni Varna - October 2, 2007 36 Another problem in the picture: data exchange • Given a source S1 and a target schema S2 (in different models or even in the same one), find a translation, that is, a function that given a database D1 for S1 produces a database D2 for S2 that “correspond” to D1 • Often emphasized with reference to materialized solutions P. Atzeni Varna - October 2, 2007 37 Integration • Given two or more sources, build an integrated schema or database P. Atzeni Varna - October 2, 2007 38 Schema translation • Given a schema find another one with respect to some specific goal (another model, better quality, …) P. Atzeni Varna - October 2, 2007 39 Data exchange • Given a source and a target schema, find a transformation from the former to the latter P. Atzeni Varna - October 2, 2007 40 Schema translation and data exchange • Can be seen a complementary: – Data translation = schema translation + data exchange • Given a source schema and database • Schema translation produces the target schema • Data exchange generates the target database • In model management terms we could write – Schema translation: • <S2, map12> = ModelGen (S1,mod2) – Data exchange: • i2 = DataGen (S1,i1,S2,map12) P. Atzeni Varna - October 2, 2007 41 Outline • • • • Introduction Model management Schema and data translation: the problem A metamodel based approach P. Atzeni Varna - October 2, 2007 42 The problem • ModelGen – given two data models M1 and M2, and a schema S1 of M1 (the source schema and model), • generate a schema S2 of M2 (the target schema and model), corresponding to S1 • and, for each database D1 over S1, generate an equivalent database D2 over S2 P. Atzeni Varna - October 2, 2007 43 We have been doing this for a while • Initial work more than ten years ago (Atzeni & Torlone, 1996) • Major novelty recently (Atzeni, Cappellari & Bernstein, 2006) – translation of both schemas and data – data-level translations generated automatically, from schema-level ones P. Atzeni Varna - October 2, 2007 44 A simple example • • • An object relational database, to be translated in a relational one Source: the OR-model Target: the relational model System managed ids used as references P. Atzeni Varna - October 2, 2007 45 Example, 2 • • Does the OR model allow for keys? Assume EmpNo and Name are keys P. Atzeni Varna - October 2, 2007 46 Example, 3 • • Does the OR model allow for keys? Assume no keys are specified P. Atzeni Varna - October 2, 2007 47 Many different models (and variations …) N-ary ER w/ gen OR Binary ER w/ gen N-ary ER w/o gen Bin ER w/o gen w/o attr on rel P. Atzeni … … … Bin ER w/ gen w/o attr on rel Binary ER w/o gen Relational XSD Bin ER w/ gen w/o M:N rel OO w/ gen Bin ER w/o gen w/o M:N rel OO w/o gen Varna - October 2, 2007 48 Heterogeneity • We need to handle artifacts and data in various models – Data are defined wrt to schemas – Schemas are defined wrt to models – How models can be defined? Models Schemas Data P. Atzeni Varna - October 2, 2007 49 A metamodel approach • • • The constructs in the various models are rather similar: – can be classified into a few categories (Hull & King 1986): • Lexical: set of printable values (domain) • Abstract (entity, class, …) • Aggregation: a construction based on (subsets of) cartesian products (relationship, table) • Function (attribute, property) • Hierarchies • … We can fix a set of metaconstructs (each with variants): – lexical, abstract, aggregation, function, ... – the set can be extended if needed, but this will not be frequent A model is defined in terms of the metaconstructs it uses P. Atzeni Varna - October 2, 2007 50 The metamodel approach, example • The ER model: – Abstract (called Entity) – Function from Abstract to Lexical (Attribute) – Aggregation of abstracts (Relationship) – … • The OR model: – Abstract (Table with ID) – Function from Abstract to Lexical (value-based Attribute) – Function from Abstract to Abstract (reference Attribute) – Aggregation of lexicals (value-based Table) – Component of Aggregation of Lexicals (Column) – … P. Atzeni Varna - October 2, 2007 51 The supermodel • A model that includes all the meta-constructs (in their most general forms) – Each model is subsumed by the supermodel (modulo construct renaming) – Each schema for any model is also a schema for the supermodel (modulo construct renaming) P. Atzeni Varna - October 2, 2007 52 The metamodel approach, translations • The constructs in the various models are rather similar: – can be classified into a few categories (“metaconstructs'') – translations can be defined on metaconstructs, • and there are “standard”, accepted ways to deal with translations of metaconstructs • they can be performed within the supermodel – each translation from the supermodel SM to a target model M is also a translation from any other model to M: • given n models, we need n translations, not n2 P. Atzeni Varna - October 2, 2007 53 Generic translation environment Supermodel 2. Translation 1. Copy 3. Copy Source model Target model Translation: composition 1,2 & 3 P. Atzeni Varna - October 2, 2007 54 Translations within the supermodel • We still have too many models: – Just within simple ER model versions, we have 4 or 5 constructs, and each has several independent features which give rise to variants • for example, relationships can be – binary or N-ary – with all possible cardinalities or without many-tomany – with or without the possibility of specifying optionality – with or without attributes –… – Combining all these, we get hundreds of models! – The management of a specific translation for each model would be hopeless P. Atzeni Varna - October 2, 2007 55 Translations, the approach • Elementary translation steps to be combined • Each translation step handles a supermodel construct (or a feature thereof) "to be eliminated" or "transformed" • A translation is the concatenation of elementary translation steps P. Atzeni Varna - October 2, 2007 56 A complex translation, example (0,N) (0,N) • • • • • P. Atzeni Eliminate N-ary relationships Eliminate attributes from relationships Eliminate many-to-many relationships Replace relationships with references Eliminate generalizations Varna - October 2, 2007 57 Complex translations N-ary ER w/ gen Binary ER w/ gen N-ary ER w/o gen Bin ER w/ gen w/o attr on rel Binary ER w/o gen Bin ER w/o gen w/o attr on rel Relational P. Atzeni … Elim. N-ary relationships Elim. Relationship attr.s Elim. M:N relationships Replace relationships with references Elim OO generalizations Elim ER generalizations Bin ER w/ gen w/o M:N rel OO w/ gen Bin ER w/o gen w/o M:N rel OO w/o gen Varna - October 2, 2007 58 Translations • Basic translations are written in a variant of Datalog, with OID invention – We specify them at the schema level – The tool "translates them down" to the data level – Some completion or tuning may be needed P. Atzeni Varna - October 2, 2007 59 A Multi-Level Dictionary • Handles models, schemas and data • Has both a model specific and a model independent component P. Atzeni Varna - October 2, 2007 60 description Multi-Level Dictionary model Model descriptions (mM) Supermodel description (mSM) schema Model specific schemas (M) Supermodel schemas (SM) data Model specific instances (i-M) Supermodel instances (i-SM) model specific model generic P. Atzeni Varna - October 2, 2007 model independence 61 mM M i-M mSM SM i-SM The supermodel description MSM-Property MSM-Construct OID Name Constr. Type 11 Name 1 String 12 Name 2 String OID Name IsLex 13 IsKey 2 Boolean 1 AggregationOfLexicals F 14 IsNullable 2 Boolean 2 ComponentOfAggrOfLex T 15 Type 2 String 3 Abstract F 16 Name 3 String 4 AttributeOfAbstract T 17 Name 4 String 5 BinaryAggregationOfAbstracts F 18 IsIdentifier 4 Boolean 19 IsNullable 4 Boolean 20 Type 4 String 21 IsFunct1 5 Boolean 22 IsOptional1 5 Boolean 23 Role1 5 String 24 IsFunct2 5 Boolean 25 IsOptional2 5 Boolean 26 Role2 5 String MSM-Reference OID Name Construct Target 30 Aggregation 2 1 31 Abstract 4 3 32 Abstract1 5 3 33 Abstract2 5 3 P. Atzeni Varna - October 2, 2007 62 mM M i-M mSM SM i-SM Model descriptions MM-Model MSM-Construct OID Name 1 Relational OID Name IsLex 2 Entity-Relationship 1 AggregationOfLexicals F 3 Object 2 ComponentOfAggrOfLex T 3 Abstract F MM-Construct OID Model MSM-Constr Name 4 AttributeOfAbstract T 1 2 3 ER_Entity 5 BinaryAggregationOfAbstracts F 2 2 4 ER_Attribute 3 2 5 ER_Relationship 4 1 1 Rel_Table OID Name Construct Type 5 1 2 Rel_Column … … … … 6 3 3 OO_Class MSM-Property MM-Property … … … MSM-Reference … MM-Reference … P. Atzeni … … OID Name Construct … … … … … … Varna - October 2, 2007 63 mM M i-M mSM SM i-SM Schemas in a model EmpNo Employees SM-Construct OID Name IsLex … … … 3 ER-Entity F 4 ER-AttributeOfEntity T … … … Name Departments Address ER-AttributeOfEntity ER-Entity OID Schema Name 301 1 Employees 302 1 Departments 201 3 Clerks 202 3 Offices ER schemas P. Atzeni Name OID Schema Name isIdent isNullable Type AbstrOID 401 1 EmpNo T F Int 301 402 1 Name F F Text 301 404 1 Name T F Char 302 405 1 Address F F Text 302 501 3 Code T F Int 201 … … … … … … … Varna - October 2, 2007 64 mM M i-M mSM SM i-SM Schemas in the supermodel EmpNo Employees MSM-Construct OID Name IsLex … … … 3 Abstract F 4 AttributeOfAbstract T … … … Schema Name 301 1 Employees 302 1 Departments 201 3 Clerks 202 3 Offices Supermodel schemas P. Atzeni Name Departments Address SM-AttributeOfAbstract SM-Abstract OID Name OID Schema Name isIdent isNullable Type AbstrOID 401 1 EmpNo T F Int 301 402 1 Name F F Text 301 404 1 Name T F Char 302 405 1 Address F F Text 302 501 3 Code T F Int 201 … … … … … … … Varna - October 2, 2007 65 mM M i-M mSM SM i-SM Instances in the supermodel i-SM-Abstract SM-Abstract OID OID AbsOID Schema InstOID Name 3010 301 301 1 1 Employees 3011 302 301 1 1 Departments … 201 3… … Clerks 3010 202 301 3 2 Offices Supermodel Supermodelinstances schemas P. Atzeni i-SM-AttributeOfAbstract SM-AttributeOfAbstract OID OID AttrOfAbsOID Schema Name InstOID isIdent Value isNullable i- AbsOID Type AbstrOID 4010 401 1 401 EmpNo 1 T 75432 F Int 3010 301 4020 402 1 402 Name 1 F John Doe F Text 3010 301 1 Name T … 404 4021 405 … 501 … F 1 402 Address 1 F 3 … … T …F Int … … … … Code … Varna - October 2, 2007 Bob White F Char 302 Text 3011 302 … 201 … 66 description Multi-Level Repository, generation and use model Model descriptions (mM) Model specific schemas schema (M) Structure fixed, content provided by model designers Model out ofspecific mSM instances Supermodel description (mSM) Supermodel schemas Structure fixed, content (SM) provided by tool designers Supermodel instances data (i-M) Structure generated by (i-SM) the tool from the content of mM model specific P. Atzeni Varna - October 2, 2007 Structure generated by the tool from the content model of mSMmodel independence generic Structure generated by the tool from the content of mSM 67 Translations • Basic translations are written in a variant of Datalog, with OID invention – We specify them at the schema level – The tool "translates them down" to the data level – Some completion or tuning may be needed P. Atzeni Varna - October 2, 2007 68 A basic translation • From (a simple) binary ER model to the relational model – a table for each entity – a column (in the table for E) for each attribute of an entity E – for each M:N relationship • a table for the relationship • columns … – for each 1:N and 1:1 relationship: • a column for each attribute of the identifier … P. Atzeni Varna - October 2, 2007 69 A basic translation application EmpNo Employees Employees Name EmpNo Name Affiliation 1,1 Affiliation 0,N Departments Name Name Departments P. Atzeni Address Address Varna - October 2, 2007 70 A basic translation (in supermodel terms) • From (a simple) binary ER model to the relational model antable aggregation lexicals for each abstract – a for each of entity component of the aggregation for each attribute abstract – a column (in the table for E) for each attribute of anofentity E for each each M:N M:N relationship aggregation of abstracts … – for …table for the relationship • a • columns … – for each 1:N and 1:1 relationship: • a column for each attribute of the identifier … P. Atzeni Varna - October 2, 2007 71 "An aggregation of lexicals for each abstract" SM_AggregationOfLexicals( OID: #aggregationOID_1(OID), Name: name) SM_Abstract ( OID: OID, Name: name ) ; P. Atzeni Varna - October 2, 2007 72 Datalog with OID invention • Datalog (informally): – a logic programming language with no function symbols and predicates that correspond to relations in a database – we use a non-positional notation • Datalog with OID invention: – an extension of Datalog that uses Skolem functions to generate new identifiers when needed • Skolem functions: – injective functions that generate "new" values (value that do not appear anywhere else); so different Skolem functions have disjoint ranges P. Atzeni Varna - October 2, 2007 73 "An aggregation of lexicals for each abstract" SM_AggregationOfLexicals( OID: #aggregationOID_1(OID), Name: n) SM_Abstract ( OID: OID, Name: n ) ; P. Atzeni • • the value for the attribute Name is copied (by using variable n) the value for OID is "invented": a new value for the function #aggregationOID_1(OID) for each different value of OID, so a different value for each value of SM_Abstract.OID Varna - October 2, 2007 74 "An aggregation of lexicals for each abstract" Employees EmpNo Name Employees SM-Abstract OID Schema Name 301 1 Employees 302 1 Departments … … … OID Schema 401 1 402 1 … … SM_AggregationOfLexicals( OID: #aggregationOID_1(OID), SM-AttributeOfAbstract Name: n) Name isIdent isNullable Type AbstrOID EmpNo T F Int 301 SM_Abstract ( Name F F Text 301 OID: OID, … … … … … Name: n ) ; SM-AggregationOfLexicals OID Schema Name 1001 11 Employees 1002 11 Departments SM-aggregationOID_1_SK OID absOID 1001 301 1002 302 P. Atzeni Varna - October 2, 2007 75 "A component of the aggregation for each attribute of abstract" SM_ComponentOfAggregation… ( OID: #componentOID_1(attOID), Name: name, AggrOID: #aggregationOID_1(absOID), IsNullable: isNullable, IsKey: isIdent, Type : type ) ← SM_AttributeOfAbstract( OID: attOID, Name: name, AbstractOID: absOID, IsIdent: isIdent, IsNullable: isNullable , Type : type ) ; P. Atzeni • • • Skolem functions – are functions – are injective – have disjoint ranges the first function "generates" a new value the second "reuses" the value generated by the first rule Varna - October 2, 2007 76 SM_ComponentOfAggregation… A component (of the aggregation for each OID: #componentOID_1(attOID), attribute of abstract" Name: name, EmpNo Employees Employees AggrOID: #aggregationOID_1(absOID), Name EmpNo Name IsNullable: isNullable, IsKey: isIdent, SM-AttributeOfAbstract SM-Abstract Type : type ) OID Schema Name isIdent isNullable Type AbstrOID OID Schema Name ← 401 1 EmpNo T F Int 301 301 1 Employees SM_AttributeOfAbstract( 402 1 Name F F Text 301 302 1 Departments OID: attOID, … … … … … … … … … … Name: name, SM-ComponentOfAggregationOfLexicals SM-AggregationOfLexicals AbstractOID: absOID, OID Schema Name isIdent isNullable Type AggrOID OIDIsIdent: Schema isIdent, Name 1003 11 EmpNo T F Int 1001 1001 11 Employees IsNullable: isNullable , 11 Name F F Text 1001 1002 Type : 11 type ) ; Departments 1004 SM-aggregationOID_1_SK SM-componentOID_1_SK OID absOID OID absOID 1001 301 1003 401 1002 302 1004 402 P. Atzeni Varna - October 2, 2007 77 Generating data-level translations • • • Same environment Same language High level translation specification Supermodel description (mSM) Supermodel schemas (SM) Schema translation Supermodel instances (i-SM) Data translation P. Atzeni Varna - October 2, 2007 78 Translation rules, data level i-SM_ ComponentOfAggregation… ( OID: #i-componentOID_1 (i-attOID), SM_ComponentOfAggregation… ( i-AggrOID: #i-aggregationOID_1(i-absOID), OID: #componentOID_1(attOID), ComponentOfAggregationOfLexicalsOID: Name: name, #componentOID_1(attOID), AggrOID: Value: Value ) #aggregationOID_1(absOID), ← IsNullable: isNullable, i-SM_AttributeOfAbstract( IsKey: isIdent, OID: i-attOID, Type : type ) i-AbstractOID: i-absOID, ← AttributeOfAbstractOID: attOID, SM_AttributeOfAbstract( Value: Value ), OID: attOID, SM_AttributeOfAbstract( Name: name, OID: attOID, AbstractOID: absOID, AbstractOID: absOID, IsIdent: isIdent, Name: attName, IsNullable: isNullable , IsNullable: isNull, Type: type ) ; IsID: isIdent, Type: type ) P. Atzeni Varna - October 2, 2007 79 Correctness • Usually modelled in terms of information capacity equivalence/dominance (Hull 1986, Miller 1993, 1994) • Mainly negative results in practical settings that are non-trivial • Probably hopeless to have correctness in general • We follow an "axiomatic" approach: – We have to verify the correctness of the basic translations, and then infer that of complex ones P. Atzeni Varna - October 2, 2007 80 Experiments • A significant set of models – ER (in many variants and extensions) – Relational – OR – XSD – UML • Demo (good luck!) P. Atzeni Varna - October 2, 2007 81 OR Tab, gen ref, FK, nested XSD OR noTab, gen ref, FK, nested OR noTab, gen, FK, nested OR noTab ref, FK, nested OO ref, nested OR Tab, ref, FK, nested OR noTab, FK, nested OR noTab, FK, flat OO ref, flat Relational P. Atzeni 37+ Remove generalizations 36 Unnest sets (with ref) 03 Unnest sets (with FKs) 24 Unnest structures (flatten) 06 Unnest structures (TT & FKs) 01 Unnest structures (TT & ref) 43 FKs for references 29 Tables for typed tables 30 Typed tables for tables 13 References for FK 31 Nest referenced classes Varna - October 2, 2007 82 OR Tab, gen ref, FK, nested XSD OR noTab, gen ref, FK, nested OR noTab, gen, FK, nested OR noTab ref, FK, nested OO ref, nested OR Tab, ref, FK, nested OR noTab, FK, nested 36 Unnest sets (with ref) 24 Unnest structures (flatten) OR noTab, FK, flat OO ref, flat 01 Unnest structures (TT & ref) 43 FKs for references 29 Tables for typed tables Relational P. Atzeni Varna - October 2, 2007 83 Summary • ModelGen was studied a few years ago • New interest on it within the "Model management" framework • New approach – Translation of schema and data – Visible (and in part self generated) dictionary – Visible and modifiable rules – Skolem functions describe mappings P. Atzeni Varna - October 2, 2007 84 Conclusion • The ten-year goal of the Asilomar report: – The information utility: make it easy for everyone to store, organize, access, and analyze the majority of human information online • A lot of interesting work has been done but … • …integration, translation, exchange are still difficult… • … 2009 is approaching … we are late! P. Atzeni Varna - October 2, 2007 85