Download sangam-dasfaa03

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Predictive analytics wikipedia , lookup

Data center wikipedia , lookup

Entity–attribute–value model wikipedia , lookup

Data analysis wikipedia , lookup

Information privacy law wikipedia , lookup

Forecasting wikipedia , lookup

3D optical data storage wikipedia , lookup

Data model wikipedia , lookup

Business intelligence wikipedia , lookup

Relational model wikipedia , lookup

Data vault modeling wikipedia , lookup

Operational transformation wikipedia , lookup

Database model wikipedia , lookup

Transcript
Sangam: A Transformation
Modeling Framework
Kajal T. Claypool (U Mass Lowell)
and
Elke A. Rundensteiner (WPI)
The Era of Electronic Information




Age of electronic information
Data exists in many different formats
 Different data models
 Different schemas
Users need to
 Publish data in many formats
 Integrate and transform data
 Query and expect results in common format
Underlying problem
 Need to express mapping of data from one format
to another
 Need to perform transformation of data based on
expressed mappings.
Schema Translation: State of the Art

Naïve approach [Zhang01,Shanmugasundram99]
 Write specific programs to translate data
from one format to another
 Examples:
 Algorithms: translate XML documents into
relational data [zhang01,shanmugasundram99]
 Latex2html: convert latex into HTML
documents
Schema Translation: State of Art


Matching approach [milo98]
 Automatically discover the semantic
correspondences between two schemas
 Generate translations based on discovered
matches
Modeling approach [bernstein00,atzeni96]
 Transform local schema into common data
model
 Translation language to express mappings
between schemas in middle layer
The Sangam Framework


Goals: Flexible, extensible, and re-usable
Allow users to:
 Explicitly model translations between schemas
 Compose translations from an existing library
of modeled translation patterns
 Choose from a library of translation operators
 Generate translation model from based on
schema match process
 For all modeled translations: transform the
data based on translation
Overview of Sangam Framework
Legend:
System Input
Pattern Interface
User Input
System generated
output
Schema
S1
Data
D1
Schema
S2
Data
D2
Matcher
Matches
Transformation
Framework
ToolSet
TransformDisplayed to
ation
Patterns
User
Transformation
Model
Evaluator
Transformed
Schema
Transformed
Data
User
feedback
T
r
a
n
I
n
t
e
r
f
a
c
e
Outline
Sangam graphs
 Cross algebra operators
 Composition techniques
 Cross algebra graphs
 Execution strategies
 Architecture
 Conclusions

Sangam Graphs

Sangam
1. Common data model:
 Sangam graph model
2. Translation language
 Algebra-based
Sangam
graph
Import
RDB
Cross
Algebra for
translation
Export
XML
Requirements for a Common
Data Model




Graph-based
 Common denominator for most data models
Expressiveness
 Represent schemas from different data models
Fundamental constraints
 Represent constraints such as quantifier, order and
key constraints
Existing data models not completely suitable
 Relational, and OO cannot represent order in clean
manner
 XML (older spec) can not represent key constraints
Sangam Graph Model


Satisfies requirements
 Graph based
 Based on SIGs[Miller93]
 Can model schemas from different data models
 Can represent quantifier, order, key and
foreign key constraints
Graph
 Nodes represent entities
 Eg. Relation, attribute, element
 Edge relationships between them
 Eg. Containment relationship between
relation and attribute
Example: Sangam Graph
<!ELEMENT item (location, mailbox, name)>
<!ATTLIST item id ID #REQUIRED
featured CDATA #IMPLIED>
<!ELEMENT location (#PCDATA)>
<!ELEMENT mailbox (mail*)>
<!ATTLIST mailbox id CDATA>
<!ELEMENT mail (from, to, date)>
<!ELEMENT from (#PCDATA)>
<!ELEMENT to (#PCDATA)>
<!ELEMENT date (#PCDATA)>
<!ELEMENT name (firstName, lastName)>
<!ELEMENT firstName (#PCDATA)>
<!ELEMENT lastName (#PCDATA)>
Cross Algebra



Requirements for a transformation language
 Node and edge manipulations
 Minimal granularity
 Eg: Relation has name and attributes
 Allow composition
Unique contribution: algebra-based translation
language
 Translate from one Sangam graph to another
Four Operators:
 Represent core set of graph linear
transformations [GBook00]
 Can be composed to formulate more complex
operations such as a join operation – not our focus
Cross Algebra Operators

cross, connect, smooth and subdivide
mailbox’
mailbox
featured
G
Cross
Item’
1
Item
1
[1-2]
mailbox
2
[0-n]
mail
G
featured’
G’
>
1
1
mail
G
Connec t
G’
Person’
Person
1
[1-1]
[2-n]
[0-2n]
<
Subdivide
Smooth
Mail’
G’
mail’
Mail
G
1
Mailbox’
[2-n]
1
Mail’
G’
Composition of Operators


Context Dependency
 Output = union of
output of all
operators
Derivation
 Output = output
of root operator
Cross Algebra Graphs
A’
A
e1
op6
CT-3
AE
DT-1
B
e2
e1’
C
e3
AC
D
e4
E
G
X
op1
A
op5
op4
CT-1
CE
X
C
op2
CT-2
X
E
op3
Context dependenc y edge
Derivation
E’
G’
Evaluating a Cross Algebra Graph
Function EvaluateCAT (input: Operator op,
Sangam Graph G, output: Sangam Graph G’)
{
if (!op.hasChildren ())
G’  p.evaluate (G, G’)
op.markDone ()
out  G’ // cached local output
return G’
while (op.hasChildren()) {
elseif (e:<op, opC> == context dependency
operator opC  op.getNextChild ()
G_local  EvaluateCAT (opC, G, G’)
if (e:<op, opC> == derivation)
G’_local  op.evaluate (G, G’)
G_local  EvaluateCAT (opC, G, G’)
G’  G_local U G’_local
G’  op.evaluate (G_local, G’)
op.markDone ()
op.markDone ()
out  G’_local // local cached output
out  G’ // cached local output
return G’
return G’
Architecture
CAG-Builder
SAG G
& extent
SAG
CAG
SAG G’
& extent
Extent
CAG-Evaluator
Sc hema
SAG-Sc hema
Builder
SAG-Data
Translator
SAG-Loader
XML DTD
& Doc s
Relational
Sc hem a &
Data
Data
SAG-Sc hema
Generator
SAG-Data
Exporter
SAG-Generator
XML DTD
& Doc s
Relational
Sc hem a &
Data
Conclusions



Sangam: Flexible, extensible, and re-usable
transformation modeling framework
Key contributions:
 Concept of Sangam
 Cross Algebra
 An algebra for modeling linear transformations
 Composition techniques to deal with finer
granularity
 Evaluation techniques
Future work
 Modeling the data model layer
 Optimization of evaluation strategies
 Non-linear transformations