* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
Download sangam-dasfaa03
Predictive analytics wikipedia , lookup
Data center wikipedia , lookup
Entity–attribute–value model wikipedia , lookup
Data analysis wikipedia , lookup
Information privacy law wikipedia , lookup
Forecasting wikipedia , lookup
3D optical data storage wikipedia , lookup
Business intelligence wikipedia , lookup
Relational model wikipedia , lookup
Data vault modeling wikipedia , lookup
Sangam: A Transformation
Modeling Framework
Kajal T. Claypool (U Mass Lowell)
and
Elke A. Rundensteiner (WPI)
The Era of Electronic Information
Age of electronic information
Data exists in many different formats
Different data models
Different schemas
Users need to
Publish data in many formats
Integrate and transform data
Query and expect results in common format
Underlying problem
Need to express mapping of data from one format
to another
Need to perform transformation of data based on
expressed mappings.
Schema Translation: State of the Art
Naïve approach [Zhang01,Shanmugasundram99]
Write specific programs to translate data
from one format to another
Examples:
Algorithms: translate XML documents into
relational data [zhang01,shanmugasundram99]
Latex2html: convert latex into HTML
documents
Schema Translation: State of Art
Matching approach [milo98]
Automatically discover the semantic
correspondences between two schemas
Generate translations based on discovered
matches
Modeling approach [bernstein00,atzeni96]
Transform local schema into common data
model
Translation language to express mappings
between schemas in middle layer
The Sangam Framework
Goals: Flexible, extensible, and re-usable
Allow users to:
Explicitly model translations between schemas
Compose translations from an existing library
of modeled translation patterns
Choose from a library of translation operators
Generate translation model from based on
schema match process
For all modeled translations: transform the
data based on translation
Overview of Sangam Framework
Legend:
System Input
Pattern Interface
User Input
System generated
output
Schema
S1
Data
D1
Schema
S2
Data
D2
Matcher
Matches
Transformation
Framework
ToolSet
TransformDisplayed to
ation
Patterns
User
Transformation
Model
Evaluator
Transformed
Schema
Transformed
Data
User
feedback
T
r
a
n
I
n
t
e
r
f
a
c
e
Outline
Sangam graphs
Cross algebra operators
Composition techniques
Cross algebra graphs
Execution strategies
Architecture
Conclusions
Sangam Graphs
Sangam
1. Common data model:
Sangam graph model
2. Translation language
Algebra-based
Sangam
graph
Import
RDB
Cross
Algebra for
translation
Export
XML
Requirements for a Common
Data Model
Graph-based
Common denominator for most data models
Expressiveness
Represent schemas from different data models
Fundamental constraints
Represent constraints such as quantifier, order and
key constraints
Existing data models not completely suitable
Relational, and OO cannot represent order in clean
manner
XML (older spec) can not represent key constraints
Sangam Graph Model
Satisfies requirements
Graph based
Based on SIGs[Miller93]
Can model schemas from different data models
Can represent quantifier, order, key and
foreign key constraints
Graph
Nodes represent entities
Eg. Relation, attribute, element
Edge relationships between them
Eg. Containment relationship between
relation and attribute
Example: Sangam Graph
<!ELEMENT item (location, mailbox, name)>
<!ATTLIST item id ID #REQUIRED
featured CDATA #IMPLIED>
<!ELEMENT location (#PCDATA)>
<!ELEMENT mailbox (mail*)>
<!ATTLIST mailbox id CDATA>
<!ELEMENT mail (from, to, date)>
<!ELEMENT from (#PCDATA)>
<!ELEMENT to (#PCDATA)>
<!ELEMENT date (#PCDATA)>
<!ELEMENT name (firstName, lastName)>
<!ELEMENT firstName (#PCDATA)>
<!ELEMENT lastName (#PCDATA)>
Cross Algebra
Requirements for a transformation language
Node and edge manipulations
Minimal granularity
Eg: Relation has name and attributes
Allow composition
Unique contribution: algebra-based translation
language
Translate from one Sangam graph to another
Four Operators:
Represent core set of graph linear
transformations [GBook00]
Can be composed to formulate more complex
operations such as a join operation – not our focus
Cross Algebra Operators
cross, connect, smooth and subdivide
mailbox’
mailbox
featured
G
Cross
Item’
1
Item
1
[1-2]
mailbox
2
[0-n]
mail
G
featured’
G’
>
1
1
mail
G
Connec t
G’
Person’
Person
1
[1-1]
[2-n]
[0-2n]
<
Subdivide
Smooth
Mail’
G’
mail’
Mail
G
1
Mailbox’
[2-n]
1
Mail’
G’
Composition of Operators
Context Dependency
Output = union of
output of all
operators
Derivation
Output = output
of root operator
Cross Algebra Graphs
A’
A
e1
op6
CT-3
AE
DT-1
B
e2
e1’
C
e3
AC
D
e4
E
G
X
op1
A
op5
op4
CT-1
CE
X
C
op2
CT-2
X
E
op3
Context dependenc y edge
Derivation
E’
G’
Evaluating a Cross Algebra Graph
Function EvaluateCAT (input: Operator op,
Sangam Graph G, output: Sangam Graph G’)
{
if (!op.hasChildren ())
G’ p.evaluate (G, G’)
op.markDone ()
out G’ // cached local output
return G’
while (op.hasChildren()) {
elseif (e:<op, opC> == context dependency
operator opC op.getNextChild ()
G_local EvaluateCAT (opC, G, G’)
if (e:<op, opC> == derivation)
G’_local op.evaluate (G, G’)
G_local EvaluateCAT (opC, G, G’)
G’ G_local U G’_local
G’ op.evaluate (G_local, G’)
op.markDone ()
op.markDone ()
out G’_local // local cached output
out G’ // cached local output
return G’
return G’
Architecture
CAG-Builder
SAG G
& extent
SAG
CAG
SAG G’
& extent
Extent
CAG-Evaluator
Sc hema
SAG-Sc hema
Builder
SAG-Data
Translator
SAG-Loader
XML DTD
& Doc s
Relational
Sc hem a &
Data
Data
SAG-Sc hema
Generator
SAG-Data
Exporter
SAG-Generator
XML DTD
& Doc s
Relational
Sc hem a &
Data
Conclusions
Sangam: Flexible, extensible, and re-usable
transformation modeling framework
Key contributions:
Concept of Sangam
Cross Algebra
An algebra for modeling linear transformations
Composition techniques to deal with finer
granularity
Evaluation techniques
Future work
Modeling the data model layer
Optimization of evaluation strategies
Non-linear transformations