Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Predictive analytics wikipedia , lookup
Data center wikipedia , lookup
Entity–attribute–value model wikipedia , lookup
Data analysis wikipedia , lookup
Information privacy law wikipedia , lookup
Forecasting wikipedia , lookup
3D optical data storage wikipedia , lookup
Business intelligence wikipedia , lookup
Relational model wikipedia , lookup
Data vault modeling wikipedia , lookup
Sangam: A Transformation Modeling Framework Kajal T. Claypool (U Mass Lowell) and Elke A. Rundensteiner (WPI) The Era of Electronic Information Age of electronic information Data exists in many different formats Different data models Different schemas Users need to Publish data in many formats Integrate and transform data Query and expect results in common format Underlying problem Need to express mapping of data from one format to another Need to perform transformation of data based on expressed mappings. Schema Translation: State of the Art Naïve approach [Zhang01,Shanmugasundram99] Write specific programs to translate data from one format to another Examples: Algorithms: translate XML documents into relational data [zhang01,shanmugasundram99] Latex2html: convert latex into HTML documents Schema Translation: State of Art Matching approach [milo98] Automatically discover the semantic correspondences between two schemas Generate translations based on discovered matches Modeling approach [bernstein00,atzeni96] Transform local schema into common data model Translation language to express mappings between schemas in middle layer The Sangam Framework Goals: Flexible, extensible, and re-usable Allow users to: Explicitly model translations between schemas Compose translations from an existing library of modeled translation patterns Choose from a library of translation operators Generate translation model from based on schema match process For all modeled translations: transform the data based on translation Overview of Sangam Framework Legend: System Input Pattern Interface User Input System generated output Schema S1 Data D1 Schema S2 Data D2 Matcher Matches Transformation Framework ToolSet TransformDisplayed to ation Patterns User Transformation Model Evaluator Transformed Schema Transformed Data User feedback T r a n I n t e r f a c e Outline Sangam graphs Cross algebra operators Composition techniques Cross algebra graphs Execution strategies Architecture Conclusions Sangam Graphs Sangam 1. Common data model: Sangam graph model 2. Translation language Algebra-based Sangam graph Import RDB Cross Algebra for translation Export XML Requirements for a Common Data Model Graph-based Common denominator for most data models Expressiveness Represent schemas from different data models Fundamental constraints Represent constraints such as quantifier, order and key constraints Existing data models not completely suitable Relational, and OO cannot represent order in clean manner XML (older spec) can not represent key constraints Sangam Graph Model Satisfies requirements Graph based Based on SIGs[Miller93] Can model schemas from different data models Can represent quantifier, order, key and foreign key constraints Graph Nodes represent entities Eg. Relation, attribute, element Edge relationships between them Eg. Containment relationship between relation and attribute Example: Sangam Graph <!ELEMENT item (location, mailbox, name)> <!ATTLIST item id ID #REQUIRED featured CDATA #IMPLIED> <!ELEMENT location (#PCDATA)> <!ELEMENT mailbox (mail*)> <!ATTLIST mailbox id CDATA> <!ELEMENT mail (from, to, date)> <!ELEMENT from (#PCDATA)> <!ELEMENT to (#PCDATA)> <!ELEMENT date (#PCDATA)> <!ELEMENT name (firstName, lastName)> <!ELEMENT firstName (#PCDATA)> <!ELEMENT lastName (#PCDATA)> Cross Algebra Requirements for a transformation language Node and edge manipulations Minimal granularity Eg: Relation has name and attributes Allow composition Unique contribution: algebra-based translation language Translate from one Sangam graph to another Four Operators: Represent core set of graph linear transformations [GBook00] Can be composed to formulate more complex operations such as a join operation – not our focus Cross Algebra Operators cross, connect, smooth and subdivide mailbox’ mailbox featured G Cross Item’ 1 Item 1 [1-2] mailbox 2 [0-n] mail G featured’ G’ > 1 1 mail G Connec t G’ Person’ Person 1 [1-1] [2-n] [0-2n] < Subdivide Smooth Mail’ G’ mail’ Mail G 1 Mailbox’ [2-n] 1 Mail’ G’ Composition of Operators Context Dependency Output = union of output of all operators Derivation Output = output of root operator Cross Algebra Graphs A’ A e1 op6 CT-3 AE DT-1 B e2 e1’ C e3 AC D e4 E G X op1 A op5 op4 CT-1 CE X C op2 CT-2 X E op3 Context dependenc y edge Derivation E’ G’ Evaluating a Cross Algebra Graph Function EvaluateCAT (input: Operator op, Sangam Graph G, output: Sangam Graph G’) { if (!op.hasChildren ()) G’ p.evaluate (G, G’) op.markDone () out G’ // cached local output return G’ while (op.hasChildren()) { elseif (e:<op, opC> == context dependency operator opC op.getNextChild () G_local EvaluateCAT (opC, G, G’) if (e:<op, opC> == derivation) G’_local op.evaluate (G, G’) G_local EvaluateCAT (opC, G, G’) G’ G_local U G’_local G’ op.evaluate (G_local, G’) op.markDone () op.markDone () out G’_local // local cached output out G’ // cached local output return G’ return G’ Architecture CAG-Builder SAG G & extent SAG CAG SAG G’ & extent Extent CAG-Evaluator Sc hema SAG-Sc hema Builder SAG-Data Translator SAG-Loader XML DTD & Doc s Relational Sc hem a & Data Data SAG-Sc hema Generator SAG-Data Exporter SAG-Generator XML DTD & Doc s Relational Sc hem a & Data Conclusions Sangam: Flexible, extensible, and re-usable transformation modeling framework Key contributions: Concept of Sangam Cross Algebra An algebra for modeling linear transformations Composition techniques to deal with finer granularity Evaluation techniques Future work Modeling the data model layer Optimization of evaluation strategies Non-linear transformations