* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Document
Survey
Document related concepts
Data analysis wikipedia , lookup
Information privacy law wikipedia , lookup
Business intelligence wikipedia , lookup
Relational algebra wikipedia , lookup
Microsoft SQL Server wikipedia , lookup
Open data in the United Kingdom wikipedia , lookup
Data vault modeling wikipedia , lookup
Operational transformation wikipedia , lookup
Clusterpoint wikipedia , lookup
Entity–attribute–value model wikipedia , lookup
Extensible Storage Engine wikipedia , lookup
Versant Object Database wikipedia , lookup
Transcript
Bx Tutorial, Database Flavor: Updatable or Invertible Mappings James F. Terwilliger Microsoft Corporation How Does the DB Field Use Mappings? DB DB DB DB DB DB DB DB DB DB DB DB’ DB DB App Model Over Store Data Warehouse, Schema Versioning Federated System DB Exchanged Data Between Applications Model (Metadata) Management Capabilities? S Model? Virtual? M Language? T Model? Virtual? In General, Two Approaches S S M M T T The View Update Problem S Concrete Database M T Application Model, External Schema • Early work abstracted away the exact language of M, focusing on what it means to be an updatable view • As work progressed, focus shifted somewhat to a choice of M – SQL – and deciding when an update policy can be computed The View Update Problem Query Query Update S Relational (Concrete) M T SQL Let’s use the declarative query tool we know and love – SQL – as a way to express views! (What could possibly go wrong!) Relational (Tables only) (Virtual) View Update References • Francois Bançilhon and Nicolas Spyratos. Update semantics of relational views. TODS, 6(4):557–575, December 1981. • Umeshwar Dayal and Philip A. Bernstein. On the correct translation of update operations on relational views. TODS, 7(3):381–416, September 1982. • G. Gottlob, P. Paolini, and R. Zicari. Properties and update semantics of consistent views. TODS, 13(4):486–524, 1988. • Jens Lechtenbörger. The impact of the constant complement approach towards view updating. PODS, 49–55, 2003. View Updates: The Basics View definition V Update statement f u u u(V) D f u(D) (Unique) Transformed update against the physical database Update translations available for some syntactic restrictions on f Constant Complement (Semantics of View Updates) D V V’ D • Updates leave the view complement unchanged • Complement may not be unique (must be chosen to determine update semantics) Great! Where Can I Get It? • Most database vendors do not implement past the SQL92 standard – View must have: • • • • No set operators No distinct, no grouping No joins or multiple FROM items No smoking, talking, or chewing gum – Basically, only simple select/project queries View Update Limitations (Among Many) • Large queries are hard to debug (and read!) • Given a large query, how to report to the user why a query is not updatable? • DB Tables, not DB DB • Syntactic restrictions are very strict • It is assumed that a query language can make a good view expression language Outline • A brief look at updatable views • Transformation-based approaches – PRISM/PRIMA – Guava/Channels • Holistic approaches – Object-relational mappings – Data exchange and model management • Whole schemas on both ends • Interesting and notable mappings • Support for interesting update/bidirectional scenarios Slides adapted with permission from Carlo Curino PRISM/PRIMA (PANTA RHEI) PRISM S Application Model Version N M T Application Model Version N+1 PRISM Query Update S Relational (Virtual) M SMOs (Schema Modification Operators) T Relational (Concrete) Schema Evolution: common practice • Starting point: • a Schema S1, • a database DB1 under S1, and • a set of queries Q1 formulated over S1 Schema Evolution: common practice • Evolution in the real world: • The DBA defines an SQL DDL script modifying S1 into S2 • The DBA defines an SQL DML script migrating data from DB1 to DB2 • Queries in Q1 might fail, the DBA adapts them manually: Q2 = Q1’ + Q2_new (new queries added on S2) Schema Evolution: common practice • Evolution in the real world: • The DBA defines an SQL DDL script modifying S2 into S3 • The DBA defines an SQL DML script migrating data from DB2 to DB3 • Queries in Q2 might fail, the DBA adapts them manually as in Q3 = Q2’ + Q3_new (new queries added on S3) Schema Evolution: common practice • DB Administrator (DBA) nightmares: • Data Migration: Data loss, redundancy, efficiency of the migration, efficiency of the new design • Impact on Queries and applications • What is the real impact of schema evolution? [iceis2008] Schema Evolution: Ideal World • Evolution in an ideal world: • Evolution design is assisted and predictable • Data migration scripts are generated automatically • Legacy Queries (and updates, views, integrity constraints,…) are automatically adapted to fit the new schema Prism: from 30,000 feet above Desiderata PRISM proposed solutions Support Evolution Design Schema Modification Operators (SMO) (a language to express changes to the schema) Increase Predictability of evolution SMO static analysis: foresee impact on schema, data and queries Automate data migration Generate SQL scripts from SMO sequences Automate query support Derive from SMOs logical mapping between schema versions Automatic query rewriting (more recently work integrity constraints propagation and updates rewriting) Schema Modification Operators • SMOs: • Atomic changes • SQL-inspired syntax • Operates on both schema and data • Practical completeness (coverage of available evolution scenarios) Data Migration • SMOs to SQL translation: • Each SMO has a clear semantics that can be represented by a set of SQL statements (DML + DDL) • Optimization issues (single/multi SMO) DECOMPOSE TABLE R INTO S(x,y), T(x,z) SMO to SQL translation RENAME TABLE R TO S; CREATE TABLE T(x varchar,z varchar); INSERT INTO T(SELECT x,z FROM S); ALTER TABLE S DROP COLUMN z; Logical Mapping • SMOs to logical mapping translation: • Language: Disjunctive Embedded Dependencies (DED) • DED is a subset of First Order Logic with expressions in the following form: • Used to capture the relationships between (instances of) consecutive schema versions JOIN TABLE R, S INTO T WHERE cond SMO to DED translation Query Rewriting Semantics • Data Migration: • d2 is equivalent to d1 migrated by M and modified by U2 • d2 = M(d1) ± U2 • Query Answering: • Goal: answer q1 on d2 • q1(M-1(d2)) • Query Rewriting Semantics: • Executing the rewritten query q1’ is equivalent to migrating d2 back to schema S1, by applying the M-1, and executing q1 • q1’(d2) = q1(M-1(d2)) Inverting SMOs • The Invertibility problem (How do we get M-1?): • DEDs are not invertible! • We deal with the inversion at the SMO-level (easier but not free) • Inverting SMOs: • Not every SMO is perfectly invertible • in this case we use SMO quasi-inverse1 (intuitively “the best you can do”-inverse) • Not every SMO has a unique inverse • in this case the DBA must disambiguate these (rare) cases 1 we apply the notion of quasi-inverse by Fagin, Kolaitis, Popa, Tan, in [fkpt2006, fkpt2007b] SMO Invertibility: tricky cases • Quasi-Inverse example: JOIN TABLE R, S INTO T WHERE cond; a valid (and reasonable) quasi-inverse is: DECOMPOSE TABLE T INTO R(x,y), S(x,z); • Multiple Inverses example: COPY TABLE R INTO S; the system provides the following candidates: 1. DROP TABLE S; 2. DROP TABLE R; RENAME TABLE S INTO R; 3. MERGE R,S INTO R; Query Rewriting • The Rewriting Process: • From the Inverse SMOs we derive the inverse DEDs mapping M-1 • Rewrite queries by chase&backchase (using M-1) [popa2000, deutsch2006] • MARS [mars2003] optimized chase&backchase implementation Q1: SELECT * FROM R MARS Query Rewriting Q1’: SELECT * FROM S,T WHERE S.x = T.x References Panta Rhei Website • schemaevolution.org PRISM • Carlo Curino, Hyun Jin Moon, Carlo Zaniolo: Graceful database schema evolution: the PRISM workbench. PVLDB 1(1): 761-772 (2008) • Carlo Curino, Hyun Jin Moon, Alin Deutsch, Carlo Zaniolo: Update Rewriting and Integrity Constraint Maintenance in a Schema Evolution Support System: PRISM++. PVLDB 4(2): 117-128 (2010) Slides used with permission by James Terwilliger… hey wait, that’s me! GUAVA, CHANNELS Channels S Concrete Database M T Application Model Channels S Relational (Concrete) M “Channel” Query Update Schema Δ T Relational (Virtual) Example: Unpivot Employee Employee’ ID Name Age Gender 1 Joe male 2 Sue … … 31 … … Exp ID Attribute Value 1 1 Name Joe 4 1 Gender male … 1 Exp 1 2 Name Sue 2 Age 31 2 Exp 4 Schema: eliminate Name, Age, … columns Schema: add Attribute, Value Columns Instance: for each row, produce one row for each non-null attribute value, with the attribute name placed in the Attribute column and the value in the Value column. … … … 33 The unpivot CT – operationally input schema current input instance output schema with output instance input DB input DB insert into Employee delete from Employee update Employee add column to Employee add domain element to Exp domain query that uses Employee insert into Employee’ delete from Employee’ update Employee’ add column to Employee’ add domain element to Exp domain query that uses Employee’ input DML/DDL/queries CT output DML/DDL/queries unpivot the Employee table where ID is the key for the table (unpivot all remaining attributes) DDL/DML operations accepted by a channel VLDB 2010 35 CTs are composable into a channel (CTs simply pass statements on untouched that they don’t need to change) Horiz. Merge Natural Schema Query, Insert, Update, Delete, Create, Alter, Drop statements Vert. Part. Apply Horiz. Merge Unpivot Unpivot Native Schema Query, Insert, Update, Delete, Create, Alter, Drop Error, Loop36 statements List of CTs that we have defined/implemented • VPartition – vertically partition into two tables with a 1:1 relationship • VMerge – vertically merge two tables with a 1:1 relationship • HPartition – horizontally partition based on the value in a column • HMerge – horizontally merge, and introduce a provenance column • Pivot/Unpivot – move data to schema (pivot) or schema to data (unpivot) (Move from/to generic/triple form to regular form.) • Apply – apply an invertible function to data in each tuple • Adorn – add columns, specify data to be held in the new columns, such as environment data • Audit – add start/end timestamps, and ensure that data is deprecated rather than deleted HMerge (operationally) working on a query: Replace P_Client to Person appropriately P_Client: FName LName Age Bob Smith 19 Person: … … … FName LName P_Staff: HMerge FName LName Cert Ted Jones X … … … T Age 19 Bob Smith P_Client Ted Jones P_Staff … … … Cert X … … πFName, Lname, Age σT=P_Client Person Query Q = P_Client (SELECT * FROM P_Client) HMerge (operationally) working on DML Insert statement P_Client: FName LName Age Bob Smith 19 Person: … … … FName LName P_Staff: HMerge FName LName Cert Ted Jones X Gail Brown X … … … I (P_Staff, {FName, LName, Cert}, {(Gail, Brown, X)}) T Age 19 Cert Bob Smith P_Client Ted Jones P_Staff … Gail … Brown … P_Staff … … X … … … … … X I (Person, {FName, LName, T, Age, Cert}, {Gail, Brown, P_Staff, null, X}) HMerge (operationally) working on DDL: Create table (P_Admin) add column and add domain element statements P_Client: FName LName Age Bob Smith 19 Person: … … … FName LName LName FName P_Staff: HMerge FName LName Cert Ted Jones X … … … P_Admin: FName LName Pay … … … TT Bob Bob Smith Smith P_Client P_Client Ted Ted Jones Jones P_Staff P_Staff …… … … … … Add “P_Admin” to the domain of the column Age Cert Cert Age 19 19 XX … … … … Add “Pay” column Pay Formal definition of a CT • A channel transformation is a 4-tuple (S, I, Q, U) – – – – A schema transformation S An instance transformation I A query transformation Q An update transformation U • S and I define semantics – Never implemented, but define what the CT means • Q and U define operation – Defines the translation of statements against the natural schema to statements against the native schema Query Correctness Instance transformation for the channel transform Query against the virtual database Query transformation for the channel transform d I d q q q = Q(s, q) Q Transformed query against physical database The query transformation is correct with respect to the instance transformation Update Correctness for Schema Schema transformation for the channel transform s Update statement (DML or DDL) S u u u(s) s S u(s) Update transformation for the channel transform u = U(s, u) Transformed update against the physical database The update transformation is correct with respect to the schema transformation Update Correctness for Instance S s Update statement against table “t” u u u(s) qt Query: “SELECT * FROM t” s PutGet u(s) Q qt u = U(s, u) qt = Q(u(s), qt) The update transformation is correct with respect to the instance transformation References Overall Architecture (Guava): • James F. Terwilliger, Lois M. L. Delcambre, Judith Logan: Querying through a user interface. Data Knowl. Eng. 63(3): 774-794 (2007) Channels: • James F. Terwilliger, Lois M. L. Delcambre, David Maier, Jeremy Steinhauer, Scott Britell: Updatable and Evolvable Transforms for Virtual Databases. PVLDB 3(1): 309-319 (2010) OBJECT-RELATIONAL MAPPINGS Object-Relational Mappings: Overview • Applications written in an object-oriented language have object-oriented data tiers • Persistence is a relational database • “Impedance mismatch” – Map object constructs to relational constructs Object-Relational Mappings M S Concrete Database T Application Model (Arrow Direction Could Go Either Way) Object-Relational Mappings S Relational (Concrete) M • Specification • Relational equivalences • Mapping strategies Query Update (Schema Δ) T ObjectOriented (Virtual) An O-R Mapping Is… • … generally an operational specification rather than a declarative query or set of queries • … tailored more to the purpose of mapping inheritance and relationships to relations rather than a general-purpose mapping Mapping Patterns (TPT) Mapped to (TPC) (TPH) Mapping Patterns: TPH Sub-Categories Name (string) Salary (integer) Name (string) Office (integer) Fully disjoint Reuse by column Reuse by domain Name1 (string) Name2 (string) Salary (integer) Office (integer) Name (string) Salary (integer) Office (integer) String1 (string) Integer1 (integer) Clear column provenance Clear name reuse Maximum data density Mapping Patterns: Etc. Horizontal Partitioning Vertical Partitioning Association Join Tables 0..1 Origin = ‘A’ Origin = ‘B’ OR ? * ORM Product Space • • • • • Ruby on Rails Hibernate/NHibernate SQLAlchemy Microsoft Entity Framework Oracle TopLink • Some major tradeoffs: – Expressiveness – Specification style Hibernate Example <hibernate-mapping> <class name="eg.hibernate.mapping.dataobject.Person" table="TB_PERSON" polymorphism="implicit"> <id name="id" column="ID"> Client Class Store Table <generator class="assigned"/> </id> <set name="rights" lazy="false"> <key column="REF_PERSON_ID"/> <one-to-many class="eg.hibernate.mapping.dataobject.Right" /> </set> <joined-subclass name="eg.hibernate.mapping.dataobject.Individual" TPT-Style Mapping table="TB_INDIVIDUAL"> <key column="id"/> <property name="firstName" column="FIRST_NAME" type="java.lang.String" /> <property name="lastName" column="LAST_NAME" type="java.lang.String" /> </joined-subclass> <joined-subclass name="eg.hibernate.mapping.dataobject.Corporation" TPT-Style Mapping table="TB_CORPORATION"> <key column="id"/> <property name="name" column="NAME" type="string" /> <property name="registrationNumber" column="REGISTRATION_NUMBER" type="string" /> </joined-subclass> </class> XML fragments almost correspond to </hibernate-mapping> individual O-to-R transformations Entity Framework (EF): A Brief Overview Client-side (Objects): Mapping specified at schema level Classes Mapping compiled to views Object Queries (LINQ) Object Updates Store side (Relations): Q1 = Q1’ Q2 = Q2’ Q3 = Q3’ … Tables (selectproject only) Query view VQ Update view VU Preserve fidelity of the source data Merge view VM 56 EF Simple Example Client-side (Classes): Person: id name title πid, name Person = πid, name Person1 πid, title Person = πid, title Person2 Store side (Relations): Person1( id integer PRIMARY KEY, name varchar(50), ) Person2( id integer PRIMARY KEY, title varchar(50), details varchar(2000) ) Person = πid, name, title Person1 ⋈ Person2 57 Entity Framework: Major Results • Validation procedure ensures that a collection of mapping fragment roundtrips PutGet – Each client state maps to a valid state – Client state travel to store and back is invariant – Guarantees query and update safety • Mapping compilation procedure expressive enough for common mapping scenarios, and many uncommon ones – All of the mapping schemes previously noted ORM Challenges • Generally not extensible – Limited mapping expressiveness – Cannot add new mapping capabilities • EF mappings must be validated – Not always easy to write correct mappings – Error detection and correction is hard References ORM overview: • Scott Ambler: http://www.agiledata.org/essays/mappingObjects.html Entity Framework formalism: • Sergey Melnik, Atul Adya, Philip A. Bernstein: Compiling mappings to bridge applications and databases. ACM Trans. Database Syst. 33(4): (2008) Entity Framework overview: • Atul Adya, José A. Blakeley, Sergey Melnik, S. Muralidhar: Anatomy of the ADO.NET entity framework. SIGMOD Conference 2007: 877-888 References Hibernate: hibernate.org/ Oracle TopLink: oracle.com/technetwork/middleware/toplink/overview/index.html SQLAlchemy: sqlalchemy.org/ Entity Framework: msdn.microsoft.com/en-us/library/bb399572.aspx Ruby on Rails: http://rubyonrails.org/ Slides adapted with permission from Jorge Perez DATA EXCHANGE Data Exchange S Concrete Instance M T Concrete Instance Data Exchange -1 M S Relational (Concrete) M Query T FO Predicate Calculus (source-to-target tuple generating dependencies) Relational (Concrete) References Another Dagstuhl Session! All About Data Exchange! • http://www.tks.informatik.uni-frankfurt.de/events/deis10/topics Data Exchange Tutorial • Marcelo Arenas, Pablo Barceló, Leonid Libkin, Filip Murlak: Relational and XML Data Exchange Morgan & Claypool Publishers 2010 • Includes a discussion of composition and inversion of mappings Foundations of Schema Mapping Management • Marcelo Arenas, Jorge Pérez, Juan L. Reutter, Cristian Riveros: Foundations of schema mapping management. PODS 2010: 227-238 Details of Maximal Recovery • Marcelo Arenas, Jorge Pérez, Cristian Riveros: The recovery of a schema mapping: Bringing exchanged data back. ACM Trans. Database Syst. 34(4): (2009)