* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Slides
Survey
Document related concepts
Microsoft Access wikipedia , lookup
Entity–attribute–value model wikipedia , lookup
Oracle Database wikipedia , lookup
Serializability wikipedia , lookup
Relational algebra wikipedia , lookup
Ingres (database) wikipedia , lookup
Extensible Storage Engine wikipedia , lookup
Functional Database Model wikipedia , lookup
Microsoft SQL Server wikipedia , lookup
Microsoft Jet Database Engine wikipedia , lookup
Open Database Connectivity wikipedia , lookup
Concurrency control wikipedia , lookup
ContactPoint wikipedia , lookup
Clusterpoint wikipedia , lookup
Transcript
Interoperability for Provenance-aware Databases using PROV and JSON Xing Niu Illinois Institute of Technology [email protected] Raghav Kapoor, Boris GlavicDieter Gawlick, Zhen Hua Liu, Vasudha Krishnaswamy Illinois Institute of Technology Oracle Corporation Venkatesh Radhakrishnan Facebook Outline ① ② ③ ④ ⑤ ⑥ Introduction Related work Overview Export and Import Experimental Results Conclusions and Future Work Introduction • The PROV standards A standardized, extensible representation of provenance graphs Exchange of provenance information between systems • Provenance-aware DBMS Computing the provenance of database operations E.g., Perm[1], GProM [2], DBNotes[3], Orchestra[4], LogicBlox[5] 3 [1] B. Glavic, R. J. Miller, and G. Alonso. Using SQL for Efficient Generation and Querying of Provenance Information. In In Search of Elegance in the Theory and Practice of Computation, pages 291–320. Springer, 2013.. [2] YB. Arab, D. Gawlick, V. Radhakrishnan, H. Guo, and B. Glavic. A generic provenance middleware for database queries, updates, and transactions. In TaPP, 2014. [3] D. Bhagwat, L. Chiticariu, W.-C. Tan, and G. Vijayvargiya. An Annotation Management System for Relational Databases. VLDB Journal, 14(4):373–396, 2005. [4] G. Karvounarakis, T. J. Green, Z. G. Ives, and V. Tannen. Collaborative data sharing via update exchange and provenance. TODS, 38(3):19, 2013. [5] Huang, S., Green, T., Loo, B.: Datalog and emerging applications: an interactive tutorial. In: SIGMOD, pp. 1213–1216 (2011) Introduction • Example: extracting demographic information from tweets 4 Introduction • Problem: No relational database system supports tracking of database provenance as well as import and export of provenance in PROV Not capable of exporting provenance into standardized formats • E.g., GProM: Essentially produces wasDerivedFrom edges • Between the output tuples of a query Q and its inputs. However, not available as PROV graphs • No way to track the derivation back to non-database entities 5 Introduction • GProM System Computes provenance for database operations • Queries, updates, transactions Using SQL language extensions • e.g., PROVENANCE OF (SELECT ...) 6 Introduction • Example of GProM in action The result of PROVENANCE OF for query Q Each tuple in this result represents one wasDerivedFrom assertion • E.g., tuple to1 was derived from tuple t1 7 Introduction • Goal: make databases interoperable with other provenance systems • Approach: Export and import of provenance • PROV-JSON Propagation of imported provenance Implemented in GProM using SQL 8 Outline ① ② ③ ④ ⑤ ⑥ Introduction Related work Overview Export and Import Experimental Results Conclusion and future work Related Work • How to integrate provenance graphs by identifying common elements? [6] • Address interoperability problem between databases and other provenance-aware systems through – Common model for both types of provenance [7][8][9] – Monitoring database access to link database provenance with other provenance systems [10][11] [6] A. Gehani and D. Tariq. Provenance integration. In TaPP, 2014. [7] U. Acar, P. Buneman, J. Cheney, J. van den Bussche, N. Kwasnikowska, and S. Vansummeren. A graph model of data and workflow provenance. In TaPP, 2010. [8] Y. Amsterdamer, S. Davidson, D. Deutch, T. Milo, J. Stoyanovich, and V. Tannen. Putting Lipstick on Pig: Enabling Database-style Workflow Provenance. PVLDB, 5(4):346–357, 2011. [9] D. Deutch, Y. Moskovitch, and V. Tannen. A provenance framework for data-dependent process analysis. PVLDB, 7(6), 2014. [10] F. Chirigati and J. Freire. Towards integrating workflow and database provenance. In IPAW, pages 11–23, 2012. [11] Q. Pham, T. Malik, B. Glavic, and I. Foster. LDV: Light-weight Database Virtualization. In ICDE, pages 1179–1190, 2015. 10 Outline ① ② ③ ④ ⑤ ⑥ Introduction Related works Overview Export and Import Experimental Results Conclusion and future work Overview • We introduce techniques for exporting database provenance as PROV documents • Importing PROV graphs alongside data • Linking outputs of SQL operations to imported provenance for their inputs – Implementation in GProM offloads generation of PROV documents to backend database • SQL and string concatenation 12 Outline ① ② ③ ④ ⑤ ⑥ Introduction Related works Overview Export and Import Experimental Results Conclusion and future work Export and Import • Export – Added TRANSLATE AS clause • e.g., PROVENANCE OF (SELECT ...) TRANSLATE AS … – Construct PROV-JSON document from database provenance ① Running several projections over the provenance computation – E.g., ‘”_:wgb\(’ || F0.STATE || ‘|’ || F0.”AVG(AGE)” || ‘\)’… ② Uses aggregation to concatenate all snippets of a certain type – E.g., entity nodes, wasGeneratedBy edges, allUsed edges ③ Uses string concatenation to create final document 14 Export and Import • Example: part of the final PROV document Red dotted lines in DB 15 Export and Import • Import Import PROV for an existing relation Provide a language construct IMPORT PROV FOR ... Import available PROV graphs for imported tuples and store them alongside the data Add three columns to each table to store imported provenance • prov doc: store a PROV-JSON snippet representing its provenance • Prov_eid: indicates which of the entities in this snippet represents the imported tuple • Prov_time: stores a timestamp as of the time when the tuple was imported 16 Export and Import • Import:example Relation user with imported provenance Attribute value d is the previous PROV graph without database activities and entities 17 Export and Import • Using Imported Provenance During Export Include the imported provenance as bundles in the generated PROV graph • Bundles [13] enable nesting of PROV graphs within PROV graphs, treating a nested graph as a new entity. Connect the entities representing input tuples in the imported provenance to the query activity and output tuple entities [13] P. Missier, K. Belhajjame, and J. Cheney. The W3C PROV family of specifications for modelling provenance metadata. In EDBT, pages 773–776, 2013. 18 Export and Import • Example of Bundles: 19 Export and Import • Handling Updates If a tuple is modified, that should be reflected when provenance is exported • E.g., by running an SQL UPDATE statement • Example Assume the user has run an update to correct tuple t1’s age value (setting age to 70) before running the query 20 Export and Import • Challenge How to track the provenance of updates under transactional semantics • Solution GProM using the novel concept of reenactment queries • User can request the provenance of an past update, transaction, or set of updates executed within a given time interval • Construct PROV document using provenance for updates computed on-the-fly 21 Outline ① ② ③ ④ ⑤ ⑥ Introduction Related works Overview Export and Import Experimental Results Conclusion and future work Experimental Results • TPC-H [14] benchmark datasets Scale factor from 0.01 to 10 (10MB up to 10GB size) • Run on a machine with 2 x AMD Opteron 3.3Ghz Processors 128GB RAM 4 x 1 TB 7.2K RPM disks configured in RAID 5 • Queries Provenance of a three way join between relations customer, order, and nation With additional selection conditions to control selectivity (and, thus, the size of the exported PROV-JSON document). [14] TPC. TPC-H Benchmark Specification, 2009. 23 Experimental Results 1 GB 10 GB 24 Outline ① ② ③ ④ ⑤ ⑥ Introduction Related works Overview Export and Import Experimental Results Conclusions and Future Work Conclusions and Future Work Conclusions • Integrated import and export of provenance represented as PROV-JSON into/from provenance-aware databases • Construct PROV graphs on-the-fly using SQL • Connect database provenance to imported PROV data Future Work • Full implementation for updates • Automatic storage management (e.g., deduplication) for imported provenance • Automatic cross-referencing 26 Questions • My Webpage – http://www.cs.iit.edu/~dbgroup/people/xniu.php • Our Group’s Webpage – http://cs.iit.edu/~dbgroup/research/index.html • GProM – http://www.cs.iit.edu/~dbgroup/research/gprom.ph p 27 Others • Provenance querying • Provenance for JSON 28