Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Storage Engine for Semantic Web Assertion Storage engine for semantic web has requirements similar to those for ecommerce aplications. Draw upon results and lessons from – R. Agrawal, A. Somani, Y. Xu: Storage and Retrieval of E-Commerce Data. VLDB-2001. Typical E-Commerce Data Characteristics An Experimental E-marketplace for Computer components Nearly 2 Million components More than 2000 leaf-level categories Large number of Attributes (5000) Constantly evolving schema Sparsely populated data (about 50-100 attributes/component) Alternative Physical Representations Horizontal – One N-ary relation Binary – N 2-ary relations Vertical – One 3-ary relation Conventional horizontal representation (n-ary relation) Name Monitor Height Recharge Output playback Smooth scan Progressive Scan PAN DVD-L75 7 inch - Built-in Digital - - - KLH DVD221 - 3.75 - S-Video - - No SONY S-7000 - - - - - - - SONY S-560D - - - - Cinema Sound Yes - … … … … … … … … DB Catalogs do not support thousands of columns (DB2/Oracle limit: 1012 columns) Storage overhead of NULL values Nulls increase the index size and they sort high in DB2 B+ tree index Hard to load/update Schema evolution is expensive Querying is straightforward Binary Representation (N 2-ary relations) Monitor Height Output Name Val Name Val PAN DVD-L75 7 inch KLH DVD221 3.75 Dense representation Manageability is hard because of large number of tables Schema evolution expensive Name Val PAN DVD-L75 Digital KLH DVD221 S-Video Decomposition Storage Model [Copeland et al SIGMOD 85], [Khoshafian et al ICDE 87] Monet: Binary Attribute Tables [Boncz et al VLDB Journal 99] Attribute Approach for storing XML Data [Florescu et al INRIA Tech Report 99] Vertical representation (One 3-ary relation) Oid (object identifier) Key (attribute name) Val (attribute value) Oid Key Val 0 ‘Name’ ‘PAN DVDL75’ 0 ‘Monitor’ ‘7 inch’ 0 ‘Recharge’ ‘Built-in’ 0 ‘Output’ ‘Digital’ 1 ‘Name’ ‘KLH DVD221’ 1 ‘Height’ ‘3.75’ 1 ‘Output’ ‘S-Video’ 1 ‘Progressiv e Scan’ ‘No’ 2 ‘Name’ ‘SONY S-7000’ … … … Objects can have large number of attributes Handles sparseness well Schema evolution is easy Implementation of SchemaSQL [LSS 99] Edge Approach for storing XML Data [FK 99] Querying over Vertical Representation is Complex Simple query on a Horizontal scheme SELECT MONITOR FROM H WHERE OUTPUT=‘Digital’ Becomes quite complex: SELECT v1.Val FROM vtable v1, vtable v2 WHERE v1.Key = ‘Monitor’ AND v2.Key = ‘Output’ AND v2.Val = ‘Digital’ AND v1.Oid = v2.Oid Writing applications becomes much harder. What can we do ? Solution Provide horizontal view of the vertical table Translation layer automatically maps operations on H to operations on V Horizontal view (H) Attr1 … Attr2 Query Mapping Layer Vertical table (V) Oid Key Val Attrk … Transformation Algebra Defined an algebra for transforming expressions over horizontal views into expressions over the vertical representation. Two key operators: – v2h () – h2v () Sample Algebraic Transforms v2h () Operation – Convert from vertical to horizontal k(V) = [Oid(V)] [i=1,k Oid,Val(Key=‘Ai’(V))] h2V () Operation – Convert from horizontal to vertical k(H) = [i=1,k Oid,’Ai’Ai(Ai ‘’(V))] [i=1,k Oid,’Ai’Ai(i=1,k Ai=‘’(V)) Similar operations such as Unfold/Fold and Gather/Scatter exist in SchemaSQL [LSS 99] and [STA 98] respectively Complete transforms in VLDB-2001 Paper From the Algebra to SQL Equivalent SQL transforms for algebraic transforms – Select, Project – Joins (self, two verticals, a horizontal and a vertical) – Cartesian Product – Union, Intersection, Set difference – Aggregation Extend DDL to provide the Horizontal View CREATE HORIZONTAL VIEW hview ON VERTICAL TABLE vtable USING COLUMNS (Attr1, Attr2, … Attrk, …) Alternative Implementation Strategies VerticalSQL – Uses only SQL-92 level capabilities VerticalUDF – Exploits User Defined Functions and Table Functions to provide a direct implementation Binary (hand-coded queries) – 2-ary representation with one relation per attribute (using only SQL-92 transforms) Data Organization Matters: Clustering by Key significantly outperforms by Oid Execution time (seconds) density = 10%, 1000 cols x 20K rows 25 20 VerticalSQL_oid 15 VerticalSQL_key 10 5 0 0.1% 1% Join selectivity Join 5% VerticalSQL comparable to Binary and outperforms Horizontal density = 10% Execution time (seconds) 60 50 40 HorizontalSQL 30 VerticalSQL 20 Binary 10 0 200x100K 400x50K 800x25K 1000x20K Table (#cols x #rows) Projection of 10 columns VerticalUDF is the best approach density = 10% Execution time (seconds) 30 20 VerticalSQL Binary 10 VerticalUDF 0 200x100K 400x50K 800x25K 1000x20K Table (#cols x #rows) Projection of 10 columns Summary Horizontal Vertical (w/ Mapping) Binary (w/ Mapping) Manageability + + - Flexibility - + - Querying + + + Performance - + + Remarks Lessons of this study directly apply to building storage engine for semantics webs Performance of vertical representation can be further improved by: – Enhanced table functions – First class treatment of table functions – Native support for v2h and h2v operations – Partial indices