Download Query Mapping for Enabling e-Business Application

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Storage Engine for Semantic Web
Assertion

Storage engine for semantic web has
requirements similar to those for ecommerce aplications.

Draw upon results and lessons from
– R. Agrawal, A. Somani, Y. Xu: Storage and
Retrieval of E-Commerce Data. VLDB-2001.
Typical E-Commerce Data
Characteristics
An Experimental E-marketplace for
Computer components

Nearly 2 Million components
 More than 2000 leaf-level
categories
 Large number of Attributes (5000)

Constantly evolving schema
Sparsely populated data (about
50-100 attributes/component)

Alternative Physical Representations

Horizontal
– One N-ary relation

Binary
– N 2-ary relations

Vertical
– One 3-ary relation
Conventional horizontal representation
(n-ary relation)
Name
Monitor
Height
Recharge
Output
playback
Smooth scan
Progressive Scan
PAN DVD-L75
7 inch
-
Built-in
Digital
-
-
-
KLH DVD221
-
3.75
-
S-Video
-
-
No
SONY S-7000
-
-
-
-
-
-
-
SONY S-560D
-
-
-
-
Cinema Sound
Yes
-
…
…
…
…
…
…
…
…

DB Catalogs do not support thousands of columns (DB2/Oracle
limit: 1012 columns)
 Storage overhead of NULL values Nulls increase the index size
and they sort high in DB2 B+ tree index
 Hard to load/update
 Schema evolution is expensive
 Querying is straightforward
Binary Representation
(N 2-ary relations)
Monitor

Height
Output
Name
Val
Name
Val
PAN DVD-L75
7 inch
KLH DVD221
3.75
Dense representation
 Manageability is hard
because of large number of
tables
 Schema evolution expensive
Name



Val
PAN DVD-L75
Digital
KLH DVD221
S-Video
Decomposition Storage Model
[Copeland et al SIGMOD 85],
[Khoshafian et al ICDE 87]
Monet: Binary Attribute Tables
[Boncz et al VLDB Journal 99]
Attribute Approach for storing
XML Data [Florescu et al INRIA
Tech Report 99]
Vertical representation
(One 3-ary relation)
Oid (object identifier) Key (attribute name) Val (attribute value)
Oid
Key
Val
0
‘Name’
‘PAN DVDL75’
0
‘Monitor’
‘7 inch’
0
‘Recharge’
‘Built-in’
0
‘Output’
‘Digital’
1
‘Name’
‘KLH DVD221’
1
‘Height’
‘3.75’
1
‘Output’
‘S-Video’
1
‘Progressiv
e Scan’
‘No’
2
‘Name’
‘SONY S-7000’
…
…
…





Objects can have large number of
attributes
Handles sparseness well
Schema evolution is easy
Implementation of SchemaSQL [LSS 99]
Edge Approach for storing XML Data [FK
99]
Querying over Vertical
Representation is Complex

Simple query on a Horizontal scheme
SELECT MONITOR FROM H WHERE OUTPUT=‘Digital’
Becomes quite complex:
SELECT v1.Val
FROM vtable v1, vtable v2
WHERE v1.Key = ‘Monitor’
AND v2.Key = ‘Output’
AND v2.Val = ‘Digital’
AND v1.Oid = v2.Oid
Writing applications becomes much harder. What can we do ?
Solution

Provide horizontal view of the vertical table
 Translation layer automatically maps operations
on H to operations on V
Horizontal
view (H)
Attr1
…
Attr2
Query Mapping Layer
Vertical
table (V)
Oid
Key
Val
Attrk
…
Transformation Algebra

Defined an algebra for transforming
expressions over horizontal views into
expressions over the vertical representation.
 Two key operators:
– v2h ()
– h2v ()
Sample Algebraic Transforms

v2h () Operation – Convert from vertical to horizontal
k(V) = [Oid(V)]  [i=1,k Oid,Val(Key=‘Ai’(V))]

h2V () Operation – Convert from horizontal to vertical
k(H) = [i=1,k Oid,’Ai’Ai(Ai  ‘’(V))] 
[i=1,k Oid,’Ai’Ai(i=1,k Ai=‘’(V))

Similar operations such as Unfold/Fold and Gather/Scatter
exist in SchemaSQL [LSS 99] and [STA 98] respectively

Complete transforms in VLDB-2001 Paper
From the Algebra to SQL

Equivalent SQL transforms for algebraic transforms
– Select, Project
– Joins (self, two verticals, a horizontal and a vertical)
– Cartesian Product
– Union, Intersection, Set difference
– Aggregation

Extend DDL to provide the Horizontal View
CREATE HORIZONTAL VIEW hview ON VERTICAL TABLE vtable
USING COLUMNS (Attr1, Attr2, … Attrk, …)
Alternative Implementation
Strategies

VerticalSQL
– Uses only SQL-92 level capabilities
 VerticalUDF
– Exploits User Defined Functions and Table
Functions to provide a direct implementation
 Binary (hand-coded queries)
– 2-ary representation with one relation per
attribute (using only SQL-92 transforms)
Data Organization Matters: Clustering
by Key significantly outperforms by Oid
Execution time (seconds)
density = 10%, 1000 cols x 20K rows
25
20
VerticalSQL_oid
15
VerticalSQL_key
10
5
0
0.1%
1%
Join selectivity
Join
5%
VerticalSQL comparable to Binary
and outperforms Horizontal
density = 10%
Execution time (seconds)
60
50
40
HorizontalSQL
30
VerticalSQL
20
Binary
10
0
200x100K
400x50K
800x25K
1000x20K
Table (#cols x #rows)
Projection of 10 columns
VerticalUDF is the best approach
density = 10%
Execution time (seconds)
30
20
VerticalSQL
Binary
10
VerticalUDF
0
200x100K
400x50K
800x25K
1000x20K
Table (#cols x #rows)
Projection of 10 columns
Summary
Horizontal
Vertical (w/
Mapping)
Binary (w/
Mapping)
Manageability
+
+
-
Flexibility
-
+
-
Querying
+
+
+
Performance
-
+
+
Remarks

Lessons of this study directly apply to building
storage engine for semantics webs
 Performance of vertical representation can be
further improved by:
– Enhanced table functions
– First class treatment of table functions
– Native support for v2h and h2v operations
– Partial indices