Download A Pareto Model for OLAP View Size Estimation

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Achieving Scalability in OLAP
Materialized View Selection
Thomas P. Nadeau
Toby J. Teorey
University of Michigan
DOLAP 2002
Topics
•
•
•
•
•
•
Overview of OLAP
Exponentiality in View Selection
Our Polynomial Greedy Algorithm (PGA)
Test Results
Conclusions
Current Work
2
Example Star Schema
Customer
CustID
Name
Fact Table
City
Calendar
CustID
DateID
DateID
Month
BindID
Bind Style
Quarter
Cost
BindID
Year
Sell
Desc
State/Prov
3
Star Schema Viewed with Data
Customer
CustID Name
City
00001 U of M
Ann Arbor
00002 Smith & Co. Toronto
Calendar
State/Prov
MI
Ont
DateID
1/1/98
1/2/98
Month
Jan
Jan



Quarter
1
1
Year
4
2000
1998
1998



12/31/00
Dec
Fact Table
CustID DateID BindID Cost
Sell
00002 12/31/00
PB
$500
$600
00222 1/1/99
HC
$1100 $1300



Many Rows
Bind Style
BindID
PB
HC
Desc
Paper Back
Hard Cover
4
Eight Dimensions
of Book Database
Attribute
Trim Width
Trim Length
Pages
Quantity
Stock Width
Stock Length
Bind Style
Press
Hierarchy Levels
4
4
4
4
4
4
4
4
5
Combinatorial Explosion
d
• Possible views =  ℓi,
i=1
where d = |dimensions|
ℓi = |levels| in dimension i
• Book database example
– 2 dimensions, 42 =
16 views
– 4 dimensions, 44 = 256 views
– 6 dimensions, 46 = 4,096 views
– 8 dimensions, 48 = 65,536 views
6
Recap
•
•
•
•
Materialized views quicken query responses
Disk space limits view materialization
Update window is a constraint
Solution: Select strategic views
7
Our OLAP Optimization Approach
Fact Table
Sample Data
View Size Estimation
Estimate Estimated
Request View Size
Initial Data
Completed
Work
View Selection
Strategic Views
Update
Users
Incremental
Data
Queries
Quick
Responses
View Maintenance
Current Views
Current
Work
Query Optimization
8
View Selection:
Example of Hypercube Lattice [HRU96]
{c, p, s} 6M
p = Part
{p, s} 0.8M
{c, s} 6M
{c, p} 6M
{s} 0.01M
{p} 0.2M
{c} 0.1M
s = Supplier
c = Customer
{} 1
9
Example of HRU Algorithm [HRU96]
{c, p, s} 6M
p = Part
{p, s} 0.8M
{c, s} 6M
{c, p} 6M
{s} 0.01M
{p} 0.2M
{c} 0.1M
{} 1
s = Supplier
c = Customer
Benefits of Possible Materialization Choices
Iteration 1
{p, s} 5.2M x 4 = 20.8M
{c, s}
0x4=0
{c, p}
0x4=0
{s} 5.99M x 2 = 11.98M
{p}
5.8M x 2 = 11.6M
{c}
5.9M x 2 = 11.8M
{}
6M - 1
10
Example of HRU
{c, p, s} 6M
p = Part
{p, s} 0.8M
{c, s} 6M
{c, p} 6M
{s} 0.01M
{p} 0.2M
{c} 0.1M
{} 1
s = Supplier
c = Customer
Benefits of Possible Materialization Choices
Iteration 1
Iteration 2
{p, s} 5.2M x 4 = 20.8M
{c, s}
0x4=0
0x4=0
{c, p}
0x4=0
0x4=0
{s} 5.99M x 2 = 11.98M 0.79M x 2 = 1.58M
{p}
5.8M x 2 = 11.6M
0.6M x 2 = 1.2M
{c}
5.9M x 2 = 11.8M 5.9M x 2 = 11.8M
{}
6M - 1
0.8M - 1 11
Exponentiality in HRU
• O(kn2) time, where k = |views to select|,
n = |possible views|
• n = 2d in non-hierarchical database,
where d = |dimensions|
• HRU algorithm is O(k22d) time
• Two sources of exponentiality
– Each possible view is evaluated
– Each view evaluation considers the effect of
materialization on every descendent
12
Polynomial Greedy Algorithm (PGA)
Nomination
Selection
For each candidate
Select fact table
Evaluate benefit
Start new path
[more candidates]
[path ended]
[continuing path]
Nominate smallest
child view
[else]
Select view greedily
[else]
[termination
condition met]
13
Example of PGA [NT02]
{c, p, s} 6M
{p, s} 0.8M
{c, s} 6M
{c, p} 6M
p = Part
s = Supplier
{s} 0.01M
{p} 0.2M
{c} 0.1M
c = Customer
{} 1
14
Example of PGA
{c, p, s} 6M
{p, s} 0.8M
{c, s} 6M
{c, p} 6M
p = Part
s = Supplier
{s} 0.01M
{p} 0.2M
{c} 0.1M
c = Customer
{} 1
Nomination
Candidates
{p, s}
{s}
{}
15
Example of PGA
{c, p, s} 6M
{p, s} 0.8M
{c, s} 6M
{c, p} 6M
p = Part
s = Supplier
{s} 0.01M
{p} 0.2M
{c} 0.1M
c = Customer
{} 1
Nomination
Selection
Candidates
Iteration 1
{p, s}
{s}
{}
5.2M x 4 = 20.8M
5.99M x 2 = 11.98M
6M - 1
16
Example of PGA
{c, p, s} 6M
{p, s} 0.8M
{c, s} 6M
{c, p} 6M
p = Part
s = Supplier
{s} 0.01M
{p} 0.2M
{c} 0.1M
c = Customer
{} 1
Nomination
Selection
Nomination
Candidates
Iteration 1
Candidates
{p, s}
{s}
{}
5.2M x 4 = 20.8M
5.99M x 2 = 11.98M
6M - 1
{c, s}
{s}
{c}
{}
17
Example of PGA
{c, p, s} 6M
{p, s} 0.8M
{c, s} 6M
{c, p} 6M
p = Part
s = Supplier
{s} 0.01M
{p} 0.2M
{c} 0.1M
c = Customer
{} 1
Nomination
Selection
Nomination
Selection
Candidates
Iteration 1
Candidates
Iteration 2
{p, s}
{s}
{}
5.2M x 4 = 20.8M
5.99M x 2 = 11.98M
6M - 1
{c, s}
{s}
{c}
{}
0x2=0
0.79M x 2 = 1.58M
5.9M x 2 = 11.8M
6M - 1 18
Nomination Complexity
•
•
•
•
Maximum swatch width is d.
Maximum path length is d.
Finding one path is O(d2) time
Our strategy nominates a path each time
a view is selected, complexity is O(d2k)
time
19
Evaluating Views in PGA
• Polynomial time evaluation requires
approximating materialization benefits
• Account for smallest ancestor
• Account for materialized view with largest
overlap in descendants
• Complexity of our algorithm is O(d2k2)
20
Complexities
Database Type
Non-Hierarchical
HRU
O(k22d) time
Hierarchical
O(kg2d) time
PGA
O(d2k2) time
O(d2k) space
O(dk2ℓ) time
O(dkℓ) space
d = | dimensions |
g = geometric mean of the number of
hierarchical levels per dimension
k = | views selected for materialization |
ℓ = | layers in lattice |
21
Near Optimal Selection
Query Costs (rows)
1400
Optimal
HRU
Polynomial Greedy
1200
1000
d=2, ℓ = 4
800
600
400
200
0
0
50
100
150
200
250
Materialization Costs (rows)
300
350
22
Query Costs (thousands of rows)
Query Costs at Four Dimensions
800
HRU
PGA
600
400
200
0
0
20
40
60
80
100
120
Materialization Costs (thousands of rows)
140
23
Query Costs (millions of rows)
Query Costs at Six Dimensions
20
HRU
PGA
15
10
5
0
0
50
100
150
200
Materialization Costs (thousands of rows)
250
24
Query Costs (millions of rows)
Query Costs at Eight Dimensions
350
HRU
PGA
300
250
200
150
100
50
0
0
100
200
300
400
500
Materialization Costs (thousands of rows) 25
Processing Time (seconds)
Performance at Four Dimensions
250
HRU
PGA
200
150
100
50
0
0
20
40
60
80
100
120
Materialization Costs (thousands of rows)
140
26
Processing Time (minutes)
Performance at Six Dimensions
200.00
HRU
PGA
150.00
100.00
50.00
0.00
0
50
100
150
200
250
Materialization Costs (thousands of rows) 27
Processing Time (minutes)
Performance at Eight Dimensions
200.00
HRU
PGA
150.00
100.00
50.00
0.00
0
100
200
300
400
500
Materialization Costs (thousands of rows) 28
Conclusions
• PGA finds a good set of views for
materialization, when HRU fails due to
algorithm complexity
• PGA extends the usefulness of OLAP
systems into higher dimensionality
29
Current Work
Fact Table
Sample Data
View Size Estimation
Estimate Estimated
Request View Size
Initial Data
Completed
Work
View Selection
Strategic Views
Update
Users
Incremental
Data
Queries
Quick
Responses
View Maintenance
Current Views
Current
Work
Query Optimization
30
Current Work
• Design alternative data structures for
materialized views in OLAP
• Test impact of new data structures on
update and query costs.
• Integrate our work into an OLAP system
31
References
• [HRU96] V. Harinarayan, A. Rajaraman, J. D. Ullman.
Implementing Data Cubes Efficiently. In Proceedings of
1996 ACM-SIGMOD Conf., pp. 205 - 216, Montreal,
Canada.
• [NT01] T. P. Nadeau, T. J. Teorey. A Pareto Model for
OLAP View Size Estimation. CASCON 2001, pp 1 – 13,
Toronto, Canada.
• [NT02] T. P. Nadeau, T. J. Teorey. Achieving
Scalability in OLAP Materialized View Selection.
Technical Report (extended version).
http://www.eecs.umich.edu/~teorey/cv.html .
32
Related documents