Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Achieving Scalability in OLAP Materialized View Selection Thomas P. Nadeau Toby J. Teorey University of Michigan DOLAP 2002 Topics • • • • • • Overview of OLAP Exponentiality in View Selection Our Polynomial Greedy Algorithm (PGA) Test Results Conclusions Current Work 2 Example Star Schema Customer CustID Name Fact Table City Calendar CustID DateID DateID Month BindID Bind Style Quarter Cost BindID Year Sell Desc State/Prov 3 Star Schema Viewed with Data Customer CustID Name City 00001 U of M Ann Arbor 00002 Smith & Co. Toronto Calendar State/Prov MI Ont DateID 1/1/98 1/2/98 Month Jan Jan Quarter 1 1 Year 4 2000 1998 1998 12/31/00 Dec Fact Table CustID DateID BindID Cost Sell 00002 12/31/00 PB $500 $600 00222 1/1/99 HC $1100 $1300 Many Rows Bind Style BindID PB HC Desc Paper Back Hard Cover 4 Eight Dimensions of Book Database Attribute Trim Width Trim Length Pages Quantity Stock Width Stock Length Bind Style Press Hierarchy Levels 4 4 4 4 4 4 4 4 5 Combinatorial Explosion d • Possible views = ℓi, i=1 where d = |dimensions| ℓi = |levels| in dimension i • Book database example – 2 dimensions, 42 = 16 views – 4 dimensions, 44 = 256 views – 6 dimensions, 46 = 4,096 views – 8 dimensions, 48 = 65,536 views 6 Recap • • • • Materialized views quicken query responses Disk space limits view materialization Update window is a constraint Solution: Select strategic views 7 Our OLAP Optimization Approach Fact Table Sample Data View Size Estimation Estimate Estimated Request View Size Initial Data Completed Work View Selection Strategic Views Update Users Incremental Data Queries Quick Responses View Maintenance Current Views Current Work Query Optimization 8 View Selection: Example of Hypercube Lattice [HRU96] {c, p, s} 6M p = Part {p, s} 0.8M {c, s} 6M {c, p} 6M {s} 0.01M {p} 0.2M {c} 0.1M s = Supplier c = Customer {} 1 9 Example of HRU Algorithm [HRU96] {c, p, s} 6M p = Part {p, s} 0.8M {c, s} 6M {c, p} 6M {s} 0.01M {p} 0.2M {c} 0.1M {} 1 s = Supplier c = Customer Benefits of Possible Materialization Choices Iteration 1 {p, s} 5.2M x 4 = 20.8M {c, s} 0x4=0 {c, p} 0x4=0 {s} 5.99M x 2 = 11.98M {p} 5.8M x 2 = 11.6M {c} 5.9M x 2 = 11.8M {} 6M - 1 10 Example of HRU {c, p, s} 6M p = Part {p, s} 0.8M {c, s} 6M {c, p} 6M {s} 0.01M {p} 0.2M {c} 0.1M {} 1 s = Supplier c = Customer Benefits of Possible Materialization Choices Iteration 1 Iteration 2 {p, s} 5.2M x 4 = 20.8M {c, s} 0x4=0 0x4=0 {c, p} 0x4=0 0x4=0 {s} 5.99M x 2 = 11.98M 0.79M x 2 = 1.58M {p} 5.8M x 2 = 11.6M 0.6M x 2 = 1.2M {c} 5.9M x 2 = 11.8M 5.9M x 2 = 11.8M {} 6M - 1 0.8M - 1 11 Exponentiality in HRU • O(kn2) time, where k = |views to select|, n = |possible views| • n = 2d in non-hierarchical database, where d = |dimensions| • HRU algorithm is O(k22d) time • Two sources of exponentiality – Each possible view is evaluated – Each view evaluation considers the effect of materialization on every descendent 12 Polynomial Greedy Algorithm (PGA) Nomination Selection For each candidate Select fact table Evaluate benefit Start new path [more candidates] [path ended] [continuing path] Nominate smallest child view [else] Select view greedily [else] [termination condition met] 13 Example of PGA [NT02] {c, p, s} 6M {p, s} 0.8M {c, s} 6M {c, p} 6M p = Part s = Supplier {s} 0.01M {p} 0.2M {c} 0.1M c = Customer {} 1 14 Example of PGA {c, p, s} 6M {p, s} 0.8M {c, s} 6M {c, p} 6M p = Part s = Supplier {s} 0.01M {p} 0.2M {c} 0.1M c = Customer {} 1 Nomination Candidates {p, s} {s} {} 15 Example of PGA {c, p, s} 6M {p, s} 0.8M {c, s} 6M {c, p} 6M p = Part s = Supplier {s} 0.01M {p} 0.2M {c} 0.1M c = Customer {} 1 Nomination Selection Candidates Iteration 1 {p, s} {s} {} 5.2M x 4 = 20.8M 5.99M x 2 = 11.98M 6M - 1 16 Example of PGA {c, p, s} 6M {p, s} 0.8M {c, s} 6M {c, p} 6M p = Part s = Supplier {s} 0.01M {p} 0.2M {c} 0.1M c = Customer {} 1 Nomination Selection Nomination Candidates Iteration 1 Candidates {p, s} {s} {} 5.2M x 4 = 20.8M 5.99M x 2 = 11.98M 6M - 1 {c, s} {s} {c} {} 17 Example of PGA {c, p, s} 6M {p, s} 0.8M {c, s} 6M {c, p} 6M p = Part s = Supplier {s} 0.01M {p} 0.2M {c} 0.1M c = Customer {} 1 Nomination Selection Nomination Selection Candidates Iteration 1 Candidates Iteration 2 {p, s} {s} {} 5.2M x 4 = 20.8M 5.99M x 2 = 11.98M 6M - 1 {c, s} {s} {c} {} 0x2=0 0.79M x 2 = 1.58M 5.9M x 2 = 11.8M 6M - 1 18 Nomination Complexity • • • • Maximum swatch width is d. Maximum path length is d. Finding one path is O(d2) time Our strategy nominates a path each time a view is selected, complexity is O(d2k) time 19 Evaluating Views in PGA • Polynomial time evaluation requires approximating materialization benefits • Account for smallest ancestor • Account for materialized view with largest overlap in descendants • Complexity of our algorithm is O(d2k2) 20 Complexities Database Type Non-Hierarchical HRU O(k22d) time Hierarchical O(kg2d) time PGA O(d2k2) time O(d2k) space O(dk2ℓ) time O(dkℓ) space d = | dimensions | g = geometric mean of the number of hierarchical levels per dimension k = | views selected for materialization | ℓ = | layers in lattice | 21 Near Optimal Selection Query Costs (rows) 1400 Optimal HRU Polynomial Greedy 1200 1000 d=2, ℓ = 4 800 600 400 200 0 0 50 100 150 200 250 Materialization Costs (rows) 300 350 22 Query Costs (thousands of rows) Query Costs at Four Dimensions 800 HRU PGA 600 400 200 0 0 20 40 60 80 100 120 Materialization Costs (thousands of rows) 140 23 Query Costs (millions of rows) Query Costs at Six Dimensions 20 HRU PGA 15 10 5 0 0 50 100 150 200 Materialization Costs (thousands of rows) 250 24 Query Costs (millions of rows) Query Costs at Eight Dimensions 350 HRU PGA 300 250 200 150 100 50 0 0 100 200 300 400 500 Materialization Costs (thousands of rows) 25 Processing Time (seconds) Performance at Four Dimensions 250 HRU PGA 200 150 100 50 0 0 20 40 60 80 100 120 Materialization Costs (thousands of rows) 140 26 Processing Time (minutes) Performance at Six Dimensions 200.00 HRU PGA 150.00 100.00 50.00 0.00 0 50 100 150 200 250 Materialization Costs (thousands of rows) 27 Processing Time (minutes) Performance at Eight Dimensions 200.00 HRU PGA 150.00 100.00 50.00 0.00 0 100 200 300 400 500 Materialization Costs (thousands of rows) 28 Conclusions • PGA finds a good set of views for materialization, when HRU fails due to algorithm complexity • PGA extends the usefulness of OLAP systems into higher dimensionality 29 Current Work Fact Table Sample Data View Size Estimation Estimate Estimated Request View Size Initial Data Completed Work View Selection Strategic Views Update Users Incremental Data Queries Quick Responses View Maintenance Current Views Current Work Query Optimization 30 Current Work • Design alternative data structures for materialized views in OLAP • Test impact of new data structures on update and query costs. • Integrate our work into an OLAP system 31 References • [HRU96] V. Harinarayan, A. Rajaraman, J. D. Ullman. Implementing Data Cubes Efficiently. In Proceedings of 1996 ACM-SIGMOD Conf., pp. 205 - 216, Montreal, Canada. • [NT01] T. P. Nadeau, T. J. Teorey. A Pareto Model for OLAP View Size Estimation. CASCON 2001, pp 1 – 13, Toronto, Canada. • [NT02] T. P. Nadeau, T. J. Teorey. Achieving Scalability in OLAP Materialized View Selection. Technical Report (extended version). http://www.eecs.umich.edu/~teorey/cv.html . 32