Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Self-Managing Cost Models Shivnath Babu Stanford University Cost Models in Database Systems • Conventional query optimization: – Enumerate query plans – Estimate physical cost of each plan (e.g., execution time, total resources--CPU & I/O--required) – Choose plan with minimum cost • Estimation of physical cost is based on (operator) cost models • Very important to have fairly accurate cost models 2 Current Approach to Deriving Cost Models • Trial and error • Classic: Linear combination of CPU cost & the number of disk blocks accessed • Sequential Vs. Random accesses – Data layout, data access pattern • Buffer pool hit ratio – Buffer pool size, data access pattern, number of concurrent queries • L1, L2, L3 cache hit ratio 3 Problems with the Current Approaches • Growing importance of: – – – – Autonomic Computing Diverse data management needs in many new apps Non-monolithic uses of database software Better user experience (Ex: SLAs, progress bars) • Current manual approach to cost model management is a hindrance in this new world: – Hard to port across system configurations (Ex: Local disk Vs. RAID Vs. NAS Vs. Remote database) or workloads – Complex, many lines of code, hard to maintain – Assumptions (Ex: ignores interference across queries) – Severely restricts auto-configuration and plug & play 4 Solution #1: Get Rid of Cost Models • Use Eddies: no plan, no optimizer no cost models • Jury is still out 5 #2: Automated Cost-Model Management 1. Bootstrapping--Start with: • • • An overall objective (Ex: minimize execution time) A common-case model (Ex: CPU + Seq. I/O + Rand. I/O) A list of other factors that could affect cost (Ex: cache misses, #concurrent processes) 2. Detect deviations from model during execution • Ignore deviations resulting from stats. estimation errors 3. Troubleshoot online (challenging) • • Does the deviation matter? What is the cause? Use extra “probe queries” 4. Update model and test: Online what-if analysis 6 Epilogue • Related work: – In data integration (e.g., CORDS-MDBS, Garlic) – In main-memory databases (e.g., Monet) – Not comprehensive or fully automated • Self-managing cost models: – A big step toward Autonomic Database Systems – Will improve re-usability of DB software – Should improve overall performance and userexperience 7