Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Automated Physical Design in Database Caches T. Malik, X. Wang, R. Burns Johns Hopkins University D. Dash, A. Ailamaki Carnegie Mellon University Hopkins Storage Systems Lab, Department of Computer Science Outline Target Application: Proxy caches for SkyQuery Physical Design in Proxy Caches – Need for vertical partitioning – Workload evolution Online Vertical Partitioning – Simple scenario: Two configuration – General scenario: N configurations Experiments Hopkins Storage Systems Lab, Department of Computer Science SkyQuery Publicly accessible federation of sky surveys (a virtual telescope with terabyte data sets) Autonomous, heterogeneous, and geographically distributed sites Data intensive, read-only workload Scaling through proxy caching – – Minimize network traffic Offload query processing Hopkins Storage Systems Lab, Department of Computer Science Bypass Caching (Malik et al., ICDE’05) Proxy database cache for SkyQuery – – Brings columns closes to users Economic model for bypassing queries Hopkins Storage Systems Lab, Department of Computer Science The Need for Vertical Partitioning Poor I/O performance in the cache – Mirrors the backend DB design – Index-free environment Largest relations groups 446 columns Auxiliary data structures (indices/views) pollute cache Offsets response time benefits from network savings – 6x benefit with partitioning alone Performance without redundant data Hopkins Storage Systems Lab, Department of Computer Science Is Partitioning Feasible? Hopkins Storage Systems Lab, Department of Computer Science Why Not Existing Solutions? (ie. DB tuning advisor, Autopart) Require representative workloads – – Offline in nature – – – Not readily available Astronomy workloads exhibit evolution Invoked periodically Costly to run Ignore the cost of partitioning Static design – – Output a single configuration for the input workload Ignores incremental changes within the workload Hopkins Storage Systems Lab, Department of Computer Science Workload Evolution Hopkins Storage Systems Lab, Department of Computer Science Workload Evolution Hopkins Storage Systems Lab, Department of Computer Science Online Vertical Partitioning Problem Hopkins Storage Systems Lab, Department of Computer Science Two Configuration Scenario Algorithm: Given current config C and an alternative C’, transition if remaining in C incurs substantial overhead Capturing overhead – – – Penalty : Cumulative Penalty Max cumulative penalty : : Transition if 3-competitive – – After k transitions, 2Conf incurs (3k/2)(d(C,C’)+d(C’,C)) OPT incurs at least (k/2)(d(C,C’)+d(C’,C)) Hopkins Storage Systems Lab, Department of Computer Science NConf: Extending to N-Configurations Let Cy be the current config., Cx be the previous config., transition to the first Cz (Cz≠Cy) satisfying: Number of configurations is exponential (51 trillion ways to partition 20 attributes) Pruning heuristics – – Neighboring configurations Attribute Groups Hopkins Storage Systems Lab, Department of Computer Science Neighboring Configurations Curr. Config: Cy A1 A2 A3 A4 A1 A2 A1 A3 A2 A4 A4 A3 Neighbors of Cy Small, incremental partitions Lower threshold to overcome Hopkins Storage Systems Lab, Department of Computer Science A1 A3 A4 A2 Attribute Groups qk: {A1,A3,A4} Curr. Config: Cy A1 A2 A3 A4 Attr. Groups: {A1}, {A3}, {A4}, {A1,A3}, {A1,A4}, {A3,A4}, {A1,A3,A4} A1 A2 A3 A4 weight+=qk(Cy) A1 A2 A4 A3 weight+=qk(Cy) A1 A2 A3 A4 weight+=qk(Cy) A1 A2 A3 weight+=qk(Cy) Candidate config if n.weight > d(Cx, Cy)+d(Cy,n) Candidates with high weight benefits from repartitioning Hopkins Storage Systems Lab, Department of Computer Science A4 Experiments TPC benchmark in SQL Server 2000 Partition orders relation using select queries Two 10k workloads – – WkldSky: Evolving access pattern that approximates SkyQuery WkldConst: Access pattern remains unchanged AutoPart: an offline partitioning tool (Papadomanolakis et al.) Hopkins Storage Systems Lab, Department of Computer Science Query Performance Hopkins Storage Systems Lab, Department of Computer Science Estimated I/O and Transitions Hopkins Storage Systems Lab, Department of Computer Science Future Work Impact of cache replacement policy – – Database state change periodically Reuse work to find new partitions Scaling to SkyQuery with thousands of attributes Fast techniques for cost estimation Integration of index selection in caches Hopkins Storage Systems Lab, Department of Computer Science Conclusion Proxy caches present a dynamic environment Vertical partitioning improves performance without adding redundant data Online vertical partitioning – Balances query execution performance with cost of transitioning Experiments show 17% improvement by partitioning a single table Hopkins Storage Systems Lab, Department of Computer Science Questions ??? Hopkins Storage Systems Lab, Department of Computer Science