* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download catalog Optimization Logical Expression
Survey
Document related concepts
Transcript
Optimization refresher Recall the main steps of optimization: Logical Expression (algebra tree) Optimization Equivalent Expr. Enumeration Physical Plan Generation catalog Plan Selection statistics cost model Physical Operator Tree (query plan) 1 Purpose of optimization Why optimize queries? Optimization can make query processing: - faster cheaper better? (Not in the sense of correctness: results are the same, regardless of what the optimal plan looks like) 2 Optimization costs What are the downsides to optimization? - Optimization uses limited resources: - time - space - Certain approaches can eat up a lot of these resources - e.g. Exhaustively considering join ordering: alternatives are exponential 3 Fundamental tradeoff Two points to consider: - an optimal plan takes less time to execute than alternative plans - it takes time to find the optimal plan This frames the fundamental engineering tradeoff in query optimization: Optimization can reduce resources consumed during query processing, but the optimization process consumes resources. 4 Distributed optimization Queries on distributed systems can be optimized too. What’s a “distributed system”? 5 Agrios Hybrid Analytic Systems are one kind of distributed system. They integrate a data management tool with an analysis tool My system integrates R and SciDB. It is named Agrios. R Operations Data A C + \ %*% * / ^ . && [ ] max() cbind() aov() factor() t.test() anova() glm() plot() Agrios SciDB Operations apply() mult() join() Data B D E 6 Agrios - R res <- A %*% B Powerful analysis system and programming language R Operations High-level operations on data Open-source A C + \ %*% * / ^ . && [ ] max() cbind() aov() factor() t.test() anova() glm() plot() Vectors and arrays are fundamental data objects Data Agrios SciDB Operations apply() mult() join() Data B D E 7 Agrios - SciDB store(multiply(A,B), res); R Array database management system Operations Data A C + \ %*% * / ^ . && [ ] max() cbind() aov() factor() t.test() anova() glm() plot() Arrays are the fundamental data objects Low-level operations on arrays Scales well Open-source Agrios SciDB Operations apply() mult() join() Data B D E 8 Key properties of hybrid systems Key property of hybrid systems; at both hybrid components: - Data is stored - Analytic work is performed R Operations Data A C + \ %*% * / ^ . && [ ] max() cbind() aov() factor() t.test() anova() glm() plot() Agrios SciDB Operations apply() mult() join() Data B D E 9 Data movement is required Key properties of hybrid systems; at both analytic components: - Data is stored - Analytic work is performed This means that data must be moved between hybrid components We need to decide what data is moved where 10 Optimizers for reducing data movement Data movement should be minimized: - data movement takes time - data movement takes money Repurposing an optimizer: - Optimizers help us minimize or reduce resource consumption - Moving data consumes resources - So we can use an optimizer to minimize data movement 11 Example – two ways to move data Some choices are better (move less data) than others: R Initial state: data at B Computation performed at A Data movement B B CxR R C C A A A B Computation performed at B 12 Choices in data movement There are choices when it comes to moving data! Two sides of the same coin: • Selecting execution locations for a query’s operations determines data movement --OR-• Selecting data movements determines query operation execution locations 13 Stagings - Example This decision amounts to filling in the question marks with operation execution locations %*% ? [] [] %*% ? ? +? ? %*% + sum() operation_1: R operation_2: SciDB operation_3: SciDB ... ... operation_n: R ? [] ? This yields a staging ? ? 14 Staging – many options with complex expressions Some possible stagings: there are many more … 15 Selecting the best staging Select the best staging What is the best staging? How do we find the best staging? 16 Finding the movement-minimizing plan How we pick the optimal plan / select the best staging: 1. User writes a query, with no mention of location - Saves users the trouble of manually managing data movement - Lets them use the same script wherever data is 2. Things are where they are (the query’s placement) 3. Agrios considers all possible execution location combinations: - Determines data movement requirements - Assigns a cost to the plan 4. Lowest-cost plan is selected and executed 17 Staging example - 1 Original query Logically equivalent queries Logically equivalent plans R script 18 Staging example - 2 Original query Logically equivalent queries Logically equivalent plans R script 19 Staging example - 3 Original query Logically equivalent queries Logically equivalent plans R script 20 Staging example - 4 Original query Logically equivalent queries Logically equivalent plans 1000 10 R script 30,000 750 21 Staging example - 5 Original query Logically equivalent queries Logically equivalent plans 1000 10 R script For execution 30,000 750 22 Additional technique: transforming queries We are using an optimizer to minimize data movement. What other techniques can we use from relational database query optimization? Query transformations. 23 Consolidating transformation A consolidating transformation reduces the number of data transfers result result (R) (R) C 10 (R) 10 A 10 B A C (SciDB) (R) (SciDB) 10 10 D 10 10 B 10 D (R) (SciDB) (SciDB) (SciDB) 10 10 10 10 (i) (ii) 24 Reductive transformation A reductive transformation reduces the amount of data moved in a transfer [] + + [] [] 25 Key data structure MEMO data structure used during optimization 26 Summary Summary: • Tools and techniques from relational database query optimization can be used to minimize data movement • Data movement is expensive, so should be minimized • Stagings vary in cost, and there is a movementminimizing plan associated with the staging • Transformations can further reduce data movement 27