Download catalog Optimization Logical Expression

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Entity–attribute–value model wikipedia , lookup

Extensible Storage Engine wikipedia , lookup

Big data wikipedia , lookup

Relational model wikipedia , lookup

Clusterpoint wikipedia , lookup

Object-relational impedance mismatch wikipedia , lookup

Functional Database Model wikipedia , lookup

Database model wikipedia , lookup

Transcript
Optimization refresher
Recall the main steps of
optimization:
Logical Expression
(algebra tree)
Optimization
Equivalent Expr.
Enumeration
Physical Plan
Generation
catalog
Plan Selection
statistics
cost model
Physical Operator Tree
(query plan)
1
Purpose of optimization
Why optimize queries?
Optimization can make query processing:
-
faster
cheaper
better? (Not in the sense of correctness: results are the
same, regardless of what the optimal plan looks like)
2
Optimization costs
What are the downsides to optimization?
-
Optimization uses limited resources:
- time
- space
-
Certain approaches can eat up a lot of these resources
- e.g. Exhaustively considering join ordering:
alternatives are exponential
3
Fundamental tradeoff
Two points to consider:
- an optimal plan takes less time to execute than
alternative plans
- it takes time to find the optimal plan
This frames the fundamental engineering tradeoff in query
optimization:
Optimization can reduce resources consumed during query
processing, but the optimization process consumes
resources.
4
Distributed optimization
Queries on distributed systems can be optimized too.
What’s a “distributed system”?
5
Agrios
Hybrid Analytic Systems are one
kind of distributed system.
They integrate a data management
tool with an analysis tool
My system integrates R and SciDB.
It is named Agrios.
R
Operations
Data
A
C
+ \ %*% * / ^ . &&
[ ] max() cbind() aov() factor()
t.test() anova() glm() plot()
Agrios
SciDB
Operations
apply()
mult()
join()
Data
B
D
E
6
Agrios - R
res <- A %*% B
Powerful analysis system and
programming language
R

Operations
High-level operations on data

Open-source
A
C
+ \ %*% * / ^ . &&
[ ] max() cbind() aov() factor()
t.test() anova() glm() plot()
Vectors and arrays are fundamental
data objects


Data
Agrios
SciDB
Operations
apply()
mult()
join()
Data
B
D
E
7
Agrios - SciDB
store(multiply(A,B), res);


R
Array database management
system
Operations
Data
A
C
+ \ %*% * / ^ . &&
[ ] max() cbind() aov() factor()
t.test() anova() glm() plot()
Arrays are the fundamental
data objects

Low-level operations on arrays

Scales well

Open-source
Agrios
SciDB
Operations
apply()
mult()
join()
Data
B
D
E
8
Key properties of hybrid systems
Key property of hybrid systems; at
both hybrid components:
- Data is stored
- Analytic work is performed
R
Operations
Data
A
C
+ \ %*% * / ^ . &&
[ ] max() cbind() aov() factor()
t.test() anova() glm() plot()
Agrios
SciDB
Operations
apply()
mult()
join()
Data
B
D
E
9
Data movement is required
Key properties of hybrid systems; at both analytic components:
- Data is stored
- Analytic work is performed
This means that data must be moved between hybrid components
We need to decide what data is moved where
10
Optimizers for reducing data movement
Data movement should be minimized:
- data movement takes time
- data movement takes money
Repurposing an optimizer:
- Optimizers help us minimize or reduce resource
consumption
- Moving data consumes resources
- So we can use an optimizer to minimize data movement
11
Example – two ways to move data
Some choices are better (move less data) than others:
R
Initial state:
data at B
Computation
performed at A
Data movement
B
B
CxR
R
C
C
A
A
A
B
Computation
performed at B
12
Choices in data movement
There are choices when it comes to moving data!
Two sides of the same coin:
• Selecting execution locations for a query’s operations
determines data movement
--OR-• Selecting data movements determines query operation
execution locations
13
Stagings - Example
This decision amounts to filling in the question marks
with operation execution locations
%*%
?
[]
[]
%*%
?
?
+?
?
%*%
+
sum()
operation_1: R
operation_2: SciDB
operation_3: SciDB
...
...
operation_n: R
?
[]
?
This yields a staging
?
?
14
Staging – many options with complex expressions
Some possible stagings:
there are many more …
15
Selecting the best staging
Select the best staging
What is the best staging?
How do we find the best staging?
16
Finding the movement-minimizing plan
How we pick the optimal plan / select the best staging:
1. User writes a query, with no mention of location
- Saves users the trouble of manually managing data
movement
- Lets them use the same script wherever data is
2. Things are where they are (the query’s placement)
3. Agrios considers all possible execution location
combinations:
- Determines data movement requirements
- Assigns a cost to the plan
4. Lowest-cost plan is selected and executed
17
Staging example - 1
Original
query
Logically equivalent
queries
Logically equivalent
plans
R script
18
Staging example - 2
Original
query
Logically equivalent
queries
Logically equivalent
plans
R script
19
Staging example - 3
Original
query
Logically equivalent
queries
Logically equivalent
plans
R script
20
Staging example - 4
Original
query
Logically equivalent
queries
Logically equivalent
plans
1000
10
R script
30,000
750
21
Staging example - 5
Original
query
Logically equivalent
queries
Logically equivalent
plans
1000
10
R script
For
execution
30,000
750
22
Additional technique: transforming queries
We are using an optimizer to minimize data
movement.
What other techniques can we use from relational
database query optimization?
Query transformations.
23
Consolidating transformation
A consolidating transformation reduces the number of data transfers
result
result
(R)
(R)
C
10
(R)
10
A
10
B
A
C
(SciDB)
(R)
(SciDB)
10
10
D
10
10
B
10
D
(R)
(SciDB)
(SciDB)
(SciDB)
10
10
10
10
(i)
(ii)
24
Reductive transformation
A reductive transformation reduces the amount of data
moved in a transfer
[]
+
+
[]
[]
25
Key data structure
MEMO data structure used during optimization
26
Summary
Summary:
• Tools and techniques from relational database query
optimization can be used to minimize data movement
• Data movement is expensive, so should be minimized
• Stagings vary in cost, and there is a movementminimizing plan associated with the staging
• Transformations can further reduce data movement
27