Download Query Optimization

Document related concepts

Commitment ordering wikipedia , lookup

DBase wikipedia , lookup

Serializability wikipedia , lookup

Microsoft Access wikipedia , lookup

IMDb wikipedia , lookup

Oracle Database wikipedia , lookup

Entity–attribute–value model wikipedia , lookup

Extensible Storage Engine wikipedia , lookup

PL/SQL wikipedia , lookup

SQL wikipedia , lookup

Open Database Connectivity wikipedia , lookup

Microsoft SQL Server wikipedia , lookup

Ingres (database) wikipedia , lookup

Concurrency control wikipedia , lookup

Database wikipedia , lookup

Microsoft Jet Database Engine wikipedia , lookup

ContactPoint wikipedia , lookup

Clusterpoint wikipedia , lookup

Database model wikipedia , lookup

Versant Object Database wikipedia , lookup

Relational algebra wikipedia , lookup

Relational model wikipedia , lookup

Transcript
Database Techniek
Query Optimization
(chapter 14)
Database Techniek – Query Optimization
Lecture 3
• Query Rewriting
– Equivalence Rules
• Query Optimization
– Dynamic Programming (System R/DB2)
– Heuristics (Ingres/Postgres)
• De-correlation of nested queries
• Result Size Estimation
– Practicum Assignment 2
Database Techniek – Query Optimization
Lecture 3
• Query Rewriting
– Equivalence Rules
• Query Optimization
– Dynamic Programming (System R/DB2)
– Heuristics (Ingres/Postgres)
• De-correlation of nested queries
• Result Size Estimation
– Practicum Assignment 2
Database Techniek – Query Optimization
Transformation of Relational
Expressions
• Two relational algebra expressions are said to be equivalent if
on every legal database instance the two expressions generate
the same set of tuples
– Note: order of tuples is irrelevant
• In SQL, inputs and outputs are bags (multi-sets) of tuples
– Two expressions in the bag version of the relational algebra are
said to be equivalent if on every legal database instance the two
expressions generate the same bag of tuples
• An equivalence rule says that expressions of two forms are
equivalent
– Can replace expression of first form by second, or vice versa
Database Techniek – Query Optimization
Equivalence Rules
1. Conjunctive selection operations can be deconstructed into a
sequence of individual selections.
   ( E )    (  ( E ))
1
2
1
2
2. Selection operations are commutative.
  (  ( E ))    (  ( E ))
1
2
2
1
3. Only the last in a sequence of projection operations is
needed, the others can be omitted.
t1 (t2 ((tn (E ))))  t1 (E )
4. Selections can be combined with Cartesian products and theta
joins.
a. (E1 X E2) = E1
 E2
b. 1(E1 2 E2) = E1
Database Techniek – Query Optimization
12
E2
Algebraic Rewritings for Selection:
cond1
cond2
cond1 AND cond2
R
cond
R
2
cond
R1

cond1 OR cond2
R
Database Techniek – Query Optimization
cond1
R
cond2
Equivalence Rules (Cont.)
5.Theta-join operations (and natural joins) are
commutative.
E1  E2 = E2  E1
6. Natural join operations are associative:
(E1 E2) E3 = E1 (E2
E3)
Database Techniek – Query Optimization
Equivalence Rules for Joins
commutative
associative
Database Techniek – Query Optimization
Equivalence Rules (Cont.)
7.
For pushing down selections into a (theta) join we have the
following cases:
–
(push 1) When all the attributes in 0 involve only the attributes
of one of the expressions (E1) being joined.
0E1
–


E2
(split) When  1 involves only the attributes of E1 and 2 involves
only the attributes of E2.
1 E1
–
E2) = (0(E1))

E2) = (1(E1))

( (E2))
(impossible) When  involves both attributes of E1 and E2 (it is a
join condition)
Database Techniek – Query Optimization
Pushing Selection thru
Cartesian Product and Join
cond

R
S
cond
R

R
The right direction
requires that cond refers to S
attributes only
S
Database Techniek – Query Optimization
R
cond
S
cond
S
Projection Decomposition
pXYpXpY
total=X.tax*Y.price
pp
pp p p
XY X Y
pp


X
X.tax
Y
{total} {tax} {price}
X
Y
Y.price
Database Techniek – Query Optimization
pp

ppXp pp p
Y
More Equivalence Rules
8. The projections operation distributes over the theta join
operation as follows:
(a) if  involves only attributes from L1  L2:
 L1  L2 ( E1.......  E2 )  ( L1 ( E1 ))......  ( L2 ( E2 ))
(b) Consider a join E1

E2.
– Let L1 and L2 be sets of attributes from E1 and E2, respectively.
– Let L3 be attributes of E1 that are involved in join condition , but
are not in L1  L2, and
– Let L4 be attributes of E2 that are involved in join condition , but are
not in L1  L2.
 L1  L2 ( E1..... E2 )   L1  L2 (( L1  L3 ( E1 ))......  ( L2  L4 ( E2 )))
Database Techniek – Query Optimization
Join Ordering Example
• For all relations r1, r2, and r3,
(r1 r2) r3 = r1 (r2 r3 )
• If r2 r3 is quite large and r1 r2 is small, we choose
(r1 r2) r3
so that we compute and store a smaller temporary relation.
Database Techniek – Query Optimization
Join Ordering Example (Cont.)
•
•
•
Consider the expression
customer-name ((branch-city = “Brooklyn” (branch))
account depositor)
Could compute account depositor first, and join result with
branch-city = “Brooklyn” (branch)
but account depositor is likely to be a large relation.
Since it is more likely that only a small fraction of the bank’s
customers have accounts in branches located in Brooklyn, it
is better to compute
branch-city = “Brooklyn” (branch) account
first.
Database Techniek – Query Optimization
Lecture 3
• Query Rewriting
– Equivalence Rules
• Query Optimization
– Dynamic Programming (System R/DB2)
– Heuristics (Ingres/Postgres)
• De-correlation of nested queries
• Result Size Estimation
Database Techniek – Query Optimization
Lecture 3
• Query Rewriting
– Equivalence Rules
• Query Optimization
– Dynamic Programming (System R/DB2)
– Heuristics (Ingres/Postgres)
• De-correlation of nested queries
• Result Size Estimation
Database Techniek – Query Optimization
The role of Query Optimization
SQL
parsing, normalization
logical algebra
physical query optimization
physical algebra
query execution
Database Techniek – Query Optimization
logical query optimization
The role of Query Optimization
SQL
parsing, normalization
logical algebra
Compare different
relational algebra plan
 on result size
(Practicum 2A)
physical query optimization
physical algebra
query execution
Database Techniek – Query Optimization
logical query optimization
The role of Query Optimization
SQL
parsing, normalization
logical algebra
physical query optimization
physical algebra
Compare different
execution algorithms
query execution
on true cost
(IO, CPU, cache)
Database Techniek – Query Optimization
logical query optimization
Enumeration of Equivalent
Expressions
•
•
Query optimizers use equivalence rules to systematically generate
expressions equivalent to the given expression
repeated until no more expressions can be found:
–
for each expression found so far, use all applicable equivalence rules, and add
newly generated expressions to the set of expressions found so far
•
The above approach is very expensive in space and time
•
Time and space requirements are reduced by not generating all
expressions
Database Techniek – Query Optimization
Finding A Good Join Order
• Consider finding the best join-order for r1
r2
. . . rn.
• There are (2(n – 1))!/(n – 1)! different join orders for above
expression. With n = 7, the number is 665280, with n = 10, the
number is greater than 176 billion!
• No need to generate all the join orders. Using dynamic
programming, the least-cost join order for any subset of
{r1, r2, . . . rn} is computed only once and stored for future use.
Database Techniek – Query Optimization
Dynamic Programming in Optimization
• To find best join tree for a set of n relations:
– To find best plan for a set S of n relations, consider all
possible plans of the form: S1
(S – S1) where S1 is any
non-empty subset of S.
– Recursively compute costs for joining subsets of S to find
the cost of each plan. Choose the cheapest of the 2n – 1
alternatives.
– When plan for any subset is computed, store it and reuse
it when it is required again, instead of recomputing it
• Dynamic programming
Database Techniek – Query Optimization
Join Order Optimization Algorithm
procedure findbestplan(S)
if (bestplan[S].cost  )
return bestplan[S]
// else bestplan[S] has not been computed earlier, compute it now
for each non-empty subset S1 of S such that S1  S
P1= findbestplan(S1)
P2= findbestplan(S - S1)
A = best algorithm for joining results of P1 and P2
cost = P1.cost + P2.cost + cost of A
if cost < bestplan[S].cost
bestplan[S].cost = cost
bestplan[S].plan = “execute P1.plan; execute P2.plan;
join results of P1 and P2 using A”
return bestplan[S]
Database Techniek – Query Optimization
Left Deep Join Trees
• In left-deep join trees, the right-hand-side input for each join
is a relation, not the result of an intermediate join.
Database Techniek – Query Optimization
Cost of Optimization
•
With dynamic programming time complexity of optimization with bushy
trees is O(3n).
– With n = 10, this number is 59000 instead of 176 billion!
•
•
Space complexity is O(2n)
To find best left-deep join tree for a set of n relations:
– Consider n alternatives with one relation as right-hand side input and the
other relations as left-hand side input.
– Using (recursively computed and stored) least-cost join order for each
alternative on left-hand-side, choose the cheapest of the n alternatives.
•
If only left-deep trees are considered, time complexity of finding best
join order is O(n 2n)
– Space complexity remains at O(2n)
•
Cost-based optimization is expensive, but worthwhile for queries on
large datasets (typical queries have small n, generally < 10)
Database Techniek – Query Optimization
Physical Query Optimization
• Minimizes absolute cost
– Minimize I/Os
– Minimize CPU, cache miss cost (main memory DBMS)
• Must consider the interaction of evaluation techniques when
choosing evaluation plans: choosing the cheapest algorithm for
each operation independently may not yield best overall
algorithm. E.g.
– merge-join may be costlier than hash-join, but may provide a sorted
output which reduces the cost for an outer level aggregation.
– nested-loop join may provide opportunity for pipelining
Database Techniek – Query Optimization
Physical Optimization: Interesting Orders
• Consider the expression (r1 r2 r3) r4 r5
• An interesting sort order is a particular sort order of
tuples that could be useful for a later operation.
– Generating the result of r1 r2 r3 sorted on the attributes
common with r4 or r5 may be useful, but generating it sorted on
the attributes common only r1 and r2 is not useful.
– Using merge-join to compute r1 r2 r3 may be costlier, but
may provide an output sorted in an interesting order.
• Not sufficient to find the best join order for each subset of
the set of n given relations; must find the best join order for
each subset, for each interesting sort order
– Simple extension of earlier dynamic programming algorithms
– Usually, number of interesting orders is quite small and doesn’t
affect time/space complexity significantly
Database Techniek – Query Optimization
Heuristic Optimization
• Cost-based optimization is expensive, even with
dynamic programming.
• Systems may use heuristics to reduce the number of
choices that must be made in a cost-based fashion.
• Heuristic optimization transforms the query-tree by
using a set of rules that typically (but not in all cases)
improve execution performance:
– Perform selection early (reduces the number of tuples)
– Perform projection early (reduces the number of
attributes)
– Perform most restrictive selection and join operations
before other similar operations.
– Some systems use only heuristics, others combine
heuristics with partial cost-based optimization.
Database Techniek – Query Optimization
Steps in Typical Heuristic Optimization
1. Deconstruct conjunctive selections into a sequence of single
selection operations (Equiv. rule 1.).
2. Move selection operations down the query tree for the
earliest possible execution (Equiv. rules 2, 7a, 7b, 11).
3. Execute first those selection and join operations that will
produce the smallest relations (Equiv. rule 6).
4. Replace Cartesian product operations that are followed by a
selection condition by join operations (Equiv. rule 4a).
5. Deconstruct and move as far down the tree as possible lists
of projection attributes, creating new projections where
needed (Equiv. rules 3, 8a, 8b, 12).
6. Identify those subtrees whose operations can be pipelined,
and execute them using pipelining).
Database Techniek – Query Optimization
Heuristic Join Order: the WongYoussefi algorithm (INGRES)
Sample TPC-H Schema
Nation(NationKey, NName)
Customer(CustKey, CName, NationKey)
Find the
Order(OrderKey, CustKey, Status)
Lineitem(OrderKey, PartKey, Quantity) names of
suppliers that
Product(SuppKey, PartKey, PName)
sell a product
that appears
Supplier(SuppKey, SName)
SELECT SName
FROM Nation, Customer, Order, LineItem, Product, Supplier
WHERE Nation.NationKey = Cuctomer.NationKey
AND Customer.CustKey = Order.CustKey
AND Order.OrderKey=LineItem.OrderKey
AND LineItem.PartKey= Product.Partkey
AND Product.Suppkey = Supplier.SuppKey
AND NName = “Canada”
Database Techniek – Query Optimization
in a line item
of an order
made by a
customer who
is in Canada
Challenges with Large Natural
Join Expressions
For simplicity, assume that in the query
1. All joins are natural
2. whenever two tables of the FROM clause have common
attributes we join on them
πSName
1. Consider Right-Index only
RI
RI
One possible order
RI
RI
RI
σIndex
NName=“Canada”
Database Techniek – Query Optimization
Nation
Customer
Order
LineItem
Product
Supplier
Wong-Yussefi algorithm
assumptions and objectives
• Assumption 1 (weak): Indexes on all join attributes
(keys and foreign keys)
• Assumption 2 (strong): At least one selection creates
a small relation
– A join with a small relation results in a small relation
• Objective: Create sequence of index-based joins such
that all intermediate results are small
Database Techniek – Query Optimization
Hypergraphs
Customer
Nation
NationKey
CName
CustKey
NName
LineItem
Order
Status
Quantity
Supplier
SName
OrderKey
SuppKey
PName
PartKey
• relation hyperedges
• two hyperedges for same relation are possible
• each node is an attribute
• can extend for non-natural equality joins by merging nodes
Database Techniek – Query Optimization
Product
Small Relations/Hypergraph Reduction
“Nation” is small
because it has the
equality selection
NName = “Canada”
Customer
Nation
NationKey
CName
CustKey
NName
LineItem
Order
Status
Quantity
Supplier
SName
σIndex
NName=“Canada”
Nation
OrderKey
SuppKey
Pick a small
relation (and its
conditions) to start
the plan
Database Techniek – Query Optimization
PName
PartKey
Product
(1) Remove small
relation (hypergraph
reduction) and color
as “small” any
relation that joins
with the removed
“small” relation
Customer
Nation
NationKey
CName
CustKey
NName
LineItem
Order
Status
OrderKey
Quantity
Supplier
SName
SuppKey
PName
RI
σIndex
NName=“Canada”
Nation
Customer
Database Techniek – Query Optimization
(2) Pick a small
relation (and its
conditions if any)
and join it with the
small relation that
has been reduced
PartKey
Product
After a bunch of steps…
πSName
RI
RI
RI
RI
RI
σIndex
NName=“Canada”
Nation
Customer
Order
Database Techniek – Query Optimization
LineItem
Product
Supplier
Some Query Optimizers
• The System R/Starburst: dynamic programming on left-deep
join orders. Also uses heuristics to push selections and
projections down the query tree.
• DB2, SQLserver are cost-based optimizers
– SQLserver is transformation based, also uses dynamic
programming.
• MySQL optimizer is heuristics-based (rather weak)
• Heuristic optimization used in some versions of Oracle:
– Repeatedly pick “best” relation to join next
• Starting from each of n starting points. Pick best among these.
Database Techniek – Query Optimization
Lecture 3
• Query Rewriting
– Equivalence Rules
• Query Optimization
– Dynamic Programming (System R/DB2)
– Heuristics (Ingres/Postgres)
• De-correlation of nested queries
• Result Size Estimation
– Practicum Assignment 2
Database Techniek – Query Optimization
Lecture 3
• Query Rewriting
– Equivalence Rules
• Query Optimization
– Dynamic Programming (System R/DB2)
– Heuristics (Ingres/Postgres)
• De-correlation of nested queries
• Result Size Estimation
– Practicum Assignment 2
Database Techniek – Query Optimization
Optimizing Nested Subqueries
• SQL conceptually treats nested subqueries in the where clause as
functions that take parameters and return a single value or set of
values
– Parameters are variables from outer level query that are used in the
nested subquery; such variables are called correlation variables
• E.g.
select customer-name
from borrower
where exists (select *
from depositor
where depositor.customer-name =
borrower.customer-name)
• Conceptually, nested subquery is executed once for each tuple in
the cross-product generated by the outer level from clause
– Such evaluation is called correlated evaluation
– Note: other conditions in where clause may be used to compute a join
(instead of a cross-product) before executing the nested subquery
Database Techniek – Query Optimization
Optimizing Nested Subqueries (Cont.)
• Correlated evaluation may be quite inefficient since
– a large number of calls may be made to the nested query
– there may be unnecessary random I/O as a result
• SQL optimizers attempt to transform nested subqueries to joins
where possible, enabling use of efficient join techniques
• E.g.: earlier nested query can be rewritten as
select customer-name
from borrower, depositor
where depositor.customer-name = borrower.customer-name
– Note: above query doesn’t correctly deal with duplicates, can be
modified to do so as we will see
• In general, it is not possible/straightforward to move the entire
nested subquery from clause into the outer level query from
clause
– A temporary relation is created instead, and used in body of outer
level query
Database Techniek – Query Optimization
Optimizing Nested Subqueries (Cont.)
In general, SQL queries of the form below can be rewritten as shown
• Rewrite: select …
from L1
where P1 and exists (select *
from L2
where P2)
• To:
create table t1 as
select distinct V
from L2
where P21
select …
from L1, t1
where P1 and P22
– P21 contains predicates in P2 that do not involve any correlation variables
– P22 reintroduces predicates involving correlation variables, with
relations renamed appropriately
– V contains all attributes used in predicates with correlation variables
Database Techniek – Query Optimization
Optimizing Nested Subqueries (Cont.)
• In our example, the original nested query would be transformed to
create table t1 as
select distinct customer-name
from depositor
select customer-name
from borrower, t1
where t1.customer-name = borrower.customer-name
• The process of replacing a nested query by a query with a join
(possibly with a temporary relation) is called decorrelation.
• Decorrelation is more complicated when
– the nested subquery uses aggregation, or
– when the result of the nested subquery is used to test for equality, or
– when the condition linking the nested subquery to the other
query is not exists,
– and so on.
Database Techniek – Query Optimization
Practicum Assignment 2A
• Get the XML metadata description for TPC-H
– xslt script for plotting histograms
• Take our solution for your second query (assignment 1)
• For each operator in the tree give:
–
–
–
–
Selectivity
Intermediate Result size
Short description how you computed this
Explanation how to compute histograms on all result columns
• Sum all intermediate result sizes into total query cost
• DEADLINE: march 31
Database Techniek – Query Optimization
The Big Picture
1. Parsing and translation
2. Optimization
3. Evaluation
Database Techniek – Query Optimization
The Big Picture
1. Parsing and translation
2. Optimization
3. Evaluation
Database Techniek – Query Optimization
Optimization
• Query Optimization: Amongst all equivalent
evaluation plans choose the one with lowest cost.
– Cost is estimated using statistical information from the
database catalog
• e.g. number of tuples in each relation, size of tuples, etc.
• In this lecture we study logical cost estimation
– introduction to histograms
– estimating the amount of tuples in the result with perfect
and equi-height histograms
– propagation of histograms into result columns
– How to compute result size from width and #tuples
Database Techniek – Query Optimization
Cost Estimation
• Physical cost estimation
– predict I/O blocks, seeks, cache misses, RAM consumption, …
– Depends in the execution algorithm
• In this lecture we study logical cost estimation
– “the plan with smallest intermediate result tends to be best”
– need estimations for intermediate result sizes
• Histogram-based estimation (practicum, assignment 2)
– estimating the amount of tuples in the result with perfect and
equi-height histograms
– propagation of histograms into result columns
– compute result size as tuple-width * #tuples
Database Techniek – Query Optimization
Result Size
#tuples_max * selectivity * #columns
We disregard differences in column-width
• project:
– #columns = |projectlist|
– #tuples_max = |R|
• aggr:
– #columns = |groupbys| + |aggrs|
– #tuples_max = min(|R|, |g1| * .. * |gn|)
• join:
– #columns = |child1| + |child2|
– #tuples_max = |R1| * |R2|
• other:
– #columns stays equal wrt child
– #tuples_max = |R|
Database Techniek – Query Optimization
Selectivity estimation
We can estimate the selectivities using:
• domain constraints
• min/max statistics
• histograms
Database Techniek – Query Optimization
Histograms
Buckets:
B = <min, max, total, distinct>
Leave min out (Bi.min = Bi-1.max)
Database Techniek – Query Optimization
Different Kinds of Histograms
• Perfect
• Equi-width
• Equi-height
In the practicum we use
• Perfect histograms, when distinct(R.a) < 25
• Equi-height histograms of 10 buckets, otherwise
– Not perfectly even-height: disjunct value ranges between buckets
– (i.e. frequent value is not split over even-height buckets. It may
create a bigger-than-height bucket)
Database Techniek – Query Optimization
Perfect Histograms:
Equi-Selection
• s(R.a=C) = Bk.total * (1/|R|)
– in case there is a k with Bk.max = C
• s(R.a=C) = 0
– otherwise
s(R.a=d)
total
a
c
d
Database Techniek – Query Optimization
f
Perfect Histograms:
Equi-Selection
• s(R.a=C) = Bk.total * (1/|R|)
– in case there is a k with Bk.max = C
• s(R.a=C) = 0
– otherwise
s(R.a=d)
total
a
c
d
Database Techniek – Query Optimization
f
Perfect Histograms:
Range-Selection
• s(R.a<C) = sum(Bi.total) * (1/|R|),
– for all 1 <= i < k with B(k-1).max < C <= Bk.max
s(R.a<d)
total
a
c
d
Database Techniek – Query Optimization
f
Perfect Histograms:
Range-Selection
• s(R.a<C) = sum(Bi.total) * (1/|R|),
– for all 1 <= i < k with B(k-1).max < C <= Bk.max
s(R.a<d)
total
a
c
d
Database Techniek – Query Optimization
f
Equi-Height Histograms:
Equi-Selection
• s(R.a=C) = avg_freq(Bk) * (1/|R|)
– in case there is a k with B(k-1).max < C <= Bk.max
– avg_freq(Bk) = Bk.total / Bk.distinct
• s(R.a=C) = 0
– otherwise
s(R.a=c)
total
a
d e
Database Techniek – Query Optimization
f
Equi-Height Histograms:
Equi-Selection
• s(R.a=C) = avg_freq(Bk) * (1/|R|)
– in case there is a k with B(k-1).max < C <= Bk.max
– avg_freq(Bk) = Bk.total / Bk.distinct
• s(R.a=C) = 0
– otherwise
s(R.a=c)
total
a
d e
Database Techniek – Query Optimization
f
Equi-Height Histograms:
Range-Selection
• s(R.a<C) = ( sum(Bi.total) + freq_lt(Bk,C) ) * (1/|R|),
– for all 1 <= i < k with B(k-1).max < C <= Bk.max.
s(R.a<c)
total
a
d e
Database Techniek – Query Optimization
f
Equi-Height Histograms:
Range-Selection
• s(R.a<C) = ( sum(Bi.total) + freq_lt(Bk,C) ) * (1/|R|),
– for all 1 <= i < k with B(k-1).max < C <= Bk.max.
s(R.a<c)
total
a
d e
Database Techniek – Query Optimization
f
Select with And and Or
Assume no correlation between attributes:
• s(θa and θc) = s(θa) * s(θc)
• s(θa or θc) = s(θa) + (1-s(θa)) * s(θc)
Note: must normalize θa , θc into nonoverlapping conditions
Database Techniek – Query Optimization
Foreign-key Join
Selectivity/Hitrate Estimation
Foreign-key constraint:
R1 matches at most once with R2
“each order matches on average with 7 lineitems” hitrate = 7
But what if R’2 (e.g. order) is an intermediate result?
• R’2 may have multiple key occurrences due to a previous join
• R’2 may have less key occurrences (missing keys) due to a select (or join).
Simple Approach (practicum):
Hitrate *= |R’2|/|R2|
Database Techniek – Query Optimization
Aggr(R,[g1..gn],[..])
Can only predict groupby columns and size:
• Expected result size =
min(|R|,distinct(g1) * …. * distinct(gn))
Database Techniek – Query Optimization
Histogram Propagation
• order:
• project:
histogram stays identical
histogram stays identical
» Expression (e.g. l_tax*l_price) not required for the practicum
» possible to use cartesian product on histograms, followed by expression
evaluation and re-bucketizing.
• topn:
not required for the practicum
» Use last bucket (and backwards) to take highest N distinct values and
their frequencies
• aggr:
not required for the practicum
» Groupbys: distinct is multiplication of distincts, freq=1
» Aggregates: only possible for global aggregates (no groupbys)
• fk-join:
multiply totals by join hitrate
» distinct = min(distinct,total)  this is a simplicifation!
• Select:
multiply totals by selectivity
» distinct = min(distinct,total)
• Select (selection attribute):
» Get totals/distincts from subset of buckets
Database Techniek – Query Optimization
Practicum Assignment 2
• Get the XML metadata description for TPC-H
– ps/pfd histograms also available
• Take our solution for your second query (assignment 1)
• For each operator in the tree give:
–
–
–
–
Selectivity
Intermediate Result size
Short description how you computed this
Explanation how to compute histograms on all result columns
• Sum all intermediate result sizes into total query cost
• DEADLINE: march 31
Database Techniek – Query Optimization