Download E - Read

Document related concepts

IMDb wikipedia , lookup

Serializability wikipedia , lookup

Entity–attribute–value model wikipedia , lookup

Oracle Database wikipedia , lookup

Extensible Storage Engine wikipedia , lookup

SQL wikipedia , lookup

Microsoft SQL Server wikipedia , lookup

Open Database Connectivity wikipedia , lookup

Concurrency control wikipedia , lookup

Database wikipedia , lookup

Microsoft Jet Database Engine wikipedia , lookup

Clusterpoint wikipedia , lookup

ContactPoint wikipedia , lookup

Relational algebra wikipedia , lookup

Versant Object Database wikipedia , lookup

Database model wikipedia , lookup

Relational model wikipedia , lookup

Transcript
PART 4
DATA STORAGE AND QUERY
Chapter 14
Query Optimization
Main parts in Chapter 14



§14.1 Overview
 why optimization needed
§14.2 Transformation of relational expressions
 equivalence rules
§14.4 Choice of evaluation plans — Query optimization
 cost-based optimization,§14.2
 heuristic optimization, §14.3
May 2008
Database System Concepts - Chapter 14 Query Optimization -
3
§14.1 Overview
A SQL query statement may corresponds to several equivalent
expressions ( or query evaluation plans ), each of which may
have different costs
 Query optimization is needed to choice an efficient queryevaluation plan with lower costs


E.g.1.
select *
from r, s
where r.A = s.A
σr.A=s.A (r ╳ s) is much slower than r
May 2008
s
Database System Concepts - Chapter 14 Query Optimization -
4
§14.1 Overview

E.g.2 Find the names of all customers who have an account at
any branch located in Brooklyn

select customer-name
from branch, account, depositor
where branch.name=account.name
AND account.number =depositor.number
AND branch-city = Brooklyn

May 2008
Fig.14.1
Database System Concepts - Chapter 14 Query Optimization -
5
Fig. 14.1 Equivalent Expression
14.1 Overview (cont.)

The procedures of optimization

generating the equivalent expressions/query trees, by
transforming of relational algebra expressions according to
equivalence rules in§14.2

generating alternative evaluation plans, by annotating the
resultant equivalent expressions with implementation
algorithms for each operations in the expressions

choosing the optimal (that is, cheapest) or near-optimal
plan based on the estimated cost, by

cost-based optimization

heuristic optimization
May 2008
Database System Concepts - Chapter 14 Query Optimization -
7
14.1 Overview (cont.)

The cost of operations is needed to estimate based on statistical
information about relations
 e.g. number of tuples, number of distinct values for join
attributes, etc.
May 2008
Database System Concepts - Chapter 14 Query Optimization -
8
§14.2 Transformation of Relational
Expressions

Definition. Two relational algebra expressions are equivalent, if
on every legal database instances, the two expression generate
the same set of tuples
May 2008
Database System Concepts - Chapter 14 Query Optimization -
9
14.2.1 Equivalence Rules
Fig.14.0.1 Schema diagram for the banking enterprise
May 2008
Database System Concepts - Chapter 14 Query Optimization -
10
14.2.1 Equivalence Rules (cont.)

Rule1.(选择串接律, 将1个选择操作分解为2个选择操作)
Conjunctive selection operations can be deconstructed into a
sequence of individual selections
   ( E )    (  ( E ))
1


2
1
2
e.g. refer to
Rule2. (选择交换律) Selection operations are commutative :
  (  ( E ))    (  ( E ))
1
May 2008
2
2
1
Database System Concepts - Chapter 14 Query Optimization -
11
14.2.1 Equivalence Rules (cont.)

Rule3. (投影串接律) Only the final operation in a sequence of
project operations is needed
 L1 ( L2 ( ( Ln ( E )) ))   L1 ( E )


May 2008
note: it is required that L1⊆ L2⊆ ….⊆ Ln
e.g.  {loan-number} {  {loan-number, amount} (loan) }
=  {loan-number} (loan)
Database System Concepts - Chapter 14 Query Optimization -
12
14.2.1 Equivalence Rules (cont.)
Rule4. (以连接操作代替选择和笛卡尔乘积) Selections can be
combined with Cartesian products and theta joins
a. (E1 ╳ E2) = E1  E2
note: definition of  join
b. 1(E1 2 E2) = E1 1 2 E2
 e.g.  branch-name=“Perryridge”( borrower
loan) =
borrower (branch-name=“Perryridge)  (borrower.loan-numbr=loan.loan-number)
loan
 By Rule4, the number of operations can be reduced, and the costs
of the left-hand expressions are less than that of the right-hand
expressions
 often used in heuristic query optimizing

May 2008
Database System Concepts - Chapter 14 Query Optimization -
13
14.2.1 Equivalence Rules (cont.)

Rule5. (连接操作可交换) Theta join operations are
commutative
E1  E2 = E2  E1
 Fig. 14.2
May 2008
Database System Concepts - Chapter 14 Query Optimization -
14
14.2.1 Equivalence Rules (cont.)

Rule6. (连接操作的结合率, associative)
a. natural join operations are associative
(E1
E2) E3 = E1
(E2 E3)
b. Theta joins are associative in the following manner:
(E1 1 E2) 2  3 E3 = E1 1 3 (E2 2 E3)
, where 2 involves attributes from only E2 and E3
May 2008
Database System Concepts - Chapter 14 Query Optimization -
15
Rule 5
Rule 6a
Rule 7a
Fig.14.2 Pictorial Depiction of Equivalence Rules
14.2.1 Equivalence Rules (cont.)

Rule7. Selection operation distributes over theta-join(选择
操作对于连接操作的分配率,选择条件下移)
 a. when all the attributes in 0 involve only the attributes
of one of the expressions , e.g. E1, being joined
0E1


May 2008

E2) = (0(E1))

E2
b. when  1 involves only the attributes of E1, and 2
involves only the attributes of E2
1 E1
e.g. in Fig.14.0.2

E2) = (1(E1))

( (E2))
Database System Concepts - Chapter 14 Query Optimization -
17
When Rule7 is applied, the size of relations participating in
theta-join operation is reduced, thus evaluation costs are reduced

branch
account
Fig.14.0.2 An example of Rule7
14.2.1 Equivalence Rules (cont.)

Rule8. Projection operation distributes over theta-join(投影操作
对于连接操作的分配率,投影下移)
 a. L1 and L2 are the attributes of E1 and E2 respectively. if
join condition  involves only attributes in L1  L2, then
 L1  L2 ( E1.......  E2 )  ( L1 ( E1 ))......  ( L2 ( E2 ))
May 2008
Database System Concepts - Chapter 14 Query Optimization -
19
L1
L2
 {branch-name, branch-city}  {branch-name, amount}
branch
loan
{branch-name, branch-city}  {branch-name, amount}
branch
E1:
branch (branch-name, branch-city, assets)
E2:
loan(loan-number, branch-name, amount)
Fig.14.0.3 An example of Rule 8a
loan
14.2.1 Equivalence Rules (cont.)
b. consider a join E1  E2
 let L1 and L2 be sets of attributes from E1 and E2,
respectively
 let L3 be attributes of E1 that are involved in join condition
, but are not in L1  L2, and
 let L4 be attributes of E2 that are involved in join condition
, but are not in L1  L2.
 L1  L2 ( E1..... E2 )   L1  L2 (( L1  L3 ( E1 ))......  ( L2  L4 ( E2 )))
May 2008
Database System Concepts - Chapter 14 Query Optimization -
21
L1
L2
 {branch-city}  { loan-number}
E1:
branch (branch-name, branch-city, assets)
E2:
loan(loan-number, branch-name, amount)

 {branch-city}  { loan-number}
B. assets> L. amount
L3
branch
L4
loan
B. assets> L. amount
 {branch-city, assets}
branch
Fig.14.0.4 An example of Rule 8b
 {loan-number, amount}
loan
14.2.1 Equivalence Rules (cont.)

Rule9. The set operations union and intersection are
commutative
E1  E2 = E2  E1
E1  E2 = E2  E1
set difference is not commutative

Rule10. Set union and intersection are associative
(E1  E2)  E3 = E1  (E2  E3)
(E1  E2)  E3 = E1  (E2  E3)
May 2008
Database System Concepts - Chapter 14 Query Optimization -
23
14.2.1 Equivalence Rules (cont.)

Rule11. (选择操作对于集合并、交、差的分配率) The
selection operation distributes over ,  and –.
  (E1 – E2) =  (E1) – (E2)
 similarly for  and  in place of –
  (E1 – E2) = (E1) – E2
 similarly for  in place of –, but not for 

Rule12. (投影操作对于集合并的分配率) The projection
operation distributes over union
L(E1  E2) = (L(E1))  (L(E2))
May 2008
Database System Concepts - Chapter 14 Query Optimization -
24
14.2.2 Example One



Find the names of all customers who have an account at some
branch located in Brooklyn
customer-name(branch-city = “Brooklyn”
(branch (account depositor)))
Transformation using rule 7a.
customer-name
((branch-city =“Brooklyn” (branch))
(account depositor))
Performing the selection as early as possible reduces the size
of the relation to be joined.
May 2008
Database System Concepts - Chapter 14 Query Optimization -
25
14.2.2 Example Two

For
customer-name((branch-city = “Brooklyn” (branch)


account)
depositor)
When we compute
t ←(branch-city = “Brooklyn” (branch) account )
we obtain a temporal relation t whose schema is:
T(branch-name, branch-city, assets, account-number, balance)
Push projections using equivalence rules 8a and 8b; eliminate
unneeded attributes from intermediate results to get:
 customer-name (
 account-number t(branch-name, branch-city, assets, accountnumber, balance)
depositor(customer-name, account-number))
May 2008
Database System Concepts - Chapter 14 Query Optimization -
26
14.2.2 Example Three

Find the names of all customers with an account at a Brooklyn
branch whose account balance is over $1000.
customer-name((branch-city = “Brooklyn”  balance > 1000
(branch (account depositor)))

Transformation using join associatively (Rule 6a), refer to
Fig.14.3
May 2008
Database System Concepts - Chapter 14 Query Optimization -
27
14.2.2 Example Three (cont.)
Fig. 14.3 Multiple transformations
May 2008
Database System Concepts - Chapter 14 Query Optimization -
28
14.2.3 Join Ordering


For all relations r1, r2, and r3,
(r1 r2) r3 = r1 (r2 r3 )
If r2 r3 is quite large and r1 r2 is small, we choose
(r1 r2) r3
so that we compute and store a smaller temporary relation.
May 2008
Database System Concepts - Chapter 14 Query Optimization -
29
14.2.3 An Example



Consider the expression
customer-name ((branch-city = “Brooklyn” (branch))
account depositor)
Could compute account depositor first, and join result with
branch-city = “Brooklyn” (branch)
, but account depositor is likely to be a large relation.
Since it is more likely that only a small fraction of the bank’s
customers have accounts in branches located in Brooklyn, it is
better to compute
branch-city = “Brooklyn” (branch) account
first.
May 2008
Database System Concepts - Chapter 14 Query Optimization -
30
§14.3 Estimating Statistics of
Expression Results

How to estimate the cost of an operation
 estimating the size of the resultant relations produced by the
operation p, according to statistics in the catalog about the
relations on which the operation is conducted


May 2008
e.g. customer-name(balance2500(account)
customer)
computing the cost of the operation in terms of disk access,
by means of the cost formulae shown in chapter13
 disk access is the predominant cost, and the number of
block transfers from disk is used as a measure of the
actual cost of evaluation.
Database System Concepts - Chapter 14 Query Optimization -
31
14.3.1 Statistics in Catalog for
Optimization






nr: number of tuples in a relation r
br: number of blocks containing tuples of r
lr: size of a tuple of r in bytes
fr: blocking factor of r
 the number of tuples of r that fit into one block
V(A, r): number of distinct values that appear in r for attribute
A
 same as the size of A(r)
Statistics about indices, such as heights of B+-trees, that is HTi,
and the number of leaf pages in the indices, are also in catalog
May 2008
Database System Concepts - Chapter 14 Query Optimization -
32
Statistics in Catalog for Optimization
(cont.)

If tuples of r are stored together physically in a file, then:




nr 
br 
f r 
May 2008
Database System Concepts - Chapter 14 Query Optimization -
33
Statistics in Catalog for Optimization
(cont.)

An example for statistical information
 faccount= 20 (20 tuples of account fit in one block)
 V(branch-name, account) = 50 (50 branches)
 V(balance, account) = 500 (500 different balance values)
 naccount = 10000 (account has 10,000 tuples)
 assume the following indices exist on account
+
 a primary, B -tree index for attribute branch-name
+
 a secondary, B -tree index for attribute balance
May 2008
Database System Concepts - Chapter 14 Query Optimization -
34
14.3.2 Selection Size Estimation


The size estimation of the result of a selection operation depends
on the selection predicates, that is single equality predicate,
single comparison predicate, and combination of predicates
Equality selection A=a (r)
 assuming uniform distribution of values in relations, and the
value a appears in attribute A of some record in r
 the selection is estimated to have
 nr / V(A, r) tuples
 nr /V(A, r) /fr blocks that these tuples occupy
May 2008
Database System Concepts - Chapter 14 Query Optimization -
35
14.3.2 Selection Size Estimation (cont.)

Comparison selections AV(r)
 the lowest and highest values of attributes, that is min(A, r)
and max(A, r), are available in catalog
 the estimated number of records/tuples satisfying comparison
condition A V
 0,
if v < min(A, r)
 nr,
if v  max(A, r)

, otherwise
v  min( A, r )
nr .
max( A, r )  min( A, r )

May 2008
in absence of statistical information cost is assumed to be
nr / 2
Database System Concepts - Chapter 14 Query Optimization -
36
14.3.2 Selection Size Estimation (cont.)

The selectivity of a condition i is the probability that a tuple in
the relation r satisfies i . If si is the number of satisfying
tuples in r, the selectivity of i is given by si /nr.
Conjunction: 1 2. . .  n (r). The estimate for number of

tuples in the result is: n  s1  s2  . . .  sn
r
nrn
Disjunction:1 2 . . .  n (r). Estimated number of tuples:


sn 
s1
s2

nr  1  (1  )  (1  )  ...  (1  ) 
nr
nr
nr 


Negation: (r). Estimated number of tuples:
nr – size((r))
May 2008
Database System Concepts - Chapter 14 Query Optimization -
37
14.4 Choice of Evaluation Plans
— Query optimization

How to generate the evaluation plan for an relational algebra
expression with lower or the lowest cost
 generating of the optimized query trees
 selecting
 implementation algorithms for each operations in the tree
 pipeline or materialization strategy for the expression

E.g. Fig.14.5
May 2008
Database System Concepts - Chapter 14 Query Optimization -
38
Fig.14.5 An evaluation plan
14.4 Choice of Evaluation Plans
(cont.)



Query optimization
 choosing the evaluation plan with lowest or the lower cost
Query optimization is conducted by two strategies/approaches
 cost-based optimization,§14.4.2
 heuristic optimization, §14.4.3!
Practical query optimizers incorporate elements of these two
approaches
May 2008
Database System Concepts - Chapter 14 Query Optimization -
40
14.4.2 Cost-based Optimization

Cost-based optimization
 search all the possible plans and choose the best plan
 a optimal plan with the minimum costs can be obtained

Cost-based optimization is expensive, though results optimal plan
May 2008
Database System Concepts - Chapter 14 Query Optimization -
41
14.4.3 Heuristic Optimization

Heuristic optimization
 uses heuristics (or heuristic rules) to choose a better plan
 a suboptimal plan with lower costs can be obtained

Principles: transforms the query-tree by heuristics to reduce cost
 perform selection early to reduce the number of tuples
 perform projection early to reduces the number of attributes
 substitute Cartesian product and selection with join
May 2008
Database System Concepts - Chapter 14 Query Optimization -
42
14.4.3 Heuristic Optimization (cont.)

May 2008
perform most restrictive selection and join operations before
other similar operations
 the most restrictive operations generate resulting relations
with smallest size
Database System Concepts - Chapter 14 Query Optimization -
43
Steps in Heuristic Optimization



Step1 使用rule1,将conjunctive selection 分解为多个单独
的选择操作,以使单个选择操作尽可能沿查询树下移(尽
早执行选择操作,以减少中间计算结果)
Step2 根据选择操作的交换率和分配率,利用rule2, rule7.a,
rule7.b, rule11, 将查询树上的每个选择操作尽可能移向叶
节点,以便尽早执行选择操作
Step3 根据连接操作的结合律和交换率,使用rule6,重新
安排查询树中的叶结点,使得具有restrictive selection特征
的叶结点先执行

restrictive selection:执行此操作后,产生的结果关系
最小(所含元组最少)

refer to Fig. 14.3
May 2008
Database System Concepts - Chapter 14 Query Optimization -
44
Steps in Heuristic Optimization
(cont.)



Step4 利用rule4.a, 以连接操作代替相邻的选择和笛卡尔乘
积操作
Step5 利用rule3, 8.a, 8.b,12, 将查询树上的投影操作尽可能
下移,以便尽早执行投影操作,减少中间计算结果
Step6 将最后的查询树分解为多个子树,使子树中的各操
作可以采用流水线方式执行,以减少对外设的访问次数
e.g. Fig.14.5
重点:前5步 !!!
May 2008
Database System Concepts - Chapter 14 Query Optimization -
45
Example One



Given
 S(S#, sname, age, sex)
 C(C#, cname, teacher)
 SC(S#, C#, grade)
Find all the students, who are male, and get grades more than
90 when they learn some one course
Step1. SQL statement
selection conditions
 SELECT DISTINCT sname
FROM
S, SC
WHERE S.s# = SC.s# AND sex=M AND grade > 90
join condition
May 2008
Database System Concepts - Chapter 14 Query Optimization -
46
Example One (cont.)

Principles
select A1, A2, …, An
from r1, r2, …, rn
where P
corresponds to
∏A1, A2, …, An (σP ( r1 ╳ r2 ╳ …╳ rn ) )
May 2008
Database System Concepts - Chapter 14 Query Optimization -
47
Example One (cont.)

Step2. Initial Relational algebra expression
 E1 =
sname( s.s#=sc.s# ∧ sex=M ∧grade>90 (S ╳ SC ) )

Step3. Initial query tree
 Fig. 14.0.5 (a), E1

Step4. By means of Rule1, Rule7b, distribute  operations over
relation S and SC
 Fig. 14.0.5 (b), E2
May 2008
Database System Concepts - Chapter 14 Query Optimization -
48
sname
sname
s.s#=sc.s#
s.s#=sc.s# ∧ sex=M ∧grade>90
╳
╳
sex=M
grade>90
SC (S#, C#, grade)
S
(S#, sname, age, sex)
(a) Initial query tree
E1
S
SC
(b)E2
Example One (cont.)



Step5. By means of Rule4.a, replace selection and Cartesian
product with natural join
 Fig. 14.0.5 (c), E3
Step6. !! By means of Rule8b, distribute  operations over
relation S and SC
 Fig. 14.0.5 (d),
The final optimized expression E4 that is equivalent to initial
expression E1 is
 sname {sname, S# (sex=M (S))
grade>90 (grade, S# (SC)) }
May 2008
Database System Concepts - Chapter 14 Query Optimization -
50
sname
sname
s.s#=sc.s#
sex=M
grade>90
SC (S#, C#, grade)
S
s.s#=sc.s#
Πsname, S#
ΠS#
sex=M
grade>90
S
SC(S#, C#, grade)
(S#, sname, age, sex)
(S#, sname, age, sex)
(c)E3
(d)E4
Example Two

Consider the following insurance database, where the primary
keys are underlined,
 Person(driver-id, name, address)
 Car(license, model, year)
 Accident(report-number, date, location)
 Participated(driver-id , license, report-number, damageamount)
May 2008
Database System Concepts - Chapter 14 Query Optimization -
52
Example Two (cont.)

Problems
 Give a SQL statement to find out the driver name, license,
report-number of the accidents which happened in Beijing
and before May 3, 2008
 Give the initial query tree for this query, and construct an
optimized and equivalent relational algebra expression for it
by means of heuristic optimization
May 2008
Database System Concepts - Chapter 14 Query Optimization -
53
Example Two (cont.)


Step1. SQL statement
 Select name, license, Accident.report-number
From Person, Participate, Accident
Where Person.driver_id = Participate.driver_id AND
Accident.report_number = Participate.report_number
AND Location=Beijing AND date < 5/3/2008
Step2. Initial expression
 name, license, Accident.report-number
( Person.driver_id = Participate.driver_id
AND Accident.report_number =
Participate.report_number AND Location=Beijing AND date < 5/3/2003
(Person ╳ Participate ╳ Accident )
May 2008
Database System Concepts - Chapter 14 Query Optimization -
54
 name, license, Accident.report-number
 Person.driver_id = Participate.driver_id
AND
Accident.report_number = Participate.report_number AND
Location=Beijing AND date < 5/3/2003
╳
╳
Person
Participate
Accident
(a) initial query tree
 name, license, Accident.report-number
Accident.report.number = Participate.report_number
 name, license,
Accident.report_number
report-number
Person.driver_id = Participate.driver_id
 driver_id,
name
Location=Beijing
 driver_id,
AND date < 5/3/2003
license,
report-number
Participate
Accident
(report-number, date ,location)
Person
(driver-id, license, report-number, damage-amount)
(driver-id, name, address)
(b) optimal query tree
Example Three

Consider the following relations in banking enterprise database,
where the primary keys are underlined
 branch (branch-name, branch-city, assets),
 loan ( loan-number, branch-name , amount)
 borrower( customer-name, loan-number, borrow-date)
 customer (customer-name, customer-street, customer-city)
 account (account-number, branch-name, balance)
 depositor (customer-name, account-number , deposit-date)
May 2008
Database System Concepts - Chapter 14 Query Optimization -
57
Example Three (cont.)

For the query “ Find the names of all customers who have an
loan at any branch that is located in Brooklyn and have assets
more than $100,000, requiring that loan-amount is less than
$1000”
 give an SQL statement for this query
 given a initial query tree for the query, and convert it into an
optimized query tree by means of heuristic optimization
May 2008
Database System Concepts - Chapter 14 Query Optimization -
58
Example Three (cont.)

May 2008
SQL
select customer-name
from borrower, loan, branch
where loan.loan-number=borrower.loan-number
and branch.branch-name=loan.branch-name
and branch-city=”Brooklyn”
and assets>100000 and amount<1000
Database System Concepts - Chapter 14 Query Optimization -
59
 customer-name
 customer- name, loan-number
loan.loan-number
loan.loan-number
branch-name
,branch-name
borrower ( customer-name,
loan-number, borrow-date)
 amount< 100
loan( loan-number, branchname , amount)
 branch-city=”Brooklyn”
AND assets>100000
Branch (branch-name,
branch-city, assets)
Example Four (cont.)




给定如下关系数据库
Employee(Employee#, Name, Address, Super-E#, E-D#)
Department(D#, Dname, manager#, depart-location)
Project(P#, Pname, Plocation, P-D#)
查询:对于每个在“哈尔滨”进行的工程项目,列出其工程
项目号P# ,工程所属部门号P-D#, 该部门领导的名字Name
和地址Address
要求:写出该查询的SQL语句;转换为关系代数表达式并利
用启发式方法进行查询优化;给出优化后的关系代数表达式。
Select P#, P-D#, Name, Address)
From Project, Department, Employee
Where Plocation =哈尔滨 AND P-D#= D#
AND manager# = Employee#
 P#, P-D#, Name, Address
 employee#, name, address
Employee
 P#, P-D#, manager#
 P#, P-D#
 P-location=“哈尔滨”
Project
D#, manager#
Department
作业







Consider the following schema, where the primary keys are
underlined ,
Suppliers (supplier-id, supplier-name, city, telephone, address)
Parts ( parts-id, parts-name, parts-color)
Catalog (supplier-id, parts-id, price )
.
(1) Give an SQL statement to find out the name and telephone of
the suppliers who supply a red part whose price is below $2000.
(2) Translate this SQL statement into an initial query tree, and
give an optimized query tree for it, by means of heuristic query
optimization.
May 2008
Database System Concepts - Chapter 14 Query Optimization -
63
Appendix A Sybase 平台下
的查询执行计划优化

Select max(BscpID)
From BSC
May 2008
Database System Concepts - Chapter 14 Query Optimization -
64
Appendix A Sybase 平台下
的查询执行计划优化(续)
May 2008
Database System Concepts - Chapter 14 Query Optimization -
65
Appendix A Sybase 平台下
的查询执行计划优化(续)

查询实验内容
 单表、多表(如3张表)的查询、插入、删除、修改所
对应的查询执行计划比较
 有无索引、索引类型对查询、插入、删除、修改等操作
的代价的影响
 查询结果相同、但表达形式不一样的多个等价的查询、
删除、插入、修改语句的查询执行计划的比较
例如。
delete from loan
where amount between 0 and 50
May 2008
Database System Concepts - Chapter 14 Query Optimization -
67
Appendix A Sybase 平台下
的查询执行计划优化(续)
delete from loan
where amount in (
select amount
from loan
where amount >= 0 and amount <= 50)
May 2008
Database System Concepts - Chapter 14 Query Optimization -
68
Appendix B
DDBS查询代价与数据分布的关系
分布式数据库系统DDBS中, 数据在各站点的分布和站点间
通讯代价/传输时间将影响到整个查询代价
 例.
4
 S(S#,SNAME,AGE,SEX) , 10 个元组,站点A
5
 C(C#,CNAME,TEACHER), 10 个元组,站点B
 SC(S#,C#,GRADE),
106个元组,站点A
4
 元组平均长度100bit,站点间传输速率v = 10 bit/s ,通
信延迟d=1s。假设无副本
 查询要求
 在支持分片透明性的DDBS(分布在A和B 2个站点)中,
查询“所有选修“MATH“的男生的学号和姓名”

May 2008
Database System Concepts - Chapter 14 Query Optimization -
69
Appendix B
DDBS查询代价与数据分布的关系 (续)


SQL(全局)查询语句
 SELECT
S#,SNAME
FROM S,SC, C
WHERE S.S# = SC.S# AND SC.C# = C.C# (连接条件)
AND SEX=M AND CNAME= MATH (选择条件)
查询代价(query cost)
 只考虑通信代价
TC (x) =1s ╳ 传输次数 + 传输的元组bit数 ÷ 104 bit/s
May 2008
Database System Concepts - Chapter 14 Query Optimization -
70
Appendix B
DDBS查询代价与数据分布的关系 (续)


策略1
 将关系C传到A地,在A地进行查询处理
5
4
3
 TC1 = 1s + 10 ╳100 bit ÷ 10 bit/s ≈ 10 s ≈ 16.7 min
策略2
 将关系S和SC传到B地,在B地进行查询处理
4
6
4
 TC2 = 2s + (10 +10 ) ╳ 100 bit ÷ 10 bit/s ≈10100
s≈28h
May 2008
Database System Concepts - Chapter 14 Query Optimization -
71
Appendix B
DDBS查询代价与数据分布的关系 (续)
策略3
5
 A地从关系SC、S中求出所有男学生的成绩元组,共10
个
 根据元组中的C#, 向B地的C关系询问(应答过程)
CNAME=MATH?共询问105次
5
 TC3 ≈ 2 ╳ 10 ╳ 1 ≈ 2.3天
 策略4
 在B地从关系C中求出CNAME=MATH的课程元组,假设
有10个
 根据C#询问A地的S、SC的连接,核实是否为选修MATH
(SC.C#= C#?)的男生(S.SEX=M?)

May 2008
Database System Concepts - Chapter 14 Query Optimization -
72
Appendix B
DDBS查询代价与数据分布的关系 (续)



TC4 ≈ 2 ╳ 10 ╳ 1 ≈ 20s
策略5
5
 在A地从关系S、SC中求出男生的的课程元组,有10 个
 将结果传到B地,在B地执行查询
5
4
3
 TC5= 1s + 10 ╳ 100 bit ÷ 10 bit/s ≈ 10 s ≈ 16.7 m
策略6
 在B地从关系C中求出CNAME=MATH的课程元组,有
10个
 将结果传到A地,在A地执行查询
4
 TC6 = 1s + 10 ╳100 bit ÷ 10 bit/s ≈ 1s
May 2008
Database System Concepts - Chapter 14 Query Optimization -
73
Appendix C DB2查询优化
对于IBM DB2来说,它有非常好的优化算法,大致是基于代
价估算后选择代价最小的策略。
 代价估算的方法
 搜集系统目录相关数据和要被执行的SQL语句,然后选
择若干方案进行代价计算。
 DB2应用程序开发过程中部署测试环境时,可以专门设置虚
拟的但是和将来实际运行环境比较一致的系统目录数据来模
拟真实环境下的DB2的SQL执行策略。
 对于DB2的优化器也可以设置不同的优化级别,可以使用
SET CURRENT QUERY OPTION命令或者某些子句进行相应
设置,其中
 0:最小的优化
 1:相当于DB2/6000 Version 1中的优化加一些低成本的
优化

May 2008
Database System Concepts - Chapter 14 Query Optimization -
74
Appendix C DB2查询优化(续)
2:使用opt level 5的功能,但连接优化上进行了一定简
化
 3:中等优化,大约相当于DB2 for Z/OS
 5:使用一定数量的优化,包括Heuristic Rules(这是默
认的值)
 7:使用一定数量的优化,不包括Heuristic Rules
 9:使用所有可用的优化技术

May 2008
Database System Concepts - Chapter 14 Query Optimization -
75
Appendix C DB2查询优化(续)
有4种查询优化策略访问工具,分别能够针对静态SQL、动
态SQL、CLI应用程序、DRDA应用程序请求者(DRDA AR)
提供图形界面或者文本界面查询访问或者提供基本信息。
 第一种。基本的是直接访问Explain Table(可以直接使用
SELECT语句来访问它,所以这是一个各种OS平台上都可以
采取的措施),它的缺点是没有提供任何界面,必须由人去
理解表中的数据。另外只能对执行的SQL来处理,如果是未
执行的静态包则无法产生信息。
 第二种。基于图形界面的Visual Explain(主要用于windows
平台,甚至可以在windows平台上对诸如UNIX上的DB2系统
上的SQL语句进行这种图形化分析),它不支持DADA AR,
也不支持分析未执行的静态包,不能一次分析多条SQL语句,
但最直观。 Database System Concepts - Chapter 14 Query Optimization - 76
May 2008

Appendix C DB2查询优化(续)


第三种。db2exfmt以预先定义好的报告格式获得explain
table的信息,它不支持DADA AR,也不支持分析未执行的
静态包,但支持一次分析多条语句。
第四种。db2expln允许我们查看作为静态绑定包(static
package)的一部分访问策略,它提供一种文本格式描述的
策略。不支持CLI或者DRDA AR,可以一次分析多条语句。
May 2008
Database System Concepts - Chapter 14 Query Optimization -
77
Have
a
break