* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download E - Read
Serializability wikipedia , lookup
Entity–attribute–value model wikipedia , lookup
Oracle Database wikipedia , lookup
Extensible Storage Engine wikipedia , lookup
Microsoft SQL Server wikipedia , lookup
Open Database Connectivity wikipedia , lookup
Concurrency control wikipedia , lookup
Microsoft Jet Database Engine wikipedia , lookup
Clusterpoint wikipedia , lookup
ContactPoint wikipedia , lookup
Relational algebra wikipedia , lookup
Versant Object Database wikipedia , lookup
PART 4
DATA STORAGE AND QUERY
Chapter 14
Query Optimization
Main parts in Chapter 14
§14.1 Overview
why optimization needed
§14.2 Transformation of relational expressions
equivalence rules
§14.4 Choice of evaluation plans — Query optimization
cost-based optimization,§14.2
heuristic optimization, §14.3
May 2008
Database System Concepts - Chapter 14 Query Optimization -
3
§14.1 Overview
A SQL query statement may corresponds to several equivalent
expressions ( or query evaluation plans ), each of which may
have different costs
Query optimization is needed to choice an efficient queryevaluation plan with lower costs
E.g.1.
select *
from r, s
where r.A = s.A
σr.A=s.A (r ╳ s) is much slower than r
May 2008
s
Database System Concepts - Chapter 14 Query Optimization -
4
§14.1 Overview
E.g.2 Find the names of all customers who have an account at
any branch located in Brooklyn
select customer-name
from branch, account, depositor
where branch.name=account.name
AND account.number =depositor.number
AND branch-city = Brooklyn
May 2008
Fig.14.1
Database System Concepts - Chapter 14 Query Optimization -
5
Fig. 14.1 Equivalent Expression
14.1 Overview (cont.)
The procedures of optimization
generating the equivalent expressions/query trees, by
transforming of relational algebra expressions according to
equivalence rules in§14.2
generating alternative evaluation plans, by annotating the
resultant equivalent expressions with implementation
algorithms for each operations in the expressions
choosing the optimal (that is, cheapest) or near-optimal
plan based on the estimated cost, by
cost-based optimization
heuristic optimization
May 2008
Database System Concepts - Chapter 14 Query Optimization -
7
14.1 Overview (cont.)
The cost of operations is needed to estimate based on statistical
information about relations
e.g. number of tuples, number of distinct values for join
attributes, etc.
May 2008
Database System Concepts - Chapter 14 Query Optimization -
8
§14.2 Transformation of Relational
Expressions
Definition. Two relational algebra expressions are equivalent, if
on every legal database instances, the two expression generate
the same set of tuples
May 2008
Database System Concepts - Chapter 14 Query Optimization -
9
14.2.1 Equivalence Rules
Fig.14.0.1 Schema diagram for the banking enterprise
May 2008
Database System Concepts - Chapter 14 Query Optimization -
10
14.2.1 Equivalence Rules (cont.)
Rule1.(选择串接律, 将1个选择操作分解为2个选择操作)
Conjunctive selection operations can be deconstructed into a
sequence of individual selections
( E ) ( ( E ))
1
2
1
2
e.g. refer to
Rule2. (选择交换律) Selection operations are commutative :
( ( E )) ( ( E ))
1
May 2008
2
2
1
Database System Concepts - Chapter 14 Query Optimization -
11
14.2.1 Equivalence Rules (cont.)
Rule3. (投影串接律) Only the final operation in a sequence of
project operations is needed
L1 ( L2 ( ( Ln ( E )) )) L1 ( E )
May 2008
note: it is required that L1⊆ L2⊆ ….⊆ Ln
e.g. {loan-number} { {loan-number, amount} (loan) }
= {loan-number} (loan)
Database System Concepts - Chapter 14 Query Optimization -
12
14.2.1 Equivalence Rules (cont.)
Rule4. (以连接操作代替选择和笛卡尔乘积) Selections can be
combined with Cartesian products and theta joins
a. (E1 ╳ E2) = E1 E2
note: definition of join
b. 1(E1 2 E2) = E1 1 2 E2
e.g. branch-name=“Perryridge”( borrower
loan) =
borrower (branch-name=“Perryridge) (borrower.loan-numbr=loan.loan-number)
loan
By Rule4, the number of operations can be reduced, and the costs
of the left-hand expressions are less than that of the right-hand
expressions
often used in heuristic query optimizing
May 2008
Database System Concepts - Chapter 14 Query Optimization -
13
14.2.1 Equivalence Rules (cont.)
Rule5. (连接操作可交换) Theta join operations are
commutative
E1 E2 = E2 E1
Fig. 14.2
May 2008
Database System Concepts - Chapter 14 Query Optimization -
14
14.2.1 Equivalence Rules (cont.)
Rule6. (连接操作的结合率, associative)
a. natural join operations are associative
(E1
E2) E3 = E1
(E2 E3)
b. Theta joins are associative in the following manner:
(E1 1 E2) 2 3 E3 = E1 1 3 (E2 2 E3)
, where 2 involves attributes from only E2 and E3
May 2008
Database System Concepts - Chapter 14 Query Optimization -
15
Rule 5
Rule 6a
Rule 7a
Fig.14.2 Pictorial Depiction of Equivalence Rules
14.2.1 Equivalence Rules (cont.)
Rule7. Selection operation distributes over theta-join(选择
操作对于连接操作的分配率,选择条件下移)
a. when all the attributes in 0 involve only the attributes
of one of the expressions , e.g. E1, being joined
0E1
May 2008
E2) = (0(E1))
E2
b. when 1 involves only the attributes of E1, and 2
involves only the attributes of E2
1 E1
e.g. in Fig.14.0.2
E2) = (1(E1))
( (E2))
Database System Concepts - Chapter 14 Query Optimization -
17
When Rule7 is applied, the size of relations participating in
theta-join operation is reduced, thus evaluation costs are reduced
branch
account
Fig.14.0.2 An example of Rule7
14.2.1 Equivalence Rules (cont.)
Rule8. Projection operation distributes over theta-join(投影操作
对于连接操作的分配率,投影下移)
a. L1 and L2 are the attributes of E1 and E2 respectively. if
join condition involves only attributes in L1 L2, then
L1 L2 ( E1....... E2 ) ( L1 ( E1 ))...... ( L2 ( E2 ))
May 2008
Database System Concepts - Chapter 14 Query Optimization -
19
L1
L2
{branch-name, branch-city} {branch-name, amount}
branch
loan
{branch-name, branch-city} {branch-name, amount}
branch
E1:
branch (branch-name, branch-city, assets)
E2:
loan(loan-number, branch-name, amount)
Fig.14.0.3 An example of Rule 8a
loan
14.2.1 Equivalence Rules (cont.)
b. consider a join E1 E2
let L1 and L2 be sets of attributes from E1 and E2,
respectively
let L3 be attributes of E1 that are involved in join condition
, but are not in L1 L2, and
let L4 be attributes of E2 that are involved in join condition
, but are not in L1 L2.
L1 L2 ( E1..... E2 ) L1 L2 (( L1 L3 ( E1 ))...... ( L2 L4 ( E2 )))
May 2008
Database System Concepts - Chapter 14 Query Optimization -
21
L1
L2
{branch-city} { loan-number}
E1:
branch (branch-name, branch-city, assets)
E2:
loan(loan-number, branch-name, amount)
{branch-city} { loan-number}
B. assets> L. amount
L3
branch
L4
loan
B. assets> L. amount
{branch-city, assets}
branch
Fig.14.0.4 An example of Rule 8b
{loan-number, amount}
loan
14.2.1 Equivalence Rules (cont.)
Rule9. The set operations union and intersection are
commutative
E1 E2 = E2 E1
E1 E2 = E2 E1
set difference is not commutative
Rule10. Set union and intersection are associative
(E1 E2) E3 = E1 (E2 E3)
(E1 E2) E3 = E1 (E2 E3)
May 2008
Database System Concepts - Chapter 14 Query Optimization -
23
14.2.1 Equivalence Rules (cont.)
Rule11. (选择操作对于集合并、交、差的分配率) The
selection operation distributes over , and –.
(E1 – E2) = (E1) – (E2)
similarly for and in place of –
(E1 – E2) = (E1) – E2
similarly for in place of –, but not for
Rule12. (投影操作对于集合并的分配率) The projection
operation distributes over union
L(E1 E2) = (L(E1)) (L(E2))
May 2008
Database System Concepts - Chapter 14 Query Optimization -
24
14.2.2 Example One
Find the names of all customers who have an account at some
branch located in Brooklyn
customer-name(branch-city = “Brooklyn”
(branch (account depositor)))
Transformation using rule 7a.
customer-name
((branch-city =“Brooklyn” (branch))
(account depositor))
Performing the selection as early as possible reduces the size
of the relation to be joined.
May 2008
Database System Concepts - Chapter 14 Query Optimization -
25
14.2.2 Example Two
For
customer-name((branch-city = “Brooklyn” (branch)
account)
depositor)
When we compute
t ←(branch-city = “Brooklyn” (branch) account )
we obtain a temporal relation t whose schema is:
T(branch-name, branch-city, assets, account-number, balance)
Push projections using equivalence rules 8a and 8b; eliminate
unneeded attributes from intermediate results to get:
customer-name (
account-number t(branch-name, branch-city, assets, accountnumber, balance)
depositor(customer-name, account-number))
May 2008
Database System Concepts - Chapter 14 Query Optimization -
26
14.2.2 Example Three
Find the names of all customers with an account at a Brooklyn
branch whose account balance is over $1000.
customer-name((branch-city = “Brooklyn” balance > 1000
(branch (account depositor)))
Transformation using join associatively (Rule 6a), refer to
Fig.14.3
May 2008
Database System Concepts - Chapter 14 Query Optimization -
27
14.2.2 Example Three (cont.)
Fig. 14.3 Multiple transformations
May 2008
Database System Concepts - Chapter 14 Query Optimization -
28
14.2.3 Join Ordering
For all relations r1, r2, and r3,
(r1 r2) r3 = r1 (r2 r3 )
If r2 r3 is quite large and r1 r2 is small, we choose
(r1 r2) r3
so that we compute and store a smaller temporary relation.
May 2008
Database System Concepts - Chapter 14 Query Optimization -
29
14.2.3 An Example
Consider the expression
customer-name ((branch-city = “Brooklyn” (branch))
account depositor)
Could compute account depositor first, and join result with
branch-city = “Brooklyn” (branch)
, but account depositor is likely to be a large relation.
Since it is more likely that only a small fraction of the bank’s
customers have accounts in branches located in Brooklyn, it is
better to compute
branch-city = “Brooklyn” (branch) account
first.
May 2008
Database System Concepts - Chapter 14 Query Optimization -
30
§14.3 Estimating Statistics of
Expression Results
How to estimate the cost of an operation
estimating the size of the resultant relations produced by the
operation p, according to statistics in the catalog about the
relations on which the operation is conducted
May 2008
e.g. customer-name(balance2500(account)
customer)
computing the cost of the operation in terms of disk access,
by means of the cost formulae shown in chapter13
disk access is the predominant cost, and the number of
block transfers from disk is used as a measure of the
actual cost of evaluation.
Database System Concepts - Chapter 14 Query Optimization -
31
14.3.1 Statistics in Catalog for
Optimization
nr: number of tuples in a relation r
br: number of blocks containing tuples of r
lr: size of a tuple of r in bytes
fr: blocking factor of r
the number of tuples of r that fit into one block
V(A, r): number of distinct values that appear in r for attribute
A
same as the size of A(r)
Statistics about indices, such as heights of B+-trees, that is HTi,
and the number of leaf pages in the indices, are also in catalog
May 2008
Database System Concepts - Chapter 14 Query Optimization -
32
Statistics in Catalog for Optimization
(cont.)
If tuples of r are stored together physically in a file, then:
nr
br
f r
May 2008
Database System Concepts - Chapter 14 Query Optimization -
33
Statistics in Catalog for Optimization
(cont.)
An example for statistical information
faccount= 20 (20 tuples of account fit in one block)
V(branch-name, account) = 50 (50 branches)
V(balance, account) = 500 (500 different balance values)
naccount = 10000 (account has 10,000 tuples)
assume the following indices exist on account
+
a primary, B -tree index for attribute branch-name
+
a secondary, B -tree index for attribute balance
May 2008
Database System Concepts - Chapter 14 Query Optimization -
34
14.3.2 Selection Size Estimation
The size estimation of the result of a selection operation depends
on the selection predicates, that is single equality predicate,
single comparison predicate, and combination of predicates
Equality selection A=a (r)
assuming uniform distribution of values in relations, and the
value a appears in attribute A of some record in r
the selection is estimated to have
nr / V(A, r) tuples
nr /V(A, r) /fr blocks that these tuples occupy
May 2008
Database System Concepts - Chapter 14 Query Optimization -
35
14.3.2 Selection Size Estimation (cont.)
Comparison selections AV(r)
the lowest and highest values of attributes, that is min(A, r)
and max(A, r), are available in catalog
the estimated number of records/tuples satisfying comparison
condition A V
0,
if v < min(A, r)
nr,
if v max(A, r)
, otherwise
v min( A, r )
nr .
max( A, r ) min( A, r )
May 2008
in absence of statistical information cost is assumed to be
nr / 2
Database System Concepts - Chapter 14 Query Optimization -
36
14.3.2 Selection Size Estimation (cont.)
The selectivity of a condition i is the probability that a tuple in
the relation r satisfies i . If si is the number of satisfying
tuples in r, the selectivity of i is given by si /nr.
Conjunction: 1 2. . . n (r). The estimate for number of
tuples in the result is: n s1 s2 . . . sn
r
nrn
Disjunction:1 2 . . . n (r). Estimated number of tuples:
sn
s1
s2
nr 1 (1 ) (1 ) ... (1 )
nr
nr
nr
Negation: (r). Estimated number of tuples:
nr – size((r))
May 2008
Database System Concepts - Chapter 14 Query Optimization -
37
14.4 Choice of Evaluation Plans
— Query optimization
How to generate the evaluation plan for an relational algebra
expression with lower or the lowest cost
generating of the optimized query trees
selecting
implementation algorithms for each operations in the tree
pipeline or materialization strategy for the expression
E.g. Fig.14.5
May 2008
Database System Concepts - Chapter 14 Query Optimization -
38
Fig.14.5 An evaluation plan
14.4 Choice of Evaluation Plans
(cont.)
Query optimization
choosing the evaluation plan with lowest or the lower cost
Query optimization is conducted by two strategies/approaches
cost-based optimization,§14.4.2
heuristic optimization, §14.4.3!
Practical query optimizers incorporate elements of these two
approaches
May 2008
Database System Concepts - Chapter 14 Query Optimization -
40
14.4.2 Cost-based Optimization
Cost-based optimization
search all the possible plans and choose the best plan
a optimal plan with the minimum costs can be obtained
Cost-based optimization is expensive, though results optimal plan
May 2008
Database System Concepts - Chapter 14 Query Optimization -
41
14.4.3 Heuristic Optimization
Heuristic optimization
uses heuristics (or heuristic rules) to choose a better plan
a suboptimal plan with lower costs can be obtained
Principles: transforms the query-tree by heuristics to reduce cost
perform selection early to reduce the number of tuples
perform projection early to reduces the number of attributes
substitute Cartesian product and selection with join
May 2008
Database System Concepts - Chapter 14 Query Optimization -
42
14.4.3 Heuristic Optimization (cont.)
May 2008
perform most restrictive selection and join operations before
other similar operations
the most restrictive operations generate resulting relations
with smallest size
Database System Concepts - Chapter 14 Query Optimization -
43
Steps in Heuristic Optimization
Step1 使用rule1,将conjunctive selection 分解为多个单独
的选择操作,以使单个选择操作尽可能沿查询树下移(尽
早执行选择操作,以减少中间计算结果)
Step2 根据选择操作的交换率和分配率,利用rule2, rule7.a,
rule7.b, rule11, 将查询树上的每个选择操作尽可能移向叶
节点,以便尽早执行选择操作
Step3 根据连接操作的结合律和交换率,使用rule6,重新
安排查询树中的叶结点,使得具有restrictive selection特征
的叶结点先执行
restrictive selection:执行此操作后,产生的结果关系
最小(所含元组最少)
refer to Fig. 14.3
May 2008
Database System Concepts - Chapter 14 Query Optimization -
44
Steps in Heuristic Optimization
(cont.)
Step4 利用rule4.a, 以连接操作代替相邻的选择和笛卡尔乘
积操作
Step5 利用rule3, 8.a, 8.b,12, 将查询树上的投影操作尽可能
下移,以便尽早执行投影操作,减少中间计算结果
Step6 将最后的查询树分解为多个子树,使子树中的各操
作可以采用流水线方式执行,以减少对外设的访问次数
e.g. Fig.14.5
重点:前5步 !!!
May 2008
Database System Concepts - Chapter 14 Query Optimization -
45
Example One
Given
S(S#, sname, age, sex)
C(C#, cname, teacher)
SC(S#, C#, grade)
Find all the students, who are male, and get grades more than
90 when they learn some one course
Step1. SQL statement
selection conditions
SELECT DISTINCT sname
FROM
S, SC
WHERE S.s# = SC.s# AND sex=M AND grade > 90
join condition
May 2008
Database System Concepts - Chapter 14 Query Optimization -
46
Example One (cont.)
Principles
select A1, A2, …, An
from r1, r2, …, rn
where P
corresponds to
∏A1, A2, …, An (σP ( r1 ╳ r2 ╳ …╳ rn ) )
May 2008
Database System Concepts - Chapter 14 Query Optimization -
47
Example One (cont.)
Step2. Initial Relational algebra expression
E1 =
sname( s.s#=sc.s# ∧ sex=M ∧grade>90 (S ╳ SC ) )
Step3. Initial query tree
Fig. 14.0.5 (a), E1
Step4. By means of Rule1, Rule7b, distribute operations over
relation S and SC
Fig. 14.0.5 (b), E2
May 2008
Database System Concepts - Chapter 14 Query Optimization -
48
sname
sname
s.s#=sc.s#
s.s#=sc.s# ∧ sex=M ∧grade>90
╳
╳
sex=M
grade>90
SC (S#, C#, grade)
S
(S#, sname, age, sex)
(a) Initial query tree
E1
S
SC
(b)E2
Example One (cont.)
Step5. By means of Rule4.a, replace selection and Cartesian
product with natural join
Fig. 14.0.5 (c), E3
Step6. !! By means of Rule8b, distribute operations over
relation S and SC
Fig. 14.0.5 (d),
The final optimized expression E4 that is equivalent to initial
expression E1 is
sname {sname, S# (sex=M (S))
grade>90 (grade, S# (SC)) }
May 2008
Database System Concepts - Chapter 14 Query Optimization -
50
sname
sname
s.s#=sc.s#
sex=M
grade>90
SC (S#, C#, grade)
S
s.s#=sc.s#
Πsname, S#
ΠS#
sex=M
grade>90
S
SC(S#, C#, grade)
(S#, sname, age, sex)
(S#, sname, age, sex)
(c)E3
(d)E4
Example Two
Consider the following insurance database, where the primary
keys are underlined,
Person(driver-id, name, address)
Car(license, model, year)
Accident(report-number, date, location)
Participated(driver-id , license, report-number, damageamount)
May 2008
Database System Concepts - Chapter 14 Query Optimization -
52
Example Two (cont.)
Problems
Give a SQL statement to find out the driver name, license,
report-number of the accidents which happened in Beijing
and before May 3, 2008
Give the initial query tree for this query, and construct an
optimized and equivalent relational algebra expression for it
by means of heuristic optimization
May 2008
Database System Concepts - Chapter 14 Query Optimization -
53
Example Two (cont.)
Step1. SQL statement
Select name, license, Accident.report-number
From Person, Participate, Accident
Where Person.driver_id = Participate.driver_id AND
Accident.report_number = Participate.report_number
AND Location=Beijing AND date < 5/3/2008
Step2. Initial expression
name, license, Accident.report-number
( Person.driver_id = Participate.driver_id
AND Accident.report_number =
Participate.report_number AND Location=Beijing AND date < 5/3/2003
(Person ╳ Participate ╳ Accident )
May 2008
Database System Concepts - Chapter 14 Query Optimization -
54
name, license, Accident.report-number
Person.driver_id = Participate.driver_id
AND
Accident.report_number = Participate.report_number AND
Location=Beijing AND date < 5/3/2003
╳
╳
Person
Participate
Accident
(a) initial query tree
name, license, Accident.report-number
Accident.report.number = Participate.report_number
name, license,
Accident.report_number
report-number
Person.driver_id = Participate.driver_id
driver_id,
name
Location=Beijing
driver_id,
AND date < 5/3/2003
license,
report-number
Participate
Accident
(report-number, date ,location)
Person
(driver-id, license, report-number, damage-amount)
(driver-id, name, address)
(b) optimal query tree
Example Three
Consider the following relations in banking enterprise database,
where the primary keys are underlined
branch (branch-name, branch-city, assets),
loan ( loan-number, branch-name , amount)
borrower( customer-name, loan-number, borrow-date)
customer (customer-name, customer-street, customer-city)
account (account-number, branch-name, balance)
depositor (customer-name, account-number , deposit-date)
May 2008
Database System Concepts - Chapter 14 Query Optimization -
57
Example Three (cont.)
For the query “ Find the names of all customers who have an
loan at any branch that is located in Brooklyn and have assets
more than $100,000, requiring that loan-amount is less than
$1000”
give an SQL statement for this query
given a initial query tree for the query, and convert it into an
optimized query tree by means of heuristic optimization
May 2008
Database System Concepts - Chapter 14 Query Optimization -
58
Example Three (cont.)
May 2008
SQL
select customer-name
from borrower, loan, branch
where loan.loan-number=borrower.loan-number
and branch.branch-name=loan.branch-name
and branch-city=”Brooklyn”
and assets>100000 and amount<1000
Database System Concepts - Chapter 14 Query Optimization -
59
customer-name
customer- name, loan-number
loan.loan-number
loan.loan-number
branch-name
,branch-name
borrower ( customer-name,
loan-number, borrow-date)
amount< 100
loan( loan-number, branchname , amount)
branch-city=”Brooklyn”
AND assets>100000
Branch (branch-name,
branch-city, assets)
Example Four (cont.)
给定如下关系数据库
Employee(Employee#, Name, Address, Super-E#, E-D#)
Department(D#, Dname, manager#, depart-location)
Project(P#, Pname, Plocation, P-D#)
查询:对于每个在“哈尔滨”进行的工程项目,列出其工程
项目号P# ,工程所属部门号P-D#, 该部门领导的名字Name
和地址Address
要求:写出该查询的SQL语句;转换为关系代数表达式并利
用启发式方法进行查询优化;给出优化后的关系代数表达式。
Select P#, P-D#, Name, Address)
From Project, Department, Employee
Where Plocation =哈尔滨 AND P-D#= D#
AND manager# = Employee#
P#, P-D#, Name, Address
employee#, name, address
Employee
P#, P-D#, manager#
P#, P-D#
P-location=“哈尔滨”
Project
D#, manager#
Department
作业
Consider the following schema, where the primary keys are
underlined ,
Suppliers (supplier-id, supplier-name, city, telephone, address)
Parts ( parts-id, parts-name, parts-color)
Catalog (supplier-id, parts-id, price )
.
(1) Give an SQL statement to find out the name and telephone of
the suppliers who supply a red part whose price is below $2000.
(2) Translate this SQL statement into an initial query tree, and
give an optimized query tree for it, by means of heuristic query
optimization.
May 2008
Database System Concepts - Chapter 14 Query Optimization -
63
Appendix A Sybase 平台下
的查询执行计划优化
Select max(BscpID)
From BSC
May 2008
Database System Concepts - Chapter 14 Query Optimization -
64
Appendix A Sybase 平台下
的查询执行计划优化(续)
May 2008
Database System Concepts - Chapter 14 Query Optimization -
65
Appendix A Sybase 平台下
的查询执行计划优化(续)
查询实验内容
单表、多表(如3张表)的查询、插入、删除、修改所
对应的查询执行计划比较
有无索引、索引类型对查询、插入、删除、修改等操作
的代价的影响
查询结果相同、但表达形式不一样的多个等价的查询、
删除、插入、修改语句的查询执行计划的比较
例如。
delete from loan
where amount between 0 and 50
May 2008
Database System Concepts - Chapter 14 Query Optimization -
67
Appendix A Sybase 平台下
的查询执行计划优化(续)
delete from loan
where amount in (
select amount
from loan
where amount >= 0 and amount <= 50)
May 2008
Database System Concepts - Chapter 14 Query Optimization -
68
Appendix B
DDBS查询代价与数据分布的关系
分布式数据库系统DDBS中, 数据在各站点的分布和站点间
通讯代价/传输时间将影响到整个查询代价
例.
4
S(S#,SNAME,AGE,SEX) , 10 个元组,站点A
5
C(C#,CNAME,TEACHER), 10 个元组,站点B
SC(S#,C#,GRADE),
106个元组,站点A
4
元组平均长度100bit,站点间传输速率v = 10 bit/s ,通
信延迟d=1s。假设无副本
查询要求
在支持分片透明性的DDBS(分布在A和B 2个站点)中,
查询“所有选修“MATH“的男生的学号和姓名”
May 2008
Database System Concepts - Chapter 14 Query Optimization -
69
Appendix B
DDBS查询代价与数据分布的关系 (续)
SQL(全局)查询语句
SELECT
S#,SNAME
FROM S,SC, C
WHERE S.S# = SC.S# AND SC.C# = C.C# (连接条件)
AND SEX=M AND CNAME= MATH (选择条件)
查询代价(query cost)
只考虑通信代价
TC (x) =1s ╳ 传输次数 + 传输的元组bit数 ÷ 104 bit/s
May 2008
Database System Concepts - Chapter 14 Query Optimization -
70
Appendix B
DDBS查询代价与数据分布的关系 (续)
策略1
将关系C传到A地,在A地进行查询处理
5
4
3
TC1 = 1s + 10 ╳100 bit ÷ 10 bit/s ≈ 10 s ≈ 16.7 min
策略2
将关系S和SC传到B地,在B地进行查询处理
4
6
4
TC2 = 2s + (10 +10 ) ╳ 100 bit ÷ 10 bit/s ≈10100
s≈28h
May 2008
Database System Concepts - Chapter 14 Query Optimization -
71
Appendix B
DDBS查询代价与数据分布的关系 (续)
策略3
5
A地从关系SC、S中求出所有男学生的成绩元组,共10
个
根据元组中的C#, 向B地的C关系询问(应答过程)
CNAME=MATH?共询问105次
5
TC3 ≈ 2 ╳ 10 ╳ 1 ≈ 2.3天
策略4
在B地从关系C中求出CNAME=MATH的课程元组,假设
有10个
根据C#询问A地的S、SC的连接,核实是否为选修MATH
(SC.C#= C#?)的男生(S.SEX=M?)
May 2008
Database System Concepts - Chapter 14 Query Optimization -
72
Appendix B
DDBS查询代价与数据分布的关系 (续)
TC4 ≈ 2 ╳ 10 ╳ 1 ≈ 20s
策略5
5
在A地从关系S、SC中求出男生的的课程元组,有10 个
将结果传到B地,在B地执行查询
5
4
3
TC5= 1s + 10 ╳ 100 bit ÷ 10 bit/s ≈ 10 s ≈ 16.7 m
策略6
在B地从关系C中求出CNAME=MATH的课程元组,有
10个
将结果传到A地,在A地执行查询
4
TC6 = 1s + 10 ╳100 bit ÷ 10 bit/s ≈ 1s
May 2008
Database System Concepts - Chapter 14 Query Optimization -
73
Appendix C DB2查询优化
对于IBM DB2来说,它有非常好的优化算法,大致是基于代
价估算后选择代价最小的策略。
代价估算的方法
搜集系统目录相关数据和要被执行的SQL语句,然后选
择若干方案进行代价计算。
DB2应用程序开发过程中部署测试环境时,可以专门设置虚
拟的但是和将来实际运行环境比较一致的系统目录数据来模
拟真实环境下的DB2的SQL执行策略。
对于DB2的优化器也可以设置不同的优化级别,可以使用
SET CURRENT QUERY OPTION命令或者某些子句进行相应
设置,其中
0:最小的优化
1:相当于DB2/6000 Version 1中的优化加一些低成本的
优化
May 2008
Database System Concepts - Chapter 14 Query Optimization -
74
Appendix C DB2查询优化(续)
2:使用opt level 5的功能,但连接优化上进行了一定简
化
3:中等优化,大约相当于DB2 for Z/OS
5:使用一定数量的优化,包括Heuristic Rules(这是默
认的值)
7:使用一定数量的优化,不包括Heuristic Rules
9:使用所有可用的优化技术
May 2008
Database System Concepts - Chapter 14 Query Optimization -
75
Appendix C DB2查询优化(续)
有4种查询优化策略访问工具,分别能够针对静态SQL、动
态SQL、CLI应用程序、DRDA应用程序请求者(DRDA AR)
提供图形界面或者文本界面查询访问或者提供基本信息。
第一种。基本的是直接访问Explain Table(可以直接使用
SELECT语句来访问它,所以这是一个各种OS平台上都可以
采取的措施),它的缺点是没有提供任何界面,必须由人去
理解表中的数据。另外只能对执行的SQL来处理,如果是未
执行的静态包则无法产生信息。
第二种。基于图形界面的Visual Explain(主要用于windows
平台,甚至可以在windows平台上对诸如UNIX上的DB2系统
上的SQL语句进行这种图形化分析),它不支持DADA AR,
也不支持分析未执行的静态包,不能一次分析多条SQL语句,
但最直观。 Database System Concepts - Chapter 14 Query Optimization - 76
May 2008
Appendix C DB2查询优化(续)
第三种。db2exfmt以预先定义好的报告格式获得explain
table的信息,它不支持DADA AR,也不支持分析未执行的
静态包,但支持一次分析多条语句。
第四种。db2expln允许我们查看作为静态绑定包(static
package)的一部分访问策略,它提供一种文本格式描述的
策略。不支持CLI或者DRDA AR,可以一次分析多条语句。
May 2008
Database System Concepts - Chapter 14 Query Optimization -
77
Have
a
break