* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download E - Read
Survey
Document related concepts
Serializability wikipedia , lookup
Entity–attribute–value model wikipedia , lookup
Oracle Database wikipedia , lookup
Extensible Storage Engine wikipedia , lookup
Microsoft SQL Server wikipedia , lookup
Open Database Connectivity wikipedia , lookup
Concurrency control wikipedia , lookup
Microsoft Jet Database Engine wikipedia , lookup
Clusterpoint wikipedia , lookup
ContactPoint wikipedia , lookup
Relational algebra wikipedia , lookup
Versant Object Database wikipedia , lookup
Transcript
PART 4 DATA STORAGE AND QUERY Chapter 14 Query Optimization Main parts in Chapter 14 §14.1 Overview why optimization needed §14.2 Transformation of relational expressions equivalence rules §14.4 Choice of evaluation plans — Query optimization cost-based optimization,§14.2 heuristic optimization, §14.3 May 2008 Database System Concepts - Chapter 14 Query Optimization - 3 §14.1 Overview A SQL query statement may corresponds to several equivalent expressions ( or query evaluation plans ), each of which may have different costs Query optimization is needed to choice an efficient queryevaluation plan with lower costs E.g.1. select * from r, s where r.A = s.A σr.A=s.A (r ╳ s) is much slower than r May 2008 s Database System Concepts - Chapter 14 Query Optimization - 4 §14.1 Overview E.g.2 Find the names of all customers who have an account at any branch located in Brooklyn select customer-name from branch, account, depositor where branch.name=account.name AND account.number =depositor.number AND branch-city = Brooklyn May 2008 Fig.14.1 Database System Concepts - Chapter 14 Query Optimization - 5 Fig. 14.1 Equivalent Expression 14.1 Overview (cont.) The procedures of optimization generating the equivalent expressions/query trees, by transforming of relational algebra expressions according to equivalence rules in§14.2 generating alternative evaluation plans, by annotating the resultant equivalent expressions with implementation algorithms for each operations in the expressions choosing the optimal (that is, cheapest) or near-optimal plan based on the estimated cost, by cost-based optimization heuristic optimization May 2008 Database System Concepts - Chapter 14 Query Optimization - 7 14.1 Overview (cont.) The cost of operations is needed to estimate based on statistical information about relations e.g. number of tuples, number of distinct values for join attributes, etc. May 2008 Database System Concepts - Chapter 14 Query Optimization - 8 §14.2 Transformation of Relational Expressions Definition. Two relational algebra expressions are equivalent, if on every legal database instances, the two expression generate the same set of tuples May 2008 Database System Concepts - Chapter 14 Query Optimization - 9 14.2.1 Equivalence Rules Fig.14.0.1 Schema diagram for the banking enterprise May 2008 Database System Concepts - Chapter 14 Query Optimization - 10 14.2.1 Equivalence Rules (cont.) Rule1.(选择串接律, 将1个选择操作分解为2个选择操作) Conjunctive selection operations can be deconstructed into a sequence of individual selections ( E ) ( ( E )) 1 2 1 2 e.g. refer to Rule2. (选择交换律) Selection operations are commutative : ( ( E )) ( ( E )) 1 May 2008 2 2 1 Database System Concepts - Chapter 14 Query Optimization - 11 14.2.1 Equivalence Rules (cont.) Rule3. (投影串接律) Only the final operation in a sequence of project operations is needed L1 ( L2 ( ( Ln ( E )) )) L1 ( E ) May 2008 note: it is required that L1⊆ L2⊆ ….⊆ Ln e.g. {loan-number} { {loan-number, amount} (loan) } = {loan-number} (loan) Database System Concepts - Chapter 14 Query Optimization - 12 14.2.1 Equivalence Rules (cont.) Rule4. (以连接操作代替选择和笛卡尔乘积) Selections can be combined with Cartesian products and theta joins a. (E1 ╳ E2) = E1 E2 note: definition of join b. 1(E1 2 E2) = E1 1 2 E2 e.g. branch-name=“Perryridge”( borrower loan) = borrower (branch-name=“Perryridge) (borrower.loan-numbr=loan.loan-number) loan By Rule4, the number of operations can be reduced, and the costs of the left-hand expressions are less than that of the right-hand expressions often used in heuristic query optimizing May 2008 Database System Concepts - Chapter 14 Query Optimization - 13 14.2.1 Equivalence Rules (cont.) Rule5. (连接操作可交换) Theta join operations are commutative E1 E2 = E2 E1 Fig. 14.2 May 2008 Database System Concepts - Chapter 14 Query Optimization - 14 14.2.1 Equivalence Rules (cont.) Rule6. (连接操作的结合率, associative) a. natural join operations are associative (E1 E2) E3 = E1 (E2 E3) b. Theta joins are associative in the following manner: (E1 1 E2) 2 3 E3 = E1 1 3 (E2 2 E3) , where 2 involves attributes from only E2 and E3 May 2008 Database System Concepts - Chapter 14 Query Optimization - 15 Rule 5 Rule 6a Rule 7a Fig.14.2 Pictorial Depiction of Equivalence Rules 14.2.1 Equivalence Rules (cont.) Rule7. Selection operation distributes over theta-join(选择 操作对于连接操作的分配率,选择条件下移) a. when all the attributes in 0 involve only the attributes of one of the expressions , e.g. E1, being joined 0E1 May 2008 E2) = (0(E1)) E2 b. when 1 involves only the attributes of E1, and 2 involves only the attributes of E2 1 E1 e.g. in Fig.14.0.2 E2) = (1(E1)) ( (E2)) Database System Concepts - Chapter 14 Query Optimization - 17 When Rule7 is applied, the size of relations participating in theta-join operation is reduced, thus evaluation costs are reduced branch account Fig.14.0.2 An example of Rule7 14.2.1 Equivalence Rules (cont.) Rule8. Projection operation distributes over theta-join(投影操作 对于连接操作的分配率,投影下移) a. L1 and L2 are the attributes of E1 and E2 respectively. if join condition involves only attributes in L1 L2, then L1 L2 ( E1....... E2 ) ( L1 ( E1 ))...... ( L2 ( E2 )) May 2008 Database System Concepts - Chapter 14 Query Optimization - 19 L1 L2 {branch-name, branch-city} {branch-name, amount} branch loan {branch-name, branch-city} {branch-name, amount} branch E1: branch (branch-name, branch-city, assets) E2: loan(loan-number, branch-name, amount) Fig.14.0.3 An example of Rule 8a loan 14.2.1 Equivalence Rules (cont.) b. consider a join E1 E2 let L1 and L2 be sets of attributes from E1 and E2, respectively let L3 be attributes of E1 that are involved in join condition , but are not in L1 L2, and let L4 be attributes of E2 that are involved in join condition , but are not in L1 L2. L1 L2 ( E1..... E2 ) L1 L2 (( L1 L3 ( E1 ))...... ( L2 L4 ( E2 ))) May 2008 Database System Concepts - Chapter 14 Query Optimization - 21 L1 L2 {branch-city} { loan-number} E1: branch (branch-name, branch-city, assets) E2: loan(loan-number, branch-name, amount) {branch-city} { loan-number} B. assets> L. amount L3 branch L4 loan B. assets> L. amount {branch-city, assets} branch Fig.14.0.4 An example of Rule 8b {loan-number, amount} loan 14.2.1 Equivalence Rules (cont.) Rule9. The set operations union and intersection are commutative E1 E2 = E2 E1 E1 E2 = E2 E1 set difference is not commutative Rule10. Set union and intersection are associative (E1 E2) E3 = E1 (E2 E3) (E1 E2) E3 = E1 (E2 E3) May 2008 Database System Concepts - Chapter 14 Query Optimization - 23 14.2.1 Equivalence Rules (cont.) Rule11. (选择操作对于集合并、交、差的分配率) The selection operation distributes over , and –. (E1 – E2) = (E1) – (E2) similarly for and in place of – (E1 – E2) = (E1) – E2 similarly for in place of –, but not for Rule12. (投影操作对于集合并的分配率) The projection operation distributes over union L(E1 E2) = (L(E1)) (L(E2)) May 2008 Database System Concepts - Chapter 14 Query Optimization - 24 14.2.2 Example One Find the names of all customers who have an account at some branch located in Brooklyn customer-name(branch-city = “Brooklyn” (branch (account depositor))) Transformation using rule 7a. customer-name ((branch-city =“Brooklyn” (branch)) (account depositor)) Performing the selection as early as possible reduces the size of the relation to be joined. May 2008 Database System Concepts - Chapter 14 Query Optimization - 25 14.2.2 Example Two For customer-name((branch-city = “Brooklyn” (branch) account) depositor) When we compute t ←(branch-city = “Brooklyn” (branch) account ) we obtain a temporal relation t whose schema is: T(branch-name, branch-city, assets, account-number, balance) Push projections using equivalence rules 8a and 8b; eliminate unneeded attributes from intermediate results to get: customer-name ( account-number t(branch-name, branch-city, assets, accountnumber, balance) depositor(customer-name, account-number)) May 2008 Database System Concepts - Chapter 14 Query Optimization - 26 14.2.2 Example Three Find the names of all customers with an account at a Brooklyn branch whose account balance is over $1000. customer-name((branch-city = “Brooklyn” balance > 1000 (branch (account depositor))) Transformation using join associatively (Rule 6a), refer to Fig.14.3 May 2008 Database System Concepts - Chapter 14 Query Optimization - 27 14.2.2 Example Three (cont.) Fig. 14.3 Multiple transformations May 2008 Database System Concepts - Chapter 14 Query Optimization - 28 14.2.3 Join Ordering For all relations r1, r2, and r3, (r1 r2) r3 = r1 (r2 r3 ) If r2 r3 is quite large and r1 r2 is small, we choose (r1 r2) r3 so that we compute and store a smaller temporary relation. May 2008 Database System Concepts - Chapter 14 Query Optimization - 29 14.2.3 An Example Consider the expression customer-name ((branch-city = “Brooklyn” (branch)) account depositor) Could compute account depositor first, and join result with branch-city = “Brooklyn” (branch) , but account depositor is likely to be a large relation. Since it is more likely that only a small fraction of the bank’s customers have accounts in branches located in Brooklyn, it is better to compute branch-city = “Brooklyn” (branch) account first. May 2008 Database System Concepts - Chapter 14 Query Optimization - 30 §14.3 Estimating Statistics of Expression Results How to estimate the cost of an operation estimating the size of the resultant relations produced by the operation p, according to statistics in the catalog about the relations on which the operation is conducted May 2008 e.g. customer-name(balance2500(account) customer) computing the cost of the operation in terms of disk access, by means of the cost formulae shown in chapter13 disk access is the predominant cost, and the number of block transfers from disk is used as a measure of the actual cost of evaluation. Database System Concepts - Chapter 14 Query Optimization - 31 14.3.1 Statistics in Catalog for Optimization nr: number of tuples in a relation r br: number of blocks containing tuples of r lr: size of a tuple of r in bytes fr: blocking factor of r the number of tuples of r that fit into one block V(A, r): number of distinct values that appear in r for attribute A same as the size of A(r) Statistics about indices, such as heights of B+-trees, that is HTi, and the number of leaf pages in the indices, are also in catalog May 2008 Database System Concepts - Chapter 14 Query Optimization - 32 Statistics in Catalog for Optimization (cont.) If tuples of r are stored together physically in a file, then: nr br f r May 2008 Database System Concepts - Chapter 14 Query Optimization - 33 Statistics in Catalog for Optimization (cont.) An example for statistical information faccount= 20 (20 tuples of account fit in one block) V(branch-name, account) = 50 (50 branches) V(balance, account) = 500 (500 different balance values) naccount = 10000 (account has 10,000 tuples) assume the following indices exist on account + a primary, B -tree index for attribute branch-name + a secondary, B -tree index for attribute balance May 2008 Database System Concepts - Chapter 14 Query Optimization - 34 14.3.2 Selection Size Estimation The size estimation of the result of a selection operation depends on the selection predicates, that is single equality predicate, single comparison predicate, and combination of predicates Equality selection A=a (r) assuming uniform distribution of values in relations, and the value a appears in attribute A of some record in r the selection is estimated to have nr / V(A, r) tuples nr /V(A, r) /fr blocks that these tuples occupy May 2008 Database System Concepts - Chapter 14 Query Optimization - 35 14.3.2 Selection Size Estimation (cont.) Comparison selections AV(r) the lowest and highest values of attributes, that is min(A, r) and max(A, r), are available in catalog the estimated number of records/tuples satisfying comparison condition A V 0, if v < min(A, r) nr, if v max(A, r) , otherwise v min( A, r ) nr . max( A, r ) min( A, r ) May 2008 in absence of statistical information cost is assumed to be nr / 2 Database System Concepts - Chapter 14 Query Optimization - 36 14.3.2 Selection Size Estimation (cont.) The selectivity of a condition i is the probability that a tuple in the relation r satisfies i . If si is the number of satisfying tuples in r, the selectivity of i is given by si /nr. Conjunction: 1 2. . . n (r). The estimate for number of tuples in the result is: n s1 s2 . . . sn r nrn Disjunction:1 2 . . . n (r). Estimated number of tuples: sn s1 s2 nr 1 (1 ) (1 ) ... (1 ) nr nr nr Negation: (r). Estimated number of tuples: nr – size((r)) May 2008 Database System Concepts - Chapter 14 Query Optimization - 37 14.4 Choice of Evaluation Plans — Query optimization How to generate the evaluation plan for an relational algebra expression with lower or the lowest cost generating of the optimized query trees selecting implementation algorithms for each operations in the tree pipeline or materialization strategy for the expression E.g. Fig.14.5 May 2008 Database System Concepts - Chapter 14 Query Optimization - 38 Fig.14.5 An evaluation plan 14.4 Choice of Evaluation Plans (cont.) Query optimization choosing the evaluation plan with lowest or the lower cost Query optimization is conducted by two strategies/approaches cost-based optimization,§14.4.2 heuristic optimization, §14.4.3! Practical query optimizers incorporate elements of these two approaches May 2008 Database System Concepts - Chapter 14 Query Optimization - 40 14.4.2 Cost-based Optimization Cost-based optimization search all the possible plans and choose the best plan a optimal plan with the minimum costs can be obtained Cost-based optimization is expensive, though results optimal plan May 2008 Database System Concepts - Chapter 14 Query Optimization - 41 14.4.3 Heuristic Optimization Heuristic optimization uses heuristics (or heuristic rules) to choose a better plan a suboptimal plan with lower costs can be obtained Principles: transforms the query-tree by heuristics to reduce cost perform selection early to reduce the number of tuples perform projection early to reduces the number of attributes substitute Cartesian product and selection with join May 2008 Database System Concepts - Chapter 14 Query Optimization - 42 14.4.3 Heuristic Optimization (cont.) May 2008 perform most restrictive selection and join operations before other similar operations the most restrictive operations generate resulting relations with smallest size Database System Concepts - Chapter 14 Query Optimization - 43 Steps in Heuristic Optimization Step1 使用rule1,将conjunctive selection 分解为多个单独 的选择操作,以使单个选择操作尽可能沿查询树下移(尽 早执行选择操作,以减少中间计算结果) Step2 根据选择操作的交换率和分配率,利用rule2, rule7.a, rule7.b, rule11, 将查询树上的每个选择操作尽可能移向叶 节点,以便尽早执行选择操作 Step3 根据连接操作的结合律和交换率,使用rule6,重新 安排查询树中的叶结点,使得具有restrictive selection特征 的叶结点先执行 restrictive selection:执行此操作后,产生的结果关系 最小(所含元组最少) refer to Fig. 14.3 May 2008 Database System Concepts - Chapter 14 Query Optimization - 44 Steps in Heuristic Optimization (cont.) Step4 利用rule4.a, 以连接操作代替相邻的选择和笛卡尔乘 积操作 Step5 利用rule3, 8.a, 8.b,12, 将查询树上的投影操作尽可能 下移,以便尽早执行投影操作,减少中间计算结果 Step6 将最后的查询树分解为多个子树,使子树中的各操 作可以采用流水线方式执行,以减少对外设的访问次数 e.g. Fig.14.5 重点:前5步 !!! May 2008 Database System Concepts - Chapter 14 Query Optimization - 45 Example One Given S(S#, sname, age, sex) C(C#, cname, teacher) SC(S#, C#, grade) Find all the students, who are male, and get grades more than 90 when they learn some one course Step1. SQL statement selection conditions SELECT DISTINCT sname FROM S, SC WHERE S.s# = SC.s# AND sex=M AND grade > 90 join condition May 2008 Database System Concepts - Chapter 14 Query Optimization - 46 Example One (cont.) Principles select A1, A2, …, An from r1, r2, …, rn where P corresponds to ∏A1, A2, …, An (σP ( r1 ╳ r2 ╳ …╳ rn ) ) May 2008 Database System Concepts - Chapter 14 Query Optimization - 47 Example One (cont.) Step2. Initial Relational algebra expression E1 = sname( s.s#=sc.s# ∧ sex=M ∧grade>90 (S ╳ SC ) ) Step3. Initial query tree Fig. 14.0.5 (a), E1 Step4. By means of Rule1, Rule7b, distribute operations over relation S and SC Fig. 14.0.5 (b), E2 May 2008 Database System Concepts - Chapter 14 Query Optimization - 48 sname sname s.s#=sc.s# s.s#=sc.s# ∧ sex=M ∧grade>90 ╳ ╳ sex=M grade>90 SC (S#, C#, grade) S (S#, sname, age, sex) (a) Initial query tree E1 S SC (b)E2 Example One (cont.) Step5. By means of Rule4.a, replace selection and Cartesian product with natural join Fig. 14.0.5 (c), E3 Step6. !! By means of Rule8b, distribute operations over relation S and SC Fig. 14.0.5 (d), The final optimized expression E4 that is equivalent to initial expression E1 is sname {sname, S# (sex=M (S)) grade>90 (grade, S# (SC)) } May 2008 Database System Concepts - Chapter 14 Query Optimization - 50 sname sname s.s#=sc.s# sex=M grade>90 SC (S#, C#, grade) S s.s#=sc.s# Πsname, S# ΠS# sex=M grade>90 S SC(S#, C#, grade) (S#, sname, age, sex) (S#, sname, age, sex) (c)E3 (d)E4 Example Two Consider the following insurance database, where the primary keys are underlined, Person(driver-id, name, address) Car(license, model, year) Accident(report-number, date, location) Participated(driver-id , license, report-number, damageamount) May 2008 Database System Concepts - Chapter 14 Query Optimization - 52 Example Two (cont.) Problems Give a SQL statement to find out the driver name, license, report-number of the accidents which happened in Beijing and before May 3, 2008 Give the initial query tree for this query, and construct an optimized and equivalent relational algebra expression for it by means of heuristic optimization May 2008 Database System Concepts - Chapter 14 Query Optimization - 53 Example Two (cont.) Step1. SQL statement Select name, license, Accident.report-number From Person, Participate, Accident Where Person.driver_id = Participate.driver_id AND Accident.report_number = Participate.report_number AND Location=Beijing AND date < 5/3/2008 Step2. Initial expression name, license, Accident.report-number ( Person.driver_id = Participate.driver_id AND Accident.report_number = Participate.report_number AND Location=Beijing AND date < 5/3/2003 (Person ╳ Participate ╳ Accident ) May 2008 Database System Concepts - Chapter 14 Query Optimization - 54 name, license, Accident.report-number Person.driver_id = Participate.driver_id AND Accident.report_number = Participate.report_number AND Location=Beijing AND date < 5/3/2003 ╳ ╳ Person Participate Accident (a) initial query tree name, license, Accident.report-number Accident.report.number = Participate.report_number name, license, Accident.report_number report-number Person.driver_id = Participate.driver_id driver_id, name Location=Beijing driver_id, AND date < 5/3/2003 license, report-number Participate Accident (report-number, date ,location) Person (driver-id, license, report-number, damage-amount) (driver-id, name, address) (b) optimal query tree Example Three Consider the following relations in banking enterprise database, where the primary keys are underlined branch (branch-name, branch-city, assets), loan ( loan-number, branch-name , amount) borrower( customer-name, loan-number, borrow-date) customer (customer-name, customer-street, customer-city) account (account-number, branch-name, balance) depositor (customer-name, account-number , deposit-date) May 2008 Database System Concepts - Chapter 14 Query Optimization - 57 Example Three (cont.) For the query “ Find the names of all customers who have an loan at any branch that is located in Brooklyn and have assets more than $100,000, requiring that loan-amount is less than $1000” give an SQL statement for this query given a initial query tree for the query, and convert it into an optimized query tree by means of heuristic optimization May 2008 Database System Concepts - Chapter 14 Query Optimization - 58 Example Three (cont.) May 2008 SQL select customer-name from borrower, loan, branch where loan.loan-number=borrower.loan-number and branch.branch-name=loan.branch-name and branch-city=”Brooklyn” and assets>100000 and amount<1000 Database System Concepts - Chapter 14 Query Optimization - 59 customer-name customer- name, loan-number loan.loan-number loan.loan-number branch-name ,branch-name borrower ( customer-name, loan-number, borrow-date) amount< 100 loan( loan-number, branchname , amount) branch-city=”Brooklyn” AND assets>100000 Branch (branch-name, branch-city, assets) Example Four (cont.) 给定如下关系数据库 Employee(Employee#, Name, Address, Super-E#, E-D#) Department(D#, Dname, manager#, depart-location) Project(P#, Pname, Plocation, P-D#) 查询:对于每个在“哈尔滨”进行的工程项目,列出其工程 项目号P# ,工程所属部门号P-D#, 该部门领导的名字Name 和地址Address 要求:写出该查询的SQL语句;转换为关系代数表达式并利 用启发式方法进行查询优化;给出优化后的关系代数表达式。 Select P#, P-D#, Name, Address) From Project, Department, Employee Where Plocation =哈尔滨 AND P-D#= D# AND manager# = Employee# P#, P-D#, Name, Address employee#, name, address Employee P#, P-D#, manager# P#, P-D# P-location=“哈尔滨” Project D#, manager# Department 作业 Consider the following schema, where the primary keys are underlined , Suppliers (supplier-id, supplier-name, city, telephone, address) Parts ( parts-id, parts-name, parts-color) Catalog (supplier-id, parts-id, price ) . (1) Give an SQL statement to find out the name and telephone of the suppliers who supply a red part whose price is below $2000. (2) Translate this SQL statement into an initial query tree, and give an optimized query tree for it, by means of heuristic query optimization. May 2008 Database System Concepts - Chapter 14 Query Optimization - 63 Appendix A Sybase 平台下 的查询执行计划优化 Select max(BscpID) From BSC May 2008 Database System Concepts - Chapter 14 Query Optimization - 64 Appendix A Sybase 平台下 的查询执行计划优化(续) May 2008 Database System Concepts - Chapter 14 Query Optimization - 65 Appendix A Sybase 平台下 的查询执行计划优化(续) 查询实验内容 单表、多表(如3张表)的查询、插入、删除、修改所 对应的查询执行计划比较 有无索引、索引类型对查询、插入、删除、修改等操作 的代价的影响 查询结果相同、但表达形式不一样的多个等价的查询、 删除、插入、修改语句的查询执行计划的比较 例如。 delete from loan where amount between 0 and 50 May 2008 Database System Concepts - Chapter 14 Query Optimization - 67 Appendix A Sybase 平台下 的查询执行计划优化(续) delete from loan where amount in ( select amount from loan where amount >= 0 and amount <= 50) May 2008 Database System Concepts - Chapter 14 Query Optimization - 68 Appendix B DDBS查询代价与数据分布的关系 分布式数据库系统DDBS中, 数据在各站点的分布和站点间 通讯代价/传输时间将影响到整个查询代价 例. 4 S(S#,SNAME,AGE,SEX) , 10 个元组,站点A 5 C(C#,CNAME,TEACHER), 10 个元组,站点B SC(S#,C#,GRADE), 106个元组,站点A 4 元组平均长度100bit,站点间传输速率v = 10 bit/s ,通 信延迟d=1s。假设无副本 查询要求 在支持分片透明性的DDBS(分布在A和B 2个站点)中, 查询“所有选修“MATH“的男生的学号和姓名” May 2008 Database System Concepts - Chapter 14 Query Optimization - 69 Appendix B DDBS查询代价与数据分布的关系 (续) SQL(全局)查询语句 SELECT S#,SNAME FROM S,SC, C WHERE S.S# = SC.S# AND SC.C# = C.C# (连接条件) AND SEX=M AND CNAME= MATH (选择条件) 查询代价(query cost) 只考虑通信代价 TC (x) =1s ╳ 传输次数 + 传输的元组bit数 ÷ 104 bit/s May 2008 Database System Concepts - Chapter 14 Query Optimization - 70 Appendix B DDBS查询代价与数据分布的关系 (续) 策略1 将关系C传到A地,在A地进行查询处理 5 4 3 TC1 = 1s + 10 ╳100 bit ÷ 10 bit/s ≈ 10 s ≈ 16.7 min 策略2 将关系S和SC传到B地,在B地进行查询处理 4 6 4 TC2 = 2s + (10 +10 ) ╳ 100 bit ÷ 10 bit/s ≈10100 s≈28h May 2008 Database System Concepts - Chapter 14 Query Optimization - 71 Appendix B DDBS查询代价与数据分布的关系 (续) 策略3 5 A地从关系SC、S中求出所有男学生的成绩元组,共10 个 根据元组中的C#, 向B地的C关系询问(应答过程) CNAME=MATH?共询问105次 5 TC3 ≈ 2 ╳ 10 ╳ 1 ≈ 2.3天 策略4 在B地从关系C中求出CNAME=MATH的课程元组,假设 有10个 根据C#询问A地的S、SC的连接,核实是否为选修MATH (SC.C#= C#?)的男生(S.SEX=M?) May 2008 Database System Concepts - Chapter 14 Query Optimization - 72 Appendix B DDBS查询代价与数据分布的关系 (续) TC4 ≈ 2 ╳ 10 ╳ 1 ≈ 20s 策略5 5 在A地从关系S、SC中求出男生的的课程元组,有10 个 将结果传到B地,在B地执行查询 5 4 3 TC5= 1s + 10 ╳ 100 bit ÷ 10 bit/s ≈ 10 s ≈ 16.7 m 策略6 在B地从关系C中求出CNAME=MATH的课程元组,有 10个 将结果传到A地,在A地执行查询 4 TC6 = 1s + 10 ╳100 bit ÷ 10 bit/s ≈ 1s May 2008 Database System Concepts - Chapter 14 Query Optimization - 73 Appendix C DB2查询优化 对于IBM DB2来说,它有非常好的优化算法,大致是基于代 价估算后选择代价最小的策略。 代价估算的方法 搜集系统目录相关数据和要被执行的SQL语句,然后选 择若干方案进行代价计算。 DB2应用程序开发过程中部署测试环境时,可以专门设置虚 拟的但是和将来实际运行环境比较一致的系统目录数据来模 拟真实环境下的DB2的SQL执行策略。 对于DB2的优化器也可以设置不同的优化级别,可以使用 SET CURRENT QUERY OPTION命令或者某些子句进行相应 设置,其中 0:最小的优化 1:相当于DB2/6000 Version 1中的优化加一些低成本的 优化 May 2008 Database System Concepts - Chapter 14 Query Optimization - 74 Appendix C DB2查询优化(续) 2:使用opt level 5的功能,但连接优化上进行了一定简 化 3:中等优化,大约相当于DB2 for Z/OS 5:使用一定数量的优化,包括Heuristic Rules(这是默 认的值) 7:使用一定数量的优化,不包括Heuristic Rules 9:使用所有可用的优化技术 May 2008 Database System Concepts - Chapter 14 Query Optimization - 75 Appendix C DB2查询优化(续) 有4种查询优化策略访问工具,分别能够针对静态SQL、动 态SQL、CLI应用程序、DRDA应用程序请求者(DRDA AR) 提供图形界面或者文本界面查询访问或者提供基本信息。 第一种。基本的是直接访问Explain Table(可以直接使用 SELECT语句来访问它,所以这是一个各种OS平台上都可以 采取的措施),它的缺点是没有提供任何界面,必须由人去 理解表中的数据。另外只能对执行的SQL来处理,如果是未 执行的静态包则无法产生信息。 第二种。基于图形界面的Visual Explain(主要用于windows 平台,甚至可以在windows平台上对诸如UNIX上的DB2系统 上的SQL语句进行这种图形化分析),它不支持DADA AR, 也不支持分析未执行的静态包,不能一次分析多条SQL语句, 但最直观。 Database System Concepts - Chapter 14 Query Optimization - 76 May 2008 Appendix C DB2查询优化(续) 第三种。db2exfmt以预先定义好的报告格式获得explain table的信息,它不支持DADA AR,也不支持分析未执行的 静态包,但支持一次分析多条语句。 第四种。db2expln允许我们查看作为静态绑定包(static package)的一部分访问策略,它提供一种文本格式描述的 策略。不支持CLI或者DRDA AR,可以一次分析多条语句。 May 2008 Database System Concepts - Chapter 14 Query Optimization - 77 Have a break