Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Query Processing and Advanced Q Queries Query Processing (2) Review: Query Processing SQL query parse parse tree convert answer logical query plan execute query rewrite “improved” ll.q.p qp estimate result sizes l l.q.p. + sizes i consider physical plans Pi pick best {(P1 C1) (P2 C2) } {(P1,C1),(P2,C2)...} estimate costs {P1 P2 {P1,P2,…..} } CMPT 454: Database Systems II – Query Processing (2) 2 / 26 Parsing Grammar ffor SQL The following grammar describes a simple subset of SQL. Queries <Query>::= SELECT <SelList> FROM <FromList> WHERE <Condition> ; Selection lists <SelList>::= <Attribute> <SelList>:: <Attribute>, <SelList> <SelList>::= <Attribute> From lists <FromList>::= <Relation>, <FromList> <FromList>::= <Relation> <FromList>:: CMPT 454: Database Systems II – Query Processing (2) 3 / 26 Parsing Grammar for SQL Conditions <Condition>::= <Condition> AND <Condition> <Condition>::= <Attribute> IN (<Query>) <Condition>::= <Attribute> = <Attribute> <Condition>:: <Condition>::= <Attribute> LIKE <Pattern> Syntactic categories Relation and Attribute are not defined by grammar rules, but by the database schema. Syntactic category Pattern defined as some regular expression. CMPT 454: Database Systems II – Query Processing (2) 4 / 26 Example: A SQL Query StarsIn (mo (movieTitle, ieTitle mo movieYear, ieYear starName) MovieStar (name, address, gender, birthdate) Goal: find the movies with stars born in 1960 SELECT movieTitle FROM StarsIn, MovieStar WHERE starName = name AND birthdate LIKE ‘%1960’ %1960 CMPT 454: Database Systems II – Query Processing (2) 5 / 26 A Parse Tree <Query> SELECT <SelList> S lLi t <Attribute> FROM <FromList> F Li t WHERE <Condition> C diti <Relation> , <FromList> movieTitle StarsIn <Relation> MovieStar <Condition> <Attribute> starName = AND <Condition> <Attribute> <Attribute> name birthdate CMPT 454: Database Systems II – Query Processing (2) LIKE <Pattern> ‘%1960’ 6 / 26 Conversion to Logical Query Plan How to convert a parse tree into a logical query plan, i.e. a relational algebra expression? Queries with conditions without subqueries are easy: Form Cartesian product of all relations in <FromList>. Apply a selection c where C is given by <Condition>. <Condition> Finally apply a projection L where L is the list of tt ib t iin <S lLi t> attributes <SelList>. Queries involving subqueries are more difficult. Remove subqueries from conditions and represent them by a two-argument selection in the logical query plan. CMPT 454: Database Systems II – Query Processing (2) 7 / 26 An Algebraic Expression Tree movieTitle starName=name AND birthdate LIKE ‘%1960’ X StarsIn CMPT 454: Database Systems II – Query Processing (2) MovieStar 8 / 26 Another SQL Query StarsIn St I ((movieTitle, i Titl movieYear, i Y starName) t N ) MovieStar (name, address, gender, birthdate) Goal: find the movies with stars born in 1960 SELECT title titl FROM StarsIn WHERE starName IN ( SELECT name FROM MovieStar ) WHERE birthdate LIKE ‘%1960’); CMPT 454: Database Systems II – Query Processing (2) 9 / 26 Another Parse Tree <Query> SELECT <SelList> FROM <FromList> <Attribute> <Relation> title StarsIn SELECT <SelList> FROM WHERE <Condition> <Attribute> IN ( <FromList> <Attribute> <Relation> name MovieStar CMPT 454: Database Systems II – Query Processing (2) ) starName <Query> WHERE <Condition> <Attribute> LIKE <Pattern> birthDate ‘%1960’ 10 / 26 Another Algebraic Expression Tree movieTitle StarsIn <condition> <Attribute> IN starName name birthdate LIKE ‘%1960’ MovieStar CMPT 454: Database Systems II – Query Processing (2) 11 / 26 Algebraic Laws for Query Plans Introduction Algebraic laws allow us to transform a Relational Algebra (RA) expression into an equivalent one. Two RA expressions p are equivalent q if,, for all database instances, they produce the same answer. The resulting expression may have a more efficient physical query plan. Algebraic laws are used in the query rewrite phase. p CMPT 454: Database Systems II – Query Processing (2) 12 / 26 Algebraic Laws for Query Plans Introduction Commutative law: O d off arguments t d tt Order does nott matter. x+y=y+x Associative A i ti law: l May group two uses of the operator either from the left right or the right. (x + y) + z = x + (y + z) Operators O t th thatt are commutative t ti and d associative i ti can be grouped and ordered arbitrarily. CMPT 454: Database Systems II – Query Processing (2) 13 / 26 Algebraic Laws for Query Plans Natural Join, Join Cartesian Product and Union R (R S S) = T S R =R (S T) RxS=SxR (R x S) x T = R x (S x T) RUS=SUR R U (S U T) = (R U S) U T CMPT 454: Database Systems II – Query Processing (2) 14 / 26 Algebraic Laws for Query Plans Selection p1 p2(R) = p11 [ p22 (R)] p1 v p2(R) = [ p1 (R)] U [ p2 (R)] p1 [ p2 (R)] = p2 [ p1 (R)] Simple conditions p1 or p2 may be pushed down further than the complex condition. CMPT 454: Database Systems II – Query Processing (2) 15 / 26 Algebraic Laws for Query Plans Bag Union What about the union of relations with duplicates (b )? (bags)? R = {a,a,b,b,b,c} S = {b,b,c,c,d} {b b d} RUS=? Number of occurrences either SUM or MAX of p relations. occurrences in the imput SUM: R U S = {a,a,b,b,b,b,b,c,c,c,d} MAX: R U S = {{a,a,b,b,b,c,c,d} , , , , , , , } CMPT 454: Database Systems II – Query Processing (2) 16 / 26 Algebraic Laws for Query Plans Selection p1 v p2 (R) = p1(R) U p2(R) MAX implementation of union makes rule work. R={a,a,b,b,b,c} R {a,a,b,b,b,c} p1 satisfied by a,b, p2 satisfied by b,c p1vp2 (R) = {{a,a,b,b,b,c} bbb } p1(R) = {a,a,b,b,b} p2(R) = {b,b,b,c} p1 (R) U p2 (R) = {a,a,b,b,b,c} CMPT 454: Database Systems II – Query Processing (2) 17 / 26 Algebraic Laws for Query Plans Selection p1 v p2 (R) = p1(R) U p2(R) SUM implementation of union makes more sense. Senators (……) Reps (……) T1 = yr,state yr state Senators, Senators T2 = yr,state yr state Reps T1 Yr 97 99 98 State CA CA AZ T2 Yr 99 99 98 State CA CA CA Union? Use SUM implementation, p but then some laws do not hold. CMPT 454: Database Systems II – Query Processing (2) 18 / 26 Algebraic Laws for Query Plans Selection and Set Operations p(R U S) = p(R) U p(S) p(R - S) = p(R) - S = p(R) - p(S) CMPT 454: Database Systems II – Query Processing (2) 19 / 26 Algebraic Laws for Query Plans Selection and Join p predicate p: p with onlyy R attributes q: predicate with only S attributes m: ppredicate with attributes from R and S p (R S) = [ q (R S) = R p (R)] S [ q CMPT 454: Database Systems II – Query Processing (2) (S)] 20 / 26 Algebraic Laws for Query Plans Selection and Join ppqq ((R pqm (R S)) = [ p ((R)])] [ S) = m [(p R) pvq (R S) = [(p R) S] U [R CMPT 454: Database Systems II – Query Processing (2) ( q ((S)])] q S)] ( q S)] 21 / 26 Algebraic Laws for Query Plans Projection X: set of attributes Y: set of attributes XY: X U Y xy (R) = x [y (R)] May y introduce p projection j anywhere y in an expression p tree as long as it eliminates no attributes needed by an operator above and no attributes that are in result CMPT 454: Database Systems II – Query Processing (2) 22 / 26 Algebraic Laws for Query Plans Projection and Selection X: subset of R attributes Z: attributes in predicate P (subset of R attributes) x (pR) = xz x {p [ x (R) ]} Need to keepp attributes for the selection and for the result CMPT 454: Database Systems II – Query Processing (2) 23 / 26 Algebraic Laws for Query Plans Projection and Selection X: subset of R attributes Y: subset of S attributes Z: intersection of R,S R S attributes xy (R S) = xy{[xz (R) ] CMPT 454: Database Systems II – Query Processing (2) [yz (S) ]} 24 / 26 Algebraic Laws for Query Plans Projection Selection and Join Projection, xy { { p (R S)} = xy { { p [xz’ (R) yz’ (S)]} z’ = z U {attributes used in P CMPT 454: Database Systems II – Query Processing (2) } 25 / 26 What Are Good Transformation? No transformation is always good Usually good: early selections/projections CMPT 454: Database Systems II – Query Processing (2) 26 / 26