Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Optimization of Relational Algebra Expressions Database I. Moore's law • Moore's law is the observation that, over the history of computing hardware, the number of transistors on integrated circuits doubles approximately every two years. • The capabilities of many digital electronic devices are strongly linked to Moore's law: processing speed, memory capacity, sensors and even the number and size of pixels in digital cameras. Our objective --1 • On the other hand, the development of other factors is linear. • An example of such a factor is the speed with which the disk moves in hard disk drives. • For large data the amount of data moved between the primary and the secondary storage devices should be minimized. Our objective --2 • Data is moved in blocks between the main memory and the hard disk. • Thus, in other words, in RDBMS computations the number of blocks involved in I/O operations should be kept as low as possible. • This can be achieved by working with as small transitory relations as possible. Equivalence of relational algebra expressions • In order to reduce the size of the transitory relations the relational algebra expressions are rewritten. • Two relational algebra queries q, q’ are equivalent if for all database instances I q(I) is the same as q’(I). • Here: – a database instance consists of relations – q(I) denotes the result of applying q on I. Example • Consider relations – Likes(drinker, beer) – Frequents(drinker, bar) • The following two expressions are equivalent to each other: – bar(σF.drinker=L.drinkerbeer= Bud (F L)) – bar(σF.drinker=L.drinker(F (σbeer=Bud (L)))). • In most cases the second query can be evaluated faster. Optimization algorithm -- sketch • The original relational algebra expression is rewritten into another one in which – the selection operations are accomplished as soon as possible – the unnecessary columns are removed afterwards. • Next, the selection and the subsequent cross product operators are substituted with the appropriate join operators. Our running example Likes(drinker, beer) Bar(name, city) Frequents(drinker, bar) ΠL. drinker(σL.drinker=F.drinkername=barbeer=Budcity=N.Y. (L B F)) Splitting the conditions • σC1C2 (E) is equivalent with σC1(σC2 (E)). ΠL. drinker (σ L.drinker=F.drinkername=barbeer=Budcity=N.Y.(L B F)) is equivalent with ΠL. drinker(σL.drinker=F.drinker(σname=bar(σbeer=Bud(σcity=N.Y. (LBF))))) Expression trees ΠL.drinker ΠL.drinker σL.dinker=F.drinker σ L.drinker=F.drinkername=bar beer=Budcity=N.Y. σbeer=Bud σcity=N.Y. L B σname=bar F L B F Pulling down the conditions • σC(E1ΘE2) ≡ (σC(E1))ΘE2, where attr(C) attr(E1) and Θ Є {, ⋈}. • Here – attr(C), attr(E1) respectively denote the attributes appearing in condition C and relational algebra expression E1 – while ≡ denotes the equivalence relation. ΠL. drinker(σL.drinker=F.drinker(σname=bar(σbeer=Bud(σcity=N.Y. (LBF))))) is equivalent with ΠL. drinker(σL.drinker=F.drinker(σname=bar ((σbeer=Bud (L)) (σcity=N.Y. (B))F))) ΠL.drinker ΠL.drinker σL.dinker=F.drinker σL.dinker=F.drinker σname=bar σname=bar σbeer=Bud L L B σbeer=Bud σcity=N.Y. F σcity=N.Y. B F Removal of the unnecessary columns • ΠX(E1 Θ E2) ≡ ΠY(E1) Θ ΠZ(E2), where X = Y Z, Y attr(E1), Z attr(E2) and Θ Є {, ⋈}. • ΠX(σC (E)) ≡ ΠX(σF (ΠY(E))), where Y = attr(C) X. ΠL. drinker(σL.drinker=F.drinker(σname=bar ((σbeer=Bud (L)) (σcity=N.Y. (B))F))) is equivalent with ΠL. drinker(σL.drinker=F.drinker(σname=bar( (Πdrinker(σbeer=Bud (L))) (Πbar(σcity=N.Y. (B)))F))) ΠL.drinker ΠL.drinker σL.dinker=F.drinker σL.dinker=F.drinker σname=bar σname=bar σbeer=Bud L σcity=N.Y. B Πdrinker F σbeer=Bud L Πbar F σcity=N.Y. B Note: the application of extra projections increases the time of the evaluation of the query, hence this rewriting step can be omitted. Substitution with joins • By definition, – E1 ⋈C E2 ≡ σC(E1 E2) – E1 ⋈ E2 ≡ ΠL(σC(E1 E2)), where in condition C the common attributes of E1 and E2 are made equal and these common attributes occur only once in L. ΠL. drinker(σL.drinker=F.drinker(σname=bar( (Πdrinker(σbeer=Bud (L))) (Πbar(σcity=N.Y. (B)))F))) is equivalent with ΠL. drinker((Πdrinker(σbeer=Bud (L))) ⋈ (Πbar(σcity=N.Y. (B))⋈ name=bar F))) ΠL.drinker σL.dinker=F.drinker ΠL.drinker σname=bar ⋈ Πdrinker σbeer=Bud L ⋈ name=bar Πdrinker Πbar σcity=N.Y. B σbeer=Bud F L Πbar σcity=N.Y. B F Commutativity and associativity • Commutativity: E1 Θ E2 ≡ E2 Θ E1, where Θ Є {, ⋈, ⋈C }. • Associativity: (E1 Θ E2) Θ E3 ≡ E1 Θ (E2 Θ E3), where Θ Є {, ⋈} • Note: in general (E1 ⋈C1 E2) ⋈C2 E3 is not equivalent with E1 ⋈C1 (E2 ⋈C2 E3). Why?? Disjunctions in the conditions • Disjunctions in the conditions of selection operators may complicate the situation. • As a first attempt one may use equivalence rule σC1C2 (E) ≡ σC1(E) σC2(E) and then apply the previous algorithm on σC1(E) and σC2(E). • However, in this case the relations appearing in E may be scanned twice, which is costly.