Download Query Optimization – Seminar 1

Query Optimization – Seminar 6 1. Introduction Query optimization plays a vital role in query processing. Query processing consists of the following stages: 1. 2. 3. 4. Parsing a user query (e.g. in SQL) Translating the parse tree (representing the query) into relational algebra expression. Optimizing the initial algebraic expression. Choosing an evaluation algorithm for each relational algebra operator that would constitute least cost for answering the query. Stages 3-4 are the two parts of Query Optimization. Query optimization is an important and classical component of a database system. Queries, in a high level and declarative language e.g. SQL, that require several algebraic operations could have several alternative compositions and ordering. Finding a “good” composition is the job of the optimizer. The optimizer generates alternative evaluation plan for answering a query and chooses the plan with least estimated cost. To estimate the cost of a plan (in terms of I/O, CPU time, memory usage, etc but not in pounds or dollars) the optimizer uses statistical information available in the database system catalogue. 2. Objectives Generally speaking, the purpose of this seminar is to present and discuss how relational algebra (RA) operators are evaluated. In other words, how RA operators are implemented using some algorithms. In particular, this seminar tries to compare different evaluation algorithms for selection (known as Restrict) operation. In addition, the exercises will explore the influence of different physical access methods available to a RBMS to make a choice of how to evaluate queries. 3. Exercise Consider a relation R(a, b, c, d, e) containing 5,000,000 records (tuples), where each data page of the relation holds 10 records. R is organized as a sorted file with dense secondary indexes. Assume that R.a is a candidate key for R, with values lying in the range of 0 to 4,999,999, and that R is sorted in R.a order. For each of the following relational algebra expressions (i.e. queries), state which of the following three approaches (evaluation strategies) is most likely to be the cheapest. a) Access the sorted file for R directly or using binary search b) Use a B+ tree index (clustered) on attribute R.a. c) Use a hashed index (clustered) on attribute R.a. 1. a<50,000(R) 2. a=50,000(R) 3. a50,000(R) where  denotes the selection operation of relational algebra (e.g. a<50,000(R)). Hints: Try to calculate how many pages are there in R. Calculate the cost of each approach using the formulas given in the reading material and select the one that gives least cost.

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Query Optimization – Seminar 1