Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Query Optimization (CB Chapter 23.1-23.3) CPSC 356 Database Ellen Walker Hiram College (Includes figures from Database Systems: An Application Oriented Approach 2ed by Kifer, Bernstein & Lewis, © Addison Wesley 2005) SQL • Widely used (only?) standard query language for relational databases • Once SEQUEL (Structured English QUEry Language), now Structured Query Language • Objectives – Easy to learn, easy to use – Create and modify the database and query from it • DDL defines • DML manipulates SQL is Declarative, RA is Procedural • SQL Statements describe the desired results, but do not specify a sequence of operations to get those results • Relational Algebra expressions describe a specific sequence of operations to perform • To evaluate a SQL statement, it needs to be translated into (a computer implementation of) RA first! Query Processing • Query Processing is the translation of SQL into RAlike nested function calls • One query can have multiple translations SELECT roomNo, HotelName FROM Room, Hotel WHERE HotelName = ‘Savoy’ and Room.hotelNo=Hotel.hotelNo; roomNo,HotelName ( hotelName=Savoy and Room.hotelNo=Hotel.hotelNo (Room x Hotel) ) roomNo,HotelName ( Room.hotelNo=Hotel.hotelNo ( (hotelName=Savoy (Hotel)) x Room) ) Query Optimization • Choose the translation that minimizes resource use (time, space) • The second translation below is better. (Why?) SELECT roomNo, HotelName FROM Room, Hotel WHERE HotelName = ‘Savoy’ and Room.hotelNo=Hotel.hotelNo; roomNo,HotelName ( hotelName=Savoy and Room.hotelNo=Hotel.hotelNo (Room x Hotel) ) roomNo,HotelName ( (hotelName=Savoy (Hotel)) Room.hotelNo=Hotel.hotelNo Room) Query Processing SQL Decomposition Relational Algebra Optimization User Result (table) Processing Engine Efficient Rel. Algebra More Detail of Query Processing Steps in Query Processing • Query Decomposition (create relational algebra expression) • Query Optimization (create execution plan) • Code Generation • Query Execution Parts of Optimization • Query Plan Generator – Comes up with viable relational algebra expressions to improve the initial naïve one • Cost Estimator – Estimates the cost (time / space) of each plan • Optimization – Choosing the plan with the lowest cost, or at least “reasonably cheap” Query Decomposition • Check Syntax • Build a relational algebra tree Room.hotelNo=Hotel.hotelNo X hotelName=Savoy Hotel Room Query Transformation (Selection) • Select with multiple AND conditions can be sequence of selects hotelName=Savoy and Room.hotelNo=Hotel.hotelNo( …) = hotelName=Savoy (Room.hotelNo=Hotel.hotelNo ( …)) • Order of Select operations doesn’t matter = Room.hotelNo=Hotel.hotelNo ( hotelName=Savoy ( …)) Query Transformation (Projection) • Extra intermediate projections don’t matter Name(Name, Status(Student))= Name (Student) • Order of select and project doesn’t matter Status=‘SR’(Status(Student)) = Status(Status=‘SR’(Student)) Query Transformation (Join) • Push Select through Join – Replace a select on a cross-product with a join – Joins can be implemented at the lowest level more efficiently than “materialized cross-product” • Push Select through Product – If the attributes of the condition all belong to one table of the join, put the select on only the one table, so a smaller table is joined – Joins on smaller tables are faster than on larger ones • More rules pp. 640-642 Pushing Select Example • Find all seniors that take CPSC356 stu_id=id & crs=‘CPSC356’(Student x Transcript) • Separate the selects stu_id=id ( crs=‘CPSC356’(Student x Transcript)) • Push the inner select stu_id=id (Student x (crs=‘CPSC356’(Transcript))) • Replace Select/Product by Join Student |x| stu_id=id (crs=‘CPSC356’(Transcript))) Query Processing Example Query Processing Example Execution Plans • Add specific algorithms to each relational algebra step • Determine whether/how indices will be used • Add pipelining (not storing intermediate data) where possible Choosing Transformations • Estimate cost of each tree based on – – – – Table sizes Numbers of distinct attribute values Average number of tuples for selection condition Methods used for join (e.g. indexed, hashed) • Choose lowest cost tree • Because estimates aren’t perfect, the absolute best tree might not be chosen! Heuristics (Rules of Thumb) • Perform Selection as early as possible – Unless doing it later lets you use an index • • • • Combine X and Selection into join operation Execute most restrictive Selections first Perform Projection as early as possible Compute common expressions once – Creating a view is a way to do this! Consequences for SQL Programmer • Using RA operations in SQL (e.g. explicit Join) constrains optimization – Good when “programmer knows best” – Bad when programmer prevents a better optimization • Intermediate tables (i.e. views) can constrain optimization – Use views to compute common subexpressions • When performance is substandard, tweaking the SQL can help! (Remember the heuristics).