Download Query Processing

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Clusterpoint wikipedia , lookup

Open Database Connectivity wikipedia , lookup

Microsoft SQL Server wikipedia , lookup

PL/SQL wikipedia , lookup

Database model wikipedia , lookup

SQL wikipedia , lookup

Relational algebra wikipedia , lookup

Relational model wikipedia , lookup

Transcript
Query Optimization
(CB Chapter 23.1-23.3)
CPSC 356 Database
Ellen Walker
Hiram College
(Includes figures from Database Systems: An Application Oriented Approach 2ed by
Kifer, Bernstein & Lewis, © Addison Wesley 2005)
SQL
• Widely used (only?) standard query language
for relational databases
• Once SEQUEL (Structured English QUEry
Language), now Structured Query Language
• Objectives
– Easy to learn, easy to use
– Create and modify the database and query from it
• DDL defines
• DML manipulates
SQL is Declarative, RA is Procedural
• SQL Statements describe the desired results,
but do not specify a sequence of operations
to get those results
• Relational Algebra expressions describe a
specific sequence of operations to perform
• To evaluate a SQL statement, it needs to be
translated into (a computer implementation
of) RA first!
Query Processing
• Query Processing is the translation of SQL into RAlike nested function calls
• One query can have multiple translations
SELECT roomNo, HotelName FROM Room, Hotel WHERE
HotelName = ‘Savoy’ and Room.hotelNo=Hotel.hotelNo;
roomNo,HotelName ( hotelName=Savoy and Room.hotelNo=Hotel.hotelNo (Room x
Hotel) )
roomNo,HotelName ( Room.hotelNo=Hotel.hotelNo ( (hotelName=Savoy (Hotel)) x
Room) )
Query Optimization
• Choose the translation that minimizes resource use
(time, space)
• The second translation below is better. (Why?)
SELECT roomNo, HotelName FROM Room, Hotel WHERE
HotelName = ‘Savoy’ and Room.hotelNo=Hotel.hotelNo;
roomNo,HotelName ( hotelName=Savoy and Room.hotelNo=Hotel.hotelNo (Room x
Hotel) )
roomNo,HotelName ( (hotelName=Savoy (Hotel)) 
Room.hotelNo=Hotel.hotelNo Room)
Query Processing
SQL
Decomposition
Relational
Algebra
Optimization
User
Result
(table)
Processing
Engine
Efficient Rel.
Algebra
More Detail of Query Processing
Steps in Query Processing
• Query Decomposition (create relational
algebra expression)
• Query Optimization (create execution plan)
• Code Generation
• Query Execution
Parts of Optimization
• Query Plan Generator
– Comes up with viable relational algebra
expressions to improve the initial naïve one
• Cost Estimator
– Estimates the cost (time / space) of each plan
• Optimization
– Choosing the plan with the lowest cost, or at least
“reasonably cheap”
Query Decomposition
• Check Syntax
• Build a relational algebra tree
 Room.hotelNo=Hotel.hotelNo
X
hotelName=Savoy
Hotel
Room
Query Transformation (Selection)
• Select with multiple AND conditions can be
sequence of selects
hotelName=Savoy and Room.hotelNo=Hotel.hotelNo( …)
= hotelName=Savoy (Room.hotelNo=Hotel.hotelNo ( …))
• Order of Select operations doesn’t matter
=  Room.hotelNo=Hotel.hotelNo ( hotelName=Savoy ( …))
Query Transformation (Projection)
• Extra intermediate projections don’t matter
Name(Name, Status(Student))= Name (Student)
• Order of select and project doesn’t matter
Status=‘SR’(Status(Student)) =
Status(Status=‘SR’(Student))
Query Transformation (Join)
• Push Select through Join
– Replace a select on a cross-product with a join
– Joins can be implemented at the lowest level more efficiently
than “materialized cross-product”
• Push Select through Product
– If the attributes of the condition all belong to one table of the
join, put the select on only the one table, so a smaller table
is joined
– Joins on smaller tables are faster than on larger ones
• More rules pp. 640-642
Pushing Select Example
• Find all seniors that take CPSC356
stu_id=id & crs=‘CPSC356’(Student x Transcript)
• Separate the selects
stu_id=id ( crs=‘CPSC356’(Student x Transcript))
• Push the inner select
stu_id=id (Student x (crs=‘CPSC356’(Transcript)))
• Replace Select/Product by Join
Student |x| stu_id=id (crs=‘CPSC356’(Transcript)))
Query Processing Example
Query Processing Example
Execution Plans
• Add specific algorithms to each relational
algebra step
• Determine whether/how indices will be used
• Add pipelining (not storing intermediate data)
where possible
Choosing Transformations
• Estimate cost of each tree based on
–
–
–
–
Table sizes
Numbers of distinct attribute values
Average number of tuples for selection condition
Methods used for join (e.g. indexed, hashed)
• Choose lowest cost tree
• Because estimates aren’t perfect, the
absolute best tree might not be chosen!
Heuristics (Rules of Thumb)
• Perform Selection as early as possible
– Unless doing it later lets you use an index
•
•
•
•
Combine X and Selection into join operation
Execute most restrictive Selections first
Perform Projection as early as possible
Compute common expressions once
– Creating a view is a way to do this!
Consequences for SQL Programmer
• Using RA operations in SQL (e.g. explicit Join)
constrains optimization
– Good when “programmer knows best”
– Bad when programmer prevents a better optimization
• Intermediate tables (i.e. views) can constrain
optimization
– Use views to compute common subexpressions
• When performance is substandard, tweaking the SQL
can help! (Remember the heuristics).