Download Parallel Query Processing

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Microsoft Access wikipedia , lookup

Operational transformation wikipedia , lookup

Information privacy law wikipedia , lookup

Asynchronous I/O wikipedia , lookup

Business intelligence wikipedia , lookup

SQL wikipedia , lookup

Data vault modeling wikipedia , lookup

Entity–attribute–value model wikipedia , lookup

Database wikipedia , lookup

Microsoft SQL Server wikipedia , lookup

SAP IQ wikipedia , lookup

Clusterpoint wikipedia , lookup

Versant Object Database wikipedia , lookup

Extensible Storage Engine wikipedia , lookup

Relational model wikipedia , lookup

Database model wikipedia , lookup

Transcript
Physical Database
Design & Performance
Optimizing for Query Performance


For DBs with high retrieval traffic as
compared to maintenance traffic, optimizing
the DB for query performance is the primary
goal.
The amount of work required to do so greatly
depends on the type of DBMS.


Some DBMS give little control to DB designer /
query writer on how a query is processed.
Other give significant control to tune DB design
and structure of queries.
Parallel Query Processing




With the advent of multiple processors, many DB servers
are now frequently using SMP technology.
To exploit this parallel processing capability, some
DBMS include strategies for breaking apart a query into
modules that can be processed in parallel by each of the
processors.
A common approach is to replicate the query so that
each copy works against a portion of the database
(usually horizonally partitioned row sets).
The same query runs in parallel on separate processors,
intermediate results from each processor are combined
to create final query result
Parallel Query Processing


Parallel query processing speeds can be impressive
(e.g., for a table scan, it can take half processing time as
compared to normal processing)
Other table operations that can make use of parallel
processing:







Table joins
Grouping table results into categories
Union operation
Sorting rows
Computing aggregate values
Row update, delete and inserts
Creating and rebuilding index
Overriding Automatic Query
Optimization




With most relational DBMSs, you can learn the
optimizer’s plan for processing the query before it
runs (EXPLAIN command).
The query optimizer uses he best plan based on
statistics about each table (no. of rows, avg row
length, etc.)
User must analyze the query costs before
processing them.
In Oracle, we may force a full scan as well as
parallel scan in a query that counts total no. of
orders (see example next slide).
Overriding Automatic Query
Optimization





Select /* + full(order) parallel(order,3)*/ count(*)
from orders
where salesperson=“Smith”
The clause inside /* */ is a hint to Oracle to
override the actual query plan.
A hint is specific to each query.
Picking Block Size




Data is transferred between RAM and disk
memory in blocks or pages.
Too small block size will increase I/O while too
large one may result in extra data transfer.
Usually min. block size is 2K bytes, typically max
. size is 32 bytes or more (depends on OS)
In general small block sizes are used for OLTP
applications and larger sizes for DSS and data
warehousing solutions
Picking Block Size (Contd.)

Trade-offs among 5 performance factors while switching
from small to large block sizes:
Block contention


Random row access


In concurrent access of same block by several I/O commands,
smaller blocks create less contention
When one row from a table has to be accessed, smaller blocks
are best, e.g., in case of OLTP applications
Sequential row access

When many rows have to be accessed sequentially, larger block
size is better.
Picking Block Size (Contd.)



Row size (length of all fields in a table row)


Large blocks allow many rows to be cached in RAM in one I/O
operation.
Sequential scans occur in DSS and Datawarehousing
application.
It is usually best to match block size with physical table row
size or a multiple of row size.
Overhead


This is the cost in terms of ‘time’ to manage I/O operations for
a database operation.
Smalle block sizes have more overhead than larger ones.
How to Design Better Queries?

Various guidelines have been suggested by
various DB experts for improving query
processing.

Understand how indexes are used in query
processing




Many DBMS use only 1 index per table in a query.
Learn how DBMS selects which index.
Drop infrequently used indexes.
Queries using equality criteria process faster as they
can be evaluated via indexes.
How to Design Better Queries?

Use compatible data types for fields & literals in
queries


Write simple queries


Simple queries are easy to process.
Break complex queries into multiple, simple parts


Compatible means DBMS will avoid to convert data during
query processing.
Result of smaller queries can be combined (using UNION)
Don’t nest one query inside another

Such queries are less efficient.
How to Design Better Queries?

Don’t combine a table with itself


Create temporary tables for groups of queries



Instead of self-join, make temp. copy of the table and then
relate original with temp. table
Sometimes a series of queries all refer to same subset of
data from database.
it will be more efficient to store this subset in one or more
temp. tables to avoid scanning DB again and again
Combine update operations

When possible, combine multiple update commands into
one.
How to Design Better Queries?

Receive only the data you need


Don’t have the DBMS sort without an index



Reduce processing time by avoiding extra data columns
which are not required (e.g. avoid use of select *)
Suppose data has to be displayed in sorted order and
index does not exist on sort key field.
Sort the data after it has been retrieved from DB
Learn


Review query plans using EXPLAIN command
Understand ways in which DBMS determines query
processing