* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Parallel Query Processing
Survey
Document related concepts
Microsoft Access wikipedia , lookup
Operational transformation wikipedia , lookup
Information privacy law wikipedia , lookup
Asynchronous I/O wikipedia , lookup
Business intelligence wikipedia , lookup
Data vault modeling wikipedia , lookup
Entity–attribute–value model wikipedia , lookup
Microsoft SQL Server wikipedia , lookup
Clusterpoint wikipedia , lookup
Versant Object Database wikipedia , lookup
Extensible Storage Engine wikipedia , lookup
Transcript
Physical Database Design & Performance Optimizing for Query Performance For DBs with high retrieval traffic as compared to maintenance traffic, optimizing the DB for query performance is the primary goal. The amount of work required to do so greatly depends on the type of DBMS. Some DBMS give little control to DB designer / query writer on how a query is processed. Other give significant control to tune DB design and structure of queries. Parallel Query Processing With the advent of multiple processors, many DB servers are now frequently using SMP technology. To exploit this parallel processing capability, some DBMS include strategies for breaking apart a query into modules that can be processed in parallel by each of the processors. A common approach is to replicate the query so that each copy works against a portion of the database (usually horizonally partitioned row sets). The same query runs in parallel on separate processors, intermediate results from each processor are combined to create final query result Parallel Query Processing Parallel query processing speeds can be impressive (e.g., for a table scan, it can take half processing time as compared to normal processing) Other table operations that can make use of parallel processing: Table joins Grouping table results into categories Union operation Sorting rows Computing aggregate values Row update, delete and inserts Creating and rebuilding index Overriding Automatic Query Optimization With most relational DBMSs, you can learn the optimizer’s plan for processing the query before it runs (EXPLAIN command). The query optimizer uses he best plan based on statistics about each table (no. of rows, avg row length, etc.) User must analyze the query costs before processing them. In Oracle, we may force a full scan as well as parallel scan in a query that counts total no. of orders (see example next slide). Overriding Automatic Query Optimization Select /* + full(order) parallel(order,3)*/ count(*) from orders where salesperson=“Smith” The clause inside /* */ is a hint to Oracle to override the actual query plan. A hint is specific to each query. Picking Block Size Data is transferred between RAM and disk memory in blocks or pages. Too small block size will increase I/O while too large one may result in extra data transfer. Usually min. block size is 2K bytes, typically max . size is 32 bytes or more (depends on OS) In general small block sizes are used for OLTP applications and larger sizes for DSS and data warehousing solutions Picking Block Size (Contd.) Trade-offs among 5 performance factors while switching from small to large block sizes: Block contention Random row access In concurrent access of same block by several I/O commands, smaller blocks create less contention When one row from a table has to be accessed, smaller blocks are best, e.g., in case of OLTP applications Sequential row access When many rows have to be accessed sequentially, larger block size is better. Picking Block Size (Contd.) Row size (length of all fields in a table row) Large blocks allow many rows to be cached in RAM in one I/O operation. Sequential scans occur in DSS and Datawarehousing application. It is usually best to match block size with physical table row size or a multiple of row size. Overhead This is the cost in terms of ‘time’ to manage I/O operations for a database operation. Smalle block sizes have more overhead than larger ones. How to Design Better Queries? Various guidelines have been suggested by various DB experts for improving query processing. Understand how indexes are used in query processing Many DBMS use only 1 index per table in a query. Learn how DBMS selects which index. Drop infrequently used indexes. Queries using equality criteria process faster as they can be evaluated via indexes. How to Design Better Queries? Use compatible data types for fields & literals in queries Write simple queries Simple queries are easy to process. Break complex queries into multiple, simple parts Compatible means DBMS will avoid to convert data during query processing. Result of smaller queries can be combined (using UNION) Don’t nest one query inside another Such queries are less efficient. How to Design Better Queries? Don’t combine a table with itself Create temporary tables for groups of queries Instead of self-join, make temp. copy of the table and then relate original with temp. table Sometimes a series of queries all refer to same subset of data from database. it will be more efficient to store this subset in one or more temp. tables to avoid scanning DB again and again Combine update operations When possible, combine multiple update commands into one. How to Design Better Queries? Receive only the data you need Don’t have the DBMS sort without an index Reduce processing time by avoiding extra data columns which are not required (e.g. avoid use of select *) Suppose data has to be displayed in sorted order and index does not exist on sort key field. Sort the data after it has been retrieved from DB Learn Review query plans using EXPLAIN command Understand ways in which DBMS determines query processing