* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download a. Database
Survey
Document related concepts
Tandem Computers wikipedia , lookup
Serializability wikipedia , lookup
Microsoft Access wikipedia , lookup
Entity–attribute–value model wikipedia , lookup
Oracle Database wikipedia , lookup
Functional Database Model wikipedia , lookup
Microsoft SQL Server wikipedia , lookup
Ingres (database) wikipedia , lookup
Concurrency control wikipedia , lookup
Open Database Connectivity wikipedia , lookup
Microsoft Jet Database Engine wikipedia , lookup
Extensible Storage Engine wikipedia , lookup
Relational model wikipedia , lookup
Clusterpoint wikipedia , lookup
Transcript
Product Name Here Tivoli Software (via Subject - File>Prop>Summary) Relational Database Design and Performance Tuning for DB2 Database Servers How to document Document Version: 1.0 Document Status: <Draft > NOTE: The hard copy version of this document is FOR REFERENCE ONLY. Online version is the master It is the responsibility of the user to ensure that they have the current version. Any outdated hard copy is invalid and must be removed from possible use. IBM SWG - Tivoli Software It is also the responsibility of the user to ensure the completeness of this document prior to use. 3901 S. Miami Blvd. Durham, NC 27703 +1 (919) 224-1598 +1 (919) 224-2560 Issue Date: Author: Edward Bernal Template version 4 IBM Confidential When Filled In 1 Product Name Here Tivoli Systems (via Subject - File>Prop>Summary) HOW TO DOCUMENT .............................................................................................................................................................. 1 1. DOCUMENT CONTROL ................................................................................................................................................... 3 1.1 2. SUMMARY OF CHANGES .................................................................................................................................................. 3 INTRODUCTION ................................................................................................................................................................ 4 2.1 DEFINITIONS ................................................................................................................................................................... 4 2.2 PERFORMANCE FACTORS ................................................................................................................................................ 5 2.2.1 Database Design .................................................................................................................................................... 5 2.2.2 Application Design ................................................................................................................................................. 5 2.2.3 Hardware Design and Operating System Usage .................................................................................................... 5 3. DATABASE DESIGN DETAILS ....................................................................................................................................... 5 3.1 KEY FACTORS ................................................................................................................................................................. 6 3.1.1 Tablespaces ............................................................................................................................................................ 6 3.1.2 Tables ..................................................................................................................................................................... 7 3.1.3 Bufferpools ............................................................................................................................................................. 8 3.1.4 Logging .................................................................................................................................................................. 9 3.1.5 Indexing ................................................................................................................................................................ 10 3.2 DATABASE MAINTENANCE ........................................................................................................................................... 14 3.2.1 REORG ................................................................................................................................................................. 14 3.2.2 RUNSTATS ........................................................................................................................................................... 15 3.2.3 REBIND ................................................................................................................................................................ 16 4. APPLICATION DESIGN DETAILS ............................................................................................................................... 17 4.1 KEY FACTORS ............................................................................................................................................................... 17 4.1.1 Review application SQL for efficiencies *** IMPORTANT *** .......................................................................... 17 4.1.2 Concurrency Control and Isolation Level ............................................................................................................ 17 4.1.3 Locking ................................................................................................................................................................. 18 4.1.4 Fetch needed columns only .................................................................................................................................. 19 4.1.5 Reuse resources .................................................................................................................................................... 19 4.1.6 SQL Statement Preparation.................................................................................................................................. 19 4.1.7 Query Tuning........................................................................................................................................................ 20 5. HARDWARE DESIGN AND OPERATING SYSTEM USAGE ................................................................................... 21 5.1 KEY FACTORS ............................................................................................................................................................... 21 5.1.1 Memory ................................................................................................................................................................ 21 5.1.2 CPU ...................................................................................................................................................................... 21 5.1.3 I/O ........................................................................................................................................................................ 22 5.1.4 Network ................................................................................................................................................................ 23 6. TUNING OPPORTUNITIES ............................................................................................................................................ 24 6.1 6.2 6.3 6.4 6.5 7. INSERT/DELETE PERFORMANCE .................................................................................................................................... 24 DATABASE MANAGER CONFIGURATION TUNING .......................................................................................................... 25 DATABASE CONFIGURATION TUNING ........................................................................................................................... 26 BUFFERPOOLS ............................................................................................................................................................... 27 REGISTRY VARIABLES .................................................................................................................................................. 27 MONITORING TOOLS.................................................................................................................................................... 28 IBM Confidential 27. June 2017 CMDB RR API SDD1, Ling Tai Page 2 of 35 Product Name Here Tivoli Systems 8. (via Subject - File>Prop>Summary) TUNING METHODOLOGY ............................................................................................................................................ 28 8.1 TUNING APPROACH ....................................................................................................................................................... 28 8.2 SKILLS NEEDED ............................................................................................................................................................ 29 8.3 GENERAL PROCESS ....................................................................................................................................................... 29 8.4 DB2 SPECIFIC TUNING .................................................................................................................................................. 30 8.4.1 SQL Reviews ......................................................................................................................................................... 30 8.4.2 Database Statistics ............................................................................................................................................... 31 8.4.3 SNAPSHOT and EVENT Monitors ....................................................................................................................... 32 8.4.4 DB2BATCH .......................................................................................................................................................... 34 9. PUBLICATIONS & ONLINE HELP ............................................................................................................................... 34 10. SIZING ............................................................................................................................................................................ 34 11. REFERENCE ................................................................................................................................................................. 35 12. ADDENDUM .................................................................................................................................................................. 35 1. Document Control 1.1 Summary of Changes The table below contains the summary of changes: Version 1.0 Date Description of changes Distribution of 1st draft IBM Confidential 27. June 2017 CMDB RR API SDD1, Ling Tai Page 3 of 35 Product Name Here Tivoli Systems 2. (via Subject - File>Prop>Summary) Introduction Any performance tuning requires some knowledge of the application involved. Relational Database systems are no different. In fact, the more knowledge available about an application, the better one can make decisions that will positively affect the performance of that application. While it is possible to perform some amount of tuning of Relational Database Systems after the fact, the more attention you pay to the overall design of your entire system up front will generally yield the best results. While much of this material is applicable to DB2 on all available platforms, this paper specifically deals only with DB2 on distributed platforms, i.e. Windows, UNIX, and Linux. The material was obtained from readily available publications on DB2 and from over 18 years experience with designing, developing, and tuning DB2 systems. While DB2 specific, many of the concepts are applicable to Relational Databases in general, such as Oracle and Microsoft SQL Server. This document is not intended to replace the detailed information that is available in various manuals, Redbooks, etc., that deal specifically with DB2 performance. The intent is to point out some of the major factors that affect DB2 performance and, hopefully, make it an easy to use and understand reference so that the user will not have to read through and understand in detail all of the previously mentioned reference material. The detailed material can, and should, be referenced when addressing a specific area of concern. Refer to Section 9 for a listing of some of these publications. It is important to note that all of these factors should be addressed for each application developed. 2.1 Definitions Throughput - The amount of data transferred from one place to another or processed in a specified amount of time. Data transfer rates for disk drives and networks are measured in terms of throughput. Typically, throughputs are measured in kbps, Mbps and Gbps. Optimizer - When an SQL statement needs to be executed, the SQL compiler needs to determine the access plan to the database tables. The optimizer creates this access plan, using information about the distribution of data in specific columns of tables and indexes if these columns are used to select rows or join tables. The optimizer uses this information to estimate the costs of alternative access plans for each query. Its decisions are heavily influenced by statistical information about the size of the database tables and available indexes. Clustered Index - An index whose sequence of key values closely corresponds to the sequence of rows stored in a table. The degree to which this correspondence exists is measured by statistics that are used by the optimizer Cardinality – with respect to tables, the number of rows in the table. With respect to indexed columns, the number of distinct values of that column in a table. IBM Confidential 27. June 2017 CMDB RR API SDD1, Ling Tai Page 4 of 35 Product Name Here Tivoli Systems 2.2 (via Subject - File>Prop>Summary) Prefetch - an operation in which data is read before, and in anticipation of, its use. DB2 supports the following mechanisms: sequential prefetch - A mechanism that reads consecutive pages into the buffer pool before the pages are required by the application. List Prefetch - Sometimes called list sequential prefetch. Prefetches a set of nonconsecutive data pages efficiently. Performance Factors There are a number of areas that factor into the overall performance of any application. Below are the general areas of concern along with an explanation of each. 2.2.1 Database Design The term “Database Design” can mean a lot of things to a lot of people. There are two main types of data models: a logical and a physical. A “Logical” model is a representation, often graphical in nature, of the information requirements of a business area, it is not a database. Its main purpose is to ensure that its structure and content can represent and support the business requirements of an area of interest. It is independent of any database technology. After completing your logical database design, there are a number of issues you should consider about the physical environment in which your database and tables will reside. These include understanding the files that will be created to support and manage your database, understanding how much space will be required to store your data, and determining how you should use the tablespaces that are required to store your data. This document deals only with the aspects of physical database design. 2.2.2 Application Design For the purposes of this document, application design deals with aspects of how you access your database system. There are a number of techniques that will be discussed that, if used, can positively influence the performance of your application. 2.2.3 Hardware Design and Operating System Usage For any database system, there are a number of common areas that need to be addressed and sized appropriately in order to support your application workload. This section will discuss common, and platform specific, hardware and operating system components. 3. Database Design Details IBM Confidential 27. June 2017 CMDB RR API SDD1, Ling Tai Page 5 of 35 Product Name Here Tivoli Systems 3.1 (via Subject - File>Prop>Summary) Key Factors 3.1.1 Tablespaces A tablespace is a physical storage object that provides a level of indirection between a database and the tables stored within the database. It is made up of a collection of containers into which database objects are stored. A container is an allocation of space to a table space. Depending on |the table space type, the container can be a directory, device, or file. The data, index, long field, and LOB portions of a table can be stored in the same table space, or can be individually broken out into separate table spaces. When working with database systems, the main objective is to be able to store and retrieve data as quickly and efficiently as possible. One important consideration when designing your database or analyzing a performance problem on an existing database is the physical layout of the database itself. DB2 provides support for two types of tablespaces: System Managed Space (SMS) - stores data in operating system files. They are an excellent choice for general purposes use. They provide good performance with little administration cost. Database Managed Space (DMS) - with database-managed space (DMS) table spaces, the database manager controls the storage space. A list of devices or files is selected to belong to a tablespace when it is defined. The space on those devices or files is managed by the DB2 database manager. There is some additional administration cost with DMS tablespaces primarily due to monitoring and adjusting the size of the pre-allocated files. A DMS tablespace can be easily increased in size by either ALTERing an existing container or adding a new container to it. 3.1.1.1 Recommendations DMS tablespaces usually perform better than SMS tablespaces because they are pre-allocated and do not have to spend time extending files when new rows are added. DMS tablespaces can be either raw devices or file system files. DMS table spaces in raw device containers provide the best performance because double buffering does not occur. Double buffering, which occurs when data is buffered first at the database manager level and then at the file system level, might be an additional cost for file containers or SMS table spaces. If you use SMS tablespaces, consider using the db2empfa command on your database. The db2empfa (Enable Multipage File Allocation) tool enables the use of multipage file allocation for a database. With multipage file allocation enabled for SMS table spaces, disk space is allocated one extent rather than one page at a time, improving INSERT throughput. Using DMS tablespaces also allows a single table to store its data, index, and large objects on up to three different DMS tablespaces, thus improving performance through parallel disk I/O operations. Parallel I/O is the process of reading from or writing to two or more I/O devices at the same time to reduce response time For example, look at the following statement (assume there are no large object data types): CREATE TABLE T1 (COLA etc. IN TS1 INDEX IN TS2 IBM Confidential 27. June 2017 CMDB RR API SDD1, Ling Tai Page 6 of 35 Product Name Here Tivoli Systems (via Subject - File>Prop>Summary) CREATE INDEX IX1 ON T1 (COLA ASC) The table data will be placed in tablespace TS1 and the index data will be place in tablespace TS2. It would be important to put the different tablespaces on different disk drives in order to enable the possibility of parallel I/O operations. In fact, each tablespace can have multiple containers, and each of those containers could be on a different disk drive. The ideal configuration has to consider a number of factors, such as the number of disks available, RAID level, etc. 3.1.1.2 Platform Specific Recommendations Windows File system caching is performed as follows: For DMS file containers (and all SMS containers), the operating system might cache pages in the file system cache For DMS device container table spaces, the operating system does not cache pages in the file system cache. On Windows, the DB2 registry variable DB2NTNOCACHE specifies whether or not DB2 will open database files with the NOCACHE option. If DB2NTNOCACHE=ON, file system caching is eliminated. If DB2NTNOCACHE=OFF, the operating system caches DB2 files. This applies to all data except for files that contain LONG FIELDS or LOBS. Eliminating system caching allows more memory to be available to the database so that the buffer pool or sortheap can be increased. 3.1.2 Tables As previously discussed, before creating your physical database tables, you should draw a logical design of your data model. We will briefly discuss aspects of logical database design that affect performance. 3.1.2.1 Normalization This is the process of restructuring a data model by reducing its relations to their simplest forms. It is a key step in the task of building a logical relational database design. Normalization reduces redundancy from your data and can improve the performance of update and delete statements, since you only have to do it in one place. By normalizing your data, you try to ensure that all columns in the table depend on the primary key. The disadvantage of a fully normalized data structure is in data retrieval operations, specifically when a query is accessing a large number of related pieces of data from different tables via join operations. For more information about Normalization, author C.J. Date is one of the better resources. Access to his works can be located by searching the internet. 3.1.2.2 Denormalization The intentional duplication of columns in multiple tables whose consequence is increased data redundancy. Denormalization is sometimes necessary to minimize performance problems and is a key IBM Confidential 27. June 2017 CMDB RR API SDD1, Ling Tai Page 7 of 35 Product Name Here Tivoli Systems (via Subject - File>Prop>Summary) step in designing a physical relational database design. The decision about whether to denormalize or not should be based on the following: 3.1.2.3 Can you utilize and implement some of the other methods described to tune your database and improve performance to an acceptable level without denormalizing Can you quantify the likely performance gains by denormalizing, and are they a reasonable tradeoff against the added update overhead? Other considerations NULLs - In general, columns defined as NOT NULL perform better than nullable columns due to the path length reduction. The database manager does not have to check for null values in a NOT NULL column. Also, every nullable column requires one extra byte per column value. Use NULLs where appropriate and not as a default. Column lengths – you should define your column lengths, particularly VARCHAR columns, as small as possible for your application. By doing this there will be space savings which may lead to a reduced number of used table and index pages, and index levels, which can improve query performance. o If you create an index on a column defined as VARCHAR, all index entries for that column take the maximum length of the VARCHAR definition, even is the actual data length in the column is smaller than the maximum. Identity Columns - Significant performance enhancement can be realized using DB2 generated identity values compared to those implemented by an application. They are typically used for generating unique primary key values. Put frequently updated columns together and at the end of the row. This has an effect on update performance due to the following logging considerations: o For fixed lengths row updates, DB2 logs from the first changed column to the last changed column o For variable length row updates, DB2 logs from the first changed byte to the end of the row. If the length of a variable length column changes, this will result in a change to the row header (which includes the row length), and thus the entire row will be logged. 3.1.3 Bufferpools A bufferpool is an area of memory into which database pages are read, modified, and held during processing. On any system, accessing memory is faster than disk I/O. DB2 uses database buffer pools to attempt to minimize disk I/O. There is no definitive answer to the question of how much memory you should dedicate to the buffer pool. Generally, more is better. A good rule of thumb would be to start with about 75% of your system’s main memory devoted to buffer pool(s), but this rule is applicable only if the machine is a dedicated database server. Since it is a memory resource, its use has to be considered along with all other applications and processes running on a server. IBM Confidential 27. June 2017 CMDB RR API SDD1, Ling Tai Page 8 of 35 Product Name Here Tivoli Systems (via Subject - File>Prop>Summary) A spreadsheet will be provided later in this document that can be use to estimate DB2 memory usage If your tablespaces have multiple page sizes, then you should create only one buffer pool for each page size. There are some cases where defining multiple buffer pools of the same size can improve performance, but, if badly configured, can have a huge negative impact on performance. Consider the following when deciding to create multiple buffer pools: You create tables which reside in table spaces using a page size other than the 4 KB default. This is required (as mentioned above). You have tables which are accessed frequently and quickly by many short update transaction applications. Dedicated buffer pool(s) for these tables may improve response times. You have tables larger than main memory which are always fully scanned. These could have their own dedicated buffer pool. 3.1.4 Logging On of the main purposes of all database systems is to maintain the integrity of your data. All databases maintain log files that keep records of database changes. DB2 logging consists of a set of primary and secondary log files that contain log records that record all changes to a database. The database log is used to roll back changes for units of work that are not committed and to recover a database to a consistent state. DB2 provides two logging strategy choices. 3.1.4.1 Circular logging This is the default log mode. With circular logging, the log records fill the log files and then overwrite the initial log records in the initial log file. The overwritten log records are not recoverable. This type of logging is typically not suited for a production application. 3.1.4.2 Log Retain logging Each log file is archived when it fills with log records. New log files are made available for log records. Retaining log files enables roll-forward recovery. Roll-forward recovery reapplies changes to the database based on completed units of work (transactions) that are recorded in the log. You can specify that rollforward recovery is to the end of the logs, or to a particular point in time before the end of the logs. Archived log files are never directly deleted by DB2, therefore, it is the applications responsibility to maintain them; i.e. archive, purge, etc. 3.1.4.3 Log Performance Ignoring the performance of your database in relation to its logging can be a costly mistake, the main cost being time. Placement of the log files needs to be optimized, not only for write performance, but also for read performance, because the database manager will need to read the log files during database recovery. 3.1.4.4 Recommendations Use the fastest disks available for your log files IBM Confidential 27. June 2017 CMDB RR API SDD1, Ling Tai Page 9 of 35 Product Name Here Tivoli Systems (via Subject - File>Prop>Summary) o Use a separate array and/or channel if possible Use Log Retain logging Mirror your log files Increase the size of the database configuration Log Buffer parameter (logbufsz) o This parameter specifies the amount of the database heap to use as a buffer for log records before writing these records to disk. The log records are written to disk when one of the following occurs: A transaction commits, or a group of transactions commit, as defined by the mincommit configuration parameter. The log buffer is full. As a result of some other internal database manager event. o Buffering the log records will result in more efficient logging file I/O because the log records will be written to disk less frequently, and more log records will be written at each time. 3.1.5 Tune the Log File Size (logfilsiz) database configuration parameter so that you are not creating excessive log files. Indexing An index is a set of keys, each pointing to a row, or rows in a table. An index serves two primary purposes: 1. To ensure uniqueness, as in the case of a Primary Key. Unique indexes can be created to ensure uniqueness of the index key. An index key is a column or an ordered collection of columns on which an index is defined. Using a unique index will ensure that the value of each index key in the indexed column or columns is unique. 2. To allow more efficient access to rows in a table by creating a direct path to the data through pointers. The SQL optimizer automatically chooses the most efficient way to access data in tables. The optimizer takes indexes into consideration when determining the fastest access path to data. The main advantages of indexes were pointed out above. Creating an index to ensure uniqueness is typically driven by a business requirement of the application, and thus, are absolutely necessary to create. Beyond that, you should be very careful about the number and size of the indexes that you create on your database tables. Each additional index will have an impact on the following: disk storage INSERT and DELETE Processing o CPU o I/O IBM Confidential 27. June 2017 CMDB RR API SDD1, Ling Tai Page 10 of 35 Product Name Here Tivoli Systems (via Subject - File>Prop>Summary) o Logging Database Maintenance o REORG o RUNSTATS 3.1.5.1 Index Considerations The creation of any index for the purpose of efficient access should be done based on a review of the actual SQL that has been, or will be, written against the tables. In addition to the above mentioned comment, the following items need to be considered when determining which indexes to create. The size of the index, determined by: o The number and size of the columns in the index o The projected, or actual, volume of data in your tables. Indexes are implemented as a B-Tree structure, with a root level, intermediate level(s), and leaf pages (the lowest level) which contain actual pointers to the data. A typical index on a moderately sized table would be 3 levels deep. That means that if the optimizer decided to use this index to read the data, it would have to do a minimum of 4 I/O’s to satisfy the query: 1. the index root page 2. one intermediate level page 3. one leaf page 4. one data page As the size of the index increases, the number of pages needed to store the index entries and/or the number of intermediate levels increases as well. The Cardinality of the indexed columns – one of the most often overlooked issues when creating an index. Consider the following example to illustrate the point: CREATE TABLE T1 (COLA INTEGER COLB SMALLINT COLC CHAR(10) NOT NULL, NOT NULL, NOT NULL) CREATE UNIQUE INDEX IX1 ON T1 (COLA ASC) CLUSTER CREATE INDEX IX2 ON T1 (COLB ASC) For this example, assume 1. T1 has 100,000,000 rows 2. the cardinality of COLB is 4, with an even distribution among the rows IBM Confidential 27. June 2017 CMDB RR API SDD1, Ling Tai Page 11 of 35 Product Name Here Tivoli Systems (via Subject - File>Prop>Summary) 3. we are not considering Multi-Dimensional Clustering (MDC) 4. the following query is typically run against this table SELECT COLA, COLC FROM T1 WHERE COLB = 3 On the surface, this looks like it all makes sense. We have a SELECT statement with an “=” predicate on COLB, and we have an index defined on that column, what could be better! This table has a clustering index defined on COLA. The intent of a clustering index is so that the sequence of key values closely corresponds to the sequence of rows stored in a table. Each table can only have one clustering index, so our index on COLB is not clustered. Assuming an equal distribution of rows across the COLB predicate, our WHERE clause, “COLB = 3”, will return 25,000,000 rows to our query: 100,000,000/4. It is highly unlikely that the DB2 optimizer would select the use of the COLB index (IX2) to satisfy this query, due to high I/O cost. It would more likely decide to scan the entire table by taking advantage of sequential Prefetch. Since this index would not be used to satisfy the query, there would only be a negative impact by creating it as discussed above; i.e. disk, CPU, INSERT and DELETE processing, etc. The order of the columns in the index – another of the most often overlooked issues when creating an index. Consider the following example to illustrate the point CREATE TABLE T1 (COLA INTEGER COLB SMALLINT COLC CHAR(10) COLD INTEGER) NOT NULL, NOT NULL, NOT NULL, CREATE UNIQUE INDEX IX1 ON T1 (COLA ASC) CREATE INDEX IX2 ON T1 (COLB ASC, COLC ASC, COLD ASC) CLUSTER For this example, assume IBM Confidential 27. June 2017 CMDB RR API SDD1, Ling Tai Page 12 of 35 Product Name Here Tivoli Systems (via Subject - File>Prop>Summary) 1. T1 has 100,000,000 rows 2. the following query is typically run against this table SELECT COLB, COLD FROM T1 WHERE COLC = ‘DATA’ AND COLD > 42 We have a SELECT statement with an “=” predicate on COLC, a second predicate with a “>” predicate on COLD, and we have an index (IX2) defined on those columns, plus COLB, which is also in our SELECT list. Will the DB2 optimizer select index IX2 to satisfy this query? The answer is not so clear. Remember that DB2 indexes are created as a B-tree structure. The first column of the IX2 index is COLB, but we have not provided a predicate for that column in our where clause, so DB2 can not effectively navigate the B-tree since the high order part of the key was not provided. This does not, however, eliminate the possibility of the use of this index. Since all of the data referenced (COLB, COLC, and COLD) is in the index, DB2 has two possibilities, either scan the entire table, or scan the entire index. Since there would be fewer pages in the index, and since there is no need to go to the data pages, an index scan would probably be selected. Index scans are also eligible for Prefetch operations. If this was a typical query used against this table, a better definition of the IX2 index would be to order the columns as follows: CREATE INDEX IX2 ON T1 (COLC ASC, COLD ASC, COLB ASC) CLUSTER This would allow effective use of the B-tree structure by providing matching values on the high order part of the key. These examples were provided for illustration only. You need to consider all of the database access queries against your tables, not just an individual SELECT statement, in order to determine the best index strategy. 3.1.5.2 Index Recommendations o Create as few indexes as possible o Consider creating the INDEXES with the “ALLOW REVERSE SCANS” option IBM Confidential 27. June 2017 CMDB RR API SDD1, Ling Tai Page 13 of 35 Product Name Here Tivoli Systems (via Subject - File>Prop>Summary) o Pay close attention to the order of the columns in the index o Don’t create redundant indexes o Use DB2 “Explain” facilities to determine the actual usage of the indexes 3.2 Database Maintenance Regular maintenance is a critical factor in the performance of a database environment. This involves running the Reorg, Runstats, and Rebind facilities, in that order, on the database tables. A regularly scheduled maintenance plan is essential in order to maintain peak performance of your system. 3.2.1 REORG After many changes to table data, caused by INSERT, DELETE, and UPDATE of variable length columns activity, logically sequential data may be on non-sequential physical data pages so that the database manager must perform additional read operations to access data. You can reorganize DB2 tables to eliminate fragmentation and reclaim space using the REORG Command. Significant reductions in elapsed times due to improved I/O can result from regularly scheduled REORG’s. DB2 provides two types of REORG operation. o Classic REORG Provides the fastest method of REORG Indexes are rebuilt during the reorganization Ensures perfectly ordered data Access is limited to read-only during the UNLOAD phase, no access during other phases Is not re-startable o In-Place REORG 3.2.1.1 Slower than the Classic REORG, takes longer to complete Does not ensure perfectly ordered data or indexes Requires more log space Can be paused and re-started Can allow applications to access the database during reorganization Recommendations Implement a regularly scheduled maintenance plan IBM Confidential 27. June 2017 CMDB RR API SDD1, Ling Tai Page 14 of 35 Product Name Here Tivoli Systems (via Subject - File>Prop>Summary) If you have an established database maintenance window, use the Classic REORG If you operate a 24 by 7 operation, use the In-Place REORG 3.2.2 RUNSTATS It was mentioned earlier that the DB2 optimizer uses information and statistics in the DB2 catalog in order to determine the best access to the database based on the query provided. Statistical information is collected for specific tables and indexes in the local database when you execute the RUNSTATS utility. When significant numbers of table rows are added or removed, or if data in columns for which you collect statistics is updated, execute RUNSTATS again to update the statistics. Use the RUNSTATS utility to collect statistics in the following situations: When data has been loaded into a table and the appropriate indexes have been created. When you create a new index on a table. You need execute RUNSTATS for only the new index if the table has not been modified since you last ran RUNSTATS on it. When a table has been reorganized with the REORG utility. When the table and its indexes have been extensively updated, by data modifications, deletions, and insertions. (“Extensive” in this case may mean that 10 to 20 percent of the table and index data has been affected.) Before binding, or rebinding, application programs whose performance is critical When you want to compare current and previous statistics. If you update statistics at regular intervals you can discover performance problems early. When the prefetch quantity is changed. When you have used the REDISTRIBUTE DATABASE PARTITION GROUP utility. There are various formats of the RUNSTATS command, mainly determining the depth and breadth or statistics collected. The more you collect, the longer the command takes to run. Some of the options are as follows: 3.2.2.1 Collect either SAMPLED or DETAILED index statistics Collecting statistics on all columns or only columns used in JOIN operations Collecting distribution statistics on all, key, or no columns. Distribution statistics are very useful when you have an uneven distribution of data on key columns Recommendations Care must be taken when running RUNSTATS since the information collected will impact the optimizer’s selection of access paths. Implement as part of a regularly scheduled maintenance plan if some of the above mentioned conditions occur IBM Confidential 27. June 2017 CMDB RR API SDD1, Ling Tai Page 15 of 35 Product Name Here Tivoli Systems 3.2.3 (via Subject - File>Prop>Summary) To ensure that the index statistics are synchronized with the table, execute RUNSTATS to collect both table and index statistics at the same time. Consider some of the following factors when deciding what type of statistics to collect o Collect statistics only for the columns used to join tables or in the WHERE, GROUP BY, and similar clauses of queries. If these columns are indexed, you can specify the columns with the ONLY ON KEY COLUMNS clause for the RUNSTATS command. o Customize the values for num_freqvalues and num_quantiles for specific tables and specific columns in tables. o Collect DETAILED index statistics with the SAMPLE DETAILED clause to reduce the amount of background calculation performed for detailed index statistics. The SAMPLE DETAILED clause reduces the time required to collect statistics, and produces adequate precision in most cases. o When you create an index for a populated table, add the COLLECT STATISTICS clause to create statistics as the index is created. REBIND After running RUNSTATS on your database tables, you need to rebind your applications to take advantage of those new statistics. This is done to ensure the best access plan is being used for your SQL statements. How that rebind takes place depends on the type of SQL you are running. DB2 provides support for the following: Dynamic SQL - SQL statements that are prepared and executed at run time. In dynamic SQL, the SQL statement is contained as a character string in a host variable and is not precompiled. Static SQL - SQL statements that are embedded within a program, and are prepared during the program preparation process before the program is executed. After being prepared, a static SQL statement does not change, although values of host variables specified by the statement can change. These Static statements are stored in a DB2 object called a package Both Dynamic SQL statements and Packages can be stored in one of DB2’s cache’s. Based on the above types of SQL, a rebind will take place under these conditions. Dynamic SQL o If the statement is not in the cache, the SQL Optimizer will “bind” the statement and generate a new access plan o If the statement is in the cache, no “rebind” will take place To clear the contents of the SQL cache, use the FLUSH PACKAGE CACHE sql statement Static SQL o An explicit REBIND <package> is executed o Implicitly if the package is marked “invalid” IBM Confidential 27. June 2017 CMDB RR API SDD1, Ling Tai Page 16 of 35 Product Name Here Tivoli Systems (via Subject - File>Prop>Summary) 3.2.3.1 This can occur if, for example, an index that the package was using has been dropped. Recommendations o Perform a REBIND after running RUNSTATS as part of you normal database maintenance procedures. 4. Application Design Details 4.1 Key Factors 4.1.1 Review application SQL for efficiencies *** IMPORTANT *** If there is any one thing that you should focus on from this entire paper, it is this topic. In a significant majority of cases, probably the single most important factor when it comes to performance with DB2, is how efficiently your SQL statements are written. This topic mainly deals with SQL search criteria, which can be present in SELECT, UPDATE, DELETE, or INSERT (through a subselect) statements. Reviewing SQL serves the following purposes: Provides the database designer with the necessary information they need in order to determine the proper indexes that should be created on your database tables. These statements are essential for the designer to be able to create the optimal indexes to support your database access. All of the considerations mentioned above regarding indexes should be considered. Allows an independent review of the SQL for the purpose of utilizing efficient SQL coding techniques Determine if locking strategies are appropriate Assess the impact of changes in your data model or data content Assess the impact of the application of service to the database manager How to review the SQL statements will be discussed in Section 8: Tuning Approach 4.1.1.1 Recommendations o Implement a formal SQL review process for your application(s) 4.1.2 Concurrency Control and Isolation Level An isolation level determines how data is locked or isolated from other processes while the data is being accessed. The isolation level will be in effect for the duration of the unit of work. DB2 supports the following isolation levels, listed in order of most restrictive to least restrictive: IBM Confidential 27. June 2017 CMDB RR API SDD1, Ling Tai Page 17 of 35 Product Name Here Tivoli Systems (via Subject - File>Prop>Summary) Repeatable Read - An isolation level that locks all the rows in an application that are referenced within a transaction. When a program uses repeatable read protection, rows referenced by the program cannot be changed by other programs until the program ends the current transaction. Read Stability - An isolation level that locks only the rows that an application retrieves within a transaction. Read stability ensures that any qualifying row that is read during a transaction is not changed by other application processes until the transaction is completed, and that any row changed by another application process is not read until the change is committed by that process. Cursor Stability - An isolation level that locks any row accessed by a transaction of an application while the cursor is positioned on the row. The lock remains in effect until the next row is fetched or the transaction is terminated. If any data is changed in a row, the lock is held until the change is committed to the database Uncommitted Read - An isolation level that allows an application to access uncommitted changes of other transactions. The application does not lock other applications out of the row that it is reading, unless the other application attempts to drop or alter the table. Sometimes referred to as “Dirty Reads” 4.1.2.1 Recommendations o Make sure you know the isolation level under which you are running. Do not count on default values, which can change based on how you are accessing the database. o Because the isolation level determines how data is locked and isolated from other processes while the data is being accessed, you should select an isolation level that balances the requirements of concurrency and data integrity for your particular application. The isolation level that you specify is in effect for the duration of the unit of work. 4.1.3 Locking To provide concurrency control and prevent uncontrolled data access, the database manager places locks on tables, table blocks, or table rows. A lock associates a database manager resource with an application, called the lock owner, to control how other applications can access the same resource. Locking is a fundamental process of any database manager and is used to ensure the integrity of the data. But while maintaining those locks, there is a potential impact on the concurrency and throughput of your application. There are a number of factors that the database manager uses to determine whether to use row level or table level locking: The different isolation levels described above are used to control access to uncommitted data, prevent lost updates, allow non-repeatable reads of data, and prevent phantom reads. Use the minimum isolation level that satisfies your application needs. The access plan selected by the optimizer. Table scans, index scans, and other methods of data access each require different types of access to the data. IBM Confidential 27. June 2017 CMDB RR API SDD1, Ling Tai Page 18 of 35 Product Name Here Tivoli Systems (via Subject - File>Prop>Summary) The LOCKSIZE attribute for the table. This parameter indicates the granularity of the locks used when the table is accessed. The choices are either ROW for row locks, or TABLE for table locks. The amount of memory devoted to locking. The amount of memory devoted to locking is controlled by the locklist database configuration parameter. 4.1.3.1 Recommendations o *** IMPORTANT *** - COMMIT as frequently as possible and/or practical in order to release any locks your application holds. If possible, design your application so that you can easily vary the commit frequency for large batch operations. This will allow you to optimally balance the throughput and concurrency of your system. o Use ALTER TABLE... LOCKSIZE TABLE for read-only tables. This reduces the number of locks required by database activity. o If the lock list fills, performance can degrade due to lock escalations and reduced concurrency on shared objects in the database. If lock escalations occur frequently, increase the value of either locklist or maxlocks , or both 4.1.4 Fetch needed columns only There is additional CPU cost associated with each column selected or fetched from the database. Higher I/O cost may also be experienced if sorting is required. 4.1.4.1 Recommendations Select or fetch the columns that you need Never code “SELECT *” to retrieve all columns in a table 4.1.5 Reuse resources Consider reuse of the following components: 4.1.5.1 4.1.6 Recommendations Database connections – this can be accomplished using the connection pooling features of DB2. Connection pooling is a process in which DB2 drops the inbound connection with an application that requests disconnection, but keeps the outbound connection to the host in a pool. When a new application requests a connection, DB2 uses one from the existing |pool. Using the already-present connection reduces the overall connection time, as well as the high processor connect cost on the host. Connection pooling is implemented using: o DB2 Connect o via JDBC using the Websphere connection pooling feature. SQL Statement Preparation Before an SQL statement can be executed, it must be converted from text form to an executable form, by submitting it to the SQL compiler. This is referred to as the SQL Statement Prepare process. After the IBM Confidential 27. June 2017 CMDB RR API SDD1, Ling Tai Page 19 of 35 Product Name Here Tivoli Systems (via Subject - File>Prop>Summary) statement is prepared, the bind process occurs. This process converts the output from the SQL compiler to a usable control structure, such as an access plan, application plan, or package. During the bind process, access paths to the data are selected and some authorization checking is performed. DB2 supports two types of SQL processing. Static - SQL statements that are embedded within a program, and are prepared during the program preparation process before the program is executed. After being prepared, a static SQL statement does not change, although values of host variables specified by the statement can change. Dynamic - SQL statements that are prepared and executed at run time. In dynamic SQL, the SQL statement is contained as a character string in a host variable and is not precompiled. Static SQL offers the advantage of only executing the statement preparation process once, this eliminating that processing step each time the statement is executed. Dynamic SQL statements, by there definition, are prepared and executed at run time. You can, however, minimize the effect of statement preparation by writing your dynamic SQL statement using parameter markers. Parameter markers act in a similar fashion to host variables in static SQL statements. If you use them in a dynamic SQL statement, you would first issue a single PREPARE statement, followed by multiple EXECUTE statements, which allow you to substitute values for the parameter markers. This savings can be significant for simple SQL statements, like an INSERT, that are executed many times with different values. 4.1.6.1 Recommendations Use static SQL whenever possible o If using java, SQLJ supports static SQL If you use dynamic SQL o code them using parameter markers o Increase the size of the database package cache. This cache stores dynamic SQL statements and allows for their reuse 4.1.7 Query Tuning The following SQL statement clauses may improve the performance of your application. Use the FOR UPDATE clause to specify the columns that could be updated by a subsequent positioned UPDATE statement. Use the FOR READ/FETCH ONLY clause to make the returned columns read only. Use the OPTIMIZE FOR n ROWS clause to give priority to retrieving the first n rows in the full result set. Use the FETCH FIRST n ROWS ONLY clause to retrieve only a specified number of rows. Use the DECLARE CURSOR WITH HOLD statement to retrieve rows one at a time and maintain cursor position after a commit IBM Confidential 27. June 2017 CMDB RR API SDD1, Ling Tai Page 20 of 35 Product Name Here Tivoli Systems (via Subject - File>Prop>Summary) Take advantage of row blocking, by specifying the FOR READ ONLY, FOR FETCH ONLY, OPTIMIZE FOR n ROWS clause, or if you declare your cursor as SCROLLing. This will improve performance, and, in addition, improve concurrency because exclusive locks are never held on the rows retrieved. Avoid DISTINCT or ORDER by if not required. This will help to eliminate any potential sorting that may have to occur. o Proper indexing may be used to eliminate SORTing also 5. Hardware Design and Operating System Usage 5.1 Key Factors This section will just discuss overall considerations for these factors and will not discuss detailed calculations for capacity planning purposes. 5.1.1 Memory Understanding how DB2 organizes memory helps you tune memory use for good performance. Many configuration parameters affect memory usage. Some may affect memory on the server, some on the client, and some on both. Furthermore, memory is allocated and de-allocated at different times and from different areas of the system. While the database server is running, you can increase or decrease the size of memory areas inside the database shared memory. You should understand how memory is divided among the different heaps before tuning to balance overall memory usage on the entire system. Refer to the “DB2 Administration Guide: Performance” for a detailed explanation of DB2’s memory model and all of the parameters that effect memory usage. 5.1.2 CPU The CPU utilization goal should be about 70 to 80% of the total CPU time. Lower utilization means that the CPU can cope better with peak workloads. Workloads between 85% to 90% result in queuing delays for CPU resources, which affect response times. CPU utilization above 90% usually results in unacceptable response times. While running batch jobs, backups, or loading large amounts of data, the CPU may be driven to high percentages, such as to 80 to 100%, to maximize throughput. DB2 supports the following processor configurations: Uni-Processor – a single system that contains only one single CPU SMP (Symmetric Multiprocessor) – a single system that can contain multiple CPU’s. Scalability is limited to the CPU sockets provided on the motherboard. MPP (Massively Parallel Processors) – a system with multiple nodes connected over a high speed link. Each node has their own CPU(s). Scalability is achieved by adding new nodes. IBM Confidential 27. June 2017 CMDB RR API SDD1, Ling Tai Page 21 of 35 Product Name Here Tivoli Systems (via Subject - File>Prop>Summary) Things to consider regarding CPU: Inefficient data access methods cause high CPU utilization and are major problems for database system. Refer back to section 4.1.1. Paging and swapping requires CPU time. Consider this factor while planning your memory requirements. 5.1.3 I/O The following are rules of thumb that can be used to calculate total disk space required by an application. If you have more detailed information, use that instead of the ROT’s. Calculate the raw data size o Add up the column lengths of your database tables o Multiply by the number of rows expected Once you have the raw data size, using the following scaling up ratios to factor in space for indexing, working space, etc. o OLTP ratio: 1:3 o DSS ratio: 1:4 o Data warehouse ratio: 1:5 Consider the following to improve disk efficiency: Minimize I/O – access to main memory is orders of magnitude faster than to disk. Provide as much memory as possible to the database Bufferpools and various memory heaps to avoid I/O When I/O is needed, reading simultaneously from several disks is the fastest way. Provide for parallel I/O operations by: o Using several smaller disk rather than one big disk o Place the disk drive(s) on separate controllers 5.1.3.1 Choosing Disk Drives There are several trends in current disk technology: They get bigger every year, roughly doubling in capacity every 18 months. The cost per GB is lower each year. The cost difference of the two smallest drives diminishes until there is little point in continuing with the smaller drive. The disk drives improve a little each year in seek time. The disk drives get smaller in physical size. IBM Confidential 27. June 2017 CMDB RR API SDD1, Ling Tai Page 22 of 35 Product Name Here Tivoli Systems (via Subject - File>Prop>Summary) While the disk drives continue to increase capacity with a smaller physical size, the speed improvements, seek, etc., are small in comparison. A database that would have taken 36 * 1 GB drives a number of years ago can now be placed on one disk. This highlights the database I/O problems. For example, if each 1 GB disk drive can do 80 I/O operations a second, this means the system can do a combined 36 * 80 = 2880 I/O operations per second. But a single 36 GB drive with a seek time of 7 ms can do only 140 I/O operations per second. While increased disk drive capacity is good news, the lower numbers of disks cannot deliver the same I/O throughput. 5.1.3.2 Recommendations When determining your I/O requirements consider: o OLTP Systems reading data will involve reading indexes inserts and updates require data, index, and logs to be written Provide for parallel I/O operations o Use DMS tablespaces and separate data and indexes in separate tablespaces o Use the smallest disk drives possible purely on the basis of increasing the number of disks for I/O throughput. If buying larger drives, use only half the space (the middle area – it’s the fastest) for the database, and the other half for: 5.1.4 Backups Archiving data Off hour test databases Extra space used for upgrades Network The network can influence the overall performance of your application, but usually manifest itself when there is a delay in the following situations: o The time between when a client machine sends a request to the server and the server receives this request o The time between when the server machine sends data back to the client machine and the client machine receives the data Once a system is implemented, the network should be monitored in order to assure that its bandwidth is not being consumed more than 50%. 5.1.4.1 Recommendations The following techniques can by used to improve overall performance and avoid high network consumption: IBM Confidential 27. June 2017 CMDB RR API SDD1, Ling Tai Page 23 of 35 Product Name Here Tivoli Systems (via Subject - File>Prop>Summary) o Transmit a block of rows to the client machine in a single operation. This is accomplished by using the BLOCKING option in the pre-compile or bind procedures. Refer to section 4.1.8, Query Tuning, for other factors that influence row blocking. o Use stored procedures to minimize the number of accesses to the database. Stored procedures are programs that reside on the RDBMS server and can be executed as part of a transaction by the client applications. This way, several pre-programmed SQL statements can be executed by using only one CALL command from the client machine. o Using stored procedures will typically make it more difficult to run your application on different database platforms, like Oracle or SQL Server, because of the syntactical differences of their stored procedure implementation. So if you need to run your application on multiple database platforms, be aware of this consideration. 6. Tuning Opportunities This section contains some additional tuning considerations not already discussed. 6.1 Insert/Delete Performance Here are some things to consider about insert and delete performance. The biggest bottlenecks are typically: o o o o Read and Write I/O for index and data Active Log Write CPU Time Locking We’ve previously discussed ways to address many of these issues, such as: o Using DMS tablespaces and placing the table and index data into separate tablespaces to enable parallel I/O. o Providing for efficient logging o The use of parameter markers to prepare an INSERT statement once, and execute it many times. o Batching of SQL Statements o Via JDBC batch facility o Minimize Indexing Here are a few other suggestions to improve insert performance: o Consider the use of APPEND MODE o Insert multiple rows with one INSERT statement o PCTFREE values for Data, Clustering Index, and non-clustering Index components. IBM Confidential 27. June 2017 CMDB RR API SDD1, Ling Tai Page 24 of 35 Product Name Here Tivoli Systems (via Subject - File>Prop>Summary) The following database parameter is important. o Number of asynchronous page cleaners (NUM_IOCLEANERS) - This parameter controls the number of page cleaners that write changed pages from the buffer pool to disk. You may want to increase this to the number of physical disk drive devices you have. The default is 1. The following database manager parameter may be important. o Enable intra-partition parallelism (INTRA_PARALLEL) – if you have a multi-processor SMP system, setting this parameter to YES may improve performance. The default is NO The following registry variable can be used o DB2MAXFSCRSEARCH - The setting of this registry variable determines the number of Free Space Control Records (FSCRs) in a table that are searched for an INSERT. The default o value for this registry variable is five. If no space is found within the specified number of FSCRs, the inserted record is appended at the end of the table. To optimize INSERT speed, subsequent records are also appended to the end of the table until two extents are filled. After the two extents are filled, the next INSERT resumes searching at the FSCR where the last search ended. o To optimize for INSERT speed at the possible expense of faster table growth, set the DB2MAXFSCRSEARCH registry variable to a small number. To optimize for space reuse at the possible expense of INSERT speed, set DB2MAXFSCRSEARCH to a larger number. 6.2 Database Manager Configuration Tuning Each instance of the database manager has a set of database manager configuration parameters (also called database manager parameters). These affect the amount of system resources that will be allocated to a single instance of the database manager. Some of these parameters are used for configuring the setup of the database manager and other, non-performance related information. There are numerous database manager configuration parameters. I will only list some of the one’s that have a high impact on performance. Refer to the “DB2 Administration Guide: Performance” for a detailed explanation of all the database manager configuration parameters. o Agentpri - This parameter controls the priority given both to all agents, and to other database manager instance processes and threads, by the operating system scheduler. Use the default unless you run a benchmark to determine the optimal value. o Aslheapsz – the application support layer heap represents a communication buffer between the local application and its associated agent. This buffer is allocated as shared memory by each database manager agent that is started. If the request to the database manager, or its associated reply, do not fit into the buffer they will be split into two or more send-and-receive pairs. The size of this buffer should be set to handle the majority of requests using a single send-and-receive pair. IBM Confidential 27. June 2017 CMDB RR API SDD1, Ling Tai Page 25 of 35 Product Name Here Tivoli Systems (via Subject - File>Prop>Summary) o Intra_parallel - This parameter specifies whether the database manager can use intra-partition parallelism on an SMP machine. Multiple processors can be used to scan and sort data for index creation. o Java_heap_sz - This parameter determines the maximum size of the heap that is used by the Java interpreter started to service Java DB2 stored procedures and UDFs (User Defined Functions). o Max_querydegree - This parameter specifies the maximum degree of intra-partition parallelism that is used for any SQL statement executing on this instance of the database manager. An SQL statement will not use more than this number of parallel operations within a partition when the statement is executed. The intra_parallel configuration parameter must be set to “YES” to enable the database partition to use intra-partition parallelism. o Sheapthres – the sort heap threshold determines the maximum amount of memory available for all the operations that use the sort heap including: sorts, hash joins, dynamic bitmaps (used for index ANDing and Star Joins), and operations where the table is in memory. Ideally, you should set this parameter to a reasonable multiple of the largest sortheap parameter you have in your database manager instance. This parameter should be at least two times the largest sortheap defined for any database within the instance. 6.3 Database Configuration Tuning Each database has a set of the database configuration parameters (also called database parameters). These affect the amount of system resources that will be allocated to that database. In addition, there are some database configuration parameters that provide descriptive information only and cannot be changed; others are flags that indicate the status of the database. There are numerous database configuration parameters. I will only list some of the one’s that have a high impact on performance. Refer to the “DB2 Administration Guide: Performance” for a detailed explanation of all the database configuration parameters. o Chngpgs_thresh - Asynchronous page cleaners will write changed pages from the buffer pool (or the buffer pools) to disk before the space in the buffer pool is required by a database agent. As a result, database agents should not have to wait for changed pages to be written out so that they might use the space in the buffer pool. This improves overall performance of the database applications. o Locklist - This parameter indicates the amount of storage that is allocated to the lock list. This has a high impact on performance if there are frequent lock escalations. o Maxlocks – maximum percent of lock list before escalation. Used in conjunction with the locklist parameter to control lock escalations. o Logbufsz - This parameter allows you to specify the amount of the database heap (defined by the dbheap parameter) to use as a buffer for log records before writing these records to disk. Buffering the log records will result in more efficient logging file I/O because the log records will be written to disk less frequently and more log records will be written at each time. IBM Confidential 27. June 2017 CMDB RR API SDD1, Ling Tai Page 26 of 35 Product Name Here Tivoli Systems (via Subject - File>Prop>Summary) o Num_iocleaners - This parameter allows you to specify the number of asynchronous page cleaners for a database. These page cleaners write changed pages from the buffer pool to disk before the space in the buffer pool is required by a database agent. As a result, database agents should not have to wait for changed pages to be written out so that they might use the space in the buffer pool. This improves overall performance of the database applications. Set this parameter to be between one and the number of physical storage devices used for the database. o Num_ioservers - I/O servers are used on behalf of the database agents to perform prefetch I/O and asynchronous I/O by utilities such as backup and restore. This parameter specifies the number of I/O servers for a database. A good value to use is generally one or two more than the number of physical devices on which the database resides. o Pckcachesz – The package cache is used for caching of sections for static and dynamic SQL statements on a database. Caching packages and statements allows the database manager to reduce its internal overhead by eliminating the need to access the system catalogs when reloading a package; or, in the case of dynamic SQL, eliminating the need for compilation. o Sortheap - This parameter defines the maximum number of private memory pages to be used for private sorts, or the maximum number of shared memory pages to be used for shared sorts. Each sort has a separate sort heap that is allocated as needed, by the database manager. This sort heap is the area where data is sorted. Increase the size of this parameter when frequent large sorts are required. 6.4 Bufferpools The buffer pool is the area of memory where database pages (table rows or indexes) are temporarily read and manipulated. All buffer pools reside in global memory, which is available to all applications using the database. The purpose of the buffer pool is to improve database performance. Data can be accessed much faster from memory than from disk. Therefore, the more data (rows and indexes) the database manager is able to read from or write to memory, the better the database performance. The default buffer pool allocation is usually not sufficient for production applications and needs to be monitored and tuned before placing your application in production. 6.5 Registry Variables Each instance of the database manager has a set of Registry and Environment variables. These affect various aspects of DB2 processing. There are numerous Registry and Environment variables. I will only list some of the one’s that have a high impact on performance. Refer to the “DB2 Administration Guide: Performance” for a detailed explanation of all the Registry and Environment variables. o DB2_Parallel_IO - While reading or writing data from and to table space containers, DB2 may use parallel I/O for each table space value that you specify. The degree of parallelism is determined by IBM Confidential 27. June 2017 CMDB RR API SDD1, Ling Tai Page 27 of 35 Product Name Here Tivoli Systems (via Subject - File>Prop>Summary) the prefetch size and extent size for the containers in the table space. For example, if the prefetch size is four times the extent size, then there are four extent-sized prefetch requests. The number of containers in the table space does not affect the number of prefetchers. To enable parallel I/O for all table spaces, use the wildcard character, ″*″. To enable parallel I/O for a subset of all table spaces, enter the list of table spaces. If there is more than one container, extent-size pieces of any full prefetch request are broken down into smaller requests executed in parallel based on the number of prefetchers. When this variable is not enabled, the number of prefetcher requests created is based on the number of containers in the table space. 7. Monitoring Tools DB2 provides several tools that can be used for monitoring or analyzing your database. These monitoring and analyzing tools, along with their purposes are: o Snapshot Monitor – captures performance information at periodic points of time. Used to determine the current state of the database o Event Monitor - provides a summary of activity at the completion of events such as statement execution, transaction completion, or when an application disconnects. o Explain Facility - provides information about how DB2 will access the data in order to resolve the SQL statements. o db2batch tool - provides performance information (benchmarking tool) 8. Tuning Methodology 8.1 Tuning Approach The objective of tuning an RDBMS is to make sure that the system is delivering good performance. As with most things, the ability to meet that objective depends on the effort and resources you apply to the tuning effort. The following are areas that would cause a tuning effort to be started: Regular task - Regular periodic monitoring and tuning is standard practice. Many sites do a review of performance on quarterly, half-yearly, or yearly intervals. Generated warning - The automated monitoring of the system to warn that performance is degrading and has hit some threshold. IBM Confidential 27. June 2017 CMDB RR API SDD1, Ling Tai Page 28 of 35 Product Name Here Tivoli Systems (via Subject - File>Prop>Summary) Emergency - There is an emergency in performance or response time, which has been highlighted by user feedback. The tuning must identify the problem, recommend a solution, and then work out how to avoid this happening again. New system - A newly built system requires initial tuning for maximum performance before going into production. In many cases, a new system might be put in production before being optimally tuned because of the difficultly of generating user workloads artificially and not being able to predict real user workloads, work patterns, and data volume and distribution. For this reason, it is critical that the design principals outlined in this document be followed, so as to minimize the effects of the unknown as much as possible. System change – if the change is significant, this is similar to a new system if testing cannot effectively simulate production. Regardless of the reason, the tuning approach will largely be the same. 8.2 Skills Needed While this document focuses on database design and tuning, there are many other factors that influence the overall performance of any application. Other skills that may be required tune the overall system include: Hardware Experts – for the various hardware platforms you plan to run on Operating System Experts – for the various operating system platforms you plan to run on. This could include: o System administration experts o Operating system performance and tuning experts Relational Database skills including: o DBA skills o SQL Tuning Middleware Experts – if using middleware products such as Websphere Applications Experts o Functional knowledge of the application o Technical experts on the products used to build the application 8.3 Java C++ General Process The following process is recommended to improve the performance of any system: 1. Establish performance indicators. IBM Confidential 27. June 2017 CMDB RR API SDD1, Ling Tai Page 29 of 35 Product Name Here Tivoli Systems (via Subject - File>Prop>Summary) 2. Define performance objectives. 3. Develop a performance monitoring plan. 4. Carry out the plan. 5. Analyze your measurements to determine whether you have met your objectives. If you have, consider reducing the number of measurements you make because performance monitoring itself uses system resources. Otherwise, continue with the next step. 6. Determine the major constraints in the system. 7. Decide where you can afford to make trade-offs and which resources can bear additional load. (Nearly all tuning involves trade-offs among system resources and the various elements of performance.) 8. Adjust the configuration of your system. If you think that it is feasible to change more than one tuning option, implement one at a time. If there are no options left at any level, you have reached the limits of your resources and need to upgrade your hardware. 9. Return to Step 4 above and continue to monitor your system. On a periodic basis, or after significant changes to your system: Perform the above procedure again from step 1. Re-examine your objectives and indicators. Refine your monitoring and tuning strategy. 8.4 DB2 Specific Tuning There are numerous documents and books written describing detailed formal tuning methodologies. This document is not intended to repeat, or rewrite, any of that. I will discuss some specific DB2 topics, tools, and techniques that can be used to tune your DB2 database. 8.4.1 SQL Reviews As mentioned earlier in this document, SQL Reviews are essential for a good performing RDBMS system. SQL reviews in DB2 are generally done using SQL Explain facility. Explain allows you to capture information about the access plan chosen by the optimizer as well as performance information that helps you tune queries. Before you can capture explain information, you need to create the relational tables in which the optimizer stores the explain information and you set the special registers that determine what kind of explain information is captured. These tables can be created: Automatically by the DB2 Control Center Running the following command from a DB2 command window o Db2 –tf EXPLAIN.DDL (located in sqllib/misc directory) IBM Confidential 27. June 2017 CMDB RR API SDD1, Ling Tai Page 30 of 35 Product Name Here Tivoli Systems (via Subject - File>Prop>Summary) DB2 provides a number of facilities to view the information generated by the EXPLAIN facility. These include: Visual Explain – you invoke Visual Explain from the Control Center to see a graphical display of a query access plan. You can analyze both static and dynamic SQL statements. db2exfmt – this command line tool is used to display explain information in preformatted output. db2expln and dynexpln – these command line tools are use to see the access plan information available for one or more packages of static SQL statements. Db2expln shows the actual implementation of the chosen access plan. It does not show optimizer information. The dynexpln tool, which uses db2expln within it, provides a quick way to explain dynamic SQL statements that contain no parameter markers. This use of db2expln from within dynexpln is done by transforming the input SQL statement into a static statement within a pseudo-package. When this occurs, the information may not always be completely accurate. If complete accuracy is desired, use the explain facility. The db2expln tool does provide a relatively compact and English-like overview of what operations will occur at run-time by examining the actual access plan generated. Which tool you use depends on a number of factors including: how complicated is the SQL Do you have access to the actual program/code that will be running Your familiarity with the tools Previously, I discussed the functions of the DB2 optimizer. The DB2 optimizer uses information and statistics in the DB2 catalog in order to determine the best access to the database based on the query provided. This information is generally gathered using the RUNSTATS utility. The catalog information includes the following types of information: Number of rows in a table Indexes defined on the table Whether the table needs to be REORGanized 8.4.2 Database Statistics Ideally, you would use the explain facility mentioned above against a database that has production level volumes loaded into it. The access path chosen may be drastically different if a table has 100 rows in a test database and 10,000,000 rows in a production database. This difference is exacerbated if the SQL query involves a join of multiple tables. IBM Confidential 27. June 2017 CMDB RR API SDD1, Ling Tai Page 31 of 35 Product Name Here Tivoli Systems (via Subject - File>Prop>Summary) Often times, it is not easy or practical, particularly early on, to have a database loaded with production level volumes. DB2 provides a facility for simulating a production database by allowing you to code SQL UPDATE statements that operate against a set of predefined catalog views in the SYSSTAT schema. With SYSSTAT, the database administrator is able to simulate production volumes in a test database. Updating these views allows the optimizer to create different access plans under different conditions. The following is a sample of one of these UPDATE statements: UPDATE SYSSTAT.TABLES SET CARD = 850000 WHERE TABNAME =’CUSTOMER ’ Although this facility is provided, it should be used only as an initial facility for updating statistics. There are many inter-relationships about the database tables in the catalog, and understanding how to update all of them correctly is extremely difficult. 8.4.2.1 Using Database Statistics for SQL Access Plan Analysis The following sources should be used in the order listed when using the EXPLAIN facility to analyze your SQL statements. Make sure any statistics you use are current by running the RUNSTATS utility. 1. production database (or an image of one) 2. a test database loaded with a significant amount of data 3. a test database updated with production statistics o The DB2 tool, db2look , is designed to capture all table DDL and statistics of a production database to replicate it to the test system. 4. update the SYSSTAT views o the UPDATE statements of the SYSSTAT views are generated by the db2look facility. 8.4.3 SNAPSHOT and EVENT Monitors DB2 maintains data about its operation, its performance, and the applications using it. This data is maintained as the database manager runs, and can provide important performance and troubleshooting information. For example, you can find out: The number of applications connected to a database, their status, and which SQL statements each application is executing, if any. Information that shows how well the database manager and database are configured, and helps you to tune them. When deadlocks occurred for a specified database, which applications were involved, and which locks were in contention. IBM Confidential 27. June 2017 CMDB RR API SDD1, Ling Tai Page 32 of 35 Product Name Here Tivoli Systems (via Subject - File>Prop>Summary) The list of locks held by an application or a database. If the application cannot proceed because it is waiting for a lock, there is additional information on the lock, including which application is holding it. Collecting performance data introduces overhead on the operation of the database. DB2 provides monitor switches to control which information is collected. You can turn these switches on by using the following DB2 commands: UPDATE MONITOR SWITCHES USING BUFFERPOOL ON ; UPDATE MONITOR SWITCHES USING LOCK ON ; UPDATE MONITOR SWITCHES USING SORT ON ; UPDATE MONITOR SWITCHES USING STATEMENT ON ; UPDATE MONITOR SWITCHES USING TABLE ON ; UPDATE MONITOR SWITCHES USING UOW ON ; You can access the data that the database manager maintains either by taking a snapshot or by using an event monitor. 8.4.3.1 SNAPSHOTs Use the GET SNAPSHOT command to collect status information and format the output for your use. The information returned represents a snapshot of the database manager operational status at the time the command was issued. There are various formats of this command that are used to obtain different kinds of information. The specific syntax can be obtained from the DB2 Command Reference. Some of the more useful ones are: GET SNAPSHOT FOR DATABASE - Provides general statistics for one or more active databases on the current database partition. GET SNAPSHOT FOR APPLICATIONS - Provides information about one or more active applications that are connected to a database on the current database partition. GET SNAPSHOT FOR DATABASE MANAGER - Provides statistics for the active database manager instance. GET SNAPSHOT FOR LOCKS - Provides information about every lock held by one or more applications connected to a specified database. GET SNAPSHOT FOR BUFFERPOOLS - Provides information about buffer pool activity for the specified database. GET SNAPSHOT FOR DYNAMIC SQL - Returns a point-in-time picture of the contents of the SQL statement cache for the database. IBM Confidential 27. June 2017 CMDB RR API SDD1, Ling Tai Page 33 of 35 Product Name Here Tivoli Systems (via Subject - File>Prop>Summary) You can create some simple scripts and schedule them to get periodic snapshots during your test cycles. 8.4.4 DB2BATCH A benchmark tool called db2batch is provided in the sqllib/bin subdirectory of your DB2 installation. This tool can read SQL statements from either a flat file or standard input, dynamically describe and prepare the statements, and return an answer set. You can specify the level of performance-related information supplied, including the elapsed time, CPU and buffer pool usage, locking, and other statistics collected from the database monitor. If you are timing a set of SQL statements, db2batch also summarizes the performance results and provides both arithmetic and geometric means. For syntax and options, type db2batch. 9. Publications & Online Help The following books serve as reference material. Information Units Contents Administration Guide: Performance SC09-4821 This book contains information about how to configuring and tuning your database environment to improve performance. Database Performance Tuning on AIX SG24-5511-01 This Redbook contains hints and tips from experts that work on RDBMS performance every day. It also provides introductions to general database layout concepts from a performance point of view, design and sizing guidelines, tuning recommendations, and performance and tuning information for DB2 UDB, Oracle, and IBM Informix databases. This IBM Redbook will provide you with guidelines for system design, database design, and application design with DB2 UDB for AIX Version 7.1. It also discusses the methods that are available for performance analysis and tuning. DB2 UDB V7.1 Performance Tuning Guide SG24-6012 10. Sizing IBM Confidential 27. June 2017 CMDB RR API SDD1, Ling Tai Page 34 of 35 Product Name Here Tivoli Systems (via Subject - File>Prop>Summary) The following spreadsheets are provided in the Project Database on an “as-is” basis as samples to assists with tablespace and index sizing, and DB2 memory utilization. 1. TBLDATA-zOS.XLS - Spreadsheet to estimate space requirements for a table on the mainframe 2. INDXDATA-zOS.XLS - Spreadsheet to estimate space requirements for an index on the mainframe 3. TBLDATA-dist.XLS - Spreadsheet to estimate space requirements for a table in the distributed environment 4. INDXDATA-dist.XLS - Spreadsheet to estimate space requirements for an index in the distributed environment 5. DB2UDBMEMORY.XLS - a spreadsheet containing the estimated memory usage of the system 11. Reference 12. Addendum IBM Confidential 27. June 2017 CMDB RR API SDD1, Ling Tai Page 35 of 35