Download File

Session: D01 Bottlenecks Elimination in Real World DB2 Applications Sigen Chen Lockheed Martin 13 October 2008 • 11:15 – 12:15 Platform: DB2 UDB for Linux, UNIX, Windows ABSTRACT Database application performance for a given system (hardware and software) may be determined by application behavior, APIs, database design and layout, data size, system configurations. This presentation will cover some aspects based on the performance improving practice from real world database applications. The focus will be on understanding the application behavior; creating the right indexes; writing optimal queries, exploring the query features wisely; using appropriate APIs for a given requirement, not only on the programming language level, but also on the statement attributes such as cursor type, data type for binding, fetch orientation, array options; practicing proactive maintenance to ensure optimal data layout and statistics; tuning the key configuration parameters based on application behavior and system monitoring data. The troubleshooting examples and sample code segments are used to exemplify the practice. Performance issue debugging and analysis is also included. In short Presenting some experience from managing Real world DB2 databases Sharing some performance data from database application benchmarking Exercising some DB2 coding (APIs) options just for curiosity from database application performance point of viewing 1 Summary • Diagnosing the real database applications • Using DB2 native tools and system tools. • Creating the correct index • Adding the right indexes • Removing the unnecessary indexes. • Choosing the right API for a given job, i.e., • Embedded, CLI, ADO/IBM provider, ADO/MS Bridge, JDBC T2, Perl and Shell Script. • Using proper data type in SQLBindCol(), using array fetch/insert, right cursor types, and proper fetching/inserting APIs. • Tuning several key cfg parameters such as parallelism, avg_appls etc., refining the options of maintenance tools. 2 1. Discussing how to identify the bottlenecks by analyzing the debugging data using system tools (vmstat, top, prstat, sar, pmap etc.), DB2 native tools (snapshot, even monitor, access plan, db2pd etc..) and profiling tools. 2. Showing how to collect and analyze the query access plan, and using the right indexes to reduce the cost of a bottleneck queries. 3. Analyzing several commonly used DB2 Supported APIs (Embedded-SQL, CLI, JDBC, ADO, Perl, CLP), and their performance difference through our test data; Comparing several fetch/insert orientations of CLI, statement attributes, and test the performance. 4. Writing the most efficient queries, using the query options wisely, such as blocking features. After all the DBMS supposed to do exactly what application (queries) requested to do. 5. Understanding the application nature (OLTP or DSS or mixed), and tuning the DBM and DB configuration parameters accordingly; maintaining the database proactively to ensure the optimal database performance. 2 Performance Factors • $, Hardware Infrastructure (cpu/mem/io, network), BP, Reasonable Data Layout • Application behavior • APIs (Language, Interface) • Database application design and data layout • Data size (response time vs size) • System configurations • System Maintenance (proactive vs responsive) 3 What Could Affect A Given Database Application System Performance? - $ and HW infrastructure (cpu/mem/disk, network) is out of the scope of this presentation; - It’s also assumed that you would have reasonable BP hit ratio and data layout (TS, LOGs, striping). For a given system (platform – HD, SW), something a DBA can do to improve perferformance. - Understand business objectives and the applications behavior – OLTP, DSS (DW), or mixed? Tuning the system accordingly - Number of the active applications. Is parallelism necessary? - How applications are implemented? C, Java etc.. - What APIs are employed? - one may not have control over all language and APIs used by applications, but a DBA does have control on maintenance programs and batch jobs. - Disks layout and data distribution? Is HA involved? Is DPF involved? - As data size growing, performance can be affected significantly (exponential), keep scalability in mind. Performance improvement can be a dynamic DBA task. - Proactive Maintenance – reorg, statistics, binding etc.. - Troubleshooting examples and some sample code segments are used to exemplify the proactive practice. Performance issue debugging and analysis is also included. 3 Performance Improvement Approaches • • • • Understanding the application behavior Writing optimal queries, exploring the query features wisely Creating the necessary indexes Using appropriate APIs for a given requirement • Programming language level • Statement attributes such as cursor type, data type for binding, fetch orientation, array options; • Proactive maintenance to ensure optimal data layout and updated statistics • Tuning the key configuration parameters based on application behavior and system monitoring data. 4 “A DBMS is supposed to do just what applications requested to do…….” Therefore understanding the application behavior is most important in order to maximize the performance for a given system. (Occasionally a DBMS does not do what’s expected, then it would become a PMR issue) -Indexes can help most of the queries, but not always - Developers ought to optimize the queries, not just barely make them work. -API - Program Level choose the right language for your job - C, Java, Perl, or shell scripts -Coding level – data type, cursor type, fetch orientation, array option, blocking etc.. - Maintenance as most DBAs do (backup, necessary reorg, update statistics, rebind, data integrity check). -Does the database need reorg? Data growth, insertion mode, Online or offline -Do I have enough security on LOGs (Primary, Mirror, Archive), how Logs should be distributed? - What RUNSTATS option is the best suited to my system? - Configuration Parameter setting (DBM CFG, DB CFG, and registry) – based on benchmarking or stress test 4 Examples Summary • Approach - DB2 native tools + OS fundamental tools • Create correct index is the key (2~43x on multiple applications) • Choosing the right API for a given job is essential Embedded(1.00) CLI (1.03) ADO/IBM provider (1.31) ADO/MS Bridge (1.47) JDBC T2 (1.56) Shell Script (4.80) • Using proper data type (i.e., in SQLBindCol); right cursor types; and proper fetching/inserting APIs • Tuning based on application behave (e.g., parallelism, avg_appls etc) to resolve memory shortage, locking, response time • runstats options (e.g., had 37x performance impact ) 5 Brief summary of the data/example showing the impact. When troubleshooting an issue, where to start? - Approach: Basic native tools are always the good places to start, such as CPU, mem, io, then examining the snapshot data and event monitoring, queries…… Some prefer to buy monitoring tools, make sure you understand how data is collected and interpreted - If find long executing queries (bottleneck queries), analyze the access plan -> focus on the most costly Plan steps - Coding APIs - A business decision and developers skill set. The numbers in parenthesis are relative response time in comparison, smaller the better - Using proper data type and appropriate cursor type, and fetch orientation. Numbers in the parenthesis are the relative execution time - Tuning is based on applications behavior. Configuration parameters should be based on benchmarking tests - Ensure DB has updated necessary statistics, and optimized access plan 5 Understand the Nature of Applications • OLTP or DSS or Mixed • Possible limitations vs tolerance • Example - parallelism (DFT_DEGREE, INTRA_PARALLEL, DFT_QUERYOPT, AVG_APPLS) 6 OLTP applications expect faster instant response; DSS applications may have complex queries or larger result set. The expectation and tolerance may be different. Configuration may need to take the application expectation into account. OLTP Opt level DSS low High AVG_APPLS 1 vary: depends on number of complex query applications and bufferpool size Parallelism no yes -----DFT_DEGREE 1 [ANY, -1, 1 - 32 767] (CURRENT DEGREE) MAX_QUERYDEGREE -1 (ANY) [ANY, 1 - 32 767] Number of parallel operations within a database partition when the statement is executed INTRA_PARALLEL NO (0) [SYSTEM (-1), NO (0), YES (1)] may require more FCM buffer DFT_QUERYOPT AVG_APPLS 5 [ 0 -- 9 ] 1 or N efficiently using Bufferpool 6 Example 1. AVG_APPLS • SQL10013N, could not load the library • Overall application performance improve 3~54% • The bottleneck query execution time (seconds) and CPU usage (%) Time (Sec.) CPU usage (%, 4-way Sun) avg_appls=5 105 50 avg_appls=1 0.006 16 7 SQL10013N The specified library "<name>" could not be loaded In an OLTP application system, response time is essential. What would be your tolerable response time when hit a button (or link)? Sub-seconds? One would want to tune the system run as quick as possible, which means allowing an application to use all the available resource (bufferpool in this case) and be done with it. When an OLTP query takes about several seconds or more, user might just navigate away from the site. In some cases, that means potentially loose business. 7 Example 2. Intra_parallel • Turning Intra_Parallel OFF freed up about 1.5GB real memory and 2 GB swap memory in an 32-bit Sun/Solaris system – saved system from crashing • Disabling the Intra_parallelism improved some application performance by 2~5% • Conclusion: choose the features wisely 8 Problem: system crashed because swap memory was exhausted. Parallelism is a great feature. However, would it help you? How did I know it was the intra_parallel=YES that caused the crash? Eerror message suggested that “No FCM request blocks are available (SQL6043C)” ; Number of FCM request blocks (FCM_NUM_RQB) can not be increased 2 GB memory saving means a great deal on a 4-way (Sun V880) box. Analogy example for this would be that for a simple job that requires climbing a ladder, one person can do the job just fine. Two people would be crowded, and might cause crash! 8 Writing optimal queries/program, exploring the query features wisely • Too many to mention • A Simple Query Example • Select C1,Cx from T1 where C1 in (x,y) optimize for 1000 rows • What is the expected resultSet? • Is the blocking necessary? • Local or n-tier system 9 Select C1,Cx from T1 where C1 in (x,y,) optimize for 1,000 rows Even a simple query like the above requires the careful coding – is the blocking really needed? What is the expected resultset? Local database or remote? Too often we have seen such clause show up in the OLTP application queries, which caused the performance problem to users. 9 Example 3. Using result set block vs non-block under various APIs (Win2k-390 system, 100,000 rows) API NON-BLOCKING BLOCKING (optimize for N rows) R.T.* Stdev/ave R.T.* Stdev/ave Embedded 5.59 0.64 1 0.03 CLI 6.49 0.74 1 0.02 JDBC T2 1.93 0.46 1 0.00 ADO 4.94 0.36 1 0.04 10 R.T. = relative time against the same API used Row blocking is a technique that reduces database manager overhead by retrieving a block of rows in a single operation. These rows are stored in a cache, and each FETCH request in the application gets the next row from the cache. When all the rows in a block have been processed, another block of rows is retrieved by the database manager. Our test data of fetching 100,000 rows from a 10 columns table (rs=239 bytes, number of rows per block is 84) in a win2k-zOS system indicated that without blocking, results can be fluctuated (being that stdev vs Average is higher), and about 2-6 times slower than that using blocking. The cache is allocated when an application issues an OPEN CURSOR request and is deallocated when the cursor is closed. The size of the cache is determined by a configuration parameter which is used to allocate memory for the I/O block. The database manager parameter used depends on whether the client is local or remote: • For local applications, aslheapsz (default 15 x 4K) is used to allocate the cache for row blocking. • For remote applications, rqrioblk (default 32K) on the client workstation is used to allocate the cache for row blocking. The cache is allocated on the database client. -- just in case someone wants to know how to determine the size Rowsperblock=aslheapsz*4096/rs Rowsperblock=rqrioblk/rs UNAMBIG, ALL, NO -- what if the query only return a handful or several records? The blocking would make query response time longer? Because it would try to find out the first N rows, until it could not get as many rows as specified. 10 Example 4.1. Reuse the Statement via Parameter Markers int main () { SQLHANDLE henv, hdbc, hstmt; char * sqlstmt = (SQLCHAR *) “INSERT INTO T1 (C2, C5) VALUES(?,?)”; SQLINTEGER *col2, lvalue; SQLCHAR *col5; int rc=0, pass=0; /* allocate henv, hdbc, connect to database */ /* allocate statement handle */ rc = SQLAllocHandle (SQL_HANDLE_STMT, hdbc, &hstmt); /* prepare the statement */ rc = SQLPrepare (hstmt, sqlstmt, SQL_NTS); /* assign values to the input variables */ col2 = (SQLINTEGER *)malloc(sizeof(int)); *col2=1; col5 = (SQLCHAR *) malloc((sizeof(char))*100); strcpy ((char *)col5, "my 100 characters string, but could be shorter……"); /* bind the values to the parameter markers */ rc = SQLBindParameter(hstmt, 1, SQL_PARAM_INPUT, SQL_C_LONG, SQL_INTEGER, 0, 0, (SQLINTEGER *)col2, sizeof((SQLINTEGER *)col2 ), &lvalue ); rc = SQLBindParameter(hstmt, 2, SQL_PARAM_INPUT, SQL_C_CHAR, SQL_CHAR, 100, 0, (SQLCHAR *)col5, 100, NULL ); /* execute the statement, assume that 100,000 rows to be inserted into the table */ while(pass++<100000){ rc=SQLExecute( hstmt ); *col2 = *col2+1; /* suppose that we’d like to assign different value for each C2, we may do the same for C5, source input can be from various source, such as file */ } /* commit or rollback, free handles, disconnect */ return (SQL_SUCCESS); } 11 /* in this example we will insert values into two columns, C2 and C5 */ For an insert query, parameter markers in an SQL statement may be bound to either application variables or array of the application variables for all C data types. Data is then transferred from application to the DBMS when SQLExecute() or SQLExecuteDirect() is called. The code segments in this slide demonstrate a typical insert using CLI SQLBindParameter() to assign the values to the parameter markers. One may use the literal values in the SQL statement instead of using parameter makers binding. But in doing so, performance will be seriously impacted. Because DB2 would have to Look for a match in the dynamic SQL cache. None would be found if each time (SQLExecute() call) the new values are inserted, then DB2 would invoke the optimizer to generate a plan to execute the statement. DB2 would then discard the oldest statement in the dynamic SQL cache, and insert the new one. Performance degradation may depend on the number of the rows inserted. I have seen several minutes vs sub-seconds performance difference of using literal vs parameter markers. The experience told me that use parameter marker binding whenever possible. One can do the same on array insert. 11 Example 4.2. Reuse the Statement via Parameter Markers int main () { /* allocate henv, hdbc, connect to database allocate statement handle */ /* prepare the statement */ rc = SQLPrepare (hstmt, sqlstmt, SQL_NTS); /* assign values to the input variables, bind the values to the parameter markers, execute the statement, assume that 100,000 rows to be inserted into the table */ while(pass++<100000){ rc=SQLExecute( hstmt ); *col2 = *col2+1; /* suppose that we’d like to assign different value for each C2, we may do the same for C5, source input can be from various source, such as file */ } /* commit or rollback, free handles, disconnect */ } 12 /* in this example we will insert values into two columns, C2 and C5 */ For an insert query, parameter markers in an SQL statement may be bound to either application variables or array of the application variables for all C data types. Data is then transferred from application to the DBMS when SQLExecute() or SQLExecuteDirect() is called. The code segments in this slide demonstrate a typical insert using CLI SQLBindParameter() to assign the values to the parameter markers. One may use the literal values in the SQL statement instead of using parameter makers binding. But in doing so, performance will be seriously impacted. Because DB2 would have to Look for a match in the dynamic SQL cache. None would be found if each time (SQLExecute() call) the new values are inserted, then DB2 would invoke the optimizer to generate a plan to execute the statement. DB2 would then discard the oldest statement in the dynamic SQL cache, and insert the new one. Performance degradation may depend on the number of the rows inserted. I have seen several minutes vs sub-seconds performance difference of using literal vs parameter markers. The experience told me that use parameter marker binding whenever possible. One can do the same on array insert. 12 Using Appropriate APIs for a Given Requirement • Scenario: An on going batch job to set document status for a list of docIDs passed. Time is essential • Shell script meant to be interactive (input/invoke CLP/SQL/Commit) • Programming language, such as C, may allow streamline the logic, reuse the stmts, more cursor manipulation options etc.. • C:Perl:ksh(opt):ksh(prim) =1:3.76:302:1066 13 What presented here is a simple update statement that needs to be executed frequently with a list of the record-IDs as input. “ Update table1 set c1=‘U’ where c2 in (?)” What needed was a streamline program, quick and efficiently process the documents. Efficiency is the key. Numbers were collected against a local database. No network traffic was involved, difference is purely caused by API difference 13 Example 5. Several APIs Performance Comparison in a Local Solaris System 1200 1066 Relative Time 1000 800 600 302 400 200 1 3.76 0 C Perl Ksh (opt) Ksh (prim) APIs 14 C:Perl:ksh(opt):ksh(prim) =1:3.76:302:1066 (50,000 records for testing, updating) C – CLI well written, prepare stmt once, reuse it. Perl – prepare stmt once, reuse it, one more layer of the Interface Ksh (opt) – auto commit off, quiet, remove the unnecessary print steps etc.. Ksh (prim) – interactive, stdout IO, redundancy with auto commit on … this is more likely some people would be programming, quick and dirty code – barely make it work. 14 Relative Time Example 6. APIs Performance in two-tier DB2 Connect System 1.6 1.4 1.2 1 0.8 0.6 0.4 0.2 0 1.47 1.56 1.31 1 Embedded 1.03 CLI ADO/IBM Provider ADO/MS Bridge JDBC T2 Driver APIs 15 Notice that numbers in this slide are collected in 2-tier system using a composite workload (all kind of the SQLs) The comparison data of using CLI, JDBC (Driver Type 2), ADO (use both IBM OLE DB Provider for DB2 Server and Microsoft OLE DB Bridge for ODBC Drivers), and Static Embedded SQL in a Windows2000 - zOS two-tier system. DB2 Connect Server was on the Windows 2000 application client. If the time for using Embedded-SQL is normalized to 1.00, the performance sequence for fetching data using various APIs (fastest to slowest) is Embedded SQL (1.00), CLI (1.03), ADO/IBM provider (1.31), ADO/Microsoft Bridge (1.47), and JDBC (1.56). DB2 CLI is comparable to the Embedded SQL! IBM Provider outperformed Microsoft Bridge. JDBC is just as expected. The magnitude of difference among APIs is that 2-tier system is smaller than that in local system. It could be because that in a multiple tier system, more factors came into play, such as MF server generally slower; and data transfer between server and client. 15 Example 7. Performance of three fetch APIs with different data type in binding SQLGetData() SQL_C_CHAR 4 SQLGetData - PDT 3.55 Relative Time 3.32 3 2 SQLBindCol SQL_C_CHAR 1.38 SQLBindCol - PDT SQLFetchScroll SQL_C_CHAR SQLFetchScroll - PDT 1 1 1 0.89 0 16 For fetching data, 10 cols x 200,000 rows in our test case, if the time for using typical SQLBindCol() is normalized to 1.00, the performance sequence from the fastest to the slowest is: Proper data type in binding Using SQL_C_CHAR in binding SQLFetchScroll 0.89 vs 1 SQLFetch/SQLBindCol 1 vs 1.38 SQLGetData 3.32 vs 3.55 Using proper data type in binding is always better than using SQL_C_CHAR, Therefore, use proper data type in binding, and use array fetch whenever possible Typically, an application may choose to allocate the maximum memory the column value could occupy and bind it via SQLBindCol(), based on information about a column in the result set (obtained via a call to SQLDescribeCol(), for example, or prior knowledge). However, in the case of character and binary data, the column can be arbitrarily long. If the length of the column value exceeds the length of the buffer the application can allocate or afford to allocate, a feature of SQLGetData() lets the application use repeated calls to obtain in sequence the value of a single column in more manageable pieces. This API may be used to Java or GUI type of the application. Tradeoff is the slow performance. 16 Example 8. SQLFetch Orientation 2.9 Relative Time 3 2 1.2 1 1 IV EN TI C EY S ET -D R ST A K FO R W AR D _O NL Y 0 SQL_CURSOR Orientation 17 Cursor Type and SQLFetchScroll() In the above examples, the fetch was sequential, i.e., retrieving rows starting with the first row, and ending with the last row. In that case, we know SQLFetchScroll() gives the best performance. What if an application to allow the user to scroll through a set of data both forwards and backwards? DB2 CLI has three type of the scrollable cursors – (1) forward only (default) cursor - can only scrolls forward. (2) static read-only cursor - is static, once it is created no rows will be added or removed, and no value in any rows will change (3) keyset-driven cursor - has ability to detect changes to the underlying data, and the ability to use the cursor to make changes to the underlying data. Keyset-driven cursor will reflect the changed values in existing rows, and deleted rows; but it will not reflect added rows. Because the set of rows is determined once, when the cursor is opened. It does not re-issue the select statement to see if new rows have been added that should be included. To be able to scroll through the cursor back and forth, cursor has to be defined as SQL_CURSOR_STATIC or SQL_CURSOR_KEYSET_DRIVEN. The position of the rowset within the result set can be specified as SQL_FETCH_NEXT, SQL_FETCH_FIRST, SQL_FETCH_LAST, SQL_FETCH_RELATIVE, SQL_FETCH_ABSOLUTE, SQL_FETCH_PRIOR, and SQL_FETCH_BOOKMARK in the SQLFetchScroll() call. Performance impact From the performance point of the view, a static cursor involves the least overhead, if the application does not need the additional feature of a keyset-driven cursor then a static cursor should be used. If the application needs to detect changes to the underlying data, or needs to add, update, or delete data from the result set, then the keyset-driven cursor may be used. Also, if one needs to scroll the cursor back and forth, cursor type needed to be set to SQL_CURSOR_STATIC, the default value for the type of scrollable cursor is SQL_CURSOR_FORWARD_ONLY. If we compared the performance for fetching data using STATIC and KEYSET-DRIVEN with that using FORWARD_ONLY, we would see 1.2 and 2.9 times slower for Static and Keyset-drive cursor respectively compared to forward only cursor. I.e., the features come with a cost. An example of using various type of the cursors in array fetch with specified fetch orientation (see next slide) 17 Sample Code of Using Static Cursor /* cursor type has to be specified via SQLSetStmtAttr() before the SQLPrepare() */ rc = SQLSetStmtAttr ( hstmt, SQL_ATTR_CURSOR_TYPE, (SQLPOINTER) SQL_CURSOR_STATIC, 0); rc = SQLParepare(hstmt, sqlstmt, SQL_NTS); /* …… */ /* fetch orientation may be specified in SQLFetchScroll() */ rc = SQLFetchScroll(hstmt, SQL_FETCH_FIRST, 0); /* …… */ 18 To be able to scroll through the cursor back and forth, cursor has to be defined as SQL_CURSOR_STATIC or SQL_CURSOR_KEYSET_DRIVEN. The position of the rowset within the result set can be specified as SQL_FETCH_NEXT SQL_FETCH_FIRST SQL_FETCH_LAST SQL_FETCH_RELATIVE SQL_FETCH_ABSOLUTE SQL_FETCH_PRIOR and SQL_FETCH_BOOKMARK in the SQLFetchScroll() call. An example of using STATIC or KEYSET_DRIVEN cursor would be similar to that illustrated in the Sample code, except defining the cursor type and specifying the fetch orientation 18 Example 9. Insert APIs Performance 1 Relative Time 1 0.85 0.81 0.8 0.6 0.42 0.4 0.42 0.36 0.2 ge dI ni A tia rr lly ay _I ns er t( 10 0) C H AI NN IN C G LI U SE _L O A D Bi nd ot Lo g N xt en de d SQ LE SQ LB in dP a ra m ... 0 SQL Insert APIs 19 For inserting data, if the time for inserting 100,000 rows, one at a time using SQLBindParameter() is normalized to 1.00, the performance sequence from fastest to the slowest is CLI USE_LOAD (0.36) - CLI API invokes LOAD; large data CHAINING (0.42) - referred to as “CLI array input chaining”. All SQLExecute() requests associated with a prepared statement will not be sent to the server until either the SQL_ATTR_CHAINING_END statement attribute is set, or the available buffer space is consumed by rows that have been chained. Array Insert (0.42, Size 100) – Inserting multiple rows Row Insert with Not Logged Initially Activated (0.81) - reducing the logging SQLExtendedBind (0.85) – bind array of the columns, some restrictions apply SQLBindParameter(1.00) - typical Had one only used single row insert via SQLBindParameter(), he would have missed a lot of the great options that CLI has to offer. When Array size > 10, changing size does not have significant impact Reducing logging with the NOT LOGGED INITIALLY parameter SQLExtendedBind() This function can be used to replace multiple calls to SQLBindCol() or SQLBindParameter(), however, important differences should be noted. 19 Typical Row Insert …… rc = SQLBindParameter(hstmt, 1, SQL_PARAM_INPUT, SQL_C_LONG, SQL_INTEGER, 0, 0, (SQLINTEGER *)col1, sizeof((SQLINTEGER *)col1 ), &lvalue ); rc = SQLBindParameter(hstmt, 2, SQL_PARAM_INPUT, SQL_C_CHAR, SQL_CHAR, 100, 0, (SQLCHAR *)col2, 100, NULL ); /* execute the statement, assume that n (100,000) rows to be inserted */ while(pass++<n){ rc=SQLExecute( hstmt ); *col1 = *col1+1; } …… 20 Suppose that we’d like to assign different value for each Col1, we may do the same for Col2 20 Array Insert /* just make up some values for column Col1 and Col2 */ SQLINTEGER col1[]= {1,2,3,4,5,6,7,8,9,10, ……100}; SQLCHAR col2[100][100]= {"A1","B2”,”C3”,”D4”,”E5”,”F6”,”G7”,”H8”,”I9”,”J10”,……”z100”}; /* set array size, 100 for our sample code */ rc=SQLSetStmtAttr(hstmt, SQL_ATTR_PARAMSET_SIZE, (SQLPOINTER)100, 0); /* bind the values to the parameter markers, which is the same as before except this time col1 and col2 are arrays */ rc = SQLBindParameter(hstmt, 1, SQL_PARAM_INPUT, SQL_C_LONG, SQL_INTEGER, 0, 0, col1, 0, NULL); rc = SQLBindParameter(hstmt, 2, SQL_PARAM_INPUT, SQL_C_CHAR, SQL_CHAR, 100, 0, col2, 100, NULL ); while(pass++<n) rc=SQLExecute( hstmt ); /* ...... */ 21 Execute the statement, assume that we’d like to insert 100,000 rows into the table, but this time only execute 1000 times, because array size is set to 100 Bind the values to the parameter markers, which is the same as before except this time col1 and col2 are arrays 21 Chaining /* …… */ rc = SQLSetStmtAttr(statement.hstmt, SQL_ATTR_CHAINING_BEGIN, (SQLPOINTER) TRUE, 0); while ( pass++ <n ) { rc = SQLExecute(hstmt); } rc=SQLSetStmtAttr(statement.hstmt, SQL_ATTR_CHAINING_END, (SQLPOINTER) TRUE, 0); /* …… */ 22 An example of using Chaining would be similar to Sample above, except setting CHAINING_BEGIN and END around the SQLExecute() via SQLSetStmtAttr() SQL_ATTR_CHAINING_BEGIN A 32-bit integer which specifies that DB2 will chain together SQLExecute() requests for a single prepared statement before sending the requests to the server; this feature is referred to as CLI array input chaining. All SQLExecute() requests associated with a prepared statement will not be sent to the server until either the SQL_ATTR_CHAINING_END statement attribute is set, or the available buffer space is consumed by rows that have been chained. The size of this buffer is defined by the ASLHEAPSZ dbm cfg for local client applications, or the RQRIOBLK dbm cfg parameter for client/server configurations. This attribute can be used with the CLI/ODBC configuration keyword ArrayInputChain to effect array input without needing to specify the array size. Refer to the documentation for ArrayInputChain for more information. SQL_ATTR_CHAINING_END Causes all chained SQLExecute() requests to be sent to the server. After this attribute is set, SQLRowCount() can be called to determine the total row count for all SQLExecute() statements that were chained between the SQL_ATTR_CHAINING_BEGIN and SQL_ATTR_CHAINING_END pair. Error diagnostic information for the chained statements becomes available after the SQL_ATTR_CHAINING_END attribute is set. This attribute can be used with the DB2 CLI configuration keyword ArrayInputChain to effect array input without needing to specify the array size. Refer to the documentation for ArrayInputChain for more information. 22 Use Load API /* allocate henv, hdbc, connect to database, allocate statement handle, prepare the statement, assign values to the input variables, bind the values to the parameter markers */ /* begin to use load */ rc = SQLSetStmtAttr(hstmt, SQL_ATTR_USE_LOAD_API, (SQLPOINTER) SQL_USE_LOAD_INSERT, 0 ); /* execute the statement, assume that we’d like to insert 100000 rows into the table */ while(pass++<n){ rc=SQLExecute( hstmt ); *col1 = *col1+1; } /* end use load */ rc=SQLSetStmtAttr(hstmt, SQL_ATTR_USE_LOAD_API, (SQLPOINTER) SQL_USE_LOAD_OFF, 0); 23 CLI calling LOAD to insert the data. Anything related to LOAD operation would apply to CLI USE_LOAD_API 23 Create Necessary Indexes • Bottleneck Queries First • Including Stored Procedures, Triggers • Only those needed – Indexes can help, could also hurt 24 How do we know indexes are needed? 0. identify bottleneck queries – snapshot and event monitor data 1. db2advis is a good tool to start…… 2. Analyzing the access plan, find the bottlenecks, try to come up an index to reduce the cost…… 3. Testing the index(es) created, ensure it improve the bottleneck queries w/o hurting other queries too much 24 Example 10. SQLs In The Procedures • Trigger on icmut01005001 CREATE TRIGGER CML.TG03_ICMUT01005001 AFTER UPDATE OF ATTR0000001024 ON CML.ICMUT01005001 REFERENCING NEW AS NEW FOR EACH ROW MODE DB2SQL WHEN (UPPER(NEW.attr0000001024) not in ('IC','CN') OR NEW.attr0000001024 is null) BEGIN ATOMIC CALL CML.ICHG_QUE_PROC (NEW.ATTR0000001021, NEW.ATTR0000001024, NEW.ATTR0000001025); END • SP on ICHG_QUE table CREATE PROCEDURE CML.ICHG_QUE_PROC (IN ATTR1021 CHARACTER(26), IN ATTR1024 CHARACTER(2), IN ATTR1025 TIMESTAMP) SPECIFIC CML.ICHG_QUE_PROC LANGUAGE SQL MODIFIES SQL DATA BEGIN DECLARE V_CNT INTEGER DEFAULT 0; SELECT count(*) INTO V_CNT FROM CML.ICHG_QUE WHERE CML.ICHG_QUE.ATTR0000001021 = ATTR1021 WITH UR; IF V_CNT < 1 THEN INSERT INTO CML.ICHG_QUE ATTR0000001021, ATTR0000001024, ATTR0000001025) VALUES (ATTR1021, ATTR1024, ATTR1025); END IF; END No index on ATTR0000001021 which is docID • 25 In some case a bottleneck SQL may not that obviously. For example when you have triggers or stored procedure calls, you may need to examine what SQLs in them. In the example above, obviously a trigger is defined to call a procedure when certain condition is met. The procedure contains a SQL counting for something. Unfortunately the count(*) stmt does not have the column in the WHERE clause defined as index, therefore a tablescan was inevitable whenever there is a modification on a table attribute. How many system could afford a tablescan? 25 An Index That Reduced The Cost TableScan Happened Before the Index Addition IndexScan after the Index is added on QUE (attr1021) …… …… TBSCAN ( 3) 9539.24 (cost) 2318 (IO) | 474808 TABLE: CML QUE IXSCAN ( 3) 50.04 (cost) 2 (IO) | 477516 INDEX: CML QUE1021 26 Tests on the laboratory server and production system indicated that this index addition has increased the performance by 230% using C/CLI where a few thousands records are in the table. What if there are more than a few thousands records in the table? 26 Example 11. Where Should The Indexes Be? Stmt: update CML.DocTab set docType =‘X’ where docID=? and docType in (‘Y’,‘Z’) docID is unique and docType is not, where the index should be? 27 Which ever has higher cardinality. 27 An index that may hurt the performance What if an index is defined on docType? Before Adding the Index …… /---+---\ 1 1.28141e+06 IXSCAN TABLE: CML ( 5) DocTab 75.0417 3 | 1.28141e+06 INDEX: CML Index2 Addition Index on doctype …… /--------+--------\ 1 0.0202847 FETCH TBSCAN ( 5) ( 7) 100.048 0.0457899 4 0 /---+---\ | 1 1.28141e+06 2 IXSCAN TABLE: CML TEMP ( 6) DocTab ( 8) 75.0417 0.0159013 3 0 | | 1.28141e+06 2 INDEX: CML TBSCAN IndeX2 ( 9) 6.67186e-05 0 | 2 TABFNC: SYSIBM GENROW 28 During examination of the query access plan, it was noticed that dropping an unnecessary index eliminated three extra operations on temp tables for the update SQL statement, and further improved the performance by nearly 40times (60 minutes work of updating 50k rows is completed in 1.5 minutes). Why? doctype has low cardinality Stmt: update CML.DocTab set docType ='DR' where docID=? and docType in ('CN','IC') Choose an index on column(s) that has more cardinality (i.e., docID) 28 Example 12. APIs +/- Index Effect 29 Right indexing (add what needed, remove unnecessary), plus proper APIs, have made 466x performance gain. From the Figures above, Indexes effect made API effects appeared small, however you are looking at double / triple / quadruple differences among the APIs. 29 Time Saved (Indexes + APIs) 584 Existing Code Optimized (1st Year Including 40 hrs Coding Effort) Optimized (Subsequent Years) 600 Hours Needed Per Year 500 400 300 200 1+40 1 100 0 Proactive Optimization Process 30 Considering the ongoing maintenance, each site may process as many as 2~3 million records per year. It would take the original ksh script 584 hours, or the third party’s Legacy program 1368 hours, to complete the job. The optimized approach can complete the job in 1.3 hours. Taking first years 40 hours effort of optimizing the methods into account, the first year’s hours for marking documents were reduced from 584 (ksh script) hours to 41 hours, this represents a net 1st year savings of 543 hours on each site. Subsequent years net saving would be 583 hours on each site. There are 7 (N) such sites on our program. Points are •Using the appropriate API for the right job. For example, C/CLI is much faster than ksh script for batch job processing of many records. •Creating indexes wisely. i.e., adding a necessary index or dropping an unnecessary index. •Some legacy code has had patches+patches+patches…… would it worthy rewriting the core pieces of the code? 30 Proactive Maintenance • • • • Reorg (online vs offline) Append_mode (online insertion) Runstats (various options) Monitor switches - do they need to be on? 31 When you have taken care of the Indexes, bufferpools, cfg parameters, logs, sort, APIs etc.. What else would you do? How about a stress test to push the system to a level where potential bottlenecks may become apparent? How about proactive maintenance? Is your database need reorg (reorgchk)? Do I have time and resource to reorg? How often do I need to update statistics? Is there a need to leave the monitor switches on? 31 Example 13. APPEND_MODE • Online pages reorganization could have its pros and cons. ON vs OFF (diff %) DELETE INSERT/ select SELECT UPDATE import 0.04 -75.47 0.06 0.75 -29.87 32 Turing append mode ON helps the insert performance, however nightly or weekly reorg is needed. When APPEND_MODE is set to ON, new rows are always appended to the end of the table. No searching or maintenance of FSCRs (Free Space Control Records) takes place. This option is enabled using the ALTER TABLE APPEND ON statement, and can improve performance for tables that only grow, like journals. Performance test is needed to verify, because if does have slight performance degradation on select stmt. 32 Relative Time Example 14.1 Runstats Options Effect 40 35 30 25 20 15 10 5 0 38 1 DEFAULT Detailed Detailed runstats option NUM_FREQVALUES from 10 to 100, NUM_QUANTILES from 20 to 200 33 Warning: Performance tests are needed to validate that the option change might help your applications. This a case of improving data validation utility (select queries mostly) RUNSTATS ON TABLE schema.OBJECTS ON ALL COLUMNS WITH DISTRIBUTION ON KEY COLUMNS DEFAULT NUM_FREQVALUES 100 NUM_QUANTILES 200 AND DETAILED INDEXES ALL ALLOW WRITE ACCESS; NUM_FREQVALUES Defines the maximum number of frequency values to collect. It can be specified for an individual column in the ON COLUMNS clause. If the value is not specified for an individual column, the frequency limit value will be picked up from that specified in the DEFAULT clause. If it is not specified there either, the maximum number of frequency values to be collected will be what is set in the NUM_FREQVALUES database configuration parameter. Current value Number of frequent values retained (NUM_FREQVALUES) = 10 The "most frequent value" statistics help the optimizer understand the distribution of data values within a column. A higher value results in more information being available to the SQL optimizer but requires additional catalog space. When 0 is specified, no frequent-value statistics are retained, even if you request that distribution statistics be collected. NUM_QUANTILES Defines the maximum number of distribution quantile values to collect. It can be specified for an individual column in the ON COLUMNS clause. If the value is not specified for an individual column, the quantile limit value will be picked up from that specified in the DEFAULT clause. If it is not specified there either, the maximum number of quantile values to be collected will be what is set in the NUM_QUANTILES database configuration parameter. Current number of quantiles retained (NUM_QUANTILES) = 20 The "quantile" statistics help the optimizer understand the distribution of data values within a column. A higher value results in more information being available to the SQL optimizer but requires additional catalog space. When 0 or 1 is specified, no quantile statistics are retained, even if you request that distribution statistics be collected. Increasing the value of these two parameters increases the amount of statistics heap (stat_heap_sz) used when collecting statistics. The default value of statistics heap size (4KB) (STAT_HEAP_SZ) is 4384. You may have to increase this configuration parameterl. 33 Example 14.2 RUNSTATS CMD RUNSTATS ON TABLE RMADMIN.RMOBJECTS ON ALL COLUMNS WITH DISTRIBUTION ON KEY COLUMNS DEFAULT NUM_FREQVALUES 100 NUM_QUANTILES 200 AND DETAILED INDEXES ALL ALLOW WRITE ACCESS ; 34 Default value for num_freqvalues = 10, num_quantiles = 20 34 How To Identify A Bottleneck? • Collecting and analyzing the debug data using basic system tools (vmstat, top, prstat, sar, pmap, iostats etc.); DB2 native tools (snapshot, event monitor, access plan, db2pd, db2advis etc..); and profiling tools if need to. • Query access plan - using the right indexes to reduce the cost of a bottleneck queries • Exploring the APIs features based on your need. DB2 supported APIs (Embedded-SQL, CLI, JDBC, ADO, Perl, CLP……), and their performance difference; fetch/insert orientations, statement attributes • Using the query options wisely, such as blocking features, parameter marking to reuse the statement if repeated calling the same. “DBMS supposed to do exactly what application (queries) requested to do” • Understanding the application nature (OLTP or DSS or mixed), and tuning the DBM and DB configuration parameters accordingly; • Maintaining the database proactively to ensure the optimal database performance 35 Could the bottlenecks identification and elimination be automated? Is there anyone interesting in writing up a program that can automatically identify performance bottlenecks and eliminate them? Stay tuned. 35 Session D01 Bottlenecks Elimination in Real World DB2 Applications Sigen Chen Lockheed Martin Baltimore, Maryland USA [email protected] 36 36

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download File