Download File

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Oracle Database wikipedia , lookup

Microsoft Access wikipedia , lookup

Tandem Computers wikipedia , lookup

Entity–attribute–value model wikipedia , lookup

Database wikipedia , lookup

Microsoft Jet Database Engine wikipedia , lookup

Functional Database Model wikipedia , lookup

Null (SQL) wikipedia , lookup

Extensible Storage Engine wikipedia , lookup

Clusterpoint wikipedia , lookup

Database model wikipedia , lookup

Microsoft SQL Server wikipedia , lookup

Relational model wikipedia , lookup

Open Database Connectivity wikipedia , lookup

SQL wikipedia , lookup

PL/SQL wikipedia , lookup

Transcript
Session: D01
Bottlenecks Elimination in
Real World DB2 Applications
Sigen Chen
Lockheed Martin
13 October 2008 • 11:15 – 12:15
Platform: DB2 UDB for Linux, UNIX, Windows
ABSTRACT
Database application performance for a given system (hardware and software)
may be determined by application behavior, APIs, database design and layout,
data size, system configurations. This presentation will cover some aspects
based on the performance improving practice from real world database
applications. The focus will be on understanding the application behavior;
creating the right indexes; writing optimal queries, exploring the query features
wisely; using appropriate APIs for a given requirement, not only on the
programming language level, but also on the statement attributes such as
cursor type, data type for binding, fetch orientation, array options; practicing
proactive maintenance to ensure optimal data layout and statistics; tuning the
key configuration parameters based on application behavior and system
monitoring data. The troubleshooting examples and sample code segments are
used to exemplify the practice. Performance issue debugging and analysis is
also included.
In short Presenting some experience from managing Real world DB2 databases
Sharing some performance data from database application benchmarking
Exercising some DB2 coding (APIs) options just for curiosity from database
application performance point of viewing
1
Summary
• Diagnosing the real database applications
• Using DB2 native tools and system tools.
• Creating the correct index
• Adding the right indexes
• Removing the unnecessary indexes.
• Choosing the right API for a given job, i.e.,
• Embedded, CLI, ADO/IBM provider, ADO/MS Bridge,
JDBC T2, Perl and Shell Script.
• Using proper data type in SQLBindCol(), using array
fetch/insert, right cursor types, and proper fetching/inserting
APIs.
• Tuning several key cfg parameters such as parallelism,
avg_appls etc., refining the options of maintenance tools.
2
1. Discussing how to identify the bottlenecks by analyzing the
debugging data using system tools (vmstat, top, prstat, sar, pmap
etc.), DB2 native tools (snapshot, even monitor, access plan,
db2pd etc..) and profiling tools.
2. Showing how to collect and analyze the query access plan, and
using the right indexes to reduce the cost of a bottleneck queries.
3. Analyzing several commonly used DB2 Supported APIs
(Embedded-SQL, CLI, JDBC, ADO, Perl, CLP), and their
performance difference through our test data; Comparing
several fetch/insert orientations of CLI, statement attributes, and
test the performance.
4. Writing the most efficient queries, using the query options
wisely, such as blocking features. After all the DBMS supposed
to do exactly what application (queries) requested to do.
5. Understanding the application nature (OLTP or DSS or mixed),
and tuning the DBM and DB configuration parameters
accordingly; maintaining the database proactively to ensure the
optimal database performance.
2
Performance Factors
• $, Hardware Infrastructure (cpu/mem/io, network),
BP, Reasonable Data Layout
• Application behavior
• APIs (Language, Interface)
• Database application design and data layout
• Data size (response time vs size)
• System configurations
• System Maintenance (proactive vs responsive)
3
What Could Affect A Given Database Application System Performance?
- $ and HW infrastructure (cpu/mem/disk, network) is out of the scope of this presentation;
- It’s also assumed that you would have reasonable BP hit ratio and data layout (TS, LOGs,
striping).
For a given system (platform – HD, SW), something a DBA can do to improve perferformance.
- Understand business objectives and the applications behavior – OLTP, DSS (DW), or mixed?
Tuning the system accordingly
- Number of the active applications. Is parallelism necessary?
- How applications are implemented? C, Java etc..
- What APIs are employed? - one may not have control over all language and APIs used by
applications, but a DBA does have control on maintenance programs and batch jobs.
- Disks layout and data distribution? Is HA involved? Is DPF involved?
- As data size growing, performance can be affected significantly (exponential), keep
scalability in mind. Performance improvement can be a dynamic DBA task.
- Proactive Maintenance – reorg, statistics, binding etc..
- Troubleshooting examples and some sample code segments are used to exemplify the
proactive practice. Performance issue debugging and analysis is also included.
3
Performance Improvement Approaches
•
•
•
•
Understanding the application behavior
Writing optimal queries, exploring the query features wisely
Creating the necessary indexes
Using appropriate APIs for a given requirement
• Programming language level
• Statement attributes such as cursor type, data type for
binding, fetch orientation, array options;
• Proactive maintenance to ensure optimal data layout and
updated statistics
• Tuning the key configuration parameters based on application
behavior and system monitoring data.
4
“A DBMS is supposed to do just what applications requested to do…….”
Therefore understanding the application behavior is most important in order to
maximize the performance for a given system. (Occasionally a DBMS does not
do what’s expected, then it would become a PMR issue)
-Indexes can help most of the queries, but not always
- Developers ought to optimize the queries, not just barely make them work.
-API
- Program Level
choose the right language for your job - C, Java, Perl, or shell scripts
-Coding level – data type, cursor type, fetch orientation, array option,
blocking etc..
- Maintenance as most DBAs do (backup, necessary reorg, update statistics,
rebind, data integrity check).
-Does the database need reorg? Data growth, insertion mode, Online or offline
-Do I have enough security on LOGs (Primary, Mirror, Archive), how Logs
should be distributed?
- What RUNSTATS option is the best suited to my system?
- Configuration Parameter setting (DBM CFG, DB CFG, and registry) – based
on benchmarking or stress test
4
Examples Summary
• Approach - DB2 native tools + OS fundamental tools
• Create correct index is the key (2~43x on multiple applications)
• Choosing the right API for a given job is essential
Embedded(1.00)
CLI (1.03)
ADO/IBM provider (1.31)
ADO/MS Bridge (1.47)
JDBC T2 (1.56)
Shell Script (4.80)
• Using proper data type (i.e., in SQLBindCol); right cursor types; and
proper fetching/inserting APIs
• Tuning based on application behave (e.g., parallelism, avg_appls
etc) to resolve memory shortage, locking, response time
• runstats options (e.g., had 37x performance impact )
5
Brief summary of the data/example showing the impact.
When troubleshooting an issue, where to start?
- Approach: Basic native tools are always the good places to start, such as
CPU, mem, io, then examining the snapshot data and event monitoring,
queries……
Some prefer to buy monitoring tools, make sure you understand
how data is collected and interpreted
- If find long executing queries (bottleneck queries), analyze the access plan ->
focus on the most costly Plan steps
- Coding APIs - A business decision and developers skill set. The numbers in
parenthesis are relative response time in comparison, smaller the better
- Using proper data type and appropriate cursor type, and fetch orientation.
Numbers in the parenthesis are the relative execution time
- Tuning is based on applications behavior. Configuration parameters should
be based on benchmarking tests
- Ensure DB has updated necessary statistics, and optimized access plan
5
Understand the Nature of Applications
• OLTP or DSS or Mixed
• Possible limitations vs tolerance
• Example - parallelism (DFT_DEGREE,
INTRA_PARALLEL, DFT_QUERYOPT,
AVG_APPLS)
6
OLTP applications expect faster instant response;
DSS applications may have complex queries or larger result set. The
expectation and tolerance may be different.
Configuration may need to take the application expectation into account.
OLTP
Opt level
DSS
low
High
AVG_APPLS 1
vary: depends on number of complex query
applications and bufferpool size
Parallelism
no
yes
-----DFT_DEGREE 1 [ANY, -1, 1 - 32 767] (CURRENT DEGREE)
MAX_QUERYDEGREE
-1 (ANY) [ANY, 1 - 32 767]
Number of parallel operations within a database partition when the statement
is executed
INTRA_PARALLEL NO (0) [SYSTEM (-1), NO (0), YES (1)] may require
more FCM buffer
DFT_QUERYOPT
AVG_APPLS
5 [ 0 -- 9 ]
1 or N efficiently using Bufferpool
6
Example 1. AVG_APPLS
• SQL10013N, could not load the library
• Overall application performance improve 3~54%
• The bottleneck query execution time (seconds) and CPU
usage (%)
Time (Sec.)
CPU usage
(%, 4-way Sun)
avg_appls=5
105
50
avg_appls=1
0.006
16
7
SQL10013N The specified library "<name>" could not be loaded
In an OLTP application system, response time is essential. What would be
your tolerable response time when hit a button (or link)? Sub-seconds?
One would want to tune the system run as quick as possible, which means
allowing an application to use all the available resource (bufferpool in this
case) and be done with it.
When an OLTP query takes about several seconds or more, user might just
navigate away from the site. In some cases, that means potentially loose
business.
7
Example 2. Intra_parallel
• Turning Intra_Parallel OFF freed up about 1.5GB
real memory and 2 GB swap memory in an 32-bit
Sun/Solaris system – saved system from crashing
• Disabling the Intra_parallelism improved some
application performance by 2~5%
• Conclusion: choose the features wisely
8
Problem: system crashed because swap memory was exhausted.
Parallelism is a great feature. However, would it help you?
How did I know it was the intra_parallel=YES that caused the crash?
Eerror message suggested that “No FCM request blocks are available
(SQL6043C)” ; Number of FCM request blocks (FCM_NUM_RQB) can not
be increased
2 GB memory saving means a great deal on a 4-way (Sun V880) box.
Analogy example for this would be that for a simple job that requires climbing
a ladder, one person can do the job just fine. Two people would be crowded,
and might cause crash!
8
Writing optimal queries/program,
exploring the query features wisely
• Too many to mention
• A Simple Query Example
• Select C1,Cx from T1 where C1 in (x,y) optimize
for 1000 rows
• What is the expected resultSet?
• Is the blocking necessary?
• Local or n-tier system
9
Select C1,Cx from T1 where C1 in (x,y,) optimize for 1,000 rows
Even a simple query like the above requires the careful coding – is the
blocking really needed? What is the expected resultset? Local database or
remote? Too often we have seen such clause show up in the OLTP application
queries, which caused the performance problem to users.
9
Example 3. Using result set block vs non-block under
various APIs (Win2k-390 system, 100,000 rows)
API
NON-BLOCKING
BLOCKING (optimize for
N rows)
R.T.*
Stdev/ave
R.T.*
Stdev/ave
Embedded
5.59
0.64
1
0.03
CLI
6.49
0.74
1
0.02
JDBC T2
1.93
0.46
1
0.00
ADO
4.94
0.36
1
0.04
10
R.T. = relative time against the same API used
Row blocking is a technique that reduces database manager overhead by retrieving a block of
rows in a single operation. These rows are stored in a cache, and each FETCH request in the
application gets the next row from the cache. When all the rows in a block have been
processed, another block of rows is retrieved by the database manager.
Our test data of fetching 100,000 rows from a 10 columns table (rs=239 bytes, number of rows
per block is 84) in a win2k-zOS system indicated that without blocking, results can be
fluctuated (being that stdev vs Average is higher), and about 2-6 times slower than that using
blocking.
The cache is allocated when an application issues an OPEN CURSOR request and is
deallocated when the cursor is closed. The size of the cache is determined by a configuration
parameter which is used to allocate memory for the I/O block. The database manager
parameter used depends on whether the client is local or remote:
• For local applications, aslheapsz (default 15 x 4K) is used to allocate the cache for row
blocking.
• For remote applications, rqrioblk (default 32K) on the client workstation is used to allocate
the cache for row blocking. The cache is allocated on the database client.
-- just in case someone wants to know how to determine the size
Rowsperblock=aslheapsz*4096/rs
Rowsperblock=rqrioblk/rs
UNAMBIG, ALL, NO
-- what if the query only return a handful or several records?
The blocking would make query response time longer? Because it would try to find out
the first N rows, until it could not get as many rows as specified.
10
Example 4.1. Reuse the Statement via Parameter Markers
int main () {
SQLHANDLE henv, hdbc, hstmt;
char * sqlstmt = (SQLCHAR *) “INSERT INTO T1 (C2, C5) VALUES(?,?)”;
SQLINTEGER
*col2, lvalue;
SQLCHAR
*col5;
int
rc=0, pass=0;
/* allocate henv, hdbc, connect to database */
/* allocate statement handle */
rc = SQLAllocHandle (SQL_HANDLE_STMT, hdbc, &hstmt);
/* prepare the statement */
rc = SQLPrepare (hstmt, sqlstmt, SQL_NTS);
/* assign values to the input variables */
col2 = (SQLINTEGER *)malloc(sizeof(int)); *col2=1;
col5 = (SQLCHAR *) malloc((sizeof(char))*100);
strcpy ((char *)col5, "my 100 characters string, but could be shorter……");
/* bind the values to the parameter markers */
rc = SQLBindParameter(hstmt, 1,
SQL_PARAM_INPUT, SQL_C_LONG, SQL_INTEGER,
0, 0,
(SQLINTEGER *)col2, sizeof((SQLINTEGER *)col2 ),
&lvalue );
rc = SQLBindParameter(hstmt, 2,
SQL_PARAM_INPUT, SQL_C_CHAR, SQL_CHAR,
100, 0,
(SQLCHAR *)col5, 100, NULL );
/* execute the statement, assume that 100,000 rows to be inserted into the table */
while(pass++<100000){
rc=SQLExecute( hstmt );
*col2 = *col2+1;
/* suppose that we’d like to assign different value for each C2, we may do the same for C5, source input can be from various source, such as file */
}
/* commit or rollback, free handles, disconnect */
return (SQL_SUCCESS);
}
11
/* in this example we will insert values into two columns, C2 and C5 */
For an insert query, parameter markers in an SQL statement may be bound to
either application variables or array of the application variables for all C data
types. Data is then transferred from application to the DBMS when
SQLExecute() or SQLExecuteDirect() is called. The code segments in this
slide demonstrate a typical insert using CLI SQLBindParameter() to assign the
values to the parameter markers.
One may use the literal values in the SQL statement instead of using parameter
makers binding. But in doing so, performance will be seriously impacted.
Because DB2 would have to Look for a match in the dynamic SQL cache.
None would be found if each time (SQLExecute() call) the new values are
inserted, then DB2 would invoke the optimizer to generate a plan to execute
the statement. DB2 would then discard the oldest statement in the dynamic
SQL cache, and insert the new one.
Performance degradation may depend on the number of the rows inserted. I
have seen several minutes vs sub-seconds performance difference of using
literal vs parameter markers. The experience told me that use parameter
marker binding whenever possible.
One can do the same on array insert.
11
Example 4.2. Reuse the Statement via Parameter Markers
int main () {
/* allocate henv, hdbc, connect to database allocate statement handle */
/* prepare the statement */
rc = SQLPrepare (hstmt, sqlstmt, SQL_NTS);
/* assign values to the input variables, bind the values to the parameter markers,
execute the statement, assume that 100,000 rows to be inserted into the
table */
while(pass++<100000){
rc=SQLExecute( hstmt );
*col2 = *col2+1;
/* suppose that we’d like to assign different value for each C2, we may do the
same for C5, source input can be from various source, such as file */
}
/* commit or rollback, free handles, disconnect */
}
12
/* in this example we will insert values into two columns, C2 and C5 */
For an insert query, parameter markers in an SQL statement may be bound to
either application variables or array of the application variables for all C data
types. Data is then transferred from application to the DBMS when
SQLExecute() or SQLExecuteDirect() is called. The code segments in this
slide demonstrate a typical insert using CLI SQLBindParameter() to assign the
values to the parameter markers.
One may use the literal values in the SQL statement instead of using parameter
makers binding. But in doing so, performance will be seriously impacted.
Because DB2 would have to Look for a match in the dynamic SQL cache.
None would be found if each time (SQLExecute() call) the new values are
inserted, then DB2 would invoke the optimizer to generate a plan to execute
the statement. DB2 would then discard the oldest statement in the dynamic
SQL cache, and insert the new one.
Performance degradation may depend on the number of the rows inserted. I
have seen several minutes vs sub-seconds performance difference of using
literal vs parameter markers. The experience told me that use parameter
marker binding whenever possible.
One can do the same on array insert.
12
Using Appropriate APIs for a
Given Requirement
• Scenario: An on going batch job to set document
status for a list of docIDs passed. Time is essential
• Shell script meant to be interactive (input/invoke
CLP/SQL/Commit)
• Programming language, such as C, may allow
streamline the logic, reuse the stmts, more cursor
manipulation options etc..
• C:Perl:ksh(opt):ksh(prim) =1:3.76:302:1066
13
What presented here is a simple update statement that needs to be executed
frequently with a list of the record-IDs as input.
“ Update table1 set c1=‘U’ where c2 in (?)”
What needed was a streamline program, quick and efficiently process the
documents.
Efficiency is the key. Numbers were collected against a local database. No
network traffic was involved, difference is purely caused by API difference
13
Example 5. Several APIs Performance
Comparison in a Local Solaris System
1200
1066
Relative Time
1000
800
600
302
400
200
1
3.76
0
C
Perl
Ksh (opt)
Ksh (prim)
APIs
14
C:Perl:ksh(opt):ksh(prim) =1:3.76:302:1066
(50,000 records for testing, updating)
C – CLI well written, prepare stmt once, reuse it.
Perl – prepare stmt once, reuse it, one more layer of the Interface
Ksh (opt) – auto commit off, quiet, remove the unnecessary print steps etc..
Ksh (prim) – interactive, stdout IO, redundancy with auto commit on … this is
more likely some people would be programming, quick and dirty code – barely
make it work.
14
Relative Time
Example 6. APIs Performance in
two-tier DB2 Connect System
1.6
1.4
1.2
1
0.8
0.6
0.4
0.2
0
1.47
1.56
1.31
1
Embedded
1.03
CLI
ADO/IBM
Provider
ADO/MS
Bridge
JDBC T2
Driver
APIs
15
Notice that numbers in this slide are collected in 2-tier system using a
composite workload (all kind of the SQLs)
The comparison data of using CLI, JDBC (Driver Type 2), ADO (use both
IBM OLE DB Provider for DB2 Server and Microsoft OLE DB Bridge for
ODBC Drivers), and Static Embedded SQL in a Windows2000 - zOS two-tier
system. DB2 Connect Server was on the Windows 2000 application client.
If the time for using Embedded-SQL is normalized to 1.00, the performance
sequence for fetching data using various APIs (fastest to slowest) is Embedded
SQL (1.00), CLI (1.03), ADO/IBM provider (1.31), ADO/Microsoft Bridge
(1.47), and JDBC (1.56). DB2 CLI is comparable to the Embedded SQL! IBM
Provider outperformed Microsoft Bridge. JDBC is just as expected.
The magnitude of difference among APIs is that 2-tier system is smaller than
that in local system. It could be because that in a multiple tier system, more
factors came into play, such as MF server generally slower; and data transfer
between server and client.
15
Example 7. Performance of three fetch APIs
with different data type in binding
SQLGetData() SQL_C_CHAR
4
SQLGetData - PDT
3.55
Relative Time
3.32
3
2
SQLBindCol SQL_C_CHAR
1.38
SQLBindCol - PDT SQLFetchScroll SQL_C_CHAR
SQLFetchScroll - PDT
1
1
1
0.89
0
16
For fetching data, 10 cols x 200,000 rows in our test case, if the time for using
typical SQLBindCol() is normalized to 1.00, the performance sequence from
the fastest to the slowest is:
Proper data type in binding Using SQL_C_CHAR in binding
SQLFetchScroll
0.89
vs
1
SQLFetch/SQLBindCol
1
vs
1.38
SQLGetData
3.32
vs
3.55
Using proper data type in binding is always better than using SQL_C_CHAR,
Therefore, use proper data type in binding, and use array fetch whenever
possible
Typically, an application may choose to allocate the maximum memory the
column value could occupy and bind it via SQLBindCol(), based on
information about a column in the result set (obtained via a call to
SQLDescribeCol(), for example, or prior knowledge). However, in the case of
character and binary data, the column can be arbitrarily long. If the length of
the column value exceeds the length of the buffer the application can allocate
or afford to allocate, a feature of SQLGetData() lets the application use
repeated calls to obtain in sequence the value of a single column in more
manageable pieces. This API may be used to Java or GUI type of the
application. Tradeoff is the slow performance.
16
Example 8. SQLFetch Orientation
2.9
Relative Time
3
2
1.2
1
1
IV
EN
TI
C
EY
S
ET
-D
R
ST
A
K
FO
R
W
AR
D
_O
NL
Y
0
SQL_CURSOR Orientation
17
Cursor Type and SQLFetchScroll()
In the above examples, the fetch was sequential, i.e., retrieving rows starting with the first row, and ending with the last
row. In that case, we know SQLFetchScroll() gives the best performance. What if an application to allow the user
to scroll through a set of data both forwards and backwards? DB2 CLI has three type of the scrollable cursors –
(1) forward only (default) cursor - can only scrolls forward.
(2) static read-only cursor - is static, once it is created no rows will be added or removed, and no value in any rows
will change
(3) keyset-driven cursor - has ability to detect changes to the underlying data, and the ability to use the cursor to
make changes to the underlying data. Keyset-driven cursor will reflect the changed values in existing rows, and
deleted rows; but it will not reflect added rows. Because the set of rows is determined once, when the cursor is
opened. It does not re-issue the select statement to see if new rows have been added that should be included.
To be able to scroll through the cursor back and forth, cursor has to be defined as SQL_CURSOR_STATIC or
SQL_CURSOR_KEYSET_DRIVEN. The position of the rowset within the result set can be specified as
SQL_FETCH_NEXT, SQL_FETCH_FIRST, SQL_FETCH_LAST, SQL_FETCH_RELATIVE,
SQL_FETCH_ABSOLUTE, SQL_FETCH_PRIOR, and SQL_FETCH_BOOKMARK in the SQLFetchScroll()
call.
Performance impact
From the performance point of the view, a static cursor involves the least overhead, if the application does not need the
additional feature of a keyset-driven cursor then a static cursor should be used. If the application needs to detect
changes to the underlying data, or needs to add, update, or delete data from the result set, then the keyset-driven
cursor may be used. Also, if one needs to scroll the cursor back and forth, cursor type needed to be set to
SQL_CURSOR_STATIC, the default value for the type of scrollable cursor is
SQL_CURSOR_FORWARD_ONLY. If we compared the performance for fetching data using STATIC and
KEYSET-DRIVEN with that using FORWARD_ONLY, we would see 1.2 and 2.9 times slower for Static and
Keyset-drive cursor respectively compared to forward only cursor. I.e., the features come with a cost.
An example of using various type of the cursors in array fetch with specified fetch orientation (see next slide)
17
Sample Code of Using Static Cursor
/* cursor type has to be specified via SQLSetStmtAttr() before the
SQLPrepare() */
rc = SQLSetStmtAttr ( hstmt,
SQL_ATTR_CURSOR_TYPE,
(SQLPOINTER) SQL_CURSOR_STATIC,
0);
rc = SQLParepare(hstmt, sqlstmt, SQL_NTS);
/* …… */
/* fetch orientation may be specified in SQLFetchScroll() */
rc = SQLFetchScroll(hstmt, SQL_FETCH_FIRST, 0);
/* …… */
18
To be able to scroll through the cursor back and forth, cursor has to be defined
as
SQL_CURSOR_STATIC or
SQL_CURSOR_KEYSET_DRIVEN.
The position of the rowset within the result set can be specified as
SQL_FETCH_NEXT
SQL_FETCH_FIRST
SQL_FETCH_LAST
SQL_FETCH_RELATIVE
SQL_FETCH_ABSOLUTE
SQL_FETCH_PRIOR and
SQL_FETCH_BOOKMARK
in the SQLFetchScroll() call.
An example of using STATIC or KEYSET_DRIVEN cursor would be similar
to that illustrated in the Sample code, except defining the cursor type and
specifying the fetch orientation
18
Example 9. Insert APIs Performance
1
Relative Time
1
0.85
0.81
0.8
0.6
0.42
0.4
0.42
0.36
0.2
ge
dI
ni
A
tia
rr
lly
ay
_I
ns
er
t(
10
0)
C
H
AI
NN
IN
C
G
LI
U
SE
_L
O
A
D
Bi
nd
ot
Lo
g
N
xt
en
de
d
SQ
LE
SQ
LB
in
dP
a
ra
m
...
0
SQL Insert APIs
19
For inserting data, if the time for inserting 100,000 rows, one at a time using
SQLBindParameter() is normalized to 1.00,
the performance sequence from fastest to the slowest is
CLI USE_LOAD (0.36) - CLI API invokes LOAD; large data
CHAINING (0.42)
- referred to as “CLI array input chaining”. All
SQLExecute() requests associated with a prepared statement will not be sent
to the server until either the SQL_ATTR_CHAINING_END statement attribute
is set, or the available buffer space is consumed by rows that have been
chained.
Array Insert (0.42, Size 100) – Inserting multiple rows
Row Insert with Not Logged Initially Activated (0.81) - reducing the logging
SQLExtendedBind (0.85) – bind array of the columns, some restrictions apply
SQLBindParameter(1.00) - typical
Had one only used single row insert via SQLBindParameter(), he would have
missed a lot of the great options that CLI has to offer.
When Array size > 10, changing size does not have significant impact
Reducing logging with the NOT LOGGED INITIALLY parameter
SQLExtendedBind()
This function can be used to replace multiple calls to SQLBindCol() or
SQLBindParameter(), however, important differences should be noted.
19
Typical Row Insert
……
rc = SQLBindParameter(hstmt, 1,
SQL_PARAM_INPUT, SQL_C_LONG, SQL_INTEGER,
0, 0,
(SQLINTEGER *)col1, sizeof((SQLINTEGER *)col1 ),
&lvalue );
rc = SQLBindParameter(hstmt, 2,
SQL_PARAM_INPUT, SQL_C_CHAR, SQL_CHAR,
100, 0,
(SQLCHAR *)col2, 100, NULL );
/* execute the statement, assume that n (100,000) rows to be inserted */
while(pass++<n){
rc=SQLExecute( hstmt );
*col1 = *col1+1;
}
……
20
Suppose that we’d like to assign different value for each Col1, we may do the
same for Col2
20
Array Insert
/* just make up some values for column Col1 and Col2 */
SQLINTEGER col1[]= {1,2,3,4,5,6,7,8,9,10, ……100};
SQLCHAR col2[100][100]= {"A1","B2”,”C3”,”D4”,”E5”,”F6”,”G7”,”H8”,”I9”,”J10”,……”z100”};
/* set array size, 100 for our sample code */
rc=SQLSetStmtAttr(hstmt,
SQL_ATTR_PARAMSET_SIZE,
(SQLPOINTER)100, 0);
/* bind the values to the parameter markers, which is the same as before except this time col1
and col2 are arrays */
rc = SQLBindParameter(hstmt, 1,
SQL_PARAM_INPUT, SQL_C_LONG, SQL_INTEGER,
0, 0, col1, 0, NULL);
rc = SQLBindParameter(hstmt, 2,
SQL_PARAM_INPUT, SQL_C_CHAR, SQL_CHAR,
100, 0, col2, 100, NULL );
while(pass++<n)
rc=SQLExecute( hstmt );
/* ...... */
21
Execute the statement, assume that we’d like to insert 100,000 rows into the
table, but this time only execute 1000 times, because array size is set to 100
Bind the values to the parameter markers, which is the same as before except
this time col1 and col2 are arrays
21
Chaining
/* …… */
rc = SQLSetStmtAttr(statement.hstmt,
SQL_ATTR_CHAINING_BEGIN,
(SQLPOINTER) TRUE,
0);
while ( pass++ <n ) {
rc = SQLExecute(hstmt);
}
rc=SQLSetStmtAttr(statement.hstmt,
SQL_ATTR_CHAINING_END,
(SQLPOINTER) TRUE,
0);
/* …… */
22
An example of using Chaining would be similar to Sample above, except setting
CHAINING_BEGIN and END around the SQLExecute() via SQLSetStmtAttr()
SQL_ATTR_CHAINING_BEGIN
A 32-bit integer which specifies that DB2 will chain together SQLExecute() requests
for a single prepared statement before sending the requests to the server; this feature
is referred to as CLI array input chaining. All SQLExecute() requests associated with
a prepared statement will not be sent to the server until either the
SQL_ATTR_CHAINING_END statement attribute is set, or the available buffer
space is consumed by rows that have been chained. The size of this buffer is defined
by the ASLHEAPSZ dbm cfg for local client applications, or the RQRIOBLK dbm
cfg parameter for client/server configurations. This attribute can be used with the
CLI/ODBC configuration keyword ArrayInputChain to effect array input without
needing to specify the array size. Refer to the documentation for ArrayInputChain for
more information.
SQL_ATTR_CHAINING_END
Causes all chained SQLExecute() requests to be sent to the server. After this attribute
is set, SQLRowCount() can be called to determine the total row count for all
SQLExecute() statements that were chained between the
SQL_ATTR_CHAINING_BEGIN and SQL_ATTR_CHAINING_END pair. Error
diagnostic information for the chained statements becomes available after the
SQL_ATTR_CHAINING_END attribute is set. This attribute can be used with the
DB2 CLI configuration keyword ArrayInputChain to effect array input without
needing to specify the array size. Refer to the documentation for ArrayInputChain for
more information.
22
Use Load API
/* allocate henv, hdbc, connect to database, allocate statement handle,
prepare the statement, assign values to the input variables, bind the
values to the parameter markers */
/* begin to use load */
rc = SQLSetStmtAttr(hstmt, SQL_ATTR_USE_LOAD_API,
(SQLPOINTER) SQL_USE_LOAD_INSERT, 0 );
/* execute the statement, assume that we’d like to insert 100000 rows
into the table */
while(pass++<n){
rc=SQLExecute( hstmt );
*col1 = *col1+1;
}
/* end use load */
rc=SQLSetStmtAttr(hstmt, SQL_ATTR_USE_LOAD_API,
(SQLPOINTER) SQL_USE_LOAD_OFF, 0);
23
CLI calling LOAD to insert the data. Anything related to LOAD operation
would apply to CLI USE_LOAD_API
23
Create Necessary Indexes
• Bottleneck Queries First
• Including Stored Procedures, Triggers
• Only those needed – Indexes can help, could also
hurt
24
How do we know indexes are needed?
0. identify bottleneck queries – snapshot and event monitor data
1. db2advis is a good tool to start……
2. Analyzing the access plan, find the bottlenecks, try to come up an index to
reduce the cost……
3. Testing the index(es) created, ensure it improve the bottleneck queries w/o
hurting other queries too much
24
Example 10. SQLs In The Procedures
•
Trigger on icmut01005001
CREATE TRIGGER CML.TG03_ICMUT01005001 AFTER UPDATE OF
ATTR0000001024 ON CML.ICMUT01005001 REFERENCING NEW AS NEW
FOR EACH ROW MODE DB2SQL WHEN (UPPER(NEW.attr0000001024) not
in ('IC','CN') OR NEW.attr0000001024 is null) BEGIN ATOMIC CALL
CML.ICHG_QUE_PROC (NEW.ATTR0000001021, NEW.ATTR0000001024,
NEW.ATTR0000001025); END
•
SP on ICHG_QUE table
CREATE PROCEDURE CML.ICHG_QUE_PROC (IN ATTR1021
CHARACTER(26), IN ATTR1024 CHARACTER(2), IN ATTR1025 TIMESTAMP)
SPECIFIC CML.ICHG_QUE_PROC LANGUAGE SQL MODIFIES SQL DATA
BEGIN DECLARE V_CNT INTEGER DEFAULT 0; SELECT count(*) INTO
V_CNT FROM CML.ICHG_QUE WHERE CML.ICHG_QUE.ATTR0000001021 =
ATTR1021 WITH UR; IF V_CNT < 1 THEN INSERT INTO CML.ICHG_QUE
ATTR0000001021, ATTR0000001024, ATTR0000001025) VALUES
(ATTR1021, ATTR1024, ATTR1025); END IF; END
No index on ATTR0000001021 which is docID
•
25
In some case a bottleneck SQL may not that obviously. For example when you
have triggers or stored procedure calls, you may need to examine what SQLs
in them.
In the example above, obviously a trigger is defined to call a procedure when
certain condition is met. The procedure contains a SQL counting for
something. Unfortunately the count(*) stmt does not have the column in the
WHERE clause defined as index, therefore a tablescan was inevitable
whenever there is a modification on a table attribute.
How many system could afford a tablescan?
25
An Index That Reduced The Cost
TableScan Happened Before
the Index Addition
IndexScan after the Index is
added on QUE (attr1021)
……
……
TBSCAN
( 3)
9539.24 (cost)
2318 (IO)
|
474808
TABLE: CML
QUE
IXSCAN
( 3)
50.04 (cost)
2
(IO)
|
477516
INDEX: CML
QUE1021
26
Tests on the laboratory server and production system indicated that this index
addition has increased the performance by 230% using C/CLI where a few
thousands records are in the table.
What if there are more than a few thousands records in the table?
26
Example 11. Where Should The
Indexes Be?
Stmt:
update CML.DocTab
set docType =‘X’
where docID=? and docType in (‘Y’,‘Z’)
docID is unique and docType is not, where the index
should be?
27
Which ever has higher cardinality.
27
An index that may hurt the performance
What if an index is defined on docType?
Before Adding the Index
……
/---+---\
1
1.28141e+06
IXSCAN TABLE: CML
( 5)
DocTab
75.0417
3
|
1.28141e+06
INDEX: CML
Index2
Addition Index on doctype
……
/--------+--------\
1
0.0202847
FETCH
TBSCAN
( 5)
( 7)
100.048
0.0457899
4
0
/---+---\
|
1
1.28141e+06
2
IXSCAN TABLE: CML TEMP
( 6)
DocTab
( 8)
75.0417
0.0159013
3
0
|
|
1.28141e+06
2
INDEX: CML
TBSCAN
IndeX2
( 9)
6.67186e-05
0
|
2
TABFNC: SYSIBM
GENROW
28
During examination of the query access plan, it was noticed that dropping an
unnecessary index eliminated three extra operations on temp tables for the
update SQL statement, and further improved the performance by nearly 40times (60 minutes work of updating 50k rows is completed in 1.5 minutes).
Why? doctype has low cardinality
Stmt: update CML.DocTab set docType ='DR' where docID=? and
docType in ('CN','IC')
Choose an index on column(s) that has more cardinality (i.e., docID)
28
Example 12. APIs +/- Index Effect
29
Right indexing (add what needed, remove unnecessary), plus proper APIs,
have made 466x performance gain.
From the Figures above, Indexes effect made API effects appeared small,
however you are looking at double / triple / quadruple differences among the
APIs.
29
Time Saved (Indexes + APIs)
584
Existing Code
Optimized (1st Year Including 40 hrs Coding Effort)
Optimized (Subsequent Years)
600
Hours Needed Per Year
500
400
300
200
1+40
1
100
0
Proactive Optimization Process
30
Considering the ongoing maintenance, each site may process as many as 2~3
million records per year. It would take the original ksh script 584 hours, or the
third party’s Legacy program 1368 hours, to complete the job. The optimized
approach can complete the job in 1.3 hours.
Taking first years 40 hours effort of optimizing the methods into account, the
first year’s hours for marking documents were reduced from 584 (ksh script)
hours to 41 hours, this represents a net 1st year savings of 543 hours on each
site. Subsequent years net saving would be 583 hours on each site. There are 7
(N) such sites on our program.
Points are
•Using the appropriate API for the right job. For example, C/CLI is much
faster than ksh script for batch job processing of many records.
•Creating indexes wisely. i.e., adding a necessary index or dropping an
unnecessary index.
•Some legacy code has had patches+patches+patches…… would it worthy rewriting the core pieces of the code?
30
Proactive Maintenance
•
•
•
•
Reorg (online vs offline)
Append_mode (online insertion)
Runstats (various options)
Monitor switches - do they need to be on?
31
When you have taken care of the Indexes, bufferpools, cfg parameters, logs,
sort, APIs etc.. What else would you do?
How about a stress test to push the system to a level where potential
bottlenecks may become apparent?
How about proactive maintenance?
Is your database need reorg (reorgchk)? Do I have time and resource to reorg?
How often do I need to update statistics?
Is there a need to leave the monitor switches on?
31
Example 13. APPEND_MODE
• Online pages reorganization could have its pros and cons.
ON vs
OFF
(diff %)
DELETE
INSERT/
select
SELECT
UPDATE import
0.04
-75.47
0.06
0.75
-29.87
32
Turing append mode ON helps the insert performance, however nightly or
weekly reorg is needed.
When APPEND_MODE is set to ON, new rows are always appended to the
end of the table. No searching or maintenance of FSCRs (Free Space Control
Records) takes place. This option is enabled using the ALTER TABLE
APPEND ON statement, and can improve performance for tables that only
grow, like journals.
Performance test is needed to verify, because if does have slight performance
degradation on select stmt.
32
Relative Time
Example 14.1 Runstats Options Effect
40
35
30
25
20
15
10
5
0
38
1
DEFAULT
Detailed
Detailed runstats option
NUM_FREQVALUES from 10 to 100,
NUM_QUANTILES from 20 to 200
33
Warning: Performance tests are needed to validate that the option change might help your applications.
This a case of improving data validation utility (select queries mostly)
RUNSTATS ON TABLE schema.OBJECTS ON ALL COLUMNS WITH DISTRIBUTION ON KEY COLUMNS
DEFAULT NUM_FREQVALUES 100 NUM_QUANTILES 200 AND DETAILED INDEXES ALL ALLOW WRITE
ACCESS;
NUM_FREQVALUES
Defines the maximum number of frequency values to collect. It can be specified for an individual column in the ON
COLUMNS clause. If the value is not specified for an individual column, the frequency limit value will be picked up
from that specified in the DEFAULT clause. If it is not specified there either, the maximum number of frequency
values to be collected will be what is set in the NUM_FREQVALUES database configuration parameter.
Current value Number of frequent values retained (NUM_FREQVALUES) = 10
The "most frequent value" statistics help the optimizer understand the distribution of data values within a column. A
higher value results in more information being available to the SQL optimizer but requires additional catalog space.
When 0 is specified, no frequent-value statistics are retained, even if you request that distribution statistics be collected.
NUM_QUANTILES
Defines the maximum number of distribution quantile values to collect. It can be specified for an individual column in
the ON COLUMNS clause. If the value is not specified for an individual column, the quantile limit value will be picked
up from that specified in the DEFAULT clause. If it is not specified there either, the maximum number of quantile
values to be collected will be what is set in the NUM_QUANTILES database configuration parameter.
Current number of quantiles retained
(NUM_QUANTILES) = 20
The "quantile" statistics help the optimizer understand the distribution of data values within a column. A higher value
results in more information being available to the SQL optimizer but requires additional catalog space. When 0 or 1 is
specified, no quantile statistics are retained, even if you request that distribution statistics be collected.
Increasing the value of these two parameters increases the amount of statistics heap (stat_heap_sz) used when
collecting statistics. The default value of statistics heap size (4KB) (STAT_HEAP_SZ) is 4384. You may have to
increase this configuration parameterl.
33
Example 14.2 RUNSTATS CMD
RUNSTATS ON TABLE RMADMIN.RMOBJECTS ON
ALL COLUMNS WITH DISTRIBUTION ON KEY
COLUMNS DEFAULT
NUM_FREQVALUES 100
NUM_QUANTILES 200
AND DETAILED INDEXES ALL ALLOW WRITE
ACCESS ;
34
Default value for num_freqvalues = 10,
num_quantiles = 20
34
How To Identify A Bottleneck?
• Collecting and analyzing the debug data using basic system tools
(vmstat, top, prstat, sar, pmap, iostats etc.); DB2 native tools
(snapshot, event monitor, access plan, db2pd, db2advis etc..); and
profiling tools if need to.
• Query access plan - using the right indexes to reduce the cost of a
bottleneck queries
• Exploring the APIs features based on your need. DB2 supported
APIs (Embedded-SQL, CLI, JDBC, ADO, Perl, CLP……), and their
performance difference; fetch/insert orientations, statement attributes
• Using the query options wisely, such as blocking features, parameter
marking to reuse the statement if repeated calling the same. “DBMS
supposed to do exactly what application (queries) requested to do”
• Understanding the application nature (OLTP or DSS or mixed), and
tuning the DBM and DB configuration parameters accordingly;
• Maintaining the database proactively to ensure the optimal database
performance
35
Could the bottlenecks identification and elimination be automated?
Is there anyone interesting in writing up a program that can automatically
identify performance bottlenecks and eliminate them? Stay tuned.
35
Session D01
Bottlenecks Elimination in Real World DB2 Applications
Sigen Chen
Lockheed Martin
Baltimore, Maryland USA
[email protected]
36
36