Download Finding Java memory leaks in WebSphere Application Server, Part svcdumps

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Finding Java memory leaks in
WebSphere Application Server, Part
2: Using FindRoots and z/OS
svcdumps
Summary
This paper explains how to diagnose and find the source of JavaTM memory leaks on
IBMR WebSphereR Application Servers 3.5, 4.01 and 5.0 on the z/OS platform, using
FindRoots and svcdumps. It presents a general methodology, explains how to gather
diagnostic data, describes tools for analyzing the data, and gives examples of use.
Table of contents
•
•
•
•
•
Summary
Introduction
Reading dumps using FindRoots
• Requirements
• Setting up the environment
• Working with the output
• FindRoots commands and options
• The Convert command
• The Print command
• The PrintRoots command
• The PrintComponents command
• The PrintTree command
• The PrintClassReach command
• Properties for the FindRoots commands
The svcdump analyzer tool (z/OS only)
• svcdump options
• Description of output
Examples
• Memory leak in a J2EE application
• Symptoms
• Clues
• Resolution
• WebSphere Application Server storage footprint growing
• Symptoms
• Clues
• Resolution
•
•
Sporadic application server crash
• Symptoms
• Clues
• Resolution
• False leak
About the authors
Introduction
The first paper in this series focused on using Sun’s Hprof and IBM heapdumps to find
Java memory leaks, primarily on distributed platforms. This paper discusses how to find
leaks on z/OS by analyzing z/OS artifacts using FindRoots and svcdumps. You might
find it useful to read the general “Introduction” and “Java Memory Leaks” sections of the
first article before reading this article.
Reading dumps using FindRoots
FindRoots is a z/OS tool for analyzing memory use and finding memory leaks in Java
programs. Further information on how to run and interpret the output can be found in the
information note referenced in APAR II13295, which we used to write this section. You
can find the APAR at http://www-1.ibm.com/support/docview.wss?uid=isg1II13295H.
The FindRoots package includes a number of separate tools that extract and analyze the
heap information. A typical procedure is as follows:
1.
2.
3.
Generate an svcdump.
Run Convert to extract the heapdump from the svcdump and create a Portable
heapdump (.phd) file. Convert can also create .phd files from other heapdump
formats, such as heapdump.txt as used by AIX.)
Analyze the .phd file with a visualization tool such as PrintRoot.
Requirements
In order to run FindRoots, you need the following:
•
•
•
An svcdump
A JRE
The svcdump.jar file. Currently this file is only internal, but can be obtained from
Java Level 2 support.
Setting up the environment
The first step is to take an svcdump of the address space that is failing with the
OutOfMemory condition. You can either set up the environment to take this dump
automatically when the OutOfMemory condition occurs, or dump the address space at the
point of failure. An automatic dump is best, since manual handling can be error-prone.
To take this dump automatically, set a SYSMDUMP DD card in the start-up procedure of
the web server, and set the appropriate LE runopts. For example:
//BBODGW PROC
// LE PARM='ENVAR("_CEE_RUNOPTS=TER(UADUMP),TRAP(ON,SPIE),ABT(ABEND)")'
// ICSPARM='-B -r /u/phil/httpd.conf -nosec',
//WEBSRV
EXEC PGM=IMWHTTPD,REGION=0K,TIME=NOLIMIT,
//
PARM=('&LEPARM/&ICSPARM')
//STEPLIB DD DSN=SYS1.LINKLIB,DISP=SHR
//
DD DSN=CEE.SCEERUN,DISP=SHR
//
DD DSN=CBC.SCLBDLL,DISP=SHR
//SYSIN
DD DUMMY
//OUTDSC
OUTPUT DEST=HOLD
//SYSPRINT DD SYSOUT=*,OUTPUT=(*.OUTDSC)
//SYSERR
DD SYSOUT=*,OUTPUT=(*.OUTDSC)
//STDOUT
DD SYSOUT=*,OUTPUT=(*.OUTDSC)
//STDERR
DD SYSOUT=*,OUTPUT=(*.OUTDSC)
//SYSOUT
DD SYSOUT=*,OUTPUT=(*.OUTDSC)
//CEEDUMP DD SYSOUT=*,OUTPUT=(*.OUTDSC)
//SYSMDUMP DD SYSOUT=*
Now set the following environment variables in your httpd.envvars file:
•
IBM_XE_COE_NAME=java/lang/OutOfMemoryError (must be at SR14 of the
SDK)
•
IBM_JAVA_ZOS_TDUMP=NO (this tells the web server to take an svcdump, not a
transaction dump, as the result of the OutOfMemory condition)
To manually take a dump of the web server address space, run the following console
commands while the web server is running:
DUMP COMM=(Descriptive name for this Webserver dump)
R rn,SDATA=(CSA,SQA,RGN,TRT,GRSQ,LPA,LSQA,SUM,NUC,PSA),CONT
R rn,JOBNAME=(server_name),CONT
R rn,DSPNAME=('OMVS'.*),END
where rn is the message number indicator on the console, and server_name is the name of
your web server. This will write the dump to the dump dataset you have predefined as
part of your OS/390 set-up.
Working with the output
After the svcdump is taken, copy it to a location where you can execute classes from
svcdump.jar. You must have a JRE installed, either on a workstation or on z/OS UNIX
system Services.
The extraction of heap data from the svcdump comprises two steps:
1.
2.
Converting the data, which is done with the Convert program and produces a
.phd file.
Displaying the date, which is done with the PrintRoots or PrintComponents
programs.
The command line syntax for running these two steps is the following:
java -classpath full_path_to_svcdump.jar com.ibm.jvm.svcdump.Convert
full_path_to_svcdump
java -classpath full_path_to_svcdump.jar com.ibm.jvm.svcdump.PrintRoots
full_path_to_.phd_file
Example:
java -classpath c:\svcdump.jar com.ibm.jvm.svcdump.Convert
c:\outofmemory.svc.dump
java -classpath c:\svcdump.jar com.ibm.jvm.svcdump.PrintRoots
c:\dump.phd
Here is some sample output from the PrintComponents module of FindRoots:
1877294 0x2b9c0e80 com/ibm/servlet/classloader/DynamicClassLoader
1255299 0x2b9c1558 com/ibm/servlet/classloader/SystemClassLoader
1255287 0x2b9c1638 sun/misc/Launcher$AppClassLoader
5683 0x2b9c16a0 sun/misc/Launcher$ExtClassLoader
5631 0x2b9f5b68 sun/misc/URLClassPath
5587 0x2b9f5ac0 java/util/ArrayList
5586 0x2ba4e0f8 array of java/lang/Object
5585 0x2b9f82b8 sun/misc/URLClassPath$JarLoader
468 0x2b9ead08 java/util/Hashtable
467 0x33b89fc8 array of java/util/Hashtable$Entry
1219496 0x2b9eac78 java/util/Vector
1219495 0x33d41910 array of java/lang/Object
1151895 0x2b9709d0 class
com/wily/introscope/agent/blame/ComponentTracer
1151892 0x2d25b6d8
com/wily/introscope/agent/stat/DataAccumulatorFactory
1150580 0x2d13f4b0
com/wily/introscope/agent/enterprise/EnterpriseAgent
The number on the left of each line is the reachability for that object. It will be either the
number of objects or the size of memory occupied, depending on whether the
findroots.sizematters property is set. Reachability is defined as the number of
objects (or size occupied by them) that are reachable from a given object by following
reference links. The second field is the address of the object in the heap, and the third
field is the class name.
Child objects are indented. In the above output, DynamicClassLoader has a reference to
SystemClassLoader, and SystemClassLoader is a child and is indented. Similarly
SystemClassLoader points to Launcher$AppClassLoader. The latter has three
children: Launcher$ExtClassLoader, Hashtable and Vector.
In order to avoid clutter in the output, objects with a small reachability (less than 1000)
are pruned and do not display. In addition, objects are only printed once, so if the same
object is pointed to by more than one parent, it will appear only the first time.
FindRoots commands and options
The following sections describe FindRoots commands.
The Convert command
The first stage of analysis is to create a .phd file by entering the following command:
java -classpath svcdump.jar com.ibm.jvm.findroots.Convert filename
In some situations, there can be multiple copies of the JVM in the dump. If this happens,
you will get a message asking you to specify which JVM to use. Specify the index of the
JVM as follows:
java -classpath svcdump.jar com.ibm.jvm.findroots.Convert -jvm index
filename
The Print command
The Print command prints the contents of a .phd file. Use as follows:
java -classpath svcdump.jar com.ibm.jvm.findroots.Print filename
By combining this command with traditional Unix filters such as sort and uniq, you can
do things like calculate which class has the most objects. For example, the following
command:
cut -d ' ' -f 3 output | sort | uniq -c | sort -n -r
takes the file output and cuts the third field out (using space as a field separator), then
pipes it into sort, and then pipes that into uniq, which is used to count how many of each
repeated line there are, then pipes that output again into sort to do a reverse sort based on
the count. Try it, you'll see!
The PrintRoots command
PrintRoots performs an analysis of a .phd file. Use as follows:
java -classpath svcdump.jar com.ibm.jvm.findroots.PrintRoots filename
You can also specify the maximum depth with the findroots.depth property. This
property specifies the maximum number of levels of indentation. The default is 1000. If
you make it too big, you can get very large output.
The PrintComponents command
David Griffiths, the author of these tools, recommends using this command rather than
PrintRoots. It basically does the same thing as PrintRoots, except that it collapses the
strong components (that is, the components that take the most resources) into just one
entry, which produces much less cluttered output. The contents of the strong components
are enumerated at the end.
Here is some sample output:
30035 0x76ab 0x144a7318 class java/lang/ref/Finalizer
29983 0x76a9 #30377 51
29409 0x73df #29663 85
20164 0x73dd 0x14bbf430 sun/misc/URLClassPath
20150 0x73dc 0x14bbe160 java/util/ArrayList
20149 0x73db 0x14bbe288 array of java/lang/Object
20135 0x73da #29658 14
12238 0x6336 #25398 2
12233 0x6335 0x14d33708
java/util/jar/JarVerifier
12221 0x6329 0x14ca1b18
java/util/jar/Manifest
12211 0x6328 0x14ca1a38 java/util/HashMap
12210 0x6327 0x14d46280 array of
java/util/HashMap$Entry
12221 0x6329 0x14ca1b18 java/util/jar/Manifest
(already visited)
Where a strong component has been collapsed, #nnnn is printed (where nnnn is the
component number) followed by the number of objects in that component. The actual
contents of the components are printed at the end. For example:
*** #25398 ***
0x14c9d980 java/util/jar/JarFile$JarFileEntry
0x14ca1d10 java/util/jar/JarFile
The PrintTree command
PrintTree prints the tree below a given object. This approach short-circuits the usual
FindRoots analysis, so you need to already know the ID of the object you're interested in.
It's useful for iterative approaches; for example, if you discover that the depth wasn't big
enough and you don't want to start all over again.
If you specify –inverse, the inverse of the tree is printed. That is, it prints the tree above
the given object, starting with the parents, then their parents and so on.
java -classpath svcdump.jar com.ibm.jvm.findroots.PrintTree -root id
filename
The PrintClassReach command
The PrintClassReach command prints the space reachable by all objects of the given
class. Use as follows:
java -classpath svcdump.jar com.ibm.jvm.findroots.PrintClassReach class classname filename
Use this tool when you suspect that many objects of a given class are causing the problem
rather than one particular object tree. Note that classname should be in directory style; for
example: java/lang/Object.
Properties for the FindRoots commands
All of the Print commands have a set of standard properties. Default values for these
properties may be specified in a properties file called .svcdumprc in the user's home
directory, or they may be specified on the command line as system properties with -D.
Note that the -D is a an option for the Java command, not the Print command, so it must
appear before the Print classname. For example:
java -Dfindroots.depth=2000 -classpath svcdump.jar
com.ibm.jvm.findroots.PrintRoots filename
Property
Default
Meaning
findroots.depth
findroots.prune
1000
Controls how deep to print the output tree
in terms of the number of levels of
indentation
1000 or
10000
Objects with reachability smaller than this
amount are not printed. This is to avoid
cluttering up the output with millions of
lines for objects that don't contribute much
to the picture. The default depends on
whether sizematters is used, with the
larger value being that for
sizematters=true .
findroots.sizematters
false
Controls whether to base the reachability
on the cumulative size of each object or
the total number of individual objects.
Despite the fact it sounds like a useful
thing to have switched on, we strongly
recommend not setting it. The reason is
that in the vast majority of cases, size
doesn't really matter at all: the heart of the
problem is invariably due to the sheer
number of objects being kept alive.
Switching on sizematters will tell you
how much actual space is involved, but it
won't change the basic structure of the tree.
The main reason for not running with it on
is that it takes much longer to calculate. If
you notice that the tree takes too much
time to be printed, rerun the command
without sizematters.
findroots.maxroots
findroots.exact
20
Controls how many roots are output, where
appropriate.
false
Controls whether to base the reachability
on the cumulative size of each strong
component rather than the much quicker
way of treating each component as one
element. As with sizematters, turning
this on produces much longer running
times, and in most cases shouldn't be
necessary. Having it switched off
effectively treats each strong component as
one node.
findroots.uselessmemory
false
FindRoots commands can sometimes
require an awful lot of memory to run.
When this property is set it forces the use
of a different algorithm to calculate the
reachability, which is slower but uses less
memory. One drawback of this algorithm
is that the counts of everything but the root
itself are based on a single depth first
search of the tree and so are not as
accurate.
The svcdump analyzer tool for z/OS
The svcdump analyzer is a tool for analyzing z/OS svcdumps with particular emphasis on
the Java component. You can use it instead of IPCS for obtaining tracebacks from a
dump. It has the advantage of printing Java method names, among other things.
svcdump options
Start svcdump as follows:
java -classpath svcdump.jar com.ibm.jvm.svcdump.Dump [options] filename
where options are:
-cache
print alloc cache
-dis addr n
disassemble n instructions starting at addr (hex)
-dump addr n
dump n words of storage starting at addr (hex)
-dumpclasses
dump all the classes and their methods
-dumpclass addr
dump the class at addr
-dumpprops
dump all the system properties
-exception
print old exceptions
-heap
print a summary of objects in the heap
-systrace
prints the system trace table
-rn
include saved register n in stack trace
-verbose
print extra info
-debug
print debug info
These options are described in more detail below.
-cache
Prints the Java alloc cache. This can give an idea of the most recently allocated
objects, which will still be in the cache.
Sample output:
alloc cache info:
cache_busy = 0x0
cache_size = 0x7bbc
cache_block = 0x14208ef0
cache_orig_size = 0x10004
14210aac: len = 20 methods = efd7ee0 flags = 0 class =
java/lang/String
14210acc: len = 70 methods = 32 flags = 2a
14210b3c: len = 68 methods = 2c flags = 2a
14210ba4: len = 20 methods = efd7ee0 flags = 0 class =
java/lang/String
14210bc4: len = 20 methods = 110c4d00 flags = 0 class =
java/lang/ref/Finalizer
14210be4: len = 20 methods = 110c7560 flags = 0 class =
java/lang/ClassLoader$NativeLibrary
14210c04: len = 20 methods = efd7ee0 flags = 0 class =
java/lang/String
14210c24: len = 50 methods = 1f flags = 2a
14210c74: len = 20 methods = efd7ee0 flags = 0 class =
java/lang/String
-exception
Forces the printing of any leftover exception objects. (Pending exceptions are
printed by default). For each thread, this will give an indication of the last
exception that was thrown, if any. The exception class and the detail is printed.
Sample output:
found old exception: java/lang/NoSuchMethodError: setInternalError
-dis addr n
Disassembles n instructions starting at the hex address addr. The disassembler is
not complete, but it knows about the most common instructions. When it comes
across an instruction it doesn't understand, it will throw an exception and exit. This
is usually because it has gone past the end of the function.
Sample output:
Disassembly
0x11a90c48:
0x11a90c6a:
0x11a90c6e:
0x11a90c72:
0x11a90c76:
0x11a90c7a:
0x11a90c7e:
0x11a90c82:
0x11a90c86:
0x11a90c8a:
0x11a90c8e:
0x11a90c92:
0x11a90c94:
0x11a90c98:
starting at 0x11a90c48
(0x00000000): B
x'22'($r15)
(0x00000022): STM
$r14,$r11,x'c'($r13)
(0x00000026): L
$r14, x'4c'($r13)
(0x0000002a): LA
$r0, x'450'($r14)
(0x0000002e): CL
$r0, x'314'($r12)
(0x00000032): LA
$r3, x'3a'($r15)
(0x00000036): BGT
x'14'($r15)
(0x0000003a): L
$r15, x'280'($r12)
(0x0000003e): STM
$r15,$r0,x'48'($r14)
(0x00000042): MVI
x'0'($r14), x'10'
(0x00000046): ST
$r13, x'4'($r14)
(0x0000004a): LR
$r13, $r14
(0x0000004c): L
$r4, x'1f4'($r12)
(0x00000050): L
$r5, x'7ae'($r3)
0x11a90c9c:
0x11a90ca0:
0x11a90ca4:
0x11a90ca8:
0x11a90cac:
0x11a90cb0:
0x11a90cb4:
0x11a90cb6:
0x11a90cba:
0x11a90cbe:
0x11a90cc2:
0x11a90cc6:
0x11a90cca:
0x11a90cce:
0x11a90cd2:
0x11a90cd6:
0x11a90cda:
0x11a90cde:
0x11a90ce2:
0x11a90ce6:
0x11a90cea:
0x11a90cee:
0x11a90cf2:
0x11a90cf6:
0x11a90cfa:
(0x00000054):
(0x00000058):
(0x0000005c):
(0x00000060):
(0x00000064):
(0x00000068):
(0x0000006c):
(0x0000006e):
(0x00000072):
(0x00000076):
(0x0000007a):
(0x0000007e):
(0x00000082):
(0x00000086):
(0x0000008a):
(0x0000008e):
(0x00000092):
(0x00000096):
(0x0000009a):
(0x0000009e):
(0x000000a2):
(0x000000a6):
(0x000000aa):
(0x000000ae):
(0x000000b2):
LA
LA
ST
L
LM
ST
BALR
L
L
LA
ST
LA
LA
LA
ST
LA
LM
ST
ST
ST
ST
LA
ST
ST
BALR
$r2, x'c4'($r13)
$r1, x'98'($r13)
$r2, x'98'($r13)
$r14, x'170'($r5,$r4)
$r15,$r0,x'8'($r14)
$r0, x'1f4'($r12)
$r14, $r15
$r7, x'7b2'($r3)
$r6, x'174'($r5,$r4)
$r0, x'f4'($r13)
$r0, x'ac'($r13)
$r1, x'407'($r7)
$r14, x'fc'($r13)
$r10, x'234'($r13)
$r1, x'9c'($r13)
$r11, x'41c'($r7)
$r15,$r0,x'8'($r6)
$r10, x'98'($r13)
$r11, x'438'($r13)
$r11, x'a0'($r13)
$r2, x'a4'($r13)
$r1, x'98'($r13)
$r14, x'a8'($r13)
$r0, x'1f4'($r12)
$r14, $r15
-dump addr n
Dumps n words of storage starting at the hex address addr. Note that there is
currently no way to specify the address space to use -- it defaults to the first Java
address space.
If n is zero, then the address is treated as an ascii string instead.
-rn
Includes saved register n in stack trace. Note that this assumes that that register has
actually been saved in the DSA.
Sample output:
found Usta TCB a7e288 tid 1ac30e20 caa 109f5120
Dsa
Entry
Offset
r12
Function
--------------------10a06d90
11aa58d8
fff8909a
109f5120
SYSTDUMP
10a06940
11a90c48
00000414
109f5120
ThreadUtils_CoreDump
10a06830
11a734b0
000004b2
0ef473a0
userSignalHandler
10a06780
11a73a18
000000b8
40404040
intrDispatch
10a066c8
061596b8
000000c4
40404040
@@GETFN
10a06068
0628af48
0000075e
109f5120
__zerros
10a034d0
00000008
0638260e
109f5120
null
10a02a70
11a566d0
fefac3f8
00000000
CompareAndSwap_Impl
10a029c0
119223f0
000000a2
109f5120
pin_object
10a02918
11740c40
00000114
109f5120
jni_GetPrimitiveArrayElements
10a02710
1af1a388
000000b2
109f5120
MVS_CcicsInit
10a02660
1af143b8
000000b0
109f5120
Java_com_ibm_ctg_server_ServerECIRequest_CcicsInit
10a025a8
06419078
0000005c
109f5120
CEEPGTFN
10a02118
119a20d0
00000138
109f5120
MMIPSJNI
10a02038
1199fd48
000003ce
109f5120
mmisInvokeJniMethodHelper
(com/ibm/ctg/server/ServerECIRequest.CcicsInit)
10a01f68
1198e3f8
00000100
109f5120
mmipInvokeJniMethod
(com/ibm/ctg/server/ServerECIRequest.CcicsInit)
Description of output
If no options are specified, the default output consists of a listing of all the TCBs in every
valid address space.
For each TCB, the traceback is listed, as well as the trace table, if any. The trace table is
obtained from the System Trace entries for that TCB. The addresses found are converted
into the closest function entry point and the trace lists the function names found in the
order of which functions appear most often. Often you will see "unknown function
(address)," which means that that the function name for that address could not be
determined, possibly because it is in the kernel. Still, it gives some indication of what the
TCB has been up to recently.
The environment variables and the DLL table for this address space are also printed out.
Examples
Following are some examples of memory leaks and how they were diagnosed and
resolved.
Memory leak in a J2EE application
Symptoms
A J2EE application was consuming more and more storage, obviously the result of a
memory leak.
Clues
First, we let the application run for awhile, driven by a stress test tool. We watched the
heap grow. Then we took a console svcdump of the server region (see Setting up the
environment) and extracted the heap information with the command :
java -classpath svcdump.jar com.ibm.jvm.svcdump.Dump -heap
/mydir/SV00204.dump > /mydir/SV00204.summ
The .summ file is a text file. We looked at the last section of this file, titled "dump of live
objects in the heap." Here is the beginning of this section:
*** dump of live objects in the heap ***
count
----1095288
558977
class
----array of references
array of boolean
555852
279457
279457
279457
279457
257722
array of byte
com/ibm/ejs/container/DispatchContextImpl
com/ibm/ejs/container/EJSDeployedSupport
com/ibm/ejs/csi/TxCookieImpl
com/ibm/ejs/container/RemoteExceptionMappingStrategy
java/util/HashMap
Resolution
Here, we saw that the com/ibm/ejs/ objects had an abnormally high count, so we
examined the code to find why they weren't reclaimed. It turned out that the code kept
references to these objects in several different places, which prevented them from being
garbage-collected. Problem solved.
WebSphere Application Server storage footprint
growing
Symptoms
A WebSphere customer complained of an Application Server footprint that got larger and
larger with time. Upon investigating the problem, we discovered that this customer used
an application in which client web requests caused XML documents to be submitted to
the application server. The application server parsed and processed the documents.
Clues
A Java heapdump of the Application Server JVM showed the dominant object in memory
to be an org.w3c.dom.Document object.
Resolution
We discussed this observation with the customer application architect and found out that
requests were handled on the Application Server by creating a new thread subclassing
java.lang.Thread. This class also contained instance data of type
org.w3c.dom.Document.
When we tried to navigate the heapdump to find the holder of this object/class, we found
that the WebSphere security component had a hash table to associate threads and their
corresponding credentials. The Thread object itself was used as the key in the table.
Given that the number of threads in WebSphere was constant, using a ThreadPool model,
this never appeared as a leak because the Thread ID was used as a hash key into the table.
The entries in this table never got deleted. Since the customer application was starting a
new Thread with every request, it amplified the problem. Of course, in this case the
association table contained the customer's subclass of the thread class, which in turn was
the holder of the org.w3c.doc.Document objects that were submitted for each request.
In summary, a hash table kept accumulating references to a thread object, which held, in
turn, references to large objects.
The resolution was a modification to WebSphere base code to clean the hashtable when
the thread wasn't needed anymore.
Sporadic application server crash
Symptoms
A customer complained that their application server would crash every so often. The
customer suspected a memory leak in their environment. Further analysis of the problem
revealed that all the server crashes had been occurring during a two-hour peak period. We
asked the customer to provide his JVM settings and verbosegc output for initial analysis.
We found that the customer had set the JVM -Xms (initial heap size) to 128M and -Xmx
(maximum heap size) to 256M. Based upon the verbosegc output, it seemed that memory
was being allocated and recovered regularly in the Java heap, except during peak hours
when the load was maximum. During peak hours, the memory would continue to grow
until the JVM would produce java.lang.OutOfMemory errors and become unresponsive.
Since the problem was occurring in production, the customer could not instrument a
profiler to debug the problem, due to the performance issues.
Clues
The problem occurred during high load times. The same transactions at off-peak hours
executed normally. Problems did not occur in the test system, but only showed up in
production.
Resolution
The clues pointed to a resource problem. We changed the -Xmx value to 512M to see the
net effect. The next day during peak hours, the heap grew as large as 420M but was
recovered after some time. This seemed to fix the OutOfMemory errors and server
crashes.
The question still remained: why didn’t the customer experience this problem in their
stress test environment? After tuning the system, we generated several heapdumps during
peak hours. We discovered that during peak hours, as all the users logged in, lots of
session objects were created in a very short time. Since these users did not typically log
out or close the browsers, the sessions would not get invalidated until after they expired.
The problem did not occur in the test system because the users used by the load
generating tool always logged out of their sessions. In order to fix this problem, the time
for session expiration was shortened from 30 minutes to 18 minutes.
False leak
A common false memory leak is caused by turning on WebSphere Ring Buffer Trace -trace going to memory with manual dump to file. This is a false memory leak because,
while it uses a large amount of memory, the amount is predictable and controllable. If not
accounted for in the Java heap size, it can cause an OutOfMemoryError to occur, hence
the false impression of leak. A heapdump analysis will show an excessive amount of
memory associated with com.ibm.ejs.ras.TraceEvent objects.
To estimate the amount of memory needed by Ring Buffer Trace, multiply the total
number of trace lines configured by 200. For example, for 200,000 trace lines, the
estimated amount would be 200,000 x 200 = 40M bytes. This means one million lines
would require 200MB heap memory! Obviously, this trace facility can use much more
memory than anticipated by a user setting a detailed trace or a long elapsed time trace.
About the authors
The authors of this paper are:
Steve Eaton (IBM Austin, TX)
Steve Eaton has been part of the WebSphere Application Server technical support
team for three years.
Frederic Mora (IBM Poughkeepsie, NY)
Frederic has been involved in development for ten years and in WebSphere testing
for three years. He is now providing support to WebSphere on zSeries customers.
Hany Salem (IBM Austin, TX)
Hany Salem is the lead serviceability architect for WebSphere Application Server.
Acknowledgments
This paper benefited greatly from the help of several people. The authors would like to
extend their thanks to:
Michel Betancourt (IBM Raleigh, NC)
Michel graduated from Florida International University two years ago and has been
supporting WebSphere Application Server ever since.
Jim Cunningham (IBM Poughkeepsie, NY)
Jim is a performance analyst working on WebSphere for zOS. He has worked on
WebSphere performance for the past five years.
Phillip Helm (IBM Raleigh, NC)
Phil is the team lead for WebSphere Application Server for z/OS Level 2. He has
been in support and service for IBM HTTP Server and WebSphere Application
Server for z/OS for 4 years.
Keith Kopycinski (IBM Poughkeepsie, NY)
Keith comes from WebSphere Development. He is a technical lead on the
WebSphere Level 2 support team and is responsible for identifying and resolving
serviceability issues for WebSphere on z/OS.
Arun Kumar (IBM Austin, TX)
Arun has been with WebSphere Development and Service organization for a
number of years. He works on enhancing Problem Determination and Serviceability
characteristics of WebSphere.
David Screen (IBM Hursley, UK)
Dave joined IBM in the Java Technology Centre Service team about 18 months
ago. He currently works in the Process Automation (Build) team. He develops
HeapRoots in his free time because getting to program some Java is fun.
Ron Verbruggen (IBM Raleigh, NC)
Ron is a WebSphere Senior Software Engineer. He specializes in WebSphere
Serviceability and Support.