Download New Courier Enhancement

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Network effect wikipedia , lookup

Choice modelling wikipedia , lookup

Transcript
New Courier Enhancement
This new courier enhancement provides a timeout feature that enables the user to gracefully
break down any courier connection (for any given stub type, such as CSMs, DOCs, INXs, etc.)
based on a user configured value. The enhancement makes the Osar server more resilient to
connectivity and application issues by not allowing CSMs server stub connections to remain in a
RECEIVE state waiting for data over the network for an extended period of time.
The timeout period can be set as a static number in the /fnsw/etc/serverConfig file or
dynamically through the new utility named "COR_set_recv_timeout". An entry in the IS elog will
be added when the timeout value is set and when the timeout has been triggered. The "IS Elog
Examples of Timeouts" section below contains elog samples.
Set Timeout as a static number that remains persistent between IS recycles
Server stubs can be configured in the /fnsw/etc/serverConfig file with a static number that
will persist between recycles of IS software. The enhancement will look for a number in the
seventh column and use it as the timeout value if it exists.
If no number exists in the seventh column the timeout feature is set to off and the original
code base will be used.
Here is an example of setting the timeout value to 90 seconds for CSMs and BESs stub in
the serverConfig file:
NCHs
CSMs
DOCs
INXs
PRIs
BESs
02
0134231040
0134231041
0134231042
0134231043
0134231044
2
1
1
1
1
1
24
24
24
24
24
16
1
1
1
1
1
1
0
0 90
0
0
0
0 90
Adding a timeout value to serverConfig file is optional. The values set in this file can be
overridden by the utility COR_set_recv_timeout described below.
New Tool to Dynamically Set the Timeout
A new utility named COR_set_recv_timeout has been release with this enhancement. This
tool overrides the timeout value set in the trigger file. The value set by this tool will take affect
for new connections. Existing connections will use the old value.
Usage:
COR_set_recv_timeout [Server Stub][timeout in seconds]
Usage Options with Examples:
0
Turns off the timeout feature.
Example:
COR_set_recv_timeout CSMs 0
Set courier recv timeout value for CSMs to 0 seconds.
timeout in seconds Sets the threshold timeout value
Example of setting the timeout to 120 seconds.
COR_set_recv_timeout CSMs 120
Set courier recv timeout value for CSMs to 120
seconds.
no parameters
Displays the current threshold timeout
If run with no parameters, it displays the current value of the timeouts for
each server stub.
A value of zero means there is no timeout.
Example that shows the timeout is set to 120 seconds for CSMs
COR_set_recv_timeout
Current Courier recv timeout values:
Prog_name
Prog_number
timeout (secs)
===============================================
NCHs
2
0
CSMs
134231040
120
DOCs
134231041
0
INXs
134231042
0
PRIs
134231043
0
BESs
134231044
0
-h
Displays tool usage
Example of -h.
COR_set_recv_timeout -h
Usage: COR_set_recv_timeout [Stub type] timeout in
seconds]
Non-Integer
message:
If run with any parameter other than an integer, it displays a usage
COR_set_recv_timeout CSMs xxx
Usage: COR_set_recv_timeout [Stub type][timeout in
seconds]
IS Elog Examples of Timeouts
Example of setting the timeout to 60 seconds using the COR_set_recv_timeout utility
2013/07/08 15:02:23.906 0,0,1 <fnsw> COR_set_recv_timeout BESs 29
(11044.1.182 0x2b24.
1) ...
Set courier recv timeout value of BESs to 29 seconds.
Example of a Timeout due to the Client Application not closing the connection
2013/07/08 15:03:12.860 155,20,4 <fnsw> COR_Listen -pt -s32769 t3600 -d100 (10262.7.36 0x2816.7) ...
Courier error: recv() timed out (29 secs) from ip address
9.39.2.127 [51509]. Request handler type = 134231044
Example of timeout due to communication error
2010/09/29 15:43:55.313 155,20,4 <fnsw> COR_Listen -pt -s32769 t3600 -d100 (409814.16706.146 0x640d6.4142) ...
Courier error: recv() timed out (29 secs) from ip address
9.39.2.145 [1821]
Files Included
/fnsw/bin/COR_set_recv_timeout
/fnsw/bin/COR_Listen
/fnsw/lib/shobj/COR
How to Determine if a Connection is Waiting on a Client
The existing cormon tool monitors the current state of the client-server connection on the
FileNet system.
Here are 2 common courier states that show that the server is waiting for data over the
network.
RCV
Blocked, waiting for network data
RCVMSG Blocked, waiting for network data
For details on this tool, refer to page 164 of the Image Services System Tools Reference
Manual, Release 4.1.2.
ftp://public.dhe.ibm.com/software/data/cm/filenet/docs/isdoc/412x/Systools.pdf
Example:
Here is a clip from your cormon output on Sep 16 that shows 3 CSMs stubs that have
been waiting over 10 minutes (over 600 seconds).
COMMAND: cormon -p / DATE: Thu Sep 16 12:57:30 EDT 2010
CORH_state Srvr PID ChldPID Prog Time LatestUser@Address
RCVMSG X 450776
0 CSMs 615 [email protected]
[1946]
RCVMSG X 1237192
0 CSMs 623 [email protected]
[2177]
RCVMSG X 1585330
0 CSMs 628 [email protected]
[4124]
What causes a server stub to remain in a RCVMSG state for an extended
period?
A server stub will be placed in a RCVMSG state when the IS server sends data back to the client
over the network. It will remain in this state until the connection is closed by the client or if the OS
tells courier that the network connection no longer exists. This means that any reason why the
close is not received could cause this condition. Here are some application and network
examples of why a connection may not be closed.
Application Examples
 The client application is waiting for a user response before sending a close.
 The client application has a bug that does not always send a close.
 The client application received a error prior to closing the connection and the
connection was not closed during its error handling.

The client application aborted before closing the connection.
Network Examples
 The client application did not received the data from the server for any reason.
 The client application send a close but, was not received by the server for any reason
such as, the packet was dropped.
Common CSMs RPC Example
The eladisp sample below shows typical series of RPCs that will open a CSM connection,
place it into RCVMSG state and then close it.



A CSMs stub would be opened on the CSM:OpenCsumObject call.
The CSMs state would be put into a RCVMSG when the IS server sends data back to
the client over the network as a reply to the RPC request.
The CSMs connection would be closed and removed from the cormon output when
the CSM:CloseObject is received from the client.
eladisp sample of a typical CSMs call
$514
CSM:OpenCsumObject
12:54:31.150 0.00 Secs 0.00 KB
NETID = [1add9e55,000000000000,0000]
------------------------------------------------------$842
CSM:ReadObject
12:54:33.850 0.00 Secs 65.54 KB
NETID = [1add9e55,000000000000,0000]
------------------------------------------------------$843
CSM:ReadObject
12:54:33.850 0.00 Secs 22.77 KB
NETID = [1add9e55,000000000000,0000]
------------------------------------------------------$844
CSM:CloseObject
12:54:33.850 0.00 Secs 0.00 KB
NETID = [1add9e55,000000000000,0000]