Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
New Courier Enhancement This new courier enhancement provides a timeout feature that enables the user to gracefully break down any courier connection (for any given stub type, such as CSMs, DOCs, INXs, etc.) based on a user configured value. The enhancement makes the Osar server more resilient to connectivity and application issues by not allowing CSMs server stub connections to remain in a RECEIVE state waiting for data over the network for an extended period of time. The timeout period can be set as a static number in the /fnsw/etc/serverConfig file or dynamically through the new utility named "COR_set_recv_timeout". An entry in the IS elog will be added when the timeout value is set and when the timeout has been triggered. The "IS Elog Examples of Timeouts" section below contains elog samples. Set Timeout as a static number that remains persistent between IS recycles Server stubs can be configured in the /fnsw/etc/serverConfig file with a static number that will persist between recycles of IS software. The enhancement will look for a number in the seventh column and use it as the timeout value if it exists. If no number exists in the seventh column the timeout feature is set to off and the original code base will be used. Here is an example of setting the timeout value to 90 seconds for CSMs and BESs stub in the serverConfig file: NCHs CSMs DOCs INXs PRIs BESs 02 0134231040 0134231041 0134231042 0134231043 0134231044 2 1 1 1 1 1 24 24 24 24 24 16 1 1 1 1 1 1 0 0 90 0 0 0 0 90 Adding a timeout value to serverConfig file is optional. The values set in this file can be overridden by the utility COR_set_recv_timeout described below. New Tool to Dynamically Set the Timeout A new utility named COR_set_recv_timeout has been release with this enhancement. This tool overrides the timeout value set in the trigger file. The value set by this tool will take affect for new connections. Existing connections will use the old value. Usage: COR_set_recv_timeout [Server Stub][timeout in seconds] Usage Options with Examples: 0 Turns off the timeout feature. Example: COR_set_recv_timeout CSMs 0 Set courier recv timeout value for CSMs to 0 seconds. timeout in seconds Sets the threshold timeout value Example of setting the timeout to 120 seconds. COR_set_recv_timeout CSMs 120 Set courier recv timeout value for CSMs to 120 seconds. no parameters Displays the current threshold timeout If run with no parameters, it displays the current value of the timeouts for each server stub. A value of zero means there is no timeout. Example that shows the timeout is set to 120 seconds for CSMs COR_set_recv_timeout Current Courier recv timeout values: Prog_name Prog_number timeout (secs) =============================================== NCHs 2 0 CSMs 134231040 120 DOCs 134231041 0 INXs 134231042 0 PRIs 134231043 0 BESs 134231044 0 -h Displays tool usage Example of -h. COR_set_recv_timeout -h Usage: COR_set_recv_timeout [Stub type] timeout in seconds] Non-Integer message: If run with any parameter other than an integer, it displays a usage COR_set_recv_timeout CSMs xxx Usage: COR_set_recv_timeout [Stub type][timeout in seconds] IS Elog Examples of Timeouts Example of setting the timeout to 60 seconds using the COR_set_recv_timeout utility 2013/07/08 15:02:23.906 0,0,1 <fnsw> COR_set_recv_timeout BESs 29 (11044.1.182 0x2b24. 1) ... Set courier recv timeout value of BESs to 29 seconds. Example of a Timeout due to the Client Application not closing the connection 2013/07/08 15:03:12.860 155,20,4 <fnsw> COR_Listen -pt -s32769 t3600 -d100 (10262.7.36 0x2816.7) ... Courier error: recv() timed out (29 secs) from ip address 9.39.2.127 [51509]. Request handler type = 134231044 Example of timeout due to communication error 2010/09/29 15:43:55.313 155,20,4 <fnsw> COR_Listen -pt -s32769 t3600 -d100 (409814.16706.146 0x640d6.4142) ... Courier error: recv() timed out (29 secs) from ip address 9.39.2.145 [1821] Files Included /fnsw/bin/COR_set_recv_timeout /fnsw/bin/COR_Listen /fnsw/lib/shobj/COR How to Determine if a Connection is Waiting on a Client The existing cormon tool monitors the current state of the client-server connection on the FileNet system. Here are 2 common courier states that show that the server is waiting for data over the network. RCV Blocked, waiting for network data RCVMSG Blocked, waiting for network data For details on this tool, refer to page 164 of the Image Services System Tools Reference Manual, Release 4.1.2. ftp://public.dhe.ibm.com/software/data/cm/filenet/docs/isdoc/412x/Systools.pdf Example: Here is a clip from your cormon output on Sep 16 that shows 3 CSMs stubs that have been waiting over 10 minutes (over 600 seconds). COMMAND: cormon -p / DATE: Thu Sep 16 12:57:30 EDT 2010 CORH_state Srvr PID ChldPID Prog Time LatestUser@Address RCVMSG X 450776 0 CSMs 615 [email protected] [1946] RCVMSG X 1237192 0 CSMs 623 [email protected] [2177] RCVMSG X 1585330 0 CSMs 628 [email protected] [4124] What causes a server stub to remain in a RCVMSG state for an extended period? A server stub will be placed in a RCVMSG state when the IS server sends data back to the client over the network. It will remain in this state until the connection is closed by the client or if the OS tells courier that the network connection no longer exists. This means that any reason why the close is not received could cause this condition. Here are some application and network examples of why a connection may not be closed. Application Examples The client application is waiting for a user response before sending a close. The client application has a bug that does not always send a close. The client application received a error prior to closing the connection and the connection was not closed during its error handling. The client application aborted before closing the connection. Network Examples The client application did not received the data from the server for any reason. The client application send a close but, was not received by the server for any reason such as, the packet was dropped. Common CSMs RPC Example The eladisp sample below shows typical series of RPCs that will open a CSM connection, place it into RCVMSG state and then close it. A CSMs stub would be opened on the CSM:OpenCsumObject call. The CSMs state would be put into a RCVMSG when the IS server sends data back to the client over the network as a reply to the RPC request. The CSMs connection would be closed and removed from the cormon output when the CSM:CloseObject is received from the client. eladisp sample of a typical CSMs call $514 CSM:OpenCsumObject 12:54:31.150 0.00 Secs 0.00 KB NETID = [1add9e55,000000000000,0000] ------------------------------------------------------$842 CSM:ReadObject 12:54:33.850 0.00 Secs 65.54 KB NETID = [1add9e55,000000000000,0000] ------------------------------------------------------$843 CSM:ReadObject 12:54:33.850 0.00 Secs 22.77 KB NETID = [1add9e55,000000000000,0000] ------------------------------------------------------$844 CSM:CloseObject 12:54:33.850 0.00 Secs 0.00 KB NETID = [1add9e55,000000000000,0000]