Download Dos and donts Production Monitoring

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Do's and Don'ts - Production Monitoring
1)
Start checking for failures of any job, servers, services.
2)
Login to Respective server
3)
•
Check for the CPU Utilization(threshold – Info-50%,Critical-70%,Fatal-90%)
•
Thread Pool Utilization (threshold – Info-50%, Critical-70%, Fatal-90%)
•
JVM/Memory Usage (threshold – Info-50%, Critical-70%, Fatal-90%)
•
JDBC Pool usage (threshold – Info-50,Critical-35,Fatal-25)
If usage ~ threshold as Critical or Fatal,
•
On-call support to open the triage line
•
Add to read-out notes
•
Mail Windows/Linux team
•
Mail DBA team
•
Send outage notification to users and [email protected]
4)
ProdSupport team to read-out the issue
5)
Triage steps•
Share findings and read-out notes with the triage team
•
Ask Windows/Linux team to look for errors in http server logs
o
•
Ask DBA team to look for error, exceptions in data connections
o
6)
If errors found, triage those errors with Windows/Linux team
If errors found, triage those errors with DBA team
•
Ask Unix team for the server utilization information and the recent set of system events
•
If no errors,
o
Notify users on the server restart
o
Prepare for server restart
o
Ask Windows/Linux team to back-up log files
o
Ask Windows/Linux team to restart in 5 min from now
Check if the errors are resolved and application is available again
7)
Send notification to users and [email protected] that application has been
restored