Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Do's and Don'ts - Production Monitoring 1) Start checking for failures of any job, servers, services. 2) Login to Respective server 3) • Check for the CPU Utilization(threshold – Info-50%,Critical-70%,Fatal-90%) • Thread Pool Utilization (threshold – Info-50%, Critical-70%, Fatal-90%) • JVM/Memory Usage (threshold – Info-50%, Critical-70%, Fatal-90%) • JDBC Pool usage (threshold – Info-50,Critical-35,Fatal-25) If usage ~ threshold as Critical or Fatal, • On-call support to open the triage line • Add to read-out notes • Mail Windows/Linux team • Mail DBA team • Send outage notification to users and [email protected] 4) ProdSupport team to read-out the issue 5) Triage steps• Share findings and read-out notes with the triage team • Ask Windows/Linux team to look for errors in http server logs o • Ask DBA team to look for error, exceptions in data connections o 6) If errors found, triage those errors with Windows/Linux team If errors found, triage those errors with DBA team • Ask Unix team for the server utilization information and the recent set of system events • If no errors, o Notify users on the server restart o Prepare for server restart o Ask Windows/Linux team to back-up log files o Ask Windows/Linux team to restart in 5 min from now Check if the errors are resolved and application is available again 7) Send notification to users and [email protected] that application has been restored