Troubleshooting overloaded databases
When a database is overloaded, database server and CPU utilization statistics are consistently high and database servers may begin to use swap space. Application server logs typically contain a high number of long running queues.
Spikes in the Processing time (ms): mean statistic may occur and the Connection availability (%) mean statistic may display a value of less than 100%. You can view statistics for the data source in the Statistics page in NexJ System Admin Console.
Typical causes
When using an Oracle database, overloaded databases are typically caused by:
- An active Oracle database maintenance job that is running
- An Oracle database backup that is in progress
- A data import or export batch processes that is running, potentially in combination with the above conditions
- A large number of CPU-intensive long running queues
- An Oracle bug that has occurred
- Oracle statistics that are out of date
- Indexes that are heavily fragmented
Information to collect
- Database server CPU, memory, and IO utilization graphs
- Application server logs, including SystemOut.log
- Long-living or resource-intensive SQL queries
- Java thread dumps if issues with NexJ CRM are suspected
- If your environment uses an Oracle database:
- Oracle Automatic Workload Repository (AWR) reports
- Oracle alert.log files
Troubleshooting
When troubleshooting the issue, engage your database administrator as soon as possible. Identify the source of the high database resource utilization, such as high CPU and I/O use. Consider whether it is due to:
- A single long-running job
- An influx of long running queues
- A maintenance job running
- A combination of the above issues
If your environment uses an Oracle database, collect and review hourly snapshots of Oracle Automatic Workload Repository (AWR) reports and Active Session History (ASH) reports from the database. Look for long-running or CPU-intensive SQL queries that are significant in numbers. Identify any long-living sessions that produce high CPU or I/O utilization. Also, review the alert.log file for errors and warnings
After collecting application logs from all nodes, review the logs for any database-related connectivity errors, connection pool warnings, or long running queues during the same time frame.