Troubleshooting overloaded databases

When a database is overloaded, users may report performance degradation or be unable to access NexJ CRM.

When a database is overloaded, database server and CPU utilization statistics are consistently high and database servers may begin to use swap space. Application server logs typically contain a high number of long running queues.

Spikes in the Processing time (ms): mean statistic may occur and the Connection availability (%) mean statistic may display a value of less than 100%. You can view statistics for the data source in the Statistics page in NexJ System Admin Console.

Note: If you are using an Oracle database, errors display in the Oracle alert.log file.

Typical causes

When using an Oracle database, overloaded databases are typically caused by:

An active Oracle database maintenance job that is running
An Oracle database backup that is in progress
A data import or export batch processes that is running, potentially in combination with the above conditions
A large number of CPU-intensive long running queues
An Oracle bug that has occurred
Oracle statistics that are out of date
Indexes that are heavily fragmented

Information to collect

Collect the following information to assist with troubleshooting:

Database server CPU, memory, and IO utilization graphs
Application server logs, including SystemOut.log
Long-living or resource-intensive SQL queries
Java thread dumps if issues with NexJ CRM are suspected
If your environment uses an Oracle database:
- Oracle Automatic Workload Repository (AWR) reports
- Oracle alert.log files

Troubleshooting

When troubleshooting the issue, engage your database administrator as soon as possible. Identify the source of the high database resource utilization, such as high CPU and I/O use. Consider whether it is due to:

A single long-running job
An influx of long running queues
A maintenance job running
A combination of the above issues

If your environment uses an Oracle database, collect and review hourly snapshots of Oracle Automatic Workload Repository (AWR) reports and Active Session History (ASH) reports from the database. Look for long-running or CPU-intensive SQL queries that are significant in numbers. Identify any long-living sessions that produce high CPU or I/O utilization. Also, review the alert.log file for errors and warnings

After collecting application logs from all nodes, review the logs for any database-related connectivity errors, connection pool warnings, or long running queues during the same time frame.