Troubleshooting overloaded databases

When a database is overloaded, users may report performance degradation or be unable to access NexJ CRM.

When a database is overloaded, database server and CPU utilization statistics are consistently high and database servers may begin to use swap space. Application server logs typically contain a high number of long running queues.

Spikes in the Processing time (ms): mean statistic may occur and the Connection availability (%) mean statistic may display a value of less than 100%. You can view statistics for the data source in the Statistics page in NexJ System Admin Console.

Note: If you are using an Oracle database, errors display in the Oracle alert.log file.

Typical causes

When using an Oracle database, overloaded databases are typically caused by:

  • An active Oracle database maintenance job that is running
  • An Oracle database backup that is in progress
  • A data import or export batch processes that is running, potentially in combination with the above conditions
  • A large number of CPU-intensive long running queues
  • An Oracle bug that has occurred
  • Oracle statistics that are out of date
  • Indexes that are heavily fragmented

Information to collect

Collect the following information to assist with troubleshooting:
  • Database server CPU, memory, and IO utilization graphs
  • Application server logs, including SystemOut.log
  • Long-living or resource-intensive SQL queries
  • Java thread dumps if issues with NexJ CRM are suspected
  • If your environment uses an Oracle database:
    • Oracle Automatic Workload Repository (AWR) reports
    • Oracle alert.log files

Troubleshooting

When troubleshooting the issue, engage your database administrator as soon as possible. Identify the source of the high database resource utilization, such as high CPU and I/O use. Consider whether it is due to:

  • A single long-running job
  • An influx of long running queues
  • A maintenance job running
  • A combination of the above issues

If your environment uses an Oracle database, collect and review hourly snapshots of Oracle Automatic Workload Repository (AWR) reports and Active Session History (ASH) reports from the database. Look for long-running or CPU-intensive SQL queries that are significant in numbers. Identify any long-living sessions that produce high CPU or I/O utilization. Also, review the alert.log file for errors and warnings

After collecting application logs from all nodes, review the logs for any database-related connectivity errors, connection pool warnings, or long running queues during the same time frame.