Quick investigation during Application performance issues

Consider a critical application (Java Based) in production and you see noise all around during an outage. Bridge is opened and you see many people jumping on the call and asking questions “What is the issue?”. After short while the question changes to “What is the RCA of the issue”?

Among all these there is a person who is actually responsible for finding the root case of the issue and provide resolution. What all areas can he/she look at quickly? In such situation the following statistics gathering exercise could help.

Where could things go wrong?

Though, there are many other things to look at, a typical Java based application will fall in the above category.

Investigating Host

Let’s look at the host (Linux), considering you are logged in as a user who has permission to run the commands.

CPU Utilization

Run the ‘ top ‘ command

top_img

The data points to look at are:

  • The first line has two data points
    • Uptime – Since how long the server has been up since last reboot. Above its 78 days. Typically, if the server has been up for more than 180 days, consider rebooting the server.
    • Load average – Higher number indicates that the system is stressed. In most cases, if the value nears to twice the core on the host, you would notice queue build up and degraded performance.
  • The third line has information on CPU Utilization, showing distribution across user, system and idle.
  • The fourth line has information on Memory Utilization, which on Linux includes buffers and cached memory. So this is not the actual RAM utilization and if you see this 90% or 100%, don’t get alarmed.
  • The fifth line has Swap memory usage
  • The detail table has information on Virtual Memory, Resident Memory, Shared Memory, %CPU utilization, %Memory Utilization and command
    • VIRT (Virtual Memory) – The memory used by the process including Swap. If there is a native memory leak you could see the virtual memory growing
    • RES (Resident Memory) – The physical memory used by a process. It is the size of the actual pages present in the RAM.
      • In the above example the heap size has been defined as 8 GB but the Resident Memory used is 9.9 GB. A java process uses native memory which is in addition to the Heap Memory.
    • SHR (Shared Memory) – It is the memory that could be potentially shared with other processes.

Customize Screen Display

  • Refresh Interval – To change the interval of screen refresh, press ‘d’ and enter a numeric value ( Unit – sec )
  • Column Order – To change the order of the column displayed, press ‘o’ and follow the instruction on the screen to move the column left or right
  • Column Sort – By default the data displayed is sorted by %CPU. To change press ‘SHIFT o’ and follow the instruction on the screen
  • Individual CPU – To see the utilization of individual CPU utilization on screen press ‘1’

Memory Utilization

Run the command ‘ free –g ‘, the displayed data is in GB

linux_free_img

The total Memory is 62 GB. The cached is 48 GB. The actual utilization is 13 GB, even though the used column shows 62 GB. Refer to the “buffers/cache” rows for actual usage and free memory. The free memory is 48 GB.

Another way is to run the ‘ vmstat ‘ command

linux_vmstat_img

The memory shows is in KB.

The ‘vmstat’ command also gives information on the run queue, shown above in the first column. The column ‘r’ shows the number of processes waiting for run time. A high value would indicate that many processes are waiting for the resources to execute.

I/O Wait time

Run the command ‘ iostat ‘ to get the I/O statistics

linux_iostat_img

The “%iowait” shows the percentage of time that the CPUs were waiting for an outstanding disk I/O request to complete. In the event of slow response reading or writing data to disk, the %iowait time will increase.

Network Statistics

Run the ‘ netstat –ant ‘ command to capture the tcp connection information. There are different states of the connection such as ESTABLISHED, TIME_WAIT, CLOSE_WAIT, FIN_WAIT, FIN_WAIT2.

netstat –ant | grep LISTEN ‘ will provide all the listening connections. ‘ netstat –l ‘ command can also be used.

To find which program is associated with the connection, use the –p flag

netstat –anp

To get the packet level summary for different Network protocol run the command

netstat –s

linux_netstat_s_img

Disk Information

To check the disk space utilization run ‘ df –h’ command

linux_disk_img

To check the size of directories in a directory, navigate to the directory and run the command

du – sh *

 Ulimit Values

Run the command ‘ ulimit –a ‘ to check the user level settings

linux_ulimit_img

The key attributes to look at are:

  • Core file size – In the event of process crash, if you want the core file to be generated, set this value to ‘unlimited’
  • Open files – The maximum number of files that can be opened concurrently
  • Max user processes – The maximum number of processes that can exist concurrently

System Log

The system messages are written to the /var/log/messages file. Check the file for error messages generated by the system.

Users Information

Run the ‘ last ’ command to get information about when a user logged-in and logged-out of the system. This is useful to investigate in cases where an external connection to the host is occurring frequently and the user making the connection never logs out. It would show many logged-in user sessions.

Run the ‘ users ’ command which shows the currently logged-in users. Use the ‘ finger ’ command to retrieve more information about the user if available.

Leave a Reply

Your email address will not be published. Required fields are marked *

*
*