WebServer Investigation

The next layer to investigate is Web Server performance. I am considering IBM Http Server (I.H.S) for the investigation.

Enable Web Server Statistics Page

Edit httpd.conf file to enable server status module. Uncomment the below LoadModule if its commented out.

LoadModule status_module modules/mod_status.so
<IfModule mod_status.c>
ExtendedStatus On
</IfModule>

Configure who is allowed to view the server status page.

<IfModule mod_status.c>
<Location /server-status>
SetHandler server-status
Order deny,allow
Deny from all
Allow from all

Temporarily, you can allow “all” to access the page.

Access the page using URL http://servername/server-status

You will also see the information similar to below that could be useful.

Server uptime: 3 minutes 5 seconds
Total accesses: 351 – Total Traffic: 2.7 MB
CPU Usage: u.07 s.02 cu0 cs0 – .0486% CPU load
1.9 requests/sec – 14.9 kB/second – 7.8 kB/request
17 requests currently being processed, 33 idle workers

A table with the following information will also be available on the server status page

Srv Child Server number – generation
PID OS process ID
Acc Number of accesses this connection / this child / this slot
M Mode of operation
Module Module active
CPU CPU usage, number of seconds
SS Seconds since beginning of most recent request
Req Milliseconds required to process most recent request
Conn Kilobytes transferred this connection
Child Megabytes transferred this child
Slot Total megabytes transferred this slot

 

Compress the page data

In httpd.conf file, uncomment the mod_deflate module

LoadModule deflate_module modules/mod_deflate.so

Add the following lines inside <IfModule mod_deflate.c></IfModule>

<IfModule mod_deflate.c>

SetOutputFilter DEFLATE
BrowserMatch ^Mozilla/4 gzip-only-text/html
BrowserMatch ^Mozilla/4\.0[678] no-gzip
BrowserMatch \bMSI[E] !no-gzip !gzip-only-text/html
SetEnvIfNoCase Request_URI \
\.(?:gif|jpe?g|png|pdf)$ no-gzip dont-vary

</IfModule>

 

Enable Debug log

In httpd.conf file, change the log level to the desired level. The default level is warn.

LogLevel debug

The possible levels are – debug, info, notice, warn, error, crit, alert, emerg

 

Quick investigation during Application performance issues

Consider a critical application (Java Based) in production and you see noise all around during an outage. Bridge is opened and you see many people jumping on the call and asking questions “What is the issue?”. After short while the question changes to “What is the RCA of the issue”?

Among all these there is a person who is actually responsible for finding the root case of the issue and provide resolution. What all areas can he/she look at quickly? In such situation the following statistics gathering exercise could help.

Where could things go wrong?

Though, there are many other things to look at, a typical Java based application will fall in the above category.

Investigating Host

Let’s look at the host (Linux), considering you are logged in as a user who has permission to run the commands.

CPU Utilization

Run the ‘ top ‘ command

top_img

The data points to look at are:

  • The first line has two data points
    • Uptime – Since how long the server has been up since last reboot. Above its 78 days. Typically, if the server has been up for more than 180 days, consider rebooting the server.
    • Load average – Higher number indicates that the system is stressed. In most cases, if the value nears to twice the core on the host, you would notice queue build up and degraded performance.
  • The third line has information on CPU Utilization, showing distribution across user, system and idle.
  • The fourth line has information on Memory Utilization, which on Linux includes buffers and cached memory. So this is not the actual RAM utilization and if you see this 90% or 100%, don’t get alarmed.
  • The fifth line has Swap memory usage
  • The detail table has information on Virtual Memory, Resident Memory, Shared Memory, %CPU utilization, %Memory Utilization and command
    • VIRT (Virtual Memory) – The memory used by the process including Swap. If there is a native memory leak you could see the virtual memory growing
    • RES (Resident Memory) – The physical memory used by a process. It is the size of the actual pages present in the RAM.
      • In the above example the heap size has been defined as 8 GB but the Resident Memory used is 9.9 GB. A java process uses native memory which is in addition to the Heap Memory.
    • SHR (Shared Memory) – It is the memory that could be potentially shared with other processes.

Customize Screen Display

  • Refresh Interval – To change the interval of screen refresh, press ‘d’ and enter a numeric value ( Unit – sec )
  • Column Order – To change the order of the column displayed, press ‘o’ and follow the instruction on the screen to move the column left or right
  • Column Sort – By default the data displayed is sorted by %CPU. To change press ‘SHIFT o’ and follow the instruction on the screen
  • Individual CPU – To see the utilization of individual CPU utilization on screen press ‘1’

Memory Utilization

Run the command ‘ free –g ‘, the displayed data is in GB

linux_free_img

The total Memory is 62 GB. The cached is 48 GB. The actual utilization is 13 GB, even though the used column shows 62 GB. Refer to the “buffers/cache” rows for actual usage and free memory. The free memory is 48 GB.

Another way is to run the ‘ vmstat ‘ command

linux_vmstat_img

The memory shows is in KB.

The ‘vmstat’ command also gives information on the run queue, shown above in the first column. The column ‘r’ shows the number of processes waiting for run time. A high value would indicate that many processes are waiting for the resources to execute.

I/O Wait time

Run the command ‘ iostat ‘ to get the I/O statistics

linux_iostat_img

The “%iowait” shows the percentage of time that the CPUs were waiting for an outstanding disk I/O request to complete. In the event of slow response reading or writing data to disk, the %iowait time will increase.

Network Statistics

Run the ‘ netstat –ant ‘ command to capture the tcp connection information. There are different states of the connection such as ESTABLISHED, TIME_WAIT, CLOSE_WAIT, FIN_WAIT, FIN_WAIT2.

netstat –ant | grep LISTEN ‘ will provide all the listening connections. ‘ netstat –l ‘ command can also be used.

To find which program is associated with the connection, use the –p flag

netstat –anp

To get the packet level summary for different Network protocol run the command

netstat –s

linux_netstat_s_img

Disk Information

To check the disk space utilization run ‘ df –h’ command

linux_disk_img

To check the size of directories in a directory, navigate to the directory and run the command

du – sh *

 Ulimit Values

Run the command ‘ ulimit –a ‘ to check the user level settings

linux_ulimit_img

The key attributes to look at are:

  • Core file size – In the event of process crash, if you want the core file to be generated, set this value to ‘unlimited’
  • Open files – The maximum number of files that can be opened concurrently
  • Max user processes – The maximum number of processes that can exist concurrently

System Log

The system messages are written to the /var/log/messages file. Check the file for error messages generated by the system.

Users Information

Run the ‘ last ’ command to get information about when a user logged-in and logged-out of the system. This is useful to investigate in cases where an external connection to the host is occurring frequently and the user making the connection never logs out. It would show many logged-in user sessions.

Run the ‘ users ’ command which shows the currently logged-in users. Use the ‘ finger ’ command to retrieve more information about the user if available.