This month, I’ll present a few system tools that can be helpful when trying to diagnose your Linux system’s health, improve performance, and so on. This installment is intended for users who are newer to Linux, and who might not be familiar with, or aware of all the utilities that are already available at their fingertips.
I often feature tools that are not included by default on a Linux system in the Tool of the Month column, but this installment of Open Road will present some utilities that are part of a “standard” Linux install, or at least packaged and available for most Linux distros — whether they’re actually installed by default, or not. I’m specifically thinking of Debian here, since a minimal Debian install won’t include several of the utilities covered this month. Not to fear, though — they’re just an apt-get away!
Most Linux users are already comfortable using
free to check out what’s running on their systems, so I won’t spend any time on those utilities. Also, this isn’t a fully comprehensive guide to all useful Linux utilities, but I hope it will serve as a jumping-off point for Linux users who are still familiarizing themselves with advanced utilities.
Processor statistics with
Let’s start off with
mpstat. This utility will provide information about your system’s processor or processors, including CPU utilization by user-level applications, system-level applications, the number of interrupts received by CPUs, and the idle time for your CPU(s) (including idle time spent waiting for disk I/O).
The syntax is pretty simple. Running
mpstat by itself (or
mpstat 0) will display the averages for your processor since system startup. The display will look something like this:
Linux 2.4.26-1-686 (serenity.zonker.net) 05/18/04 08:11:41 CPU %user %nice %system %iowait %irq %soft %idle intr/s 08:11:41 all 0.12 0.00 0.03 0.00 0.00 0.00 99.85 101.43
It probably goes without saying that the first field is the time that
mpstat ran. Because I ran
mpstat without specifying the CPU on which I wanted statistics, it simply shows global statistics. In this case, the system in question has only one CPU anyway, so there’s no point in specifying the CPU. You can specify CPU using the
mpstat option, where
n is the CPU number. Note that
mpstat starts counting at 0 instead of 1.
%idle values should already be familiar from
top. The value that is particularly interesting to most admins is
%iowait, which shows how much time the CPU(s) spend idle waiting for disk I/O. Obviously, this system isn’t terribly busy, so the disk I/O isn’t a big bottleneck here. However,
mpstat can be useful in finding out whether your CPUs are waiting on reads from disk.
If you want to see real-time statistics, this is also possible. Let’s say you want to see CPU statistics at 1 second intervals. You can run
mpstat 1 and then you’ll get the same readouts, except that they will reflect the current state of the system rather than the aggregated statistics since system startup. If you only want to see readings for a limited time, say one minute, you can run
mpstat 1 60. This will update the statistics 60 times at one-second intervals.
Earlier versions of
mpstat may show different information. The version used here is 5.0.3, taken from the Debian testing repository.
Virtual memory statistics with
Next on the list is
vmstat. As the name implies, this utility reports virtual memory statistics. Running
vmstat with no options produces the following output:
procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu---- r b swpd free buff cache si so bi bo in cs us sy id wa 0 0 8 8040 26728 300256 0 0 51 37 9 39 1 1 97 1
There’s quite a bit of info crammed into
vmstat‘s output. The first two fields describe processes waiting for runtime, and processes that are in uninterruptible sleep, respectively. The next four fields cover the amount of virtual memory in use (
swpd), free memory, memory used as buffers, and memory being used as cache. The next two fields show how much memory is being swapped in and out of disk per second.
The next two fields (under
io) show blocks received from block devices and blocks sent to block devices. The fields under
system display interrupts per second, and context switches per second. Finally, the fields under
cpu display time running user-level code (
us), system code, time spent idle, and time spent waiting for I/O. (Yes, there is some overlap between utilities.)
That’s just the default output, however. Using
vmstat, it’s possible to drill down a bit deeper and look at other information. For example, the
-p option allows the user to specify a partition to display detailed statistics about a given partition.
vmstat -d displays disk statistics. If you’d like a one-time display of event counters and memory stats, use
vmstat -s -S M, which produces output like this:
377 M total memory 373 M used memory 239 M active memory 105 M inactive memory 3 M free memory 20 M buffer memory 303 M swap cache 760 M total swap 0 M used swap 760 M free swap 499224 non-nice user cpu ticks 6027 nice user cpu ticks 134899 system cpu ticks 31925112 idle cpu ticks 172288 IO-wait cpu ticks 32025 IRQ cpu ticks 74892 softirq cpu ticks 16745809 pages paged in 12265466 pages paged out 0 pages swapped in 2 pages swapped out 350426106 interrupts 13250775 CPU context switches 1084569411 boot time 16563 forks
-s option tells
vmstat to produce the table output; the
-S option tells
vmstat to put memory statistics in megabytes.
vmstat -m will give you the slabinfo output (taken from
/proc/slabinfo), which is more information about cached objects in the Linux kernel than most folks need (or want…). But it’s there if you need it. Speaking of slabinfo, the
slabtop utility can be used to produce a
top-like display of kernel slab information.
Note that some of
vmstat‘s options only work with kernels newer than 2.5.70.
CPU and I/O statistics with
Next up is
vmstat, this utility displays information about your system’s CPU and I/O, though in a different format and with some different information.
The default output of
iostat is the average CPU and device utilization since system startup. Running
iostat n (where n is an interval in seconds) will produce a display of device and CPU utilization since the last report. For example,
iostat 5 produces the following:
Linux 2.6.4-52-default (yggdrasil) 05/18/2004 avg-cpu: %user %nice %sys %iowait %idle 1.54 0.02 0.74 0.53 97.18 Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn hda 2.15 101.72 75.06 33511064 24729224 hdc 0.00 0.02 0.00 7656 0 avg-cpu: %user %nice %sys %iowait %idle 5.20 0.00 2.40 0.20 92.20 Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn hda 6.20 30.40 84.80 152 424 hdc 0.00 0.00 0.00 0 0
The first display is the average since system boot; the second is current. If you’d like to display only CPU usage, the
-c option can be used to display only the CPU report. The
-d option is used to display only disk utilization.
If you’d like slightly more readable output, you can use the
-k option to display reads in kilobytes rather than blocks. There are several other options worth looking into; be sure to check the
iostat manpage for all the options. As with the other utilities, some
iostat functionality is dependant on having a 2.5 or newer kernel.
Getting system activity information with
The next utility is
sar. This utility can be used to display system activity information, or to collect information for further study. The information collated/displayed by
sar is largely available using the other utilities that I’ve already discussed.
sar also displays a wealth of information not available through the other utilities. Running
sar -A, for example, will display just about every relevant piece of information you’d want about your system since midnight the current day. I won’t paste the output here, as it’s quite lengthy. Give it a try on your own system, though.
If your system isn’t collecting the data, the command will fail with something like “Cannot open /var/log/sa/sa05: No such file or directory.” In this case, you’ll need to enable logging using
sadc (the System activity data collector). I believe that most Linux distros include an init script for this, though it might not be enabled by default.
Having consistent system troubles and not quite sure what’s causing them? This is where
sar shines, because it produces output that identifies system load by time. Let’s say you have a system that bogs or even fails every night, but you’re not quite sure why. Maybe you suspect that disk I/O is killing your system, so you run
sar -b, which displays the I/O and transfer rate statistics in ten-minute intervals, like this:
11:00:00 AM tps rtps wtps bread/s bwrtn/s 11:10:00 AM 0.08 0.00 0.08 0.01 1.09 11:20:00 AM 0.09 0.00 0.09 0.00 1.31 11:30:00 AM 0.13 0.00 0.13 0.00 2.06 11:40:00 AM 0.17 0.01 0.16 0.34 2.85
That’s just a small sample of the output, of course. This shows the transfers per second to disk, reads per second from disk, write requests per second (look to see whether
wtps is outstripping
tps), and blocks read and written to your drives.
Another handy use for
sar is to display network statistics. Running
sar -n FULL will produce a full report of network statistics, including errors and sockets in use.
To start at a specific time, you can use the
-s option. For example, if you only want statistics since 6 AM, run
You can also use
sar to display system load queues (
-q), memory and swap space utilization (
-r), and a number of other statistics about your system. It definitely pays to spend a little time reading the
Visualizing system activity with
If all of the textual output doesn’t do much for you, there’s
isag. This little utility produces visual graphs of system activity that makes it easy to visualize what’s going on with your system. It also utilizes data produced by
sadc. It expects to find files under
/var/log/sa in the format
nn is the day of the month. On my system, the files are in the format
sa.YYYY_MM_DD instead, so I have to use
isag -m sa.2004_[0-9][0-9]_[0-9][0-9] to let
isag know what “mask” to expect for the datafiles.
Once you start
isag, you can choose the datafile you want to display and then the type of chart you want to see. There are 10 different charts displayed by
isag, including I/O transfer rate, CPU utiliztion, inode status, and paging statistics. Need to produce visual proof that a system is too heavily loaded to get budget approval to buy a new one? Show your boss and the beancounters a nice colorful graph produced by
isag that shows the system is spending too much time swapping data to disk.
|Figure 1: isag’s output|
Figure 1 is an example of
isag‘s output, displaying memory and swap used.
As you can see, it’s much easier to grok what’s going on when you produce an easy-to-read graph of the data. Using
isag, it’s also easy for me to switch between daily graphs to compare daily averages as well as hourly ones. It’s a much faster way to pick out trends in system usage from the data collected by
There’s plenty of information available for system troubleshooting if you know where to find it. With these tools at your disposal, you’ll be able to learn a lot more about your system and its performance.
(Originally written for and published on UnixReview.com, which is now defunct. Revived from Archive.org b/c it still seems useful.)