Checking Block IO in Linux

For many workloads, performance in current systems is most often limited by IO. Be it memory IO as looked at in Linux and Memory or be it file system IO. The file system strongly depends on the underlying block device(s).

Commonly you can choose two of the following  three attributes installing a new storage device: reliability <-> speed <-> prize

  • A reliable and fast device will have a high prize tag
  • A fast and cheap storage will probably be not as reliable as you would like
  • And a cheap and reliable device will most likely be not very fast.

Whatever solution you choose, there might still be some space left to tune the available bandwidth to the storage. The hardware you buy is of course setting limits, but it does not guarantee speed.

The best benchmark you can use for your installation is running the workload that the system is planned for. However, there are often many different tunables to the complete system that affect each other in one or the other way and trying to find the optimal solution can be quite a task. A number of tunables is found in in the documentation to the Real Time Extension for SLES. I normally prefer to go for smaller building blocks like memory or block IO or maybe CPU configurations. When running linux, there are several tools that did serve me well when doing IO tests. the following tools are not so much meant as benchmarking but more as a measure of how the system is performing.

iostat

One of the tools I like to use when looking at block IO performance is “iostat”. For example, on my desktop machine which is setup as raid1 for all of the different partitions, it looks like this:

# iostat -x 2 2
Linux 2.6.37.1-1.2-desktop (example) 	03/04/2011 	_x86_64_	(4 CPU)

Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s avgrq-sz avgqu-sz   await  svctm  %util
sda               0.55     0.40    0.10    1.06     9.74   125.15   115.93     0.04   35.56   8.47   0.99
sdb               0.77     0.42    0.31    1.05    65.57   125.15   140.29     0.15  108.85   7.29   0.99
md1               0.00     0.00    0.01    0.03     0.06     0.22     8.00     0.00    0.00   0.00   0.00
md2               0.00     0.00    0.16    0.17     9.91     2.86    38.85     0.00    0.00   0.00   0.00
scd0              0.00     0.00    0.00    0.00     0.00     0.00     6.70     0.00   38.38  38.36   0.00
md0               0.00     0.00    0.02    0.00     0.18     0.00     7.49     0.00    0.00   0.00   0.00
md3               0.00     0.00    0.45    0.54    56.39   120.95   180.39     0.00    0.00   0.00   0.00

Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s avgrq-sz avgqu-sz   await  svctm  %util
sda               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdb               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
md1               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
md2               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
scd0              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
md0               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
md3               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00

The command first shows the accumulated stats and when running it with an interval, it then shows the current values. “-x” is for extended stats which include utilization of the block device. When running a bonnie in background, it looks a bit different. Depending on the phase that the bonnie just runs, there are more write or more read operations. Here are two examples how this looks:

Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s avgrq-sz avgqu-sz   await  svctm  %util
sda               0.00   526.50    3.00  180.50    56.00 170103.00   927.30    34.97  196.19   5.42  99.40
sdb               0.00   531.00    1.00  167.50    20.00 161779.00   960.23   139.02  805.91   5.93 100.00
md1               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
md2               0.00     0.00    4.00    0.50    76.00     1.00    17.11     0.00    0.00   0.00   0.00
scd0              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
md0               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
md3               0.00     0.00    0.00  680.00     0.00 164000.00   241.18     0.00    0.00   0.00   0.00
....................
Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s avgrq-sz avgqu-sz   await  svctm  %util
sda               0.00   243.00    0.00  101.00     0.00 82494.50   816.78     6.06   60.06   3.58  36.20
sdb             380.50   259.50  375.50   80.00 96768.00 77898.50   383.46   143.99  276.78   2.20 100.00
md1               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
md2               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
scd0              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
md0               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
md3               0.00     0.00  756.00    0.00 96768.00     0.00   128.00     0.00    0.00   0.00   0.00

There is quite a number of options to that command, and there is a nicely written manual page available. If you want to run that command continuously, just give it an interval as argument but not the count. “iostat” is available in the sysstat package.

vmstat

There is a second tool that I found quite interesting, called “vmstat”. There you can see more of the internals of the kernel. To look at disk IO, use the command “vmstat -d”:

# vmstat -d
disk- ------------reads------------ ------------writes----------- -----IO------
       total merged sectors      ms  total merged sectors      ms    cur    sec
sda   141952 178179 29577018  477693 237884 300621 90710884 44753954      0   2022
sda1    1838  49969  413212    4300    243      0     405    8421      0     10
sda2    1177  30730  255088    3374    904  21316  175002  128632      0     93
sda3   13692   1967  899900  145632  98957   1421  529386 1283316      0   1033
sda4  124979  95338 28005296  324132 137715 277884 90006091 43333084      0   1101
sdb   172348 201881 36728314  678824 235909 302600 90710884 73151033      0   2040
sdb1    1845  50399  417206    4300    243      0     405    9682      0     12
sdb2    1151  29726  246895    3768    890  21330  175002  136778      0    105
sdb3   12190   1387  838756  245109  98955   1423  529386 1388551      0   1092
sdb4  156880 119698 35217839  425346 135752 279847 90006091 71615593      0   1168
md1     1220      0    9754       0  21807      0  174440       0      0      0
md2    29177      0 1738453       0  26124      0  431345       0      0      0
sr0       77     52     516    2955      0      0       0       0      0      2
md0     3508      0   26650       0    101      0     204       0      0      0
md3   496707      0 63222677       0 378971      0 89932831       0      0      0

This command may also be run in continuous mode with just putting a timing interval after the command. However, for me this becomes confusing quite quickly because of the amount of data generated. To see what happens, I normally do a simple grep to the devices that I am interested in. The following output is done while running a bonnie on the same system:

# vmstat -d 2 | (head -n2; grep -e md3 -e sda4 -e sdb4)
disk- ------------reads------------ ------------writes----------- -----IO------
       total merged sectors      ms  total merged sectors      ms    cur    sec
sda4  124979  95338 28005296  324132 137943 278400 90118575 43341665      0   1103
sdb4  156880 119698 35217839  425346 135982 280362 90119567 71622604      0   1170
md3   496707      0 63222677       0 379699      0 90105057       0      0      0
sda4  124979  95338 28005296  324132 138341 279733 90502993 43483155      0   1105
sdb4  156880 119698 35217839  425346 136349 281695 90473169 71762657      0   1172
md3   496707      0 63222677       0 382386      0 90752849       0      0      0
sda4  124979  95338 28005296  324132 138745 280943 90901549 43699088      0   1107
sdb4  156880 119698 35217839  425346 136763 282905 90882421 72024008      0   1174
md3   496707      0 63222677       0 383780      0 91089049       0      0      0
sda4  124979  95338 28005296  324132 139132 282096 91282987 43927977      0   1109
sdb4  156880 119698 35217839  425346 137115 284058 91228331 72272559      0   1176
md3   496707      0 63222677       0 385346      0 91466262       0      0      0

In the above example, you can see, that the merged written sectors are increasing rapidly, while nothing happens on the reading side. This is because bonnie was just doing the char write benchmark. The “vmstat” command is available from the “procps” package.

Advertisements
This entry was posted in block devices, Hardware, openSUSE. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s