Migrating a Xen VM to KVM on openSUSE

Xen and KVM are the two major virtualization techologies that are freely available on linux. Although they are quite comparable performance wise, it still may be interesting to convert a Xen virtual machine to a KVM virtual machine.

Xen and KVM both use very similar images. However, there are some subtle differences in the setup:

  1. Xen block devices use the names “xvd?” where KVM uses “vd?”.
  2. The serial device in Xen is “xvc0” while on KVM it is “ttyS0”.
  3. Xen does not use the bootloader from the image but directly accesses the boot directory while KVM really uses the bootmanager.
  4. The modules that are needed for block devices are different.
  5. Although virsh supports both, Xen and KVM, the XML configuration is still somewhat different.

The easiest way would be to just install the necessary packages and do the needed modifications on a running Xen guest, however, if you don’t have your Xen host anymore, you would be busted. Therefore, lets do the migration of an image just on the KVM host.

First, make the image accessible with “kpartx”. To do this run the command

> kpartx -a disk0.raw -v
add map loop0p1 (253:1): 0 319488 linear /dev/loop0 2048
add map loop0p2 (253:2): 0 16435200 linear /dev/loop0 321536

Now, determine which one is a real file system:

> lsblk -f /dev/mapper/loop0p?
NAME           FSTYPE LABEL MOUNTPOINT
loop0p1 (dm-1) swap
loop0p2 (dm-2) ext3

Obviously the device “/dev/mapper/loop0p2” is our root file system that we need to access. Lets mount it and add all the needed devices:

mount /dev/mapper/loop0p2 /mnt
mount -o bind /dev /mnt/dev

Now, copy the needed kernel to the file system and do a “chroot” there:

cp kernel-default.rpm kernel-default-base.rpm /mnt/tmp
chroot /mnt
mount /sys
mount /proc

Next, update several configuration files:

  1. /etc/inittab : comment the line starting with S0 and containing xvc0
  2. /etc/inittab : uncomment line starting with S0 and containing ttyS0. Change the speed to 115200 if needed.
  3. /etc/securetty : remove xvc0 and add ttyS0
  4. /etc/sysconfig/kernel : remove modules starting with xen from “INITRD_MODULES” and add “virtio_blk virtio” instead.
  5. /etc/fstab : remove the “x” from “/dev/xvda” (and possibly more needed block devices)
  6. /boot/grub/device.map : change from “/dev/xvda” to “/dev/vda”
  7. /boot/grub/menu.lst :  comment line starting with gfxmenu
  8. /boot/grub/menu.lst : change the kernel and initrd lines to contain the kernel starting with “vmlinuz” and the default initrd as available in “/boot”.
  9. /boot/grub/menu.lst : fix the kernel parameters to contain the right root and console device, similar to: “root=/dev/vda2 console=ttyS0”.

Now, it is time to install the kernel:

rpm -Uhv /root/kernel-default.rpm /root/kernel-default-base.rpm

The only remaining task now is running “mkinitrd”. There will show up some error messages about not having the right root device available, which is correct. But the command commonly will work anyway.

To finish the work on the image, only some cleanup is needed:

  1. umount /sys
  2. umount /proc
  3. exit
  4. umount /mnt/dev
  5. umount /mnt
  6. kpartx -d disk0.raw

To start the image, the easiest way is to use “vm-install” and select activating an existing image “I have a disk or disk image …”. If it is just for testing, you can also use a command link this:

qemu-kvm \
-drive file=/kvm/images/disk0.raw,id=root,if=virtio \
-m 1024M -nographic

This should bring up your previous Xen image on a KVM machine.

Posted in KVM, openSUSE, Xen | Leave a comment

Xen and serial console over IPMI (SOL)

Recently I had to configure a serial over lan (SOL) console on a bigger Supermicro server (2042G-TRF for those who are interested) that should run Xen. This turned out to be not too easy and several issues had to be resolved.

The first thing I had to do was an upgrade of the BIOS. The original BIOS shared the IPMI IRQ with serial console0. This resulted in scrambled console output, regardless what I tried to do. Current BIOS versions put IPMI on IRQ 5. Before you try anything, make sure that you do not share IRQs between IPMI and a real serial console.

Then I tried to add COM3 to the console list of Xen. Unfortunately, there is no warning that Xen supports only two serial consoles, which are COM1 and COM2. The console just won’t work. Luckily, there is this nice manual to be found at:

/usr/share/doc/packackages/xen/pdf/user.pdf

In there, there are lots of parameters, and also a somewhat sparse description of how to setup the serial console. It turned out, that I had to reconfigure COM2 (likely COM1 would also have worked) to different IO and IRQ. To get the current values of IO-Port and IRQ, either look into the BIOS and write down the values there, or run a default kernel without Xen and run the following command:

dmesg | grep ttyS2

The result should look similar to this:

<6>[   10.232117] 00:09: ttyS2 at I/O 0x3e8 (irq = 5) is a 16550A

This means, we need to set COM2 to IO-Port 0x3e8 and to IRQ 5. The only thing that is now missing is the serial line speed and mode of the serial connection. In my case, I chose a baud rate of 115200,8n1 for the connection.

Now lets put all of this together. First is the grub configuration. This is twofold. Part one is configuring grub in a way that it is also displayed in the serial console. Commonly, this is done in the global section at the beginning of /boot/grub/menu.lst :

serial --unit=2 --speed=115200
terminal --timeout=8 serial console

Unit 2 specifies the third (start counting at 0) serial console.

Part two of the grub configuration affects the Xen section. There, add the console parameters like this:

kernel (hd0,0)/xen.gz console=vga,com2 com2=115200,8n1,0x3e8,5
module (hd0,0)/vmlinuz-2.6.32.43-0.13-xen root=/dev/md1 console=tty0 console=xvc0,115200

In my case, we have root on a mirrored raid, you just have to add the console parameters and com2 parameter to the configuration file. Note that the last console that is in the module line is considered the system console by the kernel.

To make all of this permanent and survive the next kernel update, also add the parameters to /etc/sysconfig/bootloader:

XEN_KERNEL_APPEND="console=tty0 console=xvc0,115200"
XEN_APPEND="console=vga,com2 com2=115200,8n1,0x3e8,5"

Again, your options may vary.

The last configuration to be made is activating a getty for xvc0. This is accomplished in /etc/inittab. Search for a line starting with ‘#S0’. There add a line like the following:

S0:12345:respawn:/sbin/agetty -L 115200 xvc0 vt102

After doing all of this, you are ready to test your SOL console. For me, the following command works nicely:

ipmitool -I lanplus -H <IPMI-IP-Adress> -U <IPMI-User> sol activate
Posted in IPMI, openSUSE, SOL, Xen | 3 Comments

iTunes on openSUSE 11.4

In order to use iTunes University, I had to install iTunes on my notebook that is running openSUSE 11.4. It is quite easy to get some old version of iTunes from http://www.oldapps.com/itunes.php but trying to install it on a 64bit machine with wine always failed.

In the end, I could install a 32bit version by doing the following:

rm -rf ~/.wine #(don't do this if you have other wine applications installed, it basically kills any previous wine installation)
export WINEARCH=win32
wine ~/Downloads/iTunesSetup1021.exe

To start iTunes later on, it is sufficient to run the following command:

wine c:/Program\ Files/iTunes/iTunes.exe

Unfortunately, with most of the available courses on iTunes University, I get conversion errors. To circumvent this, I just download the respective courses and view them with mplayer:

cd ~/Music/iTunes/iTunes Media/iTunes U/
mplayer -vo xv <course file>

This is not too nice, but at least it allows me to view iTunes U courses without the need for an operating system I do not own.

Posted in iTunes, openSUSE, wine | 3 Comments

Simple udev Manipulations

When discovering the /sys file system, one can find quite a number of tunables that look interesting and really have effect on system behavior. Especially when experimenting with different values of read ahead for block devices or e.g. setting the stripe_cache_size for raid5, the performance for my software raid was improved a lot.

To make that settings permanent, one can try and go for /etc/init.d/boot.local. More elegant however is to add some udev rule that does the necessary changes for you. It turns out, that this is quite easy. For example, the read ahead for block devices is found in /sys/block/*/bdi/read_ahead_kb. To check your current read ahead of /dev/sda, you may use:

cat /sys/block/sda/bdi/read_ahead_kb

However you may also want to change that setting. To accomplish that, just echo the desired number into the read ahead:

echo 4100 > /sys/block/sda/bdi/read_ahead_kb

To automatically do the setting when the device appears, just add the following file to /etc/udev/rules.d/83-ra.rules:

cat /etc/udev/rules.d/83-ra.rules
# increase readahead for sd* devices
ACTION=="add", KERNEL=="sd*", ATTR{bdi/read_ahead_kb}="4100"

When adding a udev rule for this, one just needs to know that “==” (two equal signs) is a check, and “=” (one equal sign) is a setting.

Note, that some devices may take the read ahead from the underlying device; noteably when using drbd, setting the read ahead on the underlying device will have the desired effect.

Posted in block devices, Hardware, openSUSE | Leave a comment

Xen – Removing Virtual Frame Buffer

I recently had to remove the frame buffer device from a Xen guest for debugging purposes. The frame buffer configuration in the sxp file looks like this:

(device
    (vfb
        (vncunused 1)
        (uuid ????????-????-????-????-??????????)
        (keymap en)
        (location 127.0.0.1:5909)
        (type vnc)
        (xauthority /root/.Xauthority)
    )
)

However after removing this section from the configuration the guest would not boot anymore. It just hung using 100% CPU and did not come up. Google was not a big help, and I had to experiment a little bit myself. In the end, it turned out, that one also has to remove the virtual keyboard configuration which looks like this:

(device (vkbd (uuid ????????-????-????-????-??????????) (backend 0)))

After removing  this section as well, the guest came up again, and I even could still use the serial console with xm console.

Posted in openSUSE, Xen | Leave a comment

openSUSE on HPPA

This might be surprising for one or the other, but I do have a big part of openSUSE:Factory built for hppa. What I have is a dedicated build service machine that does nothing but running a hppa build service and also 5 workers that do the build jobs.

The system is setup from the buildservice appliance which runs the machine from a USB stick. Else there is attached some local storage for all the sources and packages built. For more about this appliance, see openSUSE:Build Service Appliance.

To setup the system, a source project is needed as well as a base system that enables the different workers to do some work. This looks like the following:

<project name="HP-Factory">
  <description>
  </description>
  <link project="openSUSE.org:openSUSE:Factory"/>
  <person role="maintainer" userid="Admin"/>
  <person role="bugowner" userid="Admin"/>
  <debuginfo>
    <disable/>
  </debuginfo>
  <build>
    <enable/>
  </build>
  <publish>
    <disable/>
    <disable repository="standard" arch="hppa"/>
  </publish>
  <repository rebuild="direct" linkedbuild="all" name="standard">
    <path repository="standard" project="HP-Base"/>
    <arch>hppa</arch>
  </repository>
</project>

This means:

  • All the sources are taken from openSUSE.org from the project openSUSE:Factory. In other words, there is no need for me to care about updating packages other than those that I want to have different on my distribution.
  • Publishing is not needed to run the build service. All the binaries can still be retrieved with osc getbinaries, and they are also found in
    /obs/build/HP-Factory/standard/hppa
  • There is a base project named HP-Base, that is needed to setup the chroot systems on my workers. This is not needed anymore when the packages are available in HP-Factory, but whenever one has to inject a package, this is the place where to accomplish this task.

Now I had the problem, that for some reason the package bash did not build correctly, and could not be used by the info package anymore. As you might imagine, this breaks any progress in building for this distribution. The solution to this is:

  • Build bash manually on some worker. I like to keep some chroot system for manual builds with current packages. There one can use rpmbuild -ba to build the package manually.
  • Copy the resulting package to the buildservice appliance below
    /obs /build/HP-Base/standard/hppa/:full
  • On the build service appliance, run
    obs_admin –rescan-repository HP-Base standard hppa

After this, the build service starts its work again. One final note: I do this distribution only as a hobby, and it it is quite likely that this will never be published. If you are interested in this, distribution, I will be happy to give you any packages that I got so far. Please do not expect too much; you might be able to install everything if you can do all of it manually. There is no installation or configuration support by tools in place.

Of course it is also difficult for my view workers to build openSUSE:Factory. Commonly, I would need something like a week without checkins to be able to fully build those packages that can build. However, this does not happen too often.

Posted in Build Service, HPPA, openSUSE | Leave a comment

Checking Block IO in Linux

For many workloads, performance in current systems is most often limited by IO. Be it memory IO as looked at in Linux and Memory or be it file system IO. The file system strongly depends on the underlying block device(s).

Commonly you can choose two of the following  three attributes installing a new storage device: reliability <-> speed <-> prize

  • A reliable and fast device will have a high prize tag
  • A fast and cheap storage will probably be not as reliable as you would like
  • And a cheap and reliable device will most likely be not very fast.

Whatever solution you choose, there might still be some space left to tune the available bandwidth to the storage. The hardware you buy is of course setting limits, but it does not guarantee speed.

The best benchmark you can use for your installation is running the workload that the system is planned for. However, there are often many different tunables to the complete system that affect each other in one or the other way and trying to find the optimal solution can be quite a task. A number of tunables is found in in the documentation to the Real Time Extension for SLES. I normally prefer to go for smaller building blocks like memory or block IO or maybe CPU configurations. When running linux, there are several tools that did serve me well when doing IO tests. the following tools are not so much meant as benchmarking but more as a measure of how the system is performing.

iostat

One of the tools I like to use when looking at block IO performance is “iostat”. For example, on my desktop machine which is setup as raid1 for all of the different partitions, it looks like this:

# iostat -x 2 2
Linux 2.6.37.1-1.2-desktop (example) 	03/04/2011 	_x86_64_	(4 CPU)

Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s avgrq-sz avgqu-sz   await  svctm  %util
sda               0.55     0.40    0.10    1.06     9.74   125.15   115.93     0.04   35.56   8.47   0.99
sdb               0.77     0.42    0.31    1.05    65.57   125.15   140.29     0.15  108.85   7.29   0.99
md1               0.00     0.00    0.01    0.03     0.06     0.22     8.00     0.00    0.00   0.00   0.00
md2               0.00     0.00    0.16    0.17     9.91     2.86    38.85     0.00    0.00   0.00   0.00
scd0              0.00     0.00    0.00    0.00     0.00     0.00     6.70     0.00   38.38  38.36   0.00
md0               0.00     0.00    0.02    0.00     0.18     0.00     7.49     0.00    0.00   0.00   0.00
md3               0.00     0.00    0.45    0.54    56.39   120.95   180.39     0.00    0.00   0.00   0.00

Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s avgrq-sz avgqu-sz   await  svctm  %util
sda               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdb               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
md1               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
md2               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
scd0              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
md0               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
md3               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00

The command first shows the accumulated stats and when running it with an interval, it then shows the current values. “-x” is for extended stats which include utilization of the block device. When running a bonnie in background, it looks a bit different. Depending on the phase that the bonnie just runs, there are more write or more read operations. Here are two examples how this looks:

Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s avgrq-sz avgqu-sz   await  svctm  %util
sda               0.00   526.50    3.00  180.50    56.00 170103.00   927.30    34.97  196.19   5.42  99.40
sdb               0.00   531.00    1.00  167.50    20.00 161779.00   960.23   139.02  805.91   5.93 100.00
md1               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
md2               0.00     0.00    4.00    0.50    76.00     1.00    17.11     0.00    0.00   0.00   0.00
scd0              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
md0               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
md3               0.00     0.00    0.00  680.00     0.00 164000.00   241.18     0.00    0.00   0.00   0.00
....................
Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s avgrq-sz avgqu-sz   await  svctm  %util
sda               0.00   243.00    0.00  101.00     0.00 82494.50   816.78     6.06   60.06   3.58  36.20
sdb             380.50   259.50  375.50   80.00 96768.00 77898.50   383.46   143.99  276.78   2.20 100.00
md1               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
md2               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
scd0              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
md0               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
md3               0.00     0.00  756.00    0.00 96768.00     0.00   128.00     0.00    0.00   0.00   0.00

There is quite a number of options to that command, and there is a nicely written manual page available. If you want to run that command continuously, just give it an interval as argument but not the count. “iostat” is available in the sysstat package.

vmstat

There is a second tool that I found quite interesting, called “vmstat”. There you can see more of the internals of the kernel. To look at disk IO, use the command “vmstat -d”:

# vmstat -d
disk- ------------reads------------ ------------writes----------- -----IO------
       total merged sectors      ms  total merged sectors      ms    cur    sec
sda   141952 178179 29577018  477693 237884 300621 90710884 44753954      0   2022
sda1    1838  49969  413212    4300    243      0     405    8421      0     10
sda2    1177  30730  255088    3374    904  21316  175002  128632      0     93
sda3   13692   1967  899900  145632  98957   1421  529386 1283316      0   1033
sda4  124979  95338 28005296  324132 137715 277884 90006091 43333084      0   1101
sdb   172348 201881 36728314  678824 235909 302600 90710884 73151033      0   2040
sdb1    1845  50399  417206    4300    243      0     405    9682      0     12
sdb2    1151  29726  246895    3768    890  21330  175002  136778      0    105
sdb3   12190   1387  838756  245109  98955   1423  529386 1388551      0   1092
sdb4  156880 119698 35217839  425346 135752 279847 90006091 71615593      0   1168
md1     1220      0    9754       0  21807      0  174440       0      0      0
md2    29177      0 1738453       0  26124      0  431345       0      0      0
sr0       77     52     516    2955      0      0       0       0      0      2
md0     3508      0   26650       0    101      0     204       0      0      0
md3   496707      0 63222677       0 378971      0 89932831       0      0      0

This command may also be run in continuous mode with just putting a timing interval after the command. However, for me this becomes confusing quite quickly because of the amount of data generated. To see what happens, I normally do a simple grep to the devices that I am interested in. The following output is done while running a bonnie on the same system:

# vmstat -d 2 | (head -n2; grep -e md3 -e sda4 -e sdb4)
disk- ------------reads------------ ------------writes----------- -----IO------
       total merged sectors      ms  total merged sectors      ms    cur    sec
sda4  124979  95338 28005296  324132 137943 278400 90118575 43341665      0   1103
sdb4  156880 119698 35217839  425346 135982 280362 90119567 71622604      0   1170
md3   496707      0 63222677       0 379699      0 90105057       0      0      0
sda4  124979  95338 28005296  324132 138341 279733 90502993 43483155      0   1105
sdb4  156880 119698 35217839  425346 136349 281695 90473169 71762657      0   1172
md3   496707      0 63222677       0 382386      0 90752849       0      0      0
sda4  124979  95338 28005296  324132 138745 280943 90901549 43699088      0   1107
sdb4  156880 119698 35217839  425346 136763 282905 90882421 72024008      0   1174
md3   496707      0 63222677       0 383780      0 91089049       0      0      0
sda4  124979  95338 28005296  324132 139132 282096 91282987 43927977      0   1109
sdb4  156880 119698 35217839  425346 137115 284058 91228331 72272559      0   1176
md3   496707      0 63222677       0 385346      0 91466262       0      0      0

In the above example, you can see, that the merged written sectors are increasing rapidly, while nothing happens on the reading side. This is because bonnie was just doing the char write benchmark. The “vmstat” command is available from the “procps” package.

Posted in block devices, Hardware, openSUSE | Leave a comment