I'm experiencing some really strange behavior with an OpenSuse 10.3 guest
running in Xen. Every 48-72 hours, the machine starts running at a very
high load average, dumping tons of messages in the message log, finally
becoming completly inaccessible. When the guest finally becomes unusable,
the host "xm top" display shows 399% CPU utilzation, and contstant NET
and VBD activity, but the host cannot even "shutdown" the guest - I have
to destroy it to make it stop.
The host machine is a Dell Poweredge 2950 III server, running OpenSuse 11.1,
64 bit, kernel 22.214.171.124-0.1-xen, and Xen package xen-3.3.1_18546_24-0.4.13 .
It has 20GB of RAM, a quad-core 2GHz Intel CPU, and a Dell Perc5 RAID. It
runs other guest machines with no problem.
The guest machine is running OpenSuse 10.3, kernel 126.96.36.199-0.4-xenpae, in
32 bit mode, with Xen package xen-3.1.0_15042-51.3.
The guest machine is a clone of a running phyical machine that I'm trying to
virtualize. I did the creation of the drive, the attach, and so forth, on
the Xen host, then I did an rsync of the 10.3 physical machine's filesystems
onto the 11.1 host. I removed and reinstalled the Xen kernel package as
suggested on the net, and, against even my predictions, got the guest to
boot. And it works great... for a few days or so.
But, then, what happens is that the guest starts to go crazy. I see rapidly
repeating messages like this start to appear in the syslog /var/log/messages:
Nov 20 15:35:55 guestc kernel: b_state=0x00000029, b_size=4096
Nov 20 15:35:55 guestc kernel: device blocksize: 4096
Nov 20 15:35:55 guestc kernel: __find_get_block_slow() failed. block=210137505,
Occasionally these messages show up garbled, like this:
Nov 20 15:35:55 guestc kernel: __find_get_block_slow() failed.
And then, of course, I can't even get in to the guest at all, via network
or xm console. xm shutdown does nothing, and I must xm destroy the guest.
After re-creating the guest, everything runs fine again, until another few
days have passed.
Today I was actually in the guest when this happened. An rsync was running,
and that process was pegged, with the guest showing a load average of 5.0
from within the guest, and "xm top" showing a usage of 199% (2 of the 4 CPUS?)
I couldn't kill the rsync process, and the messages above were flooding into
the syslog. The guest could not shut all the way down even with "init 0",
and, eventually, I had to destroy it again.
Here is the machine config:
disk=[ 'file:/a/disks/guestc/disk0,xvda,w', 'phy:sdc1,sdc1,w', ]
vif=[ 'mac=00:16:3e:52:f9:96,bridge=br0', ]
Now, I get that I'm doing some unorthodox things here. Cloning a physical
machine into a virtual machine. Running 10.3 as a guest under an 11.1 host.
A 32-bit guest on a 64-bit host. But the thing DOES run, and I feel like
I'm SO CLOSE to making this work, so I'm really hopeful that someone can
recognize these symptoms and help me find a solution, rather than just
pointing out the obviously edge-case aspects to this situation here.
Any ideas or guidance would be greatly appreciated!
Xen-users mailing list