WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-users

[Xen-users] HVM Live Migrations Failing 90% Of The Time

To: xen-users@xxxxxxxxxxxxxxxxxxx
Subject: [Xen-users] HVM Live Migrations Failing 90% Of The Time
From: Tim O'Donovan <tim@xxxxxxxxxxxxxxxxx>
Date: Wed, 07 Apr 2010 18:04:14 +0100
Delivery-date: Wed, 07 Apr 2010 10:06:22 -0700
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
List-help: <mailto:xen-users-request@lists.xensource.com?subject=help>
List-id: Xen user discussion <xen-users.lists.xensource.com>
List-post: <mailto:xen-users@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/mailman/listinfo/xen-users>, <mailto:xen-users-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-users>, <mailto:xen-users-request@lists.xensource.com?subject=unsubscribe>
Organization: ICUK Computing Services Ltd
Reply-to: tim@xxxxxxxxxxxxxxxxx
Sender: xen-users-bounces@xxxxxxxxxxxxxxxxxxx
User-agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.1.21) Gecko/20090318 Lightning/0.9 Thunderbird/2.0.0.21 Mnenhy/0.7.5.0
I'm deploying a 2-node Pacemaker/DRBD backed Xen cluster to run a
mixture of Linux PVM and Windows HVM VMs. I have this up and running on
a pair of development machines, with both automatic and manual failover
working perfectly. The live migrations work every time for the PVM and
HVM based VMs.

I've replicated the setup onto a pair of high-end live machines, but the
live migrations only succeed around 10% of the time for the HVM VMs. PVM
live migrations complete every time. The configurations on the
development and live machines are identical in every way, except for the
physical hardware.

The migrating host errors with the following when the migration fails:

[2010-04-07 14:42:45 6211] DEBUG (XendCheckpoint:103) [xc_save]:
/usr/lib64/xen/bin/xc_save 30 18 0 0 5
[2010-04-07 14:42:45 6211] INFO (XendCheckpoint:403) xc_save: could not
read suspend event channel
[2010-04-07 14:42:45 6211] WARNING (XendDomainInfo:1617) Domain has
crashed: name=migrating-web id=18.
[2010-04-07 14:42:45 6211] DEBUG (XendDomainInfo:2389)
XendDomainInfo.destroy: domid=18
[2010-04-07 14:42:45 6211] DEBUG (XendDomainInfo:2406)
XendDomainInfo.destroyDomain(18)
[2010-04-07 14:42:48 6211] DEBUG (XendDomainInfo:1939) Destroying device
model
[2010-04-07 14:42:48 6211] INFO (XendCheckpoint:403) Saving memory
pages: iter 1  10%ERROR Internal error: Error peeking shadow bitmap
[2010-04-07 14:42:48 6211] INFO (XendCheckpoint:403) Warning - couldn't
disable shadow modeSave exit rc=1
[2010-04-07 14:42:48 6211] ERROR (XendCheckpoint:157) Save failed on
domain web (18) - resuming.
Traceback (most recent call last):
  File "/usr/lib/python2.5/site-packages/xen/xend/XendCheckpoint.py",
line 125, in save
    forkHelper(cmd, fd, saveInputHandler, False)
  File "/usr/lib/python2.5/site-packages/xen/xend/XendCheckpoint.py",
line 391, in forkHelper
    raise XendError("%s failed" % string.join(cmd))
XendError: /usr/lib64/xen/bin/xc_save 30 18 0 0 5 failed


With the below also being logged in /var/log/xen/qemu-dm-web.log:

xenstore_process_logdirty_event: key=000000006b8b4567 size=335816
Log-dirty: mapped segment at 0x7fb56c136000
Triggered log-dirty buffer switch


The host that is being migrated to errors with the following:

[2010-04-07 14:42:45 6227] INFO (XendCheckpoint:403) Reloading memory
pages:   0%
[2010-04-07 14:42:48 6227] INFO (XendCheckpoint:403) ERROR Internal
error: Error when reading batch size
[2010-04-07 14:42:48 6227] INFO (XendCheckpoint:403) Restore exit with rc=1
[2010-04-07 14:42:48 6227] DEBUG (XendDomainInfo:2389)
XendDomainInfo.destroy: domid=26
[2010-04-07 14:42:48 6227] DEBUG (XendDomainInfo:2406)
XendDomainInfo.destroyDomain(26)
[2010-04-07 14:42:48 6227] ERROR (XendDomainInfo:2418)
XendDomainInfo.destroy: xc.domain_destroy failed.
Traceback (most recent call last):
  File "/usr/lib/python2.5/site-packages/xen/xend/XendDomainInfo.py",
line 2413, in destroyDomain
    xc.domain_destroy(self.domid)
Error: (3, 'No such process')


Some basic config details:

Xen version:    3.3.0
Kernel:         2.6.24-27-xen
dom0 OS:        Ubuntu 8.04 64-bit
domU OS:        Windows 2008 64-bit


VM config for the above example:

name = "web"
kernel = "/usr/lib/xen/boot/hvmloader"
builder='hvm'
memory = 10240
shadow_memory = 8
vif = [ 'bridge=eth1' ]
acpi = 1
apic = 1
disk = [ 'phy:/dev/drbd0,hda,w', 'phy:/dev/drbd1,hdb,w' ]
device_model = '/usr/lib64/xen/bin/qemu-dm'
boot="dc"
sdl=0
vnc=1
vncconsole=1
vncpasswd='XXXXXXXXXXXX'
serial='pty'
usbdevice='tablet'
vcpus=8
on_poweroff = 'destroy'
on_reboot   = 'restart'
on_crash    = 'destroy'


The DRBD resources are handled by Jefferson Ogata's qemu-dm.drbd wrapper
(http://www.antibozo.net/xen/qemu-dm.drbd) and a slightly modified
version of DRBD's block-drbd script.

The dom0 machines are allocated 1GB of memory each and are identical, in
both software and hardware configurations. Each machine has a total of
24GB of memory.



Thanks


_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-users

<Prev in Thread] Current Thread [Next in Thread>