WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-users

[Xen-users] Live migration fails when source machine has multiple domUs

To: xen-users@xxxxxxxxxxxxxxxxxxx
Subject: [Xen-users] Live migration fails when source machine has multiple domUs
From: Tom Lanyon <tom@xxxxxxxxxxxxxx>
Date: Mon, 25 Aug 2008 16:55:26 +0930
Delivery-date: Mon, 25 Aug 2008 00:26:05 -0700
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
List-help: <mailto:xen-users-request@lists.xensource.com?subject=help>
List-id: Xen user discussion <xen-users.lists.xensource.com>
List-post: <mailto:xen-users@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/mailman/listinfo/xen-users>, <mailto:xen-users-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-users>, <mailto:xen-users-request@lists.xensource.com?subject=unsubscribe>
Sender: xen-users-bounces@xxxxxxxxxxxxxxxxxxx
Hi list,

I seem to have encountered a bug that's been reported a few times on this list but there's no bug in the bugzilla and no one seems to have reported a resolution.

I have a three node RHEL cluster running some paravirtualised virtual machines, each using a CLVM logical volume block device as their storage. There's no cluster file systems involved and the block device for each virtual machine is accessible on all three dom0 servers.

All dom0 and all domU are x86_64 RHEL 5.2 (also tried CentOS 5.2).

Live migration works perfectly when there's only one virtual machine involved. However, if two virtual machines are running on one server and I try to migrate one away to another server, xend starts to migrate the state (copies all the memory, etc) and then I get this error on the domU console:

WARNING: g.e. still in use!
WARNING: leaking g.e. and page still in use!
WARNING: g.e. still in use!
WARNING: leaking g.e. and page still in use!
netif_release_rx_bufs: 0 xfer, 62 noxfer, 194 unused
WARNING: g.e. still in use!
WARNING: leaking g.e. and page still in use!


Apologies for the long email, but I'll also include below the xend.log output from the source dom0 server. I've seen this before on the list and it always relates to network-based shared storage, whether that's iSCSI, DRBD or GNBD (my case). As far as I can tell, the migration works fine and the VM's state transfers completely but then has a problem trying to relinquish device 51712 (which is the xvda disk). The 'exception looking up device number for xvda' also has me suspicious.

Any help is much appreciated!

Regards,
Tom

xend.log output follows:

[2008-08-24 00:27:43 xend 5252] DEBUG (balloon:127) Balloon: 26652 KiB free; need 25600; done. [2008-08-24 00:27:43 xend 5252] DEBUG (XendCheckpoint:89) [xc_save]: / usr/lib64/xen/bin/xc_save 22 9 0 0 1 [2008-08-24 00:27:43 xend 5252] INFO (XendCheckpoint:351) ERROR Internal error: Couldn't enable shadow mode
[2008-08-24 00:27:43 xend 5252] INFO (XendCheckpoint:351) Save exit rc=1
[2008-08-24 00:27:43 xend 5252] ERROR (XendCheckpoint:133) Save failed on domain nodea (9).
Traceback (most recent call last):
File "/usr/lib64/python2.4/site-packages/xen/xend/ XendCheckpoint.py", line 110, in save
   forkHelper(cmd, fd, saveInputHandler, False)
File "/usr/lib64/python2.4/site-packages/xen/xend/ XendCheckpoint.py", line 339, in forkHelper
   raise XendError("%s failed" % string.join(cmd))
XendError: /usr/lib64/xen/bin/xc_save 22 9 0 0 1 failed
[2008-08-24 00:27:43 xend.XendDomainInfo 5252] DEBUG (XendDomainInfo: 1601) XendDomainInfo.resumeDomain(9) [2008-08-24 00:27:43 xend.XendDomainInfo 5252] INFO (XendDomainInfo: 1722) Dev 51712 still active, looping... [2008-08-24 00:27:44 xend.XendDomainInfo 5252] INFO (XendDomainInfo: 1722) Dev 51712 still active, looping... [2008-08-24 00:27:44 xend.XendDomainInfo 5252] INFO (XendDomainInfo: 1722) Dev 51712 still active, looping... [2008-08-24 00:27:44 xend.XendDomainInfo 5252] INFO (XendDomainInfo: 1722) Dev 51712 still active, looping... [2008-08-24 00:27:44 xend.XendDomainInfo 5252] INFO (XendDomainInfo: 1722) Dev 51712 still active, looping... [2008-08-24 00:27:44 xend.XendDomainInfo 5252] INFO (XendDomainInfo: 1722) Dev 51712 still active, looping... [2008-08-24 00:27:44 xend.XendDomainInfo 5252] INFO (XendDomainInfo: 1722) Dev 51712 still active, looping... [2008-08-24 00:27:44 xend.XendDomainInfo 5252] INFO (XendDomainInfo: 1722) Dev 51712 still active, looping... [2008-08-24 00:27:44 xend.XendDomainInfo 5252] DEBUG (XendDomainInfo: 1614) XendDomainInfo.resumeDomain: devices released [2008-08-24 00:27:44 xend.XendDomainInfo 5252] DEBUG (XendDomainInfo: 791) Storing domain details: {'console/ring-ref': '2057005', 'console/ port': '2', 'name': 'migrating-nodea', 'console/limit': '1048576', 'vm': '/vm/b845f914-33a3-e1cf-551e-01b6d346b92b', 'domid': '9', 'cpu/0/ availability': 'online', 'memory/target': '6144000', 'store/ring-ref': '2049294', 'store/port': '1'} [2008-08-24 00:27:44 xend 5252] DEBUG (DevController:110) DevController: writing {'backend-id': '0', 'mac': '00:16:3e:6c:ae:9f', 'handle': '0', 'state': '1', 'backend': '/local/domain/0/backend/vif/ 9/0'} to /local/domain/9/device/vif/0. [2008-08-24 00:27:44 xend 5252] DEBUG (DevController:112) DevController: writing {'bridge': 'br102', 'domain': 'migrating- nodea', 'handle': '0', 'script': '/etc/xen/scripts/vif-bridge', 'state': '1', 'frontend': '/local/domain/9/device/vif/0', 'mac': '00:16:3e:6c:ae:9f', 'online': '1', 'frontend-id': '9'} to /local/ domain/0/backend/vif/9/0. [2008-08-24 00:27:44 xend 5252] DEBUG (blkif:24) exception looking up device number for xvda: [Errno 2] No such file or directory: '/dev/xvda' [2008-08-24 00:27:44 xend 5252] DEBUG (DevController:110) DevController: writing {'backend-id': '0', 'virtual-device': '51712', 'device-type': 'disk', 'state': '1', 'backend': '/local/domain/0/ backend/vbd/9/51712'} to /local/domain/9/device/vbd/51712. [2008-08-24 00:27:44 xend 5252] DEBUG (DevController:112) DevController: writing {'domain': 'migrating-nodea', 'frontend': '/ local/domain/9/device/vbd/51712', 'format': 'raw', 'dev': 'xvda', 'state': '1','params': '/dev/int_vg/os_nodea', 'mode': 'w', 'online': '1', 'frontend-id': '9', 'type': 'phy'} to /local/domain/0/backend/vbd/ 9/51712. [2008-08-24 00:27:44 xend.XendDomainInfo 5252] DEBUG (XendDomainInfo: 1626) XendDomainInfo.resumeDomain: devices created [2008-08-24 00:27:44 xend.XendDomainInfo 5252] ERROR (XendDomainInfo: 1631) XendDomainInfo.resume: xc.domain_resume failed on domain 9.
Traceback (most recent call last):
File "/usr/lib64/python2.4/site-packages/xen/xend/ XendDomainInfo.py", line 1628, in resumeDomain
   xc.domain_resume(self.domid, fast)
Error: (1, 'Internal error', "Couldn't map start_info")
[2008-08-24 00:27:44 xend 5252] DEBUG (XendCheckpoint:136) XendCheckpoint.save: resumeDomain [2008-08-24 00:27:44 xend.XendDomainInfo 5252] INFO (XendDomainInfo: 1722) Dev 51712 still active, looping... [2008-08-24 00:27:44 xend.XendDomainInfo 5252] INFO (XendDomainInfo: 1722) Dev 51712 still active, looping... [2008-08-24 00:27:44 xend.XendDomainInfo 5252] INFO (XendDomainInfo: 1722) Dev 51712 still active, looping... [2008-08-24 00:27:45 xend.XendDomainInfo 5252] INFO (XendDomainInfo: 1722) Dev 51712 still active, looping...
-------many repeats-------
[2008-08-24 00:28:14 xend.XendDomainInfo 5252] INFO (XendDomainInfo: 1728) Dev still active but hit max loop timeout

_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-users

<Prev in Thread] Current Thread [Next in Thread>
  • [Xen-users] Live migration fails when source machine has multiple domUs, Tom Lanyon <=