This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
Home Products Support Community News


[Xen-devel] remus failure -xen 4.0.1: xc_restore failed only at some hea

To: xen-devel@xxxxxxxxxxxxxxxxxxx
Subject: [Xen-devel] remus failure -xen 4.0.1: xc_restore failed only at some heavy workload
From: Kyungjin Yoo <athleta@xxxxxxx>
Date: Tue, 14 Sep 2010 12:05:13 -0400
Delivery-date: Thu, 23 Sep 2010 05:26:13 -0700
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
I have done some experiments with remus and had some problems with its failover.

I set up dormO, and dormU like below and backup server is setup as same as primary.

Ubuntu 9.10
Xen 4.0.1-rc2
kernel for dorm0 :
kernel for dormU :

with idle guest running on dorm0, I run remus on primary server, and destroy guest or remus,
remus failover works and guest from primary server moves to backup server.

but for some workload experiment, I run specweb or kernel compile on the guest and primary server runs remus.
when the guest is destroyed or remus is killed, it doesn't survive at backup server even though it is checkpointing before. there was 'p' state of guest at backup server while checkpointing, but it's disappeared.

Error in xend.log at backup server shows this message.


[XXXX-XX-XX 13:56:50 6038] ERROR (XendCheckpoint:357) /usr/lib/xen/bin/xc_restore 36 92 1 2 0 0 0 0 failed
Traceback (most recent call last):
  File "/usr/lib/python2.6/site-packages/xen/xend/XendCheckpoint.py", line 309, in restore
    forkHelper(cmd, fd, handler.handler, True)
  File "/usr/lib/python2.6/site-packages/xen/xend/XendCheckpoint.py", line 411, in forkHelper
    raise XendError("%s failed" % string.join(cmd))
XendError: /usr/lib/xen/bin/xc_restore 36 92 1 2 0 0 0 0 failed
[XXXX-XX-XX 13:56:50 6038] ERROR (XendDomain:1175) Restore failed
Traceback (most recent call last):
  File "/usr/lib/python2.6/site-packages/xen/xend/XendDomain.py", line 1159, in domain_restore_fd
    dominfo = XendCheckpoint.restore(self, fd, paused=paused, relocating=relocating)
  File "/usr/lib/python2.6/site-packages/xen/xend/XendCheckpoint.py", line 358, in restore
    raise exn
XendError: /usr/lib/xen/bin/xc_restore 36 92 1 2 0 0 0 0 failed

it looks quite same with previous question from Shriram Rajagopalan

and this error seems appeared in xen live migration in the past, since remus shares functions with live migration, and error showed at xen live migration function.

anyone has previous similar experience either with remus or xen live migration?
anyone found any reason or solution for this?

I will appreciate it if anyone can help with this.

Thank you.

Xen-devel mailing list
<Prev in Thread] Current Thread [Next in Thread>
  • [Xen-devel] remus failure -xen 4.0.1: xc_restore failed only at some heavy workload, Kyungjin Yoo <=