WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-devel

Re: [Xen-devel] Re: Xen-unstable save error

To: Keir Fraser <keir.fraser@xxxxxxxxxxxxx>
Subject: Re: [Xen-devel] Re: Xen-unstable save error
From: Michal Novotny <minovotn@xxxxxxxxxx>
Date: Mon, 21 Jun 2010 15:37:01 +0200
Cc: "xen-devel@xxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxx>
Delivery-date: Mon, 21 Jun 2010 06:38:12 -0700
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
In-reply-to: <C8452690.180FC%keir.fraser@xxxxxxxxxxxxx>
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
References: <C8452690.180FC%keir.fraser@xxxxxxxxxxxxx>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
User-agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.1.9) Gecko/20100430 Fedora/3.0.4-3.fc13 Thunderbird/3.0.4
On 06/21/2010 03:24 PM, Keir Fraser wrote:
On 21/06/2010 14:18, "Michal Novotny"<minovotn@xxxxxxxxxx>  wrote:

So there's no xs_suspend_evtchn_port (or anything xs_*) function being
exported by /usr/lib64/libxenctrl.so (which is the symlink to
/usr/lib64/libxenctrl.so.4.0.0), therefore:

#objdump -x /usr/lib64/libxenctrl.so.4.0.0 | grep xs_
#objdump -x /xen-unstable.hg/tools/libxc/libxenctrl.so.4.0.0  | grep xs_
Libxenstore, libxenstore, libxen**STORE**.

#nm /my/path/to/libxenstore.so.3.0.0 | grep xs_sus
000000000000346a T xs_suspend_evtchn_port

  -- Keir


Oh, sorry for that and thanks for noticing my mistake. I overlooked this one, nevertheless it's still not working:

[2010-06-21 17:27:39 4305] DEBUG (XendCheckpoint:126) [xc_save]: /usr/lib64/xen/bin/xc_save 56 1 0 0 4 [2010-06-21 17:27:39 4305] INFO (XendCheckpoint:410) xc_save: failed to get the suspend evtchn port
[2010-06-21 17:27:39 4305] INFO (XendCheckpoint:410)
[2010-06-21 17:27:39 4305] DEBUG (XendCheckpoint:381) suspend
[2010-06-21 17:27:39 4305] DEBUG (XendCheckpoint:129) In saveInputHandler suspend
[2010-06-21 17:27:39 4305] DEBUG (XendCheckpoint:131) Suspending 1 ...
[2010-06-21 17:27:39 4305] DEBUG (XendDomainInfo:521) XendDomainInfo.shutdown(suspend) [2010-06-21 17:27:39 4305] DEBUG (XendDomainInfo:1877) XendDomainInfo.handleShutdownWatch [2010-06-21 17:27:39 4305] INFO (XendDomainInfo:538) HVM save:remote shutdown dom 1!
[2010-06-21 17:27:39 4305] INFO (XendCheckpoint:137) Domain 1 suspended.
[2010-06-21 17:27:39 4305] INFO (XendDomainInfo:2074) Domain has shutdown: name=migrating-rhel5-32fv-stubdom id=1 reason=suspend. [2010-06-21 17:27:40 4305] INFO (image:538) signalDeviceModel:restore dm state to running
[2010-06-21 17:27:40 4305] DEBUG (XendCheckpoint:146) Written done
[2010-06-21 17:27:46 4305] DEBUG (XendDomainInfo:3067) XendDomainInfo.destroy: domid=1 [2010-06-21 17:27:46 4305] DEBUG (XendDomainInfo:2397) Destroying device model [2010-06-21 17:27:47 4305] INFO (image:615) migrating-rhel5-32fv-stubdom device model terminated


# ls -al rhel5-32fv.sav
-rwxr-xr-x 1 root root 54657427 Jun 21 17:27 rhel5-32fv.sav

# xm restore rhel5-32fv.sav
Error: /usr/lib64/xen/bin/xc_restore 4 2 2 3 1 1 1 0 failed
Usage: xm restore <CheckpointFile> [-p]

Restore a domain from a saved state.
  -p, --paused                   Do not unpause domain after restoring it

# tail /var/log/xen/xend.log
[2010-06-21 17:29:21 4305] INFO (image:822) Need to create platform device.[domid:2] [2010-06-21 17:29:21 4305] DEBUG (XendCheckpoint:273) restore:shadow=0x9, _static_max=0x40000000, _static_min=0x0, [2010-06-21 17:29:21 4305] DEBUG (XendCheckpoint:292) [xc_restore]: /usr/lib64/xen/bin/xc_restore 4 2 2 3 1 1 1 0 [2010-06-21 17:29:22 4305] INFO (XendCheckpoint:410) xc: error: Error when reading batch size (0 = Success): Internal error [2010-06-21 17:29:22 4305] INFO (XendCheckpoint:410) xc: error: error when buffering batch, finishing (0 = Success): Internal error [2010-06-21 17:29:22 4305] INFO (XendCheckpoint:410) xc: error: error zeroing magic pages (22 = Invalid argument): Internal error [2010-06-21 17:29:22 4305] DEBUG (XendDomainInfo:3067) XendDomainInfo.destroy: domid=2 [2010-06-21 17:29:22 4305] ERROR (XendDomainInfo:3081) XendDomainInfo.destroy: domain destruction failed.
Traceback (most recent call last):
File "usr/lib64/python2.4/site-packages/xen/xend/XendDomainInfo.py", line 3074, in destroy
    xc.domain_pause(self.domid)
Error: (3, 'No such process')
[2010-06-21 17:29:22 4305] DEBUG (XendDomainInfo:2402) No device model
[2010-06-21 17:29:22 4305] DEBUG (XendDomainInfo:2404) Releasing devices
[2010-06-21 17:29:22 4305] DEBUG (XendDomainInfo:2410) Removing vif/0
[2010-06-21 17:29:22 4305] DEBUG (XendDomainInfo:1272) XendDomainInfo.destroyDevice: deviceClass = vif, device = vif/0
[2010-06-21 17:29:22 4305] DEBUG (XendDomainInfo:2410) Removing vbd/768
[2010-06-21 17:29:22 4305] DEBUG (XendDomainInfo:1272) XendDomainInfo.destroyDevice: deviceClass = vbd, device = vbd/768
[2010-06-21 17:29:22 4305] DEBUG (XendDomainInfo:2410) Removing vbd/2048
[2010-06-21 17:29:22 4305] DEBUG (XendDomainInfo:1272) XendDomainInfo.destroyDevice: deviceClass = vbd, device = vbd/2048
[2010-06-21 17:29:22 4305] DEBUG (XendDomainInfo:2410) Removing vfb/0
[2010-06-21 17:29:22 4305] DEBUG (XendDomainInfo:1272) XendDomainInfo.destroyDevice: deviceClass = vfb, device = vfb/0
[2010-06-21 17:29:22 4305] DEBUG (XendDomainInfo:2410) Removing console/0
[2010-06-21 17:29:22 4305] DEBUG (XendDomainInfo:1272) XendDomainInfo.destroyDevice: deviceClass = console, device = console/0 [2010-06-21 17:29:22 4305] ERROR (XendCheckpoint:344) /usr/lib64/xen/bin/xc_restore 4 2 2 3 1 1 1 0 failed
Traceback (most recent call last):
File "/usr/lib64/python2.4/site-packages/xen/xend/XendCheckpoint.py", line 296, in restore
    forkHelper(cmd, fd, handler.handler, True)
File "/usr/lib64/python2.4/site-packages/xen/xend/XendCheckpoint.py", line 398, in forkHelper
    raise XendError("%s failed" % string.join(cmd))
XendError: /usr/lib64/xen/bin/xc_restore 4 2 2 3 1 1 1 0 failed
[2010-06-21 17:29:22 4305] ERROR (XendDomain:1182) Restore failed
Traceback (most recent call last):
File "usr/lib64/python2.4/site-packages/xen/xend/XendDomain.py", line 1166, in domain_restore_fd dominfo = XendCheckpoint.restore(self, fd, paused=paused, relocating=relocating) File "/usr/lib64/python2.4/site-packages/xen/xend/XendCheckpoint.py", line 345, in restore
    raise exn
XendError: /usr/lib64/xen/bin/xc_restore 4 2 2 3 1 1 1 0 failed

So now it seems to be linked with the correct library but it can't get the suspend port by now:
>> xc_save: failed to get the suspend evtchn port

This is being called from xc_save.c (snippet from line 192):
...
       port = xs_suspend_evtchn_port(si.domid);

        if (port < 0)
            warnx("failed to get the suspend evtchn port\n");
        else {
            ... suspend...
        }

I had a look at the code for xenstore/xs.c I saw it's reading the value at:

/local/domain/%d/device/suspend/event-channel

but when I try to get it using:

xenstore-ls /local/domain/3/device/suspendupstream

where 3 is my domid I saw nothing, I saw just:

# xenstore-ls /local/domain/3/device
vfb = ""
 0 = ""
  state = "1"
  backend-id = "0"
  backend = "/local/domain/0/backend/vfb/3/0"
vbd = ""
 768 = ""
  backend-id = "0"
  virtual-device = "768"
  device-type = "disk"
  state = "1"
  backend = "/local/domain/0/backend/vbd/3/768"
 2048 = ""
  backend-id = "0"
  virtual-device = "2048"
  device-type = "disk"
  state = "1"
  backend = "/local/domain/0/backend/vbd/3/2048"
vif = ""
 0 = ""
  state = "1"
  backend-id = "0"
  backend = "/local/domain/0/backend/vif/3/0"
console = ""
 0 = ""
  state = "1"
  backend-id = "0"
  backend = "/local/domain/0/backend/console/3/0"
#

My guest is RHEL-5 i386 guest but this seems that the suspend port is missing. AFAIK, you started using the SUSPEND_CANCEL some time ago which requires the modified kernel.

Isn't it possible that's the issue or how is it with the SUSPEND_CANCEL functionality?

Thanks,
Michal

--
Michal Novotny<minovotn@xxxxxxxxxx>, RHCE
Virtualization Team (xen userspace), Red Hat


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel