WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-devel

Re: [Xen-devel] Error in XendCheckpoint: failed to flush file

To: Keir Fraser <Keir.Fraser@xxxxxxxxxxxx>
Subject: Re: [Xen-devel] Error in XendCheckpoint: failed to flush file
From: Stefan Berger <stefanb@xxxxxxxxxx>
Date: Wed, 28 Feb 2007 10:48:33 -0500
Cc: xen-devel <xen-devel@xxxxxxxxxxxxxxxxxxx>
Delivery-date: Wed, 28 Feb 2007 07:47:55 -0800
Envelope-to: www-data@xxxxxxxxxxxxxxxxxx
In-reply-to: <C20AD9F6.2F8A%Keir.Fraser@xxxxxxxxxxxx>
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx

Hi Keir,

here are some of the symptoms I get.

----------------

on x86-32 with changeset 14142 (this is on a blade) after a fresh 'hg clone' and build:

In the xm-test suite for example the 'restore' test cases fail:

make -C tests/restore check-TESTS

REASON: Domain still running after save!
FAIL: 01_restore_basic_pos.test
PASS: 02_restore_badparm_neg.test
PASS: 03_restore_badfilename_neg.test

REASON: Failed to create domain
FAIL: 04_restore_withdevices_pos.test


similar errors in the save test case:

REASON: Domain still running after save!
FAIL: 01_save_basic_pos.test
PASS: 02_save_badparm_neg.test
PASS: 03_save_bogusfile_neg.test


Is also see this here in 'xm dmesg'.

(XEN) *** Serial input -> DOM0 (type 'CTRL-a' three times to switch input to Xen).
(XEN) platform_hypercall.c:142: Domain 0 says that IO-APIC REGSEL is good
(XEN) grant_table.c:286:d0 Bad flags (0) or dom (0). (expected dom 0)
(XEN) grant_table.c:251:d0 Bad ref (2097664).
(XEN) grant_table.c:286:d0 Bad flags (0) or dom (0). (expected dom 0)

When doing a 'reboot' with the 'reboot' command that blade does not actually reboot but hangs after completely shutting down domain-0. I do not see this problem on other machines, though.

------------

on x86-64 (this is also a blade) after a fresh 'hg clone' and build:
Intel-Xeon 3.2Ghz
2 physical processor with hyperthreading each -> 4 logical processors
domain-0 has dom0_mem=10240000


The 'save' tests just crashed that machine (twice). :-/

I'll post a migration test that exposes the following error on x86-64 (only!) inside the guest when running that test 02_migrate_localhost_loop. To see these messages I modified the 'debugMe' variable in xm-test/lib/XmTestLib/Console.py line 68 and set it to 'True'.

@%@%> XENBUS error -12 while reading message
XENBUS error -12 while reading message
XENBUS unexpected type [1325400064], expected [4]
XENBUS error -12 while reading message
XENBUS error -12 while reading message
[...]

XENBUS error -12 while reading message
XENBUS: Unable to read cpu state
XENBUS: Unable to read cpu state

When building the sources with 'make -j 16' that blade's VNC output freezes at some point. Pinging it still works, but ssh'ing into it does not respond within reasonable time. Building the sources with non-parallel 'make' works fine.

  Stefan

xen-devel-bounces@xxxxxxxxxxxxxxxxxxx wrote on 02/28/2007 02:04:22 AM:

> I'm not sure the two are related. Fsync, lseek(), fadvise() will all fail if
> the fd maps to a socket. The failure is harmless and the error return code
> is ignored. The error to xend.log is overly noisy and needs cleaning up but
> unfortunately the suspend/resume problems probably lie elsewhere. What
> failure symptoms do you see?
>
>  -- Keir
>
> On 28/2/07 04:46, "Stefan Berger" <stefanb@xxxxxxxxxx> wrote:
>
> > I get these errors pretty often lately. This is on a x86-32 machine with
> > changes 14142. Does anyone else these this? Local migration and
> > suspend/resume fail quite frequently.
> >
> > [2007-02-27 23:39:56 20114] DEBUG (XendCheckpoint:236)
> > [xc_restore]: /usr/lib/xen/bin/xc_restore 23 262 18432 1 2 0 0 0
> > [2007-02-27 23:39:56 20114] INFO (XendCheckpoint:343) xc_linux_restore
> > start: max_pfn = 4800
> > [2007-02-27 23:39:56 20114] INFO (XendCheckpoint:343) Reloading memory
> > pages: 0%
> > [2007-02-27 23:39:56 20114] INFO (XendCheckpoint:343) Saving memory
> > pages: iter 1  37%ERROR Internal error: Failed to flush file: Invalid
> > argument (22 = Invalid argument)
>
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@xxxxxxxxxxxxxxxxxxx
> http://lists.xensource.com/xen-devel
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel