WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-devel

Re: [Xen-devel] slow live magration / xc_restore on xen4 pvops

To: xen-devel@xxxxxxxxxxxxxxxxxxx
Subject: Re: [Xen-devel] slow live magration / xc_restore on xen4 pvops
From: Andreas Olsowski <andreas.olsowski@xxxxxxxxxxxxxxx>
Date: Wed, 2 Jun 2010 17:46:45 +0200
Delivery-date: Wed, 02 Jun 2010 08:48:47 -0700
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
In-reply-to: <C82BC2B3.166A7%keir.fraser@xxxxxxxxxxxxx>
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
Organization: Leuphana Universität Lüneburg
References: <4C0578EB.2040800@xxxxxxxxxxxxxxx> <C82BC2B3.166A7%keir.fraser@xxxxxxxxxxxxx>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
Hi Keir,

i changed all DRPRINTF calls to ERROR and // DPRINTF to ERROR as well.
There are no DBGPRINTF calls in my xc_domain_restore.c though.

This is the new xend.log output, of course in this case the "ERROR Internal 
error:" is actually debug output.

xenturio1:~# tail -f /var/log/xen/xend.log
[2010-06-02 15:44:19 5468] DEBUG (XendCheckpoint:286) restore:shadow=0x0, 
_static_max=0x20000000, _static_min=0x0,
[2010-06-02 15:44:19 5468] DEBUG (XendCheckpoint:305) [xc_restore]: 
/usr/lib/xen/bin/xc_restore 50 51 1 2 0 0 0 0
[2010-06-02 15:44:19 5468] INFO (XendCheckpoint:423) ERROR Internal error: 
xc_domain_restore start: p2m_size = 20000
[2010-06-02 15:44:19 5468] INFO (XendCheckpoint:423)
[2010-06-02 15:44:19 5468] INFO (XendCheckpoint:423) ERROR Internal error: 
Reloading memory pages:   0%
[2010-06-02 15:44:19 5468] INFO (XendCheckpoint:423)
[2010-06-02 15:44:19 5468] INFO (XendCheckpoint:423) ERROR Internal error: 
reading batch of -7 pages
[2010-06-02 15:44:19 5468] INFO (XendCheckpoint:423)
[2010-06-02 15:44:19 5468] INFO (XendCheckpoint:423) ERROR Internal error: 
reading batch of 1024 pages
[2010-06-02 15:44:19 5468] INFO (XendCheckpoint:423)
[2010-06-02 15:49:02 5468] INFO (XendCheckpoint:423) ERROR Internal error: 
reading batch of 1024 pages
[2010-06-02 15:49:02 5468] INFO (XendCheckpoint:423)
[2010-06-02 15:49:02 5468] INFO (XendCheckpoint:423) ERROR Internal error: 
reading batch of 1024 pages
[2010-06-02 15:49:02 5468] INFO (XendCheckpoint:423)
[2010-06-02 15:49:03 5468] INFO (XendCheckpoint:423) ERROR Internal error: 
reading batch of 1024 pages
...
[2010-06-02 15:49:09 5468] INFO (XendCheckpoint:423) ERROR Internal err100%
...

One can see the timegap bewteen the first and the following memory batch reads.
After that restoration works as expected.
You might notice, that you have "0%" and then "100%" and no steps inbetween, 
whereas with xc_save you have, is that intentional or maybe another symptom for 
the same problem?

as for the read_exact stuff:
tarballerina:/usr/src/xen-4.0.0# find . -type f -iname \*.c -exec grep -H 
RDEXACT {} \;
tarballerina:/usr/src/xen-4.0.0# find . -type f -iname \*.c -exec grep -H 
rdexact {} \;

There are no RDEXACT/rdexact matches in my xen source code.

In a few hours i will shutdown all virtual machines on one of the hosts 
experiencing slow xc_restores, maybe reboot it and check if xc_restore is any 
faster without load or utilization on the machine.

Ill check in with results later.


On Wed, 2 Jun 2010 08:11:31 +0100
Keir Fraser <keir.fraser@xxxxxxxxxxxxx> wrote:

> Hi Andreas,
> 
> This is an interesting bug, to be sure. I think you need to modify the
> restore code to get a better idea of what's going on. The file in the Xen
> tree is tools/libxc/xc_domain_restore.c. You will see it contains many
> DBGPRINTF and DPRINTF calls, some of which are commented out, and some of
> which may 'log' at too low a priority level to make it to the log file. For
> your purposes you might change them to ERROR calls as they will definitely
> get properly logged. One area of possible concern is that our read function
> (RDEXACT, which is a macro mapping to rdexact) was modified for Remus to
> have a select() call with a timeout of 1000ms. Do I entirely trust it? Not
> when we have the inexplicable behaviour that you're seeing. So you might try
> mapping RDEXACT() to read_exact() instead (which is what we already do when
> building for __MINIOS__).
> 
> This all assumes you know your way around C code at least a little bit.
> 
>  -- Keir


-- 
Andreas Olsowski <andreas.olsowski@xxxxxxxxxxxxxxx>
Leuphana Universität Lüneburg
System- und Netzwerktechnik
Rechenzentrum, Geb 7, Raum 15
Scharnhorststr. 1
21335 Lüneburg

Tel: ++49 4131 / 6771309

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel