[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] slow live magration / xc_restore on xen4 pvops

To: Ian Jackson <Ian.Jackson@xxxxxxxxxxxxx>
From: Brendan Cully <brendan@xxxxxxxxx>
Date: Thu, 3 Jun 2010 08:03:05 -0700
Cc: "xen-devel@xxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxx>, Andreas Olsowski <andreas.olsowski@xxxxxxxxxxxxxxx>
Delivery-date: Thu, 03 Jun 2010 08:04:20 -0700
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
Mail-followup-to: Ian.Jackson@xxxxxxxxxxxxx, xen-devel@xxxxxxxxxxxxxxxxxxx, andreas.olsowski@xxxxxxxxxxxxxxx

On Thursday, 03 June 2010 at 11:01, Ian Jackson wrote:
> Brendan Cully writes ("Re: [Xen-devel] slow live magration / xc_restore on 
> xen4 pvops"):
> > 2. in normal migration, the sender should close the fd after sending
> > all data, immediately triggering an IO error on the receiver and
> > completing the restore.
> 
> This is not true.  In normal migration, the fd is used by the
> machinery which surrounds xc_domain_restore (in xc_save and also in xl
> or xend).  In any case it would be quite wrong for a library function
> like xc_domain_restore to eat the fd.

The sender closes the fd, as it always has. xc_domain_restore has
always consumed the entire contents of the fd, because the qemu tail
has no length header under normal migration. There's no behavioral
difference here that I can see.

> It's not necessary for xc_domain_restore to behave this way in all
> cases; all that's needed is parameters to tell it how to behave.

I have no objection to a more explicit interface. The current form is
simply Remus trying to be as invisible as possible to the rest of the
tool stack.

> > I did try to avoid disturbing regular live migration as much as
> > possible when I wrote the code. I suspect some other regression has
> > crept in, and I'll investigate.
> 
> The short timeout is another regression.  A normal live migration or
> restore should not fall over just because no data is available for
> 100ms.

(the timeout is 1s, by the way).

For some reason you clipped the bit of my previous message where I say
this doesn't happen:

1. reads are only supposed to be able to time out after the entire              

first checkpoint has been received (IOW this wouldn't kick in until             

normal migration had already completed)    

Let's take a look at read_exact_timed in xc_domain_restore:

if ( completed ) {
    /* expect a heartbeat every HEARBEAT_MS ms maximum */
    tv.tv_sec = HEARTBEAT_MS / 1000;
    tv.tv_usec = (HEARTBEAT_MS % 1000) * 1000;

    FD_ZERO(&rfds);
    FD_SET(fd, &rfds);
    len = select(fd + 1, &rfds, NULL, NULL, &tv);
    if ( !FD_ISSET(fd, &rfds) ) {
        fprintf(stderr, "read_exact_timed failed (select returned %zd)\n", len);
        return -1;
    }
}

'completed' is not set until the first entire checkpoint (i.e., the
entirety of non-Remus migration) has completed. So, no issue.

I see no evidence that Remus has anything to do with the live
migration performance regression discussed in this thread, and I
haven't seen any other reported issues either. I think the mlock issue
is a much more likely candidate.

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel

Follow-Ups:
- Re: [Xen-devel] slow live magration / xc_restore on xen4 pvops
  - From: Ian Jackson
- Re: [Xen-devel] slow live magration / xc_restore on xen4 pvops
  - From: Keir Fraser

References:
- [Xen-devel] XCP
  - From: AkshayKumar Mehta
- [Xen-devel] slow live magration / xc_restore on xen4 pvops
  - From: Andreas Olsowski
- Re: [Xen-devel] slow live magration / xc_restore on xen4 pvops
  - From: Ian Jackson
- Re: [Xen-devel] slow live magration / xc_restore on xen4 pvops
  - From: Brendan Cully
- Re: [Xen-devel] slow live magration / xc_restore on xen4 pvops
  - From: Ian Jackson

Prev by Date: [Xen-devel] [PATCH] libxl/xl: fix multivcpu handling
Next by Date: Re: [Xen-devel] IOMMU and AMD 890fx
Previous by thread: Re: [Xen-devel] slow live magration / xc_restore on xen4 pvops
Next by thread: Re: [Xen-devel] slow live magration / xc_restore on xen4 pvops
Index(es):
- Date
- Thread

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.