This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
Home Products Support Community News


Re: [Xen-devel] vbd flushing during migration?

To: "John Byrne" <john.l.byrne@xxxxxx>
Subject: Re: [Xen-devel] vbd flushing during migration?
From: "Andrew Warfield" <andrew.warfield@xxxxxxxxxxxx>
Date: Mon, 31 Jul 2006 16:03:18 -0700
Cc: xen-devel <xen-devel@xxxxxxxxxxxxxxxxxxx>
Delivery-date: Mon, 31 Jul 2006 16:03:46 -0700
Domainkey-signature: a=rsa-sha1; q=dns; c=nofws; s=beta; d=gmail.com; h=received:message-id:date:from:sender:to:subject:cc:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references:x-google-sender-auth; b=amSj07W8iZgxxl6MAYktWDB+0azQh6n/egsZCBh+ddZrRJqtB1EIP6CguZ6xylumb3NJvKQWB9KzGSsDq9tRAFwFdr0tb0aGRMi//wg36IlkMX/BZE9zX+BOOaxPq9MTs+UqkLwMwtO+UHab0EXL0qKJXe02wqeFIyZDXJ+ZLns=
Envelope-to: www-data@xxxxxxxxxxxxxxxxxx
In-reply-to: <44CE83B1.1090605@xxxxxx>
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
References: <44CE5C89.4070602@xxxxxx> <eacc82a40607311256s79c6b2a8tbdae53f6761fd39@xxxxxxxxxxxxxx> <44CE83B1.1090605@xxxxxx>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
So, are you just ignoring the notion of minimizing downtime for the
moment or is there something I'm missing?

That's exactly what I'm suggesting.  The current risk is a (very slim)
write-after-write error case.  Basically, you have a number of
in-flight write requests on the original machine that's somewhere in
between the backend and the physical disk at the time of migration.
Currently, you migrate and the shadow request ring reissues these on
the new host -- which is the right thing to do given that requests are
idempotent.  The problem is that the original in-flight requests can
still hit the disk some time later and cause problems.  The WAW is if
you write an update to a block that had an in-flight request
immediately on arriving at the new host, and it then gets overwritten
by the original request.

Note that for sane block devices this is extremely unlikely as the
aperture that we are talking about is basically whatever is in the
disk's request queue-- it's only really a problem for things like
NFS+loopback and other instances of buffered I/O behind blockback
(which is generally a really bad idea!) where you could see a large
window of outstanding requests that haven't actually hit the disk.
These situations probably need more than just waiting for blkback to
clear pending reqs, as loopback will acknowledge requests befre they
hit the disk in some cases.

So, I think the short-term correctness-preserving approach is to (a)
modify the migration process to add an interlock on block backends on
the source physical machine to go to a closed state -- indicating that
all the outstanding requests have cleared, and (b) not to use
loopback, or buffered IO generally, behind blkback when you intend to
do migration.  The blktap code in the tree is much safer for this sort
of thing and we're happy to sort out migration problems if/when they
come up.

If this winds up adding a big overhead to migration switching time (I
don't think it should, block shutdown can be parallelized with the
stop-and-copy round of migration -- you'll be busy transferring all
the dirty pages that you've queued for DMA anyway) we can probably
speed it up.  One option would be to look into whether the linux block
layer will let you abort submitted requests.  Another would be to
modify the block frontend driver to realize that it's just been
migrated and queue all requests to blocks that were in it's shadow
ring until it receives notification that those writes have cleared
from the original host.  As you point out -- these are probably best
left as a second step. ;)

I'd be interested to know if anyone on the list is solving this sort
of thing already using some sort of storage fencing fanciness to just
sink any pending requests on the original host after migration has


Xen-devel mailing list

<Prev in Thread] Current Thread [Next in Thread>