xen-devel

[Top] [All Lists]

Re: [Xen-devel] Shouldn't backend devices for VMX domain disks be opened

from [Stephen Tweedie]

[Permanent Link][Original]

To:	Anthony Liguori <aliguori@xxxxxxxxxx>
Subject:	Re: [Xen-devel] Shouldn't backend devices for VMX domain disks be opened with O_DIRECT?
From:	Stephen Tweedie <sct@xxxxxxxxxx>
Date:	Thu, 02 Feb 2006 21:42:08 -0500
Cc:	Steve Dobbelstein <steved@xxxxxxxxxx>, "Philip R. Auld" <pauld@xxxxxxxxxxx>, "xen-devel@xxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxx>
Delivery-date:	Fri, 03 Feb 2006 02:52:22 +0000
Envelope-to:	www-data@xxxxxxxxxxxxxxxxxxx
In-reply-to:	<43E29F27.10009@xxxxxxxxxx>
List-help:	<mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id:	Xen developer discussion <xen-devel.lists.xensource.com>
List-post:	<mailto:xen-devel@lists.xensource.com>
List-subscribe:	<http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe:	<http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
References:	<43E27DA3.80405@xxxxxxxxxx> <OF4FC3AD2A.9B8EA7AB-ON06257109.007A4F76-06257109.007B7876@xxxxxxxxxx> <20060202224106.GC17266@xxxxxxxxxxxxxxxxxx> <43E29F27.10009@xxxxxxxxxx>
Sender:	xen-devel-bounces@xxxxxxxxxxxxxxxxxxx

Hi,

On Thu, 2006-02-02 at 18:09 -0600, Anthony Liguori wrote:

> Referring to the original question, which has been quoted away, 
> journaling doesn't require that data be written to disk per-say but that 
> writes occur in a particular order.  A journal is always recoverable 
> given that writes occur in the expected order.

Sure... it's *internally* consistent, maybe.  But you need more than
that.  You need guarantees that things are on disk, else external
consistency guarantees will be broken.

Consider things like sendmail fsync()ing a spool file before telling the
sender that the email has been accepted.  After that acknowledgement,
the sender can delete the mail from its queues knowing that the
recipient MTA definitely has the data, and even if it crashes, the mail
won't be lost.  Databases frequently have similar consistency
requirements.  If a power failure loses writes that you have told the
domU have completed --- even if you maintain write ordering --- then you
*are* putting application correctness at risk, there's no doubt about
it.

> A buffer cache will have 
> no effect on that order so you're no more likely to have corruption than 
> if you disabled the buffer cache.

Not if it's being used as a write-through cache.  If it's write-back, it
will have a major impact on ordering.

> You especially want the buffer cache if you have LVM partitions.  
> Sectors on an LVM disk are not necessarily contiguous and can even span 
> multiple disks.  You definitely want the IO scheduler involved there.

That does not at all imply the use of the buffer cache.  All that you
need to satisfy this is AIO (asynchronous *submission* of the IO)
combined with O_DIRECT IO (synchronous *completion*) --- ie. you can
submit multiple IOs concurrently, but you know for sure when each one
completes.  That still lets the elevator get strongly involved in the
scheduling and reordering of the IOs, but lets you know reliably when
things hit disk.

Fortunately, that's just what blkback is doing --- it's using submit_bio
to submit the write IOs without waiting for completion, and is using the
bio's bi_end_io callback to process the IO completion once it is hard on
disk.  

--Stephen

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel

[More with this subject...]

<Prev in Thread]	Current Thread	[Next in Thread>
[Xen-devel] Shouldn't backend devices for VMX domain disks be opened with O_DIRECT?, Steve Dobbelstein Re: [Xen-devel] Shouldn't backend devices for VMX domain disks be opened with O_DIRECT?, Anthony Liguori Re: [Xen-devel] Shouldn't backend devices for VMX domain disks be opened with O_DIRECT?, Steve Dobbelstein Re: [Xen-devel] Shouldn't backend devices for VMX domain disks be opened with O_DIRECT?, Philip R. Auld Re: [Xen-devel] Shouldn't backend devices for VMX domain disks be opened with O_DIRECT?, Anthony Liguori Re: [Xen-devel] Shouldn't backend devices for VMX domain disks be opened with O_DIRECT?, Luciano Miguel Ferreira Rocha Re: [Xen-devel] Shouldn't backend devices for VMX domain disks be opened with O_DIRECT?, Rik van Riel Re: [Xen-devel] Shouldn't backend devices for VMX domain disks be opened with O_DIRECT?, Stephen Tweedie <= Re: [Xen-devel] Shouldn't backend devices for VMX domain disks be opened with O_DIRECT?, Anthony Liguori Re: [Xen-devel] Shouldn't backend devices for VMX domain disks be opened with O_DIRECT?, Stephen C. Tweedie

Previous by Date:	Re: [Xen-devel] Shouldn't backend devices for VMX domain disks be opened with O_DIRECT?, Rik van Riel
Next by Date:	Re: [Xen-devel] Shouldn't backend devices for VMX domain disks be opened with O_DIRECT?, Anthony Liguori
Previous by Thread:	Re: [Xen-devel] Shouldn't backend devices for VMX domain disks be opened with O_DIRECT?, Rik van Riel
Next by Thread:	Re: [Xen-devel] Shouldn't backend devices for VMX domain disks be opened with O_DIRECT?, Anthony Liguori
Indexes:	[Date] [Thread] [Top] [All Lists]