On Thu, Nov 25, 2010 at 11:46:40AM -0800, Jeremy Fitzhardinge wrote:
> The latter. There's a question over whether WRITE_BARRIER really
> supports empty barriers, since it appears that none of the existing
> backends implement it correctly - but on the other hand, the kernel
> blkback code does *try* to implement it, even though it fails. This
> change makes empty WRITE_BARRIERS work in qemu, which is helpful because
> the upstream blkfront tries to use them.
So far so good.
> But WRITE_BARRIER is fundamentally suboptimal for Linux's needs because
> it is a fully ordered barrier operation. What Linux needs is a simple
> FLUSH operation which just makes sure that previously completed writes
> are fully flushed out of any caches and buffers and are really on
> durable storage. It has no ordering requirements, so it doesn't prevent
> subsequent writes from being handled while the flush is going on.
> Christoph is therefore recommending that we add a specific FLUSH
> operation to the protocol with these properties so that we can achieve
> the best performance. But if the backend lacks FLUSH, we still need a
> reliable WRITE_BARRIER.
The problem is that qemu currently implements WRITE_BARRIER incorrectly,
empty or not. The Linux barrier primitive, which appears to extent 1:1
to Xen implies ordering semantics, which the blkback implementation
implementes by translating the write barrier back to a bio with the
barrier bit set. But the qemu backend does not impose any ordering,
so it gives you different behaviour from blkback. Is there any formal
specification of the Xen block protocol?
In the end the empty WRITE_BARRIER after this patch is equivalent to a
flush, which is fine for that particular command. The problem is that
a small minority of Linux filesystems actually relied on the ordering
semantics of a non-empty WRITE_BARRIER command, which qemu doesn't
But the patch doesn't make anything worse by also accepting empty
barrier writes, so I guess it's fine. My initial reply was just
supposed to be a reminder about the big dragon lurking here.
Xen-devel mailing list