This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
Home Products Support Community News


Re: [Xen-devel] Paravirtualised drivers for fully virtualised domains

To: Steven Smith <sos22-xen@xxxxxxxxxxxxx>
Subject: Re: [Xen-devel] Paravirtualised drivers for fully virtualised domains
From: Steve Ofsthun <sofsthun@xxxxxxxxxxxxxxx>
Date: Tue, 18 Jul 2006 19:24:52 -0400
Cc: xen-devel@xxxxxxxxxxxxxxxxxxx
Delivery-date: Tue, 18 Jul 2006 16:25:36 -0700
Envelope-to: www-data@xxxxxxxxxxxxxxxxxx
In-reply-to: <20060718203422.GA7497@xxxxxxxxx>
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
References: <20060718125106.GA4727@xxxxxxxxx> <44BD05B5.6050108@xxxxxxxxxxxxxxx> <20060718203422.GA7497@xxxxxxxxx>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
User-agent: Mozilla Thunderbird 1.0 (X11/20041206)
Steven Smith wrote:

Have you built the guest environment on anything other than a 2.6.16
version of Linux?  We ran into extra work supporting older linux versions.

#ifdef soup will get you back to about 2.6.12-ish without too many
problems.  These patches don't include that, since it would complicate

I was thinking about SLES9 (2.6.5), RHEL4 (2.6.9), RHEL3 (2.4.21).

You did some work to make xenbus a loadable module in the guest domains.
Can this be used to make xenbus loadable in Domain 0?

I can't see any immediate reason why not, but it's not clear to me why
that would be useful.

It just makes it easier to insert alternate bus implementations.

Domain 0 buffer cache coherency issues can cause catastrophic file
system corruption.  This is due to the backend accessing the backing
device directly, and QEMU accessing the device through buffered
reads and writes. We are working on a patch to convert QEMU to use
O_DIRECT whenever possible.  This solves the cache coherency issue.

I wasn't aware of these issues.  I was much more worried about domU
trying to cache the devices twice, and those caches getting out of
sync.  It's pretty much the usual problem of configuring a device into
two domains and then having them trip over each other.  Do you have a
plan for dealing with this?

We eliminate any buffer cache use in domain 0 for backing store objects.
This prevents double caching and reduces domain 0 's memory footprint.
We don't restrict multiple domain access to the same "raw" backing
object.  Real hardware allows this (at least for SCSI/FC).  This may be
necessary for shared storage clustering.

Actually presenting two copies of the same device to linux can cause
its own problems.  Mounting using LABEL= will complain about duplicate
labels.  However, using the device names directly seems to work.  With
this approach it is possible to decide in the guest whether to mount
a device as an emulated disk or a PV disk.

My plan here was to just not support VMs which mix paravirtualised and
ioemulated devices, requiring the user to load the PV drivers from an
initrd.  Of course, you have to load the initrd somehow, but the
bootloader should only be reading the disk, which makes the coherency
issues much easier.  As a last resort, rombios could learn about the
PV devices, but I'd rather avoid that if possible.

Your way would be preferable, though, if it works.

We currently only allow this for the boot device (mainly to avoid the
rombios work you mention).  In addition, we make the qemu device only
visible to the rombios (and not the guest O/S) by controlling the IDE
probe logic in qemu.

-- Adding a new device to the qemu PCI bus which is used for
 bootstrapping the devices and getting an IRQ.

Have you thought about supporting more than one IRQ.  We are experimenting
with an IRQ per device class (BUS, NIC, VBD).

I considered it, but it wasn't obvious that there would be much
benefit.  You can potentially scan a smaller part of the pending event
channel mask, but that's fairly quick already.

The main benefit we see is for legacy Linux variants that limit 1 CPU
per IRQ.  Allowing additional IRQs increases the possible interrupt
processing concurrency.  In addition, one interrupt class can't starve
another (on SMP guests).

Steve Ofsthun - Virtual Iron Software, Inc.

Xen-devel mailing list