This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
Home Products Support Community News


Re: [Xen-devel] Paravirtualised drivers for fully virtualised domains

To: Steve Ofsthun <sofsthun@xxxxxxxxxxxxxxx>
Subject: Re: [Xen-devel] Paravirtualised drivers for fully virtualised domains
From: Steven Smith <sos22-xen@xxxxxxxxxxxxx>
Date: Tue, 18 Jul 2006 21:34:23 +0100
Cc: xen-devel@xxxxxxxxxxxxxxxxxxx
Delivery-date: Tue, 18 Jul 2006 13:34:49 -0700
Envelope-to: www-data@xxxxxxxxxxxxxxxxxx
In-reply-to: <44BD05B5.6050108@xxxxxxxxxxxxxxx>
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
References: <20060718125106.GA4727@xxxxxxxxx> <44BD05B5.6050108@xxxxxxxxxxxxxxx>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
> >The attached patches allow you to use paravirtualised network and
> >block interfaces from fully virtualised domains, based on Intel's
> >patches from a few months ago.  These are significantly faster than
> >the equivalent ioemu devices, sometimes by more than an order of
> >magnitude.
> I've been working on a similar set of patches and your effort seems
> quite comprehensive.
Yeah, we (XenSource and Virtual Iron) really need to do a better job
of coordinating who's working on what. :)

> I do have a few questions:
> Can you comment on the testing matrix you used? In particular, does
> this patch address both 32-bit and 64-bit hypervisors?  Can 32-bit
> guests make 64-bit hypercalls?
This set of patches only deals with the 32 bit case.  Further, the PAE
case depends on Tim Deegan's new shadow mode posted last week.

Sorry, I should have said that in the initial post.

> Have you built the guest environment on anything other than a 2.6.16
> version of Linux?  We ran into extra work supporting older linux versions.
#ifdef soup will get you back to about 2.6.12-ish without too many
problems.  These patches don't include that, since it would complicate

> You did some work to make xenbus a loadable module in the guest domains.
> Can this be used to make xenbus loadable in Domain 0?
I can't see any immediate reason why not, but it's not clear to me why
that would be useful.

> >There is a slight complication in that the paravirtualised block
> >device can't share an IDE controller with an ioemu device, so if you
> >have an ioemu hda, the paravirtualised device must be hde or later.
> >This is to avoid confusing the Linux IDE driver.
> >
> >Note that having a PV device doesn't imply having a corresponding
> >ioemu device, and vice versa.  Configuring a single backing store to
> >appear as both an IDE device and a paravirtualised block device is
> >likely to cause problems; don't do it.
> Domain 0 buffer cache coherency issues can cause catastrophic file
> system corruption.  This is due to the backend accessing the backing
> device directly, and QEMU accessing the device through buffered
> reads and writes. We are working on a patch to convert QEMU to use
> O_DIRECT whenever possible.  This solves the cache coherency issue.
I wasn't aware of these issues.  I was much more worried about domU
trying to cache the devices twice, and those caches getting out of
sync.  It's pretty much the usual problem of configuring a device into
two domains and then having them trip over each other.  Do you have a
plan for dealing with this?

> Actually presenting two copies of the same device to linux can cause
> its own problems.  Mounting using LABEL= will complain about duplicate
> labels.  However, using the device names directly seems to work.  With
> this approach it is possible to decide in the guest whether to mount
> a device as an emulated disk or a PV disk.
My plan here was to just not support VMs which mix paravirtualised and
ioemulated devices, requiring the user to load the PV drivers from an
initrd.  Of course, you have to load the initrd somehow, but the
bootloader should only be reading the disk, which makes the coherency
issues much easier.  As a last resort, rombios could learn about the
PV devices, but I'd rather avoid that if possible.

Your way would be preferable, though, if it works.

> >The patches consist of a number of big parts:
> >
> >-- A version of netback and netfront which can copy packets into
> >   domains rather than doing page flipping.  It's much easier to make
> >   this work well with qemu, since the P2M table doesn't need to
> >   change, and it can be faster for some workloads.
> Recent patches to change QEMU to dynamically map memory may make this
> easier.
Yes, agreed.  It should be possible to add this in later in a
backwards-compatible fashion.

> >-- Reworking the device model and hypervisor support so that iorequest
> >   completion notifications no longer go to the HVM guest's event
> >   channel mask.  This avoids a whole slew of really quite nasty race
> >   conditions
> This is great news.  We were filtering iorequest bits out during guest
> event notification delivery.  Your method is much cleaner.
Thank you.

> >-- Adding a new device to the qemu PCI bus which is used for
> >   bootstrapping the devices and getting an IRQ.
> Have you thought about supporting more than one IRQ.  We are experimenting
> with an IRQ per device class (BUS, NIC, VBD).
I considered it, but it wasn't obvious that there would be much
benefit.  You can potentially scan a smaller part of the pending event
channel mask, but that's fairly quick already.


> >-- Support for hypercalls from HVM domains
> >
> >-- Various shims and fixes to the frontends so that they work without
> >   the rest of the xenolinux infrastructure.
> >
> >The patches still have a few rough edges, and they're not as easy to
> >understand as I'd like, but I think they should be mostly
> >comprehensible and reasonably stable.  The plan is to add them to
> >xen-unstable over the next few weeks, probably before 3.0.3, so any
> >testing which anyone can do would be helpful.
> This is a very good start!
> Steve
> -- 
> Steve Ofsthun - Virtual Iron Software, Inc.

Attachment: signature.asc
Description: Digital signature

Xen-devel mailing list