Steven Smith wrote:
The attached patches allow you to use paravirtualised network and
block interfaces from fully virtualised domains, based on Intel's
patches from a few months ago. These are significantly faster than
the equivalent ioemu devices, sometimes by more than an order of
Excellent work Steven!
I've been working on a similar set of patches and your effort seems
quite comprehensive. I do have a few questions:
Can you comment on the testing matrix you used? In particular, does
this patch address both 32-bit and 64-bit hypervisors? Can 32-bit
guests make 64-bit hypercalls?
Have you built the guest environment on anything other than a 2.6.16
version of Linux? We ran into extra work supporting older linux versions.
You did some work to make xenbus a loadable module in the guest domains.
Can this be used to make xenbus loadable in Domain 0?
These drivers are explicitly not considered by XenSource to be an
alternative to improving the performance of the ioemu devices.
Rather, work on both will continue in parallel.
I agree. Both activities are worth developing.
There is a slight complication in that the paravirtualised block
device can't share an IDE controller with an ioemu device, so if you
have an ioemu hda, the paravirtualised device must be hde or later.
This is to avoid confusing the Linux IDE driver.
Note that having a PV device doesn't imply having a corresponding
ioemu device, and vice versa. Configuring a single backing store to
appear as both an IDE device and a paravirtualised block device is
likely to cause problems; don't do it.
Several problems exist here:
Domain 0 buffer cache coherency issues can cause catastrophic file
system corruption. This is due to the backend accessing the backing
device directly, and QEMU accessing the device through buffered reads
and writes. We are working on a patch to convert QEMU to use O_DIRECT
whenever possible. This solves the cache coherency issue.
Actually presenting two copies of the same device to linux can cause
its own problems. Mounting using LABEL= will complain about duplicate
labels. However, using the device names directly seems to work. With
this approach it is possible to decide in the guest whether to mount
a device as an emulated disk or a PV disk.
The patches consist of a number of big parts:
-- A version of netback and netfront which can copy packets into
domains rather than doing page flipping. It's much easier to make
this work well with qemu, since the P2M table doesn't need to
change, and it can be faster for some workloads.
Recent patches to change QEMU to dynamically map memory may make this
easier. We still avoid it to prevent large guest pages from being
broken up (under the XI shadow code).
The copying interface has been confirmed to work in paravirtualised
domains, but is currently disabled there.
-- Reworking the device model and hypervisor support so that iorequest
completion notifications no longer go to the HVM guest's event
channel mask. This avoids a whole slew of really quite nasty race
This is great news. We were filtering iorequest bits out during guest
event notification delivery. Your method is much cleaner.
-- Adding a new device to the qemu PCI bus which is used for
bootstrapping the devices and getting an IRQ.
Have you thought about supporting more than one IRQ. We are experimenting
with an IRQ per device class (BUS, NIC, VBD).
-- Support for hypercalls from HVM domains
-- Various shims and fixes to the frontends so that they work without
the rest of the xenolinux infrastructure.
The patches still have a few rough edges, and they're not as easy to
understand as I'd like, but I think they should be mostly
comprehensible and reasonably stable. The plan is to add them to
xen-unstable over the next few weeks, probably before 3.0.3, so any
testing which anyone can do would be helpful.
This is a very good start!
Steve Ofsthun - Virtual Iron Software, Inc.
Xen-devel mailing list