[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Xen-devel] RE: Pre-virtualization, was Re: linux/arch/xen/i386 or linux/arch/i386/xen

  • To: <xen-devel@xxxxxxxxxxxxxxxxxxx>
  • From: "Volkmar Uhlig" <volkmar@xxxxxxxxxx>
  • Date: Sat, 21 May 2005 00:32:58 +0200
  • Cc: Joshua LeVasseur <jtl@xxxxxxxxxx>
  • Delivery-date: Fri, 20 May 2005 22:32:30 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xensource.com>
  • Thread-index: AcVdWlYxv+URE4XuRVKqMSdvDTmI+wAL9kUA
  • Thread-topic: Pre-virtualization, was Re: linux/arch/xen/i386 or linux/arch/i386/xen

In the following, I address several of the emails from the discussion.

> [aq]
> Joshua, as I understand, this project would be a competitor of Xen?

This project is not a competitor but one piece in the big picture.  The
Xen system consists of three components: (1) the Xen kernel/hypervisor,
(2) the guest operating systems, and (3) support infrastructure such as
management tools.  A significant effort currently goes into the second
part--the adaptation of a large variety of guest operating systems via
manual paravirtualization.

Since 1997 we've been maintaining a Linux port to the L4 kernel
(starting with Linux 2.0 up to 2.6) and it was and is a significant
maintenance effort.  A similar effort is now started by the Xen
development and at least two more projects come to mind: user-mode Linux
and MkLinux.  All projects develop, maintain, and test a specialized
port of Linux to their specific underlying hypervisor.
Pre-virtualization tries to bridge the gap between the different
hypervisors as well as increase the confidence in the general code base.

> [Ronald G. Minnich]
> You still have to modify the kernel, it seems.

One can differentiate between three types of sensitive operations.
First, are the sensitive instructions which perform a privileged
operation which must be virtualized.  The second class are operations
that are sensitive memory operations such as APIC and memory-mapped
device registers.  The last are operations that the hypervisor or the
hardware prevents to be virtualized.

The first class of operations is taken care of at the compile stage.
For the second class we have an experimental profiling feedback loop.
There, the wedge traps sensitive memory operations and records the
instruction pointer and the intended operation (i.e., read device
register, write to page table).  In a second compile run we feed the
traces back into the compiler and annotate the recorded memory access
instruction--similar to the normal sensitive instructions.  The feedback
loop and profiling run can of course be avoided by manual annotation.
In the current release we performed that annotation by hand which gives
100% coverage (and avoids extensive instruction emulation and analysis).
The third class of sensitive operations is where the virtualization
layer diverges from the actual hardware.  For Xen that is the page
tables as well as DMA address calculations (latter also for L4).  Since
this is actually a modification of the underlying architecture it
requires manual modification of the kernel.

> [Ronald G. Minnich]
> Are there fewer mods? What is the advantage of this over Xen?

Yes, the modifications are significantly less.  The required
modifications (excluding the required DMA modifications for device
driver reuse and other experimental stuff) we need about 80 lines of
code for Xen.  Using the feedback loop eliminates almost all of those
modifications.  The modifications for L4 are slightly higher due to the
required relinking of Linux and high-level transparent virtualization
(e.g., mapping Linux tasks to L4 threads).  
That means a Linux kernel of pretty much any version can be adopted to
run on a hypervisor (Xen, L4, UML, rHype) with about 100 lines of code.
AFAIK, the current modifications and code duplications are at least one
(if not two) orders of magnitude more--for each individual hypervisor.

> [Anthony Liguori]
> I'd really like to see a "pure" form of pre-virtualization that 
> required no modifications at all to the underlying source tree.  
> Besides being interesting from an academic standpoint I think it 
> would be highly useful for support legacy Open Source operating 
> systems.

The feedback loop provides that but may have a performance penalty.  In
order to achieve 100% coverage, the wedge still has to track and protect
sensitive resources.  Manual annotation "uses" the engineer's brain to
achieve the safe 100% coverage.

> I'm very excited about this technology.


> I imagine that you can get all the benefits of binary-rewriting with 
> less complexity and better performance (with the only limitation 
> being that you have the source code for the OS which is fine by me).

Actually, systems that binary rewrite and even VT can benefit from
pre-virtualization, since they can make use of the additional
information provided in the binary.  Another important side effect is,
that pre-virtualization enables hardware from vendors that provide
binary-only drivers to be used in a virtualization environment.

> I noticed that you have a patch to xen-2.0.2.  One of the changes was 
> a known gcc bug that's been worked around in newer versions of 2.0.x.

The reason was for all environments to use the same version of Linux for
performance comparisons.  We haven't gotten around to upgrading to the
latest version.

> The other changes explicitly disabled SMP and ACPI.  Is there a 
> compelling reason to patch xen for this verses just specifying the 
> appropriate command line options to disable these things at run time?

We run Xen in PIC mode because we haven't completed the APIC model yet.
We patched it because we weren't aware of command line parameters (RTFM
;)))))  Consider we are mostly L4 hackers and Xen is "somewhat foreign"
to us...

> The Linux guest changes seem a big more substantial than I expected.  
> Is there a place where you break down what these changes were and why 
> the were necessary?  Do you have any idea of the additional amount of 
> work necessary to reduce these changes to as small as humanly possible
> (perhaps even no changes at all)?

Other than the reasons I've listed above the project still contains some
"project boot-strapping burden"; we've not cleaned up some superfluous
modifications yet and left others in place for ongoing experimental
evaluations (such as feedback-loop coverage evaluation and page table

> [aq]
> yes, but that is not a (scientific) paper, which would let us
> understand more about this technology.

Sorry for the limited details on the technology at this point.  A
scientific paper is still in the review process and thus not available
yet.  However, we assembled a short whitepaper that covers the basic
aspects available at:

- Volkmar

Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.