This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
Home Products Support Community News


RE: [Xen-devel] Questioning the Xen Design of the VMM

To: "Al Boldi" <a1426z@xxxxxxxxx>, "Daniel Stodden" <stodden@xxxxxxxxxx>
Subject: RE: [Xen-devel] Questioning the Xen Design of the VMM
From: "Petersson, Mats" <Mats.Petersson@xxxxxxx>
Date: Thu, 10 Aug 2006 17:42:35 +0200
Cc: xen-devel@xxxxxxxxxxxxxxxxxxx
Delivery-date: Thu, 10 Aug 2006 08:44:49 -0700
Envelope-to: www-data@xxxxxxxxxxxxxxxxxx
In-reply-to: <200608101755.11128.a1426z@xxxxxxxxx>
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
Thread-index: Aca8jOr9GKlGkbPoRRGOsKRh6/J/RQAAGbuA
Thread-topic: [Xen-devel] Questioning the Xen Design of the VMM

> -----Original Message-----
> From: Al Boldi [mailto:a1426z@xxxxxxxxx] 
> Sent: 10 August 2006 15:55
> To: Petersson, Mats; Daniel Stodden
> Cc: xen-devel@xxxxxxxxxxxxxxxxxxx
> Subject: Re: [Xen-devel] Questioning the Xen Design of the VMM
> Petersson, Mats wrote:
> > > Al Boldi wrote:
> > > You mean AMDV/IntelVT extensions?
> >
> > Yes.
> >
> > > If so, then these extensions don't actively participate 
> in the act of
> > > virtualization, but rather fix some x86-arch 
> shortcomings, that make it
> > > easier for software (i.e. Xen) to virtualize, thus 
> circumventing the need
> > > to do binary translation.  Is this a correct reading?
> >
> > Not sure what your exact meaning is here.
> >
> > What do you mean by "actively participate in the act of 
> virtualization".
> Is there any logic involved, that does some kind of a 
> translation/control?
> It seems not.

AMD has annouced feature called "Nested page tables", which will allow
the translation of page-table lookups, essentially adding another layer
of address translation, so we can give the guest "physical" a map of
[0..256MB], whilst we're actually giving it some (completely) random set
of physical pages that it actually gets to use. This is not available in
the current generation of chips, but it will be in the next... 

I believe that Intel has at least publicly stated that they have a
similar solution in the pipeline. 

We (AMD) have also publicly talked about IOMMU, which will help hardware
virtualiztion. I'll make more comments on that in reply to your other

So, in the current generation, shadow-page tables are used, so the
actual page-table used by the guest is write-protected, and when
write-faults occur, we replace the data written by the guest with a
translated value in a second page-table, which the guest never sees, but
the processor uses to translate the memory accesses. It's a fair bit
more work, but the guest is entirely unaware of the REAL PHYSICAL
address it lives at. 

> Daniel Stodden wrote:
> >
> > they fix the issues, removing the general need for binary 
> translation,
> > but go well beyond that as well.
> >
> > a comparatively simple example of where it goes beyond are privilege
> > levels. basic system virtualization would just move the 
> guest kernel to
> > a nonprivileged level to maintain control in the vmm. so 
> you'd have the
> > hypervisor in supervisor mode (that's why it's called a 
> hypervisor), and
> > both guest kernel and applications in user mode [1]. 
> [should note that
> > xen makes a difference here, using x86 privilege levels 
> which are more
> > complex].
> >
> > what vtx does is keeping the privilege rings in protected 
> mode untouched
> > by the virtualization features. instead, two whole new 
> modes are added:
> > 'vmx root' and 'vmx non-root'. the former applies to the 
> vmm, the latter
> > to the guests. _both_ of these basically implement the 
> protected mode as
> > it used to be. so hardware virtualization won't have to 
> muck around with
> > the regular privilege system.
> >
> > one example where this is particularly useful are hosted vmms, e.g.
> > vmware workstation. imagine a natively-running operating 
> system and a
> > machine monitor running on top of (or integrated with) 
> that. the system
> > would run in vmx-root mode. regular application processes 
> there in ring3
> > as they used to. additionally, one may start guest systems 
> on top of the
> > vmm, which again are implemented on top a regular x86 
> protected mode,
> > but in non-root mode.
> >
> > all of the above
> >  - can be functionally achieved _efficiently_ without hardware
> >    extensions like vmx
> >  - but ONLY as long as the privilege architecture supports
> >    virtualization
> >  - x86 does NOT [2]
> >    the pushf/popf outlined is an example of where the problems are
> >    - binary translation is a way to do it anyway, but does not count
> >      as 'efficient'.
> >
> > with vmx
> >   - efficient virtualization is achieved.
> >   - some things just get additional flexibility.
> So VMX doesn't really virtualize anything, but rather enables 
> software to 
> perform virtualization more efficiently.


> Petersson, Mats wrote:
> > There is no doubt that para-virtualization is one viable 
> solution to the
> > virtualization problem, but it's not the ONLY solution. 
> Each user has a
> > choice: Recompile and get performance, or run unmodified 
> code at lower
> > performance.
> Agreed, but how much lower performance are we talking about 
> in an HVM vs 
> para-virtualized scenario?

Unfortunately, this is not a trivial question to answer, since it
depends very much on what amoutn of hardware accesses are involved in
the system. I'm sure it can be concieved both cases that are 10x slower
and other cases where you get 98-99.9% of the original performance in
the virtual machine. 

Para-virtual is suppsed to be around 95-98% of native solution - but
again it depends on the workload what the exact figures are -
pathological cases can probably be found. 

A large percentage of any slowdown from HVM is caused by the way
hardware is emulated - using qemu-dm to model the virtual hardware. If
you have a disk benchmark, it's quite feasible that the native machine
has 10x or so the throughput of the HVM system. 

> Thanks!
> --
> Al

Xen-devel mailing list