This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
Home Products Support Community News


RE: [Xen-devel] Questioning the Xen Design of the VMM

To: "Daniel Stodden" <stodden@xxxxxxxxxx>, "Al Boldi" <a1426z@xxxxxxxxx>
Subject: RE: [Xen-devel] Questioning the Xen Design of the VMM
From: "Petersson, Mats" <Mats.Petersson@xxxxxxx>
Date: Thu, 10 Aug 2006 18:34:02 +0200
Cc: xen-devel@xxxxxxxxxxxxxxxxxxx
Delivery-date: Thu, 10 Aug 2006 09:34:49 -0700
Envelope-to: www-data@xxxxxxxxxxxxxxxxxx
In-reply-to: <1155225232.12509.43.camel@xxxxxxxxxxxxxxxxxxxxxxx>
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
Thread-index: Aca8lY6qljvbr4ehRwmffANrdlJJNAAAeC+Q
Thread-topic: [Xen-devel] Questioning the Xen Design of the VMM

> -----Original Message-----
> From: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx 
> [mailto:xen-devel-bounces@xxxxxxxxxxxxxxxxxxx] On Behalf Of 
> Daniel Stodden
> Sent: 10 August 2006 16:54
> To: Al Boldi
> Cc: xen-devel@xxxxxxxxxxxxxxxxxxx
> Subject: Re: [Xen-devel] Questioning the Xen Design of the VMM
> On Thu, 2006-08-10 at 17:57 +0300, Al Boldi wrote:
> > > > So HVM solves the problem, but why can't this layer be 
> implemented in
> > > > software?
> > >
> > > the short answer at the cpu level is "because of the 
> arcane nature of
> > > the x86 architecture" :/
> > 
> > Which AMDV/IntelVT supposedly solves?
> regarding the virtualization issue, yes.
> > > once the cpu problem has been solved, you'd need to 
> emulate hardware
> > > resources an unmodified guest system attempts to drive. 
> that again takes
> > > additional cycles. elimination of the peripheral hardware 
> interfaces by
> > > putting the I/O layers on top of an abstract low-level 
> path into the VMM
> > > is one of the reasons why xen is faster than others. many 
> systems do
> > > this quite successfully, even for 'non-modified' guests like e.g.
> > > windows, by installing dedicated, virtualization aware 
> drivers once the
> > > base installation went ok.
> > 
> > You mean "virtualization aware" drivers in the guest-OS?  
> Wouldn't this 
> > amount to a form of patching?
> yes, strictly speaking it is a modification. but one based 
> upon usually
> well-defined interfaces, and it does not require parsing opcodes and
> patching code segments.

Exactly. There's a big difference between applying patches to the existing 
binary, and adding new code by instaling a driver once the system is running. 

Compare for example that when you install Windows, it may not know how to drive 
nVidia's or ATI's latest graphics card, but you can install a new driver for 
it. You could, alternatively, perhaps patch the existing nVidia driver to make 
it work for the latest card, but most people prefer to grab the latest driver 
from www.nvidia.com or so...  

In this case, we're installing a driver that can talk via a defined interface 
to the hypervisor, and by doing so, allow us to get "fast" disk access, network 
access or even graphics.
> otoh, one which obviously needs to be reiterated for any additional
> guest os family.
> > > > I'm sure there can't be a performance issue, as this 
> virtualization
> > > > doesn't occur on the physical resource level, but is 
> (should be) rather
> > > > implemented as some sort of a multiplexed routing 
> algorithm, I think :)
> > >
> > > few device classes support resource sharing in that 
> manner efficiently.
> > > peripheral devices in commodity platforms are inherently 
> single-hosted
> > > and won't support unfiltered access by multiple driver 
> instances in
> > > several guests.
> > 
> > Would this be due to the inability of the peripheral to 
> switch contexts fast 
> > enough?
> maybe. more important: commodity peripherals typically wouldn't
> sufficiently implement security and isolation. you're certainly won't
> 'route' arbitraty block I/O from a guest system to your disk 
> controller
> without further investigation and translation. it may gladly overwrite
> your host partition or whatever resource you granted elsewhere.

Context-switching is only part of the problem, as Daniel says. IOMMU is a 
technology that is coming in future products from AMD (and I'm sure Intel are 
working on such products as well. IBM already have a chipset in production for 
some of the PowerPC and x86-based servers). This will solve address 
translation, but it won't solve problems with sharing devices - that will 
require some form of either context-switching (which may be acceptable for some 
devices) or hardware changes to allow multi-porting within the device with 
multiple ports to allow a separated interface. Or, if applicable to the device, 
a context-switch of the device. 

However, context switching of external devices is DIFFICULT for several 
reasons, one being: it's not always possible to read the "context" of a 
device... Many devices have write-only fields, and other types of "can't read 
it back" type of behaviour. 

For example, in an IDE controller, if the system just issued a non-DMA transfer 
of a sector, waited for the READY to come back from the IDE controller, and 
started writing bytes to the IDE interface, it can't stop writing bytes until 
you've reached the correct number as per what the interface expects (usually 
512 bytes). There is also, AFAIK, no way to tell how many bytes are left to 
write (or read in case of opposite direction transfers). This is obviusly 
"braindead" hardware, but it just so happens that much of the PC hardware, even 
in modern varieties, is pretty much "braindead" - i.e. it has no more 
intelligence in the device than absolutely necessary. I'm not sure how easy it 
is to interrogate the status of a DMA transfer, as I've never really dealt much 
with those. 

Another complexity in context-switching devices is that it really can't be done 
on-the-fly, but must be implemented on a "on-demand" basis [anything else would 
be FAR to slow - we don't want to do that many operations over the PCI bus that 

> > If so, how about a "AMDV/IntelVT" for peripherals?
> good idea, and actually practical. unfortunately, this is where it's
> getting expensive.

IOMMU isn't particularly expensive, but multiple ports within a device can get 
pretty complicated - and only really suitable for higher end devices in the 
first place.

> > > from the vmm perspective, it always boils down to 
> emulating the device.
> > > howerver, with varying degrees of complexity regarding 
> the translation
> > > of guest requests to physical access. it depends. ide, 
> afaik is known to
> > > work comparatively well.
> > 
> > Probably because IDE follows a well defined API?
> yes. however, i'm not an ide guy. 

I'm strictly not an IDE guy either, but I do know a fair bit about it, as I've 
written a bunch of test-code that uses the IDE interface to exercise the 
Xen-HVM/SVM code-paths involved with IO operations.

The IDE interface is pretty straightforward and simple, so it makes it easy to 
emulate for that reason. 

Other devices may have more complex interfaces, that are harder to write 
emulation code for. 
> > > an example of an area where it's getting more
> > > sportive would be network adapters.
> > >
> > > this is basically the whole problem when building 
> virtualization layers
> > > for cots platforms: the device/driver landscape spreads 
> to infinity :)
> > > since you'll have a hard time driving any possible combination by
> > > yourself, you need something else to do it. one solution 
> are hosted
> > > vmms, running on top of an existing operating system. a 
> second solution
> > > is what xen does: offload drivers to a modified guest 
> system which can
> > > then carry the I/O load from the additional, 
> nonprivileged guests as
> > > well.
> > 
> > Agreed; so let me rephrase the dilemma like this:
> > The PC platform was never intended to be used in a 
> virtualizing scenario, and 
> > therefore does not contain the infrastructure to support 
> this kind of a 
> > scenario efficiently, but this could easily be rectified by 
> introducing 
> > simple extensions, akin to AMDV/IntelVT, on all levels of 
> the PC hardware.
> > 
> > Is this a correct reading?
> yes, with restrictions. at this point in time, correct not from an
> economical standpoint. the whole "virtualization renaissance", we've
> been experiencing for the last 3 years or so builts upon the fact that
> PC hardware has become
>       1. terribly powerful, compared to the workloads most software 
>          systems then run actually require.
>       2. remained comparatively cheap, as it always used to.
> if you start to redesign the I/O system, you're likely to 
> raise the cost
> for the overall system.
> I/O virtualization down to the device level may come, but like with
> processor prices, it's all a "economy of scale".
> hardware-assisted virtualization at various places in the 
> architecture,
> however, including I/O, is a topic as well understood.
> may i again point you to some reading matter in that area:
> nair/smith: virtual machines.
> http://www.amazon.de/gp/product/1558609105/028-2651277-1478934
> ?v=glance&n=52044011
> excellent textbook on many aspects of system virtualization, including
> those covered by this conversation so far.
> > If so, has this been considered in the Xen design, so as to 
> accommodate any 
> > future hwV/VT/VMX extensions easily and quickly?
> vmx is all about processor virtualization. addtional topics would
> include memory virtualization (required, and available in the form of
> regular virtual memory; but might see additional 
> improvements.) and I/O
> virtualization. i see no reasons why those could not be supported by
> xen. as they are subsystems which have been backed in a portable and
> scalable fashion in the operating system landscape for many 
> year now. so
> the topic of how to accomodate changes in that area is not 
> particularly
> new.
> regards,
> daniel
> -- 
> Daniel Stodden
> LRR     -      Lehrstuhl für Rechnertechnik und Rechnerorganisation
> Institut für Informatik der TU München             D-85748 Garching
> http://www.lrr.in.tum.de/~stodden         mailto:stodden@xxxxxxxxxx
> PGP Fingerprint: F5A4 1575 4C56 E26A 0B33  3D80 457E 82AE B0D8 735B

Xen-devel mailing list