This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
Home Products Support Community News


RE: [Xen-devel] Questioning the Xen Design of the VMM

To: "Daniel Stodden" <stodden@xxxxxxxxxx>, "Al Boldi" <a1426z@xxxxxxxxx>, Petersson@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
Subject: RE: [Xen-devel] Questioning the Xen Design of the VMM
From: "Petersson, Mats" <Mats.Petersson@xxxxxxx>
Date: Fri, 11 Aug 2006 10:41:39 +0200
Cc: xen-devel@xxxxxxxxxxxxxxxxxxx
Delivery-date: Fri, 11 Aug 2006 01:43:02 -0700
Envelope-to: www-data@xxxxxxxxxxxxxxxxxx
In-reply-to: <1155233268.14663.23.camel@xxxxxxxxxxxxxxxxxxxxxxx>
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
Thread-index: Aca8qbF8qksqEaHCRPu5RBnKzG0mhQAdgOIA
Thread-topic: [Xen-devel] Questioning the Xen Design of the VMM

> -----Original Message-----
> From: Daniel Stodden [mailto:stodden@xxxxxxxxxx] 
> Sent: 10 August 2006 19:08
> To: Al Boldi; Petersson@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx; 
> Petersson, Mats
> Cc: xen-devel@xxxxxxxxxxxxxxxxxxx
> Subject: RE: [Xen-devel] Questioning the Xen Design of the VMM
> On Thu, 2006-08-10 at 18:34 +0200, Petersson, Mats wrote:
> > Context-switching is only part of the problem, as Daniel says. 
> > IOMMU is a technology that is coming in future products from AMD
> >  (and I'm sure Intel are working on such products as well. 
> > IBM already have a chipset in production for some of the PowerPC 
> > and x86-based servers).
> i didn't have a look yet at the papers from amd, but
> it may be of interest that the PCI interfaces (1998, maybe 
> even earlier)
> built by sun for their ultrasparc processors already 
> implemented such a
> beast. al, docs on the bridge should be available from sun online, if
> you're interested in such things.
> the basic idea being virtualization of the I/O address space, this
> feature is quite cool even if you don't give a single thought about
> system virtualization (sun probably didn't at that point). 
> getting your
> hands on contiguous, dma-able memory areas can be a permanent headache
> in os and device driver design if you peripheral bus seeks physical
> memory untranslated. put a translation table in between and upstream
> transactions become a non-issue, without offloading any 
> additional logic
> into the peripheral bus interface.
> mats, i suppose amd's iommu solves this as well?

Yes, of course [see note below]. The only thing it doesn't solve is if the OS 
decides to swap the pages out - so there still needs to be a call to say "lock 
this area into memory, don't allow it to move or be swapped out" - but that's 
trivial compared to "make sure this [large] block of memory is contiguous so 
that it can be transferred to the hard-disk as one transfer". 

Of course, modern devices cope with this by using scatter/gather technology... 

Note: It does somewhat depend on how you implement the software to control the 
IOMMU and how you deal with memory allocation above and below this layer. Since 
the idea of the IOMMU is to translate guest physical addresses to machine 
physical addresses, when used in conjunction with a VMM, it doesn't necessarily 
help driver-writers as such, because all it does is present the guest OS and 
physical device with "the same view" of physical memory, so let's say that we 
give a guest-OS a mapping of 0..256MB, that on the Machine physical level isn't 
contiguous, the guest's physical view would still be contiguous [aside from the 
regular PC hardware holes, of course] - but the OS would still have to use 
contiguous regions to give to the hardware [assuming HW hasn't got 
scatter/gather], since the guest doesn't have control over the IOMMU itself - 
just like nested paging gives the guest it's own level of paging on top of an 
already virtual address, the IOMMU gives the guest a "virtual" PCI-space that 
matches it's guest-physical view. 

So, let's make a trivial example [using contiguous machine physical range - 
which may not be the case in real life]: 
Guest   Machine
0..256MB        256..512MB

IOMMU would then map the 256MB of guest to the relevant machine physical 

In a driver, we are given the address 0x12345000, and 12K (three pages long) as 
a buffer for a pci device. The driver will do a virt_to_phys() call to the OS, 
which gives it an address in the 0..256MB range, say 0x1005000 - this address 
can then be given to the pci device, to translate it. But if the page 
0x12346000 isn't mapped to the next guest-physical address (0x1006000), then 
you'd still have to deal with that in some way [presumably by allocating a new 
buffer with a "please make this contiguous" flag and copying the data or by 
sending the data in 4KB chunks]. 

I hope that's clear - it's rather confusing to think about all these things, 
because there are several levels of translation, which makes life pretty 
complicated. At least the IOMMU mapping should be pretty static. 

> regards,
> daniel
> -- 
> Daniel Stodden
> LRR     -      Lehrstuhl für Rechnertechnik und Rechnerorganisation
> Institut für Informatik der TU München             D-85748 Garching
> http://www.lrr.in.tum.de/~stodden         mailto:stodden@xxxxxxxxxx
> PGP Fingerprint: F5A4 1575 4C56 E26A 0B33  3D80 457E 82AE B0D8 735B

Xen-devel mailing list