> -----Original Message-----
> From: Daniel Stodden [mailto:stodden@xxxxxxxxxx]
> Sent: 10 August 2006 19:08
> To: Al Boldi; Petersson@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx;
> Petersson, Mats
> Cc: xen-devel@xxxxxxxxxxxxxxxxxxx
> Subject: RE: [Xen-devel] Questioning the Xen Design of the VMM
> On Thu, 2006-08-10 at 18:34 +0200, Petersson, Mats wrote:
> > Context-switching is only part of the problem, as Daniel says.
> > IOMMU is a technology that is coming in future products from AMD
> > (and I'm sure Intel are working on such products as well.
> > IBM already have a chipset in production for some of the PowerPC
> > and x86-based servers).
> i didn't have a look yet at the papers from amd, but
> it may be of interest that the PCI interfaces (1998, maybe
> even earlier)
> built by sun for their ultrasparc processors already
> implemented such a
> beast. al, docs on the bridge should be available from sun online, if
> you're interested in such things.
> the basic idea being virtualization of the I/O address space, this
> feature is quite cool even if you don't give a single thought about
> system virtualization (sun probably didn't at that point).
> getting your
> hands on contiguous, dma-able memory areas can be a permanent headache
> in os and device driver design if you peripheral bus seeks physical
> memory untranslated. put a translation table in between and upstream
> transactions become a non-issue, without offloading any
> additional logic
> into the peripheral bus interface.
> mats, i suppose amd's iommu solves this as well?
Yes, of course [see note below]. The only thing it doesn't solve is if the OS
decides to swap the pages out - so there still needs to be a call to say "lock
this area into memory, don't allow it to move or be swapped out" - but that's
trivial compared to "make sure this [large] block of memory is contiguous so
that it can be transferred to the hard-disk as one transfer".
Of course, modern devices cope with this by using scatter/gather technology...
Note: It does somewhat depend on how you implement the software to control the
IOMMU and how you deal with memory allocation above and below this layer. Since
the idea of the IOMMU is to translate guest physical addresses to machine
physical addresses, when used in conjunction with a VMM, it doesn't necessarily
help driver-writers as such, because all it does is present the guest OS and
physical device with "the same view" of physical memory, so let's say that we
give a guest-OS a mapping of 0..256MB, that on the Machine physical level isn't
contiguous, the guest's physical view would still be contiguous [aside from the
regular PC hardware holes, of course] - but the OS would still have to use
contiguous regions to give to the hardware [assuming HW hasn't got
scatter/gather], since the guest doesn't have control over the IOMMU itself -
just like nested paging gives the guest it's own level of paging on top of an
already virtual address, the IOMMU gives the guest a "virtual" PCI-space that
matches it's guest-physical view.
So, let's make a trivial example [using contiguous machine physical range -
which may not be the case in real life]:
IOMMU would then map the 256MB of guest to the relevant machine physical
In a driver, we are given the address 0x12345000, and 12K (three pages long) as
a buffer for a pci device. The driver will do a virt_to_phys() call to the OS,
which gives it an address in the 0..256MB range, say 0x1005000 - this address
can then be given to the pci device, to translate it. But if the page
0x12346000 isn't mapped to the next guest-physical address (0x1006000), then
you'd still have to deal with that in some way [presumably by allocating a new
buffer with a "please make this contiguous" flag and copying the data or by
sending the data in 4KB chunks].
I hope that's clear - it's rather confusing to think about all these things,
because there are several levels of translation, which makes life pretty
complicated. At least the IOMMU mapping should be pretty static.
> Daniel Stodden
> LRR - Lehrstuhl für Rechnertechnik und Rechnerorganisation
> Institut für Informatik der TU München D-85748 Garching
> http://www.lrr.in.tum.de/~stodden mailto:stodden@xxxxxxxxxx
> PGP Fingerprint: F5A4 1575 4C56 E26A 0B33 3D80 457E 82AE B0D8 735B
Xen-devel mailing list