This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
Home Products Support Community News


Re: [Xen-devel] Full virtualization and I/O

To: "Petersson, Mats" <Mats.Petersson@xxxxxxx>
Subject: Re: [Xen-devel] Full virtualization and I/O
From: "Liang Yang" <multisyncfe991@xxxxxxxxxxx>
Date: Wed, 22 Nov 2006 09:51:19 -0700
Cc: xen-devel@xxxxxxxxxxxxxxxxxxx, Thomas Heinz <thomasheinz@xxxxxxx>
Delivery-date: Wed, 22 Nov 2006 08:51:30 -0800
Envelope-to: www-data@xxxxxxxxxxxxxxxxxx
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
References: <907625E08839C4409CE5768403633E0B018E1752@xxxxxxxxxxxxxxxxx>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
Hi Mats,

Thanks for your explanation in such details.

As you mentioned in your post, could you elaborate using unmodified driver in HVM domain (i.e. using front-end driver in full-virtualized domain)? Do you think para-virtualized domain will have exactly the same behavior as full-virtualized domain when both of them are using this unmodified driver to access virtual block devices?
Best regards,


----- Original Message ----- From: "Petersson, Mats" <Mats.Petersson@xxxxxxx>
To: "Thomas Heinz" <thomasheinz@xxxxxxx>; <xen-devel@xxxxxxxxxxxxxxxxxxx>
Sent: Wednesday, November 22, 2006 9:24 AM
Subject: RE: [Xen-devel] Full virtualization and I/O

-----Original Message-----
From: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
[mailto:xen-devel-bounces@xxxxxxxxxxxxxxxxxxx] On Behalf Of
Thomas Heinz
Sent: 20 November 2006 23:39
To: xen-devel@xxxxxxxxxxxxxxxxxxx
Subject: [Xen-devel] Full virtualization and I/O


Full virtualization is about providing multiple virtual ISA level
environments and mapping them to a single physical one. One
aspect of this mapping are I/O instructions (explicit or
mmapped I/O). In
general, there are two strategies to partition the devices,
either in time
or in space. Partitioning a device in space means that the
device (or a
part of it) is exclusively available to a single VM.
Partitioning a device
in time (or time multiplexing) means that it can be used by
multiple VMs
but only one VM may use it at any point in time.
The Xen approach is to not allow any sharing of devices, a device is
owned by one domain, no other domain can directly access the device.
There is a protocol of so called frontend/backend driver which is
basically a dummy-device that forwards a request to another domain
(normally domain 0) and the other half of the driver-pair is picking up
this data, forwards it to some processing task, that then sends the
packet onto the real hardware.

For fully virtualized mode (hardware supported virtual machine, such as
AMD-V or Intel VT, aka HVM), there is a different model, where a "device
model" is involved to perform the hardware modelling. In Xen, this is
using a modified version of qemu (called qemu-dm), which has a fairly
complete set of "hardware" in it's model. It's got for example IDE
controller, several types of network devices, graphics and
mouse/keyboard models. The things you'd usually find in a PC, that is.
The way it works is that the hypervisor intercepts IOIO and memory
mapped IO regions that match the devices involved (such as the
A0000-BFFFF region for VGA frame buffer memory or the 0x1F0-0x1F7 IO
ports for the IDE controller), and forwards a request from the
hypervisor to qemu-dm, where the operation changes the current state,
and when it's necessary, the state-change will result in for example a
read-request to the "hard-disk" (which may be a real disk, a file on a
local disk, or a file on a network storage device, to give some

There is also the option of using the frontend drivers as described
above in the fully virtualized model.

Finally, while I'm on the subject of fully virtualized mode: It is
currently not possible to give a DMA-based device to a fully-virtualized
domain. The reason for this is that the guest OS will have been told
that memory is from 0..256MB (say), and it's actual machine physical
address is at 256MB..512MB. The OS is completely unaware of this
"mismatch". So the OS will perform some operation to take a virtual
address of some buffer (say a network packet) and make it into a
"physical address", which will be an address in the range of 0..256MB.
This will of course (at least) lead to the wrong data being transmitted,
as the address of the actual data is somewhere in the range
256MB..512MB. The only solution to this is to have an IOMMU, which can
translate the guest's understanding of a physical address (0..256MB) to
a machine physical address (256..512MB).

I am trying to understand how I/O virtualization on the ISA
level works if
a device is shared between multiple VM instances. On a very
high level, it
should be as follows. First of all, the VMM has to intercept
the VM's I/O
commands (I/O instructions or load/store to dedicated memory
addresses -
let's ignore interrupts for the moment). This could be done
by traps or by
replacing the resp. instructions by VMM calls to I/O
primitives. The VMM
keeps multiple device model instances (one for each VM using
the device)
in memory. The models somehow reflect the low level I/O API
of the device.
Depending on which I/O command is issued by the VM, either the memory
model is changed or a number of I/O instructions are executed
to make the
physical device state reflect the one represented in the memory model.
Do you by ISA mean "Instruction Set Architecture" or something else (I
presume it's NOT meaning ISA-bus...)?

Intercepting IOIO instructions or MMIO instructions is not that hard -
in HVM the two processor architectures have specific intercepts and
bitmaps to indicate which IO instructions should be intercepted. MMIO
will require the page-tables to be set up such that the memory mapped
region is mapped "not present" so that any operation to this region
gives a page-fault, and then the page-fault is analyzed to see if it's
for a MMIO address or for a "real page fault".

For para-virtualization, the model is similar, but the exact model of
how to intercept the IOIO or MMIO instruction is slightly different -
but in essence it's the same principle. Let me know if you really need
to know how Xen goes about doing this, as it's quite complicated (more
so than the HVM version, for sure).

This approach brings up a number of questions. It would be
great if some of
the virtualization experts here could shed some light on them
(even though
they are not immediately related to Xen, I know):

- How do these device memory models look like? Is there a common
  (automata) theory behind or are they done ad hoc?
Not sure what you're asking for here. Since the devices are either
modeled after a REAL device (qemu-dm) and as such will resemble as
closely as possible the REAL hardware device that it's emulating, or in
the frontend/backend driver, there is an "idealized model", such that
the request contains just the basic data that the OS provides normally
to the driver, and it's placed in a queue with a message-signaling
system to tell the other side that it's got something in the queue.

- What kind of strategies/algorithms are used in the merge
phase, i.e. the
  phase where the virtual memory model and the physical one are
  synchronized? What kind of problems can occur in this phase?
The Xen approach is to avoid this by only giving one device to each

- Are specific usage patterns used in real world implementations (e.g.
  VMWare) to simplify the virtualization (model or merge phase)?
This is probably the wrong list to ask detailed questions about how
VMWare works... ;-)

- Do you have any interesting pointers to literature dealing
with full I/O
  virtualization? In particular, how does VMWare's full virtualization
  works with respect to I/O?
Again, wrong list for VMWare questions.

- Is every device time partitionable? If not, which
requirements does it
  have to meet to be time partitionable?
Certainly not - I would say that almost all devices are NOT time
partitionable, as the state in the device is dependant on the current
usage. The more complex the device is, the more likely it is to have
difficulties, but even such a simple deevice as a serial port would
struggle to work in a time-shared fashion (not to mention that serial
ports generally are used for multiple transactions to make a whole
"bigger picture transaction", so for example a web-server connected via
a serial modem would send a packet of several hundred bytes to the
serial port driver, which is then portioned out as and when the serial
port is ready to send another few bytes. If you switch from one guest to
another during this process, and the second guest also has something to
send on the serial port, you'd end up with a very scrambled message from
the first guest and quite likely the second guests message completely

There are some devices that are specifically built to manage multiple
hosts, but other than that, any sharing of a device requires some
software to gather up "a full transaction" and then sending that to the
actual hardware, often also waiting for the transaction to complete (for
example the interrupt signal to say that the hard disk write is

  -> I don't think every device is. What about a device which supports
     different modes of operation. If two VMs drive the
virtual device in
     different modes, it may not be possible to constantly
switch between
     them. Ok, this is pretty artificial.
A particular problem is devices where you can't necessarily read back
the last mode-setting, which may well be the case in many different
devices. You can't, for example, read back all the registers on an IDE
device, because the read of a particular address amy give the status
rather than the current comamnd sent, or some such.

Thanks a lot for your help!

Best wishes


Xen-devel mailing list

Xen-devel mailing list

Xen-devel mailing list