Xen project Mailing List

RE: [Xen-devel] Full virtualization and I/O

To: "Liang Yang" <multisyncfe991@xxxxxxxxxxx>

From: "Petersson, Mats" <Mats.Petersson@xxxxxxx>

Date: Wed, 22 Nov 2006 17:57:57 +0100

Cc: xen-devel@xxxxxxxxxxxxxxxxxxx, Thomas Heinz <thomasheinz@xxxxxxx>

Delivery-date: Wed, 22 Nov 2006 08:58:17 -0800

List-id: Xen developer discussion <xen-devel.lists.xensource.com>

Thread-index: AccOVqJMaVAYkN/sQZigy+XX5XKTawAABdDw

Thread-topic: [Xen-devel] Full virtualization and I/O

> -----Original Message----- > From: Liang Yang [mailto:multisyncfe991@xxxxxxxxxxx] > Sent: 22 November 2006 16:51 > To: Petersson, Mats > Cc: Thomas Heinz; xen-devel@xxxxxxxxxxxxxxxxxxx > Subject: Re: [Xen-devel] Full virtualization and I/O > > Hi Mats, > > Thanks for your explanation in such details. > > As you mentioned in your post, could you elaborate using > unmodified driver > in HVM domain (i.e. using front-end driver in > full-virtualized domain)? Do > you think para-virtualized domain will have exactly the same > behavior as > full-virtualized domain when both of them are using this > unmodified driver > to access virtual block devices? Not sure exactly what you're asking, but if you're asking if the performance of driver-related work will be approximately the same, yes. By the way, I wouldn't call that an "unmodified" driver - it is definitely a MODIFIED driver (a para-virtual driver). -- Mats > > Best regards, > > Liang > > ----- Original Message ----- > From: "Petersson, Mats" <Mats.Petersson@xxxxxxx> > To: "Thomas Heinz" <thomasheinz@xxxxxxx>; > <xen-devel@xxxxxxxxxxxxxxxxxxx> > Sent: Wednesday, November 22, 2006 9:24 AM > Subject: RE: [Xen-devel] Full virtualization and I/O > > > > -----Original Message----- > > From: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx > > [mailto:xen-devel-bounces@xxxxxxxxxxxxxxxxxxx] On Behalf Of > > Thomas Heinz > > Sent: 20 November 2006 23:39 > > To: xen-devel@xxxxxxxxxxxxxxxxxxx > > Subject: [Xen-devel] Full virtualization and I/O > > > > Hi > > > > Full virtualization is about providing multiple virtual ISA level > > environments and mapping them to a single physical one. One > > particular > > aspect of this mapping are I/O instructions (explicit or > > mmapped I/O). In > > general, there are two strategies to partition the devices, > > either in time > > or in space. Partitioning a device in space means that the > > device (or a > > part of it) is exclusively available to a single VM. > > Partitioning a device > > in time (or time multiplexing) means that it can be used by > > multiple VMs > > but only one VM may use it at any point in time. > > The Xen approach is to not allow any sharing of devices, a device is > owned by one domain, no other domain can directly access the device. > There is a protocol of so called frontend/backend driver which is > basically a dummy-device that forwards a request to another domain > (normally domain 0) and the other half of the driver-pair is > picking up > this data, forwards it to some processing task, that then sends the > packet onto the real hardware. > > For fully virtualized mode (hardware supported virtual > machine, such as > AMD-V or Intel VT, aka HVM), there is a different model, > where a "device > model" is involved to perform the hardware modelling. In Xen, this is > using a modified version of qemu (called qemu-dm), which has a fairly > complete set of "hardware" in it's model. It's got for example IDE > controller, several types of network devices, graphics and > mouse/keyboard models. The things you'd usually find in a PC, that is. > The way it works is that the hypervisor intercepts IOIO and memory > mapped IO regions that match the devices involved (such as the > A0000-BFFFF region for VGA frame buffer memory or the 0x1F0-0x1F7 IO > ports for the IDE controller), and forwards a request from the > hypervisor to qemu-dm, where the operation changes the current state, > and when it's necessary, the state-change will result in for example a > read-request to the "hard-disk" (which may be a real disk, a file on a > local disk, or a file on a network storage device, to give some > examples). > > There is also the option of using the frontend drivers as described > above in the fully virtualized model. > > Finally, while I'm on the subject of fully virtualized mode: It is > currently not possible to give a DMA-based device to a > fully-virtualized > domain. The reason for this is that the guest OS will have been told > that memory is from 0..256MB (say), and it's actual machine physical > address is at 256MB..512MB. The OS is completely unaware of this > "mismatch". So the OS will perform some operation to take a virtual > address of some buffer (say a network packet) and make it into a > "physical address", which will be an address in the range of 0..256MB. > This will of course (at least) lead to the wrong data being > transmitted, > as the address of the actual data is somewhere in the range > 256MB..512MB. The only solution to this is to have an IOMMU, which can > translate the guest's understanding of a physical address > (0..256MB) to > a machine physical address (256..512MB). > > > > > I am trying to understand how I/O virtualization on the ISA > > level works if > > a device is shared between multiple VM instances. On a very > > high level, it > > should be as follows. First of all, the VMM has to intercept > > the VM's I/O > > commands (I/O instructions or load/store to dedicated memory > > addresses - > > let's ignore interrupts for the moment). This could be done > > by traps or by > > replacing the resp. instructions by VMM calls to I/O > > primitives. The VMM > > keeps multiple device model instances (one for each VM using > > the device) > > in memory. The models somehow reflect the low level I/O API > > of the device. > > Depending on which I/O command is issued by the VM, either > the memory > > model is changed or a number of I/O instructions are executed > > to make the > > physical device state reflect the one represented in the > memory model. > > Do you by ISA mean "Instruction Set Architecture" or something else (I > presume it's NOT meaning ISA-bus...)? > > Intercepting IOIO instructions or MMIO instructions is not that hard - > in HVM the two processor architectures have specific intercepts and > bitmaps to indicate which IO instructions should be intercepted. MMIO > will require the page-tables to be set up such that the memory mapped > region is mapped "not present" so that any operation to this region > gives a page-fault, and then the page-fault is analyzed to see if it's > for a MMIO address or for a "real page fault". > > For para-virtualization, the model is similar, but the exact model of > how to intercept the IOIO or MMIO instruction is slightly different - > but in essence it's the same principle. Let me know if you really need > to know how Xen goes about doing this, as it's quite complicated (more > so than the HVM version, for sure). > > > > > > This approach brings up a number of questions. It would be > > great if some of > > the virtualization experts here could shed some light on them > > (even though > > they are not immediately related to Xen, I know): > > > > - How do these device memory models look like? Is there a common > > (automata) theory behind or are they done ad hoc? > > Not sure what you're asking for here. Since the devices are either > modeled after a REAL device (qemu-dm) and as such will resemble as > closely as possible the REAL hardware device that it's > emulating, or in > the frontend/backend driver, there is an "idealized model", such that > the request contains just the basic data that the OS provides normally > to the driver, and it's placed in a queue with a message-signaling > system to tell the other side that it's got something in the queue. > > > - What kind of strategies/algorithms are used in the merge > > phase, i.e. the > > phase where the virtual memory model and the physical one are > > synchronized? What kind of problems can occur in this phase? > > The Xen approach is to avoid this by only giving one device to each > machine. > > > - Are specific usage patterns used in real world > implementations (e.g. > > VMWare) to simplify the virtualization (model or merge phase)? > > This is probably the wrong list to ask detailed questions about how > VMWare works... ;-) > > > - Do you have any interesting pointers to literature dealing > > with full I/O > > virtualization? In particular, how does VMWare's full > virtualization > > works with respect to I/O? > > Again, wrong list for VMWare questions. > > > - Is every device time partitionable? If not, which > > requirements does it > > have to meet to be time partitionable? > > Certainly not - I would say that almost all devices are NOT time > partitionable, as the state in the device is dependant on the current > usage. The more complex the device is, the more likely it is to have > difficulties, but even such a simple deevice as a serial port would > struggle to work in a time-shared fashion (not to mention that serial > ports generally are used for multiple transactions to make a whole > "bigger picture transaction", so for example a web-server > connected via > a serial modem would send a packet of several hundred bytes to the > serial port driver, which is then portioned out as and when the serial > port is ready to send another few bytes. If you switch from > one guest to > another during this process, and the second guest also has > something to > send on the serial port, you'd end up with a very scrambled > message from > the first guest and quite likely the second guests message completely > lost!). > > There are some devices that are specifically built to manage multiple > hosts, but other than that, any sharing of a device requires some > software to gather up "a full transaction" and then sending > that to the > actual hardware, often also waiting for the transaction to > complete (for > example the interrupt signal to say that the hard disk write is > complete). > > > > -> I don't think every device is. What about a device > which supports > > different modes of operation. If two VMs drive the > > virtual device in > > different modes, it may not be possible to constantly > > switch between > > them. Ok, this is pretty artificial. > > A particular problem is devices where you can't necessarily read back > the last mode-setting, which may well be the case in many different > devices. You can't, for example, read back all the registers on an IDE > device, because the read of a particular address amy give the status > rather than the current comamnd sent, or some such. > > -- > Mats > > > > Thanks a lot for your help! > > > > > > Best wishes > > > > Thomas > > > > _______________________________________________ > > Xen-devel mailing list > > Xen-devel@xxxxxxxxxxxxxxxxxxx > > http://lists.xensource.com/xen-devel > > > > > > > > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@xxxxxxxxxxxxxxxxxxx > http://lists.xensource.com/xen-devel > > > > _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.