[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [Xen-devel] Full virtualization and I/O


> -----Original Message-----
> From: Liang Yang [mailto:multisyncfe991@xxxxxxxxxxx] 
> Sent: 22 November 2006 16:51
> To: Petersson, Mats
> Cc: Thomas Heinz; xen-devel@xxxxxxxxxxxxxxxxxxx
> Subject: Re: [Xen-devel] Full virtualization and I/O
> Hi Mats,
> Thanks for your explanation in such details.
> As you mentioned in your post, could you elaborate using 
> unmodified driver 
> in HVM domain (i.e. using front-end driver in 
> full-virtualized domain)? Do 
> you think para-virtualized domain will have exactly the same 
> behavior as 
> full-virtualized domain when both of them are using this 
> unmodified driver 
> to access virtual block devices?

Not sure exactly what you're asking, but if you're asking if the
performance of driver-related work will be approximately the same, yes. 

By the way, I wouldn't call that an "unmodified" driver - it is
definitely a MODIFIED driver (a para-virtual driver). 

> Best regards,
> Liang
> ----- Original Message ----- 
> From: "Petersson, Mats" <Mats.Petersson@xxxxxxx>
> To: "Thomas Heinz" <thomasheinz@xxxxxxx>; 
> <xen-devel@xxxxxxxxxxxxxxxxxxx>
> Sent: Wednesday, November 22, 2006 9:24 AM
> Subject: RE: [Xen-devel] Full virtualization and I/O
> > -----Original Message-----
> > From: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
> > [mailto:xen-devel-bounces@xxxxxxxxxxxxxxxxxxx] On Behalf Of
> > Thomas Heinz
> > Sent: 20 November 2006 23:39
> > To: xen-devel@xxxxxxxxxxxxxxxxxxx
> > Subject: [Xen-devel] Full virtualization and I/O
> >
> > Hi
> >
> > Full virtualization is about providing multiple virtual ISA level
> > environments and mapping them to a single physical one. One
> > particular
> > aspect of this mapping are I/O instructions (explicit or
> > mmapped I/O). In
> > general, there are two strategies to partition the devices,
> > either in time
> > or in space. Partitioning a device in space means that the
> > device (or a
> > part of it) is exclusively available to a single VM.
> > Partitioning a device
> > in time (or time multiplexing) means that it can be used by
> > multiple VMs
> > but only one VM may use it at any point in time.
> The Xen approach is to not allow any sharing of devices, a device is
> owned by one domain, no other domain can directly access the device.
> There is a protocol of so called frontend/backend driver which is
> basically a dummy-device that forwards a request to another domain
> (normally domain 0) and the other half of the driver-pair is 
> picking up
> this data, forwards it to some processing task, that then sends the
> packet onto the real hardware.
> For fully virtualized mode (hardware supported virtual 
> machine, such as
> AMD-V or Intel VT, aka HVM), there is a different model, 
> where a "device
> model" is involved to perform the hardware modelling. In Xen, this is
> using a modified version of qemu (called qemu-dm), which has a fairly
> complete set of "hardware" in it's model. It's got for example IDE
> controller, several types of network devices, graphics and
> mouse/keyboard models. The things you'd usually find in a PC, that is.
> The way it works is that the hypervisor intercepts IOIO and memory
> mapped IO regions that match the devices involved (such as the
> A0000-BFFFF region for VGA frame buffer memory or the 0x1F0-0x1F7 IO
> ports for the IDE controller), and forwards a request from the
> hypervisor to qemu-dm, where the operation changes the current state,
> and when it's necessary, the state-change will result in for example a
> read-request to the "hard-disk" (which may be a real disk, a file on a
> local disk, or a file on a network storage device, to give some
> examples).
> There is also the option of using the frontend drivers as described
> above in the fully virtualized model.
> Finally, while I'm on the subject of fully virtualized mode: It is
> currently not possible to give a DMA-based device to a 
> fully-virtualized
> domain. The reason for this is that the guest OS will have been told
> that memory is from 0..256MB (say), and it's actual machine physical
> address is at 256MB..512MB. The OS is completely unaware of this
> "mismatch". So the OS will perform some operation to take a virtual
> address of some buffer (say a network packet) and make it into a
> "physical address", which will be an address in the range of 0..256MB.
> This will of course (at least) lead to the wrong data being 
> transmitted,
> as the address of the actual data is somewhere in the range
> 256MB..512MB. The only solution to this is to have an IOMMU, which can
> translate the guest's understanding of a physical address 
> (0..256MB) to
> a machine physical address (256..512MB).
> >
> > I am trying to understand how I/O virtualization on the ISA
> > level works if
> > a device is shared between multiple VM instances. On a very
> > high level, it
> > should be as follows. First of all, the VMM has to intercept
> > the VM's I/O
> > commands (I/O instructions or load/store to dedicated memory
> > addresses -
> > let's ignore interrupts for the moment). This could be done
> > by traps or by
> > replacing the resp. instructions by VMM calls to I/O
> > primitives. The VMM
> > keeps multiple device model instances (one for each VM using
> > the device)
> > in memory. The models somehow reflect the low level I/O API
> > of the device.
> > Depending on which I/O command is issued by the VM, either 
> the memory
> > model is changed or a number of I/O instructions are executed
> > to make the
> > physical device state reflect the one represented in the 
> memory model.
> Do you by ISA mean "Instruction Set Architecture" or something else (I
> presume it's NOT meaning ISA-bus...)?
> Intercepting IOIO instructions or MMIO instructions is not that hard -
> in HVM the two processor architectures have specific intercepts and
> bitmaps to indicate which IO instructions should be intercepted. MMIO
> will require the page-tables to be set up such that the memory mapped
> region is mapped "not present" so that any operation to this region
> gives a page-fault, and then the page-fault is analyzed to see if it's
> for a MMIO address or for a "real page fault".
> For para-virtualization, the model is similar, but the exact model of
> how to intercept the IOIO or MMIO instruction is slightly different -
> but in essence it's the same principle. Let me know if you really need
> to know how Xen goes about doing this, as it's quite complicated (more
> so than the HVM version, for sure).
> >
> > This approach brings up a number of questions. It would be
> > great if some of
> > the virtualization experts here could shed some light on them
> > (even though
> > they are not immediately related to Xen, I know):
> >
> > - How do these device memory models look like? Is there a common
> >   (automata) theory behind or are they done ad hoc?
> Not sure what you're asking for here. Since the devices are either
> modeled after a REAL device (qemu-dm) and as such will resemble as
> closely as possible the REAL hardware device that it's 
> emulating, or in
> the frontend/backend driver, there is an "idealized model", such that
> the request contains just the basic data that the OS provides normally
> to the driver, and it's placed in a queue with a message-signaling
> system to tell the other side that it's got something in the queue.
> > - What kind of strategies/algorithms are used in the merge
> > phase, i.e. the
> >   phase where the virtual memory model and the physical one are
> >   synchronized? What kind of problems can occur in this phase?
> The Xen approach is to avoid this by only giving one device to each
> machine.
> > - Are specific usage patterns used in real world 
> implementations (e.g.
> >   VMWare) to simplify the virtualization (model or merge phase)?
> This is probably the wrong list to ask detailed questions about how
> VMWare works... ;-)
> > - Do you have any interesting pointers to literature dealing
> > with full I/O
> >   virtualization? In particular, how does VMWare's full 
> virtualization
> >   works with respect to I/O?
> Again, wrong list for VMWare questions.
> > - Is every device time partitionable? If not, which
> > requirements does it
> >   have to meet to be time partitionable?
> Certainly not - I would say that almost all devices are NOT time
> partitionable, as the state in the device is dependant on the current
> usage. The more complex the device is, the more likely it is to have
> difficulties, but even such a simple deevice as a serial port would
> struggle to work in a time-shared fashion (not to mention that serial
> ports generally are used for multiple transactions to make a whole
> "bigger picture transaction", so for example a web-server 
> connected via
> a serial modem would send a packet of several hundred bytes to the
> serial port driver, which is then portioned out as and when the serial
> port is ready to send another few bytes. If you switch from 
> one guest to
> another during this process, and the second guest also has 
> something to
> send on the serial port, you'd end up with a very scrambled 
> message from
> the first guest and quite likely the second guests message completely
> lost!).
> There are some devices that are specifically built to manage multiple
> hosts, but other than that, any sharing of a device requires some
> software to gather up "a full transaction" and then sending 
> that to the
> actual hardware, often also waiting for the transaction to 
> complete (for
> example the interrupt signal to say that the hard disk write is
> complete).
> >   -> I don't think every device is. What about a device 
> which supports
> >      different modes of operation. If two VMs drive the
> > virtual device in
> >      different modes, it may not be possible to constantly
> > switch between
> >      them. Ok, this is pretty artificial.
> A particular problem is devices where you can't necessarily read back
> the last mode-setting, which may well be the case in many different
> devices. You can't, for example, read back all the registers on an IDE
> device, because the read of a particular address amy give the status
> rather than the current comamnd sent, or some such.
> --
> Mats
> >
> > Thanks a lot for your help!
> >
> >
> > Best wishes
> >
> > Thomas
> >
> > _______________________________________________
> > Xen-devel mailing list
> > Xen-devel@xxxxxxxxxxxxxxxxxxx
> > http://lists.xensource.com/xen-devel
> >
> >
> >
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@xxxxxxxxxxxxxxxxxxx
> http://lists.xensource.com/xen-devel

Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.