This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
Home Products Support Community News


RE: [Xen-devel] Re: blktap: Sync with XCP, dropping zero-copy.

On Tue, 16 Nov 2010, Dave Scott wrote:
> Hi,
> Re: XCP's use of blktap2:
> > On Mon, 2010-11-15 at 13:27 -0500, Jeremy Fitzhardinge wrote:
> > > On 11/12/2010 07:55 PM, Daniel Stodden wrote:
> > > > The second issue I see is the XCP side of things. XenServer got a
> > lot of
> > > > benefit out of blktap2, and particularly because of the tapdevs. It
> > > > promotes a fairly rigorous split between a blkback VBD, controlled
> > by
> > > > the agent, and tapdevs, controlled by XS's storage manager.
> > > >
> > > > That doesn't prevent blkback to go into userspace, but it better
> > won't
> > > > share a process with some libblktap, which in turn would better not
> > be
> > > > controlled under the same xenstore path.
> > >
> > >
> > > Could you elaborate on this?  What was the benefit?
> > 
> > It's been mainly a matter of who controls what. Blktap1 was basically a
> > VBD, controlled by the agent. Blktap2 is a VDI represented as a block
> > device. Leaving management of that to XCP's storage manager, which just
> > hands that device node over to Xapi simplified many things. Before, the
> > agent had to understand a lot about the type of storage, then talk to
> > the right backend accordingly. Worse, in order to have storage
> > management control a couple datapath features, you'd basically have to
> > talk to Xapi, which would talk though xenstore to blktap, which was a
> > bit tedious. :)
> As Daniel says, XCP currently separates domain management (setting up, 
> rebooting VMs) from storage management (attaching disks, snapshot, coalesce). 
> In the current design the storage layer handles the storage control-path 
> (instigating snapshots, clones, coalesce, dedup in future) through a storage 
> API ("SMAPI") and provides a uniform interface to qemu, blkback for the 
> data-path (currently in the form of a dom0 block device). In a VM start, xapi 
> will first ask the storage control-path to make a disk available, and then 
> pass this information to blkback/qemu.
> One of the trickiest things XCP handles is vhd "coalesce": merging a vhd file 
> into its "parent". This comes up because vhds are arranged in a tree 
> structure where the leaves are separate independent VM disks and the nodes 
> represent shared common blocks, the result of (eg) cloning a single VM lots 
> of times. When guest disks are deleted and the vhd leaves are removed, it 
> sometimes becomes possible to save space by merging nodes together. The 
> tricky bit is doing this while I/O is still being performed in parallel 
> against logically separate (but related by parentage/history) disks on 
> different hosts. It's necessary for the thing doing the coalescing to know 
> where all the I/O is going on (eg to be able to find the host and pid where 
> the related tapdisks (or qemus) live) and it's necessary for it to be able to 
> signal to these processes when they need to re-read the vhd tree metadata.
> In the bad old blktap1 days, the storage control-path didn't know enough 
> about the data-path to reliably signal the active tapdisks: IIRC the tapdisks 
> were spawned by blktapctrl as a side-effect of the domain manager writing to 
> xenstore. In the much better blktap2 days :) the storage control-path sets up 
> (registers?) the data-path (currently via tap-ctl and a dom0 block device) 
> and so it knows who to talk to in order to co-ordinate a coalesce.
> So I think the critical thing is to be able to have the storage control-path 
> able to do something to "register" a data-path, enabling it to find later and 
> signal any processes using that data-path. There are a bunch of different 
> possibilities the storage control-path could use instead of using tap-ctl to 
> create a block device, including:

Qemu could be spawned directly (even before the VM) and QMP could
be use to communicate with it.
The qemu pid and/or the socket to issue QMP commands could be used as

> I'm sure there are lots of possibilities :-)

Xen-devel mailing list

<Prev in Thread] Current Thread [Next in Thread>