xen-devel
RE: [Xen-devel] Re: blktap: Sync with XCP, dropping zero-copy.
Hi,
Re: XCP's use of blktap2:
> On Mon, 2010-11-15 at 13:27 -0500, Jeremy Fitzhardinge wrote:
> > On 11/12/2010 07:55 PM, Daniel Stodden wrote:
> > > The second issue I see is the XCP side of things. XenServer got a
> lot of
> > > benefit out of blktap2, and particularly because of the tapdevs. It
> > > promotes a fairly rigorous split between a blkback VBD, controlled
> by
> > > the agent, and tapdevs, controlled by XS's storage manager.
> > >
> > > That doesn't prevent blkback to go into userspace, but it better
> won't
> > > share a process with some libblktap, which in turn would better not
> be
> > > controlled under the same xenstore path.
> >
> >
> > Could you elaborate on this? What was the benefit?
>
> It's been mainly a matter of who controls what. Blktap1 was basically a
> VBD, controlled by the agent. Blktap2 is a VDI represented as a block
> device. Leaving management of that to XCP's storage manager, which just
> hands that device node over to Xapi simplified many things. Before, the
> agent had to understand a lot about the type of storage, then talk to
> the right backend accordingly. Worse, in order to have storage
> management control a couple datapath features, you'd basically have to
> talk to Xapi, which would talk though xenstore to blktap, which was a
> bit tedious. :)
As Daniel says, XCP currently separates domain management (setting up,
rebooting VMs) from storage management (attaching disks, snapshot, coalesce).
In the current design the storage layer handles the storage control-path
(instigating snapshots, clones, coalesce, dedup in future) through a storage
API ("SMAPI") and provides a uniform interface to qemu, blkback for the
data-path (currently in the form of a dom0 block device). In a VM start, xapi
will first ask the storage control-path to make a disk available, and then pass
this information to blkback/qemu.
One of the trickiest things XCP handles is vhd "coalesce": merging a vhd file
into its "parent". This comes up because vhds are arranged in a tree structure
where the leaves are separate independent VM disks and the nodes represent
shared common blocks, the result of (eg) cloning a single VM lots of times.
When guest disks are deleted and the vhd leaves are removed, it sometimes
becomes possible to save space by merging nodes together. The tricky bit is
doing this while I/O is still being performed in parallel against logically
separate (but related by parentage/history) disks on different hosts. It's
necessary for the thing doing the coalescing to know where all the I/O is going
on (eg to be able to find the host and pid where the related tapdisks (or
qemus) live) and it's necessary for it to be able to signal to these processes
when they need to re-read the vhd tree metadata.
In the bad old blktap1 days, the storage control-path didn't know enough about
the data-path to reliably signal the active tapdisks: IIRC the tapdisks were
spawned by blktapctrl as a side-effect of the domain manager writing to
xenstore. In the much better blktap2 days :) the storage control-path sets up
(registers?) the data-path (currently via tap-ctl and a dom0 block device) and
so it knows who to talk to in order to co-ordinate a coalesce.
So I think the critical thing is to be able to have the storage control-path
able to do something to "register" a data-path, enabling it to find later and
signal any processes using that data-path. There are a bunch of different
possibilities the storage control-path could use instead of using tap-ctl to
create a block device, including:
1. directly spawn a tapdisk2 userspace process. Some identifier (pid, unix
domain socket) could be passed to qemu allowing it to perform I/O. The block
backend could be either in the tapdisk2 directly or in qemu?
2. return a (path to vhd file, callback unix domain socket). This could be
passed to qemu (or something else) and qemu could use the callback socket to
register its intention to use the data-path (and hence that it needs to be
signaled if something changes)
I'm sure there are lots of possibilities :-)
Cheers,
Dave
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
|
<Prev in Thread] |
Current Thread |
[Next in Thread>
|
- Re: [Xen-devel] Re: blktap: Sync with XCP, dropping zero-copy., (continued)
- [Xen-devel] Re: blktap: Sync with XCP, dropping zero-copy., Daniel Stodden
- Re: [Xen-devel] Re: blktap: Sync with XCP, dropping zero-copy., Stefano Stabellini
- Re: [Xen-devel] Re: blktap: Sync with XCP, dropping zero-copy., Konrad Rzeszutek Wilk
- Re: [Xen-devel] Re: blktap: Sync with XCP, dropping zero-copy., Stefano Stabellini
- Re: [Xen-devel] Re: blktap: Sync with XCP, dropping zero-copy., Daniel Stodden
- Re: [Xen-devel] Re: blktap: Sync with XCP, dropping zero-copy., Stefano Stabellini
- Re: [Xen-devel] Re: blktap: Sync with XCP, dropping zero-copy., Jonathan Ludlam
- RE: [Xen-devel] Re: blktap: Sync with XCP, dropping zero-copy.,
Dave Scott <=
- RE: [Xen-devel] Re: blktap: Sync with XCP, dropping zero-copy., Stefano Stabellini
- [Xen-devel] Re: blktap: Sync with XCP, dropping zero-copy., Jeremy Fitzhardinge
- [Xen-devel] Re: blktap: Sync with XCP, dropping zero-copy., Daniel Stodden
- Re: [Xen-devel] Re: blktap: Sync with XCP, dropping zero-copy., Ian Campbell
- Re: [Xen-devel] Re: blktap: Sync with XCP, dropping zero-copy., Daniel Stodden
- Re: [Xen-devel] Re: blktap: Sync with XCP, dropping zero-copy., Ian Campbell
- Re: [Xen-devel] Re: blktap: Sync with XCP, dropping zero-copy., Daniel Stodden
- Re: [Xen-devel] Re: blktap: Sync with XCP, dropping zero-copy., Ian Campbell
- [Xen-devel] Re: blktap: Sync with XCP, dropping zero-copy., Jeremy Fitzhardinge
- [Xen-devel] Re: blktap: Sync with XCP, dropping zero-copy., Daniel Stodden
|
|
|