This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
Home Products Support Community News


[Xen-devel] Re: blktap: Sync with XCP, dropping zero-copy.

On Wed, 2010-11-17 at 11:36 -0500, Andres Lagar-Cavilla wrote:
> I'll throw an idea there and you educate me why it's lame.
> Going back to the primary issue of dropping zero-copy, you want the block 
> backend (tapdev w/AIO or otherwise) to operate on regular dom0 pages, because 
> you run into all sorts of quirkiness otherwise: magical VM_FOREIGN 
> incantations to back granted mfn's with fake page structs that make 
> get_user_pages happy, quirky grant PTEs, etc.
> Ok, so how about something along the lines of GNTTABOP_swap? Eerily 
> reminiscent of (maligned?) GNTTABOP_transfer, but hear me out.
> The observation is that for a blkfront read, you could do the read all along 
> on a regular dom0 frame, and when stuffing the response into the ring, swap 
> the dom0 frame (mfn) you used with the domU frame provided as a buffer. Then 
> the algorithm folds out:
> 1. Block backend, instead of get_empty_pages_and_pagevec at init time, 
> creates a pool of reserved regular pages via get_free_page(s). These pages 
> have their refcount pumped, no one in dom0 will ever touch them.
> 2. When extracting a blkfront write from the ring, call GNTTABOP_swap 
> immediately. One of the backend-reserved mfn's is swapped with the domU mfn. 
> Pfn's and page struct's on both ends remain untouched.
> 3. For blkfront reads, call swap when stuffing the response back into the ring
> 4. Because of 1, dom0 can a) calmly fix its p2m (and kvaddr) after swap, much 
> like balloon and others do, without fear of races. More importantly, b) you 
> don't have a weirdo granted PTE, or work with a frame from other domain. It's 
> your page all along, dom0
> 5. One assumption for domU is that pages allocated as blkfront buffers won't 
> be touched by anybody, so a) it's safe for them to swap async with another 
> frame with undef contents and b) domU can fix its p2m (and kvaddr) when 
> pulling responses from the ring (the new mfn should be put on the response by 
> dom0 directly or through an opaque handle)
> 6. Scatter-gather vectors in ring requests give you a natural multicall 
> batching for these GNTTABOP_swap's. I.e. all these hypercalls won't happen as 
> often and at the granularity as skbuff's demanded for GNTTABOP_transfer
> 7. Potentially domU may want to use the contents in a blkfront write buffer 
> later for something else. So it's not really zero-copy. But the approach 
> opens a window to async memcpy . From the point of swap when pulling the req 
> to the point of pushing the response, you can do memcpy at any time. Don't 
> know about how practical that is though.
> Problems at first glance:
> 1. To support GNTTABOP_swap you need to add more if(version) to blkfront and 
> blkback.
> 2. The kernel vaddr will need to be managed as well by dom0/U. Much like 
> balloon or others: hypercall, fix p2m, and fix kvaddr all need to be taken 
> care of. domU will probably need to neuter its kvaddr before granting, and 
> then re-establish it when the response arrives. Weren't all these hypercalls 
> ultimately more expensive than memcpy for GNTABOP_transfer for netback?
> 3. Managing the pool of backend reserved pages may be a problem?

I guess GNT_transfer for network I/O died because of the double-ended
TLB fallout?

Still liked the general direction, nice shot.


Xen-devel mailing list

<Prev in Thread] Current Thread [Next in Thread>