[Xen-devel] Re: blktap: Sync with XCP, dropping zero-copy.

To:	Jeremy Fitzhardinge <jeremy@xxxxxxxx>
Subject:	[Xen-devel] Re: blktap: Sync with XCP, dropping zero-copy.
From:	Andres Lagar-Cavilla <andres@xxxxxxxxxxxxxxxx>
Date:	Wed, 17 Nov 2010 14:47:47 -0500
Cc:	xen-devel@xxxxxxxxxxxxxxxxxxx, Daniel Stodden <daniel.stodden@xxxxxxxxxx>
Delivery-date:	Thu, 18 Nov 2010 02:02:35 -0800
Dkim-signature:	v=1; a=rsa-sha1; c=relaxed; d=lagarcavilla.org; h= subject:mime-version:content-type:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; s= lagarcavilla.org; bh=AeiLrOMfB0NU7GvC9M78xMd8+a8=; b=iYNxM8pCb5C YiNJZ1kwV7zYZJ3S1yi9fbr/v2JSG69/fCIcnrA4bVeQ60KRQ1aK+n8CvsNST9/+ 2E5VbomOoVetv2xeQtHfjlV26mg5lEJ88nDj0VB/7AcVRIC/WvFxPUxcWJr/zocX js4SINI7Xo9SihCaIkJUuzIPmGr7WISo=
Domainkey-signature:	a=rsa-sha1; c=nofws; d=lagarcavilla.org; h=subject :mime-version:content-type:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; q=dns; s= lagarcavilla.org; b=ZuaVVoqh3a2uvKrWUROqvuMmsxO2wBSq+1EqH/vKiC/G FrZgE1rrxHogpm8HGift7neNTs+b0/WQ/kFOyHoMAIjaCxvRRlqI/P33ACrPgpje rtNJ9pgjQtoE/wsQLG/jhJJlqeIk0jFLME9e6zBB3HfH7rGU0lVst9ImAbytzSk=
Envelope-to:	www-data@xxxxxxxxxxxxxxxxxxx
In-reply-to:	<4CE41676.3070007@xxxxxxxx>
List-help:	<mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id:	Xen developer discussion <xen-devel.lists.xensource.com>
List-post:	<mailto:xen-devel@lists.xensource.com>
List-subscribe:	<http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe:	<http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
References:	<20101116215621.59FC2CF782@xxxxxxxxxxxxxxxxxxxxxxxxxxxxx> <E4E889D2-D5B2-435B-8833-BC01C86506B2@xxxxxxxxxxxxxxxx> <4CE41676.3070007@xxxxxxxx>
Sender:	xen-devel-bounces@xxxxxxxxxxxxxxxxxxx

So, swapping mfns for write requests is a definite no-no. One could still live 
with copying write buffers and swapping read buffers by the end of the request. 
That still yields some benefit. 

As for kernel mappings, I though a solution would be to provide the hypervisor 
with both pte pointers. After all pte pointers are already provided for mapping 
grants in user-space. But that's a little too much to handle for the current 
interface.

Thanks for the feedback
Andres
On Nov 17, 2010, at 12:52 PM, Jeremy Fitzhardinge wrote:

> On 11/17/2010 08:36 AM, Andres Lagar-Cavilla wrote:
>> I'll throw an idea there and you educate me why it's lame.
>> 
>> Going back to the primary issue of dropping zero-copy, you want the block 
>> backend (tapdev w/AIO or otherwise) to operate on regular dom0 pages, 
>> because you run into all sorts of quirkiness otherwise: magical VM_FOREIGN 
>> incantations to back granted mfn's with fake page structs that make 
>> get_user_pages happy, quirky grant PTEs, etc.
>> 
>> Ok, so how about something along the lines of GNTTABOP_swap? Eerily 
>> reminiscent of (maligned?) GNTTABOP_transfer, but hear me out.
>> 
>> The observation is that for a blkfront read, you could do the read all along 
>> on a regular dom0 frame, and when stuffing the response into the ring, swap 
>> the dom0 frame (mfn) you used with the domU frame provided as a buffer. Then 
>> the algorithm folds out:
>> 
>> 1. Block backend, instead of get_empty_pages_and_pagevec at init time, 
>> creates a pool of reserved regular pages via get_free_page(s). These pages 
>> have their refcount pumped, no one in dom0 will ever touch them.
>> 
>> 2. When extracting a blkfront write from the ring, call GNTTABOP_swap 
>> immediately. One of the backend-reserved mfn's is swapped with the domU mfn. 
>> Pfn's and page struct's on both ends remain untouched.
> 
> Would GNTTABOP_swap also require the domU to have already unmapped the
> page from its own pagetables?  Presumably it would fail if it didn't,
> otherwise you'd end up with a domU mapping the same mfn as a
> dom0-private page.
> 
>> 3. For blkfront reads, call swap when stuffing the response back into the 
>> ring
>> 
>> 4. Because of 1, dom0 can a) calmly fix its p2m (and kvaddr) after swap, 
>> much like balloon and others do, without fear of races. More importantly, b) 
>> you don't have a weirdo granted PTE, or work with a frame from other domain. 
>> It's your page all along, dom0
>> 
>> 5. One assumption for domU is that pages allocated as blkfront buffers won't 
>> be touched by anybody, so a) it's safe for them to swap async with another 
>> frame with undef contents and b) domU can fix its p2m (and kvaddr) when 
>> pulling responses from the ring (the new mfn should be put on the response 
>> by dom0 directly or through an opaque handle)
>> 
>> 6. Scatter-gather vectors in ring requests give you a natural multicall 
>> batching for these GNTTABOP_swap's. I.e. all these hypercalls won't happen 
>> as often and at the granularity as skbuff's demanded for GNTTABOP_transfer
>> 
>> 7. Potentially domU may want to use the contents in a blkfront write buffer 
>> later for something else. So it's not really zero-copy. But the approach 
>> opens a window to async memcpy . From the point of swap when pulling the req 
>> to the point of pushing the response, you can do memcpy at any time. Don't 
>> know about how practical that is though.
> 
> I think that will be the common case - the kernel will always attempt to
> write dirty pagecache pages to make clean ones, and it will still want
> them around to access.  So it can't really give up the page altogether;
> if it hands it over to dom0, it needs to make a local copy first.
> 
>> Problems at first glance:
>> 1. To support GNTTABOP_swap you need to add more if(version) to blkfront and 
>> blkback.
>> 2. The kernel vaddr will need to be managed as well by dom0/U. Much like 
>> balloon or others: hypercall, fix p2m, and fix kvaddr all need to be taken 
>> care of. domU will probably need to neuter its kvaddr before granting, and 
>> then re-establish it when the response arrives. Weren't all these hypercalls 
>> ultimately more expensive than memcpy for GNTABOP_transfer for netback?
>> 3. Managing the pool of backend reserved pages may be a problem?
>> 
>> So in the end, perhaps more of an academic exercise than a palatable answer, 
>> but nonetheless I'd like to hear other problems people may find with this 
>> approach
> 
> It's not clear to me that its any improvement over just directly copying
> the data up front.
> 
>    J


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel

WARNING - OLD ARCHIVES

xen-devel

[Xen-devel] Re: blktap: Sync with XCP, dropping zero-copy.