This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
Home Products Support Community News


[Xen-devel] Re: blktap: Sync with XCP, dropping zero-copy.

To: Jeremy Fitzhardinge <jeremy@xxxxxxxx>
Subject: [Xen-devel] Re: blktap: Sync with XCP, dropping zero-copy.
From: Andres Lagar-Cavilla <andres@xxxxxxxxxxxxxxxx>
Date: Wed, 17 Nov 2010 14:47:47 -0500
Cc: xen-devel@xxxxxxxxxxxxxxxxxxx, Daniel Stodden <daniel.stodden@xxxxxxxxxx>
Delivery-date: Thu, 18 Nov 2010 02:02:35 -0800
Dkim-signature: v=1; a=rsa-sha1; c=relaxed; d=lagarcavilla.org; h= subject:mime-version:content-type:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; s= lagarcavilla.org; bh=AeiLrOMfB0NU7GvC9M78xMd8+a8=; b=iYNxM8pCb5C YiNJZ1kwV7zYZJ3S1yi9fbr/v2JSG69/fCIcnrA4bVeQ60KRQ1aK+n8CvsNST9/+ 2E5VbomOoVetv2xeQtHfjlV26mg5lEJ88nDj0VB/7AcVRIC/WvFxPUxcWJr/zocX js4SINI7Xo9SihCaIkJUuzIPmGr7WISo=
Domainkey-signature: a=rsa-sha1; c=nofws; d=lagarcavilla.org; h=subject :mime-version:content-type:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; q=dns; s= lagarcavilla.org; b=ZuaVVoqh3a2uvKrWUROqvuMmsxO2wBSq+1EqH/vKiC/G FrZgE1rrxHogpm8HGift7neNTs+b0/WQ/kFOyHoMAIjaCxvRRlqI/P33ACrPgpje rtNJ9pgjQtoE/wsQLG/jhJJlqeIk0jFLME9e6zBB3HfH7rGU0lVst9ImAbytzSk=
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
In-reply-to: <4CE41676.3070007@xxxxxxxx>
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
References: <20101116215621.59FC2CF782@xxxxxxxxxxxxxxxxxxxxxxxxxxxxx> <E4E889D2-D5B2-435B-8833-BC01C86506B2@xxxxxxxxxxxxxxxx> <4CE41676.3070007@xxxxxxxx>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
So, swapping mfns for write requests is a definite no-no. One could still live 
with copying write buffers and swapping read buffers by the end of the request. 
That still yields some benefit. 

As for kernel mappings, I though a solution would be to provide the hypervisor 
with both pte pointers. After all pte pointers are already provided for mapping 
grants in user-space. But that's a little too much to handle for the current 

Thanks for the feedback
On Nov 17, 2010, at 12:52 PM, Jeremy Fitzhardinge wrote:

> On 11/17/2010 08:36 AM, Andres Lagar-Cavilla wrote:
>> I'll throw an idea there and you educate me why it's lame.
>> Going back to the primary issue of dropping zero-copy, you want the block 
>> backend (tapdev w/AIO or otherwise) to operate on regular dom0 pages, 
>> because you run into all sorts of quirkiness otherwise: magical VM_FOREIGN 
>> incantations to back granted mfn's with fake page structs that make 
>> get_user_pages happy, quirky grant PTEs, etc.
>> Ok, so how about something along the lines of GNTTABOP_swap? Eerily 
>> reminiscent of (maligned?) GNTTABOP_transfer, but hear me out.
>> The observation is that for a blkfront read, you could do the read all along 
>> on a regular dom0 frame, and when stuffing the response into the ring, swap 
>> the dom0 frame (mfn) you used with the domU frame provided as a buffer. Then 
>> the algorithm folds out:
>> 1. Block backend, instead of get_empty_pages_and_pagevec at init time, 
>> creates a pool of reserved regular pages via get_free_page(s). These pages 
>> have their refcount pumped, no one in dom0 will ever touch them.
>> 2. When extracting a blkfront write from the ring, call GNTTABOP_swap 
>> immediately. One of the backend-reserved mfn's is swapped with the domU mfn. 
>> Pfn's and page struct's on both ends remain untouched.
> Would GNTTABOP_swap also require the domU to have already unmapped the
> page from its own pagetables?  Presumably it would fail if it didn't,
> otherwise you'd end up with a domU mapping the same mfn as a
> dom0-private page.
>> 3. For blkfront reads, call swap when stuffing the response back into the 
>> ring
>> 4. Because of 1, dom0 can a) calmly fix its p2m (and kvaddr) after swap, 
>> much like balloon and others do, without fear of races. More importantly, b) 
>> you don't have a weirdo granted PTE, or work with a frame from other domain. 
>> It's your page all along, dom0
>> 5. One assumption for domU is that pages allocated as blkfront buffers won't 
>> be touched by anybody, so a) it's safe for them to swap async with another 
>> frame with undef contents and b) domU can fix its p2m (and kvaddr) when 
>> pulling responses from the ring (the new mfn should be put on the response 
>> by dom0 directly or through an opaque handle)
>> 6. Scatter-gather vectors in ring requests give you a natural multicall 
>> batching for these GNTTABOP_swap's. I.e. all these hypercalls won't happen 
>> as often and at the granularity as skbuff's demanded for GNTTABOP_transfer
>> 7. Potentially domU may want to use the contents in a blkfront write buffer 
>> later for something else. So it's not really zero-copy. But the approach 
>> opens a window to async memcpy . From the point of swap when pulling the req 
>> to the point of pushing the response, you can do memcpy at any time. Don't 
>> know about how practical that is though.
> I think that will be the common case - the kernel will always attempt to
> write dirty pagecache pages to make clean ones, and it will still want
> them around to access.  So it can't really give up the page altogether;
> if it hands it over to dom0, it needs to make a local copy first.
>> Problems at first glance:
>> 1. To support GNTTABOP_swap you need to add more if(version) to blkfront and 
>> blkback.
>> 2. The kernel vaddr will need to be managed as well by dom0/U. Much like 
>> balloon or others: hypercall, fix p2m, and fix kvaddr all need to be taken 
>> care of. domU will probably need to neuter its kvaddr before granting, and 
>> then re-establish it when the response arrives. Weren't all these hypercalls 
>> ultimately more expensive than memcpy for GNTABOP_transfer for netback?
>> 3. Managing the pool of backend reserved pages may be a problem?
>> So in the end, perhaps more of an academic exercise than a palatable answer, 
>> but nonetheless I'd like to hear other problems people may find with this 
>> approach
> It's not clear to me that its any improvement over just directly copying
> the data up front.
>    J

Xen-devel mailing list

<Prev in Thread] Current Thread [Next in Thread>