This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
Home Products Support Community News


[Xen-devel] Re: blktap: Sync with XCP, dropping zero-copy.

To: Daniel Stodden <daniel.stodden@xxxxxxxxxx>
Subject: [Xen-devel] Re: blktap: Sync with XCP, dropping zero-copy.
From: Jeremy Fitzhardinge <jeremy@xxxxxxxx>
Date: Tue, 16 Nov 2010 09:56:01 -0800
Cc: "Xen-devel@xxxxxxxxxxxxxxxxxxx" <Xen-devel@xxxxxxxxxxxxxxxxxxx>
Delivery-date: Tue, 16 Nov 2010 09:56:51 -0800
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
In-reply-to: <1289898792.23890.214.camel@ramone>
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
References: <1289604707-13378-1-git-send-email-daniel.stodden@xxxxxxxxxx> <4CDDE0DA.2070303@xxxxxxxx> <1289620544.11102.373.camel@xxxxxxxxxxxxxxxxxxxxxxx> <4CE17B80.7080606@xxxxxxxx> <1289898792.23890.214.camel@ramone>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
User-agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv: Gecko/20101027 Fedora/3.1.6-1.fc13 Lightning/1.0b3pre Thunderbird/3.1.6
On 11/16/2010 01:13 AM, Daniel Stodden wrote:
> On Mon, 2010-11-15 at 13:27 -0500, Jeremy Fitzhardinge wrote:
>> On 11/12/2010 07:55 PM, Daniel Stodden wrote:
>>>> Surely this can be dealt with by replacing the mapped granted page with
>>>> a local copy if the refcount is elevated?
>>> Yeah. We briefly discussed this when the problem started to pop up
>>> (again).
>>> I had a patch, for blktap1 in XS 5.5 iirc, which would fill mapping with
>>> a dummy page mapped in. You wouldn't need a copy, a R/O zero map easily
>>> does the job.
>> Hm, I'd be a bit concerned that that might cause problems if used
>> generically. 
> Yeah. It wasn't a problem because all the network backends are on TCP,
> where one can be rather sure that the dups are going to be properly
> dropped.
> Does this hold everywhere ..? -- As mentioned below, the problem is
> rather in AIO/DIO than being Xen-specific, so you can see the same
> behavior on bare metal kernels too. A userspace app seeing an AIO
> complete and then reusing that buffer elsewhere will occassionally
> resend garbage over the network.

Yeah, that sounds like a generic security problem.  I presume the
protocol will just discard the excess retransmit data, but it might mean
a usermode program ends up transmitting secrets it never intended to...

> There are some important parts which would go missing. Such as
> ratelimiting gntdev accesses -- 200 thundering tapdisks each trying to
> gntmap 352 pages simultaneously isn't so good, so there still needs to
> be some bridge arbitrating them. I'd rather keep that in kernel space,
> okay to cram stuff like that into gntdev? It'd be much more
> straightforward than IPC.

What's the problem?  If you do nothing then it will appear to the kernel
as a bunch of processes doing memory allocations, and they'll get
blocked/rate-limited accordingly if memory is getting short.  There's
plenty of existing mechanisms to control that sort of thing (cgroups,
etc) without adding anything new to the kernel.  Or are you talking
about something other than simple memory pressure?

And there's plenty of existing IPC mechanisms if you want them to
explicitly coordinate with each other, but I'd tend to thing that's
premature unless you have something specific in mind.

> Also, I was absolutely certain I once saw VM_FOREIGN support in gntdev..
> Can't find it now, what happened? Without, there's presently still no
> zero-copy.

gntdev doesn't need VM_FOREIGN any more - it uses the (relatively
new-ish) mmu notifier infrastructure which is intended to allow a device
to sync an external MMU with usermode mappings.  We're not using it in
precisely that way, but it allows us to wrangle grant mappings before
the generic code tries to do normal pte ops on them.

> Once the issues were solved, it'd be kinda nice. Simplifies stuff like
> memshr for blktap, which depends on getting hold of original grefs.
> We'd presumably still need the tapdev nodes, for qemu, etc. But those
> can stay non-xen aware then.
>>>> The only caveat is the stray unmapping problem, but I think gntdev can
>>>> be modified to deal with that pretty easily.
>>> Not easier than anything else in kernel space, but when dealing only
>>> with the refcounts, that's as as good a place as anwhere else, yes.
>> I think the refcount test is pretty straightforward - if the refcount is
>> 1, then we're the sole owner of the page and we don't need to worry
>> about any other users.  If its > 1, then somebody else has it, and we
>> need to make sure it no longer refers to a granted page (which is just a
>> matter of doing a set_pte_atomic() to remap from present to present).
> [set_pte_atomic over grant ptes doesn't work, or does it?]

No, I forgot about grant ptes magic properties.  But there is the hypercall.

>> Then we'd have a set of frames whose lifetimes are being determined by
>> some other subsystem.  We can either maintain a list of them and poll
>> waiting for them to become free, or just release them and let them be
>> managed by the normal kernel lifetime rules (which requires that the
>> memory attached to them be completely normal, of course).
> The latter sounds like a good alternative to polling. So an
> unmap_and_replace, and giving up ownership thereafter. Next run of the
> dispatcher thread can can just refill the foreign pfn range via
> alloc_empty_pages(), to rebalance.

Do we actually need a "foreign page range"?  Won't any pfn do?  If we
start with a specific range of foreign pfns and then start freeing those
pfns back to the kernel, we won't have one for long...


Xen-devel mailing list

<Prev in Thread] Current Thread [Next in Thread>