Xen project Mailing List

[Xen-devel] Re: blktap: Sync with XCP, dropping zero-copy.

To: Jeremy Fitzhardinge <jeremy@xxxxxxxx>

From: Daniel Stodden <daniel.stodden@xxxxxxxxxx>

Date: Wed, 17 Nov 2010 13:57:00 -0800

Cc: "Xen-devel@xxxxxxxxxxxxxxxxxxx" <Xen-devel@xxxxxxxxxxxxxxxxxxx>

Delivery-date: Wed, 17 Nov 2010 13:58:03 -0800

List-id: Xen developer discussion <xen-devel.lists.xensource.com>

On Wed, 2010-11-17 at 16:02 -0500, Jeremy Fitzhardinge wrote: > On 11/17/2010 12:21 PM, Daniel Stodden wrote: > > And, like all granted frames, not owning them implies they are not > > resolvable via mfn_to_pfn, thereby failing in follow_page, thereby gup() > > without the VM_FOREIGN hack. > > Hm, I see. Well, I wonder if using _PAGE_SPECIAL would help (it is put > on usermode ptes which don't have a backing struct page). After all, > there's no fundamental reason why it would need a pfn; the mfn in the > pte is what's actually needed to ultimately generate a DMA descriptor. The kernel needs the page structs at least for locking and refcounting. There's also a some trickier stuff in there. Like redirtying disk-backed user memory after read completion, in case it's been laundered. (So that an AIO on unpinned user memory doesn't subsequently get flashed back when cycling through swap, if I understood that thing correctly.) Doesn't apply for blktap (it's all reserved pages). All I mean is: I wouldn't exactly see some innocent little dio hack or so shape up in there. Kernel allowing to DMA into a bare pfnmap -- From the platform POV, I'd agree. E.g. there's a concept of devices DMA-ing into arbitrary I/O memory space, not host memory, on some bus architectures. PCI would come to my mind (the old shared medium stuff, unsure about those newfangled P-t-P topologies). But not in Linux, so I presently don't see anybody upstream bothering to make block-I/O request addressing more forgiving than it is. PAGE_SPECIAL -- to the kernel, that means the opposite: page structs which aren't backed by 'real' memory, so gup(), for example, is told to fail (how nasty). In contrast, VM_FOREIGN is non-memory backed by page structs. > > Correct me if I'm mistaken. I used to be quicker looking up stuff on > > arch-xen kernels, but I think fundamental constants of the Xen universe > > didn't change since last time. > > No, but Linux has. Not in that respect. There's certainly a way to get VM_FOREIGN out of the mainline code. It would involve an unlikely() branch in .pte_val(=xen_pte_val) to fall back into a private local m2p hash lookup. Assuming that kind of thing gets nowhere inlined. Not nice, but still more upstreamable than VM_FOREIGN. > > [ > > Part of the reason why blktap *never* frees those pages, apart from > > being slightly greedy, are deadlock hazards when writing those nodes in > > dom0 through the pagecache, as dom0 might. You need memory pools on the > > datapath to guarantee progress under pressure. That got pretty ugly > > after 2.6.27, btw. > > ] > > That's what mempools are intended to solve. That's why the blktap frame pool is now a mempool, indeed. > > In any case, let's skip trying what happens if a thundering herd of > > several hundred userspace disks tries gfp()ing their grant slots out of > > dom0 without without arbitration. > > I'm not against arbitration, but I don't think that's something that > should be implemented as part of a Xen driver. Uhm, maybe I'm misunderstanding you, isn't the whole thing a Xen driver? What do you suggest? > >>> I guess we've been meaning the same thing here, unless I'm > >>> misunderstanding you. Any pfn does, and the balloon pagevec allocations > >>> default to order 0 entries indeed. Sorry, you're right, that's not a > >>> 'range'. With a pending re-xmit, the backend can find a couple (or all) > >>> of the request frames have count>1. It can flip and abandon those as > >>> normal memory. But it will need those lost memory slots back, straight > >>> away or next time it's running out of frames. As order-0 allocations. > >> Right. GFP_KERNEL order 0 allocations are pretty reliable; they only > >> fail if the system is under extreme memory pressure. And it has the > >> nice property that if those allocations block or fail it rate limits IO > >> ingress from domains rather than being crushed by memory pressure at the > >> backend (ie, the problem with trying to allocate memory in the writeout > >> path). > >> > >> Also the cgroup mechanism looks like an extremely powerful way to > >> control the allocations for a process or group of processes to stop them > >> from dominating the whole machine. > > Ah. In case it can be put to work to bind processes allocating pagecache > > entries for dirtying to some boundary, I'd be really interested. I think > > I came across it once but didn't take the time to read the docs > > thoroughly. Can it? > > I'm not sure about dirtyness - it seems like something that should be > within its remit, even if it doesn't currently have it. > > The cgroup mechanism is extremely powerful, now that I look at it. You > can do everything from setting block IO priorities and QoS parameters to > CPU limits. Thanks. I'll keep it under my pillow then. Daniel _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.