Xen project Mailing List

Re: [Xen-devel] [PATCH 5 of 9] Fine-grained concurrency control structure for the p2m

To: "Tim Deegan" <tim@xxxxxxx>

Date: Wed, 2 Nov 2011 07:20:09 -0700

Cc: olaf@xxxxxxxxx, xen-devel@xxxxxxxxxxxxxxxxxxx, andres@xxxxxxxxxxxxxx, keir.xen@xxxxxxxxx, Andres Lagar-Cavilla <andres@xxxxxxxxxxxxxxxx>, adin@xxxxxxxxxxxxxx

Delivery-date: Wed, 02 Nov 2011 07:21:07 -0700

Domainkey-signature: a=rsa-sha1; c=nofws; d=lagarcavilla.com; h=message-id :in-reply-to:references:date:subject:from:to:cc:mime-version :content-type:content-transfer-encoding; q=dns; s= lagarcavilla.com; b=mLgX8Nt2PiFGWGrps9eqWDfZrrE3gr4rxfzSdiSIy2Dv BgQRzGIEWf5JMMJc8MseJk8/cAso0ov5olrxP4HMd/3kWx176v9YURs6pmkcszIb l+0CsA37TUqMYuQdEVZewFZ2YKKp/K3P6zj7bB9ekSNQZAU0tp0vUHDWYRORycs=

List-id: Xen developer discussion <xen-devel.lists.xensource.com>

Hey there, (many inlines on this one) > At 00:33 -0400 on 27 Oct (1319675630), Andres Lagar-Cavilla wrote: >> Introduce a fine-grained concurrency control structure for the p2m. This >> allows for locking 2M-aligned chunks of the p2m at a time, exclusively. >> Recursive locking is allowed. Global locking of the whole p2m is also >> allowed for certain operations. Simple deadlock detection heuristics are >> put in place. >> >> Note the patch creates backwards-compatible shortcuts that will lock the >> p2m globally. So it should remain functionally identical to what is >> currently >> in place. >> >> Signed-off-by: Andres Lagar-Cavilla <andres@xxxxxxxxxxxxxxxx> > > Wow. What a lot of code. :) I took a look through, but I can't > guarantee to have got all the details. Things I saw: > > - You use atomic_t for the count but only ever update it under a > lock. :) If you just need to be sure of atomic writes, then > atomic_set will do that without using a locked increment/decrement. I'm a bit flaky on my atomics. And paranoid. I'll be less lenient next time. > > - You allocate the bitmaps from xenheap - they should really be using > p2m memory, so as to avoid changing the memory overhead of the domain > as it runs. That will involve map_domain_page()ing the bitmaps as > you go, but at least on x86_64 that's very cheap. p2m_alloc_ptp? Sure. > > - panic() on out-of-memory is pretty rude. > Yeah, but unwinding all possible lock callers to handle ENOMEM was over my threshold. Reality is that on your run-of-the-mill 4GB domain you have 4 or 5 single page allocations. You have bigger problems if that fails. > But stepping back, I'm not sure that we need all this just yet. I think > it would be worth doing the interface changes with a single p2m lock and > measuring how bad it is before getting stuck in to fine-grained locking > (fun though it might be). Completely agree. I think this will also ease adoption and bug isolation. It'll allow me to be more gradual. I'll rework the order. Thanks, very good. > > I suspect that if this is a contention point, allowing multiple readers > will become important, especially if there are particular pages that > often get emulated access. > > And also, I'd like to get some sort of plan for handling long-lived > foreign mappings, if only to make sure that this phase-1 fix doesn't > conflict wih it. > If foreign mappings will hold a lock/ref on a p2m subrange, then they'll disallow global operations, and you'll get a clash between log-dirty and, say, qemu. Ka-blam live migration. Read-only foreign mappings are only problematic insofar paging happens. With proper p2m update/lookups serialization (global or fine-grained) that problem is gone. Write-able foreign mappings are trickier because of sharing and w^x. Is there a reason left, today, to not type PGT_writable an hvm-domain's page when a foreign mapping happens? That would solve sharing problems. w^x really can't be solved short of putting the vcpu on a waitqueue (preferable to me), or destroying the mapping and forcing the foreign OS to remap later. All a few steps ahead, I hope. Who/what's using w^x by the way? If the refcount is zero, I think I know what I'll do ;) That is my current "long term plan". > Oh, one more thing: > >> +/* Some deadlock book-keeping. Say CPU A holds a lock on range A, CPU B >> holds a >> + * lock on range B. Now, CPU A wants to lock range B and vice-versa. >> Deadlock. >> + * We detect this by remembering the start of the current locked range. >> + * We keep a fairly small stack of guards (8), because we don't >> anticipate >> + * a great deal of recursive locking because (a) recursive locking is >> rare >> + * (b) it is evil (c) only PoD seems to do it (is PoD therefore evil?) >> */ > > If PoD could ba adjusted not to do it, could we get rid of all the > recursive locking entirely? That would simplify things a lot. > My comment is an exaggeration. In a fine-grained scenario, recursive locking happens massively throughout the code. We just need to live with it. I was ranting for free on the "evil" adjective. What is a real problem is that pod sweeps can cause deadlocks. There is a simple step to mitigate this: start the sweep from the current gfn and never wrap around -- too bad if the gfn is too high. But this alters the sweeping algorithm. I'll deal with it when its it's turn. Andres > Tim. > > _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.