[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH 5 of 9] Fine-grained concurrency control structure for the p2m



Hey there,
(many inlines on this one)

> At 00:33 -0400 on 27 Oct (1319675630), Andres Lagar-Cavilla wrote:
>> Introduce a fine-grained concurrency control structure for the p2m. This
>> allows for locking 2M-aligned chunks of the p2m at a time, exclusively.
>> Recursive locking is allowed. Global locking of the whole p2m is also
>> allowed for certain operations. Simple deadlock detection heuristics are
>> put in place.
>>
>> Note the patch creates backwards-compatible shortcuts that will lock the
>> p2m globally. So it should remain functionally identical to what is
>> currently
>> in place.
>>
>> Signed-off-by: Andres Lagar-Cavilla <andres@xxxxxxxxxxxxxxxx>
>
> Wow.  What a lot of code. :)  I took a look through, but I can't
> guarantee to have got all the details.  Things I saw:
>
> - You use atomic_t for the count but only ever update it under a
>   lock. :)  If you just need to be sure of atomic writes, then
>   atomic_set will do that without using a locked increment/decrement.

I'm a bit flaky on my atomics. And paranoid. I'll be less lenient next time.

>
> - You allocate the bitmaps from xenheap - they should really be using
>   p2m memory, so as to avoid changing the memory overhead of the domain
>   as it runs.   That will involve map_domain_page()ing the bitmaps as
>   you go, but at least on x86_64 that's very cheap.

p2m_alloc_ptp? Sure.

>
> - panic() on out-of-memory is pretty rude.
>
Yeah, but unwinding all possible lock callers to handle ENOMEM was over my
threshold. Reality is that on your run-of-the-mill 4GB domain you have 4
or 5 single page allocations. You have bigger problems if that fails.

> But stepping back, I'm not sure that we need all this just yet.  I think
> it would be worth doing the interface changes with a single p2m lock and
> measuring how bad it is before getting stuck in to fine-grained locking
> (fun though it might be).

Completely agree. I think this will also ease adoption and bug isolation.
It'll allow me to be more gradual. I'll rework the order. Thanks, very
good.

>
> I suspect that if this is a contention point, allowing multiple readers
> will become important, especially if there are particular pages that
> often get emulated access.
>
> And also, I'd  like to get some sort of plan for handling long-lived
> foreign mappings, if only to make sure that this phase-1 fix doesn't
> conflict wih it.
>

If foreign mappings will hold a lock/ref on a p2m subrange, then they'll
disallow global operations, and you'll get a clash between log-dirty and,
say, qemu. Ka-blam live migration.

Read-only foreign mappings are only problematic insofar paging happens.
With proper p2m update/lookups serialization (global or fine-grained) that
problem is gone.

Write-able foreign mappings are trickier because of sharing and w^x. Is
there a reason left, today, to not type PGT_writable an hvm-domain's page
when a foreign mapping happens? That would solve sharing problems. w^x
really can't be solved short of putting the vcpu on a waitqueue
(preferable to me), or destroying the mapping and forcing the foreign OS
to remap later. All a few steps ahead, I hope.

Who/what's using w^x by the way? If the refcount is zero, I think I know
what I'll do ;)

That is my current "long term plan".

> Oh, one more thing:
>
>> +/* Some deadlock book-keeping. Say CPU A holds a lock on range A, CPU B
>> holds a
>> + * lock on range B. Now, CPU A wants to lock range B and vice-versa.
>> Deadlock.
>> + * We detect this by remembering the start of the current locked range.
>> + * We keep a fairly small stack of guards (8), because we don't
>> anticipate
>> + * a great deal of recursive locking because (a) recursive locking is
>> rare
>> + * (b) it is evil (c) only PoD seems to do it (is PoD therefore evil?)
>> */
>
> If PoD could ba adjusted not to do it, could we get rid of all the
> recursive locking entirely?  That would simplify things a lot.
>

My comment is an exaggeration. In a fine-grained scenario, recursive
locking happens massively throughout the code. We just need to live with
it. I was ranting for free on the "evil" adjective.

What is a real problem is that pod sweeps can cause deadlocks. There is a
simple step to mitigate this: start the sweep from the current gfn and
never wrap around -- too bad if the gfn is too high. But this alters the
sweeping algorithm. I'll deal with it when its it's turn.

Andres
> Tim.
>
>



_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.