Xen project Mailing List

Re: [Xen-devel] Re: Re: mapping problems in xenpaging

From: Andres Lagar Cavilla <andres@xxxxxxxxxxxxxxxx>

Date: Mon, 10 Oct 2011 15:31:06 -0400

Cc: Olaf Hering <olaf@xxxxxxxxx>, xen-devel@xxxxxxxxxxxxxxxxxxx, Tim Deegan <tim@xxxxxxx>, zhen shi <bickys1986@xxxxxxxxx>, Adin Scannell <adin@xxxxxxxxxxxxxx>

Delivery-date: Tue, 11 Oct 2011 02:38:24 -0700

List-id: Xen developer discussion <xen-devel.lists.xensource.com>

Keir, On Mon, Oct 10, 2011 at 6:06 AM, Keir Fraser <keir.xen@xxxxxxxxx> wrote: > On 10/10/2011 10:21, "Tim Deegan" <tim@xxxxxxx> wrote: > >> At 21:20 -0400 on 09 Oct (1318195224), Andres Lagar Cavilla wrote: >>> I have a proposal. I'd like to hear from the list what they think. >>> >>> - 1. change p2m lock to a read/write lock >>> - 2. Make lookups (gfn_to_mfn_* family) take a read lock. All current >>> callers of p2m_lock will become write lockers. >>> - 3. Change the gfn_to_mfn_* family to get_page on the mfn obtained, >>> while holding the read lock. >>> - 4. Have all lookup callers put_page on the obtained mfn, once done. >> >> This seems like a step in the right direction, but if we're going to >> make this big an interface change there might be better interfaces to >> end up with. >> >> A few issues I can see with it: >> - p2m lookups are on some very performance-sensitive paths >> (e.g. multiple times in any pagetable walk or instruction emulation >> in a HVM guest) so adding the rwlock might have a noticeable impact. > > If the read sections are short, may as well use a plain spinlock. > > The best (but hard) way to make the locking cheaper is to work out a way to > use finer-grained locks (e.g., per-page / per-mapping) or avoid locks > altogether (e.g., RCU). No clue about RCU. But the p2m tree structure lends itself naturally to fine-grained locking. In fact, hierarchical locking given 2M and 1G superpages. Now, this moves all the locking into the specific p2m implementations, ept and traditional pt. Do you think a test_and_set-style spinlock could fit in the unused bits of a p2m entry. It would have scarce debug information :) I don't know if ept would freak out with someone spinning on an entry it has loaded in the translation hardware. Probably. So, perhaps the most decent idea is to have a tree/array of locks on the side. This would not have to live inside the ept/pt implementation-specific layer. Although locking unaligned, arbitrarily-sized ranges of pages (Does anyone do that? PoD?) would become a big headache. > > Multi-reader locks are rarely going to be a good choice in the hypervisor. > > A good first step anyhow would be to make the p2m_ synchronisation correct, > and then optimise it. Sounds like that is hard enough. :-) Pigybacking another question: ultimately, if we get p2m sync correct, paging can introduce arbitrary waits. Currently the code bails out, rather ungracefully (e.g. hvm_copy). Is this what wait queues were introduced for? Hasn't that been implemented purely out of lack of cycles, or something more fundamental awaits? Thanks! Andres > > -- Keir > >> - This fixes one class of races (page gets freed-to-xen underfoot) but >> leaves another one (gfn -> mfn map changes underfoot) untouched. In >> particular it doesn't solve the race where a foreign mapper >> gets a r/w map of what's meant to be a read-only frame. >> >> I think that to fix things properly we need to have some refcount >> associated with the p2m mapping itself. That would be taken by all >> lookups (or at least some - we could have a flag to the p2m lookup) and >> released as you suggest, but more importantly it would block all p2m changes >> while the count was raised. (I think that a least in the common case we >> could encode such a refcount using the existing typecount). >> >> One problem then is how to make all the callers of the p2m update >> functions handle failure, either by waiting (new deadlock risk?) or >> returning EAGAIN at the hypercall interface. Paths where the update >> isn't caused by an explicit request (like log-dirty and the mem-event >> rx-to-rw conversion) would be particularly tricky. >> >> More seriously, it introduces another round of the sort of priority >> inversion we already get with the existing refcounts - a foreign >> mapping, caused by a user-space program in another VM, could arbitrarily >> delay a p2m update (and so prevent a VCPU from making progress), without >> any mechanism to even request that the mapping be removed. >> >> Any ideas how to avoid that? Potentially with some extra bookkeeping on >> foreign mappings we could revoke or redirect them when the p2m changes. >> That would fit nicely with the abstraction in the interfaces where HVM >> domains' memory is always indexed by pfn. I can imagine it being quite >> tricky though. >> >>> I'm more wary that turning p2m locking into read/write will result in >>> code deadlocking itself: taking a read lock first and a write lock >>> later. Possibly the current rwlock implementation could be improved to >>> keep a cpumask of read-lockers, and provide an atomic "promote from >>> read to write" atomic operation (something along the lines of wait >>> until you're the only reader in the cpumask, and then cmpxchg(lock, >>> -1, WRITE_BIAS)) >> >> I think that would deadlock if two cpus tried it at once. >> >> Tim. >> >> _______________________________________________ >> Xen-devel mailing list >> Xen-devel@xxxxxxxxxxxxxxxxxxx >> http://lists.xensource.com/xen-devel > > > _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.