WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-devel

Re: [Xen-devel] Re: Re: mapping problems in xenpaging

To: Keir Fraser <keir.xen@xxxxxxxxx>
Subject: Re: [Xen-devel] Re: Re: mapping problems in xenpaging
From: Andres Lagar Cavilla <andres@xxxxxxxxxxxxxxxx>
Date: Mon, 10 Oct 2011 15:31:06 -0400
Cc: Olaf Hering <olaf@xxxxxxxxx>, xen-devel@xxxxxxxxxxxxxxxxxxx, Tim Deegan <tim@xxxxxxx>, zhen shi <bickys1986@xxxxxxxxx>, Adin Scannell <adin@xxxxxxxxxxxxxx>
Delivery-date: Tue, 11 Oct 2011 02:38:24 -0700
Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type :content-transfer-encoding; bh=35G2AC0oPAqazRJjWoEGQo9i8COluMeGq0UHUw31GpI=; b=h3KRpqCtF1j/2uaKkrKNO/7Y0FYOMZPcR0i2bCYK5WXrilDviS2lfvVZ1MstB7epln +oVv6F3oTBGMZxhgFfElBwmvkQHzP/J/Iq4XFHvCnez+ammqcG2Q8xeM6mwOYFSUJMds qJRNoQNEhomy7cDB12KOV/krnPwKMVDsQ7GpU=
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
In-reply-to: <CAB8821A.2291B%keir.xen@xxxxxxxxx>
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
References: <20111010092111.GB31800@xxxxxxxxxxxxxxxxxxxxx> <CAB8821A.2291B%keir.xen@xxxxxxxxx>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
Keir,

On Mon, Oct 10, 2011 at 6:06 AM, Keir Fraser <keir.xen@xxxxxxxxx> wrote:
> On 10/10/2011 10:21, "Tim Deegan" <tim@xxxxxxx> wrote:
>
>> At 21:20 -0400 on 09 Oct (1318195224), Andres Lagar Cavilla wrote:
>>> I have a proposal. I'd like to hear from the list what they think.
>>>
>>> - 1. change p2m lock to a read/write lock
>>> - 2. Make lookups (gfn_to_mfn_* family) take a read lock. All current
>>> callers of p2m_lock will become write lockers.
>>> - 3. Change the gfn_to_mfn_* family to get_page on the mfn obtained,
>>> while holding the read lock.
>>> - 4. Have all lookup callers put_page on the obtained mfn, once done.
>>
>> This seems like a step in the right direction, but if we're going to
>> make this big an interface change there might be better interfaces to
>> end up with.
>>
>> A few issues I can see with it:
>>  - p2m lookups are on some very performance-sensitive paths
>>    (e.g. multiple times in any pagetable walk or instruction emulation
>>    in a HVM guest) so adding the rwlock might have a noticeable impact.
>
> If the read sections are short, may as well use a plain spinlock.
>
> The best (but hard) way to make the locking cheaper is to work out a way to
> use finer-grained locks (e.g., per-page / per-mapping) or avoid locks
> altogether (e.g., RCU).
No clue about RCU. But the p2m tree structure lends itself naturally
to fine-grained locking. In fact, hierarchical locking given 2M and 1G
superpages.

Now, this moves all the locking into the specific p2m implementations,
ept and traditional pt. Do you think a test_and_set-style spinlock
could fit in the unused bits of a p2m entry. It would have scarce
debug information :) I don't know if ept would freak out with someone
spinning on an entry it has loaded in the translation hardware.
Probably.

So, perhaps the most decent idea is to have a tree/array of locks on
the side. This would not have to live inside the ept/pt
implementation-specific layer. Although locking unaligned,
arbitrarily-sized ranges of pages (Does anyone do that? PoD?) would
become a big headache.

>
> Multi-reader locks are rarely going to be a good choice in the hypervisor.
>
> A good first step anyhow would be to make the p2m_ synchronisation correct,
> and then optimise it. Sounds like that is hard enough. :-)

Pigybacking another question: ultimately, if we get p2m sync correct,
paging can introduce arbitrary waits. Currently the code bails out,
rather ungracefully (e.g. hvm_copy). Is this what wait queues were
introduced for? Hasn't that been implemented purely out of lack of
cycles, or something more fundamental awaits?

Thanks!
Andres
>
>  -- Keir
>
>>  - This fixes one class of races (page gets freed-to-xen underfoot) but
>>    leaves another one (gfn -> mfn map changes underfoot) untouched.  In
>>    particular it doesn't solve the race where a foreign mapper
>>    gets a r/w map of what's meant to be a read-only frame.
>>
>> I think that to fix things properly we need to have some refcount
>> associated with the p2m mapping itself.  That would be taken by all
>> lookups (or at least some - we could have a flag to the p2m lookup) and
>> released as you suggest, but more importantly it would block all p2m changes
>> while the count was raised.  (I think that a least in the common case we
>> could encode such a refcount using the existing typecount).
>>
>> One problem then is how to make all the callers of the p2m update
>> functions handle failure, either by waiting (new deadlock risk?) or
>> returning EAGAIN at the hypercall interface.  Paths where the update
>> isn't caused by an explicit request (like log-dirty and the mem-event
>> rx-to-rw conversion) would be particularly tricky.
>>
>> More seriously, it introduces another round of the sort of priority
>> inversion we already get with the existing refcounts - a foreign
>> mapping, caused by a user-space program in another VM, could arbitrarily
>> delay a p2m update (and so prevent a VCPU from making progress), without
>> any mechanism to even request that the mapping be removed.
>>
>> Any ideas how to avoid that?  Potentially with some extra bookkeeping on
>> foreign mappings we could revoke or redirect them when the p2m changes.
>> That would fit nicely with the abstraction in the interfaces where HVM
>> domains' memory is always indexed by pfn.  I can imagine it being quite
>> tricky though.
>>
>>> I'm more wary that turning p2m locking into read/write will result in
>>> code deadlocking itself: taking a read lock first and a write lock
>>> later. Possibly the current rwlock implementation could be improved to
>>> keep a cpumask of read-lockers, and provide an atomic "promote from
>>> read to write" atomic operation (something along the lines of wait
>>> until you're the only reader in the cpumask, and then cmpxchg(lock,
>>> -1, WRITE_BIAS))
>>
>> I think that would deadlock if two cpus tried it at once.
>>
>> Tim.
>>
>> _______________________________________________
>> Xen-devel mailing list
>> Xen-devel@xxxxxxxxxxxxxxxxxxx
>> http://lists.xensource.com/xen-devel
>
>
>

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel