[Xen-devel] Re: Re: mapping problems in xenpaging

To:	Tim Deegan <tim@xxxxxxx>
Subject:	[Xen-devel] Re: Re: mapping problems in xenpaging
From:	Andres Lagar Cavilla <andres@xxxxxxxxxxxxxxxx>
Date:	Mon, 10 Oct 2011 15:17:47 -0400
Cc:	Olaf Hering <olaf@xxxxxxxxx>, xen-devel@xxxxxxxxxxxxxxxxxxx, zhen shi <bickys1986@xxxxxxxxx>, Adin Scannell <adin@xxxxxxxxxxxxxx>
Delivery-date:	Tue, 11 Oct 2011 02:37:34 -0700
Dkim-signature:	v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type :content-transfer-encoding; bh=8nDcfM3JS2kFH+P1puoFgP+Z10rTTL4LKR3Rs36CVFI=; b=I3UU3x2cNV9PYoDWm43WuDIpXr8F2yyg3cwtalXrP7GHR9tTSIRRnYaPcjJ3XWYFxv OV6wUVLf5Oldusg1/dVe/QoZ9Qt4sT3A+h/Og78QTehXPx4qkf/AZHwF2NcC1RXQ50/c XYZ+tOswuYFsy4tNGM399zN3x+krWOWB07YL0=
Envelope-to:	www-data@xxxxxxxxxxxxxxxxxxx
In-reply-to:	<20111010092111.GB31800@xxxxxxxxxxxxxxxxxxxxx>
List-help:	<mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id:	Xen developer discussion <xen-devel.lists.xensource.com>
List-post:	<mailto:xen-devel@lists.xensource.com>
List-subscribe:	<http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe:	<http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
References:	<CADzFZPsJ2--PQCVH1QGJ-L1q5k4bokBrZeaaAt=BSD7QU9cbOA@xxxxxxxxxxxxxx> <20111010092111.GB31800@xxxxxxxxxxxxxxxxxxxxx>
Sender:	xen-devel-bounces@xxxxxxxxxxxxxxxxxxx

Hi Tim,

On Mon, Oct 10, 2011 at 5:21 AM, Tim Deegan <tim@xxxxxxx> wrote:
> At 21:20 -0400 on 09 Oct (1318195224), Andres Lagar Cavilla wrote:
>> I have a proposal. I'd like to hear from the list what they think.
>>
>> - 1. change p2m lock to a read/write lock
>> - 2. Make lookups (gfn_to_mfn_* family) take a read lock. All current
>> callers of p2m_lock will become write lockers.
>> - 3. Change the gfn_to_mfn_* family to get_page on the mfn obtained,
>> while holding the read lock.
>> - 4. Have all lookup callers put_page on the obtained mfn, once done.
>
> This seems like a step in the right direction, but if we're going to
> make this big an interface change there might be better interfaces to
> end up with.
Agree, and hopefully we can converge towards something awesome.

> A few issues I can see with it:
>  - p2m lookups are on some very performance-sensitive paths
>   (e.g. multiple times in any pagetable walk or instruction emulation
>   in a HVM guest) so adding the rwlock might have a noticeable impact.
>
>  - This fixes one class of races (page gets freed-to-xen underfoot) but
>   leaves another one (gfn -> mfn map changes underfoot) untouched.  In
>   particular it doesn't solve the race where a foreign mapper
>   gets a r/w map of what's meant to be a read-only frame.

Can you elaborate a bit? Under what situations does the gfn->mfn map
change underfoot (other than sharing, paging in or superpage
sharding)? Wouldn't those two be taking a writer lock, and thus be
mutually ecxluded from lookups?

Also, the second problem you mention (foregin rw map of ro page) seems
to be a tad different. That fix should go into get_page_from_l1e,
right? Isn't qemu allowed to do this all the time?

> I think that to fix things properly we need to have some refcount
> associated with the p2m mapping itself.  That would be taken by all
> lookups (or at least some - we could have a flag to the p2m lookup) and
> released as you suggest, but more importantly it would block all p2m changes
> while the count was raised.  (I think that a least in the common case we
> could encode such a refcount using the existing typecount).

Assuming mapping means "entry in p2m", multiple mappings would have
their ref count collapse in the page typecount. Isn't that a problem?
Do we need per-mapping refcounts, or rather, per mapping mutual
exclusion? My feel is that page refcounts are necessary to prevent the
page from disappearing, and mappings need to have their lookups and
modifications synchronized.

> One problem then is how to make all the callers of the p2m update
> functions handle failure, either by waiting (new deadlock risk?) or
> returning EAGAIN at the hypercall interface.  Paths where the update
> isn't caused by an explicit request (like log-dirty and the mem-event
> rx-to-rw conversion) would be particularly tricky.

Callers already wait on lock_p2m. They'll wait longer :)

On failure, to cite a specific example, if paging was trying to swap
something out that got foreign-mapped by somebody else, then yeah,
there's no other option than failing that call.

How would log-dirty and x^w fail (if the refcount increases before
they get exclusive access to the mapping)? They're not trying to
change the mapping and/or make a page go away, rather they're changing
the p2m permission.

> More seriously, it introduces another round of the sort of priority
> inversion we already get with the existing refcounts - a foreign
> mapping, caused by a user-space program in another VM, could arbitrarily
> delay a p2m update (and so prevent a VCPU from making progress), without
> any mechanism to even request that the mapping be removed.

Yeah, that's tricky. I do not know if there is a fix at all.
Fundamentally, the foreign mapper (some dom0 sharing/paging/foo
utility) is completely async to the domain. Even if we were to revoke
the foreign mapping as you suggest below, that would involve an upcall
into the foreign-mapper-guest-OS to have it cleanly neuter the mapping
and drop the refcount. That's not at all trivial! Perhaps foreign
mapping vma's should be taught to patiently re-establish mappings if
they disappear under their feet? Event then, you need to keep track of
those foreign l1e's, and nothing short of a list will do.

Because this is performance rather than correctness I'm inclined to
not poke the beast.

> Any ideas how to avoid that?  Potentially with some extra bookkeeping on
> foreign mappings we could revoke or redirect them when the p2m changes.
> That would fit nicely with the abstraction in the interfaces where HVM
> domains' memory is always indexed by pfn.  I can imagine it being quite
> tricky though.
>
>> I'm more wary that turning p2m locking into read/write will result in
>> code deadlocking itself: taking a read lock first and a write lock
>> later. Possibly the current rwlock implementation could be improved to
>> keep a cpumask of read-lockers, and provide an atomic "promote from
>> read to write" atomic operation (something along the lines of wait
>> until you're the only reader in the cpumask, and then cmpxchg(lock,
>> -1, WRITE_BIAS))
>
> I think that would deadlock if two cpus tried it at once.

If you keep a cpumask of all read lockers, and only try to promote if
you're the only read locker, it wouldn't. But then you'd have to
protect the cpumask from races with another spinlock. Yuk.

Which brings me to another question: p2m_locked_by_me (and others)
check on the physical cpu (lock->holder == curent->processor,
roughly). Well, what if you lock and then the vcpu migrates? Are vcpu
migrations prevented if you hold any locks? Or is there some other
magic going on?

Thanks!
Andres
>
> Tim.
>

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel

WARNING - OLD ARCHIVES

xen-devel

[Xen-devel] Re: Re: mapping problems in xenpaging