WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-devel

Re: [Xen-devel] Re: Re: mapping problems in xenpaging

To: Tim Deegan <tim@xxxxxxx>
Subject: Re: [Xen-devel] Re: Re: mapping problems in xenpaging
From: Andres Lagar Cavilla <andres@xxxxxxxxxxxxxxxx>
Date: Thu, 20 Oct 2011 10:54:05 -0400
Cc: zhen shi <bickys1986@xxxxxxxxx>, Olaf Hering <olaf@xxxxxxxxx>, Keir Fraser <keir.xen@xxxxxxxxx>, xen-devel@xxxxxxxxxxxxxxxxxxx, Adin Scannell <adin@xxxxxxxxxxxxxx>
Delivery-date: Tue, 25 Oct 2011 09:36:46 -0700
Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type :content-transfer-encoding; bh=NbxrqEolI3bMLd4XBNkiyADzR0KdmaJLQuP2JNI/Cmw=; b=rzQCFX7IHb7nRSBDdbMAAJsquYsS1gLp5gt5xtqDEtjNfr6oF399cLrCgZB4qKLZ4g BSVV2w8kaRJqHxFAigzJXL8nS41F+jDlNSGnqhFJyqMiwEaOfslPUVx0YWPLrp5apsTn wEb6+YWGEE9uCSH/dADhuIkWRn7lkCqE1mfTA=
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
In-reply-to: <20111020101324.GD49983@xxxxxxxxxxxxxxxxxxxxx>
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
References: <CADzFZPsJ2--PQCVH1QGJ-L1q5k4bokBrZeaaAt=BSD7QU9cbOA@xxxxxxxxxxxxxx> <20111010092111.GB31800@xxxxxxxxxxxxxxxxxxxxx> <CADzFZPscgJ6TWXNjyeEjbOnMrEpCE7zS6BTGxfJ+yibpz-Ds_w@xxxxxxxxxxxxxx> <20111013150229.GJ44937@xxxxxxxxxxxxxxxxxxxxx> <CADzFZPusJN7e2jG147q68RA3ZQ1PGs-xFNk+jjLRDAR7xboVjQ@xxxxxxxxxxxxxx> <20111020101324.GD49983@xxxxxxxxxxxxxxxxxxxxx>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
On Thu, Oct 20, 2011 at 6:13 AM, Tim Deegan <tim@xxxxxxx> wrote:
> At 13:59 -0400 on 13 Oct (1318514390), Andres Lagar Cavilla wrote:
>> Good stuff Tim, let me summarize:
>>
>>
>> - The key is to obtain exclusive access to a p2m entry, or range [gfn,
>> gfn + 1<<order). This exclusive access lasts beyond the actual lookup,
>> until the caller is finished with modifications, to prevent the p2m
>> mapping changing underfoot.
>
> Yes.  It only excludes concurrent updates, not concurrent lookups, so in
> that way it's effectively a per-range MRSW lock, implemented with
> refcounts.  (I feel like I'm working around in a circle to your first
> suggestion!)

That's great because ... I'm working around in a circle 180 degrees opposite :)

I think it's important to untangle page liveness from mapping mutex
access. That's why I favor Keir's approach of just locking the thing,
no MSRW. Reasons:
1. Very common idiom throughout the code is to get_entry -> set_entry.
How do we do that in an MSRW, atomically?
2. You're concerned about foreign mappings, rightly so. With mutex
access to the p2m mapping, we can ensure the page refcount increases
atomically in the context of building the foreign mapping. This will
keep the page alive and unable to be swapped/shared/whatever. We only
lock the p2m entry while building the p2m mapping.
3. Recursive locking for different purposes is just easier without
refcounts (generalization of reason 1)
4. Note that in your qemu/x86_emulate example, qemu's mapping does not
prevent x86_emulate from progress, as qemu will have relinquished
locks once done building the foreign mapping.

I have a draft implementation of a "tree" of exclusive locks. It's
still a bit embarrassing to share :)
The API is more or less
get_p2m(p2m, gfn, order)
<do work>
put_p2m(p2m, gfn, order)
with recursive get allowed, (unsophisticated) deadlock detection, and
shortcuts for individual gfn and for global locking (for e.g.
log_dirty). Give me a couple days for an RFC post.

Thanks
Andres

>
>> - bits for either fine-grain locks or refcounts need to be set aside.
>> Stuffing those bits in actual p2m entries will be very error prone/not
>> possible, given all existing implementations (NPT+IOMMU, 32bit, etc).
>> So, we're stuck with extra space overhead for a fine-grained p2m
>> concurrency control structure.
>
> Yes.
>
>> - Unless the recount collapses into the page_info struct. Even then
>> there is a critical section "get p2m_entry then get_page" that needs
>> to execute atomically.
>
> True, and since you only get the page struct after the p2m lookup that's
> tricky.
>
>> - foreign mappings can block p2m actions for arbitrarily long. This
>> doesn't usually happen, but the risk is latent. This is "hard to
>> solve", for now.
>
> Yes.
>
>> question 1: I still don't see the need for refcounts. If you want to
>> prevent changes underfoot, you need to lock the entry, and that's it.
>> In all the cases you explained, somebody would have to wait until the
>> refcount on the entry drops to reflect they are the only holder. This
>> is akin to being locked out.
>
> It should be possible for multiple clients to look up and use the same
> p2m entry (e.g. Qemu having a mapping of a guest frame shouldn't stop
> x86_emulate from reading or writing that memory, though both of those
> should stop any concurrent p2m update to the gfn).
>
>> question 2: although internal hypervisor code paths do not seem to act
>> on unaligned p2m ranges, external calls (e.g. MEMF_populate_on_demand)
>> could possibly pass unaligned ranges. These complicate fine-grain
>> concurrency. Should we fail those? With so many toolstacks out there,
>> I feel very hesitant.
>
> Hmm.  Most operations that touch large numbers of frames already have a
> partial-success return path (or at least stop-and-retry) to avoid
> long-running operations starving timers, softirqs etc.  If there are
> paths that don't do this, maybe they should. :)
>
>> question 3: is there any way to know a priori the max gfn a domain
>> will have? Can we pre-allocate the concurrency control structure as
>> opposed to demand allocating it?
>
> Not any sensible maximum, no, and gfn sapace can be sparse so it might
> not make sense to allocate it all up front anyway.  But the p2m
> structures themselves are allocated on demand so the extra bookkeeping
> space can run alongside them.
>
>> suggestion 1: bake exclusive access in the current calls. A p2m
>> lookup, followed by a p2m set_entry, delimit a critical section for
>> that range of p2m mappings. p2m lookups without closing set_entry will
>> have to issue a call to drop exclusive access on the range of
>> mappings.
>
> As I said above, it shouldn't be exclusive with other _lookups_, only
> with updates.  But I have no objection to adding a flag to the lookup
> function that lets the caller choose "lock for update" vs "lock for
> lookup".
>
>> suggestion 2: limit fine granularity (if locking, not refcounting), to
>> 2MB superpages. Saves space. 512 neighbours can surely coexist without
>> locking each other out :)
>
> Sure; if that turns out to cause a lot of contention it can be changed
> later.
>
> Cheers,
>
> Tim.
>

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel