|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH v3] xen/mm: move adjustment of claimed pages counters on allocation
On Thu, Jan 08, 2026 at 09:24:51AM +0100, Jan Beulich wrote:
> On 07.01.2026 18:56, Roger Pau Monne wrote:
> > The current logic splits the update of the amount of available memory in
> > the system (total_avail_pages) and pending claims into two separately
> > locked regions. This leads to a window between counters adjustments where
> > the result of total_avail_pages - outstanding_claims doesn't reflect the
> > real amount of free memory available, and can return a negative value due
> > to total_avail_pages having been updated ahead of outstanding_claims.
> >
> > Fix by adjusting outstanding_claims and d->outstanding_pages in the same
> > place where total_avail_pages is updated. Note that accesses to
> > d->outstanding_pages is protected by the global heap_lock, just like
> > total_avail_pages or outstanding_claims. Add a comment to the field
> > declaration, and also adjust the comment at the top of
> > domain_set_outstanding_pages() to be clearer in that regard.
> >
> > Note that failures in assign_pages() causes the claimed amount that has
> > been allocated to be lost, as the amount is not added back to the domain
> > quota once pages are freed. Given the intended usage of claims is limited
> > to initial physmap populate, and the current failure paths in
> > assign_pages() would lead to the domain being destroyed anyway, don't
> > add extra logic to recover the claimed amount on failure - it's just adding
> > complexity for no real benefit.
> >
> > Fixes: 65c9792df600 ("mmu: Introduce XENMEM_claim_pages (subop of memory
> > ops)")
> > Signed-off-by: Roger Pau Monné <roger.pau@xxxxxxxxxx>
> > ---
> > Changes since v2:
> > - Revert back to the approach in v1.
>
> You didn't fully go back to v1. While ...
>
> > @@ -548,9 +524,10 @@ int domain_set_outstanding_pages(struct domain *d,
> > unsigned long pages)
> > unsigned long claim, avail_pages;
> >
> > /*
> > - * take the domain's page_alloc_lock, else all d->tot_page adjustments
> > - * must always take the global heap_lock rather than only in the much
> > - * rarer case that d->outstanding_pages is non-zero
> > + * Two locks are needed here:
> > + * - d->page_alloc_lock: protects accesses to
> > d->{tot,max,extra}_pages.
> > + * - heap_lock: protects accesses to d->outstanding_pages,
> > total_avail_pages
> > + * and outstanding_claims.
> > */
> > nrspin_lock(&d->page_alloc_lock);
> > spin_lock(&heap_lock);
>
> ... the comment improvement is of course okay to keep, ...
>
> > @@ -999,7 +976,7 @@ static struct page_info *alloc_heap_pages(
> > {
> > nodeid_t node;
> > unsigned int i, buddy_order, zone, first_dirty;
> > - unsigned long request = 1UL << order;
> > + unsigned int request = 1UL << order;
>
> ... this I'm less certain about (and if it was to be kept, it should also
> become 1U). For one, this bounds check:
>
> if ( (outstanding_claims + request > total_avail_pages) &&
>
> ends up still being okay (perhaps except on Arm32, but the wrapping issue
> there is pre-existing, albeit possibly benign due to other constraints),
> but just because outstanding_claims is "long" (and it's unclear to me why
> it isn't "unsigned long").
>
> And then, what exactly is it that you want the more narrow type for (the
> description says nothing in that regard)? The other relevant uses of the
> variable are
>
> avail[node][zone] -= request;
> total_avail_pages -= request;
>
> where both avail[][] and total_avail_pages are (unsigned) long (again
> unclear to me why for total_avail_pages it's plain long).
>
> Oh, wait, it is ...
>
> > @@ -1071,6 +1050,30 @@ static struct page_info *alloc_heap_pages(
> > total_avail_pages -= request;
> > ASSERT(total_avail_pages >= 0);
> >
> > + if ( d && d->outstanding_pages && !(memflags & MEMF_no_refcount) )
> > + {
> > + /*
> > + * Adjust claims in the same locked region where total_avail_pages
> > is
> > + * adjusted, not doing so would lead to a window where the amount
> > of
> > + * free memory (avail - claimed) would be incorrect.
> > + *
> > + * Note that by adjusting the claimed amount here it's possible for
> > + * pages to fail to be assigned to the claiming domain while
> > already
> > + * having been subtracted from d->outstanding_pages. Such claimed
> > + * amount is then lost, as the pages that fail to be assigned to
> > the
> > + * domain are freed without replenishing the claim. This is fine
> > given
> > + * claims are only to be used during physmap population as part of
> > + * domain build, and any failure in assign_pages() there will
> > result in
> > + * the domain being destroyed before creation is finished. Losing
> > part
> > + * of the claim makes no difference.
> > + */
> > + unsigned int outstanding = min(d->outstanding_pages, request);
>
> ... the desire to avoid use of min_t() here which wants "request" to be
> "unsigned int". At some point we'll want to change the struct domain fields
> to unsigned long anyway, at which point the above would need adjustment. It's
> well possible that such an adjustment would end up being to then use min_t().
> Imo we'd be better off using e.g.
>
> unsigned int outstanding = min(d->outstanding_pages + 0UL, request);
>
> or even
>
> typeof(d->outstanding_pages) outstanding =
> min(d->outstanding_pages + 0UL, request);
>
> right away. In the latter case the decl wouldn't even need touching when the
> struct domain fields are promoted.
My preference would be:
unsigned long outstanding = min(d->outstanding_pages + 0UL, request);
If that's fine with you.
> > + BUG_ON(outstanding > d->outstanding_pages);
>
> Unlike in v1, where the min() was different, this is now dead code.
Oh, I need to adjust this so it's outstanding > outstanding_claims
instead.
Thanks, Roger.
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |