|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH v3] xen/mm: move adjustment of claimed pages counters on allocation
On 07.01.2026 18:56, Roger Pau Monne wrote:
> The current logic splits the update of the amount of available memory in
> the system (total_avail_pages) and pending claims into two separately
> locked regions. This leads to a window between counters adjustments where
> the result of total_avail_pages - outstanding_claims doesn't reflect the
> real amount of free memory available, and can return a negative value due
> to total_avail_pages having been updated ahead of outstanding_claims.
>
> Fix by adjusting outstanding_claims and d->outstanding_pages in the same
> place where total_avail_pages is updated. Note that accesses to
> d->outstanding_pages is protected by the global heap_lock, just like
> total_avail_pages or outstanding_claims. Add a comment to the field
> declaration, and also adjust the comment at the top of
> domain_set_outstanding_pages() to be clearer in that regard.
>
> Note that failures in assign_pages() causes the claimed amount that has
> been allocated to be lost, as the amount is not added back to the domain
> quota once pages are freed. Given the intended usage of claims is limited
> to initial physmap populate, and the current failure paths in
> assign_pages() would lead to the domain being destroyed anyway, don't
> add extra logic to recover the claimed amount on failure - it's just adding
> complexity for no real benefit.
>
> Fixes: 65c9792df600 ("mmu: Introduce XENMEM_claim_pages (subop of memory
> ops)")
> Signed-off-by: Roger Pau Monné <roger.pau@xxxxxxxxxx>
> ---
> Changes since v2:
> - Revert back to the approach in v1.
You didn't fully go back to v1. While ...
> @@ -548,9 +524,10 @@ int domain_set_outstanding_pages(struct domain *d,
> unsigned long pages)
> unsigned long claim, avail_pages;
>
> /*
> - * take the domain's page_alloc_lock, else all d->tot_page adjustments
> - * must always take the global heap_lock rather than only in the much
> - * rarer case that d->outstanding_pages is non-zero
> + * Two locks are needed here:
> + * - d->page_alloc_lock: protects accesses to d->{tot,max,extra}_pages.
> + * - heap_lock: protects accesses to d->outstanding_pages,
> total_avail_pages
> + * and outstanding_claims.
> */
> nrspin_lock(&d->page_alloc_lock);
> spin_lock(&heap_lock);
... the comment improvement is of course okay to keep, ...
> @@ -999,7 +976,7 @@ static struct page_info *alloc_heap_pages(
> {
> nodeid_t node;
> unsigned int i, buddy_order, zone, first_dirty;
> - unsigned long request = 1UL << order;
> + unsigned int request = 1UL << order;
... this I'm less certain about (and if it was to be kept, it should also
become 1U). For one, this bounds check:
if ( (outstanding_claims + request > total_avail_pages) &&
ends up still being okay (perhaps except on Arm32, but the wrapping issue
there is pre-existing, albeit possibly benign due to other constraints),
but just because outstanding_claims is "long" (and it's unclear to me why
it isn't "unsigned long").
And then, what exactly is it that you want the more narrow type for (the
description says nothing in that regard)? The other relevant uses of the
variable are
avail[node][zone] -= request;
total_avail_pages -= request;
where both avail[][] and total_avail_pages are (unsigned) long (again
unclear to me why for total_avail_pages it's plain long).
Oh, wait, it is ...
> @@ -1071,6 +1050,30 @@ static struct page_info *alloc_heap_pages(
> total_avail_pages -= request;
> ASSERT(total_avail_pages >= 0);
>
> + if ( d && d->outstanding_pages && !(memflags & MEMF_no_refcount) )
> + {
> + /*
> + * Adjust claims in the same locked region where total_avail_pages is
> + * adjusted, not doing so would lead to a window where the amount of
> + * free memory (avail - claimed) would be incorrect.
> + *
> + * Note that by adjusting the claimed amount here it's possible for
> + * pages to fail to be assigned to the claiming domain while already
> + * having been subtracted from d->outstanding_pages. Such claimed
> + * amount is then lost, as the pages that fail to be assigned to the
> + * domain are freed without replenishing the claim. This is fine
> given
> + * claims are only to be used during physmap population as part of
> + * domain build, and any failure in assign_pages() there will result
> in
> + * the domain being destroyed before creation is finished. Losing
> part
> + * of the claim makes no difference.
> + */
> + unsigned int outstanding = min(d->outstanding_pages, request);
... the desire to avoid use of min_t() here which wants "request" to be
"unsigned int". At some point we'll want to change the struct domain fields
to unsigned long anyway, at which point the above would need adjustment. It's
well possible that such an adjustment would end up being to then use min_t().
Imo we'd be better off using e.g.
unsigned int outstanding = min(d->outstanding_pages + 0UL, request);
or even
typeof(d->outstanding_pages) outstanding =
min(d->outstanding_pages + 0UL, request);
right away. In the latter case the decl wouldn't even need touching when the
struct domain fields are promoted.
> + BUG_ON(outstanding > d->outstanding_pages);
Unlike in v1, where the min() was different, this is now dead code.
Jan
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |