Xen project Mailing List

Re: [PATCH v3] xen/mm: move adjustment of claimed pages counters on allocation

From: Roger Pau Monné <roger.pau@xxxxxxxxxx>

Date: Thu, 8 Jan 2026 09:44:20 +0100

Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=citrix.com; dmarc=pass action=none header.from=citrix.com; dkim=pass header.d=citrix.com; arc=none

Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=+/T2eOW4hlqGLEqSlp0gtgrihr4y8Frr47rfOGW+7P0=; b=BTP7rzXx39BumsZJiAPTBQt5bAUNT7n2yXn6uObscibVZlwIz7Juoscp2JVpVHrF691fOf40OMkFZCOzMxtRBmt/FS1RZUrHMaGREnzp9Yv2lhsE9KS4ym466UYGimJnd8b/7wuCUdlWmhpAjVVJf6THign+KQCLm/P6dWIm7bPZpelm5Zz6ho200H8dWmyf5VAzaMLy+Wd/iQ+jc6QQOtTELli95pZ22/FLMh/prEm4i73IcRlfbPrYJ4Rr1xcFTCuhx4C+PUKj2Zge5dfabVTYSk7H3xD0gUqmxIsCICwrQHuuaeOdgILZ3RwM7PM1bcjmxuEzlUlTqF1e8q4Opg==

Arc-seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=a0w+MygcQ2qEMoUH5VqsRzX0XkcxY3oQU5AoaYtdYH0jvhqNe7PLKj2ZXUqqQjcjmpN1Y/qOqxMhPQ3e4xoXl3u9oh6sW2Aw6MVsgkfDJqK6Sc0tTNoehGO63Uxg83Hq7UY1UdcY8l8hKAmAsYRycpSNmsseCXFnTxBwGHOWL3/CrFmK+5wbBZUQvbZrjatamivKWwbEGev+vvwKgbBz1rp7xLCTxbCmjwj/KY15OFnv1+ynSq7UfjenLrzKHolLQtdsxi2ZWF6ehRTZfQND0QOHsMeYze9nVpPrlRf2F76/Y72a1JbhLkk4OtxIm1DqDXnChth3Kf4pNa6wkbTxlQ==

Authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=citrix.com;

Cc: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Anthony PERARD <anthony.perard@xxxxxxxxxx>, Michal Orzel <michal.orzel@xxxxxxx>, Julien Grall <julien@xxxxxxx>, Stefano Stabellini <sstabellini@xxxxxxxxxx>, xen-devel@xxxxxxxxxxxxxxxxxxxx

Delivery-date: Thu, 08 Jan 2026 08:44:34 +0000

List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On Thu, Jan 08, 2026 at 09:24:51AM +0100, Jan Beulich wrote: > On 07.01.2026 18:56, Roger Pau Monne wrote: > > The current logic splits the update of the amount of available memory in > > the system (total_avail_pages) and pending claims into two separately > > locked regions. This leads to a window between counters adjustments where > > the result of total_avail_pages - outstanding_claims doesn't reflect the > > real amount of free memory available, and can return a negative value due > > to total_avail_pages having been updated ahead of outstanding_claims. > > > > Fix by adjusting outstanding_claims and d->outstanding_pages in the same > > place where total_avail_pages is updated. Note that accesses to > > d->outstanding_pages is protected by the global heap_lock, just like > > total_avail_pages or outstanding_claims. Add a comment to the field > > declaration, and also adjust the comment at the top of > > domain_set_outstanding_pages() to be clearer in that regard. > > > > Note that failures in assign_pages() causes the claimed amount that has > > been allocated to be lost, as the amount is not added back to the domain > > quota once pages are freed. Given the intended usage of claims is limited > > to initial physmap populate, and the current failure paths in > > assign_pages() would lead to the domain being destroyed anyway, don't > > add extra logic to recover the claimed amount on failure - it's just adding > > complexity for no real benefit. > > > > Fixes: 65c9792df600 ("mmu: Introduce XENMEM_claim_pages (subop of memory > > ops)") > > Signed-off-by: Roger Pau Monné <roger.pau@xxxxxxxxxx> > > --- > > Changes since v2: > > - Revert back to the approach in v1. > > You didn't fully go back to v1. While ... > > > @@ -548,9 +524,10 @@ int domain_set_outstanding_pages(struct domain *d, > > unsigned long pages) > > unsigned long claim, avail_pages; > > > > /* > > - * take the domain's page_alloc_lock, else all d->tot_page adjustments > > - * must always take the global heap_lock rather than only in the much > > - * rarer case that d->outstanding_pages is non-zero > > + * Two locks are needed here: > > + * - d->page_alloc_lock: protects accesses to > > d->{tot,max,extra}_pages. > > + * - heap_lock: protects accesses to d->outstanding_pages, > > total_avail_pages > > + * and outstanding_claims. > > */ > > nrspin_lock(&d->page_alloc_lock); > > spin_lock(&heap_lock); > > ... the comment improvement is of course okay to keep, ... > > > @@ -999,7 +976,7 @@ static struct page_info *alloc_heap_pages( > > { > > nodeid_t node; > > unsigned int i, buddy_order, zone, first_dirty; > > - unsigned long request = 1UL << order; > > + unsigned int request = 1UL << order; > > ... this I'm less certain about (and if it was to be kept, it should also > become 1U). For one, this bounds check: > > if ( (outstanding_claims + request > total_avail_pages) && > > ends up still being okay (perhaps except on Arm32, but the wrapping issue > there is pre-existing, albeit possibly benign due to other constraints), > but just because outstanding_claims is "long" (and it's unclear to me why > it isn't "unsigned long"). > > And then, what exactly is it that you want the more narrow type for (the > description says nothing in that regard)? The other relevant uses of the > variable are > > avail[node][zone] -= request; > total_avail_pages -= request; > > where both avail[][] and total_avail_pages are (unsigned) long (again > unclear to me why for total_avail_pages it's plain long). > > Oh, wait, it is ... > > > @@ -1071,6 +1050,30 @@ static struct page_info *alloc_heap_pages( > > total_avail_pages -= request; > > ASSERT(total_avail_pages >= 0); > > > > + if ( d && d->outstanding_pages && !(memflags & MEMF_no_refcount) ) > > + { > > + /* > > + * Adjust claims in the same locked region where total_avail_pages > > is > > + * adjusted, not doing so would lead to a window where the amount > > of > > + * free memory (avail - claimed) would be incorrect. > > + * > > + * Note that by adjusting the claimed amount here it's possible for > > + * pages to fail to be assigned to the claiming domain while > > already > > + * having been subtracted from d->outstanding_pages. Such claimed > > + * amount is then lost, as the pages that fail to be assigned to > > the > > + * domain are freed without replenishing the claim. This is fine > > given > > + * claims are only to be used during physmap population as part of > > + * domain build, and any failure in assign_pages() there will > > result in > > + * the domain being destroyed before creation is finished. Losing > > part > > + * of the claim makes no difference. > > + */ > > + unsigned int outstanding = min(d->outstanding_pages, request); > > ... the desire to avoid use of min_t() here which wants "request" to be > "unsigned int". At some point we'll want to change the struct domain fields > to unsigned long anyway, at which point the above would need adjustment. It's > well possible that such an adjustment would end up being to then use min_t(). > Imo we'd be better off using e.g. > > unsigned int outstanding = min(d->outstanding_pages + 0UL, request); > > or even > > typeof(d->outstanding_pages) outstanding = > min(d->outstanding_pages + 0UL, request); > > right away. In the latter case the decl wouldn't even need touching when the > struct domain fields are promoted. My preference would be: unsigned long outstanding = min(d->outstanding_pages + 0UL, request); If that's fine with you. > > + BUG_ON(outstanding > d->outstanding_pages); > > Unlike in v1, where the min() was different, this is now dead code. Oh, I need to adjust this so it's outstanding > outstanding_claims instead. Thanks, Roger.

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.