[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v3] xen/mm: move adjustment of claimed pages counters on allocation


  • To: Jan Beulich <jbeulich@xxxxxxxx>
  • From: Roger Pau Monné <roger.pau@xxxxxxxxxx>
  • Date: Thu, 8 Jan 2026 09:44:20 +0100
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=citrix.com; dmarc=pass action=none header.from=citrix.com; dkim=pass header.d=citrix.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=+/T2eOW4hlqGLEqSlp0gtgrihr4y8Frr47rfOGW+7P0=; b=BTP7rzXx39BumsZJiAPTBQt5bAUNT7n2yXn6uObscibVZlwIz7Juoscp2JVpVHrF691fOf40OMkFZCOzMxtRBmt/FS1RZUrHMaGREnzp9Yv2lhsE9KS4ym466UYGimJnd8b/7wuCUdlWmhpAjVVJf6THign+KQCLm/P6dWIm7bPZpelm5Zz6ho200H8dWmyf5VAzaMLy+Wd/iQ+jc6QQOtTELli95pZ22/FLMh/prEm4i73IcRlfbPrYJ4Rr1xcFTCuhx4C+PUKj2Zge5dfabVTYSk7H3xD0gUqmxIsCICwrQHuuaeOdgILZ3RwM7PM1bcjmxuEzlUlTqF1e8q4Opg==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=a0w+MygcQ2qEMoUH5VqsRzX0XkcxY3oQU5AoaYtdYH0jvhqNe7PLKj2ZXUqqQjcjmpN1Y/qOqxMhPQ3e4xoXl3u9oh6sW2Aw6MVsgkfDJqK6Sc0tTNoehGO63Uxg83Hq7UY1UdcY8l8hKAmAsYRycpSNmsseCXFnTxBwGHOWL3/CrFmK+5wbBZUQvbZrjatamivKWwbEGev+vvwKgbBz1rp7xLCTxbCmjwj/KY15OFnv1+ynSq7UfjenLrzKHolLQtdsxi2ZWF6ehRTZfQND0QOHsMeYze9nVpPrlRf2F76/Y72a1JbhLkk4OtxIm1DqDXnChth3Kf4pNa6wkbTxlQ==
  • Authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=citrix.com;
  • Cc: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Anthony PERARD <anthony.perard@xxxxxxxxxx>, Michal Orzel <michal.orzel@xxxxxxx>, Julien Grall <julien@xxxxxxx>, Stefano Stabellini <sstabellini@xxxxxxxxxx>, xen-devel@xxxxxxxxxxxxxxxxxxxx
  • Delivery-date: Thu, 08 Jan 2026 08:44:34 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On Thu, Jan 08, 2026 at 09:24:51AM +0100, Jan Beulich wrote:
> On 07.01.2026 18:56, Roger Pau Monne wrote:
> > The current logic splits the update of the amount of available memory in
> > the system (total_avail_pages) and pending claims into two separately
> > locked regions.  This leads to a window between counters adjustments where
> > the result of total_avail_pages - outstanding_claims doesn't reflect the
> > real amount of free memory available, and can return a negative value due
> > to total_avail_pages having been updated ahead of outstanding_claims.
> > 
> > Fix by adjusting outstanding_claims and d->outstanding_pages in the same
> > place where total_avail_pages is updated.  Note that accesses to
> > d->outstanding_pages is protected by the global heap_lock, just like
> > total_avail_pages or outstanding_claims.  Add a comment to the field
> > declaration, and also adjust the comment at the top of
> > domain_set_outstanding_pages() to be clearer in that regard.
> > 
> > Note that failures in assign_pages() causes the claimed amount that has
> > been allocated to be lost, as the amount is not added back to the domain
> > quota once pages are freed.  Given the intended usage of claims is limited
> > to initial physmap populate, and the current failure paths in
> > assign_pages() would lead to the domain being destroyed anyway, don't
> > add extra logic to recover the claimed amount on failure - it's just adding
> > complexity for no real benefit.
> > 
> > Fixes: 65c9792df600 ("mmu: Introduce XENMEM_claim_pages (subop of memory 
> > ops)")
> > Signed-off-by: Roger Pau Monné <roger.pau@xxxxxxxxxx>
> > ---
> > Changes since v2:
> >  - Revert back to the approach in v1.
> 
> You didn't fully go back to v1. While ...
> 
> > @@ -548,9 +524,10 @@ int domain_set_outstanding_pages(struct domain *d, 
> > unsigned long pages)
> >      unsigned long claim, avail_pages;
> >  
> >      /*
> > -     * take the domain's page_alloc_lock, else all d->tot_page adjustments
> > -     * must always take the global heap_lock rather than only in the much
> > -     * rarer case that d->outstanding_pages is non-zero
> > +     * Two locks are needed here:
> > +     *  - d->page_alloc_lock: protects accesses to 
> > d->{tot,max,extra}_pages.
> > +     *  - heap_lock: protects accesses to d->outstanding_pages, 
> > total_avail_pages
> > +     *    and outstanding_claims.
> >       */
> >      nrspin_lock(&d->page_alloc_lock);
> >      spin_lock(&heap_lock);
> 
> ... the comment improvement is of course okay to keep, ...
> 
> > @@ -999,7 +976,7 @@ static struct page_info *alloc_heap_pages(
> >  {
> >      nodeid_t node;
> >      unsigned int i, buddy_order, zone, first_dirty;
> > -    unsigned long request = 1UL << order;
> > +    unsigned int request = 1UL << order;
> 
> ... this I'm less certain about (and if it was to be kept, it should also
> become 1U). For one, this bounds check:
> 
>     if ( (outstanding_claims + request > total_avail_pages) &&
> 
> ends up still being okay (perhaps except on Arm32, but the wrapping issue
> there is pre-existing, albeit possibly benign due to other constraints),
> but just because outstanding_claims is "long" (and it's unclear to me why
> it isn't "unsigned long").
> 
> And then, what exactly is it that you want the more narrow type for (the
> description says nothing in that regard)? The other relevant uses of the
> variable are
> 
>     avail[node][zone] -= request;
>     total_avail_pages -= request;
> 
> where both avail[][] and total_avail_pages are (unsigned) long (again
> unclear to me why for total_avail_pages it's plain long).
> 
> Oh, wait, it is ...
> 
> > @@ -1071,6 +1050,30 @@ static struct page_info *alloc_heap_pages(
> >      total_avail_pages -= request;
> >      ASSERT(total_avail_pages >= 0);
> >  
> > +    if ( d && d->outstanding_pages && !(memflags & MEMF_no_refcount) )
> > +    {
> > +        /*
> > +         * Adjust claims in the same locked region where total_avail_pages 
> > is
> > +         * adjusted, not doing so would lead to a window where the amount 
> > of
> > +         * free memory (avail - claimed) would be incorrect.
> > +         *
> > +         * Note that by adjusting the claimed amount here it's possible for
> > +         * pages to fail to be assigned to the claiming domain while 
> > already
> > +         * having been subtracted from d->outstanding_pages.  Such claimed
> > +         * amount is then lost, as the pages that fail to be assigned to 
> > the
> > +         * domain are freed without replenishing the claim.  This is fine 
> > given
> > +         * claims are only to be used during physmap population as part of
> > +         * domain build, and any failure in assign_pages() there will 
> > result in
> > +         * the domain being destroyed before creation is finished.  Losing 
> > part
> > +         * of the claim makes no difference.
> > +         */
> > +        unsigned int outstanding = min(d->outstanding_pages, request);
> 
> ... the desire to avoid use of min_t() here which wants "request" to be
> "unsigned int". At some point we'll want to change the struct domain fields
> to unsigned long anyway, at which point the above would need adjustment. It's
> well possible that such an adjustment would end up being to then use min_t().
> Imo we'd be better off using e.g.
> 
>         unsigned int outstanding = min(d->outstanding_pages + 0UL, request);
> 
> or even
> 
>         typeof(d->outstanding_pages) outstanding =
>             min(d->outstanding_pages + 0UL, request);
> 
> right away. In the latter case the decl wouldn't even need touching when the
> struct domain fields are promoted.

My preference would be:

unsigned long outstanding = min(d->outstanding_pages + 0UL, request);

If that's fine with you.

> > +        BUG_ON(outstanding > d->outstanding_pages);
> 
> Unlike in v1, where the min() was different, this is now dead code.

Oh, I need to adjust this so it's outstanding > outstanding_claims
instead.

Thanks, Roger.



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.