[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v3] xen/mm: move adjustment of claimed pages counters on allocation


  • To: Roger Pau Monné <roger.pau@xxxxxxxxxx>
  • From: Jan Beulich <jbeulich@xxxxxxxx>
  • Date: Thu, 8 Jan 2026 10:20:40 +0100
  • Autocrypt: addr=jbeulich@xxxxxxxx; keydata= xsDiBFk3nEQRBADAEaSw6zC/EJkiwGPXbWtPxl2xCdSoeepS07jW8UgcHNurfHvUzogEq5xk hu507c3BarVjyWCJOylMNR98Yd8VqD9UfmX0Hb8/BrA+Hl6/DB/eqGptrf4BSRwcZQM32aZK 7Pj2XbGWIUrZrd70x1eAP9QE3P79Y2oLrsCgbZJfEwCgvz9JjGmQqQkRiTVzlZVCJYcyGGsD /0tbFCzD2h20ahe8rC1gbb3K3qk+LpBtvjBu1RY9drYk0NymiGbJWZgab6t1jM7sk2vuf0Py O9Hf9XBmK0uE9IgMaiCpc32XV9oASz6UJebwkX+zF2jG5I1BfnO9g7KlotcA/v5ClMjgo6Gl MDY4HxoSRu3i1cqqSDtVlt+AOVBJBACrZcnHAUSuCXBPy0jOlBhxPqRWv6ND4c9PH1xjQ3NP nxJuMBS8rnNg22uyfAgmBKNLpLgAGVRMZGaGoJObGf72s6TeIqKJo/LtggAS9qAUiuKVnygo 3wjfkS9A3DRO+SpU7JqWdsveeIQyeyEJ/8PTowmSQLakF+3fote9ybzd880fSmFuIEJldWxp Y2ggPGpiZXVsaWNoQHN1c2UuY29tPsJgBBMRAgAgBQJZN5xEAhsDBgsJCAcDAgQVAggDBBYC AwECHgECF4AACgkQoDSui/t3IH4J+wCfQ5jHdEjCRHj23O/5ttg9r9OIruwAn3103WUITZee e7Sbg12UgcQ5lv7SzsFNBFk3nEQQCACCuTjCjFOUdi5Nm244F+78kLghRcin/awv+IrTcIWF hUpSs1Y91iQQ7KItirz5uwCPlwejSJDQJLIS+QtJHaXDXeV6NI0Uef1hP20+y8qydDiVkv6l IreXjTb7DvksRgJNvCkWtYnlS3mYvQ9NzS9PhyALWbXnH6sIJd2O9lKS1Mrfq+y0IXCP10eS FFGg+Av3IQeFatkJAyju0PPthyTqxSI4lZYuJVPknzgaeuJv/2NccrPvmeDg6Coe7ZIeQ8Yj t0ARxu2xytAkkLCel1Lz1WLmwLstV30g80nkgZf/wr+/BXJW/oIvRlonUkxv+IbBM3dX2OV8 AmRv1ySWPTP7AAMFB/9PQK/VtlNUJvg8GXj9ootzrteGfVZVVT4XBJkfwBcpC/XcPzldjv+3 HYudvpdNK3lLujXeA5fLOH+Z/G9WBc5pFVSMocI71I8bT8lIAzreg0WvkWg5V2WZsUMlnDL9 mpwIGFhlbM3gfDMs7MPMu8YQRFVdUvtSpaAs8OFfGQ0ia3LGZcjA6Ik2+xcqscEJzNH+qh8V m5jjp28yZgaqTaRbg3M/+MTbMpicpZuqF4rnB0AQD12/3BNWDR6bmh+EkYSMcEIpQmBM51qM EKYTQGybRCjpnKHGOxG0rfFY1085mBDZCH5Kx0cl0HVJuQKC+dV2ZY5AqjcKwAxpE75MLFkr wkkEGBECAAkFAlk3nEQCGwwACgkQoDSui/t3IH7nnwCfcJWUDUFKdCsBH/E5d+0ZnMQi+G0A nAuWpQkjM1ASeQwSHEeAWPgskBQL
  • Cc: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Anthony PERARD <anthony.perard@xxxxxxxxxx>, Michal Orzel <michal.orzel@xxxxxxx>, Julien Grall <julien@xxxxxxx>, Stefano Stabellini <sstabellini@xxxxxxxxxx>, xen-devel@xxxxxxxxxxxxxxxxxxxx
  • Delivery-date: Thu, 08 Jan 2026 09:20:50 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 08.01.2026 09:44, Roger Pau Monné wrote:
> On Thu, Jan 08, 2026 at 09:24:51AM +0100, Jan Beulich wrote:
>> On 07.01.2026 18:56, Roger Pau Monne wrote:
>>> The current logic splits the update of the amount of available memory in
>>> the system (total_avail_pages) and pending claims into two separately
>>> locked regions.  This leads to a window between counters adjustments where
>>> the result of total_avail_pages - outstanding_claims doesn't reflect the
>>> real amount of free memory available, and can return a negative value due
>>> to total_avail_pages having been updated ahead of outstanding_claims.
>>>
>>> Fix by adjusting outstanding_claims and d->outstanding_pages in the same
>>> place where total_avail_pages is updated.  Note that accesses to
>>> d->outstanding_pages is protected by the global heap_lock, just like
>>> total_avail_pages or outstanding_claims.  Add a comment to the field
>>> declaration, and also adjust the comment at the top of
>>> domain_set_outstanding_pages() to be clearer in that regard.
>>>
>>> Note that failures in assign_pages() causes the claimed amount that has
>>> been allocated to be lost, as the amount is not added back to the domain
>>> quota once pages are freed.  Given the intended usage of claims is limited
>>> to initial physmap populate, and the current failure paths in
>>> assign_pages() would lead to the domain being destroyed anyway, don't
>>> add extra logic to recover the claimed amount on failure - it's just adding
>>> complexity for no real benefit.
>>>
>>> Fixes: 65c9792df600 ("mmu: Introduce XENMEM_claim_pages (subop of memory 
>>> ops)")
>>> Signed-off-by: Roger Pau Monné <roger.pau@xxxxxxxxxx>
>>> ---
>>> Changes since v2:
>>>  - Revert back to the approach in v1.
>>
>> You didn't fully go back to v1. While ...
>>
>>> @@ -548,9 +524,10 @@ int domain_set_outstanding_pages(struct domain *d, 
>>> unsigned long pages)
>>>      unsigned long claim, avail_pages;
>>>  
>>>      /*
>>> -     * take the domain's page_alloc_lock, else all d->tot_page adjustments
>>> -     * must always take the global heap_lock rather than only in the much
>>> -     * rarer case that d->outstanding_pages is non-zero
>>> +     * Two locks are needed here:
>>> +     *  - d->page_alloc_lock: protects accesses to 
>>> d->{tot,max,extra}_pages.
>>> +     *  - heap_lock: protects accesses to d->outstanding_pages, 
>>> total_avail_pages
>>> +     *    and outstanding_claims.
>>>       */
>>>      nrspin_lock(&d->page_alloc_lock);
>>>      spin_lock(&heap_lock);
>>
>> ... the comment improvement is of course okay to keep, ...
>>
>>> @@ -999,7 +976,7 @@ static struct page_info *alloc_heap_pages(
>>>  {
>>>      nodeid_t node;
>>>      unsigned int i, buddy_order, zone, first_dirty;
>>> -    unsigned long request = 1UL << order;
>>> +    unsigned int request = 1UL << order;
>>
>> ... this I'm less certain about (and if it was to be kept, it should also
>> become 1U). For one, this bounds check:
>>
>>     if ( (outstanding_claims + request > total_avail_pages) &&
>>
>> ends up still being okay (perhaps except on Arm32, but the wrapping issue
>> there is pre-existing, albeit possibly benign due to other constraints),
>> but just because outstanding_claims is "long" (and it's unclear to me why
>> it isn't "unsigned long").
>>
>> And then, what exactly is it that you want the more narrow type for (the
>> description says nothing in that regard)? The other relevant uses of the
>> variable are
>>
>>     avail[node][zone] -= request;
>>     total_avail_pages -= request;
>>
>> where both avail[][] and total_avail_pages are (unsigned) long (again
>> unclear to me why for total_avail_pages it's plain long).
>>
>> Oh, wait, it is ...
>>
>>> @@ -1071,6 +1050,30 @@ static struct page_info *alloc_heap_pages(
>>>      total_avail_pages -= request;
>>>      ASSERT(total_avail_pages >= 0);
>>>  
>>> +    if ( d && d->outstanding_pages && !(memflags & MEMF_no_refcount) )
>>> +    {
>>> +        /*
>>> +         * Adjust claims in the same locked region where total_avail_pages 
>>> is
>>> +         * adjusted, not doing so would lead to a window where the amount 
>>> of
>>> +         * free memory (avail - claimed) would be incorrect.
>>> +         *
>>> +         * Note that by adjusting the claimed amount here it's possible for
>>> +         * pages to fail to be assigned to the claiming domain while 
>>> already
>>> +         * having been subtracted from d->outstanding_pages.  Such claimed
>>> +         * amount is then lost, as the pages that fail to be assigned to 
>>> the
>>> +         * domain are freed without replenishing the claim.  This is fine 
>>> given
>>> +         * claims are only to be used during physmap population as part of
>>> +         * domain build, and any failure in assign_pages() there will 
>>> result in
>>> +         * the domain being destroyed before creation is finished.  Losing 
>>> part
>>> +         * of the claim makes no difference.
>>> +         */
>>> +        unsigned int outstanding = min(d->outstanding_pages, request);
>>
>> ... the desire to avoid use of min_t() here which wants "request" to be
>> "unsigned int". At some point we'll want to change the struct domain fields
>> to unsigned long anyway, at which point the above would need adjustment. It's
>> well possible that such an adjustment would end up being to then use min_t().
>> Imo we'd be better off using e.g.
>>
>>         unsigned int outstanding = min(d->outstanding_pages + 0UL, request);
>>
>> or even
>>
>>         typeof(d->outstanding_pages) outstanding =
>>             min(d->outstanding_pages + 0UL, request);
>>
>> right away. In the latter case the decl wouldn't even need touching when the
>> struct domain fields are promoted.
> 
> My preference would be:
> 
> unsigned long outstanding = min(d->outstanding_pages + 0UL, request);
> 
> If that's fine with you.

It is.

>>> +        BUG_ON(outstanding > d->outstanding_pages);
>>
>> Unlike in v1, where the min() was different, this is now dead code.
> 
> Oh, I need to adjust this so it's outstanding > outstanding_claims
> instead.

And then:
Reviewed-by: Jan Beulich <jbeulich@xxxxxxxx>

Jan



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.