[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH 1/2] xen/heap: Split init_heap_pages() in two


  • To: Julien Grall <julien@xxxxxxx>
  • From: Jan Beulich <jbeulich@xxxxxxxx>
  • Date: Thu, 9 Jun 2022 15:12:43 +0200
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=suse.com; dmarc=pass action=none header.from=suse.com; dkim=pass header.d=suse.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=hXCEqLr78E0GK+gCnQehEZAMC4x4ISHwt4p+XJw6k4c=; b=nVWaaslSyiUM0u9fYiFrkLglX9bBKg7gFLktMtbmaXE6/6cMQQd6A6gbz2CYpzKGhZ9ab7CRY48S4rjzAVXced7Di3tWiPuxrHRwVodhhtYlWKBHwwI0AOTN4VdTM0eK3sS+KHPXGwQ6jpS20rRbysIyE/BK+kEqIBP8IzMO/HBXJ8vRMI0Jk4lvFS1zxNeLSuJ3PKBPFSTr3uKzPl2d+k80Jkgpc89ewZAK2HUCmo8vVYgUxDKL2JBraTf8JNciupJPsHFAnHleYweMCyQinslGU2jIH5TE/7w24rBammclx7Hr3f+PIb8eHBQadO2tyBoUZzTbAjXPHtWfR6mKMQ==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=jyiJHIdhsjCKCErzx2rT6/Gw2tbRYiBlxMLU6GkgS6RM/sjQXvJzyq84ZjJOo0iaLrR7vYiwRlbf7eo6Ii85W2L1ienTGTNPns030v3XcftrRpPicl2liWPgv2smt7C77QgPYmDcGdLvhp8Xz0b2dU57BRdNLoOCCR2CoD+hClYHVH6i1DCcv5UfoOkCkvas7opaAAor4rRPzajY/xeYH6Vpo6c3Yb3ft1oYRzoftavRjUmay/T2Xs1QEL8J3nBNNtvegGTLm0AJ23W/m3rJLiGVs/XI1eeackPOD5k/5QZeYiw7h9XyRCvs380EokXvZIN7PQOxTCV5RC//aRZoaw==
  • Authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=suse.com;
  • Cc: bertrand.marquis@xxxxxxx, Julien Grall <jgrall@xxxxxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, George Dunlap <george.dunlap@xxxxxxxxxx>, Stefano Stabellini <sstabellini@xxxxxxxxxx>, Wei Liu <wl@xxxxxxx>, xen-devel@xxxxxxxxxxxxxxxxxxxx
  • Delivery-date: Thu, 09 Jun 2022 13:12:52 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 09.06.2022 14:33, Julien Grall wrote:
> On 09/06/2022 13:09, Jan Beulich wrote:
>> On 09.06.2022 10:30, Julien Grall wrote:
>>> From: Julien Grall <jgrall@xxxxxxxxxx>
>>>
>>> At the moment, init_heap_pages() will call free_heap_pages() page
>>> by page. To reduce the time to initialize the heap, we will want
>>> to provide multiple pages at the same time.
>>>
>>> init_heap_pages() is now split in two parts:
>>>      - init_heap_pages(): will break down the range in multiple set
>>>        of contiguous pages. For now, the criteria is the pages should
>>>        belong to the same NUMA node.
>>>      - init_contig_pages(): will initialize a set of contiguous pages.
>>>        For now the pages are still passed one by one to free_heap_pages().
>>
>> Hmm, the common use of "contiguous" is to describe address correlation.
>> Therefore I'm afraid I'd like to see "contig" avoided when you mean
>> "same node". Perhaps init_node_pages()?
> 
> After the next patch, it will not only be the same node, It will also be 
> the same zone at least. Also, in the future, I would like to 
> re-submitting David Woodhouse patch to exclude broken pages (see [1]).
> 
> Therefore, I think the name init_node_pages() would not be suitable. 
> Please suggest a different name.

_init_heap_pages() then, as a helper of init_heap_pages()?

>>> --- a/xen/common/page_alloc.c
>>> +++ b/xen/common/page_alloc.c
>>> @@ -1778,16 +1778,55 @@ int query_page_offline(mfn_t mfn, uint32_t *status)
>>>   }
>>>   
>>>   /*
>>> - * Hand the specified arbitrary page range to the specified heap zone
>>> - * checking the node_id of the previous page.  If they differ and the
>>> - * latter is not on a MAX_ORDER boundary, then we reserve the page by
>>> - * not freeing it to the buddy allocator.
>>> + * init_contig_heap_pages() is intended to only take pages from the same
>>> + * NUMA node.
>>>    */
>>> +static bool is_contig_page(struct page_info *pg, unsigned int nid)
>>> +{
>>> +    return (nid == (phys_to_nid(page_to_maddr(pg))));
>>> +}
>>
>> If such a helper is indeed needed, then I think it absolutely wants
>> pg to be pointer-to-const. And imo it would also help readability if
>> the extra pair of parentheses around the nested function calls was
>> omitted. Given the naming concern, though, I wonder whether this
>> wouldn't better be open-coded in the one place it is used. Actually
>> naming is quite odd here beyond what I'd said further up: "Is this
>> page contiguous?" Such a question requires two pages, i.e. "Are these
>> two pages contiguous?" What you want to know is "Is this page on the
>> given node?"
> 
> There will be more check in the future (see next patch). I created an 
> helper because it reduces the size of the loop init_heap_pages(). I 
> would be OK to fold it if you strongly prefer that.

I don't "strongly" prefer that; I'd also be okay with a suitably named
helper. Just that I can't seem to be able to come up with any good name.

>>> +/*
>>> + * This function should only be called with valid pages from the same NUMA
>>> + * node.
>>> + *
>>> + * Callers should use is_contig_page() first to check if all the pages
>>> + * in a range are contiguous.
>>> + */
>>> +static void init_contig_heap_pages(struct page_info *pg, unsigned long 
>>> nr_pages,
>>> +                                   bool need_scrub)
>>> +{
>>> +    unsigned long s, e;
>>> +    unsigned int nid = phys_to_nid(page_to_maddr(pg));
>>> +
>>> +    s = mfn_x(page_to_mfn(pg));
>>> +    e = mfn_x(mfn_add(page_to_mfn(pg + nr_pages - 1), 1));
>>> +    if ( unlikely(!avail[nid]) )
>>> +    {
>>> +        bool use_tail = !(s & ((1UL << MAX_ORDER) - 1)) &&
>>
>> IS_ALIGNED(s, 1UL << MAX_ORDER) to "describe" what's meant?
> 
> This is existing code and it is quite complex. So I would prefer if we 
> avoid to simplify and move the code in the same patch. I would be happy 
> to write a follow-up patch to switch to IS_ALIGNED().

I do realize it's code you move, but I can accept your desire to merely
move the code without any cleanup. Personally I think that rather than a
follow-up patch (which doesn't help the reviewing of this one) such an
adjustment would better be a prereq one.

>>> @@ -1812,35 +1851,24 @@ static void init_heap_pages(
>>>       spin_unlock(&heap_lock);
>>>   
>>>       if ( system_state < SYS_STATE_active && opt_bootscrub == 
>>> BOOTSCRUB_IDLE )
>>> -        idle_scrub = true;
>>> +        need_scrub = true;
>>>   
>>> -    for ( i = 0; i < nr_pages; i++ )
>>> +    for ( i = 0; i < nr_pages; )
>>>       {
>>> -        unsigned int nid = phys_to_nid(page_to_maddr(pg+i));
>>> +        unsigned int nid = phys_to_nid(page_to_maddr(pg));
>>> +        unsigned long left = nr_pages - i;
>>> +        unsigned long contig_pages;
>>>   
>>> -        if ( unlikely(!avail[nid]) )
>>> +        for ( contig_pages = 1; contig_pages < left; contig_pages++ )
>>>           {
>>> -            unsigned long s = mfn_x(page_to_mfn(pg + i));
>>> -            unsigned long e = mfn_x(mfn_add(page_to_mfn(pg + nr_pages - 
>>> 1), 1));
>>> -            bool use_tail = (nid == phys_to_nid(pfn_to_paddr(e - 1))) &&
>>> -                            !(s & ((1UL << MAX_ORDER) - 1)) &&
>>> -                            (find_first_set_bit(e) <= 
>>> find_first_set_bit(s));
>>> -            unsigned long n;
>>> -
>>> -            n = init_node_heap(nid, mfn_x(page_to_mfn(pg + i)), nr_pages - 
>>> i,
>>> -                               &use_tail);
>>> -            BUG_ON(i + n > nr_pages);
>>> -            if ( n && !use_tail )
>>> -            {
>>> -                i += n - 1;
>>> -                continue;
>>> -            }
>>> -            if ( i + n == nr_pages )
>>> +            if ( !is_contig_page(pg + contig_pages, nid) )
>>>                   break;
>>> -            nr_pages -= n;
>>>           }
>>
>> Isn't doing this page by page in a loop quite inefficient? Can't you
>> simply obtain the end of the node's range covering the first page, and
>> then adjust "left" accordingly?
> 
> The page by page is necessary because we may need to exclude some pages 
> (see [1]) or the range may cross multiple-zone (see [2]).

If you want/need to do this for "future" reasons (aiui [1] was never
committed, and you forgot to supply [2], but yes, splitting at zone
boundaries is of course necessary), then I think this wants making quite
clear in the description.

Jan

>> I even wonder whether the admittedly
>> lax original check's assumption couldn't be leveraged alternatively,
>> by effectively bisecting to the end address on the node of interest
>> (where the assumption is that nodes aren't interleaved - see Wei's
>> NUMA series dealing with precisely that situation).
> See above. We went this way because there are some pages to be excluded. 
> The immediate use case is broken pages, but in the future we may need to 
> also exclude pages that contain guest content after Live-Update.
> 
> I also plan to get rid of the loop in free_heap_pages() to mark each 
> page free. This would mean that pages would only be accessed once in 
> init_heap_pages() (I am still cleaning up that patch and it is a much 
> more controversial change).
> 
> Cheers,
> 
> [1] 
> https://lore.kernel.org/xen-devel/20200201003303.2363081-2-dwmw2@xxxxxxxxxxxxx/
> 
>>
>> Jan
> 




 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.