|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH 1/2] xen/heap: Split init_heap_pages() in two
On 09.06.2022 14:33, Julien Grall wrote:
> On 09/06/2022 13:09, Jan Beulich wrote:
>> On 09.06.2022 10:30, Julien Grall wrote:
>>> From: Julien Grall <jgrall@xxxxxxxxxx>
>>>
>>> At the moment, init_heap_pages() will call free_heap_pages() page
>>> by page. To reduce the time to initialize the heap, we will want
>>> to provide multiple pages at the same time.
>>>
>>> init_heap_pages() is now split in two parts:
>>> - init_heap_pages(): will break down the range in multiple set
>>> of contiguous pages. For now, the criteria is the pages should
>>> belong to the same NUMA node.
>>> - init_contig_pages(): will initialize a set of contiguous pages.
>>> For now the pages are still passed one by one to free_heap_pages().
>>
>> Hmm, the common use of "contiguous" is to describe address correlation.
>> Therefore I'm afraid I'd like to see "contig" avoided when you mean
>> "same node". Perhaps init_node_pages()?
>
> After the next patch, it will not only be the same node, It will also be
> the same zone at least. Also, in the future, I would like to
> re-submitting David Woodhouse patch to exclude broken pages (see [1]).
>
> Therefore, I think the name init_node_pages() would not be suitable.
> Please suggest a different name.
_init_heap_pages() then, as a helper of init_heap_pages()?
>>> --- a/xen/common/page_alloc.c
>>> +++ b/xen/common/page_alloc.c
>>> @@ -1778,16 +1778,55 @@ int query_page_offline(mfn_t mfn, uint32_t *status)
>>> }
>>>
>>> /*
>>> - * Hand the specified arbitrary page range to the specified heap zone
>>> - * checking the node_id of the previous page. If they differ and the
>>> - * latter is not on a MAX_ORDER boundary, then we reserve the page by
>>> - * not freeing it to the buddy allocator.
>>> + * init_contig_heap_pages() is intended to only take pages from the same
>>> + * NUMA node.
>>> */
>>> +static bool is_contig_page(struct page_info *pg, unsigned int nid)
>>> +{
>>> + return (nid == (phys_to_nid(page_to_maddr(pg))));
>>> +}
>>
>> If such a helper is indeed needed, then I think it absolutely wants
>> pg to be pointer-to-const. And imo it would also help readability if
>> the extra pair of parentheses around the nested function calls was
>> omitted. Given the naming concern, though, I wonder whether this
>> wouldn't better be open-coded in the one place it is used. Actually
>> naming is quite odd here beyond what I'd said further up: "Is this
>> page contiguous?" Such a question requires two pages, i.e. "Are these
>> two pages contiguous?" What you want to know is "Is this page on the
>> given node?"
>
> There will be more check in the future (see next patch). I created an
> helper because it reduces the size of the loop init_heap_pages(). I
> would be OK to fold it if you strongly prefer that.
I don't "strongly" prefer that; I'd also be okay with a suitably named
helper. Just that I can't seem to be able to come up with any good name.
>>> +/*
>>> + * This function should only be called with valid pages from the same NUMA
>>> + * node.
>>> + *
>>> + * Callers should use is_contig_page() first to check if all the pages
>>> + * in a range are contiguous.
>>> + */
>>> +static void init_contig_heap_pages(struct page_info *pg, unsigned long
>>> nr_pages,
>>> + bool need_scrub)
>>> +{
>>> + unsigned long s, e;
>>> + unsigned int nid = phys_to_nid(page_to_maddr(pg));
>>> +
>>> + s = mfn_x(page_to_mfn(pg));
>>> + e = mfn_x(mfn_add(page_to_mfn(pg + nr_pages - 1), 1));
>>> + if ( unlikely(!avail[nid]) )
>>> + {
>>> + bool use_tail = !(s & ((1UL << MAX_ORDER) - 1)) &&
>>
>> IS_ALIGNED(s, 1UL << MAX_ORDER) to "describe" what's meant?
>
> This is existing code and it is quite complex. So I would prefer if we
> avoid to simplify and move the code in the same patch. I would be happy
> to write a follow-up patch to switch to IS_ALIGNED().
I do realize it's code you move, but I can accept your desire to merely
move the code without any cleanup. Personally I think that rather than a
follow-up patch (which doesn't help the reviewing of this one) such an
adjustment would better be a prereq one.
>>> @@ -1812,35 +1851,24 @@ static void init_heap_pages(
>>> spin_unlock(&heap_lock);
>>>
>>> if ( system_state < SYS_STATE_active && opt_bootscrub ==
>>> BOOTSCRUB_IDLE )
>>> - idle_scrub = true;
>>> + need_scrub = true;
>>>
>>> - for ( i = 0; i < nr_pages; i++ )
>>> + for ( i = 0; i < nr_pages; )
>>> {
>>> - unsigned int nid = phys_to_nid(page_to_maddr(pg+i));
>>> + unsigned int nid = phys_to_nid(page_to_maddr(pg));
>>> + unsigned long left = nr_pages - i;
>>> + unsigned long contig_pages;
>>>
>>> - if ( unlikely(!avail[nid]) )
>>> + for ( contig_pages = 1; contig_pages < left; contig_pages++ )
>>> {
>>> - unsigned long s = mfn_x(page_to_mfn(pg + i));
>>> - unsigned long e = mfn_x(mfn_add(page_to_mfn(pg + nr_pages -
>>> 1), 1));
>>> - bool use_tail = (nid == phys_to_nid(pfn_to_paddr(e - 1))) &&
>>> - !(s & ((1UL << MAX_ORDER) - 1)) &&
>>> - (find_first_set_bit(e) <=
>>> find_first_set_bit(s));
>>> - unsigned long n;
>>> -
>>> - n = init_node_heap(nid, mfn_x(page_to_mfn(pg + i)), nr_pages -
>>> i,
>>> - &use_tail);
>>> - BUG_ON(i + n > nr_pages);
>>> - if ( n && !use_tail )
>>> - {
>>> - i += n - 1;
>>> - continue;
>>> - }
>>> - if ( i + n == nr_pages )
>>> + if ( !is_contig_page(pg + contig_pages, nid) )
>>> break;
>>> - nr_pages -= n;
>>> }
>>
>> Isn't doing this page by page in a loop quite inefficient? Can't you
>> simply obtain the end of the node's range covering the first page, and
>> then adjust "left" accordingly?
>
> The page by page is necessary because we may need to exclude some pages
> (see [1]) or the range may cross multiple-zone (see [2]).
If you want/need to do this for "future" reasons (aiui [1] was never
committed, and you forgot to supply [2], but yes, splitting at zone
boundaries is of course necessary), then I think this wants making quite
clear in the description.
Jan
>> I even wonder whether the admittedly
>> lax original check's assumption couldn't be leveraged alternatively,
>> by effectively bisecting to the end address on the node of interest
>> (where the assumption is that nodes aren't interleaved - see Wei's
>> NUMA series dealing with precisely that situation).
> See above. We went this way because there are some pages to be excluded.
> The immediate use case is broken pages, but in the future we may need to
> also exclude pages that contain guest content after Live-Update.
>
> I also plan to get rid of the loop in free_heap_pages() to mark each
> page free. This would mean that pages would only be accessed once in
> init_heap_pages() (I am still cleaning up that patch and it is a much
> more controversial change).
>
> Cheers,
>
> [1]
> https://lore.kernel.org/xen-devel/20200201003303.2363081-2-dwmw2@xxxxxxxxxxxxx/
>
>>
>> Jan
>
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |