Xen project Mailing List

Re: [PATCH 1/2] xen/heap: Split init_heap_pages() in two

Date: Thu, 9 Jun 2022 15:12:43 +0200

Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=suse.com; dmarc=pass action=none header.from=suse.com; dkim=pass header.d=suse.com; arc=none

Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=hXCEqLr78E0GK+gCnQehEZAMC4x4ISHwt4p+XJw6k4c=; b=nVWaaslSyiUM0u9fYiFrkLglX9bBKg7gFLktMtbmaXE6/6cMQQd6A6gbz2CYpzKGhZ9ab7CRY48S4rjzAVXced7Di3tWiPuxrHRwVodhhtYlWKBHwwI0AOTN4VdTM0eK3sS+KHPXGwQ6jpS20rRbysIyE/BK+kEqIBP8IzMO/HBXJ8vRMI0Jk4lvFS1zxNeLSuJ3PKBPFSTr3uKzPl2d+k80Jkgpc89ewZAK2HUCmo8vVYgUxDKL2JBraTf8JNciupJPsHFAnHleYweMCyQinslGU2jIH5TE/7w24rBammclx7Hr3f+PIb8eHBQadO2tyBoUZzTbAjXPHtWfR6mKMQ==

Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=jyiJHIdhsjCKCErzx2rT6/Gw2tbRYiBlxMLU6GkgS6RM/sjQXvJzyq84ZjJOo0iaLrR7vYiwRlbf7eo6Ii85W2L1ienTGTNPns030v3XcftrRpPicl2liWPgv2smt7C77QgPYmDcGdLvhp8Xz0b2dU57BRdNLoOCCR2CoD+hClYHVH6i1DCcv5UfoOkCkvas7opaAAor4rRPzajY/xeYH6Vpo6c3Yb3ft1oYRzoftavRjUmay/T2Xs1QEL8J3nBNNtvegGTLm0AJ23W/m3rJLiGVs/XI1eeackPOD5k/5QZeYiw7h9XyRCvs380EokXvZIN7PQOxTCV5RC//aRZoaw==

Authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=suse.com;

Cc: bertrand.marquis@xxxxxxx, Julien Grall <jgrall@xxxxxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, George Dunlap <george.dunlap@xxxxxxxxxx>, Stefano Stabellini <sstabellini@xxxxxxxxxx>, Wei Liu <wl@xxxxxxx>, xen-devel@xxxxxxxxxxxxxxxxxxxx

Delivery-date: Thu, 09 Jun 2022 13:12:52 +0000

List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 09.06.2022 14:33, Julien Grall wrote: > On 09/06/2022 13:09, Jan Beulich wrote: >> On 09.06.2022 10:30, Julien Grall wrote: >>> From: Julien Grall <jgrall@xxxxxxxxxx> >>> >>> At the moment, init_heap_pages() will call free_heap_pages() page >>> by page. To reduce the time to initialize the heap, we will want >>> to provide multiple pages at the same time. >>> >>> init_heap_pages() is now split in two parts: >>> - init_heap_pages(): will break down the range in multiple set >>> of contiguous pages. For now, the criteria is the pages should >>> belong to the same NUMA node. >>> - init_contig_pages(): will initialize a set of contiguous pages. >>> For now the pages are still passed one by one to free_heap_pages(). >> >> Hmm, the common use of "contiguous" is to describe address correlation. >> Therefore I'm afraid I'd like to see "contig" avoided when you mean >> "same node". Perhaps init_node_pages()? > > After the next patch, it will not only be the same node, It will also be > the same zone at least. Also, in the future, I would like to > re-submitting David Woodhouse patch to exclude broken pages (see [1]). > > Therefore, I think the name init_node_pages() would not be suitable. > Please suggest a different name. _init_heap_pages() then, as a helper of init_heap_pages()? >>> --- a/xen/common/page_alloc.c >>> +++ b/xen/common/page_alloc.c >>> @@ -1778,16 +1778,55 @@ int query_page_offline(mfn_t mfn, uint32_t *status) >>> } >>> >>> /* >>> - * Hand the specified arbitrary page range to the specified heap zone >>> - * checking the node_id of the previous page. If they differ and the >>> - * latter is not on a MAX_ORDER boundary, then we reserve the page by >>> - * not freeing it to the buddy allocator. >>> + * init_contig_heap_pages() is intended to only take pages from the same >>> + * NUMA node. >>> */ >>> +static bool is_contig_page(struct page_info *pg, unsigned int nid) >>> +{ >>> + return (nid == (phys_to_nid(page_to_maddr(pg)))); >>> +} >> >> If such a helper is indeed needed, then I think it absolutely wants >> pg to be pointer-to-const. And imo it would also help readability if >> the extra pair of parentheses around the nested function calls was >> omitted. Given the naming concern, though, I wonder whether this >> wouldn't better be open-coded in the one place it is used. Actually >> naming is quite odd here beyond what I'd said further up: "Is this >> page contiguous?" Such a question requires two pages, i.e. "Are these >> two pages contiguous?" What you want to know is "Is this page on the >> given node?" > > There will be more check in the future (see next patch). I created an > helper because it reduces the size of the loop init_heap_pages(). I > would be OK to fold it if you strongly prefer that. I don't "strongly" prefer that; I'd also be okay with a suitably named helper. Just that I can't seem to be able to come up with any good name. >>> +/* >>> + * This function should only be called with valid pages from the same NUMA >>> + * node. >>> + * >>> + * Callers should use is_contig_page() first to check if all the pages >>> + * in a range are contiguous. >>> + */ >>> +static void init_contig_heap_pages(struct page_info *pg, unsigned long >>> nr_pages, >>> + bool need_scrub) >>> +{ >>> + unsigned long s, e; >>> + unsigned int nid = phys_to_nid(page_to_maddr(pg)); >>> + >>> + s = mfn_x(page_to_mfn(pg)); >>> + e = mfn_x(mfn_add(page_to_mfn(pg + nr_pages - 1), 1)); >>> + if ( unlikely(!avail[nid]) ) >>> + { >>> + bool use_tail = !(s & ((1UL << MAX_ORDER) - 1)) && >> >> IS_ALIGNED(s, 1UL << MAX_ORDER) to "describe" what's meant? > > This is existing code and it is quite complex. So I would prefer if we > avoid to simplify and move the code in the same patch. I would be happy > to write a follow-up patch to switch to IS_ALIGNED(). I do realize it's code you move, but I can accept your desire to merely move the code without any cleanup. Personally I think that rather than a follow-up patch (which doesn't help the reviewing of this one) such an adjustment would better be a prereq one. >>> @@ -1812,35 +1851,24 @@ static void init_heap_pages( >>> spin_unlock(&heap_lock); >>> >>> if ( system_state < SYS_STATE_active && opt_bootscrub == >>> BOOTSCRUB_IDLE ) >>> - idle_scrub = true; >>> + need_scrub = true; >>> >>> - for ( i = 0; i < nr_pages; i++ ) >>> + for ( i = 0; i < nr_pages; ) >>> { >>> - unsigned int nid = phys_to_nid(page_to_maddr(pg+i)); >>> + unsigned int nid = phys_to_nid(page_to_maddr(pg)); >>> + unsigned long left = nr_pages - i; >>> + unsigned long contig_pages; >>> >>> - if ( unlikely(!avail[nid]) ) >>> + for ( contig_pages = 1; contig_pages < left; contig_pages++ ) >>> { >>> - unsigned long s = mfn_x(page_to_mfn(pg + i)); >>> - unsigned long e = mfn_x(mfn_add(page_to_mfn(pg + nr_pages - >>> 1), 1)); >>> - bool use_tail = (nid == phys_to_nid(pfn_to_paddr(e - 1))) && >>> - !(s & ((1UL << MAX_ORDER) - 1)) && >>> - (find_first_set_bit(e) <= >>> find_first_set_bit(s)); >>> - unsigned long n; >>> - >>> - n = init_node_heap(nid, mfn_x(page_to_mfn(pg + i)), nr_pages - >>> i, >>> - &use_tail); >>> - BUG_ON(i + n > nr_pages); >>> - if ( n && !use_tail ) >>> - { >>> - i += n - 1; >>> - continue; >>> - } >>> - if ( i + n == nr_pages ) >>> + if ( !is_contig_page(pg + contig_pages, nid) ) >>> break; >>> - nr_pages -= n; >>> } >> >> Isn't doing this page by page in a loop quite inefficient? Can't you >> simply obtain the end of the node's range covering the first page, and >> then adjust "left" accordingly? > > The page by page is necessary because we may need to exclude some pages > (see [1]) or the range may cross multiple-zone (see [2]). If you want/need to do this for "future" reasons (aiui [1] was never committed, and you forgot to supply [2], but yes, splitting at zone boundaries is of course necessary), then I think this wants making quite clear in the description. Jan >> I even wonder whether the admittedly >> lax original check's assumption couldn't be leveraged alternatively, >> by effectively bisecting to the end address on the node of interest >> (where the assumption is that nodes aren't interleaved - see Wei's >> NUMA series dealing with precisely that situation). > See above. We went this way because there are some pages to be excluded. > The immediate use case is broken pages, but in the future we may need to > also exclude pages that contain guest content after Live-Update. > > I also plan to get rid of the loop in free_heap_pages() to mark each > page free. This would mean that pages would only be accessed once in > init_heap_pages() (I am still cleaning up that patch and it is a much > more controversial change). > > Cheers, > > [1] > https://lore.kernel.org/xen-devel/20200201003303.2363081-2-dwmw2@xxxxxxxxxxxxx/ > >> >> Jan >

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.