Xen project Mailing List

Re: [PATCH v2 7/9] mm/memory: factor out common code from vm_normal_page_*()

To: David Hildenbrand <david@xxxxxxxxxx>

From: Lorenzo Stoakes <lorenzo.stoakes@xxxxxxxxxx>

Date: Fri, 18 Jul 2025 13:35:34 +0100

Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=oracle.com; dmarc=pass action=none header.from=oracle.com; dkim=pass header.d=oracle.com; arc=none

Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=PGPRNwN4pu3PXQ5swLS0gVK8l1QEymD2y97Ry48GJDU=; b=D4uxWMFFi5qWFuF1SF2+XvSdfvC5t8xDustCg3enyeI1FmPbY+pD9V53sLGP1vUTaD5hajlMU1okW32MXWDK8sMISDN2QtnfOHZCdeOQAa9C/cZdMjYsqWImTTeaMvFL9MJTnKnRoDUUoTh5mAr9QqnjdWWmjmFRqoug+eSz3mgSa8qEVUEAQMrdR/FGI9czkVXXCfTtrL+iZM6/oh6H4k30iOKygivaolM6hGnYmNotJYYM70fRfKkum8LTE2x/kSANjBLApA7ZwCENp6XGBtAW2XeXZtMIZT2O/LsGlQRC1tijblK726H9gxxe3lGVGyDZlDBAtrqD7xViP8I9eg==

Arc-seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=y8d1l6St6MzD8+jGkECAVVihM+oEIma/hZRSPwbQPNtlJ40CmtSSSQEKCEyBAiOlQQxmWVfMOvv2+BGILSFXs6GGdv4S37kv4iMfdJQ/eQUPN6GehAXamVet1TohwOJj4IlpKPcki5mrXI5uQ4G9XzURaOJeILGybF5iqtrC9NTMc/bp+MzNUYBGQkSh2SOywYMuUcywDcMzfa67LTvtjDzF0UYZ+YUop6NLsNZfCmu5K9eH+suwO/0z8BXPO5mrVmQjynt3WzOGqAGJR77tc4d19RTf/o5oSR+T38IQ8TwIQA9JsO55FUmxEQyZfM9KTvAKUzytgPFmza5Wg3gGBA==

Cc: linux-kernel@xxxxxxxxxxxxxxx, linux-mm@xxxxxxxxx, xen-devel@xxxxxxxxxxxxxxxxxxxx, linux-fsdevel@xxxxxxxxxxxxxxx, nvdimm@xxxxxxxxxxxxxxx, Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>, Juergen Gross <jgross@xxxxxxxx>, Stefano Stabellini <sstabellini@xxxxxxxxxx>, Oleksandr Tyshchenko <oleksandr_tyshchenko@xxxxxxxx>, Dan Williams <dan.j.williams@xxxxxxxxx>, Matthew Wilcox <willy@xxxxxxxxxxxxx>, Jan Kara <jack@xxxxxxx>, Alexander Viro <viro@xxxxxxxxxxxxxxxxxx>, Christian Brauner <brauner@xxxxxxxxxx>, "Liam R. Howlett" <Liam.Howlett@xxxxxxxxxx>, Vlastimil Babka <vbabka@xxxxxxx>, Mike Rapoport <rppt@xxxxxxxxxx>, Suren Baghdasaryan <surenb@xxxxxxxxxx>, Michal Hocko <mhocko@xxxxxxxx>, Zi Yan <ziy@xxxxxxxxxx>, Baolin Wang <baolin.wang@xxxxxxxxxxxxxxxxx>, Nico Pache <npache@xxxxxxxxxx>, Ryan Roberts <ryan.roberts@xxxxxxx>, Dev Jain <dev.jain@xxxxxxx>, Barry Song <baohua@xxxxxxxxxx>, Jann Horn <jannh@xxxxxxxxxx>, Pedro Falcato <pfalcato@xxxxxxx>, Hugh Dickins <hughd@xxxxxxxxxx>, Oscar Salvador <osalvador@xxxxxxx>, Lance Yang <lance.yang@xxxxxxxxx>

Delivery-date: Fri, 18 Jul 2025 12:36:18 +0000

List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On Thu, Jul 17, 2025 at 10:12:37PM +0200, David Hildenbrand wrote: > > > > > > -/* > > > - * vm_normal_page -- This function gets the "struct page" associated > > > with a pte. > > > +/** > > > + * vm_normal_page_pfn() - Get the "struct page" associated with a PFN in > > > a > > > + * non-special page table entry. > > > > This is a bit nebulous/confusing, I mean you'll get PTE entries with PTE > > special > > bit that'll have a PFN but just no struct page/folio to look at, or should > > not > > be touched. > > > > So the _pfn() bit doesn't really properly describe what it does. > > > > I wonder if it'd be better to just separate out the special handler, have > > that return a boolean indicating special of either form, and then separate > > other shared code separately from that? > > Let me think about that; I played with various approaches and this was the > best I was come up with before running in circles. Thanks > > > > > > + * @vma: The VMA mapping the @pfn. > > > + * @addr: The address where the @pfn is mapped. > > > + * @pfn: The PFN. > > > + * @entry: The page table entry value for error reporting purposes. > > > * > > > * "Special" mappings do not wish to be associated with a "struct page" > > > (either > > > * it doesn't exist, or it exists but they don't want to touch it). In > > > this > > > @@ -603,10 +608,10 @@ static void print_bad_page_map(struct > > > vm_area_struct *vma, > > > * (such as GUP) can still identify these mappings and work with the > > > * underlying "struct page". > > > * > > > - * There are 2 broad cases. Firstly, an architecture may define a > > > pte_special() > > > - * pte bit, in which case this function is trivial. Secondly, an > > > architecture > > > - * may not have a spare pte bit, which requires a more complicated > > > scheme, > > > - * described below. > > > + * There are 2 broad cases. Firstly, an architecture may define a > > > "special" > > > + * page table entry bit (e.g., pte_special()), in which case this > > > function is > > > + * trivial. Secondly, an architecture may not have a spare page table > > > + * entry bit, which requires a more complicated scheme, described below. > > > > Strikes me this bit of the comment should be with vm_normal_page(). As this > > implies the 2 broad cases are handled here and this isn't the case. > > Well, pragmatism. Splitting up the doc doesn't make sense. Having it at > vm_normal_page() doesn't make sense. > > I'm sure the educated reader will be able to make sense of it :P > > But I'm happy to hear suggestions on how to do it differently :) Right yeah. I feel like having separate 'special' handling for each case as separate functions, each with their own specific explanation would work. But I don't want to hold up the series _too_ much on this, generally I just find the _pfn thing confusing. I mean the implementation is a total pain anyway... I feel like we could even have separate special handling functions like #ifdef CONFIG_ARCH_HAS_PTE_SPECIAL /* * < description of pte special special page > * * If returns true, then pagep set to NULL or, if a page can be found, that * page. * */ static struct bool is_special_page(struct vm_area_struct *vma, unsigned long addr, pte_t pte, struct page **pagep) { unsigned long pfn = pte_pfn(pte); if (likely(!pte_special(pte))) { if (pfn <= highest_memmap_pfn) return false; goto bad; } if (vma->vm_ops && vma->vm_ops->find_special_page) { *pagep = vma->vm_ops->find_special_page(vma, addr); return true; } else if (vma->vm_flags & (VM_PFNMAP | VM_MIXEDMAP)) { goto special; } if (is_zero_pfn(pfn)) goto special; /* If we reach here something's gone wrong. */ bad: print_bad_pte(vma, addr, pte, NULL); special: *pagep = NULL; return true; } #else /* * < description for not-pte special special page > */ static struct bool is_special_page(struct vm_area_struct *vma, unsigned long addr, pte_t pte, struct page **pagep) { unsigned long pfn = pte_pfn(pte); if (is_zero_pfn(pfn)) goto special; if (vma->vm_flags & VM_MIXEDMAP) { if (!pfn_valid(pfn) || is_zero_pfn(pfn)) goto special; } else if (vma->vm_flags & VM_PFNMAP) { unsigned long off; off = (addr - vma->vm_start) >> PAGE_SHIFT; if (pfn == vma->vm_pgoff + off) goto special; /* Hell's bells we allow CoW !arch_has_pte_special of PFN pages! help! */ if (!is_cow_mapping(vma->vm_flags)) goto special; } if (pfn > highest_memmap_pfn) { print_bad_pte(vma, addr, pte, NULL); goto special; } return false; special: *pagep = NULL; return true; } #endif And then obviously invoke as makes sense... This is rough and untested, just to give a sense :>) > > > > > > * > > > * A raw VM_PFNMAP mapping (ie. one that is not COWed) is always > > > considered a > > > * special mapping (even if there are underlying and valid "struct > > > pages"). > > > @@ -639,15 +644,72 @@ static void print_bad_page_map(struct > > > vm_area_struct *vma, > > > * don't have to follow the strict linearity rule of PFNMAP mappings in > > > * order to support COWable mappings. > > > * > > > + * This function is not expected to be called for obviously special > > > mappings: > > > + * when the page table entry has the "special" bit set. > > > > Hmm this is is a bit weird though, saying "obviously" special, because > > you're > > handling "special" mappings here, but only for architectures that don't > > specify > > the PTE special bit. > > > > So it makes it quite nebulous what constitutes 'obviously' here, really you > > mean > > pte_special(). > > Yes, I can clarify that. Thanks! > > > > > > + * > > > + * Return: Returns the "struct page" if this is a "normal" mapping. > > > Returns > > > + * NULL if this is a "special" mapping. > > > + */ > > > +static inline struct page *vm_normal_page_pfn(struct vm_area_struct *vma, > > > + unsigned long addr, unsigned long pfn, unsigned long long entry) > > > +{ > > > + /* > > > + * With CONFIG_ARCH_HAS_PTE_SPECIAL, any special page table mappings > > > + * (incl. shared zero folios) are marked accordingly and are handled > > > + * by the caller. > > > + */ > > > + if (!IS_ENABLED(CONFIG_ARCH_HAS_PTE_SPECIAL)) { > > > + if (unlikely(vma->vm_flags & (VM_PFNMAP | VM_MIXEDMAP))) { > > > + if (vma->vm_flags & VM_MIXEDMAP) { > > > + /* If it has a "struct page", it's "normal". */ > > > + if (!pfn_valid(pfn)) > > > + return NULL; > > > + } else { > > > + unsigned long off = (addr - vma->vm_start) >> > > > PAGE_SHIFT; > > > + > > > + /* Only CoW'ed anon folios are "normal". */ > > > + if (pfn == vma->vm_pgoff + off) > > > + return NULL; > > > + if (!is_cow_mapping(vma->vm_flags)) > > > + return NULL; > > > + } > > > + } > > > + > > > + if (is_zero_pfn(pfn) || is_huge_zero_pfn(pfn)) > > > > This handles zero/zero huge page handling for non-pte_special() case > > only. I wonder if we even need to bother having these marked special > > generally since you can just check the PFN every time anyway. > > Well, that makes (a) pte_special() a bit weird -- not set for some special > pages and (b) requires additional runtime checks for the case we all really > care about -- pte_special(). > > So I don't think we should change that. OK, best to be consistent in setting for special pages. > > [...] > > > > > > > +/** > > > + * vm_normal_folio() - Get the "struct folio" associated with a PTE > > > + * @vma: The VMA mapping the @pte. > > > + * @addr: The address where the @pte is mapped. > > > + * @pte: The PTE. > > > + * > > > + * Get the "struct folio" associated with a PTE. See vm_normal_page_pfn() > > > + * for details. > > > + * > > > + * Return: Returns the "struct folio" if this is a "normal" mapping. > > > Returns > > > + * NULL if this is a "special" mapping. > > > + */ > > > > Nice to add a comment, but again feels weird to have the whole explanation > > in > > vm_normal_page_pfn() but then to invoke vm_normal_page().. > > You want people to do pointer chasing to find what they are looking for? :) Yes. Only joking :P Hopefully the ideas mentioned above clarify things... a bit maybe... :>) > > -- > Cheers, > > David / dhildenb >

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.