[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH v2 7/9] mm/memory: factor out common code from vm_normal_page_*()
On Thu, Jul 17, 2025 at 10:12:37PM +0200, David Hildenbrand wrote: > > > > > > -/* > > > - * vm_normal_page -- This function gets the "struct page" associated > > > with a pte. > > > +/** > > > + * vm_normal_page_pfn() - Get the "struct page" associated with a PFN in > > > a > > > + * non-special page table entry. > > > > This is a bit nebulous/confusing, I mean you'll get PTE entries with PTE > > special > > bit that'll have a PFN but just no struct page/folio to look at, or should > > not > > be touched. > > > > So the _pfn() bit doesn't really properly describe what it does. > > > > I wonder if it'd be better to just separate out the special handler, have > > that return a boolean indicating special of either form, and then separate > > other shared code separately from that? > > Let me think about that; I played with various approaches and this was the > best I was come up with before running in circles. Thanks > > > > > > + * @vma: The VMA mapping the @pfn. > > > + * @addr: The address where the @pfn is mapped. > > > + * @pfn: The PFN. > > > + * @entry: The page table entry value for error reporting purposes. > > > * > > > * "Special" mappings do not wish to be associated with a "struct page" > > > (either > > > * it doesn't exist, or it exists but they don't want to touch it). In > > > this > > > @@ -603,10 +608,10 @@ static void print_bad_page_map(struct > > > vm_area_struct *vma, > > > * (such as GUP) can still identify these mappings and work with the > > > * underlying "struct page". > > > * > > > - * There are 2 broad cases. Firstly, an architecture may define a > > > pte_special() > > > - * pte bit, in which case this function is trivial. Secondly, an > > > architecture > > > - * may not have a spare pte bit, which requires a more complicated > > > scheme, > > > - * described below. > > > + * There are 2 broad cases. Firstly, an architecture may define a > > > "special" > > > + * page table entry bit (e.g., pte_special()), in which case this > > > function is > > > + * trivial. Secondly, an architecture may not have a spare page table > > > + * entry bit, which requires a more complicated scheme, described below. > > > > Strikes me this bit of the comment should be with vm_normal_page(). As this > > implies the 2 broad cases are handled here and this isn't the case. > > Well, pragmatism. Splitting up the doc doesn't make sense. Having it at > vm_normal_page() doesn't make sense. > > I'm sure the educated reader will be able to make sense of it :P > > But I'm happy to hear suggestions on how to do it differently :) Right yeah. I feel like having separate 'special' handling for each case as separate functions, each with their own specific explanation would work. But I don't want to hold up the series _too_ much on this, generally I just find the _pfn thing confusing. I mean the implementation is a total pain anyway... I feel like we could even have separate special handling functions like #ifdef CONFIG_ARCH_HAS_PTE_SPECIAL /* * < description of pte special special page > * * If returns true, then pagep set to NULL or, if a page can be found, that * page. * */ static struct bool is_special_page(struct vm_area_struct *vma, unsigned long addr, pte_t pte, struct page **pagep) { unsigned long pfn = pte_pfn(pte); if (likely(!pte_special(pte))) { if (pfn <= highest_memmap_pfn) return false; goto bad; } if (vma->vm_ops && vma->vm_ops->find_special_page) { *pagep = vma->vm_ops->find_special_page(vma, addr); return true; } else if (vma->vm_flags & (VM_PFNMAP | VM_MIXEDMAP)) { goto special; } if (is_zero_pfn(pfn)) goto special; /* If we reach here something's gone wrong. */ bad: print_bad_pte(vma, addr, pte, NULL); special: *pagep = NULL; return true; } #else /* * < description for not-pte special special page > */ static struct bool is_special_page(struct vm_area_struct *vma, unsigned long addr, pte_t pte, struct page **pagep) { unsigned long pfn = pte_pfn(pte); if (is_zero_pfn(pfn)) goto special; if (vma->vm_flags & VM_MIXEDMAP) { if (!pfn_valid(pfn) || is_zero_pfn(pfn)) goto special; } else if (vma->vm_flags & VM_PFNMAP) { unsigned long off; off = (addr - vma->vm_start) >> PAGE_SHIFT; if (pfn == vma->vm_pgoff + off) goto special; /* Hell's bells we allow CoW !arch_has_pte_special of PFN pages! help! */ if (!is_cow_mapping(vma->vm_flags)) goto special; } if (pfn > highest_memmap_pfn) { print_bad_pte(vma, addr, pte, NULL); goto special; } return false; special: *pagep = NULL; return true; } #endif And then obviously invoke as makes sense... This is rough and untested, just to give a sense :>) > > > > > > * > > > * A raw VM_PFNMAP mapping (ie. one that is not COWed) is always > > > considered a > > > * special mapping (even if there are underlying and valid "struct > > > pages"). > > > @@ -639,15 +644,72 @@ static void print_bad_page_map(struct > > > vm_area_struct *vma, > > > * don't have to follow the strict linearity rule of PFNMAP mappings in > > > * order to support COWable mappings. > > > * > > > + * This function is not expected to be called for obviously special > > > mappings: > > > + * when the page table entry has the "special" bit set. > > > > Hmm this is is a bit weird though, saying "obviously" special, because > > you're > > handling "special" mappings here, but only for architectures that don't > > specify > > the PTE special bit. > > > > So it makes it quite nebulous what constitutes 'obviously' here, really you > > mean > > pte_special(). > > Yes, I can clarify that. Thanks! > > > > > > + * > > > + * Return: Returns the "struct page" if this is a "normal" mapping. > > > Returns > > > + * NULL if this is a "special" mapping. > > > + */ > > > +static inline struct page *vm_normal_page_pfn(struct vm_area_struct *vma, > > > + unsigned long addr, unsigned long pfn, unsigned long long entry) > > > +{ > > > + /* > > > + * With CONFIG_ARCH_HAS_PTE_SPECIAL, any special page table mappings > > > + * (incl. shared zero folios) are marked accordingly and are handled > > > + * by the caller. > > > + */ > > > + if (!IS_ENABLED(CONFIG_ARCH_HAS_PTE_SPECIAL)) { > > > + if (unlikely(vma->vm_flags & (VM_PFNMAP | VM_MIXEDMAP))) { > > > + if (vma->vm_flags & VM_MIXEDMAP) { > > > + /* If it has a "struct page", it's "normal". */ > > > + if (!pfn_valid(pfn)) > > > + return NULL; > > > + } else { > > > + unsigned long off = (addr - vma->vm_start) >> > > > PAGE_SHIFT; > > > + > > > + /* Only CoW'ed anon folios are "normal". */ > > > + if (pfn == vma->vm_pgoff + off) > > > + return NULL; > > > + if (!is_cow_mapping(vma->vm_flags)) > > > + return NULL; > > > + } > > > + } > > > + > > > + if (is_zero_pfn(pfn) || is_huge_zero_pfn(pfn)) > > > > This handles zero/zero huge page handling for non-pte_special() case > > only. I wonder if we even need to bother having these marked special > > generally since you can just check the PFN every time anyway. > > Well, that makes (a) pte_special() a bit weird -- not set for some special > pages and (b) requires additional runtime checks for the case we all really > care about -- pte_special(). > > So I don't think we should change that. OK, best to be consistent in setting for special pages. > > [...] > > > > > > > +/** > > > + * vm_normal_folio() - Get the "struct folio" associated with a PTE > > > + * @vma: The VMA mapping the @pte. > > > + * @addr: The address where the @pte is mapped. > > > + * @pte: The PTE. > > > + * > > > + * Get the "struct folio" associated with a PTE. See vm_normal_page_pfn() > > > + * for details. > > > + * > > > + * Return: Returns the "struct folio" if this is a "normal" mapping. > > > Returns > > > + * NULL if this is a "special" mapping. > > > + */ > > > > Nice to add a comment, but again feels weird to have the whole explanation > > in > > vm_normal_page_pfn() but then to invoke vm_normal_page().. > > You want people to do pointer chasing to find what they are looking for? :) Yes. Only joking :P Hopefully the ideas mentioned above clarify things... a bit maybe... :>) > > -- > Cheers, > > David / dhildenb >
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |