[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v2 7/9] mm/memory: factor out common code from vm_normal_page_*()



On Thu, Jul 17, 2025 at 01:52:10PM +0200, David Hildenbrand wrote:
> Let's reduce the code duplication and factor out the non-pte/pmd related
> magic into vm_normal_page_pfn().
>
> To keep it simpler, check the pfn against both zero folios. We could
> optimize this, but as it's only for the !CONFIG_ARCH_HAS_PTE_SPECIAL
> case, it's not a compelling micro-optimization.
>
> With CONFIG_ARCH_HAS_PTE_SPECIAL we don't have to check anything else,
> really.
>
> It's a good question if we can even hit the !CONFIG_ARCH_HAS_PTE_SPECIAL
> scenario in the PMD case in practice: but doesn't really matter, as
> it's now all unified in vm_normal_page_pfn().
>
> Add kerneldoc for all involved functions.
>
> No functional change intended.
>
> Reviewed-by: Oscar Salvador <osalvador@xxxxxxx>
> Signed-off-by: David Hildenbrand <david@xxxxxxxxxx>
> ---
>  mm/memory.c | 183 +++++++++++++++++++++++++++++++---------------------
>  1 file changed, 109 insertions(+), 74 deletions(-)
>
> diff --git a/mm/memory.c b/mm/memory.c
> index 08d16ed7b4cc7..c43ae5e4d7644 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -590,8 +590,13 @@ static void print_bad_page_map(struct vm_area_struct 
> *vma,
>       add_taint(TAINT_BAD_PAGE, LOCKDEP_NOW_UNRELIABLE);
>  }
>
> -/*
> - * vm_normal_page -- This function gets the "struct page" associated with a 
> pte.
> +/**
> + * vm_normal_page_pfn() - Get the "struct page" associated with a PFN in a
> + *                     non-special page table entry.

This is a bit nebulous/confusing, I mean you'll get PTE entries with PTE special
bit that'll have a PFN but just no struct page/folio to look at, or should not
be touched.

So the _pfn() bit doesn't really properly describe what it does.

I wonder if it'd be better to just separate out the special handler, have
that return a boolean indicating special of either form, and then separate
other shared code separately from that?

> + * @vma: The VMA mapping the @pfn.
> + * @addr: The address where the @pfn is mapped.
> + * @pfn: The PFN.
> + * @entry: The page table entry value for error reporting purposes.
>   *
>   * "Special" mappings do not wish to be associated with a "struct page" 
> (either
>   * it doesn't exist, or it exists but they don't want to touch it). In this
> @@ -603,10 +608,10 @@ static void print_bad_page_map(struct vm_area_struct 
> *vma,
>   * (such as GUP) can still identify these mappings and work with the
>   * underlying "struct page".
>   *
> - * There are 2 broad cases. Firstly, an architecture may define a 
> pte_special()
> - * pte bit, in which case this function is trivial. Secondly, an architecture
> - * may not have a spare pte bit, which requires a more complicated scheme,
> - * described below.
> + * There are 2 broad cases. Firstly, an architecture may define a "special"
> + * page table entry bit (e.g., pte_special()), in which case this function is
> + * trivial. Secondly, an architecture may not have a spare page table
> + * entry bit, which requires a more complicated scheme, described below.

Strikes me this bit of the comment should be with vm_normal_page(). As this
implies the 2 broad cases are handled here and this isn't the case.

>   *
>   * A raw VM_PFNMAP mapping (ie. one that is not COWed) is always considered a
>   * special mapping (even if there are underlying and valid "struct pages").
> @@ -639,15 +644,72 @@ static void print_bad_page_map(struct vm_area_struct 
> *vma,
>   * don't have to follow the strict linearity rule of PFNMAP mappings in
>   * order to support COWable mappings.
>   *
> + * This function is not expected to be called for obviously special mappings:
> + * when the page table entry has the "special" bit set.

Hmm this is is a bit weird though, saying "obviously" special, because you're
handling "special" mappings here, but only for architectures that don't specify
the PTE special bit.

So it makes it quite nebulous what constitutes 'obviously' here, really you mean
pte_special().

> + *
> + * Return: Returns the "struct page" if this is a "normal" mapping. Returns
> + *      NULL if this is a "special" mapping.
> + */
> +static inline struct page *vm_normal_page_pfn(struct vm_area_struct *vma,
> +             unsigned long addr, unsigned long pfn, unsigned long long entry)
> +{
> +     /*
> +      * With CONFIG_ARCH_HAS_PTE_SPECIAL, any special page table mappings
> +      * (incl. shared zero folios) are marked accordingly and are handled
> +      * by the caller.
> +      */
> +     if (!IS_ENABLED(CONFIG_ARCH_HAS_PTE_SPECIAL)) {
> +             if (unlikely(vma->vm_flags & (VM_PFNMAP | VM_MIXEDMAP))) {
> +                     if (vma->vm_flags & VM_MIXEDMAP) {
> +                             /* If it has a "struct page", it's "normal". */
> +                             if (!pfn_valid(pfn))
> +                                     return NULL;
> +                     } else {
> +                             unsigned long off = (addr - vma->vm_start) >> 
> PAGE_SHIFT;
> +
> +                             /* Only CoW'ed anon folios are "normal". */
> +                             if (pfn == vma->vm_pgoff + off)
> +                                     return NULL;
> +                             if (!is_cow_mapping(vma->vm_flags))
> +                                     return NULL;
> +                     }
> +             }
> +
> +             if (is_zero_pfn(pfn) || is_huge_zero_pfn(pfn))

This handles zero/zero huge page handling for non-pte_special() case
only. I wonder if we even need to bother having these marked special
generally since you can just check the PFN every time anyway.

> +                     return NULL;
> +     }
> +
> +     /* Cheap check for corrupted page table entries. */
> +     if (pfn > highest_memmap_pfn) {
> +             print_bad_page_map(vma, addr, entry, NULL);
> +             return NULL;
> +     }
> +     /*
> +      * NOTE! We still have PageReserved() pages in the page tables.
> +      * For example, VDSO mappings can cause them to exist.
> +      */
> +     VM_WARN_ON_ONCE(is_zero_pfn(pfn) || is_huge_zero_pfn(pfn));
> +     return pfn_to_page(pfn);
> +}
> +
> +/**
> + * vm_normal_page() - Get the "struct page" associated with a PTE
> + * @vma: The VMA mapping the @pte.
> + * @addr: The address where the @pte is mapped.
> + * @pte: The PTE.
> + *
> + * Get the "struct page" associated with a PTE. See vm_normal_page_pfn()
> + * for details.
> + *
> + * Return: Returns the "struct page" if this is a "normal" mapping. Returns
> + *      NULL if this is a "special" mapping.
>   */
>  struct page *vm_normal_page(struct vm_area_struct *vma, unsigned long addr,
>                           pte_t pte)
>  {
>       unsigned long pfn = pte_pfn(pte);
>
> -     if (IS_ENABLED(CONFIG_ARCH_HAS_PTE_SPECIAL)) {
> -             if (likely(!pte_special(pte)))
> -                     goto check_pfn;
> +     if (unlikely(pte_special(pte))) {
>               if (vma->vm_ops && vma->vm_ops->find_special_page)
>                       return vma->vm_ops->find_special_page(vma, addr);
>               if (vma->vm_flags & (VM_PFNMAP | VM_MIXEDMAP))
> @@ -658,44 +720,21 @@ struct page *vm_normal_page(struct vm_area_struct *vma, 
> unsigned long addr,
>               print_bad_page_map(vma, addr, pte_val(pte), NULL);
>               return NULL;
>       }
> -
> -     /* !CONFIG_ARCH_HAS_PTE_SPECIAL case follows: */
> -
> -     if (unlikely(vma->vm_flags & (VM_PFNMAP|VM_MIXEDMAP))) {
> -             if (vma->vm_flags & VM_MIXEDMAP) {
> -                     if (!pfn_valid(pfn))
> -                             return NULL;
> -                     if (is_zero_pfn(pfn))
> -                             return NULL;
> -                     goto out;
> -             } else {
> -                     unsigned long off;
> -                     off = (addr - vma->vm_start) >> PAGE_SHIFT;
> -                     if (pfn == vma->vm_pgoff + off)
> -                             return NULL;
> -                     if (!is_cow_mapping(vma->vm_flags))
> -                             return NULL;
> -             }
> -     }
> -
> -     if (is_zero_pfn(pfn))
> -             return NULL;
> -
> -check_pfn:
> -     if (unlikely(pfn > highest_memmap_pfn)) {
> -             print_bad_page_map(vma, addr, pte_val(pte), NULL);
> -             return NULL;
> -     }
> -
> -     /*
> -      * NOTE! We still have PageReserved() pages in the page tables.
> -      * eg. VDSO mappings can cause them to exist.
> -      */
> -out:
> -     VM_WARN_ON_ONCE(is_zero_pfn(pfn));
> -     return pfn_to_page(pfn);
> +     return vm_normal_page_pfn(vma, addr, pfn, pte_val(pte));
>  }
>
> +/**
> + * vm_normal_folio() - Get the "struct folio" associated with a PTE
> + * @vma: The VMA mapping the @pte.
> + * @addr: The address where the @pte is mapped.
> + * @pte: The PTE.
> + *
> + * Get the "struct folio" associated with a PTE. See vm_normal_page_pfn()
> + * for details.
> + *
> + * Return: Returns the "struct folio" if this is a "normal" mapping. Returns
> + *      NULL if this is a "special" mapping.
> + */

Nice to add a comment, but again feels weird to have the whole explanation in
vm_normal_page_pfn() but then to invoke vm_normal_page()...

>  struct folio *vm_normal_folio(struct vm_area_struct *vma, unsigned long addr,
>                           pte_t pte)
>  {
> @@ -707,6 +746,18 @@ struct folio *vm_normal_folio(struct vm_area_struct 
> *vma, unsigned long addr,
>  }
>
>  #ifdef CONFIG_PGTABLE_HAS_HUGE_LEAVES
> +/**
> + * vm_normal_page_pmd() - Get the "struct page" associated with a PMD
> + * @vma: The VMA mapping the @pmd.
> + * @addr: The address where the @pmd is mapped.
> + * @pmd: The PMD.
> + *
> + * Get the "struct page" associated with a PMD. See vm_normal_page_pfn()
> + * for details.
> + *
> + * Return: Returns the "struct page" if this is a "normal" mapping. Returns
> + *      NULL if this is a "special" mapping.
> + */
>  struct page *vm_normal_page_pmd(struct vm_area_struct *vma, unsigned long 
> addr,
>                               pmd_t pmd)
>  {
> @@ -721,37 +772,21 @@ struct page *vm_normal_page_pmd(struct vm_area_struct 
> *vma, unsigned long addr,
>               print_bad_page_map(vma, addr, pmd_val(pmd), NULL);
>               return NULL;
>       }
> -
> -     if (unlikely(vma->vm_flags & (VM_PFNMAP|VM_MIXEDMAP))) {
> -             if (vma->vm_flags & VM_MIXEDMAP) {
> -                     if (!pfn_valid(pfn))
> -                             return NULL;
> -                     goto out;
> -             } else {
> -                     unsigned long off;
> -                     off = (addr - vma->vm_start) >> PAGE_SHIFT;
> -                     if (pfn == vma->vm_pgoff + off)
> -                             return NULL;
> -                     if (!is_cow_mapping(vma->vm_flags))
> -                             return NULL;
> -             }
> -     }
> -
> -     if (is_huge_zero_pfn(pfn))
> -             return NULL;
> -     if (unlikely(pfn > highest_memmap_pfn)) {
> -             print_bad_page_map(vma, addr, pmd_val(pmd), NULL);
> -             return NULL;
> -     }
> -
> -     /*
> -      * NOTE! We still have PageReserved() pages in the page tables.
> -      * eg. VDSO mappings can cause them to exist.
> -      */
> -out:
> -     return pfn_to_page(pfn);
> +     return vm_normal_page_pfn(vma, addr, pfn, pmd_val(pmd));

Hmm this seems broken, because you're now making these special on arches with
pte_special() right? But then you're invoking the not-special function?

Also for non-pte_special() arches you're kind of implying they _maybe_ could be
special.


>  }
>
> +/**
> + * vm_normal_folio_pmd() - Get the "struct folio" associated with a PMD
> + * @vma: The VMA mapping the @pmd.
> + * @addr: The address where the @pmd is mapped.
> + * @pmd: The PMD.
> + *
> + * Get the "struct folio" associated with a PMD. See vm_normal_page_pfn()
> + * for details.
> + *
> + * Return: Returns the "struct folio" if this is a "normal" mapping. Returns
> + *      NULL if this is a "special" mapping.
> + */
>  struct folio *vm_normal_folio_pmd(struct vm_area_struct *vma,
>                                 unsigned long addr, pmd_t pmd)
>  {
> --
> 2.50.1
>



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.