[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH v1 6/9] mm/memory: convert print_bad_pte() to print_bad_page_map()
On 15.07.25 15:23, David Hildenbrand wrote: print_bad_pte() looks like something that should actually be a WARN or similar, but historically it apparently has proven to be useful to detect corruption of page tables even on production systems -- report the issue and keep the system running to make it easier to actually detect what is going wrong (e.g., multiple such messages might shed a light). As we want to unify vm_normal_page_*() handling for PTE/PMD/PUD, we'll have to take care of print_bad_pte() as well. Let's prepare for using print_bad_pte() also for non-PTEs by adjusting the implementation and renaming the function -- we'll rename it to what we actually print: bad (page) mappings. Maybe it should be called "print_bad_table_entry()"? We'll just call it "print_bad_page_map()" because the assumption is that we are dealing with some (previously) present page table entry that got corrupted in weird ways. Whether it is a PTE or something else will usually become obvious from the page table dump or from the dumped stack. If ever required in the future, we could pass the entry level type similar to "enum rmap_level". For now, let's keep it simple. To make the function a bit more readable, factor out the ratelimit check into is_bad_page_map_ratelimited() and place the dumping of page table content into __dump_bad_page_map_pgtable(). We'll now dump information from each level in a single line, and just stop the table walk once we hit something that is not a present page table. Use print_bad_page_map() in vm_normal_page_pmd() similar to how we do it for vm_normal_page(), now that we have a function that can handle it. The report will now look something like (dumping pgd to pmd values): [ 77.943408] BUG: Bad page map in process XXX entry:80000001233f5867 [ 77.944077] addr:00007fd84bb1c000 vm_flags:08100071 anon_vma: ... [ 77.945186] pgd:10a89f067 p4d:10a89f067 pud:10e5a2067 pmd:105327067 Signed-off-by: David Hildenbrand <david@xxxxxxxxxx> --- mm/memory.c | 120 ++++++++++++++++++++++++++++++++++++++++------------ 1 file changed, 94 insertions(+), 26 deletions(-) diff --git a/mm/memory.c b/mm/memory.c index a4f62923b961c..00ee0df020503 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -479,22 +479,8 @@ static inline void add_mm_rss_vec(struct mm_struct *mm, int *rss) add_mm_counter(mm, i, rss[i]); }-/*- * This function is called to print an error when a bad pte - * is found. For example, we might have a PFN-mapped pte in - * a region that doesn't allow it. - * - * The calling function must still handle the error. - */ -static void print_bad_pte(struct vm_area_struct *vma, unsigned long addr, - pte_t pte, struct page *page) +static bool is_bad_page_map_ratelimited(void) { - pgd_t *pgd = pgd_offset(vma->vm_mm, addr); - p4d_t *p4d = p4d_offset(pgd, addr); - pud_t *pud = pud_offset(p4d, addr); - pmd_t *pmd = pmd_offset(pud, addr); - struct address_space *mapping; - pgoff_t index; static unsigned long resume; static unsigned long nr_shown; static unsigned long nr_unshown; @@ -506,7 +492,7 @@ static void print_bad_pte(struct vm_area_struct *vma, unsigned long addr, if (nr_shown == 60) { if (time_before(jiffies, resume)) { nr_unshown++; - return; + return true; } if (nr_unshown) { pr_alert("BUG: Bad page map: %lu messages suppressed\n", @@ -517,15 +503,87 @@ static void print_bad_pte(struct vm_area_struct *vma, unsigned long addr, } if (nr_shown++ == 0) resume = jiffies + 60 * HZ; + return false; +} + +static void __dump_bad_page_map_pgtable(struct mm_struct *mm, unsigned long addr) +{ + unsigned long long pgdv, p4dv, pudv, pmdv; + pgd_t pgd, *pgdp; + p4d_t p4d, *p4dp; + pud_t pud, *pudp; + pmd_t *pmdp; + + /* + * This looks like a fully lockless walk, however, the caller is + * expected to hold the leaf page table lock in addition to other + * rmap/mm/vma locks. So this is just a re-walk to dump page table + * content while any concurrent modifications should be completely + * prevented. + */ + pgdp = pgd_offset(mm, addr); + pgd = pgdp_get(pgdp); + pgdv = pgd_val(pgd); Apparently there is something weird here on arm-bcm2835_defconfig: All errors (new ones prefixed by >>):>> mm/memory.c:525:6: error: array type 'pgd_t' (aka 'unsigned int[2]') is not assignable 525 | pgd = pgdp_get(pgdp); | ~~~ ^ 1 error generated. ... apparently because we have this questionable ... arch/arm/include/asm/pgtable-2level-types.h:typedef pmdval_t pgd_t[2];I mean, the whole concept of pgdp_get() doesn't make too much sense if it wants to return an array. I don't quite understand the "#undef STRICT_MM_TYPECHECKS #ifdef STRICT_MM_TYPECHECKS" stuff. Why do we want to make it easier on the compiler while doing something fairly weird? CCing arm maintainers: what's going on here? :)An easy fix would be to not dump the pgd value, but having a non-functional pgdp_get() really is weird. -- Cheers, David / dhildenb
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |