Xen project Mailing List

Re: [Xen-devel] Re: kernel BUG at mm/swapfile.c:2527! [was 3.0.0 Xen pv guest - BUG: Unable to handle]

To: "Christopher S. Aker" <caker@xxxxxxxxxxxx>, Shaun R <mailinglists@xxxxxxxxxxxxxxxx>, Jeremy Fitzhardinge <jeremy@xxxxxxxx>

From: Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx>

Date: Thu, 22 Sep 2011 14:32:32 -0400

Cc: "xen-devel@xxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxx>, Ian Campbell <Ian.Campbell@xxxxxxxxxx>, LKML <linux-kernel@xxxxxxxxxxxxxxx>

Delivery-date: Thu, 22 Sep 2011 11:38:35 -0700

List-id: Xen developer discussion <xen-devel.lists.xensource.com>

> >I'd bet the dereference corresponds to the "*map" in that same place but > >Peter can you convert that address to a line of code please? > > root@build:/build/xen/domU/i386/3.0.0-linode35-debug# gdb vmlinux > GNU gdb (GDB) 7.1-ubuntu (...snip...) > Reading symbols from > /build/xen/domU/i386/3.0.0-linode35-debug/vmlinux...done. > (gdb) list *0xc01ab854 > 0xc01ab854 is in swap_count_continued (mm/swapfile.c:2493). > 2488 > 2489 if (count == (SWAP_MAP_MAX | COUNT_CONTINUED)) { /* > incrementing */ > 2490 /* > 2491 * Think of how you add 1 to 999 > 2492 */ > 2493 while (*map == (SWAP_CONT_MAX | COUNT_CONTINUED)) { > 2494 kunmap_atomic(map, KM_USER0); > 2495 page = list_entry(page->lru.next, > struct page, lru); > 2496 BUG_ON(page == head); > 2497 map = kmap_atomic(page, KM_USER0) + offset; > (gdb) > > >map came from a kmap_atomic() not far before this point so it appears > >that it is mapping the wrong page (so *map != 0) and/or mapping a > >non-existent page (leading to the fault). First of thanks to Jeremy for help on this one, and Shaun R for lending me one of his boxes with a environment to easily test it. The problem looks that in copy_page_range we turn lazy mode on, and then in swap_entry_free we call swap_count_continued which ends up in: map = kmap_atomic(page, KM_USER0) + offset; and then later on touching *map. Basically we are forking a process and copying the pages that are also "swap" pages. We don't need to access the user pages immediately, but we do for the swap pages as we need proper reference counting. Well, since we are running in batched mode we don't actually set up the PTE mappings and the kmap_atomic is not done synchronously and ends up trying to dereference a page that has not been set. Looking at kmap_atomic_prot_pfn, it uses 'arch_flush_lazy_mmu_mode' and sprinkling that in kmap_atomic_prot and __kunmap_atomic seems to make the problem go away. This is the patch that looks to be doing the trick. Please double check if it fixes in your guys setup. diff --git a/arch/x86/mm/highmem_32.c b/arch/x86/mm/highmem_32.c index b499626..f4f29b1 100644 --- a/arch/x86/mm/highmem_32.c +++ b/arch/x86/mm/highmem_32.c @@ -45,6 +45,7 @@ void *kmap_atomic_prot(struct page *page, pgprot_t prot) vaddr = __fix_to_virt(FIX_KMAP_BEGIN + idx); BUG_ON(!pte_none(*(kmap_pte-idx))); set_pte(kmap_pte-idx, mk_pte(page, prot)); + arch_flush_lazy_mmu_mode(); return (void *)vaddr; } @@ -88,6 +89,7 @@ void __kunmap_atomic(void *kvaddr) */ kpte_clear_flush(kmap_pte-idx, vaddr); kmap_atomic_idx_pop(); + arch_flush_lazy_mmu_mode(); } #ifdef CONFIG_DEBUG_HIGHMEM else {

Attachment: flush.patch
Description: Text Data

_______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.