WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-devel

[Xen-devel] Re: [PATCH] x86/paravirt: Partially revert "remove lazy mode

To: Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx>
Subject: [Xen-devel] Re: [PATCH] x86/paravirt: Partially revert "remove lazy mode in interrupts"
From: Jeremy Fitzhardinge <jeremy@xxxxxxxx>
Date: Mon, 26 Sep 2011 09:22:21 -0700
Cc: xen-devel@xxxxxxxxxxxxxxxxxxx, Peter Zijlstra <a.p.zijlstra@xxxxxxxxx>, Jeremy Fitzhardinge <jeremy.fitzhardinge@xxxxxxxxxx>, x86@xxxxxxxxxx, linux-kernel@xxxxxxxxxxxxxxx, Ingo Molnar <mingo@xxxxxxxxxx>, "H. Peter Anvin" <hpa@xxxxxxxxx>, Thomas Gleixner <tglx@xxxxxxxxxxxxx>, stable@xxxxxxxxxx
Delivery-date: Mon, 26 Sep 2011 11:24:29 -0700
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
In-reply-to: <1317042797-19975-1-git-send-email-konrad.wilk@xxxxxxxxxx>
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
References: <1317042797-19975-1-git-send-email-konrad.wilk@xxxxxxxxxx>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:6.0.2) Gecko/20110906 Thunderbird/6.0.2
On 09/26/2011 06:13 AM, Konrad Rzeszutek Wilk wrote:
> which has git commit b8bcfe997e46150fedcc3f5b26b846400122fdd9.
>
> The unintended consequence of removing the flushing of MMU
> updates when doing kmap_atomic (or kunmap_atomic) is that we can
> hit a dereference bug when processing a "fork()" under a heavy loaded
> machine. Specifically we can hit:

The patch is all OK, but I wouldn't have headlined it as a "partial
revert" - the important point is that the pte updates in k(un)map_atomic
need to be synchronous, regardless of whether we're in lazy_mmu mode.

The fact that b8bcfe997e4 introduced the problem is interesting to note,
but only somewhat relevant to the analysis of what's being fixed here.

    J

>
> BUG: unable to handle kernel paging request at f573fc8c
> IP: [<c01abc54>] swap_count_continued+0x104/0x180
> *pdpt = 000000002a3b9027 *pde = 0000000001bed067 *pte = 0000000000000000
> Oops: 0000 [#1] SMP
> Modules linked in:
> Pid: 1638, comm: apache2 Not tainted 3.0.4-linode37 #1
> EIP: 0061:[<c01abc54>] EFLAGS: 00210246 CPU: 3
> EIP is at swap_count_continued+0x104/0x180
> .. snip..
> Call Trace:
>  [<c01ac222>] ? __swap_duplicate+0xc2/0x160
>  [<c01040f7>] ? pte_mfn_to_pfn+0x87/0xe0
>  [<c01ac2e4>] ? swap_duplicate+0x14/0x40
>  [<c01a0a6b>] ? copy_pte_range+0x45b/0x500
>  [<c01a0ca5>] ? copy_page_range+0x195/0x200
>  [<c01328c6>] ? dup_mmap+0x1c6/0x2c0
>  [<c0132cf8>] ? dup_mm+0xa8/0x130
>  [<c013376a>] ? copy_process+0x98a/0xb30
>  [<c013395f>] ? do_fork+0x4f/0x280
>  [<c01573b3>] ? getnstimeofday+0x43/0x100
>  [<c010f770>] ? sys_clone+0x30/0x40
>  [<c06c048d>] ? ptregs_clone+0x15/0x48
>  [<c06bfb71>] ? syscall_call+0x7/0xb
>
> The problem looks that in copy_page_range we turn lazy mode on, and then
> in swap_entry_free we call swap_count_continued which ends up in:
>
>          map = kmap_atomic(page, KM_USER0) + offset;
>
> and then later touches *map.
>
> Since we are running in batched mode (lazy) we don't actually set up the
> PTE mappings and the kmap_atomic is not done synchronously and ends up
> trying to dereference a page that has not been set.
>
> Looking at kmap_atomic_prot_pfn, it uses 'arch_flush_lazy_mmu_mode' and
> sprinkling that in kmap_atomic_prot and __kunmap_atomic makes the problem
> go away.
>
> CC: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
> CC: Ingo Molnar <mingo@xxxxxxxxxx>
> CC: "H. Peter Anvin" <hpa@xxxxxxxxx>
> CC: x86@xxxxxxxxxx
> CC: Peter Zijlstra <a.p.zijlstra@xxxxxxxxx>
> CC: Jeremy Fitzhardinge <jeremy.fitzhardinge@xxxxxxxxxx>
> CC: stable@xxxxxxxxxx
> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx>
> ---
>  arch/x86/mm/highmem_32.c |    2 ++
>  1 files changed, 2 insertions(+), 0 deletions(-)
>
> diff --git a/arch/x86/mm/highmem_32.c b/arch/x86/mm/highmem_32.c
> index b499626..f4f29b1 100644
> --- a/arch/x86/mm/highmem_32.c
> +++ b/arch/x86/mm/highmem_32.c
> @@ -45,6 +45,7 @@ void *kmap_atomic_prot(struct page *page, pgprot_t prot)
>       vaddr = __fix_to_virt(FIX_KMAP_BEGIN + idx);
>       BUG_ON(!pte_none(*(kmap_pte-idx)));
>       set_pte(kmap_pte-idx, mk_pte(page, prot));
> +     arch_flush_lazy_mmu_mode();
>  
>       return (void *)vaddr;
>  }
> @@ -88,6 +89,7 @@ void __kunmap_atomic(void *kvaddr)
>                */
>               kpte_clear_flush(kmap_pte-idx, vaddr);
>               kmap_atomic_idx_pop();
> +             arch_flush_lazy_mmu_mode();
>       }
>  #ifdef CONFIG_DEBUG_HIGHMEM
>       else {


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel