WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-devel

[Xen-devel] [PATCH v2] xen mmu: fix a race window causing leave_mm BUG()

To: Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx>, xen devel <xen-devel@xxxxxxxxxxxxxxxxxxx>, "Ian Campbell (Ian.Campbell@xxxxxxxxxxxxx)" <Ian.Campbell@xxxxxxxxxxxxx>, "jeremy@xxxxxxxx" <jeremy@xxxxxxxx>
Subject: [Xen-devel] [PATCH v2] xen mmu: fix a race window causing leave_mm BUG()
From: "Tian, Kevin" <kevin.tian@xxxxxxxxx>
Date: Thu, 12 May 2011 10:56:08 +0800
Accept-language: en-US
Acceptlanguage: en-US
Cc:
Delivery-date: Wed, 11 May 2011 19:57:53 -0700
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
Thread-index: AcwQT/H4/67Sdgl4Ta271b2c0NWN7A==
Thread-topic: [PATCH v2] xen mmu: fix a race window causing leave_mm BUG()
commit ba1707117fee7e712290c7bb3cd67085a55e01ce
Author: Kevin Tian <kevin.tian@xxxxxxxxx>
Date:   Thu May 12 10:01:19 2011 +0800

    xen mmu: fix a race window causing leave_mm BUG()
    
    there's a race window in xen_drop_mm_ref, where remote cpu may exit
    dirty bitmap between the check on this cpu and the point where remote
    cpu handles drop request. So in drop_other_mm_ref we need check
    whether TLB state is still lazy before calling into leave_mm. This
    bug is rarely observed in earlier kernel, but exaggerated by the
    commit 831d52bc153971b70e64eccfbed2b232394f22f8 which clears bitmap
    after changing the TLB state. the call trace is as below:
    
    ---------------------------------
    kernel BUG at arch/x86/mm/tlb.c:61!
    invalid opcode: 0000 [#1] SMP
    last sysfs file: /sys/devices/system/xen_memory/xen_memory0/info/current_kb
    CPU 1
    Modules linked in: 8021q garp xen_netback xen_blkback blktap 
blkback_pagemap nbd bridge stp llc autofs4 ipmi_devintf ipmi_si ipmi_msghandler 
lockd sunrpc bonding ipv6 xenfs dm_multipath video output sbs sbshc parport_pc 
lp parport ses enclosure snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq 
snd_seq_device serio_raw bnx2 snd_pcm_oss snd_mixer_oss snd_pcm snd_timer 
iTCO_wdt snd soundcore snd_page_alloc i2c_i801 iTCO_vendor_support i2c_core pcs 
pkr pata_acpi ata_generic ata_piix shpchp mptsas mptscsih mptbase [last 
unloaded: freq_table]
    Pid: 25581, comm: khelper Not tainted 2.6.32.36fixxen #1 Tecal RH2285
    RIP: e030:[<ffffffff8103a3cb>]  [<ffffffff8103a3cb>] leave_mm+0x15/0x46
    RSP: e02b:ffff88002805be48  EFLAGS: 00010046
    RAX: 0000000000000000 RBX: 0000000000000001 RCX: ffff88015f8e2da0
    RDX: ffff88002805be78 RSI: 0000000000000000 RDI: 0000000000000001
    RBP: ffff88002805be48 R08: ffff88009d662000 R09: dead000000200200
    R10: dead000000100100 R11: ffffffff814472b2 R12: ffff88009bfc1880
    R13: ffff880028063020 R14: 00000000000004f6 R15: 0000000000000000
    FS:  00007f62362d66e0(0000) GS:ffff880028058000(0000) knlGS:0000000000000000
    CS:  e033 DS: 0000 ES: 0000 CR0: 000000008005003b
    CR2: 0000003aabc11909 CR3: 000000009b8ca000 CR4: 0000000000002660
    DR0: 0000000000000000 DR1: 0000000000000000 DR2: 00000000000000 00
    DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
    Process khelper (pid: 25581, threadinfo ffff88007691e000, task 
ffff88009b92db40)
    Stack:
     ffff88002805be68 ffffffff8100e4ae 0000000000000001 ffff88009d733b88
    <0> ffff88002805be98 ffffffff81087224 ffff88002805be78 ffff88002805be78
    <0> ffff88015f808360 00000000000004f6 ffff88002805bea8 ffffffff81010108
    Call Trace:
     <IRQ>
     [<ffffffff8100e4ae>] drop_other_mm_ref+0x2a/0x53
     [<ffffffff81087224>] generic_smp_call_function_single_interrupt+0xd8/0xfc
     [<ffffffff81010108>] xen_call_function_single_interrupt+0x13/0x28
     [<ffffffff810a936a>] handle_IRQ_event+0x66/0x120
     [<ffffffff810aac5b>] handle_percpu_irq+0x41/0x6e
     [<ffffffff8128c1c0>] __xen_evtchn_do_upcall+0x1ab/0x27d
     [<ffffffff8128dd11>] xen_evtchn_do_upcall+0x33/0x46
     [<ffffffff81013efe>] xen_do_hyper visor_callback+0x1e/0x30
     <EOI>
     [<ffffffff814472b2>] ? _spin_unlock_irqrestore+0x15/0x17
     [<ffffffff8100f8cf>] ? xen_restore_fl_direct_end+0x0/0x1
     [<ffffffff81113f71>] ? flush_old_exec+0x3ac/0x500
     [<ffffffff81150dc5>] ? load_elf_binary+0x0/0x17ef
     [<ffffffff81150dc5>] ? load_elf_binary+0x0/0x17ef
     [<ffffffff8115115d>] ? load_elf_binary+0x398/0x17ef
     [<ffffffff81042fcf>] ? need_resched+0x23/0x2d
     [<ffffffff811f4648>] ? process_measurement+0xc0/0xd7
     [<ffffffff81150dc5>] ? load_elf_binary+0x0/0x17ef
     [<ffffffff81113094>] ? search_binary_handler+0xc8/0x255
     [<ffffffff81114362>] ? do_execve+0x1c3/0x29e
     [<ffffffff8101155d>] ? sys_execve+0x43/0x5d
     [<ffffffff8106fc45>] ? __call_usermodehelper+0x0/0x6f
     [<ffffffff81013e28>] ? kernel_execve+0x68/0xd0
     [<ffffffff 8106fc45>] ? __call_usermodehelper+0x0/0x6f
     [<ffffffff8100f8cf>] ? xen_restore_fl_direct_end+0x0/0x1
     [<ffffffff8106fb64>] ? ____call_usermodehelper+0x113/0x11e
     [<ffffffff81013daa>] ? child_rip+0xa/0x20
     [<ffffffff8106fc45>] ? __call_usermodehelper+0x0/0x6f
     [<ffffffff81012f91>] ? int_ret_from_sys_call+0x7/0x1b
     [<ffffffff8101371d>] ? retint_restore_args+0x5/0x6
     [<ffffffff81013da0>] ? child_rip+0x0/0x20
    Code: 41 5e 41 5f c9 c3 55 48 89 e5 0f 1f 44 00 00 e8 17 ff ff ff c9 c3 55 
48 89 e5 0f 1f 44 00 00 65 8b 04 25 c8 55 01 00 ff c8 75 04 <0f> 0b eb fe 65 48 
8b 34 25 c0 55 01 00 48 81 c6 b8 02 00 00 e8
    RIP  [<ffffffff8103a3cb>] leave_mm+0x15/0x46
     RSP <ffff88002805be48>
    ---[ end trace ce9cee6832a9c503 ]---
    
    thanks for Maoxiaoyun<tinnycloud@xxxxxxxxxxx> to verify it.
    
    Signed-off-by: Kevin Tian <kevin.tian@xxxxxxxxx>

diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
index 55c965b..8b745fb 100644
--- a/arch/x86/xen/mmu.c
+++ b/arch/x86/xen/mmu.c
@@ -1187,7 +1187,7 @@ static void drop_other_mm_ref(void *info)
 
        active_mm = percpu_read(cpu_tlbstate.active_mm);
 
-       if (active_mm == mm)
+       if (active_mm == mm && percpu_read(cpu_tlbstate.state) != TLBSTATE_OK)
                leave_mm(smp_processor_id());
 
        /* If this cpu still has a stale cr3 reference, then make sure

Attachment: 20100512_fix_leave_mm_bug_v2.patch
Description: 20100512_fix_leave_mm_bug_v2.patch

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
<Prev in Thread] Current Thread [Next in Thread>
  • [Xen-devel] [PATCH v2] xen mmu: fix a race window causing leave_mm BUG(), Tian, Kevin <=