[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [SPAM] Re: kernel BUG at arch/x86/xen/mmu.c:1860! - ideas.


  • To: Andreas Olsowski <andreas.olsowski@xxxxxxxxxxx>
  • From: Teck Choon Giam <giamteckchoon@xxxxxxxxx>
  • Date: Mon, 28 Mar 2011 20:29:22 +0800
  • Cc: xen-devel@xxxxxxxxxxxxxxxxxxx, Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx>
  • Delivery-date: Mon, 28 Mar 2011 05:29:57 -0700
  • Domainkey-signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type:content-transfer-encoding; b=HELws14l/KCTCJ4BMfA6vIcP1YVONQwHCiAUPew2BccUI2M9UbKsVK/6KkVE1vvq/l 0tOCE+Ej8MhyFlVt4M0nGwoNu0gh/Fi3zSKYTL3vkb2IHL952gWNE4tl5Vr8q42LfXH2 FPIS9/I88v+7uvnVikQAqA9KoNwl/r+yuC/Po=
  • List-id: Xen developer discussion <xen-devel.lists.xensource.com>

On Mon, Mar 28, 2011 at 7:37 PM, Andreas Olsowski
<andreas.olsowski@xxxxxxxxxxx> wrote:
>
>>  - turn on CONFIG_DEBUG_PAGEALLOC
>>  - turn on CONFIG_DEBUG_LIST
>>  - turn on CONFIG_DEBUG_KMEMLEAK
>>  - turn on CONFIG_JBD_DEBUG, CONFIG_JBD2_DEBUG
>>  - turn on CONFIG_SLUB_DEBUG_ON
>
> After i enabled those options (i dont use SLUB, i use SLAB) i do no longer
> encounter any errors.
>
> I completed 1000 loops of snapshot/mount/umoun/removesnapshot.

Did you try with just CONFIG_DEBUG_PAGEALLOC=y and leave the rest
unchange of your config?  My testing all narrow down to
CONFIG_DEBUG_PAGEALLOC=y to prevent this BUG.

>
>
> Without those options in 2.6.32.35 i hit a different bug earlier today:
>
> But you really have to be patient to see some output, because lvremove will
> hang quite a while:
> (a "while" beeing the a a roughly the time it takes for: wait 5 min for
> error, leave office, get coffee, smoke cigarette, goto restroom, return to
> office, finally see error)
>
> kernel: BUG: unable to handle kernel paging request
> ...
> kernel: RIP  [<ffffffff8100f2bf>] xen_set_pmd+0x2f/0xb0
> syslog/dmesg output is attached as crash.2.6.32.35-xen_01 or available at:
> http://pastebin.com/Ad8MhUzD

I hit this before:

# grep 'xen_set_pmd' /var/log/messages*
/var/log/messages:Mar 27 09:31:14 xen05 kernel: IP:
[<ffffffff8100e2d4>] xen_set_pmd+0x16/0x2b
/var/log/messages:Mar 27 09:31:14 xen05 kernel: RIP:
e030:[<ffffffff8100e2d4>]  [<ffffffff8100e2d4>] xen_set_pmd+0x16/0x2b
/var/log/messages:Mar 27 09:31:14 xen05 kernel: RIP
[<ffffffff8100e2d4>] xen_set_pmd+0x16/0x2b
/var/log/messages:Mar 27 09:06:10 xen05 kernel: IP:
[<ffffffff8100e2d4>] xen_set_pmd+0x16/0x2b
/var/log/messages:Mar 27 09:06:10 xen05 kernel: RIP:
e030:[<ffffffff8100e2d4>]  [<ffffffff8100e2d4>] xen_set_pmd+0x16/0x2b
/var/log/messages:Mar 27 09:06:10 xen05 kernel: RIP
[<ffffffff8100e2d4>] xen_set_pmd+0x16/0x2b
/var/log/messages:Mar 27 15:18:57 xen05 kernel: IP:
[<ffffffff8100e2d4>] xen_set_pmd+0x16/0x2b
/var/log/messages:Mar 27 15:18:57 xen05 kernel: RIP:
e030:[<ffffffff8100e2d4>]  [<ffffffff8100e2d4>] xen_set_pmd+0x16/0x2b
/var/log/messages:Mar 27 15:18:57 xen05 kernel: RIP
[<ffffffff8100e2d4>] xen_set_pmd+0x16/0x2b
/var/log/messages.1:Mar 23 11:00:16 xen05 kernel: IP:
[<ffffffff8100e2d4>] xen_set_pmd+0x16/0x2b
/var/log/messages.1:Mar 23 11:00:16 xen05 kernel: RIP:
e030:[<ffffffff8100e2d4>]  [<ffffffff8100e2d4>] xen_set_pmd+0x16/0x2b
/var/log/messages.1:Mar 23 11:00:17 xen05 kernel: RIP
[<ffffffff8100e2d4>] xen_set_pmd+0x16/0x2b

But unable to reproduce when CONFIG_DEBUG_PAGEALLOC=y.

>
> After that happened i did a kernel recompile without rebooting the machine
> first and encoundeterd system_call_fastpath as last call once more as shown
> in crash.2.6.32.35-xen_02 or http://pastebin.com/kB38W5mp

I hit this at least once but unable to when CONFIG_DEBUG_PAGEALLOC=y:

/var/log/messages-Mar 27 17:04:39 xen05 kernel: ------------[ cut here
]------------
/var/log/messages-Mar 27 17:04:39 xen05 kernel: kernel BUG at
arch/x86/xen/mmu.c:1872!
/var/log/messages-Mar 27 17:04:39 xen05 kernel: invalid opcode: 0000 [#1] SMP
/var/log/messages-Mar 27 17:04:39 xen05 kernel: last sysfs file:
/sys/block/sdd/dev
/var/log/messages-Mar 27 17:04:39 xen05 kernel: CPU 2
/var/log/messages-Mar 27 17:04:39 xen05 kernel: Modules linked in:
ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 nf_defrag_ipv4
xt_state nf_conntrack ipt_REJECT xt_tcpudp xt_physdev iptable_filter
ip_tables x_tables bridge stp be2iscsi iscsi_tcp bnx2i cnic uio ipv6
cxgb3i cxgb3 mdio libiscsi_tcp libiscsi scsi_transport_iscsi
dm_multipath scsi_dh video backlight output sbs sbshc power_meter
hwmon battery acpi_memhotplug xen_acpi_memhotplug ac parport_pc lp
parport tg3 libphy sg ide_cd_mod cdrom serio_raw button tpm_tis tpm
tpm_bios i2c_i801 i2c_core shpchp iTCO_wdt pcspkr dm_snapshot dm_zero
dm_mirror dm_region_hash dm_log dm_mod ata_piix libata sd_mod scsi_mod
raid1 ext3 jbd uhci_hcd ohci_hcd ehci_hcd [last unloaded: microcode]
/var/log/messages-Mar 27 17:04:39 xen05 kernel: Pid: 5874, comm:
lvcreate Not tainted 2.6.32.35-4.xen.pvops.choon.centos5 #1 PowerEdge
860
/var/log/messages-Mar 27 17:04:39 xen05 kernel: RIP:
e030:[<ffffffff8100cb5b>]  [<ffffffff8100cb5b>]
pin_pagetable_pfn+0x53/0x59
/var/log/messages-Mar 27 17:04:39 xen05 kernel: RSP:
e02b:ffff8800303d1c28  EFLAGS: 00010282
/var/log/messages-Mar 27 17:04:39 xen05 kernel: RAX: 00000000ffffffea
RBX: 000000000003032d RCX: 0000000000000181
/var/log/messages-Mar 27 17:04:39 xen05 kernel: RDX: 00000000deadbeef
RSI: 00000000deadbeef RDI: 00000000deadbeef
/var/log/messages-Mar 27 17:04:39 xen05 kernel: RBP: ffff8800303d1c48
R08: 0000000000000968 R09: ffff880000000000
/var/log/messages-Mar 27 17:04:39 xen05 kernel: R10: 00000000deadbeef
R11: ffff8800303d1d08 R12: 0000000000000003
/var/log/messages-Mar 27 17:04:39 xen05 kernel: R13: 000000000003032d
R14: ffff880030360000 R15: 00007fd324a00000
/var/log/messages-Mar 27 17:04:39 xen05 kernel: FS:
00007fd327d2e710(0000) GS:ffff880028089000(0000)
knlGS:0000000000000000
/var/log/messages-Mar 27 17:04:39 xen05 kernel: CS:  e033 DS: 0000 ES:
0000 CR0: 000000008005003b
/var/log/messages-Mar 27 17:04:39 xen05 kernel: CR2: 00000000004612f0
CR3: 000000003a025000 CR4: 0000000000002660
/var/log/messages-Mar 27 17:04:39 xen05 kernel: DR0: 0000000000000000
DR1: 0000000000000000 DR2: 0000000000000000
/var/log/messages-Mar 27 17:04:39 xen05 kernel: DR3: 0000000000000000
DR6: 00000000ffff0ff0 DR7: 0000000000000400
/var/log/messages-Mar 27 17:04:39 xen05 kernel: Process lvcreate (pid:
5874, threadinfo ffff8800303d0000, task ffff880030360000)
/var/log/messages-Mar 27 17:04:39 xen05 kernel: Stack:
/var/log/messages-Mar 27 17:04:39 xen05 kernel:  0000000000000000
00000000002027a9 000000013eb43318 000000000003032d
/var/log/messages-Mar 27 17:04:39 xen05 kernel: <0> ffff8800303d1c68
ffffffff8100e07c ffff880032be05c0 ffff880032aa9928
/var/log/messages-Mar 27 17:04:39 xen05 kernel: <0> ffff8800303d1c78
ffffffff8100e0af ffff8800303d1cb8 ffffffff810a4433
/var/log/messages-Mar 27 17:04:39 xen05 kernel: Call Trace:
/var/log/messages-Mar 27 17:04:39 xen05 kernel:  [<ffffffff8100e07c>]
xen_alloc_ptpage+0x64/0x69
/var/log/messages-Mar 27 17:04:39 xen05 kernel:  [<ffffffff8100e0af>]
xen_alloc_pte+0xe/0x10
/var/log/messages-Mar 27 17:04:39 xen05 kernel:  [<ffffffff810a4433>]
__pte_alloc+0x70/0xce
/var/log/messages-Mar 27 17:04:39 xen05 kernel:  [<ffffffff810a45d1>]
handle_mm_fault+0x140/0x8b9
/var/log/messages-Mar 27 17:04:39 xen05 kernel:  [<ffffffff810a50c9>]
__get_user_pages+0x37f/0x479
/var/log/messages-Mar 27 17:04:39 xen05 kernel:  [<ffffffff810a76ca>]
__mlock_vma_pages_range+0xc0/0x16f
/var/log/messages-Mar 27 17:04:39 xen05 kernel:  [<ffffffff8131c03f>]
? _spin_unlock_irqrestore+0x11/0x13
/var/log/messages-Mar 27 17:04:39 xen05 kernel:  [<ffffffff810a78db>]
mlock_fixup+0x162/0x199
/var/log/messages-Mar 27 17:04:39 xen05 kernel:  [<ffffffff810a7989>]
do_mlockall+0x77/0x8d
/var/log/messages-Mar 27 17:04:39 xen05 kernel:  [<ffffffff81139016>]
? security_capable+0x27/0x29
/var/log/messages-Mar 27 17:04:39 xen05 kernel:  [<ffffffff810a7ce2>]
sys_mlockall+0x8f/0xb9
/var/log/messages:Mar 27 17:04:39 xen05 kernel:  [<ffffffff81012ac2>]
system_call_fastpath+0x16/0x1b
/var/log/messages-Mar 27 17:04:39 xen05 kernel: Code: 48 b8 ff ff ff
ff ff ff ff 7f 48 21 c2 48 89 55 e8 48 8d 7d e0 be 01 00 00 00 31 d2
41 ba f0 7f 00 00 e8 e9 c7 ff ff 85 c0 74 04 <0f> 0b eb fe c9 c3 55 40
f6 c7 01 48 89 e5 53 48 89 fb 74 5b 48
/var/log/messages-Mar 27 17:04:39 xen05 kernel: RIP
[<ffffffff8100cb5b>] pin_pagetable_pfn+0x53/0x59
/var/log/messages-Mar 27 17:04:39 xen05 kernel:  RSP <ffff8800303d1c28>
/var/log/messages-Mar 27 17:04:39 xen05 kernel: ---[ end trace
bf36c55d2ecd52e5 ]---

>
>
> Maybe this helps, but i think, if anything, this makes it worse as the debug
> options actually supressed the problem that needs to be debugged.

True.  At least now we know/narrow down to just related to
CONFIG_DEBUG_PAGEALLOC.  Maybe Konrad or Jeremy can have a closer look
in the related codes... ...

Thanks.

Kindest regards,
Giam Teck Choon

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.