[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [xen] double fault: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC



On Mon, Oct 7, 2013 at 7:09 PM, Fengguang Wu <fengguang.wu@xxxxxxxxx> wrote:
>
> Thanks for the hints! I run a kernel with pr_alert() for several times
> and here is the screen log. Note that this kernel is compiled with gcc
> 4.6.3 and the decoded code looks different than gcc 4.8.1

Ok, I think we have something.

> NULL pointer dereference at 0000000000000008
> [    3.901119] IP: [<ffffffff81098f60>] run_timer_softirq+0x126/0x1da
> [    3.901119] PGD 0
> [    3.901119] Oops: 0002 [#1] PREEMPT SMP DEBUG_PAGEALLOC
> [    3.901119] CPU: 1 PID: 22 Comm: kworker/1:1 Not tainted 
> 3.11.0-rc2-00010-gc817a67-dirty #5
> [    3.901119] Workqueue: events power_supply_changed_work
> [    3.901119] task: ffff8800002dc5c0 ti: ffff8800002de000 task.ti: 
> ffff8800002de000
> [    3.901119] RIP: 0010:[<ffffffff81098f60>]  [<ffffffff81098f60>] 
> run_timer_softirq+0x126/0x1da
> [    3.901119] RSP: 0000:ffff88000dd03ea8  EFLAGS: 00010002
> [    3.901119] RAX: ffff880006dc28a0 RBX: ffff88000015c000 RCX: 
> ffff88000dd03ec8
> [    3.901119] RDX: ffff880006dc2860 RSI: ffffffff810a1b7b RDI: 
> 0000000000000000
> [    3.901119] RBP: ffff88000dd03f08 R08: 0000000000000002 R09: 
> dead000000200200
> [    3.901119] R10: ffff88000015c018 R11: ffff8800002dc5c0 R12: 
> ffff88000dd03ec8
> [    3.901119] R13: ffff88000015d858 R14: ffff88000015d458 R15: 
> ffff88000015d058
> [    3.901119] FS:  0000000000000000(0000) GS:ffff88000dd00000(0000) 
> knlGS:0000000000000000
> [    3.901119] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> [    3.901119] CR2: 0000000000000008 CR3: 0000000001c47000 CR4: 
> 00000000000006a0
> [    3.901119] Stack:
> [    3.901119]  ffff88000dd0e1b0 ffff88000dd0e170 ffffffff810a1b7b 
> 000000c000000000
> [    3.901119]  ffff880006dc28a0 ffff88000dd0e300 ffff88000dd03ef8 
> ffff8800002dffd8
> [    3.901119]  ffffffff81c43088 0000000010000100 0000000000000141 
> 0000000000000001
> [    3.901119] Call Trace:
> [    3.901119]  <IRQ>
> [    3.901119]  [<ffffffff810a1b7b>] ? __queue_work+0x1ee/0x1ee
> [    3.901119]  [<ffffffff8109555f>] __do_softirq+0xc6/0x18f
> [    3.901119]  [<ffffffff81095745>] irq_exit+0x58/0x9e
> [    3.901119]  [<ffffffff81044e69>] smp_apic_timer_interrupt+0x31/0x3e
> [    3.901119]  [<ffffffff8185a772>] apic_timer_interrupt+0x72/0x80
> [    3.901119]  <EOI>
> [    3.901119]  [<ffffffff81859277>] ? retint_restore_args+0x13/0x13
> [    3.901119]  [<ffffffff810af994>] ? arch_local_irq_enable+0xb/0xd
> [    3.901119]  [<ffffffff810c707f>] ? trace_hardirqs_on+0xd/0xf
> [    3.901119]  [<ffffffff81857904>] preempt_schedule_irq+0x36/0x5a
> [    3.901119]  [<ffffffff818593a6>] retint_kernel+0x26/0x30
> [    3.901119]  [<ffffffff8108f47c>] ? arch_local_irq_restore+0x6/0xd
> [    3.901119]  [<ffffffff81090db8>] vprintk_emit+0x3f9/0x422
> [    3.901119]  [<ffffffff8121a640>] ? snprintf+0x34/0x36
> [    3.901119]  [<ffffffff81483ca2>] dev_vprintk_emit+0x1a3/0x1c9
> [    3.901119]  [<ffffffff81219385>] ? string.isra.3+0x3d/0xa2
> [    3.901119]  [<ffffffff81483cfc>] dev_printk_emit+0x34/0x36
> [    3.901119]  [<ffffffff81483d6f>] __dev_printk+0x71/0x78
> [    3.901119]  [<ffffffff81483db6>] dev_printk+0x40/0x42
> [    3.901119]  [<ffffffff8111b1b4>] ? __kmalloc+0x93/0xb8
> [    3.901119]  [<ffffffff815dd5a4>] power_supply_uevent+0x183/0x1ec
> [    3.901119]  [<ffffffff81483451>] dev_uevent+0x179/0x1a2
> [    3.901119]  [<ffffffff812130dc>] ? kzalloc+0xf/0x11
> [    3.901119]  [<ffffffff812142ce>] kobject_uevent_env+0x195/0x414
> [    3.901119]  [<ffffffff81214558>] kobject_uevent+0xb/0xd
> [    3.901119]  [<ffffffff815dcc14>] power_supply_changed_work+0x60/0x65
> [    3.901119]  [<ffffffff810a2733>] process_one_work+0x1de/0x2fd
> [    3.901119]  [<ffffffff810a26c9>] ? process_one_work+0x174/0x2fd
> [    3.901119]  [<ffffffff810a2c54>] worker_thread+0x159/0x1ee
> [    3.901119]  [<ffffffff810a2afb>] ? rescuer_thread+0x27f/0x27f
> [    3.901119]  [<ffffffff810a859a>] kthread+0xac/0xb4
> [    3.901119]  [<ffffffff810a84ee>] ? __kthread_parkme+0x68/0x68
> [    3.901119]  [<ffffffff81859a3c>] ret_from_fork+0x7c/0xb0
> [    3.901119]  [<ffffffff810a84ee>] ? __kthread_parkme+0x68/0x68
> [    3.901119] Code: 08 e9 99 00 00 00 4c 8b 40 18 48 8b 70 20 49 b9 00 02 20 
> 00 00 00 ad de 48 8b 50 28 48 89 43 38 48 8b 38 48 8b 48 08 41 83 e0 02 <48> 
> 89 4f 08 48 89 39 f6 40 18 01 48 c7 00 00 00 00 00 4c 89 48
> [    3.901119] RIP  [<ffffffff81098f60>] run_timer_softirq+0x126/0x1da

So this is the same bug, now the code decodes to

   5: 4c 8b 40 18           mov    0x18(%rax),%r8
   9: 48 8b 70 20           mov    0x20(%rax),%rsi
   d: 49 b9 00 02 20 00 00 movabs $0xdead000000200200,%r9
  14: 00 ad de
  17: 48 8b 50 28           mov    0x28(%rax),%rdx
  1b: 48 89 43 38           mov    %rax,0x38(%rbx)
  1f: 48 8b 38             mov    (%rax),%rdi
  22: 48 8b 48 08           mov    0x8(%rax),%rcx
  26: 41 83 e0 02           and    $0x2,%r8d
  2a:* 48 89 4f 08           mov    %rcx,0x8(%rdi) <-- trapping instruction
  2e: 48 89 39             mov    %rdi,(%rcx)
  31: f6 40 18 01           testb  $0x1,0x18(%rax)
  35: 48 c7 00 00 00 00 00 movq   $0x0,(%rax)

so the NULL pointer in %rdi (which _should_ be a list pointer) got
loaded from %rax. So the prime suspect is an allocation near

  RAX: ffff880006dc28a0.

and looking backwards in your dump, we have

[    2.807742] kobject: 'parport_pc.888' (ffff880006dc2020):
kobject_release, parent           (null) (delayed)
[    2.810494] kobject: 'parport_pc.632' (ffff880006dca820):
kobject_release, parent           (null) (delayed)

so it looks like that "parparport_pc.632" thing.

Another one, same code:

> [    3.992074] BUG: unable to handle kernel NULL pointer dereference at 
> 0000000000000008
> [    3.994126] IP: [<ffffffff81098f60>] run_timer_softirq+0x126/0x1da
> [    3.995677] PGD 0
> [    3.996037] Oops: 0002 [#1] PREEMPT SMP DEBUG_PAGEALLOC
> [    3.996037] CPU: 1 PID: 1 Comm: init Not tainted 
> 3.11.0-rc2-00010-gc817a67-dirty #5
> [    3.996037] task: ffff88000010a040 ti: ffff88000010c000 task.ti: 
> ffff88000010c000
> [    3.996037] RIP: 0010:[<ffffffff81098f60>]  [<ffffffff81098f60>] 
> run_timer_softirq+0x126/0x1da
> [    3.996037] RSP: 0000:ffff88000dd03ea8  EFLAGS: 00010002
> [    3.996037] RAX: ffff880006da20a0 RBX: ffff88000015c000 RCX: 
> ffff88000dd03ec8

and for that boot we had

[    2.990338] kobject: 'parport_pc.956' (ffff880006da2020):
kobject_release, parent           (null) (delayed)
[    2.992654] kobject: 'parport_pc.888' (ffff880006db5820):
kobject_release, parent           (null) (delayed)
[    2.994905] kobject: 'parport_pc.632' (ffff880006db5020):
kobject_release, parent           (null) (delayed)

so now it's 'parport_pc.956'.

Anyway, that does look like a strong pattern.

Does the problem go away if you disable the parport_pc driver?

              Linus

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.