[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Dom0 crash with old style AMD NUMA detection



On Fri, Aug 03, 2012 at 02:20:31PM +0200, Andre Przywara wrote:
> Hi,
> 
> we see Dom0 crashes due to the kernel detecting the NUMA topology not by 
> ACPI, but directly from the northbridge (CONFIG_AMD_NUMA).
> 
> This will detect the actual NUMA config of the physical machine, but 
> will crash about the mismatch with Dom0's virtual memory. Variation of 
> the theme: Dom0 sees what it's not supposed to see.
> 
> This happens with the said config option enabled and on a machine where 
> this scanning is still enabled (K8 and Fam10h, not Bulldozer class)
> 
> We have this dump then:
> [    0.000000] NUMA: Warning: node ids are out of bound, from=-1 to=-1
> distance=10
> [    0.000000] Scanning NUMA topology in Northbridge 24
> [    0.000000] Number of physical nodes 4
> [    0.000000] Node 0 MemBase 0000000000000000 Limit 0000000040000000
> [    0.000000] Node 1 MemBase 0000000040000000 Limit 0000000138000000
> [    0.000000] Node 2 MemBase 0000000138000000 Limit 00000001f8000000
> [    0.000000] Node 3 MemBase 00000001f8000000 Limit 0000000238000000
> [    0.000000] Initmem setup node 0 0000000000000000-0000000040000000
> [    0.000000]   NODE_DATA [000000003ffd9000 - 000000003fffffff]
> [    0.000000] Initmem setup node 1 0000000040000000-0000000138000000
> [    0.000000]   NODE_DATA [0000000137fd9000 - 0000000137ffffff]
> [    0.000000] Initmem setup node 2 0000000138000000-00000001f8000000
> [    0.000000]   NODE_DATA [00000001f095e000 - 00000001f0984fff]
> [    0.000000] Initmem setup node 3 00000001f8000000-0000000238000000
> [    0.000000] Cannot find 159744 bytes in node 3
> [    0.000000] BUG: unable to handle kernel NULL pointer dereference at 
> (null)
> [    0.000000] IP: [<ffffffff81d220e6>] __alloc_bootmem_node+0x43/0x96
> [    0.000000] PGD 0
> [    0.000000] Oops: 0000 [#1] SMP
> [    0.000000] CPU 0
> [    0.000000] Modules linked in:
> [    0.000000]
> [    0.000000] Pid: 0, comm: swapper Not tainted 3.3.6 #1 AMD Dinar/Dinar
> [    0.000000] RIP: e030:[<ffffffff81d220e6>]  [<ffffffff81d220e6>] 
> __alloc_bootmem_node+0x43/0x96
> [    0.000000] RSP: e02b:ffffffff81c01de8  EFLAGS: 00010046
> [    0.000000] RAX: 0000000000000000 RBX: 00000000000000c0 RCX: 
> 0000000000000000
> [    0.000000] RDX: 0000000000000040 RSI: 00000000000000c0 RDI: 
> 0000000000000000
> [    0.000000] RBP: ffffffff81c01e08 R08: 0000000000000000 R09: 
> 0000000000000000
> [    0.000000] R10: 0000000000098000 R11: 0000000000000000 R12: 
> 0000000000000000
> [    0.000000] R13: 0000000000000000 R14: 0000000000000040 R15: 
> 0000000000000003
> [    0.000000] FS:  0000000000000000(0000) GS:ffffffff81ced000(0000) 
> knlGS:0000000000000000
> [    0.000000] CS:  e033 DS: 0000 ES: 0000 CR0: 0000000080050033
> [    0.000000] CR2: 0000000000000000 CR3: 0000000001c05000 CR4: 
> 0000000000000660
> [    0.000000] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 
> 0000000000000000
> [    0.000000] DR3: 0000000000000000 DR6: 0000000000000000 DR7: 
> 0000000000000000
> [    0.000000] Process swapper (pid: 0, threadinfo ffffffff81c00000, 
> task ffffffff81c0d020)
> [    0.000000] Stack:
> [    0.000000]  00000000000000c0 0000000000000003 0000000000000000 
> 000000000000003f
> [    0.000000]  ffffffff81c01e68 ffffffff81d23024 0000000000400000 
> 0000000000000002
> [    0.000000]  0000000000080000 ffff8801f055e000 ffff8801f055e1f8 
> 0000000000000000
> [    0.000000] Call Trace:
> [    0.000000]  [<ffffffff81d23024>] 
> sparse_early_usemaps_alloc_node+0x64/0x178
> [    0.000000]  [<ffffffff81d23348>] sparse_init+0xe4/0x25a
> [    0.000000]  [<ffffffff81d16840>] paging_init+0x13/0x22
> [    0.000000]  [<ffffffff81d07fbb>] setup_arch+0x9c6/0xa9b
> [    0.000000]  [<ffffffff81683954>] ? printk+0x3c/0x3e
> [    0.000000]  [<ffffffff81d01a38>] start_kernel+0xe5/0x468
> [    0.000000]  [<ffffffff81d012cf>] x86_64_start_reservations+0xba/0xc1
> [    0.000000]  [<ffffffff81007153>] ? xen_setup_runstate_info+0x2c/0x36
> [    0.000000]  [<ffffffff81d050ee>] xen_start_kernel+0x565/0x56c
> [    0.000000] Code: 79 bc 3e ff 85 c0 74 23 80 3d 19 e9 21 00 00 75 59 
> be 2a
> 01 00 00 48 c7 c7 d0 55 a8 81 e8 b6 dc 31 ff c6 05 ff e8 21 00 01 eb 3f 
> <41> 8b
> bc 24 60 60 02 00 49 83 c8 ff 4c 89 e9 4c 89 f2 48 89 de
> [    0.000000] RIP  [<ffffffff81d220e6>] __alloc_bootmem_node+0x43/0x96
> [    0.000000]  RSP <ffffffff81c01de8>
> [    0.000000] CR2: 0000000000000000
> [    0.000000] ---[ end trace a7919e7f17c0a725 ]---
> [    0.000000] Kernel panic - not syncing: Attempted to kill the idle task!
> (XEN) Domain 0 crashed: 'noreboot' set - not rebooting.
> 
> 
> 
> The obvious solution would be to explicitly deny northbridge scanning 
> when running as Dom0, though I am not sure how to implement this without 
> upsetting the other kernel folks about "that crappy Xen thing" again ;-)

Heh.
Is there a numa=0 option that could be used to override it to turn it
off?
> 
> Could someone propose a fix for this (I am OoO for the next two weeks).
> 
> Regards,
> Andre.
> 
> -- 
> Andre Przywara
> AMD-Operating System Research Center (OSRC), Dresden, Germany
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@xxxxxxxxxxxxx
> http://lists.xen.org/xen-devel

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.