Hi,
Machines with multi NUMA nodes may panic on bootup.
Attached patch(for C/S15145), in which I modified
the initialization order of buddy allocator, fixes
this problem.
I tested booting dom0/domVTi and kernel-make on guests.
Any comments and feedbacks would be appreciated.
I will describe the issue and the cause of it later, but
I have a few questions:
1. I moved acpi_table_init(), acpi_numa_init(), and
smp_build_cpu_map() to early_setup_arch() from late_setup_arch().
It works and, as far as I read source codes, it seems there is
no bad effect.
What do you think ?
2. The xenheap area (from xen_pstart to xenheap_phys_end) must exist
in node0 from its design?
(As far as I know, if xenheap is not in node0, the initialization
process of xenheap recursively needs xenheap memory)
[Issue detail]
I have been testing Xen/IA64 on NEC's IPF server(AsAmA2).
CPU: Itanium2(Montecito) 16cpus/32cores
Memory: 128GB(16GB/node)
OS: SLES10
It had worked fine till at least C/S14077, but after I
upgraded to C/S14828, dom0 got panic at boot time with
messages like attached at end of this mail.
I traced the problem down, and figured out that the reason of
the panic was an access to avail[4][23], while avail[4]
was 0(that is, it was not allocated).
I read source codes and inserted debug codes, and figured out
that the root cause of this problem is bad order of initialization
of buddy allocator.
In current order, when end_boot_allocator() is called, node_memblk[]
and xenheap is not initialized.
But init_heap_pages()(called by end_boot_allocator() and other
functions) calls phys_to_nid(), which needs node_memblk[], and
xmalloc_array(), which needs xenheap.
So node_memblk[] and xenheap should be initialized before
end_boot_allocator().
I haven't confirmed it, but it seems that C/S14106(xen
memory allocator: Dynamically allocate per-numa-node
metadata) revealed this potential bug.
[panic message]
:
netconsole: not configured, aborting
Linux video capture interface: v2.00
Xen virtual console successfully installed as ttyS0
Event-channel device installed.
(XEN) *** xen_handle_domain_access: exception table lookup failed,
iip=0xf000000004032e10, addr=0xb0, spinning...
(XEN) $$$$$ PANIC in domain 0 (k6=0xf000000007b00000): ***
xen_handle_domain_access: exception table lookup failed,
iip=0xf000000004032e10, addr=0xb0, spinning...
(XEN) d 0xf000000007b28080 domid 0
(XEN) vcpu 0xf000000007b00000 vcpu 1
(XEN)
(XEN) CPU 1
(XEN) psr : 0000101008226018 ifs : 800000000000060e ip :
[<f000000004032e10>]
(XEN) ip is at free_heap_pages+0x2d0/0x6c0
(XEN) unat: 0000000000000000 pfs : 000000000000060e rsc : 0000000000000003
(XEN) rnat: 0000000000000538 bsps: 0000000000000000 pr : 000000000002a599
(XEN) ldrs: 0000000000000000 ccv : 0000000000000000 fpsr: 0009804c0270033f
(XEN) csd : 0000000000000000 ssd : 0000000000000000
(XEN) b0 : f000000004032dd0 b6 : f0000000040abe30 b7 : f000000004002e20
(XEN) f6 : 000000000000000000000 f7 : 1003e0000000000000000
(XEN) f8 : 1003e0000000000002000 f9 : 100058000000000000000
(XEN) f10 : 1003e0000000000002000 f11 : 1003e0000000000000001
(XEN) r1 : f00000000438d720 r2 : 0000000000000000 r3 : f00000201ea6bde9
(XEN) r8 : 0000000000000004 r9 : ffffffffffffffff r10 : 0000000000000000
(XEN) r11 : 0000000000020959 r12 : f000000007b07d30 r13 : f000000007b00000
(XEN) r14 : f00000000419af70 r15 : 00000000000000b0 r16 : 0000000000000001
(XEN) r17 : f000000004251578 r18 : 0000000000000022 r19 : 0000000000000023
(XEN) r20 : 0000000000000001 r21 : f000000004128218 r22 : f000000004190c58
(XEN) r23 : f30000000c4adb64 r24 : f000000004128208 r25 : 0000000006497b93
(XEN) r26 : 0000000000000016 r27 : 0000000000000000 r28 : 0000000000000000
(XEN) r29 : 0000000000000000 r30 : 0000000000000000 r31 : f000000004196bc8
(XEN)
(XEN) Call Trace:
(XEN) [<f0000000040b2a70>] show_stack+0x80/0xa0
(XEN) sp=f000000007b077e0
bsp=f000000007b015e8
(XEN) [<f000000004089500>] panic_domain+0x120/0x170
(XEN) sp=f000000007b079b0
bsp=f000000007b01580
(XEN) [<f00000000407e1f0>] ia64_do_page_fault+0x640/0x650
(XEN) sp=f000000007b07af0
bsp=f000000007b014f0
(XEN) [<f0000000040ab880>] ia64_leave_kernel+0x0/0x300
(XEN) sp=f000000007b07b30
bsp=f000000007b014f0
(XEN) [<f000000004032e10>] free_heap_pages+0x2d0/0x6c0
(XEN) sp=f000000007b07d30
bsp=f000000007b01480
(XEN) [<f000000004034180>] free_domheap_pages+0x430/0x880
(XEN) sp=f000000007b07d30
bsp=f000000007b01440
(XEN) [<f00000000402f220>] guest_remove_page+0x390/0x580
(XEN) sp=f000000007b07d30
bsp=f000000007b013e0
Thanks,
Daisuke Nishimura.
diff -r 2b14a1f22eec xen/arch/ia64/linux-xen/setup.c
--- a/xen/arch/ia64/linux-xen/setup.c Fri May 25 09:43:21 2007 -0600
+++ b/xen/arch/ia64/linux-xen/setup.c Mon May 28 13:26:25 2007 +0900
@@ -506,13 +506,6 @@ setup_arch (char **cmdline_p)
if (early_console_setup(*cmdline_p) == 0)
mark_bsp_online();
-#ifdef XEN
-}
-
-void __init
-late_setup_arch (char **cmdline_p)
-{
-#endif
#ifdef CONFIG_ACPI_BOOT
/* Initialize the ACPI boot-time table parser */
acpi_table_init();
@@ -525,6 +518,13 @@ late_setup_arch (char **cmdline_p)
# endif
#endif /* CONFIG_APCI_BOOT */
+#ifdef XEN
+}
+
+void __init
+late_setup_arch (char **cmdline_p)
+{
+#endif
#ifndef XEN
find_memory();
#endif
diff -r 2b14a1f22eec xen/arch/ia64/xen/xensetup.c
--- a/xen/arch/ia64/xen/xensetup.c Fri May 25 09:43:21 2007 -0600
+++ b/xen/arch/ia64/xen/xensetup.c Mon May 28 13:26:25 2007 +0900
@@ -433,12 +433,12 @@ void __init start_kernel(void)
alloc_dom0();
- end_boot_allocator();
-
init_xenheap_pages(__pa(xen_heap_start), xenheap_phys_end);
printk("Xen heap: %luMB (%lukB)\n",
(xenheap_phys_end-__pa(xen_heap_start)) >> 20,
(xenheap_phys_end-__pa(xen_heap_start)) >> 10);
+
+ end_boot_allocator();
late_setup_arch(&cmdline);
_______________________________________________
Xen-ia64-devel mailing list
Xen-ia64-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-ia64-devel
|