WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-ia64-devel

[Xen-ia64-devel] [PATCH]fix initialization order of buddy allocator

To: xen-ia64-devel@xxxxxxxxxxxxxxxxxxx
Subject: [Xen-ia64-devel] [PATCH]fix initialization order of buddy allocator
From: Daisuke Nishimura <nishimura@xxxxxxxxxxxxxxxxx>
Date: Mon, 28 May 2007 19:48:51 +0900
Delivery-date: Mon, 28 May 2007 03:47:03 -0700
Envelope-to: www-data@xxxxxxxxxxxxxxxxxx
List-help: <mailto:xen-ia64-devel-request@lists.xensource.com?subject=help>
List-id: Discussion of the ia64 port of Xen <xen-ia64-devel.lists.xensource.com>
List-post: <mailto:xen-ia64-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-ia64-devel>, <mailto:xen-ia64-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-ia64-devel>, <mailto:xen-ia64-devel-request@lists.xensource.com?subject=unsubscribe>
Sender: xen-ia64-devel-bounces@xxxxxxxxxxxxxxxxxxx
User-agent: Thunderbird 1.5.0.10 (Windows/20070221)
Hi,

Machines with multi NUMA nodes may panic on bootup.
Attached patch(for C/S15145), in which I modified
the initialization order of buddy allocator, fixes
this problem.
I tested booting dom0/domVTi and kernel-make on guests.
Any comments and feedbacks would be appreciated.

I will describe the issue and the cause of it later, but
I have a few questions:

1. I moved acpi_table_init(), acpi_numa_init(), and
  smp_build_cpu_map() to early_setup_arch() from late_setup_arch().
  It works and, as far as I read source codes, it seems there is
  no bad effect.
  What do you think ?
2. The xenheap area (from xen_pstart to xenheap_phys_end) must exist
  in node0 from its design?
  (As far as I know, if xenheap is not in node0, the initialization
  process of xenheap recursively needs xenheap memory)


[Issue detail]
I have been testing Xen/IA64 on NEC's IPF server(AsAmA2).

CPU: Itanium2(Montecito) 16cpus/32cores
Memory: 128GB(16GB/node)
OS: SLES10

It had worked fine till at least C/S14077, but after I
upgraded to C/S14828, dom0 got panic at boot time with
messages like attached at end of this mail.

I traced the problem down, and figured out that the reason of
the panic was an access to avail[4][23], while avail[4]
was 0(that is, it was not allocated).

I read source codes and inserted debug codes, and figured out
that the root cause of this problem is bad order of initialization
of buddy allocator.

In current order, when end_boot_allocator() is called, node_memblk[]
and xenheap is not initialized.
But init_heap_pages()(called by end_boot_allocator() and other
functions) calls phys_to_nid(), which needs node_memblk[], and
xmalloc_array(), which needs xenheap.
So node_memblk[] and xenheap should be initialized before
end_boot_allocator().

I haven't confirmed it, but it seems that C/S14106(xen
memory allocator: Dynamically allocate per-numa-node
metadata) revealed this potential bug.


[panic message]
  :
netconsole: not configured, aborting
Linux video capture interface: v2.00
Xen virtual console successfully installed as ttyS0
Event-channel device installed.
(XEN) *** xen_handle_domain_access: exception table lookup failed,
iip=0xf000000004032e10, addr=0xb0, spinning...
(XEN) $$$$$ PANIC in domain 0 (k6=0xf000000007b00000): ***
xen_handle_domain_access: exception table lookup failed,
iip=0xf000000004032e10, addr=0xb0, spinning...
(XEN) d 0xf000000007b28080 domid 0
(XEN) vcpu 0xf000000007b00000 vcpu 1
(XEN)
(XEN) CPU 1
(XEN) psr : 0000101008226018 ifs : 800000000000060e ip  :
[<f000000004032e10>]
(XEN) ip is at free_heap_pages+0x2d0/0x6c0
(XEN) unat: 0000000000000000 pfs : 000000000000060e rsc : 0000000000000003
(XEN) rnat: 0000000000000538 bsps: 0000000000000000 pr  : 000000000002a599
(XEN) ldrs: 0000000000000000 ccv : 0000000000000000 fpsr: 0009804c0270033f
(XEN) csd : 0000000000000000 ssd : 0000000000000000
(XEN) b0  : f000000004032dd0 b6  : f0000000040abe30 b7  : f000000004002e20
(XEN) f6  : 000000000000000000000 f7  : 1003e0000000000000000
(XEN) f8  : 1003e0000000000002000 f9  : 100058000000000000000
(XEN) f10 : 1003e0000000000002000 f11 : 1003e0000000000000001
(XEN) r1  : f00000000438d720 r2  : 0000000000000000 r3  : f00000201ea6bde9
(XEN) r8  : 0000000000000004 r9  : ffffffffffffffff r10 : 0000000000000000
(XEN) r11 : 0000000000020959 r12 : f000000007b07d30 r13 : f000000007b00000
(XEN) r14 : f00000000419af70 r15 : 00000000000000b0 r16 : 0000000000000001
(XEN) r17 : f000000004251578 r18 : 0000000000000022 r19 : 0000000000000023
(XEN) r20 : 0000000000000001 r21 : f000000004128218 r22 : f000000004190c58
(XEN) r23 : f30000000c4adb64 r24 : f000000004128208 r25 : 0000000006497b93
(XEN) r26 : 0000000000000016 r27 : 0000000000000000 r28 : 0000000000000000
(XEN) r29 : 0000000000000000 r30 : 0000000000000000 r31 : f000000004196bc8
(XEN)
(XEN) Call Trace:
(XEN)  [<f0000000040b2a70>] show_stack+0x80/0xa0
(XEN)                                 sp=f000000007b077e0
bsp=f000000007b015e8
(XEN)  [<f000000004089500>] panic_domain+0x120/0x170
(XEN)                                 sp=f000000007b079b0
bsp=f000000007b01580
(XEN)  [<f00000000407e1f0>] ia64_do_page_fault+0x640/0x650
(XEN)                                 sp=f000000007b07af0
bsp=f000000007b014f0
(XEN)  [<f0000000040ab880>] ia64_leave_kernel+0x0/0x300
(XEN)                                 sp=f000000007b07b30
bsp=f000000007b014f0
(XEN)  [<f000000004032e10>] free_heap_pages+0x2d0/0x6c0
(XEN)                                 sp=f000000007b07d30
bsp=f000000007b01480
(XEN)  [<f000000004034180>] free_domheap_pages+0x430/0x880
(XEN)                                 sp=f000000007b07d30
bsp=f000000007b01440
(XEN)  [<f00000000402f220>] guest_remove_page+0x390/0x580
(XEN)                                 sp=f000000007b07d30
bsp=f000000007b013e0


Thanks,
Daisuke Nishimura.


diff -r 2b14a1f22eec xen/arch/ia64/linux-xen/setup.c
--- a/xen/arch/ia64/linux-xen/setup.c   Fri May 25 09:43:21 2007 -0600
+++ b/xen/arch/ia64/linux-xen/setup.c   Mon May 28 13:26:25 2007 +0900
@@ -506,13 +506,6 @@ setup_arch (char **cmdline_p)
        if (early_console_setup(*cmdline_p) == 0)
                mark_bsp_online();
 
-#ifdef XEN
-}
-
-void __init
-late_setup_arch (char **cmdline_p)
-{
-#endif
 #ifdef CONFIG_ACPI_BOOT
        /* Initialize the ACPI boot-time table parser */
        acpi_table_init();
@@ -525,6 +518,13 @@ late_setup_arch (char **cmdline_p)
 # endif
 #endif /* CONFIG_APCI_BOOT */
 
+#ifdef XEN
+}
+
+void __init
+late_setup_arch (char **cmdline_p)
+{
+#endif
 #ifndef XEN
        find_memory();
 #endif
diff -r 2b14a1f22eec xen/arch/ia64/xen/xensetup.c
--- a/xen/arch/ia64/xen/xensetup.c      Fri May 25 09:43:21 2007 -0600
+++ b/xen/arch/ia64/xen/xensetup.c      Mon May 28 13:26:25 2007 +0900
@@ -433,12 +433,12 @@ void __init start_kernel(void)
 
     alloc_dom0();
 
-    end_boot_allocator();
-
     init_xenheap_pages(__pa(xen_heap_start), xenheap_phys_end);
     printk("Xen heap: %luMB (%lukB)\n",
        (xenheap_phys_end-__pa(xen_heap_start)) >> 20,
        (xenheap_phys_end-__pa(xen_heap_start)) >> 10);
+
+    end_boot_allocator();
 
     late_setup_arch(&cmdline);
 
_______________________________________________
Xen-ia64-devel mailing list
Xen-ia64-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-ia64-devel