[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Dom0 crash with old style AMD NUMA detection



On 09/17/2012 09:14 PM, Konrad Rzeszutek Wilk wrote:
On Mon, Sep 17, 2012 at 09:29:22AM +0200, Andre Przywara wrote:
On 09/14/2012 08:58 PM, Konrad Rzeszutek Wilk wrote:
[    0.000000] Kernel panic - not syncing: Attempted to kill the idle task!
(XEN) Domain 0 crashed: 'noreboot' set - not rebooting.



The obvious solution would be to explicitly deny northbridge scanning
when running as Dom0, though I am not sure how to implement this without
upsetting the other kernel folks about "that crappy Xen thing" again ;-)

Heh.
Is there a numa=0 option that could be used to override it to turn it
off?

Not compile tested.. but was thinking something like this:

ping?

That looks good to me - at least for the time being.

OK, can I've your Tested-by/Acked-by on it pls?

I just want to check how this interacts with upcoming Dom0 NUMA
support. It wouldn't be too clever if we deliberately disable NUMA

We can always revert this patch in future versions of Linux.

I don't like this idea. Then we have Linux kernel up to 3.5 working and say from 3.8 on again, but 3.6 and 3.7 cannot use NUMA. That would be pretty unfortunate.

I haven't checked back with Dario, but I'd suspect that we use ACPI for injecting NUMA topology into Dom0. Even if not, a general "numa=off" for Dom0 is too much of a sledgehammer for me.

and future Xen version will allow us to use it. So let me check if I
can confine this turn-off to the fallback K8 northbridge reading.

This potentially could work, but I would prefer to not do it for 3.6.

Mmh, I don't get the idea of your patch below. One can always read the NUMA topology from the AMD northbridge, but this is deprecated if favor of ACPI. The amdtopology.c stuff was only there to enable NUMA for very early Opterons, where BIOSes didn't provide (sane) SRAT tables. Though we disallow ACPI for NUMA on Dom0, this northbridge scanning unfortunately "shines through" the virtualization, actually revealing the system's NUMA topology, which is usually much different from Dom0's one.

So instead I want to do more something like this:

diff --git a/arch/x86/include/asm/numa.h b/arch/x86/include/asm/numa.h
index bfacd2c..7811c0d 100644
--- a/arch/x86/include/asm/numa.h
+++ b/arch/x86/include/asm/numa.h
@@ -20,6 +20,8 @@

 extern int numa_off;

+extern bool deny_amd_nb_numa_scan;
+
 /*
  * __apicid_to_node[] stores the raw mapping between physical apicid and
  * node and is used to initialize cpu_to_node mapping.
diff --git a/arch/x86/mm/amdtopology.c b/arch/x86/mm/amdtopology.c
index 5247d01..f223a67 100644
--- a/arch/x86/mm/amdtopology.c
+++ b/arch/x86/mm/amdtopology.c
@@ -29,6 +29,8 @@

 static unsigned char __initdata nodeids[8];

+bool deny_amd_nb_numa_scan = 0;
+
 static __init int find_northbridge(void)
 {
        int num;
@@ -78,6 +80,9 @@ int __init amd_numa_init(void)
        u32 nodeid, reg;
        unsigned int bits, cores, apicid_base;

+       if (deny_amd_nb_numa_scan)
+               return -ENOENT;
+
        if (!early_pci_allowed())
                return -EINVAL;

diff --git a/arch/x86/xen/setup.c b/arch/x86/xen/setup.c
index d11ca11..6db63c0 100644
--- a/arch/x86/xen/setup.c
+++ b/arch/x86/xen/setup.c
@@ -532,6 +532,8 @@ void __init xen_arch_setup(void)
        }
 #endif

+       deny_amd_nb_numa_scan = 1;
+
        memcpy(boot_command_line, xen_start_info->cmd_line,
               MAX_GUEST_CMDLINE > COMMAND_LINE_SIZE ?
               COMMAND_LINE_SIZE : MAX_GUEST_CMDLINE);

This would just turn off this one kind of NUMA discovery for Dom0.
The patch is admittedly a bit rough (not sure about the proper placement into #ifdef's, for instance) and not well tested yet. Also one could think about using a more general variable name to cover other hardware things in the future that Dom0 shouldn't use.
So this isn't something still for 3.6, probably not even for 3.7.

What about if we drop the patch for this problem at all for 3.6 and recommend "numa=off" as a workaround? This is much less sticky than a kernel patch and could appear in the Xen wiki, for instance. After all this isn't a strict regression (appears with every 3.x kernel, AFAICT). Most of the time the northbridge scanning will yield bogus results, so the kernel eventually discards it, but sometimes it seems to slip through and causes trouble. Also it does not trigger on newer (Bulldozer) class CPUs, since we deliberately avoided adding the new northbridge PCI-ID for this routine.

Regards,
Andre.


diff --git a/arch/x86/xen/setup.c b/arch/x86/xen/setup.c
index a4790bf..b4edce4 100644
--- a/arch/x86/xen/setup.c
+++ b/arch/x86/xen/setup.c
@@ -17,6 +17,7 @@
  #include <asm/e820.h>
  #include <asm/setup.h>
  #include <asm/acpi.h>
+#include <asm/numa.h>
  #include <asm/xen/hypervisor.h>
  #include <asm/xen/hypercall.h>

@@ -483,7 +484,32 @@ void __cpuinit xen_enable_sysenter(void)
        if(ret != 0)
                setup_clear_cpu_cap(sysenter_feature);
  }
+#ifdef CONFIG_AMD_NUMA
+int __cpuinit xen_amd_k8(void)
+{
+       int num;
+
+       if (boot_cpu_data.x86_vendor != X86_VENDOR_AMD)
+               return -ENOENT;
+
+       for (num = 0; num < 32; num++) {
+               u32 header;
+
+               header = read_pci_config(0, num, 0, 0x00);
+               if (header != (PCI_VENDOR_ID_AMD | (0x1100<<16)) &&
+                       header != (PCI_VENDOR_ID_AMD | (0x1200<<16)) &&
+                       header != (PCI_VENDOR_ID_AMD | (0x1300<<16)))
+                       continue;

+               header = read_pci_config(0, num, 1, 0x00);
+               if (header != (PCI_VENDOR_ID_AMD | (0x1101<<16)) &&
+                       header != (PCI_VENDOR_ID_AMD | (0x1201<<16)) &&
+                       header != (PCI_VENDOR_ID_AMD | (0x1301<<16)))
+                       continue;
+               return num;
+       }
+       return -ENOENT;
+#endif
  void __cpuinit xen_enable_syscall(void)
  {
  #ifdef CONFIG_X86_64
@@ -542,4 +568,8 @@ void __init xen_arch_setup(void)
        disable_cpufreq();
        WARN_ON(set_pm_idle_to_default());
        fiddle_vdso();
+#ifdef CONFIG_AMD_NUMA
+       if (xen_amd_k8() >= 0)
+               numa_off=1;
+#endif
  }




--
Andre Przywara
AMD-Operating System Research Center (OSRC), Dresden, Germany


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.