[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Xen-devel] [stable-4.11] Heads-up: c719519 (x86/SMP: don't try to stop already stopped CPUs) causes 100% kexec/kdump failure



This is a heads-up as I have observed that the following commit (backported 
onto an Amazon 4.11 tree) causes kexec (and hence kdump) to fail. 
========
commit c719519a4183d0630121f6abeba420f49dbc3229
Author: Jan Beulich <jbeulich@xxxxxxxx>
AuthorDate: Fri Jul 5 10:32:41 2019 +0200
Commit: Jan Beulich <jbeulich@xxxxxxxx>
CommitDate: Fri Jul 5 10:32:41 2019 +0200

x86/SMP: don't try to stop already stopped CPUs
    
    In particular with an enabled IOMMU (but not really limited to this
    case), trying to invoke fixup_irqs() after having already done
    disable_IO_APIC() -> clear_IO_APIC() is a rather bad idea:
========

The test was performing "echo c > /proc/sysrq-trigger" in dom0 and the loaded 
crash kernel fails to show any signs of starting. This is the end of the Xen 
console ...
========
(XEN) Hardware Dom0 crashed: rebooting machine in 5 seconds.
(XEN) Resetting with ACPI MEMORY or I/O RESET_REG.
<machine hangs here then reboots via the BIOS after 5 seconds>
========
Expected behaviour is that the kdump kernel immediately loads and then performs 
the crash dump

I'm sorry that I have not yet had time to check if this affects vanilla 
stable-4.11 or master. I just wanted to be certain that you don't have the same 
issue.


Reverting one hunk via the following commit fixes things for me (this is an 
experiment and not at all a proposed fix)
========
--- a/xen/arch/x86/smp.c
+++ b/xen/arch/x86/smp.c
@@ -303,15 +303,15 @@ static void stop_this_cpu(void *dummy)
 void smp_send_stop(void)
 {
     unsigned int cpu = smp_processor_id();
+    
+    local_irq_disable();
+    fixup_irqs(cpumask_of(cpu), 0);
+    local_irq_enable();
 
     if ( num_online_cpus() > 1 )
     {
         int timeout = 10;
 
-        local_irq_disable();
-        fixup_irqs(cpumask_of(cpu), 0);
-        local_irq_enable();
-
         smp_call_function(stop_this_cpu, NULL, 0);
 
         /* Wait 10ms for all other CPUs to go offline. */
========

Regards
Rob

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.