[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Xen-devel] [PATCH] xen/kexec: Prevent deadlock because of one_cpu_only()



During automated testing of the Xen crash path via fatal NMIs, we discovered
the following deadlock:

[22:47:04 UTC] (XEN) ****************************************
[22:47:04 UTC] (XEN) Panic on CPU 0:
[22:47:04 UTC] (XEN) FATAL TRAP: vector = 2 (nmi)
[22:47:04 UTC] (XEN) [error_code=0000] , IN INTERRUPT CONTEXT
[22:47:04 UTC] (XEN) ****************************************
[22:47:04 UTC] (XEN)
[22:47:04 UTC] (XEN) Reboot in five seconds...
[22:47:05 UTC] (XEN) DMAR_IQA_REG = 839b34002
[22:47:05 UTC] (XEN) DMAR_IQH_REG = 0
[22:47:05 UTC] (XEN) DMAR_IQT_REG = 26b0
[22:47:05 UTC] (XEN)
[22:47:05 UTC] (XEN) ****************************************
[22:47:05 UTC] (XEN) Panic on CPU 0:
[22:47:05 UTC] (XEN) queue invalidate wait descriptor was not executed
[22:47:05 UTC] (XEN) ****************************************
[22:47:05 UTC] (XEN)
[22:47:05 UTC] (XEN) Reboot in five seconds...
[23:09:54 UTC]  The server is not powered on.  The Virtual Serial Port is not 
available.

The problem is quite difficult to reproduce (seen twice, unable to reproduce
manually), but inspection of kexec_crash() path shows a certain deadlock
because of one_cpu_only().


This patch replaces the use of one_cpu_only() with a recursive spinlock which
provides exclusion between different CPUs racing into kexec_crash(), but
prevents deadlock if the same CPU reenters kexec_crash() again.

Signed-off-by: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>

diff -r 996fbd7657bb -r a559ec5225fc xen/common/kexec.c
--- a/xen/common/kexec.c
+++ b/xen/common/kexec.c
@@ -234,13 +234,6 @@ void __init set_kexec_crash_area_size(u6
     }
 }
 
-static void one_cpu_only(void)
-{
-    /* Only allow the first cpu to continue - force other cpus to spin */
-    if ( test_and_set_bit(KEXEC_FLAG_IN_PROGRESS, &kexec_flags) )
-        for ( ; ; ) ;
-}
-
 /* Save the registers in the per-cpu crash note buffer. */
 void kexec_crash_save_cpu(void)
 {
@@ -294,19 +287,25 @@ static void kexec_common_shutdown(void)
     watchdog_disable();
     console_start_sync();
     spin_debug_disable();
-    one_cpu_only();
     acpi_dmar_reinstate();
 }
 
 void kexec_crash(void)
 {
+    static DEFINE_SPINLOCK(crash_lock);
     int pos;
 
     pos = (test_bit(KEXEC_FLAG_CRASH_POS, &kexec_flags) != 0);
     if ( !test_bit(KEXEC_IMAGE_CRASH_BASE + pos, &kexec_flags) )
         return;
 
+    /* In a cascade failure, it is possible to re-enter kexec_crash on the same
+     * CPU.  This spinlock is deliberately never unlocked, to allow reentrance
+     * on the same CPU, but provides excusion from different CPUs racing. */
+    spin_lock_recursive(&crash_lock);
+
     kexecing = TRUE;
+    set_bit(KEXEC_FLAG_IN_PROGRESS, &kexec_flags);
 
     kexec_common_shutdown();
     kexec_crash_save_cpu();

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.