[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [Xen-devel] [PATCH] xen/kexec: Prevent deadlock because of one_cpu_only()
During automated testing of the Xen crash path via fatal NMIs, we discovered the following deadlock: [22:47:04 UTC] (XEN) **************************************** [22:47:04 UTC] (XEN) Panic on CPU 0: [22:47:04 UTC] (XEN) FATAL TRAP: vector = 2 (nmi) [22:47:04 UTC] (XEN) [error_code=0000] , IN INTERRUPT CONTEXT [22:47:04 UTC] (XEN) **************************************** [22:47:04 UTC] (XEN) [22:47:04 UTC] (XEN) Reboot in five seconds... [22:47:05 UTC] (XEN) DMAR_IQA_REG = 839b34002 [22:47:05 UTC] (XEN) DMAR_IQH_REG = 0 [22:47:05 UTC] (XEN) DMAR_IQT_REG = 26b0 [22:47:05 UTC] (XEN) [22:47:05 UTC] (XEN) **************************************** [22:47:05 UTC] (XEN) Panic on CPU 0: [22:47:05 UTC] (XEN) queue invalidate wait descriptor was not executed [22:47:05 UTC] (XEN) **************************************** [22:47:05 UTC] (XEN) [22:47:05 UTC] (XEN) Reboot in five seconds... [23:09:54 UTC] The server is not powered on. The Virtual Serial Port is not available. The problem is quite difficult to reproduce (seen twice, unable to reproduce manually), but inspection of kexec_crash() path shows a certain deadlock because of one_cpu_only(). This patch replaces the use of one_cpu_only() with a recursive spinlock which provides exclusion between different CPUs racing into kexec_crash(), but prevents deadlock if the same CPU reenters kexec_crash() again. Signed-off-by: Andrew Cooper <andrew.cooper3@xxxxxxxxxx> diff -r 996fbd7657bb -r a559ec5225fc xen/common/kexec.c --- a/xen/common/kexec.c +++ b/xen/common/kexec.c @@ -234,13 +234,6 @@ void __init set_kexec_crash_area_size(u6 } } -static void one_cpu_only(void) -{ - /* Only allow the first cpu to continue - force other cpus to spin */ - if ( test_and_set_bit(KEXEC_FLAG_IN_PROGRESS, &kexec_flags) ) - for ( ; ; ) ; -} - /* Save the registers in the per-cpu crash note buffer. */ void kexec_crash_save_cpu(void) { @@ -294,19 +287,25 @@ static void kexec_common_shutdown(void) watchdog_disable(); console_start_sync(); spin_debug_disable(); - one_cpu_only(); acpi_dmar_reinstate(); } void kexec_crash(void) { + static DEFINE_SPINLOCK(crash_lock); int pos; pos = (test_bit(KEXEC_FLAG_CRASH_POS, &kexec_flags) != 0); if ( !test_bit(KEXEC_IMAGE_CRASH_BASE + pos, &kexec_flags) ) return; + /* In a cascade failure, it is possible to re-enter kexec_crash on the same + * CPU. This spinlock is deliberately never unlocked, to allow reentrance + * on the same CPU, but provides excusion from different CPUs racing. */ + spin_lock_recursive(&crash_lock); + kexecing = TRUE; + set_bit(KEXEC_FLAG_IN_PROGRESS, &kexec_flags); kexec_common_shutdown(); kexec_crash_save_cpu(); _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |