Xen project Mailing List

Re: [Xen-devel] [PATCH] xen/kexec: Prevent deadlock because of one_cpu_only()

To: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, <xen-devel@xxxxxxxxxxxxx>

From: Keir Fraser <keir.xen@xxxxxxxxx>

Date: Fri, 05 Apr 2013 22:21:36 +0100

Cc: Jan Beulich <JBeulich@xxxxxxxx>

Delivery-date: Fri, 05 Apr 2013 21:22:01 +0000

List-id: Xen developer discussion <xen-devel.lists.xen.org>

Thread-index: Ac4yQ42WiFAB9vj0EkGJmIUSBnchOg==

Thread-topic: [PATCH] xen/kexec: Prevent deadlock because of one_cpu_only()

On 05/04/2013 20:52, "Andrew Cooper" <andrew.cooper3@xxxxxxxxxx> wrote: > During automated testing of the Xen crash path via fatal NMIs, we discovered > the following deadlock: > > [22:47:04 UTC] (XEN) **************************************** > [22:47:04 UTC] (XEN) Panic on CPU 0: > [22:47:04 UTC] (XEN) FATAL TRAP: vector = 2 (nmi) > [22:47:04 UTC] (XEN) [error_code=0000] , IN INTERRUPT CONTEXT > [22:47:04 UTC] (XEN) **************************************** > [22:47:04 UTC] (XEN) > [22:47:04 UTC] (XEN) Reboot in five seconds... > [22:47:05 UTC] (XEN) DMAR_IQA_REG = 839b34002 > [22:47:05 UTC] (XEN) DMAR_IQH_REG = 0 > [22:47:05 UTC] (XEN) DMAR_IQT_REG = 26b0 > [22:47:05 UTC] (XEN) > [22:47:05 UTC] (XEN) **************************************** > [22:47:05 UTC] (XEN) Panic on CPU 0: > [22:47:05 UTC] (XEN) queue invalidate wait descriptor was not executed > [22:47:05 UTC] (XEN) **************************************** > [22:47:05 UTC] (XEN) > [22:47:05 UTC] (XEN) Reboot in five seconds... > [23:09:54 UTC] The server is not powered on. The Virtual Serial Port is not > available. > > The problem is quite difficult to reproduce (seen twice, unable to reproduce > manually), but inspection of kexec_crash() path shows a certain deadlock > because of one_cpu_only(). > > > This patch replaces the use of one_cpu_only() with a recursive spinlock which > provides exclusion between different CPUs racing into kexec_crash(), but > prevents deadlock if the same CPU reenters kexec_crash() again. Does kexec_crash() not mind being reentered in this way? -- Keir > Signed-off-by: Andrew Cooper <andrew.cooper3@xxxxxxxxxx> > > diff -r 996fbd7657bb -r a559ec5225fc xen/common/kexec.c > --- a/xen/common/kexec.c > +++ b/xen/common/kexec.c > @@ -234,13 +234,6 @@ void __init set_kexec_crash_area_size(u6 > } > } > > -static void one_cpu_only(void) > -{ > - /* Only allow the first cpu to continue - force other cpus to spin */ > - if ( test_and_set_bit(KEXEC_FLAG_IN_PROGRESS, &kexec_flags) ) > - for ( ; ; ) ; > -} > - > /* Save the registers in the per-cpu crash note buffer. */ > void kexec_crash_save_cpu(void) > { > @@ -294,19 +287,25 @@ static void kexec_common_shutdown(void) > watchdog_disable(); > console_start_sync(); > spin_debug_disable(); > - one_cpu_only(); > acpi_dmar_reinstate(); > } > > void kexec_crash(void) > { > + static DEFINE_SPINLOCK(crash_lock); > int pos; > > pos = (test_bit(KEXEC_FLAG_CRASH_POS, &kexec_flags) != 0); > if ( !test_bit(KEXEC_IMAGE_CRASH_BASE + pos, &kexec_flags) ) > return; > > + /* In a cascade failure, it is possible to re-enter kexec_crash on the > same > + * CPU. This spinlock is deliberately never unlocked, to allow > reentrance > + * on the same CPU, but provides excusion from different CPUs racing. */ > + spin_lock_recursive(&crash_lock); > + > kexecing = TRUE; > + set_bit(KEXEC_FLAG_IN_PROGRESS, &kexec_flags); > > kexec_common_shutdown(); > kexec_crash_save_cpu(); _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.