Xen project Mailing List

Re: [Xen-devel] [PATCH] xen/kexec: Prevent deadlock because of one_cpu_only()

From: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>

Date: Sat, 6 Apr 2013 16:03:43 +0100

Cc: Jan Beulich <JBeulich@xxxxxxxx>, "xen-devel@xxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxx>

Delivery-date: Sat, 06 Apr 2013 15:04:18 +0000

List-id: Xen developer discussion <xen-devel.lists.xen.org>

On 05/04/2013 22:21, Keir Fraser wrote: > On 05/04/2013 20:52, "Andrew Cooper" <andrew.cooper3@xxxxxxxxxx> wrote: > >> During automated testing of the Xen crash path via fatal NMIs, we discovered >> the following deadlock: >> >> [22:47:04 UTC] (XEN) **************************************** >> [22:47:04 UTC] (XEN) Panic on CPU 0: >> [22:47:04 UTC] (XEN) FATAL TRAP: vector = 2 (nmi) >> [22:47:04 UTC] (XEN) [error_code=0000] , IN INTERRUPT CONTEXT >> [22:47:04 UTC] (XEN) **************************************** >> [22:47:04 UTC] (XEN) >> [22:47:04 UTC] (XEN) Reboot in five seconds... >> [22:47:05 UTC] (XEN) DMAR_IQA_REG = 839b34002 >> [22:47:05 UTC] (XEN) DMAR_IQH_REG = 0 >> [22:47:05 UTC] (XEN) DMAR_IQT_REG = 26b0 >> [22:47:05 UTC] (XEN) >> [22:47:05 UTC] (XEN) **************************************** >> [22:47:05 UTC] (XEN) Panic on CPU 0: >> [22:47:05 UTC] (XEN) queue invalidate wait descriptor was not executed >> [22:47:05 UTC] (XEN) **************************************** >> [22:47:05 UTC] (XEN) >> [22:47:05 UTC] (XEN) Reboot in five seconds... >> [23:09:54 UTC] The server is not powered on. The Virtual Serial Port is not >> available. >> >> The problem is quite difficult to reproduce (seen twice, unable to reproduce >> manually), but inspection of kexec_crash() path shows a certain deadlock >> because of one_cpu_only(). >> >> >> This patch replaces the use of one_cpu_only() with a recursive spinlock which >> provides exclusion between different CPUs racing into kexec_crash(), but >> prevents deadlock if the same CPU reenters kexec_crash() again. > Does kexec_crash() not mind being reentered in this way? > > -- Keir Not really. I first experimented with seeing whether it was possible to detect reentrance and back out, but there certainly cases with the reentrant NMI/MCE where you can loose the state required to get back to the outer instance of kexec_crash. Furthermore, in the case of a crash its possible memory corruption has occurred, meaning faults are more likely, leading to reentrance back onto the kexec_crash() path. Also, there are BUG()s and ASSERTS()s in the current kexec_crash() path. Therefore, the sensible approach is to make kexec_crash() safe to be reentered, even if we do make all attempts to minimise the chances of reentrance. Looking through the entire kexec_crash() path, console_start_sync() could deadlock on the serial tx_lock if a fault occurs midway through the function, and acpi_dmar_reinstate() will leave the DMAR table in a state which fails the checksum. kexec_crash_save_cpu() might not completely write the crash notes, but there are no other obvious problems. I can probably fix the acpi_dmar_reinstate() and kexec_crash_save_cpu() issues without too much problem and will look into those in due course, but my next task is to see whether there was a programatic reason as to why the wait descriptor was not executed successfully. Given that on the crash path something has already gone wrong, I would argue this patch on the merit of removing a deadlock without making the situation any worse. I think it is impossible to have a kexec_crash() path which is safely reentrant and guarantee to work without issue; after all, if we reenter too many times because of faults we will likely triple fault. ~Andrew > >> Signed-off-by: Andrew Cooper <andrew.cooper3@xxxxxxxxxx> >> >> diff -r 996fbd7657bb -r a559ec5225fc xen/common/kexec.c >> --- a/xen/common/kexec.c >> +++ b/xen/common/kexec.c >> @@ -234,13 +234,6 @@ void __init set_kexec_crash_area_size(u6 >> } >> } >> >> -static void one_cpu_only(void) >> -{ >> - /* Only allow the first cpu to continue - force other cpus to spin */ >> - if ( test_and_set_bit(KEXEC_FLAG_IN_PROGRESS, &kexec_flags) ) >> - for ( ; ; ) ; >> -} >> - >> /* Save the registers in the per-cpu crash note buffer. */ >> void kexec_crash_save_cpu(void) >> { >> @@ -294,19 +287,25 @@ static void kexec_common_shutdown(void) >> watchdog_disable(); >> console_start_sync(); >> spin_debug_disable(); >> - one_cpu_only(); >> acpi_dmar_reinstate(); >> } >> >> void kexec_crash(void) >> { >> + static DEFINE_SPINLOCK(crash_lock); >> int pos; >> >> pos = (test_bit(KEXEC_FLAG_CRASH_POS, &kexec_flags) != 0); >> if ( !test_bit(KEXEC_IMAGE_CRASH_BASE + pos, &kexec_flags) ) >> return; >> >> + /* In a cascade failure, it is possible to re-enter kexec_crash on the >> same >> + * CPU. This spinlock is deliberately never unlocked, to allow >> reentrance >> + * on the same CPU, but provides excusion from different CPUs racing. */ >> + spin_lock_recursive(&crash_lock); >> + >> kexecing = TRUE; >> + set_bit(KEXEC_FLAG_IN_PROGRESS, &kexec_flags); >> >> kexec_common_shutdown(); >> kexec_crash_save_cpu(); > _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.