Xen project Mailing List

Re: [Xen-devel] [PATCH 1/2] common/kexec: Prevent deadlock on reentry to the crash path.

To: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>

From: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>

Date: Fri, 22 Nov 2013 14:55:35 +0000

Cc: Tim Deegan <tim@xxxxxxx>, Keir Fraser <keir@xxxxxxx>, David Vrabel <david.vrabel@xxxxxxxxxx>, Jan Beulich <JBeulich@xxxxxxxx>, Xen-devel <xen-devel@xxxxxxxxxxxxx>

Delivery-date: Fri, 22 Nov 2013 14:55:51 +0000

List-id: Xen developer discussion <xen-devel.lists.xen.org>

Ping? On 15/11/13 20:32, Andrew Cooper wrote: > In some cases, such as suffering a queued-invalidation timeout while > performing an iommu_crash_shutdown(), Xen can end up reentering the crash > path. Previously, this would result in a deadlock in one_cpu_only(), as the > test_and_set_bit() would fail. > > The crash path is not reentrant, and even if it could be made to be so, it is > almost certain that we would fall over the same reentry condition again. > > The new code can distinguish a reentry case from multiple cpus racing down the > crash path. In the case that a reentry is detected, return back out to the > nested panic() call, which will maybe_reboot() on our behalf. This requires a > bit of return plumbing back up to kexec_crash(). > > While fixing this deadlock, also fix up an minor niggle seen recently from a > XenServer crash report. The report was from a Bank 8 MCE, which had managed > to crash on all cpus at once. The result was a lot of stack traces with cpus > in kexec_common_shutdown(), which was infact the inlined version of > one_cpu_only(). The kexec crash path is not a hotpath, so we can easily > afford to prevent inlining for the sake of clarity in the stack traces. > > Signed-off-by: Andrew Cooper <andrew.cooper3@xxxxxxxxxx> > CC: Keir Fraser <keir@xxxxxxx> > CC: Jan Beulich <JBeulich@xxxxxxxx> > CC: Tim Deegan <tim@xxxxxxx> > CC: David Vrabel <david.vrabel@xxxxxxxxxx> > --- > xen/common/kexec.c | 51 ++++++++++++++++++++++++++++++++++++++++++++------- > 1 file changed, 44 insertions(+), 7 deletions(-) > > diff --git a/xen/common/kexec.c b/xen/common/kexec.c > index 17f3ed7..481b0c2 100644 > --- a/xen/common/kexec.c > +++ b/xen/common/kexec.c > @@ -233,11 +233,39 @@ void __init set_kexec_crash_area_size(u64 system_ram) > } > } > > -static void one_cpu_only(void) > +/* > + * Only allow one cpu to continue on the crash path, forcing others to spin. > + * Racing on the crash path from here will end in misery. If we reenter, > + * something has very gone wrong and retrying will (almost certainly) be > + * futile. Return up to our nested panic() to try and reboot. > + * > + * This is noinline to make it obvious in stack traces which cpus have lost > + * the race (as opposed to being somewhere in kexec_common_shutdown()) > + */ > +static int noinline one_cpu_only(void) > { > - /* Only allow the first cpu to continue - force other cpus to spin */ > - if ( test_and_set_bit(KEXEC_FLAG_IN_PROGRESS, &kexec_flags) ) > - for ( ; ; ) ; > + static unsigned int crashing_cpu = -1; > + unsigned int cpu = smp_processor_id(); > + > + if ( cmpxchg(&crashing_cpu, -1, cpu) != -1 ) > + { > + /* Not the first entry into one_cpu_only(). */ > + if ( crashing_cpu == cpu ) > + { > + printk("Reentered the crash path. Something is very broken\n"); > + return -EBUSY; > + } > + > + /* > + * Another cpu has beaten us to this point. Wait here patiently for > + * it to kill us. > + */ > + for ( ; ; ) > + halt(); > + } > + > + set_bit(KEXEC_FLAG_IN_PROGRESS, &kexec_flags); > + return 0; > } > > /* Save the registers in the per-cpu crash note buffer. */ > @@ -288,13 +316,20 @@ crash_xen_info_t *kexec_crash_save_info(void) > return out; > } > > -static void kexec_common_shutdown(void) > +static int kexec_common_shutdown(void) > { > + int ret; > + > + ret = one_cpu_only(); > + if ( ret ) > + return ret; > + > watchdog_disable(); > console_start_sync(); > spin_debug_disable(); > - one_cpu_only(); > acpi_dmar_reinstate(); > + > + return 0; > } > > void kexec_crash(void) > @@ -309,7 +344,9 @@ void kexec_crash(void) > > kexecing = TRUE; > > - kexec_common_shutdown(); > + if ( kexec_common_shutdown() != 0 ) > + return; > + > kexec_crash_save_cpu(); > machine_crash_shutdown(); > machine_kexec(kexec_image[KEXEC_IMAGE_CRASH_BASE + pos]); _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.