[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Xen-changelog] [xen master] kexec: prevent deadlock on reentry to the crash path



commit 470f58c159410b280627c2ea7798ea12ad93bd7c
Author:     Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
AuthorDate: Wed Nov 27 15:13:48 2013 +0100
Commit:     Jan Beulich <jbeulich@xxxxxxxx>
CommitDate: Wed Nov 27 15:13:48 2013 +0100

    kexec: prevent deadlock on reentry to the crash path
    
    In some cases, such as suffering a queued-invalidation timeout while
    performing an iommu_crash_shutdown(), Xen can end up reentering the crash
    path. Previously, this would result in a deadlock in one_cpu_only(), as the
    test_and_set_bit() would fail.
    
    The crash path is not reentrant, and even if it could be made to be so, it 
is
    almost certain that we would fall over the same reentry condition again.
    
    The new code can distinguish a reentry case from multiple cpus racing down 
the
    crash path.  In the case that a reentry is detected, return back out to the
    nested panic() call, which will maybe_reboot() on our behalf.  This 
requires a
    bit of return plumbing back up to kexec_crash().
    
    While fixing this deadlock, also fix up an minor niggle seen recently from a
    XenServer crash report.  The report was from a Bank 8 MCE, which had managed
    to crash on all cpus at once.  The result was a lot of stack traces with 
cpus
    in kexec_common_shutdown(), which was infact the inlined version of
    one_cpu_only().  The kexec crash path is not a hotpath, so we can easily
    afford to prevent inlining for the sake of clarity in the stack traces.
    
    Signed-off-by: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
    Reviewed-by: David Vrabel <david.vrabel@xxxxxxxxxx>
---
 xen/common/kexec.c |   51 ++++++++++++++++++++++++++++++++++++++++++++-------
 1 files changed, 44 insertions(+), 7 deletions(-)

diff --git a/xen/common/kexec.c b/xen/common/kexec.c
index 17f3ed7..481b0c2 100644
--- a/xen/common/kexec.c
+++ b/xen/common/kexec.c
@@ -233,11 +233,39 @@ void __init set_kexec_crash_area_size(u64 system_ram)
     }
 }
 
-static void one_cpu_only(void)
+/*
+ * Only allow one cpu to continue on the crash path, forcing others to spin.
+ * Racing on the crash path from here will end in misery.  If we reenter,
+ * something has very gone wrong and retrying will (almost certainly) be
+ * futile.  Return up to our nested panic() to try and reboot.
+ *
+ * This is noinline to make it obvious in stack traces which cpus have lost
+ * the race (as opposed to being somewhere in kexec_common_shutdown())
+ */
+static int noinline one_cpu_only(void)
 {
-    /* Only allow the first cpu to continue - force other cpus to spin */
-    if ( test_and_set_bit(KEXEC_FLAG_IN_PROGRESS, &kexec_flags) )
-        for ( ; ; ) ;
+    static unsigned int crashing_cpu = -1;
+    unsigned int cpu = smp_processor_id();
+
+    if ( cmpxchg(&crashing_cpu, -1, cpu) != -1 )
+    {
+        /* Not the first entry into one_cpu_only(). */
+        if ( crashing_cpu == cpu )
+        {
+            printk("Reentered the crash path.  Something is very broken\n");
+            return -EBUSY;
+        }
+
+        /*
+         * Another cpu has beaten us to this point.  Wait here patiently for
+         * it to kill us.
+         */
+        for ( ; ; )
+            halt();
+    }
+
+    set_bit(KEXEC_FLAG_IN_PROGRESS, &kexec_flags);
+    return 0;
 }
 
 /* Save the registers in the per-cpu crash note buffer. */
@@ -288,13 +316,20 @@ crash_xen_info_t *kexec_crash_save_info(void)
     return out;
 }
 
-static void kexec_common_shutdown(void)
+static int kexec_common_shutdown(void)
 {
+    int ret;
+
+    ret = one_cpu_only();
+    if ( ret )
+        return ret;
+
     watchdog_disable();
     console_start_sync();
     spin_debug_disable();
-    one_cpu_only();
     acpi_dmar_reinstate();
+
+    return 0;
 }
 
 void kexec_crash(void)
@@ -309,7 +344,9 @@ void kexec_crash(void)
 
     kexecing = TRUE;
 
-    kexec_common_shutdown();
+    if ( kexec_common_shutdown() != 0 )
+        return;
+
     kexec_crash_save_cpu();
     machine_crash_shutdown();
     machine_kexec(kexec_image[KEXEC_IMAGE_CRASH_BASE + pos]);
--
generated by git-patchbot for /home/xen/git/xen.git#master

_______________________________________________
Xen-changelog mailing list
Xen-changelog@xxxxxxxxxxxxx
http://lists.xensource.com/xen-changelog


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.