[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH] xen/kexec: Prevent deadlock because of one_cpu_only()


  • To: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, <xen-devel@xxxxxxxxxxxxx>
  • From: Keir Fraser <keir.xen@xxxxxxxxx>
  • Date: Fri, 05 Apr 2013 22:21:36 +0100
  • Cc: Jan Beulich <JBeulich@xxxxxxxx>
  • Delivery-date: Fri, 05 Apr 2013 21:22:01 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xen.org>
  • Thread-index: Ac4yQ42WiFAB9vj0EkGJmIUSBnchOg==
  • Thread-topic: [PATCH] xen/kexec: Prevent deadlock because of one_cpu_only()

On 05/04/2013 20:52, "Andrew Cooper" <andrew.cooper3@xxxxxxxxxx> wrote:

> During automated testing of the Xen crash path via fatal NMIs, we discovered
> the following deadlock:
> 
> [22:47:04 UTC] (XEN) ****************************************
> [22:47:04 UTC] (XEN) Panic on CPU 0:
> [22:47:04 UTC] (XEN) FATAL TRAP: vector = 2 (nmi)
> [22:47:04 UTC] (XEN) [error_code=0000] , IN INTERRUPT CONTEXT
> [22:47:04 UTC] (XEN) ****************************************
> [22:47:04 UTC] (XEN)
> [22:47:04 UTC] (XEN) Reboot in five seconds...
> [22:47:05 UTC] (XEN) DMAR_IQA_REG = 839b34002
> [22:47:05 UTC] (XEN) DMAR_IQH_REG = 0
> [22:47:05 UTC] (XEN) DMAR_IQT_REG = 26b0
> [22:47:05 UTC] (XEN)
> [22:47:05 UTC] (XEN) ****************************************
> [22:47:05 UTC] (XEN) Panic on CPU 0:
> [22:47:05 UTC] (XEN) queue invalidate wait descriptor was not executed
> [22:47:05 UTC] (XEN) ****************************************
> [22:47:05 UTC] (XEN)
> [22:47:05 UTC] (XEN) Reboot in five seconds...
> [23:09:54 UTC]  The server is not powered on.  The Virtual Serial Port is not
> available.
> 
> The problem is quite difficult to reproduce (seen twice, unable to reproduce
> manually), but inspection of kexec_crash() path shows a certain deadlock
> because of one_cpu_only().
> 
> 
> This patch replaces the use of one_cpu_only() with a recursive spinlock which
> provides exclusion between different CPUs racing into kexec_crash(), but
> prevents deadlock if the same CPU reenters kexec_crash() again.

Does kexec_crash() not mind being reentered in this way?

 -- Keir

> Signed-off-by: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
> 
> diff -r 996fbd7657bb -r a559ec5225fc xen/common/kexec.c
> --- a/xen/common/kexec.c
> +++ b/xen/common/kexec.c
> @@ -234,13 +234,6 @@ void __init set_kexec_crash_area_size(u6
>      }
>  }
>  
> -static void one_cpu_only(void)
> -{
> -    /* Only allow the first cpu to continue - force other cpus to spin */
> -    if ( test_and_set_bit(KEXEC_FLAG_IN_PROGRESS, &kexec_flags) )
> -        for ( ; ; ) ;
> -}
> -
>  /* Save the registers in the per-cpu crash note buffer. */
>  void kexec_crash_save_cpu(void)
>  {
> @@ -294,19 +287,25 @@ static void kexec_common_shutdown(void)
>      watchdog_disable();
>      console_start_sync();
>      spin_debug_disable();
> -    one_cpu_only();
>      acpi_dmar_reinstate();
>  }
>  
>  void kexec_crash(void)
>  {
> +    static DEFINE_SPINLOCK(crash_lock);
>      int pos;
>  
>      pos = (test_bit(KEXEC_FLAG_CRASH_POS, &kexec_flags) != 0);
>      if ( !test_bit(KEXEC_IMAGE_CRASH_BASE + pos, &kexec_flags) )
>          return;
>  
> +    /* In a cascade failure, it is possible to re-enter kexec_crash on the
> same
> +     * CPU.  This spinlock is deliberately never unlocked, to allow
> reentrance
> +     * on the same CPU, but provides excusion from different CPUs racing. */
> +    spin_lock_recursive(&crash_lock);
> +
>      kexecing = TRUE;
> +    set_bit(KEXEC_FLAG_IN_PROGRESS, &kexec_flags);
>  
>      kexec_common_shutdown();
>      kexec_crash_save_cpu();



_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.