[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH] xen/kexec: Prevent deadlock because of one_cpu_only()



On 05/04/2013 22:21, Keir Fraser wrote:
> On 05/04/2013 20:52, "Andrew Cooper" <andrew.cooper3@xxxxxxxxxx> wrote:
>
>> During automated testing of the Xen crash path via fatal NMIs, we discovered
>> the following deadlock:
>>
>> [22:47:04 UTC] (XEN) ****************************************
>> [22:47:04 UTC] (XEN) Panic on CPU 0:
>> [22:47:04 UTC] (XEN) FATAL TRAP: vector = 2 (nmi)
>> [22:47:04 UTC] (XEN) [error_code=0000] , IN INTERRUPT CONTEXT
>> [22:47:04 UTC] (XEN) ****************************************
>> [22:47:04 UTC] (XEN)
>> [22:47:04 UTC] (XEN) Reboot in five seconds...
>> [22:47:05 UTC] (XEN) DMAR_IQA_REG = 839b34002
>> [22:47:05 UTC] (XEN) DMAR_IQH_REG = 0
>> [22:47:05 UTC] (XEN) DMAR_IQT_REG = 26b0
>> [22:47:05 UTC] (XEN)
>> [22:47:05 UTC] (XEN) ****************************************
>> [22:47:05 UTC] (XEN) Panic on CPU 0:
>> [22:47:05 UTC] (XEN) queue invalidate wait descriptor was not executed
>> [22:47:05 UTC] (XEN) ****************************************
>> [22:47:05 UTC] (XEN)
>> [22:47:05 UTC] (XEN) Reboot in five seconds...
>> [23:09:54 UTC]  The server is not powered on.  The Virtual Serial Port is not
>> available.
>>
>> The problem is quite difficult to reproduce (seen twice, unable to reproduce
>> manually), but inspection of kexec_crash() path shows a certain deadlock
>> because of one_cpu_only().
>>
>>
>> This patch replaces the use of one_cpu_only() with a recursive spinlock which
>> provides exclusion between different CPUs racing into kexec_crash(), but
>> prevents deadlock if the same CPU reenters kexec_crash() again.
> Does kexec_crash() not mind being reentered in this way?
>
>  -- Keir

Not really.

I first experimented with seeing whether it was possible to detect
reentrance and back out, but there certainly cases with the reentrant
NMI/MCE where you can loose the state required to get back to the outer
instance of kexec_crash.  Furthermore, in the case of a crash its
possible memory corruption has occurred, meaning faults are more likely,
leading to reentrance back onto the kexec_crash() path.  Also, there are
BUG()s and ASSERTS()s in the current kexec_crash() path.

Therefore, the sensible approach is to make kexec_crash() safe to be
reentered, even if we do make all attempts to minimise the chances of
reentrance.

Looking through the entire kexec_crash() path, console_start_sync()
could deadlock on the serial tx_lock if a fault occurs midway through
the function, and acpi_dmar_reinstate() will leave the DMAR table in a
state which fails the checksum.  kexec_crash_save_cpu() might not
completely write the crash notes, but there are no other obvious problems.

I can probably fix the acpi_dmar_reinstate() and kexec_crash_save_cpu()
issues without too much problem and will look into those in due course,
but my next task is to see whether there was a programatic reason as to
why the wait descriptor was not executed successfully.

Given that on the crash path something has already gone wrong, I would
argue this patch on the merit of removing a deadlock without making the
situation any worse.  I think it is impossible to have a kexec_crash()
path which is safely reentrant and guarantee to work without issue;
after all, if we reenter too many times because of faults we will likely
triple fault.

~Andrew

>
>> Signed-off-by: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
>>
>> diff -r 996fbd7657bb -r a559ec5225fc xen/common/kexec.c
>> --- a/xen/common/kexec.c
>> +++ b/xen/common/kexec.c
>> @@ -234,13 +234,6 @@ void __init set_kexec_crash_area_size(u6
>>      }
>>  }
>>  
>> -static void one_cpu_only(void)
>> -{
>> -    /* Only allow the first cpu to continue - force other cpus to spin */
>> -    if ( test_and_set_bit(KEXEC_FLAG_IN_PROGRESS, &kexec_flags) )
>> -        for ( ; ; ) ;
>> -}
>> -
>>  /* Save the registers in the per-cpu crash note buffer. */
>>  void kexec_crash_save_cpu(void)
>>  {
>> @@ -294,19 +287,25 @@ static void kexec_common_shutdown(void)
>>      watchdog_disable();
>>      console_start_sync();
>>      spin_debug_disable();
>> -    one_cpu_only();
>>      acpi_dmar_reinstate();
>>  }
>>  
>>  void kexec_crash(void)
>>  {
>> +    static DEFINE_SPINLOCK(crash_lock);
>>      int pos;
>>  
>>      pos = (test_bit(KEXEC_FLAG_CRASH_POS, &kexec_flags) != 0);
>>      if ( !test_bit(KEXEC_IMAGE_CRASH_BASE + pos, &kexec_flags) )
>>          return;
>>  
>> +    /* In a cascade failure, it is possible to re-enter kexec_crash on the
>> same
>> +     * CPU.  This spinlock is deliberately never unlocked, to allow
>> reentrance
>> +     * on the same CPU, but provides excusion from different CPUs racing. */
>> +    spin_lock_recursive(&crash_lock);
>> +
>>      kexecing = TRUE;
>> +    set_bit(KEXEC_FLAG_IN_PROGRESS, &kexec_flags);
>>  
>>      kexec_common_shutdown();
>>      kexec_crash_save_cpu();
>


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.