[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH 2/2] x86/crash: Disable the watchdog NMIs on the crashing cpu.

>>> On 18.11.13 at 11:33, Andrew Cooper <andrew.cooper3@xxxxxxxxxx> wrote:
> On 18/11/13 09:26, Jan Beulich wrote:
>>>>> On 15.11.13 at 21:32, Andrew Cooper <andrew.cooper3@xxxxxxxxxx> wrote:
>>> --- a/xen/arch/x86/crash.c
>>> +++ b/xen/arch/x86/crash.c
>>> @@ -118,6 +118,7 @@ static void nmi_shootdown_cpus(void)
>>>      unsigned long msecs;
>>>      int i, cpu = smp_processor_id();
>>> +    disable_lapic_nmi_watchdog();
>>>      local_irq_disable();
>>>      crashing_cpu = cpu;
>> _If_ you do this here, I wonder why it's being done before
>> disabling interrupts.
>> But then again I wonder whether it wouldn't be better to do
>> this even earlier (i.e. by passing a flag to watchdog_disable()),
>> as the NMI watchdog becomes useless with that call being done
>> from kexec_common_shutdown().
> Disabling interrupts here is more defensive coding than anything else. 
> It is not expected to be able to get here with interrupts enabled, but
> in a crash
> Putting this in watchdog_disable() would result in a race condition. 
> disable_lapic_nmi_watchdog() mutates global state, meaning that it can
> only possibly run correctly on a single cpu. In an ideal world with
> plenty of time, the lapic watchdog code could be improved.  However,
> restricting its use until after one_cpu_only() is the easiest fix.

Just looked at disable_lapic_nmi_watchdog() another time, and I
don't see the race. What I do see though is that you'd need to
run this on each CPU for it to have the full intended effect...


Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.