[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH 2/2] x86/crash: Disable the watchdog NMIs on the crashing cpu.

On 18/11/13 11:04, Jan Beulich wrote:
>>>> On 18.11.13 at 11:33, Andrew Cooper <andrew.cooper3@xxxxxxxxxx> wrote:
>> On 18/11/13 09:26, Jan Beulich wrote:
>>>>>> On 15.11.13 at 21:32, Andrew Cooper <andrew.cooper3@xxxxxxxxxx> wrote:
>>>> --- a/xen/arch/x86/crash.c
>>>> +++ b/xen/arch/x86/crash.c
>>>> @@ -118,6 +118,7 @@ static void nmi_shootdown_cpus(void)
>>>>      unsigned long msecs;
>>>>      int i, cpu = smp_processor_id();
>>>> +    disable_lapic_nmi_watchdog();
>>>>      local_irq_disable();
>>>>      crashing_cpu = cpu;
>>> _If_ you do this here, I wonder why it's being done before
>>> disabling interrupts.
>>> But then again I wonder whether it wouldn't be better to do
>>> this even earlier (i.e. by passing a flag to watchdog_disable()),
>>> as the NMI watchdog becomes useless with that call being done
>>> from kexec_common_shutdown().
>> Disabling interrupts here is more defensive coding than anything else. 
>> It is not expected to be able to get here with interrupts enabled, but
>> in a crash
>> Putting this in watchdog_disable() would result in a race condition. 
>> disable_lapic_nmi_watchdog() mutates global state, meaning that it can
>> only possibly run correctly on a single cpu. In an ideal world with
>> plenty of time, the lapic watchdog code could be improved.  However,
>> restricting its use until after one_cpu_only() is the easiest fix.
> Just looked at disable_lapic_nmi_watchdog() another time, and I
> don't see the race. What I do see though is that you'd need to
> run this on each CPU for it to have the full intended effect...
> Jan

disable_lapic_nmi_watchdog() sets nmi_active to -1, which casues
subsequent calls to exit early.

There are possible crash paths where a first cpu will execute
watchdog_disable() but a subsequent cpu will complete the kexec path. 
Hooking disable_lapic_nmi_watchdog() off watchdog_disable() will result
in certain paths where the wrong cpu ends up with its MSRs properly



Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.