[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH 2/2] x86/crash: Disable the watchdog NMIs on the crashing cpu.
On 18/11/13 11:04, Jan Beulich wrote: >>>> On 18.11.13 at 11:33, Andrew Cooper <andrew.cooper3@xxxxxxxxxx> wrote: >> On 18/11/13 09:26, Jan Beulich wrote: >>>>>> On 15.11.13 at 21:32, Andrew Cooper <andrew.cooper3@xxxxxxxxxx> wrote: >>>> --- a/xen/arch/x86/crash.c >>>> +++ b/xen/arch/x86/crash.c >>>> @@ -118,6 +118,7 @@ static void nmi_shootdown_cpus(void) >>>> unsigned long msecs; >>>> int i, cpu = smp_processor_id(); >>>> >>>> + disable_lapic_nmi_watchdog(); >>>> local_irq_disable(); >>>> >>>> crashing_cpu = cpu; >>> _If_ you do this here, I wonder why it's being done before >>> disabling interrupts. >>> >>> But then again I wonder whether it wouldn't be better to do >>> this even earlier (i.e. by passing a flag to watchdog_disable()), >>> as the NMI watchdog becomes useless with that call being done >>> from kexec_common_shutdown(). >> Disabling interrupts here is more defensive coding than anything else. >> It is not expected to be able to get here with interrupts enabled, but >> in a crash >> >> Putting this in watchdog_disable() would result in a race condition. >> disable_lapic_nmi_watchdog() mutates global state, meaning that it can >> only possibly run correctly on a single cpu. In an ideal world with >> plenty of time, the lapic watchdog code could be improved. However, >> restricting its use until after one_cpu_only() is the easiest fix. > Just looked at disable_lapic_nmi_watchdog() another time, and I > don't see the race. What I do see though is that you'd need to > run this on each CPU for it to have the full intended effect... > > Jan > disable_lapic_nmi_watchdog() sets nmi_active to -1, which casues subsequent calls to exit early. There are possible crash paths where a first cpu will execute watchdog_disable() but a subsequent cpu will complete the kexec path. Hooking disable_lapic_nmi_watchdog() off watchdog_disable() will result in certain paths where the wrong cpu ends up with its MSRs properly disabled. ~Andrew disable_lapic_nmi_watchdog() _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |