Re: [Xen-devel] [PATCH] x86/nmi: Make external NMI injection reliably crash the host

On 26/08/2014 22:51, Don Slutz wrote:
> On 08/26/14 12:51, Andrew Cooper wrote:
>> On 26/08/14 17:06, Don Slutz wrote:
>>> On 08/26/14 06:10, Ross Lagerwall wrote:
>>>> Change the watchdog handler to only "tick" if the corresponding perf
>>>> counter has overflowed; otherwise, return false from the NMI
>>>> handler to
>>>> indicate that the NMI is not a watchdog tick and let the other
>>>> handlers
>>>> handle it.  This allows externally injected NMIs to reliably crash the
>>>> host rather than be swallowed by the watchdog handler.
>>> If a crash kernel has been setup via kexec, does this change to
>>> "crash host" ends up jumping into the crash kernel?
>>>      -Don Slutz
>> No - this has no change of behaviour as to how Xen proceeds after it has
>> decided to panic().
>> It does however change whether Xen decided to panic, depending on
>> whether the NMI was a result of the watchdog, or some otherwise
>> unidentified NMI.
>> Basically, without this change, the "inject fatal NMI" option in most
>> IPMI controllers doesn't work in combination with running the Xen
>> watchdog.  Only certain HP systems appear to set the IOCK bit in the
>> system control port B when injecting an NMI.  All other systems just
>> send an NMI with no change to the control ports, which get eaten by the
>> watchdog logic.
>> This patch changes the watchdog logic to only consider an NMI as a
>> watchdog tick if the perf counter confirms that it injected the NMI.
> Well, that is useful information.  Looks like I was not clear.  I am
> reading
>> as to how Xen proceeds after it has
> > decided to panic().
> As a yes, but you start with a no.  And I am getting "crash host" to
> mean "calls panic()".
>    -Don Slutz
>> ~Andrew

Allow me to try again.

This patch will alter how NMIs are classified.  It does not alter the
actions of a particular classification of NMI.

Before this patch, any NMI which did not explicitly set the IOCK/SERR
bit in the system control port B would be considered a watchdog NMI, and
ignored if the watchdog was active.  The vast majority of "inject NMI"
options from IPMI controllers do not set the IOCK/SERR bit.

After this patch is applied, NMIs which are received but not generated
by the watchdog performance counters will be considered as external NMIs
*even if* the IOCK/SERR bits are not set.

The action taken upon discovery of these NMIs is still controlled by the
nmi=fatal/dom0/ignore command line option,  and in the case of
nmi=fatal, panic() is still called as before.

Realistically, it means that, with the NMI watchdog enabled, using the
"inject NMI" button on your Dell/SuperMicro/IBM/Quanta/Intel IPMI
interface will be classified as an external NMI rather than a watchdog
NMI, and in the case of nmi=fatal, will call panic().  (Certain HP
servers are the only ones we have encountered which reliably set the
IOCK bit when injecting an NMI from the iLO interface)


