[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH] x86/nmi: Make external NMI injection reliably crash the host



>>> On 27.08.14 at 13:14, <ross.lagerwall@xxxxxxxxxx> wrote:
> On 08/26/2014 04:38 PM, Jan Beulich wrote:
>>>>> On 26.08.14 at 17:26, <ross.lagerwall@xxxxxxxxxx> wrote:
>>> On 08/26/2014 01:59 PM, Jan Beulich wrote:
>>>>>>> On 26.08.14 at 12:10, <ross.lagerwall@xxxxxxxxxx> wrote:
>>>>> @@ -3323,7 +3323,7 @@ void do_nmi(const struct cpu_user_regs *regs)
>>>>>                pci_serr_error(regs);
>>>>>            if ( reason & 0x40 )
>>>>>                io_check_error(regs);
>>>>> -        if ( !(reason & 0xc0) && !nmi_watchdog )
>>>>> +        if ( !(reason & 0xc0) )
>>>>>                unknown_nmi_error(regs, reason);
>>>>
>>>> As much as I like the original idea, I'm afraid this won't fly: I do
>>>> know of systems where bad motherboard design leads to neither
>>>> of these two bits ever getting set. I.e. at the very minimum we'd
>>>> need a command line option to restore old behavior. Personally I
>>>> think it should in fact remain default behavior, and new behavior
>>>> should only be enabled via command line option.
>>>
>>> Well the old behavior was different depending on whether the watchdog
>>> was enabled or not. Since the watchdog was disabled by default, that's
>>> no different from the behavior here.
>>>
>>> So are you thinking something like an ignore_unknown_nmi boolean
>>> parameter that defaults to true?
>>
>> More like a "watchdog=force" one, but right, since the watchdog
>> isn't being enabled by default, maybe making it an opt-out instead
>> of opt-in would indeed be acceptable.
>>
> 
> If bad motherboard design leads to neither of these bits being set (thus 
> always giving an unknown nmi error), can't the user set nmi=ignore on 
> the xen command-line to get the previous behavior?

As opt-out that could be acceptable, but I'm still not certain we
wouldn't better go with an opt-in, in which case things ought to
continue to work the way they are currently namely without
any "nmi=" option specified (and you clearly don't want Dom0 to
see NMIs it previously didn't get to see).

Actually part of my apparent confusion about the intended new
behavior stems from me not having spotted that you bail from
do_nmi() on a positively identified watchdog NMI. That, however,
is wrong in any event: You absolutely have to look at the two
reason bits, as an multiple sources may have triggered at (almost)
the same time. Linux over the last few years went through quite
some hoops to deal with such situations, making me assume they
aren't purely theoretical. So I think setting a flag instead, and
changing to

        if ( !(reason & 0xc0) && !watchdog )
            unknown_nmi_error(regs, reason);

would already eliminate some of the concerns. (Then, quite
obviously, an NMI ought to be unknown too if seen on a CPU
other than the BSP, yet identified as not being a watchdog
one. Implying that you may want nmi_watchdog_tick() to
return a tristate allowing to distinguish positive/negative/
don't-know.)

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.