[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH] x86/nmi: lower initial watchdog frequency to avoid boot hangs



On 06/02/18 16:23, Jan Beulich wrote:
>>>> On 06.02.18 at 17:14, <igor.druzhinin@xxxxxxxxxx> wrote:
>> On 06/02/18 16:07, Jan Beulich wrote:
>>>>>> On 05.02.18 at 22:18, <igor.druzhinin@xxxxxxxxxx> wrote:
>>>> --- a/xen/arch/x86/nmi.c
>>>> +++ b/xen/arch/x86/nmi.c
>>>> @@ -34,7 +34,8 @@
>>>>  #include <asm/apic.h>
>>>>  
>>>>  unsigned int nmi_watchdog = NMI_NONE;
>>>> -static unsigned int nmi_hz = HZ;
>>>> +/* initial watchdog frequency - shouldn't be too high to avoid boot hangs 
>> */
>>>> +static unsigned int nmi_hz = HZ / 10;
>>>
>>> For one - the comment should explain what "too high" means.
>>> Further - what if on another system 10Hz is still too high? I also hope
>>> you realize that you slow down boot a little for everyone just
>>> because of this one machine model. Can the lower frequency perhaps
>>> be set via DMI quirk, or otherwise obtain a command line override
>>> (perhaps something like "watchdog=probe:10Hz")?
>>>
>>
>> I can improve the comment message.
>> Why does this change slow down anything while I'm lowering the frequency
>> - not making it higher?
> 
> We wait for two occurrences of the NMI in wait_for_nmis().
> 
>> The alternative approach would be to reshuffle
>> the code and take the reason before programming the next interrupt as
>> suggested before. In that case the actual frequency would be adjusted
>> naturally I think.
> 
> Thinking about this, reading the reason early seems like a good idea
> to me irrespective of the issue here.
>

I ran a couple of experiments with different layouts in NMI handler:
it looks like it doesn't really help as merely having this instruction
inside the handler and running it at 100Hz breaks a number of timeouts
in SMP bootstrap code and makes it unstable. So we are back to lowering
the frequency as I'm now out of ideas.

The problem with a quirk/commandline parameter is that the issue is
reported for a wide variety of systems and, as it looks like, depends on
the default BIOS setup - means it's hard to identify particular
machines. We should obviously sort this out with Intel but until then
lowering the initial frequency is our only option.

Igor

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.