[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v3] x86/nmi: start NMI watchdog on CPU0 after SMP bootstrap



On Mon, 19 Feb 2018 14:48:37 +0000
Andrew Cooper <andrew.cooper3@xxxxxxxxxx> wrote:

>On 19/02/18 14:23, Igor Druzhinin wrote:
>> We're noticing a reproducible system boot hang on certain
>> post-Skylake platforms where the BIOS is configured in  
>
>These are Skylake, not post-Skylake.

Well, strictly speaking, any platforms which made obsolete REF_TOGGLE
bit in NMI_SC (I/O port 61h) and provide sw emulation for it via the
SMI I/O trap instead.

>> legacy boot mode with x2APIC disabled. The system stalls
>> immediately after writing the first SMP initialization
>> sequence into APIC ICR.
>>
>> The cause of the problem is watchdog NMI handler execution -
>> somewhere near the end of NMI handling (after it's already
>> rescheduled the next NMI) it tries to access IO port 0x61
>> to get the actual NMI reason on CPU0. Unfortunately, this
>> port is emulated by BIOS using SMIs and this emulation for
>> some reason takes more time than we expect during INIT-SIPI-SIPI
>> sequence. As the result, the system is constantly moving between
>> NMI and SMI handler and not making any progress.
>>
>> To avoid this, initialize the watchdog after SMP bootstrap on
>> CPU0 and, additionally, protect the NMI handler by moving
>> IO port access before NMI re-scheduling. The latter should help
>> in case of post boot CPU onlining. Although we're running
>> watchdog at much lower frequency it's neveretheless possible
>> we may trigger the issue anyway.
>>
>> Signed-off-by: Igor Druzhinin <igor.druzhinin@xxxxxxxxxx>
>> ---
>> v3: corrected comments and coommit meesage.
>> ---
>>  xen/arch/x86/apic.c    |  2 +-
>>  xen/arch/x86/smpboot.c |  3 +++
>>  xen/arch/x86/traps.c   | 12 ++++++++++--
>>  3 files changed, 14 insertions(+), 3 deletions(-)
>>
>> diff --git a/xen/arch/x86/apic.c b/xen/arch/x86/apic.c
>> index 5039173..ffa5a69 100644
>> --- a/xen/arch/x86/apic.c
>> +++ b/xen/arch/x86/apic.c
>> @@ -684,7 +684,7 @@ void setup_local_APIC(void)
>>          printk("Leaving ESR disabled.\n");
>>      }
>>  
>> -    if (nmi_watchdog == NMI_LOCAL_APIC)
>> +    if (nmi_watchdog == NMI_LOCAL_APIC && smp_processor_id())
>>          setup_apic_nmi_watchdog();
>>      apic_pm_activate();
>>  }
>> diff --git a/xen/arch/x86/smpboot.c b/xen/arch/x86/smpboot.c
>> index 2ebef03..1844116 100644
>> --- a/xen/arch/x86/smpboot.c
>> +++ b/xen/arch/x86/smpboot.c
>> @@ -1248,7 +1248,10 @@ int __cpu_up(unsigned int cpu)
>>  void __init smp_cpus_done(void)
>>  {
>>      if ( nmi_watchdog == NMI_LOCAL_APIC )
>> +    {
>> +        setup_apic_nmi_watchdog();
>>          check_nmi_watchdog();
>> +    }
>>  
>>      setup_ioapic_dest();
>>  
>> diff --git a/xen/arch/x86/traps.c b/xen/arch/x86/traps.c
>> index 2e022b0..e6c7487 100644
>> --- a/xen/arch/x86/traps.c
>> +++ b/xen/arch/x86/traps.c
>> @@ -1706,7 +1706,7 @@ static nmi_callback_t *nmi_callback =
>> dummy_nmi_callback; void do_nmi(const struct cpu_user_regs *regs)
>>  {
>>      unsigned int cpu = smp_processor_id();
>> -    unsigned char reason;
>> +    unsigned char reason = 0;
>>      bool handle_unknown = false;
>>  
>>      ++nmi_count(cpu);
>> @@ -1714,6 +1714,15 @@ void do_nmi(const struct cpu_user_regs *regs)
>>      if ( nmi_callback(regs, cpu) )
>>          return;
>>  
>> +    /*
>> +     * There is a chance that this IO port access will produce SMI
>> which,
>> +     * in turn, may take enough time for the next NMI tick to
>> happen.
>> +     * To avoid having nested NMIs as the result let's do it before
>> +     * watchdog re-scheduling.  
>
>This isn't strictly accurate.  How about:
>
>/* Reads of 0x61 may trap to SMM, and on production SKX servers, have
>been observed to take up to 200ms to complete.  By reading this port
>before we re-arm the NMI watchdog, we reduce the chance of having an
>NMI watchdog expire while in the SMI handler. */
>
>In particular, if we are servicing a non-watchdog NMI, the watchdog
>will still be counting down while the SMI executes.
>
>~Andrew
>
>> +     */
>> +    if ( cpu == 0 )
>> +        reason = inb(0x61);
>> +
>>      if ( (nmi_watchdog == NMI_NONE) ||
>>           (!nmi_watchdog_tick(regs) && watchdog_force) )
>>          handle_unknown = true;
>> @@ -1721,7 +1730,6 @@ void do_nmi(const struct cpu_user_regs *regs)
>>      /* Only the BSP gets external NMIs from the system. */
>>      if ( cpu == 0 )
>>      {
>> -        reason = inb(0x61);
>>          if ( reason & 0x80 )
>>              pci_serr_error(regs);
>>          if ( reason & 0x40 )  
>
>
>_______________________________________________
>Xen-devel mailing list
>Xen-devel@xxxxxxxxxxxxxxxxxxxx
>https://lists.xenproject.org/mailman/listinfo/xen-devel

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.