|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH v3] x86/nmi: start NMI watchdog on CPU0 after SMP bootstrap
On 19/02/18 14:23, Igor Druzhinin wrote:
> We're noticing a reproducible system boot hang on certain
> post-Skylake platforms where the BIOS is configured in
These are Skylake, not post-Skylake.
> legacy boot mode with x2APIC disabled. The system stalls
> immediately after writing the first SMP initialization
> sequence into APIC ICR.
>
> The cause of the problem is watchdog NMI handler execution -
> somewhere near the end of NMI handling (after it's already
> rescheduled the next NMI) it tries to access IO port 0x61
> to get the actual NMI reason on CPU0. Unfortunately, this
> port is emulated by BIOS using SMIs and this emulation for
> some reason takes more time than we expect during INIT-SIPI-SIPI
> sequence. As the result, the system is constantly moving between
> NMI and SMI handler and not making any progress.
>
> To avoid this, initialize the watchdog after SMP bootstrap on
> CPU0 and, additionally, protect the NMI handler by moving
> IO port access before NMI re-scheduling. The latter should help
> in case of post boot CPU onlining. Although we're running
> watchdog at much lower frequency it's neveretheless possible
> we may trigger the issue anyway.
>
> Signed-off-by: Igor Druzhinin <igor.druzhinin@xxxxxxxxxx>
> ---
> v3: corrected comments and coommit meesage.
> ---
> xen/arch/x86/apic.c | 2 +-
> xen/arch/x86/smpboot.c | 3 +++
> xen/arch/x86/traps.c | 12 ++++++++++--
> 3 files changed, 14 insertions(+), 3 deletions(-)
>
> diff --git a/xen/arch/x86/apic.c b/xen/arch/x86/apic.c
> index 5039173..ffa5a69 100644
> --- a/xen/arch/x86/apic.c
> +++ b/xen/arch/x86/apic.c
> @@ -684,7 +684,7 @@ void setup_local_APIC(void)
> printk("Leaving ESR disabled.\n");
> }
>
> - if (nmi_watchdog == NMI_LOCAL_APIC)
> + if (nmi_watchdog == NMI_LOCAL_APIC && smp_processor_id())
> setup_apic_nmi_watchdog();
> apic_pm_activate();
> }
> diff --git a/xen/arch/x86/smpboot.c b/xen/arch/x86/smpboot.c
> index 2ebef03..1844116 100644
> --- a/xen/arch/x86/smpboot.c
> +++ b/xen/arch/x86/smpboot.c
> @@ -1248,7 +1248,10 @@ int __cpu_up(unsigned int cpu)
> void __init smp_cpus_done(void)
> {
> if ( nmi_watchdog == NMI_LOCAL_APIC )
> + {
> + setup_apic_nmi_watchdog();
> check_nmi_watchdog();
> + }
>
> setup_ioapic_dest();
>
> diff --git a/xen/arch/x86/traps.c b/xen/arch/x86/traps.c
> index 2e022b0..e6c7487 100644
> --- a/xen/arch/x86/traps.c
> +++ b/xen/arch/x86/traps.c
> @@ -1706,7 +1706,7 @@ static nmi_callback_t *nmi_callback =
> dummy_nmi_callback;
> void do_nmi(const struct cpu_user_regs *regs)
> {
> unsigned int cpu = smp_processor_id();
> - unsigned char reason;
> + unsigned char reason = 0;
> bool handle_unknown = false;
>
> ++nmi_count(cpu);
> @@ -1714,6 +1714,15 @@ void do_nmi(const struct cpu_user_regs *regs)
> if ( nmi_callback(regs, cpu) )
> return;
>
> + /*
> + * There is a chance that this IO port access will produce SMI which,
> + * in turn, may take enough time for the next NMI tick to happen.
> + * To avoid having nested NMIs as the result let's do it before
> + * watchdog re-scheduling.
This isn't strictly accurate. How about:
/* Reads of 0x61 may trap to SMM, and on production SKX servers, have
been observed to take up to 200ms to complete. By reading this port
before we re-arm the NMI watchdog, we reduce the chance of having an NMI
watchdog expire while in the SMI handler. */
In particular, if we are servicing a non-watchdog NMI, the watchdog will
still be counting down while the SMI executes.
~Andrew
> + */
> + if ( cpu == 0 )
> + reason = inb(0x61);
> +
> if ( (nmi_watchdog == NMI_NONE) ||
> (!nmi_watchdog_tick(regs) && watchdog_force) )
> handle_unknown = true;
> @@ -1721,7 +1730,6 @@ void do_nmi(const struct cpu_user_regs *regs)
> /* Only the BSP gets external NMIs from the system. */
> if ( cpu == 0 )
> {
> - reason = inb(0x61);
> if ( reason & 0x80 )
> pci_serr_error(regs);
> if ( reason & 0x40 )
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |