[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH v2] x86/nmi: start NMI watchdog on CPU0 after SMP bootstrap
On Fri, 16 Feb 2018 17:46:48 +0000 Igor Druzhinin <igor.druzhinin@xxxxxxxxxx> wrote: >We're noticing a reproducible system boot hang on certain >post-Skylake platforms where the BIOS is configured in >legacy boot mode with x2APIC disabled. The system stalls >immediately after writing the first SMP initialization >sequence into APIC ICR. > >The cause of the problem is watchdog NMI handler execution - >somewhere near the end of NMI handling (after it's already >rescheduled the next NMI) it tries to access IO port 0x61 >to get the actual NMI reason on CPU0. Unfortunately, this >port is emulated by BIOS using SMIs and this emulation for >some reason takes more time than we expect during INIT-SIPI-SIPI >sequence. As the result, the system is constantly moving between >NMI and SMI handler and not making any progress. > >To avoid this, initialize the watchdog after SMP bootstrap on >CPU0 and, additionally, protect the NMI handler by moving >IO port access before NMI re-scheduling. > >Signed-off-by: Igor Druzhinin <igor.druzhinin@xxxxxxxxxx> This is far better than merely reducing the NMI rate. The reason why SMI execution takes so long in the MP initialization sequence particularly is (very likely) a necessity to wait for other CPUs in the SMI handler. Sending INIT causes target CPU to become numb for a relatively long time (SDM propose 10ms delay in the MP init seq after it). If, right after sending the INIT IPI, we immediately cause SMI I/O trap by reading port 61h -- this means that at the beginning of the SMI handler the primary CPU will have to wait (via a spinlock) for other CPUs including the one which only started to come out from INIT coma. AFAIR it doesn't even begin execution of SMI processing until all CPUs sync their execution in SMM via that spinlock. Thus overall long SMI processing time, enough for a next NMI watchdog tick to arrive. Basically, it's a race between sending INIT to APs, triggering SMI by reading the port 61h, spin-waiting for other CPUs at the start of the SMI handler and the NMI watchdog timer (which generates flow of SMIs). If this assumptions are correct, delaying triggering SMIs due to a NMI watchdog until MP initialization is complete is the right solution. >--- > xen/arch/x86/apic.c | 2 +- > xen/arch/x86/smpboot.c | 3 +++ > xen/arch/x86/traps.c | 9 +++++++-- > 3 files changed, 11 insertions(+), 3 deletions(-) > >diff --git a/xen/arch/x86/apic.c b/xen/arch/x86/apic.c >index 5039173..ffa5a69 100644 >--- a/xen/arch/x86/apic.c >+++ b/xen/arch/x86/apic.c >@@ -684,7 +684,7 @@ void setup_local_APIC(void) > printk("Leaving ESR disabled.\n"); > } > >- if (nmi_watchdog == NMI_LOCAL_APIC) >+ if (nmi_watchdog == NMI_LOCAL_APIC && smp_processor_id()) > setup_apic_nmi_watchdog(); > apic_pm_activate(); > } >diff --git a/xen/arch/x86/smpboot.c b/xen/arch/x86/smpboot.c >index 2ebef03..1844116 100644 >--- a/xen/arch/x86/smpboot.c >+++ b/xen/arch/x86/smpboot.c >@@ -1248,7 +1248,10 @@ int __cpu_up(unsigned int cpu) > void __init smp_cpus_done(void) > { > if ( nmi_watchdog == NMI_LOCAL_APIC ) >+ { >+ setup_apic_nmi_watchdog(); > check_nmi_watchdog(); >+ } > > setup_ioapic_dest(); > >diff --git a/xen/arch/x86/traps.c b/xen/arch/x86/traps.c >index 2e022b0..c16f146 100644 >--- a/xen/arch/x86/traps.c >+++ b/xen/arch/x86/traps.c >@@ -1706,7 +1706,7 @@ static nmi_callback_t *nmi_callback = >dummy_nmi_callback; > void do_nmi(const struct cpu_user_regs *regs) > { > unsigned int cpu = smp_processor_id(); >- unsigned char reason; >+ unsigned char reason = 0; > bool handle_unknown = false; > > ++nmi_count(cpu); >@@ -1714,6 +1714,12 @@ void do_nmi(const struct cpu_user_regs *regs) > if ( nmi_callback(regs, cpu) ) > return; > >+ /* This IO port access is likely to produce SMI which, in turn, >+ * may take enough time for the next NMI tick to happen. To avoid >having >+ * nested NMIs as the result let's call it before watchdog >re-scheduling */ >+ if ( cpu == 0 ) >+ reason = inb(0x61); >+ > if ( (nmi_watchdog == NMI_NONE) || > (!nmi_watchdog_tick(regs) && watchdog_force) ) > handle_unknown = true; >@@ -1721,7 +1727,6 @@ void do_nmi(const struct cpu_user_regs *regs) > /* Only the BSP gets external NMIs from the system. */ > if ( cpu == 0 ) > { >- reason = inb(0x61); > if ( reason & 0x80 ) > pci_serr_error(regs); > if ( reason & 0x40 ) _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxxx https://lists.xenproject.org/mailman/listinfo/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |