[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH] x86/nmi: lower initial watchdog frequency to avoid boot hangs
On Mon, 5 Feb 2018 21:18:42 +0000 Igor Druzhinin <igor.druzhinin@xxxxxxxxxx> wrote: >We're noticing a reproducible system boot hang on certain >post-Skylake platforms where the BIOS is configured in >legacy boot mode with x2APIC disabled. The system stalls >immediately after writing the first SMP initialization >sequence into APIC ICR. > >The cause of the problem is watchdog NMI handler execution - >somewhere near the end of NMI handling (after it's already >rescheduled the next NMI) it tries to access IO port 0x61 >to get the actual NMI reason on CPU0. Unfortunately, this >port is emulated by BIOS using SMIs and this emulation >apparently might take more than we expect under certain >conditions. As the result, the system is constantly moving >between NMI and SMI handler and not making any progress. > >Just lower the initial frequency for now as we lower it later >even more anyway. I/O port 61h normally is not emulated by SMI legacy kbd handling code in BIOS, only ports like 60h, 64h, etc. Contrary to USB legacy emulation, it has to intercept port 61h via a different approach -- generic SMI I/O trap, which is not common (at least it was) to use by BIOSes... although it is possible as EFI interface and code for this is available. The assumption about port 61h being trapped by the SMI handler must be explicitly confirmed by checking I/O Trap control regs in the RCBA region. If I/O trap regs won't show an active I/O trap on I/O port 61h -- the root cause might be different (might even be related to stuff like NMI2SMI logic). If the problem is actually due to NMI handler being preempted by another NMI which occurred after (a long) execution of triggered SMI handler, it might be better to do all sensitive stuff before re-enabling NMIs by IRET in the NMI handler. >Signed-off-by: Igor Druzhinin <igor.druzhinin@xxxxxxxxxx> >--- > xen/arch/x86/nmi.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > >diff --git a/xen/arch/x86/nmi.c b/xen/arch/x86/nmi.c >index d7fce28..1eb2a32 100644 >--- a/xen/arch/x86/nmi.c >+++ b/xen/arch/x86/nmi.c >@@ -34,7 +34,8 @@ > #include <asm/apic.h> > > unsigned int nmi_watchdog = NMI_NONE; >-static unsigned int nmi_hz = HZ; >+/* initial watchdog frequency - shouldn't be too high to avoid boot hangs >*/ +static unsigned int nmi_hz = HZ / 10; > static unsigned int nmi_perfctr_msr; /* the MSR to reset in NMI > handler */ static unsigned int nmi_p4_cccr_val; > static DEFINE_PER_CPU(struct timer, nmi_timer); _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxxx https://lists.xenproject.org/mailman/listinfo/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |