[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [Xen-users] Random Hangs At Boot Around setup_local_APIC(void): ExtINT on CPU#
I posted to this list a few days ago a problem I was having with an Intel Atom-based computer with UEFI and Xen, subject "Boot Sometimes Hangs At "masked EXTINT" (Varies)" I am experiencing random hangs at start-up before the posting event of "(XEN) [...] Brought up 8 CPUs" and after the event "(XEN) [...] HVM: HAP page sizes: 4kB, 2MB". Here is a successful startup log (that I randomly can obtain if I keep rebooting) of the section where the hang occurs: (XEN) [2019-03-04 22:43:05] HVM: Hardware Assisted Paging (HAP) detected (XEN) [2019-03-04 22:43:05] HVM: HAP page sizes: 4kB, 2MB (XEN) [2019-03-04 22:43:01] masked ExtINT on CPU#1 (XEN) [2019-03-04 22:43:01] masked ExtINT on CPU#2 (XEN) [2019-03-04 22:43:01] masked ExtINT on CPU#3 (XEN) [2019-03-04 22:43:01] masked ExtINT on CPU#4 (XEN) [2019-03-04 22:43:01] masked ExtINT on CPU#5 (XEN) [2019-03-04 22:43:01] masked ExtINT on CPU#6 (XEN) [2019-03-04 22:43:01] masked ExtINT on CPU#7 (XEN) [2019-03-04 22:43:06] Brought up 8 CPUs Without altering anything, e.g. touching my grub.cfg, I can repeatedly try to launch the Xen hypervisor and obtain inconsistent results. Sometimes booting will hang after printing out "masked ExtINT on CPU#1", or #2, or #3...6, and then sometimes I can make it through that "masked..." output to a successful start-up. This randomness causes me to believe there is something hardware related that has not been accounted for by the software. I could not find an official web page for the code that I could link to as I might were it on GitHub, so I'll just have to provide references. The print-out of "masked ExtINT on CPU#..." occurs within api.c under the function setup_local_APIC(void) around line 646 (version 11.1?). Oh, and I have tried versions Gentoo's Xen 10.2 and 11.1 and see no difference in the errant behavior. I'm wondering if the following post to a forum I found bears on the issue: ======= start posting ===== I remember seeing something like this in the past and it turned out to be a BIOS issue. BIOS was enabling the APs to interact with the legacy 8259 interrupt controller when only the BSP should. During POST the APs were exposed to ExtINT/INTR events as a result of the mis-configuration (probably due to a UEFI timer-tick using the 8259) and this left a pending ExtINT/INTR interrupt latched on the APs. When the APs were started by the OS, the latched ExtINT/INTR interrupt is processed shortly after the OS enables interrupts. The AP then queries the 8259 to identify the vector number (which is the value of the 8259's ICW2 register + the IRQ level). The master 8259's ICW2 was set to 0x30 and, since no interrupts are actually pending, the 8259 will respond with IRQ7 (spurious interrupt) yielding a vector of 0x37 or 55. The OS was not expecting vector 55 and printed the message. From the Intel Developer's Manual: Vol 3a, Section 10.5.1: "Only one processor in the system should have an LVT entry configured to use the ExtINT delivery mode." ======= end posting ===== From https://lkml.org/lkml/2019/3/5/538 If someone wants to provide a patch that I could apply to Gentoo's package, I can run it to see if there is something afoot that has not been considered. Otherwise, any suggestions on how to work around this problem? John _______________________________________________ Xen-users mailing list Xen-users@xxxxxxxxxxxxxxxxxxxx https://lists.xenproject.org/mailman/listinfo/xen-users
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |