[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] Xen 4.12.0-rc Hangs Around masked ExtINT on CPU#
On 3/23/2019 11:50 PM, Roger Pau Monné wrote: On Fri, Mar 22, 2019 at 05:46:26PM -0700, John L. Poole wrote:On 3/22/2019 7:40 AM, Andrew Cooper wrote:On 22/03/2019 09:53, John L. Poole wrote:3)Xen Source - here is the log of an attempt adding "cpuinfor maxcpus=1 watchdog" as an option in myman_xen.cfg: https://pastebin.com/b682FWmC (6 months) The last 12 lines: (XEN) [2019-03-22 09:37:49] Booting processor 2/4 eip 3e000 (XEN) [2019-03-22 09:35:28] Initializing CPU#2 (XEN) [2019-03-22 09:35:28] masked ExtINT on CPU#2 (XEN) [2019-03-22 09:35:28] CPU: Physical Processor ID: 0 (XEN) [2019-03-22 09:35:28] CPU: Processor Core ID: 2 (XEN) [2019-03-22 09:35:28] CPU: L1 I cache: 32K, L1 D cache: 24K (XEN) [2019-03-22 09:35:28] CPU: L2 cache: 1024K (XEN) [2019-03-22 09:35:28] CMCI: CPU2 has no CMCI support (XEN) [2019-03-22 09:35:28] CPU2: Thermal monitoring enabled (TM1) (XEN) [2019-03-22 09:37:49] CPU2: Intel(R) Atom(TM) CPU C2750 @ 2.40GHz stepping 08 (XEN) [2019-03-22 09:37:49] Adding cpu 2 to runqueue 0 (XEN) [2019-03-22 09:37:49] Removing cpu 2 from runqueue 0 (XEN) [2019-03-22 09:37:49] Booting processor 3/6 eip 3e000 Result: hangs around the same placeOk. Something is clearly stalling while we are trying to start secondary processors. Can you apply this patch and rebuild please? andrewcoop@andrewcoop:/local/xen.git$ git d diff --git a/xen/include/asm-x86/apic.h b/xen/include/asm-x86/apic.h index 9d7ec93..14ac0b1 100644 --- a/xen/include/asm-x86/apic.h +++ b/xen/include/asm-x86/apic.h @@ -5,7 +5,7 @@ #include <asm/fixmap.h> #include <asm/msr.h> -#define Dprintk(x...) do {} while (0) +#define Dprintk printk /* * Debugging macros which should give us some better diagnostics of the INIT-SIPI-SIPI mechanism. Do you have any options such as TXT or SMX enabled in firmware? They can interfere with AP bringup, so it would be useful to disable them for now. ~Andrewdone. I tried patching and then make, but ran into an error. So I performed: git pull make clean then verified the patch was still in effect, and then: make There was some problem in the install so I hand moved: ... -rw-r--r-- 1 root root2991647 Mar 22 11:01 xen-4.13-unstable.efi ... under /usr/local/src/xen/dist/install/usr/lib64/efi/ to /boot/efi/gentoo and renamed it man_xen.efi. Likewise, if found a xen kernel under /usr/local/src/xen/xen/dist/install/boot/ ... -rw-r--r-- 1 root root 1181850 Mar 22 11:01 xen-4.13-unstable.gz ... and moved it to /boot/efi/gentoo -- not renaming it and making sure /boot/efi/gentoo/man_xen.cfg defines the kernel as "xen-4.13-unstable.gz" Result: same failure, but with more debugging information. Here are the last ten lines (starting at line 287): (XEN) [2019-03-23 00:36:06] HVM: ASIDs enabled. (XEN) [2019-03-23 00:36:06] HVM: VMX enabled (XEN) [2019-03-23 00:36:06] HVM: Hardware Assisted Paging (HAP) detected (XEN) [2019-03-23 00:36:06] HVM: HAP page sizes: 4kB, 2MB (XEN) [2019-03-23 00:36:06] Booting processor 1/2 eip 3e000 (XEN) [2019-03-23 00:36:06] Setting warm reset code and vector. (XEN) [2019-03-23 00:36:06] 1. (XEN) [2019-03-23 00:36:06] 2. (XEN) [2019-03-23 00:36:06] 3. (XEN) [2019-03-23 00:36:06] Asserting INIT. (XEN) [2019-03-23 00:36:06] Waiting for send to finish... Here is the full boot log: https://pastebin.com/0LgrJH25I'm currently away from home, and cannot really help much ATM, also I don't have access to a system with a CPU that exhibits such behavior, much makes debugging it harder. I've taken a look at the difference in AP startup code between Linux and Xen at or before the point you get the hang, and I'm not able to spot anything obvious that could make Linux work and not Xen. I've realized however that Linux disables interrupts when writing to the local APIC ICR register for other reasons, but maybe this somehow affects bring up in this CPU, hence the patch below. Could you please give it a spin together with the patch provided by Andrew? There are other minor differences between Linux and Xen AP bring up, so I guess there are further changes to test if the patch below doesn't make things better. Thanks, Roger. ---8<--- diff --git a/xen/include/asm-x86/apic.h b/xen/include/asm-x86/apic.h index 9d7ec93042..f28e922e2e 100644 --- a/xen/include/asm-x86/apic.h +++ b/xen/include/asm-x86/apic.h @@ -138,8 +138,12 @@ static __inline void apic_icr_write(u32 low, u32 dest) apic_wrmsr(APIC_ICR, low | ((uint64_t)dest << 32)); else { + unsigned long flags; + + local_irq_save(flags); apic_mem_write(APIC_ICR2, dest << 24); apic_mem_write(APIC_ICR, low); + local_irq_restore(flags); } } The patch made a line of progress (got to Deasserting INIT): (XEN) [2019-03-24 16:17:26] HVM: VMX enabled (XEN) [2019-03-24 16:17:26] HVM: Hardware Assisted Paging (HAP) detected (XEN) [2019-03-24 16:17:27] HVM: HAP page sizes: 4kB, 2MB (XEN) [2019-03-24 16:17:27] Booting processor 1/2 eip 3e000 (XEN) [2019-03-24 16:17:27] Setting warm reset code and vector. (XEN) [2019-03-24 16:17:27] 1. (XEN) [2019-03-24 16:17:27] 2. (XEN) [2019-03-24 16:17:27] 3. (XEN) [2019-03-24 16:17:27] Asserting INIT. (XEN) [2019-03-24 16:17:27] Waiting for send to finish... (XEN) [2019-03-24 16:17:27] +Deasserting INIT. Full log at: Xen Source 4.13 unstable w/201903231150_pau.patch at: https://pastebin.com/eewfy91P For posterity, here's my patch log: zeta /usr/local/src/xen # patch <201903231150_pau.patch ...zeta /usr/local/src/xen # cat xen/include/asm-x86/apic.h |grep -n "restore(flags)" 146: local_irq_restore(flags);zeta /usr/local/src/xen # cat xen/include/asm-x86/apic.h |grep -n "Dprintk printk" 8:#define Dprintk printk zeta /usr/local/src/xen # makecp dist/install/usr/lib64/efi/xen-4.13-unstable.efi /boot/efi/gentoo/man_xen.efi cp dist/install/boot/xen-4.13-unstable.gz /boot/efi/gentoo reboot I performed a boot a 2nd time, the ending result was with these two lines (no "+"s and no Deasserting): (XEN) [2019-03-24 16:23:51] Asserting INIT. (XEN) [2019-03-24 16:23:51] Waiting for send to finish... I performed a boot a 3rd time, the ending result on my serial console was: (XEN) [2019-03-24 16:25:53] Waiting for send to finish... (XEN) [2019-03-24 16:25:53] +Deassertin Note: the console attached to the server ("server console") only had the "+" and no "Deassertin" [sic - missing "g"], so there seems to be an inconsistency between the server console's output and the serial port console. Probably not relevant to this inquiry, but I note it so that in the future I will always check the last entries in server's console vs. the serial PuTTy port. For what it is worth, I've posted UEFI vars, dmidecode, & hwinfo at: https://pastebin.com/d6zjv7x0 I am very impressed with the dedication you two have demonstrated. Thank you. -John _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxxx https://lists.xenproject.org/mailman/listinfo/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |