[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Xen 4.12.0-rc Hangs Around masked ExtINT on CPU#




On 3/23/2019 11:50 PM, Roger Pau Monné wrote:
On Fri, Mar 22, 2019 at 05:46:26PM -0700, John L. Poole wrote:
On 3/22/2019 7:40 AM, Andrew Cooper wrote:
On 22/03/2019 09:53, John L. Poole wrote:
3)Xen Source - here is the log of an attempt adding
"cpuinfor maxcpus=1 watchdog"
as an option in myman_xen.cfg:
https://pastebin.com/b682FWmC (6 months)

The last 12 lines:
(XEN) [2019-03-22 09:37:49] Booting processor 2/4 eip 3e000
(XEN) [2019-03-22 09:35:28] Initializing CPU#2
(XEN) [2019-03-22 09:35:28] masked ExtINT on CPU#2
(XEN) [2019-03-22 09:35:28] CPU: Physical Processor ID: 0
(XEN) [2019-03-22 09:35:28] CPU: Processor Core ID: 2
(XEN) [2019-03-22 09:35:28] CPU: L1 I cache: 32K, L1 D cache: 24K
(XEN) [2019-03-22 09:35:28] CPU: L2 cache: 1024K
(XEN) [2019-03-22 09:35:28] CMCI: CPU2 has no CMCI support
(XEN) [2019-03-22 09:35:28] CPU2: Thermal monitoring enabled (TM1)
(XEN) [2019-03-22 09:37:49] CPU2: Intel(R) Atom(TM) CPU  C2750 @
2.40GHz stepping 08
(XEN) [2019-03-22 09:37:49] Adding cpu 2 to runqueue 0
(XEN) [2019-03-22 09:37:49] Removing cpu 2 from runqueue 0
(XEN) [2019-03-22 09:37:49] Booting processor 3/6 eip 3e000

Result: hangs around the same place
Ok.  Something is clearly stalling while we are trying to start
secondary processors.

Can you apply this patch and rebuild please?

andrewcoop@andrewcoop:/local/xen.git$ git d
diff --git a/xen/include/asm-x86/apic.h b/xen/include/asm-x86/apic.h
index 9d7ec93..14ac0b1 100644
--- a/xen/include/asm-x86/apic.h
+++ b/xen/include/asm-x86/apic.h
@@ -5,7 +5,7 @@
   #include <asm/fixmap.h>
   #include <asm/msr.h>
-#define Dprintk(x...) do {} while (0)
+#define Dprintk printk
   /*
    * Debugging macros

which should give us some better diagnostics of the INIT-SIPI-SIPI
mechanism.

Do you have any options such as TXT or SMX enabled in firmware?  They
can interfere with AP bringup, so it would be useful to disable them for
now.

~Andrew
done.

I tried patching and then make, but ran into an error.  So I performed:

git pull
make clean

then verified the patch was still in effect, and then:

make

There was some problem in the install so I hand moved:
...
-rw-r--r-- 1 root root2991647 Mar 22 11:01 xen-4.13-unstable.efi
...
under /usr/local/src/xen/dist/install/usr/lib64/efi/
to /boot/efi/gentoo and renamed it man_xen.efi.

Likewise, if found a xen kernel under
/usr/local/src/xen/xen/dist/install/boot/
...
-rw-r--r-- 1 root root 1181850 Mar 22 11:01 xen-4.13-unstable.gz
...
and moved it to /boot/efi/gentoo -- not renaming it and
making sure /boot/efi/gentoo/man_xen.cfg defines the kernel as
"xen-4.13-unstable.gz"

Result: same failure, but with more debugging information.

Here are the last ten lines (starting at line 287):

(XEN) [2019-03-23 00:36:06] HVM: ASIDs enabled.
(XEN) [2019-03-23 00:36:06] HVM: VMX enabled
(XEN) [2019-03-23 00:36:06] HVM: Hardware Assisted Paging (HAP) detected
(XEN) [2019-03-23 00:36:06] HVM: HAP page sizes: 4kB, 2MB
(XEN) [2019-03-23 00:36:06] Booting processor 1/2 eip 3e000
(XEN) [2019-03-23 00:36:06] Setting warm reset code and vector.
(XEN) [2019-03-23 00:36:06] 1.
(XEN) [2019-03-23 00:36:06] 2.
(XEN) [2019-03-23 00:36:06] 3.
(XEN) [2019-03-23 00:36:06] Asserting INIT.
(XEN) [2019-03-23 00:36:06] Waiting for send to finish...

Here is the full boot log:
https://pastebin.com/0LgrJH25
I'm currently away from home, and cannot really help much ATM, also I
don't have access to a system with a CPU that exhibits such behavior,
much makes debugging it harder.

I've taken a look at the difference in AP startup code between Linux
and Xen at or before the point you get the hang, and I'm not able to
spot anything obvious that could make Linux work and not Xen.

I've realized however that Linux disables interrupts when writing to
the local APIC ICR register for other reasons, but maybe this somehow
affects bring up in this CPU, hence the patch below. Could you please
give it a spin together with the patch provided by Andrew?

There are other minor differences between Linux and Xen AP bring up,
so I guess there are further changes to test if the patch below
doesn't make things better.

Thanks, Roger.
---8<---
diff --git a/xen/include/asm-x86/apic.h b/xen/include/asm-x86/apic.h
index 9d7ec93042..f28e922e2e 100644
--- a/xen/include/asm-x86/apic.h
+++ b/xen/include/asm-x86/apic.h
@@ -138,8 +138,12 @@ static __inline void apic_icr_write(u32 low, u32 dest)
          apic_wrmsr(APIC_ICR, low | ((uint64_t)dest << 32));
      else
      {
+        unsigned long flags;
+
+        local_irq_save(flags);
          apic_mem_write(APIC_ICR2, dest << 24);
          apic_mem_write(APIC_ICR, low);
+        local_irq_restore(flags);
      }
  }
The patch made a line of progress (got to Deasserting INIT):

(XEN) [2019-03-24 16:17:26] HVM: VMX enabled
(XEN) [2019-03-24 16:17:26] HVM: Hardware Assisted Paging (HAP) detected
(XEN) [2019-03-24 16:17:27] HVM: HAP page sizes: 4kB, 2MB
(XEN) [2019-03-24 16:17:27] Booting processor 1/2 eip 3e000
(XEN) [2019-03-24 16:17:27] Setting warm reset code and vector.
(XEN) [2019-03-24 16:17:27] 1.
(XEN) [2019-03-24 16:17:27] 2.
(XEN) [2019-03-24 16:17:27] 3.
(XEN) [2019-03-24 16:17:27] Asserting INIT.
(XEN) [2019-03-24 16:17:27] Waiting for send to finish...
(XEN) [2019-03-24 16:17:27] +Deasserting INIT.

Full log at:
Xen Source 4.13 unstable w/201903231150_pau.patch
at: https://pastebin.com/eewfy91P

For posterity, here's my patch log:

   zeta /usr/local/src/xen # patch <201903231150_pau.patch
   ...
   zeta /usr/local/src/xen # cat xen/include/asm-x86/apic.h |grep -n "restore(flags)"
   146:        local_irq_restore(flags);
   zeta /usr/local/src/xen # cat xen/include/asm-x86/apic.h |grep -n "Dprintk printk"
   8:#define Dprintk printk
   zeta /usr/local/src/xen #

   make
   cp dist/install/usr/lib64/efi/xen-4.13-unstable.efi /boot/efi/gentoo/man_xen.efi
   cp dist/install/boot/xen-4.13-unstable.gz /boot/efi/gentoo
   reboot

I performed a boot a 2nd time, the ending result was with these two lines

(no "+"s and no Deasserting):

(XEN) [2019-03-24 16:23:51] Asserting INIT.
(XEN) [2019-03-24 16:23:51] Waiting for send to finish...

I performed a boot a 3rd time, the ending result on my serial console was:

(XEN) [2019-03-24 16:25:53] Waiting for send to finish...
(XEN) [2019-03-24 16:25:53] +Deassertin

Note: the console attached to the server ("server console")
only had the "+"
and no "Deassertin" [sic - missing "g"], so there seems
to be an inconsistency between the server console's output
and the serial port console.  Probably not relevant to this
inquiry, but I note it so that in the future I will always check
the last entries in server's console vs. the serial PuTTy port.

For what it is worth, I've posted
UEFI vars, dmidecode, & hwinfo
at:
https://pastebin.com/d6zjv7x0

I am very impressed with the dedication you two have demonstrated.

Thank you.

-John


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.