[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Xen 4.12.0-rc Hangs Around masked ExtINT on CPU#



On Fri, Mar 22, 2019 at 05:46:26PM -0700, John L. Poole wrote:
> 
> On 3/22/2019 7:40 AM, Andrew Cooper wrote:
> > On 22/03/2019 09:53, John L. Poole wrote:
> > > 3)Xen Source - here is the log of an attempt adding
> > > "cpuinfor maxcpus=1 watchdog"
> > > as an option in myman_xen.cfg:
> > > https://pastebin.com/b682FWmC (6 months)
> > > 
> > > The last 12 lines:
> > > (XEN) [2019-03-22 09:37:49] Booting processor 2/4 eip 3e000
> > > (XEN) [2019-03-22 09:35:28] Initializing CPU#2
> > > (XEN) [2019-03-22 09:35:28] masked ExtINT on CPU#2
> > > (XEN) [2019-03-22 09:35:28] CPU: Physical Processor ID: 0
> > > (XEN) [2019-03-22 09:35:28] CPU: Processor Core ID: 2
> > > (XEN) [2019-03-22 09:35:28] CPU: L1 I cache: 32K, L1 D cache: 24K
> > > (XEN) [2019-03-22 09:35:28] CPU: L2 cache: 1024K
> > > (XEN) [2019-03-22 09:35:28] CMCI: CPU2 has no CMCI support
> > > (XEN) [2019-03-22 09:35:28] CPU2: Thermal monitoring enabled (TM1)
> > > (XEN) [2019-03-22 09:37:49] CPU2: Intel(R) Atom(TM) CPU  C2750 @
> > > 2.40GHz stepping 08
> > > (XEN) [2019-03-22 09:37:49] Adding cpu 2 to runqueue 0
> > > (XEN) [2019-03-22 09:37:49] Removing cpu 2 from runqueue 0
> > > (XEN) [2019-03-22 09:37:49] Booting processor 3/6 eip 3e000
> > > 
> > > Result: hangs around the same place
> > Ok.  Something is clearly stalling while we are trying to start
> > secondary processors.
> > 
> > Can you apply this patch and rebuild please?
> > 
> > andrewcoop@andrewcoop:/local/xen.git$ git d
> > diff --git a/xen/include/asm-x86/apic.h b/xen/include/asm-x86/apic.h
> > index 9d7ec93..14ac0b1 100644
> > --- a/xen/include/asm-x86/apic.h
> > +++ b/xen/include/asm-x86/apic.h
> > @@ -5,7 +5,7 @@
> >   #include <asm/fixmap.h>
> >   #include <asm/msr.h>
> > -#define Dprintk(x...) do {} while (0)
> > +#define Dprintk printk
> >   /*
> >    * Debugging macros
> > 
> > which should give us some better diagnostics of the INIT-SIPI-SIPI
> > mechanism.
> > 
> > Do you have any options such as TXT or SMX enabled in firmware?  They
> > can interfere with AP bringup, so it would be useful to disable them for
> > now.
> > 
> > ~Andrew
> 
> done.
> 
> I tried patching and then make, but ran into an error.  So I performed:
> 
> git pull
> make clean
> 
> then verified the patch was still in effect, and then:
> 
> make
> 
> There was some problem in the install so I hand moved:
> ...
> -rw-r--r-- 1 root root2991647 Mar 22 11:01 xen-4.13-unstable.efi
> ...
> under /usr/local/src/xen/dist/install/usr/lib64/efi/
> to /boot/efi/gentoo and renamed it man_xen.efi.
> 
> Likewise, if found a xen kernel under
> /usr/local/src/xen/xen/dist/install/boot/
> ...
> -rw-r--r-- 1 root root 1181850 Mar 22 11:01 xen-4.13-unstable.gz
> ...
> and moved it to /boot/efi/gentoo -- not renaming it and
> making sure /boot/efi/gentoo/man_xen.cfg defines the kernel as
> "xen-4.13-unstable.gz"
> 
> Result: same failure, but with more debugging information.
> 
> Here are the last ten lines (starting at line 287):
> 
> (XEN) [2019-03-23 00:36:06] HVM: ASIDs enabled.
> (XEN) [2019-03-23 00:36:06] HVM: VMX enabled
> (XEN) [2019-03-23 00:36:06] HVM: Hardware Assisted Paging (HAP) detected
> (XEN) [2019-03-23 00:36:06] HVM: HAP page sizes: 4kB, 2MB
> (XEN) [2019-03-23 00:36:06] Booting processor 1/2 eip 3e000
> (XEN) [2019-03-23 00:36:06] Setting warm reset code and vector.
> (XEN) [2019-03-23 00:36:06] 1.
> (XEN) [2019-03-23 00:36:06] 2.
> (XEN) [2019-03-23 00:36:06] 3.
> (XEN) [2019-03-23 00:36:06] Asserting INIT.
> (XEN) [2019-03-23 00:36:06] Waiting for send to finish...
> 
> Here is the full boot log:
> https://pastebin.com/0LgrJH25

I'm currently away from home, and cannot really help much ATM, also I
don't have access to a system with a CPU that exhibits such behavior,
much makes debugging it harder.

I've taken a look at the difference in AP startup code between Linux
and Xen at or before the point you get the hang, and I'm not able to
spot anything obvious that could make Linux work and not Xen.

I've realized however that Linux disables interrupts when writing to
the local APIC ICR register for other reasons, but maybe this somehow
affects bring up in this CPU, hence the patch below. Could you please
give it a spin together with the patch provided by Andrew?

There are other minor differences between Linux and Xen AP bring up,
so I guess there are further changes to test if the patch below
doesn't make things better.

Thanks, Roger.
---8<---
diff --git a/xen/include/asm-x86/apic.h b/xen/include/asm-x86/apic.h
index 9d7ec93042..f28e922e2e 100644
--- a/xen/include/asm-x86/apic.h
+++ b/xen/include/asm-x86/apic.h
@@ -138,8 +138,12 @@ static __inline void apic_icr_write(u32 low, u32 dest)
         apic_wrmsr(APIC_ICR, low | ((uint64_t)dest << 32));
     else
     {
+        unsigned long flags;
+
+        local_irq_save(flags);
         apic_mem_write(APIC_ICR2, dest << 24);
         apic_mem_write(APIC_ICR, low);
+        local_irq_restore(flags);
     }
 }
 

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.