[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [BUG] XEN crash and double fault when doing cpu online/offline



On 08.01.20 09:32, Tao Xu wrote:

On 1/8/20 3:50 PM, Jürgen Groß wrote:
On 08.01.20 06:50, Tao Xu wrote:
Hi,

When I use xen-hptool cpu-offline/cpu-online to let CPU in a socket online/offline using the script as follows:

for((j=48;j<=95;j++));
do
   xen-hptool cpu-offline $j
done

for((j=48;j<=95;j++));
do
   xen-hptool cpu-online $j
done

Xen crash when cpu re-online. I use the upstream XEN(0dd92688) and try many days, it still crash. But if I only do cpu online/offline for CPU 48~59, Xen will not crash. The bug can be reproduced when we do cpu online/offline for most CPU in a socket. And interesting thing is when we use the script as follow:

for((j=48;j<=95;j++));
do
   xen-hptool cpu-offline $j
   xen-hptool cpu-online $j
done

Xen will not crash too. Is there a bug in sched_credit2?

The crash message as follows:

(XEN) Adding cpu 77 to runqueue 1
(XEN) Adding cpu 78 to runqueue 1
(XEN) Adding cpu 79 to runqueue 1
(XEN) Adding cpu 80 to runqueue 1
(X(ENXE) N) *** DOUBLE FAULT ***
(XEN) Assertion 'debug->cpu == smp_processor_id()' failed at spinlock.c:88
(XEN) ----[ Xen-4.14-unstable  x86_64  debug=y   Not tainted ]----
(XEN) Debugging connection not set up.
(XEN) CPU:    48
(XEN) ----[ Xen-4.14-unstable  x86_64  debug=y   Not tainted ]----
(XEN) CPU:    0
(XEN) RIP:    e008:[<ffff82d080240bfc>] _spin_unlock+0x40/0x42

So the original problem causes a double fault, but spinlock debugging
causes a subsequent panic.

Can you please retry the tests with the attached patch? It should
result in diagnostic data related to the real problem.


Juergen

Hi Juergen,

After apply your patch, spin_lock still assert. And the address ffff82d0bffce880 is not in the xen-syms.

Yes, I had a bug in my modified ASSERT(), but this time the data is
better.


(XEN) Adding cpu 78 to runqueue 1
(XEN) *** DOUBLE FAULT ***
(XEN) ----[ Xen-4.14-unstable  x86_64  debug=y   Not tainted ]----
(XEN) CPU:    49
(XEN) RIP:    e008:[<ffff82d0bffce880>] ffff82d0bffce880

This seems to be a crash in the stub page of cpu 48.

I don't think this is related to the scheduler, but to stub page
handling.

Can you please try the attached patch?


Juergen

Attachment: 0001-xen-x86-clear-per-cpu-stub-page-information-in-cpu_s.patch
Description: Text Data

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.