[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Core scheduling and cpu offlining



On 02.03.20 22:45, Igor Druzhinin wrote:
On 02/03/2020 14:05, Jürgen Groß wrote:
On 02.03.20 14:51, Igor Druzhinin wrote:
On 02/03/2020 08:39, Jürgen Groß wrote:
Hi Igor,

could you please test the attached patch whether it fixes your problem
with cpu offlining?

It's certainly better and doesn't cause watchdog hit as before but I ran
the following script to verify:

while true
do
      for i in `seq 1 63`; do xen-hptool cpu-offline $i; done
      for i in `seq 1 63`; do xen-hptool cpu-online $i; done
done

... and got this a little bit later (note the same script works fine in thread 
mode):

(XEN) [  282.199134] Assertion '!preempt_count()' failed at preempt.c:36
(XEN) [  282.199142] ----[ Xen-4.13.0  x86_64  debug=y   Not tainted ]----
(XEN) [  282.199147] CPU:    0
(XEN) [  282.199150] RIP:    e008:[<ffff82d080228817>] 
ASSERT_NOT_IN_ATOMIC+0x1f/0x58
(XEN) [  282.199159] RFLAGS: 0000000000010202   CONTEXT: hypervisor
(XEN) [  282.199165] rax: ffff82d0805c7024   rbx: 0000000000000000   rcx: 
0000000000000000
(XEN) [  282.199170] rdx: 0000000000000000   rsi: 00000000000026cd   rdi: 
ffff82d0804b3aac
(XEN) [  282.199175] rbp: ffff8300920bfe90   rsp: ffff8300920bfe90   r8:  
ffff83042f21ffe0
(XEN) [  282.199180] r9:  0000000000000001   r10: 3333333333333333   r11: 
0000000000000001
(XEN) [  282.199185] r12: ffff82d0805cdb00   r13: 0000000000000000   r14: 
ffff82d0805c7250
(XEN) [  282.199192] r15: 0000000000000000   cr0: 000000008005003b   cr4: 
00000000003506e0
(XEN) [  282.199252] cr3: 00000000920b0000   cr2: 00007f0fff967000
(XEN) [  282.199256] fsb: 00007f0fff957740   gsb: ffff88821e000000   gss: 
0000000000000000
(XEN) [  282.199261] ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: e010   cs: 
e008
(XEN) [  282.199268] Xen code around <ffff82d080228817> 
(ASSERT_NOT_IN_ATOMIC+0x1f/0x58):
(XEN) [  282.199272]  52 d1 83 3c 10 00 74 02 <0f> 0b 48 89 e0 48 0d ff 7f 00 
00 8b 40 c1 48 c1
(XEN) [  282.199287] Xen stack trace from rsp=ffff8300920bfe90:
(XEN) [  282.199290]    ffff8300920bfea0 ffff82d080242680 ffff8300920bfef0 
ffff82d08027a171
(XEN) [  282.199297]    ffff82d080242635 000000002b3bf000 ffff83042bb1f000 
ffff83042bb1f000
(XEN) [  282.199304]    ffff83042bb1f000 0000000000000000 ffff82d0805ec620 
0000000000000000
(XEN) [  282.199311]    ffff8300920bfd60 0000000000000000 00007ffc633001b0 
0000000000305000
(XEN) [  282.199317]    ffff888212bd28a8 00007ffc633001b0 fffffffffffffff2 
0000000000000286
(XEN) [  282.199324]    0000000000000000 0000000000000000 0000000000000000 
0000000000000000
(XEN) [  282.199329]    ffffffff8100146a 0000000000000000 0000000000000000 
deadbeefdeadf00d
(XEN) [  282.199335]    0000010000000000 ffffffff8100146a 000000000000e033 
0000000000000286
(XEN) [  282.199342]    ffffc90042977d70 000000000000e02b 0000000000000000 
0000000000000000
(XEN) [  282.199347]    0000000000000000 0000000000000000 0000e01000000000 
ffff83042bb1f000
(XEN) [  282.199353]    0000000000000000 00000000003506e0 0000000000000000 
0000000000000000
(XEN) [  282.199359]    0000040000000000 0000000000000000
(XEN) [  282.199364] Xen call trace:
(XEN) [  282.199368]    [<ffff82d080228817>] R ASSERT_NOT_IN_ATOMIC+0x1f/0x58
(XEN) [  282.199375]    [<ffff82d080242680>] F do_softirq+0x9/0x15
(XEN) [  282.199381]    [<ffff82d08027a171>] F 
arch/x86/domain.c#idle_loop+0xb4/0xcb
(XEN) [  282.199384]
(XEN) [  282.438998]
(XEN) [  282.440991] ****************************************
(XEN) [  282.446459] Panic on CPU 0:
(XEN) [  282.449745] Assertion '!preempt_count()' failed at preempt.c:36
(XEN) [  282.456156] ****************************************
(XEN) [  282.461621]

Oh, indeed, there are rcu_read_unlock() calls missing (up to now
for ARM relevant only).

Is this one better?

I think we're back at the square one. For some reason it now throws watchdog 
timeouts
again. Note: I'm testing without any rcu_barrier related patches applied. Do 
you see
the same issues running the script above on your machine?

Yes. This is due to your script trying to remove siblings of _all_
cores leaving no cpu to work on. There is a bug in cpupool.c missing
to call rcu_read_unlock() on the error path. Will send a patch.


Juergen

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.