[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v2 0/4] xen/rcu: let rcu work better with core scheduling



On 22.02.20 17:42, Igor Druzhinin wrote:
On 22/02/2020 06:05, Jürgen Groß wrote:
On 22.02.20 03:29, Igor Druzhinin wrote:
On 18/02/2020 12:21, Juergen Gross wrote:
Today the RCU handling in Xen is affecting scheduling in several ways.
It is raising sched softirqs without any real need and it requires
tasklets for rcu_barrier(), which interacts badly with core scheduling.

This small series repairs those issues.

Additionally some ASSERT()s are added for verification of sane rcu
handling. In order to avoid those triggering right away the obvious
violations are fixed.

I've done more testing of this with [1] and, unfortunately, it quite easily
deadlocks while without this series it doesn't.

Steps to repro:
- apply [1]
- take a host with considerable CPU count (~64)
- run a loop: xen-hptool smt-disable; xen-hptool smt-enable

[1] https://lists.xenproject.org/archives/html/xen-devel/2020-02/msg01383.html

Yeah, the reason for that is that rcu_barrier() is a nop in this
situation without my patch, as the then called stop_machine_run() in
rcu_barrier() will just return -EBUSY.

Are you sure that's ther reason? I always have the following stack on CPU0:

(XEN) [  120.891143] *** Dumping CPU0 host state: ***
(XEN) [  120.895909] ----[ Xen-4.13.0  x86_64  debug=y   Not tainted ]----
(XEN) [  120.902487] CPU:    0
(XEN) [  120.905269] RIP:    e008:[<ffff82d0802aa750>] 
smp_send_call_function_mask+0x40/0x43
(XEN) [  120.913415] RFLAGS: 0000000000000286   CONTEXT: hypervisor
(XEN) [  120.919389] rax: 0000000000000000   rbx: ffff82d0805ddb78   rcx: 
0000000000000001
(XEN) [  120.927362] rdx: ffff82d0805cdb00   rsi: ffff82d0805c7cd8   rdi: 
0000000000000007
(XEN) [  120.935341] rbp: ffff8300920bfbc0   rsp: ffff8300920bfbb8   r8:  
000000000000003b
(XEN) [  120.943310] r9:  0444444444444432   r10: 3333333333333333   r11: 
0000000000000001
(XEN) [  120.951282] r12: ffff82d0805ddb78   r13: 0000000000000001   r14: 
ffff8300920bfc18
(XEN) [  120.959251] r15: ffff82d0802af646   cr0: 000000008005003b   cr4: 
00000000003506e0
(XEN) [  120.967223] cr3: 00000000920b0000   cr2: ffff88820dffe7f8
(XEN) [  120.973125] fsb: 0000000000000000   gsb: ffff88821e3c0000   gss: 
0000000000000000
(XEN) [  120.981094] ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: e010   cs: 
e008
(XEN) [  120.988548] Xen code around <ffff82d0802aa750> 
(smp_send_call_function_mask+0x40/0x43):
(XEN) [  120.997037]  85 f9 ff fb 48 83 c4 08 <5b> 5d c3 9c 58 f6 c4 02 74 02 
0f 0b 55 48 89 e5
(XEN) [  121.005442] Xen stack trace from rsp=ffff8300920bfbb8:
(XEN) [  121.011080]    ffff8300920bfc18 ffff8300920bfc00 ffff82d080242c84 
ffff82d080389845
(XEN) [  121.019145]    ffff8300920bfc18 ffff82d0802af178 0000000000000000 
0000001c1d27aff8
(XEN) [  121.027200]    0000000000000000 ffff8300920bfc80 ffff82d0802af1fa 
ffff82d080289adf
(XEN) [  121.035255]    fffffffffffffd55 0000000000000000 0000000000000000 
0000000000000000
(XEN) [  121.043320]    0000000000000000 0000000000000000 0000000000000000 
0000000000000000
(XEN) [  121.051375]    000000000000003b 0000001c25e54bf1 0000000000000000 
ffff8300920bfc80
(XEN) [  121.059443]    ffff82d0805c7300 ffff8300920bfcb0 ffff82d080245f4d 
ffff82d0802af4a2
(XEN) [  121.067498]    ffff82d0805c7300 ffff83042bb24f60 ffff82d08060f400 
ffff8300920bfd00
(XEN) [  121.075553]    ffff82d080246781 ffff82d0805cdb00 ffff8300920bfd80 
ffff82d0805c7040
(XEN) [  121.083621]    ffff82d0805cdb00 ffff82d0805cdb00 fffffffffffffff9 
ffff8300920bffff
(XEN) [  121.091674]    0000000000000000 ffff8300920bfd30 ffff82d0802425a5 
ffff82d0805c7040
(XEN) [  121.099739]    ffff82d0805cdb00 fffffffffffffff9 ffff8300920bffff 
ffff8300920bfd40
(XEN) [  121.107797]    ffff82d0802425e5 ffff8300920bfd80 ffff82d08022bc0f 
0000000000000000
(XEN) [  121.115852]    ffff82d08022b600 ffff82d0804b3888 ffff82d0805cdb00 
ffff82d0805cdb00
(XEN) [  121.123917]    fffffffffffffff9 ffff8300920bfdb0 ffff82d0802425a5 
0000000000000003
(XEN) [  121.131975]    0000000000000001 00000000ffffffef ffff8300920bffff 
ffff8300920bfdc0
(XEN) [  121.140037]    ffff82d0802425e5 ffff8300920bfdd0 ffff82d08022b91b 
ffff8300920bfdf0
(XEN) [  121.148093]    ffff82d0802addb1 ffff83042b3b0000 0000000000000003 
ffff8300920bfe30
(XEN) [  121.156150]    ffff82d0802ae086 ffff8300920bfe10 ffff83042b7e81e0 
ffff83042b3b0000
(XEN) [  121.164216]    0000000000000000 0000000000000000 0000000000000000 
ffff8300920bfe50
(XEN) [  121.172271] Xen call trace:
(XEN) [  121.175573]    [<ffff82d0802aa750>] R 
smp_send_call_function_mask+0x40/0x43
(XEN) [  121.183024]    [<ffff82d080242c84>] F on_selected_cpus+0xa4/0xde
(XEN) [  121.189520]    [<ffff82d0802af1fa>] F 
arch/x86/time.c#time_calibration+0x82/0x89
(XEN) [  121.197403]    [<ffff82d080245f4d>] F 
common/timer.c#execute_timer+0x49/0x64
(XEN) [  121.204951]    [<ffff82d080246781>] F 
common/timer.c#timer_softirq_action+0x116/0x24e
(XEN) [  121.213271]    [<ffff82d0802425a5>] F 
common/softirq.c#__do_softirq+0x85/0x90
(XEN) [  121.220890]    [<ffff82d0802425e5>] F 
process_pending_softirqs+0x35/0x37
(XEN) [  121.228086]    [<ffff82d08022bc0f>] F 
common/rcupdate.c#rcu_process_callbacks+0x1ef/0x20d
(XEN) [  121.236758]    [<ffff82d0802425a5>] F 
common/softirq.c#__do_softirq+0x85/0x90
(XEN) [  121.244378]    [<ffff82d0802425e5>] F 
process_pending_softirqs+0x35/0x37
(XEN) [  121.251568]    [<ffff82d08022b91b>] F rcu_barrier+0x58/0x6e
(XEN) [  121.257639]    [<ffff82d0802addb1>] F cpu_down_helper+0x11/0x32
(XEN) [  121.264051]    [<ffff82d0802ae086>] F 
arch/x86/sysctl.c#smt_up_down_helper+0x1d6/0x1fe
(XEN) [  121.272454]    [<ffff82d08020878d>] F 
common/domain.c#continue_hypercall_tasklet_handler+0x54/0xb8
(XEN) [  121.281900]    [<ffff82d0802454e6>] F 
common/tasklet.c#do_tasklet_work+0x81/0xb4
(XEN) [  121.289786]    [<ffff82d080245803>] F do_tasklet+0x58/0x85
(XEN) [  121.295771]    [<ffff82d08027a0b4>] F 
arch/x86/domain.c#idle_loop+0x87/0xcb

So it's not in get_cpu_maps() loop. It seems to me it's not entering time sync 
for some
reason.

Interesting. Looking further into that.

At least time_calibration() is missing to call get_cpu_maps().


Juergen

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.