[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH v2 0/4] xen/rcu: let rcu work better with core scheduling
On 02.03.20 15:23, Igor Druzhinin wrote: On 02/03/2020 14:03, Jürgen Groß wrote:On 02.03.20 14:25, Igor Druzhinin wrote:On 28/02/2020 07:10, Jürgen Groß wrote:I think you are just narrowing the window of the race: It is still possible to have two cpus entering rcu_barrier() and to make it into the if ( !initial ) clause. Instead of introducing another atomic I believe the following patch instead of yours should do it: diff --git a/xen/common/rcupdate.c b/xen/common/rcupdate.c index e6add0b120..0d5469a326 100644 --- a/xen/common/rcupdate.c +++ b/xen/common/rcupdate.c @@ -180,23 +180,17 @@ static void rcu_barrier_action(void) void rcu_barrier(void) { - int initial = atomic_read(&cpu_count); - while ( !get_cpu_maps() ) { process_pending_softirqs(); - if ( initial && !atomic_read(&cpu_count) ) + if ( !atomic_read(&cpu_count) ) return; cpu_relax(); - initial = atomic_read(&cpu_count); } - if ( !initial ) - { - atomic_set(&cpu_count, num_online_cpus()); + if ( atomic_cmpxchg(&cpu_count, 0, num_online_cpus()) == 0 ) cpumask_raise_softirq(&cpu_online_map, RCU_SOFTIRQ); - } while ( atomic_read(&cpu_count) ) { Could you give that a try, please?With this patch I cannot disable SMT at all. The problem that my diff solved was a race between 2 consecutive rcu_barrier operations on CPU0 (the pattern specific to SMT-on/off operation) where some CPUs didn't exit the cpu_count checking loop completely but cpu_count is already reinitialized on CPU0 - this results in some CPUs being stuck in the loop.Ah, okay, then I believe a combination of the two patches is needed. Something like the attached version?I apologies - my previous test result was from machine booted in core mode. I'm now testing it properly and the original patch seems to do the trick but I still don't understand how you can avoid the race with only 1 counter - it's always possible that CPU1 is still in cpu_count checking loop (even if cpu_count is currently 0) when cpu_count is reinitialized. I guess this is very very unlikely. I'm looking at your current version now. Was the removal of get_cpu_maps() and recursion protection intentional? I suspect it would only work on the latest master so I need to keep those for 4.13 testing. Oh, sorry, this seems to be an old version. Here comes the correct one. Juergen Attachment:
v3-0002-xen-rcu-don-t-use-stop_machine_run-for-rcu_barrie.patch _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxxx https://lists.xenproject.org/mailman/listinfo/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |