[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH v4 2/6] xen/rcu: don't use stop_machine_run() for rcu_barrier()
On 10.03.20 17:29, Jan Beulich wrote: On 10.03.2020 08:28, Juergen Gross wrote:@@ -143,51 +143,75 @@ static int qhimark = 10000; static int qlowmark = 100; static int rsinterval = 1000;-struct rcu_barrier_data {- struct rcu_head head; - atomic_t *cpu_count; -}; +/* + * rcu_barrier() handling: + * cpu_count holds the number of cpu required to finish barrier handling. + * Cpus are synchronized via softirq mechanism. rcu_barrier() is regarded to + * be active if cpu_count is not zero. In case rcu_barrier() is called on + * multiple cpus it is enough to check for cpu_count being not zero on entry + * and to call process_pending_softirqs() in a loop until cpu_count drops to + * zero, as syncing has been requested already and we don't need to sync + * multiple times. + * In order to avoid hangs when rcu_barrier() is called mutiple times on the + * same cpu in fast sequence and a slave cpu couldn't drop out of the + * barrier handling fast enough a second counter done_count is needed. + */ +static atomic_t cpu_count = ATOMIC_INIT(0); +static atomic_t done_count = ATOMIC_INIT(0);From its use below this looks more like "todo_count" or "pending_count".+void rcu_barrier(void) { - atomic_t cpu_count = ATOMIC_INIT(0); - return stop_machine_run(rcu_barrier_action, &cpu_count, NR_CPUS); + unsigned int n_cpus; + + while ( !get_cpu_maps() ) + { + process_pending_softirqs(); + if ( !atomic_read(&cpu_count) ) + return; + + cpu_relax(); + } + + n_cpus = num_online_cpus(); + + if ( atomic_cmpxchg(&cpu_count, 0, n_cpus) == 0 ) + { + atomic_add(n_cpus, &done_count); + cpumask_raise_softirq(&cpu_online_map, RCU_SOFTIRQ); + } + + while ( atomic_read(&done_count) )Don't you leave a window for races here, in that done_count gets set to non-zero only after setting cpu_count? A CPU losing the cmpxchg attempt above may observe done_count still being zero, and hence exit without waiting for the count to actually _drop_ to zero. This can only be a cpu not having joined the barrier handling, so it will do that later. Juergen _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxxx https://lists.xenproject.org/mailman/listinfo/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |