Xen project Mailing List

Re: [Xen-devel] [PATCH v7 2/5] xen/rcu: don't use stop_machine_run() for rcu_barrier()

Date: Thu, 26 Mar 2020 07:58:39 +0100

Cc: Juergen Gross <jgross@xxxxxxxx>, Stefano Stabellini <sstabellini@xxxxxxxxxx>, Wei Liu <wl@xxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Ian Jackson <ian.jackson@xxxxxxxxxxxxx>, George Dunlap <george.dunlap@xxxxxxxxxx>, xen-devel@xxxxxxxxxxxxxxxxxxxx

Delivery-date: Thu, 26 Mar 2020 06:59:04 +0000

List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 25.03.2020 17:13, Julien Grall wrote: > On 25/03/2020 10:55, Juergen Gross wrote: >> @@ -143,51 +143,90 @@ static int qhimark = 10000; >> static int qlowmark = 100; >> static int rsinterval = 1000; >> -struct rcu_barrier_data { >> - struct rcu_head head; >> - atomic_t *cpu_count; >> -}; >> +/* >> + * rcu_barrier() handling: >> + * Two counters are used to synchronize rcu_barrier() work: >> + * - cpu_count holds the number of cpus required to finish barrier handling. >> + * It is decremented by each cpu when it has performed all pending rcu >> calls. >> + * - pending_count shows whether any rcu_barrier() activity is running and >> + * it is used to synchronize leaving rcu_barrier() only after all cpus >> + * have finished their processing. pending_count is initialized to >> nr_cpus + 1 >> + * and it is decremented by each cpu when it has seen that cpu_count has >> + * reached 0. The cpu where rcu_barrier() has been called will wait until >> + * pending_count has been decremented to 1 (so all cpus have seen >> cpu_count >> + * reaching 0) and will then set pending_count to 0 indicating there is no >> + * rcu_barrier() running. >> + * Cpus are synchronized via softirq mechanism. rcu_barrier() is regarded to >> + * be active if pending_count is not zero. In case rcu_barrier() is called >> on >> + * multiple cpus it is enough to check for pending_count being not zero on >> entry >> + * and to call process_pending_softirqs() in a loop until pending_count >> drops to >> + * zero, before starting the new rcu_barrier() processing. >> + */ >> +static atomic_t cpu_count = ATOMIC_INIT(0); >> +static atomic_t pending_count = ATOMIC_INIT(0); >> static void rcu_barrier_callback(struct rcu_head *head) >> { >> - struct rcu_barrier_data *data = container_of( >> - head, struct rcu_barrier_data, head); >> - atomic_inc(data->cpu_count); >> + smp_mb__before_atomic(); /* Make all writes visible to other cpus. >> */ > > smp_mb__before_atomic() will order both read and write. However, the > comment suggest only the write are required to be ordered. > > So either the barrier is too strong or the comment is incorrect. Can > you clarify it? Neither is the case, I guess: There simply is no smp_wmb__before_atomic() in Linux, and if we want to follow their model we shouldn't have one either. I'd rather take the comment to indicate that if one appeared, it could be used here. >> + atomic_dec(&cpu_count); >> } >> -static int rcu_barrier_action(void *_cpu_count) >> +static void rcu_barrier_action(void) >> { >> - struct rcu_barrier_data data = { .cpu_count = _cpu_count }; >> - >> - ASSERT(!local_irq_is_enabled()); >> - local_irq_enable(); >> + struct rcu_head head; >> /* >> * When callback is executed, all previously-queued RCU work on this >> CPU >> - * is completed. When all CPUs have executed their callback, >> data.cpu_count >> - * will have been incremented to include every online CPU. >> + * is completed. When all CPUs have executed their callback, cpu_count >> + * will have been decremented to 0. >> */ >> - call_rcu(&data.head, rcu_barrier_callback); >> + call_rcu(&head, rcu_barrier_callback); >> - while ( atomic_read(data.cpu_count) != num_online_cpus() ) >> + while ( atomic_read(&cpu_count) ) >> { >> process_pending_softirqs(); >> cpu_relax(); >> } >> - local_irq_disable(); >> - >> - return 0; >> + smp_mb__before_atomic(); >> + atomic_dec(&pending_count); >> } >> -/* >> - * As rcu_barrier() is using stop_machine_run() it is allowed to be used in >> - * idle context only (see comment for stop_machine_run()). >> - */ >> -int rcu_barrier(void) >> +void rcu_barrier(void) >> { >> - atomic_t cpu_count = ATOMIC_INIT(0); >> - return stop_machine_run(rcu_barrier_action, &cpu_count, NR_CPUS); >> + unsigned int n_cpus; >> + >> + ASSERT(!in_irq() && local_irq_is_enabled()); >> + >> + for ( ; ; ) >> + { >> + if ( !atomic_read(&pending_count) && get_cpu_maps() ) >> + { >> + n_cpus = num_online_cpus(); >> + >> + if ( atomic_cmpxchg(&pending_count, 0, n_cpus + 1) == 0 ) >> + break; >> + >> + put_cpu_maps(); >> + } >> + >> + process_pending_softirqs(); >> + cpu_relax(); >> + } >> + >> + smp_mb__before_atomic(); > > Our semantic of atomic_cmpxchg() is exactly the same as Linux. I.e > it will contain a full barrier when the cmpxchg succeed. So why do you need > this barrier? I was me I think to have (wrongly) suggested a barrier was missing here, sorry. Jan

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.