[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v6 1/4] xen/rcu: don't use stop_machine_run() for rcu_barrier()



On 16/03/2020 16:01, Jürgen Groß wrote:
> On 16.03.20 16:24, Igor Druzhinin wrote:
>> On 13/03/2020 13:06, Juergen Gross wrote:
>>> Today rcu_barrier() is calling stop_machine_run() to synchronize all
>>> physical cpus in order to ensure all pending rcu calls have finished
>>> when returning.
>>>
>>> As stop_machine_run() is using tasklets this requires scheduling of
>>> idle vcpus on all cpus imposing the need to call rcu_barrier() on idle
>>> cpus only in case of core scheduling being active, as otherwise a
>>> scheduling deadlock would occur.
>>>
>>> There is no need at all to do the syncing of the cpus in tasklets, as
>>> rcu activity is started in __do_softirq() called whenever softirq
>>> activity is allowed. So rcu_barrier() can easily be modified to use
>>> softirq for synchronization of the cpus no longer requiring any
>>> scheduling activity.
>>>
>>> As there already is a rcu softirq reuse that for the synchronization.
>>>
>>> Remove the barrier element from struct rcu_data as it isn't used.
>>>
>>> Finally switch rcu_barrier() to return void as it now can never fail.
>>>
>>> Partially-based-on-patch-by: Igor Druzhinin <igor.druzhinin@xxxxxxxxxx>
>>> Signed-off-by: Juergen Gross <jgross@xxxxxxxx>
>>> ---
>>> V2:
>>> - add recursion detection
>>>
>>> V3:
>>> - fix races (Igor Druzhinin)
>>>
>>> V5:
>>> - rename done_count to pending_count (Jan Beulich)
>>> - fix race (Jan Beulich)
>>>
>>> V6:
>>> - add barrier (Julien Grall)
>>> - add ASSERT() (Julien Grall)
>>> - hold cpu_map lock until end of rcu_barrier() (Julien Grall)
>>> ---
>>>    xen/common/rcupdate.c      | 95 
>>> +++++++++++++++++++++++++++++++++-------------
>>>    xen/include/xen/rcupdate.h |  2 +-
>>>    2 files changed, 69 insertions(+), 28 deletions(-)
>>>
>>> diff --git a/xen/common/rcupdate.c b/xen/common/rcupdate.c
>>> index 03d84764d2..ed9083d2b2 100644
>>> --- a/xen/common/rcupdate.c
>>> +++ b/xen/common/rcupdate.c
>>> @@ -83,7 +83,6 @@ struct rcu_data {
>>>        struct rcu_head **donetail;
>>>        long            blimit;           /* Upper limit on a 
>>> processed batch */
>>>        int cpu;
>>> -    struct rcu_head barrier;
>>>        long            last_rs_qlen;     /* qlen during the last 
>>> resched */
>>>        /* 3) idle CPUs handling */
>>> @@ -91,6 +90,7 @@ struct rcu_data {
>>>        bool idle_timer_active;
>>>        bool            process_callbacks;
>>> +    bool            barrier_active;
>>>    };
>>>    /*
>>> @@ -143,51 +143,85 @@ static int qhimark = 10000;
>>>    static int qlowmark = 100;
>>>    static int rsinterval = 1000;
>>> -struct rcu_barrier_data {
>>> -    struct rcu_head head;
>>> -    atomic_t *cpu_count;
>>> -};
>>> +/*
>>> + * rcu_barrier() handling:
>>> + * cpu_count holds the number of cpus required to finish barrier 
>>> handling.
>>> + * pending_count is initialized to nr_cpus + 1.
>>> + * Cpus are synchronized via softirq mechanism. rcu_barrier() is 
>>> regarded to
>>> + * be active if pending_count is not zero. In case rcu_barrier() is 
>>> called on
>>> + * multiple cpus it is enough to check for pending_count being not 
>>> zero on entry
>>> + * and to call process_pending_softirqs() in a loop until 
>>> pending_count drops to
>>> + * zero, before starting the new rcu_barrier() processing.
>>> + * In order to avoid hangs when rcu_barrier() is called multiple 
>>> times on the
>>> + * same cpu in fast sequence and a slave cpu couldn't drop out of the
>>> + * barrier handling fast enough a second counter pending_count is 
>>> needed.
>>> + * The rcu_barrier() invoking cpu will wait until pending_count 
>>> reaches 1
>>> + * (meaning that all cpus have finished processing the barrier) and 
>>> then will
>>> + * reset pending_count to 0 to enable entering rcu_barrier() again.
>>> + */
>>> +static atomic_t cpu_count = ATOMIC_INIT(0);
>>> +static atomic_t pending_count = ATOMIC_INIT(0);
>>>    static void rcu_barrier_callback(struct rcu_head *head)
>>>    {
>>> -    struct rcu_barrier_data *data = container_of(
>>> -        head, struct rcu_barrier_data, head);
>>> -    atomic_inc(data->cpu_count);
>>> +    smp_wmb();     /* Make all previous writes visible to other 
>>> cpus. */
>>> +    atomic_dec(&cpu_count);
>>>    }
>>> -static int rcu_barrier_action(void *_cpu_count)
>>> +static void rcu_barrier_action(void)
>>>    {
>>> -    struct rcu_barrier_data data = { .cpu_count = _cpu_count };
>>> -
>>> -    ASSERT(!local_irq_is_enabled());
>>> -    local_irq_enable();
>>> +    struct rcu_head head;
>>>        /*
>>>         * When callback is executed, all previously-queued RCU work 
>>> on this CPU
>>> -     * is completed. When all CPUs have executed their callback, 
>>> data.cpu_count
>>> -     * will have been incremented to include every online CPU.
>>> +     * is completed. When all CPUs have executed their callback, 
>>> cpu_count
>>> +     * will have been decremented to 0.
>>>         */
>>> -    call_rcu(&data.head, rcu_barrier_callback);
>>> +    call_rcu(&head, rcu_barrier_callback);
>>> -    while ( atomic_read(data.cpu_count) != num_online_cpus() )
>>> +    while ( atomic_read(&cpu_count) )
>>>        {
>>>            process_pending_softirqs();
>>>            cpu_relax();
>>>        }
>>> -    local_irq_disable();
>>> -
>>> -    return 0;
>>> +    atomic_dec(&pending_count);
>>>    }
>>> -/*
>>> - * As rcu_barrier() is using stop_machine_run() it is allowed to be 
>>> used in
>>> - * idle context only (see comment for stop_machine_run()).
>>> - */
>>> -int rcu_barrier(void)
>>> +void rcu_barrier(void)
>>>    {
>>> -    atomic_t cpu_count = ATOMIC_INIT(0);
>>> -    return stop_machine_run(rcu_barrier_action, &cpu_count, NR_CPUS);
>>> +    unsigned int n_cpus;
>>> +
>>> +    ASSERT(!in_irq() && local_irq_is_enabled());
>>> +
>>> +    for ( ;; )
>>> +    {
>>> +        if ( !atomic_read(&pending_count) && get_cpu_maps() )
>>> +        {
>>
>> If the whole action is happening while cpu_maps are taken why do you
>> need to check pending_count first? I think the logic of this loop
>> could be simplified if taken this into account.
> 
> get_cpu_maps() can be successful on multiple cpus (its a read_lock()).
> Testing pending_count avoids hammering on the cache lines.

I see - the logic was changed recently. I'm currently testing this 
version of the patch.

Igor

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.