[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v2 0/4] xen/rcu: let rcu work better with core scheduling



On 02/03/2020 14:32, Jürgen Groß wrote:
> On 02.03.20 15:23, Igor Druzhinin wrote:
>> On 02/03/2020 14:03, Jürgen Groß wrote:
>>> On 02.03.20 14:25, Igor Druzhinin wrote:
>>>> On 28/02/2020 07:10, Jürgen Groß wrote:
>>>>>
>>>>> I think you are just narrowing the window of the race:
>>>>>
>>>>> It is still possible to have two cpus entering rcu_barrier() and to
>>>>> make it into the if ( !initial ) clause.
>>>>>
>>>>> Instead of introducing another atomic I believe the following patch
>>>>> instead of yours should do it:
>>>>>
>>>>> diff --git a/xen/common/rcupdate.c b/xen/common/rcupdate.c
>>>>> index e6add0b120..0d5469a326 100644
>>>>> --- a/xen/common/rcupdate.c
>>>>> +++ b/xen/common/rcupdate.c
>>>>> @@ -180,23 +180,17 @@ static void rcu_barrier_action(void)
>>>>>
>>>>>    void rcu_barrier(void)
>>>>>    {
>>>>> -    int initial = atomic_read(&cpu_count);
>>>>> -
>>>>>        while ( !get_cpu_maps() )
>>>>>        {
>>>>>            process_pending_softirqs();
>>>>> -        if ( initial && !atomic_read(&cpu_count) )
>>>>> +        if ( !atomic_read(&cpu_count) )
>>>>>                return;
>>>>>
>>>>>            cpu_relax();
>>>>> -        initial = atomic_read(&cpu_count);
>>>>>        }
>>>>>
>>>>> -    if ( !initial )
>>>>> -    {
>>>>> -        atomic_set(&cpu_count, num_online_cpus());
>>>>> +    if ( atomic_cmpxchg(&cpu_count, 0, num_online_cpus()) == 0 )
>>>>>            cpumask_raise_softirq(&cpu_online_map, RCU_SOFTIRQ);
>>>>> -    }
>>>>>
>>>>>        while ( atomic_read(&cpu_count) )
>>>>>        {
>>>>>
>>>>> Could you give that a try, please?
>>>>
>>>> With this patch I cannot disable SMT at all.
>>>>
>>>> The problem that my diff solved was a race between 2 consecutive
>>>> rcu_barrier operations on CPU0 (the pattern specific to SMT-on/off
>>>> operation) where some CPUs didn't exit the cpu_count checking loop
>>>> completely but cpu_count is already reinitialized on CPU0 - this
>>>> results in some CPUs being stuck in the loop.
>>>
>>> Ah, okay, then I believe a combination of the two patches is needed.
>>>
>>> Something like the attached version?
>>
>> I apologies - my previous test result was from machine booted in core mode.
>> I'm now testing it properly and the original patch seems to do the trick but
>> I still don't understand how you can avoid the race with only 1 counter -
>> it's always possible that CPU1 is still in cpu_count checking loop (even if
>> cpu_count is currently 0) when cpu_count is reinitialized.
> 
> I guess this is very very unlikely.
> 
>> I'm looking at your current version now. Was the removal of get_cpu_maps()
>> and recursion protection intentional? I suspect it would only work on the
>> latest master so I need to keep those for 4.13 testing.
> 
> Oh, sorry, this seems to be an old version.
> 
> Here comes the correct one.

I checked this version and it's supposed to be fine for v3 I guess. However,
I wasn't able to check how well it would work in core mode though as CPU hot off
is generally broken in it now (at least it boots in core mode with rcu_barrier
called on CPU bring-up).

Igor

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.