[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v1.1 2/3] xen/sched: fix theoretical races accessing vcpu->dirty_cpu



On 04.05.2020 14:41, Jürgen Groß wrote:
> On 04.05.20 13:51, Jan Beulich wrote:
>> On 30.04.2020 17:28, Juergen Gross wrote:
>>> @@ -1956,13 +1958,17 @@ void sync_local_execstate(void)
>>>     void sync_vcpu_execstate(struct vcpu *v)
>>>   {
>>> -    if ( v->dirty_cpu == smp_processor_id() )
>>> +    unsigned int dirty_cpu = read_atomic(&v->dirty_cpu);
>>> +
>>> +    if ( dirty_cpu == smp_processor_id() )
>>>           sync_local_execstate();
>>> -    else if ( vcpu_cpu_dirty(v) )
>>> +    else if ( is_vcpu_dirty_cpu(dirty_cpu) )
>>>       {
>>>           /* Remote CPU calls __sync_local_execstate() from flush IPI 
>>> handler. */
>>> -        flush_mask(cpumask_of(v->dirty_cpu), FLUSH_VCPU_STATE);
>>> +        flush_mask(cpumask_of(dirty_cpu), FLUSH_VCPU_STATE);
>>>       }
>>> +    ASSERT(read_atomic(&v->dirty_cpu) != dirty_cpu ||
>>> +           dirty_cpu == VCPU_CPU_CLEAN);
>>
>> !is_vcpu_dirty_cpu(dirty_cpu) again? Also perhaps flip both
>> sides of the || (to have the cheaper one first), and maybe
>>
>>      if ( is_vcpu_dirty_cpu(dirty_cpu) )
>>          ASSERT(read_atomic(&v->dirty_cpu) != dirty_cpu);
>>
>> as the longer assertion string literal isn't really of that
>> much extra value.
> 
> I can do that, in case we can be sure the compiler will drop the
> test in case of a non-debug build.

Modern gcc will afaik; no idea about clang though.

>> However, having stared at it for a while now - is this race
>> free? I can see this being fine in the (initial) case of
>> dirty_cpu == smp_processor_id(), but if this is for a foreign
>> CPU, can't the vCPU have gone back to that same CPU again in
>> the meantime?
> 
> This should never happen. Either the vcpu in question is paused,
> or it has been forced off the cpu due to not being allowed to run
> there (e.g. affinity has been changed, or cpu is about to be
> removed from cpupool). I can add a comment explaining it.

There is a time window from late in flush_mask() to the assertion
you add. All sorts of things can happen during this window on
other CPUs. IOW what guarantees the vCPU not getting unpaused or
its affinity getting changed yet another time?

Jan



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.