Xen project Mailing List

Re: [PATCH v1.1 2/3] xen/sched: fix theoretical races accessing vcpu->dirty_cpu

From: Jürgen Groß <jgross@xxxxxxxx>

Date: Mon, 4 May 2020 15:53:33 +0200

Cc: Stefano Stabellini <sstabellini@xxxxxxxxxx>, Julien Grall <julien@xxxxxxx>, Wei Liu <wl@xxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Ian Jackson <ian.jackson@xxxxxxxxxxxxx>, George Dunlap <george.dunlap@xxxxxxxxxx>, xen-devel@xxxxxxxxxxxxxxxxxxxx, Roger Pau Monné <roger.pau@xxxxxxxxxx>

Delivery-date: Mon, 04 May 2020 13:53:37 +0000

List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 04.05.20 14:48, Jan Beulich wrote:

On 04.05.2020 14:41, Jürgen Groß wrote:

On 04.05.20 13:51, Jan Beulich wrote:

On 30.04.2020 17:28, Juergen Gross wrote:

@@ -1956,13 +1958,17 @@ void sync_local_execstate(void)
     void sync_vcpu_execstate(struct vcpu *v)
   {
-    if ( v->dirty_cpu == smp_processor_id() )
+    unsigned int dirty_cpu = read_atomic(&v->dirty_cpu);
+
+    if ( dirty_cpu == smp_processor_id() )
           sync_local_execstate();
-    else if ( vcpu_cpu_dirty(v) )
+    else if ( is_vcpu_dirty_cpu(dirty_cpu) )
       {
           /* Remote CPU calls __sync_local_execstate() from flush IPI handler. 
*/
-        flush_mask(cpumask_of(v->dirty_cpu), FLUSH_VCPU_STATE);
+        flush_mask(cpumask_of(dirty_cpu), FLUSH_VCPU_STATE);
       }
+    ASSERT(read_atomic(&v->dirty_cpu) != dirty_cpu ||
+           dirty_cpu == VCPU_CPU_CLEAN);


!is_vcpu_dirty_cpu(dirty_cpu) again? Also perhaps flip both
sides of the || (to have the cheaper one first), and maybe

      if ( is_vcpu_dirty_cpu(dirty_cpu) )
          ASSERT(read_atomic(&v->dirty_cpu) != dirty_cpu);

as the longer assertion string literal isn't really of that
much extra value.


I can do that, in case we can be sure the compiler will drop the
test in case of a non-debug build.


Modern gcc will afaik; no idea about clang though.

I'll go with both tests inside the ASSERT(), as this will hurt debug build only.

However, having stared at it for a while now - is this race
free? I can see this being fine in the (initial) case of
dirty_cpu == smp_processor_id(), but if this is for a foreign
CPU, can't the vCPU have gone back to that same CPU again in
the meantime?


This should never happen. Either the vcpu in question is paused,
or it has been forced off the cpu due to not being allowed to run
there (e.g. affinity has been changed, or cpu is about to be
removed from cpupool). I can add a comment explaining it.


There is a time window from late in flush_mask() to the assertion
you add. All sorts of things can happen during this window on
other CPUs. IOW what guarantees the vCPU not getting unpaused or
its affinity getting changed yet another time?

Hmm, very unlikely, but not impossible (especially in nested virt case). This makes me wonder whether adding the call to sync_vcpu_execstate() to sched_unit_migrate_finish() was really a good idea. There might be cases where it is not necessary (e.g. in the case you are mentioning, but with a much larger time window), or where it is more expensive than doing it the lazy way. Dropping this call of sync_vcpu_execstate() will remove the race you are mentioning completely. Juergen

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.