commit baccc50fcb7cec6f5ec84473dad28847b65b65e8
Author: Dario Faggioli <dfaggioli@xxxxxxxx>
Date:   Fri May 28 17:12:48 2021 +0200

    credit2: make sure we pick a runnable unit from the runq if there is one
    
    A !runnable unit (temporarily) present in the runq may cause us to
    stop scanning the runq itself too early. Of course, we don't run any
    non-runnable vCPUs, but we end the scan and we fallback to picking
    the idle unit. In other word, this prevent us to find there and pick
    the actual unit that we're meant to start running (which might be
    further ahead in the runq).
    
    Depending on the vCPU pinning configuration, this may lead to such
    unit to be stuck in the runq for long time, causing malfunctioning
    inside the guest.
    
    Fix this by checking runnable/non-runnable status up-front, in the runq
    scanning function.
    
    Reported-by: Michał Leszczyński <michal.leszczynski@xxxxxxx>
    Reported-by: Dion Kant <g.w.kant@xxxxxxxxxx>
    Signed-off-by: Dario Faggioli <dfaggioli@xxxxxxxx>
    Reviewed-by: George Dunlap <george.dunlap@xxxxxxxxxx>

diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c
index d6ebd126de..d89e340905 100644
--- a/xen/common/sched_credit2.c
+++ b/xen/common/sched_credit2.c
@@ -3361,6 +3361,10 @@ runq_candidate(struct csched2_runqueue_data *rqd,
                         (unsigned char *)&d);
         }
 
+        /* Skip non runnable vcpus that we (temporarily) have in the runq */
+        if ( unlikely(!vcpu_runnable(svc->vcpu)) )
+            continue;
+
         /* Only consider vcpus that are allowed to run on this processor. */
         if ( !cpumask_test_cpu(cpu, svc->vcpu->cpu_hard_affinity) )
             continue;