[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Xen crash after S3 suspend - Xen 4.13 and newer





On Tue, Sep 20, 2022 at 11:23 AM Marek Marczykowski-Górecki <marmarek@xxxxxxxxxxxxxxxxxxxxxx> wrote:

I have two (non exclusive) ideas here:
1. If old_cpu is actually still available, do not move it at all.
2. Use sched_migrate() instead of sched_set_res().

Other possibilities:

3.  Make sure that svc->rqd is set to null when the affinity is broken.

Currently on vcpu creation, sched_init_vcpu() expects to set the pcpu; and it looks like for credit2, the svc->rqd may not be set until the first time it's woken up (that's the 'if' part of the 'if/else' clause whose 'else' contains the ASSERT() you're hitting).  If when we broke the CPU affinity on suspend, we set the runqueues to NULL, then on wake it would "take" the runqueue assigned by restore_vcpu_affinity().

4. Make sched2_unit_wake() tolerant of pcpus changing under its feet.

#3 would potentially make things more robust, but would require adding some sort of call-back to notify schedulers that affinity had been broken.  ATM this might only be used by credit2.

#4 would potentially be dangerous: if some other bit of credit2 code which assumes the svc->rq is valid.
 
Here is the patch that fixes it for me:
---8<---
diff --git a/xen/common/sched/core.c b/xen/common/sched/core.c
index 83455fbde1c8..dcf202d8b307 100644
--- a/xen/common/sched/core.c
+++ b/xen/common/sched/core.c
@@ -1246,19 +1246,29 @@ void restore_vcpu_affinity(struct domain *d)
             }
         }

-        res = get_sched_res(cpumask_any(cpumask_scratch_cpu(cpu)));
+        /* Prefer old cpu if available. */
+        if ( cpumask_test_cpu(old_cpu, cpumask_scratch_cpu(cpu)) )
+            res = get_sched_res(old_cpu);
+        else
+            res = get_sched_res(cpumask_any(cpumask_scratch_cpu(cpu)));
         sched_set_res(unit, res);

         spin_unlock_irq(lock);

-        /* v->processor might have changed, so reacquire the lock. */
-        lock = unit_schedule_lock_irq(unit);
-        res = sched_pick_resource(unit_scheduler(unit), unit);
-        sched_set_res(unit, res);
-        spin_unlock_irq(lock);
-
+        /*
+         * If different cpu was chosen, it was random, let scheduler do proper
+         * decision.
+         */
         if ( old_cpu != sched_unit_master(unit) )
+        {
+            /* v->processor might have changed, so reacquire the lock. */
+            lock = unit_schedule_lock_irq(unit);
+            res = sched_pick_resource(unit_scheduler(unit), unit);
+            sched_migrate(unit_scheduler(unit), unit, res->master_cpu);
+            spin_unlock_irq(lock);
+
             sched_move_irqs(unit);
+        }
     }

     rcu_read_unlock(&sched_res_rculock);
---8<---

I have several doubts here:

1. If old_cpu is available, is sched_set_res() needed at all?
2. Should both calls be changed to sched_migrate()? Currently I changed
   only the second one, in case scheduler could be confused about
   old_cpu not being available anymore.
3. Are there any extra locking requirements for sched_migrate() at this
   stage? The long comment above sched_unit_migrate_start() suggests
   there might be, but I'm not sure if that's really the case during
   resume.
4. Related to the above - should thaw_domains() be modified to call
   restore_vcpu_affinity() for all domains first, and unpause only
   later? That could reduce locking requirements, I guess.

Unfortunately this code has had a lot of churn since the last time I really engaged with it; I'm going to have to come back to this on Monday.

Jürgen / Dario, any thoughts?

 -George 

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.