[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] xen/sched: fix sched_move_domain()


  • To: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, xen-devel@xxxxxxxxxxxxxxxxxxxx
  • From: Juergen Gross <jgross@xxxxxxxx>
  • Date: Mon, 23 Oct 2023 08:31:25 +0200
  • Authentication-results: smtp-out2.suse.de; none
  • Autocrypt: addr=jgross@xxxxxxxx; keydata= xsBNBFOMcBYBCACgGjqjoGvbEouQZw/ToiBg9W98AlM2QHV+iNHsEs7kxWhKMjrioyspZKOB ycWxw3ie3j9uvg9EOB3aN4xiTv4qbnGiTr3oJhkB1gsb6ToJQZ8uxGq2kaV2KL9650I1SJve dYm8Of8Zd621lSmoKOwlNClALZNew72NjJLEzTalU1OdT7/i1TXkH09XSSI8mEQ/ouNcMvIJ NwQpd369y9bfIhWUiVXEK7MlRgUG6MvIj6Y3Am/BBLUVbDa4+gmzDC9ezlZkTZG2t14zWPvx XP3FAp2pkW0xqG7/377qptDmrk42GlSKN4z76ELnLxussxc7I2hx18NUcbP8+uty4bMxABEB AAHNH0p1ZXJnZW4gR3Jvc3MgPGpncm9zc0BzdXNlLmNvbT7CwHkEEwECACMFAlOMcK8CGwMH CwkIBwMCAQYVCAIJCgsEFgIDAQIeAQIXgAAKCRCw3p3WKL8TL8eZB/9G0juS/kDY9LhEXseh mE9U+iA1VsLhgDqVbsOtZ/S14LRFHczNd/Lqkn7souCSoyWsBs3/wO+OjPvxf7m+Ef+sMtr0 G5lCWEWa9wa0IXx5HRPW/ScL+e4AVUbL7rurYMfwCzco+7TfjhMEOkC+va5gzi1KrErgNRHH kg3PhlnRY0Udyqx++UYkAsN4TQuEhNN32MvN0Np3WlBJOgKcuXpIElmMM5f1BBzJSKBkW0Jc Wy3h2Wy912vHKpPV/Xv7ZwVJ27v7KcuZcErtptDevAljxJtE7aJG6WiBzm+v9EswyWxwMCIO RoVBYuiocc51872tRGywc03xaQydB+9R7BHPzsBNBFOMcBYBCADLMfoA44MwGOB9YT1V4KCy vAfd7E0BTfaAurbG+Olacciz3yd09QOmejFZC6AnoykydyvTFLAWYcSCdISMr88COmmCbJzn sHAogjexXiif6ANUUlHpjxlHCCcELmZUzomNDnEOTxZFeWMTFF9Rf2k2F0Tl4E5kmsNGgtSa aMO0rNZoOEiD/7UfPP3dfh8JCQ1VtUUsQtT1sxos8Eb/HmriJhnaTZ7Hp3jtgTVkV0ybpgFg w6WMaRkrBh17mV0z2ajjmabB7SJxcouSkR0hcpNl4oM74d2/VqoW4BxxxOD1FcNCObCELfIS auZx+XT6s+CE7Qi/c44ibBMR7hyjdzWbABEBAAHCwF8EGAECAAkFAlOMcBYCGwwACgkQsN6d 1ii/Ey9D+Af/WFr3q+bg/8v5tCknCtn92d5lyYTBNt7xgWzDZX8G6/pngzKyWfedArllp0Pn fgIXtMNV+3t8Li1Tg843EXkP7+2+CQ98MB8XvvPLYAfW8nNDV85TyVgWlldNcgdv7nn1Sq8g HwB2BHdIAkYce3hEoDQXt/mKlgEGsLpzJcnLKimtPXQQy9TxUaLBe9PInPd+Ohix0XOlY+Uk QFEx50Ki3rSDl2Zt2tnkNYKUCvTJq7jvOlaPd6d/W0tZqpyy7KVay+K4aMobDsodB3dvEAs6 ScCnh03dDAFgIq5nsB11j3KPKdVoPlfucX2c7kGNH+LUMbzqV6beIENfNexkOfxHfw==
  • Cc: George Dunlap <george.dunlap@xxxxxxxxxx>, Dario Faggioli <dfaggioli@xxxxxxxx>, Henry Wang <Henry.Wang@xxxxxxx>
  • Delivery-date: Mon, 23 Oct 2023 06:31:37 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 20.10.23 20:03, Andrew Cooper wrote:
On 19/10/2023 12:23 pm, Juergen Gross wrote:
When moving a domain out of a cpupool running with the credit2
scheduler and having multiple run-queues, the following ASSERT() can
be observed:

(XEN) Xen call trace:
(XEN)    [<ffff82d04023a700>] R credit2.c#csched2_unit_remove+0xe3/0xe7
(XEN)    [<ffff82d040246adb>] S sched_move_domain+0x2f3/0x5b1
(XEN)    [<ffff82d040234cf7>] S cpupool.c#cpupool_move_domain_locked+0x1d/0x3b
(XEN)    [<ffff82d040236025>] S cpupool_move_domain+0x24/0x35
(XEN)    [<ffff82d040206513>] S domain_kill+0xa5/0x116
(XEN)    [<ffff82d040232b12>] S do_domctl+0xe5f/0x1951
(XEN)    [<ffff82d0402276ba>] S timer.c#timer_lock+0x69/0x143
(XEN)    [<ffff82d0402dc71b>] S pv_hypercall+0x44e/0x4a9
(XEN)    [<ffff82d0402012b7>] S lstar_enter+0x137/0x140
(XEN)
(XEN)
(XEN) ****************************************
(XEN) Panic on CPU 1:
(XEN) Assertion 'svc->rqd == c2rqd(sched_unit_master(unit))' failed at 
common/sched/credit2.c:1159
(XEN) ****************************************

This is happening as sched_move_domain() is setting a different cpu
for a scheduling unit without telling the scheduler. When this unit is
removed from the scheduler, the ASSERT() will trigger.

In non-debug builds the result is usually a clobbered pointer, leading
to another crash a short time later.

Fix that by swapping the two involved actions (setting another cpu and
removing the unit from the scheduler).

Cc: Henry Wang <Henry.Wang@xxxxxxx>
Fixes: 70fadc41635b ("xen/cpupool: support moving domain between cpupools with 
different granularity")

Link: https://github.com/Dasharo/dasharo-issues/issues/488

And a Reported/Tested-by if the discoverer wishes.

Signed-off-by: Juergen Gross <jgross@xxxxxxxx>
---
This fixes a regression introduced in Xen 4.15. The fix is very simple
and it will affect only configurations with multiple cpupools. I think
whether to include it in 4.18 should be decided by the release manager
based on the current state of the release (I think I wouldn't have
added it that late in the release while being the release manager).
---
  xen/common/sched/core.c | 7 ++++---
  1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/xen/common/sched/core.c b/xen/common/sched/core.c
index 12deefa745..e9f7486197 100644
--- a/xen/common/sched/core.c
+++ b/xen/common/sched/core.c
@@ -738,12 +738,13 @@ int sched_move_domain(struct domain *d, struct cpupool *c)
      new_p = cpumask_first(d->cpupool->cpu_valid);
      for_each_sched_unit ( d, unit )
      {
-        spinlock_t *lock = unit_schedule_lock_irq(unit);
+        spinlock_t *lock;
+
+        sched_remove_unit(old_ops, unit);
+ lock = unit_schedule_lock_irq(unit);
          sched_set_res(unit, get_sched_res(new_p));
          spin_unlock_irq(lock);
-
-        sched_remove_unit(old_ops, unit);

I'm happy to see the T-by, and that you know what's going on here, but I
don't understand the explanation.

The change here is the ordering of sched_remove_unit() with respect to
the lock/get&set/unlock block.

Yes.

remove unit is moderately complicated, but the get&set is just an RCU
pointer assignment.  But as the assertion fires in the latter action, it
must be the get&set causing problems.

Yes. As I wrote, it is because the cpu assignment of the unit is being
changed without telling the specific scheduler (credit2) about it.

And that's because new_p is actually a cpu number, which has the
consequence of causing Credit2's c2rqd() to materialise the wrong
csched2_runqueue_data pointer, and then we're operating on someone
else's data without a suitable lock held.

I haven't looked into the exact details what went wrong further down the
road, but I think it is not only the missing lock, but probably the
fact that depending on the use case different run queues will be addressed
for the same unit, causing inconsistencies.

And it's only by luck that none of the other schedulers tie something to
per-cpu data like this?

I didn't look into the other schedulers.

Other observations.

I think sched_move_domain() would be easier to follow with
s/new_p/new_cpu/ (and similar for unit_p) as "p" is not obviously
processor unless you notice the cpumask_first() calls.

Maybe. I don't have that problem, OTOH I've written that code. :-D

Why do we move a domain back to cpupool0 on domain destroy?  There's
nothing magic about cpupool0 that I'm aware of here.  It seems like
unnecessary complexity.

See commit bac6334b51d9.


Juergen

Attachment: OpenPGP_0xB0DE9DD628BF132F.asc
Description: OpenPGP public key

Attachment: OpenPGP_signature.asc
Description: OpenPGP digital signature


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.