[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH] xen/sched: fix sched_move_domain()
On 20.10.23 20:03, Andrew Cooper wrote: On 19/10/2023 12:23 pm, Juergen Gross wrote:When moving a domain out of a cpupool running with the credit2 scheduler and having multiple run-queues, the following ASSERT() can be observed: (XEN) Xen call trace: (XEN) [<ffff82d04023a700>] R credit2.c#csched2_unit_remove+0xe3/0xe7 (XEN) [<ffff82d040246adb>] S sched_move_domain+0x2f3/0x5b1 (XEN) [<ffff82d040234cf7>] S cpupool.c#cpupool_move_domain_locked+0x1d/0x3b (XEN) [<ffff82d040236025>] S cpupool_move_domain+0x24/0x35 (XEN) [<ffff82d040206513>] S domain_kill+0xa5/0x116 (XEN) [<ffff82d040232b12>] S do_domctl+0xe5f/0x1951 (XEN) [<ffff82d0402276ba>] S timer.c#timer_lock+0x69/0x143 (XEN) [<ffff82d0402dc71b>] S pv_hypercall+0x44e/0x4a9 (XEN) [<ffff82d0402012b7>] S lstar_enter+0x137/0x140 (XEN) (XEN) (XEN) **************************************** (XEN) Panic on CPU 1: (XEN) Assertion 'svc->rqd == c2rqd(sched_unit_master(unit))' failed at common/sched/credit2.c:1159 (XEN) **************************************** This is happening as sched_move_domain() is setting a different cpu for a scheduling unit without telling the scheduler. When this unit is removed from the scheduler, the ASSERT() will trigger. In non-debug builds the result is usually a clobbered pointer, leading to another crash a short time later. Fix that by swapping the two involved actions (setting another cpu and removing the unit from the scheduler). Cc: Henry Wang <Henry.Wang@xxxxxxx> Fixes: 70fadc41635b ("xen/cpupool: support moving domain between cpupools with different granularity")Link: https://github.com/Dasharo/dasharo-issues/issues/488 And a Reported/Tested-by if the discoverer wishes.Signed-off-by: Juergen Gross <jgross@xxxxxxxx> --- This fixes a regression introduced in Xen 4.15. The fix is very simple and it will affect only configurations with multiple cpupools. I think whether to include it in 4.18 should be decided by the release manager based on the current state of the release (I think I wouldn't have added it that late in the release while being the release manager). --- xen/common/sched/core.c | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/xen/common/sched/core.c b/xen/common/sched/core.c index 12deefa745..e9f7486197 100644 --- a/xen/common/sched/core.c +++ b/xen/common/sched/core.c @@ -738,12 +738,13 @@ int sched_move_domain(struct domain *d, struct cpupool *c) new_p = cpumask_first(d->cpupool->cpu_valid); for_each_sched_unit ( d, unit ) { - spinlock_t *lock = unit_schedule_lock_irq(unit); + spinlock_t *lock; + + sched_remove_unit(old_ops, unit);+ lock = unit_schedule_lock_irq(unit);sched_set_res(unit, get_sched_res(new_p)); spin_unlock_irq(lock); - - sched_remove_unit(old_ops, unit);I'm happy to see the T-by, and that you know what's going on here, but I don't understand the explanation. The change here is the ordering of sched_remove_unit() with respect to the lock/get&set/unlock block. Yes. remove unit is moderately complicated, but the get&set is just an RCU pointer assignment. But as the assertion fires in the latter action, it must be the get&set causing problems. Yes. As I wrote, it is because the cpu assignment of the unit is being changed without telling the specific scheduler (credit2) about it. And that's because new_p is actually a cpu number, which has the consequence of causing Credit2's c2rqd() to materialise the wrong csched2_runqueue_data pointer, and then we're operating on someone else's data without a suitable lock held. I haven't looked into the exact details what went wrong further down the road, but I think it is not only the missing lock, but probably the fact that depending on the use case different run queues will be addressed for the same unit, causing inconsistencies. And it's only by luck that none of the other schedulers tie something to per-cpu data like this? I didn't look into the other schedulers. Other observations. I think sched_move_domain() would be easier to follow with s/new_p/new_cpu/ (and similar for unit_p) as "p" is not obviously processor unless you notice the cpumask_first() calls. Maybe. I don't have that problem, OTOH I've written that code. :-D Why do we move a domain back to cpupool0 on domain destroy? There's nothing magic about cpupool0 that I'm aware of here. It seems like unnecessary complexity. See commit bac6334b51d9. Juergen Attachment:
OpenPGP_0xB0DE9DD628BF132F.asc Attachment:
OpenPGP_signature.asc
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |