[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH v2] xen/sched: fix sched_move_domain()
On 16.11.23 17:48, George Dunlap wrote: On Tue, Nov 14, 2023 at 1:30 PM Juergen Gross <jgross@xxxxxxxx> wrote:When moving a domain out of a cpupool running with the credit2 scheduler and having multiple run-queues, the following ASSERT() can be observed: (XEN) Xen call trace: (XEN) [<ffff82d04023a700>] R credit2.c#csched2_unit_remove+0xe3/0xe7 (XEN) [<ffff82d040246adb>] S sched_move_domain+0x2f3/0x5b1 (XEN) [<ffff82d040234cf7>] S cpupool.c#cpupool_move_domain_locked+0x1d/0x3b (XEN) [<ffff82d040236025>] S cpupool_move_domain+0x24/0x35 (XEN) [<ffff82d040206513>] S domain_kill+0xa5/0x116 (XEN) [<ffff82d040232b12>] S do_domctl+0xe5f/0x1951 (XEN) [<ffff82d0402276ba>] S timer.c#timer_lock+0x69/0x143 (XEN) [<ffff82d0402dc71b>] S pv_hypercall+0x44e/0x4a9 (XEN) [<ffff82d0402012b7>] S lstar_enter+0x137/0x140 (XEN) (XEN) (XEN) **************************************** (XEN) Panic on CPU 1: (XEN) Assertion 'svc->rqd == c2rqd(sched_unit_master(unit))' failed at common/sched/credit2.c:1159 (XEN) **************************************** This is happening as sched_move_domain() is setting a different cpu for a scheduling unit without telling the scheduler. When this unit is removed from the scheduler, the ASSERT() will trigger. In non-debug builds the result is usually a clobbered pointer, leading to another crash a short time later. Fix that by swapping the two involved actions (setting another cpu and removing the unit from the scheduler). Cc: Henry Wang <Henry.Wang@xxxxxxx> Link: https://github.com/Dasharo/dasharo-issues/issues/488 Fixes: 70fadc41635b ("xen/cpupool: support moving domain between cpupools with different granularity") Signed-off-by: Juergen Gross <jgross@xxxxxxxx> --- V2: - added Link: (reporter didn't want to be added by name) --- xen/common/sched/core.c | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/xen/common/sched/core.c b/xen/common/sched/core.c index 12deefa745..e9f7486197 100644 --- a/xen/common/sched/core.c +++ b/xen/common/sched/core.c @@ -738,12 +738,13 @@ int sched_move_domain(struct domain *d, struct cpupool *c) new_p = cpumask_first(d->cpupool->cpu_valid);There's a comment just above here which should probably be updated; something like "Remove all units from the old scheduler, and temporarily move them to the same processor to make locking easier when moving the new units to nwe processors." With that change: Reviewed-by: George Dunlap <george.dunlap@xxxxxxxxx> I could change that on check-if you'd like. Yes, please (with s/nwe/new/). I take it at this point this is just for the 4.19 branch, and this will be a candidate for backport to 4.18.1? Correct. Juergen Attachment:
OpenPGP_0xB0DE9DD628BF132F.asc Attachment:
OpenPGP_signature.asc
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |