[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH-for-4.17 v2] xen/sched: migrate timers to correct cpus after suspend
On 02.11.22 15:07, Jan Beulich wrote: On 28.10.2022 13:12, Juergen Gross wrote:Today all timers are migrated to cpu 0 when the system is being suspended. They are not migrated back after resuming the system again. This results (at least) to problems with the credit scheduler, as the timer isn't handled on the cpu it was expected to occur.While you say "at least", this doesn't really make clear in how far all four timers for which you change their handling are actually problematic, or whether for some you make the adjustment "just in case". Looking at core.c's s_timer I'm inclined to say that with s_timer_fn() raising the schedule softirq things can't go well when this doesn't happen on the correct CPU. Just that it won't be an obvious problem like the crash in credit1 which had prompted the creation of this patch.Add migrating the scheduling related timers of a specific cpu from cpu 0 back to its original cpu when that cpu has gone up when resuming the system. Signed-off-by: Juergen Gross <jgross@xxxxxxxx>Is there any Fixes: tag possible? If not, could you add a respective note in the post-commit-message area, to increase the chance of recalling that this will want queuing for backport? (Or maybe the lack of a reasonable Fixes: tag could actually justify the use of the Backport: one.) I'm not sure whether it is really correct, but I assume the most probable candidate for a Fixes: tag would be commit 0763cd268789. --- a/xen/common/sched/core.c +++ b/xen/common/sched/core.c @@ -1284,6 +1284,32 @@ static int cpu_disable_scheduler_check(unsigned int cpu) return 0; }+/*+ * Called after a cpu has come up again in a suspend/resume cycle. + * Note that on a system with smt=0 this will be called for the sibling cpus, + * too, so the case for no scheduling resource being available must be + * considered.I think this part of the comment would better live ...+ * Migrate all timers for this cpu (they have been migrated to cpu 0 when the + * cpu was going down). + * Note that only timers related to a physical cpu are migrated, not the ones + * related to a vcpu or domain. + */ +void sched_migrate_timers(unsigned int cpu) +{ + struct sched_resource *sr; + + rcu_read_lock(&sched_res_rculock); + + sr = get_sched_res(cpu);... inbetween here, increasing the chance that there won't be someone trying to remove the extra check ... Okay. + if ( sr && sr->master_cpu == cpu )... from here. I further think that explicitly mentioning "smt=0" isn't very helpful. Aiui a system with some CPUs otherwise soft-offlined would suffer the same problem. And I further assume no problem would occur even with "smt=0" on AMD hardware or Arm (where we don't park CPUs). At the very least I'd therefore like to ask for "e.g." to be inserted; better would be a more generic statement like "with some CPUs parked". I'll go with the latter. Juergen Attachment:
OpenPGP_0xB0DE9DD628BF132F.asc Attachment:
OpenPGP_signature
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |