Xen project Mailing List

Re: [PATCH-for-4.17 v2] xen/sched: migrate timers to correct cpus after suspend

Date: Wed, 2 Nov 2022 15:07:55 +0100

Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=suse.com; dmarc=pass action=none header.from=suse.com; dkim=pass header.d=suse.com; arc=none

Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=MNImGd2VfP1X7a4ep1HdEPAIBmU3R5jAKZYwcxXLpUY=; b=UYlCt+EDBoJhMYlJCyPcymyvtDV7maBdLMRQeCNox7oPr2Trdg54JWSamH0wZOSYVX/KcBKYMP5CwxDbHn7IOPij1sDc692/9YTiABDGkHaEyJkT/MOeEgng64iUMVHx4f4leGW7Kxr2HKy+p7Lf/93LIevpC2Nq1NhTU/sgFBDc++Siau6ZK7iNA0xJh4DA2akk7BS0dUrV//LQ1SbZrMuZ/5OQUt+e90BXjeYxWXQe+uc0kFlL4wYECBgBTn6mXGhk/gex0KFnTPYLEyl8OmVoiZ1EiQ8u+7XbrwcssaGDZsnbJ2mT51QRA3c9YHXX4YuvyTJqTlZhiWSM4DA//g==

Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=dzhIE70kjY+y+cuY46maKKB5pnsYXCbwcXN0hZ6Mz84WiezyubOx3FMIZgRsUm2/x31mBYNYQUSO5VptWrsZWQae8/7JR3ov7eFf4Q57EXIJHhlxbeyLMNnXQLMaLYv9xfMqbdF3pFnw0Ubo1DsvG1FoBntY3AaX0vgdGeZTMxyDfkpAQZ3gS4+ofObDgF9wQ10kPAiYGwydHTHa6ZsCv2j+gJt1HkDVR9bEFsmSmRq+fhc6GL3PxZAVHQuA4CkVlLgOEc07axYsJ3lhn/qGKGTOW8YTsVKWpX5CUeYrk+4OJPvGRFuRhr3EfBz482ZzX2oNZ1tQ0MJ10+mGUwt4vw==

Authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=suse.com;

Cc: George Dunlap <george.dunlap@xxxxxxxxxx>, Dario Faggioli <dfaggioli@xxxxxxxx>, Meng Xu <mengxu@xxxxxxxxxxxxx>, Marek Marczykowski-Górecki <marmarek@xxxxxxxxxxxxxxxxxxxxxx>, Henry Wang <Henry.Wang@xxxxxxx>, xen-devel@xxxxxxxxxxxxxxxxxxxx

Delivery-date: Wed, 02 Nov 2022 14:08:17 +0000

List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 28.10.2022 13:12, Juergen Gross wrote: > Today all timers are migrated to cpu 0 when the system is being > suspended. They are not migrated back after resuming the system again. > > This results (at least) to problems with the credit scheduler, as the > timer isn't handled on the cpu it was expected to occur. While you say "at least", this doesn't really make clear in how far all four timers for which you change their handling are actually problematic, or whether for some you make the adjustment "just in case". Looking at core.c's s_timer I'm inclined to say that with s_timer_fn() raising the schedule softirq things can't go well when this doesn't happen on the correct CPU. Just that it won't be an obvious problem like the crash in credit1 which had prompted the creation of this patch. > Add migrating the scheduling related timers of a specific cpu from cpu > 0 back to its original cpu when that cpu has gone up when resuming the > system. > > Signed-off-by: Juergen Gross <jgross@xxxxxxxx> Is there any Fixes: tag possible? If not, could you add a respective note in the post-commit-message area, to increase the chance of recalling that this will want queuing for backport? (Or maybe the lack of a reasonable Fixes: tag could actually justify the use of the Backport: one.) > --- a/xen/common/sched/core.c > +++ b/xen/common/sched/core.c > @@ -1284,6 +1284,32 @@ static int cpu_disable_scheduler_check(unsigned int > cpu) > return 0; > } > > +/* > + * Called after a cpu has come up again in a suspend/resume cycle. > + * Note that on a system with smt=0 this will be called for the sibling cpus, > + * too, so the case for no scheduling resource being available must be > + * considered. I think this part of the comment would better live ... > + * Migrate all timers for this cpu (they have been migrated to cpu 0 when the > + * cpu was going down). > + * Note that only timers related to a physical cpu are migrated, not the ones > + * related to a vcpu or domain. > + */ > +void sched_migrate_timers(unsigned int cpu) > +{ > + struct sched_resource *sr; > + > + rcu_read_lock(&sched_res_rculock); > + > + sr = get_sched_res(cpu); ... inbetween here, increasing the chance that there won't be someone trying to remove the extra check ... > + if ( sr && sr->master_cpu == cpu ) ... from here. I further think that explicitly mentioning "smt=0" isn't very helpful. Aiui a system with some CPUs otherwise soft-offlined would suffer the same problem. And I further assume no problem would occur even with "smt=0" on AMD hardware or Arm (where we don't park CPUs). At the very least I'd therefore like to ask for "e.g." to be inserted; better would be a more generic statement like "with some CPUs parked". Jan

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.