[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] xen/timer: don't migrate timers away from cpus during suspend


  • To: Juergen Gross <jgross@xxxxxxxx>
  • From: Jan Beulich <jbeulich@xxxxxxxx>
  • Date: Tue, 6 Sep 2022 17:50:15 +0200
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=suse.com; dmarc=pass action=none header.from=suse.com; dkim=pass header.d=suse.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=vB4j2pxCH2RTjllMuol6WACmZJs5T0m0lZfjkdgonzw=; b=ZzF8LffjX9dQGqGmwDT5ZDTcEpL4YkWsAr+vXZ1yIgihGnsQXUhjNAEt5G87E+9N6NfquXPH7OQAP2OlHu7iFe5hab7VUhV338KfMlf0YbNZZquLa7111ZmOxtv04R3PwZVXGdXJVr+RURfAxZyfRYdBUWOPW4et+ehTItXVV0rL8Kc5sImEetnVcz1+q44iDYpaXm42xzg2ljfhLZvVkGwqlu43z8E6lz7/U4D4djoGe+xNKyV1oRuzrP7YgGh+mbIocPKhIDCn70TBKn58ECrCyjRbhzLgyv0x6DDyP8+e/kPzlyOjWDekkANQpGZ5UAi8/6zVDLtVIozzR6u/RQ==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=JIwv1472R+qR0mhXPVS4RGnUZn0onvYqV4ZO7E7Cf9fssVWklImcJgh7ciacrH6nnK3/BeREG8TsK5Q4EP/x8wWrihJf2zWr1NQOJ3gEkMoDxFKTpyr03XTTdomx5RixvcpN91taYyREJUmSD2GrdrlhFj8RPt6A1SQuX0khdFcWgqbzwD6KEeXZUD9KU7Niw5VW6CtVLUzyYhppPNY9zUltExJq7Sxkzv9OPBECiLMUB2Rx8j0xVhaoNIZIt1BtnGaBf2TBIxwGepv9HolVWRCCn/z1qnW+D7Szf1Hv3FXwbs8obPWnzTh5HzIbhlkR62DaHX+QoBNlvcmpG2705g==
  • Authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=suse.com;
  • Cc: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, George Dunlap <george.dunlap@xxxxxxxxxx>, Julien Grall <julien@xxxxxxx>, Stefano Stabellini <sstabellini@xxxxxxxxxx>, Wei Liu <wl@xxxxxxx>, Marek Marczykowski-Górecki <marmarek@xxxxxxxxxxxxxxxxxxxxxx>, xen-devel@xxxxxxxxxxxxxxxxxxxx
  • Delivery-date: Tue, 06 Sep 2022 21:39:07 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 06.09.2022 14:41, Juergen Gross wrote:
> During a suspend/resume cycle timers on all cpus but cpu 0 will be
> migrated to cpu 0, as the other cpus are taken down.
> 
> This is problematic in case such a timer is related to a specific vcpu,
> as the vcpus are not migrated to another cpu during suspend (migrating
> them would break cpupools and core scheduling).
> 
> In order to avoid the problems just try to keep the timers on their
> cpus. Only migrate them away in case resume failed. Doing so isn't
> problematic, as any vcpu on a cpu not coming back to life would be
> migrated away, too.

The description fails to make clear what the problem is with a timer
which "is related to a specific vcpu". In principle there's no issue
with such a timer running on an arbitrary CPU. An example of a case
where a problem exists may help. This might then also clarify whether
it wouldn't be better to remove such assumptions from the (few?)
cases where they are made. Plus this might then also clarify why
this appears to be a credit1-specific issue.

Also to me "just try to keep" reads like "best effort", which isn't
what the patch does. I'd like to suggest to drop "just try to" and
maybe further insert "CPU" before "resume".

As to this not being a problem - if there are assumptions on the CPU
a timer runs on, why would this not be the case after resume? Timers
are migrated to random CPUs, and hence it's not very likely that the
vCPU would end up on the same CPU the timer was migrated to. IOW to
me it looks as if this would work only if _all_ APs failed to come
back up, and the system would continue with just the BSP.

Jan



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.