Xen project Mailing List

Re: [PATCH] xen: arm: Don't use stop_cpu() in halt_this_cpu()

From: Stefano Stabellini <sstabellini@xxxxxxxxxx>

Date: Fri, 24 Jun 2022 14:31:52 -0700 (PDT)

Cc: Stefano Stabellini <sstabellini@xxxxxxxxxx>, dmitry.semenets@xxxxxxxxx, xen-devel@xxxxxxxxxxxxxxxxxxxx, Dmytro Semenets <dmytro_semenets@xxxxxxxx>, Bertrand Marquis <bertrand.marquis@xxxxxxx>, Volodymyr Babchuk <Volodymyr_Babchuk@xxxxxxxx>

Delivery-date: Fri, 24 Jun 2022 21:32:14 +0000

List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On Fri, 24 Jun 2022, Julien Grall wrote: > On 23/06/2022 23:07, Stefano Stabellini wrote: > > On Thu, 23 Jun 2022, dmitry.semenets@xxxxxxxxx wrote: > > > From: Dmytro Semenets <dmytro_semenets@xxxxxxxx> > > So wouldn't it be better to remove the panic from the implementation of > > call_psci_cpu_off? > > I have asked to keep the panic() in call_psci_cpu_off(). If you remove the > panic() then we will hide the fact that the CPU was not properly turned off > and will consume more energy than expected. > > The WFI loop is fine when shutting down or rebooting because we know this will > only happen for a short period of time. Yeah, I don't think we should hide that CPU_OFF failed. I think we should print a warning. However, given that we know CPU_OFF can reasonably fail in normal conditions returning DENIED when a Trusted OS is present, then I think we should not panic. If there was a way to distinguish a failure because a Trusted OS is present (the "normal" failure) from other failures, I would suggest to: - print a warning if failed due to a Trusted OS being present - panic in other cases Unfortunately it looks like in all cases the return code is DENIED :-( Given that, I would not panic and only print a warning in all cases. Or we could ASSERT which at least goes away in !DEBUG builds. > > The reason I am saying this is that stop_cpu is called in a number of > > places beyond halt_this_cpu and as far as I can tell any of them could > > trigger the panic. (I admit they are unlikely places but still.) > > This is one of the example where the CPU will not be stopped for a short > period of time. We should deal with them differently (i.e. migrating the > trusted OS or else) so we give all the chance for the CPU to be fully powered. > > IMHO, this is a different issue and hence why I didn't ask Dmitry to solve it. I see your point now. I was seeing the two things as one. I think it is true that the WFI loop is likely to work. Also it is true that from a power perspective it makes no different on power down or reboot. From that point of view this patch is OK. But even on shut-down/reboot, why not do that as a fallback in case CPU_OFF didn't work? It is going to work most of the times anyway, why change the default for the few cases that doesn't work? Given that this patch would work, I don't want to insist on this and let you decide. But even if we don't want to remove the panic as part of this patch, I think we should remove the panic in a separate patch anyway, at least until someone investigates and thinks of a strategy how to migrate the TrustedOS as you suggested. > > Also the PSCI spec explicitely mention CPU_OFF as a way to place CPUs in > > a "known state" and doesn't offer any other examples. So although what > > you wrote in the commit message is correct, using CPU_OFF seems to be > > the less error prone way (in the sense of triggering firmware errors) to > > place CPUs in a known state. > > The section you are referring to is starting with "One way". This imply there > are others methods. > > FWIW, the spin loop above seems to be how Linux is dealing with the > shutdown/reboot. > > > > > So I would simply remove the panic from call_psci_cpu_off, so that we > > try CPU_OFF first, and if it doesn't work, we use the WFI loop as > > backup. Also we get to fix all the other callers of stop_cpu this way. > This reads strange. In the previous paragraph you suggested the CPU off is a > less error prone way to place CPUs in a known state. But here, you are > softening the stance and suggesting to fallback to the WFI loop. > > So to me this indicates that WFI loop is fine. Otherwise, wouldn't the user > risk to see firmware errors (which BTW, I don't understand what sort of > firmware errors you are worried)? If yes, why would it be acceptable? > > For instance, Dmitry situation, the CPU0 would always WFI loop...

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.