Xen project Mailing List

Re: [PATCH] xen: Don't call panic if ARM TF cpu off returns DENIED

From: Dmytro Semenets <dmitry.semenets@xxxxxxxxx>

Date: Sat, 18 Jun 2022 20:43:47 +0300

Cc: Volodymyr Babchuk <Volodymyr_Babchuk@xxxxxxxx>, Dmytro Semenets <Dmytro_Semenets@xxxxxxxx>, Stefano Stabellini <sstabellini@xxxxxxxxxx>, Bertrand Marquis <bertrand.marquis@xxxxxxx>, "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>

Delivery-date: Sat, 18 Jun 2022 17:44:18 +0000

List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

пт, 17 июн. 2022 г. в 14:12, Julien Grall <julien@xxxxxxx>: Hi Julien, > > Hi, > > On 17/06/2022 10:10, Volodymyr Babchuk wrote: > > Julien Grall <julien@xxxxxxx> writes: > > > >> On 16/06/2022 19:40, Volodymyr Babchuk wrote: > >>> Hi Julien, > >> > >> Hi Volodymyr, > >> > >>> Julien Grall <julien@xxxxxxx> writes: > >>> > >>>> Hi, > >>>> > >>>> On 16/06/2022 14:55, dmitry.semenets@xxxxxxxxx wrote: > >>>>> From: Dmytro Semenets <dmytro_semenets@xxxxxxxx> > >>>>> According to PSCI specification ARM TF can return DENIED on CPU OFF. > >>>> > >>>> I am confused. The spec is talking about Trusted OS and not > >>>> firmware. The docummentation is also not specific to ARM Trusted > >>>> Firmware. So did you mean "Trusted OS"? > >>> It should be "firmware", I believe. > >> > >> Hmmm... I couldn't find a reference in the spec suggesting that > >> CPU_OFF could return DENIED because of the firmware. Do you have a > >> pointer to the spec? > > > > Ah, looks like we are talking about different things. Indeed, CPU_OFF > > can return DENIED only because of Trusted OS. But entity that *returns* > > the error to a caller is a firmware. > > Right, the interesting part is *why* DENIED is returned not *who* > returns it. ARM TF returns DENIED *only* for the platform I have. We have a dissonance between spec and xen implementation because DENIED returned by ARM TF or Thrusted OS or whatever is not a reason for panic. And we have issues with this. If machine_restart() behaviour is more or less correct (sometimes reports about panic but restarts the machine) but machine_halt() doesn't work at all. Transit execution to CPU0 for my understanding is a workaround and this approach will fix machine_restart() function but will not fix machine_halt(). Approach you suggested (spinning all cpus) will work but will save less energy. > >>>>> Refer to "Arm Power State Coordination Interface (DEN0022D.b)" > >>>>> section 5.5.2 > >>>> > >>>> Reading both 5.5.2 and 5.9.1 together, DENIED would be returned when > >>>> the trusted OS can only run on one core. > >>>> > >>>> Some of the trusted OS are migratable. So I think we should first > >>>> attempt to migrate the CPU. Then if it doesn't work, we should prevent > >>>> the CPU to go offline. > >>>> > >>>> That said, upstream doesn't support cpu offlining (I don't know for > >>>> your use case). In case of shutdown, it is not necessary to offline > >>>> the CPU, so we could avoid to call CPU_OFF on all CPUs but > >>>> one. Something like: > >>>> > >>> This is even better approach yes. But you mentioned CPU_OFF. Did you > >>> mean SYSTEM_RESET? > >> > >> By CPU_OFF I was referring to the fact that Xen will issue the call > >> all CPUs but one. The remaining CPU will issue the command to > >> reset/shutdown the system. > >> > > > > I just want to clarify: change that you suggested removes call to > > stop_cpu() in halt_this_cpu(). So no CPU_OFF will be sent at all. > > I was describing the existing behavior. > > > > > All CPUs except one will spin in > > > > while ( 1 ) > > wfi(); > > > > while last cpu will issue SYSTEM_OFF or SYSTEM_RESET. > > > > Is this correct? > > Yes. > > > > >>>> void machine_halt(void) > >>>> @@ -21,10 +23,6 @@ void machine_halt(void) > >>>> smp_call_function(halt_this_cpu, NULL, 0); > >>>> local_irq_disable(); > >>>> > >>>> - /* Wait at most another 10ms for all other CPUs to go offline. */ > >>>> - while ( (num_online_cpus() > 1) && (timeout-- > 0) ) > >>>> - mdelay(1); > >>>> - > >>>> /* This is mainly for PSCI-0.2, which does not return if success. > >>>> */ > >>>> call_psci_system_off(); > >>>> > >>>>> Signed-off-by: Dmytro Semenets <dmytro_semenets@xxxxxxxx> > >>>>> Reviewed-by: Volodymyr Babchuk <volodymyr_babchuk@xxxxxxxx> > >>>> > >>>> I don't recall to see patch on the ML recently for this. So is this an > >>>> internal review? > >>> Yeah, sorry about that. Dmytro is a new member in our team and he is > >>> not > >>> yet familiar with differences in internal reviews and reviews in ML. > >> > >> No worries. I usually classify internal review anything that was done > >> privately. This looks to be a public review, althought not on > >> xen-devel. > >> > >> I understand that not all some of the patches are still in PoC stage > >> and doing the review on your github is a good idea. But for those are > >> meant to be for upstream (e.g. bug fixes, small patches), I would > >> suggest to do the review on xen-devel directly. > > > > It not always clear if patch is eligible for upstream. At first we > > thought that problem is platform-specific and we weren't sure that we > > will find a proper upstreamable fix. > > You can guess but not be sure until you send it to upstrema :). In fact,... > > > Probably you saw that PR's name > > quite differs from final patch. This is because initial solution was > > completely different from final one. > > ... even before looking at your PR, this was the first solution I had in > mind. I am still pondering whether this could be the best approach > because I have the suspicion there might be some platform out relying on > receiving the shutdown request on CPU0. > > Anyway, this is so far just theorical, my proposal should solve your > problem. > > On a separate topic, the community is aiming to support a wide range of > platforms out-of-the-box. I think platform-specific patches are > acceptable so long they are self-contained (to some extend. I.e if you > ask to support Xen on RPI3 then I would still probably argue against :)) > or have a limited impact to the rest of the users (this is why we have > alternative in Xen). > > My point here is your initial solution may have been the preferred > approach for upstream. So if you involve the community early, you are > reducing the risk to have to backtrack and/or spend extra time in the > wrong directions. > > Cheers, > > -- > Julien Grall

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.