[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH 3/7] xen/arm/psci: Implement CPU_OFF PSCI call (physical interface)



Hi Mirela,

On 16/04/18 11:02, Mirela Simonovic wrote:
On Thu, Apr 12, 2018 at 3:31 PM, Julien Grall <julien.grall@xxxxxxx> wrote:


On 12/04/18 12:33, Mirela Simonovic wrote:

On Wed, Apr 11, 2018 at 4:46 PM, Julien Grall <julien.grall@xxxxxxx>
wrote:

On 11/04/18 14:19, Mirela Simonovic wrote:

        local_irq_disable();
        cpu_is_dead = true;
        /* Make sure the write happens before we sleep forever */
        dsb(sy);
        isb();
+    /* PSCI cpu off call will return only in case of an error */
+    errno = call_psci_cpu_off();
+    printk(XENLOG_DEBUG "PSCI cpu off call failed for CPU#%d err=%d\n",
+           get_processor_id(), errno);
+    isb();



What are you trying to achieve with the isb() here?


I use to have a problem that the wfi below gets executed before the
call_psci_cpu_off(). Adding isb() fixed the issue. However, I tried
now to reproduce the problem and it doesn't show up. I still believe
isb() should be here, please let me know if you disagree (I obviously
can't prove the claim now).


The problem you describe can't be possible with the code you have because
call_psci_cpu_off() is issuing a SMC. SMC will lead to change exception
level and therefore have a context-synchronization barrier.

This is obviously based on the assumption you don't have an errata on your
CPU exposing the behavior you describe. For that you would need to check
errata notice for your CPU and/or try to reproduce.

However, what you would need is a dsb(sy); isb(); to drain the write buffer
if you print a message.

Furthermore, now on platform without CPU off support (e.g non-PSCI platform
and PSCI 0.1) you will log an error message that may worry people. In
reality, PSCI cpu_off will unlikely fail, so you probably want to add a
panic in call_psci_cpu_off instead.


Even if PSCI cpu_off call fails, what is unlikely to happen, the
system is still functional.

I disagree here, if you are unable to turn off a CPU via PSCI then something is definitely wrong. This means that CPU will forever spin in Xen code with no way to exit. This could bring interesting issue with anything potentially modifying Xen code (i.e livepatching).

IHMO, the forever sleep in stop_cpu() is just a temporary solution to cater shutdown of the platform. The state of secondary CPU does not much matter at that time. In case of suspend/resume you want really want to be able to turn off those CPUs correctly otherwise they are not going to come up again.

Enabling that pCPU later will fail, but
Xen can handle this error and continue running properly on the boot
pCPU (I've tested this in 2 pCPUs config).

I don't consider that as xen running properly. You lost a pCPU so your workload is completely different. Imagine you are using the NULL scheduler (e.g only one vCPU is pinned to a specific pCPU), what are you going to do with the vCPU?

Therefore, I believe panic may not be necessary in this case. I
suggest that we dump the error message and continue to run. Please let
me know if you disagree.

This is a bad idea, a failure should at least be logged to show something gone wrong.

PSCI CPU off will, as you said, unlikely failed. Looking at the spec, the only possible reason is your are trying to turn off a CPU where Trusted OS is resident. This means something far more wrong is happening in Xen code and I don't think it would be safe to continue to run.

Hence why I suggested a BUG_ON/panic because this is something that is not meant to happen.

Cheers,

--
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.