[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH 3/7] xen/arm/psci: Implement CPU_OFF PSCI call (physical interface)



Hi Julien,

On Mon, Apr 16, 2018 at 4:26 PM, Julien Grall <julien.grall@xxxxxxx> wrote:
> Hi Mirela,
>
>
> On 16/04/18 11:02, Mirela Simonovic wrote:
>>
>> On Thu, Apr 12, 2018 at 3:31 PM, Julien Grall <julien.grall@xxxxxxx>
>> wrote:
>>>
>>>
>>>
>>> On 12/04/18 12:33, Mirela Simonovic wrote:
>>>>
>>>>
>>>> On Wed, Apr 11, 2018 at 4:46 PM, Julien Grall <julien.grall@xxxxxxx>
>>>> wrote:
>>>>>
>>>>>
>>>>> On 11/04/18 14:19, Mirela Simonovic wrote:
>>>>>
>>>>>>         local_irq_disable();
>>>>>>         cpu_is_dead = true;
>>>>>>         /* Make sure the write happens before we sleep forever */
>>>>>>         dsb(sy);
>>>>>>         isb();
>>>>>> +    /* PSCI cpu off call will return only in case of an error */
>>>>>> +    errno = call_psci_cpu_off();
>>>>>> +    printk(XENLOG_DEBUG "PSCI cpu off call failed for CPU#%d
>>>>>> err=%d\n",
>>>>>> +           get_processor_id(), errno);
>>>>>> +    isb();
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> What are you trying to achieve with the isb() here?
>>>>>
>>>>
>>>> I use to have a problem that the wfi below gets executed before the
>>>> call_psci_cpu_off(). Adding isb() fixed the issue. However, I tried
>>>> now to reproduce the problem and it doesn't show up. I still believe
>>>> isb() should be here, please let me know if you disagree (I obviously
>>>> can't prove the claim now).
>>>
>>>
>>>
>>> The problem you describe can't be possible with the code you have because
>>> call_psci_cpu_off() is issuing a SMC. SMC will lead to change exception
>>> level and therefore have a context-synchronization barrier.
>>>
>>> This is obviously based on the assumption you don't have an errata on
>>> your
>>> CPU exposing the behavior you describe. For that you would need to check
>>> errata notice for your CPU and/or try to reproduce.
>>>
>>> However, what you would need is a dsb(sy); isb(); to drain the write
>>> buffer
>>> if you print a message.
>>>
>>> Furthermore, now on platform without CPU off support (e.g non-PSCI
>>> platform
>>> and PSCI 0.1) you will log an error message that may worry people. In
>>> reality, PSCI cpu_off will unlikely fail, so you probably want to add a
>>> panic in call_psci_cpu_off instead.
>>>
>>
>> Even if PSCI cpu_off call fails, what is unlikely to happen, the
>> system is still functional.
>
>
> I disagree here, if you are unable to turn off a CPU via PSCI then something
> is definitely wrong. This means that CPU will forever spin in Xen code with
> no way to exit. This could bring interesting issue with anything potentially
> modifying Xen code (i.e livepatching).
>
> IHMO, the forever sleep in stop_cpu() is just a temporary solution to cater
> shutdown of the platform. The state of secondary CPU does not much matter at
> that time. In case of suspend/resume you want really want to be able to turn
> off those CPUs correctly otherwise they are not going to come up again.
>

If we follow that logic the CPU will not be able to exit WFI state
either. So we should raise panic in that case as well and in cases
where the system is suspending + the CPU is stopped + cpu off doesn't
work so the CPU cannot be enabled again.
However, raising panic makes no sense for shutdown scenario.
How about we do it something like this:

void stop_cpu(void)
{
    ...
    if ( system_state == SYS_STATE_suspend )
        call_psci_cpu_off();

    while ( 1 )
        wfi();
}

void call_psci_cpu_off(void)
{
    int errno;

    /* If successfull the cpu_off call doesn't return */
    errno = call_smc(PSCI_0_2_FN32_CPU_OFF, 0, 0, 0);
    if ( errno )
        panic("PSCI cpu off failed for CPU%d err=%d\n", get_processor_id(),
                errno);
}

call_psci_cpu_off should not check for PSCI version because we need to
panic regardless.

Thanks,
Mirela


>> Enabling that pCPU later will fail, but
>> Xen can handle this error and continue running properly on the boot
>> pCPU (I've tested this in 2 pCPUs config).
>
>
> I don't consider that as xen running properly. You lost a pCPU so your
> workload is completely different. Imagine you are using the NULL scheduler
> (e.g only one vCPU is pinned to a specific pCPU), what are you going to do
> with the vCPU?
>
>> Therefore, I believe panic may not be necessary in this case. I
>> suggest that we dump the error message and continue to run. Please let
>> me know if you disagree.
>
>
> This is a bad idea, a failure should at least be logged to show something
> gone wrong.
>
> PSCI CPU off will, as you said, unlikely failed. Looking at the spec, the
> only possible reason is your are trying to turn off a CPU where Trusted OS
> is resident. This means something far more wrong is happening in Xen code
> and I don't think it would be safe to continue to run.
>
> Hence why I suggested a BUG_ON/panic because this is something that is not
> meant to happen.
>
> Cheers,
>
> --
> Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.