[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v1 2/2] xen: move per-cpu area management into common code



Hi Julien,

On Sun, 2024-09-22 at 10:23 +0200, Julien Grall wrote:
> > +}
>  > +#endif> +
> > +#ifndef ARCH_CPU_PERCPU_CALLBACK
> > +inline int arch_cpu_percpu_callback(struct notifier_block *nfb,
> > +                                    unsigned long action, void
> > *hcpu)
> 
> I am not entirely sure we should introduce arch_cpu_percpu_callback.
> It 
> seems there are some redundant work. Looking at the x86
> implementation 
> the differences are:
>    * The additional checks
>    * Extra actions (e.g CPU_RESUME_FAILED/CPU_REMOVE).
> 
> I think the extra actions also make sense for other architectures.
> For 
> the additional checks, the parking feature is implemented in
> common/*.
> 
> So is there any way we could avoid introduce
> arch_cpu_percpu_callback()?

Initially, I did not want to introduce arch_cpu_percpu_callback(), and
if it were only the park_offline_cpus check, it would be safe enough
since other architectures would continue to function as before ( and
CPU parking is a common feature so it is to be expected to work on
different architectures ), with park_offline_cpus = false for them.

I then decided to investigate why the check system_state !=
SYS_STATE_suspend is necessary, and according to the commit message:
```
    xen: don't free percpu areas during suspend
    
    Instead of freeing percpu areas during suspend and allocating them
    again when resuming keep them. Only free an area in case a cpu
didn't
    come up again when resuming.
    
    It should be noted that there is a potential change in behaviour as
    the percpu areas are no longer zeroed out during suspend/resume.
While
    I have checked the called cpu notifier hooks to cope with that
there
    might be some well hidden dependency on the previous behaviour.
OTOH
    a component not registering itself for cpu down/up and expecting to
    see a zeroed percpu variable after suspend/resume is kind of broken
    already. And the opposite case, where a component is not registered
    to be called for cpu down/up and is not expecting a percpu variable
    suddenly to be zero due to suspend/resume is much more probable,
    especially as the suspend/resume functionality seems not to be
tested
    that often.
```
It is mentioned that "there is a potential change in behavior as the
percpu areas are no longer zeroed out during suspend/resume." I was
unsure if this change might break something for Arm.

I can attempt to move everything to the common percpu.c and at least
verify that there are no issues in CI. If that suffices to confirm that
everything is okay, I can create a new patch series version.

Does this make sense?

~ Oleksii



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.