[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v3 0/5] Implement CPU hotplug on Arm



Hi,

On 20/10/2025 19:00, Mykola Kvach wrote:
Thank you for your answers

On Mon, Oct 20, 2025 at 5:15 PM Mykyta Poturai <Mykyta_Poturai@xxxxxxxx> wrote:

On 15.10.25 20:30, Mykola Kvach wrote:
Hi Mykyta,

Thanks for the series.

It seems there might be issues here -- please take a look and let me
know if my concerns are valid:

1. FF-A notification IRQ: after a CPU down->up cycle the IRQ
configuration may be lost.

OPTEE and FFA are marked as unsupported.

Understood, thanks. Would it be worth documenting this?

This must be documented. OP-TEE, FFA and ITS will eventually be supported. So we need to know the gap.

I think it would also be worth to have a Kconfig indicating whether CPU hotplug (and soon suspend/resume) can be used with the documentation. So CPU hotplug will gracefully fail rather than putting the system in a undefined state.


2. GICv3 LPIs: a CPU may fail to come back up unless its LPI pending
table exists (is allocated) on bring-up. See
gicv3_lpi_allocate_pendtable() and its call chain.

ITS is marked as unsupported. I have a plan to deal with this, but it is
out of scope of this series.
> > Thanks for the clarification. Should we document this somewhere?


3. IRQ migration on CPU down: if an IRQ targets a CPU being offlined,
its affinity should be moved to an online CPU before completing the
offlining.

All guest tied IRQ migration is handled by the scheduler. Regarding the
irqs used by Xen, I didn't find any with affinity to other CPUs than CPU
0, which can't be disabled. I think theoretically it is possible for
them to have different affinity, but it seems unlikely considering that
x86 hotplug code also doesn't seem to do any Xen irq migration AFAIU.

What about arm_smmu_init_domain_context and its related call chains?
As far as I can see, some of these paths touch XEN_DOMCTL_* hypercalls,
and my understanding is they can be issued on any CPU.

You are right. The SMMU can be configured from any pCPU. When request_irq() is called, it will route the IRQ to the current pCPU.

Those IRQs are not guest interrupts, so from my understanding, they would not be migrated.

Should we add a
check that no enabled (e)SPIs owned by Xen are pinned to the offlining
CPU?

This would be good. But I also think we should aim to migrate those interrupts.


4. Race between the new hypercalls and disable/enable_nonboot_cpus():
disable_nonboot_cpus is called, enable_nonboot_cpus() reads
frozen_cpus, and before it calls cpu_up() a hypercall onlines the CPU.
cpu_up() then fails as "already online", but the CPU_RESUME_FAILED
path may still run for an already-online CPU, risking use-after-free
of per-CPU state (e.g. via free_percpu_area()) and other issues
related to CPU_RESUME_FAILED notification.


There don't seem to be any calls to disable/enable_nonboot_cpus() on
Arm.

Yet. There is a patch series to use the functions as part of suspend/resume. In fact this series is a pre-requisite for the suspend/resume series.

If we take x86 as an example, then they are called with all domains
already paused, and I don't see how paused domains can issue hypercalls.

The Arm version will also freeze all the domains before calling disable_nonboot_cpus(). So there should be no race on Arm as well.

Cheers,

--
Julien Grall




 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.