[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v5 04/12] xen/arm: Prevent crash during disable_nonboot_cpus on suspend


  • To: Mykola Kvach <xakep.amatop@xxxxxxxxx>
  • From: Volodymyr Babchuk <Volodymyr_Babchuk@xxxxxxxx>
  • Date: Sat, 23 Aug 2025 00:36:52 +0000
  • Accept-language: en-US
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=epam.com; dmarc=pass action=none header.from=epam.com; dkim=pass header.d=epam.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=AzOYsrfC8+EllYNwzzxcH1YW8G0O1MA7x46OJNns/no=; b=gm+9BVnHgNd32rjVSzuy3mh/KnAJBrME2ZYwzsQ/OTsRTcTSBXqFsmdMfLo5y7ai7dW2PpCEOw3Ns870TakU1Ufo9k4mGAb37fSydwYDGKpRvu1mYtnIisi+K0dsewsqXKfcr89tG2tfVOsC627GqG3fKyYU5/ZaHoTi2psWQw9T0KUO6q/riw3VstVAQizn77EPvKoen/UYc2qIo3RQblh6frbCpUzjlBJp5Ql04NmFJb8fR7R3VyKLrSetkizfSlKA19HE73CaopxX7Mee0WTy1UTeMDOEWzDgaX1F9MFhSxou2d+xurLvh/KVPfnfqwPwrn+HvYGqUW+kJnzSeA==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=F3vLtHpM3Flzry/4+33du2B2EzxDC4fyu/P/uKcjabrXKETVW9pdXwNjmB4W/d8m47vTHXxElUzGkx0Aqtw8afxfZKc/d/mtjxIdsRdYqWWlR1GbdcdBC+Zkgo09RVAS0GPc71nE1FDt9hBeaYyuW9HGP+NaJr8eTk6kwJmJfBY/gM1+A3zI3q4Oar/nmBut0LGNVOrbd7H8SFJquDRHIPELr0xFIRoIZz9vijaXw+p2PpHGbXBWaoKWKX5BAQjYFXsNyOplcg3W6mco9iWKgvayddRqhmIiut5a1SdU40UbtojX7dg3N0O8pcn6NYHDjd5ctUi1+UhRpY8R6ICFyg==
  • Authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=epam.com;
  • Cc: "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>, Mykola Kvach <Mykola_Kvach@xxxxxxxx>, Stefano Stabellini <sstabellini@xxxxxxxxxx>, Julien Grall <julien@xxxxxxx>, Bertrand Marquis <bertrand.marquis@xxxxxxx>, Michal Orzel <michal.orzel@xxxxxxx>
  • Delivery-date: Sat, 23 Aug 2025 00:36:59 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>
  • Thread-index: AQHcCwFbWICkeXmBzU6Bv3/6IpnNyw==
  • Thread-topic: [PATCH v5 04/12] xen/arm: Prevent crash during disable_nonboot_cpus on suspend

Hi Mykola,

Mykola Kvach <xakep.amatop@xxxxxxxxx> writes:

While I approve the change, the commit message is somewhat
unclear. Maybe "Don't release IRQs on suspend" will be better?

> From: Mykola Kvach <mykola_kvach@xxxxxxxx>
>
> If we call disable_nonboot_cpus on ARM64 with system_state set
> to SYS_STATE_suspend, the following assertion will be triggered:
>
> ```
> (XEN) [   25.582712] Disabling non-boot CPUs ...
> (XEN) [   25.587032] Assertion '!in_irq() && (local_irq_is_enabled() || 
> num_online_cpus() <= 1)' failed at common/xmalloc_tlsf.c:714
> [...]
> (XEN) [   25.975069] Xen call trace:
> (XEN) [   25.978353]    [<00000a000022e098>] xfree+0x130/0x1a4 (PC)
> (XEN) [   25.984314]    [<00000a000022e08c>] xfree+0x124/0x1a4 (LR)
> (XEN) [   25.990276]    [<00000a00002747d4>] release_irq+0xe4/0xe8
> (XEN) [   25.996152]    [<00000a0000278588>] 
> time.c#cpu_time_callback+0x44/0x60
> (XEN) [   26.003150]    [<00000a000021d678>] notifier_call_chain+0x7c/0xa0
> (XEN) [   26.009717]    [<00000a00002018e0>] 
> cpu.c#cpu_notifier_call_chain+0x24/0x48
> (XEN) [   26.017148]    [<00000a000020192c>] cpu.c#_take_cpu_down+0x28/0x34
> (XEN) [   26.023801]    [<00000a0000201944>] cpu.c#take_cpu_down+0xc/0x18
> (XEN) [   26.030281]    [<00000a0000225c5c>] 
> stop_machine.c#stopmachine_action+0xbc/0xe4
> (XEN) [   26.038057]    [<00000a00002264bc>] 
> tasklet.c#do_tasklet_work+0xb8/0x100
> (XEN) [   26.045229]    [<00000a00002268a4>] do_tasklet+0x68/0xb0
> (XEN) [   26.051018]    [<00000a000026e120>] domain.c#idle_loop+0x7c/0x194
> (XEN) [   26.057585]    [<00000a0000277e30>] start_secondary+0x21c/0x220
> (XEN) [   26.063978]    [<00000a0000361258>] 00000a0000361258
> ```
>
> This happens because before invoking take_cpu_down via the stop_machine_run
> function on the target CPU, stop_machine_run requests
> the STOPMACHINE_DISABLE_IRQ state on that CPU. Releasing memory in
> the release_irq function then triggers the assertion:
>
> /*
>  * Heap allocations may need TLB flushes which may require IRQs to be
>  * enabled (except when only 1 PCPU is online).
>  */
>
> This patch adds system state checks to guard calls to request_irq
> and release_irq. These calls are now skipped when system_state is
> SYS_STATE_{resume,suspend}, preventing unsafe operations during
> suspend/resume handling.

If any call to release_irq() during suspend will trigger ASSERT, and it
is fine to leave IRQs as is during suspend, maybe it will be easier to
put

+        if ( system_state == SYS_STATE_suspend )
+            return;

straight into release_irq() code? This will be easier than playing
whack-a-mole when some other patch will add another release_irq() call
somewhere.


>
> Signed-off-by: Mykola Kvach <mykola_kvach@xxxxxxxx>
> ---
> Changes in V4:
>   - removed the prior tasklet-based workaround in favor of a more
>     straightforward and safer solution
>   - reworked the approach by adding explicit system state checks around
>     request_irq and release_irq calls, skips these calls during suspend
>     and resume states to avoid unsafe memory operations when IRQs are
>     disabled
> ---
>  xen/arch/arm/gic.c           |  6 ++++++
>  xen/arch/arm/tee/ffa_notif.c |  2 +-
>  xen/arch/arm/time.c          | 18 ++++++++++++------
>  3 files changed, 19 insertions(+), 7 deletions(-)
>
> diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
> index a018bd7715..9856cb1592 100644
> --- a/xen/arch/arm/gic.c
> +++ b/xen/arch/arm/gic.c
> @@ -388,6 +388,9 @@ void gic_dump_info(struct vcpu *v)
>  
>  void init_maintenance_interrupt(void)
>  {
> +    if ( system_state == SYS_STATE_resume )
> +        return;
> +
>      request_irq(gic_hw_ops->info->maintenance_irq, 0, maintenance_interrupt,
>                  "irq-maintenance", NULL);
>  }
> @@ -461,6 +464,9 @@ static int cpu_gic_callback(struct notifier_block *nfb,
>      switch ( action )
>      {
>      case CPU_DYING:
> +        if ( system_state == SYS_STATE_suspend )
> +            break;
> +
>          /* This is reverting the work done in init_maintenance_interrupt */
>          release_irq(gic_hw_ops->info->maintenance_irq, NULL);
>          break;
> diff --git a/xen/arch/arm/tee/ffa_notif.c b/xen/arch/arm/tee/ffa_notif.c
> index 00efaf8f73..06f715a82b 100644
> --- a/xen/arch/arm/tee/ffa_notif.c
> +++ b/xen/arch/arm/tee/ffa_notif.c
> @@ -347,7 +347,7 @@ void ffa_notif_init_interrupt(void)
>  {
>      int ret;
>  
> -    if ( notif_enabled && notif_sri_irq < NR_GIC_SGI )
> +    if ( notif_enabled && notif_sri_irq < NR_GIC_SGI && system_state != 
> SYS_STATE_resume )
>      {
>          /*
>           * An error here is unlikely since the primary CPU has already
> diff --git a/xen/arch/arm/time.c b/xen/arch/arm/time.c
> index ad984fdfdd..b2e07ade43 100644
> --- a/xen/arch/arm/time.c
> +++ b/xen/arch/arm/time.c
> @@ -320,10 +320,13 @@ void init_timer_interrupt(void)
>      WRITE_SYSREG(CNTHCTL_EL2_EL1PCTEN, CNTHCTL_EL2);
>      disable_physical_timers();
>  
> -    request_irq(timer_irq[TIMER_HYP_PPI], 0, htimer_interrupt,
> -                "hyptimer", NULL);
> -    request_irq(timer_irq[TIMER_VIRT_PPI], 0, vtimer_interrupt,
> -                   "virtimer", NULL);
> +    if ( system_state != SYS_STATE_resume )
> +    {
> +        request_irq(timer_irq[TIMER_HYP_PPI], 0, htimer_interrupt,
> +                    "hyptimer", NULL);
> +        request_irq(timer_irq[TIMER_VIRT_PPI], 0, vtimer_interrupt,
> +                    "virtimer", NULL);
> +    }
>  
>      check_timer_irq_cfg(timer_irq[TIMER_HYP_PPI], "hypervisor");
>      check_timer_irq_cfg(timer_irq[TIMER_VIRT_PPI], "virtual");
> @@ -338,8 +341,11 @@ static void deinit_timer_interrupt(void)
>  {
>      disable_physical_timers();
>  
> -    release_irq(timer_irq[TIMER_HYP_PPI], NULL);
> -    release_irq(timer_irq[TIMER_VIRT_PPI], NULL);
> +    if ( system_state != SYS_STATE_suspend )
> +    {
> +        release_irq(timer_irq[TIMER_HYP_PPI], NULL);
> +        release_irq(timer_irq[TIMER_VIRT_PPI], NULL);
> +    }
>  }
>  
>  /* Wait a set number of microseconds */

-- 
WBR, Volodymyr


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.