[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH 1/2] xen/arm: propagate vGIC vCPU init failures


  • To: Mykola Kvach <xakep.amatop@xxxxxxxxx>, <xen-devel@xxxxxxxxxxxxxxxxxxxx>
  • From: "Orzel, Michal" <michal.orzel@xxxxxxx>
  • Date: Mon, 25 May 2026 11:46:09 +0200
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 165.204.84.17) smtp.rcpttodomain=gmail.com smtp.mailfrom=amd.com; dmarc=pass (p=quarantine sp=quarantine pct=100) action=none header.from=amd.com; dkim=none (message not signed); arc=none (0)
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=ahRO5UdBbWodjZgE60oN0b0QE3VVyY2NqX5GA8TthSI=; b=WH5h+y2nfBuBPE9pMARw2u6OPO5Hc1EXE+rVe0iRnVHfN+O22I1oGY55tnCRGtuASPhNoxE4RE4DTdd/cv+/YehLQjU4yCumhD1irDJDrVC/RADvQNTjl+EO9dP4cDeckTMGYxC2CkOgCIhteLfE5pKuIzSeYKoRpaVRHvsPd5cDNcbLoFDEAwpTHL1XibeV4SKFFdBQIJd4riLh+KBZZz/jeMgjAjjqLQx3Z84ERzqFEV/+ESAzNitJylnGa/xjb/4QC7M0SpKujHxSj1ySZP/NM3iBhEZKuFS7iNPENyTfiVL/1C3/JG5u+cxlOOb56qAFLG6Y609KU3qg6mF2ZA==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=LarfoQTigKAKL8w9WZmnV1KU/CADGZgq45JGiqsrBOStp/e+qu8qg/Cmv3YXfT6kh8IOp44oMWb2dSmoyT87lyVjYaUrbRvK7HAPlPNI2jnT2IBpZ6k1j3v2aZGdlZQJ1mBe6TuC6zsUiyBaVal3HdY+4R2CSYIDigLKg551rM+gtaSD2PzD7pqh5zAmQT3Wri81c0QBFE+Rq/xDL50GPJl6tJ1mPxEx8eQ4lF1qZXozLofFXWaF8rElAyokaXB+ZtS/MrFSjwG3gGc8UKaSUOHjYfdxHKIjZKM+n1Y1PIQ/eRk32JIlFlmddNWW2qwtZrIXRKgkh04E2Cf+RL2CNw==
  • Authentication-results: eu.smtp.expurgate.cloud; dkim=pass header.s=selector1 header.d=amd.com header.i="@amd.com" header.h="From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck"
  • Cc: Mykola Kvach <mykola_kvach@xxxxxxxx>, Stefano Stabellini <sstabellini@xxxxxxxxxx>, Julien Grall <julien@xxxxxxx>, Bertrand Marquis <bertrand.marquis@xxxxxxx>, Volodymyr Babchuk <Volodymyr_Babchuk@xxxxxxxx>
  • Delivery-date: Mon, 25 May 2026 09:46:43 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>


On 22-May-26 08:18, Mykola Kvach wrote:
> From: Mykola Kvach <mykola_kvach@xxxxxxxx>
> 
> The vGIC per-vCPU init callback can fail. In particular, GICv3 rejects
> a vCPU when the domain's redistributor layout has no MMIO slot covering
> that vCPU. However, the generic vGIC init path ignored the callback
> return value, so vcpu_create() could succeed with an invalid vGICv3
> per-vCPU state.
> 
> This can be reproduced on FVP Base RevC by exposing a 2 MiB GICv3
> redistributor region to Dom0 and booting Xen with:
> 
>     maxcpus=1 dom0_max_vcpus=64
> 
> The host GICv3 redistributor range is:
> 
>     region 0: 0x0000002f100000 - 0x0000002f300000
> 
> With Xen's guest redistributor frame size of 128 KiB, that range covers
> 16 guest redistributors. vCPU16 and above have no corresponding
> redistributor slot.
> 
> Before this fix, Xen detected the missing redistributors:
> 
>     d0: Unable to find a re-distributor for VCPU 16
>     ...
>     d0: Unable to find a re-distributor for VCPU 63
> 
> but ignored the error and continued as if the secondary vCPUs had been
> created correctly. Dom0 then saw 64 possible CPUs and could hang during
> secondary CPU bring-up:
> 
>     smp: Bringing up secondary CPUs ...
>     d0v15: vGICR: SGI: unhandled word write ... to ICACTIVER0
> 
> Propagate the vGIC vcpu_init() error so the caller can stop creating
> secondary vCPUs. With this fix, Dom0 construction reports:
> 
>     d0: Unable to find a re-distributor for VCPU 16
>     Failed to allocate d0v16
> 
> and the guest continues booting with the vCPUs created before the
> failure:
> 
>     smp: Brought up 1 node, 16 CPUs
> 
> Free the private IRQ rank allocated by vcpu_vgic_init() on this error
> path. The caller will still run the generic vCPU creation cleanup, but
> XFREE() clears the pointer so that cleanup remains idempotent.
> 
> Also fix the host-layout redistributor region count for the case where
> the requested vCPU count is larger than the capacity of all host
> redistributor regions. The old code always stored i + 1 after the loop.
> That is correct when the loop stops inside a valid region because the
> requested vCPU count is covered. If the loop exits after consuming all
> hardware regions, i is already equal to the number of allocated regions,
> so i + 1 records one region too many.
> 
> In the same FVP setup, that off-by-one made Xen describe host-layout
> GICR state beyond the populated redistributor region list. Dom0 then
> accessed the GICR MMIO window described in its device tree, but Xen could
> not match the access to a valid emulated redistributor frame. During
> debugging this was seen as an unexpected vGICR access followed by a
> guest panic:
> 
>     d0v0: vGICR: unknown gpa read address 000000002f10ffe8
>     pc : gic_iterate_rdists+0x4c/0x104
>     Kernel panic - not syncing: Attempted to kill the idle task!
> 
> Keep the existing best-effort Dom0 policy: a failure to create a
> secondary vCPU stops the secondary vCPU creation loop, but does not fail
> the whole Dom0 boot.
This is the construct_domain() policy that is used for both dom0 and dom0less
domUs. I must say I don't really like this policy and it's against our *generic*
Arm policy to fail the domain creation on any error especially if this belongs
to unsatisfied user requests. This is not really related to your series, but I
would like to ask other Arm maintainers about their opinion.

> 
> Fixes: ea37fd21110b ("xen/arm: split vgic driver into generic and vgic-v2 
> driver")
> Fixes: 54ec59f6b0b3 ("xen/arm: vgic-v3: Don't create empty re-distributor 
> regions")
> Signed-off-by: Mykola Kvach <mykola_kvach@xxxxxxxx>
> ---
>  xen/arch/arm/vgic-v3.c |  3 ++-
>  xen/arch/arm/vgic.c    | 10 +++++++---
>  2 files changed, 9 insertions(+), 4 deletions(-)
> 
> diff --git a/xen/arch/arm/vgic-v3.c b/xen/arch/arm/vgic-v3.c
> index 77517c3030..360778eb32 100644
> --- a/xen/arch/arm/vgic-v3.c
> +++ b/xen/arch/arm/vgic-v3.c
> @@ -1834,7 +1834,8 @@ static int vgic_v3_domain_init(struct domain *d)
>           * not match the number of pCPUs). Update the number of regions to
>           * avoid exposing unused region as they will not get emulated.
>           */
> -        d->arch.vgic.nr_regions = i + 1;
> +        d->arch.vgic.nr_regions = (i == vgic_v3_hw.nr_rdist_regions) ?
> +                                   i : i + 1;
Wouldn't min(i+1,vgic_v3_hw.nr_rdist_regions) be cleaner?

>  
>          d->arch.vgic.intid_bits = vgic_v3_hw.intid_bits;
>      }
> diff --git a/xen/arch/arm/vgic.c b/xen/arch/arm/vgic.c
> index 6647071ad4..e55e484493 100644
> --- a/xen/arch/arm/vgic.c
> +++ b/xen/arch/arm/vgic.c
> @@ -368,7 +368,7 @@ void domain_vgic_free(struct domain *d)
>  
>  int vcpu_vgic_init(struct vcpu *v)
>  {
> -    int i;
> +    int i, ret;
>  
>      v->arch.vgic.private_irqs = xzalloc(struct vgic_irq_rank);
>      if ( v->arch.vgic.private_irqs == NULL )
> @@ -377,7 +377,12 @@ int vcpu_vgic_init(struct vcpu *v)
>      /* SGIs/PPIs are always routed to this VCPU */
>      vgic_rank_init(v->arch.vgic.private_irqs, 0, v->vcpu_id);
>  
> -    v->domain->arch.vgic.handler->vcpu_init(v);
> +    ret = v->domain->arch.vgic.handler->vcpu_init(v);
> +    if ( ret )
> +    {
> +        XFREE(v->arch.vgic.private_irqs);
> +        return ret;
> +    }
>  
>      memset(&v->arch.vgic.pending_irqs, 0, sizeof(v->arch.vgic.pending_irqs));
>      for (i = 0; i < 32; i++)
> @@ -944,4 +949,3 @@ void vgic_check_inflight_irqs_pending(struct vcpu *v, 
> unsigned int rank, uint32_
>   * indent-tabs-mode: nil
>   * End:
>   */
> -
Stray change, please drop.

Reviewed-by: Michal Orzel <michal.orzel@xxxxxxx>

~Michal





 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.