|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH 1/2] xen/arm: propagate vGIC vCPU init failures
Hi Michal,
Thank you for the feedback.
On Mon, May 25, 2026 at 12:46 PM Orzel, Michal <michal.orzel@xxxxxxx> wrote:
>
>
>
> On 22-May-26 08:18, Mykola Kvach wrote:
> > From: Mykola Kvach <mykola_kvach@xxxxxxxx>
> >
> > The vGIC per-vCPU init callback can fail. In particular, GICv3 rejects
> > a vCPU when the domain's redistributor layout has no MMIO slot covering
> > that vCPU. However, the generic vGIC init path ignored the callback
> > return value, so vcpu_create() could succeed with an invalid vGICv3
> > per-vCPU state.
> >
> > This can be reproduced on FVP Base RevC by exposing a 2 MiB GICv3
> > redistributor region to Dom0 and booting Xen with:
> >
> > maxcpus=1 dom0_max_vcpus=64
> >
> > The host GICv3 redistributor range is:
> >
> > region 0: 0x0000002f100000 - 0x0000002f300000
> >
> > With Xen's guest redistributor frame size of 128 KiB, that range covers
> > 16 guest redistributors. vCPU16 and above have no corresponding
> > redistributor slot.
> >
> > Before this fix, Xen detected the missing redistributors:
> >
> > d0: Unable to find a re-distributor for VCPU 16
> > ...
> > d0: Unable to find a re-distributor for VCPU 63
> >
> > but ignored the error and continued as if the secondary vCPUs had been
> > created correctly. Dom0 then saw 64 possible CPUs and could hang during
> > secondary CPU bring-up:
> >
> > smp: Bringing up secondary CPUs ...
> > d0v15: vGICR: SGI: unhandled word write ... to ICACTIVER0
> >
> > Propagate the vGIC vcpu_init() error so the caller can stop creating
> > secondary vCPUs. With this fix, Dom0 construction reports:
> >
> > d0: Unable to find a re-distributor for VCPU 16
> > Failed to allocate d0v16
> >
> > and the guest continues booting with the vCPUs created before the
> > failure:
> >
> > smp: Brought up 1 node, 16 CPUs
> >
> > Free the private IRQ rank allocated by vcpu_vgic_init() on this error
> > path. The caller will still run the generic vCPU creation cleanup, but
> > XFREE() clears the pointer so that cleanup remains idempotent.
> >
> > Also fix the host-layout redistributor region count for the case where
> > the requested vCPU count is larger than the capacity of all host
> > redistributor regions. The old code always stored i + 1 after the loop.
> > That is correct when the loop stops inside a valid region because the
> > requested vCPU count is covered. If the loop exits after consuming all
> > hardware regions, i is already equal to the number of allocated regions,
> > so i + 1 records one region too many.
> >
> > In the same FVP setup, that off-by-one made Xen describe host-layout
> > GICR state beyond the populated redistributor region list. Dom0 then
> > accessed the GICR MMIO window described in its device tree, but Xen could
> > not match the access to a valid emulated redistributor frame. During
> > debugging this was seen as an unexpected vGICR access followed by a
> > guest panic:
> >
> > d0v0: vGICR: unknown gpa read address 000000002f10ffe8
> > pc : gic_iterate_rdists+0x4c/0x104
> > Kernel panic - not syncing: Attempted to kill the idle task!
> >
> > Keep the existing best-effort Dom0 policy: a failure to create a
> > secondary vCPU stops the secondary vCPU creation loop, but does not fail
> > the whole Dom0 boot.
> This is the construct_domain() policy that is used for both dom0 and dom0less
> domUs. I must say I don't really like this policy and it's against our
> *generic*
> Arm policy to fail the domain creation on any error especially if this belongs
> to unsatisfied user requests. This is not really related to your series, but I
> would like to ask other Arm maintainers about their opinion.
Yes, I agree this is a broader construct_domain() policy question. I kept
the existing behaviour in this patch, but I am fine with adjusting it if
the maintainers decide that domain creation should fail in this case.
In any case, I will reword the commit message so it does not describe
this as a Dom0-only policy, since the same construct_domain() path is
also used for dom0less domains.
>
> >
> > Fixes: ea37fd21110b ("xen/arm: split vgic driver into generic and vgic-v2
> > driver")
> > Fixes: 54ec59f6b0b3 ("xen/arm: vgic-v3: Don't create empty re-distributor
> > regions")
> > Signed-off-by: Mykola Kvach <mykola_kvach@xxxxxxxx>
> > ---
> > xen/arch/arm/vgic-v3.c | 3 ++-
> > xen/arch/arm/vgic.c | 10 +++++++---
> > 2 files changed, 9 insertions(+), 4 deletions(-)
> >
> > diff --git a/xen/arch/arm/vgic-v3.c b/xen/arch/arm/vgic-v3.c
> > index 77517c3030..360778eb32 100644
> > --- a/xen/arch/arm/vgic-v3.c
> > +++ b/xen/arch/arm/vgic-v3.c
> > @@ -1834,7 +1834,8 @@ static int vgic_v3_domain_init(struct domain *d)
> > * not match the number of pCPUs). Update the number of regions to
> > * avoid exposing unused region as they will not get emulated.
> > */
> > - d->arch.vgic.nr_regions = i + 1;
> > + d->arch.vgic.nr_regions = (i == vgic_v3_hw.nr_rdist_regions) ?
> > + i : i + 1;
> Wouldn't min(i+1,vgic_v3_hw.nr_rdist_regions) be cleaner?
Ack, I will change this in v2.
>
> >
> > d->arch.vgic.intid_bits = vgic_v3_hw.intid_bits;
> > }
> > diff --git a/xen/arch/arm/vgic.c b/xen/arch/arm/vgic.c
> > index 6647071ad4..e55e484493 100644
> > --- a/xen/arch/arm/vgic.c
> > +++ b/xen/arch/arm/vgic.c
> > @@ -368,7 +368,7 @@ void domain_vgic_free(struct domain *d)
> >
> > int vcpu_vgic_init(struct vcpu *v)
> > {
> > - int i;
> > + int i, ret;
> >
> > v->arch.vgic.private_irqs = xzalloc(struct vgic_irq_rank);
> > if ( v->arch.vgic.private_irqs == NULL )
> > @@ -377,7 +377,12 @@ int vcpu_vgic_init(struct vcpu *v)
> > /* SGIs/PPIs are always routed to this VCPU */
> > vgic_rank_init(v->arch.vgic.private_irqs, 0, v->vcpu_id);
> >
> > - v->domain->arch.vgic.handler->vcpu_init(v);
> > + ret = v->domain->arch.vgic.handler->vcpu_init(v);
> > + if ( ret )
> > + {
> > + XFREE(v->arch.vgic.private_irqs);
> > + return ret;
> > + }
> >
> > memset(&v->arch.vgic.pending_irqs, 0,
> > sizeof(v->arch.vgic.pending_irqs));
> > for (i = 0; i < 32; i++)
> > @@ -944,4 +949,3 @@ void vgic_check_inflight_irqs_pending(struct vcpu *v,
> > unsigned int rank, uint32_
> > * indent-tabs-mode: nil
> > * End:
> > */
> > -
> Stray change, please drop.
Ack.
Best regards,
Mykola
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |