|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH v3] xen/arm: fix rank/vgic lock inversion bug
On Fri, 27 Jan 2017, Julien Grall wrote:
> (CC Artem as ARM coverity admin)
>
> Hi Stefano,
>
> On 03/01/17 23:29, Stefano Stabellini wrote:
> > Always set the new physical irq affinity at the beginning of
> > vgic_migrate_irq, in all cases.
> >
> > No need to set physical irq affinity in gic_update_one_lr anymore,
> > solving the lock inversion problem.
> >
> > After migrating an interrupt from vcpu/pcpu 0 to vcpu/pcpu 1, it is
> > possible to receive a physical interrupt on pcpu 1, which Xen is
> > supposed to inject into vcpu 1, before the LR on pcpu 0 has been
> > cleared. In this case the irq is still marked as
> > GIC_IRQ_GUEST_MIGRATING, and struct pending_irq is still "inflight" on
> > vcpu 0. As the irq cannot be "inflight" on vcpu 0 and vcpu 1
> > simultaneously, drop the interrupt.
>
> I am not sure to understand how this is related to the fix of the rank/vgic
> lock inversion bug. Am I right to think that this bug was already present?
Yes, it is related: without this patch, this situation cannot happen,
because irq_set_affinity is called from gic_update_one_lr.
> > Coverity-ID: 1381855
> > Coverity-ID: 1381853
>
> I am bit confused... somehow those numbers disappeared from the main Coverity
> page. Which means Coverity think they have been fixed. However this patch is
> not yet upstreamed. Artem, are you testing Xen upstream?
>
> >
> > Signed-off-by: Stefano Stabellini <sstabellini@xxxxxxxxxx>
> > ---
> > xen/arch/arm/gic.c | 6 +-----
> > xen/arch/arm/vgic.c | 19 +++++++++++--------
> > 2 files changed, 12 insertions(+), 13 deletions(-)
> >
> > diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
> > index a5348f2..767fc9e 100644
> > --- a/xen/arch/arm/gic.c
> > +++ b/xen/arch/arm/gic.c
> > @@ -504,11 +504,7 @@ static void gic_update_one_lr(struct vcpu *v, int i)
> > gic_raise_guest_irq(v, irq, p->priority);
> > else {
> > list_del_init(&p->inflight);
> > - if ( test_and_clear_bit(GIC_IRQ_GUEST_MIGRATING, &p->status) )
> > - {
> > - struct vcpu *v_target = vgic_get_target_vcpu(v, irq);
> > - irq_set_affinity(p->desc, cpumask_of(v_target->processor));
> > - }
> > + clear_bit(GIC_IRQ_GUEST_MIGRATING, &p->status);
> > }
> > }
> > }
> > diff --git a/xen/arch/arm/vgic.c b/xen/arch/arm/vgic.c
> > index 364d5f0..11ffb9b 100644
> > --- a/xen/arch/arm/vgic.c
> > +++ b/xen/arch/arm/vgic.c
> > @@ -264,20 +264,17 @@ void vgic_migrate_irq(struct vcpu *old, struct vcpu
> > *new, unsigned int irq)
> > if ( p->desc == NULL )
> > return;
> >
> > + irq_set_affinity(p->desc, cpumask_of(new->processor));
> > +
> > /* migration already in progress, no need to do anything */
> > if ( test_bit(GIC_IRQ_GUEST_MIGRATING, &p->status) )
> > return;
> > + if ( list_empty(&p->inflight) )
> > + return;
> >
> > perfc_incr(vgic_irq_migrates);
> >
> > spin_lock_irqsave(&old->arch.vgic.lock, flags);
> > -
> > - if ( list_empty(&p->inflight) )
> > - {
> > - irq_set_affinity(p->desc, cpumask_of(new->processor));
> > - spin_unlock_irqrestore(&old->arch.vgic.lock, flags);
> > - return;
> > - }
> > /* If the IRQ is still lr_pending, re-inject it to the new vcpu */
> > if ( !list_empty(&p->lr_queue) )
> > {
> > @@ -286,7 +283,6 @@ void vgic_migrate_irq(struct vcpu *old, struct vcpu
> > *new, unsigned int irq)
> > list_del_init(&p->inflight);
> > irq_set_affinity(p->desc, cpumask_of(new->processor));
>
> This call is not necessary anymore.
>
> > spin_unlock_irqrestore(&old->arch.vgic.lock, flags);
>
> Now all the paths finish by spin_unlock; return; Can you send a patch do the
> cleanup?
Sure
> > - vgic_vcpu_inject_irq(new, irq);
>
> The comment on top of the if says: "If the IRQ is still lr_pending, re-inject
> it to the new vCPU". However, you remove interrupt from the list but never
> re-inject it.
>
> Looking at the context, I think we should keep the call to
> vgic_vcpu_inject_irq otherwise we lost the interrupt for ever.
It looks like I made a mistake while generating the patch: I meant to
remove the redundant irq_set_affinity call, instead of the necessary
vgic_vcpu_inject_irq call. I don't know how it happened, sorry. Thanks
for catching the error.
> > return;
> > }
> > /* if the IRQ is in a GICH_LR register, set GIC_IRQ_GUEST_MIGRATING
> > @@ -495,6 +491,13 @@ void vgic_vcpu_inject_irq(struct vcpu *v, unsigned int
> > virq)
> > return;
> > }
> >
> > + if ( test_bit(GIC_IRQ_GUEST_MIGRATING, &n->status) )
> > + {
> > + /* Drop the interrupt, because it is still inflight on another vcpu
> > */
> > + spin_unlock_irqrestore(&v->arch.vgic.lock, flags);
>
> If I understand correctly, the problem arise because LRs are cleared lazily
> and the other vCPU is still running. It is a short window and I don't think
> should happen often.
>
> I would suggest to kick the other vCPU with an SGI to get the LRs cleared. Any
> opinions?
I am fine with sending an SGI, but we still need
http://marc.info/?l=xen-devel&m=148237295020488 to know the destination
of the SGI.
The problem is what happens next: is it safe to busy-wait until the LR
is cleared? It is safe only if we cannot receive a second interrupt
notification while the first one has not been deactivated yet by the
guest:
- guest vcpu0 reads GICC_IAR for virq0
- guest vcpu1 change target for virq0 to vcpu1
- Xen traps the write and changes the physical affinity of virq0 (pirq0)
to vcpu1 (pcpu1)
- Xen receives a second virq0 (pirq0) interrupt notification in pcpu1,
but virq0 is still MIGRATING and guest vcpu0 has not EOIed it yet
- Xen sends an SGI to vcpu0 (pcpu0), but the interrupt has not been
EOIed yet, thus, the LR doesn't get cleared
Can this happen? I read 3.2.3 and 3.2.4 a few times but I am still
unsure, it looks like it can. If it can happen, we cannot busy-wait.
> > + return;
> > + }
> > +
> > set_bit(GIC_IRQ_GUEST_QUEUED, &n->status);
> >
> > if ( !list_empty(&n->inflight) )
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |