[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v3] xen/arm: fix rank/vgic lock inversion bug



(CC Artem as ARM coverity admin)

Hi Stefano,

On 03/01/17 23:29, Stefano Stabellini wrote:
Always set the new physical irq affinity at the beginning of
vgic_migrate_irq, in all cases.

No need to set physical irq affinity in gic_update_one_lr anymore,
solving the lock inversion problem.

After migrating an interrupt from vcpu/pcpu 0 to vcpu/pcpu 1, it is
possible to receive a physical interrupt on pcpu 1, which Xen is
supposed to inject into vcpu 1, before the LR on pcpu 0 has been
cleared. In this case the irq is still marked as
GIC_IRQ_GUEST_MIGRATING, and struct pending_irq is still "inflight" on
vcpu 0. As the irq cannot be "inflight" on vcpu 0 and vcpu 1
simultaneously, drop the interrupt.

I am not sure to understand how this is related to the fix of the rank/vgic lock inversion bug. Am I right to think that this bug was already present?


Coverity-ID: 1381855
Coverity-ID: 1381853

I am bit confused... somehow those numbers disappeared from the main Coverity page. Which means Coverity think they have been fixed. However this patch is not yet upstreamed. Artem, are you testing Xen upstream?


Signed-off-by: Stefano Stabellini <sstabellini@xxxxxxxxxx>
---
 xen/arch/arm/gic.c  |  6 +-----
 xen/arch/arm/vgic.c | 19 +++++++++++--------
 2 files changed, 12 insertions(+), 13 deletions(-)

diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
index a5348f2..767fc9e 100644
--- a/xen/arch/arm/gic.c
+++ b/xen/arch/arm/gic.c
@@ -504,11 +504,7 @@ static void gic_update_one_lr(struct vcpu *v, int i)
             gic_raise_guest_irq(v, irq, p->priority);
         else {
             list_del_init(&p->inflight);
-            if ( test_and_clear_bit(GIC_IRQ_GUEST_MIGRATING, &p->status) )
-            {
-                struct vcpu *v_target = vgic_get_target_vcpu(v, irq);
-                irq_set_affinity(p->desc, cpumask_of(v_target->processor));
-            }
+            clear_bit(GIC_IRQ_GUEST_MIGRATING, &p->status);
         }
     }
 }
diff --git a/xen/arch/arm/vgic.c b/xen/arch/arm/vgic.c
index 364d5f0..11ffb9b 100644
--- a/xen/arch/arm/vgic.c
+++ b/xen/arch/arm/vgic.c
@@ -264,20 +264,17 @@ void vgic_migrate_irq(struct vcpu *old, struct vcpu *new, 
unsigned int irq)
     if ( p->desc == NULL )
         return;

+    irq_set_affinity(p->desc, cpumask_of(new->processor));
+
     /* migration already in progress, no need to do anything */
     if ( test_bit(GIC_IRQ_GUEST_MIGRATING, &p->status) )
         return;
+    if ( list_empty(&p->inflight) )
+        return;

     perfc_incr(vgic_irq_migrates);

     spin_lock_irqsave(&old->arch.vgic.lock, flags);
-
-    if ( list_empty(&p->inflight) )
-    {
-        irq_set_affinity(p->desc, cpumask_of(new->processor));
-        spin_unlock_irqrestore(&old->arch.vgic.lock, flags);
-        return;
-    }
     /* If the IRQ is still lr_pending, re-inject it to the new vcpu */
     if ( !list_empty(&p->lr_queue) )
     {
@@ -286,7 +283,6 @@ void vgic_migrate_irq(struct vcpu *old, struct vcpu *new, 
unsigned int irq)
         list_del_init(&p->inflight);
         irq_set_affinity(p->desc, cpumask_of(new->processor));

This call is not necessary anymore.

         spin_unlock_irqrestore(&old->arch.vgic.lock, flags);

Now all the paths finish by spin_unlock; return; Can you send a patch do the cleanup?

-        vgic_vcpu_inject_irq(new, irq);

The comment on top of the if says: "If the IRQ is still lr_pending, re-inject it to the new vCPU". However, you remove interrupt from the list but never re-inject it.

Looking at the context, I think we should keep the call to vgic_vcpu_inject_irq otherwise we lost the interrupt for ever.

         return;
     }
     /* if the IRQ is in a GICH_LR register, set GIC_IRQ_GUEST_MIGRATING
@@ -495,6 +491,13 @@ void vgic_vcpu_inject_irq(struct vcpu *v, unsigned int 
virq)
         return;
     }

+    if ( test_bit(GIC_IRQ_GUEST_MIGRATING, &n->status) )
+    {
+        /* Drop the interrupt, because it is still inflight on another vcpu */
+        spin_unlock_irqrestore(&v->arch.vgic.lock, flags);

If I understand correctly, the problem arise because LRs are cleared lazily and the other vCPU is still running. It is a short window and I don't think should happen often.

I would suggest to kick the other vCPU with an SGI to get the LRs cleared. Any opinions?

+        return;
+    }
+
     set_bit(GIC_IRQ_GUEST_QUEUED, &n->status);

     if ( !list_empty(&n->inflight) )


Cheers,

--
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.