[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v4 2/2] arm: proper ordering for correct execution of gic_update_one_lr and vgic_store_itargetsr





On 16/02/2017 22:10, Stefano Stabellini wrote:
On Thu, 16 Feb 2017, Julien Grall wrote:
Hi Stefano,

On 11/02/17 02:05, Stefano Stabellini wrote:
Concurrent execution of gic_update_one_lr and vgic_store_itargetsr can
result in the wrong pcpu being set as irq target, see
http://marc.info/?l=xen-devel&m=148218667104072.

To solve the issue, add barriers, remove an irq from the inflight
queue, only after the affinity has been set. On the other end, write the
new vcpu target, before checking GIC_IRQ_GUEST_MIGRATING and inflight.

Signed-off-by: Stefano Stabellini <sstabellini@xxxxxxxxxx>
---
 xen/arch/arm/gic.c     | 3 ++-
 xen/arch/arm/vgic-v2.c | 4 ++--
 xen/arch/arm/vgic-v3.c | 4 +++-
 3 files changed, 7 insertions(+), 4 deletions(-)

diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
index a5348f2..bb52959 100644
--- a/xen/arch/arm/gic.c
+++ b/xen/arch/arm/gic.c
@@ -503,12 +503,13 @@ static void gic_update_one_lr(struct vcpu *v, int i)
              !test_bit(GIC_IRQ_GUEST_MIGRATING, &p->status) )
             gic_raise_guest_irq(v, irq, p->priority);
         else {
-            list_del_init(&p->inflight);
             if ( test_and_clear_bit(GIC_IRQ_GUEST_MIGRATING, &p->status) )
             {
                 struct vcpu *v_target = vgic_get_target_vcpu(v, irq);
                 irq_set_affinity(p->desc, cpumask_of(v_target->processor));
             }
+            smp_mb();
+            list_del_init(&p->inflight);

I don't understand why you remove from the inflight list afterwards. If you do
that you introduce that same problem as discussed in
<7a78c859-fa6f-ba10-b574-d8edd46ea934@xxxxxxx>

As long as the interrupt is routed to the pCPU running gic_update_one_lr, the
interrupt cannot fired because the interrupts are masked.

This is not accurate: it is possible to receive a second interrupt
notification while the first one is still active.

Can you detail here? Because if you look at how gic_update_one_lr is called from gic_clear_lrs, interrupts are masked.

So it cannot be received by Xen while you are in gic_update_one_lr and before irq_set_affinity is called.



However, as soon as irq_set_affinity is called the interrupt may fire
on the other pCPU.

This is true.


However, list_del_init is not atomic and not protected by any lock. So
vgic_vcpu_inject_irq may see a corrupted version of {p,n}->inflight.

Did I miss anything?

Moving list_del_init later ensures that there are no conflicts between
gic_update_one_lr and vgic_store_itargetsr (more specifically,
vgic_migrate_irq). If you look at the implementation of
vgic_migrate_irq, all checks depends on list_empty(&p->inflight). If we
don't move list_del_init later, given that vgic_migrate_irq can be
called with a different vgic lock taken than gic_update_one_lr, the
following scenario can happen:


  <GIC_IRQ_GUEST_MIGRATING is set>      <GIC_IRQ_GUEST_MIGRATING is set>
  CPU0: gic_update_one_lr               CPU1: vgic_store_itargetsr
  ----------------------------------------------------------------------
  remove from inflight
  clear GIC_IRQ_GUEST_MIGRATING
  read rank->vcpu (intermediate)

It is only true if vgic_store_itargetsr is testing GIC_IRQ_GUEST_MIGRATING here and it was clear.

                                        set rank->vcpu (final)
                                        vgic_migrate_irq
                                          if (!inflight) irq_set_affinity 
(final)
  irq_set_affinity (intermediate)


As a result, the irq affinity is set to the wrong cpu. With this patch,
this problem doesn't occur.

However, you are right that both in the case of gic_update_one_lr and
vgic_migrate_irq, as well as the case of gic_update_one_lr and
vgic_vcpu_inject_irq that you mentioned, list_del_init (from
gic_update_one_lr) is potentially run as the same time as list_empty
(from vgic_migrate_irq or from vgic_vcpu_inject_irq), and they are not
atomic.

Also see this other potential issue: 
http://marc.info/?l=xen-devel&m=148703220714075

All these concurrent accesses are difficult to understand and to deal
with. This is why my original suggestion was to use the old vcpu vgic
lock, rather then try to ensure safe concurrent accesses everywhere.
That option is still open and would solve both problems.
We only need to:

- store the vcpu to which an irq is currently injected
http://marc.info/?l=xen-devel&m=148237295020488
- check the new irq->vcpu field, and take the right vgic lock
something like http://marc.info/?l=xen-devel&m=148237295920492&w=2, but
would need improvements

Much simpler, right?

Would not it be easier to just take the desc->lock to protect the concurrent access?

Cheers,

--
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.