[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v1 2/2] xen/arm: Fix deadlock in on_selected_cpus function



On Tue, 28 Jan 2014, Ian Campbell wrote:
> On Tue, 2014-01-28 at 14:00 +0000, Stefano Stabellini wrote:
> > On Tue, 28 Jan 2014, Ian Campbell wrote:
> > > On Mon, 2014-01-27 at 19:00 +0000, Stefano Stabellini wrote:
> > > > Alternatively, as Ian suggested, we could increase the priotiry of SGIs
> > > > but I am a bit wary of making that change at RC2.
> > > 
> > > I'm leaning the other way -- I'm wary of open coding magic locking
> > > primitives to work around this issue on a case by case basis. It's just
> > > too subtle IMHO.
> > > 
> > > The IPI and cross CPU calling primitives are basically predicated on
> > > those IPIs interrupting normal interrupt handlers.
> > 
> > The problem is that we don't know if we can context switch properly
> > nested interrupts.
> 
> What do you mean? We don't have to context switch an IPI.

Sorry, I meant save/restore registers, stack pointer, processor mode,
etc, for nested interrupts.


> > Also I would need to think harder whether everything
> > would work correctly without hitches with multiple SGIs happening
> > simultaneously (with more than 2 cpus involved).
> 
> Since all IPIs would be at the same higher priority only one will be
> active on each CPU at a time. If you are worried about multiple CPUs
> then that is already an issue today, just at a lower priority.

That is correct, but if we moved the on_selected_cpus call out of the
interrupt handler I don't think the problem could occur.


> I have hacked the IPI priority to be higher in the past and it worked
> fine, I just never got round to cleaning it up for submission (I hadn't
> thought of the locking thing and my use case was low priority).
> 
> The interrupt entry and exit paths were written with nested interrupt in
> mind and they have to be so in order to handle interrupts which occur
> from both guest and hypervisor context.
>
> > On the other hand we know that both Oleksandr's and my solution should
> > work OK with no surprises if implemented correctly.
> 
> That's a big if in my mind, any use of trylock is very subtle IMOH.

I wouldn't accept that patch for 4.4 either (or for 4.5 given that we
can certainly come up with something better by that time).


> AIUI this issue only occurs with "proto device assignment" patches added
> to 4.4, n which case I think the solution can wait until 4.5 and can be
> done properly via the IPI priority fix.
 
I think this is a pretty significant problem, even if we don't commit a
fix, we should post a proper patch that we deem acceptable and link it to
the wiki so that anybody that needs it can find it and be sure that it
works correctly.
In my opinion if we go to this length then we might as well commit it
(if it doesn't touch common code of course), but I'll leave the decision
up to you and George.

Given the constraints, the solution I would feel more comfortable with at
this time is something like the following patch (lightly tested):

diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
index e6257a7..b00ca73 100644
--- a/xen/arch/arm/gic.c
+++ b/xen/arch/arm/gic.c
@@ -58,6 +58,25 @@ static DEFINE_PER_CPU(uint64_t, lr_mask);
 
 static unsigned nr_lrs;
 
+static void gic_irq_eoi(void *info)
+{
+    int virq = (uintptr_t) info;
+    GICC[GICC_DIR] = virq;
+}
+
+static DEFINE_PER_CPU(struct pending_irq*, eoi_irq);
+static void eoi_action(unsigned long unused)
+{
+    struct pending_irq *p = this_cpu(eoi_irq);
+    ASSERT(p != NULL);
+
+    on_selected_cpus(cpumask_of(p->desc->arch.eoi_cpu),
+            gic_irq_eoi, (void*)(uintptr_t)p->desc->irq, 0);
+
+    this_cpu(eoi_irq) = NULL;
+}
+static DECLARE_TASKLET(eoi_tasklet, eoi_action, 0);
+
 /* The GIC mapping of CPU interfaces does not necessarily match the
  * logical CPU numbering. Let's use mapping as returned by the GIC
  * itself
@@ -897,12 +916,6 @@ int gicv_setup(struct domain *d)
 
 }
 
-static void gic_irq_eoi(void *info)
-{
-    int virq = (uintptr_t) info;
-    GICC[GICC_DIR] = virq;
-}
-
 static void maintenance_interrupt(int irq, void *dev_id, struct cpu_user_regs 
*regs)
 {
     int i = 0, virq, pirq = -1;
@@ -962,8 +975,11 @@ static void maintenance_interrupt(int irq, void *dev_id, 
struct cpu_user_regs *r
             if ( cpu == smp_processor_id() )
                 gic_irq_eoi((void*)(uintptr_t)pirq);
             else
-                on_selected_cpus(cpumask_of(cpu),
-                                 gic_irq_eoi, (void*)(uintptr_t)pirq, 0);
+            {
+                ASSERT(this_cpu(eoi_irq) == NULL);
+                this_cpu(eoi_irq) = p;
+                tasklet_schedule(&eoi_tasklet);
+            }
         }
 
         i++;

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.