[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Xen-devel] [PATCH] dpci: Put the dpci back on the list if scheduled from another CPU.

There is race when we clear the STATE_SCHED in the softirq
- which allows the 'raise_softirq_for' (on another CPU or
on the one running the softirq) to schedule the dpci.

Specifically this can happen when the other CPU receives
an interrupt, calls 'raise_softirq_for', and puts the dpci
on its per-cpu list (same dpci structure). Note that
this could also happen on the same physical CPU, however
the explanation for simplicity will assume two CPUs actors.

There would be two 'dpci_softirq' running at the same time
(on different CPUs) where on one CPU it would be executing
hvm_dirq_assist (so had cleared STATE_SCHED and set STATE_RUN)
and on the other CPU it is trying to call:

   if ( test_and_set_bit(STATE_RUN, &pirq_dpci->state) )

Since STATE_RUN is already set it would end badly.

The reason we can get his with this is when an interrupt
affinity is set over multiple CPUs.

Potential solutions:

a) Instead of the BUG() we can put the dpci back on the per-cpu
list to deal with later (when the softirq are activated again).
This putting the 'dpci' back on the per-cpu list is an spin
until the bad condition clears.

b) We could also expand the test-and-set(STATE_SCHED) in raise_softirq_for
to detect for 'STATE_RUN' bit being set and schedule the dpci.
The BUG() check in dpci_softirq would be replace with a spin until
'STATE_RUN' has been cleared. The dpci would still not
be scheduled when STATE_SCHED bit was set.

c) Only schedule the dpci when the state is cleared
(no STATE_SCHED and no STATE_RUN).  It would spin if STATE_RUN is set
(as it is in progress and will finish). If the STATE_SCHED is set
(so hasn't run yet) we won't try to spin and just exit.

Down-sides of the solutions:

a). Live-lock of the CPU. We could be finishing an dpci, then adding
the dpci, exiting, and the processing the dpci once more. And so on.
We would eventually stop as the TIMER_SOFTIRQ would be set, which will
cause SCHEDULER_SOFTIRQ to be set as well and we would exit this loop.

Interestingly the old ('tasklet') code used this mechanism.
If the function assigned to the tasklet was running  - the softirq
that ran said function (hvm_dirq_assist) would be responsible for
putting the tasklet back on the per-cpu list. This would allow
to have an running tasklet and an 'to-be-scheduled' tasklet
at the same time.

b). is similar to a) - instead of re-entering the dpci_softirq
we are looping in the softirq waiting for the correct condition to
arrive. As it does not allow unwedging ourselves because the other
softirqs are not called - it is less preferable.

c) can cause an dead-lock if the interrupt comes in when we are
processing the dpci in the softirq - iff this happens on the same CPU.
We would be looping in on raise_softirq waiting for STATE_RUN
to be cleared, while the softirq that was to clear it - is preempted
by our interrupt handler.

As such, this patch - which implements a) is the best candidate
for this quagmire.

Reported-and-Tested-by: Sander Eikelenboom <linux@xxxxxxxxxxxxxx>
Reported-and-Tested-by: Malcolm Crossley <malcolm.crossley@xxxxxxxxxx>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx>
 xen/drivers/passthrough/io.c | 12 +++++++++++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/xen/drivers/passthrough/io.c b/xen/drivers/passthrough/io.c
index ae050df..9b77334 100644
--- a/xen/drivers/passthrough/io.c
+++ b/xen/drivers/passthrough/io.c
@@ -804,7 +804,17 @@ static void dpci_softirq(void)
         d = pirq_dpci->dom;
         smp_mb(); /* 'd' MUST be saved before we set/clear the bits. */
         if ( test_and_set_bit(STATE_RUN, &pirq_dpci->state) )
-            BUG();
+        {
+            unsigned long flags;
+            /* Put back on the list and retry. */
+            local_irq_save(flags);
+            list_add_tail(&pirq_dpci->softirq_list, &this_cpu(dpci_list));
+            local_irq_restore(flags);
+            raise_softirq(HVM_DPCI_SOFTIRQ);
+            continue;
+        }
          * The one who clears STATE_SCHED MUST refcount the domain.

Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.