[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [Xen-devel] [PATCH] dpci: Put the dpci back on the list if scheduled from another CPU.
There is race when we clear the STATE_SCHED in the softirq - which allows the 'raise_softirq_for' (on another CPU or on the one running the softirq) to schedule the dpci. Specifically this can happen when the other CPU receives an interrupt, calls 'raise_softirq_for', and puts the dpci on its per-cpu list (same dpci structure). Note that this could also happen on the same physical CPU, however the explanation for simplicity will assume two CPUs actors. There would be two 'dpci_softirq' running at the same time (on different CPUs) where on one CPU it would be executing hvm_dirq_assist (so had cleared STATE_SCHED and set STATE_RUN) and on the other CPU it is trying to call: if ( test_and_set_bit(STATE_RUN, &pirq_dpci->state) ) BUG(); Since STATE_RUN is already set it would end badly. The reason we can get his with this is when an interrupt affinity is set over multiple CPUs. Potential solutions: a) Instead of the BUG() we can put the dpci back on the per-cpu list to deal with later (when the softirq are activated again). This putting the 'dpci' back on the per-cpu list is an spin until the bad condition clears. b) We could also expand the test-and-set(STATE_SCHED) in raise_softirq_for to detect for 'STATE_RUN' bit being set and schedule the dpci. The BUG() check in dpci_softirq would be replace with a spin until 'STATE_RUN' has been cleared. The dpci would still not be scheduled when STATE_SCHED bit was set. c) Only schedule the dpci when the state is cleared (no STATE_SCHED and no STATE_RUN). It would spin if STATE_RUN is set (as it is in progress and will finish). If the STATE_SCHED is set (so hasn't run yet) we won't try to spin and just exit. Down-sides of the solutions: a). Live-lock of the CPU. We could be finishing an dpci, then adding the dpci, exiting, and the processing the dpci once more. And so on. We would eventually stop as the TIMER_SOFTIRQ would be set, which will cause SCHEDULER_SOFTIRQ to be set as well and we would exit this loop. Interestingly the old ('tasklet') code used this mechanism. If the function assigned to the tasklet was running - the softirq that ran said function (hvm_dirq_assist) would be responsible for putting the tasklet back on the per-cpu list. This would allow to have an running tasklet and an 'to-be-scheduled' tasklet at the same time. b). is similar to a) - instead of re-entering the dpci_softirq we are looping in the softirq waiting for the correct condition to arrive. As it does not allow unwedging ourselves because the other softirqs are not called - it is less preferable. c) can cause an dead-lock if the interrupt comes in when we are processing the dpci in the softirq - iff this happens on the same CPU. We would be looping in on raise_softirq waiting for STATE_RUN to be cleared, while the softirq that was to clear it - is preempted by our interrupt handler. As such, this patch - which implements a) is the best candidate for this quagmire. Reported-and-Tested-by: Sander Eikelenboom <linux@xxxxxxxxxxxxxx> Reported-and-Tested-by: Malcolm Crossley <malcolm.crossley@xxxxxxxxxx> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx> --- xen/drivers/passthrough/io.c | 12 +++++++++++- 1 file changed, 11 insertions(+), 1 deletion(-) diff --git a/xen/drivers/passthrough/io.c b/xen/drivers/passthrough/io.c index ae050df..9b77334 100644 --- a/xen/drivers/passthrough/io.c +++ b/xen/drivers/passthrough/io.c @@ -804,7 +804,17 @@ static void dpci_softirq(void) d = pirq_dpci->dom; smp_mb(); /* 'd' MUST be saved before we set/clear the bits. */ if ( test_and_set_bit(STATE_RUN, &pirq_dpci->state) ) - BUG(); + { + unsigned long flags; + + /* Put back on the list and retry. */ + local_irq_save(flags); + list_add_tail(&pirq_dpci->softirq_list, &this_cpu(dpci_list)); + local_irq_restore(flags); + + raise_softirq(HVM_DPCI_SOFTIRQ); + continue; + } /* * The one who clears STATE_SCHED MUST refcount the domain. */ -- 2.1.0 _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |