Xen project Mailing List

Re: [Xen-devel] [PATCH] dpci: Put the dpci back on the list if scheduled from another CPU.

To: "Konrad Rzeszutek Wilk" <konrad.wilk@xxxxxxxxxx>

From: "Jan Beulich" <JBeulich@xxxxxxxx>

Date: Thu, 19 Mar 2015 07:53:35 +0000

Delivery-date: Thu, 19 Mar 2015 07:53:51 +0000

List-id: Xen developer discussion <xen-devel.lists.xen.org>

>>> On 17.03.15 at 16:38, <konrad.wilk@xxxxxxxxxx> wrote: > There is race when we clear the STATE_SCHED in the softirq > - which allows the 'raise_softirq_for' (on another CPU or > on the one running the softirq) to schedule the dpci. > > Specifically this can happen when the other CPU receives > an interrupt, calls 'raise_softirq_for', and puts the dpci > on its per-cpu list (same dpci structure). Note that > this could also happen on the same physical CPU, however > the explanation for simplicity will assume two CPUs actors. > > There would be two 'dpci_softirq' running at the same time > (on different CPUs) where on one CPU it would be executing > hvm_dirq_assist (so had cleared STATE_SCHED and set STATE_RUN) > and on the other CPU it is trying to call: > > if ( test_and_set_bit(STATE_RUN, &pirq_dpci->state) ) > BUG(); > > Since STATE_RUN is already set it would end badly. > > The reason we can get his with this is when an interrupt > affinity is set over multiple CPUs. > > Potential solutions: > > a) Instead of the BUG() we can put the dpci back on the per-cpu > list to deal with later (when the softirq are activated again). > This putting the 'dpci' back on the per-cpu list is an spin > until the bad condition clears. > > b) We could also expand the test-and-set(STATE_SCHED) in raise_softirq_for > to detect for 'STATE_RUN' bit being set and schedule the dpci. > The BUG() check in dpci_softirq would be replace with a spin until > 'STATE_RUN' has been cleared. The dpci would still not > be scheduled when STATE_SCHED bit was set. > > c) Only schedule the dpci when the state is cleared > (no STATE_SCHED and no STATE_RUN). It would spin if STATE_RUN is set > (as it is in progress and will finish). If the STATE_SCHED is set > (so hasn't run yet) we won't try to spin and just exit. > > Down-sides of the solutions: > > a). Live-lock of the CPU. We could be finishing an dpci, then adding > the dpci, exiting, and the processing the dpci once more. And so on. > We would eventually stop as the TIMER_SOFTIRQ would be set, which will > cause SCHEDULER_SOFTIRQ to be set as well and we would exit this loop. > > Interestingly the old ('tasklet') code used this mechanism. > If the function assigned to the tasklet was running - the softirq > that ran said function (hvm_dirq_assist) would be responsible for > putting the tasklet back on the per-cpu list. This would allow > to have an running tasklet and an 'to-be-scheduled' tasklet > at the same time. > > b). is similar to a) - instead of re-entering the dpci_softirq > we are looping in the softirq waiting for the correct condition to > arrive. As it does not allow unwedging ourselves because the other > softirqs are not called - it is less preferable. > > c) can cause an dead-lock if the interrupt comes in when we are > processing the dpci in the softirq - iff this happens on the same CPU. > We would be looping in on raise_softirq waiting for STATE_RUN > to be cleared, while the softirq that was to clear it - is preempted > by our interrupt handler. > > As such, this patch - which implements a) is the best candidate > for this quagmire. > > Reported-and-Tested-by: Sander Eikelenboom <linux@xxxxxxxxxxxxxx> > Reported-and-Tested-by: Malcolm Crossley <malcolm.crossley@xxxxxxxxxx> > Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx> So I now agree that in the state we're in this is the most reasonable fix. My reservations against the extra logic introduced earlier (and being fixed here) stand though: From an abstract perspective the IRQ and softirq logic alone should be sufficient to deal with the needs we have. The complications really result from the desire to use a per-CPU list of hvm_pirq_dpci-s, which I still think a simpler alternative should be found for (after all real hardware doesn't use such lists). A first thought would be to put them all on a per-domain list and have a cpumask tracking which CPUs they need servicing on. The downside of this would be (apart from again not being a proper equivalent of how actual hardware handles this) that - the softirq handler not having any other context - domains needing servicing would also need to be tracked in some form (in order to avoid having to iterate over all of them), and a per-CPU list would be undesirable for the exact same reasons. Yet a per-CPU domain-needs-service bitmap doesn't seem very attractive either, i.e. this would need further thought (also to make sure such an alternative model doesn't become even more involved than what we have now). Jan _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.