[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH] dpci: Put the dpci back on the list if scheduled from another CPU.



>>> On 17.03.15 at 16:38, <konrad.wilk@xxxxxxxxxx> wrote:
> There is race when we clear the STATE_SCHED in the softirq
> - which allows the 'raise_softirq_for' (on another CPU or
> on the one running the softirq) to schedule the dpci.
> 
> Specifically this can happen when the other CPU receives
> an interrupt, calls 'raise_softirq_for', and puts the dpci
> on its per-cpu list (same dpci structure). Note that
> this could also happen on the same physical CPU, however
> the explanation for simplicity will assume two CPUs actors.
> 
> There would be two 'dpci_softirq' running at the same time
> (on different CPUs) where on one CPU it would be executing
> hvm_dirq_assist (so had cleared STATE_SCHED and set STATE_RUN)
> and on the other CPU it is trying to call:
> 
>    if ( test_and_set_bit(STATE_RUN, &pirq_dpci->state) )
>       BUG();
> 
> Since STATE_RUN is already set it would end badly.
> 
> The reason we can get his with this is when an interrupt
> affinity is set over multiple CPUs.
> 
> Potential solutions:
> 
> a) Instead of the BUG() we can put the dpci back on the per-cpu
> list to deal with later (when the softirq are activated again).
> This putting the 'dpci' back on the per-cpu list is an spin
> until the bad condition clears.
> 
> b) We could also expand the test-and-set(STATE_SCHED) in raise_softirq_for
> to detect for 'STATE_RUN' bit being set and schedule the dpci.
> The BUG() check in dpci_softirq would be replace with a spin until
> 'STATE_RUN' has been cleared. The dpci would still not
> be scheduled when STATE_SCHED bit was set.
> 
> c) Only schedule the dpci when the state is cleared
> (no STATE_SCHED and no STATE_RUN).  It would spin if STATE_RUN is set
> (as it is in progress and will finish). If the STATE_SCHED is set
> (so hasn't run yet) we won't try to spin and just exit.
> 
> Down-sides of the solutions:
> 
> a). Live-lock of the CPU. We could be finishing an dpci, then adding
> the dpci, exiting, and the processing the dpci once more. And so on.
> We would eventually stop as the TIMER_SOFTIRQ would be set, which will
> cause SCHEDULER_SOFTIRQ to be set as well and we would exit this loop.
> 
> Interestingly the old ('tasklet') code used this mechanism.
> If the function assigned to the tasklet was running  - the softirq
> that ran said function (hvm_dirq_assist) would be responsible for
> putting the tasklet back on the per-cpu list. This would allow
> to have an running tasklet and an 'to-be-scheduled' tasklet
> at the same time.
> 
> b). is similar to a) - instead of re-entering the dpci_softirq
> we are looping in the softirq waiting for the correct condition to
> arrive. As it does not allow unwedging ourselves because the other
> softirqs are not called - it is less preferable.
> 
> c) can cause an dead-lock if the interrupt comes in when we are
> processing the dpci in the softirq - iff this happens on the same CPU.
> We would be looping in on raise_softirq waiting for STATE_RUN
> to be cleared, while the softirq that was to clear it - is preempted
> by our interrupt handler.
> 
> As such, this patch - which implements a) is the best candidate
> for this quagmire.
> 
> Reported-and-Tested-by: Sander Eikelenboom <linux@xxxxxxxxxxxxxx>
> Reported-and-Tested-by: Malcolm Crossley <malcolm.crossley@xxxxxxxxxx>
> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx>

So I now agree that in the state we're in this is the most reasonable
fix. My reservations against the extra logic introduced earlier (and
being fixed here) stand though: From an abstract perspective the
IRQ and softirq logic alone should be sufficient to deal with the
needs we have. The complications really result from the desire to
use a per-CPU list of hvm_pirq_dpci-s, which I still think a simpler
alternative should be found for (after all real hardware doesn't use
such lists).

A first thought would be to put them all on a per-domain list and
have a cpumask tracking which CPUs they need servicing on. The
downside of this would be (apart from again not being a proper
equivalent of how actual hardware handles this) that - the softirq
handler not having any other context - domains needing servicing
would also need to be tracked in some form (in order to avoid
having to iterate over all of them), and a per-CPU list would be
undesirable for the exact same reasons. Yet a per-CPU
domain-needs-service bitmap doesn't seem very attractive either,
i.e. this would need further thought (also to make sure such an
alternative model doesn't become even more involved than what
we have now).

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.