[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH] x86/IRQ: prevent vector sharing within IO-APICs



On 14/11/11 16:29, Jan Beulich wrote:
>>>> On 14.11.11 at 16:27, Andrew Cooper <andrew.cooper3@xxxxxxxxxx> wrote:
>> On 14/11/11 14:20, Jan Beulich wrote:
>>> I'm having some more fundamental problem with this original change
>>> of yours: The assignment of the new vector happens in the context
>>> of move_native_irq(), which gets called *after* doing the manual
>>> EOI (and hence also *after* the vector != desc->arch.vector check).
>>> How does that do what it is supposed to?
>>>
>>> (Below the [untested] patch that I would propose to do what I described
>>> above, pending your clarification regarding the original change.)
>>>
>>> Jan
>> Are you sure?  I was under the impression that for safety, you have to
>> move the IRQ before ack'ing it at the local APIC, to prevent another IRQ
>> coming in while you are updating the structures.
> Just take another look at end_level_ioapic_irq().

If it makes a difference, I was debugging the issue on boxes without
x2APIC, so there was no direct EOI support.  Given the number of times
the code conditionally returns depending on direct_eoi_enabled, I would
not be surprised if there are two separate codepaths.

>> The steps (as far as I remember from debugging this issue) are:
>>
>> 1) Scheduler decides move a vcpu to a different pcpu.  This implies
>> moving all bound IRQs.  It sets irq_desc->arch->pending_mask to the
>> pcpu(s) we wish to move to, and sets irq_desc->static |= IRQ_MOVE_PENDING
>> 2) When an irq appears (on the original pcpu) which has IRQ_MOVE_PENDING
>> set, the ack handler rewrites the RTE to point to the new pcpu:vector,
>> and updates the irq_desc a bit.  This rewriting necessarily happens
>> before sending an EOI.
> One would think so, but according to my reading of the code that's
> not the case. And it really can't, as otherwise
> io_apic_level_ack_pending() would always return true (and hence
> move_native_irq() would never get called).

In mask_and_ack_level_ioapic_irq which is the ack handler for level
triggered interrupts, there seems to be a codepath which masks the irq,
then EOIs the local apic which should clear the Remote IRR in the RTE at
which point the call to move_masked_irq() should go ahead.

I have to admit that I am perplexed as to how the original bug occurred,
but I guarantee that at the time, I was seeing the RTE being updated to
the new vector while the LAPIC still held the vector for the old irq. 
This is why the submitted patch fixed the bug.

>> 3) When the next irq appears (on the new pcpu), the code cleans up the
>> old vector_irq reference for the old pcpu, and tweaks some of irq_desc.
> But the crucial thing is that desc->arch.vector gets written during step 2,
> and hence the condition around the EOI can never be true unless an
> interrupt surfaces in the window between the EOI in step 2 and the
> disabling of it in move_native_irq(). Was it just this special case that
> your original change addressed (the change description doesn't hint in
> that direction).

Given that the bug was certainly a timing issue, and most of the time
the IRQ migrated, everything worked fine, It is quite possible that it
was a new IRQ appearing in that window.  Having said that, even if a new
IRQ arrived, its remote IRR should not be set until the LAPIC has acked
it, at which point we should be in the IRQ handler for it which seems
rather hard if we are still in the handler for the previous IRQ. 
Perhaps there is a weird interaction whereby acking a masked level
interrupt doesn't reset the IRR.

I have to admit that the logic seems a bit broken, and it certainly
seems to be more complicated than it needs to be.

>> An alternative to 3) can occur when using the IRQ_MOVE_CLEANUP_VECTOR
>> IPI which looks through all pending cleanups on the current pcpu and
>> cleans them up.
>>
>> Anyway, it was certainly the case that I saw the RTE being updated
>> before the EOI was sent, which is why I had to change the
>> hw_irq_controler.end interface to include the vector from the pending
>> EOI stack, so end_level_ioapic_irq could EOI the new vector if it was
>> different from the vector the lapic saw.
> That's another thing I would want to revert: There really shouldn't be
> a need to pass in the old vector, as it is getting stored in
> __assign_irq_vector(). If the value wasn't valid anymore at the point
> we'd need it, we'd have a more severe problem - the vector gets
> marked as not in use by smp_irq_move_cleanup_interrupt(), and
> hence it wouldn't be valid anymore to play with it (read: EOI it) as
> it may have got re-used already. As vector_irq[] gets updated there
> too, that should result in "No irq handler for vector ...", but it would
> not make it into end_level_ioapic_irq() in such a case.
>
> Jan

That is fair enough.  When I submitted this patch, irq_cfg did not have
old_vector in it - I added old_vector in a later patch.  I really should
have remembered that I could then revert the change to the interface.

-- 
Andrew Cooper - Dom0 Kernel Engineer, Citrix XenServer
T: +44 (0)1223 225 900, http://www.citrix.com


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.