[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v3 1/2] IOMMU/spinlock: Fix a bug found in AMD IOMMU initialization



Hi Quan and Dario,

On Fri, Mar 11, 2016 at 5:35 AM, Dario Faggioli
<dario.faggioli@xxxxxxxxxx> wrote:
> On Fri, 2016-03-11 at 06:54 +0000, Xu, Quan wrote:
>> On March 11, 2016 11:25am, <mengxu@xxxxxxxxxxxxx> wrote:
>> >
>> > On Wed, Mar 9, 2016 at 8:17 AM, Quan Xu <quan.xu@xxxxxxxxx> wrote:
>> > >
>> > > pcidevs_lock should be held with interrupt enabled. However there
>> > > remains an exception in AMD IOMMU code, where the lock is
>> > > acquired
>> > > with interrupt disabled. This inconsistency might lead to
>> > > deadlock.
>> > Why will this inconsistency lead to deadlock?
>> > I understand the difference between spin_lock_irqsave(), which
>> > disable interrupt,
>> > and spin_lock(), which allows interrupt, however, I didn't get why
>> > misuse the
>> > spin_lock_irqsave() at the place of spin_lock() could potentially
>> > lead to a
>> > deadlock?
>>
>>  1).As Jan mentioned, The implication from disabling interrupts while
>> acquiring a lock is that the lock is also being acquired by some
>> interrupt handler.
>>   If you mix acquire types, the one not disabling interrupts is prone
>> to be interrupted, and the interrupt trying to get hold of the lock
>> the same CPU already owns.
>>
> The key issue is whether or not a lock can be acquired from IRQ context
> (i.e., in an interrupt handler, or a function called by that, as a
> result of an interrupt occurring).
>
> For lock that can, IRQ disabling variants must be used *everywhere* the
> lock is taken (so, e.g., not only when it is actually taken from IRQ
> context, just *always*!).
>
> If that rule is not honored, we are ok if the lock is taken on CPUA,
> and the interrupt handled on CPUB:
>
>   CPUA              CPUB
>   .                 .
>   .                 .
>   spin_lock(l)      .
>   .                 .
>   .                 . <-- Interrupt!
>   .                        .
>   .                       spin_lock(l); //spins on l
>   .                       x             //spins on l
>   .                       x             //spins on l
>   spin_unlock(l)          .             //takes l
>   .                       .
>   .                       spin_unlock(l);
>   .                 . <-- .
>   .                 .
>
> but the following can happen, if the interrupt is delivered to the CPU
> that has already taken the lock:
>
>      CPU
>      .
>      .
> [1]  spin_lock(l);       //takes l
>      .
>      . <-- Interrupt!
>            .
>            spin_lock(l) // CPU deadlocks
>
> If, in the latter case, spin_lock_irq(l) were used at [1], things would
> have been fine, as the interrupt wouldn't have occurred until l weren't
> released:
>
>   CPU
>   .
>   .
>   spin_lock_irq(l)                        //takes l
>   .
>   . <---------------- Interrupt!
>   .                   -                   //IRQs are disabled
>   .                   -                   //IRQs are disabled
>   .                   -                   //IRQs are disabled
>   spin_unlock_irq(l)  .                   //IRQs re-hanbled
>                       spin_lock_irq(l);   //takes l
>                       .
>                       .
>                       spin_unlock_irq(l);
>  . <----------------- .
>  .
>
> Note that whether spin_lock_irq() or spin_lock_irqsave() should be
> used, is completely independent from this, and it must be decided
> according to whether IRQs are disabled already or not when taking the
> lock. And (quite obviously, but since wre're here) it is fine to mix
> spin_lock_irq() and spin_lock_irqsave(), as they both disable
> interrupts, which is what matters.
>
> So, now, if we know for sure that a lock is _never_ever_ever_ taken
> from interrupt context, can we mix spin_lock() and spin_lock_irq() on
> it (for whatever reason)? Well, as far as the above reasoning is
> concerned, yes.
>
> In fact, the deadlock arises because IRQs interrupt asynchronously what
> the CPU is doing, and that can happen when the CPU has taken the lock
> already. But if the 'asynchronous' part goes away, we really don't care
> whether a lock is take at time t1 with IRQ enabled, and at time t2 with
> IRQ disabled, don't you think?
>
> Well, here it is where what the comment inside check_lock() describes
> comes into play. As quoted by Qaun already:
>
>>  2). Comment inside check_lock(),
>> we partition locks into IRQ-safe (always held with IRQs disabled) and
>> IRQ-unsafe (always held with IRQs enabled) types. The convention for
>> every lock must be consistently observed else we can deadlock in
>> IRQ-context rendezvous functions (__a rendezvous which gets every CPU
>> into IRQ context before any CPU is released from the rendezvous__).
>> If we can mix IRQ-disabled and IRQ-enabled callers, the following can
>> happen:
>>  * Lock is held by CPU A, with IRQs enabled
>>  * CPU B is spinning on same lock, with IRQs disabled
>>  * Rendezvous starts -- CPU A takes interrupt and enters rendezbous
>> spin
>>  * DEADLOCK -- CPU B will never enter rendezvous, CPU A will never
>> exit
>>                the rendezvous, and will hence never release the lock.
>>
>> To guard against this subtle bug we latch the IRQ safety of every
>> spinlock in the system, on first use.
>>
> This is a very good safety measure, but it can be quite a restriction
> in some (luckily limited) cases. And that's why it is possible --
> although absolutely discouraged-- to relax it. See, for an example of
> this, the comment in start_secondary(), in xen/arch/x86/smpboot.c:
>
>     ...
>     /*
>      * Just as during early bootstrap, it is convenient here to disable
>      * spinlock checking while we have IRQs disabled. This allows us to
>      * acquire IRQ-unsafe locks when it would otherwise be disallowed.
>      *
>      * It is safe because the race we are usually trying to avoid involves
>      * a group of CPUs rendezvousing in an IPI handler, where one cannot
>      * join because it is spinning with IRQs disabled waiting to acquire a
>      * lock held by another in the rendezvous group (the lock must be an
>      * IRQ-unsafe lock since the CPU took the IPI after acquiring it, and
>      * hence had IRQs enabled). This is a deadlock scenario.
>      *
>      * However, no CPU can be involved in rendezvous until it is online,
>      * hence no such group can be waiting for this CPU until it is
>      * visible in cpu_online_map. Hence such a deadlock is not possible.
>      */
>     spin_debug_disable();
>     ...
>
> Just FTR, at least as far as I can remember, Linux does not enforce
> anything like that.
>
> Hope this clarifies things.

Thank you very much for your explanation and education! They are
really helpful! :-D

Let me summarize: ;-)
|                                                      | spin_lock |
spin_lock_irq | spin_lock_irqsave
| Can run in irq context?                  | No            |  Yes
        | Yes
| Can run in irq disabled context?   | No            |  No                | Yes

Why deadlock may occur if we mix the spin_lock and spin_lock_irq(save)?
If we mix the spin_lock and spin_lock_irq(save), and a group of CPUs
rendezvousing in an IPI handler, we will have deadlock. Because the
CPU A that takes spin_lock will wait for the rendezvousing condition
to be satisfied, while the CPU B that takes th spin_lock_irq(save)
will not enter into the rendezvousing condition (since it disable the
interrupt). Then,
CPU A waits for CPU B to handle the IPI to get out of the
rendezvousing condition (kind of synchrous point), which won't until
it gets the spin_lock.
CPU B waits for CPU A to release the spin_lock, which won't until it
get out of the rendezvousing condition;

Are my understanding and summary correct?

Thanks and Best Regards,

Meng

-----------
Meng Xu
PhD Student in Computer and Information Science
University of Pennsylvania
http://www.cis.upenn.edu/~mengxu/

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.