[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v3 1/2] IOMMU/spinlock: Fix a bug found in AMD IOMMU initialization



On March 11, 2016 11:25am, <mengxu@xxxxxxxxxxxxx> wrote:
> On Wed, Mar 9, 2016 at 8:17 AM, Quan Xu <quan.xu@xxxxxxxxx> wrote:
> > pcidevs_lock should be held with interrupt enabled. However there
> > remains an exception in AMD IOMMU code, where the lock is acquired
> > with interrupt disabled. This inconsistency might lead to deadlock.
> 
> Why will this inconsistency lead to deadlock?
> I understand the difference between spin_lock_irqsave(), which disable 
> interrupt,
> and spin_lock(), which allows interrupt, however, I didn't get why misuse the
> spin_lock_irqsave() at the place of spin_lock() could potentially lead to a
> deadlock?

For above 2 questions:
Mix is just one of _possible_ scenarios(IMO, not 100% leading to a deadlock):

-where an interrupt tries to lock an already locked variable. This is ok if
the other interrupt happens on another CPU, but it is _not_ ok if the
interrupt happens on the same CPU that already holds the lock

I got it from the bellow 2 points:
 1).As Jan mentioned, The implication from disabling interrupts while acquiring 
a lock is that the lock is also being acquired by some interrupt handler.
  If you mix acquire types, the one not disabling interrupts is prone to be 
interrupted, and the interrupt trying to get hold of the lock the same CPU 
already owns.

 2). Comment inside check_lock(), 
we partition locks into IRQ-safe (always held with IRQs disabled) and
IRQ-unsafe (always held with IRQs enabled) types. The convention for
every lock must be consistently observed else we can deadlock in
IRQ-context rendezvous functions (__a rendezvous which gets every CPU
into IRQ context before any CPU is released from the rendezvous__).
If we can mix IRQ-disabled and IRQ-enabled callers, the following can
happen:
 * Lock is held by CPU A, with IRQs enabled
 * CPU B is spinning on same lock, with IRQs disabled
 * Rendezvous starts -- CPU A takes interrupt and enters rendezbous spin
 * DEADLOCK -- CPU B will never enter rendezvous, CPU A will never exit
               the rendezvous, and will hence never release the lock.

To guard against this subtle bug we latch the IRQ safety of every
spinlock in the system, on first use.

> Would you minding pointing me to somewhere I can find the reason or
> enlighten me?
> 

Some reference material:

   * (look at "Lesson 3: spinlocks revisited.")
     https://www.kernel.org/doc/Documentation/locking/spinlocks.txt
   * comment inside check_lock(), in xen/common/spinlock.c, in Xen's codebase.
   * the "Linux Device Drivers, 3rd Edition" book , 
http://www.makelinux.net/ldd3/chp-5-sect-5

Quan
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.