[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Xen-devel] Re: [PATCH] fix pgd_lock deadlock



On Tue, 15 Feb 2011, Thomas Gleixner wrote:

> On Tue, 15 Feb 2011, Andrea Arcangeli wrote:
> > On Tue, Feb 15, 2011 at 08:26:51PM +0100, Thomas Gleixner wrote:
> > 
> > With NR_CPUs < 4, or with THP enabled, rmap.c will do
> > spin_lock(&mm->page_table_lock) (or pte_offset_map_lock where the lock
> > is still mm->page_table_lock and not the PT lock). Then it will send
> > IPIs to flush the tlb of the other CPUs.
> > 
> > But the other CPU is running the vmalloc_sync_all, and it is trying to
> > take the page_table_lock with irq disabled. It will never take the
> > lock because the CPU waiting the IPI delivery holds it. And it will
> > never run the IPI because it has irqs disabled.
> 
> Ok, that makes sense :)
>  
> > Now the big question is if anything is taking the pgd_lock from
> > irqs. Normal testing could never reveal it as even if it happens it
> > has a slim chance to happen while the pgd_lock is already hold by
> > normal kernel context. But the VM_BUG_ON(in_interrupt()) should
> > hopefully have revealed it already if it ever happened, I hope.
> > 
> > Clearly we could try to fix it in other ways, but still if there's no
> > reason to do the _irqsave this sounds a good idea to apply my fix
> > anyway.
> 
> Did you try with DEBUG_PAGEALLOC, which is calling into cpa quite a
> lot?

Another thing. You check for in_interrupt(), but what makes sure that
the code which takes pgd_lock is never taken with interrupts disabled
except during early boot ?

Thanks,

        tglx

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.