[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Xen Linux deadlock



On 07/06/17 17:05, Andre Przywara wrote:
> Hi,
> 
> when booting Linux 4.12-rc4 as Dom0 under a recent Xen HV I saw the
> following lockdep splat after running xencommons start:
> 
> root@junor1:~# bash /etc/init.d/xencommons start
> Setting domain 0 name, domid and JSON config...
> [  247.979498] ======================================================
> [  247.985688] WARNING: possible circular locking dependency detected
> [  247.991882] 4.12.0-rc4-00022-gc4b25c0 #575 Not tainted
> [  247.997040] ------------------------------------------------------
> [  248.003232] xenbus/91 is trying to acquire lock:
> [  248.007875]  (&u->msgbuffer_mutex){+.+.+.}, at: [<ffff00000863e904>]
> xenbus_dev_queue_reply+0x3c/0x230
> [  248.017163]
> [  248.017163] but task is already holding lock:
> [  248.023096]  (xb_write_mutex){+.+...}, at: [<ffff00000863a940>]
> xenbus_thread+0x5f0/0x798
> [  248.031267]
> [  248.031267] which lock already depends on the new lock.
> [  248.031267]
> [  248.039615]
> [  248.039615] the existing dependency chain (in reverse order) is:
> [  248.047176]
> [  248.047176] -> #1 (xb_write_mutex){+.+...}:
> [  248.052943]        __lock_acquire+0x1728/0x1778
> [  248.057498]        lock_acquire+0xc4/0x288
> [  248.061630]        __mutex_lock+0x84/0x868
> [  248.065755]        mutex_lock_nested+0x3c/0x50
> [  248.070227]        xs_send+0x164/0x1f8
> [  248.074015]        xenbus_dev_request_and_reply+0x6c/0x88
> [  248.079427]        xenbus_file_write+0x260/0x420
> [  248.084073]        __vfs_write+0x48/0x138
> [  248.088113]        vfs_write+0xa8/0x1b8
> [  248.091983]        SyS_write+0x54/0xb0
> [  248.095768]        el0_svc_naked+0x24/0x28
> [  248.099897]
> [  248.099897] -> #0 (&u->msgbuffer_mutex){+.+.+.}:
> [  248.106088]        print_circular_bug+0x80/0x2e0
> [  248.110730]        __lock_acquire+0x1768/0x1778
> [  248.115288]        lock_acquire+0xc4/0x288
> [  248.119417]        __mutex_lock+0x84/0x868
> [  248.123545]        mutex_lock_nested+0x3c/0x50
> [  248.128016]        xenbus_dev_queue_reply+0x3c/0x230
> [  248.133005]        xenbus_thread+0x788/0x798
> [  248.137306]        kthread+0x110/0x140
> [  248.141087]        ret_from_fork+0x10/0x40
> [  248.145214]
> [  248.145214] other info that might help us debug this:
> [  248.145214]
> [  248.153383]  Possible unsafe locking scenario:
> [  248.153383]
> [  248.159403]        CPU0                    CPU1
> [  248.163960]        ----                    ----
> [  248.168518]   lock(xb_write_mutex);
> [  248.172045]                                lock(&u->msgbuffer_mutex);
> [  248.178500]                                lock(xb_write_mutex);
> [  248.184514]   lock(&u->msgbuffer_mutex);
> [  248.188470]
> [  248.188470]  *** DEADLOCK ***
> [  248.188470]
> [  248.194578] 2 locks held by xenbus/91:
> [  248.198360]  #0:  (xs_response_mutex){+.+...}, at:
> [<ffff00000863a7b0>] xenbus_thread+0x460/0x798
> [  248.207218]  #1:  (xb_write_mutex){+.+...}, at: [<ffff00000863a940>]
> xenbus_thread+0x5f0/0x798
> [  248.215818]
> [  248.215818] stack backtrace:
> [  248.220293] CPU: 0 PID: 91 Comm: xenbus Not tainted
> 4.12.0-rc4-00022-gc4b25c0 #575
> [  248.227858] Hardware name: ARM Juno development board (r1) (DT)
> [  248.233792] Call trace:
> [  248.236289] [<ffff00000808a748>] dump_backtrace+0x0/0x270
> [  248.241707] [<ffff00000808aa94>] show_stack+0x24/0x30
> [  248.246782] [<ffff0000084caa98>] dump_stack+0xb8/0xf0
> [  248.251859] [<ffff000008139068>] print_circular_bug+0x1f8/0x2e0
> [  248.257787] [<ffff00000813c090>] __lock_acquire+0x1768/0x1778
> [  248.263548] [<ffff00000813c90c>] lock_acquire+0xc4/0x288
> [  248.268882] [<ffff000008bdb28c>] __mutex_lock+0x84/0x868
> [  248.274219] [<ffff000008bdbaac>] mutex_lock_nested+0x3c/0x50
> [  248.279889] [<ffff00000863e904>] xenbus_dev_queue_reply+0x3c/0x230
> [  248.286081] [<ffff00000863aad8>] xenbus_thread+0x788/0x798
> [  248.291585] [<ffff000008108070>] kthread+0x110/0x140
> [  248.296572] [<ffff000008083710>] ret_from_fork+0x10/0x40
> 
> Apparently it's not easily reproducible, but Julien confirmed that the
> dead lock condition as reported above is indeed in the Linux code.
> 
> Does anyone has an idea of how to fix this?

Shouldn't be too hard. The xb_write_mutex can be dropped earlier in the
critical path. I'll send a patch.


Juergen

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.