[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH] xenbus: Use kref to track req lifetime
On 2025-05-07 05:27, Jürgen Groß wrote: On 06.05.25 23:09, Jason Andryuk wrote:Marek reported seeing a NULL pointer fault in the xenbus_thread callstack: BUG: kernel NULL pointer dereference, address: 0000000000000000 RIP: e030:__wake_up_common+0x4c/0x180 Call Trace: <TASK> __wake_up_common_lock+0x82/0xd0 process_msg+0x18e/0x2f0 xenbus_thread+0x165/0x1c0 process_msg+0x18e is req->cb(req). req->cb is set to xs_wake_up(), a thin wrapper around wake_up(), or xenbus_dev_queue_reply(). It seems like it was xs_wake_up() in this case. It seems like req may have woken up the xs_wait_for_reply(), which kfree()ed the req. When xenbus_thread resumes, it faults on the zero-ed data. Linux Device Drivers 2nd edition states: "Normally, a wake_up call can cause an immediate reschedule to happen, meaning that other processes might run before wake_up returns." ... which would match the behaviour observed. Change to keeping two krefs on each request. One for the caller, and one for xenbus_thread. Each will kref_put() when finished, and the last will free it. This use of kref matches the description in Documentation/core-api/kref.rst Link: https://lore.kernel.org/xen-devel/ZO0WrR5J0xuwDIxW@mail-itl/Reported-by: "Marek Marczykowski-Górecki" <marmarek@xxxxxxxxxxxxxxxxxxxxxx> Fixes: fd8aa9095a95 ("xen: optimize xenbus driver for multiple concurrent xenstore accesses")Cc: stable@xxxxxxxxxxxxxxx Signed-off-by: Jason Andryuk <jason.andryuk@xxxxxxx>Reviewed-by: Juergen Gross <jgross@xxxxxxxx> Thanks --- Kinda RFC-ish as I don't know if it fixes Marek's issue. This does seem like the correct approach if we are seeing req free()ed out from under xenbus_thread.I think your analysis is correct. When writing this code I didn't think of wake_up() needing to access req->wq _after_ having woken up the waiter. Yes, this was tricky.One other thing that makes me think this is correct. If this is the same underlying issue: https://lore.kernel.org/xen-devel/Z_lJTyVipJJEpWg2@mail-itl/ The failure is in the unlock: pvqspinlock: lock 0xffff8881029af110 has corrupted value 0x0!WARNING: CPU: 1 PID: 118 at kernel/locking/qspinlock_paravirt.h:504 __pv_queued_spin_unlock_slowpath+0xdc/0x120 Which makes me think the req was fine entering wake_up(), and it's only found to be corrupt on the way out. Regards, Jason
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |