[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: NULL pointer dereference in xenbus_thread->...
On 30.04.25 16:29, Jason Andryuk wrote: On 2025-04-30 06:56, Marek Marczykowski-Górecki wrote:On Tue, Apr 29, 2025 at 08:59:45PM -0400, Jason Andryuk wrote:Hi Marek, On Wed, Apr 23, 2025 at 8:42 AM Marek Marczykowski-Górecki <marmarek@xxxxxxxxxxxxxxxxxxxxxx> wrote:I've got some more report confirming it's still happening on Linux 6.12.18. Is there anything I can do to help fixing this? Maybe ask users to enable some extra logging?Have you been able to capture a crash with debug symbols and run it through scripts/decode_stacktrace.sh?Not really, as I don't have debug symbols for this kernel. And I can't reliably reproduce it myself (for me it happens about once in a month...). I can try reproducing debug symbols, theoretically I should have all ingredients for it.I'm curious what process_msg+0x18e/0x2f0 is. process_writes() has a direct call to wake_up(), but process_msg() calling req->cb(req) may be xs_wake_up() which is a thin wrapper over wake_up().There is a code dump in the crash message, does it help?That's a little deeper in the call chain. If you have a vmlinux or bzImage with a matching stacktrace, that would work to look up the address in the disassembly. So if you don't have a matching pair, maybe try to catch it the next time.They make me wonder if req has been free()ed and at least partially zero-ed, but it still has wake_up() called. The call stack here is reminiscent of the one here https://lore.kernel.org/xen-devel/Z_lJTyVipJJEpWg2@mail-itl/ and the unexpected value there is 0.That's interesting idea, the one above I've seen only on 6.15-rc1 (and no latter rc). But maybe?I am guessing, so I could be wrong. NULL pointer and unexpected zero value are both 0 at least. Also Whonix looks like it may use init_on_free=1 to zero memory at free time. I have looked at this issue multiple times now. Just some remarks what IMO could go wrong (I didn't find any proof that this really happened, though), in case someone wants to double check: The most probably candidate for something going wrong is a use-after-free of a struct xb_req_data element (normally named "req" in the related code). Some words about the not really obvious locking scheme used for those elements: A "req" is owned by a thread as long as it isn't in any of the lists it can live in (xs_reply_list or xb_write_list). Putting it into one of the lists or removing it again requires to hold the xb_write_mutex. A "req" needs to be in a certain state when either in one of the lists or when being owned by a worker thread. I'm wondering whether it could happen that a thread waiting for a "req" could be woken up and the "req" is being freed before the waiting thread can react. Normally this shouldn't be possible, but "never say never". What catched my eye today is the test of req->state == xb_req_state_wait_reply in process_msg() just after dropping the xb_write_mutex. This looks a little bit fishy, but OTOH the request has been just removed from the xs_reply_list, so no mutex should be needed for that test. Possible candidates for such an "impossible" scenario include a wrap of xs_request_id (not very probable, though, as having 4 billion Xenstore requests "in flight" is rather unlikely IMHO). Juergen Attachment:
OpenPGP_0xB0DE9DD628BF132F.asc Attachment:
OpenPGP_signature.asc
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |