|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: kernel BUG around vmap/vfree - xen_enter_lazy_mmu()/xen_leave_lazy_mmu() - Linux 7.0-rc1
On 08/04/2026 04:47, Marek Marczykowski-Górecki wrote:
>> That may well be the case - it seems that xen_enter_lazy_mmu() is called
>> while already in lazy MMU mode (first splat), and xen_leave_lazy_mmu()
>> is called without being in lazy MMU mode (second splat). I expect this
>> is something specific to Xen, which I didn't get the chance to test.
>>
>> Looking at the series again I don't see anything obviously wrong, but I
>> think the riskiest change is commit 291b3abed657 ("x86/xen: use
>> lazy_mmu_state when context-switching") - worth trying to revert it.
> With that reverted (on top of 7.0-rc6, didn't updated to rc7 yet), I
> still got panic, although might be a bit different one:
>
> [ 8.099973] BUG: unable to handle page fault for address: ffff888008000670
> [ 8.100004] #PF: supervisor write access in kernel mode
> [ 8.100021] #PF: error_code(0x0003) - permissions violation
> [ 8.100037] PGD 3a00067 P4D 3a00067 PUD 3a01067 PMD 7cd7063 PTE
> 8000000008000021
> [ 8.100063] Oops: Oops: 0003 [#1] SMP PTI
> [ 8.100079] CPU: 0 UID: 0 PID: 226 Comm: kworker/0:2 Not tainted
> 7.0.0-0.rc6.1.qubes.1001.fc41.x86_64 #1 PREEMPT(full)
> [ 8.100110] Workqueue: events do_free_init
> [ 8.100126] RIP: 0010:native_set_pte+0x4/0x10
> [ 8.100145] Code: 00 03 c3 cc cc cc cc 66 66 2e 0f 1f 84 00 00 00 00 00 0f
> 1f 40 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa <48> 89
> 37 c3 cc cc cc cc 0f 1f 40 00 90 90 90 90 90 90 90 90 90 90
> [ 8.100195] RSP: 0018:ffffc90000c97c48 EFLAGS: 00010287
> [ 8.100212] RAX: e00c4f3d8b48c03e RBX: ffff888008000670 RCX:
> e00000000000003e
> [ 8.100234] RDX: e00c4f3d8b48c13e RSI: e00c4f3d8b48c03e RDI:
> ffff888008000670
> [ 8.100260] RBP: e00c4f3d8b48c13e R08: 0000000000000000 R09:
> 0000000000000001
> [ 8.100282] R10: 0000003b0c274b73 R11: e00000000000013e R12:
> ffffc90000c97cf0
> [ 8.100304] R13: ffffffffc04ce000 R14: fffc4f3d8b48cfff R15:
> e00000000000013e
> [ 8.100327] FS: 0000000000000000(0000) GS:ffff888094e81000(0000)
> knlGS:0000000000000000
> [ 8.100350] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 8.100369] CR2: ffff888008000670 CR3: 000000000242e003 CR4:
> 00000000001706f0
> [ 8.100394] Call Trace:
> [ 8.100404] <TASK>
> [ 8.100413] __change_page_attr+0x24f/0x350
> [ 8.100429] __change_page_attr_set_clr+0x61/0xd0
> [ 8.100446] change_page_attr_set_clr+0x103/0x1a0
> [ 8.100467] set_memory_nx+0x39/0x50
> [ 8.100481] __execmem_cache_free+0x35/0xb0
> [ 8.100496] execmem_free+0x9f/0x180
> [ 8.100510] ? nft_chain_nat_exit+0xe70/0xe70 [nft_chain_nat]
> [ 8.100531] do_free_init+0x2e/0x60
> [ 8.100545] process_one_work+0x198/0x390
> [ 8.100559] worker_thread+0x1af/0x320
> [ 8.100573] ? __pfx_worker_thread+0x10/0x10
> [ 8.103338] kthread+0xe3/0x120
> [ 8.103355] ? __pfx_kthread+0x10/0x10
> [ 8.103369] ret_from_fork+0x19e/0x260
> [ 8.103384] ? __pfx_kthread+0x10/0x10
> [ 8.103397] ret_from_fork_asm+0x1a/0x30
> [ 8.103412] </TASK>
> [ 8.103421] Modules linked in: xenfs nft_reject_inet nf_reject_ipv4
> nf_reject_ipv6 nft_reject nft_redir nft_ct nft_chain_nat nf_nat nf_conntrack
> nf_defrag_ipv6 nf_defrag_ipv4 nf_tables binfmt_misc intel_rapl_msr
> intel_rapl_common ghash_clmulni_intel xen_netfront xen_privcmd xen_gntdev
> xen_gntalloc xen_blkback xen_evtchn fuse loop nfnetlink ip_tables overlay
> xen_blkfront
> [ 8.103529] CR2: ffff888008000670
> [ 8.103542] ---[ end trace 0000000000000000 ]---
> [ 8.103558] RIP: 0010:native_set_pte+0x4/0x10
> [ 8.103576] Code: 00 03 c3 cc cc cc cc 66 66 2e 0f 1f 84 00 00 00 00 00 0f
> 1f 40 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa <48> 89
> 37 c3 cc cc cc cc 0f 1f 40 00 90 90 90 90 90 90 90 90 90 90
> [ 8.103625] RSP: 0018:ffffc90000c97c48 EFLAGS: 00010287
> [ 8.103641] RAX: e00c4f3d8b48c03e RBX: ffff888008000670 RCX:
> e00000000000003e
> [ 8.103664] RDX: e00c4f3d8b48c13e RSI: e00c4f3d8b48c03e RDI:
> ffff888008000670
> [ 8.103686] RBP: e00c4f3d8b48c13e R08: 0000000000000000 R09:
> 0000000000000001
> [ 8.103708] R10: 0000003b0c274b73 R11: e00000000000013e R12:
> ffffc90000c97cf0
> [ 8.103730] R13: ffffffffc04ce000 R14: fffc4f3d8b48cfff R15:
> e00000000000013e
> [ 8.103753] FS: 0000000000000000(0000) GS:ffff888094e81000(0000)
> knlGS:0000000000000000
> [ 8.103775] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 8.103794] CR2: ffff888008000670 CR3: 000000000242e003 CR4:
> 00000000001706f0
> [ 8.103820] Kernel panic - not syncing: Fatal exception
> [ 8.103929] Kernel Offset: disabled
That is probably the same root cause indeed (lazy MMU appearing disabled
in __xet_set_pte() while it should be enabled).
>> If
>> that doesn't help, I'd suggest bisecting the following range:
>> 58852f24f956..291b3abed657
> It will take some time, as the issue doesn't happen every time.
Understood. Here are the commits that are expected to have a functional
effect on x86 (in reverse chronological order):
- 291b3abed657 ("x86/xen: use lazy_mmu_state when context-switching")
- 5ab246749569 ("mm: enable lazy_mmu sections to nest")
- 9273dfaeaca8 ("mm: bail out of lazy_mmu_mode_* in interrupt context")
- 66bdd779d344 ("x86/xen: simplify flush_lazy_mmu()")
Hope that helps, let me know if you have any further information. It
would be worth enabling CONFIG_DEBUG_VM and then checking if any WARN()
splat appears in the log.
- Kevin
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |