|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: kernel BUG around vmap/vfree - xen_enter_lazy_mmu()/xen_leave_lazy_mmu() - Linux 7.0-rc1
On 26.02.26 14:27, Andrew Cooper wrote: On 26/02/2026 1:17 pm, Marek Marczykowski-Górecki wrote:Hi, When testing Linux 7.0-rc1 in PV dom0, I hit the following panic sometimes: [ 436.849614] ------------[ cut here ]------------ [ 436.849669] kernel BUG at arch/x86/include/asm/xen/hypervisor.h:78! [ 436.849693] Oops: invalid opcode: 0000 [#1] SMP NOPTI [ 436.849710] CPU: 3 UID: 0 PID: 4021 Comm: kworker/u25:1 Not tainted 7.0.0-0.rc1.1.qubes.1001.fc41.x86_64 #1 PREEMPT(full) [ 436.849729] Hardware name: Star Labs StarBook/StarBook, BIOS 8.97 10/03/2023 [ 436.849743] Workqueue: i915_flip intel_atomic_commit_work [i915] [ 436.850226] RIP: e030:xen_enter_lazy_mmu+0x24/0x30 [ 436.850245] Code: 90 90 90 90 90 90 f3 0f 1e fa 0f 1f 44 00 00 65 8b 05 b8 e5 02 03 85 c0 75 10 65 c7 05 a9 e5 02 03 01 00 00 00 c3 cc cc cc cc <0f> 0b 66 2e 0f 1f 84 00 00 00 00 00 90 90 90 90 90 90 90 90 90 90 [ 436.850270] RSP: e02b:ffffc90045727a68 EFLAGS: 00010202 [ 436.850283] RAX: 0000000000000001 RBX: ffff8881042fa6d0 RCX: 000fffffffe00000 [ 436.850296] RDX: 0000000000000001 RSI: ffff88810a5a2980 RDI: 0000000000000000 [ 436.850308] RBP: ffffc90049eda000 R08: ffffc90049edc000 R09: ffffc90049edc000 [ 436.850320] R10: ffffc90049edc000 R11: ffffc90049edbfff R12: ffffc90049edc000 [ 436.850332] R13: ffffc90045727bb0 R14: ffffc90045727b28 R15: 800000000000006b [ 436.850356] FS: 0000000000000000(0000) GS:ffff888201e6e000(0000) knlGS:0000000000000000 [ 436.850371] CS: e030 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 436.850383] CR2: 00006543dbade250 CR3: 0000000115ef1000 CR4: 0000000000050660 [ 436.850401] Call Trace: [ 436.850410] <TASK> [ 436.850420] vmap_pages_pud_range+0x47c/0x530 [ 436.850439] vmap_small_pages_range_noflush+0x1f1/0x2b0 [ 436.850451] ? __get_vm_area_node+0x10a/0x170 [ 436.850465] vmap+0x79/0xd0 [ 436.850476] i915_gem_object_map_page+0x13b/0x210 [i915] [ 436.850812] i915_gem_object_pin_map+0x1e2/0x210 [i915] [ 436.851123] i915_gem_object_pin_map_unlocked+0x2d/0xa0 [i915] [ 436.851424] intel_dsb_buffer_create+0xed/0x1a0 [i915] [ 436.851778] intel_dsb_prepare+0xca/0x1a0 [i915] [ 436.852110] intel_atomic_dsb_finish+0x92/0x350 [i915] [ 436.852456] intel_atomic_commit_tail+0x326/0xd40 [i915] [ 436.852769] process_one_work+0x18d/0x380 [ 436.852779] worker_thread+0x196/0x300 [ 436.852787] ? __pfx_worker_thread+0x10/0x10 [ 436.852796] kthread+0xe3/0x120 [ 436.852805] ? __pfx_kthread+0x10/0x10 [ 436.852815] ret_from_fork+0x19e/0x260 [ 436.852824] ? __pfx_kthread+0x10/0x10 [ 436.852832] ret_from_fork_asm+0x1a/0x30 [ 436.852842] </TASK> [ 436.852847] Modules linked in: snd_seq_dummy snd_hrtimer snd_hda_codec_intelhdmi snd_hda_codec_hdmi snd_hda_codec_alc269 snd_hda_codec_realtek_lib snd_hda_scodec_component snd_hda_codec_generic snd_hda_intel snd_sof_pci_intel_tgl snd_sof_pci_intel_cnl snd_sof_intel_hda_generic soundwire_intel snd_sof_intel_hda_sdw_bpt snd_sof_intel_hda_common snd_soc_hdac_hda snd_sof_intel_hda_mlink snd_sof_intel_hda soundwire_cadence snd_sof_pci snd_sof_xtensa_dsp snd_sof snd_sof_utils snd_soc_acpi_intel_match snd_soc_acpi_intel_sdca_quirks soundwire_generic_allocation snd_soc_sdw_utils snd_soc_acpi crc8 intel_rapl_msr soundwire_bus intel_rapl_common snd_soc_sdca snd_soc_avs snd_soc_hda_codec snd_hda_ext_core snd_hda_codec vfat intel_uncore_frequency_common fat snd_hda_core snd_intel_dspcfg snd_intel_sdw_acpi snd_hwdep intel_powerclamp snd_soc_core iwlwifi snd_compress spi_nor iTCO_wdt ac97_bus intel_pmc_bxt ee1004 mtd snd_pcm_dmaengine snd_seq cfg80211 snd_seq_device pcspkr spi_intel_pci snd_pcm rfkill spi_intel snd_timer snd [ 436.852939] i2c_i801 soundcore i2c_smbus idma64 intel_pmc_core pmt_telemetry pmt_discovery pmt_class intel_hid intel_pmc_ssram_telemetry intel_scu_pltdrv sparse_keymap joydev loop fuse xenfs nfnetlink vsock_loopback vmw_vsock_virtio_transport_common vmw_vsock_vmci_transport vsock zram vmw_vmci lz4hc_compress lz4_compress dm_thin_pool dm_persistent_data dm_bio_prison dm_crypt xe drm_ttm_helper drm_suballoc_helper gpu_sched drm_gpuvm drm_exec drm_gpusvm_helper i915 i2c_algo_bit drm_buddy hid_multitouch i2c_hid_acpi ghash_clmulni_intel video nvme wmi ttm i2c_hid nvme_core nvme_keyring drm_display_helper nvme_auth xhci_pci pinctrl_tigerlake thunderbolt hkdf cec xhci_hcd intel_vsec serio_raw xen_acpi_processor xen_privcmd xen_pciback xen_blkback xen_gntalloc xen_gntdev xen_evtchn scsi_dh_rdac scsi_dh_emc scsi_dh_alua uinput i2c_dev [ 436.853183] ---[ end trace 0000000000000000 ]--- or this: [ 548.736884] ------------[ cut here ]------------ [ 548.736907] kernel BUG at arch/x86/include/asm/xen/hypervisor.h:85! [ 548.736923] Oops: invalid opcode: 0000 [#1] SMP NOPTI [ 548.736935] CPU: 0 UID: 0 PID: 206 Comm: kworker/0:2 Not tainted 7.0.0-0.rc1.1.qubes.1001.fc41.x86_64 #1 PREEMPT(full) [ 548.736949] Hardware name: LENOVO 2347A45/2347A45, BIOS CBET4000 Nitrokey-v0.2.0-2608-ga649597 01/01/1970 [ 548.736962] Workqueue: events delayed_vfree_work [ 548.736976] RIP: e030:xen_leave_lazy_mmu+0x44/0x50 [ 548.736989] Code: 02 03 83 f8 01 75 23 65 c7 05 6c e4 02 03 00 00 00 00 65 ff 0d 7d b8 02 03 74 05 c3 cc cc cc cc e8 61 5d fd ff c3 cc cc cc cc <0f> 0b 66 2e 0f 1f 84 00 00 00 00 00 90 90 90 90 90 90 90 90 90 90 [ 548.737010] RSP: e02b:ffffc90040607cf0 EFLAGS: 00010297 [ 548.737018] RAX: 0000000000000000 RBX: ffff888164a70408 RCX: 0000000000000000 [ 548.737029] RDX: 0000000000000000 RSI: 000ffffffffff000 RDI: ffff8881069c0000 [ 548.737039] RBP: ffffc90049681000 R08: ffffc90049681000 R09: 0000000000000027 [ 548.737050] R10: 0000000000000027 R11: fefefefefefefeff R12: ffffc90049681000 [ 548.737060] R13: ffff8881002fd258 R14: 0000000000000000 R15: ffffc90040607dac [ 548.737079] FS: 0000000000000000(0000) GS:ffff8881f88ee000(0000) knlGS:0000000000000000 [ 548.737090] CS: e030 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 548.737099] CR2: 000055576c2e6058 CR3: 000000010d47b000 CR4: 0000000000050660 [ 548.737115] Call Trace: [ 548.737123] <TASK> [ 548.737128] vunmap_pmd_range.isra.0+0x1f1/0x2e0 [ 548.737142] vunmap_p4d_range+0x17d/0x290 [ 548.737151] __vunmap_range_noflush+0x182/0x1d0 [ 548.737161] ? _raw_spin_unlock+0xe/0x30 [ 548.737171] remove_vm_area+0x40/0x70 [ 548.737180] vfree.part.0+0x1b/0x290 [ 548.737189] delayed_vfree_work+0x35/0x50 [ 548.737198] process_one_work+0x18d/0x380 [ 548.737207] worker_thread+0x196/0x300 [ 548.737215] ? __pfx_worker_thread+0x10/0x10 [ 548.737224] kthread+0xe3/0x120 [ 548.737233] ? __pfx_kthread+0x10/0x10 [ 548.737242] ret_from_fork+0x19e/0x260 [ 548.737250] ? __pfx_kthread+0x10/0x10 [ 548.737258] ret_from_fork_asm+0x1a/0x30 [ 548.737269] </TASK> [ 548.737274] Modules linked in: vfat fat snd_seq_dummy snd_hrtimer ath9k ath9k_common snd_hda_codec_intelhdmi snd_hda_codec_hdmi ath9k_hw snd_hda_codec_alc269 snd_hda_codec_realtek_lib snd_hda_scodec_component snd_hda_codec_generic snd_hda_intel snd_hda_codec mac80211 snd_hda_core snd_intel_dspcfg snd_intel_sdw_acpi snd_hwdep ath snd_seq snd_seq_device snd_ctl_led cfg80211 snd_pcm at24 thinkpad_acpi intel_rapl_msr i2c_i801 snd_timer sparse_keymap iTCO_wdt intel_rapl_common platform_profile intel_powerclamp intel_pmc_bxt pcspkr i2c_smbus rfkill libarc4 snd soundcore mei_me e1000e mei joydev lpc_ich loop fuse xenfs nfnetlink vsock_loopback vmw_vsock_virtio_transport_common vmw_vsock_vmci_transport vsock zram vmw_vmci lz4hc_compress lz4_compress dm_thin_pool dm_persistent_data dm_bio_prison dm_crypt i915 i2c_algo_bit drm_buddy ghash_clmulni_intel ttm sdhci_pci drm_display_helper sdhci_uhs2 sdhci video xhci_pci cqhci wmi cec xhci_hcd ehci_pci mmc_core ehci_hcd serio_raw xen_acpi_processor xen_privcmd xen_pciback [ 548.737348] xen_blkback xen_gntalloc xen_gntdev xen_evtchn scsi_dh_rdac scsi_dh_emc scsi_dh_alua uinput i2c_dev [ 548.737469] ---[ end trace 0000000000000000 ]--- I don't have clear pattern when this happens, one was during host suspend, but the other was during "normal" test run (starting/stopping domUs and running stuff around them). Note also one of those is Intel and the other AMD, so it isn't really hardware specific. Slightly more details with links (especially serial0.txt in the logs tab) at https://github.com/QubesOS/qubes-linux-kernel/pull/662#issuecomment-3963326188 Any idea?That looks like the issue Juergen fixed with: https://lore.kernel.org/xen-devel/20260220123715.834848-1-jgross@xxxxxxxx/ No, it doesn't. The fix is already in rc1, and the crash was quite early during boot (before any secondary CPUs were brought up). I guess this problem is related to the lazy_mmu_state series [1]. Juergen [1]: https://lore.kernel.org/lkml/20251215150323.2218608-1-kevin.brodsky@xxxxxxx/ Attachment:
OpenPGP_0xB0DE9DD628BF132F.asc Attachment:
OpenPGP_signature.asc
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |