[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: Lockdep show 6.6-rc regression in Xen HVM CPU hotplug
On Tue, 2023-10-24 at 08:43 +0100, David Woodhouse wrote: > On Tue, 2023-10-24 at 08:53 +0200, Juergen Gross wrote: > > > > I'm puzzled. This path doesn't contain any of the RCU usage I've added in > > commit 87797fad6cce. > > > > Are you sure that with just reverting commit 87797fad6cce the issue doesn't > > manifest anymore? I'd rather expect commit 721255b9826b having caused this > > behavior, just telling from the messages above. > > Retesting in the cold light of day, yes. Using v6.6-rc5 which is the > parent commit of the offending 87797fad6cce. I don't know if this is relevant or just noise, but this only happens under true Xen (I've only tested 4.17.1, as noted). Under Qemu I get the same warning on offlining, but it does come back. However, the reason I was looking at this in the first place was because when I enable XEN_HVM_CPUID_UPCALL_VECTOR under Qemu¹ to make the guest use the per-vCPU local APIC upcall, I get a different failure mode on onlining. ¹ https://lore.kernel.org/qemu-devel/20231019154020.99080-10-dwmw2@xxxxxxxxxxxxx/ I don't know if it's related to interrupts being enabled too early, which is what some of the other warnings at *online* time in my first email seemed to be complaining about? This one seems to be a timer IRQ storm (or just never finishing processing the event bitmap?). Not sure if this is just because Qemu manages to deliver the first event channel a bit sooner than Xen, perhaps because it should have cleared something in the vCPU state when offlining? Maybe it's an additional clue... maybe just noise and my fault in Qemu or KVM. [root@localhost cpu1]# echo 1 > online kvm_xen_set_vcpu_callback callback vcpu 1 vector 243 [ 33.451969] installing Xen timer for CPU 1 [ 33.453547] smpboot: Booting Node 0 Processor 1 APIC 0x1 [ 60.000948] watchdog: BUG: soft lockup - CPU#0 stuck for 26s! [khugepaged:37] [ 60.000985] [ 60.000987] ============================= [ 60.000988] [ BUG: Invalid wait context ] [ 60.000989] 6.6.0-rc5+ #1376 Not tainted [ 60.000991] ----------------------------- [ 60.000992] khugepaged/37 is trying to lock: [ 60.000993] ffffffff9841abd8 (&port_lock_key){-.-.}-{3:3}, at: serial8250_console_write+0x460/0x4e0 [ 60.001016] other info that might help us debug this: [ 60.001016] context-{2:2} [ 60.001017] 6 locks held by khugepaged/37: [ 60.001019] #0: ffff88f301c93620 (&mm->mmap_lock){++++}-{4:4}, at: collapse_huge_page+0x2d1/0x890 [ 60.001038] #1: ffff88f3075aa8e8 (&anon_vma->rwsem){+.+.}-{4:4}, at: collapse_huge_page+0x40a/0x890 [ 60.001048] #2: ffffffff96c03820 (rcu_read_lock){....}-{1:3}, at: xen_evtchn_do_upcall+0x61/0x1b0 [ 60.001062] #3: ffffffff96c007e0 (console_lock){+.+.}-{0:0}, at: vprintk_emit+0x12e/0x1c0 [ 60.001068] #4: ffffffff96c00830 (console_srcu){....}-{0:0}, at: console_flush_all+0x21/0x260 [ 60.001072] #5: ffffffff96880420 (console_owner){-.-.}-{0:0}, at: console_emit_next_record+0x113/0x330 [ 60.001078] stack backtrace: [ 60.001079] CPU: 0 PID: 37 Comm: khugepaged Not tainted 6.6.0-rc5+ #1376 [ 60.001082] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.2-14-g1e1da7a96300-prebuilt.qemu.org 04/01/2014 [ 60.001084] Call Trace: [ 60.001086] <IRQ> [ 60.001093] dump_stack_lvl+0x57/0x90 [ 60.001098] __lock_acquire+0x7bb/0xbb0 [ 60.001105] lock_acquire.part.0+0xad/0x240 [ 60.001108] ? serial8250_console_write+0x460/0x4e0 [ 60.001114] ? rcu_is_watching+0xd/0x40 [ 60.001123] ? lock_acquire+0xf2/0x110 [ 60.001128] __raw_spin_lock_irqsave+0x42/0x60 [ 60.001131] ? serial8250_console_write+0x460/0x4e0 [ 60.001135] serial8250_console_write+0x460/0x4e0 [ 60.001140] ? lock_acquire.part.0+0xbd/0x240 [ 60.001145] ? rcu_is_watching+0xd/0x40 [ 60.001148] ? lock_acquire+0xf2/0x110 [ 60.001155] ? console_emit_next_record+0x113/0x330 [ 60.001157] console_emit_next_record+0x13c/0x330 [ 60.001159] ? console_emit_next_record+0x113/0x330 [ 60.001165] console_flush_all+0xed/0x260 [ 60.001167] ? console_flush_all+0x21/0x260 [ 60.001170] console_unlock+0x52/0xf0 [ 60.001178] vprintk_emit+0x173/0x1c0 [ 60.001182] _printk+0x60/0x80 [ 60.001191] watchdog_timer_fn+0x216/0x2a0 [ 60.001205] ? __pfx_watchdog_timer_fn+0x10/0x10 [ 60.001209] __hrtimer_run_queues+0x1b2/0x380 [ 60.001217] hrtimer_interrupt+0xfe/0x240 [ 60.001222] xen_timer_interrupt+0x29/0x40 [ 60.001233] __handle_irq_event_percpu+0x87/0x240 [ 60.001238] handle_irq_event_percpu+0xf/0x40 [ 60.001242] handle_percpu_irq+0x4f/0x70 [ 60.001247] handle_irq_desc+0x3a/0x50 [ 60.001250] evtchn_2l_handle_events+0x254/0x2a0 [ 60.001254] ? rcu_is_watching+0xd/0x40 [ 60.001257] ? lock_acquire+0xf2/0x110 [ 60.001262] xen_evtchn_do_upcall+0xa1/0x1b0 [ 60.001266] __sysvec_xen_hvm_callback+0x47/0x80 [ 60.001271] sysvec_xen_hvm_callback+0x99/0xc0 [ 60.001283] </IRQ> [ 60.001284] <TASK> [ 60.001288] asm_sysvec_xen_hvm_callback+0x16/0x20 [ 60.001308] RIP: 0010:smp_call_function_many_cond+0x119/0x660 [ 60.001312] Code: 7e 08 48 63 d0 e8 17 90 44 00 3b 05 e1 bf c5 01 73 25 48 63 d0 49 8b 36 48 03 34 d5 60 dc 55 96 8b 56 08 83 e2 01 74 0a f3 90 <8b> 4e 08 83 e1 01 75 f6 83 c0 01 eb c1 48 83 c4 38 5b 5d 41 5c 41 [ 60.001315] RSP: 0018:ffffae458014fc00 EFLAGS: 00000202 [ 60.001317] RAX: 0000000000000001 RBX: 0000000000000000 RCX: 0000000000000001 [ 60.001319] RDX: 0000000000000001 RSI: ffff88f33ecb3e00 RDI: ffff88f301112790 [ 60.001320] RBP: 0000000000000000 R08: ffffffff96015000 R09: fffffffffffffffc [ 60.001322] R10: ffffffff95cacbac R11: ffff88f301bd0d48 R12: 0000000000000000 [ 60.001323] R13: 0000000000000001 R14: ffff88f33ec2f8c0 R15: 0000000000000001 [ 60.001326] ? mtree_load+0x8c/0x590 [ 60.001333] ? __pfx_tlb_remove_table_smp_sync+0x10/0x10 [ 60.001346] ? __pfx_tlb_remove_table_smp_sync+0x10/0x10 [ 60.001350] smp_call_function+0x39/0x70 [ 60.001354] collapse_huge_page+0x509/0x890 [ 60.001361] hpage_collapse_scan_pmd+0x376/0x8b0 [ 60.001369] khugepaged_scan_mm_slot.constprop.0+0x28e/0x500 [ 60.001377] khugepaged+0xd5/0x240 [ 60.001380] ? _raw_spin_unlock_irqrestore+0x4c/0x60 [ 60.001388] ? lockdep_hardirqs_on+0x7d/0x100 [ 60.001396] ? __pfx_khugepaged+0x10/0x10 [ 60.001400] kthread+0xf6/0x130 [ 60.001405] ? __pfx_kthread+0x10/0x10 [ 60.001408] ret_from_fork+0x30/0x50 [ 60.001417] ? __pfx_kthread+0x10/0x10 [ 60.001419] ret_from_fork_asm+0x1b/0x30 [ 60.001427] </TASK> [ 60.062651] Modules linked in: [ 60.063126] irq event stamp: 58300 [ 60.063680] hardirqs last enabled at (58299): [<ffffffff95e01046>] asm_sysvec_xen_hvm_callback+0x16/0x20 [ 60.065189] hardirqs last disabled at (58300): [<ffffffff95ccc1ea>] sysvec_xen_hvm_callback+0xa/0xc0 [ 60.066633] softirqs last enabled at (58098): [<ffffffff95cec89c>] __do_softirq+0x31c/0x423 [ 60.067787] softirqs last disabled at (58089): [<ffffffff950bab91>] __irq_exit_rcu+0x91/0x100 [ 60.068928] CPU: 0 PID: 37 Comm: khugepaged Not tainted 6.6.0-rc5+ #1376 [ 60.069786] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.2-14-g1e1da7a96300-prebuilt.qemu.org 04/01/2014 [ 60.071340] RIP: 0010:smp_call_function_many_cond+0x119/0x660 [ 60.072100] Code: 7e 08 48 63 d0 e8 17 90 44 00 3b 05 e1 bf c5 01 73 25 48 63 d0 49 8b 36 48 03 34 d5 60 dc 55 96 8b 56 08 83 e2 01 74 0a f3 90 <8b> 4e 08 83 e1 01 75 f6 83 c0 01 eb c1 48 83 c4 38 5b 5d 41 5c 41 [ 60.074795] RSP: 0018:ffffae458014fc00 EFLAGS: 00000202 [ 60.075557] RAX: 0000000000000001 RBX: 0000000000000000 RCX: 0000000000000001 [ 60.076509] RDX: 0000000000000001 RSI: ffff88f33ecb3e00 RDI: ffff88f301112790 [ 60.077465] RBP: 0000000000000000 R08: ffffffff96015000 R09: fffffffffffffffc [ 60.078476] R10: ffffffff95cacbac R11: ffff88f301bd0d48 R12: 0000000000000000 [ 60.079434] R13: 0000000000000001 R14: ffff88f33ec2f8c0 R15: 0000000000000001 [ 60.080589] FS: 0000000000000000(0000) GS:ffff88f33ec00000(0000) knlGS:0000000000000000 [ 60.081626] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 60.082459] CR2: 00007f4cceba6818 CR3: 00000000021ec000 CR4: 00000000000006f0 [ 60.083672] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 60.085047] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 60.085994] Call Trace: [ 60.086283] <IRQ> [ 60.086530] ? watchdog_timer_fn+0x230/0x2a0 [ 60.087084] ? __pfx_watchdog_timer_fn+0x10/0x10 [ 60.087603] ? __hrtimer_run_queues+0x1b2/0x380 [ 60.088307] ? hrtimer_interrupt+0xfe/0x240 [ 60.088931] ? xen_timer_interrupt+0x29/0x40 [ 60.089578] ? __handle_irq_event_percpu+0x87/0x240 [ 60.090420] ? handle_irq_event_percpu+0xf/0x40 [ 60.091083] ? handle_percpu_irq+0x4f/0x70 [ 60.091646] ? handle_irq_desc+0x3a/0x50 [ 60.092166] ? evtchn_2l_handle_events+0x254/0x2a0 [ 60.092716] ? rcu_is_watching+0xd/0x40 [ 60.093248] ? lock_acquire+0xf2/0x110 [ 60.093814] ? xen_evtchn_do_upcall+0xa1/0x1b0 [ 60.094482] ? __sysvec_xen_hvm_callback+0x47/0x80 [ 60.095262] ? sysvec_xen_hvm_callback+0x99/0xc0 [ 60.095978] </IRQ> [ 60.096295] <TASK> [ 60.096610] ? asm_sysvec_xen_hvm_callback+0x16/0x20 [ 60.097323] ? mtree_load+0x8c/0x590 [ 60.097872] ? smp_call_function_many_cond+0x119/0x660 [ 60.098796] ? __pfx_tlb_remove_table_smp_sync+0x10/0x10 [ 60.099599] ? __pfx_tlb_remove_table_smp_sync+0x10/0x10 [ 60.100515] smp_call_function+0x39/0x70 [ 60.101119] collapse_huge_page+0x509/0x890 [ 60.101920] hpage_collapse_scan_pmd+0x376/0x8b0 [ 60.102642] khugepaged_scan_mm_slot.constprop.0+0x28e/0x500 [ 60.103378] khugepaged+0xd5/0x240 [ 60.103884] ? _raw_spin_unlock_irqrestore+0x4c/0x60 [ 60.104659] ? lockdep_hardirqs_on+0x7d/0x100 [ 60.105364] ? __pfx_khugepaged+0x10/0x10 [ 60.105945] kthread+0xf6/0x130 [ 60.106363] ? __pfx_kthread+0x10/0x10 [ 60.106808] ret_from_fork+0x30/0x50 [ 60.107239] ? __pfx_kthread+0x10/0x10 [ 60.107769] ret_from_fork_asm+0x1b/0x30 [ 60.108430] </TASK> Attachment:
smime.p7s
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |