[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Lockdep show 6.6-rc regression in Xen HVM CPU hotplug



On Tue, 2023-10-24 at 08:43 +0100, David Woodhouse wrote:
> On Tue, 2023-10-24 at 08:53 +0200, Juergen Gross wrote:
> > 
> > I'm puzzled. This path doesn't contain any of the RCU usage I've added in
> > commit 87797fad6cce.
> > 
> > Are you sure that with just reverting commit 87797fad6cce the issue doesn't
> > manifest anymore? I'd rather expect commit 721255b9826b having caused this
> > behavior, just telling from the messages above.
> 
> Retesting in the cold light of day, yes. Using v6.6-rc5 which is the
> parent commit of the offending 87797fad6cce.

I don't know if this is relevant or just noise, but this only happens
under true Xen (I've only tested 4.17.1, as noted).

Under Qemu I get the same warning on offlining, but it does come back.

However, the reason I was looking at this in the first place was
because when I enable XEN_HVM_CPUID_UPCALL_VECTOR under Qemu¹ to make
the guest use the per-vCPU local APIC upcall, I get a different failure
mode on onlining.

¹ 
https://lore.kernel.org/qemu-devel/20231019154020.99080-10-dwmw2@xxxxxxxxxxxxx/

I don't know if it's related to interrupts being enabled too early,
which is what some of the other warnings at *online* time in my first
email seemed to be complaining about? This one seems to be a timer IRQ
storm (or just never finishing processing the event bitmap?). 

Not sure if this is just because Qemu manages to deliver the first
event channel a bit sooner than Xen, perhaps because it should have
cleared something in the vCPU state when offlining?

Maybe it's an additional clue... maybe just noise and my fault in Qemu or KVM.

[root@localhost cpu1]# echo 1 > online 
kvm_xen_set_vcpu_callback callback vcpu 1 vector 243
[   33.451969] installing Xen timer for CPU 1
[   33.453547] smpboot: Booting Node 0 Processor 1 APIC 0x1
[   60.000948] watchdog: BUG: soft lockup - CPU#0 stuck for 26s! [khugepaged:37]
[   60.000985] 
[   60.000987] =============================
[   60.000988] [ BUG: Invalid wait context ]
[   60.000989] 6.6.0-rc5+ #1376 Not tainted
[   60.000991] -----------------------------
[   60.000992] khugepaged/37 is trying to lock:
[   60.000993] ffffffff9841abd8 (&port_lock_key){-.-.}-{3:3}, at: 
serial8250_console_write+0x460/0x4e0
[   60.001016] other info that might help us debug this:
[   60.001016] context-{2:2}
[   60.001017] 6 locks held by khugepaged/37:
[   60.001019]  #0: ffff88f301c93620 (&mm->mmap_lock){++++}-{4:4}, at: 
collapse_huge_page+0x2d1/0x890
[   60.001038]  #1: ffff88f3075aa8e8 (&anon_vma->rwsem){+.+.}-{4:4}, at: 
collapse_huge_page+0x40a/0x890
[   60.001048]  #2: ffffffff96c03820 (rcu_read_lock){....}-{1:3}, at: 
xen_evtchn_do_upcall+0x61/0x1b0
[   60.001062]  #3: ffffffff96c007e0 (console_lock){+.+.}-{0:0}, at: 
vprintk_emit+0x12e/0x1c0
[   60.001068]  #4: ffffffff96c00830 (console_srcu){....}-{0:0}, at: 
console_flush_all+0x21/0x260
[   60.001072]  #5: ffffffff96880420 (console_owner){-.-.}-{0:0}, at: 
console_emit_next_record+0x113/0x330
[   60.001078] stack backtrace:
[   60.001079] CPU: 0 PID: 37 Comm: khugepaged Not tainted 6.6.0-rc5+ #1376
[   60.001082] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
rel-1.16.2-14-g1e1da7a96300-prebuilt.qemu.org 04/01/2014
[   60.001084] Call Trace:
[   60.001086]  <IRQ>
[   60.001093]  dump_stack_lvl+0x57/0x90
[   60.001098]  __lock_acquire+0x7bb/0xbb0
[   60.001105]  lock_acquire.part.0+0xad/0x240
[   60.001108]  ? serial8250_console_write+0x460/0x4e0
[   60.001114]  ? rcu_is_watching+0xd/0x40
[   60.001123]  ? lock_acquire+0xf2/0x110
[   60.001128]  __raw_spin_lock_irqsave+0x42/0x60
[   60.001131]  ? serial8250_console_write+0x460/0x4e0
[   60.001135]  serial8250_console_write+0x460/0x4e0
[   60.001140]  ? lock_acquire.part.0+0xbd/0x240
[   60.001145]  ? rcu_is_watching+0xd/0x40
[   60.001148]  ? lock_acquire+0xf2/0x110
[   60.001155]  ? console_emit_next_record+0x113/0x330
[   60.001157]  console_emit_next_record+0x13c/0x330
[   60.001159]  ? console_emit_next_record+0x113/0x330
[   60.001165]  console_flush_all+0xed/0x260
[   60.001167]  ? console_flush_all+0x21/0x260
[   60.001170]  console_unlock+0x52/0xf0
[   60.001178]  vprintk_emit+0x173/0x1c0
[   60.001182]  _printk+0x60/0x80
[   60.001191]  watchdog_timer_fn+0x216/0x2a0
[   60.001205]  ? __pfx_watchdog_timer_fn+0x10/0x10
[   60.001209]  __hrtimer_run_queues+0x1b2/0x380
[   60.001217]  hrtimer_interrupt+0xfe/0x240
[   60.001222]  xen_timer_interrupt+0x29/0x40
[   60.001233]  __handle_irq_event_percpu+0x87/0x240
[   60.001238]  handle_irq_event_percpu+0xf/0x40
[   60.001242]  handle_percpu_irq+0x4f/0x70
[   60.001247]  handle_irq_desc+0x3a/0x50
[   60.001250]  evtchn_2l_handle_events+0x254/0x2a0
[   60.001254]  ? rcu_is_watching+0xd/0x40
[   60.001257]  ? lock_acquire+0xf2/0x110
[   60.001262]  xen_evtchn_do_upcall+0xa1/0x1b0
[   60.001266]  __sysvec_xen_hvm_callback+0x47/0x80
[   60.001271]  sysvec_xen_hvm_callback+0x99/0xc0
[   60.001283]  </IRQ>
[   60.001284]  <TASK>
[   60.001288]  asm_sysvec_xen_hvm_callback+0x16/0x20
[   60.001308] RIP: 0010:smp_call_function_many_cond+0x119/0x660
[   60.001312] Code: 7e 08 48 63 d0 e8 17 90 44 00 3b 05 e1 bf c5 01 73 25 48 
63 d0 49 8b 36 48 03 34 d5 60 dc 55 96 8b 56 08 83 e2 01 74 0a f3 90 <8b> 4e 08 
83 e1 01 75 f6 83 c0 01 eb c1 48 83 c4 38 5b 5d 41 5c 41
[   60.001315] RSP: 0018:ffffae458014fc00 EFLAGS: 00000202
[   60.001317] RAX: 0000000000000001 RBX: 0000000000000000 RCX: 0000000000000001
[   60.001319] RDX: 0000000000000001 RSI: ffff88f33ecb3e00 RDI: ffff88f301112790
[   60.001320] RBP: 0000000000000000 R08: ffffffff96015000 R09: fffffffffffffffc
[   60.001322] R10: ffffffff95cacbac R11: ffff88f301bd0d48 R12: 0000000000000000
[   60.001323] R13: 0000000000000001 R14: ffff88f33ec2f8c0 R15: 0000000000000001
[   60.001326]  ? mtree_load+0x8c/0x590
[   60.001333]  ? __pfx_tlb_remove_table_smp_sync+0x10/0x10
[   60.001346]  ? __pfx_tlb_remove_table_smp_sync+0x10/0x10
[   60.001350]  smp_call_function+0x39/0x70
[   60.001354]  collapse_huge_page+0x509/0x890
[   60.001361]  hpage_collapse_scan_pmd+0x376/0x8b0
[   60.001369]  khugepaged_scan_mm_slot.constprop.0+0x28e/0x500
[   60.001377]  khugepaged+0xd5/0x240
[   60.001380]  ? _raw_spin_unlock_irqrestore+0x4c/0x60
[   60.001388]  ? lockdep_hardirqs_on+0x7d/0x100
[   60.001396]  ? __pfx_khugepaged+0x10/0x10
[   60.001400]  kthread+0xf6/0x130
[   60.001405]  ? __pfx_kthread+0x10/0x10
[   60.001408]  ret_from_fork+0x30/0x50
[   60.001417]  ? __pfx_kthread+0x10/0x10
[   60.001419]  ret_from_fork_asm+0x1b/0x30
[   60.001427]  </TASK>
[   60.062651] Modules linked in:
[   60.063126] irq event stamp: 58300
[   60.063680] hardirqs last  enabled at (58299): [<ffffffff95e01046>] 
asm_sysvec_xen_hvm_callback+0x16/0x20
[   60.065189] hardirqs last disabled at (58300): [<ffffffff95ccc1ea>] 
sysvec_xen_hvm_callback+0xa/0xc0
[   60.066633] softirqs last  enabled at (58098): [<ffffffff95cec89c>] 
__do_softirq+0x31c/0x423
[   60.067787] softirqs last disabled at (58089): [<ffffffff950bab91>] 
__irq_exit_rcu+0x91/0x100
[   60.068928] CPU: 0 PID: 37 Comm: khugepaged Not tainted 6.6.0-rc5+ #1376
[   60.069786] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
rel-1.16.2-14-g1e1da7a96300-prebuilt.qemu.org 04/01/2014
[   60.071340] RIP: 0010:smp_call_function_many_cond+0x119/0x660
[   60.072100] Code: 7e 08 48 63 d0 e8 17 90 44 00 3b 05 e1 bf c5 01 73 25 48 
63 d0 49 8b 36 48 03 34 d5 60 dc 55 96 8b 56 08 83 e2 01 74 0a f3 90 <8b> 4e 08 
83 e1 01 75 f6 83 c0 01 eb c1 48 83 c4 38 5b 5d 41 5c 41
[   60.074795] RSP: 0018:ffffae458014fc00 EFLAGS: 00000202
[   60.075557] RAX: 0000000000000001 RBX: 0000000000000000 RCX: 0000000000000001
[   60.076509] RDX: 0000000000000001 RSI: ffff88f33ecb3e00 RDI: ffff88f301112790
[   60.077465] RBP: 0000000000000000 R08: ffffffff96015000 R09: fffffffffffffffc
[   60.078476] R10: ffffffff95cacbac R11: ffff88f301bd0d48 R12: 0000000000000000
[   60.079434] R13: 0000000000000001 R14: ffff88f33ec2f8c0 R15: 0000000000000001
[   60.080589] FS:  0000000000000000(0000) GS:ffff88f33ec00000(0000) 
knlGS:0000000000000000
[   60.081626] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   60.082459] CR2: 00007f4cceba6818 CR3: 00000000021ec000 CR4: 00000000000006f0
[   60.083672] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   60.085047] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[   60.085994] Call Trace:
[   60.086283]  <IRQ>
[   60.086530]  ? watchdog_timer_fn+0x230/0x2a0
[   60.087084]  ? __pfx_watchdog_timer_fn+0x10/0x10
[   60.087603]  ? __hrtimer_run_queues+0x1b2/0x380
[   60.088307]  ? hrtimer_interrupt+0xfe/0x240
[   60.088931]  ? xen_timer_interrupt+0x29/0x40
[   60.089578]  ? __handle_irq_event_percpu+0x87/0x240
[   60.090420]  ? handle_irq_event_percpu+0xf/0x40
[   60.091083]  ? handle_percpu_irq+0x4f/0x70
[   60.091646]  ? handle_irq_desc+0x3a/0x50
[   60.092166]  ? evtchn_2l_handle_events+0x254/0x2a0
[   60.092716]  ? rcu_is_watching+0xd/0x40
[   60.093248]  ? lock_acquire+0xf2/0x110
[   60.093814]  ? xen_evtchn_do_upcall+0xa1/0x1b0
[   60.094482]  ? __sysvec_xen_hvm_callback+0x47/0x80
[   60.095262]  ? sysvec_xen_hvm_callback+0x99/0xc0
[   60.095978]  </IRQ>
[   60.096295]  <TASK>
[   60.096610]  ? asm_sysvec_xen_hvm_callback+0x16/0x20
[   60.097323]  ? mtree_load+0x8c/0x590
[   60.097872]  ? smp_call_function_many_cond+0x119/0x660
[   60.098796]  ? __pfx_tlb_remove_table_smp_sync+0x10/0x10
[   60.099599]  ? __pfx_tlb_remove_table_smp_sync+0x10/0x10
[   60.100515]  smp_call_function+0x39/0x70
[   60.101119]  collapse_huge_page+0x509/0x890
[   60.101920]  hpage_collapse_scan_pmd+0x376/0x8b0
[   60.102642]  khugepaged_scan_mm_slot.constprop.0+0x28e/0x500
[   60.103378]  khugepaged+0xd5/0x240
[   60.103884]  ? _raw_spin_unlock_irqrestore+0x4c/0x60
[   60.104659]  ? lockdep_hardirqs_on+0x7d/0x100
[   60.105364]  ? __pfx_khugepaged+0x10/0x10
[   60.105945]  kthread+0xf6/0x130
[   60.106363]  ? __pfx_kthread+0x10/0x10
[   60.106808]  ret_from_fork+0x30/0x50
[   60.107239]  ? __pfx_kthread+0x10/0x10
[   60.107769]  ret_from_fork_asm+0x1b/0x30
[   60.108430]  </TASK>

Attachment: smime.p7s
Description: S/MIME cryptographic signature


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.