[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [RFC] Xen crashes on ASSERT on suspend/resume, suggested fix



On Fri, 26 May 2023, Jan Beulich wrote:
> On 25.05.2023 21:54, Stefano Stabellini wrote:
> > On Thu, 25 May 2023, Jan Beulich wrote:
> >> On 25.05.2023 01:51, Stefano Stabellini wrote:
> >>> xen/irq: fix races between send_cleanup_vector and _clear_irq_vector
> >>
> >> This title is, I'm afraid, already misleading. No such race can occur
> >> afaict, as both callers of _clear_irq_vector() acquire the IRQ
> >> descriptor lock first, and irq_complete_move() (the sole caller of
> >> send_cleanup_vector()) is only ever invoked as or by an ->ack()
> >> hook, which in turn is only invoked with, again, the descriptor lock
> >> held.
> > 
> > Yes I see that you are right about the locking, and thank you for taking
> > the time to look into it.
> > 
> > One last question: could it be that a second interrupt arrives while
> > ->ack() is being handled?  do_IRQ() is running with interrupts disabled?
> 
> It is, at least as far as the invocation of ->ack() is concerned. Else
> the locking scheme would be broken. You may not that around ->handler()
> invocation we enable interrupts.

OK. FYI, we were able to repro a problem after 250+ suspend/resume Dom0
cycles with my patch applied. So unfortunately there is no extra
information as my patch removes the ASSERTs.

However I can tell you that the symptom is the below. I am not sure if
it tells you anything but FYI. So clearly my patch makes the problem
harder to repro but doesn't fix it.


May 23 22:47:31 amd-saravana-crater kernel: [17881.744986] INFO: task 
kworker/u8:1:45 blocked for more than 120 seconds.
May 23 22:47:31 amd-saravana-crater kernel: [17881.745048]       Not tainted 
6.1.0-rtc-s3 #1
May 23 22:47:31 amd-saravana-crater kernel: [17881.745089] "echo 0 > 
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
May 23 22:47:31 amd-saravana-crater kernel: [17881.745144] task:kworker/u8:1    
state:D stack:0     pid:45    ppid:2      flags:0x00004000
May 23 22:47:31 amd-saravana-crater kernel: [17881.745154] Workqueue: writeback 
wb_workfn (flush-259:0)
May 23 22:47:31 amd-saravana-crater kernel: [17881.745170] Call Trace:
May 23 22:47:31 amd-saravana-crater kernel: [17881.745174]  <TASK>
May 23 22:47:31 amd-saravana-crater kernel: [17881.745182]  
__schedule+0x2d5/0x920
May 23 22:47:31 amd-saravana-crater kernel: [17881.745192]  ? 
preempt_count_add+0x7c/0xc0
May 23 22:47:31 amd-saravana-crater kernel: [17881.745202]  schedule+0x63/0xd0
May 23 22:47:31 amd-saravana-crater kernel: [17881.745208]  
__bio_queue_enter+0xeb/0x230
May 23 22:47:31 amd-saravana-crater kernel: [17881.745217]  ? 
prepare_to_wait_event+0x130/0x130
May 23 22:47:31 amd-saravana-crater kernel: [17881.745226]  
blk_mq_submit_bio+0x358/0x570
May 23 22:47:31 amd-saravana-crater kernel: [17881.745237]  
__submit_bio+0xfa/0x170
May 23 22:47:31 amd-saravana-crater kernel: [17881.745243]  
submit_bio_noacct_nocheck+0x229/0x2b0
May 23 22:47:31 amd-saravana-crater kernel: [17881.745250]  ? 
ktime_get+0x47/0xb0
May 23 22:47:31 amd-saravana-crater kernel: [17881.745256]  
submit_bio_noacct+0x1e4/0x5a0
May 23 22:47:31 amd-saravana-crater kernel: [17881.745261]  ? 
submit_bio_noacct+0x1e4/0x5a0
May 23 22:47:31 amd-saravana-crater kernel: [17881.745268]  submit_bio+0x47/0x80
May 23 22:47:31 amd-saravana-crater kernel: [17881.745273]  
ext4_io_submit+0x24/0x40
May 23 22:47:31 amd-saravana-crater kernel: [17881.745282]  
ext4_writepages+0x57f/0xdd0
May 23 22:47:31 amd-saravana-crater kernel: [17881.745288]  ? 
_raw_read_lock_bh+0x20/0x40
May 23 22:47:31 amd-saravana-crater kernel: [17881.745296]  ? 
update_sd_lb_stats.constprop.148+0x11e/0x960
May 23 22:47:31 amd-saravana-crater kernel: [17881.745308]  
do_writepages+0xbf/0x1a0
May 23 22:47:31 amd-saravana-crater kernel: [17881.745314]  ? 
__enqueue_entity+0x6c/0x80
May 23 22:47:31 amd-saravana-crater kernel: [17881.745321]  ? 
enqueue_entity+0x1a9/0x370
May 23 22:47:31 amd-saravana-crater kernel: [17881.745327]  
__writeback_single_inode+0x44/0x360
May 23 22:47:31 amd-saravana-crater kernel: [17881.745332]  ? 
_raw_spin_unlock+0x19/0x40
May 23 22:47:31 amd-saravana-crater kernel: [17881.745339]  
writeback_sb_inodes+0x203/0x4e0
May 23 22:47:31 amd-saravana-crater kernel: [17881.745350]  
__writeback_inodes_wb+0x66/0xd0
May 23 22:47:31 amd-saravana-crater kernel: [17881.745358]  
wb_writeback+0x23d/0x2d0
May 23 22:47:31 amd-saravana-crater kernel: [17881.745366]  
wb_workfn+0x20b/0x490
May 23 22:47:31 amd-saravana-crater kernel: [17881.745372]  ? 
_raw_spin_unlock+0x19/0x40
May 23 22:47:31 amd-saravana-crater kernel: [17881.745381]  
process_one_work+0x227/0x440
May 23 22:47:31 amd-saravana-crater kernel: [17881.745389]  
worker_thread+0x31/0x3e0
May 23 22:47:31 amd-saravana-crater kernel: [17881.745395]  ? 
process_one_work+0x440/0x440
May 23 22:47:31 amd-saravana-crater kernel: [17881.745400]  kthread+0xfe/0x130
May 23 22:47:31 amd-saravana-crater kernel: [17881.745406]  ? 
kthread_complete_and_exit+0x20/0x20
May 23 22:47:31 amd-saravana-crater kernel: [17881.745413]  
ret_from_fork+0x22/0x30
May 23 22:47:31 amd-saravana-crater kernel: [17881.745425]  </TASK>



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.