[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: domU suspend issue - freeze processes failed - Linux 6.16



On Fri, Aug 22, 2025 at 08:42:30PM +0200, Marek Marczykowski-Górecki wrote:
> On Fri, Aug 22, 2025 at 05:27:20PM +0200, Jürgen Groß wrote:
> > On 22.08.25 16:42, Marek Marczykowski-Górecki wrote:
> > > On Fri, Aug 22, 2025 at 04:39:33PM +0200, Marek Marczykowski-Górecki 
> > > wrote:
> > > > Hi,
> > > > 
> > > > When suspending domU I get the following issue:
> > > > 
> > > >      Freezing user space processes
> > > >      Freezing user space processes failed after 20.004 seconds (1 tasks 
> > > > refusing to freeze, wq_busy=0):
> > > >      task:xl              state:D stack:0     pid:466   tgid:466   
> > > > ppid:1      task_flags:0x400040 flags:0x00004006
> > > >      Call Trace:
> > > >       <TASK>
> > > >       __schedule+0x2f3/0x780
> > > >       schedule+0x27/0x80
> > > >       schedule_preempt_disabled+0x15/0x30
> > > >       __mutex_lock.constprop.0+0x49f/0x880
> > > >       unregister_xenbus_watch+0x216/0x230
> > > >       xenbus_write_watch+0xb9/0x220
> > > >       xenbus_file_write+0x131/0x1b0
> > > >       vfs_writev+0x26c/0x3d0
> > > >       ? do_writev+0xeb/0x110
> > > >       do_writev+0xeb/0x110
> > > >       do_syscall_64+0x84/0x2c0
> > > >       ? do_syscall_64+0x200/0x2c0
> > > >       ? generic_handle_irq+0x3f/0x60
> > > >       ? syscall_exit_work+0x108/0x140
> > > >       ? do_syscall_64+0x200/0x2c0
> > > >       ? __irq_exit_rcu+0x4c/0xe0
> > > >       entry_SYSCALL_64_after_hwframe+0x76/0x7e
> > > >      RIP: 0033:0x79b618138642
> > > >      RSP: 002b:00007fff9a192fc8 EFLAGS: 00000246 ORIG_RAX: 
> > > > 0000000000000014
> > > >      RAX: ffffffffffffffda RBX: 00000000024fd490 RCX: 000079b618138642
> > > >      RDX: 0000000000000003 RSI: 00007fff9a193120 RDI: 0000000000000014
> > > >      RBP: 00007fff9a193000 R08: 0000000000000000 R09: 0000000000000000
> > > >      R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000014
> > > >      R13: 00007fff9a193120 R14: 0000000000000003 R15: 0000000000000000
> > > >       </TASK>
> > > >      OOM killer enabled.
> > > >      Restarting tasks: Starting
> > > >      Restarting tasks: Done
> > > >      xen:manage: do_suspend: freeze processes failed -16
> > > > 
> > > > The process in question is `xl devd` daemon. It's a domU serving a
> > > > xenvif backend.
> > > > 
> > > > I noticed it on 6.16.1, but looking at earlier test logs I see it with
> > > > 6.16-rc6 already (but interestingly, not 6.16-rc2 yet? feels weird given
> > > > seemingly no relevant changes between rc2 and rc6).
> > > 
> > > I forgot to include link for (a little) more details:
> > > https://github.com/QubesOS/qubes-linux-kernel/pull/1157
> > > 
> > > Especially, there is another call trace with panic_on_warn enabled -
> > > slightly different, but looks related.
> > > 
> > 
> > I'm pretty sure the PV variant for suspending is just wrong: it is calling
> > dpm_suspend_start() from do_suspend() without taking the required
> > system_transition_mutex, resulting in the WARN() in pm_restrict_gfp_mask().
> > 
> > It might be as easy as just adding the mutex() call to do_suspend(), but I'm
> > really not sure that will be a proper fix.
> 
> Hm, this might explain the second call trace, but not the freeze failure
> quoted here above, I think?

While the patch I sent appears to fix this particular issue, it made me
wonder: is there any fundamental reason why do_suspend() is not using
pm_suspend() and register Xen-specific actions via platform_suspend_ops
(and maybe syscore_ops)? From a brief look at the code, it should
theoretically be possible, and should avoid issues like this.

I tried to do a quick&dirty attempt at that[1], and it failed (panic). I
surely made several mistakes there (and also left a ton of todo
comments). But before spending any more time at that, I'd like to ask
if this is a viable option at all.

[1] 
https://github.com/marmarek/linux/commit/47cfdb991c85566c9c333570511e67bf477a5da6
-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab

Attachment: signature.asc
Description: PGP signature


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.