[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [BUG] Linux pvh vm not getting destroyed on shutdown



On Samstag, 13. Februar 2021 19:21:56 CET Elliott Mitchell wrote:
> On Sat, Feb 13, 2021 at 04:36:24PM +0100, Maximilian Engelhardt wrote:
> > after a recent upgrade of one of our test systems to Debian Bullseye we
> > noticed an issue where on shutdown of a pvh vm the vm was not destroyed by
> > xen automatically. It could still be destroyed by manually issuing a 'xl
> > destroy $vm' command.
> 
> Usually I would expect such an issue to show on the Debian bug database
> before xen-devel.  In particular as this is a behavior change with
> security updates, there is a good chance this isn't attributable to the
> Xen Project.  Additionally the Xen Project's support window is rather
> narrow.  I've been observing the same (or similar) issue for a bit too.

I posted this bug report to the xen-devel list because I was told to do so on 
upstream #xen irc channel.
Before writing my mail, I also checked the Debian kernel packaging for 
anything that might be related to our issue, but could not find anything.
Please note we didn't observe any behavior change in Debian buster on our 
systems and also didn't notice the shutdown issue there. For us the issue 
only started with kernel version 5.8.3+1~exp1.

> > Here are some things I noticed while trying to debug this issue:
> > 
> > * It happens on a Debian buster dom0 as well as on a bullseye dom0
> 
> I stick with stable on non-development machines, so I can't say anything
> to this.
> 
> > * It seems to only affect pvh vms.
> 
> I've observed it with pv and hvm VMs as well.
> 
> > * shutdown from the pvgrub menu ("c" -> "halt") does work
> 
> Woah!  That is quite the observation.  Since I had a handy opportunity
> I tried this and this reproduces for me.
> 
> > * the vm seems to shut down normal, the last lines in the console are:
> I agree with this.  Everything appears typical until the last moment.
> 
> > * issuing a reboot instead of a shutdown does work fine.
> 
> I disagree with this.  I'm seeing the issue occur with restart attempts
> too.
> 
> > * The issue started with Debian kernel 5.8.3+1~exp1 running in the vm,
> > Debian kernel 5.7.17-1 does not show the issue.
> 
> I think the first kernel update during which I saw the issue was around
> linux-image-4.19.0-12-amd64 or linux-image-4.19.0-13-amd64.  I think
> the last security update to the Xen packages was in a similar timeframe
> though.  Rate this portion as unreliable though.  I can definitely state
> this occurs with Debian's linux-image-4.19.0-13-amd64 and kernels built
> from corresponding source, this may have shown earlier.

We don't see any issues with the current Debian buster (Debian stable) kernel 
(4.19.0-14-amd64 #1 SMP Debian 4.19.171-2 (2021-01-30) x86_64 GNU/Linux) and 
also did not notice any issues with the older kernel packages in buster. Also 
the security update of xen in buster did not cause any behavior change for us. 
In our case everything in buster is working as we expect it to work (using 
latest updates and security updates).

> > * setting vcpus equal to maxvcpus does *not* show the hang.
> 
> I haven't tried things related to this, so I can't comment on this
> part.
> 
> 
> Fresh observation.  During a similar timeframe I started noticing VM
> creation leaving a `xl create` process behind.  I had discovered this
> process could be freely killed without appearing to effect the VM and had
> thus been doing so (memory in a lean Dom0 is precious).
> 
> While typing this I realized there was another scenario I needed to try.
> Turns out if I boot PV GRUB and get to its command-line (press 'c'), then
> get away from the VM console, kill the `xl create` process, return to
> the console and type "halt".  This results in a hung VM.
> 
> Are you perhaps either killing the `xl create` process for effected VMs,
> or migrating the VM and thus splitting the `xl create` process from the
> effected VMs?
> 
> This seems more a Debian issue than a Xen Project issue right now.

We don't migrate the vms, we don't kill any processes running on the dom0 and 
I don't see anything in our logs indicating something gets killed on the dom0. 
On our systems the running 'xl create' processes only use very little memory.

Have you tried if you still observer your hangs if you don't kill the xl 
processes?

Attachment: signature.asc
Description: This is a digitally signed message part.


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.