[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[BUG] Linux pvh vm not getting destroyed on shutdown



Hi,

after a recent upgrade of one of our test systems to Debian Bullseye we 
noticed an issue where on shutdown of a pvh vm the vm was not destroyed by xen 
automatically. It could still be destroyed by manually issuing a 'xl destroy 
$vm' command.

We can reproduce the hang reliably with the following vm configuration:

type = 'pvh'
memory = '512'
kernel = '/usr/lib/grub-xen/grub-i386-xen_pvh.bin'
[... disk/name/vif ]
on_poweroff = 'destroy'
on_reboot   = 'restart'
on_crash    = 'restart'
vcpus = '1'
maxvcpus = '2'

And then issuing a shutdown command in the vm (e.g. by calling 'poweroff')


Here are some things I noticed while trying to debug this issue:

* It happens on a Debian buster dom0 as well as on a bullseye dom0

* It seems to only affect pvh vms.

* shutdown from the pvgrub menu ("c" -> "halt") does work

* the vm seems to shut down normal, the last lines in the console are:

[  228.461167] systemd-shutdown[1]: All filesystems, swaps, loop devices, MD 
devices and DM devices detached.
[  228.476794] systemd-shutdown[1]: Syncing filesystems and block devices.
[  228.477878] systemd-shutdown[1]: Powering off.
[  233.709498] xenbus_probe_frontend: xenbus_frontend_dev_shutdown: device/
vif/0 timeout closing device
[  233.745642] reboot: System halted

* issuing a reboot instead of a shutdown does work fine.

* The issue started with Debian kernel 5.8.3+1~exp1 running in the vm, Debian 
kernel 5.7.17-1 does not show the issue.

* setting vcpus equal to maxvcpus does *not* show the hang.


Below is the output of "xl debug-keys q; xl dmesg" for the affected vm in the 
'hang' state as suggested by andyhhp on #xen to attach to this bug report:

(XEN) General information for domain 55:
(XEN)     refcnt=3 dying=0 pause_count=0
(XEN)     nr_pages=131088 xenheap_pages=4 shared_pages=0 paged_pages=0 
dirty_cpus={} max_pages=131328
(XEN)     handle=275e3a73-247f-4649-af86-6d5c0c72e8e4 vm_assist=00000020
(XEN)     paging assistance: hap refcounts translate external 
(XEN) Rangesets belonging to domain 55:
(XEN)     Interrupts { }
(XEN)     I/O Memory { }
(XEN)     I/O Ports  { }
(XEN)     log-dirty  { }
(XEN) Memory pages belonging to domain 55:
(XEN)     DomPage list too long to display
(XEN)     PoD entries=0 cachesize=0
(XEN)     XenPage 0000000000080125: caf=c000000000000001, taf=e400000000000001
(XEN)     XenPage 00000000001412c9: caf=c000000000000001, taf=e400000000000001
(XEN)     XenPage 0000000000140da0: caf=c000000000000001, taf=e400000000000001
(XEN)     XenPage 0000000000140d9a: caf=c000000000000001, taf=e400000000000001
(XEN)     ExtraPage 00000000001412d3: caf=8040000000000002, 
taf=e400000000000001
(XEN) NODE affinity for domain 55: [0]
(XEN) VCPU information and callbacks for domain 55:
(XEN)   UNIT0 affinities: hard={0-7} soft={0-3}
(XEN)     VCPU0: CPU2 [has=F] poll=0 upcall_pend=01 upcall_mask=00 
(XEN)     pause_count=0 pause_flags=2
(XEN)     paging assistance: hap, 4 levels
(XEN) No periodic timer
(XEN)   UNIT1 affinities: hard={0-7} soft={0-3}
(XEN)     VCPU1: CPU1 [has=F] poll=0 upcall_pend=00 upcall_mask=00 
(XEN)     pause_count=0 pause_flags=1
(XEN)     paging assistance: hap, 4 levels
(XEN) No periodic timer


Please let me know if more information is necessary.

Thanks,
Maxi

Attachment: signature.asc
Description: This is a digitally signed message part.


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.