Xen project Mailing List

Re: handle_pio looping during domain shutdown, with qemu 4.2.0 in stubdom

To: Marek Marczykowski-Górecki <marmarek@xxxxxxxxxxxxxxxxxxxxxx>

Date: Thu, 4 Jun 2020 09:08:53 +0200

Cc: xen-devel <xen-devel@xxxxxxxxxxxxxxxxxxxx>

Delivery-date: Thu, 04 Jun 2020 07:09:05 +0000

List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 04.06.2020 03:46, Marek Marczykowski-Górecki wrote: > During system shutdown quite often I hit infinite stream of errors like > this: > > (XEN) d3v0 Weird PIO status 1, port 0xb004 read 0xffff > (XEN) domain_crash called from io.c:178 > > This is all running on Xen 4.13.0 (I think I've got this with 4.13.1 > too), nested within KVM. The KVM part means everything is very slow, so > various race conditions are much more likely to happen. > > It started happening not long ago, and I'm pretty sure it's about > updating to qemu 4.2.0 (in linux stubdom), previous one was 3.0.0. > > Thanks to Andrew and Roger, I've managed to collect more info. > > Context: > dom0: pv > dom1: hvm > dom2: stubdom for dom1 > dom3: hvm > dom4: stubdom for dom3 > dom5: pvh > dom6: pvh > > It starts I think ok: > > (XEN) hvm.c:1620:d6v0 All CPUs offline -- powering off. > (XEN) d3v0 handle_pio port 0xb004 read 0x0000 > (XEN) d3v0 handle_pio port 0xb004 read 0x0000 > (XEN) d3v0 handle_pio port 0xb004 write 0x0001 > (XEN) d3v0 handle_pio port 0xb004 write 0x2001 > (XEN) d4v0 XEN_DMOP_remote_shutdown domain 3 reason 0 > (XEN) hvm.c:1620:d5v0 All CPUs offline -- powering off. > (XEN) d1v0 handle_pio port 0xb004 read 0x0000 > (XEN) d1v0 handle_pio port 0xb004 read 0x0000 > (XEN) d1v0 handle_pio port 0xb004 write 0x0001 > (XEN) d1v0 handle_pio port 0xb004 write 0x2001 > (XEN) d2v0 XEN_DMOP_remote_shutdown domain 1 reason 0 > > But then (after a second or so) when the toolstack tries to clean it up, > things go sideways: > > (XEN) d0v0 XEN_DOMCTL_destroydomain domain 6 > (XEN) d0v0 XEN_DOMCTL_destroydomain domain 6 got domain_lock > (XEN) d0v0 XEN_DOMCTL_destroydomain domain 6 ret -85 > (XEN) d0v0 XEN_DOMCTL_destroydomain domain 6 > (XEN) d0v0 XEN_DOMCTL_destroydomain domain 6 got domain_lock > (XEN) d0v0 XEN_DOMCTL_destroydomain domain 6 ret -85 > (... long stream of domain destroy that can't really finish ...) > > And then, similar also for dom1: > > (XEN) d0v1 XEN_DOMCTL_destroydomain domain 1 > (XEN) d0v1 XEN_DOMCTL_destroydomain domain 1 got domain_lock > (XEN) d0v1 XEN_DOMCTL_destroydomain domain 1 ret -85 > (... now a stream of this for dom1 and dom6 interleaved ...) > > At some point, domain 2 (stubdom for domain 1) and domain 5 join too. What makes you think this is an indication of things going sideways? -85 is -ERESTART, which is quite normal to see for a period of time while cleaning up a domain. > Then, we get the main issue: > > (XEN) d3v0 handle_pio port 0xb004 read 0x0000 > (XEN) d3v0 Weird PIO status 1, port 0xb004 read 0xffff > (XEN) domain_crash called from io.c:178 > > Note, there was no XEN_DOMCTL_destroydomain for domain 3 nor its stubdom > yet. But XEN_DMOP_remote_shutdown for domain 3 was called already. I'd guess an issue with the shutdown deferral logic. Did you / can you check whether XEN_DMOP_remote_shutdown managed to pause all CPUs (I assume it didn't, since once they're paused there shouldn't be any I/O there anymore, and hence no I/O emulation)? Another question though: In 4.13 the log message next to the domain_crash() I assume you're hitting is "Weird HVM ioemulation status", not "Weird PIO status", and the debugging patch you referenced doesn't have any change there. Andrew's recent change to master, otoh, doesn't use the word "weird" anymore. I can therefore only guess that the value logged is still hvmemul_do_pio_buffer()'s return value, i.e. X86EMUL_UNHANDLEABLE. Please confirm. Jan

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.