[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: handle_pio looping during domain shutdown, with qemu 4.2.0 in stubdom



On Thu, Jun 04, 2020 at 02:36:26PM +0200, Jan Beulich wrote:
> On 04.06.2020 13:13, Andrew Cooper wrote:
> > On 04/06/2020 08:08, Jan Beulich wrote:
> >> On 04.06.2020 03:46, Marek Marczykowski-Górecki wrote:
> >>> Then, we get the main issue:
> >>>
> >>>     (XEN) d3v0 handle_pio port 0xb004 read 0x0000
> >>>     (XEN) d3v0 Weird PIO status 1, port 0xb004 read 0xffff
> >>>     (XEN) domain_crash called from io.c:178
> >>>
> >>> Note, there was no XEN_DOMCTL_destroydomain for domain 3 nor its stubdom
> >>> yet. But XEN_DMOP_remote_shutdown for domain 3 was called already.
> >> I'd guess an issue with the shutdown deferral logic. Did you / can
> >> you check whether XEN_DMOP_remote_shutdown managed to pause all
> >> CPUs (I assume it didn't, since once they're paused there shouldn't
> >> be any I/O there anymore, and hence no I/O emulation)?
> > 
> > The vcpu in question is talking to Qemu, so will have v->defer_shutdown
> > intermittently set, and skip the pause in domain_shutdown()
> > 
> > I presume this lack of pause is to allow the vcpu in question to still
> > be scheduled to consume the IOREQ reply?  (Its fairly opaque logic with
> > 0 clarifying details).
> > 
> > What *should* happen is that, after consuming the reply, the vcpu should
> > notice and pause itself, at which point it would yield to the
> > scheduler.  This is the purpose of vcpu_{start,end}_shutdown_deferral().
> > 
> > Evidentially, this is not happening.
> 
> We can't tell yet, until ...
> 
> > Marek: can you add a BUG() after the weird PIO printing?  That should
> > confirm whether we're getting into handle_pio() via the
> > handle_hvm_io_completion() path, or via the vmexit path (at which case,
> > we're fully re-entering the guest).
> 
> ... we know this. handle_pio() gets called from handle_hvm_io_completion()
> after having called hvm_wait_for_io() -> hvm_io_assist() ->
> vcpu_end_shutdown_deferral(), so the issue may be that we shouldn't call
> handle_pio() (etc) at all anymore in this state. IOW perhaps
> hvm_wait_for_io() should return "!sv->vcpu->domain->is_shutting_down"
> instead of plain "true"?
> 
> Adding Paul to Cc, as being the maintainer here.

Got it, by sticking BUG() just before that domain_crash() in
handle_pio(). And also vcpu 0 of both HVM domains do have
v->defer_shutdown.

(XEN) hvm.c:1620:d6v0 All CPUs offline -- powering off.
(XEN) d3v0 handle_pio port 0xb004 read 0x0000
(XEN) d3v0 handle_pio port 0xb004 read 0x0000
(XEN) d3v0 handle_pio port 0xb004 write 0x0001
(XEN) d3v0 handle_pio port 0xb004 write 0x2001
(XEN) d4v0 XEN_DMOP_remote_shutdown domain 3 reason 0
(XEN) d4v0 domain 3 domain_shutdown vcpu_id 0 defer_shutdown 1
(XEN) d4v0 XEN_DMOP_remote_shutdown domain 3 done
(XEN) hvm.c:1620:d5v0 All CPUs offline -- powering off.
(XEN) d1v0 handle_pio port 0xb004 read 0x0000
(XEN) d1v0 handle_pio port 0xb004 read 0x0000
(XEN) d1v0 handle_pio port 0xb004 write 0x0001
(XEN) d1v0 handle_pio port 0xb004 write 0x2001
(XEN) d2v0 XEN_DMOP_remote_shutdown domain 1 reason 0
(XEN) d2v0 domain 1 domain_shutdown vcpu_id 0 defer_shutdown 1
(XEN) d2v0 XEN_DMOP_remote_shutdown domain 1 done
(XEN) grant_table.c:3702:d0v0 Grant release 0x3 ref 0x11d flags 0x2 d6
(XEN) grant_table.c:3702:d0v0 Grant release 0x4 ref 0x11e flags 0x2 d6
(XEN) d3v0 handle_pio port 0xb004 read 0x0000
(XEN) d3v0 Unexpected PIO status 1, port 0xb004 read 0xffff
(XEN) Xen BUG at io.c:178
(XEN) ----[ Xen-4.13.0  x86_64  debug=y   Not tainted ]----
(XEN) CPU:    1
(XEN) RIP:    e008:[<ffff82d0802fcb0f>] handle_pio+0x1e4/0x1e6
(XEN) RFLAGS: 0000000000010282   CONTEXT: hypervisor (d3v0)
(XEN) rax: ffff8301ba6fffff   rbx: 0000000000000002   rcx: 0000000000000000
(XEN) rdx: 0000000000000001   rsi: 000000000000000a   rdi: ffff82d080438698
(XEN) rbp: ffff8301ba6ffe90   rsp: ffff8301ba6ffe58   r8:  0000000000000001
(XEN) r9:  ffff8301ba6ffdc0   r10: 0000000000000001   r11: 000000000000000f
(XEN) r12: 000000000000b004   r13: ffff8300bfcf1000   r14: 0000000000000001
(XEN) r15: ffff8300bfcf4000   cr0: 000000008005003b   cr4: 00000000000006e0
(XEN) cr3: 00000000bebb8000   cr2: 00007d081d9b82a0
(XEN) fsb: 00007d081cafcb80   gsb: ffff9ae510c00000   gss: 0000000000000000
(XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: 0000   cs: e008
(XEN) Xen code around <ffff82d0802fcb0f> (handle_pio+0x1e4/0x1e6):
(XEN)  03 09 00 e8 5b 83 f4 ff <0f> 0b 55 48 89 e5 e8 b2 f5 ff ff 48 85 c0 74 0f
(XEN) Xen stack trace from rsp=ffff8301ba6ffe58:
(XEN)    000000000000ffff ffff8300bfcfffff 000000000000007b ffff8301ba6ffef8
(XEN)    ffff8300bfcf1000 ffff8300bfcf4000 0000000000000000 ffff8301ba6ffee8
(XEN)    ffff82d0803128f1 00ff8301ba6ffec0 ffff82d08030c257 ffff8301ba6ffef8
(XEN)    ffff8300bfcf1000 ffff8300bfcf4000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 00007cfe459000e7 ffff82d08031470b
(XEN)    0000000000000010 0000000000000010 0000000000000010 ffffa92ec001fd0c
(XEN)    000000000000b004 0000000000000010 0000000000000001 0000000000000000
(XEN)    0000000000000002 000000000000b004 ffffa92ec001fca4 0000000000000002
(XEN)    000000000000b004 ffffa92ec001fd0c 000000000000b004 0000beef0000beef
(XEN)    ffffffffaa5d43bf 000000bf0000beef 0000000000000046 ffffa92ec001fca0
(XEN)    000000000000beef 000000000000beef 000000000000beef 000000000000beef
(XEN)    000000000000beef 0000e01000000001 ffff8300bfcf4000 000000313a1d6000
(XEN)    00000000000006e0 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000
(XEN) Xen call trace:
(XEN)    [<ffff82d0802fcb0f>] R handle_pio+0x1e4/0x1e6
(XEN)    [<ffff82d0803128f1>] F svm_vmexit_handler+0x97a/0x165b
(XEN)    [<ffff82d08031470b>] F svm_stgi_label+0x8/0x18
(XEN) 
(XEN) 
(XEN) ****************************************
(XEN) Panic on CPU 1:
(XEN) Xen BUG at io.c:178
(XEN) ****************************************
(XEN) 
(XEN) Reboot in five seconds...


-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?

Attachment: signature.asc
Description: PGP signature


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.