Xen project Mailing List

Re: handle_pio looping during domain shutdown, with qemu 4.2.0 in stubdom

Date: Fri, 5 Jun 2020 15:36:59 +0200

Cc: 'Andrew Cooper' <andrew.cooper3@xxxxxxxxxx>, 'Marek Marczykowski-Górecki' <marmarek@xxxxxxxxxxxxxxxxxxxxxx>, 'xen-devel' <xen-devel@xxxxxxxxxxxxxxxxxxxx>

Delivery-date: Fri, 05 Jun 2020 13:37:06 +0000

List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 05.06.2020 13:25, Paul Durrant wrote: >> -----Original Message----- >> From: Paul Durrant <xadimgnik@xxxxxxxxx> >> Sent: 05 June 2020 12:06 >> To: 'Jan Beulich' <jbeulich@xxxxxxxx>; 'Marek Marczykowski-Górecki' >> <marmarek@xxxxxxxxxxxxxxxxxxxxxx> >> Cc: 'Andrew Cooper' <andrew.cooper3@xxxxxxxxxx>; 'xen-devel' >> <xen-devel@xxxxxxxxxxxxxxxxxxxx> >> Subject: RE: handle_pio looping during domain shutdown, with qemu 4.2.0 in >> stubdom >> >> Sorry, only just catching up with this... >> >>> -----Original Message----- >>> From: Jan Beulich <jbeulich@xxxxxxxx> >>> Sent: 05 June 2020 10:09 >>> To: Marek Marczykowski-Górecki <marmarek@xxxxxxxxxxxxxxxxxxxxxx> >>> Cc: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>; xen-devel >>> <xen-devel@xxxxxxxxxxxxxxxxxxxx>; Paul >>> Durrant <paul@xxxxxxx> >>> Subject: Re: handle_pio looping during domain shutdown, with qemu 4.2.0 in >>> stubdom >>> >>> On 04.06.2020 16:25, Marek Marczykowski-Górecki wrote: >>>> On Thu, Jun 04, 2020 at 02:36:26PM +0200, Jan Beulich wrote: >>>>> On 04.06.2020 13:13, Andrew Cooper wrote: >>>>>> On 04/06/2020 08:08, Jan Beulich wrote: >>>>>>> On 04.06.2020 03:46, Marek Marczykowski-Górecki wrote: >>>>>>>> Then, we get the main issue: >>>>>>>> >>>>>>>> (XEN) d3v0 handle_pio port 0xb004 read 0x0000 >>>>>>>> (XEN) d3v0 Weird PIO status 1, port 0xb004 read 0xffff >>>>>>>> (XEN) domain_crash called from io.c:178 >>>>>>>> >>>>>>>> Note, there was no XEN_DOMCTL_destroydomain for domain 3 nor its >>>>>>>> stubdom >>>>>>>> yet. But XEN_DMOP_remote_shutdown for domain 3 was called already. >>>>>>> I'd guess an issue with the shutdown deferral logic. Did you / can >>>>>>> you check whether XEN_DMOP_remote_shutdown managed to pause all >>>>>>> CPUs (I assume it didn't, since once they're paused there shouldn't >>>>>>> be any I/O there anymore, and hence no I/O emulation)? >>>>>> >>>>>> The vcpu in question is talking to Qemu, so will have v->defer_shutdown >>>>>> intermittently set, and skip the pause in domain_shutdown() >>>>>> >>>>>> I presume this lack of pause is to allow the vcpu in question to still >>>>>> be scheduled to consume the IOREQ reply? (Its fairly opaque logic with >>>>>> 0 clarifying details). >>>>>> >>>>>> What *should* happen is that, after consuming the reply, the vcpu should >>>>>> notice and pause itself, at which point it would yield to the >>>>>> scheduler. This is the purpose of vcpu_{start,end}_shutdown_deferral(). >>>>>> >>>>>> Evidentially, this is not happening. >>>>> >>>>> We can't tell yet, until ... >>>>> >>>>>> Marek: can you add a BUG() after the weird PIO printing? That should >>>>>> confirm whether we're getting into handle_pio() via the >>>>>> handle_hvm_io_completion() path, or via the vmexit path (at which case, >>>>>> we're fully re-entering the guest). >>>>> >>>>> ... we know this. handle_pio() gets called from handle_hvm_io_completion() >>>>> after having called hvm_wait_for_io() -> hvm_io_assist() -> >>>>> vcpu_end_shutdown_deferral(), so the issue may be that we shouldn't call >>>>> handle_pio() (etc) at all anymore in this state. IOW perhaps >>>>> hvm_wait_for_io() should return "!sv->vcpu->domain->is_shutting_down" >>>>> instead of plain "true"? >>>>> >>>>> Adding Paul to Cc, as being the maintainer here. >>>> >>>> Got it, by sticking BUG() just before that domain_crash() in >>>> handle_pio(). And also vcpu 0 of both HVM domains do have >>>> v->defer_shutdown. >>> >>> As per the log they did get it set. I'd be curious of the flag's >>> value (as well as v->paused_for_shutdown's) at the point of the >>> problematic handle_pio() invocation (see below). It may be >>> worthwhile to instrument vcpu_check_shutdown() (inside its if()) >>> - before exiting to guest context (in order to then come back >>> and call handle_pio()) the vCPU ought to be getting through >>> there. No indication of it doing so would be a sign that there's >>> a code path bypassing the call to vcpu_end_shutdown_deferral(). >>> >>>> (XEN) hvm.c:1620:d6v0 All CPUs offline -- powering off. >>>> (XEN) d3v0 handle_pio port 0xb004 read 0x0000 >>>> (XEN) d3v0 handle_pio port 0xb004 read 0x0000 >>>> (XEN) d3v0 handle_pio port 0xb004 write 0x0001 >>>> (XEN) d3v0 handle_pio port 0xb004 write 0x2001 >>>> (XEN) d4v0 XEN_DMOP_remote_shutdown domain 3 reason 0 >>>> (XEN) d4v0 domain 3 domain_shutdown vcpu_id 0 defer_shutdown 1 >>>> (XEN) d4v0 XEN_DMOP_remote_shutdown domain 3 done >>>> (XEN) hvm.c:1620:d5v0 All CPUs offline -- powering off. >>>> (XEN) d1v0 handle_pio port 0xb004 read 0x0000 >>>> (XEN) d1v0 handle_pio port 0xb004 read 0x0000 >>>> (XEN) d1v0 handle_pio port 0xb004 write 0x0001 >>>> (XEN) d1v0 handle_pio port 0xb004 write 0x2001 >>>> (XEN) d2v0 XEN_DMOP_remote_shutdown domain 1 reason 0 >>>> (XEN) d2v0 domain 1 domain_shutdown vcpu_id 0 defer_shutdown 1 >>>> (XEN) d2v0 XEN_DMOP_remote_shutdown domain 1 done >>>> (XEN) grant_table.c:3702:d0v0 Grant release 0x3 ref 0x11d flags 0x2 d6 >>>> (XEN) grant_table.c:3702:d0v0 Grant release 0x4 ref 0x11e flags 0x2 d6 >>>> (XEN) d3v0 handle_pio port 0xb004 read 0x0000 >>> >>> Perhaps in this message could you also log >>> v->domain->is_shutting_down, v->defer_shutdown, and >>> v->paused_for_shutdown? (Would be nice if, after having made >>> changes to your debugging patch, you could point again at the >>> precise version you've used for the log provided.) >>> >>>> (XEN) d3v0 Unexpected PIO status 1, port 0xb004 read 0xffff >>>> (XEN) Xen BUG at io.c:178 >>> >>> Btw, instead of BUG(), WARN() or dump_execution_state() would >>> likely also do, keeping Xen alive. >>> >> >> A shutdown deferral problem would result in X86EMUL_RETRY wouldn't it? That >> would mean we wouldn't be >> seeing the "Unexpected PIO" message. From that message this clearly >> X86EMUL_UNHANDLEABLE which >> suggests a race with ioreq server teardown, possibly due to selecting a >> server but then not finding a >> vcpu match in ioreq_vcpu_list. > > Actually looking at remote_shutdown... the test of ( reason == SHUTDOWN_crash > ) and then clearing defer_shutdown looks a bit odd... Just because the domain > shutdown code has been set that way doesn't mean that a vcpu is not deferred > in emulation; SCHEDOP_shutdown_code could easily be called from one vcpu > whilst another has emulation pending. I'm confused: The deferral is of shutting down the domain, not of a specific instance of emulation. When a guest crashed I understand this code is intended to make sure shutting down because of the crash won't get deferred because of in-progress emulation anywhere. Marek didn't provide any hints so far that the guest may be crashing, so I think if there is an issue here, it's likely only a tangential one anyway. Jan

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.