Xen project Mailing List

Re: handle_pio looping during domain shutdown, with qemu 4.2.0 in stubdom

To: paul@xxxxxxx, 'Marek Marczykowski-Górecki' <marmarek@xxxxxxxxxxxxxxxxxxxxxx>

Date: Fri, 5 Jun 2020 15:57:14 +0200

Cc: 'Andrew Cooper' <andrew.cooper3@xxxxxxxxxx>, 'xen-devel' <xen-devel@xxxxxxxxxxxxxxxxxxxx>

Delivery-date: Fri, 05 Jun 2020 13:57:28 +0000

List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 05.06.2020 15:37, Paul Durrant wrote: >> -----Original Message----- >> From: Jan Beulich <jbeulich@xxxxxxxx> >> Sent: 05 June 2020 14:32 >> To: paul@xxxxxxx >> Cc: 'Marek Marczykowski-Górecki' <marmarek@xxxxxxxxxxxxxxxxxxxxxx>; 'Andrew >> Cooper' >> <andrew.cooper3@xxxxxxxxxx>; 'xen-devel' <xen-devel@xxxxxxxxxxxxxxxxxxxx> >> Subject: Re: handle_pio looping during domain shutdown, with qemu 4.2.0 in >> stubdom >> >> On 05.06.2020 13:05, Paul Durrant wrote: >>> Sorry, only just catching up with this... >>> >>>> -----Original Message----- >>>> From: Jan Beulich <jbeulich@xxxxxxxx> >>>> Sent: 05 June 2020 10:09 >>>> To: Marek Marczykowski-Górecki <marmarek@xxxxxxxxxxxxxxxxxxxxxx> >>>> Cc: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>; xen-devel >>>> <xen-devel@xxxxxxxxxxxxxxxxxxxx>; Paul >>>> Durrant <paul@xxxxxxx> >>>> Subject: Re: handle_pio looping during domain shutdown, with qemu 4.2.0 in >>>> stubdom >>>> >>>> On 04.06.2020 16:25, Marek Marczykowski-Górecki wrote: >>>>> On Thu, Jun 04, 2020 at 02:36:26PM +0200, Jan Beulich wrote: >>>>>> On 04.06.2020 13:13, Andrew Cooper wrote: >>>>>>> On 04/06/2020 08:08, Jan Beulich wrote: >>>>>>>> On 04.06.2020 03:46, Marek Marczykowski-Górecki wrote: >>>>>>>>> Then, we get the main issue: >>>>>>>>> >>>>>>>>> (XEN) d3v0 handle_pio port 0xb004 read 0x0000 >>>>>>>>> (XEN) d3v0 Weird PIO status 1, port 0xb004 read 0xffff >>>>>>>>> (XEN) domain_crash called from io.c:178 >>>>>>>>> >>>>>>>>> Note, there was no XEN_DOMCTL_destroydomain for domain 3 nor its >>>>>>>>> stubdom >>>>>>>>> yet. But XEN_DMOP_remote_shutdown for domain 3 was called already. >>>>>>>> I'd guess an issue with the shutdown deferral logic. Did you / can >>>>>>>> you check whether XEN_DMOP_remote_shutdown managed to pause all >>>>>>>> CPUs (I assume it didn't, since once they're paused there shouldn't >>>>>>>> be any I/O there anymore, and hence no I/O emulation)? >>>>>>> >>>>>>> The vcpu in question is talking to Qemu, so will have v->defer_shutdown >>>>>>> intermittently set, and skip the pause in domain_shutdown() >>>>>>> >>>>>>> I presume this lack of pause is to allow the vcpu in question to still >>>>>>> be scheduled to consume the IOREQ reply? (Its fairly opaque logic with >>>>>>> 0 clarifying details). >>>>>>> >>>>>>> What *should* happen is that, after consuming the reply, the vcpu should >>>>>>> notice and pause itself, at which point it would yield to the >>>>>>> scheduler. This is the purpose of vcpu_{start,end}_shutdown_deferral(). >>>>>>> >>>>>>> Evidentially, this is not happening. >>>>>> >>>>>> We can't tell yet, until ... >>>>>> >>>>>>> Marek: can you add a BUG() after the weird PIO printing? That should >>>>>>> confirm whether we're getting into handle_pio() via the >>>>>>> handle_hvm_io_completion() path, or via the vmexit path (at which case, >>>>>>> we're fully re-entering the guest). >>>>>> >>>>>> ... we know this. handle_pio() gets called from >>>>>> handle_hvm_io_completion() >>>>>> after having called hvm_wait_for_io() -> hvm_io_assist() -> >>>>>> vcpu_end_shutdown_deferral(), so the issue may be that we shouldn't call >>>>>> handle_pio() (etc) at all anymore in this state. IOW perhaps >>>>>> hvm_wait_for_io() should return "!sv->vcpu->domain->is_shutting_down" >>>>>> instead of plain "true"? >>>>>> >>>>>> Adding Paul to Cc, as being the maintainer here. >>>>> >>>>> Got it, by sticking BUG() just before that domain_crash() in >>>>> handle_pio(). And also vcpu 0 of both HVM domains do have >>>>> v->defer_shutdown. >>>> >>>> As per the log they did get it set. I'd be curious of the flag's >>>> value (as well as v->paused_for_shutdown's) at the point of the >>>> problematic handle_pio() invocation (see below). It may be >>>> worthwhile to instrument vcpu_check_shutdown() (inside its if()) >>>> - before exiting to guest context (in order to then come back >>>> and call handle_pio()) the vCPU ought to be getting through >>>> there. No indication of it doing so would be a sign that there's >>>> a code path bypassing the call to vcpu_end_shutdown_deferral(). >>>> >>>>> (XEN) hvm.c:1620:d6v0 All CPUs offline -- powering off. >>>>> (XEN) d3v0 handle_pio port 0xb004 read 0x0000 >>>>> (XEN) d3v0 handle_pio port 0xb004 read 0x0000 >>>>> (XEN) d3v0 handle_pio port 0xb004 write 0x0001 >>>>> (XEN) d3v0 handle_pio port 0xb004 write 0x2001 >>>>> (XEN) d4v0 XEN_DMOP_remote_shutdown domain 3 reason 0 >>>>> (XEN) d4v0 domain 3 domain_shutdown vcpu_id 0 defer_shutdown 1 >>>>> (XEN) d4v0 XEN_DMOP_remote_shutdown domain 3 done >>>>> (XEN) hvm.c:1620:d5v0 All CPUs offline -- powering off. >>>>> (XEN) d1v0 handle_pio port 0xb004 read 0x0000 >>>>> (XEN) d1v0 handle_pio port 0xb004 read 0x0000 >>>>> (XEN) d1v0 handle_pio port 0xb004 write 0x0001 >>>>> (XEN) d1v0 handle_pio port 0xb004 write 0x2001 >>>>> (XEN) d2v0 XEN_DMOP_remote_shutdown domain 1 reason 0 >>>>> (XEN) d2v0 domain 1 domain_shutdown vcpu_id 0 defer_shutdown 1 >>>>> (XEN) d2v0 XEN_DMOP_remote_shutdown domain 1 done >>>>> (XEN) grant_table.c:3702:d0v0 Grant release 0x3 ref 0x11d flags 0x2 d6 >>>>> (XEN) grant_table.c:3702:d0v0 Grant release 0x4 ref 0x11e flags 0x2 d6 >>>>> (XEN) d3v0 handle_pio port 0xb004 read 0x0000 >>>> >>>> Perhaps in this message could you also log >>>> v->domain->is_shutting_down, v->defer_shutdown, and >>>> v->paused_for_shutdown? (Would be nice if, after having made >>>> changes to your debugging patch, you could point again at the >>>> precise version you've used for the log provided.) >>>> >>>>> (XEN) d3v0 Unexpected PIO status 1, port 0xb004 read 0xffff >>>>> (XEN) Xen BUG at io.c:178 >>>> >>>> Btw, instead of BUG(), WARN() or dump_execution_state() would >>>> likely also do, keeping Xen alive. >>>> >>> >>> A shutdown deferral problem would result in X86EMUL_RETRY wouldn't it? >> >> Where would this originate? > > I was referring to the 'if ( unlikely(!vcpu_start_shutdown_deferral(curr)) )' > at the top of hvm_send_ioreq(). Ah yes. But this is just one way of things possibly going wrong. Plus the function will return true when ->defer_shutdown is already or (wrongly) still set. >>> That would mean we wouldn't be seeing the "Unexpected PIO" message. From >>> that message this clearly >> X86EMUL_UNHANDLEABLE which suggests a race with ioreq server teardown, >> possibly due to selecting a >> server but then not finding a vcpu match in ioreq_vcpu_list. >> >> I was suspecting such, but at least the tearing down of all servers >> happens only from relinquish-resources, which gets started only >> after ->is_shut_down got set (unless the tool stack invoked >> XEN_DOMCTL_destroydomain without having observed XEN_DOMINF_shutdown >> set for the domain). >> >> For individually unregistered servers - yes, if qemu did so, this >> would be a problem. They need to remain registered until all vCPU-s >> in the domain got paused. > > It shouldn't be a problem should it? Destroying an individual server is only > done with the domain paused, so no vcpus can be running at the time. Consider the case of one getting destroyed after it has already returned data, but the originating vCPU didn't consume that data yet. Once that vCPU gets unpaused, handle_hvm_io_completion() won't find the matching server anymore, and hence the chain hvm_wait_for_io() -> hvm_io_assist() -> vcpu_end_shutdown_deferral() would be skipped. handle_pio() would then still correctly consume the result. Marek - to verify this doesn't happen (sorry, my qemu knowledge is rather limited, and hence I don't know whether this can happen at all), could you also log hvm_destroy_ioreq_server() invocations? Jan

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.