[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: handle_pio looping during domain shutdown, with qemu 4.2.0 in stubdom



On Fri, Jun 05, 2020 at 04:39:56PM +0100, Paul Durrant wrote:
> > -----Original Message-----
> > From: Jan Beulich <jbeulich@xxxxxxxx>
> > Sent: 05 June 2020 14:57
> > To: paul@xxxxxxx; 'Marek Marczykowski-Górecki' 
> > <marmarek@xxxxxxxxxxxxxxxxxxxxxx>
> > Cc: 'Andrew Cooper' <andrew.cooper3@xxxxxxxxxx>; 'xen-devel' 
> > <xen-devel@xxxxxxxxxxxxxxxxxxxx>
> > Subject: Re: handle_pio looping during domain shutdown, with qemu 4.2.0 in 
> > stubdom
> > 
> > On 05.06.2020 15:37, Paul Durrant wrote:
> > >> -----Original Message-----
> > >> From: Jan Beulich <jbeulich@xxxxxxxx>
> > >> Sent: 05 June 2020 14:32
> > >> To: paul@xxxxxxx
> > >> Cc: 'Marek Marczykowski-Górecki' <marmarek@xxxxxxxxxxxxxxxxxxxxxx>; 
> > >> 'Andrew Cooper'
> > >> <andrew.cooper3@xxxxxxxxxx>; 'xen-devel' <xen-devel@xxxxxxxxxxxxxxxxxxxx>
> > >> Subject: Re: handle_pio looping during domain shutdown, with qemu 4.2.0 
> > >> in stubdom
> > >>
> > >> On 05.06.2020 13:05, Paul Durrant wrote:
> > >>> Sorry, only just catching up with this...
> > >>>
> > >>>> -----Original Message-----
> > >>>> From: Jan Beulich <jbeulich@xxxxxxxx>
> > >>>> Sent: 05 June 2020 10:09
> > >>>> To: Marek Marczykowski-Górecki <marmarek@xxxxxxxxxxxxxxxxxxxxxx>
> > >>>> Cc: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>; xen-devel 
> > >>>> <xen-devel@xxxxxxxxxxxxxxxxxxxx>; Paul
> > >>>> Durrant <paul@xxxxxxx>
> > >>>> Subject: Re: handle_pio looping during domain shutdown, with qemu 
> > >>>> 4.2.0 in stubdom
> > >>>>
> > >>>> On 04.06.2020 16:25, Marek Marczykowski-Górecki wrote:
> > >>>>> On Thu, Jun 04, 2020 at 02:36:26PM +0200, Jan Beulich wrote:
> > >>>>>> On 04.06.2020 13:13, Andrew Cooper wrote:
> > >>>>>>> On 04/06/2020 08:08, Jan Beulich wrote:
> > >>>>>>>> On 04.06.2020 03:46, Marek Marczykowski-Górecki wrote:
> > >>>>>>>>> Then, we get the main issue:
> > >>>>>>>>>
> > >>>>>>>>>     (XEN) d3v0 handle_pio port 0xb004 read 0x0000
> > >>>>>>>>>     (XEN) d3v0 Weird PIO status 1, port 0xb004 read 0xffff
> > >>>>>>>>>     (XEN) domain_crash called from io.c:178
> > >>>>>>>>>
> > >>>>>>>>> Note, there was no XEN_DOMCTL_destroydomain for domain 3 nor its 
> > >>>>>>>>> stubdom
> > >>>>>>>>> yet. But XEN_DMOP_remote_shutdown for domain 3 was called already.
> > >>>>>>>> I'd guess an issue with the shutdown deferral logic. Did you / can
> > >>>>>>>> you check whether XEN_DMOP_remote_shutdown managed to pause all
> > >>>>>>>> CPUs (I assume it didn't, since once they're paused there shouldn't
> > >>>>>>>> be any I/O there anymore, and hence no I/O emulation)?
> > >>>>>>>
> > >>>>>>> The vcpu in question is talking to Qemu, so will have 
> > >>>>>>> v->defer_shutdown
> > >>>>>>> intermittently set, and skip the pause in domain_shutdown()
> > >>>>>>>
> > >>>>>>> I presume this lack of pause is to allow the vcpu in question to 
> > >>>>>>> still
> > >>>>>>> be scheduled to consume the IOREQ reply?  (Its fairly opaque logic 
> > >>>>>>> with
> > >>>>>>> 0 clarifying details).
> > >>>>>>>
> > >>>>>>> What *should* happen is that, after consuming the reply, the vcpu 
> > >>>>>>> should
> > >>>>>>> notice and pause itself, at which point it would yield to the
> > >>>>>>> scheduler.  This is the purpose of 
> > >>>>>>> vcpu_{start,end}_shutdown_deferral().
> > >>>>>>>
> > >>>>>>> Evidentially, this is not happening.
> > >>>>>>
> > >>>>>> We can't tell yet, until ...
> > >>>>>>
> > >>>>>>> Marek: can you add a BUG() after the weird PIO printing?  That 
> > >>>>>>> should
> > >>>>>>> confirm whether we're getting into handle_pio() via the
> > >>>>>>> handle_hvm_io_completion() path, or via the vmexit path (at which 
> > >>>>>>> case,
> > >>>>>>> we're fully re-entering the guest).
> > >>>>>>
> > >>>>>> ... we know this. handle_pio() gets called from 
> > >>>>>> handle_hvm_io_completion()
> > >>>>>> after having called hvm_wait_for_io() -> hvm_io_assist() ->
> > >>>>>> vcpu_end_shutdown_deferral(), so the issue may be that we shouldn't 
> > >>>>>> call
> > >>>>>> handle_pio() (etc) at all anymore in this state. IOW perhaps
> > >>>>>> hvm_wait_for_io() should return "!sv->vcpu->domain->is_shutting_down"
> > >>>>>> instead of plain "true"?
> > >>>>>>
> > >>>>>> Adding Paul to Cc, as being the maintainer here.
> > >>>>>
> > >>>>> Got it, by sticking BUG() just before that domain_crash() in
> > >>>>> handle_pio(). And also vcpu 0 of both HVM domains do have
> > >>>>> v->defer_shutdown.
> > >>>>
> > >>>> As per the log they did get it set. I'd be curious of the flag's
> > >>>> value (as well as v->paused_for_shutdown's) at the point of the
> > >>>> problematic handle_pio() invocation (see below). It may be
> > >>>> worthwhile to instrument vcpu_check_shutdown() (inside its if())
> > >>>> - before exiting to guest context (in order to then come back
> > >>>> and call handle_pio()) the vCPU ought to be getting through
> > >>>> there. No indication of it doing so would be a sign that there's
> > >>>> a code path bypassing the call to vcpu_end_shutdown_deferral().
> > >>>>
> > >>>>> (XEN) hvm.c:1620:d6v0 All CPUs offline -- powering off.
> > >>>>> (XEN) d3v0 handle_pio port 0xb004 read 0x0000
> > >>>>> (XEN) d3v0 handle_pio port 0xb004 read 0x0000
> > >>>>> (XEN) d3v0 handle_pio port 0xb004 write 0x0001
> > >>>>> (XEN) d3v0 handle_pio port 0xb004 write 0x2001
> > >>>>> (XEN) d4v0 XEN_DMOP_remote_shutdown domain 3 reason 0
> > >>>>> (XEN) d4v0 domain 3 domain_shutdown vcpu_id 0 defer_shutdown 1
> > >>>>> (XEN) d4v0 XEN_DMOP_remote_shutdown domain 3 done
> > >>>>> (XEN) hvm.c:1620:d5v0 All CPUs offline -- powering off.
> > >>>>> (XEN) d1v0 handle_pio port 0xb004 read 0x0000
> > >>>>> (XEN) d1v0 handle_pio port 0xb004 read 0x0000
> > >>>>> (XEN) d1v0 handle_pio port 0xb004 write 0x0001
> > >>>>> (XEN) d1v0 handle_pio port 0xb004 write 0x2001
> > >>>>> (XEN) d2v0 XEN_DMOP_remote_shutdown domain 1 reason 0
> > >>>>> (XEN) d2v0 domain 1 domain_shutdown vcpu_id 0 defer_shutdown 1
> > >>>>> (XEN) d2v0 XEN_DMOP_remote_shutdown domain 1 done
> > >>>>> (XEN) grant_table.c:3702:d0v0 Grant release 0x3 ref 0x11d flags 0x2 d6
> > >>>>> (XEN) grant_table.c:3702:d0v0 Grant release 0x4 ref 0x11e flags 0x2 d6
> > >>>>> (XEN) d3v0 handle_pio port 0xb004 read 0x0000
> > >>>>
> > >>>> Perhaps in this message could you also log
> > >>>> v->domain->is_shutting_down, v->defer_shutdown, and
> > >>>> v->paused_for_shutdown? (Would be nice if, after having made
> > >>>> changes to your debugging patch, you could point again at the
> > >>>> precise version you've used for the log provided.)
> > >>>>
> > >>>>> (XEN) d3v0 Unexpected PIO status 1, port 0xb004 read 0xffff
> > >>>>> (XEN) Xen BUG at io.c:178
> > >>>>
> > >>>> Btw, instead of BUG(), WARN() or dump_execution_state() would
> > >>>> likely also do, keeping Xen alive.
> > >>>>
> > >>>
> > >>> A shutdown deferral problem would result in X86EMUL_RETRY wouldn't it?
> > >>
> > >> Where would this originate?
> > >
> > > I was referring to the 'if ( 
> > > unlikely(!vcpu_start_shutdown_deferral(curr)) )' at the top of
> > hvm_send_ioreq().
> > 
> > Ah yes. But this is just one way of things possibly going wrong. Plus
> > the function will return true when ->defer_shutdown is already or
> > (wrongly) still set.
> > 
> > >>> That would mean we wouldn't be seeing the "Unexpected PIO" message. 
> > >>> From that message this clearly
> > >> X86EMUL_UNHANDLEABLE which suggests a race with ioreq server teardown, 
> > >> possibly due to selecting a
> > >> server but then not finding a vcpu match in ioreq_vcpu_list.
> > >>
> > >> I was suspecting such, but at least the tearing down of all servers
> > >> happens only from relinquish-resources, which gets started only
> > >> after ->is_shut_down got set (unless the tool stack invoked
> > >> XEN_DOMCTL_destroydomain without having observed XEN_DOMINF_shutdown
> > >> set for the domain).
> > >>
> > >> For individually unregistered servers - yes, if qemu did so, this
> > >> would be a problem. They need to remain registered until all vCPU-s
> > >> in the domain got paused.
> > >
> > > It shouldn't be a problem should it? Destroying an individual server is 
> > > only done with the domain
> > paused, so no vcpus can be running at the time.
> > 
> > Consider the case of one getting destroyed after it has already
> > returned data, but the originating vCPU didn't consume that data
> > yet. Once that vCPU gets unpaused, handle_hvm_io_completion()
> > won't find the matching server anymore, and hence the chain
> > hvm_wait_for_io() -> hvm_io_assist() ->
> > vcpu_end_shutdown_deferral() would be skipped. handle_pio()
> > would then still correctly consume the result.
> 
> True, and skipping hvm_io_assist() means the vcpu internal ioreq state will 
> be left set to IOREQ_READY and *that* explains why we would then exit 
> hvmemul_do_io() with X86EMUL_UNHANDLEABLE (from the first switch).

I can confirm X86EMUL_UNHANDLEABLE indeed comes from the first switch in
hvmemul_do_io(). And it happens shortly after ioreq server is destroyed:

(XEN) d12v0 XEN_DMOP_remote_shutdown domain 11 reason 0
(XEN) d12v0 domain 11 domain_shutdown vcpu_id 0 defer_shutdown 1
(XEN) d12v0 XEN_DMOP_remote_shutdown domain 11 done
(XEN) d12v0 hvm_destroy_ioreq_server called for 11, id 0
(XEN) d11v0 handle_pio port 0xb004 read 0x0000 is_shutting_down 1 
defer_shutdown 1 paused_for_shutdown 0 is_shut_down 0
(XEN) emulate.c:180:d11v0 hvmemul_do_io X86EMUL_UNHANDLEABLE: io_req.state 1
(XEN) d11v0 Unexpected PIO status 1, port 0xb004 read 0xffff

I've also made handle_pio message printed on v->defer_shutdown=1
regardless of the port, but didn't hit any other case than read from
0xb004.

Now, I'm going to try your patch.

> > Marek - to verify this doesn't happen (sorry, my qemu knowledge
> > is rather limited, and hence I don't know whether this can
> > happen at all), could you also log hvm_destroy_ioreq_server()
> > invocations?
> > 
> > Jan
> 

-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?

Attachment: signature.asc
Description: PGP signature


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.