[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: handle_pio looping during domain shutdown, with qemu 4.2.0 in stubdom


  • To: Jan Beulich <jbeulich@xxxxxxxx>, Marek Marczykowski-Górecki <marmarek@xxxxxxxxxxxxxxxxxxxxxx>
  • From: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
  • Date: Thu, 4 Jun 2020 12:13:10 +0100
  • Authentication-results: esa5.hc3370-68.iphmx.com; dkim=none (message not signed) header.i=none
  • Cc: xen-devel <xen-devel@xxxxxxxxxxxxxxxxxxxx>
  • Delivery-date: Thu, 04 Jun 2020 11:13:26 +0000
  • Ironport-sdr: oDor9dntqjX/CZq4EOWuvzz9MMJkD1I0M0gt1qCcqHU8OYVP/SXpQfJUsnrYO80Au6Z92f+w7x QGsytnEB2UhRDZmMNQO1Y68PX/sPqhaAnkKn51Kb6NxsQ1Mb8wsSZe8HkZOoVtRyEfUug/RxsX /tjKRvMNCQHdi5gRfSjXcKcl05ZzxHdtE/N7ThPYcvnejsDuLLHAxlsUNHZD1vb0bR8cr0X96b J56Rr4V4xod7gtciHGhqDVDpwdZgJNaTxMS20JoCSuJLKu8kDP0OR8dGgK0e48n/w01py8m+C5 ng4=
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 04/06/2020 08:08, Jan Beulich wrote:
On 04.06.2020 03:46, Marek Marczykowski-Górecki wrote:
Then, we get the main issue:

    (XEN) d3v0 handle_pio port 0xb004 read 0x0000
    (XEN) d3v0 Weird PIO status 1, port 0xb004 read 0xffff
    (XEN) domain_crash called from io.c:178

Note, there was no XEN_DOMCTL_destroydomain for domain 3 nor its stubdom
yet. But XEN_DMOP_remote_shutdown for domain 3 was called already.
I'd guess an issue with the shutdown deferral logic. Did you / can
you check whether XEN_DMOP_remote_shutdown managed to pause all
CPUs (I assume it didn't, since once they're paused there shouldn't
be any I/O there anymore, and hence no I/O emulation)?

The vcpu in question is talking to Qemu, so will have v->defer_shutdown intermittently set, and skip the pause in domain_shutdown()

I presume this lack of pause is to allow the vcpu in question to still be scheduled to consume the IOREQ reply?  (Its fairly opaque logic with 0 clarifying details).

What *should* happen is that, after consuming the reply, the vcpu should notice and pause itself, at which point it would yield to the scheduler.  This is the purpose of vcpu_{start,end}_shutdown_deferral().

Evidentially, this is not happening.

Marek: can you add a BUG() after the weird PIO printing?  That should confirm whether we're getting into handle_pio() via the handle_hvm_io_completion() path, or via the vmexit path (at which case, we're fully re-entering the guest).

I suspect you can drop the debugging of XEN_DOMCTL_destroydomain - I think its just noise atm.

However, it would be very helpful to see the vcpus which fall into domain_shutdown()'s "else if ( v->defer_shutdown ) continue;" path.

Another question though: In 4.13 the log message next to the
domain_crash() I assume you're hitting is "Weird HVM ioemulation
status", not "Weird PIO status", and the debugging patch you
referenced doesn't have any change there. Andrew's recent
change to master, otoh, doesn't use the word "weird" anymore. I
can therefore only guess that the value logged is still
hvmemul_do_pio_buffer()'s return value, i.e. X86EMUL_UNHANDLEABLE.
Please confirm.

It's the first draft of the patch which I did, before submitting to xen-devel.  We do have X86EMUL_UNHANDLEABLE at this point, but its not terribly helpful - there are loads of paths which fail silently with this error.

~Andrew

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.