Re: [Xen-devel] [Xen-users] substantial shutdown delay for PV guests with PCI -passthrough

Am 19.03.14 14:00, schrieb Konrad Rzeszutek Wilk:
On Wed, Mar 19, 2014 at 11:26:24AM +0000, Ian Campbell wrote:
On Wed, 2014-03-19 at 01:25 +0100, Atom2 wrote:
So it seems that pretty much at the start of the 10s delay the state
changed from 4 to 6 and stays at that value even after the first 10s
delay is over - whatever that means.

4 == Connected
6 == Closed

I think what is happening is that the domain is shutting down, which
causes pciback to transition to the closed state (because the f.e. went
away, so this is a reasonable thing for it to do).

The bug appears to be that libxl is trying to "hot unplug" the devices
on shutdown when they have already been effectively "cold unplugged" by
the domain going down.
I might be wrong, but this behaviour is somehow reminescent of (although not identical to) the bug in the vif-bridge script that I reported some time ago (see http://xen.markmail.org/thread/auroivzr4vje3bzn ; btw discussions there seem to have stalled): The vif-bridge script also tried to do something (i.e. deleting an i/f from the bridge and bringing down the i/f) which obviously has already been done through shutting down the guest domain.

Perhaps libxl__device_pci_remove_xenstore should observe that the state
is > 4 (hence closing/closed) and not bother doing anything, i.e. only
waiting iff the state is <4 (init, connecting etc)? Or unconditionally
removing the nodes if state > 4. (perhaps state 7, reconfiguring needs
handling here too)

Or perhaps the force parameter passed to remove_common (which indicates
destroy rather than unplug) ought to be propagated down to this code and
$something done with it.

Roger, Ian, any thoughts on that?

This reminds me of this bug:

commit 098b1aeaf4d6149953b8f1f8d55c21d85536fbff
Author: Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx>
Date:   Mon Jun 10 16:48:09 2013 -0400

     xen/pcifront: Deal with toolstack missing 'XenbusStateClosing' state.

... snip..

     In other words, this 4(Connected)->5(Closing)->4(Connected) state
     was expected, while 4(Connected)->.... anything but 
     was not. This patch removes that aggressive check and allows
     Xen pcifront to work with the 'xl' toolstack (for one or more
     PCI devices) and with 'xm' toolstack (for more than two PCI

But this seems to be a different state issue?

Ariel/Atom2, do you see this behavior with 'xend'? And what is the version of 
kernel you are running as guest?
Hi Konrad -
nope, I am using xl; there is no xend or xm installed on the machine or involved anyhow (I assumed with xend you referred back to xm instead of xl).

The xen (and xen-tools) version is 4.3.1-r5 and the linux kernel is 3.11.7-r1 from gentoo hardened-sources (that's both for guest and for dom0 - although clearly with different kernel configs). Both the kernel and xen/xen-tools are the latest stable versions available as ebuilds from gentoo.


