[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [Xen-users] substantial shutdown delay for PV guests with PCI -passthrough
Am 18.03.14 11:15, schrieb Ian Campbell: Adding xen-devel. Full thread starts at http://lists.xen.org/archives/html/xen-users/2014-03/msg00102.html On Mon, 2014-03-17 at 19:13 +0100, Atom2 wrote:Any chance you could try 4.3.2, or even 4.4.0?Unfortunately neither of these versions are currently available as stable ebuilds for my distribution, but I assume it shouldn't be long before there's some movement.Looking at the diff to tools/libxl/libxl_pci.c I don't see any pertinent looking fixes so it seems probably this issue still exists.The system is capable of vt-d and uses a Xeon E3-1260L processor. Do these observations ring a bell with anybody or is this even expected behaviour. If this is not normal - which I would expect as I have not been able to find any information relating to substantial delays during shutdown - how would I go about getting to the grounds of this?My guess would be that xl process which is managing the domain destroy is waiting for something (perhaps pciback) to confirm shutdown for each device and this is timing out in series, leading to the delays. You might find something in the logs /var/log/xen pointing to something like this. If not then if you start the guest with "xl -vvv create -F <cfg>" then the xl process which is monitoring the domain will stay in the foreground and be logging to stdout (I think). If you then issue the shutdown from another shell perhaps there will be some obvious gaps in the logs as things shutdown which might help.That worked and there also was some output - please find the log from start to finnish attached to this mail. I have marked various points in the log: First the point where the startup was done and the domU was live and secondly those 4 points in time (or rather output) where the 10s delay occured.Quoting the relevant bit for -devel, full log is at http://lists.xen.org/archives/html/xen-users/2014-03/txtl6VscE4NMf.txt: Domain 3 has shut down, reason code 0 0x0 Action for shutdown reason code 0 is destroy Domain 3 needs to be cleaned up: destroying the domain libxl: debug: libxl.c:1252:libxl_domain_destroy: ao 0x7f1dddf2b850: create: how=(nil) callback=(nil) poller=0x7f1dddf2cd70 libxl: error: libxl_pci.c:1248:do_pci_remove: xc_domain_irq_permission irq=17 <NOTE: at this point a 10s pause happens> libxl: error: libxl_device.c:1127:libxl__wait_for_backend: Backend /local/domain/0/backend/pci/3/0 not ready libxl: debug: libxl_pci.c:173:libxl__device_pci_remove_xenstore: pci backend at /local/domain/0/backend/pci/3/0 is not ready libxl: error: libxl_pci.c:1248:do_pci_remove: xc_domain_irq_permission irq=16 <NOTE: at this point a 10s pause happens> libxl: error: libxl_device.c:1127:libxl__wait_for_backend: Backend /local/domain/0/backend/pci/3/0 not ready libxl: debug: libxl_pci.c:173:libxl__device_pci_remove_xenstore: pci backend at /local/domain/0/backend/pci/3/0 is not ready [repeat for more devices] Do you get anything in "xl dmesg" or dom0's "dmesg" corresponding to these events? xl dmesg:There is no additional output in dom0's 'xl dmesg' after creation of the domain with xl create -c There are however two additional lines in dom0's 'xl dmesg' output after I have executed 'shutdown -h now' from the domU's command line following a login as follows: ============== (XEN) irq.c:2028: dom3: forcing unbind of pirq 23 (XEN) tmem: flushing tmem pools for domid=3 ============== dmesg:Additional output of dmesg in dom0 after creating the the domain with xl create -c is as follows: ============== [ 322.225345] device vif3.0 entered promiscuous mode [ 322.828304] xen_pciback: vpci: 0000:03:00.0: assign to virtual slot 0 [ 322.829486] xen_pciback: vpci: 0000:06:00.0: assign to virtual slot 1 [ 322.840174] xen_pciback: vpci: 0000:09:02.0: assign to virtual slot 2 [ 322.841134] xen_pciback: vpci: 0000:00:1d.0: assign to virtual slot 3[ 322.937946] xen-blkback:ring-ref 2047, event-channel 4, protocol 1 (x86_64-abi) [ 322.947106] xen-blkback:ring-ref 2046, event-channel 5, protocol 1 (x86_64-abi) [ 322.955840] xen-blkback:ring-ref 2045, event-channel 6, protocol 1 (x86_64-abi) [ 327.171287] xen-blkback:backend/vbd/3/51713: prepare for reconnect [ 327.194441] xen-blkback:backend/vbd/3/51714: prepare for reconnect [ 327.198687] xen-blkback:backend/vbd/3/51715: prepare for reconnect [ 327.925926] pciback 0000:00:1d.0: enabling device (0000 -> 0002) [ 327.926053] xen: registering gsi 23 triggering 0 polarity 1 [ 327.926059] Already setup the GSI :23[ 327.926219] pciback 0000:00:1d.0: Driver tried to write to a read-only configuration space field at offset 0x6c, size 4. This may be harmless, but if you have problems with your device: 1) see permissive attribute in sysfs2) report problems to the xen-devel mailing list along with details of your device obtained from lspci. [ 327.932217] xen-blkback:ring-ref 9, event-channel 18, protocol 1 (x86_64-abi) persistent grants [ 327.938142] xen-blkback:ring-ref 10, event-channel 19, protocol 1 (x86_64-abi) persistent grants [ 327.943674] xen-blkback:ring-ref 11, event-channel 20, protocol 1 (x86_64-abi) persistent grants [ 328.965061] xenbr0: port 4(vif3.0) entered forwarding state [ 328.965067] xenbr0: port 4(vif3.0) entered forwarding state [ 330.716771] pciback 0000:00:1d.0: setting latency timer to 64 [ 335.728218] pciback 0000:06:00.0: enabling device (0000 -> 0002) [ 335.728286] xen: registering gsi 16 triggering 0 polarity 1 [ 335.728290] Already setup the GSI :16 [ 343.992293] xenbr0: port 4(vif3.0) entered forwarding state ==============FYI: The 0000:00:1d.0 device is the USB host controller being passed through (Intel Sandybridge, C206 chipset); output of lspci: 00:1d.0 USB controller: Intel Corporation 6 Series/C200 Series Chipset Family USB Enhanced Host Controller #1 (rev 05) Howvere currently there seems to be no issue with the USB host controller in domU: There are two USB bluetooth dongles attached and they do work (they connect to mobile phones). I'm more than happy to provide more info should this be required. Further output of dmesg in dom0 after I have executed 'shutdown -h now' from the domU's command line following a login is follows: ============== [ 556.864398] xenbr0: port 4(vif3.0) entered disabled state [ 597.584588] xenbr0: port 4(vif3.0) entered disabled state [ 597.584663] device vif3.0 left promiscuous mode [ 597.584668] xenbr0: port 4(vif3.0) entered disabled state ============== So I guess probably not too much of value ... Full output of xenstore-ls -fp (from dom0) is attached as it is 627 lines long and I am not quiet sure what you are actually after): There's nothing in it that reads pciback; there are however a few entries namedLooking at do_pci_remove after the call to xc_domain_irq_permission (which fails, but I don't think that relates to the delay) we then call (conditionally) libxl__device_pci_reset, xc_deassign_device, libxl__device_pci_remove_common and libxl__device_pci_remove_xenstore, with no logging to indicate which we are calling (not helpful!). The "is not ready" message comes from libxl__device_pci_remove_xenstore which calls libxl__wait_for_backend. The latter has been rewritten a bit since 4.3.1 but not in a way which I think would affect this case. libxl__wait_for_backend does have a usleep(10000) in it -- which is certainly the source of the delay, but I'd like to explain how we got to waiting like that anyway (IanJ: do you have PCI on your hitlist for asyncing up?) This thing about pciback not being ready rings a bell. I've cc'd a few folks who I think might remember more. While the domain is happily running can you provide the output of "xenstore-ls -fp" -- I'm curious what state pciback is in. It should be 4, if not then that would be the problem. /local/domain/0/backend/pci and for one of the subentries named 3/0/state (3 is the domain-id) the value seems to be 4: /local/domain/0/backend/pci/3/0/state = "4" (n0,r3) But then there is also another entry further down named /local/domain/3/device/pci/0/state = "4" (n3,r0)But I am speculating here and I guess it's better to leave the interpretation to your expertise ... BTW: I don't know whether it makes any difference, but I am only using xen-pciback.hide=(bb:dd.f)(...) on the grub command line for a number of devices including those that I pass through to this domU - there's nothing else happening in the dom0 with those devices priot to starting the domU and there are also no driver modules available for any of the hidden hardware (except for one of the hidden USB Controllers of the motherboard which is also passed through) in dom0.I don't think that should matter here. Ian. Attachment:
xenstore _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |