[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-users] substantial shutdown delay for PV guests with PCI -passthrough
Adding xen-devel. Full thread starts at http://lists.xen.org/archives/html/xen-users/2014-03/msg00102.html On Mon, 2014-03-17 at 19:13 +0100, Atom2 wrote: > > Any chance you could try 4.3.2, or even 4.4.0? > Unfortunately neither of these versions are currently available as > stable ebuilds for my distribution, but I assume it shouldn't be long > before there's some movement. Looking at the diff to tools/libxl/libxl_pci.c I don't see any pertinent looking fixes so it seems probably this issue still exists. > > > >> The system is capable of vt-d and uses a Xeon E3-1260L processor. > >> > >> Do these observations ring a bell with anybody or is this even expected > >> behaviour. If this is not normal - which I would expect as I have not > >> been able to find any information relating to substantial delays during > >> shutdown - how would I go about getting to the grounds of this? > > > > My guess would be that xl process which is managing the domain destroy > > is waiting for something (perhaps pciback) to confirm shutdown for each > > device and this is timing out in series, leading to the delays. You > > might find something in the logs /var/log/xen pointing to something like > > this. > > > > If not then if you start the guest with "xl -vvv create -F <cfg>" then > > the xl process which is monitoring the domain will stay in the > > foreground and be logging to stdout (I think). If you then issue the > > shutdown from another shell perhaps there will be some obvious gaps in > > the logs as things shutdown which might help. > That worked and there also was some output - please find the log from > start to finnish attached to this mail. I have marked various points in > the log: First the point where the startup was done and the domU was > live and secondly those 4 points in time (or rather output) where the > 10s delay occured. Quoting the relevant bit for -devel, full log is at http://lists.xen.org/archives/html/xen-users/2014-03/txtl6VscE4NMf.txt: Domain 3 has shut down, reason code 0 0x0 Action for shutdown reason code 0 is destroy Domain 3 needs to be cleaned up: destroying the domain libxl: debug: libxl.c:1252:libxl_domain_destroy: ao 0x7f1dddf2b850: create: how=(nil) callback=(nil) poller=0x7f1dddf2cd70 libxl: error: libxl_pci.c:1248:do_pci_remove: xc_domain_irq_permission irq=17 <NOTE: at this point a 10s pause happens> libxl: error: libxl_device.c:1127:libxl__wait_for_backend: Backend /local/domain/0/backend/pci/3/0 not ready libxl: debug: libxl_pci.c:173:libxl__device_pci_remove_xenstore: pci backend at /local/domain/0/backend/pci/3/0 is not ready libxl: error: libxl_pci.c:1248:do_pci_remove: xc_domain_irq_permission irq=16 <NOTE: at this point a 10s pause happens> libxl: error: libxl_device.c:1127:libxl__wait_for_backend: Backend /local/domain/0/backend/pci/3/0 not ready libxl: debug: libxl_pci.c:173:libxl__device_pci_remove_xenstore: pci backend at /local/domain/0/backend/pci/3/0 is not ready [repeat for more devices] Do you get anything in "xl dmesg" or dom0's "dmesg" corresponding to these events? Looking at do_pci_remove after the call to xc_domain_irq_permission (which fails, but I don't think that relates to the delay) we then call (conditionally) libxl__device_pci_reset, xc_deassign_device, libxl__device_pci_remove_common and libxl__device_pci_remove_xenstore, with no logging to indicate which we are calling (not helpful!). The "is not ready" message comes from libxl__device_pci_remove_xenstore which calls libxl__wait_for_backend. The latter has been rewritten a bit since 4.3.1 but not in a way which I think would affect this case. libxl__wait_for_backend does have a usleep(10000) in it -- which is certainly the source of the delay, but I'd like to explain how we got to waiting like that anyway (IanJ: do you have PCI on your hitlist for asyncing up?) This thing about pciback not being ready rings a bell. I've cc'd a few folks who I think might remember more. While the domain is happily running can you provide the output of "xenstore-ls -fp" -- I'm curious what state pciback is in. It should be 4, if not then that would be the problem. > BTW: I don't know whether it makes any difference, but I am only using > xen-pciback.hide=(bb:dd.f)(...) on the grub command line for a number of > devices including those that I pass through to this domU - there's > nothing else happening in the dom0 with those devices priot to starting > the domU and there are also no driver modules available for any of the > hidden hardware (except for one of the hidden USB Controllers of the > motherboard which is also passed through) in dom0. I don't think that should matter here. Ian. _______________________________________________ Xen-users mailing list Xen-users@xxxxxxxxxxxxx http://lists.xen.org/xen-users
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |