[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] Xen-unstable: problems with qmp socket error in libxl_pci teardown path for HVM guest with PCI passthrough
On Fri, 28 Nov 2014, Sander Eikelenboom wrote: > Friday, November 28, 2014, 6:09:52 PM, you wrote: > > > On Fri, 28 Nov 2014, Wei Liu wrote: > >> On Fri, Nov 28, 2014 at 04:54:19PM +0000, Stefano Stabellini wrote: > >> > On Fri, 28 Nov 2014, Wei Liu wrote: > >> > > On Fri, Nov 28, 2014 at 03:08:51PM +0000, Stefano Stabellini wrote: > >> > > > create ^ > >> > > > title it QMP connection problems prevent libxl from calling > >> > > > libxl__device_pci_reset on domain shutdown > >> > > > thanks > >> > > > > >> > > > On Wed, 26 Nov 2014, Sander Eikelenboom wrote: > >> > > > > Hi, > >> > > > > > >> > > > > While testing a patch for Konrad i was wondering why "libxl_pci.c: > >> > > > > libxl__device_pci_reset()" > >> > > > > doesn't get called on guest shutdown of a HVM guest (qemu-xen) > >> > > > > with pci passthrough. > >> > > > > xl didn't show any problems on the commandline so i never drawed > >> > > > > much attention > >> > > > > to it, but /var/log/xen/xl-guest.log shows: > >> > > > > > >> > > > > Waiting for domain xbmc13sid (domid 19) to die [pid 20450] > >> > > > > Domain 19 has shut down, reason code 0 0x0 > >> > > > > Action for shutdown reason code 0 is destroy > >> > > > > Domain 19 needs to be cleaned up: destroying the domain > >> > > > > libxl: error: libxl_qmp.c:443:qmp_next: Socket read error: > >> > > > > Connection reset by peer > >> > > > > libxl: error: libxl_qmp.c:701:libxl__qmp_initialize: Failed to > >> > > > > connect to QMP > >> > > > > libxl: error: libxl_pci.c:1242:do_pci_remove: > >> > > > > libxl__qmp_pci_del: Connection reset by peer > >> > > > > libxl: error: libxl_dm.c:1575:kill_device_model: Device Model > >> > > > > already exited > >> > > > > Done. Exiting now > >> > > > > > >> > > > > So it doesn't even get to calling "libxl_pci.c: > >> > > > > libxl__device_pci_reset()". > >> > > > > > >> > > > > -- > >> > > > >> > > It's worth checking whether the device model exits too early, i.e., did > >> > > it crash? What's in the DM log? > >> > > >> > Firstly I was thinking that if force=1 it makes sense to continue even > >> > when QEMU returns error, see: > >> > > >> > http://marc.info/?i=alpine.DEB.2.02.1411281648500.14135%40kaball.uk.xensource.com > >> > >> In normal case DM is not destroyed until PCI device is removed. So I > >> think the real issue is DM crashed. > > > I don't think it crashed: QEMU detects that domain shutdown has been > > called and exits. > > See: > > > static void main_loop(void) > > { > > bool nonblocking; > > int last_io = 0; > > #ifdef CONFIG_PROFILER > > int64_t ti; > > #endif > > do { > > > [...] > > > } while (!main_loop_should_exit()); > > } > > > > main_loop_should_exit returns true when qemu_shutdown_requested(). > > > >> libxl.c: > >> 1571 if (libxl__device_pci_destroy_all(gc, domid) < 0) > >> 1572 LIBXL__LOG(ctx, LIBXL__LOG_ERROR, "pci shutdown failed for > >> domid %d", domid); > >> 1573 rc = xc_domain_pause(ctx->xch, domid); > >> 1574 if (rc < 0) { > >> 1575 LIBXL__LOG_ERRNOVAL(ctx, LIBXL__LOG_ERROR, rc, > >> "xc_domain_pause failed for %d", domid); > >> 1576 } > >> 1577 if (dm_present) { > >> 1578 if (libxl__destroy_device_model(gc, domid) < 0) > >> 1579 LIBXL__LOG(ctx, LIBXL__LOG_ERROR, > >> "libxl__destroy_device_model failed for %d", domid); > >> 1580 > >> 1581 libxl__qmp_cleanup(gc, domid); > >> 1582 } > >> > >> The patch you posted covers up the real issue. > > > That is true, but I think the patch is correct regardless of the issue. > > Hmm how about qemu's "-no-shutdown" option ? > >From the manpage: > -no-shutdown > Don't exit QEMU on guest shutdown, but instead only stop the emulation. > This allows for instance switching to monitor to commit changes to the disk > image. > > with: > device_model_args_hvm = [ '-no-shutdown' ] > > I get: > > /var/log/xen/xl-tv.log: > > Waiting for domain tv (domid 18) to die [pid 18587] > Domain 18 has shut down, reason code 0 0x0 > Action for shutdown reason code 0 is destroy > Domain 18 needs to be cleaned up: destroying the domain > libxl: error: libxl_pci.c:1299:do_pci_remove: xc_domain_irq_permission > irq=40: Invalid argument Thanks for testing! The suggestion might be a good idea, but I am not sure whether it is suitable given the stage of the release cycle. I would wait until 4.6 before making that change. Regarding the xc_domain_irq_permission error, it is just a matter of calling xc_physdev_unmap_pirq after xc_domain_irq_permission: diff --git a/tools/libxl/libxl_pci.c b/tools/libxl/libxl_pci.c index 316643c..904d255 100644 --- a/tools/libxl/libxl_pci.c +++ b/tools/libxl/libxl_pci.c @@ -1288,14 +1290,14 @@ skip1: goto out; } if ((fscanf(f, "%u", &irq) == 1) && irq) { - rc = xc_physdev_unmap_pirq(ctx->xch, domid, irq); - if (rc < 0) { - LIBXL__LOG_ERRNO(ctx, LIBXL__LOG_ERROR, "xc_physdev_unmap_pirq irq=%d", irq); - } rc = xc_domain_irq_permission(ctx->xch, domid, irq, 0); if (rc < 0) { LIBXL__LOG_ERRNO(ctx, LIBXL__LOG_ERROR, "xc_domain_irq_permission irq=%d", irq); } + rc = xc_physdev_unmap_pirq(ctx->xch, domid, irq); + if (rc < 0) { + LIBXL__LOG_ERRNO(ctx, LIBXL__LOG_ERROR, "xc_physdev_unmap_pirq irq=%d", irq); + } } fclose(f); } _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |