[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH 2/2] libxl_pci: Fix guest shutdown with PCI PT attached



On 01/10/2019 12:35, Anthony PERARD wrote:
> Rewrite of the commit message:
> 
> Before the problematic commit, libxl used to ignore error when
> destroying (force == true) a passthrough device, especially error that
> happens when dealing with the DM.
> 
> Since fae4880c45fe, if the DM failed to detach the pci device within
> the allowed time, the timed out error raised skip part of
> pci_remove_*, but also raise the error up to the caller of
> libxl__device_pci_destroy_all, libxl__destroy_domid, and thus the
> destruction of the domain fails.
> 
> In this patch, if the DM didn't confirmed that the device is removed,
> we will print a warning and keep going if force=true.  The patch
> reorder the functions so that pci_remove_timeout() calls
> pci_remove_detatched() like it's done when DM calls are successful.
> 
> We also clean the QMP states and associated timeouts earlier, as soon
> as they are not needed anymore.
> 
> Reported-by: Sander Eikelenboom <linux@xxxxxxxxxxxxxx>
> Fixes: fae4880c45fe015e567afa223f78bf17a6d98e1b
> Signed-off-by: Anthony PERARD <anthony.perard@xxxxxxxxxx>
> 

Hi Anthony / Chao,

I have to come back to this, a bit because perhaps there is an underlying issue.
While it earlier occurred to me that the VM to which I passed through most 
pci-devices 
(8 to be exact) became very slow to shutdown, but I  didn't investigate it 
further.

But after you commit messages from this patch it kept nagging, so today I did 
some testing
and bisecting.

The difference in tear-down time at least from what the IOMMU code logs is 
quite large:

xen-4.12.0
        Setup:      7.452 s
        Tear-down:  7.626 s

xen-unstable-ee7170822f1fc209f33feb47b268bab35541351d
        Setup:      7.468 s
        Tear-down: 50.239 s

Bisection turned up:
        commit c4b1ef0f89aa6a74faa4618ce3efed1de246ec40
        Author: Chao Gao <chao.gao@xxxxxxxxx>
        Date:   Fri Jul 19 10:24:08 2019 +0100
        libxl_qmp: wait for completion of device removal

Which makes me wonder if there is something going wrong in Qemu ?

--
Sander



xen-4.12.0 setup:
        (XEN) [2019-10-10 09:54:14.846] AMD-Vi: Disable: device id = 0x900, 
domain = 0, paging mode = 3
        (XEN) [2019-10-10 09:54:14.846] AMD-Vi: Setup I/O page table: device id 
= 0x900, type = 0x1, root table = 0x4aa847000, domain = 1, paging mode = 3
        (XEN) [2019-10-10 09:54:14.846] AMD-Vi: Re-assign 0000:09:00.0 from 
dom0 to dom1
        ...
        (XEN) [2019-10-10 09:54:22.298] AMD-Vi: Disable: device id = 0x907, 
domain = 0, paging mode = 3
        (XEN) [2019-10-10 09:54:22.298] AMD-Vi: Setup I/O page table: device id 
= 0x907, type = 0x1, root table = 0x4aa847000, domain = 1, paging mode = 3
        (XEN) [2019-10-10 09:54:22.298] AMD-Vi: Re-assign 0000:09:00.7 from 
dom0 to dom1


xen-4.12.0 tear-down:
        (XEN) [2019-10-10 10:01:11.971] AMD-Vi: Disable: device id = 0x900, 
domain = 1, paging mode = 3
        (XEN) [2019-10-10 10:01:11.971] AMD-Vi: Setup I/O page table: device id 
= 0x900, type = 0x1, root table = 0x53572c000, domain = 0, paging mode = 3
        (XEN) [2019-10-10 10:01:11.971] AMD-Vi: Re-assign 0000:09:00.0 from 
dom1 to dom0
        ...
        (XEN) [2019-10-10 10:01:19.597] AMD-Vi: Disable: device id = 0x907, 
domain = 1, paging mode = 3
        (XEN) [2019-10-10 10:01:19.597] AMD-Vi: Setup I/O page table: device id 
= 0x907, type = 0x1, root table = 0x53572c000, domain = 0, paging mode = 3
        (XEN) [2019-10-10 10:01:19.597] AMD-Vi: Re-assign 0000:09:00.7 from 
dom1 to dom0

xen-unstable-ee7170822f1fc209f33feb47b268bab35541351d setup:
        (XEN) [2019-10-10 10:21:38.549] d1: bind: m_gsi=47 g_gsi=36 dev=00.00.5 
intx=0
        (XEN) [2019-10-10 10:21:38.621] AMD-Vi: Disable: device id = 0x900, 
domain = 0, paging mode = 3
        (XEN) [2019-10-10 10:21:38.621] AMD-Vi: Setup I/O page table: device id 
= 0x900, type = 0x1, root table = 0x4aa83b000, domain = 1, paging mode = 3
        (XEN) [2019-10-10 10:21:38.621] AMD-Vi: Re-assign 0000:09:00.0 from 
dom0 to dom1
        ...
        (XEN) [2019-10-10 10:21:46.069] d1: bind: m_gsi=46 g_gsi=36 dev=00.01.4 
intx=3
        (XEN) [2019-10-10 10:21:46.089] AMD-Vi: Disable: device id = 0x907, 
domain = 0, paging mode = 3
        (XEN) [2019-10-10 10:21:46.089] AMD-Vi: Setup I/O page table: device id 
= 0x907, type = 0x1, root table = 0x4aa83b000, domain = 1, paging mode = 3
        (XEN) [2019-10-10 10:21:46.089] AMD-Vi: Re-assign 0000:09:00.7 from 
dom0 to dom1


xen-unstable-ee7170822f1fc209f33feb47b268bab35541351d tear-down:
        (XEN) [2019-10-10 10:23:53.167] AMD-Vi: Disable: device id = 0x900, 
domain = 1, paging mode = 3
        (XEN) [2019-10-10 10:23:53.167] AMD-Vi: Setup I/O page table: device id 
= 0x900, type = 0x1, root table = 0x5240f8000, domain = 0, paging mode = 3
        (XEN) [2019-10-10 10:23:53.167] AMD-Vi: Re-assign 0000:09:00.0 from 
dom1 to dom0
        ...
        (XEN) [2019-10-10 10:24:43.406] AMD-Vi: Disable: device id = 0x907, 
domain = 1, paging mode = 3
        (XEN) [2019-10-10 10:24:43.406] AMD-Vi: Setup I/O page table: device id 
= 0x907, type = 0x1, root table = 0x5240f8000, domain = 0, paging mode = 3
        (XEN) [2019-10-10 10:24:43.406] AMD-Vi: Re-assign 0000:09:00.7 from 
dom1 to dom0

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.