[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[PATCH] pci: cleanup MSI interrupts before removing device from IOMMU


  • To: <xen-devel@xxxxxxxxxxxxxxxxxxxx>
  • From: Roger Pau Monne <roger.pau@xxxxxxxxxx>
  • Date: Wed, 21 Oct 2020 10:19:45 +0200
  • Authentication-results: esa4.hc3370-68.iphmx.com; dkim=none (message not signed) header.i=none
  • Cc: Roger Pau Monne <roger.pau@xxxxxxxxxx>, Jan Beulich <jbeulich@xxxxxxxx>, Paul Durrant <paul@xxxxxxx>
  • Delivery-date: Wed, 21 Oct 2020 08:20:52 +0000
  • Ironport-sdr: 5C733hWXykDkPs3r9anZHWXJn489rFYLVVlMW3rYOIs2sYhLzzkEalWK+2QkxYlnDGUGGP1yKw JS9016T5ZQcD/Y6d/ikMtintAYeGauk5cMhIMlmMsHUK6lutG9Bd3vBECyYzQFWjy6TtiXm73n Kri8kUEoJFPYwGMDVlLbQuKX/MVORFdxOaROemC1SaryRRpyqOUHUBfYkAeEQvQH8oCTljfLjI 47ZSZZpMfgqJwY2ttMyP5UKpc+yycR/BFA6ceKAXjbq2jgqOIQ0uxSgKn6k7kzqAFFuryUMFR/ TEw=
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

Doing the MSI cleanup after removing the device from the IOMMU leads
to the following panic on AMD hardware:

Assertion 'table.ptr && (index < intremap_table_entries(table.ptr, iommu))' 
failed at iommu_intr.c:172
----[ Xen-4.13.1-10.0.3-d  x86_64  debug=y   Not tainted ]----
CPU:    3
RIP:    e008:[<ffff82d08026ae3c>] 
drivers/passthrough/amd/iommu_intr.c#get_intremap_entry+0x52/0x7b
[...]
Xen call trace:
   [<ffff82d08026ae3c>] R 
drivers/passthrough/amd/iommu_intr.c#get_intremap_entry+0x52/0x7b
   [<ffff82d08026af25>] F 
drivers/passthrough/amd/iommu_intr.c#update_intremap_entry_from_msi_msg+0xc0/0x342
   [<ffff82d08026ba65>] F amd_iommu_msi_msg_update_ire+0x98/0x129
   [<ffff82d08025dd36>] F iommu_update_ire_from_msi+0x1e/0x21
   [<ffff82d080286862>] F msi_free_irq+0x55/0x1a0
   [<ffff82d080286f25>] F pci_cleanup_msi+0x8c/0xb0
   [<ffff82d08025cf52>] F pci_remove_device+0x1af/0x2da
   [<ffff82d0802a42d1>] F do_physdev_op+0xd18/0x1187
   [<ffff82d080383925>] F pv_hypercall+0x1f5/0x567
   [<ffff82d08038a432>] F lstar_enter+0x112/0x120

That's because the call to iommu_remove_device on AMD hardware will
remove the per-device interrupt remapping table, and hence the call to
pci_cleanup_msi done afterwards will find a null intremap table and
crash.

Reorder the calls so that MSI interrupts are torn down before removing
the device from the IOMMU.

Signed-off-by: Roger Pau Monné <roger.pau@xxxxxxxxxx>
---
I've discussed the issue with Andrew and maybe we should try to avoid
removing the interrupt remapping table on device removal, but then the
tables would have to be sized to support the maximum amount of
interrupts instead of the maximum supported by the device currently
plugged in.

I think the currently proposed fix can be easily backported. We can
see about improving the resilience going ahead.
---
 xen/drivers/passthrough/pci.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/xen/drivers/passthrough/pci.c b/xen/drivers/passthrough/pci.c
index b035067975..64b8a77ce0 100644
--- a/xen/drivers/passthrough/pci.c
+++ b/xen/drivers/passthrough/pci.c
@@ -834,10 +834,15 @@ int pci_remove_device(u16 seg, u8 bus, u8 devfn)
     list_for_each_entry ( pdev, &pseg->alldevs_list, alldevs_list )
         if ( pdev->bus == bus && pdev->devfn == devfn )
         {
+            /*
+             * Cleanup MSI interrupts before removing the device from the
+             * IOMMU, or else the internal IOMMU data used to track the device
+             * interrupts might be already gone.
+             */
+            pci_cleanup_msi(pdev);
             ret = iommu_remove_device(pdev);
             if ( pdev->domain )
                 list_del(&pdev->domain_list);
-            pci_cleanup_msi(pdev);
             printk(XENLOG_DEBUG "PCI remove device %pp\n", &pdev->sbdf);
             free_pdev(pseg, pdev);
             break;
-- 
2.29.0




 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.