[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[PATCH v5 1/3] x86/msi: harden stale pdev handling


  • To: <xen-devel@xxxxxxxxxxxxxxxxxxxx>
  • From: Stewart Hildebrand <stewart.hildebrand@xxxxxxx>
  • Date: Fri, 11 Oct 2024 11:27:06 -0400
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 165.204.84.17) smtp.rcpttodomain=lists.xenproject.org smtp.mailfrom=amd.com; dmarc=pass (p=quarantine sp=quarantine pct=100) action=none header.from=amd.com; dkim=none (message not signed); arc=none (0)
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=Bae18Nq7AOaW1aZ8YIQKtu/Xs/0xVUuY6JnFn3meqtw=; b=PTP82t8gf/ZrirNSf48Hfa8MFV17jamp7YngmCIlwzNbcIQ4L5FavPPRIyhSNYkGIyL5mlnzEvBg/ClAM2toHAmLBR6tI4UwUDpHcQ2m4tckU31hcpOUwjJrVLP+4ESyDS0Gz0jnyC5ANZbzKirMRlWeUXo08t9513mjW/+ZhOBlQ/MkimcPLmeBH05LT59+lb0N5NbUqGJ8U3BAvaHv6q8WM/k7O3iIKzMZvTEX0VE0/PN81YyHDmSxANRjFVTJFPN8kubavth4a9lM21RjRHE/gOEcvre0oaDDa4PmypliyA7HICJvq1cb151dE40XjZcirTl0Ri/602CbIuFNcA==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=PVuecqEyK9QUPIM/OucNgwHJLHQqk9B2HELaqoNmi/PrRqgkHLFflWRpNamcapp+o7YKCasjXPEn1FPGlO4ymN5ZPuz8zRkA8/5cdzuDWzzNKMBBZxfFE+rXAZ5QEIbsRMmD7bugkOuV3MJuMQ60/MvyzeZ3WWLwZMux4L9KiTg1uE3i9b0Q7KHus5VslEdfDlOa561d8t73iXq1zkkfFvgM23+ERobLWRwBQbVrFhGAw5trMacoZMQq1gn+d2kBpb3be8fJ9vb1UrT1uyCUvT796VDfusUIkIgnQLpSl64Ij8qF59GrYFhxiBqF2gec2DzYUCW7r2qb8Fo05tbaew==
  • Cc: Stewart Hildebrand <stewart.hildebrand@xxxxxxx>, Jan Beulich <jbeulich@xxxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Roger Pau Monné <roger.pau@xxxxxxxxxx>
  • Delivery-date: Fri, 11 Oct 2024 15:27:54 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

Dom0 normally informs Xen of PCI device removal via
PHYSDEVOP_pci_device_remove, e.g. in response to SR-IOV disable or
hot-unplug. We might find ourselves with stale pdevs if a buggy dom0
fails to report removal via PHYSDEVOP_pci_device_remove. In this case,
attempts to access the config space of the stale pdevs would be invalid
and return all 1s.

Some possible conditions leading to this are:

1. Dom0 disables SR-IOV without reporting VF removal to Xen.

The Linux SR-IOV subsystem normally reports VF removal when a PF driver
disables SR-IOV. In case of a buggy dom0 SR-IOV subsystem, SR-IOV could
become disabled with stale dangling VF pdevs in both dom0 Linux and Xen.

2. Dom0 reporting PF removal without reporting VF removal.

During SR-IOV PF removal (hot-unplug), a buggy PF driver may fail to
disable SR-IOV, thus failing to remove the VFs, leaving stale dangling
VFs behind in both Xen and Linux. At least Linux warns in this case:

[  100.000000]  0000:01:00.0: driver left SR-IOV enabled after remove

In either case, Xen is left with stale VF pdevs, risking invalid PCI
config space accesses.

When Xen is built with CONFIG_DEBUG=y, the following Xen crashes were
observed when dom0 attempted to access the config space of a stale VF:

(XEN) Assertion 'pos' failed at arch/x86/msi.c:1274
(XEN) ----[ Xen-4.20-unstable  x86_64  debug=y  Tainted:   C    ]----
...
(XEN) Xen call trace:
(XEN)    [<ffff82d040346834>] R pci_msi_conf_write_intercept+0xa2/0x1de
(XEN)    [<ffff82d04035d6b4>] F pci_conf_write_intercept+0x68/0x78
(XEN)    [<ffff82d0403264e5>] F arch/x86/pv/emul-priv-op.c#pci_cfg_ok+0xa0/0x114
(XEN)    [<ffff82d04032660e>] F 
arch/x86/pv/emul-priv-op.c#guest_io_write+0xb5/0x1c8
(XEN)    [<ffff82d0403267bb>] F arch/x86/pv/emul-priv-op.c#write_io+0x9a/0xe0
(XEN)    [<ffff82d04037c77a>] F x86_emulate+0x100e5/0x25f1e
(XEN)    [<ffff82d0403941a8>] F x86_emulate_wrapper+0x29/0x64
(XEN)    [<ffff82d04032802b>] F pv_emulate_privileged_op+0x12e/0x217
(XEN)    [<ffff82d040369f12>] F do_general_protection+0xc2/0x1b8
(XEN)    [<ffff82d040201aa7>] F x86_64/entry.S#handle_exception_saved+0x2b/0x8c

(XEN) Assertion 'pos' failed at arch/x86/msi.c:1246
(XEN) ----[ Xen-4.20-unstable  x86_64  debug=y  Tainted:   C    ]----
...
(XEN) Xen call trace:
(XEN)    [<ffff82d040346b0a>] R pci_reset_msix_state+0x47/0x50
(XEN)    [<ffff82d040287eec>] F pdev_msix_assign+0x19/0x35
(XEN)    [<ffff82d040286184>] F 
drivers/passthrough/pci.c#assign_device+0x181/0x471
(XEN)    [<ffff82d040287c36>] F iommu_do_pci_domctl+0x248/0x2ec
(XEN)    [<ffff82d040284e1f>] F iommu_do_domctl+0x26/0x44
(XEN)    [<ffff82d0402483b8>] F do_domctl+0x8c1/0x1660
(XEN)    [<ffff82d04032977e>] F pv_hypercall+0x5ce/0x6af
(XEN)    [<ffff82d0402012d3>] F lstar_enter+0x143/0x150

Replace the ASSERT(s) with an error, and mark the device broken to
disallow passthrough to domUs.

Fixes: 484d7c852e4f ("x86/MSI-X: track host and guest mask-all requests 
separately")
Fixes: 575e18d54d19 ("pci: clear {host/guest}_maskall field on assign")
Signed-off-by: Stewart Hildebrand <stewart.hildebrand@xxxxxxx>
---
v4->v5:
* new patch, independent of the rest of the series
* new approach to fixing the issue: don't rely on dom0 to report any
  sort of device removal; rather, fix the condition directly

---
Instructions to reproduce
Requires Xen with CONFIG_DEBUG=y
Tested with Linux 6.11

1. Dom0 disables SR-IOV without reporting VF removal to Xen.

* Hack the Linux SR-IOV subsystem to remove the call to
  pci_stop_and_remove_bus_device() in
  drivers/pci/iov.c:pci_iov_remove_virtfn().

* Enable SR-IOV, then disable SR-IOV
  echo 1 > /sys/bus/pci/devices/0000\:01\:00.0/sriov_numvfs
  echo 0 > /sys/bus/pci/devices/0000\:01\:00.0/sriov_numvfs

* Now we have a stale VF. We can trigger the ASSERT either by unbinding
  the VF driver and issuing a reset...

  echo 0000\:01\:10.0 > /sys/bus/pci/devices/0000\:01\:10.0/driver/unbind
  echo 1 > /sys/bus/pci/devices/0000\:01\:10.0/reset

  ... or by doing xl pci-assignable-add

  xl pci-assignable-add 01:10.0

2. Dom0 reporting PF removal without reporting VF removal.

* Hack your PF driver to leave SR-IOV enabled when removing the device

* Enable SR-IOV

  echo 1 > /sys/bus/pci/devices/0000\:01\:00.0/sriov_numvfs

* Unplug the PCI device

  (qemu) device_del mydev

* Now we have a stale VF. We can trigger the ASSERT either by re-adding
  the PF device with SR-IOV disabled...

  echo 0000\:01\:10.0 > /sys/bus/pci/devices/0000\:01\:10.0/driver/unbind
  (qemu) device_add igb,id=mydev,bus=pcie.1,netdev=net1

  ... or by reset / xl pci-assignable-add as above.
---
 xen/arch/x86/msi.c | 14 ++++++++++++--
 1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/xen/arch/x86/msi.c b/xen/arch/x86/msi.c
index ff2e3d86878d..fbb07fe821b5 100644
--- a/xen/arch/x86/msi.c
+++ b/xen/arch/x86/msi.c
@@ -1243,7 +1243,12 @@ int pci_reset_msix_state(struct pci_dev *pdev)
 {
     unsigned int pos = pci_find_cap_offset(pdev->sbdf, PCI_CAP_ID_MSIX);
 
-    ASSERT(pos);
+    if ( !pos )
+    {
+        pdev->broken = true;
+        return -EFAULT;
+    }
+
     /*
      * Xen expects the device state to be the after reset one, and hence
      * host_maskall = guest_maskall = false and all entries should have the
@@ -1271,7 +1276,12 @@ int pci_msi_conf_write_intercept(struct pci_dev *pdev, 
unsigned int reg,
         entry = find_msi_entry(pdev, -1, PCI_CAP_ID_MSIX);
         pos = entry ? entry->msi_attrib.pos
                     : pci_find_cap_offset(pdev->sbdf, PCI_CAP_ID_MSIX);
-        ASSERT(pos);
+
+        if ( !pos )
+        {
+            pdev->broken = true;
+            return -EFAULT;
+        }
 
         if ( reg >= pos && reg < msix_pba_offset_reg(pos) + 4 )
         {
-- 
2.47.0




 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.