[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: RMRRs and Phantom Functions


  • To: Jan Beulich <jbeulich@xxxxxxxx>
  • From: Andrew Cooper <Andrew.Cooper3@xxxxxxxxxx>
  • Date: Wed, 27 Apr 2022 10:05:54 +0000
  • Accept-language: en-GB, en-US
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=citrix.com; dmarc=pass action=none header.from=citrix.com; dkim=pass header.d=citrix.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=xuH8RRONxl+DQX/OIKIJJ9vSVJLrKLopyzbW8H3FAnE=; b=gmYauG9WlsWjL1c2a19BWabZKHekClzJURbtfm+qQVXZOOxUD8/JsqQ6cWQLw/XiHM93FxujIKGUXIYM+M0Ie8kTBhROXKYaqKMfCxGR3PXNVvhhBV03Qb8vwGdfAyeA0crhnkPeEN++h7qLaVu5rr6Uxaii8WO/YzpiejeDKRezHJVTmj2SIAuJmkWGgUtsHfjBsxd4D1pbsbEPjXuRdxh3DG+u9iKmHC8g4GUk+hg55zLA/PlQd3hMmbtj56vevl+S92nmCZJleMu31o3KTOnmjLV5tddtdlA5TbBJbNfR9TdW2OEPVjntpn25VcNGLovDkogN8gBOTgSCQemH1A==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=TaYzN+GLsTuGjJVbNIANTyQJNgdDME/RCeB2nKLNXTp909joZ0JwYybxIh4zJQVq+ju7HQyeNMbzcgGVU3a6RPjvsEX8R2kX61+bohR1djpjTO64j7HTT2lgPjGIFmVjMTVbzXZi3pmxZvTL9/Ks6o3wB+bau4rFhOnDRUeNqaOJActHRpNuWhGPJhoO0zibNBD8T3YVEWm8Fu6nheQb8NJkRkc74DzYdRLVCBQcWRmlzoSlM30q7mG/Xhn2W20d48mtXzh8ZKn+8MtZ1VvmmBaL2K5B+JS4RyL0Vjr7yAJtzV7qSO/AckKaEfw7QyALsPZ0LXRInUc2jtVy1rsT4Q==
  • Authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=citrix.com;
  • Cc: Roger Pau Monne <roger.pau@xxxxxxxxxx>, Kevin Tian <kevin.tian@xxxxxxxxx>, Edwin Torok <edvin.torok@xxxxxxxxxx>, xen-devel <xen-devel@xxxxxxxxxxxxxxxxxxxx>
  • Delivery-date: Wed, 27 Apr 2022 10:06:16 +0000
  • Ironport-data: A9a23:aF+OG67nPKkoT/vjqUXiagxRtDzGchMFZxGqfqrLsTDasY5as4F+v msbD2CAPayKM2Dxed13Ot7k/U0AvMODxtFmTQdp+XpjHi5G8cbLO4+Ufxz6V8+wwmwvb67FA +E2MISowBUcFyeEzvuVGuG96yE6j8lkf5KkYAL+EnkZqTRMFWFw0XqPp8Zj2tQy2YTjXVvW0 T/Pi5a31GGNimYc3l08s8pvmDs31BglkGpF1rCWTakjUG72zxH5PrpGTU2CByKQrr1vNvy7X 47+IISRpQs1yfuP5uSNyd4XemVSKlLb0JPnZnB+A8BOiTAazsA+PzpS2FPxpi67hh3Q9+2dx umhurSNdBcYOaPQod5HXhNpTgEjG7RhwJzudC3XXcy7lyUqclPK6tA3VAQaGNNd/ex6R2ZT6 fYfNTYBKAiZgP67y666Te8qgdk/KM7sP8UUvXQIITPxVK56B8ycBfiao4YAgF/chegXdRraT +MfZSBic1LrZBpXN01MIJk/gP2plj/0dDgwRFe9+vJruDKOlVMZPL7FbIrqR4ySR/5vgWGUl kXbw3b6XRI6DYnKodaC2jf27gPVpgv5Uo8PELyz9tZxnUaegGcUDXU+VlaloP//lk+3XfpeL VAZ/mwlqq1a3FymSJzxUgO1pFaAvwUAQJxAHusi8gaPx6HIpQGDCQA5oiVpbdUnsIo6QGIs3 1rQx9fxX2U37PuSVG6X8aqSoXWqIy8JIGQeZCgCCwwY/93kp4J1hRXKJjp+LJOIYhTOMWmY6 1i3QOIW3ux7YRIjv0ljwW36vg==
  • Ironport-hdrordr: A9a23:NwaSvKEBzSDElh6epLqFt5LXdLJyesId70hD6qkvc3Fom52j/f xGws5x6fatskdrZJkh8erwW5Vp2RvnhNFICPoqTM2ftW7dySWVxeBZnMffKljbdxEWmdQtsp uIH5IeNDS0NykDsS+Y2nj4Lz9D+qjgzEnAv463oBlQpENRGthdBmxCe2Sm+zhNNW177O0CZf +hD6R8xwaISDAyVICWF3MFV+/Mq5ngj5T9eyMLABYh9U2nkS6owKSSKWna4j4uFxd0hZsy+2 nMlAL0oo+5teug9xPa32jPq7xLhdrazMdZDsDksLlWFtyssHfsWG1SYczEgNkHmpDo1L/sqq iUn/4UBbU215oWRBDsnfKi4Xi67N9k0Q6e9bbRuwqenSW+fkN7NyMJv/MmTvOSgXBQw+1Uwe ZF2XmUuIFQCg6FlCPh58LQXxUvjUasp2E++NRjxEC3/rFuGoO5gLZvtX+9Kq1wVB4SKbpXZd VGHYXZ/rJbYFmaZ3fWsi1mx8GtRG06GlODTlIZssKY3jBKlDQhpnFoifA3jzMF7tYwWpNE7+ PLPuBhk6xPVNYfaeZ4CP0aScW6B2TRSVbHMX6UI17gCKYbUki94aLf8fEw/qWnaZYIxJw9lN DIV05Zr3c7fwb0BciHzPRwg2bwqaWGLEPQI+1lluhEU+fHNcvW2AW4OSMTutrlpekDCcvGXP v2MI5KApbYXB/TJbo=
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>
  • Thread-index: AQHYWZZDIrEkmq9fL0+3qLNT2+7QyK0DVX0AgAA0GQA=
  • Thread-topic: RMRRs and Phantom Functions

On 27/04/2022 07:59, Jan Beulich wrote:
> On 26.04.2022 19:51, Andrew Cooper wrote:
>> Hello,
>>
>> Edvin has found a machine with some very weird properties.  It is an HP
>> ProLiant BL460c Gen8 with:
>>
>>  \-[0000:00]-+-00.0  Intel Corporation Xeon E5/Core i7 DMI2
>>              +-01.0-[11]--
>>              +-01.1-[02]--
>>              +-02.0-[04]--+-00.0  Emulex Corporation OneConnect 10Gb NIC
>> (be3)
>>              |            +-00.1  Emulex Corporation OneConnect 10Gb NIC
>> (be3)
>>              |            +-00.2  Emulex Corporation OneConnect 10Gb
>> iSCSI Initiator (be3)
>>              |            \-00.3  Emulex Corporation OneConnect 10Gb
>> iSCSI Initiator (be3)
>>
>> yet all 4 other functions on the device periodically hit IOMMU faults
>> (~once every 5 mins, so definitely stats).
>>
>> (XEN) [VT-D]DMAR:[DMA Write] Request device [0000:04:00.4] fault addr
>> bdf80000
>> (XEN) [VT-D]DMAR:[DMA Write] Request device [0000:04:00.5] fault addr
>> bdf80000
>> (XEN) [VT-D]DMAR:[DMA Write] Request device [0000:04:00.6] fault addr
>> bdf80000
>> (XEN) [VT-D]DMAR:[DMA Write] Request device [0000:04:00.7] fault addr
>> bdf80000
>>
>> There are several RMRRs covering the these devices, with:
>>
>> (XEN) [VT-D]found ACPI_DMAR_RMRR:
>> (XEN) [VT-D] endpoint: 0000:03:00.0
>> (XEN) [VT-D] endpoint: 0000:01:00.0
>> (XEN) [VT-D] endpoint: 0000:01:00.2
>> (XEN) [VT-D] endpoint: 0000:04:00.0
>> (XEN) [VT-D] endpoint: 0000:04:00.1
>> (XEN) [VT-D] endpoint: 0000:04:00.2
>> (XEN) [VT-D] endpoint: 0000:04:00.3
>> (XEN) [VT-D]dmar.c:608:   RMRR region: base_addr bdf8f000 end_addr bdf92fff
>>
>> being the one relevant to these faults.  I've not manually decoded the
>> DMAR table because device paths are horrible to follow but there are at
>> least the correct number of endpoints.  The functions all have SR-IOV
>> (disabled) and ARI (enabled).  None have any Phantom functions described.
>>
>> Specifying pci-phantom=04:00,1 does appear to work around the faults,
>> but it's not right, because functions 1 thru 3 aren't actually phantom.
> Indeed, and I think you really mean "pci-phantom=04:00,4".

As a quick tangent, the cmdline docs for pci-phantom= are in desperate
need of an example and a description of how stride works.  I've got some
ideas and notes jotted down.

Do we really mean ,4 here?  What happens for function 1?

> I guess we
> should actually refuse "pci-phantom=04:00,1" in a case like this one.
> The problem is that at the point we set pdev->phantom_stride we may
> not know of the other devices, yet. But I guess we could attempt a
> config space read of the supposed phantom function's device/vendor
> and do <whatever> if these aren't both 0xffff.

At a minimum, we ought to warn when it looks like something is wonky,
but I wouldn't go as far as rejecting.

All of these options to work around firmware/system screwups are applied
to an already-non-working system, and there is absolutely no guarantee
that necessary fixes make any kind of logical sense.

~Andrew

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.