Xen project Mailing List

Re: [PATCH v8 1/6] AMD/IOMMU: obtain IVHD type to use earlier

To: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>

Date: Wed, 20 Oct 2021 08:58:34 +0200

Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=suse.com; dmarc=pass action=none header.from=suse.com; dkim=pass header.d=suse.com; arc=none

Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=3TOYEVnXzVkA5cBsykb6n9wGmV8L9yx/o1Y/39iFAG4=; b=nXhOReUiQkmcbSBEw39Lw/hN1lbO9CQGhuoM41zso+CFcdyBqcgBkOCwLYjuk/G04VtVoYviRr/+y2cZMsesI8VaDYbPf+nZQQfuFoFV7VD5BoKZRrVcUCQdrAeiB16hIkM5v3cCpyCeffjGvM+ODeOA95n4UL/ik0QPUflbv2K4JsbIB9m7LhnlRtvio8xcwqN/G7IFmwbObApETmkRJ6cdzVrbFQNiCtn2XORxwtf9pUtufLYBjqT6//QryydLm03IKhmX7rnvWDRJhb2ew4wUV9eBNadlmpsegI0HoWLBP9bURBFSPZOpQ3FTro98QxRaBnt6ZYP99XLeqnDCeg==

Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=iriAmSeT1uGEPGj8gyTdNPAsEJ99CUm9oSLhYXjGCS+5Y+ugZ4YO5Ke7YXPbh7xsDORouS9VT+o+LfIu0EHvNLTJpVE6iiB7sdHU4wd/qQp42kOsX2QJGgJdbYj1TbB8sCxRmrqPZWLSbuB8heoMuCOQ65n6+FC3KKX7ca7RDYlXO3mQ+gYJZ8wuoKg2rOoC1tOj94JntP83bDgbEWxWglrFrle7CSeTkOWPp90sktZ2tnCg819xS+8DxHHFgivZytxFa5aIGaM+MpnwNhCL9vmF+hqG1IhYwO8/ZniW2KeDs9M4Ah9JNf3jpZwGx/xtrUmraSO1jMcF4v2dw7b03Q==

Authentication-results: lists.xenproject.org; dkim=none (message not signed) header.d=none;lists.xenproject.org; dmarc=none action=none header.from=suse.com;

Cc: Paul Durrant <paul@xxxxxxx>, Ian Jackson <iwj@xxxxxxxxxxxxxx>, Roger Pau Monné <roger.pau@xxxxxxxxxx>, "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>

Delivery-date: Wed, 20 Oct 2021 06:58:58 +0000

List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 20.10.2021 01:34, Andrew Cooper wrote: > On 22/09/2021 15:36, Jan Beulich wrote: >> Doing this in amd_iommu_prepare() is too late for it, in particular, to >> be used in amd_iommu_detect_one_acpi(), as a subsequent change will want >> to do. Moving it immediately ahead of amd_iommu_detect_acpi() is >> (luckily) pretty simple, (pretty importantly) without breaking >> amd_iommu_prepare()'s logic to prevent multiple processing. >> >> This involves moving table checksumming, as >> amd_iommu_get_supported_ivhd_type() -> get_supported_ivhd_type() will >> now be invoked before amd_iommu_detect_acpi() -> detect_iommu_acpi(). In >> the course of doing so stop open-coding acpi_tb_checksum(), seeing that >> we have other uses of this originally ACPI-private function elsewhere in >> the tree. >> >> Signed-off-by: Jan Beulich <jbeulich@xxxxxxxx> > > I'm afraid this breaks booting on Skylake Server. Yes, really - I > didn't believe the bisection at first either. > > From a bit of debugging, I've found: > > (XEN) *** acpi_dmar_init() => -19 > (XEN) *** amd_iommu_get_supported_ivhd_type() => -19 > > So VT-d is disabled in firmware. Oops, but something we should cope with. I wanted to say that I definitely did test this (for a long, long time) on Intel systems, but clearly not on one like this. I'm sure though that I did test on IOMMU-less Intel systems, so I'm still a bit puzzled. > Then we fall into acpi_ivrs_init(), and take the new-in-this-patch early > exit with -ENOENT too. > > It turns out ... > >> --- a/xen/drivers/passthrough/amd/pci_amd_iommu.c >> +++ b/xen/drivers/passthrough/amd/pci_amd_iommu.c >> @@ -179,9 +179,17 @@ static int __must_check amd_iommu_setup_ >> >> int __init acpi_ivrs_init(void) >> { >> + int rc; >> + >> if ( !iommu_enable && !iommu_intremap ) >> return 0; >> >> + rc = amd_iommu_get_supported_ivhd_type(); >> + if ( rc < 0 ) >> + return rc; >> + BUG_ON(!rc); >> + ivhd_type = rc; >> + >> if ( (amd_iommu_detect_acpi() !=0) || (iommu_found() == 0) ) >> { >> iommu_intremap = iommu_intremap_off; >> > > ... we're relying on this path (now skipped) to set iommu_intremap away > from iommu_intremap_full in the "no IOMMU anywhere to be found" case. > > This explains why I occasionally during failure get spew about: > > (XEN) CPU0: No irq handler for vector 7a (IRQ -2147483648, LAPIC) > [ 17.117518] xhci_hcd 0000:00:14.0: Error while assigning device slot ID > [ 17.121114] xhci_hcd 0000:00:14.0: Max number of devices this xHCI > host supports is 64. > [ 17.125198] usb usb1-port2: couldn't allocate usb_device > [ 248.317462] INFO: task kworker/u32:0:7 blocked for more than 120 seconds. > > and eventually (gone 400s) get dumped in a dracut shell. > > Booting with an explicit iommu=no-intremap, which clobbers > iommu_intremap during cmdline parsing, recovers the system. > > This variable controls a whole lot of magic with interrupt handling. It > should default to 0, not 2, and only become nonzero when an IOMMU is > properly established. It also shouldn't be serving double duty as "what > the user wants" ahead of determining the system capabilities. This would probably be too large a change at this point in time; I'll see whether I can find something less intrusive. Unless of course there's a patch already on xen-devel, which I didn't get to read yet. > And not to open another can of worms, but our entire way of working > explodes if there are devices on the system not covered by an IOMMU. I wouldn't be surprised, but is this something we have to expect on non-broken systems? (I do know of broken systems giving the appearance of uncovered devices by lacking suitable include-all DRHD entries.) Jan

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.