[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH for-4.18 v2] iommu/vt-d: fix SAGAW capability parsing


  • To: Jan Beulich <jbeulich@xxxxxxxx>
  • From: Roger Pau Monné <roger.pau@xxxxxxxxxx>
  • Date: Thu, 19 Oct 2023 11:46:14 +0200
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=citrix.com; dmarc=pass action=none header.from=citrix.com; dkim=pass header.d=citrix.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=QirGQsxmnxCJFOtgMr79yOeTAgQqECVNZ8Q3BTeM8Es=; b=mdNeuyN260dLh/oOtmz7lqCahiHFMsz2fk/5XY6opPcywj6DLPNVYR9WET5/tbQPKSSMCaBYKm0OAqp+Vj/UXaEGxHOEvVJ8wFf4H8W9nFMZ8nfVbkwyv4InKdEdVUdfDzDcvyxyMsThsMmIUICbeXkvkI5ehM6E4y4fwaEQm4+GtTQJcMqTIZvcSoQFucRkgs/nwYSjBZbyIRoWKrioYYTkVgkCskiZ9xIzCOB7Ilef7ABjFwbSagJ0XgUCSNz6JfRa2ETCpzoVgNRM+HWOaTDqoImOOZXwe/+5z2+7/mKmfjHk0tyddNDbhIjtE34zZgu2S0aijMV1trR8aqr8yA==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=b+f0d43+7qcvV7Yobubm1In5JXNPXg4f4P8VfvB+6JjiQvz4mXNQotaH7w65/7ecCTtFMIOaqGOEPDhZVy9lIjmsy0kz7fYUKfYARgIiIyaWxA1puv10CYYIrJYrdYKqnM5n6P3xedClqD2qDrxFtAVzfE49+ykM4+U8vV+035htoGP2xvqHMByIx7fPPnzTgZ/hMCsUmFk96xya+MPzfrSYlqiPukEuwTJuL/DG55yn3K3RFbToEMc20Pn5LSNTjmYC8GDAq8k1klk9f1xFVYjjSflIYNhblEerbGqb+q160sKU4loC6rDC96qsB8FUFuunXfwoVkBuZehW/9qPdg==
  • Authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=citrix.com;
  • Cc: Henry Wang <Henry.Wang@xxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Kevin Tian <kevin.tian@xxxxxxxxx>, xen-devel@xxxxxxxxxxxxxxxxxxxx
  • Delivery-date: Thu, 19 Oct 2023 09:46:33 +0000
  • Ironport-data: A9a23:tnPhy6N0VeWTZv/vrR1hlsFynXyQoLVcMsEvi/4bfWQNrUoj0jFWy jNJCG2Hb/qPZmKgKNwib4W09UNT7JDUz9EyGwto+SlhQUwRpJueD7x1DKtS0wC6dZSfER09v 63yTvGacajYm1eF/k/F3oDJ9CQ6jefQAOOkVIYoAwgpLSd8UiAtlBl/rOAwh49skLCRDhiE/ Nj/uKUzAnf8s9JPGjxSs/nrRC9H5qyo42pA5wxmPpingXeF/5UrJMNHTU2OByOQrrl8RoaSW +vFxbelyWLVlz9F5gSNy+uTnuUiG9Y+DCDW4pZkc/HKbitq/0Te5p0TJvsEAXq7vh3S9zxHJ HehgrTrIeshFvWkdO3wyHC0GQkmVUFN0OevzXRSLaV/ZqAJGpfh66wGMa04AWEX0v9uMVthz 6U6EzAiRBGk3dzpyo+pc+Y506zPLOGzVG8ekldJ6GiDSNoDH9XESaiM4sJE1jAtgMwIBezZe 8cSdTtoalLHfgFLPVAUTpk5mY9EhFGmK2Ee9A3T+PtxujaCpOBy+OGF3N79YNuFSN8Thk+Fj mnH4374ElcRM9n3JT+tqyjy2rCUzXKkMG4UPJ6H2K5WnlqL/zMCDx1OTVKLu9uEtnfrDrqzL GRRoELCt5Ma9kamU938VB2Qu2Ofs1gXXN84O8037hucjJXd5QmxD3IBCDVGbbQOt8IoRDpsy l6AmfvoAyBitPueTnf13qeZq3a+NDYYKUcGZDQYVk0V7t/7uoYxgxnTCNF5H8aIYsbdHDjxx 3WGqXY4jrBK18oTjfzlrBbAni6moYXPQkgt/ALLU2m57wR/Iom4e4iv7lud5vFFRGqEcmS8U LE/s5D2xIgz4VulzkRhnM1l8GmV2su4
  • Ironport-hdrordr: A9a23:mXiD/qFD95aHHpOypLqFrZHXdLJyesId70hD6qkRc20hTiX8ra vBoB1173/JYUkqKQ0dcLy7WZVoIkmshqKdn7NhX4tKNTOO0AGVxepZnOjfKlPbakjDHuU079 YeT0AXYuedMbAQ5/yU3OF2eexM/PC3tJmNwcPi5zNVSwduApsQnTuQyGygYzNLrM0tP+tIKH JYjPA31gZIAk5nCviTNz0+Ru3eoN+OvIv+CCR2fiIP2U21lDa177y/OASZ2xp2aUIz/Z4StV LdlhD/5OGFu/W2oyWssFP73tBtgd78zdkGItKKhtN9EESLti+YIL55XqGEvnQOgMzH0idTrP D85y04Oth16TfqcnqrrQDL0w3tuQxekEPK+BujmH7+ps68ez4gEcpGgutiA2Hk13Y=
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On Thu, Oct 19, 2023 at 10:49:29AM +0200, Jan Beulich wrote:
> On 19.10.2023 10:15, Roger Pau Monné wrote:
> > On Thu, Oct 19, 2023 at 09:41:41AM +0200, Jan Beulich wrote:
> >> On 18.10.2023 18:07, Roger Pau Monne wrote:
> >>> SAGAW is a bitmap field, with bits 1, 2 and 3 signaling support for 3, 4 
> >>> and 5
> >>> level page tables respectively.  According to the Intel VT-d 
> >>> specification, an
> >>> IOMMU can report multiple SAGAW bits being set.
> >>>
> >>> Commit 859d11b27912 claims to replace the open-coded 
> >>> find_first_set_bit(), but
> >>> it's actually replacing an open coded implementation to find the last set 
> >>> bit.
> >>> The change forces the used AGAW to the lowest supported by the IOMMU 
> >>> instead of
> >>> the highest one between 1 and 2.
> >>>
> >>> Restore the previous SAGAW parsing by using fls() instead of
> >>> find_first_set_bit(), in order to get the highest (supported) AGAW to be 
> >>> used.
> >>>
> >>> However there's a caveat related to the value the AW context entry field 
> >>> must
> >>> be set to when using passthrough mode:
> >>>
> >>> "When the Translation-type (TT) field indicates pass-through processing 
> >>> (10b),
> >>> this field must be programmed to indicate the largest AGAW value 
> >>> supported by
> >>> hardware." [0]
> >>>
> >>> Newer Intel IOMMU implementations support 5 level page tables for the 
> >>> IOMMU,
> >>> and signal such support in SAGAW bit 3.
> >>>
> >>> Enabling 5 level paging support (AGAW 3) at this point in the release is 
> >>> too
> >>> risky, so instead put a bodge to unconditionally disable passthough mode 
> >>> if
> >>> SAGAW has any bits greater than 2 set.  Ignore bit 0, it's reserved in the
> >>> specification but unlikely to have any meaning in the future.
> >>
> >> May be worth mentioning that in earlier versions this indicated 2-level
> >> paging support.
> > 
> > Oh, that's not even present in my copy of the spec from 2016.  I guess
> > it was removed very, very long time ago?
> 
> Indeed, as mentioned in the commit you're fixing. Version 1.3 of the
> spec still has it. Judging by the document numbers 2.2 may have been
> its direct successor (i.e. no further 1.x and no 2.0 or 2.1).
> 
> >>> --- a/xen/drivers/passthrough/vtd/iommu.c
> >>> +++ b/xen/drivers/passthrough/vtd/iommu.c
> >>> @@ -1327,15 +1327,24 @@ int __init iommu_alloc(struct acpi_drhd_unit 
> >>> *drhd)
> >>>  
> >>>      /* Calculate number of pagetable levels: 3 or 4. */
> >>>      sagaw = cap_sagaw(iommu->cap);
> >>> -    if ( sagaw & 6 )
> >>> -        agaw = find_first_set_bit(sagaw & 6);
> >>> -    if ( !agaw )
> >>> +    agaw = fls(sagaw & 6) - 1;
> >>> +    if ( agaw == -1 )
> >>
> >> Would you mind making this "< 0" or even "<= 0"? The latter in particular
> >> would already cover the likely future change of dropping the masking by 6.
> > 
> > My plan wasn't to drop the masking, but use 0xe if we support AGAW 3.
> 
> But we will also need to deal with bit 4 (at which point applying a mask
> is going to be useless code). We can either guess that it's going to mean
> 6-level paging, or we need to treat it as having unknown meaning when set
> (implying that we'd then still need to either fail initialization or
> disable pass-through mode).

I wouldn't enable support for AGAW 4 unless we have a way to test it.
It's safer to just disable passthrough mode if SAGAW bit 4 is set.

> > I'm fine with using < or <= if you think it's more robust.
> 
> Good, will do so then.
> 
> >>>      {
> >>>          printk(XENLOG_ERR VTDPREFIX "IOMMU: unsupported sagaw %x\n", 
> >>> sagaw);
> >>>          print_iommu_regs(drhd);
> >>>          rc = -ENODEV;
> >>>          goto free;
> >>>      }
> >>> +    if ( sagaw >> 3 )
> >>> +    {
> >>> +        printk_once(XENLOG_WARNING VTDPREFIX
> >>> +                    "IOMMU: unhandled bits set in sagaw (%#x)%s\n",
> >>
> >> I think IOMMU: is redundant with VTDPREFIX (or alternatively iommu->index
> >> would also want logging). Also note that VTDPREFIX (bogusly) has no
> >> trailing space. (I realize both apply to the other log message in context
> >> as well, but still. I'd be inclined to adjust that at the same time,
> >> including switching to %#x as you have it in the new log message.)
> > 
> > Oh, I didn't realize VTDPREFIX had no trailing space.
> > 
> > Since it's a printk_once(), not sure iommu->index is really useful
> > here, as we would report just one IOMMU has having an unhandled SAGAW.
> > IMO if we switch to printing iommu->index we must also use a plain
> > printk.  But I don't see a lot of benefit in printing this for likely
> > each IOMMU on the system, and hence I would rather use printk_once()
> > and not print the index.
> 
> Well, logging the index in printk_once() has the benefit of identifying
> the first IOMMU with the issue, which may help further analysis if not
> all of them have bits beyond 2 set. But I'm not going to insist on this
> aspect.
> 
> > Feel free to drop the IOMMU prefix, but I'm not sure what to do with
> > VTDPREFIX and the missing trialing space, as some users of VTDPREFIX
> > already account for such missing space.
> 
> I'd simply insert a leading space in the string literal.

Ack.

> >>> +                    sagaw,
> >>> +                    iommu_hwdom_passthrough ? " disabling passthrough" : 
> >>> "" );
> >>
> >> May want a leading comma (or some other separator) in the string.
> >>
> >>> +        if ( iommu_hwdom_passthrough )
> >>> +            iommu_hwdom_passthrough = false;
> >>
> >> No real need for if() here.
> > 
> > Not really, but also no need for a write to iommu_hwdom_passthrough
> > every time an IOMMU is initialized if the condition is removed.
> 
> This is init-time code, and hence the excess writes aren't going to be
> noticable.

I don't have a strong opinion, TBH I used to have it without the
conditional and added it later.  If you prefer to drop the conditional
I won't oppose.

Thanks, Roger.



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.