[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH for-4.18] iommu/vt-d: use max supported AGAW


  • To: Roger Pau Monne <roger.pau@xxxxxxxxxx>, Kevin Tian <kevin.tian@xxxxxxxxx>
  • From: Jan Beulich <jbeulich@xxxxxxxx>
  • Date: Wed, 18 Oct 2023 09:54:15 +0200
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=suse.com; dmarc=pass action=none header.from=suse.com; dkim=pass header.d=suse.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=uyg6YN3Zh1a9d4JC5xIfee4oWWmRWWvhCuc2nePrjQA=; b=XQMHxAK3oAkx0RuaureRHqwjV5E//RUPobR6n1B+5bQrC9eoIpAlITpQYabedgldjTdmHTZTjB00nSKUGHjg4bxMU9nkzOLDGe7U8vLOeS+0l4nsaIUyCJbIW/s8bnFR8xLT/Z3V6cbAyWFxesVFoNOlckfUkA7fLeQMx52SUSqd0kx/LMRufK/uwGVEe9vUSbPDJF32s560/tZBjU2bLUsXUhXSwfF06IPrI/3845sefsYDNPDMLj+G/bWux/glDW5qXr7Z9lzrlj3MGNvLWmaf9nirJWt2sainE1f/NqsOr91CRqny/4KKtHjJHy+yAXZSIYJmZurwxNFIHt6v5Q==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=bmPY7jfNsV6F4feHfrFKQJ8AcY592CshrT+3AD0Ngr9/Xzp6A6kTfSLVP2fZ9fBDqYY53H8hD7NR0k35RV0aeZU6xkVDQvEt2jZm8UBIkSWNnZ3WYhTuA2+7H2rTUWo9xmPu0x2Vx58rHgfwnamlwfZKqZtubjNVJ30GQPYw5jzjsOZ49A0XCih6ar7eD92N0xCwSqbGPXJQoqWjyrbFcSeKMI5QHWLh/Uq3vpNZzilfA34g8c97cgyXbnJofnC3k0H1GW3QBBuO8si9et14XADSVKjO9DguX5nhFvenNPa0RW4X89GF8kFh+0JJ0izBsIRBQLNeaWIVw4T+4ByjYA==
  • Authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=suse.com;
  • Cc: Henry Wang <Henry.Wang@xxxxxxx>, xen-devel@xxxxxxxxxxxxxxxxxxxx
  • Delivery-date: Wed, 18 Oct 2023 07:54:39 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 17.10.2023 15:09, Roger Pau Monne wrote:
> SAGAW is a bitmap field, with bits 1 and 2 signaling support for AGAW 1 and
> AGAW 2 respectively.  According to the Intel VT-d specification, an IOMMU 
> might
> support multiple AGAW values.
> 
> The AGAW value for each device is set in the device context entry, however
> there's a caveat related to the value the field supports depending on the
> translation type:
> 
> "When the Translation-type (T) field indicates pass-through (010b) or
> guest-mode (100b or 101b), this field must be programmed to indicate the
> largest AGAW value supported by hardware."

Considering SAGAW=3 was reserved in earlier versions, and considering SAGAW=4
and higher continue to be reserved, how is one to write forward-compatible
code? (In retrospect I think this is what mislead me to wrongly use
find_first_set_bit().)

Furthermore, which version of the spec are you looking at? The newest public
one I found is 4.1 (-016), which only mentions pass-through, and only as a
2-bit quantity. (Doesn't matter much for the purposes of the actual code
change, but still.)

> Of the translation types listed above Xen only uses pass-through (010b), and
> hence we need to make sure the context entry AGAW field is set appropriately,
> or else the IOMMU will report invalid context entry errors.
> 
> To do so calculate the IOMMU supported page table levels based on the last bit
> set in the SAGAW field, instead of the first one.  This also allows making use
> of the widest address width supported by the IOMMU, in case multiple AGAWs are
> supported.

To truly achieve that (with the 5-level spec), ...

> --- a/xen/drivers/passthrough/vtd/iommu.c
> +++ b/xen/drivers/passthrough/vtd/iommu.c
> @@ -1328,7 +1328,7 @@ int __init iommu_alloc(struct acpi_drhd_unit *drhd)
>      /* Calculate number of pagetable levels: 3 or 4. */
>      sagaw = cap_sagaw(iommu->cap);
>      if ( sagaw & 6 )
> -        agaw = find_first_set_bit(sagaw & 6);
> +        agaw = fls(sagaw & 6) - 1;

... the mask here needs widening to 0xe. But see my forward-compatibility
remark above: It may need widening even further. Yet I'm not sure our code
is uniformly ready to handle levels > 4. As a result I think we need to
further alter the use of context_set_address_width(): We don't necessarily
want to use the maximum value with CONTEXT_TT_{DEV_IOTLB,MULTI_LEVEL}.
Specifically I don't think we want to use levels=5 (aw=3) there, until
such time that we support 5-level page tables (which as it looks right now
may well end up being never).

Furthermore just out of context we have

    iommu->nr_pt_levels = agaw_to_level(agaw);
    if ( min_pt_levels > iommu->nr_pt_levels )
        min_pt_levels = iommu->nr_pt_levels;

With fls() instead of find_first_set_bit() this won't be correct anymore.
Yet looking at the sole use (and depending on the resolution of the other
issue) it may be a mere matter of renaming the variable to properly
reflect its purpose.

Taking together perhaps:
- nr_pt_levels needs setting to the larger of 3 and 4, depending on what
  hardware supports, for use in non-pass-through entries,
- a new max_pt_levels field needs setting to what would result from your
  code change above, for use in pass-through entries,
- min_pt_levels could then remain as is,
- for the moment we ignore the forward-compatibility aspect, until the
  underlying principle has been clarified by Intel.

A possible further complication then is that we will end up switching
context entries between different AW values. That's not an issue when
we use CMPXCHG16B or transiently clear the present bit, but our best
effort fallback would likely be of security concern then.

Jan



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.