[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH 1/3] amd-vi: use the same IOMMU page table levels for PV and HVM


  • To: Roger Pau Monné <roger.pau@xxxxxxxxxx>
  • From: Jan Beulich <jbeulich@xxxxxxxx>
  • Date: Mon, 20 Nov 2023 12:34:45 +0100
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=suse.com; dmarc=pass action=none header.from=suse.com; dkim=pass header.d=suse.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=Z5kLVuIyFrgiWUcHogd2nCXxVGSKSjpcut5ZCn/5sAY=; b=HcaMo3oqL0bC//NKtPyFpXG41cdXeISRRGURYgDw8fjcj21f9QAuRimc67xWayKVU5pSy3YziI0QQaDJTiBuY0X4R4qvWMXMBDg90GzWVVS26XPDkiJ531mQAJhhFTT9wIRcoA4BpTO5p6cgbwl8EYoaHBI2k8WT7UlYCG0gyshsulGEcj7+1MsUGvKrzV5KlBnBOKS5/lP07Z/lCYXatkcsxomFY+HcVBUnF3IZ18Qn0LqkOV6E3j2SYc8eBfAg3ttSN4r/UAjhfB0inzn7azT7LdcDUsl6PsED9Da43Ewq1mcEvqXPFugwdM9bclC1XnNRYkdKj2KZdc67eVzuhA==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=gUFbK6ZKX6xM73gNggmtFspC1OvpiWjCAipAzymsWmXRA6V4AZPxsJEAABxKucWqXevSWAYu9VckmsXpQKlqeC/hIoHUZaJMyboWfwIpHWm5ePOD40mi86Z0wHHRN95L/m0EMf1bAPmO51ctUm7pGKshCyM54q78E7T+GZ2kXGT6aBzPc/6/BWbZLw1k0H+2YaUfv9AOxnEoiuClv+bwrs5UFyyQMmuz8+TeJl+vc2kzSlibhFt5UIgdBHLnL6DIKS+xA27KdzClRpUvphtw2CDk9wf33fEI3X6cdNRU3HVK3XTLVukRY5k12ss6i1nCuelyfPu/ZwVVBw53SkN+Ug==
  • Authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=suse.com;
  • Autocrypt: addr=jbeulich@xxxxxxxx; keydata= xsDiBFk3nEQRBADAEaSw6zC/EJkiwGPXbWtPxl2xCdSoeepS07jW8UgcHNurfHvUzogEq5xk hu507c3BarVjyWCJOylMNR98Yd8VqD9UfmX0Hb8/BrA+Hl6/DB/eqGptrf4BSRwcZQM32aZK 7Pj2XbGWIUrZrd70x1eAP9QE3P79Y2oLrsCgbZJfEwCgvz9JjGmQqQkRiTVzlZVCJYcyGGsD /0tbFCzD2h20ahe8rC1gbb3K3qk+LpBtvjBu1RY9drYk0NymiGbJWZgab6t1jM7sk2vuf0Py O9Hf9XBmK0uE9IgMaiCpc32XV9oASz6UJebwkX+zF2jG5I1BfnO9g7KlotcA/v5ClMjgo6Gl MDY4HxoSRu3i1cqqSDtVlt+AOVBJBACrZcnHAUSuCXBPy0jOlBhxPqRWv6ND4c9PH1xjQ3NP nxJuMBS8rnNg22uyfAgmBKNLpLgAGVRMZGaGoJObGf72s6TeIqKJo/LtggAS9qAUiuKVnygo 3wjfkS9A3DRO+SpU7JqWdsveeIQyeyEJ/8PTowmSQLakF+3fote9ybzd880fSmFuIEJldWxp Y2ggPGpiZXVsaWNoQHN1c2UuY29tPsJgBBMRAgAgBQJZN5xEAhsDBgsJCAcDAgQVAggDBBYC AwECHgECF4AACgkQoDSui/t3IH4J+wCfQ5jHdEjCRHj23O/5ttg9r9OIruwAn3103WUITZee e7Sbg12UgcQ5lv7SzsFNBFk3nEQQCACCuTjCjFOUdi5Nm244F+78kLghRcin/awv+IrTcIWF hUpSs1Y91iQQ7KItirz5uwCPlwejSJDQJLIS+QtJHaXDXeV6NI0Uef1hP20+y8qydDiVkv6l IreXjTb7DvksRgJNvCkWtYnlS3mYvQ9NzS9PhyALWbXnH6sIJd2O9lKS1Mrfq+y0IXCP10eS FFGg+Av3IQeFatkJAyju0PPthyTqxSI4lZYuJVPknzgaeuJv/2NccrPvmeDg6Coe7ZIeQ8Yj t0ARxu2xytAkkLCel1Lz1WLmwLstV30g80nkgZf/wr+/BXJW/oIvRlonUkxv+IbBM3dX2OV8 AmRv1ySWPTP7AAMFB/9PQK/VtlNUJvg8GXj9ootzrteGfVZVVT4XBJkfwBcpC/XcPzldjv+3 HYudvpdNK3lLujXeA5fLOH+Z/G9WBc5pFVSMocI71I8bT8lIAzreg0WvkWg5V2WZsUMlnDL9 mpwIGFhlbM3gfDMs7MPMu8YQRFVdUvtSpaAs8OFfGQ0ia3LGZcjA6Ik2+xcqscEJzNH+qh8V m5jjp28yZgaqTaRbg3M/+MTbMpicpZuqF4rnB0AQD12/3BNWDR6bmh+EkYSMcEIpQmBM51qM EKYTQGybRCjpnKHGOxG0rfFY1085mBDZCH5Kx0cl0HVJuQKC+dV2ZY5AqjcKwAxpE75MLFkr wkkEGBECAAkFAlk3nEQCGwwACgkQoDSui/t3IH7nnwCfcJWUDUFKdCsBH/E5d+0ZnMQi+G0A nAuWpQkjM1ASeQwSHEeAWPgskBQL
  • Cc: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, xen-devel@xxxxxxxxxxxxxxxxxxxx
  • Delivery-date: Mon, 20 Nov 2023 11:34:57 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 20.11.2023 11:50, Roger Pau Monné wrote:
> On Mon, Nov 20, 2023 at 11:37:43AM +0100, Jan Beulich wrote:
>> On 20.11.2023 11:27, Roger Pau Monné wrote:
>>> On Mon, Nov 20, 2023 at 10:45:29AM +0100, Jan Beulich wrote:
>>>> On 17.11.2023 12:55, Andrew Cooper wrote:
>>>>> On 17/11/2023 9:47 am, Roger Pau Monne wrote:
>>>>>>      /*
>>>>>> -     * Choose the number of levels for the IOMMU page tables.
>>>>>> -     * - PV needs 3 or 4, depending on whether there is RAM (including 
>>>>>> hotplug
>>>>>> -     *   RAM) above the 512G boundary.
>>>>>> -     * - HVM could in principle use 3 or 4 depending on how much guest
>>>>>> -     *   physical address space we give it, but this isn't known yet so 
>>>>>> use 4
>>>>>> -     *   unilaterally.
>>>>>> -     * - Unity maps may require an even higher number.
>>>>>> +     * Choose the number of levels for the IOMMU page tables, taking 
>>>>>> into
>>>>>> +     * account unity maps.
>>>>>>       */
>>>>>> -    hd->arch.amd.paging_mode = max(amd_iommu_get_paging_mode(
>>>>>> -            is_hvm_domain(d)
>>>>>> -            ? 1UL << (DEFAULT_DOMAIN_ADDRESS_WIDTH - PAGE_SHIFT)
>>>>>> -            : get_upper_mfn_bound() + 1),
>>>>>> -        amd_iommu_min_paging_mode);
>>>>>> +    hd->arch.amd.paging_mode = max(pgmode, amd_iommu_min_paging_mode);
>>>>>
>>>>> I think these min/max variables can be dropped now we're not doing
>>>>> variable height IOMMU pagetables, which further simplifies this 
>>>>> expression.
>>>>
>>>> Did you take unity maps into account? At least $subject and comment looks
>>>> to not be consistent in this regard: Either unity maps need considering
>>>> specially (and then we don't uniformly use the same depth), or they don't
>>>> need mentioning in the comment (anymore).
>>>
>>> Unity maps that require an address width > DEFAULT_DOMAIN_ADDRESS_WIDTH
>>> will currently only work on PV at best, as HVM p2m code is limited to
>>> 4 level page tables, so even if the IOMMU page tables support a
>>> greater address width the call to map such regions will trigger an
>>> error in the p2m code way before attempting to create any IOMMU
>>> mappings.
>>>
>>> We could do:
>>>
>>> hd->arch.amd.paging_mode =
>>>     is_hvm_domain(d) ? pgmode : max(pgmode, amd_iommu_min_paging_mode);
>>>
>>> Putting IVMD/RMRR regions that require the usage of 5 level page
>>> tables would be a very short sighted move by vendors IMO.
>>>
>>> And will put us back in a situation where PV vs HVM can get different
>>> IOMMU page table levels, which is undesirable.  It might be better to
>>> just assume all domains use DEFAULT_DOMAIN_ADDRESS_WIDTH and hide
>>> devices that have IVMD/RMRR regions above that limit.
>>
>> That's a possible approach, yes. To be honest, I was actually hoping we'd
>> move in a different direction: Do away with the entirely arbitrary
>> DEFAULT_DOMAIN_ADDRESS_WIDTH, and use actual system properties instead.
> 
> Hm, yes, that might be a sensible approach, but right now I don't want
> to block this series on such (likely big) piece of work.  I think we
> should aim for HVM and PV to have the same IOMMU page table levels,
> and that's currently limited by the p2m code only supporting 4 levels.

No, I certainly don't mean to introduce a dependency there. Yet what
you do here goes actively against that possible movement in the other
direction: What "actual system properties" are differs between PV and
HVM (host properties vs guest properties), and hence there would
continue to be a (possible) difference in depth between the two.

>> Whether having PV and HVM have uniform depth is indeed desirable is also
>> not entirely obvious to me. Having looked over patch 3 now, it also
>> hasn't become clear to me why the change here is actually a (necessary)
>> prereq.
> 
> Oh, it's a prereq because I've found AMD systems that have reserved
> regions > 512GB, but no RAM past that region.  arch_iommu_hwdom_init()
> would fail on those systems when patch 3/3 was applied, as then
> reserved regions past the last RAM address are also mapped in
> arch_iommu_hwdom_init().

Hmm, interesting. I can't bring together "would fail" and "are also
mapped" though, unless the latter was meant to say "are attempted to
also be mapped", in which case I could at least see room for failure.
Yet still this would then feel like an issue with the last patch alone,
which the change here is merely avoiding (without this being a strict
prereq). Instead I'd expect us to use 4 levels whenever there are any
kind of regions (reserved or not) above 512G. Without disallowing use
of 3 levels on other (smaller) systems.

Jan



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.