[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v6 01/11] lib/x86: Relax checks about policy compatibility


  • To: Alejandro Vallejo <alejandro.vallejo@xxxxxxxxx>
  • From: Jan Beulich <jbeulich@xxxxxxxx>
  • Date: Thu, 10 Oct 2024 09:37:15 +0200
  • Autocrypt: addr=jbeulich@xxxxxxxx; keydata= xsDiBFk3nEQRBADAEaSw6zC/EJkiwGPXbWtPxl2xCdSoeepS07jW8UgcHNurfHvUzogEq5xk hu507c3BarVjyWCJOylMNR98Yd8VqD9UfmX0Hb8/BrA+Hl6/DB/eqGptrf4BSRwcZQM32aZK 7Pj2XbGWIUrZrd70x1eAP9QE3P79Y2oLrsCgbZJfEwCgvz9JjGmQqQkRiTVzlZVCJYcyGGsD /0tbFCzD2h20ahe8rC1gbb3K3qk+LpBtvjBu1RY9drYk0NymiGbJWZgab6t1jM7sk2vuf0Py O9Hf9XBmK0uE9IgMaiCpc32XV9oASz6UJebwkX+zF2jG5I1BfnO9g7KlotcA/v5ClMjgo6Gl MDY4HxoSRu3i1cqqSDtVlt+AOVBJBACrZcnHAUSuCXBPy0jOlBhxPqRWv6ND4c9PH1xjQ3NP nxJuMBS8rnNg22uyfAgmBKNLpLgAGVRMZGaGoJObGf72s6TeIqKJo/LtggAS9qAUiuKVnygo 3wjfkS9A3DRO+SpU7JqWdsveeIQyeyEJ/8PTowmSQLakF+3fote9ybzd880fSmFuIEJldWxp Y2ggPGpiZXVsaWNoQHN1c2UuY29tPsJgBBMRAgAgBQJZN5xEAhsDBgsJCAcDAgQVAggDBBYC AwECHgECF4AACgkQoDSui/t3IH4J+wCfQ5jHdEjCRHj23O/5ttg9r9OIruwAn3103WUITZee e7Sbg12UgcQ5lv7SzsFNBFk3nEQQCACCuTjCjFOUdi5Nm244F+78kLghRcin/awv+IrTcIWF hUpSs1Y91iQQ7KItirz5uwCPlwejSJDQJLIS+QtJHaXDXeV6NI0Uef1hP20+y8qydDiVkv6l IreXjTb7DvksRgJNvCkWtYnlS3mYvQ9NzS9PhyALWbXnH6sIJd2O9lKS1Mrfq+y0IXCP10eS FFGg+Av3IQeFatkJAyju0PPthyTqxSI4lZYuJVPknzgaeuJv/2NccrPvmeDg6Coe7ZIeQ8Yj t0ARxu2xytAkkLCel1Lz1WLmwLstV30g80nkgZf/wr+/BXJW/oIvRlonUkxv+IbBM3dX2OV8 AmRv1ySWPTP7AAMFB/9PQK/VtlNUJvg8GXj9ootzrteGfVZVVT4XBJkfwBcpC/XcPzldjv+3 HYudvpdNK3lLujXeA5fLOH+Z/G9WBc5pFVSMocI71I8bT8lIAzreg0WvkWg5V2WZsUMlnDL9 mpwIGFhlbM3gfDMs7MPMu8YQRFVdUvtSpaAs8OFfGQ0ia3LGZcjA6Ik2+xcqscEJzNH+qh8V m5jjp28yZgaqTaRbg3M/+MTbMpicpZuqF4rnB0AQD12/3BNWDR6bmh+EkYSMcEIpQmBM51qM EKYTQGybRCjpnKHGOxG0rfFY1085mBDZCH5Kx0cl0HVJuQKC+dV2ZY5AqjcKwAxpE75MLFkr wkkEGBECAAkFAlk3nEQCGwwACgkQoDSui/t3IH7nnwCfcJWUDUFKdCsBH/E5d+0ZnMQi+G0A nAuWpQkjM1ASeQwSHEeAWPgskBQL
  • Cc: Roger Pau Monné <roger.pau@xxxxxxxxxx>, Anthony PERARD <anthony.perard@xxxxxxxxxx>, Xen-devel <xen-devel@xxxxxxxxxxxxxxxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
  • Delivery-date: Thu, 10 Oct 2024 07:37:27 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 09.10.2024 17:57, Alejandro Vallejo wrote:
> On Wed Oct 9, 2024 at 10:40 AM BST, Jan Beulich wrote:
>> On 01.10.2024 14:37, Alejandro Vallejo wrote:
>>> --- a/xen/lib/x86/policy.c
>>> +++ b/xen/lib/x86/policy.c
>>> @@ -15,7 +15,16 @@ int x86_cpu_policies_are_compatible(const struct 
>>> cpu_policy *host,
>>>  #define FAIL_MSR(m) \
>>>      do { e.msr = (m); goto out; } while ( 0 )
>>>  
>>> -    if ( guest->basic.max_leaf > host->basic.max_leaf )
>>> +    /*
>>> +     * Old AMD hardware doesn't expose topology information in leaf 0xb. We
>>> +     * want to emulate that leaf with credible information because it must 
>>> be
>>> +     * present on systems in which we emulate the x2APIC.
>>> +     *
>>> +     * For that reason, allow the max basic guest leaf to be larger than 
>>> the
>>> +     * hosts' up until 0xb.
>>> +     */
>>> +    if ( guest->basic.max_leaf > 0xb &&
>>> +         guest->basic.max_leaf > host->basic.max_leaf )
>>>          FAIL_CPUID(0, NA);
>>>  
>>>      if ( guest->feat.max_subleaf > host->feat.max_subleaf )
>>
>> I'm concerned by this in multiple ways:
>>
>> 1) It's pretty ad hoc, and hence doesn't make clear how to deal with similar
>> situations in the future.
> 
> I agree. I don't have a principled suggestion for how to deal with other cases
> where we might have to bump the max leaf. It may be safe (as is here becasue
> everything below it is either used or unimplemnted), but AFAIU some leaves
> might be problematic to expose, even as zeroes. I suspect that's the problem
> you hint at later on about AMX and AVX10?

Not exactly, but perhaps somewhat related (see below).

>> 2) Why would we permit going up to leaf 0xb when x2APIC is off in the 
>> respective
>> leaf?
> 
> I assume you mean when the x2APIC is not emulated? One reason is to avoid a
> migration barrier, as otherwise we can't migrate VMs created in "leaf
> 0xb"-capable hardware to non-"leaf 0xb"-capable even though the migration is
> perfectly safe.

Leaf 0xb ought to be synthesized anyway (to match the guest's topology);
hardware capabilities hence don't matter here.

> Also, it's benign and simplifies everything. Otherwise we have to find out
> during early creation not only whether the host has leaf 0xb, but also whether
> we're emulating an x2APIC or not.

The policy passed by the tool stack will tell you what the choice there was.

> Furthermore, not doing this would actively prevent emulating an x2APIC on AMD
> Lisbon-like silicon even though it's fine to do so.

I'm afraid I don't understand this. If the tool stack cleared the x2APIC bit,
x2APIC ought to not be emulated. If it sets it (as permitted by the max
policy), x2APIC would be emulated.

> Note that we have a broken
> invariant in existing code where the x2APIC is emulated and leaf 0xb is not
> exposed at all; not even to show the x2APIC IDs.

Well, fixing this is what this series is about, isn't it?

>> 3) We similarly force a higher extended leaf in order to accommodate the 
>> LFENCE-
>> is-dispatch-serializing bit. Yet there's no similar extra logic there in the
>> function here.
> 
> That's done on the host policy though, so there's no clash.

There's no clash, sure, but ...

> In calculate_host_policy()...
> 
> ```
>       /*
>        * For AMD/Hygon hardware before Zen3, we unilaterally modify LFENCE to 
> be
>        * dispatch serialising for Spectre mitigations.  Extend max_extd_leaf
>        * beyond what hardware supports, to include the feature leaf containing
>        * this information.
>        */
>       if ( cpu_has_lfence_dispatch )
>           max_extd_leaf = max(max_extd_leaf, 0x80000021U);
> ```
> 
> One could imagine doing the same for leaf 0xb and dropping this patch, but 
> then
> we'd have to synthesise something on that leaf for hardware that doesn't have
> it, which is a lot more annoying.

... we're doing things one way there and another way here. Which is generally
undesirable imo.

>> 4) While there the guest vs host check won't matter, the situation with AMX 
>> and
>> AVX10 leaves imo still wants considering here right away. IOW (taken together
>> with at least 3) above) I think we need to first settle on a model for
>> collectively all max (sub)leaf handling. That in particular needs to properly
>> spell out who's responsible for what (tool stack vs Xen).
> 
> I'm not sure I follow. What's the situation with AMX and AVX10 that you refer
> to?

See the prereq series to both, most recently posted at
https://lists.xen.org/archives/html/xen-devel/2024-08/msg00591.html

That's hacky; Andrew has indicated that he'd like to take care of this (mostly)
in the tool stack instead. Yet so far nothing has surfaced, hence I'm keeping
to have this dependency for both series.

Jan

> I'd assume that making ad-hoc decisions on this is pretty much unavoidable,
> but maybe the solution to the problem you mention would highlight a more
> general approach.
> 
> Cheers,
> Alejandro




 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.