[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH 2/5] x86/HVM: allocate emulation cache entries dynamically


  • To: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
  • From: Jan Beulich <jbeulich@xxxxxxxx>
  • Date: Mon, 9 Sep 2024 13:19:35 +0200
  • Autocrypt: addr=jbeulich@xxxxxxxx; keydata= xsDiBFk3nEQRBADAEaSw6zC/EJkiwGPXbWtPxl2xCdSoeepS07jW8UgcHNurfHvUzogEq5xk hu507c3BarVjyWCJOylMNR98Yd8VqD9UfmX0Hb8/BrA+Hl6/DB/eqGptrf4BSRwcZQM32aZK 7Pj2XbGWIUrZrd70x1eAP9QE3P79Y2oLrsCgbZJfEwCgvz9JjGmQqQkRiTVzlZVCJYcyGGsD /0tbFCzD2h20ahe8rC1gbb3K3qk+LpBtvjBu1RY9drYk0NymiGbJWZgab6t1jM7sk2vuf0Py O9Hf9XBmK0uE9IgMaiCpc32XV9oASz6UJebwkX+zF2jG5I1BfnO9g7KlotcA/v5ClMjgo6Gl MDY4HxoSRu3i1cqqSDtVlt+AOVBJBACrZcnHAUSuCXBPy0jOlBhxPqRWv6ND4c9PH1xjQ3NP nxJuMBS8rnNg22uyfAgmBKNLpLgAGVRMZGaGoJObGf72s6TeIqKJo/LtggAS9qAUiuKVnygo 3wjfkS9A3DRO+SpU7JqWdsveeIQyeyEJ/8PTowmSQLakF+3fote9ybzd880fSmFuIEJldWxp Y2ggPGpiZXVsaWNoQHN1c2UuY29tPsJgBBMRAgAgBQJZN5xEAhsDBgsJCAcDAgQVAggDBBYC AwECHgECF4AACgkQoDSui/t3IH4J+wCfQ5jHdEjCRHj23O/5ttg9r9OIruwAn3103WUITZee e7Sbg12UgcQ5lv7SzsFNBFk3nEQQCACCuTjCjFOUdi5Nm244F+78kLghRcin/awv+IrTcIWF hUpSs1Y91iQQ7KItirz5uwCPlwejSJDQJLIS+QtJHaXDXeV6NI0Uef1hP20+y8qydDiVkv6l IreXjTb7DvksRgJNvCkWtYnlS3mYvQ9NzS9PhyALWbXnH6sIJd2O9lKS1Mrfq+y0IXCP10eS FFGg+Av3IQeFatkJAyju0PPthyTqxSI4lZYuJVPknzgaeuJv/2NccrPvmeDg6Coe7ZIeQ8Yj t0ARxu2xytAkkLCel1Lz1WLmwLstV30g80nkgZf/wr+/BXJW/oIvRlonUkxv+IbBM3dX2OV8 AmRv1ySWPTP7AAMFB/9PQK/VtlNUJvg8GXj9ootzrteGfVZVVT4XBJkfwBcpC/XcPzldjv+3 HYudvpdNK3lLujXeA5fLOH+Z/G9WBc5pFVSMocI71I8bT8lIAzreg0WvkWg5V2WZsUMlnDL9 mpwIGFhlbM3gfDMs7MPMu8YQRFVdUvtSpaAs8OFfGQ0ia3LGZcjA6Ik2+xcqscEJzNH+qh8V m5jjp28yZgaqTaRbg3M/+MTbMpicpZuqF4rnB0AQD12/3BNWDR6bmh+EkYSMcEIpQmBM51qM EKYTQGybRCjpnKHGOxG0rfFY1085mBDZCH5Kx0cl0HVJuQKC+dV2ZY5AqjcKwAxpE75MLFkr wkkEGBECAAkFAlk3nEQCGwwACgkQoDSui/t3IH7nnwCfcJWUDUFKdCsBH/E5d+0ZnMQi+G0A nAuWpQkjM1ASeQwSHEeAWPgskBQL
  • Cc: Roger Pau Monné <roger.pau@xxxxxxxxxx>, "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>
  • Delivery-date: Mon, 09 Sep 2024 11:19:43 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 06.09.2024 21:20, Andrew Cooper wrote:
> On 04/09/2024 2:29 pm, Jan Beulich wrote:
>> Both caches may need higher capacity, and the upper bound will need to
>> be determined dynamically based on CPUID policy (for AMX at least).
> 
> Is this to cope with TILE{LOAD,STORE}, or something else?

These two, yes.

> It's not exactly clear, even when looking at prior AMX series.

I've added mention of them.

>> While touching the check in hvmemul_phys_mmio_access() anyway, also
>> tighten it: To avoid overrunning the internal buffer we need to take the
>> offset into the buffer into account.
> 
> Does this really want to be mixed with a prep patch ?

Taking the offset into account becomes more important with subsequent
patches. If needed, I can certainly split this out into its own prereq
patch.

>> --- a/xen/arch/x86/hvm/emulate.c
>> +++ b/xen/arch/x86/hvm/emulate.c
>> @@ -26,6 +26,18 @@
>>  #include <asm/iocap.h>
>>  #include <asm/vm_event.h>
>>  
>> +/*
>> + * We may read or write up to m512 or up to a tile row as a number of
>> + * device-model transactions.
>> + */
>> +struct hvm_mmio_cache {
>> +    unsigned long gla;
>> +    unsigned int size;
>> +    unsigned int space:31;
>> +    unsigned int dir:1;
>> +    uint8_t buffer[] __aligned(sizeof(long));
> 
> I know this is a minor tangent, but you are turning a regular struct
> into a flexible one.
> 
> Could we introduce __counted_by() and start using it here?
> 
> At the toolchain level, it lets the compiler understand the real size of
> the object, so e.g. the sanitisers can spot out-of-bounds accesses
> through the flexible member.
> 
> But, even in the short term, having
> 
>     /* TODO */
>     # define __counted_by(member)
> 
> in compiler.h still leaves us with better code, because
> 
>     struct hvm_mmio_cache {
>         unsigned long gla;
>         unsigned int size;
>         unsigned int space:31;
>         unsigned int dir:1;
>         uint8_t buffer[] __aligned(sizeof(long)) __counted_by(size);
>     };
> 
> is explicitly clear in a case where the "space" field creates some
> ambiguity.

Which raises a question here: Is it really "size" that you mean, not
"space"? It's the latter that describes the capacity after all.

As to __counted_by() (or counted_by()) introduction: While is seems
pretty orthogonal, I could of course add a prereq patch to introduce it.
Yet - even if I had it expand to nothing for now, what do you expect it
to expand to going forward? I've just gone through docs trying to find
something to match, yet with no success.

>> @@ -2978,16 +2991,21 @@ void hvm_dump_emulation_state(const char
>>  int hvmemul_cache_init(struct vcpu *v)
>>  {
>>      /*
>> -     * No insn can access more than 16 independent linear addresses (AVX512F
>> -     * scatters/gathers being the worst). Each such linear range can span a
>> -     * page boundary, i.e. may require two page walks. Account for each insn
>> -     * byte individually, for simplicity.
>> +     * AVX512F scatter/gather insns can access up to 16 independent linear
>> +     * addresses, up to 8 bytes size. Each such linear range can span a page
>> +     * boundary, i.e. may require two page walks.
>> +     */
>> +    unsigned int nents = 16 * 2 * (CONFIG_PAGING_LEVELS + 1);
>> +    unsigned int i, max_bytes = 64;
>> +    struct hvmemul_cache *cache;
>> +
>> +    /*
>> +     * Account for each insn byte individually, both for simplicity and to
>> +     * leave some slack space.
>>       */
> 
> Hang on.  Do we seriously use a separate cache entry for each
> instruction byte ?

We don't _use_ one, but we set up one. Already prior to this patch; the
comment merely moves (and is extended to mention the slack space aspect).
Also note that it is only the other (read) cache this is relevant for,
not the (MMIO) cache this series as a whole is about.

Jan



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.