[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH 2/5] x86/HVM: allocate emulation cache entries dynamically
On 04/09/2024 2:29 pm, Jan Beulich wrote: > Both caches may need higher capacity, and the upper bound will need to > be determined dynamically based on CPUID policy (for AMX at least). Is this to cope with TILE{LOAD,STORE}, or something else? It's not exactly clear, even when looking at prior AMX series. > While touching the check in hvmemul_phys_mmio_access() anyway, also > tighten it: To avoid overrunning the internal buffer we need to take the > offset into the buffer into account. Does this really want to be mixed with a prep patch ? > > Signed-off-by: Jan Beulich <jbeulich@xxxxxxxx> > --- > This is a patch taken from the AMX series, which was part of the v3 > submission. All I did is strip out the actual AMX bits (from > hvmemul_cache_init()), plus of course change the description. As a > result some local variables there may look unnecessary, but this way > it's going to be less churn when the AMX bits are added. The next patch > pretty strongly depends on the changed approach (contextually, not so > much functionally), and I'd really like to avoid rebasing that one ahead > of this one, and then this one on top of that. Fine by me. > --- a/xen/arch/x86/hvm/emulate.c > +++ b/xen/arch/x86/hvm/emulate.c > @@ -26,6 +26,18 @@ > #include <asm/iocap.h> > #include <asm/vm_event.h> > > +/* > + * We may read or write up to m512 or up to a tile row as a number of > + * device-model transactions. > + */ > +struct hvm_mmio_cache { > + unsigned long gla; > + unsigned int size; > + unsigned int space:31; > + unsigned int dir:1; > + uint8_t buffer[] __aligned(sizeof(long)); I know this is a minor tangent, but you are turning a regular struct into a flexible one. Could we introduce __counted_by() and start using it here? At the toolchain level, it lets the compiler understand the real size of the object, so e.g. the sanitisers can spot out-of-bounds accesses through the flexible member. But, even in the short term, having /* TODO */ # define __counted_by(member) in compiler.h still leaves us with better code, because struct hvm_mmio_cache { unsigned long gla; unsigned int size; unsigned int space:31; unsigned int dir:1; uint8_t buffer[] __aligned(sizeof(long)) __counted_by(size); }; is explicitly clear in a case where the "space" field creates some ambiguity. > @@ -2978,16 +2991,21 @@ void hvm_dump_emulation_state(const char > int hvmemul_cache_init(struct vcpu *v) > { > /* > - * No insn can access more than 16 independent linear addresses (AVX512F > - * scatters/gathers being the worst). Each such linear range can span a > - * page boundary, i.e. may require two page walks. Account for each insn > - * byte individually, for simplicity. > + * AVX512F scatter/gather insns can access up to 16 independent linear > + * addresses, up to 8 bytes size. Each such linear range can span a page > + * boundary, i.e. may require two page walks. > + */ > + unsigned int nents = 16 * 2 * (CONFIG_PAGING_LEVELS + 1); > + unsigned int i, max_bytes = 64; > + struct hvmemul_cache *cache; > + > + /* > + * Account for each insn byte individually, both for simplicity and to > + * leave some slack space. > */ Hang on. Do we seriously use a separate cache entry for each instruction byte ? ~Andrew
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |