[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v8 46/50] x86emul: support GFNI insns



>>> On 21.06.19 at 16:20, <andrew.cooper3@xxxxxxxxxx> wrote:
> On 21/06/2019 15:00, Jan Beulich wrote:
>>> On 15/03/2019 11:06, Jan Beulich wrote:
>>>> @@ -138,6 +141,26 @@ static bool simd_check_avx512vbmi_vl(voi
>>>>      return cpu_has_avx512_vbmi && cpu_has_avx512vl;
>>>>  }
>>>>  
>>>> +static bool simd_check_sse2_gf(void)
>>>> +{
>>>> +    return cpu_has_gfni && cpu_has_sse2;
>>> This dependency doesn't match the manual.  The legacy encoding needs
>>> GFNI alone.
>>>
>>> gen-cpuid.py is trying to reduce the ability to create totally
>>> implausible configurations via levelling, but for software checks, we
>>> should follow the manual to the letter.
>> This is test harness code - I'd rather be a little more strict here than
>> having to needlessly spend time fixing an issue in there. Furthermore
>> this matches how gcc enforces dependencies.
>>>> +}
>>>> +
>>>> +static bool simd_check_avx2_gf(void)
>>>> +{
>>>> +    return cpu_has_gfni && cpu_has_avx2;
>>> Here, the dependency is only on AVX, which I think is probably trying to
>>> express a dependency on xcr0.ymm
>> Mostly as per above, except that here gcc (imo wrongly) enables just
>> AVX.
>>
>>>> +}
>>>> +
>>>> +static bool simd_check_avx512bw_gf(void)
>>>> +{
>>>> +    return cpu_has_gfni && cpu_has_avx512bw;
>>> I don't see any BW interaction anywhere (in the manual), despite the
>>> fact it operates on a datatype of int8.
>> But by operating on vectors of bytes, it requires 64 bits wide mask
>> registers, which is the connection to BW. Again gcc also does so.
> 
> To be honest, it doesn't matter what GCC does.

Coming back to this - it very much matters _here_ (i.e. in test harness
code): For one, the checks above need to be in line with the -m<isa>
options we pass to the compiler. I.e. if anything the question might
be on the Makefile additions why I enable SSE2, AVX2, and AVX512BW
respectively.

While I could simply claim that this is my choice as to producing
sensible test case binary blobs, there's a compiler aspect _there_:
gcc's intrinsics header enables SSE2, AVX, and AVX512F / AVX512BW
around the definitions on the inline wrappers around the builtins.
This in turn is because the availability of the respective builtins
depends on these ISAs to be enabled alongside GFNI itself. We
therefore may not use any ISA level _lower_ than what the
builtins require. As mentioned before, seeing them use SSE2 as
prereq for the legacy encoded insns, I question their use of AVX
instead of AVX2, and hence I'd prefer to stick with AVX2; if nothing
else then simply because of what I've said in the first half sentence
of this paragraph. (My guess is that it's the availability of
{,,V}MOVDQ{A,U}* that did determine their choice, rather than the
general availability of vector operations on the given vector and
element types and sizes. I'm sure this would lead to some rather
"funny" generated code when inputs come from other
transformations rather than straight from memory.)

Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.