[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v3 01/34] x86emul: support AVX512 opmask insns



>>> On 26.10.18 at 14:19, <andrew.cooper3@xxxxxxxxxx> wrote:
> On 26/10/18 12:59, Jan Beulich wrote:
>>>>> On 26.10.18 at 13:29, <andrew.cooper3@xxxxxxxxxx> wrote:
>>> On 26/10/18 10:03, Jan Beulich wrote:
>>>>>>> On 25.10.18 at 20:32, <andrew.cooper3@xxxxxxxxxx> wrote:
>>>>> On 18/09/18 12:53, Jan Beulich wrote:
>>>>>> @@ -1187,6 +1188,11 @@ static int _get_fpu(
>>>>>>              return X86EMUL_UNHANDLEABLE;
>>>>>>          break;
>>>>>>  
>>>>>> +    case X86EMUL_FPU_opmask:
>>>>>> +        if ( !(xcr0 & X86_XCR0_SSE) || !(xcr0 & X86_XCR0_OPMASK) )
>>>>>> +            return X86EMUL_UNHANDLEABLE;
>>>>>> +        break;
>>>>> I see this follows the pattern from X86EMUL_FPU_ymm, but by the SSE
>>>>> check?  It is not relevant at this point - if xcr0.opmask is set, the
>>>>> opmask instructions should be usable.
>>>> I would agree with you from a functional POV, but please see the
>>>> last row of the table named "OS XSAVE Enabling Requirements of
>>>> Instruction Categories" in SDM Vol 2.
>>> Hmm.  That table is in contradiction to Vol 3 Figure 2-8 and associated
>>> description, which enumerates the #GP conditions of XSETBV.
>>>
>>> In particular, it suggests that OPMASK must be set in union with the ZMM
>>> bits, and cannot be set without YMM.
>> Hmm, true. However ...
>>
>>> IMO, we can require/expect get_xcr0() to pass an architecturally valid
>>> result.  I don't think we should be checking the transitive closure of
>>> feature dependences here.
>> ... I'd prefer if the code remained consistent, and hence the
>> analogy with X86EMUL_FPU_ymm should remain, the more that
>> this way it's also more consistent with Vol 2's #UD condition table.
>>
>> Furthermore I consider some of the restrictions in table 2-8 quite
>> a bit too restrictive, and hence I wouldn't be surprised if they
>> got relaxed at some point. It has puzzled me from the very
>> beginning that e.g. Hi16_ZMM needs to be set by 32-bit OSes
>> despite there being no means to access the upper 16 ZMM
>> registers.
> 
> It almost certainly simplifies the circuitry.  The high parts will be 0
> (due to the zero extending property of VEX/EVEX) and unallocated in the
> physical register file.

High parts? We're talking of entirely distinct registers (zmm16 -
zmm31) here. But I agree it's likely a simplification for them, just
perhaps elsewhere.

>>  By keeping the code as is we'd have no issue with
>> any such relaxation.
> 
> Right.  For now, lets leave this consistent with the surrounding code. 
> Acked-by: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>

Thanks.

> I think we need need Intel to fix their docs, and based on that, I think
> we should do some uniform changes to handling to remove unnecessary
> dependency links.

I'll fire off a request.

Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.