[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v2 12/30] xen/x86: Generate deep dependencies of features

On 15/02/16 16:27, Jan Beulich wrote:
>>>> On 15.02.16 at 17:09, <andrew.cooper3@xxxxxxxxxx> wrote:
>> On 15/02/16 15:52, Jan Beulich wrote:
>>>>>> --- a/xen/tools/gen-cpuid.py
>>>>>> +++ b/xen/tools/gen-cpuid.py
>>>>>> @@ -138,6 +138,61 @@ def crunch_numbers(state):
>>>>>>      state.hvm_shadow = featureset_to_uint32s(state.raw_hvm_shadow, 
>>>> nr_entries)
>>>>>>      state.hvm_hap = featureset_to_uint32s(state.raw_hvm_hap, nr_entries)
>>>>>> +    deps = {
>>>>>> +        XSAVE:
>>>>>> +
>>>>>> +        AVX:
>>>>>> +        (FMA, FMA4, F16C, AVX2, XOP),
>>>>> I continue to question whether namely XOP, but perhaps also the
>>>>> others here except maybe AVX2, really is depending on AVX, and
>>>>> not just on XSAVE.
>>>> I am sure we have had this argument before.
>>> Indeed, hence the "I continue to ...".
>>>> All VEX encoded SIMD instructions (including XOP which is listed in the
>>>> same category by AMD) are specified to act on 256bit AVX state, and
>>>> require AVX enabled in xcr0 to avoid #UD faults.  This includes VEX
>>>> instructions encoding %xmm registers, which explicitly zero the upper
>>>> 128bits of the associated %ymm register.
>>>> This is very clearly a dependency on AVX, even if it isn't written in
>>>> one clear concise statement in the instruction manuals.
>>> The question is what AVX actually means: To me it's an instruction set
>>> extension, not one of machine state. The machine state extension to
>>> me is tied to XSAVE (or more precisely to XSAVE's YMM state). (But I
>>> intentionally say "to me", because I can certainly see why this may be
>>> viewed differently.)
>> The AVX feature bit is also the indicator that the AVX bit may be set in
>> XCR0, which links it to machine state and not just instruction sets.
> No, it's not (and again - there's no bit named AVX in XCR0):

(and again - Intel disagree) The Intel manual uniformly refers to
XCR0.AVX (bit 2).  AMD uses XCR0.YMM.

>  Which
> bits can be set in XCR0 is enumerated by CPUID[0xd].EDX:EAX,
> which is - surprise, surprise - the so called XSTATE leaf (i.e. related
> to XSAVE, and not to AVX).

In hardware, all these bits are almost certainly hardwired on or off. 
Part of the issue here is that with virtualisation, there are a whole
lot more combinations than exist on real hardware.

Whether right or wrong, the values for guests values for
CPUID[0xd].EDX:EAX are now generated from the guest featureset.  This is
based on my assumption that that's how real hardware actually works, and
prevents the possibility of them getting out of sync.

>>>  Note how you yourself have recourse to XCR0,
>>> which is very clearly tied to XSAVE and not AVX, above (and note also
>>> that there's nothing called AVX to be enabled in XCR0, it's YMM that
>>> you talk about).
>> The key point is this.  If I choose to enable XSAVE and disable AVX for
>> a domain, that domain is unable to FMA/FMA4/F16C instructions.  It
>> therefore shouldn't see the features.
> Are you sure? Did you try?


void test_main(void)
    printk("AVX Testing\n");

    write_cr4(read_cr4() | X86_CR4_OSFXSR | X86_CR4_OSXMMEXCPT |

    asm volatile ("xsetbv" :: "a" (0x7), "d" (0), "c" (0));
    asm volatile ("vfmadd132pd %xmm0, %xmm1, %xmm2");

    asm volatile ("xsetbv" :: "a" (0x3), "d" (0), "c" (0));
    asm volatile ("vfmadd132pd %xmm0, %xmm1, %xmm2");


with disassembly:

00104000 <test_main>:
  104000:       48 83 ec 08             sub    $0x8,%rsp
  104004:       bf b0 4c 10 00          mov    $0x104cb0,%edi
  104009:       31 c0                   xor    %eax,%eax
  10400b:       e8 b0 c2 ff ff          callq  1002c0 <printk>
  104010:       0f 20 e0                mov    %cr4,%rax
  104013:       48 0d 00 06 04 00       or     $0x40600,%rax
  104019:       0f 22 e0                mov    %rax,%cr4
  10401c:       31 c9                   xor    %ecx,%ecx
  10401e:       31 d2                   xor    %edx,%edx
  104020:       b8 07 00 00 00          mov    $0x7,%eax
  104025:       0f 01 d1                xsetbv
  104028:       c4 e2 f1 98 d0          vfmadd132pd %xmm0,%xmm1,%xmm2
  10402d:       b0 03                   mov    $0x3,%al
  10402f:       0f 01 d1                xsetbv
  104032:       c4 e2 f1 98 d0          vfmadd132pd %xmm0,%xmm1,%xmm2
  104037:       48 83 c4 08             add    $0x8,%rsp
  10403b:       e9 60 d6 ff ff          jmpq   1016a0 <xtf_success>

causes a #UD exception on the second FMA instruction only:

(d3) [  357.071427] --- Xen Test Framework ---
(d3) [  357.094556] Environment: HVM 64bit (Long mode 4 levels)
(d3) [  357.094709] AVX Testing
(d3) [  357.094867] ******************************
(d3) [  357.095050] PANIC: Unhandled exception: vec 6 at
(d3) [  357.095160] ******************************

>  Those instructions may not be very
> useful without other AVX instructions, but I don't think there's
> any coupling. And if I, as an example, look at one of the
> 3-operand vfmadd instructions, I also don't see any #UD
> resulting from the AVX bit being clear (as opposed to various of
> the AVX-512 extensions, which clearly document that AVX512F
> needs to always be checked). It's only in the textual description
> of e.g. FMA or AVX2 detection where such a connection is being
> made.
> In any event, please don't misunderstand my bringing up of this
> as objection to the way you handle things. I merely wanted to
> point out again that this is not the only way the (often self-
> contradictory) SDM can be understood.

The fact that there is ambiguity means that we must be even more careful
when making changes like this.  After all, if there are multiple ways to
interpret the text, you can probably bet that different software takes
contrary interpretations.


Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.