[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH] x86/PV: hide features dependent on XSAVE when booted with "no-xsave"

On 30/11/15 16:00, Jan Beulich wrote:
>>>> On 30.11.15 at 16:38, <andrew.cooper3@xxxxxxxxxx> wrote:
>> On 30/11/15 15:22, Jan Beulich wrote:
>>>>>> On 30.11.15 at 14:36, <andrew.cooper3@xxxxxxxxxx> wrote:
>>>> On 30/11/15 11:30, Jan Beulich wrote:
>>>>> It's not well defined whether YMM register presence
>>>>> correlates to AVX, or is simply flagged by the respective XSTATE
>>>>> CPUID bit (or a mixture of both).
>>>> It is indeed not well defined, which is what makes this area of
>>>> functionality so hard to level safely.
>>>>> The minimal (and imo more natural) dependency is just the XSTATE bit.
>>>> But it is wrong.
>>>> Any VEX encoded SIMD operation unconditionally works on YMM state.  In
>>>> the case that XMM registers are encoded with a VEX prefix, the upper 128
>>>> bits of the YMM register are zeroed (SDM Vol 2, 2.3.10).  This is
>>>> contrary to legacy SSE instructions which preserve the upper 128 bits.
>>>> Therefore, FMA, FMA4 and XOP do have a strict dependency on AVX.
>>> No, if you really want to express it that way, you'll need feature
>>> flags derived from the XSTATE bits.
>> What? That is absurd.
> Sorry, but no, this is not absurd, this is what you can derive from the
> SDM without much guessing. There's nowhere the SDM makes any
> connection between FMA and AVX.

Intel Vol 1 14.5.3 "Detection of FMA" states:

Hardware support for FMA is indicated by CPUID.1:ECX.FMA[bit 12]=1.
Application Software must identify that hardware supports AVX, after
that it must also detect support for FMA by
CPUID.1:ECX.FMA[bit 12].

> The only connections it makes are OSXSAVE and XCR0[2:1], neither of which is 
> formally tied to AVX.

Actually, on further reading,

Intel SDM Vol 3, 2.6, Figure 2-8 states:

XCR0.AVX (bit 2): If 1, AVX instructions can be executed and the XSAVE
feature set can be used to manage the
upper halves of the YMM registers (YMM0-YMM15 in 64-bit mode; otherwise

This means that bit 2 has dual meaning, and is not just YMM state.  This
does IMO provide a formal tie between AVX and XCRO[2].

I admit that the AMD manuals are far less prescriptive than the Intel. 
However, AMD Vol 3 1.9 "Encoding using the VEX and XOP Prefixes" draws
several conclusions, including:

VEX opcode maps 1–3 are also used to encode the FMA4 and FMA instructions

while the FMA/FMA4 instruction description states:

The destination is either an XMM register or a YMM register, as
determined by VEX.L. When the
destination is an XMM register (L = 0), bits [255:128] of the
corresponding YMM register are

and also states that a #UD will occur if XCR0[2:1] != '11b', which is
sufficient indication of FMA/FMA4 having a direct link to AVX.

As for XOP, AMD Vol 4, "XMM Register Destinations" states again
that either all YMM is specified, or the the upper 128 bits are cleared
if an XMM register is encoded, as well as each instruction description
specifying a #UD if XCR0[2:1] != '11b'.  This logically follows from the
history, where XOP ended up being all the SSE5 instructions which didn't
overlap with AVX.


Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.