[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v3 02/18] x86emul: support most memory accessing MMX/SSE{, 2, 3} insns



>>> On 20.02.17 at 19:40, <andrew.cooper3@xxxxxxxxxx> wrote:
> On 20/02/17 14:52, Jan Beulich wrote:
>>>>> On 20.02.17 at 14:45, <andrew.cooper3@xxxxxxxxxx> wrote:
>>> On 15/02/17 11:07, Jan Beulich wrote:
>>>> --- a/xen/arch/x86/x86_emulate/x86_emulate.c
>>>> +++ b/xen/arch/x86/x86_emulate/x86_emulate.c
>>>> @@ -45,6 +45,8 @@
>>>>  #define ModRM       (1<<6)
>>>>  /* Destination is only written; never read. */
>>>>  #define Mov         (1<<7)
>>>> +/* VEX/EVEX (SIMD only): 2nd source operand unused (must be all ones) */
>>>> +#define TwoOp       Mov
>>> Is this safe?  It looks overloaded to me.  The Mov behaviour is still
>>> applicable even with TwoOp VEX/EVEX encodings.
>> It is safe. Mov only really matters for instructions writing to memory,
>> and there's no read-modify-write instruction in the entire SIMD set
>> afaict.
> 
> On the meaning of Mov alone, can we please state the intended meaning
> more clearly, and pro-actively remove misuses?

I'm having difficulty with both parts - could you give an example for
each?

> We currently have 21 masks; you are adding a 22nd here, and a 23rd in a
> later patch.  For 8 bits worth of space, this shows how overloaded the
> encoding is.

There aren't that many _masks_, quite a few of the values you
refer to are actually values to compare against after masking.
Overloading prior to this patch is - afaict - limited to SrcEax/DstEax
(aliasing *Reg) and SrcImplicit/DstImplicit (aliasing *None).

> As for TwoOp, it still collides for DstMem instructions.

Are there enough DstMem instructions with more than 2 operands for
it to become unwieldy to special treat them to get after reading the
table?

>>>> @@ -196,22 +234,41 @@ static const struct {
>>>>      [0x0d] = { ImplicitOps|ModRM },
>>>>      [0x0e] = { ImplicitOps },
>>>>      [0x0f] = { ModRM|SrcImmByte },
>>>> -    [0x10 ... 0x1f] = { ImplicitOps|ModRM },
>>>> +    [0x10] = { DstImplicit|SrcMem|ModRM|Mov, simd_any_fp },
>>>> +    [0x11] = { DstMem|SrcImplicit|ModRM|Mov, simd_any_fp },
>>>> +    [0x12 ... 0x13] = { ImplicitOps|ModRM },
>>>> +    [0x14 ... 0x15] = { DstImplicit|SrcMem|ModRM, simd_packed_fp },
>>>> +    [0x16 ... 0x1f] = { ImplicitOps|ModRM },
>>>>      [0x20 ... 0x21] = { DstMem|SrcImplicit|ModRM },
>>>>      [0x22 ... 0x23] = { DstImplicit|SrcMem|ModRM },
>>>> -    [0x28 ... 0x2f] = { ImplicitOps|ModRM },
>>>> +    [0x28] = { DstImplicit|SrcMem|ModRM|Mov, simd_packed_fp },
>>>> +    [0x29] = { DstMem|SrcImplicit|ModRM|Mov, simd_packed_fp },
>>>> +    [0x2a] = { ImplicitOps|ModRM },
>>>> +    [0x2b] = { DstMem|SrcImplicit|ModRM|Mov, simd_any_fp },
>>>> +    [0x2c ... 0x2f] = { ImplicitOps|ModRM },
>>>>      [0x30 ... 0x35] = { ImplicitOps },
>>>>      [0x37] = { ImplicitOps },
>>>>      [0x38] = { DstReg|SrcMem|ModRM },
>>>>      [0x3a] = { DstReg|SrcImmByte|ModRM },
>>>>      [0x40 ... 0x4f] = { DstReg|SrcMem|ModRM|Mov },
>>>> -    [0x50 ... 0x6e] = { ModRM },
>>>> -    [0x6f] = { ImplicitOps|ModRM },
>>>> -    [0x70 ... 0x73] = { SrcImmByte|ModRM },
>>>> -    [0x74 ... 0x76] = { ModRM },
>>>> -    [0x77] = { ImplicitOps },
>>>> +    [0x50] = { ModRM },
>>>> +    [0x51] = { DstImplicit|SrcMem|ModRM|TwoOp, simd_any_fp },
>>>> +    [0x52 ... 0x53] = { DstImplicit|SrcMem|ModRM|TwoOp, simd_single_fp },
>>> RCPPS/RCPSS all have 3 operands.  Why is TwoOp used here?
>> Not exactly: The packed ones have two operands, while the
>> scalar ones have three. See the adjustment to the end of
>> x86_decode_twobyte() for how this is being dealt with.
> 
> In which case this shouldn't really be TwoOp here, and TwoOp should be
> inserted in the later operand adjustment logic when we distinguish the
> F3 prefix.

I disagree, as I think the table would best represent the base
instruction form, if alternative forms vary. The base for is the one
without any (implied) prefix.

> In fact, only VRCPSS is a 3-operand instruction.  The legacy encoding is
> indeed just two operands.

Hence wanting it to be marked 2-op here.

>>>> +    case simd_single_fp:
>>>> +        if ( vex.pfx & VEX_PREFIX_DOUBLE_MASK )
>>> This logic would be far easier to follow by using vex.pfx == vex_66 ||
>>> vex.pfx == vex_f2.
>> That would be two comparisons, and I think the constant's name
>> is sufficiently descriptive to understand what's going on.
> 
> As someone who isn't the author of the code, I am still not clear on
> what the constant actually means.  I understand what it mechanically
> equates to, but it appears to be context dependent as to whether the
> terms double and/or scalar are relevant.

I don't understand. The generic meaning of no prefix is vector single.
The generic meaning of prefix 66 is vector double. Etc. Going over
the full file after the entire series is applied there are edge cases of
the VEX_PREFIX_DOUBLE_MASK use in the handling of
- {,v}pmovmskb
- {,v}cvtdq2pd
and none at all for VEX_PREFIX_SCALAR_MASK. I could be talked
into adjusting those edge cases, but I don't think there's any benefit
of adjusting all the other uses where the generic meaning applies.

>>>> +        {
>>>> +            op_bytes = 0;
>>>> +            break;
>>>> +    case simd_packed_fp:
>>>> +            if ( vex.pfx & VEX_PREFIX_SCALAR_MASK )
>>> Similarly here, vex_none || vex_f3
>> If at all, vex_f3 || vex_f2, but see above (and you having got it
>> wrong is a good indication to me that using the constants is
>> better).
>>
>>> Having said that, taking VSHUFPS (0xc6) as example of simd_packed_fp,
>>> this instruction is defined for vex_none and vex_66, both of which have
>>> op_bytes of 16 when not vex encoded.
>> Which is to tell me what? This matches the common pattern
>> (vex_none and vex_66 being packed - i.e.full width - operations,
>> while vex_f3 and vex_f2 are scalar ones).
> 
> The setting of op_bytes to 0 looks wrong.

Why? The instruction is marked simd_packed_fp, i.e. telling that
scalar variants are invalid. Hence op_bytes gets set to 0, for the
common stub preparation/invocation code after the big switch()
statement to raise #UD unless op_bytes did get altered by that
time.

>>>> +    CASE_SIMD_SCALAR_FP(, 0x0f, 0x2b):     /* movnts{s,d} xmm,mem */
>>>> +        host_and_vcpu_must_have(sse4a);
>>>> +        /* fall through */
>>>> +    CASE_SIMD_PACKED_FP(, 0x0f, 0x2b):     /* movntp{s,d} xmm,m128 */
>>>> +    CASE_SIMD_PACKED_FP(_VEX, 0x0f, 0x2b): /* vmovntp{s,d} {x,y}mm,mem */
>>>> +        generate_exception_if(ea.type != OP_MEM, EXC_UD);
>>>> +        sfence = true;
>>> Why do we need to emit an sfence at this point?  The software hitting
>>> this emulation is the entity which should be making sfence decisions.
>> This is to be on the safe side: The instruction in the stub doesn't
>> write to the intended destination, but to mmval. We then copy
>> from there to the final destination. While the CPU _should_
>> consult its WC buffer for reads, I'd rather not rely on the absence
>> of errata here.
> 
> There is only one erratum (Pentium-M, erratum Y9) concerning cache
> snooping, and the reason I didn't port it to Xen at the same time as the
> SelfSnoop feature is because Pentium-M are 32bit processors only.
> 
> I know emulation isn't a fastpath, but fences have substantial
> perturbance to the current core, including associated hyperthreads.
> 
> If there were an unknown erratum in this area, the guest OS would still
> need to issue fence instructions itself, which are far less likely to
> trap for emulation.

Is there an explicit statement anywhere in the manuals that tells us
that CPUs are required to consult the WC buffers upon memory
reads for data written by earlier writes? The entire section "Buffering
of Write Combining Memory Locations" in the SDM does not contain a
single hint into that direction. I can't even find any reference to
"WC buffer" at all in volume 1, which is what is supposed to be
sufficient for application programmers (to whom the behavior here is
of concern).

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.