[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v3 02/18] x86emul: support most memory accessing MMX/SSE{, 2, 3} insns



On 20/02/17 14:52, Jan Beulich wrote:
>>>> On 20.02.17 at 14:45, <andrew.cooper3@xxxxxxxxxx> wrote:
>> On 15/02/17 11:07, Jan Beulich wrote:
>>> --- a/xen/arch/x86/x86_emulate/x86_emulate.c
>>> +++ b/xen/arch/x86/x86_emulate/x86_emulate.c
>>> @@ -45,6 +45,8 @@
>>>  #define ModRM       (1<<6)
>>>  /* Destination is only written; never read. */
>>>  #define Mov         (1<<7)
>>> +/* VEX/EVEX (SIMD only): 2nd source operand unused (must be all ones) */
>>> +#define TwoOp       Mov
>> Is this safe?  It looks overloaded to me.  The Mov behaviour is still
>> applicable even with TwoOp VEX/EVEX encodings.
> It is safe. Mov only really matters for instructions writing to memory,
> and there's no read-modify-write instruction in the entire SIMD set
> afaict.

On the meaning of Mov alone, can we please state the intended meaning
more clearly, and pro-actively remove misuses?

We currently have 21 masks; you are adding a 22nd here, and a 23rd in a
later patch.  For 8 bits worth of space, this shows how overloaded the
encoding is.


As for TwoOp, it still collides for DstMem instructions.

>
>>> @@ -196,22 +234,41 @@ static const struct {
>>>      [0x0d] = { ImplicitOps|ModRM },
>>>      [0x0e] = { ImplicitOps },
>>>      [0x0f] = { ModRM|SrcImmByte },
>>> -    [0x10 ... 0x1f] = { ImplicitOps|ModRM },
>>> +    [0x10] = { DstImplicit|SrcMem|ModRM|Mov, simd_any_fp },
>>> +    [0x11] = { DstMem|SrcImplicit|ModRM|Mov, simd_any_fp },
>>> +    [0x12 ... 0x13] = { ImplicitOps|ModRM },
>>> +    [0x14 ... 0x15] = { DstImplicit|SrcMem|ModRM, simd_packed_fp },
>>> +    [0x16 ... 0x1f] = { ImplicitOps|ModRM },
>>>      [0x20 ... 0x21] = { DstMem|SrcImplicit|ModRM },
>>>      [0x22 ... 0x23] = { DstImplicit|SrcMem|ModRM },
>>> -    [0x28 ... 0x2f] = { ImplicitOps|ModRM },
>>> +    [0x28] = { DstImplicit|SrcMem|ModRM|Mov, simd_packed_fp },
>>> +    [0x29] = { DstMem|SrcImplicit|ModRM|Mov, simd_packed_fp },
>>> +    [0x2a] = { ImplicitOps|ModRM },
>>> +    [0x2b] = { DstMem|SrcImplicit|ModRM|Mov, simd_any_fp },
>>> +    [0x2c ... 0x2f] = { ImplicitOps|ModRM },
>>>      [0x30 ... 0x35] = { ImplicitOps },
>>>      [0x37] = { ImplicitOps },
>>>      [0x38] = { DstReg|SrcMem|ModRM },
>>>      [0x3a] = { DstReg|SrcImmByte|ModRM },
>>>      [0x40 ... 0x4f] = { DstReg|SrcMem|ModRM|Mov },
>>> -    [0x50 ... 0x6e] = { ModRM },
>>> -    [0x6f] = { ImplicitOps|ModRM },
>>> -    [0x70 ... 0x73] = { SrcImmByte|ModRM },
>>> -    [0x74 ... 0x76] = { ModRM },
>>> -    [0x77] = { ImplicitOps },
>>> +    [0x50] = { ModRM },
>>> +    [0x51] = { DstImplicit|SrcMem|ModRM|TwoOp, simd_any_fp },
>>> +    [0x52 ... 0x53] = { DstImplicit|SrcMem|ModRM|TwoOp, simd_single_fp },
>> RCPPS/RCPSS all have 3 operands.  Why is TwoOp used here?
> Not exactly: The packed ones have two operands, while the
> scalar ones have three. See the adjustment to the end of
> x86_decode_twobyte() for how this is being dealt with.

In which case this shouldn't really be TwoOp here, and TwoOp should be
inserted in the later operand adjustment logic when we distinguish the
F3 prefix.

In fact, only VRCPSS is a 3-operand instruction.  The legacy encoding is
indeed just two operands.

>
>>> +    case simd_single_fp:
>>> +        if ( vex.pfx & VEX_PREFIX_DOUBLE_MASK )
>> This logic would be far easier to follow by using vex.pfx == vex_66 ||
>> vex.pfx == vex_f2.
> That would be two comparisons, and I think the constant's name
> is sufficiently descriptive to understand what's going on.

As someone who isn't the author of the code, I am still not clear on
what the constant actually means.  I understand what it mechanically
equates to, but it appears to be context dependent as to whether the
terms double and/or scalar are relevant.

> In fact I think it's easier to understand with that constant, than if I used
> vex_66 and vex_f2, which doesn't make immediately visible that
> we care about double variants here.
>
>>> +        {
>>> +            op_bytes = 0;
>>> +            break;
>>> +    case simd_packed_fp:
>>> +            if ( vex.pfx & VEX_PREFIX_SCALAR_MASK )
>> Similarly here, vex_none || vex_f3
> If at all, vex_f3 || vex_f2, but see above (and you having got it
> wrong is a good indication to me that using the constants is
> better).
>
>> Having said that, taking VSHUFPS (0xc6) as example of simd_packed_fp,
>> this instruction is defined for vex_none and vex_66, both of which have
>> op_bytes of 16 when not vex encoded.
> Which is to tell me what? This matches the common pattern
> (vex_none and vex_66 being packed - i.e.full width - operations,
> while vex_f3 and vex_f2 are scalar ones).

The setting of op_bytes to 0 looks wrong.

>
>>> +    CASE_SIMD_SCALAR_FP(, 0x0f, 0x2b):     /* movnts{s,d} xmm,mem */
>>> +        host_and_vcpu_must_have(sse4a);
>>> +        /* fall through */
>>> +    CASE_SIMD_PACKED_FP(, 0x0f, 0x2b):     /* movntp{s,d} xmm,m128 */
>>> +    CASE_SIMD_PACKED_FP(_VEX, 0x0f, 0x2b): /* vmovntp{s,d} {x,y}mm,mem */
>>> +        generate_exception_if(ea.type != OP_MEM, EXC_UD);
>>> +        sfence = true;
>> Why do we need to emit an sfence at this point?  The software hitting
>> this emulation is the entity which should be making sfence decisions.
> This is to be on the safe side: The instruction in the stub doesn't
> write to the intended destination, but to mmval. We then copy
> from there to the final destination. While the CPU _should_
> consult its WC buffer for reads, I'd rather not rely on the absence
> of errata here.

There is only one erratum (Pentium-M, erratum Y9) concerning cache
snooping, and the reason I didn't port it to Xen at the same time as the
SelfSnoop feature is because Pentium-M are 32bit processors only.

I know emulation isn't a fastpath, but fences have substantial
perturbance to the current core, including associated hyperthreads.

If there were an unknown erratum in this area, the guest OS would still
need to issue fence instructions itself, which are far less likely to
trap for emulation.

>
>>> @@ -6457,22 +6917,6 @@ x86_insn_is_mem_write(const struct x86_e
>>>      case 0x6c: case 0x6d:                /* INS */
>>>      case 0xa4: case 0xa5:                /* MOVS */
>>>      case 0xaa: case 0xab:                /* STOS */
>>> -    case X86EMUL_OPC(0x0f, 0x11):        /* MOVUPS */
>>> -    case X86EMUL_OPC_VEX(0x0f, 0x11):    /* VMOVUPS */
>>> -    case X86EMUL_OPC_66(0x0f, 0x11):     /* MOVUPD */
>>> -    case X86EMUL_OPC_VEX_66(0x0f, 0x11): /* VMOVUPD */
>>> -    case X86EMUL_OPC_F3(0x0f, 0x11):     /* MOVSS */
>>> -    case X86EMUL_OPC_VEX_F3(0x0f, 0x11): /* VMOVSS */
>>> -    case X86EMUL_OPC_F2(0x0f, 0x11):     /* MOVSD */
>>> -    case X86EMUL_OPC_VEX_F2(0x0f, 0x11): /* VMOVSD */
>>> -    case X86EMUL_OPC(0x0f, 0x29):        /* MOVAPS */
>>> -    case X86EMUL_OPC_VEX(0x0f, 0x29):    /* VMOVAPS */
>>> -    case X86EMUL_OPC_66(0x0f, 0x29):     /* MOVAPD */
>>> -    case X86EMUL_OPC_VEX_66(0x0f, 0x29): /* VMOVAPD */
>>> -    case X86EMUL_OPC(0x0f, 0x2b):        /* MOVNTPS */
>>> -    case X86EMUL_OPC_VEX(0x0f, 0x2b):    /* VMOVNTPS */
>>> -    case X86EMUL_OPC_66(0x0f, 0x2b):     /* MOVNTPD */
>>> -    case X86EMUL_OPC_VEX_66(0x0f, 0x2b): /* VMOVNTPD */
>> Where have these gone?
> Nowhere, they're not needed anymore now that the twobyte_table[]
> entries no longer use DstImplicit, but DstMem.

Oh, good.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.