|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH v3 02/18] x86emul: support most memory accessing MMX/SSE{, 2, 3} insns
>>> On 20.02.17 at 14:45, <andrew.cooper3@xxxxxxxxxx> wrote:
> On 15/02/17 11:07, Jan Beulich wrote:
>> --- a/xen/arch/x86/x86_emulate/x86_emulate.c
>> +++ b/xen/arch/x86/x86_emulate/x86_emulate.c
>> @@ -45,6 +45,8 @@
>> #define ModRM (1<<6)
>> /* Destination is only written; never read. */
>> #define Mov (1<<7)
>> +/* VEX/EVEX (SIMD only): 2nd source operand unused (must be all ones) */
>> +#define TwoOp Mov
>
> Is this safe? It looks overloaded to me. The Mov behaviour is still
> applicable even with TwoOp VEX/EVEX encodings.
It is safe. Mov only really matters for instructions writing to memory,
and there's no read-modify-write instruction in the entire SIMD set
afaict.
>> @@ -180,8 +182,44 @@ static const opcode_desc_t opcode_table[
>> ImplicitOps, ImplicitOps, ByteOp|DstMem|SrcNone|ModRM,
> DstMem|SrcNone|ModRM
>> };
>>
>> +enum simd_opsize {
>> + simd_none,
>
> Please can we have newlines here,
>
>> + /*
>> + * Ordinary packed integers:
>> + * - 64 bits without prefix 66 (MMX)
>> + * - 128 bits with prefix 66 (SSEn)
>> + * - 128/256 bits depending on VEX.L (AVX)
>> + */
>
> and here, etc, to help identify which comment is attached to which enum.
Well, if you think it helps.
>> + simd_packed_int,
>> + /*
>> + * Ordinary packed/scalar floating point:
>> + * - 128 bits without prefix or with prefix 66 (SSEn)
>> + * - 128/256 bits depending on VEX.L (AVX)
>> + * - 32 bits with prefix F3 (scalar single)
>> + * - 64 bits with prefix F2 (scalar doubgle)
>> + */
>> + simd_any_fp,
>> + /*
>> + * Packed floating point:
>> + * - 128 bits without prefix or with prefix 66 (SSEn)
>> + * - 128/256 bits depending on VEX.L (AVX)
>> + */
>> + simd_packed_fp,
>> + /*
>> + * Single precision packed/scalar floating point:
>> + * - 128 bits without prefix (SSEn)
>> + * - 128/256 bits depending on VEX.L, no prefix (AVX)
>> + * - 32 bits with prefix F3 (scalar)
>> + */
>> + simd_single_fp,
>> + /* Operand size encoded in non-standard way. */
>> + simd_other
>
> ,
Specifically not in this case, as I mean this to remain the last entry
even if new enumerators get added.
>> @@ -196,22 +234,41 @@ static const struct {
>> [0x0d] = { ImplicitOps|ModRM },
>> [0x0e] = { ImplicitOps },
>> [0x0f] = { ModRM|SrcImmByte },
>> - [0x10 ... 0x1f] = { ImplicitOps|ModRM },
>> + [0x10] = { DstImplicit|SrcMem|ModRM|Mov, simd_any_fp },
>> + [0x11] = { DstMem|SrcImplicit|ModRM|Mov, simd_any_fp },
>> + [0x12 ... 0x13] = { ImplicitOps|ModRM },
>> + [0x14 ... 0x15] = { DstImplicit|SrcMem|ModRM, simd_packed_fp },
>> + [0x16 ... 0x1f] = { ImplicitOps|ModRM },
>> [0x20 ... 0x21] = { DstMem|SrcImplicit|ModRM },
>> [0x22 ... 0x23] = { DstImplicit|SrcMem|ModRM },
>> - [0x28 ... 0x2f] = { ImplicitOps|ModRM },
>> + [0x28] = { DstImplicit|SrcMem|ModRM|Mov, simd_packed_fp },
>> + [0x29] = { DstMem|SrcImplicit|ModRM|Mov, simd_packed_fp },
>> + [0x2a] = { ImplicitOps|ModRM },
>> + [0x2b] = { DstMem|SrcImplicit|ModRM|Mov, simd_any_fp },
>> + [0x2c ... 0x2f] = { ImplicitOps|ModRM },
>> [0x30 ... 0x35] = { ImplicitOps },
>> [0x37] = { ImplicitOps },
>> [0x38] = { DstReg|SrcMem|ModRM },
>> [0x3a] = { DstReg|SrcImmByte|ModRM },
>> [0x40 ... 0x4f] = { DstReg|SrcMem|ModRM|Mov },
>> - [0x50 ... 0x6e] = { ModRM },
>> - [0x6f] = { ImplicitOps|ModRM },
>> - [0x70 ... 0x73] = { SrcImmByte|ModRM },
>> - [0x74 ... 0x76] = { ModRM },
>> - [0x77] = { ImplicitOps },
>> + [0x50] = { ModRM },
>> + [0x51] = { DstImplicit|SrcMem|ModRM|TwoOp, simd_any_fp },
>> + [0x52 ... 0x53] = { DstImplicit|SrcMem|ModRM|TwoOp, simd_single_fp },
>
> RCPPS/RCPSS all have 3 operands. Why is TwoOp used here?
Not exactly: The packed ones have two operands, while the
scalar ones have three. See the adjustment to the end of
x86_decode_twobyte() for how this is being dealt with.
>> + case simd_single_fp:
>> + if ( vex.pfx & VEX_PREFIX_DOUBLE_MASK )
>
> This logic would be far easier to follow by using vex.pfx == vex_66 ||
> vex.pfx == vex_f2.
That would be two comparisons, and I think the constant's name
is sufficiently descriptive to understand what's going on. In fact
I think it's easier to understand with that constant, than if I used
vex_66 and vex_f2, which doesn't make immediately visible that
we care about double variants here.
>> + {
>> + op_bytes = 0;
>> + break;
>> + case simd_packed_fp:
>> + if ( vex.pfx & VEX_PREFIX_SCALAR_MASK )
>
> Similarly here, vex_none || vex_f3
If at all, vex_f3 || vex_f2, but see above (and you having got it
wrong is a good indication to me that using the constants is
better).
> Having said that, taking VSHUFPS (0xc6) as example of simd_packed_fp,
> this instruction is defined for vex_none and vex_66, both of which have
> op_bytes of 16 when not vex encoded.
Which is to tell me what? This matches the common pattern
(vex_none and vex_66 being packed - i.e.full width - operations,
while vex_f3 and vex_f2 are scalar ones).
>> + CASE_SIMD_SCALAR_FP(, 0x0f, 0x2b): /* movnts{s,d} xmm,mem */
>> + host_and_vcpu_must_have(sse4a);
>> + /* fall through */
>> + CASE_SIMD_PACKED_FP(, 0x0f, 0x2b): /* movntp{s,d} xmm,m128 */
>> + CASE_SIMD_PACKED_FP(_VEX, 0x0f, 0x2b): /* vmovntp{s,d} {x,y}mm,mem */
>> + generate_exception_if(ea.type != OP_MEM, EXC_UD);
>> + sfence = true;
>
> Why do we need to emit an sfence at this point? The software hitting
> this emulation is the entity which should be making sfence decisions.
This is to be on the safe side: The instruction in the stub doesn't
write to the intended destination, but to mmval. We then copy
from there to the final destination. While the CPU _should_
consult its WC buffer for reads, I'd rather not rely on the absence
of errata here.
>> @@ -6457,22 +6917,6 @@ x86_insn_is_mem_write(const struct x86_e
>> case 0x6c: case 0x6d: /* INS */
>> case 0xa4: case 0xa5: /* MOVS */
>> case 0xaa: case 0xab: /* STOS */
>> - case X86EMUL_OPC(0x0f, 0x11): /* MOVUPS */
>> - case X86EMUL_OPC_VEX(0x0f, 0x11): /* VMOVUPS */
>> - case X86EMUL_OPC_66(0x0f, 0x11): /* MOVUPD */
>> - case X86EMUL_OPC_VEX_66(0x0f, 0x11): /* VMOVUPD */
>> - case X86EMUL_OPC_F3(0x0f, 0x11): /* MOVSS */
>> - case X86EMUL_OPC_VEX_F3(0x0f, 0x11): /* VMOVSS */
>> - case X86EMUL_OPC_F2(0x0f, 0x11): /* MOVSD */
>> - case X86EMUL_OPC_VEX_F2(0x0f, 0x11): /* VMOVSD */
>> - case X86EMUL_OPC(0x0f, 0x29): /* MOVAPS */
>> - case X86EMUL_OPC_VEX(0x0f, 0x29): /* VMOVAPS */
>> - case X86EMUL_OPC_66(0x0f, 0x29): /* MOVAPD */
>> - case X86EMUL_OPC_VEX_66(0x0f, 0x29): /* VMOVAPD */
>> - case X86EMUL_OPC(0x0f, 0x2b): /* MOVNTPS */
>> - case X86EMUL_OPC_VEX(0x0f, 0x2b): /* VMOVNTPS */
>> - case X86EMUL_OPC_66(0x0f, 0x2b): /* MOVNTPD */
>> - case X86EMUL_OPC_VEX_66(0x0f, 0x2b): /* VMOVNTPD */
>
> Where have these gone?
Nowhere, they're not needed anymore now that the twobyte_table[]
entries no longer use DstImplicit, but DstMem.
Jan
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |