[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH v3 02/18] x86emul: support most memory accessing MMX/SSE{, 2, 3} insns
>>> On 20.02.17 at 14:45, <andrew.cooper3@xxxxxxxxxx> wrote: > On 15/02/17 11:07, Jan Beulich wrote: >> --- a/xen/arch/x86/x86_emulate/x86_emulate.c >> +++ b/xen/arch/x86/x86_emulate/x86_emulate.c >> @@ -45,6 +45,8 @@ >> #define ModRM (1<<6) >> /* Destination is only written; never read. */ >> #define Mov (1<<7) >> +/* VEX/EVEX (SIMD only): 2nd source operand unused (must be all ones) */ >> +#define TwoOp Mov > > Is this safe? It looks overloaded to me. The Mov behaviour is still > applicable even with TwoOp VEX/EVEX encodings. It is safe. Mov only really matters for instructions writing to memory, and there's no read-modify-write instruction in the entire SIMD set afaict. >> @@ -180,8 +182,44 @@ static const opcode_desc_t opcode_table[ >> ImplicitOps, ImplicitOps, ByteOp|DstMem|SrcNone|ModRM, > DstMem|SrcNone|ModRM >> }; >> >> +enum simd_opsize { >> + simd_none, > > Please can we have newlines here, > >> + /* >> + * Ordinary packed integers: >> + * - 64 bits without prefix 66 (MMX) >> + * - 128 bits with prefix 66 (SSEn) >> + * - 128/256 bits depending on VEX.L (AVX) >> + */ > > and here, etc, to help identify which comment is attached to which enum. Well, if you think it helps. >> + simd_packed_int, >> + /* >> + * Ordinary packed/scalar floating point: >> + * - 128 bits without prefix or with prefix 66 (SSEn) >> + * - 128/256 bits depending on VEX.L (AVX) >> + * - 32 bits with prefix F3 (scalar single) >> + * - 64 bits with prefix F2 (scalar doubgle) >> + */ >> + simd_any_fp, >> + /* >> + * Packed floating point: >> + * - 128 bits without prefix or with prefix 66 (SSEn) >> + * - 128/256 bits depending on VEX.L (AVX) >> + */ >> + simd_packed_fp, >> + /* >> + * Single precision packed/scalar floating point: >> + * - 128 bits without prefix (SSEn) >> + * - 128/256 bits depending on VEX.L, no prefix (AVX) >> + * - 32 bits with prefix F3 (scalar) >> + */ >> + simd_single_fp, >> + /* Operand size encoded in non-standard way. */ >> + simd_other > > , Specifically not in this case, as I mean this to remain the last entry even if new enumerators get added. >> @@ -196,22 +234,41 @@ static const struct { >> [0x0d] = { ImplicitOps|ModRM }, >> [0x0e] = { ImplicitOps }, >> [0x0f] = { ModRM|SrcImmByte }, >> - [0x10 ... 0x1f] = { ImplicitOps|ModRM }, >> + [0x10] = { DstImplicit|SrcMem|ModRM|Mov, simd_any_fp }, >> + [0x11] = { DstMem|SrcImplicit|ModRM|Mov, simd_any_fp }, >> + [0x12 ... 0x13] = { ImplicitOps|ModRM }, >> + [0x14 ... 0x15] = { DstImplicit|SrcMem|ModRM, simd_packed_fp }, >> + [0x16 ... 0x1f] = { ImplicitOps|ModRM }, >> [0x20 ... 0x21] = { DstMem|SrcImplicit|ModRM }, >> [0x22 ... 0x23] = { DstImplicit|SrcMem|ModRM }, >> - [0x28 ... 0x2f] = { ImplicitOps|ModRM }, >> + [0x28] = { DstImplicit|SrcMem|ModRM|Mov, simd_packed_fp }, >> + [0x29] = { DstMem|SrcImplicit|ModRM|Mov, simd_packed_fp }, >> + [0x2a] = { ImplicitOps|ModRM }, >> + [0x2b] = { DstMem|SrcImplicit|ModRM|Mov, simd_any_fp }, >> + [0x2c ... 0x2f] = { ImplicitOps|ModRM }, >> [0x30 ... 0x35] = { ImplicitOps }, >> [0x37] = { ImplicitOps }, >> [0x38] = { DstReg|SrcMem|ModRM }, >> [0x3a] = { DstReg|SrcImmByte|ModRM }, >> [0x40 ... 0x4f] = { DstReg|SrcMem|ModRM|Mov }, >> - [0x50 ... 0x6e] = { ModRM }, >> - [0x6f] = { ImplicitOps|ModRM }, >> - [0x70 ... 0x73] = { SrcImmByte|ModRM }, >> - [0x74 ... 0x76] = { ModRM }, >> - [0x77] = { ImplicitOps }, >> + [0x50] = { ModRM }, >> + [0x51] = { DstImplicit|SrcMem|ModRM|TwoOp, simd_any_fp }, >> + [0x52 ... 0x53] = { DstImplicit|SrcMem|ModRM|TwoOp, simd_single_fp }, > > RCPPS/RCPSS all have 3 operands. Why is TwoOp used here? Not exactly: The packed ones have two operands, while the scalar ones have three. See the adjustment to the end of x86_decode_twobyte() for how this is being dealt with. >> + case simd_single_fp: >> + if ( vex.pfx & VEX_PREFIX_DOUBLE_MASK ) > > This logic would be far easier to follow by using vex.pfx == vex_66 || > vex.pfx == vex_f2. That would be two comparisons, and I think the constant's name is sufficiently descriptive to understand what's going on. In fact I think it's easier to understand with that constant, than if I used vex_66 and vex_f2, which doesn't make immediately visible that we care about double variants here. >> + { >> + op_bytes = 0; >> + break; >> + case simd_packed_fp: >> + if ( vex.pfx & VEX_PREFIX_SCALAR_MASK ) > > Similarly here, vex_none || vex_f3 If at all, vex_f3 || vex_f2, but see above (and you having got it wrong is a good indication to me that using the constants is better). > Having said that, taking VSHUFPS (0xc6) as example of simd_packed_fp, > this instruction is defined for vex_none and vex_66, both of which have > op_bytes of 16 when not vex encoded. Which is to tell me what? This matches the common pattern (vex_none and vex_66 being packed - i.e.full width - operations, while vex_f3 and vex_f2 are scalar ones). >> + CASE_SIMD_SCALAR_FP(, 0x0f, 0x2b): /* movnts{s,d} xmm,mem */ >> + host_and_vcpu_must_have(sse4a); >> + /* fall through */ >> + CASE_SIMD_PACKED_FP(, 0x0f, 0x2b): /* movntp{s,d} xmm,m128 */ >> + CASE_SIMD_PACKED_FP(_VEX, 0x0f, 0x2b): /* vmovntp{s,d} {x,y}mm,mem */ >> + generate_exception_if(ea.type != OP_MEM, EXC_UD); >> + sfence = true; > > Why do we need to emit an sfence at this point? The software hitting > this emulation is the entity which should be making sfence decisions. This is to be on the safe side: The instruction in the stub doesn't write to the intended destination, but to mmval. We then copy from there to the final destination. While the CPU _should_ consult its WC buffer for reads, I'd rather not rely on the absence of errata here. >> @@ -6457,22 +6917,6 @@ x86_insn_is_mem_write(const struct x86_e >> case 0x6c: case 0x6d: /* INS */ >> case 0xa4: case 0xa5: /* MOVS */ >> case 0xaa: case 0xab: /* STOS */ >> - case X86EMUL_OPC(0x0f, 0x11): /* MOVUPS */ >> - case X86EMUL_OPC_VEX(0x0f, 0x11): /* VMOVUPS */ >> - case X86EMUL_OPC_66(0x0f, 0x11): /* MOVUPD */ >> - case X86EMUL_OPC_VEX_66(0x0f, 0x11): /* VMOVUPD */ >> - case X86EMUL_OPC_F3(0x0f, 0x11): /* MOVSS */ >> - case X86EMUL_OPC_VEX_F3(0x0f, 0x11): /* VMOVSS */ >> - case X86EMUL_OPC_F2(0x0f, 0x11): /* MOVSD */ >> - case X86EMUL_OPC_VEX_F2(0x0f, 0x11): /* VMOVSD */ >> - case X86EMUL_OPC(0x0f, 0x29): /* MOVAPS */ >> - case X86EMUL_OPC_VEX(0x0f, 0x29): /* VMOVAPS */ >> - case X86EMUL_OPC_66(0x0f, 0x29): /* MOVAPD */ >> - case X86EMUL_OPC_VEX_66(0x0f, 0x29): /* VMOVAPD */ >> - case X86EMUL_OPC(0x0f, 0x2b): /* MOVNTPS */ >> - case X86EMUL_OPC_VEX(0x0f, 0x2b): /* VMOVNTPS */ >> - case X86EMUL_OPC_66(0x0f, 0x2b): /* MOVNTPD */ >> - case X86EMUL_OPC_VEX_66(0x0f, 0x2b): /* VMOVNTPD */ > > Where have these gone? Nowhere, they're not needed anymore now that the twobyte_table[] entries no longer use DstImplicit, but DstMem. Jan _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx https://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |