Xen project Mailing List

Re: [Xen-devel] [PATCH v4 02/17] x86emul: support MMX/SSE{, 2, 3} moves

To: "Andrew Cooper" <andrew.cooper3@xxxxxxxxxx>

From: "Jan Beulich" <JBeulich@xxxxxxxx>

Date: Thu, 02 Mar 2017 01:07:13 -0700

Cc: xen-devel <xen-devel@xxxxxxxxxxxxxxxxxxxx>

Delivery-date: Thu, 02 Mar 2017 08:07:18 +0000

List-id: Xen developer discussion <xen-devel.lists.xen.org>

>>> On 01.03.17 at 20:56, <andrew.cooper3@xxxxxxxxxx> wrote: > On 01/03/17 14:19, Jan Beulich wrote: >>>>> On 01.03.17 at 14:59, <andrew.cooper3@xxxxxxxxxx> wrote: >>> On 28/02/17 12:50, Jan Beulich wrote: >> I also think avoiding two identical prefixes is (marginally) better >> architecture- >> wise. > > There is no specific advice in the AMD optimisation guide. > > The Intel guide warns against unnecessary use of 0x66 and 0x67 > (specifically in the Length-Changing Prefixes section), which > dynamically change the length of the instruction. This doesn't apply to > us in this situation. > > The only other reference to prefixes comes from the Other Decoding > Guidelines section, which state (obviously) that extra prefixes decrease > instruction bandwidth (as more bytes need consuming to decode the > instruction), and that any instruction with multiple prefixes at all > require decoding in the first decoder, which builds competition of resource. > > I can't see anything suggesting that a double %ds vs a single %ds and > rex prefix would make any difference. Well - performance isn't of interest here anyway, only correctness is. Since not emitting multiple identical prefixes is marginally better and since using REX here is slightly better overall code, I'd prefer for it to stay this way. >>>> @@ -5429,6 +5496,57 @@ x86_emulate( >>>> singlestep = _regs._eflags & X86_EFLAGS_TF; >>>> break; >>>> >>>> + CASE_SIMD_PACKED_FP(, 0x0f, 0x50): /* movmskp{s,d} xmm,reg */ >>>> + CASE_SIMD_PACKED_FP(_VEX, 0x0f, 0x50): /* vmovmskp{s,d} {x,y}mm,reg */ >>>> + CASE_SIMD_PACKED_INT(0x0f, 0xd7): /* pmovmskb {,x}mm,reg */ >>>> + case X86EMUL_OPC_VEX_66(0x0f, 0xd7): /* vpmovmskb {x,y}mm,reg */ >>>> + generate_exception_if(ea.type != OP_REG, EXC_UD); >>>> + >>>> + if ( vex.opcx == vex_none ) >>>> + { >>>> + if ( vex.pfx & VEX_PREFIX_DOUBLE_MASK ) >>>> + vcpu_must_have(sse2); >>>> + else >>>> + { >>>> + if ( b != 0x50 ) >>>> + host_and_vcpu_must_have(mmx); >>>> + vcpu_must_have(sse); >>>> + } >>>> + if ( b == 0x50 || (vex.pfx & VEX_PREFIX_DOUBLE_MASK) ) >>>> + get_fpu(X86EMUL_FPU_xmm, &fic); >>>> + else >>>> + get_fpu(X86EMUL_FPU_mmx, &fic); >>>> + } >>>> + else >>>> + { >>>> + generate_exception_if(vex.reg != 0xf, EXC_UD); >>> Isn't this TwoOp? >> Yes, hence the #UD. Or is the question "Why is this being done >> here, instead of on the common code path?" If so - the common >> code path doing this isn't being reached, as we invoke the stub >> inside the case block. > > My question was actually "Why isn't this based on d & TwoByte"? Because the above variant is more explicit imo: Why depend on some derived info when we can use the original encoding directly? >>>> + if ( b == 0x50 || !vex.l ) >>>> + host_and_vcpu_must_have(avx); >>>> + else >>>> + host_and_vcpu_must_have(avx2); >>>> + get_fpu(X86EMUL_FPU_ymm, &fic); >>>> + } >>>> + >>>> + opc = init_prefixes(stub); >>>> + opc[0] = b; >>>> + /* Convert GPR destination to %rAX. */ >>>> + rex_prefix &= ~REX_R; >>>> + vex.r = 1; >>>> + if ( !mode_64bit() ) >>>> + vex.w = 0; >>>> + opc[1] = modrm & 0xc7; >>>> + fic.insn_bytes = PFX_BYTES + 2; >>>> + opc[2] = 0xc3; >>>> + >>>> + copy_REX_VEX(opc, rex_prefix, vex); >>>> + invoke_stub("", "", "=a" (dst.val) : [dummy] "i" (0)); >>>> + >>>> + put_stub(stub); >>>> + put_fpu(&fic); >>>> + >>>> + dst.bytes = 4; >>> Somewhere there should probably be an ASSERT() that state->simd_size is >>> 0, so we don't try to invoke the stub twice. >> I can do this, but it didn't seem natural to do so when putting this >> together, as - obviously - I did produce/check the table entries at >> basically the same time as I did write this code. > > It is obvious to you now, and I do trust that you checked the > correctness as it related to this patch, but it will not be obvious in 6 > months time with another dev-cycles worth of change on top. Right, that's what I was trying to hint at with my explanation of why I didn't think of adding ASSERT()s to this effect. Jan _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx https://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.