[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v4 02/17] x86emul: support MMX/SSE{, 2, 3} moves



>>> On 01.03.17 at 20:56, <andrew.cooper3@xxxxxxxxxx> wrote:
> On 01/03/17 14:19, Jan Beulich wrote:
>>>>> On 01.03.17 at 14:59, <andrew.cooper3@xxxxxxxxxx> wrote:
>>> On 28/02/17 12:50, Jan Beulich wrote:
>> I also think avoiding two identical prefixes is (marginally) better 
>> architecture-
>> wise.
> 
> There is no specific advice in the AMD optimisation guide.
> 
> The Intel guide warns against unnecessary use of 0x66 and 0x67
> (specifically in the Length-Changing Prefixes section), which
> dynamically change the length of the instruction.  This doesn't apply to
> us in this situation.
> 
> The only other reference to prefixes comes from the Other Decoding
> Guidelines section, which state (obviously) that extra prefixes decrease
> instruction bandwidth (as more bytes need consuming to decode the
> instruction), and that any instruction with multiple prefixes at all
> require decoding in the first decoder, which builds competition of resource.
> 
> I can't see anything suggesting that a double %ds vs a single %ds and
> rex prefix would make any difference.

Well - performance isn't of interest here anyway, only correctness
is. Since not emitting multiple identical prefixes is marginally better
and since using REX here is slightly better overall code, I'd prefer
for it to stay this way.

>>>> @@ -5429,6 +5496,57 @@ x86_emulate(
>>>>          singlestep = _regs._eflags & X86_EFLAGS_TF;
>>>>          break;
>>>>  
>>>> +    CASE_SIMD_PACKED_FP(, 0x0f, 0x50):     /* movmskp{s,d} xmm,reg */
>>>> +    CASE_SIMD_PACKED_FP(_VEX, 0x0f, 0x50): /* vmovmskp{s,d} {x,y}mm,reg */
>>>> +    CASE_SIMD_PACKED_INT(0x0f, 0xd7):      /* pmovmskb {,x}mm,reg */
>>>> +    case X86EMUL_OPC_VEX_66(0x0f, 0xd7):   /* vpmovmskb {x,y}mm,reg */
>>>> +        generate_exception_if(ea.type != OP_REG, EXC_UD);
>>>> +
>>>> +        if ( vex.opcx == vex_none )
>>>> +        {
>>>> +            if ( vex.pfx & VEX_PREFIX_DOUBLE_MASK )
>>>> +                vcpu_must_have(sse2);
>>>> +            else
>>>> +            {
>>>> +                if ( b != 0x50 )
>>>> +                    host_and_vcpu_must_have(mmx);
>>>> +                vcpu_must_have(sse);
>>>> +            }
>>>> +            if ( b == 0x50 || (vex.pfx & VEX_PREFIX_DOUBLE_MASK) )
>>>> +                get_fpu(X86EMUL_FPU_xmm, &fic);
>>>> +            else
>>>> +                get_fpu(X86EMUL_FPU_mmx, &fic);
>>>> +        }
>>>> +        else
>>>> +        {
>>>> +            generate_exception_if(vex.reg != 0xf, EXC_UD);
>>> Isn't this TwoOp?
>> Yes, hence the #UD. Or is the question "Why is this being done
>> here, instead of on the common code path?" If so - the common
>> code path doing this isn't being reached, as we invoke the stub
>> inside the case block.
> 
> My question was actually "Why isn't this based on d & TwoByte"?

Because the above variant is more explicit imo: Why depend on
some derived info when we can use the original encoding directly?

>>>> +            if ( b == 0x50 || !vex.l )
>>>> +                host_and_vcpu_must_have(avx);
>>>> +            else
>>>> +                host_and_vcpu_must_have(avx2);
>>>> +            get_fpu(X86EMUL_FPU_ymm, &fic);
>>>> +        }
>>>> +
>>>> +        opc = init_prefixes(stub);
>>>> +        opc[0] = b;
>>>> +        /* Convert GPR destination to %rAX. */
>>>> +        rex_prefix &= ~REX_R;
>>>> +        vex.r = 1;
>>>> +        if ( !mode_64bit() )
>>>> +            vex.w = 0;
>>>> +        opc[1] = modrm & 0xc7;
>>>> +        fic.insn_bytes = PFX_BYTES + 2;
>>>> +        opc[2] = 0xc3;
>>>> +
>>>> +        copy_REX_VEX(opc, rex_prefix, vex);
>>>> +        invoke_stub("", "", "=a" (dst.val) : [dummy] "i" (0));
>>>> +
>>>> +        put_stub(stub);
>>>> +        put_fpu(&fic);
>>>> +
>>>> +        dst.bytes = 4;
>>> Somewhere there should probably be an ASSERT() that state->simd_size is
>>> 0, so we don't try to invoke the stub twice.
>> I can do this, but it didn't seem natural to do so when putting this
>> together, as - obviously - I did produce/check the table entries at
>> basically the same time as I did write this code.
> 
> It is obvious to you now, and I do trust that you checked the
> correctness as it related to this patch, but it will not be obvious in 6
> months time with another dev-cycles worth of change on top.

Right, that's what I was trying to hint at with my explanation of
why I didn't think of adding ASSERT()s to this effect.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.