[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v4 02/17] x86emul: support MMX/SSE{, 2, 3} moves



>>> On 01.03.17 at 14:59, <andrew.cooper3@xxxxxxxxxx> wrote:
> On 28/02/17 12:50, Jan Beulich wrote:
>> Previously supported insns are being converted to the new model, and
>> several new ones are being added.
>>
>> To keep the stub handling reasonably simple, integrate SET_SSE_PREFIX()
>> into copy_REX_VEX(), at once switching the stubs to use an empty REX
>> prefix instead of a double DS: one (no byte registers are being
>> accessed, so an empty REX prefix has no effect), except (of course) for
>> the 32-bit test harness build.
> 
> Why switch a %ds override to REX?  There doesn't appear to be any benefit.

It eliminates a mode_64bit() conditional from the non-VEX path in
the macro. And then, honestly, this is a question I would have
expected (if at all) the first time you came across this. I also think
avoiding two identical prefixes is (marginally) better architecture-
wise.

>> @@ -383,15 +383,35 @@ union vex {
>>      };
>>  };
>>  
>> +#ifdef __x86_64__
>> +# define PFX2 REX_PREFIX
>> +#else
>> +# define PFX2 0x3e
>> +#endif
>> +#define PFX_BYTES 3
>> +#define init_prefixes(stub) ({ \
>> +    uint8_t *buf_ = get_stub(stub); \
>> +    buf_[0] = 0x3e; \
>> +    buf_[1] = PFX2; \
>> +    buf_[2] = 0x0f; \
>> +    buf_ + 3; \
>> +})
>> +
>>  #define copy_REX_VEX(ptr, rex, vex) do { \
>>      if ( (vex).opcx != vex_none ) \
>>      { \
>>          if ( !mode_64bit() ) \
>>              vex.reg |= 8; \
>> -        ptr[0] = 0xc4, ptr[1] = (vex).raw[0], ptr[2] = (vex).raw[1]; \
>> +        (ptr)[0 - PFX_BYTES] = 0xc4; \
>> +        (ptr)[1 - PFX_BYTES] = (vex).raw[0]; \
>> +        (ptr)[2 - PFX_BYTES] = (vex).raw[1]; \
>> +    } \
>> +    else \
>> +    { \
>> +        if ( (vex).pfx ) \
>> +            (ptr)[0 - PFX_BYTES] = sse_prefix[(vex).pfx - 1]; \
>> +        (ptr)[1 - PFX_BYTES] |= rex; \
> 
> This is no longer guarded by mode_64bit().  Won't this result in %ds |
> rex in the 32bit test stubs?

Correct. But please realize that rex is zero at all times when
emulating other than 64-bit mode.

>> @@ -5429,6 +5496,57 @@ x86_emulate(
>>          singlestep = _regs._eflags & X86_EFLAGS_TF;
>>          break;
>>  
>> +    CASE_SIMD_PACKED_FP(, 0x0f, 0x50):     /* movmskp{s,d} xmm,reg */
>> +    CASE_SIMD_PACKED_FP(_VEX, 0x0f, 0x50): /* vmovmskp{s,d} {x,y}mm,reg */
>> +    CASE_SIMD_PACKED_INT(0x0f, 0xd7):      /* pmovmskb {,x}mm,reg */
>> +    case X86EMUL_OPC_VEX_66(0x0f, 0xd7):   /* vpmovmskb {x,y}mm,reg */
>> +        generate_exception_if(ea.type != OP_REG, EXC_UD);
>> +
>> +        if ( vex.opcx == vex_none )
>> +        {
>> +            if ( vex.pfx & VEX_PREFIX_DOUBLE_MASK )
>> +                vcpu_must_have(sse2);
>> +            else
>> +            {
>> +                if ( b != 0x50 )
>> +                    host_and_vcpu_must_have(mmx);
>> +                vcpu_must_have(sse);
>> +            }
>> +            if ( b == 0x50 || (vex.pfx & VEX_PREFIX_DOUBLE_MASK) )
>> +                get_fpu(X86EMUL_FPU_xmm, &fic);
>> +            else
>> +                get_fpu(X86EMUL_FPU_mmx, &fic);
>> +        }
>> +        else
>> +        {
>> +            generate_exception_if(vex.reg != 0xf, EXC_UD);
> 
> Isn't this TwoOp?

Yes, hence the #UD. Or is the question "Why is this being done
here, instead of on the common code path?" If so - the common
code path doing this isn't being reached, as we invoke the stub
inside the case block.

>> +            if ( b == 0x50 || !vex.l )
>> +                host_and_vcpu_must_have(avx);
>> +            else
>> +                host_and_vcpu_must_have(avx2);
>> +            get_fpu(X86EMUL_FPU_ymm, &fic);
>> +        }
>> +
>> +        opc = init_prefixes(stub);
>> +        opc[0] = b;
>> +        /* Convert GPR destination to %rAX. */
>> +        rex_prefix &= ~REX_R;
>> +        vex.r = 1;
>> +        if ( !mode_64bit() )
>> +            vex.w = 0;
>> +        opc[1] = modrm & 0xc7;
>> +        fic.insn_bytes = PFX_BYTES + 2;
>> +        opc[2] = 0xc3;
>> +
>> +        copy_REX_VEX(opc, rex_prefix, vex);
>> +        invoke_stub("", "", "=a" (dst.val) : [dummy] "i" (0));
>> +
>> +        put_stub(stub);
>> +        put_fpu(&fic);
>> +
>> +        dst.bytes = 4;
> 
> Somewhere there should probably be an ASSERT() that state->simd_size is
> 0, so we don't try to invoke the stub twice.

I can do this, but it didn't seem natural to do so when putting this
together, as - obviously - I did produce/check the table entries at
basically the same time as I did write this code.

>> @@ -6553,23 +6686,14 @@ x86_emulate(
>>  
>>      if ( state->simd_size )
>>      {
>> -#ifdef __XEN__
>> -        uint8_t *buf = stub.ptr;
>> -#else
>> -        uint8_t *buf = get_stub(stub);
>> -#endif

Note, btw, how the ugly #ifdef-ary goes away here.

>>          generate_exception_if(!op_bytes, EXC_UD);
>>          generate_exception_if(vex.opcx && (d & TwoOp) && vex.reg != 0xf,
>>                                EXC_UD);
>>  
>> -        if ( !buf )
>> +        if ( !opc )
>>              BUG();
>> -        if ( vex.opcx == vex_none )
>> -            SET_SSE_PREFIX(buf[0], vex.pfx);
>> -
>> -        buf[fic.insn_bytes] = 0xc3;
>> -        copy_REX_VEX(buf, rex_prefix, vex);
>> +        opc[fic.insn_bytes - PFX_BYTES] = 0xc3;
> 
> fic.insn_bytes - PFX_BYTES is in the middle of the opcode, isn't it?

No - note the difference between opc and buf: The former points
past the common prefix bytes.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.