[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v3 02/18] x86emul: support most memory accessing MMX/SSE{, 2, 3} insns



>>> On 20.02.17 at 14:45, <andrew.cooper3@xxxxxxxxxx> wrote:
> On 15/02/17 11:07, Jan Beulich wrote:
>> --- a/xen/arch/x86/x86_emulate/x86_emulate.c
>> +++ b/xen/arch/x86/x86_emulate/x86_emulate.c
>> @@ -45,6 +45,8 @@
>>  #define ModRM       (1<<6)
>>  /* Destination is only written; never read. */
>>  #define Mov         (1<<7)
>> +/* VEX/EVEX (SIMD only): 2nd source operand unused (must be all ones) */
>> +#define TwoOp       Mov
> 
> Is this safe?  It looks overloaded to me.  The Mov behaviour is still
> applicable even with TwoOp VEX/EVEX encodings.

It is safe. Mov only really matters for instructions writing to memory,
and there's no read-modify-write instruction in the entire SIMD set
afaict.

>> @@ -180,8 +182,44 @@ static const opcode_desc_t opcode_table[
>>      ImplicitOps, ImplicitOps, ByteOp|DstMem|SrcNone|ModRM, 
> DstMem|SrcNone|ModRM
>>  };
>>  
>> +enum simd_opsize {
>> +    simd_none,
> 
> Please can we have newlines here,
> 
>> +    /*
>> +     * Ordinary packed integers:
>> +     * - 64 bits without prefix 66 (MMX)
>> +     * - 128 bits with prefix 66 (SSEn)
>> +     * - 128/256 bits depending on VEX.L (AVX)
>> +     */
> 
> and here, etc, to help identify which comment is attached to which enum.

Well, if you think it helps.

>> +    simd_packed_int,
>> +    /*
>> +     * Ordinary packed/scalar floating point:
>> +     * - 128 bits without prefix or with prefix 66 (SSEn)
>> +     * - 128/256 bits depending on VEX.L (AVX)
>> +     * - 32 bits with prefix F3 (scalar single)
>> +     * - 64 bits with prefix F2 (scalar doubgle)
>> +     */
>> +    simd_any_fp,
>> +    /*
>> +     * Packed floating point:
>> +     * - 128 bits without prefix or with prefix 66 (SSEn)
>> +     * - 128/256 bits depending on VEX.L (AVX)
>> +     */
>> +    simd_packed_fp,
>> +    /*
>> +     * Single precision packed/scalar floating point:
>> +     * - 128 bits without prefix (SSEn)
>> +     * - 128/256 bits depending on VEX.L, no prefix (AVX)
>> +     * - 32 bits with prefix F3 (scalar)
>> +     */
>> +    simd_single_fp,
>> +    /* Operand size encoded in non-standard way. */
>> +    simd_other
> 
> ,

Specifically not in this case, as I mean this to remain the last entry
even if new enumerators get added.

>> @@ -196,22 +234,41 @@ static const struct {
>>      [0x0d] = { ImplicitOps|ModRM },
>>      [0x0e] = { ImplicitOps },
>>      [0x0f] = { ModRM|SrcImmByte },
>> -    [0x10 ... 0x1f] = { ImplicitOps|ModRM },
>> +    [0x10] = { DstImplicit|SrcMem|ModRM|Mov, simd_any_fp },
>> +    [0x11] = { DstMem|SrcImplicit|ModRM|Mov, simd_any_fp },
>> +    [0x12 ... 0x13] = { ImplicitOps|ModRM },
>> +    [0x14 ... 0x15] = { DstImplicit|SrcMem|ModRM, simd_packed_fp },
>> +    [0x16 ... 0x1f] = { ImplicitOps|ModRM },
>>      [0x20 ... 0x21] = { DstMem|SrcImplicit|ModRM },
>>      [0x22 ... 0x23] = { DstImplicit|SrcMem|ModRM },
>> -    [0x28 ... 0x2f] = { ImplicitOps|ModRM },
>> +    [0x28] = { DstImplicit|SrcMem|ModRM|Mov, simd_packed_fp },
>> +    [0x29] = { DstMem|SrcImplicit|ModRM|Mov, simd_packed_fp },
>> +    [0x2a] = { ImplicitOps|ModRM },
>> +    [0x2b] = { DstMem|SrcImplicit|ModRM|Mov, simd_any_fp },
>> +    [0x2c ... 0x2f] = { ImplicitOps|ModRM },
>>      [0x30 ... 0x35] = { ImplicitOps },
>>      [0x37] = { ImplicitOps },
>>      [0x38] = { DstReg|SrcMem|ModRM },
>>      [0x3a] = { DstReg|SrcImmByte|ModRM },
>>      [0x40 ... 0x4f] = { DstReg|SrcMem|ModRM|Mov },
>> -    [0x50 ... 0x6e] = { ModRM },
>> -    [0x6f] = { ImplicitOps|ModRM },
>> -    [0x70 ... 0x73] = { SrcImmByte|ModRM },
>> -    [0x74 ... 0x76] = { ModRM },
>> -    [0x77] = { ImplicitOps },
>> +    [0x50] = { ModRM },
>> +    [0x51] = { DstImplicit|SrcMem|ModRM|TwoOp, simd_any_fp },
>> +    [0x52 ... 0x53] = { DstImplicit|SrcMem|ModRM|TwoOp, simd_single_fp },
> 
> RCPPS/RCPSS all have 3 operands.  Why is TwoOp used here?

Not exactly: The packed ones have two operands, while the
scalar ones have three. See the adjustment to the end of
x86_decode_twobyte() for how this is being dealt with.

>> +    case simd_single_fp:
>> +        if ( vex.pfx & VEX_PREFIX_DOUBLE_MASK )
> 
> This logic would be far easier to follow by using vex.pfx == vex_66 ||
> vex.pfx == vex_f2.

That would be two comparisons, and I think the constant's name
is sufficiently descriptive to understand what's going on. In fact
I think it's easier to understand with that constant, than if I used
vex_66 and vex_f2, which doesn't make immediately visible that
we care about double variants here.

>> +        {
>> +            op_bytes = 0;
>> +            break;
>> +    case simd_packed_fp:
>> +            if ( vex.pfx & VEX_PREFIX_SCALAR_MASK )
> 
> Similarly here, vex_none || vex_f3

If at all, vex_f3 || vex_f2, but see above (and you having got it
wrong is a good indication to me that using the constants is
better).

> Having said that, taking VSHUFPS (0xc6) as example of simd_packed_fp,
> this instruction is defined for vex_none and vex_66, both of which have
> op_bytes of 16 when not vex encoded.

Which is to tell me what? This matches the common pattern
(vex_none and vex_66 being packed - i.e.full width - operations,
while vex_f3 and vex_f2 are scalar ones).

>> +    CASE_SIMD_SCALAR_FP(, 0x0f, 0x2b):     /* movnts{s,d} xmm,mem */
>> +        host_and_vcpu_must_have(sse4a);
>> +        /* fall through */
>> +    CASE_SIMD_PACKED_FP(, 0x0f, 0x2b):     /* movntp{s,d} xmm,m128 */
>> +    CASE_SIMD_PACKED_FP(_VEX, 0x0f, 0x2b): /* vmovntp{s,d} {x,y}mm,mem */
>> +        generate_exception_if(ea.type != OP_MEM, EXC_UD);
>> +        sfence = true;
> 
> Why do we need to emit an sfence at this point?  The software hitting
> this emulation is the entity which should be making sfence decisions.

This is to be on the safe side: The instruction in the stub doesn't
write to the intended destination, but to mmval. We then copy
from there to the final destination. While the CPU _should_
consult its WC buffer for reads, I'd rather not rely on the absence
of errata here.

>> @@ -6457,22 +6917,6 @@ x86_insn_is_mem_write(const struct x86_e
>>      case 0x6c: case 0x6d:                /* INS */
>>      case 0xa4: case 0xa5:                /* MOVS */
>>      case 0xaa: case 0xab:                /* STOS */
>> -    case X86EMUL_OPC(0x0f, 0x11):        /* MOVUPS */
>> -    case X86EMUL_OPC_VEX(0x0f, 0x11):    /* VMOVUPS */
>> -    case X86EMUL_OPC_66(0x0f, 0x11):     /* MOVUPD */
>> -    case X86EMUL_OPC_VEX_66(0x0f, 0x11): /* VMOVUPD */
>> -    case X86EMUL_OPC_F3(0x0f, 0x11):     /* MOVSS */
>> -    case X86EMUL_OPC_VEX_F3(0x0f, 0x11): /* VMOVSS */
>> -    case X86EMUL_OPC_F2(0x0f, 0x11):     /* MOVSD */
>> -    case X86EMUL_OPC_VEX_F2(0x0f, 0x11): /* VMOVSD */
>> -    case X86EMUL_OPC(0x0f, 0x29):        /* MOVAPS */
>> -    case X86EMUL_OPC_VEX(0x0f, 0x29):    /* VMOVAPS */
>> -    case X86EMUL_OPC_66(0x0f, 0x29):     /* MOVAPD */
>> -    case X86EMUL_OPC_VEX_66(0x0f, 0x29): /* VMOVAPD */
>> -    case X86EMUL_OPC(0x0f, 0x2b):        /* MOVNTPS */
>> -    case X86EMUL_OPC_VEX(0x0f, 0x2b):    /* VMOVNTPS */
>> -    case X86EMUL_OPC_66(0x0f, 0x2b):     /* MOVNTPD */
>> -    case X86EMUL_OPC_VEX_66(0x0f, 0x2b): /* VMOVNTPD */
> 
> Where have these gone?

Nowhere, they're not needed anymore now that the twobyte_table[]
entries no longer use DstImplicit, but DstMem.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.