[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v3 2/3] x86/emulate: add support of emulating SSE2 instruction {, v}movd mm, r32/m32 and {, v}movq mm, r64



>>> On 01.08.16 at 15:28, <mdontu@xxxxxxxxxxxxxxx> wrote:
> On Monday 01 August 2016 06:59:08 Jan Beulich wrote:
>> >>> On 01.08.16 at 14:53, <mdontu@xxxxxxxxxxxxxxx> wrote:  
>> > On Monday 01 August 2016 10:52:12 Andrew Cooper wrote:  
>> >> On 01/08/16 03:52, Mihai Donțu wrote:  
>> >> > Found that Windows driver was using a SSE2 instruction MOVD.
>> >> >
>> >> > Signed-off-by: Zhi Wang <zhi.a.wang@xxxxxxxxx>
>> >> > Signed-off-by: Mihai Donțu <mdontu@xxxxxxxxxxxxxxx>
>> >> > ---
>> >> > Picked from the XenServer 7 patch queue, as suggested by Andrew Cooper
>> >> >
>> >> > Changed since v2:
>> >> >  * handle the case where the destination is a GPR
>> >> > ---
>> >> >  xen/arch/x86/x86_emulate/x86_emulate.c | 38   
>> > +++++++++++++++++++++++++++++++---  
>> >> >  1 file changed, 35 insertions(+), 3 deletions(-)
>> >> >
>> >> > diff --git a/xen/arch/x86/x86_emulate/x86_emulate.c   
>> > b/xen/arch/x86/x86_emulate/x86_emulate.c  
>> >> > index 44de3b6..9f89ada 100644
>> >> > --- a/xen/arch/x86/x86_emulate/x86_emulate.c
>> >> > +++ b/xen/arch/x86/x86_emulate/x86_emulate.c
>> >> > @@ -204,7 +204,7 @@ static uint8_t twobyte_table[256] = {
>> >> >      /* 0x60 - 0x6F */
>> >> >      0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ImplicitOps|ModRM,
>> >> >      /* 0x70 - 0x7F */
>> >> > -    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ImplicitOps|ModRM,
>> >> > +    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ImplicitOps|ModRM,   
>> > ImplicitOps|ModRM,  
>> >> >      /* 0x80 - 0x87 */
>> >> >      ImplicitOps, ImplicitOps, ImplicitOps, ImplicitOps,
>> >> >      ImplicitOps, ImplicitOps, ImplicitOps, ImplicitOps,
>> >> > @@ -4409,6 +4409,10 @@ x86_emulate(
>> >> >      case 0x6f: /* movq mm/m64,mm */
>> >> >                 /* {,v}movdq{a,u} xmm/m128,xmm */
>> >> >                 /* vmovdq{a,u} ymm/m256,ymm */
>> >> > +    case 0x7e: /* movd mm,r/m32 */
>> >> > +               /* movq mm,r/m64 */
>> >> > +               /* {,v}movd xmm,r/m32 */
>> >> > +               /* {,v}movq xmm,r/m64 */
>> >> >      case 0x7f: /* movq mm,mm/m64 */
>> >> >                 /* {,v}movdq{a,u} xmm,xmm/m128 */
>> >> >                 /* vmovdq{a,u} ymm,ymm/m256 */
>> >> > @@ -4432,7 +4436,17 @@ x86_emulate(
>> >> >                  host_and_vcpu_must_have(sse2);
>> >> >                  buf[0] = 0x66; /* SSE */
>> >> >                  get_fpu(X86EMUL_FPU_xmm, &fic);
>> >> > -                ea.bytes = (b == 0xd6 ? 8 : 16);
>> >> > +                switch ( b )
>> >> > +                {
>> >> > +                case 0x7e:
>> >> > +                    ea.bytes = 4;
>> >> > +                    break;
>> >> > +                case 0xd6:
>> >> > +                    ea.bytes = 8;
>> >> > +                    break;
>> >> > +                default:
>> >> > +                    ea.bytes = 16;
>> >> > +                }
>> >> >                  break;
>> >> >              case vex_none:
>> >> >                  if ( b != 0xe7 )
>> >> > @@ -4452,7 +4466,17 @@ x86_emulate(
>> >> >                      ((vex.pfx != vex_66) && (vex.pfx != vex_f3)));
>> >> >              host_and_vcpu_must_have(avx);
>> >> >              get_fpu(X86EMUL_FPU_ymm, &fic);
>> >> > -            ea.bytes = (b == 0xd6 ? 8 : (16 << vex.l));
>> >> > +            switch ( b )
>> >> > +            {
>> >> > +            case 0x7e:
>> >> > +                ea.bytes = 4;
>> >> > +                break;
>> >> > +            case 0xd6:
>> >> > +                ea.bytes = 8;
>> >> > +                break;
>> >> > +            default:
>> >> > +                ea.bytes = 16 << vex.l;
>> >> > +            }
>> >> >          }
>> >> >          if ( ea.type == OP_MEM )
>> >> >          {
>> >> > @@ -4468,6 +4492,14 @@ x86_emulate(
>> >> >              vex.b = 1;
>> >> >              buf[4] &= 0x38;
>> >> >          }
>> >> > +        else if ( b == 0x7e )
>> >> > +        {
>> >> > +            /* convert the GPR destination to (%rAX) */
>> >> > +            *((unsigned long *)&mmvalp) = (unsigned long)ea.reg;
>> >> > +            rex_prefix &= ~REX_B;
>> >> > +            vex.b = 1;
>> >> > +            buf[4] &= 0x38;
>> >> > +        }    
>> >> 
>> >> Thankyou for doing this.  However, looking at it, it has some code in
>> >> common with the "ea.type == OP_MEM" clause.
>> >> 
>> >> Would this work?
>> >> 
>> >> diff --git a/xen/arch/x86/x86_emulate/x86_emulate.c
>> >> b/xen/arch/x86/x86_emulate/x86_emulate.c
>> >> index fe594ba..90db067 100644
>> >> --- a/xen/arch/x86/x86_emulate/x86_emulate.c
>> >> +++ b/xen/arch/x86/x86_emulate/x86_emulate.c
>> >> @@ -4453,16 +4453,25 @@ x86_emulate(
>> >>              get_fpu(X86EMUL_FPU_ymm, &fic);
>> >>              ea.bytes = 16 << vex.l;
>> >>          }
>> >> -        if ( ea.type == OP_MEM )
>> >> +        if ( ea.type == OP_MEM || ea.type == OP_REG )
>> >>          {
>> >> -            /* XXX enable once there is ops->ea() or equivalent
>> >> -            generate_exception_if((vex.pfx == vex_66) &&
>> >> -                                  (ops->ea(ea.mem.seg, ea.mem.off)
>> >> -                                   & (ea.bytes - 1)), EXC_GP, 0); */
>> >> -            if ( b == 0x6f )
>> >> -                rc = ops->read(ea.mem.seg, ea.mem.off+0, mmvalp,
>> >> -                               ea.bytes, ctxt);
>> >>              /* convert memory operand to (%rAX) */
>> >> +
>> >> +            if ( ea.type == OP_MEM)
>> >> +            {
>> >> +                /* XXX enable once there is ops->ea() or equivalent
>> >> +                   generate_exception_if((vex.pfx == vex_66) &&
>> >> +                   (ops->ea(ea.mem.seg, ea.mem.off)
>> >> +                   & (ea.bytes - 1)), EXC_GP, 0); */
>> >> +                if ( b == 0x6f )
>> >> +                    rc = ops->read(ea.mem.seg, ea.mem.off+0, mmvalp,
>> >> +                                   ea.bytes, ctxt);
>> >> +            }
>> >> +            else if ( ea.type == OP_REG )
>> >> +            {
>> >> +                *((unsigned long *)&mmvalp) = (unsigned long)ea.reg;
>> >> +            }
>> >> +
>> >>              rex_prefix &= ~REX_B;
>> >>              vex.b = 1;
>> >>              buf[4] &= 0x38;
>> >> 
>> >> 
>> >> This is untested, but avoids duplicating this bit of state maniupulation. 
>> >>  
>> > 
>> > Your suggestion makes sense, but I'm starting to doubt my initial
>> > patch. :-) I'm testing "movq xmm1, xmm1" and noticing that it takes the
>> > GPR-handling route and I can't seem to be able to easily prevent it
>> > with !(rex_prefix & REX_B), as rex_prefix == 0 and vex.b == 1. I need
>> > to take a harder look at how that class of instructions is coded.  
>> 
>> You obviously need to distinguish the two kinds of register sources/
>> destinations: GPRs need suitable re-writing of the instruction (without
>> having looked at the most recent version of the patch yet I btw doubt
>> converting register to memory operands is the most efficient model),
>> while MMs, XMMs, and YMMs can retain their register encoding.
> 
> Regarding efficiency, I'm not married with the approach I've proposed.
> If you can give me a few more hints, I can give it a try.

I'd rather pick a fixed register and update the regs->... field from that
after the stub was executed. E.g. using rAX and treating it just like a
return value of the "call". But maybe I'm imagining this easier than it
really is; as an alternative I'd then suggest really following what Andrew
said - use a pointer into regs->, not mmvalp. But (as said in the review
mail) you'd then have the problem of the missing zero-extension for
writes to 32-bit GPRs

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.