[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Test if on newer xen all SSE2 and SSE3 instructions are effectively working

Il 21/11/2013 16:38, Andrew Cooper ha scritto:
On 21/11/13 15:32, George Dunlap wrote:
On 21/11/13 15:22, Andrew Cooper wrote:
On 21/11/13 15:12, George Dunlap wrote:
On Thu, Nov 21, 2013 at 10:52 AM, Fabio Fantoni
<fabio.fantoni@xxxxxxx> wrote:
I'm trying to test if on newer xen all SSE2 and SSE3 instructions are
effectively working.
I tried this simple program to test SSE2:
But probably use only instructions with short operand because SSE2
on this
program is working also on old xen 4.0 where Jan Beulich patches to
long operands are missing.
Are there any minimal program to test if SSE instructions with MMIO
8 byte are working?
I don't see the code there doing MMIO -- it's just doing operations on
normal RAM, which is not emulated by Xen at all, but executed natively
by the processor.

What you need is a program that will do this to an MMIO region -- that
will be a much trickier thing to set up, I think.

The problem with SSE is only when the guest performs an SSE (or larger)
operation on a piece of memory which ends up being emulated and handed
to qemu.  The ioreq protocol doesn't have a way of signalling an operand
width greater than 64 bits.
I'd like to emphasize the "and" in the first sentence.  You might be
able to trigger a Xen emulation in any number of ways (disabling HAP
and then doing an SSE instruction on an in-use PT might do it).  But
Xen allegedly already does the actual emulation correctly -- as Andy
said, it's only the path to qemu that wasn't working before.


Oops yes - I should have emphasised that a bit more.  I believe Jan
submitted a hacked-fix for the qemu path which fixes the immediate issue
(for 128bit emulation) but is in need of a redesign for wider emulation;
256bit is available with AVX, and 512bit is on its way with AVX2.

As for testing individual instructions, there is
tools/tests/x86_emulator/test_x86_emulator.c which tests a token few
instructions against Xen's emulation code, but it is far from comprehensive.


Thanks for all replies.
I tried x86_emulator on dom0 and SSE2 instructions seems ok:
Testing addl %%ecx,(%%eax)...           okay
Testing addl %%ecx,%%eax...             okay
Testing xorl (%%eax),%%ecx...           okay
Testing movl (%%eax),%%ecx...           okay
Testing lock cmpxchgb %%cl,(%%ebx)...   okay
Testing lock cmpxchgb %%cl,(%%ebx)...   okay
Testing xchgl %%ecx,(%%eax)...          okay
Testing lock cmpxchgl %%ecx,(%%ebx)...  okay
Testing rep movsw...                    okay
Testing btrl $0x1,(%edi)...             okay
Testing btrl %eax,(%edi)...             okay
Testing cmpxchg8b (%edi) [succeeding]...okay
Testing cmpxchg8b (%edi) [failing]...   okay
Testing movsxbd (%%eax),%%ecx...        okay
Testing movzxwd (%%eax),%%ecx...        okay
Testing movsxd (%%rax),%%rcx...         okay
Testing xadd %%ax,(%%ecx)...            okay
Testing dec %%ax...                     okay
Testing lea 8(%%ebp),%%eax...           okay
Testing daa/das (all inputs)...         skipped
Testing movq %mm3,(%ecx)...             okay
Testing movq (%edx),%mm5...             okay
Testing movdqu %xmm2,(%ecx)...          okay
Testing movdqu (%edx),%xmm4...          okay
Testing vmovdqu %ymm2,(%ecx)...         skipped
Testing vmovdqu (%edx),%ymm4...         skipped
Testing movsd %xmm5,(%ecx)...           okay
Testing movaps (%edx),%xmm7...          okay
Testing vmovsd %xmm5,(%ecx)...          skipped
Testing vmovaps (%edx),%ymm7...         skipped
Testing blowfish 32-bit code sequence..................................okay
Testing blowfish 64-bit code sequence.................................okay
Testing blowfish native execution...    okay

Same result on linux hvm domUs.

I'm trying to verify if SSE2 is fully working because on past Anthony's debug about qxl problem on linux hvm domUs showed up an error on SSE2 instructions.
After Jan Beulich patches these errors went away.
Qxl on windows 7 pro 64 bit domUs with qxl driver installed, is working but has big performance problem on screen refresh, same of before Jan Beulich patches.
Windows qxl driver code seems to use SSE2:
void CheckAndSetSSE2()
        mov eax, 0x0000001
        and edx, 0x4000000
        mov have_sse2, edx

    if (have_sse2) {
        have_sse2 = TRUE;
Time ago I tried to disable SSE from cpuid of xl cfg but windows not starts if I remember good. I don't have knownledge about SSE but with fast search on qxl driver code I noticed there are other SSE instructions missed on xen x86_emulator test.
Here a copy of 2 parts about from qxl driver code:
static _inline void fast_memcpy_unaligment(void *dest, const void *src, size_t len)
        mov ecx, len
        mov esi, src
        mov edi, dest

        cmp ecx, 128
        jb try_to_copy64

        prefetchnta [esi]
            prefetchnta [esi + 64]

            movdqu xmm0, [esi]
            movdqu xmm1, [esi + 16]
            movdqu xmm2, [esi + 32]
            movdqu xmm3, [esi + 48]

            prefetchnta [esi + 128]

            movntdq [edi], xmm0
            movntdq [edi + 16], xmm1
            movntdq [edi + 32], xmm2
            movntdq [edi + 48], xmm3

            movdqu xmm0, [esi + 64]
            movdqu xmm1, [esi + 80]
            movdqu xmm2, [esi + 96]
            movdqu xmm3, [esi + 112]

            movntdq [edi + 64], xmm0
            movntdq [edi + 80], xmm1
            movntdq [edi + 96], xmm2
            movntdq [edi + 112], xmm3

            add edi, 128
            add esi, 128
            sub ecx, 128
            cmp ecx, 128
            jae copy_128

            cmp ecx, 64
            jb try_to_copy32

             movdqu xmm0, [esi]
             movdqu xmm1, [esi + 16]
             movdqu xmm2, [esi + 32]
             movdqu xmm3, [esi + 48]

             movntdq [edi], xmm0
             movntdq [edi + 16], xmm1
             movntdq [edi + 32], xmm2
             movntdq [edi + 48], xmm3

             add edi, 64
             add esi, 64
             sub ecx, 64
             prefetchnta [esi]

             cmp ecx, 32
             jb try_to_copy16

             movdqu xmm0, [esi]
             movdqu xmm1, [esi + 16]
             movntdq [edi], xmm0
             movntdq [edi + 16], xmm1

             add edi, 32
             add esi, 32
             sub ecx, 32

             cmp ecx, 16
             jb try_to_copy4

             movdqu xmm0, [esi]
             movntdq [edi], xmm0

             add edi, 16
             add esi, 16
             sub ecx, 16

            cmp ecx, 4
            jb try_to_copy_1
            sub ecx, 4
            jmp try_to_copy4

            rep movsb


static _inline void SaveFPU(PDev *pdev, UINT8 FPUSave[])
    void *align_addr =  (void *)ALIGN((size_t)(FPUSave), SSE_ALIGN);

        mov edi, align_addr

        movdqa [edi], xmm0
        movdqa [edi + 16], xmm1
        movdqa [edi + 32], xmm2
        movdqa [edi + 48], xmm3

Is it possible to know if these instructions are currently full working on hvm domUs please?

About linux domUs with qxl tests details of latest debug is here if you want see it also:
Gave this error but I don't know if is related:
ioremap error for 0xfc001000-0xfc002000, requested 0x10, got 0x0

Thanks for any reply.

Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.