Xen project Mailing List

Re: [Xen-devel] [PATCH v4 06/44] x86emul: test for correct EVEX Disp8 scaling

From: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>

Date: Tue, 13 Nov 2018 15:45:56 +0000

Autocrypt: addr=andrew.cooper3@xxxxxxxxxx; prefer-encrypt=mutual; keydata= xsFNBFLhNn8BEADVhE+Hb8i0GV6mihnnr/uiQQdPF8kUoFzCOPXkf7jQ5sLYeJa0cQi6Penp VtiFYznTairnVsN5J+ujSTIb+OlMSJUWV4opS7WVNnxHbFTPYZVQ3erv7NKc2iVizCRZ2Kxn srM1oPXWRic8BIAdYOKOloF2300SL/bIpeD+x7h3w9B/qez7nOin5NzkxgFoaUeIal12pXSR Q354FKFoy6Vh96gc4VRqte3jw8mPuJQpfws+Pb+swvSf/i1q1+1I4jsRQQh2m6OTADHIqg2E ofTYAEh7R5HfPx0EXoEDMdRjOeKn8+vvkAwhviWXTHlG3R1QkbE5M/oywnZ83udJmi+lxjJ5 YhQ5IzomvJ16H0Bq+TLyVLO/VRksp1VR9HxCzItLNCS8PdpYYz5TC204ViycobYU65WMpzWe LFAGn8jSS25XIpqv0Y9k87dLbctKKA14Ifw2kq5OIVu2FuX+3i446JOa2vpCI9GcjCzi3oHV e00bzYiHMIl0FICrNJU0Kjho8pdo0m2uxkn6SYEpogAy9pnatUlO+erL4LqFUO7GXSdBRbw5 gNt25XTLdSFuZtMxkY3tq8MFss5QnjhehCVPEpE6y9ZjI4XB8ad1G4oBHVGK5LMsvg22PfMJ ISWFSHoF/B5+lHkCKWkFxZ0gZn33ju5n6/FOdEx4B8cMJt+cWwARAQABzSlBbmRyZXcgQ29v cGVyIDxhbmRyZXcuY29vcGVyM0BjaXRyaXguY29tPsLBegQTAQgAJAIbAwULCQgHAwUVCgkI CwUWAgMBAAIeAQIXgAUCWKD95wIZAQAKCRBlw/kGpdefoHbdD/9AIoR3k6fKl+RFiFpyAhvO 59ttDFI7nIAnlYngev2XUR3acFElJATHSDO0ju+hqWqAb8kVijXLops0gOfqt3VPZq9cuHlh IMDquatGLzAadfFx2eQYIYT+FYuMoPZy/aTUazmJIDVxP7L383grjIkn+7tAv+qeDfE+txL4 SAm1UHNvmdfgL2/lcmL3xRh7sub3nJilM93RWX1Pe5LBSDXO45uzCGEdst6uSlzYR/MEr+5Z JQQ32JV64zwvf/aKaagSQSQMYNX9JFgfZ3TKWC1KJQbX5ssoX/5hNLqxMcZV3TN7kU8I3kjK mPec9+1nECOjjJSO/h4P0sBZyIUGfguwzhEeGf4sMCuSEM4xjCnwiBwftR17sr0spYcOpqET ZGcAmyYcNjy6CYadNCnfR40vhhWuCfNCBzWnUW0lFoo12wb0YnzoOLjvfD6OL3JjIUJNOmJy RCsJ5IA/Iz33RhSVRmROu+TztwuThClw63g7+hoyewv7BemKyuU6FTVhjjW+XUWmS/FzknSi dAG+insr0746cTPpSkGl3KAXeWDGJzve7/SBBfyznWCMGaf8E2P1oOdIZRxHgWj0zNr1+ooF /PzgLPiCI4OMUttTlEKChgbUTQ+5o0P080JojqfXwbPAyumbaYcQNiH1/xYbJdOFSiBv9rpt TQTBLzDKXok86M7BTQRS4TZ/ARAAkgqudHsp+hd82UVkvgnlqZjzz2vyrYfz7bkPtXaGb9H4 Rfo7mQsEQavEBdWWjbga6eMnDqtu+FC+qeTGYebToxEyp2lKDSoAsvt8w82tIlP/EbmRbDVn 7bhjBlfRcFjVYw8uVDPptT0TV47vpoCVkTwcyb6OltJrvg/QzV9f07DJswuda1JH3/qvYu0p vjPnYvCq4NsqY2XSdAJ02HrdYPFtNyPEntu1n1KK+gJrstjtw7KsZ4ygXYrsm/oCBiVW/OgU g/XIlGErkrxe4vQvJyVwg6YH653YTX5hLLUEL1NS4TCo47RP+wi6y+TnuAL36UtK/uFyEuPy wwrDVcC4cIFhYSfsO0BumEI65yu7a8aHbGfq2lW251UcoU48Z27ZUUZd2Dr6O/n8poQHbaTd 6bJJSjzGGHZVbRP9UQ3lkmkmc0+XCHmj5WhwNNYjgbbmML7y0fsJT5RgvefAIFfHBg7fTY/i kBEimoUsTEQz+N4hbKwo1hULfVxDJStE4sbPhjbsPCrlXf6W9CxSyQ0qmZ2bXsLQYRj2xqd1 bpA+1o1j2N4/au1R/uSiUFjewJdT/LX1EklKDcQwpk06Af/N7VZtSfEJeRV04unbsKVXWZAk uAJyDDKN99ziC0Wz5kcPyVD1HNf8bgaqGDzrv3TfYjwqayRFcMf7xJaL9xXedMcAEQEAAcLB XwQYAQgACQUCUuE2fwIbDAAKCRBlw/kGpdefoG4XEACD1Qf/er8EA7g23HMxYWd3FXHThrVQ HgiGdk5Yh632vjOm9L4sd/GCEACVQKjsu98e8o3ysitFlznEns5EAAXEbITrgKWXDDUWGYxd pnjj2u+GkVdsOAGk0kxczX6s+VRBhpbBI2PWnOsRJgU2n10PZ3mZD4Xu9kU2IXYmuW+e5KCA vTArRUdCrAtIa1k01sPipPPw6dfxx2e5asy21YOytzxuWFfJTGnVxZZSCyLUO83sh6OZhJkk b9rxL9wPmpN/t2IPaEKoAc0FTQZS36wAMOXkBh24PQ9gaLJvfPKpNzGD8XWR5HHF0NLIJhgg 4ZlEXQ2fVp3XrtocHqhu4UZR4koCijgB8sB7Tb0GCpwK+C4UePdFLfhKyRdSXuvY3AHJd4CP 4JzW0Bzq/WXY3XMOzUTYApGQpnUpdOmuQSfpV9MQO+/jo7r6yPbxT7CwRS5dcQPzUiuHLK9i nvjREdh84qycnx0/6dDroYhp0DFv4udxuAvt1h4wGwTPRQZerSm4xaYegEFusyhbZrI0U9tJ B8WrhBLXDiYlyJT6zOV2yZFuW47VrLsjYnHwn27hmxTC/7tvG3euCklmkn9Sl9IAKFu29RSo d5bD8kMSCYsTqtTfT6W4A3qHGvIDta3ptLYpIAOD2sY3GYq2nf3Bbzx81wZK14JdDDHUX2Rs 6+ahAA==

Cc: George Dunlap <George.Dunlap@xxxxxxxxxxxxx>, xen-devel <xen-devel@xxxxxxxxxxxxxxxxxxxx>, Wei Liu <wei.liu2@xxxxxxxxxx>

Delivery-date: Tue, 13 Nov 2018 15:46:46 +0000

List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

Openpgp: preference=signencrypt

On 13/11/2018 11:12, Jan Beulich wrote: >>>> On 12.11.18 at 18:42, <andrew.cooper3@xxxxxxxxxx> wrote: >> On 25/09/18 14:29, Jan Beulich wrote: >>> Besides the already existing tests (which are going to be extended once >>> respective ISA extension support is complete), let's also ensure for >>> every individual insn that their Disp8 scaling (and memory access width) >>> are correct. >>> >>> Signed-off-by: Jan Beulich <jbeulich@xxxxxxxx> >> I can see what you're attempting to do, but you now have two >> implementations of the EVEX disp8 logic written by yourself. AFAICT, >> this doesn't actually check that the behaviour of the instruction in >> hardware matches your model of the instruction - it checks that two of >> your models are the same. > Correct, but I've specifically tried to make the two models sufficiently > different. > >> The only way I can think of testing the emulator model against hardware >> is to start with two memory area poisoned with a non-repeating pattern, >> and a src/dst register poisoned with a different non-repeating pattern. >> Then, execute a real instruction stub, emulate the other and memcmp() >> the two memory regions. > That's what some of the tests added right in patch 5 do. Did you > intentionally skip that patch while reviewing? I intentionally wanted to understand this patch first. > >> That way, a systematic error in the two models won't cancel out to "all ok". > Hence the two different models. I certainly realize the risk you > name. > >>> --- /dev/null >>> +++ b/tools/tests/x86_emulator/evex-disp8.c >>> @@ -0,0 +1,452 @@ >>> +#include <stdarg.h> >>> +#include <stdio.h> >>> + >>> +#include "x86-emulate.h" >> This now needs rearranging to avoid: >> >> x86-emulate.h:30:3: error: #error "Must not include <stdio.h> before >> x86-emulate.h" >> # error "Must not include <stdio.h> before x86-emulate.h" > Yes, I've already re-based over that other change. > >>> +enum vl { >>> + VL_128, >>> + VL_256, >>> + VL_512, >>> +}; >>> + >>> +enum scale { >>> + SC_vl, >>> + SC_el, >>> +}; >>> + >>> +enum vsz { >>> + VSZ_vl, >>> + VSZ_vl_2, /* VL / 2 */ >>> + VSZ_vl_4, /* VL / 4 */ >>> + VSZ_vl_8, /* VL / 8 */ >>> + /* "no broadcast" implied from here on. */ >>> + VSZ_el, >>> + VSZ_el_2, /* EL * 2 */ >>> + VSZ_el_4, /* EL * 4 */ >>> + VSZ_el_8, /* EL * 8 */ >>> +}; >>> + >> These acronyms get increasingly difficult to follow. What is el in this >> context? > VL -> vector length > EL -> element length Can you at least leave trailing comments after the identifiers for the benefit of people other than you reading the code? > >>> +static const struct test avx512f_all[] = { >>> + INSN_SFP(mov, 0f, 10), >>> + INSN_SFP(mov, 0f, 11), >>> + INSN_PFP_NB(mova, 0f, 28), >>> + INSN_PFP_NB(mova, 0f, 29), >>> + INSN(movdqa32, 66, 0f, 6f, vl, d_nb, vl), >>> + INSN(movdqa32, 66, 0f, 7f, vl, d_nb, vl), >>> + INSN(movdqa64, 66, 0f, 6f, vl, q_nb, vl), >>> + INSN(movdqa64, 66, 0f, 7f, vl, q_nb, vl), >>> + INSN(movdqu32, f3, 0f, 6f, vl, d_nb, vl), >>> + INSN(movdqu32, f3, 0f, 7f, vl, d_nb, vl), >>> + INSN(movdqu64, f3, 0f, 6f, vl, q_nb, vl), >>> + INSN(movdqu64, f3, 0f, 7f, vl, q_nb, vl), >>> + INSN(movntdq, 66, 0f, e7, vl, d_nb, vl), >>> + INSN(movntdqa, 66, 0f38, 2a, vl, d_nb, vl), >>> + INSN_PFP_NB(movnt, 0f, 2b), >>> + INSN_PFP_NB(movu, 0f, 10), >>> + INSN_PFP_NB(movu, 0f, 11), >>> +}; >>> + >>> +static const struct test avx512f_128[] = { >>> + INSN(mov, 66, 0f, 6e, el, dq64, el), >>> + INSN(mov, 66, 0f, 7e, el, dq64, el), >>> + INSN(movq, f3, 0f, 7e, el, q, el), >>> + INSN(movq, 66, 0f, d6, el, q, el), >>> +}; >>> + >>> +static const struct test avx512bw_all[] = { >>> + INSN(movdqu8, f2, 0f, 6f, vl, b, vl), >>> + INSN(movdqu8, f2, 0f, 7f, vl, b, vl), >>> + INSN(movdqu16, f2, 0f, 6f, vl, w, vl), >>> + INSN(movdqu16, f2, 0f, 7f, vl, w, vl), >>> +}; >>> + >>> +static const unsigned char vl_all[] = { VL_512, VL_128, VL_256 }; >>> +static const unsigned char vl_128[] = { VL_128 }; >> What are these for, and why is vl_all[]'s VL_128 out of order? > The RUN() macro invocations (further down) reference one them > each, to indicate what vector lengths to test. The first array > entry does always get used, while subsequent entries (if any) > require AVX512VL to be available. See the conditional at the top > of the inner loop in test_group(). After re-reading the apparently relevant bits of Vol 1, 2 and 3, I'm still actually none the wiser as to which AVX512 feature bits mean what. Is there a chapter with an overview that I've overlooked, or if not, can we see about putting one together? > >>> + >>> +/* >>> + * This table, indicating the presence of an immediate (byte) for an opcode >>> + * space 0f major opcode, is indexed by high major opcode byte nibble, with >>> + * each table element then bit-indexed by low major opcode byte nibble. >>> + */ >>> +static const uint16_t imm0f[16] = { >>> + [0x7] = (1 << 0x0) /* vpshuf* */ | >>> + (1 << 0x1) /* vps{ll,ra,rl}w */ | >>> + (1 << 0x2) /* vps{l,r}ld, vp{rol,ror,sra}{d,q} */ | >>> + (1 << 0x3) /* vps{l,r}l{,d}q */, >>> + [0xc] = (1 << 0x2) /* vcmp{p,s}{d,s} */ | >>> + (1 << 0x4) /* vpinsrw */ | >>> + (1 << 0x5) /* vpextrw */ | >>> + (1 << 0x6) /* vshufp{d,s} */, >>> +}; >>> + >>> +static struct x86_emulate_ops emulops; >>> + >>> +static unsigned int accessed[3 * 64]; >> What are the expected properties? Why 3 * ? > See record_access(): The instructions under test all get a Disp8 value > of 1 encoded. In order to be able to sensibly see how exactly things > go wrong (during debugging), it simply helps to cover the entire range > from zero to 3 times the (maximum) vector length. All accesses farther > out of bounds than by vector length will not be recorded here, and > hence fail "silently". Please can you put a short description in a comment somewhere around about here. ~Andrew _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxxx https://lists.xenproject.org/mailman/listinfo/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.