[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH] xen/x86: Work around Clang code generation bug with asm parameters
On 12.11.2020 09:14, Jan Beulich wrote: > On 11.11.2020 21:01, Andrew Cooper wrote: >> On 11/11/2020 15:11, Jan Beulich wrote: >>> On 11.11.2020 13:45, Andrew Cooper wrote: >>>> Clang 9 and later don't handle the clobber of %r10 correctly in >>>> _hypercall64_4(). See https://bugs.llvm.org/show_bug.cgi?id=48122 >>> Are you sure this is a bug? >> >> Yes. >> >>> With ... >>> >>>> #define _hypercall64_4(type, hcall, a1, a2, a3, a4) \ >>>> ({ \ >>>> - long res, tmp__; \ >>>> - register long _a4 asm ("r10") = ((long)(a4)); \ >>>> + long res, _a1 = (long)(a1), _a2 = (long)(a2), \ >>>> + _a3 = (long)(a3); \ >>>> + register long _a4 asm ("r10") = (long)(a4); \ >>>> asm volatile ( \ >>>> "call hypercall_page + %c[offset]" \ >>>> - : "=a" (res), "=D" (tmp__), "=S" (tmp__), "=d" (tmp__), \ >>>> - "=&r" (tmp__) ASM_CALL_CONSTRAINT \ >>> ... this we've requested "any register", while with ... >>> >>>> - : [offset] "i" (hcall * 32), \ >>>> - "1" ((long)(a1)), "2" ((long)(a2)), "3" ((long)(a3)), \ >>>> - "4" (_a4) \ >>> ... this we've asked for that specific register to be initialized >>> from r10 (and without telling the compiler that r10 is going to >>> change). >> >> Consider applying that same reasoning to "1" instead of "4". In that >> case, a1 would no longer be bound to %rdi. > > That's different: "=D" specifies the register, and "1" says "use > the same register as input". Whereas, as said, "=&r" says "use > any register" with "1" saying "use the same register" and (_a4) > specifying where the value is to come from. > >> The use of "4" explicitly binds the input and the output, which includes >> requiring them to be the same register. >> >> Furthermore, LLVM tends to consider "not behaving in the same was as >> GCC" a bug. > > That's a fair statement, but then still the description wants > re-wording. Plus of course future gcc is free to change their > behavior to that currently observed with clang. > > Consider the following example (on an arch where "f" is a > floating point register and there are ways to copy directly > between GPR and floating point registers: > > int i; > register float f asm("f7") = <input>; > asm("..." : "=r" (i) : "0" (f)); > > In this case obviously f7 can't be used for i (as it doesn't > match "r"). It's merely that the initial value of i is to come > from f7. In fact for Arm64 this > > extern float flt; > > int test(void) { > int i; > register float f asm("s7") = flt; > asm("add %0,%0,5" : "=r" (i) : "0" (f)); > return i; > } > > behaves exactly as described: > > test: > adrp x0, flt > ldr s7, [x0, @lo12(flt)] > fmov w0, s7 > add x0, x0, #5 > ret > > (Whether fmov is a sensible choice here is a different question; > I'd have expected some fcvt*.) Meanwhile I've realized that I neither need to resort to Arm here, nor to floating point, e.g. int test2(int in) { int i; register int ri asm("ecx") = in; asm("nop %0" : "=r" (i) : "0" (ri)); return i; } You'll find that the resulting code (at -O2; gcc 10.2.0) doesn't use %ecx at all - %edi gets moved directly to %eax. Jan
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |