[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH v6 08/20] xen/riscv: introduce cmpxchg.h
On Fri, 2024-03-22 at 11:42 +0100, Jan Beulich wrote: > On 22.03.2024 11:32, Oleksii wrote: > > On Thu, 2024-03-21 at 13:07 +0100, Jan Beulich wrote: > > > On 15.03.2024 19:06, Oleksii Kurochko wrote: > > > > The header was taken from Linux kernl 6.4.0-rc1. > > > > > > > > Addionally, were updated: > > > > * add emulation of {cmp}xchg for 1/2 byte types using 32-bit > > > > atomic > > > > access. > > > > * replace tabs with spaces > > > > * replace __* variale with *__ > > > > * introduce generic version of xchg_* and cmpxchg_*. > > > > * drop {cmp}xchg{release,relaxed,acquire} as Xen doesn't use > > > > them > > > > > > With this, ... > > > > > > > * drop barries and use instruction suffixices instead ( .aq, > > > > .rl, > > > > .aqrl ) > > > > > > > > Implementation of 4- and 8-byte cases were updated according to > > > > the > > > > spec: > > > > ``` > > > > .... > > > > Linux Construct RVWMO AMO Mapping > > > > atomic <op> relaxed amo<op>.{w|d} > > > > atomic <op> acquire amo<op>.{w|d}.aq > > > > atomic <op> release amo<op>.{w|d}.rl > > > > atomic <op> amo<op>.{w|d}.aqrl > > > > Linux Construct RVWMO LR/SC Mapping > > > > atomic <op> relaxed loop: lr.{w|d}; <op>; sc.{w|d}; bnez > > > > loop > > > > atomic <op> acquire loop: lr.{w|d}.aq; <op>; sc.{w|d}; bnez > > > > loop > > > > atomic <op> release loop: lr.{w|d}; <op>; sc.{w|d}.aqrl∗ ; > > > > bnez > > > > loop OR > > > > fence.tso; loop: lr.{w|d}; <op>; > > > > sc.{w|d}∗ > > > > ; bnez loop > > > > atomic <op> loop: lr.{w|d}.aq; <op>; sc.{w|d}.aqrl; > > > > bnez loop > > > > > > > > Table A.5: Mappings from Linux memory primitives to RISC-V > > > > primitives > > > > > > > > ``` > > > > > > ... I consider quoting this table in full, without any further > > > remarks, as > > > confusing: Three of the lines each are inapplicable now, aiui. > > I'll shorten the table then. > > > > > > > > Further what are the two * telling us? Quite likely they aren't > > > there > > > just > > > accidentally. > > > > > > Finally, why sc.{w|d}.aqrl when in principle one would expect > > > just > > > sc.{w|d}.rl? > > Because according to the last line of table A.5: > > atomic <op> loop: lr.{w|d}.aq; <op>; sc.{w|d}.aqrl; > > Here it is used sc.{w|d}.aqrl form, so I decided to stick to the > > what > > is mentioned in the table. > > I understand it's mentioned that way in the table. But it being that > way > is not explained anywhere. Hence my "Why?" > > > > > + __typeof__(*(ptr)) old__ = (__typeof__(*(ptr)))(old); \ > > > > + __typeof__(*(ptr)) new__ = (__typeof__(*(ptr)))(new); \ > > > > > > The casts aren't very nice to have here; I take they're needed > > > for > > > cmpxchg_ptr() to compile? > > Not really, I have not faced an compilation issue. > > The reason why it was added is that lr instruction places the sign- > > extended value in destination register, but if not to do cast value > > for > > old and new were generated without sign extension, so, for example: > > u32= 0xbbbbbbbb; > > cmpxchg(&u32, 0xbbbbbbbb, 0xCCCCCCCC), u32); > > Will fail because after: > > "0: lr" lr_sfx " %0, %2\n" > > in %0 we will have 0xFFFFFFFFBBBBBBBB, but in %3 we will have > > 0xBBBBBBBB, so > > bne %0, %z3, 1f\n" > > %0 and %3 are always inequal in case when the most significat bit > > of > > value read from %2 has 1. > > I'm afraid I don't follow: It's the type conversion you're after, but > that would happen also with the casts omitted. Yes, agree it would happen also with the casts omitted, and it was pure luck that the compiler that with casts the compiler used an immediate with the most significant bit = 1: ffffffffc00397f0: bbbbc7b7 lui a5,0xbbbbc ffffffffc00397f4: bbb78793 add a5,a5,-1093 # ffffffffbbbbbbbb <start-0x4444445> ffffffffc00397f8: fef42623 sw a5,-20(s0) ffffffffc00397fc: ccccd737 lui a4,0xccccd ffffffffc0039800: ccc7071b addw a4,a4,-820 # ffffffffcccccccc <__bss_end+0xcc7ff44> ffffffffc0039804: 56fd li a3,-1 ffffffffc0039806: 9281 srl a3,a3,0x20 ffffffffc0039808: fec40513 add a0,s0,-20 ffffffffc003980c: 140525af lr.w.aq a1,(a0) ffffffffc0039810: 00f59563 bne a1,a5,ffffffffc003981a <start_xen+0x4e> ffffffffc0039814: 1ee5262f sc.w.aqrl a2,a4,(a0) I will update the code with the mask mentioned below to be sure that a5 has always correct value. ~ Oleksii > > > But now I realized that it would be better just to use a mask and > > not > > be dependent from compiler code generation, so it would be better > > to in > > the following way ( of course, the macros should be updated > > accordingly > > to remarks you give me ): > > #define __generic_cmpxchg(ptr, old, new, ret, lr_sfx, > > sc_sfx) \ > > ({ \ > > register unsigned int rc; \ > > unsigned long mask = GENMASK(((sizeof(*(ptr))) * > > BITS_PER_BYTE) > > - 1, 0); \ > > asm volatile( \ > > "0: lr" lr_sfx " %0, %2\n" \ > > " and %0, %0, %z[mask]\n" \ > > " bne %0, %z3, 1f\n" \ > > " sc" sc_sfx " %1, %z4, %2\n" \ > > " bnez %1, 0b\n" \ > > "1:\n" \ > > : "=&r" (ret), "=&r" (rc), "+A" (*ptr) \ > > : "rJ" (old), "rJ" (new), [mask] "rJ" (mask) \ > > : "memory"); \ > > }) > > It'll be up to you whether to switch to such an alternative. > > Jan
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |