[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH v6 08/20] xen/riscv: introduce cmpxchg.h
On Thu, 2024-03-21 at 13:07 +0100, Jan Beulich wrote: > On 15.03.2024 19:06, Oleksii Kurochko wrote: > > The header was taken from Linux kernl 6.4.0-rc1. > > > > Addionally, were updated: > > * add emulation of {cmp}xchg for 1/2 byte types using 32-bit atomic > > access. > > * replace tabs with spaces > > * replace __* variale with *__ > > * introduce generic version of xchg_* and cmpxchg_*. > > * drop {cmp}xchg{release,relaxed,acquire} as Xen doesn't use them > > With this, ... > > > * drop barries and use instruction suffixices instead ( .aq, .rl, > > .aqrl ) > > > > Implementation of 4- and 8-byte cases were updated according to the > > spec: > > ``` > > .... > > Linux Construct RVWMO AMO Mapping > > atomic <op> relaxed amo<op>.{w|d} > > atomic <op> acquire amo<op>.{w|d}.aq > > atomic <op> release amo<op>.{w|d}.rl > > atomic <op> amo<op>.{w|d}.aqrl > > Linux Construct RVWMO LR/SC Mapping > > atomic <op> relaxed loop: lr.{w|d}; <op>; sc.{w|d}; bnez loop > > atomic <op> acquire loop: lr.{w|d}.aq; <op>; sc.{w|d}; bnez > > loop > > atomic <op> release loop: lr.{w|d}; <op>; sc.{w|d}.aqrl∗ ; bnez > > loop OR > > fence.tso; loop: lr.{w|d}; <op>; sc.{w|d}∗ > > ; bnez loop > > atomic <op> loop: lr.{w|d}.aq; <op>; sc.{w|d}.aqrl; > > bnez loop > > > > Table A.5: Mappings from Linux memory primitives to RISC-V > > primitives > > > > ``` > > ... I consider quoting this table in full, without any further > remarks, as > confusing: Three of the lines each are inapplicable now, aiui. I'll shorten the table then. > > Further what are the two * telling us? Quite likely they aren't there > just > accidentally. > > Finally, why sc.{w|d}.aqrl when in principle one would expect just > sc.{w|d}.rl? Because according to the last line of table A.5: atomic <op> loop: lr.{w|d}.aq; <op>; sc.{w|d}.aqrl; Here it is used sc.{w|d}.aqrl form, so I decided to stick to the what is mentioned in the table. > > > +}) > > + > > +static always_inline unsigned long __xchg(volatile void *ptr, > > unsigned long new, int size) > > +{ > > + unsigned long ret; > > + > > + switch ( size ) > > + { > > + case 1: > > + ret = emulate_xchg_1_2((volatile uint8_t *)ptr, new, > > ".aq", ".aqrl"); > > + break; > > + case 2: > > + ret = emulate_xchg_1_2((volatile uint16_t *)ptr, new, > > ".aq", ".aqrl"); > > + break; > > + case 4: > > + __amoswap_generic((volatile uint32_t *)ptr, new, ret, > > ".w.aqrl"); > > + break; > > +#ifndef CONFIG_32BIT > > There's no 32BIT Kconfig symbol; all we have is a 64BIT one. I meant here CONFIG_RISCV_32. > > > + case 8: > > + __amoswap_generic((volatile uint64_t *)ptr, new, ret, > > ".d.aqrl"); > > + break; > > +#endif > > + default: > > + STATIC_ASSERT_UNREACHABLE(); > > + } > > + > > + return ret; > > +} > > + > > +#define xchg(ptr, x) \ > > +({ \ > > + __typeof__(*(ptr)) n_ = (x); \ > > + (__typeof__(*(ptr))) \ > > + __xchg((ptr), (unsigned long)(n_), sizeof(*(ptr))); \ > > Nit: While excess parentheses "only" harm readability, they would > nevertheless better be omitted (here: the first argument passed). > > > +}) > > + > > +#define __generic_cmpxchg(ptr, old, new, ret, lr_sfx, sc_sfx) \ > > + ({ \ > > + register unsigned int rc; \ > > Nit: We don't normally use "register", unless accompanied by asm() > tying > a variable to a specific one. > > > + __typeof__(*(ptr)) old__ = (__typeof__(*(ptr)))(old); \ > > + __typeof__(*(ptr)) new__ = (__typeof__(*(ptr)))(new); \ > > The casts aren't very nice to have here; I take they're needed for > cmpxchg_ptr() to compile? Not really, I have not faced an compilation issue. The reason why it was added is that lr instruction places the sign- extended value in destination register, but if not to do cast value for old and new were generated without sign extension, so, for example: u32= 0xbbbbbbbb; cmpxchg(&u32, 0xbbbbbbbb, 0xCCCCCCCC), u32); Will fail because after: "0: lr" lr_sfx " %0, %2\n" in %0 we will have 0xFFFFFFFFBBBBBBBB, but in %3 we will have 0xBBBBBBBB, so bne %0, %z3, 1f\n" %0 and %3 are always inequal in case when the most significat bit of value read from %2 has 1. But now I realized that it would be better just to use a mask and not be dependent from compiler code generation, so it would be better to in the following way ( of course, the macros should be updated accordingly to remarks you give me ): #define __generic_cmpxchg(ptr, old, new, ret, lr_sfx, sc_sfx) \ ({ \ register unsigned int rc; \ unsigned long mask = GENMASK(((sizeof(*(ptr))) * BITS_PER_BYTE) - 1, 0); \ asm volatile( \ "0: lr" lr_sfx " %0, %2\n" \ " and %0, %0, %z[mask]\n" \ " bne %0, %z3, 1f\n" \ " sc" sc_sfx " %1, %z4, %2\n" \ " bnez %1, 0b\n" \ "1:\n" \ : "=&r" (ret), "=&r" (rc), "+A" (*ptr) \ : "rJ" (old), "rJ" (new), [mask] "rJ" (mask) \ : "memory"); \ }) > > > + asm volatile( \ > > Nit: Missing blank once again. Would be really nice if you could go > through and sort this uniformly for the series. > > > + "0: lr" lr_sfx " %0, %2\n" \ > > + " bne %0, %z3, 1f\n" \ > > + " sc" sc_sfx " %1, %z4, %2\n" \ > > + " bnez %1, 0b\n" \ > > + "1:\n" \ > > + : "=&r" (ret), "=&r" (rc), "+A" (*ptr) \ > > + : "rJ" (old__), "rJ" (new__) \ > > Please could I talk you into using named operands here, too? Sure, I will add them. ~ Oleksii
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |