[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH v3 13/34] xen/riscv: introduce cmpxchg.h
On Tue, 2024-01-23 at 11:28 +0100, Jan Beulich wrote: > On 23.01.2024 11:15, Oleksii wrote: > > On Mon, 2024-01-22 at 17:27 +0100, Jan Beulich wrote: > > > On 22.12.2023 16:12, Oleksii Kurochko wrote: > > > > + : "=r" (ret__), "+A" (*ptr__) \ > > > > + : "r" (new__) \ > > > > + : "memory" ); \ > > > > + break; \ > > > > + case 8: \ > > > > + asm volatile( \ > > > > + " amoswap.d %0, %2, %1\n" \ > > > > + : "=r" (ret__), "+A" (*ptr__) \ > > > > + : "r" (new__) \ > > > > + : "memory" ); \ > > > > + break; \ > > > > + default: \ > > > > + ASSERT_UNREACHABLE(); \ > > > > > > If at all possible this wants to trigger a build failure, not a > > > runtime > > > one. > > I'll update that with BUILD_BUG_ON(size > 8); > > What about size accidentally being e.g. 5? I'm also not sure you'll > be > able to use BUILD_BUG_ON() here: That'll depend on what use sites > there > are. And if all present ones are okay in this regard, you'd still set > out a trap for someone else to fall into later. We have a common > approach for this, which currently is under re-work. See > https://lists.xen.org/archives/html/xen-devel/2024-01/msg01115.html. Thanks. I'll use that. > > > > > + } \ > > > > + ret__; \ > > > > +}) > > > > + > > > > +#define xchg_relaxed(ptr, x) \ > > > > +({ \ > > > > + __typeof__(*(ptr)) x_ = (x); \ > > > > + (__typeof__(*(ptr))) __xchg_relaxed((ptr), x_, > > > > sizeof(*(ptr))); \ > > > > > > Nit: Stray blank after cast. For readability I'd also suggest to > > > drop parentheses in cases like the first argument passed to > > > __xchg_relaxed() here. > > Thanks. I'll take that into account. > > > > > > > > > +}) > > > > > > For both: What does "relaxed" describe? I'm asking because it's > > > not > > > really clear whether the memory clobbers are actually needed. > > > > > > > +#define __xchg_acquire(ptr, new, size) \ > > > > +({ \ > > > > + __typeof__(ptr) ptr__ = (ptr); \ > > > > + __typeof__(new) new__ = (new); \ > > > > + __typeof__(*(ptr)) ret__; \ > > > > + switch (size) \ > > > > + { \ > > > > + case 4: \ > > > > + asm volatile( \ > > > > + " amoswap.w %0, %2, %1\n" \ > > > > + RISCV_ACQUIRE_BARRIER \ > > > > + : "=r" (ret__), "+A" (*ptr__) \ > > > > + : "r" (new__) \ > > > > + : "memory" ); \ > > > > + break; \ > > > > + case 8: \ > > > > + asm volatile( \ > > > > + " amoswap.d %0, %2, %1\n" \ > > > > + RISCV_ACQUIRE_BARRIER \ > > > > + : "=r" (ret__), "+A" (*ptr__) \ > > > > + : "r" (new__) \ > > > > + : "memory" ); \ > > > > + break; \ > > > > + default: \ > > > > + ASSERT_UNREACHABLE(); \ > > > > + } \ > > > > + ret__; \ > > > > +}) > > > > > > If I'm not mistaken this differs from __xchg_relaxed() only in > > > the > > > use > > > of RISCV_ACQUIRE_BARRIER, and ... > > > > > > > +#define xchg_acquire(ptr, x) \ > > > > +({ \ > > > > + __typeof__(*(ptr)) x_ = (x); \ > > > > + (__typeof__(*(ptr))) __xchg_acquire((ptr), x_, > > > > sizeof(*(ptr))); \ > > > > +}) > > > > + > > > > +#define __xchg_release(ptr, new, size) \ > > > > +({ \ > > > > + __typeof__(ptr) ptr__ = (ptr); \ > > > > + __typeof__(new) new__ = (new); \ > > > > + __typeof__(*(ptr)) ret__; \ > > > > + switch (size) \ > > > > + { \ > > > > + case 4: \ > > > > + asm volatile ( \ > > > > + RISCV_RELEASE_BARRIER \ > > > > + " amoswap.w %0, %2, %1\n" \ > > > > + : "=r" (ret__), "+A" (*ptr__) \ > > > > + : "r" (new__) \ > > > > + : "memory"); \ > > > > + break; \ > > > > + case 8: \ > > > > + asm volatile ( \ > > > > + RISCV_RELEASE_BARRIER \ > > > > + " amoswap.d %0, %2, %1\n" \ > > > > + : "=r" (ret__), "+A" (*ptr__) \ > > > > + : "r" (new__) \ > > > > + : "memory"); \ > > > > + break; \ > > > > + default: \ > > > > + ASSERT_UNREACHABLE(); \ > > > > + } \ > > > > + ret__; \ > > > > +}) > > > > > > this only in the use of RISCV_RELEASE_BARRIER. If so they likely > > > want > > > folding, to limit redundancy and make eventual updating easier. > > > (Same > > > for the cmpxchg helper further down, as it seems.) > > Yes, you are right. The difference is only in RISCV_RELEASE_BARRIER > > and > > it is an absence of RISCV_RELEASE_BARRIER is a reason why we have > > "relaxed" versions. > > > > I am not sure that I understand what you mean by folding here. Do > > you > > mean that there is no any sense to have to separate macros and it > > is > > needed only one with RISCV_RELEASE_BARRIER? > > No. You should parameterize the folded common macro for the derived > macros to simply pass in the barriers needed, with empty macro > arguments indicating "this barrier not needed". Now it makes sense to me. Thank you for explanation. > > > > > +#define xchg_release(ptr, x) \ > > > > +({ \ > > > > + __typeof__(*(ptr)) x_ = (x); \ > > > > + (__typeof__(*(ptr))) __xchg_release((ptr), x_, > > > > sizeof(*(ptr))); \ > > > > +}) > > > > + > > > > +static always_inline uint32_t __xchg_case_4(volatile uint32_t > > > > *ptr, > > > > + uint32_t new) > > > > +{ > > > > + __typeof__(*(ptr)) ret; > > > > + > > > > + asm volatile ( > > > > + " amoswap.w.aqrl %0, %2, %1\n" > > > > + : "=r" (ret), "+A" (*ptr) > > > > + : "r" (new) > > > > + : "memory" ); > > > > + > > > > + return ret; > > > > +} > > > > + > > > > +static always_inline uint64_t __xchg_case_8(volatile uint64_t > > > > *ptr, > > > > + uint64_t new) > > > > +{ > > > > + __typeof__(*(ptr)) ret; > > > > + > > > > + asm volatile( \ > > > > + " amoswap.d.aqrl %0, %2, %1\n" \ > > > > + : "=r" (ret), "+A" (*ptr) \ > > > > + : "r" (new) \ > > > > + : "memory" ); \ > > > > + > > > > + return ret; > > > > +} > > > > + > > > > +static always_inline unsigned short __cmpxchg_case_2(volatile > > > > uint32_t *ptr, > > > > + uint32_t > > > > old, > > > > + uint32_t > > > > new); > > > > > > Don't you consistently mean uint16_t here (incl the return type) > > > and > > > ... > > > > > > > +static always_inline unsigned short __cmpxchg_case_1(volatile > > > > uint32_t *ptr, > > > > + uint32_t > > > > old, > > > > + uint32_t > > > > new); > > > > > > ... uint8_t here? > > The idea was that we emulate __cmpxchg_case_1 and __cmpxchg_case_2 > > using 4 bytes cmpxchg and __cmpxchg_case_4 expects uint32_t. > > I consider this wrong. The functions would better be type-correct. > > > > > +static inline unsigned long __xchg(volatile void *ptr, > > > > unsigned > > > > long x, int size) > > > > +{ > > > > + switch (size) { > > > > + case 1: > > > > + return __cmpxchg_case_1(ptr, (uint32_t)-1, x); > > > > + case 2: > > > > + return __cmpxchg_case_2(ptr, (uint32_t)-1, x); > > > > > > How are these going to work? You'll compare against ~0, and if > > > the > > > value > > > in memory isn't ~0, memory won't be updated; you will only > > > (correctly) > > > return the value found in memory. > > > > > > Or wait - looking at __cmpxchg_case_{1,2}() far further down, you > > > ignore > > > "old" there. Which apparently means they'll work for the use > > > here, > > > but > > > not for the use in __cmpxchg(). > > Yes, the trick is that old is ignored and is read in > > __emulate_cmpxchg_case1_2() before __cmpxchg_case_4 is called: > > do > > { > > read_val = > > read_func(aligned_ptr); > > swapped_new = read_val & > > ~mask; > > swapped_new |= > > masked_new; > > ret = cmpxchg_func(aligned_ptr, read_val, > > swapped_new); > > } while ( ret != read_val > > ); > > read_val it is 'old'. > > > > But now I am not 100% sure that it is correct for __cmpxchg... > > It just can't be correct - you can't ignore "old" there. I think you > want simple cmpxchg primitives, which xchg then uses in a loop (while > cmpxchg uses them plainly). But xchg doesn't require 'old' value, so it should be ignored in some way by cmpxchg. > > > > > +static always_inline unsigned short __cmpxchg_case_2(volatile > > > > uint32_t *ptr, > > > > + uint32_t > > > > old, > > > > + uint32_t > > > > new) > > > > +{ > > > > + (void) old; > > > > + > > > > + if (((unsigned long)ptr & 3) == 3) > > > > + { > > > > +#ifdef CONFIG_64BIT > > > > + return __emulate_cmpxchg_case1_2((uint64_t *)ptr, new, > > > > + readq, > > > > __cmpxchg_case_8, > > > > 0xffffU); > > > > > > What if ((unsigned long)ptr & 7) == 7 (which is a sub-case of > > > what > > > the > > > if() above checks for? Isn't it more reasonable to require > > > aligned > > > 16-bit quantities here? Or if mis-aligned addresses are okay, you > > > could > > > as well emulate using __cmpxchg_case_4(). > > Yes, it will be more reasonable. I'll use IS_ALIGNED instead. > > Not sure I get your use of "instead" here correctly. There's more > to change here than just the if() condition. I meant something like: if ( IS_ALIGNED(ptr, 16) ) __emulate_cmpxchg_case1_2(...); else assert_failed("ptr isn't aligned\n"); I have to check if it is necessary to have an mis-aligned address for CAS intstuctions. If mis-aligned address is okay then it looks like we can use always __cmpxchng_case_4 without checking if a ptr is ALIGNED or not. Or I started to loose something... ~ Oleksii
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |