[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH v6 08/20] xen/riscv: introduce cmpxchg.h
On 15.03.2024 19:06, Oleksii Kurochko wrote: > The header was taken from Linux kernl 6.4.0-rc1. > > Addionally, were updated: > * add emulation of {cmp}xchg for 1/2 byte types using 32-bit atomic > access. > * replace tabs with spaces > * replace __* variale with *__ > * introduce generic version of xchg_* and cmpxchg_*. > * drop {cmp}xchg{release,relaxed,acquire} as Xen doesn't use them With this, ... > * drop barries and use instruction suffixices instead ( .aq, .rl, .aqrl ) > > Implementation of 4- and 8-byte cases were updated according to the spec: > ``` > .... > Linux Construct RVWMO AMO Mapping > atomic <op> relaxed amo<op>.{w|d} > atomic <op> acquire amo<op>.{w|d}.aq > atomic <op> release amo<op>.{w|d}.rl > atomic <op> amo<op>.{w|d}.aqrl > Linux Construct RVWMO LR/SC Mapping > atomic <op> relaxed loop: lr.{w|d}; <op>; sc.{w|d}; bnez loop > atomic <op> acquire loop: lr.{w|d}.aq; <op>; sc.{w|d}; bnez loop > atomic <op> release loop: lr.{w|d}; <op>; sc.{w|d}.aqrl∗ ; bnez loop OR > fence.tso; loop: lr.{w|d}; <op>; sc.{w|d}∗ ; bnez loop > atomic <op> loop: lr.{w|d}.aq; <op>; sc.{w|d}.aqrl; bnez loop > > Table A.5: Mappings from Linux memory primitives to RISC-V primitives > > ``` ... I consider quoting this table in full, without any further remarks, as confusing: Three of the lines each are inapplicable now, aiui. Further what are the two * telling us? Quite likely they aren't there just accidentally. Finally, why sc.{w|d}.aqrl when in principle one would expect just sc.{w|d}.rl? > --- /dev/null > +++ b/xen/arch/riscv/include/asm/cmpxchg.h > @@ -0,0 +1,209 @@ > +/* SPDX-License-Identifier: GPL-2.0-only */ > +/* Copyright (C) 2014 Regents of the University of California */ > + > +#ifndef _ASM_RISCV_CMPXCHG_H > +#define _ASM_RISCV_CMPXCHG_H > + > +#include <xen/compiler.h> > +#include <xen/lib.h> > + > +#include <asm/fence.h> > +#include <asm/io.h> > +#include <asm/system.h> > + > +#define __amoswap_generic(ptr, new, ret, sfx) \ As before / elsewhere: Is there a strong need for two leading underscores here? Using just one would already be standard compliant afaict. > +({ \ > + asm volatile ( \ > + " amoswap" sfx " %0, %2, %1" \ > + : "=r" (ret), "+A" (*ptr) \ > + : "r" (new) \ > + : "memory" ); \ > +}) This doesn't need the ({ }) (anymore?): #define __amoswap_generic(ptr, new, ret, sfx) \ asm volatile ( \ " amoswap" sfx " %0, %2, %1" \ : "=r" (ret), "+A" (*(ptr)) \ : "r" (new) \ : "memory" ) (note also the added parentheses). > +/* > + * For LR and SC, the A extension requires that the address held in rs1 be > + * naturally aligned to the size of the operand (i.e., eight-byte aligned > + * for 64-bit words and four-byte aligned for 32-bit words). > + * If the address is not naturally aligned, an address-misaligned exception > + * or an access-fault exception will be generated. > + * > + * Thereby: > + * - for 1-byte xchg access the containing word by clearing low two bits > + * - for 2-byte xchg ccess the containing word by clearing bit 1. Nit: "access" > + * If resulting 4-byte access is still misalgined, it will fault just as > + * non-emulated 4-byte access would. > + */ > +#define emulate_xchg_1_2(ptr, new, lr_sfx, sc_sfx) \ > +({ \ > + uint32_t *aligned_ptr = (uint32_t *)((unsigned long)ptr & ~(0x4 - > sizeof(*(ptr)))); \ > + unsigned int new_val_pos = ((unsigned long)(ptr) & (0x4 - > sizeof(*(ptr)))) * BITS_PER_BYTE; \ You parenthesize ptr here correctly, but not in the line above. Instead of "_pos" in the name, maybe better "_bit"? Finally, here and elsewhere, please limit line length to 80 chars. (Omitting the 0x here would help a little, but not quite enough. Question is whether these wouldn't better be sizeof(*aligned_ptr) anyway.) > + unsigned long mask = GENMASK(((sizeof(*(ptr))) * BITS_PER_BYTE) - 1, 0) > << new_val_pos; \ > + unsigned int new_ = new << new_val_pos; \ Similarly new wants parenthesizing here. > + unsigned int old; \ > + unsigned int scratch; \ > + \ > + asm volatile ( \ > + "0: lr.w" lr_sfx " %[old], %[aligned_ptr]\n" \ > + " and %[scratch], %[old], %z[nmask]\n" \ > + " or %[scratch], %[scratch], %z[new_]\n" \ > + " sc.w" sc_sfx " %[scratch], %[scratch], %[aligned_ptr]\n" \ > + " bnez %[scratch], 0b\n" \ > + : [old] "=&r" (old), [scratch] "=&r" (scratch), [aligned_ptr] "+A" > (*aligned_ptr) \ While for the variable name aligned_ptr is likely helpful, for the operand name just ptr would certainly do? > + : [new_] "rJ" (new_), [nmask] "rJ" (~mask) \ Neither mask nor ~mask can be 0. Hence J here and the z modifier above look pointless. (new_, otoh, can be 0, so allowing x0 to be used in that case is certainly desirable.) As to using ~mask here: Now that we look to have settled on requiring Zbb, you could use andn instead of and, thus allowing the same register to be used in the asm() and ... > + : "memory" ); \ > + \ > + (__typeof__(*(ptr)))((old & mask) >> new_val_pos); \ ... for this calculation. > +}) > + > +static always_inline unsigned long __xchg(volatile void *ptr, unsigned long > new, int size) > +{ > + unsigned long ret; > + > + switch ( size ) > + { > + case 1: > + ret = emulate_xchg_1_2((volatile uint8_t *)ptr, new, ".aq", ".aqrl"); > + break; > + case 2: > + ret = emulate_xchg_1_2((volatile uint16_t *)ptr, new, ".aq", > ".aqrl"); > + break; > + case 4: > + __amoswap_generic((volatile uint32_t *)ptr, new, ret, ".w.aqrl"); > + break; > +#ifndef CONFIG_32BIT There's no 32BIT Kconfig symbol; all we have is a 64BIT one. > + case 8: > + __amoswap_generic((volatile uint64_t *)ptr, new, ret, ".d.aqrl"); > + break; > +#endif > + default: > + STATIC_ASSERT_UNREACHABLE(); > + } > + > + return ret; > +} > + > +#define xchg(ptr, x) \ > +({ \ > + __typeof__(*(ptr)) n_ = (x); \ > + (__typeof__(*(ptr))) \ > + __xchg((ptr), (unsigned long)(n_), sizeof(*(ptr))); \ Nit: While excess parentheses "only" harm readability, they would nevertheless better be omitted (here: the first argument passed). > +}) > + > +#define __generic_cmpxchg(ptr, old, new, ret, lr_sfx, sc_sfx) \ > + ({ \ > + register unsigned int rc; \ Nit: We don't normally use "register", unless accompanied by asm() tying a variable to a specific one. > + __typeof__(*(ptr)) old__ = (__typeof__(*(ptr)))(old); \ > + __typeof__(*(ptr)) new__ = (__typeof__(*(ptr)))(new); \ The casts aren't very nice to have here; I take they're needed for cmpxchg_ptr() to compile? > + asm volatile( \ Nit: Missing blank once again. Would be really nice if you could go through and sort this uniformly for the series. > + "0: lr" lr_sfx " %0, %2\n" \ > + " bne %0, %z3, 1f\n" \ > + " sc" sc_sfx " %1, %z4, %2\n" \ > + " bnez %1, 0b\n" \ > + "1:\n" \ > + : "=&r" (ret), "=&r" (rc), "+A" (*ptr) \ > + : "rJ" (old__), "rJ" (new__) \ Please could I talk you into using named operands here, too? Also ptr here is lacking parentheses again. > + : "memory"); \ And yet another missing blank. > + }) At the use site this construct having a normal return value (rather than ret being passed in) would overall look more natural. > +/* > + * For LR and SC, the A extension requires that the address held in rs1 be > + * naturally aligned to the size of the operand (i.e., eight-byte aligned > + * for 64-bit words and four-byte aligned for 32-bit words). > + * If the address is not naturally aligned, an address-misaligned exception > + * or an access-fault exception will be generated. > + * > + * Thereby: > + * - for 1-byte xchg access the containing word by clearing low two bits > + * - for 2-byte xchg ccess the containing word by clearing first bit. > + * > + * If resulting 4-byte access is still misalgined, it will fault just as > + * non-emulated 4-byte access would. > + * > + * old_val was casted to unsigned long for cmpxchgptr() > + */ > +#define emulate_cmpxchg_1_2(ptr, old, new, lr_sfx, sc_sfx) \ > +({ \ > + uint32_t *aligned_ptr = (uint32_t *)((unsigned long)ptr & ~(0x4 - > sizeof(*(ptr)))); \ > + uint8_t new_val_pos = ((unsigned long)(ptr) & (0x4 - sizeof(*(ptr)))) * > BITS_PER_BYTE; \ > + unsigned long mask = GENMASK(((sizeof(*(ptr))) * BITS_PER_BYTE) - 1, 0) > << new_val_pos; \ > + unsigned int old_ = old << new_val_pos; \ > + unsigned int new_ = new << new_val_pos; \ > + unsigned int old_val; \ > + unsigned int scratch; \ > + \ > + __asm__ __volatile__ ( \ > + "0: lr.w" lr_sfx " %[scratch], %[aligned_ptr]\n" \ > + " and %[old_val], %[scratch], %z[mask]\n" \ > + " bne %[old_val], %z[old_], 1f\n" \ > + " xor %[scratch], %[old_val], %[scratch]\n" \ To be honest I was hoping this line would come with a brief comment. > + " or %[scratch], %[scratch], %z[new_]\n" \ > + " sc.w" sc_sfx " %[scratch], %[scratch], %[aligned_ptr]\n" \ > + " bnez %[scratch], 0b\n" \ > + "1:\n" \ > + : [old_val] "=&r" (old_val), [scratch] "=&r" (scratch), > [aligned_ptr] "+A" (*aligned_ptr) \ > + : [old_] "rJ" (old_), [new_] "rJ" (new_), \ > + [mask] "rJ" (mask) \ > + : "memory" ); \ > + \ > + (__typeof__(*(ptr)))((unsigned long)old_val >> new_val_pos); \ > +}) A few of the comments for emulate_xchg_1_2() apply here as well. > +/* > + * Atomic compare and exchange. Compare OLD with MEM, if identical, > + * store NEW in MEM. Return the initial value in MEM. Success is > + * indicated by comparing RETURN with OLD. > + */ > +static always_inline unsigned long __cmpxchg(volatile void *ptr, > + unsigned long old, > + unsigned long new, > + int size) Nit: Inappropriate indentation. Jan
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |