[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH v6 10/20] xen/riscv: introduce atomic.h
On Mon, 2024-03-25 at 09:18 +0100, Jan Beulich wrote: > On 22.03.2024 13:25, Oleksii wrote: > > On Thu, 2024-03-21 at 14:03 +0100, Jan Beulich wrote: > > > On 15.03.2024 19:06, Oleksii Kurochko wrote: > > > > + */ > > > > +static always_inline void read_atomic_size(const volatile void > > > > *p, > > > > + void *res, > > > > + unsigned int size) > > > > +{ > > > > + switch ( size ) > > > > + { > > > > + case 1: *(uint8_t *)res = readb(p); break; > > > > + case 2: *(uint16_t *)res = readw(p); break; > > > > + case 4: *(uint32_t *)res = readl(p); break; > > > > + case 8: *(uint32_t *)res = readq(p); break; > > > > > > Nit: Excess blank before =. > > > > > > Also - no #ifdef here to be RV32-ready? > > Because there is #ifdef RV32 in io.h for readq(). > > There you'd run into __raw_readq()'s BUILD_BUG_ON(), afaict even for > 1-, 2-, or 4-byte accesses. That's not quite what we want here. Do you mean that if someone will redefine readq() in another way and not wrap it by #ifdef RV32? Except this I am not sure that there is an issue as it will be still a compilation error, so anyway it will be needed to provide an implementation for __raw_readq(). One of the reason why I decided to wrap with #ifdef RV32 in io.h to not go over the source code and add wrapping. Also for some code it will be needed to rewrite it. For example, I am not sure that I can add #ifdef inside macros, f.e.: #define write_atomic(p, x) \ ({ \ typeof(*(p)) x__ = (x); \ switch ( sizeof(*(p)) ) \ { \ case 1: writeb(x__, p); break; \ case 2: writew(x__, p); break; \ case 4: writel(x__, p); break; \ case 8: writeq(x__, p); break; \ default: __bad_atomic_size(); break; \ } \ x__; \ }) > > > > > +/* > > > > + * First, the atomic ops that have no ordering constraints and > > > > therefor don't > > > > + * have the AQ or RL bits set. These don't return anything, > > > > so > > > > there's only > > > > + * one version to worry about. > > > > + */ > > > > +#define ATOMIC_OP(op, asm_op, I, asm_type, c_type, prefix) \ > > > > +static inline \ > > > > +void atomic##prefix##_##op(c_type i, atomic##prefix##_t *v) \ > > > > +{ \ > > > > + asm volatile ( \ > > > > + " amo" #asm_op "." #asm_type " zero, %1, %0" \ > > > > + : "+A" (v->counter) \ > > > > + : "r" (I) \ > > > > > > Btw, I consider this pretty confusing. At the 1st and 2nd glance > > > this > > > looks like a mistake, i.e. as if i was meant. Imo ... > > > > > > > + : "memory" ); \ > > > > +} \ > > > > + > > > > +/* > > > > + * Only CONFIG_GENERIC_ATOMIC64=y was ported to Xen that is > > > > the > > > > reason why > > > > + * last argument for ATOMIC_OP isn't used. > > > > + */ > > > > +#define ATOMIC_OPS(op, asm_op, I) \ > > > > + ATOMIC_OP (op, asm_op, I, w, int, ) > > > > + > > > > +ATOMIC_OPS(add, add, i) > > > > +ATOMIC_OPS(sub, add, -i) > > > > +ATOMIC_OPS(and, and, i) > > > > +ATOMIC_OPS( or, or, i) > > > > +ATOMIC_OPS(xor, xor, i) > > > > > > ... here you want to only pass the (unary) operator (and leaving > > > that > > > blank > > > is as fine as using +). > > I agree that a game with 'i' and 'I' looks confusing, but I am not > > really understand what is wrong with using ' i' here. It seems that > > preprocessed macros looks fine: > > static inline void atomic_add(int i, atomic_t *v) { asm volatile > > ( " > > amo" "add" "." "w" " zero, %1, %0" : "+A" (v->counter) : "r" (i) > > : > > "memory" ); } > > > > static inline void atomic_sub(int i, atomic_t *v) { asm volatile > > ( " > > amo" "add" "." "w" " zero, %1, %0" : "+A" (v->counter) : "r" (- > > i) : > > "memory" ); } > > I didn't question the pre-processed result being correct. Instead I > said > that I consider the construct confusing to the reader, for looking as > if > there was a mistake (in the case of the letter i used). Note also in > particular how the macro invocations need to be in sync with the > macro > implementation, for lower case i being used both in the macro and in > its > invocations. Anything parameterized would better be fully so, at the > very least to avoid, as said, confusion. (Having macros depend on > context at their use sites _may_ be okay for local helper macros, but > here we're talking about a not even private header file.) I am not sure then I understand how mentioning '+i' will help significantly remove confusion. > > > > > +/* This is required to provide a full barrier on success. */ > > > > +static inline int atomic_add_unless(atomic_t *v, int a, int u) > > > > +{ > > > > + int prev, rc; > > > > + > > > > + asm volatile ( > > > > + "0: lr.w %[p], %[c]\n" > > > > + " beq %[p], %[u], 1f\n" > > > > + " add %[rc], %[p], %[a]\n" > > > > + " sc.w.rl %[rc], %[rc], %[c]\n" > > > > + " bnez %[rc], 0b\n" > > > > + RISCV_FULL_BARRIER > > > > > > With this and no .aq on the load, why the .rl on the store? > > It is something that LKMM requires [1]. > > > > This is not fully clear to me what is so specific in LKMM, but > > accoring > > to the spec: > > Ordering Annotation Fence-based Equivalent > > l{b|h|w|d|r}.aq l{b|h|w|d|r}; fence r,rw > > l{b|h|w|d|r}.aqrl fence rw,rw; l{b|h|w|d|r}; fence r,rw > > s{b|h|w|d|c}.rl fence rw,w; s{b|h|w|d|c} > > s{b|h|w|d|c}.aqrl fence rw,w; s{b|h|w|d|c} > > amo<op>.aq amo<op>; fence r,rw > > amo<op>.rl fence rw,w; amo<op> > > amo<op>.aqrl fence rw,rw; amo<op>; fence rw,rw > > Table 2.2: Mappings from .aq and/or .rl to fence-based > > equivalents. > > An alternative mapping places a fence rw,rw after the existing > > s{b|h|w|d|c} mapping rather than at the front of the > > l{b|h|w|d|r} mapping. > > I'm afraid I can't spot the specific case in this table. None of the > stores in the right column have a .rl suffix. Yes, it is expected. I am reading this table as (f.e.) amo<op>.rl is an equivalent of fence rw,w; amo<op>. (without .rl) > > > It is also safe to translate any .aq, .rl, or .aqrl annotation > > into > > the fence-based snippets of > > Table 2.2. These can also be used as a legal implementation of > > l{b|h|w|d} or s{b|h|w|d} pseu- > > doinstructions for as long as those instructions are not added > > to > > the ISA. > > > > So according to the spec, it should be: > > sc.w ... > > RISCV_FULL_BARRIER. > > > > Considering [1] and how this code looks before, it seems to me that > > it > > is safe to use lr.w.aq/sc.w.rl here or an fence equivalent. > > Here you say "or". Then why dos the code use sc.?.rl _and_ a fence? I confused this line with amo<op>.aqrl, so based on the table 2.2 above s{b|h|w|d|c}.aqrl is transformed to "fence rw,w; s{b|h|w|d|c}", but Linux kernel decided to strengthen it with full barrier: - "0:\n\t" - "lr.w.aqrl %[p], %[c]\n\t" - "beq %[p], %[u], 1f\n\t" - "add %[rc], %[p], %[a]\n\t" - "sc.w.aqrl %[rc], %[rc], %[c]\n\t" - "bnez %[rc], 0b\n\t" - "1:" + "0: lr.w %[p], %[c]\n" + " beq %[p], %[u], 1f\n" + " add %[rc], %[p], %[a]\n" + " sc.w.rl %[rc], %[rc], %[c]\n" + " bnez %[rc], 0b\n" + " fence rw, rw\n" + "1:\n" As they have the following issue: implementations of atomics such as atomic_cmpxchg() and atomic_add_unless() rely on LR/SC pairs, which do not give full-ordering with .aqrl; for example, current implementations allow the "lr-sc-aqrl-pair-vs-full-barrier" test below to end up with the state indicated in the "exists" clause. > > > But in general it ( a combination of fence, .aq, .rl ) can be > > considered as the same things in this context, so it is possible to > > leave this function as is to be synced here with Linux kernel. > > In turn I also don't understand this. Yes, the excess .rl certainly > doesn't render things unsafe. But what's the purpose of the .rl? > That's > what my original question boiled down to. I don't know, either. As I mentioned before, it is enough ( in my opinion ) to have a FULL barrier or .aq,.rl or .aqrl/.aqrl ( if it needed to be strengthened) like it was done before in Linux. It seems to me it is LKMM specific that they need more to be more strengthened as it RISC-V Memory model requires because: "sc.w ; fence rw, rw" does not guarantee that all previous reads and writes finish before the sc itself is globally visible, which might matter if the sc is unlocking a lock or something. Despite of the fact, for compare-and-swap loops, RISC-V international recommends lr.w.aq/lr.d.aq followed by sc.w.rl/sc.d.rl ( as it was implemeted before in Linux kernel ) I am okay just for safety reasons and for the reason I mentioned at the last sentence of previous paragraph to strengthen implementations with fences. ~ Oleksii > > Jan > > > [1] > > https://lore.kernel.org/lkml/1520274276-21871-1-git-send-email-parri.andrea@xxxxxxxxx > > / > > > > ~ Oleksii >
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |