|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH v6 08/20] xen/riscv: introduce cmpxchg.h
On Thu, 2024-03-21 at 13:07 +0100, Jan Beulich wrote:
> On 15.03.2024 19:06, Oleksii Kurochko wrote:
> > The header was taken from Linux kernl 6.4.0-rc1.
> >
> > Addionally, were updated:
> > * add emulation of {cmp}xchg for 1/2 byte types using 32-bit atomic
> > access.
> > * replace tabs with spaces
> > * replace __* variale with *__
> > * introduce generic version of xchg_* and cmpxchg_*.
> > * drop {cmp}xchg{release,relaxed,acquire} as Xen doesn't use them
>
> With this, ...
>
> > * drop barries and use instruction suffixices instead ( .aq, .rl,
> > .aqrl )
> >
> > Implementation of 4- and 8-byte cases were updated according to the
> > spec:
> > ```
> > ....
> > Linux Construct RVWMO AMO Mapping
> > atomic <op> relaxed amo<op>.{w|d}
> > atomic <op> acquire amo<op>.{w|d}.aq
> > atomic <op> release amo<op>.{w|d}.rl
> > atomic <op> amo<op>.{w|d}.aqrl
> > Linux Construct RVWMO LR/SC Mapping
> > atomic <op> relaxed loop: lr.{w|d}; <op>; sc.{w|d}; bnez loop
> > atomic <op> acquire loop: lr.{w|d}.aq; <op>; sc.{w|d}; bnez
> > loop
> > atomic <op> release loop: lr.{w|d}; <op>; sc.{w|d}.aqrl∗ ; bnez
> > loop OR
> > fence.tso; loop: lr.{w|d}; <op>; sc.{w|d}∗
> > ; bnez loop
> > atomic <op> loop: lr.{w|d}.aq; <op>; sc.{w|d}.aqrl;
> > bnez loop
> >
> > Table A.5: Mappings from Linux memory primitives to RISC-V
> > primitives
> >
> > ```
>
> ... I consider quoting this table in full, without any further
> remarks, as
> confusing: Three of the lines each are inapplicable now, aiui.
I'll shorten the table then.
>
> Further what are the two * telling us? Quite likely they aren't there
> just
> accidentally.
>
> Finally, why sc.{w|d}.aqrl when in principle one would expect just
> sc.{w|d}.rl?
Because according to the last line of table A.5:
atomic <op> loop: lr.{w|d}.aq; <op>; sc.{w|d}.aqrl;
Here it is used sc.{w|d}.aqrl form, so I decided to stick to the what
is mentioned in the table.
>
> > +})
> > +
> > +static always_inline unsigned long __xchg(volatile void *ptr,
> > unsigned long new, int size)
> > +{
> > + unsigned long ret;
> > +
> > + switch ( size )
> > + {
> > + case 1:
> > + ret = emulate_xchg_1_2((volatile uint8_t *)ptr, new,
> > ".aq", ".aqrl");
> > + break;
> > + case 2:
> > + ret = emulate_xchg_1_2((volatile uint16_t *)ptr, new,
> > ".aq", ".aqrl");
> > + break;
> > + case 4:
> > + __amoswap_generic((volatile uint32_t *)ptr, new, ret,
> > ".w.aqrl");
> > + break;
> > +#ifndef CONFIG_32BIT
>
> There's no 32BIT Kconfig symbol; all we have is a 64BIT one.
I meant here CONFIG_RISCV_32.
>
> > + case 8:
> > + __amoswap_generic((volatile uint64_t *)ptr, new, ret,
> > ".d.aqrl");
> > + break;
> > +#endif
> > + default:
> > + STATIC_ASSERT_UNREACHABLE();
> > + }
> > +
> > + return ret;
> > +}
> > +
> > +#define xchg(ptr, x) \
> > +({ \
> > + __typeof__(*(ptr)) n_ = (x); \
> > + (__typeof__(*(ptr))) \
> > + __xchg((ptr), (unsigned long)(n_), sizeof(*(ptr))); \
>
> Nit: While excess parentheses "only" harm readability, they would
> nevertheless better be omitted (here: the first argument passed).
>
> > +})
> > +
> > +#define __generic_cmpxchg(ptr, old, new, ret, lr_sfx, sc_sfx) \
> > + ({ \
> > + register unsigned int rc; \
>
> Nit: We don't normally use "register", unless accompanied by asm()
> tying
> a variable to a specific one.
>
> > + __typeof__(*(ptr)) old__ = (__typeof__(*(ptr)))(old); \
> > + __typeof__(*(ptr)) new__ = (__typeof__(*(ptr)))(new); \
>
> The casts aren't very nice to have here; I take they're needed for
> cmpxchg_ptr() to compile?
Not really, I have not faced an compilation issue.
The reason why it was added is that lr instruction places the sign-
extended value in destination register, but if not to do cast value for
old and new were generated without sign extension, so, for example:
u32= 0xbbbbbbbb;
cmpxchg(&u32, 0xbbbbbbbb, 0xCCCCCCCC), u32);
Will fail because after:
"0: lr" lr_sfx " %0, %2\n"
in %0 we will have 0xFFFFFFFFBBBBBBBB, but in %3 we will have
0xBBBBBBBB, so
bne %0, %z3, 1f\n"
%0 and %3 are always inequal in case when the most significat bit of
value read from %2 has 1.
But now I realized that it would be better just to use a mask and not
be dependent from compiler code generation, so it would be better to in
the following way ( of course, the macros should be updated accordingly
to remarks you give me ):
#define __generic_cmpxchg(ptr, old, new, ret, lr_sfx, sc_sfx) \
({ \
register unsigned int rc; \
unsigned long mask = GENMASK(((sizeof(*(ptr))) * BITS_PER_BYTE)
- 1, 0); \
asm volatile( \
"0: lr" lr_sfx " %0, %2\n" \
" and %0, %0, %z[mask]\n" \
" bne %0, %z3, 1f\n" \
" sc" sc_sfx " %1, %z4, %2\n" \
" bnez %1, 0b\n" \
"1:\n" \
: "=&r" (ret), "=&r" (rc), "+A" (*ptr) \
: "rJ" (old), "rJ" (new), [mask] "rJ" (mask) \
: "memory"); \
})
>
> > + asm volatile( \
>
> Nit: Missing blank once again. Would be really nice if you could go
> through and sort this uniformly for the series.
>
> > + "0: lr" lr_sfx " %0, %2\n" \
> > + " bne %0, %z3, 1f\n" \
> > + " sc" sc_sfx " %1, %z4, %2\n" \
> > + " bnez %1, 0b\n" \
> > + "1:\n" \
> > + : "=&r" (ret), "=&r" (rc), "+A" (*ptr) \
> > + : "rJ" (old__), "rJ" (new__) \
>
> Please could I talk you into using named operands here, too?
Sure, I will add them.
~ Oleksii
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |