[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v3 13/34] xen/riscv: introduce cmpxchg.h

To: Jan Beulich <jbeulich@xxxxxxxx>
From: Oleksii <oleksii.kurochko@xxxxxxxxx>
Date: Tue, 23 Jan 2024 14:18:50 +0200
Cc: Alistair Francis <alistair.francis@xxxxxxx>, Bob Eshleman <bobbyeshleman@xxxxxxxxx>, Connor Davis <connojdavis@xxxxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, George Dunlap <george.dunlap@xxxxxxxxxx>, Julien Grall <julien@xxxxxxx>, Stefano Stabellini <sstabellini@xxxxxxxxxx>, Wei Liu <wl@xxxxxxx>, xen-devel@xxxxxxxxxxxxxxxxxxxx
Delivery-date: Tue, 23 Jan 2024 12:19:11 +0000
List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On Tue, 2024-01-23 at 11:28 +0100, Jan Beulich wrote:
> On 23.01.2024 11:15, Oleksii wrote:
> > On Mon, 2024-01-22 at 17:27 +0100, Jan Beulich wrote:
> > > On 22.12.2023 16:12, Oleksii Kurochko wrote:
> > > > +            : "=r" (ret__), "+A" (*ptr__) \
> > > > +            : "r" (new__) \
> > > > +            : "memory" ); \
> > > > +        break; \
> > > > +    case 8: \
> > > > +        asm volatile( \
> > > > +            "  amoswap.d %0, %2, %1\n" \
> > > > +            : "=r" (ret__), "+A" (*ptr__) \
> > > > +            : "r" (new__) \
> > > > +            : "memory" ); \
> > > > +        break; \
> > > > +    default: \
> > > > +        ASSERT_UNREACHABLE(); \
> > > 
> > > If at all possible this wants to trigger a build failure, not a
> > > runtime
> > > one.
> > I'll update that with BUILD_BUG_ON(size > 8);
> 
> What about size accidentally being e.g. 5? I'm also not sure you'll
> be
> able to use BUILD_BUG_ON() here: That'll depend on what use sites
> there
> are. And if all present ones are okay in this regard, you'd still set
> out a trap for someone else to fall into later. We have a common
> approach for this, which currently is under re-work. See
> https://lists.xen.org/archives/html/xen-devel/2024-01/msg01115.html.
Thanks. I'll use that.

> 
> > > > +    } \
> > > > +    ret__; \
> > > > +})
> > > > +
> > > > +#define xchg_relaxed(ptr, x) \
> > > > +({ \
> > > > +    __typeof__(*(ptr)) x_ = (x); \
> > > > +    (__typeof__(*(ptr))) __xchg_relaxed((ptr), x_,
> > > > sizeof(*(ptr))); \
> > > 
> > > Nit: Stray blank after cast. For readability I'd also suggest to
> > > drop parentheses in cases like the first argument passed to
> > > __xchg_relaxed() here.
> > Thanks. I'll take that into account.
> > 
> > > 
> > > > +})
> > > 
> > > For both: What does "relaxed" describe? I'm asking because it's
> > > not
> > > really clear whether the memory clobbers are actually needed.
> > > 
> > > > +#define __xchg_acquire(ptr, new, size) \
> > > > +({ \
> > > > +    __typeof__(ptr) ptr__ = (ptr); \
> > > > +    __typeof__(new) new__ = (new); \
> > > > +    __typeof__(*(ptr)) ret__; \
> > > > +    switch (size) \
> > > > +       { \
> > > > +    case 4: \
> > > > +        asm volatile( \
> > > > +            "  amoswap.w %0, %2, %1\n" \
> > > > +            RISCV_ACQUIRE_BARRIER \
> > > > +            : "=r" (ret__), "+A" (*ptr__) \
> > > > +            : "r" (new__) \
> > > > +            : "memory" ); \
> > > > +        break; \
> > > > +    case 8: \
> > > > +        asm volatile( \
> > > > +            "  amoswap.d %0, %2, %1\n" \
> > > > +            RISCV_ACQUIRE_BARRIER \
> > > > +            : "=r" (ret__), "+A" (*ptr__) \
> > > > +            : "r" (new__) \
> > > > +            : "memory" ); \
> > > > +        break; \
> > > > +    default: \
> > > > +        ASSERT_UNREACHABLE(); \
> > > > +    } \
> > > > +    ret__; \
> > > > +})
> > > 
> > > If I'm not mistaken this differs from __xchg_relaxed() only in
> > > the
> > > use
> > > of RISCV_ACQUIRE_BARRIER, and ...
> > > 
> > > > +#define xchg_acquire(ptr, x) \
> > > > +({ \
> > > > +    __typeof__(*(ptr)) x_ = (x); \
> > > > +    (__typeof__(*(ptr))) __xchg_acquire((ptr), x_,
> > > > sizeof(*(ptr))); \
> > > > +})
> > > > +
> > > > +#define __xchg_release(ptr, new, size) \
> > > > +({ \
> > > > +    __typeof__(ptr) ptr__ = (ptr); \
> > > > +    __typeof__(new) new__ = (new); \
> > > > +    __typeof__(*(ptr)) ret__; \
> > > > +    switch (size) \
> > > > +       { \
> > > > +    case 4: \
> > > > +        asm volatile ( \
> > > > +            RISCV_RELEASE_BARRIER \
> > > > +            "  amoswap.w %0, %2, %1\n" \
> > > > +            : "=r" (ret__), "+A" (*ptr__) \
> > > > +            : "r" (new__) \
> > > > +            : "memory"); \
> > > > +        break; \
> > > > +    case 8: \
> > > > +        asm volatile ( \
> > > > +            RISCV_RELEASE_BARRIER \
> > > > +            "  amoswap.d %0, %2, %1\n" \
> > > > +            : "=r" (ret__), "+A" (*ptr__) \
> > > > +            : "r" (new__) \
> > > > +            : "memory"); \
> > > > +        break; \
> > > > +    default: \
> > > > +        ASSERT_UNREACHABLE(); \
> > > > +    } \
> > > > +    ret__; \
> > > > +})
> > > 
> > > this only in the use of RISCV_RELEASE_BARRIER. If so they likely
> > > want
> > > folding, to limit redundancy and make eventual updating easier.
> > > (Same
> > > for the cmpxchg helper further down, as it seems.)
> > Yes, you are right. The difference is only in RISCV_RELEASE_BARRIER
> > and
> > it is an absence of RISCV_RELEASE_BARRIER is a reason why we have
> > "relaxed" versions.
> > 
> > I am not sure that I understand what you mean by folding here. Do
> > you
> > mean that there is no any sense to have to separate macros and it
> > is
> > needed only one with RISCV_RELEASE_BARRIER?
> 
> No. You should parameterize the folded common macro for the derived
> macros to simply pass in the barriers needed, with empty macro
> arguments indicating "this barrier not needed".
Now it makes sense to me. Thank you for explanation.

> 
> > > > +#define xchg_release(ptr, x) \
> > > > +({ \
> > > > +    __typeof__(*(ptr)) x_ = (x); \
> > > > +    (__typeof__(*(ptr))) __xchg_release((ptr), x_,
> > > > sizeof(*(ptr))); \
> > > > +})
> > > > +
> > > > +static always_inline uint32_t __xchg_case_4(volatile uint32_t
> > > > *ptr,
> > > > +                                            uint32_t new)
> > > > +{
> > > > +    __typeof__(*(ptr)) ret;
> > > > +
> > > > +    asm volatile (
> > > > +        "   amoswap.w.aqrl %0, %2, %1\n"
> > > > +        : "=r" (ret), "+A" (*ptr)
> > > > +        : "r" (new)
> > > > +        : "memory" );
> > > > +
> > > > +    return ret;
> > > > +}
> > > > +
> > > > +static always_inline uint64_t __xchg_case_8(volatile uint64_t
> > > > *ptr,
> > > > +                                            uint64_t new)
> > > > +{
> > > > +    __typeof__(*(ptr)) ret;
> > > > +
> > > > +    asm volatile( \
> > > > +        "   amoswap.d.aqrl %0, %2, %1\n" \
> > > > +        : "=r" (ret), "+A" (*ptr) \
> > > > +        : "r" (new) \
> > > > +        : "memory" ); \
> > > > +
> > > > +    return ret;
> > > > +}
> > > > +
> > > > +static always_inline unsigned short __cmpxchg_case_2(volatile
> > > > uint32_t *ptr,
> > > > +                                                     uint32_t
> > > > old,
> > > > +                                                     uint32_t
> > > > new);
> > > 
> > > Don't you consistently mean uint16_t here (incl the return type)
> > > and
> > > ...
> > > 
> > > > +static always_inline unsigned short __cmpxchg_case_1(volatile
> > > > uint32_t *ptr,
> > > > +                                                     uint32_t
> > > > old,
> > > > +                                                     uint32_t
> > > > new);
> > > 
> > > ... uint8_t here?
> > The idea was that we emulate __cmpxchg_case_1 and __cmpxchg_case_2
> > using 4 bytes cmpxchg and __cmpxchg_case_4 expects uint32_t.
> 
> I consider this wrong. The functions would better be type-correct.
> 
> > > > +static inline unsigned long __xchg(volatile void *ptr,
> > > > unsigned
> > > > long x, int size)
> > > > +{
> > > > +    switch (size) {
> > > > +    case 1:
> > > > +        return __cmpxchg_case_1(ptr, (uint32_t)-1, x);
> > > > +    case 2:
> > > > +        return __cmpxchg_case_2(ptr, (uint32_t)-1, x);
> > > 
> > > How are these going to work? You'll compare against ~0, and if
> > > the
> > > value
> > > in memory isn't ~0, memory won't be updated; you will only
> > > (correctly)
> > > return the value found in memory.
> > > 
> > > Or wait - looking at __cmpxchg_case_{1,2}() far further down, you
> > > ignore
> > > "old" there. Which apparently means they'll work for the use
> > > here,
> > > but
> > > not for the use in __cmpxchg().
> > Yes, the trick is that old is ignored and is read in
> > __emulate_cmpxchg_case1_2() before __cmpxchg_case_4 is called:
> >     do
> > {                                                              
> >         read_val =
> > read_func(aligned_ptr);                            
> >         swapped_new = read_val &
> > ~mask;                               
> >         swapped_new |=
> > masked_new;                                    
> >         ret = cmpxchg_func(aligned_ptr, read_val,
> > swapped_new);       
> >     } while ( ret != read_val
> > );                                      
> > read_val it is 'old'.
> > 
> > But now I am not 100% sure that it is correct for __cmpxchg...
> 
> It just can't be correct - you can't ignore "old" there. I think you
> want simple cmpxchg primitives, which xchg then uses in a loop (while
> cmpxchg uses them plainly).
But xchg doesn't require 'old' value, so it should be ignored in some
way by cmpxchg.

> 
> > > > +static always_inline unsigned short __cmpxchg_case_2(volatile
> > > > uint32_t *ptr,
> > > > +                                                     uint32_t
> > > > old,
> > > > +                                                     uint32_t
> > > > new)
> > > > +{
> > > > +    (void) old;
> > > > +
> > > > +    if (((unsigned long)ptr & 3) == 3)
> > > > +    {
> > > > +#ifdef CONFIG_64BIT
> > > > +        return __emulate_cmpxchg_case1_2((uint64_t *)ptr, new,
> > > > +                                         readq,
> > > > __cmpxchg_case_8,
> > > > 0xffffU);
> > > 
> > > What if ((unsigned long)ptr & 7) == 7 (which is a sub-case of
> > > what
> > > the
> > > if() above checks for? Isn't it more reasonable to require
> > > aligned
> > > 16-bit quantities here? Or if mis-aligned addresses are okay, you
> > > could
> > > as well emulate using __cmpxchg_case_4().
> > Yes, it will be more reasonable. I'll use IS_ALIGNED instead.
> 
> Not sure I get your use of "instead" here correctly. There's more
> to change here than just the if() condition.
I meant something like:

if ( IS_ALIGNED(ptr, 16) )
    __emulate_cmpxchg_case1_2(...);
else
    assert_failed("ptr isn't aligned\n");

I have to check if it is necessary to have an mis-aligned address for
CAS intstuctions.

If mis-aligned address is okay then it looks like we can use always 
__cmpxchng_case_4 without checking if a ptr is ALIGNED or not. Or I
started to loose something...

~ Oleksii

Follow-Ups:
- Re: [PATCH v3 13/34] xen/riscv: introduce cmpxchg.h
  - From: Jan Beulich

References:
- Re: [PATCH v3 13/34] xen/riscv: introduce cmpxchg.h
  - From: Jan Beulich
- Re: [PATCH v3 13/34] xen/riscv: introduce cmpxchg.h
  - From: Oleksii
- Re: [PATCH v3 13/34] xen/riscv: introduce cmpxchg.h
  - From: Jan Beulich

Prev by Date: Re: [PATCH v5 09/13] xen: add cache coloring allocator for domains
Next by Date: Re: [PATCH v3 15/34] xen/riscv: introduce atomic.h
Previous by thread: Re: [PATCH v3 13/34] xen/riscv: introduce cmpxchg.h
Next by thread: Re: [PATCH v3 13/34] xen/riscv: introduce cmpxchg.h
Index(es):
- Date
- Thread

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.