[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v3 13/34] xen/riscv: introduce cmpxchg.h



On Mon, 2024-01-22 at 17:27 +0100, Jan Beulich wrote:
> On 22.12.2023 16:12, Oleksii Kurochko wrote:
> > --- /dev/null
> > +++ b/xen/arch/riscv/include/asm/cmpxchg.h
> > @@ -0,0 +1,496 @@
> > +/* SPDX-License-Identifier: GPL-2.0-only */
> > +/* Copyright (C) 2014 Regents of the University of California */
> > +
> > +#ifndef _ASM_RISCV_CMPXCHG_H
> > +#define _ASM_RISCV_CMPXCHG_H
> > +
> > +#include <xen/compiler.h>
> > +#include <xen/lib.h>
> > +
> > +#include <asm/fence.h>
> > +#include <asm/io.h>
> > +#include <asm/system.h>
> > +
> > +#define __xchg_relaxed(ptr, new, size) \
> > +({ \
> > +    __typeof__(ptr) ptr__ = (ptr); \
> > +    __typeof__(new) new__ = (new); \
> > +    __typeof__(*(ptr)) ret__; \
> 
> I expect the types of new and *ptr want to actually match. Which
> you then want to enforce, so that issues at use sites would either
> be reported by the compiler, or be permitted by a type conversion
> of new.
I am OK to make the same type for new and ret__, but it looks like
__xchg_relaxed() is used only inside xchg_relaxed() where 'new' is
declared with the same type as *ptr.

> 
> > +    switch (size) \
> > +   { \
> 
> Nit: Hard tab left here. (Also again you want to either stick to
> Linux style or fully switch to Xen style.)
Thanks. I'll update that.

> 
> > +    case 4: \
> > +        asm volatile( \
> > +            "      amoswap.w %0, %2, %1\n" \
> 
> I don't think a leading tab (or leading whitespace in general) is
> needed in single-line-output asm()s. The trailing \n also isn't
> needed if I'm not mistaken.
I just wanted to align it with "=r", but for sure a leading tab or
whitespace can be dropped.

> 
> > +            : "=r" (ret__), "+A" (*ptr__) \
> > +            : "r" (new__) \
> > +            : "memory" ); \
> > +        break; \
> > +    case 8: \
> > +        asm volatile( \
> > +            "      amoswap.d %0, %2, %1\n" \
> > +            : "=r" (ret__), "+A" (*ptr__) \
> > +            : "r" (new__) \
> > +            : "memory" ); \
> > +        break; \
> > +    default: \
> > +        ASSERT_UNREACHABLE(); \
> 
> If at all possible this wants to trigger a build failure, not a
> runtime
> one.
I'll update that with BUILD_BUG_ON(size > 8);
> 
> > +    } \
> > +    ret__; \
> > +})
> > +
> > +#define xchg_relaxed(ptr, x) \
> > +({ \
> > +    __typeof__(*(ptr)) x_ = (x); \
> > +    (__typeof__(*(ptr))) __xchg_relaxed((ptr), x_,
> > sizeof(*(ptr))); \
> 
> Nit: Stray blank after cast. For readability I'd also suggest to
> drop parentheses in cases like the first argument passed to
> __xchg_relaxed() here.
Thanks. I'll take that into account.

> 
> > +})
> 
> For both: What does "relaxed" describe? I'm asking because it's not
> really clear whether the memory clobbers are actually needed.
> 
> > +#define __xchg_acquire(ptr, new, size) \
> > +({ \
> > +    __typeof__(ptr) ptr__ = (ptr); \
> > +    __typeof__(new) new__ = (new); \
> > +    __typeof__(*(ptr)) ret__; \
> > +    switch (size) \
> > +   { \
> > +    case 4: \
> > +        asm volatile( \
> > +            "      amoswap.w %0, %2, %1\n" \
> > +            RISCV_ACQUIRE_BARRIER \
> > +            : "=r" (ret__), "+A" (*ptr__) \
> > +            : "r" (new__) \
> > +            : "memory" ); \
> > +        break; \
> > +    case 8: \
> > +        asm volatile( \
> > +            "      amoswap.d %0, %2, %1\n" \
> > +            RISCV_ACQUIRE_BARRIER \
> > +            : "=r" (ret__), "+A" (*ptr__) \
> > +            : "r" (new__) \
> > +            : "memory" ); \
> > +        break; \
> > +    default: \
> > +        ASSERT_UNREACHABLE(); \
> > +    } \
> > +    ret__; \
> > +})
> 
> If I'm not mistaken this differs from __xchg_relaxed() only in the
> use
> of RISCV_ACQUIRE_BARRIER, and ...
> 
> > +#define xchg_acquire(ptr, x) \
> > +({ \
> > +    __typeof__(*(ptr)) x_ = (x); \
> > +    (__typeof__(*(ptr))) __xchg_acquire((ptr), x_,
> > sizeof(*(ptr))); \
> > +})
> > +
> > +#define __xchg_release(ptr, new, size) \
> > +({ \
> > +    __typeof__(ptr) ptr__ = (ptr); \
> > +    __typeof__(new) new__ = (new); \
> > +    __typeof__(*(ptr)) ret__; \
> > +    switch (size) \
> > +   { \
> > +    case 4: \
> > +        asm volatile ( \
> > +            RISCV_RELEASE_BARRIER \
> > +            "      amoswap.w %0, %2, %1\n" \
> > +            : "=r" (ret__), "+A" (*ptr__) \
> > +            : "r" (new__) \
> > +            : "memory"); \
> > +        break; \
> > +    case 8: \
> > +        asm volatile ( \
> > +            RISCV_RELEASE_BARRIER \
> > +            "      amoswap.d %0, %2, %1\n" \
> > +            : "=r" (ret__), "+A" (*ptr__) \
> > +            : "r" (new__) \
> > +            : "memory"); \
> > +        break; \
> > +    default: \
> > +        ASSERT_UNREACHABLE(); \
> > +    } \
> > +    ret__; \
> > +})
> 
> this only in the use of RISCV_RELEASE_BARRIER. If so they likely want
> folding, to limit redundancy and make eventual updating easier. (Same
> for the cmpxchg helper further down, as it seems.)
Yes, you are right. The difference is only in RISCV_RELEASE_BARRIER and
it is an absence of RISCV_RELEASE_BARRIER is a reason why we have
"relaxed" versions.

I am not sure that I understand what you mean by folding here. Do you
mean that there is no any sense to have to separate macros and it is
needed only one with RISCV_RELEASE_BARRIER?

> 
> > +#define xchg_release(ptr, x) \
> > +({ \
> > +    __typeof__(*(ptr)) x_ = (x); \
> > +    (__typeof__(*(ptr))) __xchg_release((ptr), x_,
> > sizeof(*(ptr))); \
> > +})
> > +
> > +static always_inline uint32_t __xchg_case_4(volatile uint32_t
> > *ptr,
> > +                                            uint32_t new)
> > +{
> > +    __typeof__(*(ptr)) ret;
> > +
> > +    asm volatile (
> > +        "   amoswap.w.aqrl %0, %2, %1\n"
> > +        : "=r" (ret), "+A" (*ptr)
> > +        : "r" (new)
> > +        : "memory" );
> > +
> > +    return ret;
> > +}
> > +
> > +static always_inline uint64_t __xchg_case_8(volatile uint64_t
> > *ptr,
> > +                                            uint64_t new)
> > +{
> > +    __typeof__(*(ptr)) ret;
> > +
> > +    asm volatile( \
> > +        "   amoswap.d.aqrl %0, %2, %1\n" \
> > +        : "=r" (ret), "+A" (*ptr) \
> > +        : "r" (new) \
> > +        : "memory" ); \
> > +
> > +    return ret;
> > +}
> > +
> > +static always_inline unsigned short __cmpxchg_case_2(volatile
> > uint32_t *ptr,
> > +                                                     uint32_t old,
> > +                                                     uint32_t
> > new);
> 
> Don't you consistently mean uint16_t here (incl the return type) and
> ...
> 
> > +static always_inline unsigned short __cmpxchg_case_1(volatile
> > uint32_t *ptr,
> > +                                                     uint32_t old,
> > +                                                     uint32_t
> > new);
> 
> ... uint8_t here?
The idea was that we emulate __cmpxchg_case_1 and __cmpxchg_case_2
using 4 bytes cmpxchg and __cmpxchg_case_4 expects uint32_t.

> 
> > +static inline unsigned long __xchg(volatile void *ptr, unsigned
> > long x, int size)
> > +{
> > +    switch (size) {
> > +    case 1:
> > +        return __cmpxchg_case_1(ptr, (uint32_t)-1, x);
> > +    case 2:
> > +        return __cmpxchg_case_2(ptr, (uint32_t)-1, x);
> 
> How are these going to work? You'll compare against ~0, and if the
> value
> in memory isn't ~0, memory won't be updated; you will only
> (correctly)
> return the value found in memory.
> 
> Or wait - looking at __cmpxchg_case_{1,2}() far further down, you
> ignore
> "old" there. Which apparently means they'll work for the use here,
> but
> not for the use in __cmpxchg().
Yes, the trick is that old is ignored and is read in
__emulate_cmpxchg_case1_2() before __cmpxchg_case_4 is called:
    do {                                                              
        read_val = read_func(aligned_ptr);                            
        swapped_new = read_val & ~mask;                               
        swapped_new |= masked_new;                                    
        ret = cmpxchg_func(aligned_ptr, read_val, swapped_new);       
    } while ( ret != read_val );                                      
read_val it is 'old'.

But now I am not 100% sure that it is correct for __cmpxchg...

> 
> > +    case 4:
> > +        return __xchg_case_4(ptr, x);
> > +    case 8:
> > +        return __xchg_case_8(ptr, x);
> > +    default:
> > +        ASSERT_UNREACHABLE();
> > +    }
> > +
> > +    return -1;
> > +}
> > +
> > +#define xchg(ptr,x) \
> > +({ \
> > +    __typeof__(*(ptr)) ret__; \
> > +    ret__ = (__typeof__(*(ptr))) \
> > +            __xchg((ptr), (unsigned long)(x), sizeof(*(ptr))); \
> > +    ret__; \
> > +})
> > +
> > +#define xchg32(ptr, x) \
> > +({ \
> > +    BUILD_BUG_ON(sizeof(*(ptr)) != 4); \
> > +    xchg((ptr), (x)); \
> > +})
> > +
> > +#define xchg64(ptr, x) \
> > +({ \
> > +    BUILD_BUG_ON(sizeof(*(ptr)) != 8); \
> > +    xchg((ptr), (x)); \
> > +})
> 
> What are these two (and their cmpxchg counterparts) needed for?
It was copied from Linux, but it looks like they can be really dropped,
I can't find where it is used, so I'll drop them.

> 
> > +/*
> > + * Atomic compare and exchange.  Compare OLD with MEM, if
> > identical,
> > + * store NEW in MEM.  Return the initial value in MEM.  Success is
> > + * indicated by comparing RETURN with OLD.
> > + */
> > +#define __cmpxchg_relaxed(ptr, old, new, size) \
> > +({ \
> > +    __typeof__(ptr) ptr__ = (ptr); \
> > +    __typeof__(*(ptr)) __old = (old); \
> 
> Leftover leading underscores?
> 
> > +    __typeof__(*(ptr)) new__ = (new); \
> 
> Related to my earlier comment on types needing to be compatible - see
> how here you're using "ptr" throughout.
> 
> > +    __typeof__(*(ptr)) ret__; \
> > +    register unsigned int __rc; \
> 
> More leftover leading underscores?
Missed to drop underscores. Thanks. I'll drop them in next version.
> 
> > +static always_inline unsigned short __cmpxchg_case_2(volatile
> > uint32_t *ptr,
> > +                                                     uint32_t old,
> > +                                                     uint32_t new)
> > +{
> > +    (void) old;
> > +
> > +    if (((unsigned long)ptr & 3) == 3)
> > +    {
> > +#ifdef CONFIG_64BIT
> > +        return __emulate_cmpxchg_case1_2((uint64_t *)ptr, new,
> > +                                         readq, __cmpxchg_case_8,
> > 0xffffU);
> 
> What if ((unsigned long)ptr & 7) == 7 (which is a sub-case of what
> the
> if() above checks for? Isn't it more reasonable to require aligned
> 16-bit quantities here? Or if mis-aligned addresses are okay, you
> could
> as well emulate using __cmpxchg_case_4().
Yes, it will be more reasonable. I'll use IS_ALIGNED instead.

> 
> Also you shouldn't be casting away volatile (here and below).
> Avoiding
> the casts (by suitable using volatile void * parameter types) would
> likely be best.
Thanks. I'll update the function prototype.

~ Oleksii


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.