[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v3 13/34] xen/riscv: introduce cmpxchg.h


  • To: Oleksii <oleksii.kurochko@xxxxxxxxx>
  • From: Jan Beulich <jbeulich@xxxxxxxx>
  • Date: Tue, 23 Jan 2024 11:28:00 +0100
  • Autocrypt: addr=jbeulich@xxxxxxxx; keydata= xsDiBFk3nEQRBADAEaSw6zC/EJkiwGPXbWtPxl2xCdSoeepS07jW8UgcHNurfHvUzogEq5xk hu507c3BarVjyWCJOylMNR98Yd8VqD9UfmX0Hb8/BrA+Hl6/DB/eqGptrf4BSRwcZQM32aZK 7Pj2XbGWIUrZrd70x1eAP9QE3P79Y2oLrsCgbZJfEwCgvz9JjGmQqQkRiTVzlZVCJYcyGGsD /0tbFCzD2h20ahe8rC1gbb3K3qk+LpBtvjBu1RY9drYk0NymiGbJWZgab6t1jM7sk2vuf0Py O9Hf9XBmK0uE9IgMaiCpc32XV9oASz6UJebwkX+zF2jG5I1BfnO9g7KlotcA/v5ClMjgo6Gl MDY4HxoSRu3i1cqqSDtVlt+AOVBJBACrZcnHAUSuCXBPy0jOlBhxPqRWv6ND4c9PH1xjQ3NP nxJuMBS8rnNg22uyfAgmBKNLpLgAGVRMZGaGoJObGf72s6TeIqKJo/LtggAS9qAUiuKVnygo 3wjfkS9A3DRO+SpU7JqWdsveeIQyeyEJ/8PTowmSQLakF+3fote9ybzd880fSmFuIEJldWxp Y2ggPGpiZXVsaWNoQHN1c2UuY29tPsJgBBMRAgAgBQJZN5xEAhsDBgsJCAcDAgQVAggDBBYC AwECHgECF4AACgkQoDSui/t3IH4J+wCfQ5jHdEjCRHj23O/5ttg9r9OIruwAn3103WUITZee e7Sbg12UgcQ5lv7SzsFNBFk3nEQQCACCuTjCjFOUdi5Nm244F+78kLghRcin/awv+IrTcIWF hUpSs1Y91iQQ7KItirz5uwCPlwejSJDQJLIS+QtJHaXDXeV6NI0Uef1hP20+y8qydDiVkv6l IreXjTb7DvksRgJNvCkWtYnlS3mYvQ9NzS9PhyALWbXnH6sIJd2O9lKS1Mrfq+y0IXCP10eS FFGg+Av3IQeFatkJAyju0PPthyTqxSI4lZYuJVPknzgaeuJv/2NccrPvmeDg6Coe7ZIeQ8Yj t0ARxu2xytAkkLCel1Lz1WLmwLstV30g80nkgZf/wr+/BXJW/oIvRlonUkxv+IbBM3dX2OV8 AmRv1ySWPTP7AAMFB/9PQK/VtlNUJvg8GXj9ootzrteGfVZVVT4XBJkfwBcpC/XcPzldjv+3 HYudvpdNK3lLujXeA5fLOH+Z/G9WBc5pFVSMocI71I8bT8lIAzreg0WvkWg5V2WZsUMlnDL9 mpwIGFhlbM3gfDMs7MPMu8YQRFVdUvtSpaAs8OFfGQ0ia3LGZcjA6Ik2+xcqscEJzNH+qh8V m5jjp28yZgaqTaRbg3M/+MTbMpicpZuqF4rnB0AQD12/3BNWDR6bmh+EkYSMcEIpQmBM51qM EKYTQGybRCjpnKHGOxG0rfFY1085mBDZCH5Kx0cl0HVJuQKC+dV2ZY5AqjcKwAxpE75MLFkr wkkEGBECAAkFAlk3nEQCGwwACgkQoDSui/t3IH7nnwCfcJWUDUFKdCsBH/E5d+0ZnMQi+G0A nAuWpQkjM1ASeQwSHEeAWPgskBQL
  • Cc: Alistair Francis <alistair.francis@xxxxxxx>, Bob Eshleman <bobbyeshleman@xxxxxxxxx>, Connor Davis <connojdavis@xxxxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, George Dunlap <george.dunlap@xxxxxxxxxx>, Julien Grall <julien@xxxxxxx>, Stefano Stabellini <sstabellini@xxxxxxxxxx>, Wei Liu <wl@xxxxxxx>, xen-devel@xxxxxxxxxxxxxxxxxxxx
  • Delivery-date: Tue, 23 Jan 2024 10:28:12 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 23.01.2024 11:15, Oleksii wrote:
> On Mon, 2024-01-22 at 17:27 +0100, Jan Beulich wrote:
>> On 22.12.2023 16:12, Oleksii Kurochko wrote:
>>> +            : "=r" (ret__), "+A" (*ptr__) \
>>> +            : "r" (new__) \
>>> +            : "memory" ); \
>>> +        break; \
>>> +    case 8: \
>>> +        asm volatile( \
>>> +            "      amoswap.d %0, %2, %1\n" \
>>> +            : "=r" (ret__), "+A" (*ptr__) \
>>> +            : "r" (new__) \
>>> +            : "memory" ); \
>>> +        break; \
>>> +    default: \
>>> +        ASSERT_UNREACHABLE(); \
>>
>> If at all possible this wants to trigger a build failure, not a
>> runtime
>> one.
> I'll update that with BUILD_BUG_ON(size > 8);

What about size accidentally being e.g. 5? I'm also not sure you'll be
able to use BUILD_BUG_ON() here: That'll depend on what use sites there
are. And if all present ones are okay in this regard, you'd still set
out a trap for someone else to fall into later. We have a common
approach for this, which currently is under re-work. See
https://lists.xen.org/archives/html/xen-devel/2024-01/msg01115.html.

>>> +    } \
>>> +    ret__; \
>>> +})
>>> +
>>> +#define xchg_relaxed(ptr, x) \
>>> +({ \
>>> +    __typeof__(*(ptr)) x_ = (x); \
>>> +    (__typeof__(*(ptr))) __xchg_relaxed((ptr), x_,
>>> sizeof(*(ptr))); \
>>
>> Nit: Stray blank after cast. For readability I'd also suggest to
>> drop parentheses in cases like the first argument passed to
>> __xchg_relaxed() here.
> Thanks. I'll take that into account.
> 
>>
>>> +})
>>
>> For both: What does "relaxed" describe? I'm asking because it's not
>> really clear whether the memory clobbers are actually needed.
>>
>>> +#define __xchg_acquire(ptr, new, size) \
>>> +({ \
>>> +    __typeof__(ptr) ptr__ = (ptr); \
>>> +    __typeof__(new) new__ = (new); \
>>> +    __typeof__(*(ptr)) ret__; \
>>> +    switch (size) \
>>> +   { \
>>> +    case 4: \
>>> +        asm volatile( \
>>> +            "      amoswap.w %0, %2, %1\n" \
>>> +            RISCV_ACQUIRE_BARRIER \
>>> +            : "=r" (ret__), "+A" (*ptr__) \
>>> +            : "r" (new__) \
>>> +            : "memory" ); \
>>> +        break; \
>>> +    case 8: \
>>> +        asm volatile( \
>>> +            "      amoswap.d %0, %2, %1\n" \
>>> +            RISCV_ACQUIRE_BARRIER \
>>> +            : "=r" (ret__), "+A" (*ptr__) \
>>> +            : "r" (new__) \
>>> +            : "memory" ); \
>>> +        break; \
>>> +    default: \
>>> +        ASSERT_UNREACHABLE(); \
>>> +    } \
>>> +    ret__; \
>>> +})
>>
>> If I'm not mistaken this differs from __xchg_relaxed() only in the
>> use
>> of RISCV_ACQUIRE_BARRIER, and ...
>>
>>> +#define xchg_acquire(ptr, x) \
>>> +({ \
>>> +    __typeof__(*(ptr)) x_ = (x); \
>>> +    (__typeof__(*(ptr))) __xchg_acquire((ptr), x_,
>>> sizeof(*(ptr))); \
>>> +})
>>> +
>>> +#define __xchg_release(ptr, new, size) \
>>> +({ \
>>> +    __typeof__(ptr) ptr__ = (ptr); \
>>> +    __typeof__(new) new__ = (new); \
>>> +    __typeof__(*(ptr)) ret__; \
>>> +    switch (size) \
>>> +   { \
>>> +    case 4: \
>>> +        asm volatile ( \
>>> +            RISCV_RELEASE_BARRIER \
>>> +            "      amoswap.w %0, %2, %1\n" \
>>> +            : "=r" (ret__), "+A" (*ptr__) \
>>> +            : "r" (new__) \
>>> +            : "memory"); \
>>> +        break; \
>>> +    case 8: \
>>> +        asm volatile ( \
>>> +            RISCV_RELEASE_BARRIER \
>>> +            "      amoswap.d %0, %2, %1\n" \
>>> +            : "=r" (ret__), "+A" (*ptr__) \
>>> +            : "r" (new__) \
>>> +            : "memory"); \
>>> +        break; \
>>> +    default: \
>>> +        ASSERT_UNREACHABLE(); \
>>> +    } \
>>> +    ret__; \
>>> +})
>>
>> this only in the use of RISCV_RELEASE_BARRIER. If so they likely want
>> folding, to limit redundancy and make eventual updating easier. (Same
>> for the cmpxchg helper further down, as it seems.)
> Yes, you are right. The difference is only in RISCV_RELEASE_BARRIER and
> it is an absence of RISCV_RELEASE_BARRIER is a reason why we have
> "relaxed" versions.
> 
> I am not sure that I understand what you mean by folding here. Do you
> mean that there is no any sense to have to separate macros and it is
> needed only one with RISCV_RELEASE_BARRIER?

No. You should parameterize the folded common macro for the derived
macros to simply pass in the barriers needed, with empty macro
arguments indicating "this barrier not needed".

>>> +#define xchg_release(ptr, x) \
>>> +({ \
>>> +    __typeof__(*(ptr)) x_ = (x); \
>>> +    (__typeof__(*(ptr))) __xchg_release((ptr), x_,
>>> sizeof(*(ptr))); \
>>> +})
>>> +
>>> +static always_inline uint32_t __xchg_case_4(volatile uint32_t
>>> *ptr,
>>> +                                            uint32_t new)
>>> +{
>>> +    __typeof__(*(ptr)) ret;
>>> +
>>> +    asm volatile (
>>> +        "   amoswap.w.aqrl %0, %2, %1\n"
>>> +        : "=r" (ret), "+A" (*ptr)
>>> +        : "r" (new)
>>> +        : "memory" );
>>> +
>>> +    return ret;
>>> +}
>>> +
>>> +static always_inline uint64_t __xchg_case_8(volatile uint64_t
>>> *ptr,
>>> +                                            uint64_t new)
>>> +{
>>> +    __typeof__(*(ptr)) ret;
>>> +
>>> +    asm volatile( \
>>> +        "   amoswap.d.aqrl %0, %2, %1\n" \
>>> +        : "=r" (ret), "+A" (*ptr) \
>>> +        : "r" (new) \
>>> +        : "memory" ); \
>>> +
>>> +    return ret;
>>> +}
>>> +
>>> +static always_inline unsigned short __cmpxchg_case_2(volatile
>>> uint32_t *ptr,
>>> +                                                     uint32_t old,
>>> +                                                     uint32_t
>>> new);
>>
>> Don't you consistently mean uint16_t here (incl the return type) and
>> ...
>>
>>> +static always_inline unsigned short __cmpxchg_case_1(volatile
>>> uint32_t *ptr,
>>> +                                                     uint32_t old,
>>> +                                                     uint32_t
>>> new);
>>
>> ... uint8_t here?
> The idea was that we emulate __cmpxchg_case_1 and __cmpxchg_case_2
> using 4 bytes cmpxchg and __cmpxchg_case_4 expects uint32_t.

I consider this wrong. The functions would better be type-correct.

>>> +static inline unsigned long __xchg(volatile void *ptr, unsigned
>>> long x, int size)
>>> +{
>>> +    switch (size) {
>>> +    case 1:
>>> +        return __cmpxchg_case_1(ptr, (uint32_t)-1, x);
>>> +    case 2:
>>> +        return __cmpxchg_case_2(ptr, (uint32_t)-1, x);
>>
>> How are these going to work? You'll compare against ~0, and if the
>> value
>> in memory isn't ~0, memory won't be updated; you will only
>> (correctly)
>> return the value found in memory.
>>
>> Or wait - looking at __cmpxchg_case_{1,2}() far further down, you
>> ignore
>> "old" there. Which apparently means they'll work for the use here,
>> but
>> not for the use in __cmpxchg().
> Yes, the trick is that old is ignored and is read in
> __emulate_cmpxchg_case1_2() before __cmpxchg_case_4 is called:
>     do {                                                              
>         read_val = read_func(aligned_ptr);                            
>         swapped_new = read_val & ~mask;                               
>         swapped_new |= masked_new;                                    
>         ret = cmpxchg_func(aligned_ptr, read_val, swapped_new);       
>     } while ( ret != read_val );                                      
> read_val it is 'old'.
> 
> But now I am not 100% sure that it is correct for __cmpxchg...

It just can't be correct - you can't ignore "old" there. I think you
want simple cmpxchg primitives, which xchg then uses in a loop (while
cmpxchg uses them plainly).

>>> +static always_inline unsigned short __cmpxchg_case_2(volatile
>>> uint32_t *ptr,
>>> +                                                     uint32_t old,
>>> +                                                     uint32_t new)
>>> +{
>>> +    (void) old;
>>> +
>>> +    if (((unsigned long)ptr & 3) == 3)
>>> +    {
>>> +#ifdef CONFIG_64BIT
>>> +        return __emulate_cmpxchg_case1_2((uint64_t *)ptr, new,
>>> +                                         readq, __cmpxchg_case_8,
>>> 0xffffU);
>>
>> What if ((unsigned long)ptr & 7) == 7 (which is a sub-case of what
>> the
>> if() above checks for? Isn't it more reasonable to require aligned
>> 16-bit quantities here? Or if mis-aligned addresses are okay, you
>> could
>> as well emulate using __cmpxchg_case_4().
> Yes, it will be more reasonable. I'll use IS_ALIGNED instead.

Not sure I get your use of "instead" here correctly. There's more
to change here than just the if() condition.

Jan



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.