[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v6 10/20] xen/riscv: introduce atomic.h



On Mon, 2024-03-25 at 09:18 +0100, Jan Beulich wrote:
> On 22.03.2024 13:25, Oleksii wrote:
> > On Thu, 2024-03-21 at 14:03 +0100, Jan Beulich wrote:
> > > On 15.03.2024 19:06, Oleksii Kurochko wrote:
> > > > + */
> > > > +static always_inline void read_atomic_size(const volatile void
> > > > *p,
> > > > +                                           void *res,
> > > > +                                           unsigned int size)
> > > > +{
> > > > +    switch ( size )
> > > > +    {
> > > > +    case 1: *(uint8_t *)res = readb(p); break;
> > > > +    case 2: *(uint16_t *)res = readw(p); break;
> > > > +    case 4: *(uint32_t *)res = readl(p); break;
> > > > +    case 8: *(uint32_t *)res  = readq(p); break;
> > > 
> > > Nit: Excess blank before =.
> > > 
> > > Also - no #ifdef here to be RV32-ready?
> > Because there is #ifdef RV32 in io.h for readq().
> 
> There you'd run into __raw_readq()'s BUILD_BUG_ON(), afaict even for
> 1-, 2-, or 4-byte accesses. That's not quite what we want here.
Do you mean that if someone will redefine readq() in another way and
not wrap it by #ifdef RV32? Except this I am not sure that there is an
issue as it will be still a compilation error, so anyway it will be
needed to provide an implementation for __raw_readq().

One of the reason why I decided to wrap with #ifdef RV32 in io.h to not
go over the source code and add wrapping. Also for some code it will be
needed to rewrite it. For example, I am not sure that I can add #ifdef
inside macros, f.e.:
   #define write_atomic(p, x)                              \
   ({                                                      \
       typeof(*(p)) x__ = (x);                             \
       switch ( sizeof(*(p)) )                             \
       {                                                   \
       case 1: writeb(x__, p); break;                      \
       case 2: writew(x__, p); break;                      \
       case 4: writel(x__, p); break;                      \
       case 8: writeq(x__, p); break;                      \
       default: __bad_atomic_size(); break;                \
       }                                                   \
       x__;                                                \
   })

> 
> > > > +/*
> > > > + * First, the atomic ops that have no ordering constraints and
> > > > therefor don't
> > > > + * have the AQ or RL bits set.  These don't return anything,
> > > > so
> > > > there's only
> > > > + * one version to worry about.
> > > > + */
> > > > +#define ATOMIC_OP(op, asm_op, I, asm_type, c_type, prefix)  \
> > > > +static inline                                               \
> > > > +void atomic##prefix##_##op(c_type i, atomic##prefix##_t *v) \
> > > > +{                                                           \
> > > > +    asm volatile (                                          \
> > > > +        "   amo" #asm_op "." #asm_type " zero, %1, %0"      \
> > > > +        : "+A" (v->counter)                                 \
> > > > +        : "r" (I)                                           \
> > > 
> > > Btw, I consider this pretty confusing. At the 1st and 2nd glance
> > > this
> > > looks like a mistake, i.e. as if i was meant. Imo ...
> > > 
> > > > +        : "memory" );                                       \
> > > > +}                                                           \
> > > > +
> > > > +/*
> > > > + * Only CONFIG_GENERIC_ATOMIC64=y was ported to Xen that is
> > > > the
> > > > reason why
> > > > + * last argument for ATOMIC_OP isn't used.
> > > > + */
> > > > +#define ATOMIC_OPS(op, asm_op, I)                           \
> > > > +        ATOMIC_OP (op, asm_op, I, w, int,   )
> > > > +
> > > > +ATOMIC_OPS(add, add,  i)
> > > > +ATOMIC_OPS(sub, add, -i)
> > > > +ATOMIC_OPS(and, and,  i)
> > > > +ATOMIC_OPS( or,  or,  i)
> > > > +ATOMIC_OPS(xor, xor,  i)
> > > 
> > > ... here you want to only pass the (unary) operator (and leaving
> > > that
> > > blank
> > > is as fine as using +).
> > I agree that a game with 'i' and 'I' looks confusing, but I am not
> > really understand what is wrong with using ' i' here. It seems that
> > preprocessed macros looks fine:
> >    static inline void atomic_add(int i, atomic_t *v) { asm volatile
> > ( "  
> >    amo" "add" "." "w" " zero, %1, %0" : "+A" (v->counter) : "r" (i)
> > :
> >    "memory" ); }
> >    
> >    static inline void atomic_sub(int i, atomic_t *v) { asm volatile
> > ( "  
> >    amo" "add" "." "w" " zero, %1, %0" : "+A" (v->counter) : "r" (-
> > i) :
> >    "memory" ); }
> 
> I didn't question the pre-processed result being correct. Instead I
> said
> that I consider the construct confusing to the reader, for looking as
> if
> there was a mistake (in the case of the letter i used). Note also in
> particular how the macro invocations need to be in sync with the
> macro
> implementation, for lower case i being used both in the macro and in
> its
> invocations. Anything parameterized would better be fully so, at the
> very least to avoid, as said, confusion. (Having macros depend on
> context at their use sites _may_ be okay for local helper macros, but
> here we're talking about a not even private header file.)
I am not sure then I understand how mentioning '+i' will help
significantly remove confusion.

> 
> > > > +/* This is required to provide a full barrier on success. */
> > > > +static inline int atomic_add_unless(atomic_t *v, int a, int u)
> > > > +{
> > > > +       int prev, rc;
> > > > +
> > > > +    asm volatile (
> > > > +        "0: lr.w     %[p],  %[c]\n"
> > > > +        "   beq      %[p],  %[u], 1f\n"
> > > > +        "   add      %[rc], %[p], %[a]\n"
> > > > +        "   sc.w.rl  %[rc], %[rc], %[c]\n"
> > > > +        "   bnez     %[rc], 0b\n"
> > > > +        RISCV_FULL_BARRIER
> > > 
> > > With this and no .aq on the load, why the .rl on the store?
> > It is something that LKMM requires [1].
> > 
> > This is not fully clear to me what is so specific in LKMM, but
> > accoring
> > to the spec:
> >    Ordering Annotation Fence-based Equivalent
> >    l{b|h|w|d|r}.aq     l{b|h|w|d|r}; fence r,rw
> >    l{b|h|w|d|r}.aqrl   fence rw,rw; l{b|h|w|d|r}; fence r,rw
> >    s{b|h|w|d|c}.rl     fence rw,w; s{b|h|w|d|c}
> >    s{b|h|w|d|c}.aqrl   fence rw,w; s{b|h|w|d|c}
> >    amo<op>.aq          amo<op>; fence r,rw
> >    amo<op>.rl          fence rw,w; amo<op>
> >    amo<op>.aqrl        fence rw,rw; amo<op>; fence rw,rw
> >    Table 2.2: Mappings from .aq and/or .rl to fence-based
> > equivalents.
> >    An alternative mapping places a fence rw,rw after the existing 
> >    s{b|h|w|d|c} mapping rather than at the front of the
> >    l{b|h|w|d|r} mapping.
> 
> I'm afraid I can't spot the specific case in this table. None of the
> stores in the right column have a .rl suffix.
Yes, it is expected.

I am reading this table as (f.e.) amo<op>.rl is an equivalent of fence
rw,w; amo<op>. (without .rl) 

> 
> >    It is also safe to translate any .aq, .rl, or .aqrl annotation
> > into
> >    the fence-based snippets of
> >    Table 2.2. These can also be used as a legal implementation of
> >    l{b|h|w|d} or s{b|h|w|d} pseu-
> >    doinstructions for as long as those instructions are not added
> > to
> >    the ISA.
> > 
> > So according to the spec, it should be:
> >  sc.w ...
> >  RISCV_FULL_BARRIER.
> > 
> > Considering [1] and how this code looks before, it seems to me that
> > it
> > is safe to use lr.w.aq/sc.w.rl here or an fence equivalent.
> 
> Here you say "or". Then why dos the code use sc.?.rl _and_ a fence?
I confused this line with amo<op>.aqrl, so based on the table 2.2 above
s{b|h|w|d|c}.aqrl is transformed to "fence rw,w; s{b|h|w|d|c}", but
Linux kernel decided to strengthen it with full barrier:
   -              "0:\n\t"
   -              "lr.w.aqrl  %[p],  %[c]\n\t"
   -              "beq        %[p],  %[u], 1f\n\t"
   -              "add       %[rc],  %[p], %[a]\n\t"
   -              "sc.w.aqrl %[rc], %[rc], %[c]\n\t"
   -              "bnez      %[rc], 0b\n\t"
   -              "1:"
   +               "0:     lr.w     %[p],  %[c]\n"
   +               "       beq      %[p],  %[u], 1f\n"
   +               "       add      %[rc], %[p], %[a]\n"
   +               "       sc.w.rl  %[rc], %[rc], %[c]\n"
   +               "       bnez     %[rc], 0b\n"
   +               "       fence    rw, rw\n"
   +               "1:\n"
As they have the following issue:
   implementations of atomics such as atomic_cmpxchg() and
   atomic_add_unless() rely on LR/SC pairs,
   which do not give full-ordering with .aqrl; for example, current
   implementations allow the "lr-sc-aqrl-pair-vs-full-barrier" test
   below to end up with the state indicated in the "exists" clause.

> 
> > But in general it ( a combination of fence, .aq, .rl ) can be
> > considered as the same things in this context, so it is possible to
> > leave this function as is to be synced here with Linux kernel.
> 
> In turn I also don't understand this. Yes, the excess .rl certainly
> doesn't render things unsafe. But what's the purpose of the .rl?
> That's
> what my original question boiled down to.
I don't know, either. As I mentioned before, it is enough ( in my
opinion ) to have a FULL barrier or .aq,.rl or .aqrl/.aqrl ( if it
needed to be strengthened) like it was done before in Linux.
It seems to me it is LKMM specific that they need more to be more
strengthened as it RISC-V Memory model requires because:
"sc.w ; fence rw, rw" does not guarantee that all previous reads and
writes finish before the sc itself is globally visible, which might
matter if the sc is unlocking a lock or something.

Despite of the fact, for compare-and-swap loops, RISC-V international
recommends lr.w.aq/lr.d.aq followed by sc.w.rl/sc.d.rl ( as it was
implemeted before in Linux kernel ) I am okay just for safety reasons
and for the reason I mentioned at the last sentence of previous
paragraph to strengthen implementations with fences.

~ Oleksii
> 
> Jan
> 
> > [1]
> > https://lore.kernel.org/lkml/1520274276-21871-1-git-send-email-parri.andrea@xxxxxxxxx
> > /
> > 
> > ~ Oleksii
> 




 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.