[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v6 08/20] xen/riscv: introduce cmpxchg.h



On Fri, 2024-03-22 at 11:42 +0100, Jan Beulich wrote:
> On 22.03.2024 11:32, Oleksii wrote:
> > On Thu, 2024-03-21 at 13:07 +0100, Jan Beulich wrote:
> > > On 15.03.2024 19:06, Oleksii Kurochko wrote:
> > > > The header was taken from Linux kernl 6.4.0-rc1.
> > > > 
> > > > Addionally, were updated:
> > > > * add emulation of {cmp}xchg for 1/2 byte types using 32-bit
> > > > atomic
> > > >   access.
> > > > * replace tabs with spaces
> > > > * replace __* variale with *__
> > > > * introduce generic version of xchg_* and cmpxchg_*.
> > > > * drop {cmp}xchg{release,relaxed,acquire} as Xen doesn't use
> > > > them
> > > 
> > > With this, ...
> > > 
> > > > * drop barries and use instruction suffixices instead ( .aq,
> > > > .rl,
> > > > .aqrl )
> > > > 
> > > > Implementation of 4- and 8-byte cases were updated according to
> > > > the
> > > > spec:
> > > > ```
> > > >               ....
> > > > Linux Construct         RVWMO AMO Mapping
> > > > atomic <op> relaxed     amo<op>.{w|d}
> > > > atomic <op> acquire     amo<op>.{w|d}.aq
> > > > atomic <op> release     amo<op>.{w|d}.rl
> > > > atomic <op>             amo<op>.{w|d}.aqrl
> > > > Linux Construct         RVWMO LR/SC Mapping
> > > > atomic <op> relaxed     loop: lr.{w|d}; <op>; sc.{w|d}; bnez
> > > > loop
> > > > atomic <op> acquire     loop: lr.{w|d}.aq; <op>; sc.{w|d}; bnez
> > > > loop
> > > > atomic <op> release     loop: lr.{w|d}; <op>; sc.{w|d}.aqrl∗ ;
> > > > bnez
> > > > loop OR
> > > >                         fence.tso; loop: lr.{w|d}; <op>;
> > > > sc.{w|d}∗
> > > > ; bnez loop
> > > > atomic <op>             loop: lr.{w|d}.aq; <op>; sc.{w|d}.aqrl;
> > > > bnez loop
> > > > 
> > > > Table A.5: Mappings from Linux memory primitives to RISC-V
> > > > primitives
> > > > 
> > > > ```
> > > 
> > > ... I consider quoting this table in full, without any further
> > > remarks, as
> > > confusing: Three of the lines each are inapplicable now, aiui.
> > I'll shorten the table then.
> > 
> > > 
> > > Further what are the two * telling us? Quite likely they aren't
> > > there
> > > just
> > > accidentally.
> > > 
> > > Finally, why sc.{w|d}.aqrl when in principle one would expect
> > > just
> > > sc.{w|d}.rl?
> > Because according to the last line of table A.5:
> >     atomic <op>             loop: lr.{w|d}.aq; <op>; sc.{w|d}.aqrl;
> > Here it is used sc.{w|d}.aqrl form, so I decided to stick to the
> > what
> > is mentioned in the table.
> 
> I understand it's mentioned that way in the table. But it being that
> way
> is not explained anywhere. Hence my "Why?"
> 
> > > > +    __typeof__(*(ptr)) old__ = (__typeof__(*(ptr)))(old); \
> > > > +    __typeof__(*(ptr)) new__ = (__typeof__(*(ptr)))(new); \
> > > 
> > > The casts aren't very nice to have here; I take they're needed
> > > for
> > > cmpxchg_ptr() to compile?
> > Not really, I have not faced an compilation issue.
> > The reason why it was added is that lr instruction places the sign-
> > extended value in destination register, but if not to do cast value
> > for
> > old and new were generated without sign extension, so, for example:
> >    u32= 0xbbbbbbbb;
> >    cmpxchg(&u32, 0xbbbbbbbb, 0xCCCCCCCC), u32);
> > Will fail because after:
> >        "0: lr" lr_sfx " %0, %2\n" 
> > in %0 we will have 0xFFFFFFFFBBBBBBBB, but in %3 we will have
> > 0xBBBBBBBB, so
> >        bne  %0, %z3, 1f\n"
> > %0 and %3 are always inequal in case when the most significat bit
> > of
> > value read from %2 has 1.
> 
> I'm afraid I don't follow: It's the type conversion you're after, but
> that would happen also with the casts omitted.
Yes, agree it would happen also with the casts omitted, and it was pure
luck that the compiler that with casts the compiler used an immediate
with the most significant bit = 1:
ffffffffc00397f0:       bbbbc7b7                lui     a5,0xbbbbc
ffffffffc00397f4:       bbb78793                add     a5,a5,-1093 #
ffffffffbbbbbbbb <start-0x4444445>
ffffffffc00397f8:       fef42623                sw      a5,-20(s0)
ffffffffc00397fc:       ccccd737                lui     a4,0xccccd
ffffffffc0039800:       ccc7071b                addw    a4,a4,-820 #
ffffffffcccccccc <__bss_end+0xcc7ff44>
ffffffffc0039804:       56fd                    li      a3,-1
ffffffffc0039806:       9281                    srl     a3,a3,0x20
ffffffffc0039808:       fec40513                add     a0,s0,-20
ffffffffc003980c:       140525af                lr.w.aq a1,(a0)
ffffffffc0039810:       00f59563                bne    
a1,a5,ffffffffc003981a <start_xen+0x4e>
ffffffffc0039814:       1ee5262f                sc.w.aqrl      
a2,a4,(a0)

I will update the code with the mask mentioned below to be sure that a5
has always correct value.

~ Oleksii

> 
> > But now I realized that it would be better just to use a mask and
> > not
> > be dependent from compiler code generation, so it would be better
> > to in
> > the following way ( of course, the macros should be updated
> > accordingly
> > to remarks you give me ):
> >    #define __generic_cmpxchg(ptr, old, new, ret, lr_sfx,
> > sc_sfx)     \
> >     ({ \
> >        register unsigned int rc; \
> >        unsigned long mask = GENMASK(((sizeof(*(ptr))) *
> > BITS_PER_BYTE)
> >    - 1, 0); \
> >        asm volatile( \
> >            "0: lr" lr_sfx " %0, %2\n" \
> >            "   and  %0, %0, %z[mask]\n" \
> >            "   bne  %0, %z3, 1f\n" \
> >            "   sc" sc_sfx " %1, %z4, %2\n" \
> >            "   bnez %1, 0b\n" \
> >            "1:\n" \
> >            : "=&r" (ret), "=&r" (rc), "+A" (*ptr) \
> >            : "rJ" (old), "rJ" (new), [mask] "rJ" (mask)  \
> >            : "memory"); \
> >     })
> 
> It'll be up to you whether to switch to such an alternative.
> 
> Jan




 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.