[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v2] x86/atomic: Improvements and simplifications to assembly constraints



>>> On 18.03.19 at 15:11, <andrew.cooper3@xxxxxxxxxx> wrote:
> On 18/03/2019 13:20, Jan Beulich wrote:
>>  >>> On 18.03.19 at 12:27, <andrew.cooper3@xxxxxxxxxx> wrote:
>>> * Some of the single-byte versions specify "=q" as the output.  This is a
>>>    remnent of the 32bit build and can be relaxed to "=r" in 64bit builds.
>> I have to admit that I don't understand the "relaxed" part of this:
>> "q" and "r" represent the exact same set of registers on 64-bit.
>> Unless the conversion allows further code folding, I think it wouldn't
>> be a bad idea to retain the distinction, just for cases like code
>> eventually getting shared via something like lib/x86/.
> 
> The change from =q to =r is specifically to allow the folding to +r

Hmm, I still don't understand. "+q" is as valid as "+r". And "q" does
not make any size implications, i.e. registers in that group aren't
implicitly byte ones (albeit this aspect doesn't even matter here).

>>> The reason the volatile cast in __cmpxchg_user() can't be dropped is because
>>> without it, the compiler uses a stack copy rather than the in-memory copy,
>>> which ends up tripping:
>>>
>>>   /* Allowed to change in Accessed/Dirty flags only. */
>>>   BUG_ON((t ^ old) & ~(intpte_t)(_PAGE_ACCESSED|_PAGE_DIRTY));
>> Isn't this hinting at some other shortcoming or even flaw then?
>> If the compiler generally did such transformations, I'm afraid a
>> lot of other code would be at risk too, including some of what
>> you modify here.
> 
> I don't think there is any flaw or shortcoming.  Without the volatile,
> the compiler doesn't know that there are any side effects, so can
> legitimately operate on a local stack copy so long as it copies things
> back later.
> 
> In practice, this is an operation on shared memory which has to happen
> on the shared memory pointer.

Yes, of course. But volatile isn't guaranteed to have this effect.
Iirc there's even a statement to this effect in the gcc docs. It
just so happens that the compiler wouldn't normally make a
memory copy of a memory variable. But it wouldn't normally do
so for non-volatile memory either (because it would be an
anti-optimization), so there has to be something special here.

>> In any event I think it would be a good idea to have a code
>> comment for this as well.
> 
> I don't see how that would help.  The same applies to all atomic
> operations, even test_bit().

I don't think any other operation has casts to void in it - certainly
none of the others you touch here. In fact in (at least) __xchg()
and __cmpxchg() you cast away volatile, which I think you want
to avoid. test_bit() (as the example you give) doesn't cast to
volatile, it merely alters the type, but the incoming pointer is
already a volatile one.

Then again this is a macro, so I guess the cast can be counted to
be the equivalent of an implicit conversion due to a function
parameter being pointer-to-volatile.

>>> @@ -40,28 +37,24 @@ static always_inline unsigned long __xchg(
>>>      switch ( size )
>>>      {
>>>      case 1:
>>> -        asm volatile ( "xchgb %b0,%1"
>>> -                       : "=q" (x)
>>> -                       : "m" (*__xg(ptr)), "0" (x)
>>> -                       : "memory" );
>>> +        asm volatile ( "xchg %b[x], %[ptr]"
>>> +                       : [x] "+r" (x), [ptr] "+m" (*(uint8_t *)ptr)
>>> +                       :: "memory" );
>>>          break;
>>>      case 2:
>>> -        asm volatile ( "xchgw %w0,%1"
>>> -                       : "=r" (x)
>>> -                       : "m" (*__xg(ptr)), "0" (x)
>>> -                       : "memory" );
>>> +        asm volatile ( "xchg %w[x], %[ptr]"
>>> +                       : [x] "+r" (x), [ptr] "+m" (*(uint16_t *)ptr)
>>> +                       :: "memory" );
>>>          break;
>>>      case 4:
>>> -        asm volatile ( "xchgl %k0,%1"
>>> -                       : "=r" (x)
>>> -                       : "m" (*__xg(ptr)), "0" (x)
>>> -                       : "memory" );
>>> +        asm volatile ( "xchg %k[x], %[ptr]"
>>> +                       : [x] "+r" (x), [ptr] "+m" (*(uint32_t *)ptr)
>>> +                       :: "memory" );
>>>          break;
>>>      case 8:
>>> -        asm volatile ( "xchgq %0,%1"
>>> -                       : "=r" (x)
>>> -                       : "m" (*__xg(ptr)), "0" (x)
>>> -                       : "memory" );
>>> +        asm volatile ( "xchg %q[x], %[ptr]"
>>> +                       : [x] "+r" (x), [ptr] "+m" (*(uint64_t *)ptr)
>>> +                       :: "memory" );
>>>          break;
>> Is the q modifier really useful to have here (and elsewhere below)?
> 
> Yes - it is strictly necessary, because otherwise it gets derived from
> the type of (x) which is unsigned long even in the smaller size constructs.

What you say explains why you need the b, w, and k modifiers. It
doesn't explain the q one, since sizeof(unsigned long) is 8 and
hence exactly what would result without the modifier.

>>> @@ -63,36 +65,38 @@ static always_inline __uint128_t cmpxchg16b_local_(
>>>   * If no fault occurs then _o is updated to the value we saw at _p. If this
>>>   * is the same as the initial value of _o then _n is written to location 
>>> _p.
>>>   */
>>> -#define __cmpxchg_user(_p,_o,_n,_isuff,_oppre,_regtype)                 \
>>> +#define __cmpxchg_user(_p, _o, _n, _oppre)                              \
>>>      stac();                                                             \
>>>      asm volatile (                                                      \
>>> -        "1: lock; cmpxchg"_isuff" %"_oppre"2,%3\n"                      \
>>> +        "1: lock cmpxchg %"_oppre"[new], %[ptr]\n"                      \
>>>          "2:\n"                                                          \
>>>          ".section .fixup,\"ax\"\n"                                      \
>>> -        "3:     movl $1,%1\n"                                           \
>>> +        "3:     movl $1, %[rc]\n"                                       \
>>>          "       jmp 2b\n"                                               \
>>>          ".previous\n"                                                   \
>>>          _ASM_EXTABLE(1b, 3b)                                            \
>>> -        : "=a" (_o), "=r" (_rc)                                         \
>>> -        : _regtype (_n), "m" (*__xg((volatile void *)_p)), "0" (_o), "1" 
>>> (0) \
>>> +        : "+a" (_o), [rc] "=r" (_rc),                                   \
>>> +          [ptr] "+m" (*(volatile typeof(*(_p)) *)(_p))                  \
>>> +        : [new] "r" (_n), "[rc]" (0)                                    \
>> Wouldn't it further help readability a little if _rc was initialized to zero
>> right when getting declared, eliminating the last input arg here (the
>> output then would need to be "+r" of course)?
> 
> I can do.
> 
>> And since then you
>> actually touch all lines containing uses of _rc, it would be a good
>> opportunity to also rename the variable to get rid of the leading
>> underscore.
> 
> I'm not sure that is a sensible move.  Its a macro-scope variable from
> cmpxchg_user() which still needs disambiguating from potential names of
> parameters.

I don't understand. For one, the macro parameters are all (wrongly)
underscore-prefixed too, so there's no risk of collision there.
Disambiguation is needed against outer scope variables, yet this
should be done by an underscore suffix (as we commonly do) or
yet some other mechanism. A leading underscore here is not in line
with the C spec.

>> Anyway, with at least the "relaxed" part of the description changed
>> (e.g. to "converted") or explained verbally in a reply, with or without
>> the other items taken care of
>> Reviewed-by: Jan Beulich <jbeulich@xxxxxxxx>
> 
> Hopefully my reply is sufficient?

Well, as per above not for now: I still don't see any relaxation you do.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.