[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] rwlock: remove unneeded subtraction


  • To: Jan Beulich <jbeulich@xxxxxxxx>
  • From: Roger Pau Monné <roger.pau@xxxxxxxxxx>
  • Date: Tue, 15 Feb 2022 15:16:18 +0100
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=none; dmarc=none; dkim=none; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=kGV0L+/KnQNwueDBMZXeET0IfGjUwvWawy+o5cWbojI=; b=dUkho0VYIXMPgTnuRvHZOlRyEnPevx8o6Y8nugiSmrRUxQc6fqLlmcKllbPPC75tP0W+jIxppVkM3IazEcpvRJLvW6GPuPgyGjEv2roqPMEnuk+DVNiyCeEGO7d7IMNBQPLUqG5nwPje6PluI2mzUNidg3oiK0JpAmrI1I/seaVVB68XOeQ/KxASgCZULAhfgJh0zwwt2H7mENsONxWvbdS9TsAV1H6VL+r3wbQTFxvQHqHKVzfWY56d7U5vERWUesk7e8xASBE1ESwRrir/eJPUB34N4+LWf9UR/c1zdOQEtE61oCppDdZrWmudyFMdQHr790fTfSNsSNixviY/sg==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=g1FFy+CGkZbdgQlTL27wKyXhvF79yxksgLI88XCgmkYYbeQupRoIDfcpDQKKfwwxa+tymAVEScdBTusFD6UfwDZI9t5BGmfkC0sJUFBCXpocorushHzM7ipJ+CzHR8wifC+TRnQIW88rd98jwFWAhNua3PuIznjguboq74gZP+2ZJFfO47BSY1JwnH42lqhDQSnj6IyYe7dpJKC7TSQxTSGeFp3PaGtjx3/TzARk0xtBab9z7Gnd/JSJMqS+0YSd8W2NJei+ALOv1EJxpylBt0PItYETOzWdkW9D5e4JcOAjAAlBeLkJcsBRmL2V4XC25IRAanPo9hjF8uBdolqfDQ==
  • Authentication-results: esa6.hc3370-68.iphmx.com; dkim=pass (signature verified) header.i=@citrix.onmicrosoft.com
  • Cc: Julien Grall <julien@xxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, George Dunlap <george.dunlap@xxxxxxxxxx>, Stefano Stabellini <sstabellini@xxxxxxxxxx>, Wei Liu <wl@xxxxxxx>, <xen-devel@xxxxxxxxxxxxxxxxxxxx>
  • Delivery-date: Tue, 15 Feb 2022 14:16:37 +0000
  • Ironport-data: A9a23:IGsTbqr26iux4cjfYg/VwxyNwJdeBmLoYhIvgKrLsJaIsI4StFCzt garIBnUa/uDNGSgfdh0Od+y8kJTvMLcnYNhQQA4qCo9FykRpJuZCYyVIHmrMnLJJKUvbq7GA +byyDXkBJppJpMJjk71atANlZT4vE2xbuKU5NTsY0idfic5Dndx4f5fs7Rh2NQw24HkW1jlV e7a+KUzBnf0g1aYDUpMg06zgEsHUCPa4W5wUvQWPJinjXeG/5UnJMt3yZKZdhMUdrJ8DO+iL 9sv+Znilo/vE7XBPfv++lrzWhVirrc/pmFigFIOM0SpqkAqSiDfTs/XnRfTAKtao2zhojx/9 DlCnb3zVAp2fa70pOg6DBMfDQdlEZFhwqCSdBBTseTLp6HHW37lwvEoB0AqJ4wIvO1wBAmi9 9RBdmpLNErawbvrnvTrEYGAhex6RCXvFJkYtXx6iynQEN4tQIzZQrWM7thdtNs1rp4SRKqAO pFGAdZpRAbeWT5/KnU6Mc4FkeuIt3TAYxxIqGvA8MLb5ECMlVcsgdABKuH9ZdiiVchT2EGCq Qru72n/Rx0XKtGb4T6E6W63wP/CmzvhX4AfH6H+8eRl6HWI3XAaAhASUVq9oNG6h1S4VtYZL FYbkgIsp6Uv8E2gTvHmQga15nWDu3Y0WcdUEuA8wBGAzOzT+QnxO4QfZmcfMpp87pZwHGF0k A/S9z/0OdBxmJzWVH/CtbSskT+VZiJSM0kZSRMjdxRQtrEPv7oPph7IS99iFou8gdv0BSz8z li2kcQuu1kApZVVjvvmpDgrlxrp/8GUFVBtum07S0r4tlsRWWKzW2C/BbE3B95kJZ3RcFSOt WNsdyO2vLFXVsHleMBgrYww8FCVCxStbWe0bb1HRcBJG9GRF5mLJ9443d2GDB01WvvogBewC KMphStf5YVIIFyhZrJtboS6BqwClPa8SYu1Bq2JNoMVOPCdkTNrGwk0OyatM53FyhBwwcnTx 7/HGSpTMZrqIfs+l2fnLwvs+bQq2jo/1QvuqWPTlHyaPU6lTCfNE98taQLWBshgtf/siFiFo r53aprRoz0CAbKWX8Ui2dNKRbz8BSNgXs6eRg0+XrPrHzeK70l4UKGPmOt4IdQ690mX/8+Rl kyAtoZj4AOXrVXMKBmQa2Alb7XqXJ1lqmk8MzBqNlGts0XPq671hEvGX5doL7Qh6sJ5yvt4E 6sMd8maW6wdQTXb4TUNK5L6qdU6JhisgAuPOQujYSQ+IME8F1CYpIe8c1u97jQKAwq2qdA6/ ++q2DTETMdRXA9lFsvXNq6ilgvjoXgHletudELUOd0PKl70+Y1nJnWp3P86Ks0BMzvZwT6e2 1rECBsUv7CV8YQ07MPIleaPqILwS7lyGU9THm/667eqNHaFojr/kNEYCOvRJGLTTmL5/qmmd N559fCkPa1VhktOvqp9D61vkfA06ezwquII1Q9jBnjKMQimU+syPnmc0MBTnaRR3bsF6xCuU 0eC99QGa7WEPMTpTAwYKAY/N7nR0PgVnn/Z7OgvIVW87yhypeLVXUJXNhiKqSpcMLoqb991n bZ/4JYbu16llx4nEtealSQFpW2DI0sJX7gjqpxHUpTgjRAmyw0abJHRYsMsDEpjtzmY3pEWH wKp
  • Ironport-hdrordr: A9a23:I0PCXK8X5cIN9TiIcfxuk+DcI+orL9Y04lQ7vn2ZLiYlFfBw9v re+MjzsCWetN9/Yh0dcLy7V5VoIkm9yXcW2+cs1N6ZNWGN1VdAR7sC0aLShxHmBi3i5qp8+M 5bAs1D4QTLfDtHZBDBkWuFL+o=
  • Ironport-sdr: QgDxyYAXWi4CzPlDKkTZtmMlV6X3pAURUY6/aA1k1Dqm2jqnLzPaz0bYSkfXKch5JmuNmxJqmj EJO2L/sUSZwqxtAOcAUaC8THBXLYIWU32IglQyQOlaBcjDJ0doP16zF7XUZunBg04woLcYWQPe qODYo0GPE1yMKmd38Y7uNg/aaUufyGm2ZzWo0mJk91sF3YS6y5y634l+vZXhOadObfskyy1XRB hHfo+h5zVJ7bp81phwpdX2TUd9t+hVAuV/eTol3DwSUOtf/eC0nHzn/VgQyUPc2FWUeqAeIvzM LGfVwd57+wXhJ3RV1VrgbZrM
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On Tue, Feb 15, 2022 at 02:22:02PM +0100, Jan Beulich wrote:
> On 15.02.2022 14:13, Julien Grall wrote:
> > On 15/02/2022 09:39, Roger Pau Monne wrote:
> >> There's no need to subtract _QR_BIAS from the lock value for storing
> >> in the local cnts variable in the read lock slow path: the users of
> >> the value in cnts only care about the writer-related bits and use a
> >> mask to get the value.
> >>
> >> Note that further setting of cnts in rspin_until_writer_unlock already
> >> do not subtract _QR_BIAS.
> > 
> > The rwlock is a copy of the Linux implementation. So I looked at the 
> > history to find out why _QR_BIAS was substracted.
> > 
> > It looks like this was done to get better assembly on x86:
> > 
> > commit f9852b74bec0117b888da39d070c323ea1cb7f4c
> > Author: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
> > Date:   Mon Apr 18 01:27:03 2016 +0200
> > 
> >      locking/atomic, arch/qrwlock: Employ atomic_fetch_add_acquire()
> > 
> >      The only reason for the current code is to make GCC emit only the
> >      "LOCK XADD" instruction on x86 (and not do a pointless extra ADD on
> >      the result), do so nicer.
> > 
> >      Signed-off-by: Peter Zijlstra (Intel) <peterz@xxxxxxxxxxxxx>
> >      Acked-by: Waiman Long <waiman.long@xxxxxxx>
> >      Cc: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
> >      Cc: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>
> >      Cc: Paul E. McKenney <paulmck@xxxxxxxxxxxxxxxxxx>
> >      Cc: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
> >      Cc: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
> >      Cc: linux-arch@xxxxxxxxxxxxxxx
> >      Cc: linux-kernel@xxxxxxxxxxxxxxx
> >      Signed-off-by: Ingo Molnar <mingo@xxxxxxxxxx>
> > 
> > diff --git a/kernel/locking/qrwlock.c b/kernel/locking/qrwlock.c
> > index fec082338668..19248ddf37ce 100644
> > --- a/kernel/locking/qrwlock.c
> > +++ b/kernel/locking/qrwlock.c
> > @@ -93,7 +93,7 @@ void queued_read_lock_slowpath(struct qrwlock *lock, 
> > u32 cnts)
> >           * that accesses can't leak upwards out of our subsequent critical
> >           * section in the case that the lock is currently held for write.
> >           */
> > -       cnts = atomic_add_return_acquire(_QR_BIAS, &lock->cnts) - _QR_BIAS;
> > +       cnts = atomic_fetch_add_acquire(_QR_BIAS, &lock->cnts);
> >          rspin_until_writer_unlock(lock, cnts);
> > 
> >          /*
> > 
> > This is a slowpath, so probably not a concern. But I thought I would 
> > double check whether the x86 folks are still happy to proceed with that 
> > in mind.
> 
> Hmm, that's an interesting observation. Roger - did you inspect the
> generated code? At the very least the description may want amending.

It seems to always generate the same code for me when using gcc 8.3,
I even tried using arch_fetch_and_add directly, it always results
in:

ffff82d04022d983:       f0 0f c1 03             lock xadd %eax,(%rbx)
ffff82d04022d987:       25 00 30 00 00          and    $0x3000,%eax

Similarly clang 13.0.0 seem to always generate:

ffff82d0402085de:       f0 0f c1 03             lock xadd %eax,(%rbx)
ffff82d0402085e2:       89 c1                   mov    %eax,%ecx
ffff82d0402085e4:       81 e1 00 30 00 00       and    $0x3000,%ecx

Maybe I'm missing something, but I don't see a difference in the
generated code.

I could add to the commit message:

"Originally _QR_BIAS was subtracted from the result of
atomic_add_return_acquire in order to prevent GCC from emitting an
unneeded ADD instruction. This being in the lock slow path such
optimizations don't seem likely to make any relevant performance
difference. Also modern GCC and CLANG versions will already avoid
emitting the ADD instruction."

Thanks, Roger.



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.