[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] rwlock: remove unneeded subtraction


  • To: Roger Pau Monné <roger.pau@xxxxxxxxxx>
  • From: Jan Beulich <jbeulich@xxxxxxxx>
  • Date: Tue, 15 Feb 2022 15:45:00 +0100
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=none; dmarc=none; dkim=none; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=/YUQRnu7FAeGZcxpUphTiHuUlTf3kZQccc3Fm3/StxY=; b=GkMXJcLpzucr/C4ptqEFF66MyDAwvliFeEiijxP1wP56q0AVdSd2Dd8scLqWWoGiSeR2Sg3w3Nbtzm00BY/VGdE7fQqC9Se7K0QkN1/Z5vlpvLhJDqYqFL6e6bzIrA+4bK0Oy7+JeiTDZ1i5mdOR3gKlgestOy4FMcvfauNdji6uosqTmDx3+9pmIsHDFo3KcZwSB2x6iUHjXkxfvszluNVHtFj9njqtkNdM72JJvcb3eh/hGrKL/UeyAys4NFoONUl4yjSKBW9fifxFU6Zlpl4MQb6l05QqiF7rBGHeER8dP6lh3yQAKfvC6gj9wbbFw7KtIKlttJaaE6RNiHV4OQ==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=HGPa7WUX3UeWjpE1W0/XisSSiUlFYCmd52G9D986Ij4m//qSAyrYMHob+nD8l61M8LGx9USj5Rs3ZxTRFpFlywER++ZO1eTZMkIO11xxUycO+AZERU8fp7PDgNYob9C6Q2c/0VvixpTrUV9rYqV/2oTR95boWKdYut1t0II+YToOuC2ctDEZkanZt5TgVq6eXidJ9PqJo4zKd49AK4T18eMwqRC3RElEjrzXGcE6WX67hvsyrhJoUi9QF9Xq6p/ANkxLAut0MWvGLQzI57CEhaoFHaLvujHIux3QJUnh9Yd0g2/0xeWzYMwJiKohQRto3D/1ArKpkbYG+OkoSJcycA==
  • Authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=suse.com;
  • Cc: Julien Grall <julien@xxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, George Dunlap <george.dunlap@xxxxxxxxxx>, Stefano Stabellini <sstabellini@xxxxxxxxxx>, Wei Liu <wl@xxxxxxx>, xen-devel@xxxxxxxxxxxxxxxxxxxx
  • Delivery-date: Tue, 15 Feb 2022 14:45:09 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 15.02.2022 15:16, Roger Pau Monné wrote:
> On Tue, Feb 15, 2022 at 02:22:02PM +0100, Jan Beulich wrote:
>> On 15.02.2022 14:13, Julien Grall wrote:
>>> On 15/02/2022 09:39, Roger Pau Monne wrote:
>>>> There's no need to subtract _QR_BIAS from the lock value for storing
>>>> in the local cnts variable in the read lock slow path: the users of
>>>> the value in cnts only care about the writer-related bits and use a
>>>> mask to get the value.
>>>>
>>>> Note that further setting of cnts in rspin_until_writer_unlock already
>>>> do not subtract _QR_BIAS.
>>>
>>> The rwlock is a copy of the Linux implementation. So I looked at the 
>>> history to find out why _QR_BIAS was substracted.
>>>
>>> It looks like this was done to get better assembly on x86:
>>>
>>> commit f9852b74bec0117b888da39d070c323ea1cb7f4c
>>> Author: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
>>> Date:   Mon Apr 18 01:27:03 2016 +0200
>>>
>>>      locking/atomic, arch/qrwlock: Employ atomic_fetch_add_acquire()
>>>
>>>      The only reason for the current code is to make GCC emit only the
>>>      "LOCK XADD" instruction on x86 (and not do a pointless extra ADD on
>>>      the result), do so nicer.
>>>
>>>      Signed-off-by: Peter Zijlstra (Intel) <peterz@xxxxxxxxxxxxx>
>>>      Acked-by: Waiman Long <waiman.long@xxxxxxx>
>>>      Cc: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
>>>      Cc: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>
>>>      Cc: Paul E. McKenney <paulmck@xxxxxxxxxxxxxxxxxx>
>>>      Cc: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
>>>      Cc: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
>>>      Cc: linux-arch@xxxxxxxxxxxxxxx
>>>      Cc: linux-kernel@xxxxxxxxxxxxxxx
>>>      Signed-off-by: Ingo Molnar <mingo@xxxxxxxxxx>
>>>
>>> diff --git a/kernel/locking/qrwlock.c b/kernel/locking/qrwlock.c
>>> index fec082338668..19248ddf37ce 100644
>>> --- a/kernel/locking/qrwlock.c
>>> +++ b/kernel/locking/qrwlock.c
>>> @@ -93,7 +93,7 @@ void queued_read_lock_slowpath(struct qrwlock *lock, 
>>> u32 cnts)
>>>           * that accesses can't leak upwards out of our subsequent critical
>>>           * section in the case that the lock is currently held for write.
>>>           */
>>> -       cnts = atomic_add_return_acquire(_QR_BIAS, &lock->cnts) - _QR_BIAS;
>>> +       cnts = atomic_fetch_add_acquire(_QR_BIAS, &lock->cnts);
>>>          rspin_until_writer_unlock(lock, cnts);
>>>
>>>          /*
>>>
>>> This is a slowpath, so probably not a concern. But I thought I would 
>>> double check whether the x86 folks are still happy to proceed with that 
>>> in mind.
>>
>> Hmm, that's an interesting observation. Roger - did you inspect the
>> generated code? At the very least the description may want amending.
> 
> It seems to always generate the same code for me when using gcc 8.3,
> I even tried using arch_fetch_and_add directly, it always results
> in:
> 
> ffff82d04022d983:       f0 0f c1 03             lock xadd %eax,(%rbx)
> ffff82d04022d987:       25 00 30 00 00          and    $0x3000,%eax
> 
> Similarly clang 13.0.0 seem to always generate:
> 
> ffff82d0402085de:       f0 0f c1 03             lock xadd %eax,(%rbx)
> ffff82d0402085e2:       89 c1                   mov    %eax,%ecx
> ffff82d0402085e4:       81 e1 00 30 00 00       and    $0x3000,%ecx
> 
> Maybe I'm missing something, but I don't see a difference in the
> generated code.

I've looked myself in the meantime, and I can largely confirm this.
Clang 5 doesn't eliminate the "add" (or really "lea") though. But
nevertheless ...

> I could add to the commit message:
> 
> "Originally _QR_BIAS was subtracted from the result of
> atomic_add_return_acquire in order to prevent GCC from emitting an
> unneeded ADD instruction. This being in the lock slow path such
> optimizations don't seem likely to make any relevant performance
> difference. Also modern GCC and CLANG versions will already avoid
> emitting the ADD instruction."

... I'm fine with this as explanation; I'd also be fine adding
this to the description while committing.

Jan




 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.