[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH for-4.21 v2] x86/AMD: avoid REP MOVSB for Zen3/4
- To: Jan Beulich <jbeulich@xxxxxxxx>, "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>
- From: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
- Date: Tue, 6 Jan 2026 21:07:13 +0000
- Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=citrix.com; dmarc=pass action=none header.from=citrix.com; dkim=pass header.d=citrix.com; arc=none
- Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=T+Qkxo2/murvC9IJKVYbJXH1tg93Vz2Ht9AUoo6S1LY=; b=FX0VzJFY3vJEfz1fyYBXic1Epbe2BzY6gzVQf1303QEN1KddMicuoM6RAuasj3h5vQOB5qy59zwsRG66DWz+C0b2XpUe56a38Wo3UopEioLFOOzSDc6tHKxW3itkKnRQd2kbOXXfisGB7/WE/T8jWHmZP+ZOYxvhurBbuzX/uiR5jGR1u+kR5MgIwKNqd/NmU2tb5drtn9tKaQRVxyo5CGoEzLWUiGd5oeu1djR/TEeP0IB1cPItSwNg+p7Nmt+A+PEXuRdZkHYUBuOlmjZ0Ym79QOs4AM6VfzgYYg/MZdJO6+vTfJCbKJ6SKpG7Ud6qBKSKpzR13cq2i4TyDzlUAg==
- Arc-seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=vvFg4d4U1GvEwQxK69ROUerfpQZLMpx+6f/6FdGm/VoKtsODds37qwyBh5PZma/oJMnbIt0MTrXW4RwfoObTb+E7DSxo3xt9DzEk0xVPTaCoupWTGPvoGG0PllPFxD5sUjuaCiyzwVfjyCy30h3q3OYwrzOLzGI2juHLGlmiroIRq11lW+YUNmo3mKF1PAXIdg52S67nF8hxuQfXs4FJMp2hVzkZ9tUVaRe3J9KfArrfVyj5v37qT/MNRZXiT+2RWulxo7b1LxWbxTecx5LeCKPlhwkuTZGqmqkYSFt63ZSrknDyHN7lahTTMYrOfJVustRE1hB5tvKkUUSjRica1Q==
- Authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=citrix.com;
- Cc: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Andrew Cooper <andrew.cooper@xxxxxxxxxx>, Roger Pau Monne <roger.pau@xxxxxxxxxx>, Oleksii Kurochko <oleksii.kurochko@xxxxxxxxx>
- Delivery-date: Tue, 06 Jan 2026 21:07:44 +0000
- List-id: Xen developer discussion <xen-devel.lists.xenproject.org>
On 13/10/2025 2:06 pm, Jan Beulich wrote:
> Along with Zen2 (which doesn't expose ERMS), both families reportedly
> suffer from sub-optimal aliasing detection when deciding whether REP MOVSB
> can actually be carried out the accelerated way. Therefore we want to
> avoid its use in the common case of memcpy(); copy_page_hot() is fine, as
> its two pointers are always going to be having the same low 5 bits.
I think this could be a bit clearer. How about this:
---8<---
Zen2 (which doesn't expose ERMS) through Zen4 have sub-optimal aliasing
detection for REP MOVS, and fall back to a unit-at-a-time loop when the
two pointers have differing bottom 5 bits. While both forms are
affected, this makes REP MOVSB 8 times slower than REP MOVSQ.
memcpy() has a high likelihood of encountering this slowpath, so avoid
using REP MOVSB. This undoes the ERMS optimisation added in commit
d6397bd0e11c which turns out to be an anti-optimisation on these
microarchitectures.
However, retain the use of ERMS-based REP MOVSB in other cases such as
copy_page_hot() where there parameter alignment is known to avoid the
slowpath.
---8<---
?
This at least gets us back to the 4.20 behaviour.
~Andrew
|