[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v2 07/12] mm: allow page scrubbing routine(s) to be arch controlled

To: Jan Beulich <jbeulich@xxxxxxxx>
From: Julien Grall <julien@xxxxxxx>
Date: Thu, 3 Jun 2021 10:39:38 +0100
Cc: Roger Pau Monné <roger.pau@xxxxxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, George Dunlap <george.dunlap@xxxxxxxxxx>, Ian Jackson <iwj@xxxxxxxxxxxxxx>, Stefano Stabellini <sstabellini@xxxxxxxxxx>, Wei Liu <wl@xxxxxxx>, "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>
Delivery-date: Thu, 03 Jun 2021 09:39:50 +0000
List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

Hi Jan,

On 27/05/2021 14:58, Jan Beulich wrote:

On 27.05.2021 15:06, Julien Grall wrote:

On 27/05/2021 13:33, Jan Beulich wrote:

Especially when dealing with large amounts of memory, memset() may not
be very efficient; this can be bad enough that even for debug builds a
custom function is warranted. We additionally want to distinguish "hot"
and "cold" cases.


Do you have any benchmark showing the performance improvement?


This is based on the numbers provided at
https://lists.xen.org/archives/html/xen-devel/2021-04/msg00716.html (???)
with the thread with some of the prior discussion rooted at
https://lists.xen.org/archives/html/xen-devel/2021-04/msg00425.html


Thanks for the pointer!

I'm afraid I lack ideas on how to sensibly measure _all_ of the
effects (i.e. including the amount of disturbing of caches).

I think it is quite important to provide some benchmark (or at leastrationale) in the commit message.

We had a similar situation in the past (see the discussion [1]) where acommit message claimed it would improve the performance but in realityit also added regression. Unfortunately, there is no easy way forward asthe rationale is now forgotten...

---
The choice between hot and cold in scrub_one_page()'s callers is
certainly up for discussion / improvement.


To get the discussion started, can you explain how you made the decision
between hot/cot? This will also want to be written down in the commit
message.


Well, the initial trivial heuristic is "allocation for oneself" vs
"allocation for someone else, or freeing, or scrubbing", i.e. whether
it would be likely that the page will soon be accessed again (or for
the first time).

--- /dev/null
+++ b/xen/arch/x86/scrub_page.S
@@ -0,0 +1,41 @@
+        .file __FILE__
+
+#include <asm/asm_defns.h>
+#include <xen/page-size.h>
+#include <xen/scrub.h>
+
+ENTRY(scrub_page_cold)
+        mov     $PAGE_SIZE/32, %ecx
+        mov     $SCRUB_PATTERN, %rax
+
+0:      movnti  %rax,   (%rdi)
+        movnti  %rax,  8(%rdi)
+        movnti  %rax, 16(%rdi)
+        movnti  %rax, 24(%rdi)
+        add     $32, %rdi
+        sub     $1, %ecx
+        jnz     0b
+
+        sfence
+        ret
+        .type scrub_page_cold, @function
+        .size scrub_page_cold, . - scrub_page_cold
+
+        .macro scrub_page_stosb
+        mov     $PAGE_SIZE, %ecx
+        mov     $SCRUB_BYTE_PATTERN, %eax
+        rep stosb
+        ret
+        .endm
+
+        .macro scrub_page_stosq
+        mov     $PAGE_SIZE/8, %ecx
+        mov     $SCRUB_PATTERN, %rax
+        rep stosq
+        ret
+        .endm
+
+ENTRY(scrub_page_hot)
+        ALTERNATIVE scrub_page_stosq, scrub_page_stosb, X86_FEATURE_ERMS
+        .type scrub_page_hot, @function
+        .size scrub_page_hot, . - scrub_page_hot


  From the commit message, it is not clear how the implementation for
hot/cold was chosen. Can you outline in the commit message what are the
assumption for each helper?


I've added 'The goal is for accesses of "cold" pages to not
disturb caches (albeit finding a good balance between this
and the higher latency looks to be difficult).'

@@ -1046,12 +1051,14 @@ static struct page_info *alloc_heap_page
       if ( first_dirty != INVALID_DIRTY_IDX ||
            (scrub_debug && !(memflags & MEMF_no_scrub)) )
       {
+        bool cold = d && d != current->domain;


So the assumption is if the domain is not running, then the content is
not in the cache. Is that correct?


Not exactly: For one, instead of "not running" it is "is not the current
domain", i.e. there may still be vCPU-s of the domain running elsewhere.
And for the cache the question isn't so much of "is in cache", but to
avoid needlessly bringing contents into the cache when the data is
unlikely to be used again soon.


Ok. Can this be clarified in the commit message?

As to the approach itself, I'd like an ack from one of the x86maintainers to confirm that distinguising cold vs hot page is worth it.


Cheers,

[1]<de46590ad566d9be55b26eaca0bc4dc7fbbada59.1585063311.git.hongyxia@xxxxxxxxxx>


--
Julien Grall

Follow-Ups:
- Re: [PATCH v2 07/12] mm: allow page scrubbing routine(s) to be arch controlled
  - From: Jan Beulich

Prev by Date: Re: [PATCH 01/10] xen/arm: introduce domain on Static Allocation
Next by Date: [qemu-mainline test] 162342: regressions - FAIL
Previous by thread: [libvirt test] 162345: regressions - FAIL
Next by thread: Re: [PATCH v2 07/12] mm: allow page scrubbing routine(s) to be arch controlled
Index(es):
- Date
- Thread

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.