[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v4 4/8] mm: Scrub memory from idle loop



>>> On 19.05.17 at 17:50, <boris.ostrovsky@xxxxxxxxxx> wrote:
> Instead of scrubbing pages during guest destruction (from
> free_heap_pages()) do this opportunistically, from the idle loop.

This is too brief for my taste. In particular the re-ordering ...

> --- a/xen/arch/x86/domain.c
> +++ b/xen/arch/x86/domain.c
> @@ -118,8 +118,9 @@ static void idle_loop(void)
>      {
>          if ( cpu_is_offline(smp_processor_id()) )
>              play_dead();
> -        (*pm_idle)();
>          do_tasklet();
> +        if ( cpu_is_haltable(smp_processor_id()) && !scrub_free_pages() )
> +            (*pm_idle)();
>          do_softirq();

... here (and its correctness / safety) needs an explanation. Not
processing tasklets right after idle wakeup is a not obviously
correct change. Nor is it immediately clear why this needs to be
pulled ahead for your purposes.

> --- a/xen/arch/x86/domain_page.c
> +++ b/xen/arch/x86/domain_page.c
> @@ -18,12 +18,14 @@
>  #include <asm/hardirq.h>
>  #include <asm/setup.h>
>  
> -static struct vcpu *__read_mostly override;
> +static DEFINE_PER_CPU(struct vcpu *, override);
>  
>  static inline struct vcpu *mapcache_current_vcpu(void)
>  {
> +    struct vcpu *v, *this_vcpu = this_cpu(override);
> +
>      /* In the common case we use the mapcache of the running VCPU. */
> -    struct vcpu *v = override ?: current;
> +    v = this_vcpu ?: current;

What's wrong with

    struct vcpu *v = this_cpu(override) ?: current;

? And this, btw, is another change which should have an explanation
in the commit message.

> --- a/xen/common/page_alloc.c
> +++ b/xen/common/page_alloc.c
> @@ -1010,15 +1010,79 @@ static int reserve_offlined_page(struct page_info 
> *head)
>      return count;
>  }
>  
> -static void scrub_free_pages(unsigned int node)
> +static nodemask_t node_scrubbing;
> +
> +/*
> + * If get_node is true this will return closest node that needs to be 
> scrubbed,
> + * with appropriate bit in node_scrubbing set.
> + * If get_node is not set, this will return *a* node that needs to be 
> scrubbed.
> + * node_scrubbing bitmask will no be updated.
> + * If no node needs scrubbing then NUMA_NO_NODE is returned.
> + */
> +static unsigned int node_to_scrub(bool get_node)
>  {
> -    struct page_info *pg;
> -    unsigned int zone;
> +    nodeid_t node = cpu_to_node(smp_processor_id()), local_node;
> +    nodeid_t closest = NUMA_NO_NODE;
> +    u8 dist, shortest = 0xff;
>  
> -    ASSERT(spin_is_locked(&heap_lock));
> +    if ( node == NUMA_NO_NODE )
> +        node = 0;
>  
> -    if ( !node_need_scrub[node] )
> -        return;
> +    if ( node_need_scrub[node] &&
> +         (!get_node || !node_test_and_set(node, node_scrubbing)) )
> +        return node;
> +
> +    /*
> +     * See if there are memory-only nodes that need scrubbing and choose
> +     * the closest one.
> +     */
> +    local_node = node;
> +    while ( 1 )

As some compilers / compiler versions warn about such constructs
("condition is always true"), I generally recommend using "for ( ; ; )"
instead.

> +    {
> +        do {
> +            node = cycle_node(node, node_online_map);
> +        } while ( !cpumask_empty(&node_to_cpumask(node)) &&
> +                  (node != local_node) );
> +
> +        if ( node == local_node )
> +            break;
> +
> +        if ( node_need_scrub[node] )
> +        {
> +            if ( !get_node )
> +                return node;
> +
> +            dist = __node_distance(local_node, node);
> +            if ( dist < shortest || closest == NUMA_NO_NODE )
> +            {
> +                if ( !node_test_and_set(node, node_scrubbing) )
> +                {
> +                    if ( closest != NUMA_NO_NODE )
> +                        node_clear(closest, node_scrubbing);

You call this function with no locks held, yet you temporarily set a
bit in node_scrubbing which you then may clear again here. That'll
prevent another idle CPU to do scrubbing on this node afaict,
which, while not a bug, is not optimal. Therefore I think for this
second part of the function you actually want to acquire the heap
lock.

> +                    shortest = dist;
> +                    closest = node;
> +                }
> +            }

Also note that two if()s like the above, when the inner one also
has no else, can and should be combined into one.

> @@ -1035,22 +1099,46 @@ static void scrub_free_pages(unsigned int node)
>  
>                  for ( i = pg->u.free.first_dirty; i < (1U << order); i++)
>                  {
> +                    cnt++;
>                      if ( test_bit(_PGC_need_scrub, &pg[i].count_info) )
>                      {
>                          scrub_one_page(&pg[i]);
>                          pg[i].count_info &= ~PGC_need_scrub;
>                          node_need_scrub[node]--;
> +                        cnt += 100; /* scrubbed pages add heavier weight. */
>                      }

While it doesn't matter much, I would guess that your intention
really was to either do the "cnt++" in an "else" to this "if()", or
use 99 instead of 100 above.

> -                }
>  
> -                page_list_del(pg, &heap(node, zone, order));
> -                page_list_add_scrub(pg, node, zone, order, 
> INVALID_DIRTY_IDX);
> +                    /*
> +                     * Scrub a few (8) pages before becoming eligible for
> +                     * preemtion. But also count non-scrubbing loop iteration

"preemption" and "iterations"

> +                     * so that we don't get stuck here with an almost clean
> +                     * heap.
> +                     */
> +                    if ( softirq_pending(cpu) && cnt > 800 )

Please switch both sides of the &&.

> +                    {
> +                        preempt = true;
> +                        break;
> +                    }
> +                }
>  
> -                if ( node_need_scrub[node] == 0 )
> -                    return;
> +                if ( i == (1U << order) )
> +                {
> +                    page_list_del(pg, &heap(node, zone, order));
> +                    page_list_add_scrub(pg, node, zone, order, 
> INVALID_DIRTY_IDX);
> +                }
> +                else
> +                    pg->u.free.first_dirty = i + 1;

This seems wrong if you set "preempt" after scrubbing the last page
of a buddy - in that case the increment of i in the loop header is being
skipped yet the entire buddy is now clean, so the if() condition here
wants to be "i >= (1U << order) - 1" afaict. In any event it needs to
be impossible for first_dirty to obtain a value of 1U << order here.

> +                if ( preempt || (node_need_scrub[node] == 0) )
> +                    goto out;
>              }
>          } while ( order-- != 0 );
>      }
> +
> + out:
> +    spin_unlock(&heap_lock);
> +    node_clear(node, node_scrubbing);

With the memory-less node comment above it may be necessary
to switch these two around.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.