[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH 19/36] xen/arch: introduce cache-coloring allocator


  • To: Marco Solieri <marco.solieri@xxxxxxxxxxxxxxx>
  • From: Jan Beulich <jbeulich@xxxxxxxx>
  • Date: Wed, 9 Mar 2022 15:35:52 +0100
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=suse.com; dmarc=pass action=none header.from=suse.com; dkim=pass header.d=suse.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=/69ovZYZIr5K1Q61KTBcqWMzdtZcA0k2GbH+a2AQAus=; b=KePb0yY+ndppQ6lsGou0D1qEW46sfjrdNYsWdTFkm4Us7gqopE1z4Nr2aew0FgZ3Gh5Ch/XJ3MgUhScA/eZbO7BWu1OwARkWklcAl71JI//04e80oWHIeB/WHxBMMYZopHXmcUGiJd/WTVhAzvVbZgdUIz+EyBdP+Rkmhz7Rc7szV6EsV31pimPSvbyarcvjpTG9MHPv5VhVLikiok4RMiLGfIvCttzKZ20UX5FwKMk1S/Rw12JhMAKlO0/dMwhYSIkvReDpsUcueORlFuWxAouhpPNChBLdv1LUTyEr7+eQBMHyvkT1xojfrZkuxd3GGf8HJVmI6spBmCYl2AQnqA==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=L7ALvTQre29P4uR6ya5uY190TIPWemUoYan0Cbf6uuvS8GPl5hkpA1qo8Y3p0pZYqWhmGAlCcTWt9pDaFSdZbOBrdVXszgXQY54ejw72ZRL5ROBtK6Rvw5WkhJl6yr2MyeJYF0N6UVN8aNa5BWqDLFTmGCl+IbS650sr5x61WbpKMKgAGPH3R0J0iitDfhAezTyHArqs6EQFydk0srUbHifVd25cU60dwKotH6Y+f+0Cbe69oubSLC/jiuqe4cSYccUCpVYBju2aYbJNxTBvW7EXoZZeL4pfryfq+uumeYtdAxonMa3PwgfY/hX2+/iiN1VZmmlZ0L53iGfmxRhS2A==
  • Authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=suse.com;
  • Cc: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, George Dunlap <george.dunlap@xxxxxxxxxx>, Julien Grall <julien@xxxxxxx>, Stefano Stabellini <sstabellini@xxxxxxxxxx>, Wei Liu <wl@xxxxxxx>, Marco Solieri <marco.solieri@xxxxxxxxxx>, Andrea Bastoni <andrea.bastoni@xxxxxxxxxxxxxxx>, Luca Miccio <lucmiccio@xxxxxxxxx>, xen-devel@xxxxxxxxxxxxxxxxxxxx
  • Delivery-date: Wed, 09 Mar 2022 14:36:12 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 04.03.2022 18:46, Marco Solieri wrote:
> @@ -438,6 +441,263 @@ mfn_t __init alloc_boot_pages(unsigned long nr_pfns, 
> unsigned long pfn_align)
>  
>  
>  
> +static DEFINE_SPINLOCK(heap_lock);

Please take the opportunity and shrink the number of consecutive blank lines
above here.

> +
> +#ifdef CONFIG_COLORING
> +/*************************
> + * COLORED SIDE-ALLOCATOR
> + *
> + * Pages are stored by their color in separated lists. Each list defines a 
> color
> + * and it is initialized during end_boot_allocator, where each page's color
> + * is calculated and the page itself is put in the correct list.
> + * After initialization we have N list where N is the number of maximum
> + * available colors on the platform.
> + * All the lists' heads are stored as element in an array with size N-1 using
> + * the following schema:
> + * array[X] = head of color X, where X goes from 0 to N-1
> + */
> +typedef struct page_list_head color_list;

color_list_t, to make it easy to recognize this as a type (rather than
a variable name)?

> +static color_list *color_heap;

__ro_after_init or, if Arm still hasn't got support for  that, at least
__read_mostly.

> +static long total_avail_col_pages;

Can this go negative? If not, unsigned long please.

> +static u64 col_num_max;

No new uses of u<N> or s<N> please. They're being phased out, and C99
types should be used instead.

> +static bool color_init_state = true;

This variable looks to be local to init_col_heap_pages() - please move
it there (provided it's needed in the first place: it's no clear why
you don't use an initcall instead).

> +#define page_to_head(pg) (&color_heap[color_from_page(pg)])
> +#define color_to_head(col) (&color_heap[col])
> +
> +/* Add page in list in order depending on its physical address. */
> +static void page_list_add_order(struct page_info *pg, struct list_head *head)

I guess you mean struct page_list_head * here?

> +{
> +    struct page_info *pos;
> +
> +    /* Add first page after head */
> +    if ( page_list_empty(head) )
> +    {
> +        page_list_add(pg, head);
> +        return;
> +    }
> +
> +    /* Add non-first page in list in ascending order */
> +    page_list_for_each_reverse(pos, head)
> +    {
> +        /* Get pg position */
> +        if ( page_to_maddr(pos) <= page_to_maddr(pg) )

Wouldn't it be a bug if the two were equal? If so, perhaps better
ASSERT() or even BUG_ON() accordingly?

> +        {
> +            /* Insert pg between pos and pos->list.next */
> +            page_list_add(pg, &pos->list);

The 2nd parameter of page_list_add() is struct page_list_head *, not
struct page_list_entry *. I guess you won't get away without introducing
a new accessor.

> +            break;
> +        }
> +
> +        /*
> +         * If pos is the first element it means that pg <= pos so we have
> +         * to insert pg after head.
> +         */
> +        if ( page_list_first(head) == pos )
> +        {
> +            page_list_add(pg, head);
> +            break;
> +        }

The way it's written it's not immediately obvious that the passed in page
would actually be put anywhere on the list by the time the function
returns. Furthermore this if() doesn't look to be necessary to be
evaluated on every loop iteration. Instead it could apparently live
_after_ the loop, requiring use of "return" instead of "break" further up.

> +    }
> +}

This function dealing with a page at a time while linearly scanning the
list looks to be pretty inefficient, the more that it'll be a common
case that callers have to deal with multiple pages at a time. Sadly it is
not clear how many colors there may be (without hunting down the origin
of max_col_num, which get_max_colors() returns, in the earlier 18
patches), and hence how long these lists may grow.

> +/* Alloc one page based on domain color configuration */
> +static struct page_info *alloc_col_heap_page(
> +    unsigned int memflags, struct domain *d)
> +{
> +    struct page_info *pg, *tmp;
> +    bool need_tlbflush = false;
> +    uint32_t cur_color;
> +    uint32_t tlbflush_timestamp = 0;
> +    uint32_t *colors = 0;

Please consult ./CODING_STYLE for when it is appropriate to use fixed-
width types.

> +    int max_colors;
> +    int i;

I don't suppose either of these can go negative, so unsigned int please.
(Just like other remarks - please consider applicable to the entire
series.)

> +    colors = d->colors;
> +    max_colors = d->max_colors;

Please make these the initializers of the variables, which will also
avoid using 0 where NULL is meant above. It also looks as if these were
the only two uses of "d" in the function. If so, please consider making
colors and max_colors the function parameters instead. Or else d likely
wants to be pointer to const (and colors as well as it looks; generally
please use const on pointed-to types wherever possible).

> +    spin_lock(&heap_lock);
> +
> +    tmp = pg = NULL;
> +
> +    /* Check for the first pg on non-empty list */
> +    for ( i = 0; i < max_colors; i++ )
> +    {
> +        if ( !page_list_empty(color_to_head(colors[i])) )
> +        {
> +            tmp = pg = page_list_last(color_to_head(colors[i]));
> +            cur_color = d->colors[i];
> +            break;
> +        }
> +    }
> +
> +    /* If all lists are empty, no requests can be satisfied */
> +    if ( !pg )
> +    {
> +        spin_unlock(&heap_lock);
> +        return NULL;
> +    }

I'm not convinced this is a useful thing to have. The identical
construct below the subsequent loop will deal with this case quite fine
afaict.

> +    /* Get the highest page from the lists compliant to the domain color(s) 
> */
> +    for ( i += 1; i < max_colors; i++ )

Perhaps easier as

    while ( ++i < max_colors )

?

> +    {
> +        if ( page_list_empty(color_to_head(colors[i])) )
> +        {
> +            printk(XENLOG_INFO "List empty\n");

This is liable to be too noisy even if converted to dprintk(). Please
drop.

> +            continue;
> +        }
> +        tmp = page_list_last(color_to_head(colors[i]));
> +        if ( page_to_maddr(tmp) > page_to_maddr(pg) )
> +        {
> +            pg = tmp;
> +            cur_color = colors[i];

You only ever write this variable - please drop such, or introduce them
only once actually needed (if e.g. in a later patch).

> +        }
> +    }
> +
> +    if ( !pg )
> +    {
> +        spin_unlock(&heap_lock);
> +        return NULL;
> +    }
> +
> +    pg->count_info = PGC_state_inuse;
> +
> +    if ( !(memflags & MEMF_no_tlbflush) )
> +        accumulate_tlbflush(&need_tlbflush, pg,
> +                            &tlbflush_timestamp);
> +
> +    /* Initialise fields which have other uses for free pages. */
> +    pg->u.inuse.type_info = 0;
> +    page_set_owner(pg, NULL);

This would now become the 3rd instance - time to consider a small
helper function?

> +    flush_page_to_ram(mfn_x(page_to_mfn(pg)),
> +                      !(memflags & MEMF_no_icache_flush));
> +
> +    page_list_del(pg, page_to_head(pg));
> +    total_avail_col_pages--;
> +
> +    spin_unlock(&heap_lock);
> +
> +    if ( need_tlbflush )
> +        filtered_flush_tlb_mask(tlbflush_timestamp);
> +
> +    return pg;
> +}
> +
> +struct page_info *alloc_col_domheap_page(
> +    struct domain *d, unsigned int memflags)
> +{
> +    struct page_info *pg;
> +
> +    ASSERT(!in_irq());
> +
> +    /* Get page based on color selection */
> +    pg = alloc_col_heap_page(memflags, d);
> +
> +    if ( !pg )
> +    {
> +        printk(XENLOG_INFO "ERROR: Colored Page is null\n");
> +        return NULL;
> +    }
> +
> +    /* Assign page to domain */
> +    if ( d && !(memflags & MEMF_no_owner) &&
> +        assign_page(pg, 0, d, memflags) )
> +    {
> +        free_col_heap_page(pg);
> +        return NULL;
> +    }
> +
> +    return pg;
> +}

So this is really only providing a single order-0 page. From the cover
letter it didn't sound like you were aiming at such a limited use case.
It's also not listed under "Known limitations" there that large pages
don't even have provisions made for.

> +void free_col_heap_page(struct page_info *pg)
> +{
> +    /* This page is not a guest frame any more. */
> +    pg->count_info = PGC_state_free;
> +
> +    page_set_owner(pg, NULL);
> +    total_avail_col_pages++;
> +    page_list_add_order( pg, page_to_head(pg) );

Nit: Stray blanks immediately inside the parentheses.

> +}

How does this fit into the get_page() / put_page() machinery? You
don't alter free_domheap_pages(), after all.

> +static inline void init_col_heap_pages(struct page_info *pg, unsigned long 
> nr_pages)

Why inline? I guess this might be to silence the compiler warning
about the function being unused, but then this only means that you
want to introduce the function once it's needed. Then it would
also be possible to tell whether the function wants to be __init.

Additionally the line is too long.

> +{
> +    int i;
> +
> +    if ( color_init_state )
> +    {
> +        col_num_max = get_max_colors();
> +        color_heap = xmalloc_array(color_list, col_num_max);
> +        BUG_ON(!color_heap);
> +
> +        for ( i = 0; i < col_num_max; i++ )
> +        {
> +            printk(XENLOG_INFO "Init list for color: %u\n", i);

Again too noisy. Such may be okay in a RFC series, but should have been
dropped for a "normal" submission.

> +            INIT_PAGE_LIST_HEAD(&color_heap[i]);
> +        }
> +
> +        color_init_state = false;
> +    }
> +
> +    printk(XENLOG_INFO "Init color heap pages with %lu pages for a given 
> size of 0x%"PRIx64"\n",

While you shouldn't split the format string across lines, you should
take all other available measures to limit line length. Furthermore
please consider using the shorter %#x form here and elsewhere. Overall:

    printk(XENLOG_INFO
           "Init color heap with %lu pages for a given size of 0x%"PRIx64"\n",

And even then the two values logged are redundant with one another,
so things can further be shortened here.

> +            nr_pages, nr_pages * PAGE_SIZE);

Nit: Indentation.

> +    printk(XENLOG_INFO "Paging starting from: 0x%"PRIx64"\n", 
> page_to_maddr(pg));
> +    total_avail_col_pages += nr_pages;
> +
> +    for ( i = 0; i < nr_pages; i++ )
> +    {
> +        pg->colored = true;
> +        page_list_add_order(pg, page_to_head(pg));
> +        pg++;
> +    }
> +}
> +
> +static inline bool is_page_colored(struct page_info *pg)
> +{
> +        return pg->colored;

Nit: Indentation again (and more instance below).

> +}
> +
> +static void dump_col_heap(unsigned char key)
> +{
> +    struct page_info *pg;
> +    unsigned long size;
> +    unsigned int i;
> +
> +    printk("Colored heap info\n");
> +    for ( i = 0; i < col_num_max; i++ )
> +    {
> +        printk("Heap[%u]: ", i);
> +        size = 0;
> +        page_list_for_each( pg, color_to_head(i) )
> +        {
> +            BUG_ON(!(color_from_page(pg) == i));
> +            size++;
> +        }

How long is this going to take on a decently sized system? At the
very least you'll need to call process_pending_softirqs() every
once in a while. But this may be taking too long this way anyway.

> +        printk("%lu pages -> %lukB free\n", size, size << (PAGE_SHIFT - 10));

Again the same information is being logged twice.

> +    }
> +
> +    printk("Total number of pages: %lu\n", total_avail_col_pages);

Since the value logged isn't calculated locally, may I suggest to
merge this into the initial printk()?

> +}
> +#else /* !CONFIG_COLORING */
> +#define init_col_heap_pages(x, y) init_heap_pages(x, y)
> +
> +inline struct page_info *alloc_col_domheap_page(
> +     struct domain *d, unsigned int memflags)
> +{
> +     return NULL;
> +}
> +
> +inline void free_col_heap_page(struct page_info *pg)
> +{
> +     return;
> +}
> +
> +static inline bool is_page_colored(struct page_info *pg)
> +{
> +        return false;
> +}

Why are any of these needed? And if any are needed, please drop
inline once again.

> @@ -2600,6 +2859,9 @@ static void cf_check dump_heap(unsigned char key)
>  static __init int cf_check register_heap_trigger(void)
>  {
>      register_keyhandler('H', dump_heap, "dump heap info", 1);
> +#ifdef CONFIG_COLORING
> +    register_keyhandler('c', dump_col_heap, "dump coloring heap info", 1);

'c' already has a use on x86. Please avoid such collisions, even if
initially you're targeting Arm only. I don't see why a separate key
is needed anyway - can't you just extend dump_heap()?

> --- a/xen/include/xen/mm.h
> +++ b/xen/include/xen/mm.h
> @@ -131,6 +131,11 @@ unsigned int online_page(mfn_t mfn, uint32_t *status);
>  int offline_page(mfn_t mfn, int broken, uint32_t *status);
>  int query_page_offline(mfn_t mfn, uint32_t *status);
>  
> +/* Colored suballocator. */
> +struct page_info *alloc_col_domheap_page(
> +    struct domain *d, unsigned int memflags);
> +void free_col_heap_page(struct page_info *pg);

These two should imo represent a pair, i.e. be named similarly.

Jan




 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.