[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH 06/12] xen/common: add cache coloring allocator for domains


  • To: Carlo Nonato <carlo.nonato@xxxxxxxxxxxxxxx>
  • From: Jan Beulich <jbeulich@xxxxxxxx>
  • Date: Thu, 15 Sep 2022 15:13:21 +0200
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=suse.com; dmarc=pass action=none header.from=suse.com; dkim=pass header.d=suse.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=E7ZW9IxP6fqgA/H2tgkcIDfG1a5VZJISiz6xJOiojI4=; b=T0By0PupOxDpbOkWhS7O3qV19QKOwzFUMZZ2giCuFzSxHIllWb4lYuZht/OR7AUKFRJPAl08StwJ+0jxot8BJIFhPJTVs7+E9wqNg5G4G/4lOF8jodTf7WrYkndw4WQK21pxCIpsXWe2SEZTb48Xu55gTxWnF29MKMe/5UdjZ2Vr6V5C13bCrnEwtbS13PuSl0rLJWNBc01EGajGI5pQmwbige8iBrZGQJJZc1iFQUBEILAzk2QtHtL83w4ErB99nELfBuflcOXT3NQBvd7vZ0przCdIWczXTmW0n1c/h9uLBqj9knzoI06wN1rV4RAcqy5bIxxmaiQ1ltV0J4vBuQ==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=MuOtlH9sTMEGPkT0lwyWdv0SobSNYvOZGaUuTSLhxzFZ/xdPlG2F03SygYlAb0rR5k7xw8HCuYuoc37rd5+ANmjwzC9oPUXd+zY/GS3Sm5T2OYpk0W7b+tJGOebAVSFf0lTR9WLeTevUWxiCA1iev1YJwnC9npzo/9JCc6i2MndoLr5RE47SFSs8GH3a4QisiBLnbjliaxjXQsOilzZ2fATo1b09GH9IEquTDV1Tkrs7ZYyhayytRi85RhO+AQyoTCaUFj2GEq2cI7pTFVbd13sDviZccQilzHUmZkthPWgePRuoF7MuHkkQnsli8XXeQYxJGxnOE6ffe0YxeDC7iA==
  • Authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=suse.com;
  • Cc: andrew.cooper3@xxxxxxxxxx, george.dunlap@xxxxxxxxxx, julien@xxxxxxx, stefano.stabellini@xxxxxxx, wl@xxxxxxx, marco.solieri@xxxxxxxxxx, andrea.bastoni@xxxxxxxxxxxxxxx, lucmiccio@xxxxxxxxx, Marco Solieri <marco.solieri@xxxxxxxxxxxxxxx>, "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>
  • Delivery-date: Thu, 15 Sep 2022 13:13:34 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 26.08.2022 14:51, Carlo Nonato wrote:
> This commit adds a new memory page allocator that implements the cache
> coloring mechanism. The allocation algorithm follows the given color
> configuration of the domain and maximizes contiguity in the page selection.
> 
> Pages are stored in a color-indexed data structure of lists, sorted by their
> machine addresses, that are collectively called the colored heap. A simple
> initialization function computes the color of any available page and inserts
> it in the corresponding list. When a domain requests a page, the allocator
> takes one from the subset of lists whose colors equals the domain
> configuration. It chooses the page with the highest machine address such that
> contiguous pages are sequentially allocated, if this is made possible by a
> color assignment which includes adjacent colors.
> 
> The allocator can handle only requests with order equals to 0 since the
> single color granularity is represented in memory by one page.

In the earlier paragraph you talk about contiguous pages using contiguous
colors. This paragraph then appears to say the opposite, i.e. that
contiguous multi-page allocations aren't possible. Which of the two is it?

> --- a/xen/arch/arm/Kconfig
> +++ b/xen/arch/arm/Kconfig
> @@ -147,6 +147,18 @@ config MAX_CACHE_COLORS
>         colors at boot. Note that if, at any time, a color configuration with 
> more
>         colors than the maximum will be employed an error will be produced.
>  
> +config BUDDY_ALLOCATOR_SIZE
> +     string "Buddy allocator reserved memory size" if CACHE_COLORING
> +     default "64M" if CACHE_COLORING
> +     default "0M" if !CACHE_COLORING

I don't understand the purpose of this last line, nor the two earlier
"if". Why not simply

config BUDDY_ALLOCATOR_SIZE
        string "Buddy allocator reserved memory size"
        depend on CACHE_COLORING
        default "64M"

? Also does this really need to be a string, rather than a number (which
then doesn't need parsing) with e.g. MiB granularity?

Finally - how much of this is really Arm-specific? Shouldn't this be a
common config option, perhaps merely restricted to Arm by the top level
option (CACHE_COLORING?) depending on a further HAS_CACHE_COLORING,
which only Arm would select?

> --- a/xen/arch/arm/coloring.c
> +++ b/xen/arch/arm/coloring.c
> @@ -300,6 +300,16 @@ void prepare_color_domain_config(struct 
> xen_arch_domainconfig *config,
>      config->num_colors = (uint16_t)num;
>  }
>  
> +unsigned int page_to_color(struct page_info *pg)

The parameter will want to be pointer-to-const and I wonder whether ...

> +{
> +    return addr_to_color(page_to_maddr(pg));
> +}

... the function as a whole wouldn't be a good candidate for being an
inline one (requiring addr_to_color() to be available in outside of
this file, of course).

> --- a/xen/arch/arm/include/asm/mm.h
> +++ b/xen/arch/arm/include/asm/mm.h
> @@ -143,6 +143,9 @@ struct page_info
>  #define PGC_count_width   PG_shift(10)
>  #define PGC_count_mask    ((1UL<<PGC_count_width)-1)
>  
> +#define _PGC_colored      PG_shift(11)
> +#define PGC_colored       PG_mask(1, 11)

I don't think this can work - you can't use bits already covered by
the count. You'll need to shift up PGC_count_{width,mask} by one and
insert your bit between PGC_extra and that. Or you could use one of
the lower unused ones, between PGC_static and PGC_broken.

> --- a/xen/common/page_alloc.c
> +++ b/xen/common/page_alloc.c
> @@ -150,6 +150,9 @@
>  #define p2m_pod_offline_or_broken_hit(pg) 0
>  #define p2m_pod_offline_or_broken_replace(pg) BUG_ON(pg != NULL)
>  #endif
> +#ifdef CONFIG_CACHE_COLORING
> +#include <asm/coloring.h>
> +#endif
>  
>  #ifndef PGC_static
>  #define PGC_static 0
> @@ -231,6 +234,9 @@ static bool __read_mostly scrub_debug;
>  #define scrub_debug    false
>  #endif
>  
> +/* Memory required for buddy allocator to work with colored one */
> +static unsigned long __initdata buddy_alloc_size;
> +
>  /*
>   * Bit width of the DMA heap -- used to override NUMA-node-first.
>   * allocation strategy, which can otherwise exhaust low memory.
> @@ -440,7 +446,172 @@ mfn_t __init alloc_boot_pages(unsigned long nr_pfns, 
> unsigned long pfn_align)
>      BUG();
>  }
>  
> +static DEFINE_SPINLOCK(heap_lock);
>  
> +/* Initialise fields which have other uses for free pages. */
> +static void init_free_page_fields(struct page_info *pg)
> +{
> +    pg->u.inuse.type_info = PGT_TYPE_INFO_INITIALIZER;
> +    page_set_owner(pg, NULL);
> +}
> +
> +static bool is_page_colored(struct page_info *pg)
> +{
> +    return pg->count_info & PGC_colored;
> +}
> +
> +#ifdef CONFIG_CACHE_COLORING
> +/*************************
> + * COLORED SIDE-ALLOCATOR
> + *
> + * Pages are stored by their color in separate lists. Each list defines a 
> color
> + * and it is initialized during end_boot_allocator, where each page's color
> + * is calculated and the page itself is put in the correct list.
> + * After initialization there will be N lists where N is the number of 
> maximum
> + * available colors on the platform.
> + */

Nit: Perhaps either "maximum number of colors" or "number of available
colors"?

> +typedef struct page_list_head colored_pages_t;
> +static colored_pages_t __ro_after_init *color_heap;

Please place the attribute at its canonical slot:

static colored_pages_t *__ro_after_init color_heap;

It applies to the variable, not to the pointed-to type.

> +#define colored_pages(color) &color_heap[(color)]

The parentheses want to move:

#define colored_pages(color) (&color_heap[color])

> +static void color_heap_insert_page(struct page_info *pg)
> +{
> +    struct page_info *pos;
> +    struct page_list_head *head = colored_pages(page_to_color(pg));
> +
> +    pg->count_info |= PGC_colored;

The function isn't marked __init, so runtime correctness as to the
(non-atomic) update here wants clarifying.

> +    /* Insert page in list in ascending machine address order */

Isn't is descending order that you actually use, also seeing that
you take the first page of a list when allocating (further down)?

> +    page_list_for_each( pos, head )
> +    {
> +        if ( page_to_maddr(pos) < page_to_maddr(pg) )
> +        {
> +            head = &pos->list;
> +            break;
> +        }
> +    }

Wait - a linear search for every single page insertion? How well
is that going to perform on a multi-terabyte system?

> +    page_list_add_tail(pg, head);

page_list_head and page_list_entry are generally different things,
so I'm afraid this isn't correct in the common case, and you're
getting away with it only because only Arm currently enables this
code.

> +}
> +
> +static void color_heap_remove_page(struct page_info *pg)
> +{
> +    page_list_del(pg, colored_pages(page_to_color(pg)));
> +}
> +
> +static void __init init_col_heap_pages(struct page_info *pg,
> +                                       unsigned long nr_pages)
> +{
> +    unsigned int i;
> +
> +    if ( !color_heap )
> +    {
> +        unsigned int max_colors = get_max_colors();
> +        color_heap = xmalloc_array(colored_pages_t, max_colors);

Nit: Please always have a blank line between declaration(s) and
statement(s).

> +        BUG_ON(!color_heap);
> +
> +        for ( i = 0; i < max_colors; i++ )
> +            INIT_PAGE_LIST_HEAD(colored_pages(i));
> +
> +        if ( !buddy_alloc_size )
> +            buddy_alloc_size = 
> parse_size_and_unit(CONFIG_BUDDY_ALLOCATOR_SIZE,
> +                                                   NULL);
> +    }
> +
> +    printk(XENLOG_INFO "Init color heap with %lu pages\n", nr_pages);
> +    printk(XENLOG_INFO "Paging starting from: 0x%"PRIx64"\n",
> +           page_to_maddr(pg));

"Paging"? And please prefer %# over 0x% for hex formatting, as we do
elsewhere.

> +    for ( i = 0; i < nr_pages; i++ )
> +        color_heap_insert_page(pg++);
> +}
> +
> +/* Alloc one page based on domain color configuration */
> +static struct page_info *alloc_col_heap_page(unsigned int memflags,
> +                                             const unsigned int *colors,
> +                                             unsigned int num_colors)

Array instead of pointer notation would better reflect the purpose.

> +{
> +    struct page_info *pg = NULL;
> +    unsigned int i;
> +    bool need_tlbflush = false;
> +    uint32_t tlbflush_timestamp = 0;
> +
> +    spin_lock(&heap_lock);
> +
> +    for ( i = 0; i < num_colors; i++ )
> +    {
> +        struct page_info *tmp;
> +
> +        if ( page_list_empty(colored_pages(colors[i])) )
> +            continue;
> +
> +        tmp = page_list_first(colored_pages(colors[i]));
> +        if ( !pg || page_to_maddr(tmp) > page_to_maddr(pg) )
> +            pg = tmp;
> +    }
> +
> +    if ( !pg )
> +    {
> +        spin_unlock(&heap_lock);
> +        return NULL;
> +    }
> +
> +    pg->count_info = PGC_state_inuse;

Aren't you losing PGC_colored here?

> +static struct page_info *alloc_col_domheap_page(struct domain *d,
> +                                                unsigned int memflags)
> +{
> +    struct page_info *pg;
> +
> +    ASSERT(!in_irq());

ASSERT_ALLOC_CONTEXT()? Albeit that's redundant then with the sole caller.

> +#else
> +static void free_col_domheap_page(struct page_info *pg)
> +{
> +    return;
> +}

No need for "return" here.

> @@ -1939,11 +2107,24 @@ void __init end_boot_allocator(void)
>              break;
>          }
>      }
> -    for ( i = nr_bootmem_regions; i-- > 0; )
> +
> +    for ( i = 0; i < nr_bootmem_regions; i++ )

I'm afraid you can't simply go and invert the direction this loop works
without any justification. It's not even clear why you need to work
forwards in the colored case.

Jan



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.