[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH 06/12] xen/common: add cache coloring allocator for domains


  • To: Carlo Nonato <carlo.nonato@xxxxxxxxxxxxxxx>
  • From: Jan Beulich <jbeulich@xxxxxxxx>
  • Date: Mon, 19 Sep 2022 08:26:16 +0200
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=suse.com; dmarc=pass action=none header.from=suse.com; dkim=pass header.d=suse.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=eckbcM2lQ6kmZ4wzMH2CUCKJ1xiHQ1Joy54oP1OZmK8=; b=XrblN2F2ABAXmOjT6/vwh2jEVtEAW+fRU+CxQwdtz0jDh9QA5RBKUXIG3PucKPapvWBr1xid/SZD80FS1Vr6uZ2E6TlB4wJXaxjh57QHKPOMDzDs55eE2bF7ZbVfcRX5VjgxLL2OYtoiBPLCxkiJGwDF+2o7YTtO6D7kKnhVp65bQKxUwvW45Pl/RM3dTR0SnS8krcxz2/CFQw/p3/2wQvwmtjp9LRnrF/QFw7pwdyq3WhIQVNTBByzR+T6/sJVL9OiMwUJdctnoAW9WArkJNqpoxnWxIgKTp/9pRaK0syjNY539ysTJ1e/pwaHWDerDIj/NO3PnHN2efABxsYhiHg==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=gpyBR51jbHMtIxRklDKBcRa0RFpC0dtIPNaeg6yc7CQfS98JIOyw2oVcH1d1kN30IHybppToh8XPTG6AiTuXcNGNDCp/vFqusoQ2f9NkSU7SJHMrR/MaPNkX4CSqfJyTmdlQlycljy7mfb6EdpTLpm/Ez00lFin7FmF/qk4Y7pPzHgD1Jnm9Hm9UVG7GQGkwabLmdBqatExcGyFG7C9dumCWQvB9i2KNByOHiwEAaESNnNICNXqWgqOqHr1IygK8mNcEbIaP1vOukVgexQZsbMyJEBVJ0fEXINbcI1qOU84ECRNX8pb5BkkGb/bbAIIA/D/EtRF9UrMvSKuzORz36g==
  • Authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=suse.com;
  • Cc: andrew.cooper3@xxxxxxxxxx, george.dunlap@xxxxxxxxxx, julien@xxxxxxx, stefano.stabellini@xxxxxxx, wl@xxxxxxx, marco.solieri@xxxxxxxxxx, andrea.bastoni@xxxxxxxxxxxxxxx, lucmiccio@xxxxxxxxx, Marco Solieri <marco.solieri@xxxxxxxxxxxxxxx>, "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>
  • Delivery-date: Mon, 19 Sep 2022 06:26:39 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 16.09.2022 18:05, Carlo Nonato wrote:
> On Thu, Sep 15, 2022 at 3:13 PM Jan Beulich <jbeulich@xxxxxxxx> wrote:
>> On 26.08.2022 14:51, Carlo Nonato wrote:
>>> --- a/xen/arch/arm/Kconfig
>>> +++ b/xen/arch/arm/Kconfig
>>> @@ -147,6 +147,18 @@ config MAX_CACHE_COLORS
>>>         colors at boot. Note that if, at any time, a color configuration 
>>> with more
>>>         colors than the maximum will be employed an error will be produced.
>>>
>>> +config BUDDY_ALLOCATOR_SIZE
>>> +     string "Buddy allocator reserved memory size" if CACHE_COLORING
>>> +     default "64M" if CACHE_COLORING
>>> +     default "0M" if !CACHE_COLORING
>>
>> I don't understand the purpose of this last line, nor the two earlier
>> "if". Why not simply
>>
>> config BUDDY_ALLOCATOR_SIZE
>>         string "Buddy allocator reserved memory size"
>>         depend on CACHE_COLORING
>>         default "64M"
> 
> This was just to have a value for the config option even with cache coloring
> disabled. All those ifs emulate the "depends on" keyword, but let the
> CONFIG_BUDDY_ALLOCATOR_SIZE takes "0M" when coloring is disabled. With just
> the "depends on" the macro isn't defined at all. I know that this can be
> handled with some simple #ifdef, but I found this way to be more elegant.
> Not an expert here so if you prefer the other way or a whole different one
> (more readable/better fitted) please let me know.

As far as I saw, the sole use was already inside a suitable #ifdef. Hence
yes, I clearly would see "depends on" as the better fit. Please also don't
forget that if later cache coloring would be enabled for another
architecture, that default of zero (pre-recorded in a .config) would get
in the way; one would need to manually change it (and remember to do so).

>> Finally - how much of this is really Arm-specific? Shouldn't this be a
>> common config option, perhaps merely restricted to Arm by the top level
>> option (CACHE_COLORING?) depending on a further HAS_CACHE_COLORING,
>> which only Arm would select?
> 
> I'm sorry, but I don't understand your suggestion. BUDDY_ALLOCATOR_SIZE
> is Arm specific because CACHE_COLORING is. In fact it depends only on this
> last config value and not on Arm config directly. Why should someone limit
> the buddy allocator when coloring isn't enabled?

My comment wasn't on this on setting alone, but on the coloring ones as a
set.

> I've lost you on the HAS_CACHE_COLORING. Why should Arm config select this
> one? Cache coloring must remain optional. I'm probably missing something.

HAS_* settings only express what an arch is capable of; they don't replace
dependent options which actually are user-selectable. (That said, we have
a number where there's no user selection possible, but that's not of
interest here.)

>>> --- a/xen/arch/arm/coloring.c
>>> +++ b/xen/arch/arm/coloring.c
>>> @@ -300,6 +300,16 @@ void prepare_color_domain_config(struct 
>>> xen_arch_domainconfig *config,
>>>      config->num_colors = (uint16_t)num;
>>>  }
>>>
>>> +unsigned int page_to_color(struct page_info *pg)
>>
>> The parameter will want to be pointer-to-const and I wonder whether ...
>>
>>> +{
>>> +    return addr_to_color(page_to_maddr(pg));
>>> +}
>>
>> ... the function as a whole wouldn't be a good candidate for being an
>> inline one (requiring addr_to_color() to be available in outside of
>> this file, of course).
> 
> You mean defining it as static inline in the coloring.h header?

That would seem preferable for a simple function like this one.

>>> +static void color_heap_insert_page(struct page_info *pg)
>>> +{
>>> +    struct page_info *pos;
>>> +    struct page_list_head *head = colored_pages(page_to_color(pg));
>>> +
>>> +    pg->count_info |= PGC_colored;
>>
>> The function isn't marked __init, so runtime correctness as to the
>> (non-atomic) update here wants clarifying.
> 
> Yes. I need to check and probably add a spin lock for the color heap.

I'm afraid a spin lock won't help. May I suggest you look at some of
the staticmem discussions that had happened, including a similar
topic. (Sorry, I don't have a link at hand to the exact mail.)

>>> +    page_list_for_each( pos, head )
>>> +    {
>>> +        if ( page_to_maddr(pos) < page_to_maddr(pg) )
>>> +        {
>>> +            head = &pos->list;
>>> +            break;
>>> +        }
>>> +    }
>>
>> Wait - a linear search for every single page insertion? How well
>> is that going to perform on a multi-terabyte system?
> 
> For our test cases (embedded systems) the linear search is good enough.
> I agree with you that in the general case this is bad (even though the main
> targets are indeed embedded systems).
> Are there any already available data structures that we can exploit to get
> better performances?

I'm afraid there aren't any that I would see as a good fit here.

>>> +    page_list_add_tail(pg, head);
>>
>> page_list_head and page_list_entry are generally different things,
>> so I'm afraid this isn't correct in the common case, and you're
>> getting away with it only because only Arm currently enables this
>> code.
> 
> So how to insert in the middle of the list?

That likely would require a page_list_*() new helper function. So
far we simply had no need to insert at other than head or tail,
iirc.

>>> @@ -1939,11 +2107,24 @@ void __init end_boot_allocator(void)
>>>              break;
>>>          }
>>>      }
>>> -    for ( i = nr_bootmem_regions; i-- > 0; )
>>> +
>>> +    for ( i = 0; i < nr_bootmem_regions; i++ )
>>
>> I'm afraid you can't simply go and invert the direction this loop works
>> without any justification. It's not even clear why you need to work
>> forwards in the colored case.
> 
> The order was inverted because I'm assuming bootmem regions are stored in
> ascending order (this should be the case looking at bootmem_region_add().
> Am I wrong?) and (second assumption) pages taken from each memory region
> are sorted in ascending order relatively to their machine address.
> This means that color_heap_insert_page() is called (at least during
> end_boot_allocator()) with always increasing machine addresses and so the
> linear search should be O(1).

Indeed that was my guess. Yet the present order of processing is there
for a reason, so you need to both retain it and supply justification
(perhaps by way of a brief comment) why you need alternative code for
your allocator. Even better would of course be if insertion was more
efficient and didn't require the logic here to specifically take care.

Jan



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.