[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v4] NUMA: Introduce NODE_DATA->node_present_pages(RAM pages)


  • To: Bernhard Kaindl <bernhard.kaindl@xxxxxxxxx>
  • From: Jan Beulich <jbeulich@xxxxxxxx>
  • Date: Thu, 31 Oct 2024 12:33:51 +0100
  • Autocrypt: addr=jbeulich@xxxxxxxx; keydata= xsDiBFk3nEQRBADAEaSw6zC/EJkiwGPXbWtPxl2xCdSoeepS07jW8UgcHNurfHvUzogEq5xk hu507c3BarVjyWCJOylMNR98Yd8VqD9UfmX0Hb8/BrA+Hl6/DB/eqGptrf4BSRwcZQM32aZK 7Pj2XbGWIUrZrd70x1eAP9QE3P79Y2oLrsCgbZJfEwCgvz9JjGmQqQkRiTVzlZVCJYcyGGsD /0tbFCzD2h20ahe8rC1gbb3K3qk+LpBtvjBu1RY9drYk0NymiGbJWZgab6t1jM7sk2vuf0Py O9Hf9XBmK0uE9IgMaiCpc32XV9oASz6UJebwkX+zF2jG5I1BfnO9g7KlotcA/v5ClMjgo6Gl MDY4HxoSRu3i1cqqSDtVlt+AOVBJBACrZcnHAUSuCXBPy0jOlBhxPqRWv6ND4c9PH1xjQ3NP nxJuMBS8rnNg22uyfAgmBKNLpLgAGVRMZGaGoJObGf72s6TeIqKJo/LtggAS9qAUiuKVnygo 3wjfkS9A3DRO+SpU7JqWdsveeIQyeyEJ/8PTowmSQLakF+3fote9ybzd880fSmFuIEJldWxp Y2ggPGpiZXVsaWNoQHN1c2UuY29tPsJgBBMRAgAgBQJZN5xEAhsDBgsJCAcDAgQVAggDBBYC AwECHgECF4AACgkQoDSui/t3IH4J+wCfQ5jHdEjCRHj23O/5ttg9r9OIruwAn3103WUITZee e7Sbg12UgcQ5lv7SzsFNBFk3nEQQCACCuTjCjFOUdi5Nm244F+78kLghRcin/awv+IrTcIWF hUpSs1Y91iQQ7KItirz5uwCPlwejSJDQJLIS+QtJHaXDXeV6NI0Uef1hP20+y8qydDiVkv6l IreXjTb7DvksRgJNvCkWtYnlS3mYvQ9NzS9PhyALWbXnH6sIJd2O9lKS1Mrfq+y0IXCP10eS FFGg+Av3IQeFatkJAyju0PPthyTqxSI4lZYuJVPknzgaeuJv/2NccrPvmeDg6Coe7ZIeQ8Yj t0ARxu2xytAkkLCel1Lz1WLmwLstV30g80nkgZf/wr+/BXJW/oIvRlonUkxv+IbBM3dX2OV8 AmRv1ySWPTP7AAMFB/9PQK/VtlNUJvg8GXj9ootzrteGfVZVVT4XBJkfwBcpC/XcPzldjv+3 HYudvpdNK3lLujXeA5fLOH+Z/G9WBc5pFVSMocI71I8bT8lIAzreg0WvkWg5V2WZsUMlnDL9 mpwIGFhlbM3gfDMs7MPMu8YQRFVdUvtSpaAs8OFfGQ0ia3LGZcjA6Ik2+xcqscEJzNH+qh8V m5jjp28yZgaqTaRbg3M/+MTbMpicpZuqF4rnB0AQD12/3BNWDR6bmh+EkYSMcEIpQmBM51qM EKYTQGybRCjpnKHGOxG0rfFY1085mBDZCH5Kx0cl0HVJuQKC+dV2ZY5AqjcKwAxpE75MLFkr wkkEGBECAAkFAlk3nEQCGwwACgkQoDSui/t3IH7nnwCfcJWUDUFKdCsBH/E5d+0ZnMQi+G0A nAuWpQkjM1ASeQwSHEeAWPgskBQL
  • Cc: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Roger Pau Monné <roger.pau@xxxxxxxxxx>, Julien Grall <julien@xxxxxxx>, Stefano Stabellini <sstabellini@xxxxxxxxxx>, Alejandro Vallejo <alejandro.vallejo@xxxxxxxxx>, xen-devel@xxxxxxxxxxxxxxxxxxxx
  • Delivery-date: Thu, 31 Oct 2024 11:33:59 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 29.10.2024 16:53, Jan Beulich wrote:
> On 27.10.2024 15:43, Bernhard Kaindl wrote:
>> From: Bernhard Kaindl <bernhard.kaindl@xxxxxxxxx>
>>
>> At the moment, Xen keeps track of the spans of PFNs of the NUMA nodes.
>> But the PFN span sometimes includes large MMIO holes, so these values
>> might not be an exact representation of the total usable RAM of nodes.
>>
>> Xen does not need it, but the size of the NUMA node's memory can be
>> helpful for management tools and HW information tools like hwloc/lstopo
>> with its Xen backend for Dom0: https://github.com/xenserver-next/hwloc/
>>
>> First, introduce NODE_DATA(nodeid)->node_present_pages to node_data[],
>> determine the sum of usable PFNs at boot and update them on memory_add().
>>
>> (The Linux kernel handles NODE_DATA->node_present_pages likewise)
>>
>> Signed-off-by: Bernhard Kaindl <bernhard.kaindl@xxxxxxxxx>
>> ---
>> Changes in v3:
>> - Use PFN_UP/DOWN, refactored further to simplify the code while leaving
>>   compiler-level optimisations to the compiler's optimisation passes.
>> Changes in v4:
>> - Refactored code and doxygen documentation according to the review.
>> ---
>>  xen/arch/x86/numa.c      | 13 +++++++++++++
>>  xen/arch/x86/x86_64/mm.c |  3 +++
>>  xen/common/numa.c        | 36 +++++++++++++++++++++++++++++++++---
>>  xen/include/xen/numa.h   | 21 +++++++++++++++++++++
>>  4 files changed, 70 insertions(+), 3 deletions(-)
>>
>> diff --git a/xen/arch/x86/numa.c b/xen/arch/x86/numa.c
>> index 4b0b297c7e..3c0574f773 100644
>> --- a/xen/arch/x86/numa.c
>> +++ b/xen/arch/x86/numa.c
>> @@ -100,6 +100,19 @@ unsigned int __init arch_get_dma_bitsize(void)
>>                   + PAGE_SHIFT, 32);
>>  }
>>  
>> +/**
>> + * @brief Retrieves the RAM range for a given index from the e820 memory 
>> map.
>> + *
>> + * This function fetches the start and end address (exclusive) of a RAM 
>> range
>> + * specified by the given index idx from the e820 memory map.
> 
> I think the use of (exclusive) here leaves room for ambiguity (as it may,
> unusually, apply to start as well then). Imo it would better be put ...
> 
>> + * @param idx The index of the RAM range in the e820 memory map to retrieve.
>> + * @param start Pointer to store the start address of the RAM range.
>> + * @param end Pointer to store the end address of the RAM range.
> 
> ... here, just like you have it ...
> 
>> + * @return 0 on success, -ENOENT if the index is out of bounds,
>> + *         or -ENODATA if the memory map at index idx is not of type 
>> E820_RAM.
>> + */
>>  int __init arch_get_ram_range(unsigned int idx, paddr_t *start, paddr_t 
>> *end)
>>  {
>>      if ( idx >= e820.nr_map )
>> --- a/xen/common/numa.c
>> +++ b/xen/common/numa.c
>> @@ -4,6 +4,7 @@
>>   * Adapted for Xen: Ryan Harper <ryanh@xxxxxxxxxx>
>>   */
>>  
>> +#include "xen/pfn.h"
>>  #include <xen/init.h>
>>  #include <xen/keyhandler.h>
>>  #include <xen/mm.h>
>> @@ -499,15 +500,44 @@ int __init compute_hash_shift(const struct node *nodes,
>>      return shift;
>>  }
>>  
>> -/* Initialize NODE_DATA given nodeid and start/end */
>> +/**
>> + * @brief Initialize a NUMA node's node_data structure at boot.
>> + *
>> + * It is given the NUMA node's index in the node_data array as well
>> + * as the start and exclusive end address of the node's memory span
>> + * as arguments and initializes the node_data entry with this information.
>> + *
>> + * It then initializes the total number of usable memory pages within
>> + * the NUMA node's memory span using the arch_get_ram_range() function.
>> + *
>> + * @param nodeid The index into the node_data array for the node.
>> + * @param start The starting physical address of the node's memory range.
>> + * @param end The exclusive ending physical address of the node's memory 
>> range.
> 
> ... here.
> 
>> + */
>>  void __init setup_node_bootmem(nodeid_t nodeid, paddr_t start, paddr_t end)
>>  {
>>      unsigned long start_pfn = paddr_to_pfn(start);
>>      unsigned long end_pfn = paddr_to_pfn(end);
>> +    struct node_data *numa_node = NODE_DATA(nodeid);
>> +    paddr_t start_ram, end_ram;
>> +    unsigned int idx = 0;
>> +    unsigned long *pages = &numa_node->node_present_pages;
>>  
>> -    NODE_DATA(nodeid)->node_start_pfn = start_pfn;
>> -    NODE_DATA(nodeid)->node_spanned_pages = end_pfn - start_pfn;
>> +    numa_node->node_start_pfn = start_pfn;
>> +    numa_node->node_spanned_pages = end_pfn - start_pfn;
>> +
>> +    /* Calculate the number of present RAM pages within the node: */
>> +    *pages = 0;
>> +    do {
>> +        int err = arch_get_ram_range(idx++, &start_ram, &end_ram);
>> +
>> +        if (err == -ENOENT)
>> +            break;
>> +        if ( err || start_ram >= end || end_ram <= start )
>> +            continue;  /* range is outside of the node, or not usable RAM */
>>  
>> +        *pages += PFN_DOWN(min(end_ram, end)) - PFN_UP(max(start_ram, 
>> start));
>> +    } while (1);
> 
> Nit: While we have ample bad examples, I think even in such while() uses style
> ought to be followed (i.e. "while ( 1 )"). Personally, since this looks a 
> little
> odd to me, I generally prefer "for ( ; ; )" in such cases.
> 
> With respective adjustments (which I'm happy to make while committing, so long
> as you agree):

Ah, no, I take that back. Alejandro's comments also want addressing, one way or
another.

Jan



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.