[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [PATCH v3 2/6] xen/x86: move generically usable NUMA code from x86 to common


  • To: Jan Beulich <jbeulich@xxxxxxxx>
  • From: Wei Chen <Wei.Chen@xxxxxxx>
  • Date: Mon, 29 Aug 2022 09:49:01 +0000
  • Accept-language: en-US
  • Arc-authentication-results: i=2; mx.microsoft.com 1; spf=pass (sender ip is 63.35.35.123) smtp.rcpttodomain=lists.xenproject.org smtp.mailfrom=arm.com; dmarc=pass (p=none sp=none pct=100) action=none header.from=arm.com; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com; arc=pass (0 oda=1 ltdi=1 spf=[1,1,smtp.mailfrom=arm.com] dkim=[1,1,header.d=arm.com] dmarc=[1,1,header.from=arm.com])
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=arm.com; dmarc=pass action=none header.from=arm.com; dkim=pass header.d=arm.com; arc=none
  • Arc-message-signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=I2tP5Pge+XN3MHTJzNxXJi3U0rB5hYo1z3hbP1gUjKY=; b=MU1cQ9sdOkzXDKKTqOzqRuWCsU/9Z7TUpzQozGYnsNYFUYtjDLG6dD7Tvx+IOaU3kAZaq2zkz9/I7Fe66hQEUufzYRmGDlB1QISD9M7vd485NL1KueH8MkRVsbkgGVz6Wf1bfQav7TukCKVbgVSittBSLgwEoNDdOLlyfTX5+I4PjGu4Ze0ETYg4Rth80XQl2oKVfxxoi0U7Vawl/rXr5Jfu6kf9aEUL2/8rmAqKWVtEHeDARs1u4TjptipXwr02J21fX5fGrlHU/v5/zpbHnO1qJd0IB+MQzp33PnYJRKdN96pl2Uwxl1uCVEbZQB5iv6nG6KOBW3yH8mHJKuALgw==
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=I2tP5Pge+XN3MHTJzNxXJi3U0rB5hYo1z3hbP1gUjKY=; b=D0ToUhg/ncVuBc/r69Qx6j4Ofrp1ibHt6E8jgElEZFBCZyk6Tm3f65JNBvWNCTMGAE6Q4kLJ6xh0q9YNeUPN8fKtdQZvSyV9tDnjYbf6ByrLMbdXitjo2FL52AwsjWLmbq2W0aW94V6woZKDKzusW2mbVvgGGJw6pLfe4GQZ0S++wO2oZpZVzOyPkMYa/BVxnVT3ybEm2Eqd1FNyy8N6yDhnOg1e0p9NbRYKixXCod82YSrMu9rVntTr9ZSt+8tjmHeBmg7qDczIjtCJtNZWcdIifa+Z2YMY5PegjkpCiNeVcOjBULrYFBM4WddHQM3kX9P4zInSbJWDM+SSFFlTVw==
  • Arc-seal: i=2; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=pass; b=nWb5yT7/uYTDTtGutJJ4gHj94E8uikA8mQtfsruTXWSxEXMc/ktsY640elNVXzhVVvbhGckeaF6vOsl2mi5c2CSkkPcEnWe+dlIbLM7+aPr6DnYf9Y6gpdBH10e2NEdJgbzezpeHKyOy0K+1Cz1NawxoX1mDNSBZHE76nPBXR4DIijvt3+qUXaTdR59rxOCpVlAvZziFAyRNu2rdE9+F8GaRr53i3LjGESeP3dl1WMqfI6IYM2X/KNVgtX+JX6suFUhTd6EacuSAiuFrqeHKSesHQM9lne1xnYW8J7NxXL4ahzKKeXzmA94KJwZ6nUmcjmv+sKMkBTx2E7eppd0XaA==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=J5FPmUetpwF1qGb447FdoAncHE4OYIFZWOggwJrV78GrWoqhopvAvbpgSzWSMjiE7Qdn4gObPVmulJvrz/MeYyjbyQ3iFmAh2oAvx/oSapb5y7vcBc4o+dBYCtgY0WaBstuUyXFiO+paDkf0Ya2t+0WXTXrKoDBhdYQ0xpTlusZWB6r9yoaV30621U7BUPdrYdGCqHTEN+DZ87TKjTzLt1qE3WH9PdHSAo4xig54oczbpaMaMLBrBlHUOjoGH+lBa5mggaLTHr/Lh1Fcxf7fAzlqJj5ViTLm0sEsF3KUy0yhp3EUX6mmlc28j0YIK/6hL7FluW4pL4jfi935E36SuQ==
  • Authentication-results-original: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com;
  • Cc: nd <nd@xxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Roger Pau Monné <roger.pau@xxxxxxxxxx>, Wei Liu <wl@xxxxxxx>, George Dunlap <george.dunlap@xxxxxxxxxx>, Julien Grall <julien@xxxxxxx>, Stefano Stabellini <sstabellini@xxxxxxxxxx>, "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>
  • Delivery-date: Mon, 29 Aug 2022 09:49:39 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>
  • Nodisclaimer: true
  • Original-authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com;
  • Thread-index: AQHYtdMye4MHuNHq+UOzJ3rwmp69yK2/d3mAgAYmodA=
  • Thread-topic: [PATCH v3 2/6] xen/x86: move generically usable NUMA code from x86 to common

Hi Jan,

> -----Original Message-----
> From: Jan Beulich <jbeulich@xxxxxxxx>
> Sent: 2022年8月25日 18:58
> To: Wei Chen <Wei.Chen@xxxxxxx>
> Cc: nd <nd@xxxxxxx>; Andrew Cooper <andrew.cooper3@xxxxxxxxxx>; Roger Pau
> Monné <roger.pau@xxxxxxxxxx>; Wei Liu <wl@xxxxxxx>; George Dunlap
> <george.dunlap@xxxxxxxxxx>; Julien Grall <julien@xxxxxxx>; Stefano
> Stabellini <sstabellini@xxxxxxxxxx>; xen-devel@xxxxxxxxxxxxxxxxxxxx
> Subject: Re: [PATCH v3 2/6] xen/x86: move generically usable NUMA code
> from x86 to common
> 
> On 22.08.2022 04:58, Wei Chen wrote:
> > --- /dev/null
> > +++ b/xen/common/numa.c
> > @@ -0,0 +1,440 @@
> > +/*
> > + * Generic VM initialization for NUMA setups.
> > + * Copyright 2002,2003 Andi Kleen, SuSE Labs.
> > + * Adapted for Xen: Ryan Harper <ryanh@xxxxxxxxxx>
> > + */
> > +
> > +#include <xen/init.h>
> > +#include <xen/keyhandler.h>
> > +#include <xen/mm.h>
> > +#include <xen/nodemask.h>
> > +#include <xen/numa.h>
> > +#include <xen/param.h>
> > +#include <xen/sched.h>
> > +#include <xen/softirq.h>
> > +
> > +struct node_data __ro_after_init node_data[MAX_NUMNODES];
> > +
> > +/* Mapping from pdx to node id */
> > +unsigned int __ro_after_init memnode_shift;
> > +unsigned long __ro_after_init memnodemapsize;
> > +uint8_t *__ro_after_init memnodemap;
> > +static uint8_t __ro_after_init _memnodemap[64];
> > +
> > +nodeid_t __ro_after_init cpu_to_node[NR_CPUS] = {
> 
> I don't think this can be __ro_after_init, or you'll break CPU
> hotplug.
> 

Yes, this will cause problem in cpu_add. I will __read_mostly for it
in next version.

> > +    [0 ... NR_CPUS-1] = NUMA_NO_NODE
> > +};
> > +
> > +cpumask_t __ro_after_init node_to_cpumask[MAX_NUMNODES];
> 
> Same here.
> 

Ok.

> > +nodemask_t __read_mostly node_online_map = { { [0] = 1UL } };
> > +
> > +bool __read_mostly numa_off;
> 
> This, otoh, can be, or have I missed a place where it's written by a
> non-__init function?
> 

I think yes, it will be used in numa_disabled and numa_disabled will
be called in cpu_add.

> > +bool numa_disabled(void)
> > +{
> > +    return numa_off || arch_numa_disabled(false);
> > +}
> > +
> > +/*
> > + * Given a shift value, try to populate memnodemap[]
> > + * Returns :
> > + * 1 if OK
> > + * 0 if memnodmap[] too small (of shift too small)
> > + * -1 if node overlap or lost ram (shift too big)
> > + */
> > +static int __init populate_memnodemap(const struct node *nodes,
> > +                                      nodeid_t numnodes, unsigned int
> shift,
> 
> I don't think you can use nodeid_t for a variable holding a node count.
> Think of what would happen if there were 256 nodes, the IDs of which
> all fit in nodeid_t. (Same again further down.)
> 

If we use u8 as nodeid_t, why there will be 256 nodes to here?
And the MAX_NUMNODES has been limited to 64 (using NODES_SHIFT or
CONFIG_NR_NUMA_NODES). If we allow 256 nodes, we have to update MAX_NUMNODES
and nodeid_t first I think?

> > +                                      nodeid_t *nodeids)
> > +{
> > +    unsigned long spdx, epdx;
> > +    nodeid_t i;
> 
> This is likely inefficient for a loop counter variable. Note how you
> use "unsigned int" in e.g. extract_lsb_from_nodes().
> 

Did you mean u8 for "i" will cause something like unalignment, and will
cause loop inefficient. If yes, I will use unsigned int for "i" in next
version.

> > +unsigned int __init compute_hash_shift(const struct node *nodes,
> > +                                       nodeid_t numnodes, nodeid_t
> *nodeids)
> > +{
> > +    unsigned int shift;
> > +
> > +    shift = extract_lsb_from_nodes(nodes, numnodes);
> > +    if ( memnodemapsize <= ARRAY_SIZE(_memnodemap) )
> > +        memnodemap = _memnodemap;
> > +    else if ( allocate_cachealigned_memnodemap() )
> > +        return -1;
> 
> With this the function can't very well have "unsigned int" return type.
> 

Oh, yes, I had only thought a negative shift will be strange and hadn't
noticed this return value. I will restore the return value.

> > +void __init numa_init_array(void)
> > +{
> > +    int rr, i;
> 
> "unsigned int" for i and perhaps nodeid_t for rr?
> 

Yes, I will do it.

> > +static int __init numa_emulation(unsigned long start_pfn,
> > +                                 unsigned long end_pfn)
> > +{
> > +    unsigned int i;
> > +    struct node nodes[MAX_NUMNODES];
> > +    uint64_t sz = pfn_to_paddr(end_pfn - start_pfn) / numa_fake;
> > +
> > +    /* Kludge needed for the hash function */
> > +    if ( hweight64(sz) > 1 )
> > +    {
> > +        u64 x = 1;
> 
> uint64_t and a blank line between declaration(s) and statement(s)
> please.
> 

OK.

> > +        while ( (x << 1) < sz )
> > +            x <<= 1;
> > +        if ( x < sz / 2 )
> > +            printk(KERN_ERR "Numa emulation unbalanced. Complain to
> maintainer\n");
> > +        sz = x;
> > +    }
> > +
> > +    memset(&nodes, 0, sizeof(nodes));
> > +    for ( i = 0; i < numa_fake; i++ )
> > +    {
> > +        nodes[i].start = pfn_to_paddr(start_pfn) + i * sz;
> > +        if ( i == numa_fake - 1 )
> > +            sz = pfn_to_paddr(end_pfn) - nodes[i].start;
> > +        nodes[i].end = nodes[i].start + sz;
> > +        printk(KERN_INFO "Faking node %u at %"PRIx64"-%"PRIx64"
> (%"PRIu64"MB)\n",
> > +               i, nodes[i].start, nodes[i].end,
> > +               (nodes[i].end - nodes[i].start) >> 20);
> > +        node_set_online(i);
> > +    }
> > +    memnode_shift = compute_hash_shift(nodes, numa_fake, NULL);
> > +    if ( memnode_shift < 0 )
> 
> Does the compiler not warn here, comparing an unsigned value for being
> negative?

It's strange, I haven't seen warnings for this kind of comparison.

> 
> > --- a/xen/include/xen/numa.h
> > +++ b/xen/include/xen/numa.h
> > @@ -18,4 +18,70 @@
> >    (((d)->vcpu != NULL && (d)->vcpu[0] != NULL) \
> >     ? vcpu_to_node((d)->vcpu[0]) : NUMA_NO_NODE)
> >
> > +/* The following content can be used when NUMA feature is enabled */
> > +#ifdef CONFIG_NUMA
> > +
> > +extern nodeid_t      cpu_to_node[NR_CPUS];
> > +extern cpumask_t     node_to_cpumask[];
> > +
> > +#define cpu_to_node(cpu)        (cpu_to_node[cpu])
> > +#define parent_node(node)       (node)
> > +#define node_to_first_cpu(node) (__ffs(node_to_cpumask[node]))
> > +#define node_to_cpumask(node)   (node_to_cpumask[node])
> 
> Please could you take the opportunity and drop unnecessary parentheses
> from here? Afaict only parent_node() need them to be kept.
>

OK.

> > +struct node {
> > +    paddr_t start, end;
> > +};
> > +
> > +extern unsigned int compute_hash_shift(const struct node *nodes,
> > +                                       nodeid_t numnodes, nodeid_t
> *nodeids);
> > +
> > +#define VIRTUAL_BUG_ON(x)
> > +
> > +extern bool numa_off;
> > +extern void numa_add_cpu(unsigned int cpu);
> 
> Please can you have variable and function declarations visually separated,
> by adding a blank line between them?
>

Sure. I will do it.
 
> > +extern void numa_init_array(void);
> > +extern void numa_set_node(unsigned int cpu, nodeid_t node);
> > +extern void numa_initmem_init(unsigned long start_pfn, unsigned long
> end_pfn);
> > +extern int  numa_scan_nodes(paddr_t start, paddr_t end);
> > +
> > +extern int arch_numa_setup(const char *opt);
> > +extern bool arch_numa_disabled(bool init_as_disable);
> > +extern void setup_node_bootmem(nodeid_t nodeid, paddr_t start, paddr_t
> end);
> > +
> > +static inline void clear_node_cpumask(unsigned int cpu)
> > +{
> > +    cpumask_clear_cpu(cpu, &node_to_cpumask[cpu_to_node(cpu)]);
> > +}
> > +
> > +/* Simple perfect hash to map pdx to node numbers */
> > +extern unsigned int memnode_shift;
> > +extern unsigned long memnodemapsize;
> > +extern uint8_t *memnodemap;
> > +
> > +struct node_data {
> > +    unsigned long node_start_pfn;
> > +    unsigned long node_spanned_pages;
> > +};
> > +
> > +extern struct node_data node_data[];
> > +
> > +static inline __attribute_pure__ nodeid_t phys_to_nid(paddr_t addr)
> 
> Nit: The conventional place for attributes is between return type
> and function (or object) name.
>

Ok.
 
> > +{
> > +    nodeid_t nid;
> > +    VIRTUAL_BUG_ON((paddr_to_pdx(addr) >> memnode_shift) >=
> memnodemapsize);
> > +    nid = memnodemap[paddr_to_pdx(addr) >> memnode_shift];
> > +    VIRTUAL_BUG_ON(nid >= MAX_NUMNODES || !node_data[nid]);
> > +    return nid;
> > +}
> > +
> > +#define NODE_DATA(nid)          (&(node_data[nid]))
> 
> Again please take the opportunity and drop the unnecessary inner
> parentheses.
> 

Ok.

> > +#define node_start_pfn(nid)     (NODE_DATA(nid)->node_start_pfn)
> > +#define node_spanned_pages(nid) (NODE_DATA(nid)->node_spanned_pages)
> > +#define node_end_pfn(nid)       (NODE_DATA(nid)->node_start_pfn + \
> > +                                NODE_DATA(nid)->node_spanned_pages)
> 
> Pleae correct indentation here - it was correct originally (except
> for the fact that it was using hard tabs).
>

Ok.

Cheers,
Wei Chen
 
> Jan

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.