[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [XEN RFC PATCH 23/40] xen/arm: introduce a helper to parse device tree memory node



On Wed, 8 Sep 2021, Wei Chen wrote:
> > > >>> @@ -55,6 +57,79 @@ static int __init
> > > >> dtb_numa_processor_affinity_init(nodeid_t node)
> > > >>>       return 0;
> > > >>>   }
> > > >>>
> > > >>> +/* Callback for parsing of the memory regions affinity */
> > > >>> +static int __init dtb_numa_memory_affinity_init(nodeid_t node,
> > > >>> +                                paddr_t start, paddr_t size)
> > > >>> +{
> > > >>> +    struct node *nd;
> > > >>> +    paddr_t end;
> > > >>> +    int i;
> > > >>> +
> > > >>> +    if ( srat_disabled() )
> > > >>> +        return -EINVAL;
> > > >>> +
> > > >>> +    end = start + size;
> > > >>> +    if ( num_node_memblks >= NR_NODE_MEMBLKS )
> > > >>> +    {
> > > >>> +        dprintk(XENLOG_WARNING,
> > > >>> +                "Too many numa entry, try bigger NR_NODE_MEMBLKS
> > \n");
> > > >>> +        bad_srat();
> > > >>> +        return -EINVAL;
> > > >>> +    }
> > > >>> +
> > > >>> +    /* It is fine to add this area to the nodes data it will be
> > used
> > > >> later */
> > > >>> +    i = conflicting_memblks(start, end);
> > > >>> +    /* No conflicting memory block, we can save it for later usage
> > */;
> > > >>> +    if ( i < 0 )
> > > >>> +        goto save_memblk;
> > > >>> +
> > > >>> +    if ( memblk_nodeid[i] == node ) {
> > > >>> +        /*
> > > >>> +         * Overlaps with other memblk in the same node, warning
> > here.
> > > >>> +         * This memblk will be merged with conflicted memblk later.
> > > >>> +         */
> > > >>> +        printk(XENLOG_WARNING
> > > >>> +               "DT: NUMA NODE %u (%"PRIx64
> > > >>> +               "-%"PRIx64") overlaps with itself (%"PRIx64"-
> > > >> %"PRIx64")\n",
> > > >>> +               node, start, end,
> > > >>> +               node_memblk_range[i].start,
> > node_memblk_range[i].end);
> > > >>> +    } else {
> > > >>> +        /*
> > > >>> +         * Conflict with memblk in other node, this is an error.
> > > >>> +         * The NUMA information is invalid, NUMA will be turn off.
> > > >>> +         */
> > > >>> +        printk(XENLOG_ERR
> > > >>> +               "DT: NUMA NODE %u (%"PRIx64"-%"
> > > >>> +               PRIx64") overlaps with NODE %u (%"PRIx64"-
> > > %"PRIx64")\n",
> > > >>> +               node, start, end, memblk_nodeid[i],
> > > >>> +               node_memblk_range[i].start,
> > node_memblk_range[i].end);
> > > >>> +        bad_srat();
> > > >>> +        return -EINVAL;
> > > >>> +    }
> > > >>> +
> > > >>> +save_memblk:
> > > >>> +    nd = &nodes[node];
> > > >>> +    if ( !node_test_and_set(node, memory_nodes_parsed) ) {
> > > >>> +        nd->start = start;
> > > >>> +        nd->end = end;
> > > >>> +    } else {
> > > >>> +        if ( start < nd->start )
> > > >>> +            nd->start = start;
> > > >>> +        if ( nd->end < end )
> > > >>> +            nd->end = end;
> > > >>> +    }
> > > >>> +
> > > >>> +    printk(XENLOG_INFO "DT: NUMA node %u %"PRIx64"-%"PRIx64"\n",
> > > >>> +           node, start, end);
> > > >>> +
> > > >>> +    node_memblk_range[num_node_memblks].start = start;
> > > >>> +    node_memblk_range[num_node_memblks].end = end;
> > > >>> +    memblk_nodeid[num_node_memblks] = node;
> > > >>> +    num_node_memblks++;
> > > >>
> > > >>
> > > >> Is it possible to have non-contigous ranges of memory for a single
> > NUMA
> > > >> node?
> > > >>
> > > >> Looking at the DT bindings and Linux implementation, it seems
> > possible.
> > > >> Here, it seems that node_memblk_range/memblk_nodeid could handle it,
> > > >> but nodes couldn't.
> > > >
> > > > Yes, you're right. I copied this code for x86 ACPI NUMA. Does ACPI
> > allow
> > > > non-contiguous ranges of memory for a single NUMA node too?
> > >
> > > I couldn't find any restriction for ACPI. Although, I only briefly
> > > looked at the spec.
> > >
> > > > If yes, I think
> > > > this will affect x86 ACPI NUMA too. In next version, we plan to merge
> > > > dtb_numa_memory_affinity_init and acpi_numa_memory_affinity_init into
> > a
> > > > neutral function. So we can fix them at the same time.
> > > >
> > > > If not, maybe we have to keep the diversity for dtb and ACPI here.
> > >
> > > I am not entirely sure what you mean. Are you saying if ACPI doesn't
> > > allow non-contiguous ranges of memory, then we should keep the
> > > implementation separated?
> > >
> > > If so, then I disagree with that. It is fine to have code that supports
> > > more than what a firmware table supports. The main benefit is less code
> > > and therefore less long term maintenance (with the current solution we
> > > would need to check both the ACPI and DT implementation if there is a
> > > bug in one).
> > >
> > 
> > Yes, I agree.
> > 
> 
> I am looking for some methods to address this comment. Current "nodes"
> has not considered the situation of memory addresses of different NUMA
> nodes can be interleaved.
> 
> This code exists in x86 NUMA implementation. I think it may be based on
> one early version of Linux x86 NUMA implementation. In recent Linux
> code, both ACPI/numa/srat.c[1] and x86 NUMA code[2] are not using
> "nodes" to record NUMA memory address boundary. They don't depend
> on "nodes" to do sanity check.
> 
> To fix it, we'd better to upgrade the x86 NUMA driver. It will make
> a great affect for Xen-x86. And I think it might out of this series
> scope. Can we create another thread to discuss about it?
> 
> Or could you give me suggestions that we can use some simple ways
> to fix it?

It looks like that we would have to replace all the node->start /
node->end checks with node_memblk_range checks. There are a few of them
in valid_numa_range, conflicting_memblks, cutoff_node,
nodes_cover_memory. It wouldn't be trivial.

Although I do think that non-contiguous memory for NUMA nodes is
important to support, the patch series is already 40 patches. I don't
think it is a good idea to add other significant changes to it. I
wouldn't upgrade the x86 NUMA driver now. If we can't find a better way,
we can proceed as you are doing in this version, with the known gap that
we can't deal with non-contigious memory for NUMA nodes, and fix it with
a follow-up series later. In that case we would want to have an explicit
check for non-contiguous memory for NUMA nodes in
dtb_numa_memory_affinity_init and error out if found.


> Also, on Linux, NUMA implementations for x86 are different from Arm64
> and RISC-V implementations.[3]
> 
> [1] https://github.com/torvalds/linux/blob/master/drivers/acpi/numa/srat.c
> [2] https://github.com/torvalds/linux/blob/master/arch/x86/mm/numa.c
> [3] https://github.com/torvalds/linux/blob/master/drivers/base/arch_numa.c


In general, I like the idea of sharing code as much as possible between
architectures (x86, ARM, etc.) and between DT/ACPI because it makes the
code maintainance easier and one might even gain certain features for
free.

However, the excercise of sharing code shouldn't take significant
additional efforts. In fact, it should decrease the overall effort:
instead of writing new code one just take existing code and move it to
common.

In this instance, I think it would be good to be able to share the NUMA
initialization code between x86/ARM and ACPI/DT if it doesn't cause
extra efforts.

Here the extra effort that my previous comment might cause doesn't
derive from x86/ARM or DT/ACPI code sharing. It derives from the fact
that our existing code doesn't deal with non-contigous memory for NUMA
nodes unfortunately. That is something we need to find a way to cope
with anyway.



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.