[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [XEN RFC PATCH 24/40] xen/arm: introduce a helper to parse device tree NUMA distance map



On Tue, 31 Aug 2021, Wei Chen wrote:
> Hi Stefano,
> 
> > -----Original Message-----
> > From: Stefano Stabellini <sstabellini@xxxxxxxxxx>
> > Sent: 2021年8月31日 8:48
> > To: Wei Chen <Wei.Chen@xxxxxxx>
> > Cc: xen-devel@xxxxxxxxxxxxxxxxxxxx; sstabellini@xxxxxxxxxx; julien@xxxxxxx;
> > jbeulich@xxxxxxxx; Bertrand Marquis <Bertrand.Marquis@xxxxxxx>
> > Subject: Re: [XEN RFC PATCH 24/40] xen/arm: introduce a helper to parse
> > device tree NUMA distance map
> > 
> > On Wed, 11 Aug 2021, Wei Chen wrote:
> > > A NUMA aware device tree will provide a "distance-map" node to
> > > describe distance between any two nodes. This patch introduce a
> > > new helper to parse this distance map.
> > >
> > > Signed-off-by: Wei Chen <wei.chen@xxxxxxx>
> > > ---
> > >  xen/arch/arm/numa_device_tree.c | 67 +++++++++++++++++++++++++++++++++
> > >  1 file changed, 67 insertions(+)
> > >
> > > diff --git a/xen/arch/arm/numa_device_tree.c
> > b/xen/arch/arm/numa_device_tree.c
> > > index bbe081dcd1..6e0d1d3d9f 100644
> > > --- a/xen/arch/arm/numa_device_tree.c
> > > +++ b/xen/arch/arm/numa_device_tree.c
> > > @@ -200,3 +200,70 @@ device_tree_parse_numa_memory_node(const void *fdt,
> > int node,
> > >
> > >      return 0;
> > >  }
> > > +
> > > +/* Parse NUMA distance map v1 */
> > > +int __init
> > > +device_tree_parse_numa_distance_map_v1(const void *fdt, int node)
> > > +{
> > > +    const struct fdt_property *prop;
> > > +    const __be32 *matrix;
> > > +    int entry_count, len, i;
> > > +
> > > +    printk(XENLOG_INFO "NUMA: parsing numa-distance-map\n");
> > > +
> > > +    prop = fdt_get_property(fdt, node, "distance-matrix", &len);
> > > +    if ( !prop )
> > > +    {
> > > +        printk(XENLOG_WARNING
> > > +               "NUMA: No distance-matrix property in distance-map\n");
> > > +
> > > +        return -EINVAL;
> > > +    }
> > > +
> > > +    if ( len % sizeof(uint32_t) != 0 )
> > > +    {
> > > +        printk(XENLOG_WARNING
> > > +               "distance-matrix in node is not a multiple of u32\n");
> > > +        return -EINVAL;
> > > +    }
> > > +
> > > +    entry_count = len / sizeof(uint32_t);
> > > +    if ( entry_count <= 0 )
> > > +    {
> > > +        printk(XENLOG_WARNING "NUMA: Invalid distance-matrix\n");
> > > +
> > > +        return -EINVAL;
> > > +    }
> > > +
> > > +    matrix = (const __be32 *)prop->data;
> > > +    for ( i = 0; i + 2 < entry_count; i += 3 )
> > > +    {
> > > +        uint32_t from, to, distance;
> > > +
> > > +        from = dt_read_number(matrix, 1);
> > > +        matrix++;
> > > +        to = dt_read_number(matrix, 1);
> > > +        matrix++;
> > > +        distance = dt_read_number(matrix, 1);
> > > +        matrix++;
> > > +
> > > +        if ( (from == to && distance != NUMA_LOCAL_DISTANCE) ||
> > > +            (from != to && distance <= NUMA_LOCAL_DISTANCE) )
> > > +        {
> > > +            printk(XENLOG_WARNING
> > > +                   "Invalid nodes' distance from node#%d to node#%d
> > = %d\n",
> > > +                   from, to, distance);
> > > +            return -EINVAL;
> > > +        }
> > > +
> > > +        printk(XENLOG_INFO "NUMA: distance from node#%d to node#%d
> > = %d\n",
> > > +               from, to, distance);
> > > +        numa_set_distance(from, to, distance);
> > > +
> > > +        /* Set default distance of node B->A same as A->B */
> > > +        if (to > from)
> > > +             numa_set_distance(to, from, distance);
> > 
> > I am a bit unsure about this last 2 lines: why calling numa_set_distance
> > in the opposite direction only when to > from? Wouldn't it be OK to
> > always do both:
> > 
> > numa_set_distance(from, to, distance);
> > numa_set_distance(to, from, distance);
> > 
> > ?
> > 
> I borrowed this code from Linux, but here is my understanding:
> 
> First, I read some notes in Documentation/devicetree/bindings/numa.txt
> 1. Each entry represents distance from first node to second node.
> The distances are equal in either direction.
> 2. distance-matrix should have entries in lexicographical ascending
> order of nodes.
> 
> Here is an example of distance-map node in DTB:
> Sample#1, full list:
>               distance-map {
>                        compatible = "numa-distance-map-v1";
>                        distance-matrix = <0 0  10>,
>                                          <0 1  20>,
>                                          <0 2  40>,
>                                          <0 3  20>,
>                                          <1 0  20>,
>                                          <1 1  10>,
>                                          <1 2  20>,
>                                          <1 3  40>,
>                                          <2 0  40>,
>                                          <2 1  20>,
>                                          <2 2  10>,
>                                          <2 3  20>,
>                                          <3 0  20>,
>                                          <3 1  40>,
>                                          <3 2  20>,
>                                          <3 3  10>;
>               };
>
> Call numa_set_distance when "to > from" will prevent Xen to call
> numa_set_distance(0, 1, 20) again when it's setting distance for <1 0 20>.
> But, numa_set_distance(1, 0, 20) will be call twice.
> 
> Normally, distance-map node will be optimized in following sample#2,
> all redundant entries are removed:
> Sample#2, partial list:
>               distance-map {
>                        compatible = "numa-distance-map-v1";
>                        distance-matrix = <0 0  10>,
>                                          <0 1  20>,
>                                          <0 2  40>,
>                                          <0 3  20>,
>                                          <1 1  10>,
>                                          <1 2  20>,
>                                          <1 3  40>,
>                                          <2 2  10>,
>                                          <2 3  20>,
>                                          <3 3  10>;
>               };
> 
> There is not any "from > to" entry in the map. But using this partial map
> still can set all distances for all pairs. And numa_set_distance(1, 0, 20)
> will be only once.
 
I see. I can't find in Documentation/devicetree/bindings/numa.txt where
it says that "from > to" nodes can be omitted. If it is not written
down, then somebody could easily optimize it the opposite way:

                         distance-matrix = <0 0  10>,
                                           <1 0  20>,
                                           <2 0  40>,
                                           <3 0  20>,
                                           <1 1  10>,
                                           <2 1  20>,
                                           <3 1  40>,
                                           <2 2  10>,
                                           <3 2  20>,
                                           <3 3  10>;

I think the code in Xen should be resilient and able to cope with a
device tree like the one you wrote or the one I wrote. From a code
perspective, it should be very easy to do. If nothing else it would make
Xen more resilient against buggy firmware.

 
> > But in any case, I have a different suggestion. The binding states that
> > "distances are equal in either direction". Also it has an example where
> > only one direction is expressed unfortunately (at the end of the
> > document).
> > 
> 
> Oh, I should see this comment first, then I will not post above
> comment : )
> 
> > So my suggestion is to parse it as follows:
> > 
> > - call numa_set_distance just once from
> >   device_tree_parse_numa_distance_map_v1
> > 
> > - in numa_set_distance:
> >     - set node_distance_map[from][to] = distance;
> >     - check node_distance_map[to][from]
> >           - if unset, node_distance_map[to][from] = distance;
> >           - if already set to the same value, return success;
> >           - if already set to a different value, return error;
> 
> I don't really like this implementation. I want the behavior of
> numa_set_distance just like the function name, do not include
> implicit operations. Otherwise, except the user read this function
> implementation before he use it, he probably doesn't know this
> function has done so many things.

You can leave numa_set_distance as-is without any implicit operations.

In that case, just call numa_set_distance twice from numa_set_distance
for both from/to and to/from. numa_set_distance could return error is
the entry was already set to a different value or success otherwise
(also in the case it was already set to the same value). This would
allow Xen to cope with both scenarios above.

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.