[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [XEN RFC PATCH 24/40] xen/arm: introduce a helper to parse device tree NUMA distance map


  • To: Stefano Stabellini <sstabellini@xxxxxxxxxx>
  • From: Wei Chen <Wei.Chen@xxxxxxx>
  • Date: Thu, 2 Sep 2021 02:30:08 +0000
  • Accept-language: en-US
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=arm.com; dmarc=pass action=none header.from=arm.com; dkim=pass header.d=arm.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version; bh=lHGjjLyUt7AJP/HqcDCD570L6Zw4mKyCG8YQ/5AkkyY=; b=VfFjACFhRe1a8A65VkL0AZVQ7Su8gBGFyz70RYHLwdlp/T2cIXcaYGipyG6X8UQD1yTf5Cja9WHIZc7MjazHjkXfNl6W9PPJQelIxFG1Yhw95fsQLKuR/agtqqeIP+4I1+kxrTK0TFdFlsT3QFaJlsgnFjNix6L/EDQJgjUgnZfkyq5kF6NTLtlmgBZ5E467DHkLgRGKJj5Ox+um5atVxo8j67zXs/G6xhqynFkK8JUkoURIHlJD5jKHn3844+TsLsAES8DK5oI4pQxvD4t7Wcd80uy/r6gPuKN3UXTGV/ZI3Jva+A0gaet2YcbLkF4392/WxXLzxzKQltkt83C1bw==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=ENhPZCgrIZ3NH+4nUwGrgFS4mqKizgTFjS5GrmfodRmtQmp/lhVrFEfkW/MMxQDuhZxZsriS1yuLqC4c0T0PESlLwqOdqJ1TZOelZt8C4xjKzLnXjhKcLZv+wZMa23Xx3nVyuGYixuYGpuP1zJuxTeXseE4PhV/p0YaZgoIBCxWlHQTsCPtukPRP8D+/vwekSCRvNAYFaN43IbAtQxgRxTHz0JtwllqKYDxuM0ytr6PP8eRIRgRPb9T+JXm6Iay/3fkFAHkxiS0AgsiuixF/GlE48o3TPDxMeiQn78QqDRJGHCvFwJPtAEU/C2jX0wPD5QNnxhQqXsHxdz1QN5B2+A==
  • Authentication-results-original: kernel.org; dkim=none (message not signed) header.d=none;kernel.org; dmarc=none action=none header.from=arm.com;
  • Cc: "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>, "julien@xxxxxxx" <julien@xxxxxxx>, Bertrand Marquis <Bertrand.Marquis@xxxxxxx>
  • Delivery-date: Thu, 02 Sep 2021 02:30:30 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>
  • Nodisclaimer: true
  • Original-authentication-results: kernel.org; dkim=none (message not signed) header.d=none;kernel.org; dmarc=none action=none header.from=arm.com;
  • Thread-index: AQHXjps+iLbrOQu8n06b5z+EtCRt06uM5puAgABnI4CAAPWOAIAA3gOAgABcYACAAKNX0A==
  • Thread-topic: [XEN RFC PATCH 24/40] xen/arm: introduce a helper to parse device tree NUMA distance map

Hi Stefano,

> -----Original Message-----
> From: Stefano Stabellini <sstabellini@xxxxxxxxxx>
> Sent: 2021年9月2日 0:22
> To: Wei Chen <Wei.Chen@xxxxxxx>
> Cc: Stefano Stabellini <sstabellini@xxxxxxxxxx>; xen-
> devel@xxxxxxxxxxxxxxxxxxxx; julien@xxxxxxx; Bertrand Marquis
> <Bertrand.Marquis@xxxxxxx>
> Subject: RE: [XEN RFC PATCH 24/40] xen/arm: introduce a helper to parse
> device tree NUMA distance map
> 
> On Wed, 1 Sep 2021, Wei Chen wrote:
> > Hi Stefano,
> >
> > > -----Original Message-----
> > > From: Xen-devel <xen-devel-bounces@xxxxxxxxxxxxxxxxxxxx> On Behalf Of
> > > Stefano Stabellini
> > > Sent: 2021年9月1日 5:36
> > > To: Wei Chen <Wei.Chen@xxxxxxx>
> > > Cc: Stefano Stabellini <sstabellini@xxxxxxxxxx>; xen-
> > > devel@xxxxxxxxxxxxxxxxxxxx; julien@xxxxxxx; Bertrand Marquis
> > > <Bertrand.Marquis@xxxxxxx>
> > > Subject: RE: [XEN RFC PATCH 24/40] xen/arm: introduce a helper to
> parse
> > > device tree NUMA distance map
> > >
> > > On Tue, 31 Aug 2021, Wei Chen wrote:
> > > > Hi Stefano,
> > > >
> > > > > -----Original Message-----
> > > > > From: Stefano Stabellini <sstabellini@xxxxxxxxxx>
> > > > > Sent: 2021年8月31日 8:48
> > > > > To: Wei Chen <Wei.Chen@xxxxxxx>
> > > > > Cc: xen-devel@xxxxxxxxxxxxxxxxxxxx; sstabellini@xxxxxxxxxx;
> > > julien@xxxxxxx;
> > > > > jbeulich@xxxxxxxx; Bertrand Marquis <Bertrand.Marquis@xxxxxxx>
> > > > > Subject: Re: [XEN RFC PATCH 24/40] xen/arm: introduce a helper to
> > > parse
> > > > > device tree NUMA distance map
> > > > >
> > > > > On Wed, 11 Aug 2021, Wei Chen wrote:
> > > > > > A NUMA aware device tree will provide a "distance-map" node to
> > > > > > describe distance between any two nodes. This patch introduce a
> > > > > > new helper to parse this distance map.
> > > > > >
> > > > > > Signed-off-by: Wei Chen <wei.chen@xxxxxxx>
> > > > > > ---
> > > > > >  xen/arch/arm/numa_device_tree.c | 67
> > > +++++++++++++++++++++++++++++++++
> > > > > >  1 file changed, 67 insertions(+)
> > > > > >
> > > > > > diff --git a/xen/arch/arm/numa_device_tree.c
> > > > > b/xen/arch/arm/numa_device_tree.c
> > > > > > index bbe081dcd1..6e0d1d3d9f 100644
> > > > > > --- a/xen/arch/arm/numa_device_tree.c
> > > > > > +++ b/xen/arch/arm/numa_device_tree.c
> > > > > > @@ -200,3 +200,70 @@ device_tree_parse_numa_memory_node(const
> void
> > > *fdt,
> > > > > int node,
> > > > > >
> > > > > >      return 0;
> > > > > >  }
> > > > > > +
> > > > > > +/* Parse NUMA distance map v1 */
> > > > > > +int __init
> > > > > > +device_tree_parse_numa_distance_map_v1(const void *fdt, int
> node)
> > > > > > +{
> > > > > > +    const struct fdt_property *prop;
> > > > > > +    const __be32 *matrix;
> > > > > > +    int entry_count, len, i;
> > > > > > +
> > > > > > +    printk(XENLOG_INFO "NUMA: parsing numa-distance-map\n");
> > > > > > +
> > > > > > +    prop = fdt_get_property(fdt, node, "distance-matrix", &len);
> > > > > > +    if ( !prop )
> > > > > > +    {
> > > > > > +        printk(XENLOG_WARNING
> > > > > > +               "NUMA: No distance-matrix property in distance-
> > > map\n");
> > > > > > +
> > > > > > +        return -EINVAL;
> > > > > > +    }
> > > > > > +
> > > > > > +    if ( len % sizeof(uint32_t) != 0 )
> > > > > > +    {
> > > > > > +        printk(XENLOG_WARNING
> > > > > > +               "distance-matrix in node is not a multiple of
> > > u32\n");
> > > > > > +        return -EINVAL;
> > > > > > +    }
> > > > > > +
> > > > > > +    entry_count = len / sizeof(uint32_t);
> > > > > > +    if ( entry_count <= 0 )
> > > > > > +    {
> > > > > > +        printk(XENLOG_WARNING "NUMA: Invalid distance-
> matrix\n");
> > > > > > +
> > > > > > +        return -EINVAL;
> > > > > > +    }
> > > > > > +
> > > > > > +    matrix = (const __be32 *)prop->data;
> > > > > > +    for ( i = 0; i + 2 < entry_count; i += 3 )
> > > > > > +    {
> > > > > > +        uint32_t from, to, distance;
> > > > > > +
> > > > > > +        from = dt_read_number(matrix, 1);
> > > > > > +        matrix++;
> > > > > > +        to = dt_read_number(matrix, 1);
> > > > > > +        matrix++;
> > > > > > +        distance = dt_read_number(matrix, 1);
> > > > > > +        matrix++;
> > > > > > +
> > > > > > +        if ( (from == to && distance != NUMA_LOCAL_DISTANCE) ||
> > > > > > +            (from != to && distance <= NUMA_LOCAL_DISTANCE) )
> > > > > > +        {
> > > > > > +            printk(XENLOG_WARNING
> > > > > > +                   "Invalid nodes' distance from node#%d to
> node#%d
> > > > > = %d\n",
> > > > > > +                   from, to, distance);
> > > > > > +            return -EINVAL;
> > > > > > +        }
> > > > > > +
> > > > > > +        printk(XENLOG_INFO "NUMA: distance from node#%d to
> node#%d
> > > > > = %d\n",
> > > > > > +               from, to, distance);
> > > > > > +        numa_set_distance(from, to, distance);
> > > > > > +
> > > > > > +        /* Set default distance of node B->A same as A->B */
> > > > > > +        if (to > from)
> > > > > > +             numa_set_distance(to, from, distance);
> > > > >
> > > > > I am a bit unsure about this last 2 lines: why calling
> > > numa_set_distance
> > > > > in the opposite direction only when to > from? Wouldn't it be OK
> to
> > > > > always do both:
> > > > >
> > > > > numa_set_distance(from, to, distance);
> > > > > numa_set_distance(to, from, distance);
> > > > >
> > > > > ?
> > > > >
> > > > I borrowed this code from Linux, but here is my understanding:
> > > >
> > > > First, I read some notes in
> Documentation/devicetree/bindings/numa.txt
> > > > 1. Each entry represents distance from first node to second node.
> > > > The distances are equal in either direction.
> > > > 2. distance-matrix should have entries in lexicographical ascending
> > > > order of nodes.
> > > >
> > > > Here is an example of distance-map node in DTB:
> > > > Sample#1, full list:
> > > >                 distance-map {
> > > >                          compatible = "numa-distance-map-v1";
> > > >                          distance-matrix = <0 0  10>,
> > > >                                            <0 1  20>,
> > > >                                            <0 2  40>,
> > > >                                            <0 3  20>,
> > > >                                            <1 0  20>,
> > > >                                            <1 1  10>,
> > > >                                            <1 2  20>,
> > > >                                            <1 3  40>,
> > > >                                            <2 0  40>,
> > > >                                            <2 1  20>,
> > > >                                            <2 2  10>,
> > > >                                            <2 3  20>,
> > > >                                            <3 0  20>,
> > > >                                            <3 1  40>,
> > > >                                            <3 2  20>,
> > > >                                            <3 3  10>;
> > > >                 };
> > > >
> > > > Call numa_set_distance when "to > from" will prevent Xen to call
> > > > numa_set_distance(0, 1, 20) again when it's setting distance for <1
> 0
> > > 20>.
> > > > But, numa_set_distance(1, 0, 20) will be call twice.
> > > >
> > > > Normally, distance-map node will be optimized in following sample#2,
> > > > all redundant entries are removed:
> > > > Sample#2, partial list:
> > > >                 distance-map {
> > > >                          compatible = "numa-distance-map-v1";
> > > >                          distance-matrix = <0 0  10>,
> > > >                                            <0 1  20>,
> > > >                                            <0 2  40>,
> > > >                                            <0 3  20>,
> > > >                                            <1 1  10>,
> > > >                                            <1 2  20>,
> > > >                                            <1 3  40>,
> > > >                                            <2 2  10>,
> > > >                                            <2 3  20>,
> > > >                                            <3 3  10>;
> > > >                 };
> > > >
> > > > There is not any "from > to" entry in the map. But using this
> partial
> > > map
> > > > still can set all distances for all pairs. And numa_set_distance(1,
> 0,
> > > 20)
> > > > will be only once.
> > >
> > > I see. I can't find in Documentation/devicetree/bindings/numa.txt
> where
> > > it says that "from > to" nodes can be omitted. If it is not written
> > > down, then somebody could easily optimize it the opposite way:
> > >
> > >                    distance-matrix = <0 0  10>,
> > >                                      <1 0  20>,
> > >                                      <2 0  40>,
> > >                                      <3 0  20>,
> > >                                      <1 1  10>,
> > >                                      <2 1  20>,
> > >                                      <3 1  40>,
> > >                                      <2 2  10>,
> > >                                      <3 2  20>,
> > >                                      <3 3  10>;
> > >
> >
> > Yes, you're right. Spec doesn't say opposite way is unallowed.
> >
> > > I think the code in Xen should be resilient and able to cope with a
> > > device tree like the one you wrote or the one I wrote. From a code
> > > perspective, it should be very easy to do. If nothing else it would
> make
> > > Xen more resilient against buggy firmware.
> > >
> > >
> >
> > I don't disagree with that.
> >
> > > > > But in any case, I have a different suggestion. The binding states
> > > that
> > > > > "distances are equal in either direction". Also it has an example
> > > where
> > > > > only one direction is expressed unfortunately (at the end of the
> > > > > document).
> > > > >
> > > >
> > > > Oh, I should see this comment first, then I will not post above
> > > > comment : )
> > > >
> > > > > So my suggestion is to parse it as follows:
> > > > >
> > > > > - call numa_set_distance just once from
> > > > >   device_tree_parse_numa_distance_map_v1
> > > > >
> > > > > - in numa_set_distance:
> > > > >     - set node_distance_map[from][to] = distance;
> > > > >     - check node_distance_map[to][from]
> > > > >           - if unset, node_distance_map[to][from] = distance;
> > > > >           - if already set to the same value, return success;
> > > > >           - if already set to a different value, return error;
> > > >
> > > > I don't really like this implementation. I want the behavior of
> > > > numa_set_distance just like the function name, do not include
> > > > implicit operations. Otherwise, except the user read this function
> > > > implementation before he use it, he probably doesn't know this
> > > > function has done so many things.
> > >
> > > You can leave numa_set_distance as-is without any implicit operations.
> > >
> > > In that case, just call numa_set_distance twice from numa_set_distance
> > > for both from/to and to/from. numa_set_distance could return error is
> >
> > I am OK for the first sentence. But...
> >
> > > the entry was already set to a different value or success otherwise
> > > (also in the case it was already set to the same value). This would
> >
> > ... I prefer not to check the previous value. Subsequent
> numa_set_distance
> > call will override previous calls. Keep numa_set_distance as simple as
> > it can. And when you pass new data to numa_set_distance, it doesn't
> > know whether the previous data was correct or the new data is correct.
> > Only caller may have known.
> 
> That might be OK but if not numa_set_distance then somebody else needs
> to check against overwriting previous values. That is to be able to spot
> bad device tree cases like:
> 
>   0 1 20
>   1 0 40


How about we check it still in NUMA distance parse function?
Before setting the numa_set_distance for one pair nodes (e.g. a -> b),
we can get its opposite way distance first.

distance_b_a = __node_distance(b, a); ==> get opposite way distance.
if (distance_b_a == 0) ==> opposite way distance has not been set
{
    numa_set_distance(a, b, 20); ==> set both
    numa_set_distance(b, a, 20)
} else {
    if (distance_b_a == 20) ==> opposite way distance has been set
       numa_set_distance(a, b, 20); ==> set this way only
    else ===> opposite way distance has been set, but is unmatched
       // What can we do here?
       Panic the system? or Just warning users? Or choose the bigger
       distance for both ways?
       
       And distance_b_a == NUMA_NO_DISTANCE would be a special case
       here.
}



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.