[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v10 9/9] libxl: vnuma topology configuration parser and doc



Hello Dario

Thank you for such a thorough review.
And you are right, it will be changed slightly as we agreed on the
different approach to declare vnuma in libxl :)


On Thu, Sep 11, 2014 at 1:13 PM, Dario Faggioli
<dario.faggioli@xxxxxxxxxx> wrote:
> So, as far as I understood it, v11 did not include a toolstack part, so
> this version is what I should be looking at, is this correct?
>
> I'm also looking at the commits in the github repo (for v11), and the
> code seems to be the same as here.
>
> Please, do correct me if I'm wrong. Hopefully, the comments I'm leaving
> about this version, will be useful anyway.
>
> Also, as we agreed upon changing the libxl data structures for this
> feature (i.e., the vNUMA related field names and types), the review will
> focus on the behavior of the xl parser (and on the structure of the
> parsing code, of course), and neglect that part.
>
> So, first of all, about the subject: 'libxl: vnuma...'. This patch looks
> more about xl than libxl (so I'd expect something like 'xl: vnuma...').
>
> On mer, 2014-09-03 at 00:24 -0400, Elena Ufimtseva wrote:
>> --- a/docs/man/xl.cfg.pod.5
>> +++ b/docs/man/xl.cfg.pod.5
>> @@ -264,6 +264,83 @@ if the values of B<memory=> and B<maxmem=> differ.
>>  A "pre-ballooned" HVM guest needs a balloon driver, without a balloon driver
>>  it will crash.
>>
> In general, about the config file interface, what I think what we want
> an user to be able to specify is:
>
>  1. number of virtual NUMA nodes
>     (default: 1)
>  2. amount of memory on each virtual NUMA node
>     (default: total tot_memory / nr_nodes)
>  3. the 'distances' from each node to each other one
>     (default: something sane :-) )
>  4. what vcpus on each virtual NUMA node
>     (default: distribute evenly)
>  5. virtual node to physical node mapping
>     (default: driven by the existing automatic placement logic, or
>     by pinning)
>
> Which is pretty much what this patch allows to do, so, good! :-D
>
> The syntax used for each of these parameters is also mostly fine, IMO,
> with the only exception of the mapping between vCPUs and vNODEs (more on
> that below).
>
> About the name of the parameters, in this patch, we have:
>
>  1. "vnuma_nodes"
>  2. "vnuma_mem"
>  3. "vdistance"
>  4. "vnuma_cpumap"
>  5. "vnuma_nodemap"
>
> I have to admit, I don't especially like these, so here's my proposal.
> I'm of course open to other views here, I'm actually just tossing names
> there (with a brief rationale and explanation)... :-)
>
>  1. "vnodes": for vCPUs, we have "vcpus", I think something along the
>               same line would make sense;
>  2. "vnodes_memory" : same reasoning as above (BTW, we may want to go
>                       for something like "vnodes_maxmem", as that is
>                       probably what we're actually specifying, isn't it?
>                       I'm not sure how important it is to stress that,
>                       though);
>  3. "vnodes_distance[s]": or just "vnodes_dist[s]"
>  4. "vcpus_to_vnodes"
>  5. "vnodes_to_pnodes": not particularly proud of this, but can't think
>                         of anything better right now.
>
> Thoughts?

Yep, totally fine with these names.

>
>> +=item B<vnuma_nodes=N>
>> +
>> +Number of vNUMA nodes the guest will be initialized with on boot.
>> +PV guest by default will have one vnuma node.
>> +
> It's not only PV guests that will have, by default, just one node...
> This is true for every guest (the reason being vNUMA is not even
> implemented for such other guest types!! :-) ).
>
> As it stands, it looks to me like other kind of guest than PV can have a
> different number of NUMA nodes by default.
>
>> +=item B<vnuma_mem=[vmem1, vmem2, ...]>
>> +
>> +List of memory sizes for each node, defined in MBytes. Number of items 
>> listed must
>> +match nr_vnodes.
>>
> "must match the number of virtual NUMA nodes specified by
> B<vnuma_nodes=>"
>
>> If the sum of all vnode memories does not match the domain memory
>>
> ", specified by B<memory=> and/or B<maxmem=>,"
>
>> +or there are missing nodes, it will fail.
>> +If not specified, memory will be equally split between vnodes. Current 
>> minimum
>> +memory size for one node is limited by 32MB.
>> +
>> +Example: vnuma_mem=[1024, 1024, 2048, 2048]
>> +Total amount of memory in guest: 6GB
>> +
>
>> +=item B<vnuma_vcpumap=[node_nr, node_nr, ...]>
>> +
>> +Defines vcpu to vnode mapping as a list of integers. The position in the 
>> list
>> +is a vcpu number, and the value is the vnode number to which the vcpu will 
>> be
>> +assigned to.
>> +Current limitations:
>> +- vNUMA node must have at least one vcpu, otherwise default vcpu_to_vnode 
>> will be used.
>> +- Total number of vnodes cannot be bigger then number of vcpus.
>> +
>> +Example:
>> +Map of 4 vcpus to 2 vnodes:
>> +0,1 vcpu -> vnode0
>> +2,3 vcpu -> vnode1:
>> +
>> +vnuma_vcpumap = [0, 0, 1, 1]
>> + 4 vcpus here -  0  1  2  3
>> +
>>
> I wonder whether it would be better to turn this into something lieke
> the below:
>
> ["0,1,2,3", "4-7", "8,10,12,14", "9,11,13,15"]

Yep, I think its good to have more unified approach.
>
> With the following meaning: there are 4 nodes, and vCPUs 0,1,2,3 belongs
> to node 0, vCPUs 4,5,6,7 belongs to node 1, etc.
>
> It looks a lot more easy to use to me. With the current syntax, the same
> configuration would be expressed like this:
>
> [0,0,0,0,1,1,1,1,2,3,2,3,2,3,2,3]
>
> Quite hard to get right when both writing and reading it, isn't it?
>
> Elena's example looks like this:
>
> ["0,1", "2,3"] (with my notation)
> [0, 0, 1, 1]   (with this patch notation)
>
> From an implementation point of view, the code for parsing strings like
> these (and also for parsing list of strings like this), is already there
> in xl, so it shouldn't even be too big of a piece of work...
>
>> +=item B<vnuma_vnodemap=[p1, p2, ..., pn]>
>> +
>> +List of physical node numbers, position in the list represents vnode number.
>> +Used for manual placement of vnuma nodes to physical NUMA nodes.
>> +Will not be used if automatic numa placement is active.
>> +
>> +Example:
>> +assume NUMA machine with 4 physical nodes. Placing vnuma node 0 to pnode 2,
>> +vnuma node 1 to pnode 3:
>> +vnode0 -> pnode2
>> +vnode1 -> pnode3
>> +
>> +vnuma_vnodemap=[2, 3]
>> +first vnode will be placed on node 2, second on node 3.
>> +
>>
> I'm fine with this.
>
>> +=item B<vnuma_autoplacement=[0|1]>
>> +
> To be killed. :-)
>
>>  =head3 Event Actions
>> diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c
>> index 409a795..1af2250 100644
>> --- a/tools/libxl/xl_cmdimpl.c
>> +++ b/tools/libxl/xl_cmdimpl.c
>> @@ -40,6 +40,7 @@
>>  #include "libxl_json.h"
>>  #include "libxlutil.h"
>>  #include "xl.h"
>> +#include "libxl_vnuma.h"
>>
>>  /* For calls which return an errno on failure */
>>  #define CHK_ERRNOVAL( call ) ({                                         \
>> @@ -797,6 +798,432 @@ static void 
>> parse_vcpu_affinity(libxl_domain_build_info *b_info,
>>      }
>>  }
>>
>> +static unsigned int get_list_item_uint(XLU_ConfigList *list, unsigned int i)
>> +{
>> +    const char *buf;
>> +    char *ep;
>> +    unsigned long ul;
>> +    int rc = -EINVAL;
>> +
>> +    buf = xlu_cfg_get_listitem(list, i);
>> +    if (!buf)
>> +        return rc;
>> +    ul = strtoul(buf, &ep, 10);
>> +    if (ep == buf)
>> +        return rc;
>> +    if (ul >= UINT16_MAX)
>> +        return rc;
>> +    return (unsigned int)ul;
>> +}
>> +
> I see what you want to achieve here, but I don't particularly like
> having this as an helper here. If we want something this generic,
> perhaps put this in xlu? Although, I'm not sure about this either...
>
> It's probably not such a great deal, and it's a tool's maintainer call,
> I think.
>
>> +static void vdistance_set(unsigned int *vdistance,
>> +                                unsigned int nr_vnodes,
>> +                                unsigned int samenode,
>> +                                unsigned int othernode)
>> +{
>> +    unsigned int idx, slot;
>> +    for (idx = 0; idx < nr_vnodes; idx++)
>> +        for (slot = 0; slot < nr_vnodes; slot++)
>> +            *(vdistance + slot * nr_vnodes + idx) =
>> +                idx == slot ? samenode : othernode;
>> +}
>> +
>> +static void vcputovnode_default(unsigned int *cpu_to_node,
>> +                                unsigned int nr_vnodes,
>> +                                unsigned int max_vcpus)
>> +{
>> +    unsigned int cpu;
>> +    for (cpu = 0; cpu < max_vcpus; cpu++)
>> +        cpu_to_node[cpu] = cpu % nr_vnodes;
>> +}
>> +
>> +/* Split domain memory between vNUMA nodes equally. */
>> +static int split_vnumamem(libxl_domain_build_info *b_info)
>> +{
>> +    unsigned long long vnodemem = 0;
>> +    unsigned long n;
>> +    unsigned int i;
>> +
>> +    if (b_info->vnodes == 0)
>> +        return -1;
>> +
> This is never called (and that's ok!) with when b_info->vnodes==0, so I
> find the check more confusing than helpful.
>
> An assert(), maybe?
>
>> +    vnodemem = (b_info->max_memkb >> 10) / b_info->vnodes;
>> +    if (vnodemem < MIN_VNODE_SIZE)
>> +        return -1;
>> +    /* reminder in MBytes. */
>> +    n = (b_info->max_memkb >> 10) % b_info->vnodes;
>> +    /* get final sizes in MBytes. */
>> +    for (i = 0; i < (b_info->vnodes - 1); i++)
>> +        b_info->vnuma_mem[i] = vnodemem;
>> +    /* add the reminder to the last node. */
>> +    b_info->vnuma_mem[i] = vnodemem + n;
>> +    return 0;
>> +}
>> +
>> +static void vnuma_vnodemap_default(unsigned int *vnuma_vnodemap,
>> +                                   unsigned int nr_vnodes)
>> +{
>> +    unsigned int i;
>> +    for (i = 0; i < nr_vnodes; i++)
>> +        vnuma_vnodemap[i] = VNUMA_NO_NODE;
>> +}
>> +
>> +/*
>> + * init vNUMA to "zero config" with one node and all other
>> + * topology parameters set to default.
>> + */
>> +static int vnuma_default_config(libxl_domain_build_info *b_info)
>> +{
>> +    b_info->vnodes = 1;
>> +    /* all memory goes to this one vnode, as well as vcpus. */
>> +    if (!(b_info->vnuma_mem = (uint64_t *)calloc(b_info->vnodes,
>> +                                sizeof(*b_info->vnuma_mem))))
>> +        goto bad_vnumazerocfg;
>> +
>> +    if (!(b_info->vnuma_vcpumap = (unsigned int *)calloc(b_info->max_vcpus,
>> +                                sizeof(*b_info->vnuma_vcpumap))))
>> +        goto bad_vnumazerocfg;
>> +
>> +    if (!(b_info->vdistance = (unsigned int *)calloc(b_info->vnodes *
>> +                                b_info->vnodes, 
>> sizeof(*b_info->vdistance))))
>> +        goto bad_vnumazerocfg;
>> +
>> +    if (!(b_info->vnuma_vnodemap = (unsigned int *)calloc(b_info->vnodes,
>> +                                sizeof(*b_info->vnuma_vnodemap))))
>> +        goto bad_vnumazerocfg;
>> +
>> +    b_info->vnuma_mem[0] = b_info->max_memkb >> 10;
>> +
>> +    /* all vcpus assigned to this vnode. */
>> +    vcputovnode_default(b_info->vnuma_vcpumap, b_info->vnodes,
>> +                        b_info->max_vcpus);
>> +
>> +    /* default vdistance is 10. */
>> +    vdistance_set(b_info->vdistance, b_info->vnodes, 10, 10);
>> +
>> +    /* VNUMA_NO_NODE for vnode_to_pnode. */
>> +    vnuma_vnodemap_default(b_info->vnuma_vnodemap, b_info->vnodes);
>> +
>> +    /*
>> +     * will be placed to some physical nodes defined by automatic
>> +     * numa placement or VNUMA_NO_NODE will not request exact node.
>> +     */
>> +    libxl_defbool_set(&b_info->vnuma_autoplacement, true);
>> +    return 0;
>> +
>> + bad_vnumazerocfg:
>> +    return -1;
>> +}
>> +
> Can't this, and with it most of the above helpers, be implemented in
> such a way that it fits inside libxl__domain_build_info_setdefault()
> (and, if yes, moved in a libxl related patch)?
>
> I'm not 100% sure, but it really looks similar to the kind of things we
> usually do there rather than in xl.
>
>> +static void free_vnuma_info(libxl_domain_build_info *b_info)
>> +{
>> +    free(b_info->vnuma_mem);
>> +    free(b_info->vdistance);
>> +    free(b_info->vnuma_vcpumap);
>> +    free(b_info->vnuma_vnodemap);
>> +
>> +    b_info->vnuma_mem = NULL;
>> +    b_info->vdistance = NULL;
>> +    b_info->vnuma_vcpumap = NULL;
>> +    b_info->vnuma_vnodemap = NULL;
>> +
>> +    b_info->vnodes = 0;
>> +    b_info->vmemranges = 0;
>> +}
>> +
> This appears to be called only once, right before an "exit(1)", in xl.
> Do we need it? For sure we don't need the =NULL and =0 thing, do we?
>
>> +static int parse_vnuma_mem(XLU_Config *config,
>> +                            libxl_domain_build_info **b_info)
>> +{
>> +    libxl_domain_build_info *dst;
>>
> Why the **b_info and the *dst thing?
>
>> +    XLU_ConfigList *vnumamemcfg;
>> +    int nr_vnuma_regions, i;
>> +    unsigned long long vnuma_memparsed = 0;
>> +    unsigned long ul;
>> +    const char *buf;
>> +    char *ep;
>> +
>> +    dst = *b_info;
>> +    if (!xlu_cfg_get_list(config, "vnuma_mem",
>> +                          &vnumamemcfg, &nr_vnuma_regions, 0)) {
>> +
>> +        if (nr_vnuma_regions != dst->vnodes) {
>> +            fprintf(stderr, "Number of numa regions (vnumamem = %d) is \
>> +                    incorrect (should be %d).\n", nr_vnuma_regions,
>> +                    dst->vnodes);
>>
> I find the term 'regions' rather confusing, both in the variable name
> and in the error message. I'd go for something like nr_vnuma_mem, for
> the variable. For the message, maybe something like this:
>
>  "Specified the amount of memory for %d virtual nodes,\n"
>  "while number of virtual node is %d!\n"
>
> But maybe it's just me. :-)
>
> Oh, the "(vnumamem = %d)" part, at least, makes very few sense to me, as
> I don't thing the word 'vnumamem', written like that and associated with
> an integer, makes much sense to an user. At least that should change, I
> think.
>
>> +            goto bad_vnuma_mem;
>> +        }
>> +
>> +        dst->vnuma_mem = calloc(dst->vnodes,
>> +                                 sizeof(*dst->vnuma_mem));
>>
> If you go for the libxl__build_info_setdefault() route I suggested
> above, this may need to change a bit, as you'll probably find
> b_info->vnuma_mem allocated already.
>
> Anyway, I don't think it would be a big deal to, in that case, either
> use realloc() here, or use NULL as default value (and hence set it to
> that in libxl__build_info_sedefault() ).
>
>> +        if (dst->vnuma_mem == NULL) {
>> +            fprintf(stderr, "Unable to allocate memory for vnuma 
>> ranges.\n");
>>
> Ditto, about 'ranges'.
>
>> +            goto bad_vnuma_mem;
>> +        }
>> +
>> +        /*
>> +         * Will parse only nr_vnodes times, even if we have more/less 
>> regions.
>>
> How can it be that you have fewer or more entries than b_info->vnodes?
>
>> +         * Take care of it later if less or discard if too many regions.
>>
> You took care of that above already...
>
>> +         */
>> +        for (i = 0; i < dst->vnodes; i++) {
>> +            buf = xlu_cfg_get_listitem(vnumamemcfg, i);
>> +            if (!buf) {
>> +                fprintf(stderr,
>> +                        "xl: Unable to get element %d in vnuma memory 
>> list.\n", i);
>> +                goto bad_vnuma_mem;
>>
> The common pattern in xl seems to be to exit() right away, if this
> happens (whether to exit with 1 or -1, that seems a bit inconsistent to
> me, at a quick glance).
>
>> +            }
>> +
>> +            ul = strtoul(buf, &ep, 10);
>> +            if (ep == buf) {
>> +                fprintf(stderr, "xl: Invalid argument parsing vnumamem: 
>> %s.\n", buf);
>> +                goto bad_vnuma_mem;
>>
> And the same (i.e., just exit() ) applies here... And pretty much
> everywhere, actually! :-)
>
>> +            }
>> +
>> +            /* 32Mb is a min size for a node, taken from Linux */
>> +            if (ul >= UINT32_MAX || ul < MIN_VNODE_SIZE) {
>>
> What's the idea behing UINT32_MAX here?
>
> I guess what I mean is, if there is, for any reason, a maximum virtual
> node size, then it may be useful to #define it, e.g., MAX_VNODE_SIZE
> (just as you're doing for MIN_VNODE_SIZE). If there is not, then we can
> just live with what the type can handle, and it will be lower layers
> that will return appropriate errors if values are wrong/unrealistic.
>
> Like this, you're saying that such maximum virtual node size is 4096 TB
> (if I'm not doing the math wrong), which I don't think it makes much
> sense... :-P
>
>> +                fprintf(stderr, "xl: vnuma memory %lu is not within %u - %u 
>> range.\n",
>> +                        ul, MIN_VNODE_SIZE, UINT32_MAX);
>>
> "xl: memory for virtual node %lu is outside of [%u, %u] MB.\n"
>
>> +                goto bad_vnuma_mem;
>> +            }
>> +
>> +            /* memory in MBytes */
>> +            dst->vnuma_mem[i] = ul;
>> +        }
>> +
>> +        /* Total memory for vNUMA parsed to verify */
>> +        for (i = 0; i < nr_vnuma_regions; i++)
>> +            vnuma_memparsed = vnuma_memparsed + (dst->vnuma_mem[i]);
>> +
> Not a big deal, but can't you accumulate while doing the parsing?
>
>> +        /* Amount of memory for vnodes same as total? */
>> +        if ((vnuma_memparsed << 10) != (dst->max_memkb)) {
>> +            fprintf(stderr, "xl: vnuma memory is not the same as domain \
>> +                    memory size.\n");
>> +            goto bad_vnuma_mem;
>> +        }
>> +    } else {
>> +        dst->vnuma_mem = calloc(dst->vnodes,
>> +                                      sizeof(*dst->vnuma_mem));
>> +        if (dst->vnuma_mem == NULL) {
>> +            fprintf(stderr, "Unable to allocate memory for vnuma 
>> ranges.\n");
>> +            goto bad_vnuma_mem;
>> +        }
>> +
> The allocation is the same as above. It can live outside of the if,
> can't it? Especially considering that, as said before, you can just
> exit() on failure.
>
>> +        fprintf(stderr, "WARNING: vNUMA memory ranges were not 
>> specified.\n");
>> +        fprintf(stderr, "Using default equal vnode memory size %lu Kbytes \
>> +                to cover %lu Kbytes.\n",
>> +                dst->max_memkb / dst->vnodes, dst->max_memkb);
>> +
> Does this deserve a warning? In the man page, we say it's legal not to
> specify this, so I may well have read that, and may be doing this on
> purpose...
>
>> +        if (split_vnumamem(dst) < 0) {
>> +            fprintf(stderr, "Could not split vnuma memory into equal 
>> chunks.\n");
>> +            goto bad_vnuma_mem;
>> +        }
>> +    }
>> +    return 0;
>> +
>> + bad_vnuma_mem:
>> +    return -1;
>> +}
>> +
>> +static int parse_vnuma_distance(XLU_Config *config,
>> +                                libxl_domain_build_info **b_info)
>> +{
>> +    libxl_domain_build_info *dst;
>>
> Same as above.
>
>> +    XLU_ConfigList *vdistancecfg;
>> +    int nr_vdist;
>> +
>> +    dst = *b_info;
>> +    dst->vdistance = calloc(dst->vnodes * dst->vnodes,
>> +                               sizeof(*dst->vdistance));
>> +    if (dst->vdistance == NULL)
>> +        goto bad_distance;
>> +
>> +    if (!xlu_cfg_get_list(config, "vdistance", &vdistancecfg, &nr_vdist, 
>> 0)) {
>> +        int d1, d2, i;
>> +        /*
>> +         * First value is the same node distance, the second as the
>> +         * rest of distances. The following is required right now to
>>
> I'm not a native English speaker, but 'The latter' sounds more correct
> than 'The following' (if that is what you mean; if it's not, then I
> didn't understand what you mean! :-) ).
>
>> +         * avoid non-symmetrical distance table as it may break latest 
>> kernel.
>> +         * TODO: Better way to analyze extended distance table, possibly
>> +         * OS specific.
>> +         */
>> +
> Ok. So, AFAIUI, for now, you want the list to be two elements long, is
> that right? If yes, should we check that nr_vdist==2 ?
>
> What happens if I have b_info->vnodes > 1, and I pass a list only one
> element long ?
>
>> +        for (i = 0; i < nr_vdist; i++) {
>> +            d1 = get_list_item_uint(vdistancecfg, i);
>> +        }
>> +
> Mmm... What's this?
>
>> +        d1 = get_list_item_uint(vdistancecfg, 0);
>> +        if (dst->vnodes > 1)
>> +           d2 = get_list_item_uint(vdistancecfg, 1);
>> +        else
>> +           d2 = d1;
>> +
>> +        if (d1 >= 0 && d2 >= 0) {
>> +            if (d1 < d2)
>> +                fprintf(stderr, "WARNING: vnuma distance d1 < d2, %u < 
>> %u\n", d1, d2);
>>
> I agree this looks weird, and deserves a warning.
>
> I'd mention why that is the case, in the warning message, i.e., because
> the distance from a node to _itself_ is being set to a bigger value than
> the istance between two different nodes.
>
>> +            vdistance_set(dst->vdistance, dst->vnodes, d1, d2);
>> +        } else {
>> +            fprintf(stderr, "WARNING: vnuma distance values are 
>> incorrect.\n");
>> +            goto bad_distance;
>>
> After printing this "WARNING", you goto bad_distance, which means this
> returns -1, which means we below exit(1).
>
> So, this really looks like an error, rather than a warning to me.
>
>> +        }
>> +    } else {
>> +        fprintf(stderr, "Could not parse vnuma distances.\n");
>> +        vdistance_set(dst->vdistance, dst->vnodes, 10, 20);
>> +    }
>> +    return 0;
>> +
>> + bad_distance:
>> +    return -1;
>>
> As said before, xl often just exit on failures like these.
>
>> +}
>> +
>> +static int parse_vnuma_vcpumap(XLU_Config *config,
>> +                                libxl_domain_build_info **b_info)
>> +{
>> +    libxl_domain_build_info *dst;
>>
> ...
>
>> +    XLU_ConfigList *vcpumap;
>> +    int nr_vcpumap, i;
>> +
>> +    dst = *b_info;
>> +    dst->vnuma_vcpumap = (unsigned int *)calloc(dst->max_vcpus,
>> +                                     sizeof(*dst->vnuma_vcpumap));
>>
> You don't need the cast.
>
>> +    if (dst->vnuma_vcpumap == NULL)
>> +        goto bad_vcpumap;
>> +
> exit(1);
>
>> +    if (!xlu_cfg_get_list(config, "vnuma_vcpumap",
>> +                          &vcpumap, &nr_vcpumap, 0)) {
>> +        if (nr_vcpumap == dst->max_vcpus) {
>>
>     if (nr_vcpumap != b_info->max_vcpus) {
>         fprintf(stderr, "xl: mapping of vCPUs to vNODEs specified for "
>                         "%d vCPUs, while the domain has %d",
>                 nr_vcpumap, b_inf->max_vcpus);
>         exit(1)
>     }
>
> and then continue with one less level of indentation. :-)
>
>> +            unsigned int  vnode, vcpumask = 0, vmask;
>> +
>> +            vmask = ~(~0 << nr_vcpumap);
>> +            for (i = 0; i < nr_vcpumap; i++) {
>> +                vnode = get_list_item_uint(vcpumap, i);
>> +                if (vnode >= 0 && vnode < dst->vnodes) {
>> +                    vcpumask |= (1 << i);
>> +                    dst->vnuma_vcpumap[i] = vnode;
>> +                }
>> +            }
>> +
>> +            /* Did it covered all vnodes in the vcpu mask? */
>> +            if ( !(((vmask & vcpumask) + 1) == (1 << nr_vcpumap)) ) {
>> +                fprintf(stderr, "WARNING: Not all vnodes were covered \
>> +                        in numa_cpumask.\n");
>> +                goto bad_vcpumap;
>> +            }
>>
> As we settled on using libxl_bitmap for this, I expect this to change
> quite a bit (e.g., we'd have to build a bitmap per each node, the
> libxl_bitmap manipulating functions should be used, etc.).
>
>> +        } else {
>> +            fprintf(stderr, "WARNING:  Bad vnuma_vcpumap.\n");
>> +            goto bad_vcpumap;
>> +        }
>> +    }
>> +    else
>> +        vcputovnode_default(dst->vnuma_vcpumap,
>> +                            dst->vnodes,
>> +                            dst->max_vcpus);
>> +    return 0;
>> +
>> + bad_vcpumap:
>> +    return -1;
>> +}
>> +
>> +static int parse_vnuma_vnodemap(XLU_Config *config,
>> +                                libxl_domain_build_info **b_info)
>> +{
>> +    libxl_domain_build_info *dst;
> No need?
>
>> +    XLU_ConfigList *vnodemap;
>> +    int nr_vnodemap, i;
>> +
>> +    dst = *b_info;
>> +
>> +    /* There is mapping to NUMA physical nodes? */
>> +    dst->vnuma_vnodemap = (unsigned int *)calloc(dst->vnodes,
>> +                           sizeof(*dst->vnuma_vnodemap));
>>
> no cast.
>
>> +    if (dst->vnuma_vnodemap == NULL)
>> +        goto bad_vnodemap;
>> +
>> +    if (!xlu_cfg_get_list(config, "vnuma_vnodemap",
>> +                          &vnodemap, &nr_vnodemap, 0)) {
>> +        /*
>> +         * If not specified or incorrect, will be defined
>> +         * later based on the machine architecture, configuration
>> +         * and memory availble when creating domain.
>> +         */
>> +        libxl_defbool_set(&dst->vnuma_autoplacement, false);
>>
> As agreed in the other part of this thread, there is already an
> automatic placement switch to act on, in this case.
>
> Note that the way that switch is being used right now in libxl code, may
> need to change a bit, but that is, nevertheless, the right thing to do.
> So have a look at that, please. :-)
>
>> +        if (nr_vnodemap == dst->vnodes) {
>> +            unsigned int vnodemask = 0, pnode, smask;
>> +            smask = ~(~0 << dst->vnodes);
>> +            for (i = 0; i < dst->vnodes; i++) {
>> +                pnode = get_list_item_uint(vnodemap, i);
>> +                if (pnode >= 0) {
>> +                    vnodemask |= (1 << i);
>> +                    dst->vnuma_vnodemap[i] = pnode;
>> +                }
>> +            }
>> +
>> +            /* Did it covered all vnodes in the mask? */
>> +            if ( !(((vnodemask & smask) + 1) == (1 << nr_vnodemap)) ) {
>> +                fprintf(stderr, "WARNING: Not all vnodes were covered \
>> +                        vnuma_vnodemap.\n");
>> +                fprintf(stderr, "Automatic placement will be used for 
>> vnodes.\n");
>> +                libxl_defbool_set(&dst->vnuma_autoplacement, true);
>> +                vnuma_vnodemap_default(dst->vnuma_vnodemap, dst->vnodes);
>> +            }
>> +        }
>> +        else {
>> +            fprintf(stderr, "WARNING: Incorrect vnuma_vnodemap.\n");
>> +            fprintf(stderr, "Automatic placement will be used for 
>> vnodes.\n");
>> +            libxl_defbool_set(&dst->vnuma_autoplacement, true);
>> +            vnuma_vnodemap_default(dst->vnuma_vnodemap, dst->vnodes);
>> +        }
>> +    }
>>
> This is also about to change quite substantially, so I'm avoiding
> commenting on it...
>
>> +static void parse_vnuma_config(XLU_Config *config,
>> +                               libxl_domain_build_info *b_info)
>> +{
>> +    long l;
>> +
>> +    if (!xlu_cfg_get_long (config, "vnodes", &l, 0)) {
>> +        if (l > MAX_VNUMA_NODES) {
>> +            fprintf(stderr, "Too many vnuma nodes, max %d is allowed.\n",
>> +                    MAX_VNUMA_NODES);
>> +            goto bad_vnuma_config;
>> +        }
>> +        b_info->vnodes = l;
>> +
>> +        if (!xlu_cfg_get_defbool(config, "vnuma_autoplacement",
>> +                    &b_info->vnuma_autoplacement, 0))
>> +            libxl_defbool_set(&b_info->vnuma_autoplacement, false);
>> +
> "vnuma_autoplacement" is, AFAIUI, going away, so no comments. ;-P
>
>
> Regards,
> Dario
>
> --
> <<This happens because I choose it to happen!>> (Raistlin Majere)
> -----------------------------------------------------------------
> Dario Faggioli, Ph.D, http://about.me/dario.faggioli
> Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)
>

Majority of comments make sense to me.
Now I wanted to ask about the memory ranges :)
I am not sure at this point I should have them parsed in config file
(for the case where vNUMA node has multiple ranges) and as Ian has
mentioned I will be moving
these ranges away from libxl build info.
I guess for now It will be better to leave these ranges out of scope
of parsing code.
What do you think?

Thank you!

-- 
Elena

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.