[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH 09 of 11] libxl, xl: enable automatic placement of guests on NUMA nodes



Dario Faggioli writes ("[PATCH 09 of 11] libxl, xl: enable automatic placement 
of guests on NUMA nodes"):
> If a domain does not have a VCPU affinity, try to pin it automatically
> to some PCPUs. This is done taking into account the NUMA characteristics
> of the host: we look for a combination of host's NUMA nodes that has enough
> free memoy for the new domain, and pin it to the VCPUs of those nodes.
> Smaller combinations are considered first, to avoid spreading the
> domain's memory among too many nodes.

Thanks for this.  Here are my comments:

> +static void libxl_nodemap_rand_init(libxl_nodemap *nodemap)
> +{
> +    int i;
> +    nodemap->size = rand() % 16;
> +    nodemap->map = calloc(nodemap->size, sizeof(*nodemap->map));
> +    libxl_for_each_node(i, *nodemap) {
> +        if (rand() % 2)
> +            libxl_nodemap_set(nodemap, i);
> +        else
> +            libxl_nodemap_reset(nodemap, i);
> +    }
> +}

For your random number generation, please use nrand48, with a seed in
the libxl ctx.  (This means you'll need to take out the ctx lock.)
And provide a way to set the seed.

> +/* Automatic NUMA placement */
> +libxl_numa_candidate *libxl_domain_numa_candidates(libxl_ctx *ctx,
> +                                        libxl_domain_build_info *b_info,
> +                                        int min_nodes, int *nr_cndts);
> +int libxl_numa_candidate_add_cpus(libxl_ctx *ctx,
> +                                  int min_cpus, int max_nodes,
> +                                  libxl_numa_candidate *candidate);
> +void libxl_numa_candidates_list_free(libxl_numa_candidate *list, int nr);

This interface documentation is deficient, I'm afraid.  Please explain
how these functions are supposed to be used.

> diff --git a/tools/libxl/libxl_json.c b/tools/libxl/libxl_json.c
> --- a/tools/libxl/libxl_json.c
> +++ b/tools/libxl/libxl_json.c
> @@ -119,6 +119,26 @@ out:
>      return s;
>  }
> 
> +yajl_gen_status libxl_nodemap_gen_json(yajl_gen hand,
> +                                       libxl_nodemap *nodemap)
> +{
> +    yajl_gen_status s;
> +    int i;
> +
> +    s = yajl_gen_array_open(hand);
> +    if (s != yajl_gen_status_ok) goto out;
> +
> +    libxl_for_each_node(i, *nodemap) {
> +        if (libxl_nodemap_test(nodemap, i)) {
> +            s = yajl_gen_integer(hand, i);
> +            if (s != yajl_gen_status_ok) goto out;
> +        }
> +    }
> +    s = yajl_gen_array_close(hand);
> +out:
> +    return s;
> +}

This is a copy of _cpumap_gen_json.  Bad enough to have one of these,
let along two.

> +/*
> + * The following two uility functions compute the node maps
> + * coresponding to the [ n! / (k! * (n - k)!) ] combinations
> + * of @size nodes within a set that is @nr_nodes big, without
> + * repetition. The algorithm used for generating them uses
> + * a vector containing the indexes in the set of the elements
> + * of the i-eth combination to generate the (i+1)-eth, and
> + * ensures they come in lexicographic order.

Can you please avoid this `@' ?  We aren't using doxygen (and never
will I hope) and it's just noise.  Also I think the "..." here (and
later) are wrong since these aren't strings.  If you want to talk
about them as arrays and want something to indicate the grouping you
could use { }.

> +static int numa_node_set_next(int nr_nodes, int size, int *idxs,
> +                              libxl_nodemap *nodemap)

You should at least write     int idxs[]    and explain that the
caller is expected to provide an array of size `size' which is private
to the implementation of numa_node_set_...

> +{
> +    int i;
> +
> +    /* Check whether we just need to increase the rightmost index */
> +    if (idxs[size - 1] < nr_nodes - 1) {
> +        idxs[size - 1]++;
> +        goto build_nodemap;

Is there something wrong with `else' ?  Or maybe you mean `goto out' ?

But, I think in fact this special case is unnecessary, isn't it ?

> +    /* Find the rightmost element from where to start the increasing */
> +    for (i = size - 1; idxs[i] == nr_nodes - size + i; i--) {

Since if  idxs[size-1] == nr_nodes-1  this loop's first test of the
condition with  i == size-1  becomes   idxs[size-1]==nr_nodes-1
and we therefore execute this

> +    for (idxs[i]++, i += 1; i < size; i++)
> +        idxs[i] = idxs[i - 1] + 1;

with i==size-1 and thus we increment idxs[size-1] and quit the loop
right away.

Furthermore, that last loop is really rather confusing.  The function
would benefit from a comment describing the algorith.   `i += 1' is
un-idiomatic, too.


> +libxl_numa_candidate *libxl_domain_numa_candidates(libxl_ctx *ctx,
> +                                        libxl_domain_build_info *b_info,
> +                                        int min_nodes, int *nr_cndts)

I think it would be better if this function returned a libxl error
code.  That way in would be possible to distinguish the various error
cases with different error codes.  You should define new error codes
if you need them.

For the libxl internal subcalls, do this:
    rc = libxl_get_some_information(CTX, ...);
    if (rc) goto out;

> +    suitable_nodes_idx = malloc(sizeof(int) * nr_nodes);

Use GCNEW_ARRAY and ditch the error handling.  You will need to
GC_INIT and GC_FREE.  This will give you a gc and then you should
write `CTX' rather than `ctx' so that your code can easily be moved
into an inner function.

> +        /* Generate the combinations and check their cumulative free memory 
> */
> +        numa_node_set_init(nr_nodes, min_nodes, suitable_nodes_idx,
> +                           &suitable_nodes);
> +        do {
> +            nodes_memkb = 0;
> +            libxl_for_each_set_node(i, suitable_nodes)
> +                nodes_memkb += ninfo[i].free / 1024;

This should be a for loop.  If necessary, numa_node_set_* should be
changed so that they can easily be used in a for loop.

This would be acceptable, for example:

       for (rc = numa_node_set_init(nr_nodes, min_nodes, suitable_nodes_idx,
                                    &suitable_nodes);
            !rc;
            rc = numa_node_set_next(nr_nodes, min_nodes, suitable_nodes_idx,
                                    &suitable_nodes)) {

Perhaps numa_node_set_ should take a struct for all these arguments.
Then you could write:

        for (rc = numa_node_set_init(&numa_node_set_iter, nr_nodes, min_nodes);
             !rc;
             rc = numa_node_set_next(&numa_node_set_iter) {

Or even better if you changed the interface a bit:

        for (numa_node_set_init(gc, &numa_node_set_iter, nr_nodes, min_nodes);
             numa_node_set_get(&numa_node_set_iter, &suitable_nodes);
             numa_node_set_next(&numa_node_set_iter) {

which would allow you to make numa_node_set_iter entirely opaque and
move the allocation of the idx array into the numa_node_set_*
functions.

> +                cndts = realloc(cndts, sizeof(cndts[0]) * (*nr_cndts+1));
> +                if (cndts == NULL) {

Firstly, use libxl__zrealloc(0, ...) and ditch the error handling.
Secondly, this reallocs the array every time we add a node.  It would
be better to keep a separate note of the array size and reallocate in
bigger chunks.

> + out_nodemap_idx:
> +    free(suitable_nodes_idx);
> + out_nodemap:
> +    libxl_nodemap_dispose(&suitable_nodes);
> + out_numainfo:
> +    libxl_numainfo_list_free(ninfo, nr_nodes);
> + out:
> +    return cndts;
> +}

Please don't use this error handling style.  Instead, initialise all
the resource variables to 0 right at the top of the function, and
unconditionally free them.

For the output variable cndts, you can unconditionally free it at the
end if you do this on the success exit path:

> +int libxl_numa_candidate_add_cpus(libxl_ctx *ctx,
> +                                  int min_cpus, int max_nodes,
> +                                  libxl_numa_candidate *candidate)
> +{
> +    int dom_nodes, nodes_cpus;
> +    libxl_cputopology *tinfo;
> +    libxl_numainfo *ninfo;
> +    int nr_nodes, nr_cpus;
> +    int i, rc;
> +
> +    tinfo = libxl_get_cpu_topology(ctx, &nr_cpus);
> +    if (tinfo == NULL) {
> +        LIBXL__LOG(ctx, LIBXL__LOG_ERROR, "libxl_get_topologyinfo failed\n");
> +        rc = ERROR_NOMEM;

We aren't supposed to be returning ERROR_NOMEM any more because all
our memory allocation is now assumed to be infallible.  Shouldn't this
be ERROR_FAIL ?  (And in the next stanza.)

> +    if (max_nodes == -1 || max_nodes > nr_nodes)
> +        max_nodes = nr_nodes;

I'm afraid I can't really make much sense of this algorithm without
thinking very hard.  Maybe if the interface doc comment had explained
what it was supposed to do it would be obvious...

> diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c
> --- a/tools/libxl/xl_cmdimpl.c
> +++ b/tools/libxl/xl_cmdimpl.c
> @@ -490,6 +490,122 @@ static void split_string_into_string_lis
...
> +/* How many PCPUs are there on each node? */
> +static int cpus_per_node(libxl_cputopology *tinfo, int nr_cpus)
> +{
> +    libxl_numainfo *ninfo;
> +    int nr_nodes, j, i;
> +    int cpus_nodes = 0;
> +
> +    ninfo = libxl_get_numainfo(ctx, &nr_nodes);
> +    if (ninfo == NULL)
> +        return 0;
> +
> +    /* This makes sense iff # of PCPUs is the same for all nodes */

And what if it isn't ?  Later I see:

> +    if (cpus_nodes != 0)
> +        dom_min_nodes = (dom_needs_cpus + cpus_nodes - 1) / cpus_nodes;
> +    else
> +        dom_min_nodes = 1;

which seems to suggest we'll pile the whole thing onto one pcpu.

> +/* Try to achieve "optimal" NUMA placement */
> +static int place_domain(libxl_domain_build_info *b_info)
> +{
...
 +    if (nr_pools > 1) {
> +        fprintf(stderr, "skip numa placement as cpupools are being used\n");
> +        err = 0;
> +        goto out_poolinfo;
> +    }

Perhaps in the case of cpupools the right answer is to consider only
those pcpus in the intended pool ?  Or is that likely to do more harm
than good, with people who are currently doing their numa placement
with cpupools ?

> +    tinfo = libxl_get_cpu_topology(ctx, &nr_cpus);
> +    if (tinfo == NULL) {
> +        fprintf(stderr, "libxl_get_topologyinfo failed.\n");
> +        err = ENOMEM;

libxl functions should return libxl error codes, not errno values.
You will probably need to introduce some.

> +    if (err == ERROR_FAIL) {
> +        /* Report back we didn't find a candidate with enough cpus */
> +        err = ESRCH;

I don't think we should ever turn ERROR_FAIL into something else.
ERROR_FAIL means `something went wrong in a way that shouldn't
happen'.


> +out_topologyinfo:
> +    libxl_cputopology_list_free(tinfo, nr_cpus);
> +out_poolinfo:
> +    for (i = 0; i < nr_pools; i++)
> +        libxl_cpupoolinfo_dispose(&pinfo[i]);
> +out:
> +    return err;
> +}

Again, please use an idempotent error handling style.


Ian.

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.