[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH 08 of 10 v2] libxl: enable automatic placement of guests on NUMA nodes



On Thu, 2012-06-21 at 17:34 +0100, Dario Faggioli wrote:
> On Thu, 2012-06-21 at 12:40 +0100, Ian Campbell wrote:

> > > +/*
> > > + * This function tries to figure out if the host has a consistent number
> > > + * of cpus along all its NUMA nodes. In fact, if that is the case, we can
> > > + * calculate the minimum number of nodes needed for a domain by just
> > > + * dividing its total number of vcpus by this value computed here.
> > > + * However, we are not allowed to assume that all the nodes have the
> > > + * same number of cpus. Therefore, in case discrepancies among different
> > > + * nodes are found, this function just returns 0 and the caller needs
> > > + * to sort out things in some other way.
> > 
> > If the caller has to deal with this anyway why bother with this check
> > and/or the other algorithm?
> > 
> Mmm... I'm not sure I get what you mean here. Perhaps my commenting
> about the whole thing was not so clear in the first place.
> 
> The (George's :-)) idea is, if we know each node has 8 CPUs, and our
> domain wants more than that, it does not make sense to start looking for
> placement solutions with just one node. Actually, if the domain wants
> fewer than 16 CPUs, starting looking from solutions with 2 nodes in them
> should be the right thing.
> 
> That being said, I think it's important we do not assume we won't ever
> find some architecture with different number of CPUs in different nodes,
> and that is what a zero return from that function means.
> 
> What was that you thought not to be worthwhile?

The comment said "this function just returns 0 and the caller needs to
sort out things in some other way" so I was wondering -- well if the
caller has some algorithm which can cope with this why not always use
it. 

I think I later observed that if it returns zero the caller in this case
does something basic instead, so I see how this makes sense now.

> 
> > > +/* Get all the placement candidates satisfying some specific conditions 
> > > */
> > > +int libxl__get_numa_candidates(libxl__gc *gc,
> > > +                               uint32_t min_free_memkb, int min_cpus,
> > > +                               int min_nodes, int max_nodes,
> > > +                               libxl__numa_candidate *cndts[], int 
> > > *nr_cndts)
> > > +{
> > > +    libxl__numa_candidate *new_cndts = NULL;
> > > +    libxl_cputopology *tinfo = NULL;
> > > +    libxl_numainfo *ninfo = NULL;
> > > +    libxl_bitmap nodemap;
> > > +    int nr_nodes, nr_cpus;
> > > +    int array_size, rc;
> > > +
> > > +    /* Get platform info and prepare the map for testing the 
> > > combinations */
> > > +    ninfo = libxl_get_numainfo(CTX, &nr_nodes);
> > > +    if (ninfo == NULL)
> > > +        return ERROR_FAIL;
> > > +    /* If we don't have at least 2 nodes, it is useless to proceed */
> > 
> > You don't want to return whichever of those 2 nodes meets the
> > constraints?
> > 
> I actually was meaning something like "hey, there's only _1_ node, go
> for it!" :-P

I misread sorry!

> 
> > ...
> > > +    *cndts = new_cndts;
> > > + out:
> > > +    libxl_bitmap_dispose(&nodemap);
> > 
> > nodemap might not have been initialised?
> >
> Mmm... Right. So I really think I need an init function. :-(

In general all libxl datastructures are supposed to have one, most of
them are just memset(.., 0)

> 
> > > +    libxl_cputopology_list_free(tinfo, nr_cpus);
> > > +    libxl_numainfo_list_free(ninfo, nr_nodes);
> > 
> > It looks like the nr_* may also not have been initialised, but maybe
> > list_free is safe in that case since the associated array must
> > necessarily be NULL?
> > 
> Well, probably, but as they both do a "for (i=0;i<nr_*;i++)" I think
> it's safer if I "=0" those twos, is that ok?

I think it would be a good idea.

The alternative is that anything which returns a list has to set nr ==
0.

> > > +    /* Bring the best candidate in front of the list --> candidates[0] */
> > > +    if (nr_candidates > 1)
> > > +        libxl__sort_numa_candidates(candidates, nr_candidates, 
> > > numa_cmpf);
> > 
> > Is the start up cost of libxl__sort_numa_candidates significant enough
> > to make that if worthwhile?
> > 
> I'm not sure what you mean here, I'm afraid ... :-(

Is the cost of sorting the 1 element list so high it is worth avoiding.
Since the sort itself would be trivial the startup costs are what would
dominate.

> 
> > > +
> > > +    /*
> > > +     * Check if the domain has any CPU affinity. If not, try to build up 
> > > one.
> > > +     * In case numa_place_domain() find at least a suitable candidate, 
> > > it will
> > > +     * affect info->cpumap accordingly; if it does not, it just leaves it
> > > +     * as it is. This means (unless some weird error manifests) the 
> > > subsequent
> > > +     * call to libxl_set_vcpuaffinity_all() will do the actual placement,
> > > +     * whatever that turns out to be.
> > > +     */
> > > +    if (libxl_bitmap_is_full(&info->cpumap)) {
> > > +        int rc = numa_place_domain(gc, info);
> > > +        if (rc)
> > > +            return rc;
> > 
> > I'm not sure about this, if numa_place_domain fails for any reason would
> > we be not better off logging and continuing without placement? We
> > already do that explicitly in a couple of cases.
> > 
> Well, actually, if it "fails" in the sense it finds no candidates or
> does not manage in placing the domain, it returns 0, so we do right what
> you're suggesting. I'm sure this is stated in some comment but I guess I
> can repeat it here. If !rc, it means we stepped into some ERROR_FAIL or
> something, which I think has to be handled like this, hasn't it?

That makes sense.

> Perhaps I can also add some logging about "successfully failing", i.e.,
> not getting into any error but not being able to place the domain. If
> yes, that will happen _inside_ numa_place_domain() rather than here.

Logging in that case seems wise in any case since it will have
performance implications I think.

Ian.



_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.