[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v2 16/16] libxl: automatic NUMA placement affects soft affinity



On 13/11/13 19:13, Dario Faggioli wrote:
vCPU soft affinity and NUMA-aware scheduling does not have
to be related. However, soft affinity is how NUMA-aware
scheduling is actually implemented, and therefore, by default,
the results of automatic NUMA placement (at VM creation time)
are also used to set the soft affinity of all the vCPUs of
the domain.

Of course, this only happens if automatic NUMA placement is
enabled and actually takes place (for instance, if the user
does not specify any hard and soft affiniy in the xl config
file).

This also takes care of the vice-versa, i.e., don't trigger
automatic placement if the config file specifies either an
hard (the check for which was already there) or a soft (the
check for which is introduced by this commit) affinity.

It looks like with this patch you set *both* hard and soft affinities when doing auto-numa placement. Would it make more sense to change it to setting only the soft affinity, and leaving the hard affinity to "any"?

(My brain is running low, so forgive me if I've mis-read it...)

 -George


Signed-off-by: Dario Faggioli <dario.faggioli@xxxxxxxxxx>
---
  docs/man/xl.cfg.pod.5                |   21 +++++++++++----------
  docs/misc/xl-numa-placement.markdown |   16 ++++++++++++++--
  tools/libxl/libxl_dom.c              |   22 ++++++++++++++++++++--
  3 files changed, 45 insertions(+), 14 deletions(-)

diff --git a/docs/man/xl.cfg.pod.5 b/docs/man/xl.cfg.pod.5
index 733c74e..d4a0a6f 100644
--- a/docs/man/xl.cfg.pod.5
+++ b/docs/man/xl.cfg.pod.5
@@ -150,16 +150,6 @@ here, and the soft affinity mask, provided via 
B<cpus\_soft=> (if any),
  is utilized to compute the domain node-affinity, for driving memory
  allocations.
-If we are on a NUMA machine (i.e., if the host has more than one NUMA
-node) and this option is not specified, libxl automatically tries to
-place the guest on the least possible number of nodes. That, however,
-will not affect vcpu pinning, so the guest will still be able to run on
-all the cpus. A heuristic approach is used for choosing the best node (or
-set of nodes), with the goals of maximizing performance for the guest
-and, at the same time, achieving efficient utilization of host cpus
-and memory. See F<docs/misc/xl-numa-placement.markdown> for more
-details.
-
  =item B<cpus_soft="CPU-LIST">
Exactly as B<cpus=>, but specifies soft affinity, rather than pinning
@@ -178,6 +168,17 @@ the intersection of the soft affinity mask, provided here, 
and the vcpu
  pinning, provided via B<cpus=> (if any), is utilized to compute the
  domain node-affinity, for driving memory allocations.
+If this option is not specified (and B<cpus=> is not specified either),
+libxl automatically tries to place the guest on the least possible
+number of nodes. A heuristic approach is used for choosing the best
+node (or set of nodes), with the goal of maximizing performance for
+the guest and, at the same time, achieving efficient utilization of
+host cpus and memory. In that case, the soft affinity of all the vcpus
+of the domain will be set to the pcpus belonging to the NUMA nodes
+chosen during placement.
+
+For more details, see F<docs/misc/xl-numa-placement.markdown>.
+
  =back
=head3 CPU Scheduling
diff --git a/docs/misc/xl-numa-placement.markdown 
b/docs/misc/xl-numa-placement.markdown
index b1ed361..f644758 100644
--- a/docs/misc/xl-numa-placement.markdown
+++ b/docs/misc/xl-numa-placement.markdown
@@ -126,10 +126,22 @@ or Xen won't be able to guarantee the locality for their 
memory accesses.
  That, of course, also mean the vCPUs of the domain will only be able to
  execute on those same pCPUs.
+Starting from 4.4, is is also possible to specify a "cpus\_soft=" option
+in the xl config file. This, independently from whether or not "cpus=" is
+specified too, affect the NUMA placement in a way very similar to what
+is described above. In fact, the hypervisor will build up the node-affinity
+of the VM basing right on it or, if both pinning (via "cpus=") and soft
+affinity (via "cpus\_soft=") are present, basing on their intersection.
+
+Besides that, "cpus\_soft=" also means, of course, that the vCPUs of the
+domain will prefer to execute on, among the pCPUs where they can run,
+those particular pCPUs.
+
+
  ### Placing the guest automatically ###
-If no "cpus=" option is specified in the config file, libxl tries
-to figure out on its own on which node(s) the domain could fit best.
+If neither "cpus=" nor "cpus\_soft=" are present in the config file, libxl
+tries to figure out on its own on which node(s) the domain could fit best.
  If it finds one (some), the domain's node affinity get set to there,
  and both memory allocations and NUMA aware scheduling (for the credit
  scheduler and starting from Xen 4.3) will comply with it. Starting from
diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c
index a1c16b0..d241efc 100644
--- a/tools/libxl/libxl_dom.c
+++ b/tools/libxl/libxl_dom.c
@@ -222,21 +222,39 @@ int libxl__build_pre(libxl__gc *gc, uint32_t domid,
       * some weird error manifests) the subsequent call to
       * libxl_domain_set_nodeaffinity() will do the actual placement,
       * whatever that turns out to be.
+     *
+     * As far as scheduling is concerned, we achieve NUMA-aware scheduling
+     * by having the results of placement affect the soft affinity of all
+     * the vcpus of the domain. Of course, we want that iff placement is
+     * enabled and actually happens, so we only change info->cpumap_soft to
+     * reflect the placement result if that is the case
       */
      if (libxl_defbool_val(info->numa_placement)) {
- if (!libxl_bitmap_is_full(&info->cpumap)) {
+        /* We require both hard and soft affinity not to be set */
+        if (!libxl_bitmap_is_full(&info->cpumap) ||
+            !libxl_bitmap_is_full(&info->cpumap_soft)) {
              LOG(ERROR, "Can run NUMA placement only if no vcpu "
-                       "affinity is specified");
+                       "(hard or soft) affinity is specified");
              return ERROR_INVAL;
          }
rc = numa_place_domain(gc, domid, info);
          if (rc)
              return rc;
+
+        /*
+         * We change the soft affinity in domain_build_info here, of course
+         * after converting the result of placement from nodes to cpus. the
+         * following call to libxl_set_vcpuaffinity_all_soft() will do the
+         * actual updating of the domain's vcpus' soft affinity.
+         */
+        libxl_nodemap_to_cpumap(ctx, &info->nodemap, &info->cpumap_soft);
      }
      libxl_domain_set_nodeaffinity(ctx, domid, &info->nodemap);
      libxl_set_vcpuaffinity_all(ctx, domid, info->max_vcpus, &info->cpumap);
+    libxl_set_vcpuaffinity_all_soft(ctx, domid, info->max_vcpus,
+                                    &info->cpumap_soft);
xc_domain_setmaxmem(ctx->xch, domid, info->target_memkb + LIBXL_MAXMEM_CONSTANT);
      xs_domid = xs_read(ctx->xsh, XBT_NULL, "/tool/xenstored/domid", NULL);



_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.