[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Xen-changelog] [xen master] libxl: automatic NUMA placement affects soft affinity



commit 705fad1227a3313f05e0f64da46d19a5c52dacde
Author:     Dario Faggioli <dario.faggioli@xxxxxxxxxx>
AuthorDate: Tue Jul 29 18:07:09 2014 +0200
Commit:     Ian Campbell <ian.campbell@xxxxxxxxxx>
CommitDate: Wed Jul 30 12:45:25 2014 +0100

    libxl: automatic NUMA placement affects soft affinity
    
    vCPU soft affinity and NUMA-aware scheduling does not have
    to be related. However, soft affinity is how NUMA-aware
    scheduling is actually implemented, and therefore, by default,
    the results of automatic NUMA placement (at VM creation time)
    are also used to set the soft affinity of all the vCPUs of
    the domain.
    
    Of course, this only happens if automatic NUMA placement is
    enabled and actually takes place (for instance, if the user
    does not specify any hard and soft affiniy in the xl config
    file).
    
    This also takes care of the vice-versa, i.e., don't trigger
    automatic placement if the config file specifies either an
    hard (the check for which was already there) or a soft (the
    check for which is introduced by this commit) affinity.
    
    Signed-off-by: Dario Faggioli <dario.faggioli@xxxxxxxxxx>
    Acked-by: George Dunlap <george.dunlap@xxxxxxxxxxxxx>
    Acked-by: Ian Campbell <ian.campbell@xxxxxxxxxx>
---
 docs/man/xl.cfg.pod.5                |   21 ++++++++++---------
 docs/misc/xl-numa-placement.markdown |   14 +++++++++++-
 tools/libxl/libxl_dom.c              |   36 +++++++++++++++++++++++++++++++--
 3 files changed, 56 insertions(+), 15 deletions(-)

diff --git a/docs/man/xl.cfg.pod.5 b/docs/man/xl.cfg.pod.5
index 5833054..1e04eed 100644
--- a/docs/man/xl.cfg.pod.5
+++ b/docs/man/xl.cfg.pod.5
@@ -162,16 +162,6 @@ here, and the soft affinity mask, provided via 
B<cpus\_soft=> (if any),
 is utilized to compute the domain node-affinity, for driving memory
 allocations.
 
-If we are on a NUMA machine (i.e., if the host has more than one NUMA
-node) and this option is not specified, libxl automatically tries to
-place the guest on the least possible number of nodes. That, however,
-will not affect vcpu pinning, so the guest will still be able to run on
-all the cpus. A heuristic approach is used for choosing the best node (or
-set of nodes), with the goals of maximizing performance for the guest
-and, at the same time, achieving efficient utilization of host cpus
-and memory. See F<docs/misc/xl-numa-placement.markdown> for more
-details.
-
 =item B<cpus_soft="CPU-LIST">
 
 Exactly as B<cpus=>, but specifies soft affinity, rather than pinning
@@ -186,6 +176,17 @@ the intersection of the soft affinity mask, provided here, 
and the vcpu
 pinning, provided via B<cpus=> (if any), is utilized to compute the
 domain node-affinity, for driving memory allocations.
 
+If this option is not specified (and B<cpus=> is not specified either),
+libxl automatically tries to place the guest on the least possible
+number of nodes. A heuristic approach is used for choosing the best
+node (or set of nodes), with the goal of maximizing performance for
+the guest and, at the same time, achieving efficient utilization of
+host cpus and memory. In that case, the soft affinity of all the vcpus
+of the domain will be set to the pcpus belonging to the NUMA nodes
+chosen during placement.
+
+For more details, see F<docs/misc/xl-numa-placement.markdown>.
+
 =back
 
 =head3 CPU Scheduling
diff --git a/docs/misc/xl-numa-placement.markdown 
b/docs/misc/xl-numa-placement.markdown
index 9d64eae..f863492 100644
--- a/docs/misc/xl-numa-placement.markdown
+++ b/docs/misc/xl-numa-placement.markdown
@@ -126,10 +126,20 @@ or Xen won't be able to guarantee the locality for their 
memory accesses.
 That, of course, also mean the vCPUs of the domain will only be able to
 execute on those same pCPUs.
 
+It is is also possible to have a "cpus\_soft=" option in the xl config file,
+to specify the soft affinity for all the vCPUs of the domain. This affects
+the NUMA placement in the following way:
+
+ * if only "cpus\_soft=" is present, the VM's node-affinity will be equal
+   to the nodes to which the pCPUs in the soft affinity mask belong;
+ * if both "cpus\_soft=" and "cpus=" are present, the VM's node-affinity
+   will be equal to the nodes to which the pCPUs present both in hard and
+   soft affinity belong.
+
 ### Placing the guest automatically ###
 
-If no "cpus=" option is specified in the config file, libxl tries
-to figure out on its own on which node(s) the domain could fit best.
+If neither "cpus=" nor "cpus\_soft=" are present in the config file, libxl
+tries to figure out on its own on which node(s) the domain could fit best.
 If it finds one (some), the domain's node affinity get set to there,
 and both memory allocations and NUMA aware scheduling (for the credit
 scheduler and starting from Xen 4.3) will comply with it. Starting from
diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c
index cfbd13d..c944804 100644
--- a/tools/libxl/libxl_dom.c
+++ b/tools/libxl/libxl_dom.c
@@ -247,11 +247,20 @@ int libxl__build_pre(libxl__gc *gc, uint32_t domid,
      * updated accordingly; if it does not manage, info->nodemap is just left
      * alone. It is then the the subsequent call to
      * libxl_domain_set_nodeaffinity() that enacts the actual placement.
+     *
+     * As far as scheduling is concerned, we achieve NUMA-aware scheduling
+     * by having the results of placement affect the soft affinity of all
+     * the vcpus of the domain. Of course, we want that iff placement is
+     * enabled and actually happens, so we only change info->cpumap_soft to
+     * reflect the placement result if that is the case
      */
     if (libxl_defbool_val(info->numa_placement)) {
-        if (info->cpumap.size || info->num_vcpu_hard_affinity) {
+        libxl_bitmap cpumap_soft;
+
+        if (info->cpumap.size ||
+            info->num_vcpu_hard_affinity || info->num_vcpu_soft_affinity) {
             LOG(ERROR, "Can run NUMA placement only if no vcpu "
-                       "affinity is specified explicitly");
+                       "(hard or soft) affinity is specified explicitly");
             return ERROR_INVAL;
         }
         if (info->nodemap.size) {
@@ -265,9 +274,30 @@ int libxl__build_pre(libxl__gc *gc, uint32_t domid,
             return rc;
         libxl_bitmap_set_any(&info->nodemap);
 
-        rc = numa_place_domain(gc, domid, info);
+        rc = libxl_cpu_bitmap_alloc(ctx, &cpumap_soft, 0);
         if (rc)
             return rc;
+
+        rc = numa_place_domain(gc, domid, info);
+        if (rc) {
+            libxl_bitmap_dispose(&cpumap_soft);
+            return rc;
+        }
+
+        /*
+         * All we need to do now is converting the result of automatic
+         * placement from nodemap to cpumap, and then use such cpumap as
+         * the soft affinity for all the vcpus of the domain.
+         *
+         * When calling libxl_set_vcpuaffinity_all(), it is ok to use NULL
+         * as hard affinity, as we know we don't have one, or we won't be
+         * here.
+         */
+        libxl_nodemap_to_cpumap(ctx, &info->nodemap, &cpumap_soft);
+        libxl_set_vcpuaffinity_all(ctx, domid, info->max_vcpus,
+                                   NULL, &cpumap_soft);
+
+        libxl_bitmap_dispose(&cpumap_soft);
     }
     if (info->nodemap.size)
         libxl_domain_set_nodeaffinity(ctx, domid, &info->nodemap);
--
generated by git-patchbot for /home/xen/git/xen.git#master

_______________________________________________
Xen-changelog mailing list
Xen-changelog@xxxxxxxxxxxxx
http://lists.xensource.com/xen-changelog


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.