[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Xen-changelog] [xen master] docs: add pod variant of xl-numa-placement



commit aa4eb460bcf77ea87b9209bb136efc8142a1a512
Author:     Olaf Hering <olaf@xxxxxxxxx>
AuthorDate: Wed Jul 26 16:39:50 2017 +0200
Commit:     Wei Liu <wei.liu2@xxxxxxxxxx>
CommitDate: Fri Jul 28 17:49:47 2017 +0100

    docs: add pod variant of xl-numa-placement
    
    Convert source for xl-numa-placement.7 from markdown to pod.
    This removes the buildtime requirement for pandoc, and subsequently the
    need for ghc, in the chain for BuildRequires of xen.rpm.
    
    Signed-off-by: Olaf Hering <olaf@xxxxxxxxx>
    Reviewed-by: Dario Faggioli <dario.faggioli@xxxxxxxxxx>
    Acked-by: Wei Liu <wei.liu2@xxxxxxxxxx>
---
 docs/man/xl-numa-placement.markdown.7 | 239 ---------------------------
 docs/man/xl-numa-placement.pod.7      | 293 ++++++++++++++++++++++++++++++++++
 2 files changed, 293 insertions(+), 239 deletions(-)

diff --git a/docs/man/xl-numa-placement.markdown.7 
b/docs/man/xl-numa-placement.markdown.7
deleted file mode 100644
index f863492..0000000
--- a/docs/man/xl-numa-placement.markdown.7
+++ /dev/null
@@ -1,239 +0,0 @@
-# Guest Automatic NUMA Placement in libxl and xl #
-
-## Rationale ##
-
-NUMA (which stands for Non-Uniform Memory Access) means that the memory
-accessing times of a program running on a CPU depends on the relative
-distance between that CPU and that memory. In fact, most of the NUMA
-systems are built in such a way that each processor has its local memory,
-on which it can operate very fast. On the other hand, getting and storing
-data from and on remote memory (that is, memory local to some other processor)
-is quite more complex and slow. On these machines, a NUMA node is usually
-defined as a set of processor cores (typically a physical CPU package) and
-the memory directly attached to the set of cores.
-
-NUMA awareness becomes very important as soon as many domains start
-running memory-intensive workloads on a shared host. In fact, the cost
-of accessing non node-local memory locations is very high, and the
-performance degradation is likely to be noticeable.
-
-For more information, have a look at the [Xen NUMA Introduction][numa_intro]
-page on the Wiki.
-
-## Xen and NUMA machines: the concept of _node-affinity_ ##
-
-The Xen hypervisor deals with NUMA machines throughout the concept of
-_node-affinity_. The node-affinity of a domain is the set of NUMA nodes
-of the host where the memory for the domain is being allocated (mostly,
-at domain creation time). This is, at least in principle, different and
-unrelated with the vCPU (hard and soft, see below) scheduling affinity,
-which instead is the set of pCPUs where the vCPU is allowed (or prefers)
-to run.
-
-Of course, despite the fact that they belong to and affect different
-subsystems, the domain node-affinity and the vCPUs affinity are not
-completely independent.
-In fact, if the domain node-affinity is not explicitly specified by the
-user, via the proper libxl calls or xl config item, it will be computed
-basing on the vCPUs' scheduling affinity.
-
-Notice that, even if the node affinity of a domain may change on-line,
-it is very important to "place" the domain correctly when it is fist
-created, as the most of its memory is allocated at that time and can
-not (for now) be moved easily.
-
-### Placing via pinning and cpupools ###
-
-The simplest way of placing a domain on a NUMA node is setting the hard
-scheduling affinity of the domain's vCPUs to the pCPUs of the node. This
-also goes under the name of vCPU pinning, and can be done through the
-"cpus=" option in the config file (more about this below). Another option
-is to pool together the pCPUs spanning the node and put the domain in
-such a _cpupool_ with the "pool=" config option (as documented in our
-[Wiki][cpupools_howto]).
-
-In both the above cases, the domain will not be able to execute outside
-the specified set of pCPUs for any reasons, even if all those pCPUs are
-busy doing something else while there are others, idle, pCPUs.
-
-So, when doing this, local memory accesses are 100% guaranteed, but that
-may come at he cost of some load imbalances.
-
-### NUMA aware scheduling ###
-
-If using the credit1 scheduler, and starting from Xen 4.3, the scheduler
-itself always tries to run the domain's vCPUs on one of the nodes in
-its node-affinity. Only if that turns out to be impossible, it will just
-pick any free pCPU. Locality of access is less guaranteed than in the
-pinning case, but that comes along with better chances to exploit all
-the host resources (e.g., the pCPUs).
-
-Starting from Xen 4.5, credit1 supports two forms of affinity: hard and
-soft, both on a per-vCPU basis. This means each vCPU can have its own
-soft affinity, stating where such vCPU prefers to execute on. This is
-less strict than what it (also starting from 4.5) is called hard affinity,
-as the vCPU can potentially run everywhere, it just prefers some pCPUs
-rather than others.
-In Xen 4.5, therefore, NUMA-aware scheduling is achieved by matching the
-soft affinity of the vCPUs of a domain with its node-affinity.
-
-In fact, as it was for 4.3, if all the pCPUs in a vCPU's soft affinity
-are busy, it is possible for the domain to run outside from there. The
-idea is that slower execution (due to remote memory accesses) is still
-better than no execution at all (as it would happen with pinning). For
-this reason, NUMA aware scheduling has the potential of bringing
-substantial performances benefits, although this will depend on the
-workload.
-
-Notice that, for each vCPU, the following three scenarios are possbile:
-
-  * a vCPU *is pinned* to some pCPUs and *does not have* any soft affinity
-    In this case, the vCPU is always scheduled on one of the pCPUs to which
-    it is pinned, without any specific peference among them.
-  * a vCPU *has* its own soft affinity and *is not* pinned to any particular
-    pCPU. In this case, the vCPU can run on every pCPU. Nevertheless, the
-    scheduler will try to have it running on one of the pCPUs in its soft
-    affinity;
-  * a vCPU *has* its own vCPU soft affinity and *is also* pinned to some
-    pCPUs. In this case, the vCPU is always scheduled on one of the pCPUs
-    onto which it is pinned, with, among them, a preference for the ones
-    that also forms its soft affinity. In case pinning and soft affinity
-    form two disjoint sets of pCPUs, pinning "wins", and the soft affinity
-    is just ignored.
-
-## Guest placement in xl ##
-
-If using xl for creating and managing guests, it is very easy to ask for
-both manual or automatic placement of them across the host's NUMA nodes.
-
-Note that xm/xend does a very similar thing, the only differences being
-the details of the heuristics adopted for automatic placement (see below),
-and the lack of support (in both xm/xend and the Xen versions where that
-was the default toolstack) for NUMA aware scheduling.
-
-### Placing the guest manually ###
-
-Thanks to the "cpus=" option, it is possible to specify where a domain
-should be created and scheduled on, directly in its config file. This
-affects NUMA placement and memory accesses as, in this case, the
-hypervisor constructs the node-affinity of a VM basing right on its
-vCPU pinning when it is created.
-
-This is very simple and effective, but requires the user/system
-administrator to explicitly specify the pinning for each and every domain,
-or Xen won't be able to guarantee the locality for their memory accesses.
-
-That, of course, also mean the vCPUs of the domain will only be able to
-execute on those same pCPUs.
-
-It is is also possible to have a "cpus\_soft=" option in the xl config file,
-to specify the soft affinity for all the vCPUs of the domain. This affects
-the NUMA placement in the following way:
-
- * if only "cpus\_soft=" is present, the VM's node-affinity will be equal
-   to the nodes to which the pCPUs in the soft affinity mask belong;
- * if both "cpus\_soft=" and "cpus=" are present, the VM's node-affinity
-   will be equal to the nodes to which the pCPUs present both in hard and
-   soft affinity belong.
-
-### Placing the guest automatically ###
-
-If neither "cpus=" nor "cpus\_soft=" are present in the config file, libxl
-tries to figure out on its own on which node(s) the domain could fit best.
-If it finds one (some), the domain's node affinity get set to there,
-and both memory allocations and NUMA aware scheduling (for the credit
-scheduler and starting from Xen 4.3) will comply with it. Starting from
-Xen 4.5, this also means that the mask resulting from this "fitting"
-procedure will become the soft affinity of all the vCPUs of the domain.
-
-It is worthwhile noting that optimally fitting a set of VMs on the NUMA
-nodes of an host is an incarnation of the Bin Packing Problem. In fact,
-the various VMs with different memory sizes are the items to be packed,
-and the host nodes are the bins. As such problem is known to be NP-hard,
-we will be using some heuristics.
-
-The first thing to do is find the nodes or the sets of nodes (from now
-on referred to as 'candidates') that have enough free memory and enough
-physical CPUs for accommodating the new domain. The idea is to find a
-spot for the domain with at least as much free memory as it has configured
-to have, and as much pCPUs as it has vCPUs.  After that, the actual
-decision on which candidate to pick happens accordingly to the following
-heuristics:
-
-  *  candidates involving fewer nodes are considered better. In case
-     two (or more) candidates span the same number of nodes,
-  *  candidates with a smaller number of vCPUs runnable on them (due
-     to previous placement and/or plain vCPU pinning) are considered
-     better. In case the same number of vCPUs can run on two (or more)
-     candidates,
-  *  the candidate with with the greatest amount of free memory is
-     considered to be the best one.
-
-Giving preference to candidates with fewer nodes ensures better
-performance for the guest, as it avoid spreading its memory among
-different nodes. Favoring candidates with fewer vCPUs already runnable
-there ensures a good balance of the overall host load. Finally, if more
-candidates fulfil these criteria, prioritizing the nodes that have the
-largest amounts of free memory helps keeping the memory fragmentation
-small, and maximizes the probability of being able to put more domains
-there.
-
-## Guest placement in libxl ##
-
-xl achieves automatic NUMA placement because that is what libxl does
-by default. No API is provided (yet) for modifying the behaviour of
-the placement algorithm. However, if your program is calling libxl,
-it is possible to set the `numa_placement` build info key to `false`
-(it is `true` by default) with something like the below, to prevent
-any placement from happening:
-
-    libxl_defbool_set(&domain_build_info->numa_placement, false);
-
-Also, if `numa_placement` is set to `true`, the domain's vCPUs must
-not be pinned (i.e., `domain_build_info->cpumap` must have all its
-bits set, as it is by default), or domain creation will fail with
-`ERROR_INVAL`.
-
-Starting from Xen 4.3, in case automatic placement happens (and is
-successful), it will affect the domain's node-affinity and _not_ its
-vCPU pinning. Namely, the domain's vCPUs will not be pinned to any
-pCPU on the host, but the memory from the domain will come from the
-selected node(s) and the NUMA aware scheduling (if the credit scheduler
-is in use) will try to keep the domain's vCPUs there as much as possible.
-
-Besides than that, looking and/or tweaking the placement algorithm
-search "Automatic NUMA placement" in libxl\_internal.h.
-
-Note this may change in future versions of Xen/libxl.
-
-## Xen < 4.5 ##
-
-The concept of vCPU soft affinity has been introduced for the first time
-in Xen 4.5. In 4.3, it is the domain's node-affinity that drives the
-NUMA-aware scheduler. The main difference is soft affinity is per-vCPU,
-and so each vCPU can have its own mask of pCPUs, while node-affinity is
-per-domain, that is the equivalent of having all the vCPUs with the same
-soft affinity.
-
-## Xen < 4.3 ##
-
-As NUMA aware scheduling is a new feature of Xen 4.3, things are a little
-bit different for earlier version of Xen. If no "cpus=" option is specified
-and Xen 4.2 is in use, the automatic placement algorithm still runs, but
-the results is used to _pin_ the vCPUs of the domain to the output node(s).
-This is consistent with what was happening with xm/xend.
-
-On a version of Xen earlier than 4.2, there is not automatic placement at
-all in xl or libxl, and hence no node-affinity, vCPU affinity or pinning
-being introduced/modified.
-
-## Limitations ##
-
-Analyzing various possible placement solutions is what makes the
-algorithm flexible and quite effective. However, that also means
-it won't scale well to systems with arbitrary number of nodes.
-For this reason, automatic placement is disabled (with a warning)
-if it is requested on a host with more than 16 NUMA nodes.
-
-[numa_intro]: http://wiki.xen.org/wiki/Xen_NUMA_Introduction
-[cpupools_howto]: http://wiki.xen.org/wiki/Cpupools_Howto
diff --git a/docs/man/xl-numa-placement.pod.7 b/docs/man/xl-numa-placement.pod.7
new file mode 100644
index 0000000..54a4441
--- /dev/null
+++ b/docs/man/xl-numa-placement.pod.7
@@ -0,0 +1,293 @@
+=encoding utf8
+
+=head1 NAME
+
+Guest Automatic NUMA Placement in libxl and xl
+
+=head1 DESCRIPTION
+
+=head2 Rationale
+
+NUMA (which stands for Non-Uniform Memory Access) means that the memory
+accessing times of a program running on a CPU depends on the relative
+distance between that CPU and that memory. In fact, most of the NUMA
+systems are built in such a way that each processor has its local memory,
+on which it can operate very fast. On the other hand, getting and storing
+data from and on remote memory (that is, memory local to some other processor)
+is quite more complex and slow. On these machines, a NUMA node is usually
+defined as a set of processor cores (typically a physical CPU package) and
+the memory directly attached to the set of cores.
+
+NUMA awareness becomes very important as soon as many domains start
+running memory-intensive workloads on a shared host. In fact, the cost
+of accessing non node-local memory locations is very high, and the
+performance degradation is likely to be noticeable.
+
+For more information, have a look at the L<Xen NUMA 
Introduction|http://wiki.xen.org/wiki/Xen_NUMA_Introduction>
+page on the Wiki.
+
+
+=head2 Xen and NUMA machines: the concept of I<node-affinity>
+
+The Xen hypervisor deals with NUMA machines throughout the concept of
+I<node-affinity>. The node-affinity of a domain is the set of NUMA nodes
+of the host where the memory for the domain is being allocated (mostly,
+at domain creation time). This is, at least in principle, different and
+unrelated with the vCPU (hard and soft, see below) scheduling affinity,
+which instead is the set of pCPUs where the vCPU is allowed (or prefers)
+to run.
+
+Of course, despite the fact that they belong to and affect different
+subsystems, the domain node-affinity and the vCPUs affinity are not
+completely independent.
+In fact, if the domain node-affinity is not explicitly specified by the
+user, via the proper libxl calls or xl config item, it will be computed
+basing on the vCPUs' scheduling affinity.
+
+Notice that, even if the node affinity of a domain may change on-line,
+it is very important to "place" the domain correctly when it is fist
+created, as the most of its memory is allocated at that time and can
+not (for now) be moved easily.
+
+
+=head2 Placing via pinning and cpupools
+
+The simplest way of placing a domain on a NUMA node is setting the hard
+scheduling affinity of the domain's vCPUs to the pCPUs of the node. This
+also goes under the name of vCPU pinning, and can be done through the
+"cpus=" option in the config file (more about this below). Another option
+is to pool together the pCPUs spanning the node and put the domain in
+such a I<cpupool> with the "pool=" config option (as documented in our
+L<Wiki|http://wiki.xen.org/wiki/Cpupools_Howto>).
+
+In both the above cases, the domain will not be able to execute outside
+the specified set of pCPUs for any reasons, even if all those pCPUs are
+busy doing something else while there are others, idle, pCPUs.
+
+So, when doing this, local memory accesses are 100% guaranteed, but that
+may come at he cost of some load imbalances.
+
+
+=head2 NUMA aware scheduling
+
+If using the credit1 scheduler, and starting from Xen 4.3, the scheduler
+itself always tries to run the domain's vCPUs on one of the nodes in
+its node-affinity. Only if that turns out to be impossible, it will just
+pick any free pCPU. Locality of access is less guaranteed than in the
+pinning case, but that comes along with better chances to exploit all
+the host resources (e.g., the pCPUs).
+
+Starting from Xen 4.5, credit1 supports two forms of affinity: hard and
+soft, both on a per-vCPU basis. This means each vCPU can have its own
+soft affinity, stating where such vCPU prefers to execute on. This is
+less strict than what it (also starting from 4.5) is called hard affinity,
+as the vCPU can potentially run everywhere, it just prefers some pCPUs
+rather than others.
+In Xen 4.5, therefore, NUMA-aware scheduling is achieved by matching the
+soft affinity of the vCPUs of a domain with its node-affinity.
+
+In fact, as it was for 4.3, if all the pCPUs in a vCPU's soft affinity
+are busy, it is possible for the domain to run outside from there. The
+idea is that slower execution (due to remote memory accesses) is still
+better than no execution at all (as it would happen with pinning). For
+this reason, NUMA aware scheduling has the potential of bringing
+substantial performances benefits, although this will depend on the
+workload.
+
+Notice that, for each vCPU, the following three scenarios are possbile:
+
+=over
+
+=item *
+
+a vCPU I<is pinned> to some pCPUs and I<does not have> any soft affinity
+In this case, the vCPU is always scheduled on one of the pCPUs to which
+it is pinned, without any specific peference among them.
+
+
+=item *
+
+a vCPU I<has> its own soft affinity and I<is not> pinned to any particular
+pCPU. In this case, the vCPU can run on every pCPU. Nevertheless, the
+scheduler will try to have it running on one of the pCPUs in its soft
+affinity;
+
+
+=item *
+
+a vCPU I<has> its own vCPU soft affinity and I<is also> pinned to some
+pCPUs. In this case, the vCPU is always scheduled on one of the pCPUs
+onto which it is pinned, with, among them, a preference for the ones
+that also forms its soft affinity. In case pinning and soft affinity
+form two disjoint sets of pCPUs, pinning "wins", and the soft affinity
+is just ignored.
+
+
+=back
+
+
+=head2 Guest placement in xl
+
+If using xl for creating and managing guests, it is very easy to ask for
+both manual or automatic placement of them across the host's NUMA nodes.
+
+Note that xm/xend does a very similar thing, the only differences being
+the details of the heuristics adopted for automatic placement (see below),
+and the lack of support (in both xm/xend and the Xen versions where that
+was the default toolstack) for NUMA aware scheduling.
+
+
+=head2 Placing the guest manually
+
+Thanks to the "cpus=" option, it is possible to specify where a domain
+should be created and scheduled on, directly in its config file. This
+affects NUMA placement and memory accesses as, in this case, the
+hypervisor constructs the node-affinity of a VM basing right on its
+vCPU pinning when it is created.
+
+This is very simple and effective, but requires the user/system
+administrator to explicitly specify the pinning for each and every domain,
+or Xen won't be able to guarantee the locality for their memory accesses.
+
+That, of course, also mean the vCPUs of the domain will only be able to
+execute on those same pCPUs.
+
+It is is also possible to have a "cpus_soft=" option in the xl config file,
+to specify the soft affinity for all the vCPUs of the domain. This affects
+the NUMA placement in the following way:
+
+=over
+
+=item *
+
+if only "cpus_soft=" is present, the VM's node-affinity will be equal
+to the nodes to which the pCPUs in the soft affinity mask belong;
+
+
+=item *
+
+if both "cpus_soft=" and "cpus=" are present, the VM's node-affinity
+will be equal to the nodes to which the pCPUs present both in hard and
+soft affinity belong.
+
+
+=back
+
+
+=head2 Placing the guest automatically
+
+If neither "cpus=" nor "cpus_soft=" are present in the config file, libxl
+tries to figure out on its own on which node(s) the domain could fit best.
+If it finds one (some), the domain's node affinity get set to there,
+and both memory allocations and NUMA aware scheduling (for the credit
+scheduler and starting from Xen 4.3) will comply with it. Starting from
+Xen 4.5, this also means that the mask resulting from this "fitting"
+procedure will become the soft affinity of all the vCPUs of the domain.
+
+It is worthwhile noting that optimally fitting a set of VMs on the NUMA
+nodes of an host is an incarnation of the Bin Packing Problem. In fact,
+the various VMs with different memory sizes are the items to be packed,
+and the host nodes are the bins. As such problem is known to be NP-hard,
+we will be using some heuristics.
+
+The first thing to do is find the nodes or the sets of nodes (from now
+on referred to as 'candidates') that have enough free memory and enough
+physical CPUs for accommodating the new domain. The idea is to find a
+spot for the domain with at least as much free memory as it has configured
+to have, and as much pCPUs as it has vCPUs.  After that, the actual
+decision on which candidate to pick happens accordingly to the following
+heuristics:
+
+=over
+
+=item *
+
+candidates involving fewer nodes are considered better. In case
+two (or more) candidates span the same number of nodes,
+
+
+=item *
+
+candidates with a smaller number of vCPUs runnable on them (due
+to previous placement and/or plain vCPU pinning) are considered
+better. In case the same number of vCPUs can run on two (or more)
+candidates,
+
+
+=item *
+
+the candidate with with the greatest amount of free memory is
+considered to be the best one.
+
+
+=back
+
+Giving preference to candidates with fewer nodes ensures better
+performance for the guest, as it avoid spreading its memory among
+different nodes. Favoring candidates with fewer vCPUs already runnable
+there ensures a good balance of the overall host load. Finally, if more
+candidates fulfil these criteria, prioritizing the nodes that have the
+largest amounts of free memory helps keeping the memory fragmentation
+small, and maximizes the probability of being able to put more domains
+there.
+
+
+=head2 Guest placement in libxl
+
+xl achieves automatic NUMA placement because that is what libxl does
+by default. No API is provided (yet) for modifying the behaviour of
+the placement algorithm. However, if your program is calling libxl,
+it is possible to set the C<numa_placement> build info key to C<false>
+(it is C<true> by default) with something like the below, to prevent
+any placement from happening:
+
+    libxl_defbool_set(&domain_build_info->numa_placement, false);
+
+Also, if C<numa_placement> is set to C<true>, the domain's vCPUs must
+not be pinned (i.e., C<<< domain_build_info->cpumap >>> must have all its
+bits set, as it is by default), or domain creation will fail with
+C<ERROR_INVAL>.
+
+Starting from Xen 4.3, in case automatic placement happens (and is
+successful), it will affect the domain's node-affinity and I<not> its
+vCPU pinning. Namely, the domain's vCPUs will not be pinned to any
+pCPU on the host, but the memory from the domain will come from the
+selected node(s) and the NUMA aware scheduling (if the credit scheduler
+is in use) will try to keep the domain's vCPUs there as much as possible.
+
+Besides than that, looking and/or tweaking the placement algorithm
+search "Automatic NUMA placement" in libxl_internal.h.
+
+Note this may change in future versions of Xen/libxl.
+
+
+=head2 Xen < 4.5
+
+The concept of vCPU soft affinity has been introduced for the first time
+in Xen 4.5. In 4.3, it is the domain's node-affinity that drives the
+NUMA-aware scheduler. The main difference is soft affinity is per-vCPU,
+and so each vCPU can have its own mask of pCPUs, while node-affinity is
+per-domain, that is the equivalent of having all the vCPUs with the same
+soft affinity.
+
+
+=head2 Xen < 4.3
+
+As NUMA aware scheduling is a new feature of Xen 4.3, things are a little
+bit different for earlier version of Xen. If no "cpus=" option is specified
+and Xen 4.2 is in use, the automatic placement algorithm still runs, but
+the results is used to I<pin> the vCPUs of the domain to the output node(s).
+This is consistent with what was happening with xm/xend.
+
+On a version of Xen earlier than 4.2, there is not automatic placement at
+all in xl or libxl, and hence no node-affinity, vCPU affinity or pinning
+being introduced/modified.
+
+
+=head2 Limitations
+
+Analyzing various possible placement solutions is what makes the
+algorithm flexible and quite effective. However, that also means
+it won't scale well to systems with arbitrary number of nodes.
+For this reason, automatic placement is disabled (with a warning)
+if it is requested on a host with more than 16 NUMA nodes.
--
generated by git-patchbot for /home/xen/git/xen.git#master

_______________________________________________
Xen-changelog mailing list
Xen-changelog@xxxxxxxxxxxxx
https://lists.xenproject.org/xen-changelog

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.