Xen project Mailing List

Re: [Xen-devel] [PATCH 8/9] xen: sched: allow for choosing credit2 runqueues configuration at boot

To: Dario Faggioli <dario.faggioli@xxxxxxxxxx>, xen-devel@xxxxxxxxxxxxxxxxxxxx

From: Juergen Gross <jgross@xxxxxxxx>

Date: Thu, 1 Oct 2015 07:48:11 +0200

Cc: George Dunlap <george.dunlap@xxxxxxxxxxxxx>, Uma Sharma <uma.sharma523@xxxxxxxxx>

Delivery-date: Thu, 01 Oct 2015 05:48:31 +0000

List-id: Xen developer discussion <xen-devel.lists.xen.org>

On 09/29/2015 06:56 PM, Dario Faggioli wrote:

In fact, credit2 uses CPU topology to decide how to arrange
its internal runqueues. Before this change, only 'one runqueue
per socket' was allowed. However, experiments have shown that,
for instance, having one runqueue per physical core improves
performance, especially in case hyperthreading is available.

In general, it makes sense to allow users to pick one runqueue
arrangement at boot time, so that:
  - more experiments can be easily performed to even better
    assess and improve performance;
  - one can select the best configuration for his specific
    use case and/or hardware.

This patch enables the above.

Note that, for correctly arranging runqueues to be per-core,
just checking cpu_to_core() on the host CPUs is not enough.
In fact, cores (and hyperthreads) on different sockets, can
have the same core (and thread) IDs! We, therefore, need to
check whether the full topology of two CPUs matches, for
them to be put in the same runqueue.

Note also that the default (although not functional) for
credit2, since now, has been per-socket runqueue. This patch
leaves things that way, to avoid mixing policy and technical
changes.

I think you should think about a way to make this parameter a per cpupool one instead a system global one. As this will require some extra work regarding the tools interface I'd be absolutely fine with adding this at a later time, but you should have that in mind when setting this up now.


Signed-off-by: Dario Faggioli <dario.faggioli@xxxxxxxxxx>
Signed-off-by: Uma Sharma <uma.sharma523@xxxxxxxxx>
---
Cc: George Dunlap <george.dunlap@xxxxxxxxxxxxx>
Cc: Uma Sharma <uma.sharma523@xxxxxxxxx>
---
  docs/misc/xen-command-line.markdown |   11 +++++++
  xen/common/sched_credit2.c          |   57 ++++++++++++++++++++++++++++++++---
  2 files changed, 63 insertions(+), 5 deletions(-)

diff --git a/docs/misc/xen-command-line.markdown 
b/docs/misc/xen-command-line.markdown
index a2e427c..71315b8 100644
--- a/docs/misc/xen-command-line.markdown
+++ b/docs/misc/xen-command-line.markdown
@@ -467,6 +467,17 @@ combination with the `low_crashinfo` command line option.
  ### credit2\_load\_window\_shift
  > `= <integer>`

+### credit2\_runqueue
+> `= socket | core`
+
+> Default: `socket`
+
+Specify how host CPUs are arranged in runqueues. Runqueues are kept
+balanced with respect to the load generated by the vCPUs running on
+them. Smaller runqueues (as in with `core`) means more accurate load
+balancing (for instance, it will deal better with hyperthreading),
+but also more overhead.
+
  ### dbgp
  > `= ehci[ <integer> | @pci<bus>:<slot>.<func> ]`

diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c
index 38f382e..025626f 100644
--- a/xen/common/sched_credit2.c
+++ b/xen/common/sched_credit2.c
@@ -82,10 +82,6 @@
   * Credits are "reset" when the next vcpu in the runqueue is less than
   * or equal to zero.  At that point, everyone's credits are "clipped"
   * to a small value, and a fixed credit is added to everyone.
- *
- * The plan is for all cores that share an L2 will share the same
- * runqueue.  At the moment, there is one global runqueue for all
- * cores.
   */

  /*
@@ -194,6 +190,41 @@ static int __read_mostly opt_overload_balance_tolerance = 
-3;
  integer_param("credit2_balance_over", opt_overload_balance_tolerance);

  /*
+ * Runqueue organization.
+ *
+ * The various cpus are to be assigned each one to a runqueue, and we
+ * want that to happen basing on topology. At the moment, it is possible
+ * to choose to arrange runqueues to be:
+ *
+ * - per-core: meaning that there will be one runqueue per each physical
+ *             core of the host. This will happen if the opt_runqueue
+ *             parameter is set to 'core';
+ *
+ * - per-socket: meaning that there will be one runqueue per each physical
+ *               socket (AKA package, which often, but not always, also
+ *               matches a NUMA node) of the host; This will happen if
+ *               the opt_runqueue parameter is set to 'socket';

Wouldn't it be a nice idea to add "per-numa-node" as well? This would make a difference for systems with: - multiple sockets per numa-node - multiple numa-nodes per socket It might even be a good idea to be able to have only one runqueue in small cpupools (again, this will apply only in case you have a per cpupool setting instead a global one). Juergen _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.