Xen project Mailing List

Re: [Xen-devel] [PATCH v3 08/11] xen: sched: allow for choosing credit2 runqueues configuration at boot

To: Dario Faggioli <dario.faggioli@xxxxxxxxxx>, xen-devel@xxxxxxxxxxxxxxxxxxxx

From: Juergen Gross <jgross@xxxxxxxx>

Date: Fri, 8 Apr 2016 06:18:48 +0200

Cc: George Dunlap <george.dunlap@xxxxxxxxxxxxx>, Uma Sharma <uma.sharma523@xxxxxxxxx>

Delivery-date: Fri, 08 Apr 2016 04:18:56 +0000

List-id: Xen developer discussion <xen-devel.lists.xen.org>

On 08/04/16 03:24, Dario Faggioli wrote: > In fact, credit2 uses CPU topology to decide how to arrange > its internal runqueues. Before this change, only 'one runqueue > per socket' was allowed. However, experiments have shown that, > for instance, having one runqueue per physical core improves > performance, especially in case hyperthreading is available. > > In general, it makes sense to allow users to pick one runqueue > arrangement at boot time, so that: > - more experiments can be easily performed to even better > assess and improve performance; > - one can select the best configuration for his specific > use case and/or hardware. > > This patch enables the above. > > Note that, for correctly arranging runqueues to be per-core, > just checking cpu_to_core() on the host CPUs is not enough. > In fact, cores (and hyperthreads) on different sockets, can > have the same core (and thread) IDs! We, therefore, need to > check whether the full topology of two CPUs matches, for > them to be put in the same runqueue. > > Note also that the default (although not functional) for > credit2, since now, has been per-socket runqueue. This patch > leaves things that way, to avoid mixing policy and technical > changes. > > Finally, it would be a nice feature to be able to select > a particular runqueue arrangement, even when creating a > Credit2 cpupool. This is left as future work. > > Signed-off-by: Dario Faggioli <dario.faggioli@xxxxxxxxxx> > Signed-off-by: Uma Sharma <uma.sharma523@xxxxxxxxx> Some nits below. > --- > Cc: George Dunlap <george.dunlap@xxxxxxxxxxxxx> > Cc: Uma Sharma <uma.sharma523@xxxxxxxxx> > Cc: Juergen Gross <jgross@xxxxxxxx> > --- > Changes from v2: > * valid strings are now in an array, that we scan during > parameter parsing, as suggested during review. > > Cahnges from v1: > * fix bug in parameter parsing, and start using strcmp() > for that, as requested during review. > --- > docs/misc/xen-command-line.markdown | 19 ++++++++ > xen/common/sched_credit2.c | 83 > +++++++++++++++++++++++++++++++++-- > 2 files changed, 97 insertions(+), 5 deletions(-) > > diff --git a/docs/misc/xen-command-line.markdown > b/docs/misc/xen-command-line.markdown > index ca77e3b..0047f94 100644 > --- a/docs/misc/xen-command-line.markdown > +++ b/docs/misc/xen-command-line.markdown > @@ -469,6 +469,25 @@ combination with the `low_crashinfo` command line option. > ### credit2\_load\_window\_shift > > `= <integer>` > > +### credit2\_runqueue > +> `= core | socket | node | all` > + > +> Default: `socket` > + > +Specify how host CPUs are arranged in runqueues. Runqueues are kept > +balanced with respect to the load generated by the vCPUs running on > +them. Smaller runqueues (as in with `core`) means more accurate load > +balancing (for instance, it will deal better with hyperthreading), > +but also more overhead. > + > +Available alternatives, with their meaning, are: > +* `core`: one runqueue per each physical core of the host; > +* `socket`: one runqueue per each physical socket (which often, > + but not always, matches a NUMA node) of the host; > +* `node`: one runqueue per each NUMA node of the host; > +* `all`: just one runqueue shared by all the logical pCPUs of > + the host > + > ### dbgp > > `= ehci[ <integer> | @pci<bus>:<slot>.<func> ]` > > diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c > index a61a45a..eeb3f54 100644 > --- a/xen/common/sched_credit2.c > +++ b/xen/common/sched_credit2.c > @@ -81,10 +81,6 @@ > * Credits are "reset" when the next vcpu in the runqueue is less than > * or equal to zero. At that point, everyone's credits are "clipped" > * to a small value, and a fixed credit is added to everyone. > - * > - * The plan is for all cores that share an L2 will share the same > - * runqueue. At the moment, there is one global runqueue for all > - * cores. > */ > > /* > @@ -193,6 +189,63 @@ static int __read_mostly opt_overload_balance_tolerance > = -3; > integer_param("credit2_balance_over", opt_overload_balance_tolerance); > > /* > + * Runqueue organization. > + * > + * The various cpus are to be assigned each one to a runqueue, and we > + * want that to happen basing on topology. At the moment, it is possible > + * to choose to arrange runqueues to be: > + * > + * - per-core: meaning that there will be one runqueue per each physical > + * core of the host. This will happen if the opt_runqueue > + * parameter is set to 'core'; > + * > + * - per-node: meaning that there will be one runqueue per each physical > + * NUMA node of the host. This will happen if the opt_runqueue > + * parameter is set to 'node'; > + * > + * - per-socket: meaning that there will be one runqueue per each physical > + * socket (AKA package, which often, but not always, also > + * matches a NUMA node) of the host; This will happen if > + * the opt_runqueue parameter is set to 'socket'; > + * > + * - global: meaning that there will be only one runqueue to which all the > + * (logical) processors of the host belongs. This will happen if s/belongs/belong/ > + * the opt_runqueue parameter is set to 'all'. > + * > + * Depending on the value of opt_runqueue, therefore, cpus that are part of > + * either the same physical core, or of the same physical socket, will be > + * put together to form runqueues. numa? all? > + */ > +#define OPT_RUNQUEUE_CORE 0 > +#define OPT_RUNQUEUE_SOCKET 1 > +#define OPT_RUNQUEUE_NODE 2 > +#define OPT_RUNQUEUE_ALL 3 > +static const char *const opt_runqueue_str[] = { > + [OPT_RUNQUEUE_CORE] = "core", > + [OPT_RUNQUEUE_SOCKET] = "socket", > + [OPT_RUNQUEUE_NODE] = "node", > + [OPT_RUNQUEUE_ALL] = "all" > +}; > +static int __read_mostly opt_runqueue = OPT_RUNQUEUE_SOCKET; > + > +static void parse_credit2_runqueue(const char *s) > +{ > + unsigned int i; > + > + for ( i = 0; i <= OPT_RUNQUEUE_ALL; i++ ) I'd prefer: for ( i = 0; i < ARRAY_SIZE(opt_runqueue_str); i++ ) Juergen > + { > + if ( !strcmp(s, opt_runqueue_str[i]) ) > + { > + opt_runqueue = i; > + return; > + } > + } > + > + printk("WARNING, unrecognized value of credit2_runqueue option!\n"); > +} > +custom_param("credit2_runqueue", parse_credit2_runqueue); > + > +/* > * Per-runqueue data > */ > struct csched2_runqueue_data { > @@ -1974,6 +2027,22 @@ static void deactivate_runqueue(struct csched2_private > *prv, int rqi) > cpumask_clear_cpu(rqi, &prv->active_queues); > } > > +static inline bool_t same_node(unsigned int cpua, unsigned int cpub) > +{ > + return cpu_to_node(cpua) == cpu_to_node(cpub); > +} > + > +static inline bool_t same_socket(unsigned int cpua, unsigned int cpub) > +{ > + return cpu_to_socket(cpua) == cpu_to_socket(cpub); > +} > + > +static inline bool_t same_core(unsigned int cpua, unsigned int cpub) > +{ > + return same_socket(cpua, cpub) && > + cpu_to_core(cpua) == cpu_to_core(cpub); > +} > + > static unsigned int > cpu_to_runqueue(struct csched2_private *prv, unsigned int cpu) > { > @@ -2006,7 +2075,10 @@ cpu_to_runqueue(struct csched2_private *prv, unsigned > int cpu) > BUG_ON(cpu_to_socket(cpu) == XEN_INVALID_SOCKET_ID || > cpu_to_socket(peer_cpu) == XEN_INVALID_SOCKET_ID); > > - if ( cpu_to_socket(cpumask_first(&rqd->active)) == > cpu_to_socket(cpu) ) > + if ( opt_runqueue == OPT_RUNQUEUE_ALL || > + (opt_runqueue == OPT_RUNQUEUE_CORE && same_core(peer_cpu, cpu)) > || > + (opt_runqueue == OPT_RUNQUEUE_SOCKET && same_socket(peer_cpu, > cpu)) || > + (opt_runqueue == OPT_RUNQUEUE_NODE && same_node(peer_cpu, cpu)) > ) > break; > } > > @@ -2170,6 +2242,7 @@ csched2_init(struct scheduler *ops) > printk(" load_window_shift: %d\n", opt_load_window_shift); > printk(" underload_balance_tolerance: %d\n", > opt_underload_balance_tolerance); > printk(" overload_balance_tolerance: %d\n", > opt_overload_balance_tolerance); > + printk(" runqueues arrangement: %s\n", opt_runqueue_str[opt_runqueue]); > > if ( opt_load_window_shift < LOADAVG_WINDOW_SHIFT_MIN ) > { > > _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.