[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v3 08/11] xen: sched: allow for choosing credit2 runqueues configuration at boot



On 08/04/16 08:35, Dario Faggioli wrote:
> On Fri, 2016-04-08 at 06:18 +0200, Juergen Gross wrote:
>> On 08/04/16 03:24, Dario Faggioli wrote:
>>>
>>> In fact, credit2 uses CPU topology to decide how to arrange
>>> its internal runqueues. Before this change, only 'one runqueue
>>> per socket' was allowed. However, experiments have shown that,
>>> for instance, having one runqueue per physical core improves
>>> performance, especially in case hyperthreading is available.
>>>
>>> In general, it makes sense to allow users to pick one runqueue
>>> arrangement at boot time, so that:
>>>  - more experiments can be easily performed to even better
>>>    assess and improve performance;
>>>  - one can select the best configuration for his specific
>>>    use case and/or hardware.
>>>
>>> This patch enables the above.
>>>
>>> Note that, for correctly arranging runqueues to be per-core,
>>> just checking cpu_to_core() on the host CPUs is not enough.
>>> In fact, cores (and hyperthreads) on different sockets, can
>>> have the same core (and thread) IDs! We, therefore, need to
>>> check whether the full topology of two CPUs matches, for
>>> them to be put in the same runqueue.
>>>
>>> Note also that the default (although not functional) for
>>> credit2, since now, has been per-socket runqueue. This patch
>>> leaves things that way, to avoid mixing policy and technical
>>> changes.
>>>
>>> Finally, it would be a nice feature to be able to select
>>> a particular runqueue arrangement, even when creating a
>>> Credit2 cpupool. This is left as future work.
>>>
>>> Signed-off-by: Dario Faggioli <dario.faggioli@xxxxxxxxxx>
>>> Signed-off-by: Uma Sharma <uma.sharma523@xxxxxxxxx>
>>
>> Some nits below.
>>
> Thanks for the quick review!
> 
> A revised version of this patch is provided here (both inlined and
> attached), and a branch with the remaining to be committed patches of
> this series, and with this patch changed as you suggest, is available
> at:
> 
>  git://xenbits.xen.org/people/dariof/xen.git 
> rel/sched/credit2/fix-runq-and-haff-v4
>  
> http://xenbits.xen.org/gitweb/?p=people/dariof/xen.git;a=shortlog;h=refs/heads/rel/sched/credit2/fix-runq-and-haff-v4
> 
> Regards,
> Dario
> ---
> commit 7f491488bbff1cc3af021cd29fca7e0fba321e02
> Author: Dario Faggioli <dario.faggioli@xxxxxxxxxx>
> Date:   Tue Sep 29 14:05:09 2015 +0200
> 
>     xen: sched: allow for choosing credit2 runqueues configuration at boot
>     
>     In fact, credit2 uses CPU topology to decide how to arrange
>     its internal runqueues. Before this change, only 'one runqueue
>     per socket' was allowed. However, experiments have shown that,
>     for instance, having one runqueue per physical core improves
>     performance, especially in case hyperthreading is available.
>     
>     In general, it makes sense to allow users to pick one runqueue
>     arrangement at boot time, so that:
>      - more experiments can be easily performed to even better
>        assess and improve performance;
>      - one can select the best configuration for his specific
>        use case and/or hardware.
>     
>     This patch enables the above.
>     
>     Note that, for correctly arranging runqueues to be per-core,
>     just checking cpu_to_core() on the host CPUs is not enough.
>     In fact, cores (and hyperthreads) on different sockets, can
>     have the same core (and thread) IDs! We, therefore, need to
>     check whether the full topology of two CPUs matches, for
>     them to be put in the same runqueue.
>     
>     Note also that the default (although not functional) for
>     credit2, since now, has been per-socket runqueue. This patch
>     leaves things that way, to avoid mixing policy and technical
>     changes.
>     
>     Finally, it would be a nice feature to be able to select
>     a particular runqueue arrangement, even when creating a
>     Credit2 cpupool. This is left as future work.
>     
>     Signed-off-by: Dario Faggioli <dario.faggioli@xxxxxxxxxx>
>     Signed-off-by: Uma Sharma <uma.sharma523@xxxxxxxxx>

Reviewed-by: George Dunlap <george.dunlap@xxxxxxxxxx>

>     ---
>     Cc: George Dunlap <george.dunlap@xxxxxxxxxxxxx>
>     Cc: Uma Sharma <uma.sharma523@xxxxxxxxx>
>     Cc: Juergen Gross <jgross@xxxxxxxx>
>     ---
>     Changes from v3:
>      * fix type and other issue in comments;
>        use ARRAY_SIZE when iterating the parameter string array.
>     
>     Changes from v2:
>      * valid strings  are now in an array, that we scan during
>        parameter parsing, as suggested during review.
>     
>     Cahnges from v1:
>      * fix bug in parameter parsing, and start using strcmp()
>        for that, as requested during review.
> 
> diff --git a/docs/misc/xen-command-line.markdown 
> b/docs/misc/xen-command-line.markdown
> index ca77e3b..0047f94 100644
> --- a/docs/misc/xen-command-line.markdown
> +++ b/docs/misc/xen-command-line.markdown
> @@ -469,6 +469,25 @@ combination with the `low_crashinfo` command line option.
>  ### credit2\_load\_window\_shift
>  > `= <integer>`
>  
> +### credit2\_runqueue
> +> `= core | socket | node | all`
> +
> +> Default: `socket`
> +
> +Specify how host CPUs are arranged in runqueues. Runqueues are kept
> +balanced with respect to the load generated by the vCPUs running on
> +them. Smaller runqueues (as in with `core`) means more accurate load
> +balancing (for instance, it will deal better with hyperthreading),
> +but also more overhead.
> +
> +Available alternatives, with their meaning, are:
> +* `core`: one runqueue per each physical core of the host;
> +* `socket`: one runqueue per each physical socket (which often,
> +            but not always, matches a NUMA node) of the host;
> +* `node`: one runqueue per each NUMA node of the host;
> +* `all`: just one runqueue shared by all the logical pCPUs of
> +         the host
> +
>  ### dbgp
>  > `= ehci[ <integer> | @pci<bus>:<slot>.<func> ]`
>  
> diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c
> index a61a45a..d43f67a 100644
> --- a/xen/common/sched_credit2.c
> +++ b/xen/common/sched_credit2.c
> @@ -81,10 +81,6 @@
>   * Credits are "reset" when the next vcpu in the runqueue is less than
>   * or equal to zero.  At that point, everyone's credits are "clipped"
>   * to a small value, and a fixed credit is added to everyone.
> - *
> - * The plan is for all cores that share an L2 will share the same
> - * runqueue.  At the moment, there is one global runqueue for all
> - * cores.
>   */
>  
>  /*
> @@ -193,6 +189,63 @@ static int __read_mostly opt_overload_balance_tolerance 
> = -3;
>  integer_param("credit2_balance_over", opt_overload_balance_tolerance);
>  
>  /*
> + * Runqueue organization.
> + *
> + * The various cpus are to be assigned each one to a runqueue, and we
> + * want that to happen basing on topology. At the moment, it is possible
> + * to choose to arrange runqueues to be:
> + *
> + * - per-core: meaning that there will be one runqueue per each physical
> + *             core of the host. This will happen if the opt_runqueue
> + *             parameter is set to 'core';
> + *
> + * - per-socket: meaning that there will be one runqueue per each physical
> + *               socket (AKA package, which often, but not always, also
> + *               matches a NUMA node) of the host; This will happen if
> + *               the opt_runqueue parameter is set to 'socket';
> + *
> + * - per-node: meaning that there will be one runqueue per each physical
> + *             NUMA node of the host. This will happen if the opt_runqueue
> + *             parameter is set to 'node';
> + *
> + * - global: meaning that there will be only one runqueue to which all the
> + *           (logical) processors of the host belong. This will happen if
> + *           the opt_runqueue parameter is set to 'all'.
> + *
> + * Depending on the value of opt_runqueue, therefore, cpus that are part of
> + * either the same physical core, the same physical socket, the same NUMA
> + * node, or just all of them, will be put together to form runqueues.
> + */
> +#define OPT_RUNQUEUE_CORE   0
> +#define OPT_RUNQUEUE_SOCKET 1
> +#define OPT_RUNQUEUE_NODE   2
> +#define OPT_RUNQUEUE_ALL    3
> +static const char *const opt_runqueue_str[] = {
> +    [OPT_RUNQUEUE_CORE] = "core",
> +    [OPT_RUNQUEUE_SOCKET] = "socket",
> +    [OPT_RUNQUEUE_NODE] = "node",
> +    [OPT_RUNQUEUE_ALL] = "all"
> +};
> +static int __read_mostly opt_runqueue = OPT_RUNQUEUE_SOCKET;
> +
> +static void parse_credit2_runqueue(const char *s)
> +{
> +    unsigned int i;
> +
> +    for ( i = 0; i < ARRAY_SIZE(opt_runqueue_str); i++ )
> +    {
> +        if ( !strcmp(s, opt_runqueue_str[i]) )
> +        {
> +            opt_runqueue = i;
> +            return;
> +        }
> +    }
> +
> +    printk("WARNING, unrecognized value of credit2_runqueue option!\n");
> +}
> +custom_param("credit2_runqueue", parse_credit2_runqueue);
> +
> +/*
>   * Per-runqueue data
>   */
>  struct csched2_runqueue_data {
> @@ -1974,6 +2027,22 @@ static void deactivate_runqueue(struct csched2_private 
> *prv, int rqi)
>      cpumask_clear_cpu(rqi, &prv->active_queues);
>  }
>  
> +static inline bool_t same_node(unsigned int cpua, unsigned int cpub)
> +{
> +    return cpu_to_node(cpua) == cpu_to_node(cpub);
> +}
> +
> +static inline bool_t same_socket(unsigned int cpua, unsigned int cpub)
> +{
> +    return cpu_to_socket(cpua) == cpu_to_socket(cpub);
> +}
> +
> +static inline bool_t same_core(unsigned int cpua, unsigned int cpub)
> +{
> +    return same_socket(cpua, cpub) &&
> +           cpu_to_core(cpua) == cpu_to_core(cpub);
> +}
> +
>  static unsigned int
>  cpu_to_runqueue(struct csched2_private *prv, unsigned int cpu)
>  {
> @@ -2006,7 +2075,10 @@ cpu_to_runqueue(struct csched2_private *prv, unsigned 
> int cpu)
>          BUG_ON(cpu_to_socket(cpu) == XEN_INVALID_SOCKET_ID ||
>                 cpu_to_socket(peer_cpu) == XEN_INVALID_SOCKET_ID);
>  
> -        if ( cpu_to_socket(cpumask_first(&rqd->active)) == 
> cpu_to_socket(cpu) )
> +        if ( opt_runqueue == OPT_RUNQUEUE_ALL ||
> +             (opt_runqueue == OPT_RUNQUEUE_CORE && same_core(peer_cpu, cpu)) 
> ||
> +             (opt_runqueue == OPT_RUNQUEUE_SOCKET && same_socket(peer_cpu, 
> cpu)) ||
> +             (opt_runqueue == OPT_RUNQUEUE_NODE && same_node(peer_cpu, cpu)) 
> )
>              break;
>      }
>  
> @@ -2170,6 +2242,7 @@ csched2_init(struct scheduler *ops)
>      printk(" load_window_shift: %d\n", opt_load_window_shift);
>      printk(" underload_balance_tolerance: %d\n", 
> opt_underload_balance_tolerance);
>      printk(" overload_balance_tolerance: %d\n", 
> opt_overload_balance_tolerance);
> +    printk(" runqueues arrangement: %s\n", opt_runqueue_str[opt_runqueue]);
>  
>      if ( opt_load_window_shift < LOADAVG_WINDOW_SHIFT_MIN )
>      {
> 


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.