Xen project Mailing List

Re: [PATCH v2 1/2] credit: Limit load balancing to once per millisecond

From: George Dunlap <george.dunlap@xxxxxxxxx>

Date: Fri, 22 Sep 2023 13:20:57 +0100

Cc: xen-devel@xxxxxxxxxxxxxxxxxxxx, Dario Faggioli <dfaggioli@xxxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Jan Beulich <jbeulich@xxxxxxxx>, Julien Grall <julien@xxxxxxx>, Stefano Stabellini <sstabellini@xxxxxxxxxx>, Wei Liu <wl@xxxxxxx>

Delivery-date: Fri, 22 Sep 2023 12:21:18 +0000

List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On Thu, Sep 21, 2023 at 2:12 PM Juergen Gross <jgross@xxxxxxxx> wrote: > > On 21.09.23 14:23, George Dunlap wrote: > > The credit scheduler tries as hard as it can to ensure that it always > > runs scheduling units with positive credit (PRI_TS_UNDER) before > > running those with negative credit (PRI_TS_OVER). If the next > > runnable scheduling unit is of priority OVER, it will always run the > > load balancer, which will scour the system looking for another > > scheduling unit of the UNDER priority. > > > > Unfortunately, as the number of cores on a system has grown, the cost > > of the work-stealing algorithm has dramatically increased; a recent > > trace on a system with 128 cores showed this taking over 50 > > microseconds. > > > > Add a parameter, load_balance_ratelimit, to limit the frequency of > > load balance operations on a given pcpu. Default this to 1 > > millisecond. > > > > Invert the load balancing conditional to make it more clear, and line > > up more closely with the comment above it. > > > > Overall it might be cleaner to have the last_load_balance checking > > happen inside csched_load_balance(), but that would require either > > passing both now and spc into the function, or looking them up again; > > both of which seemed to be worse than simply checking and setting the > > values before calling it. > > > > On a system with a vcpu:pcpu ratio of 2:1, running Windows guests > > (which will end up calling YIELD during spinlock contention), this > > patch increased performance significantly. > > > > Signed-off-by: George Dunlap <george.dunlap@xxxxxxxxx> > > --- > > Changes since v1: > > - Fix editing mistake in commit message > > - Improve documentation > > - global var is __ro_after_init > > - Remove sysctl, as it's not used. Define max value in credit.c. > > - Fix some style issues > > - Move comment tweak to the right patch > > - In the event that the commandline-parameter value is too high, clip > > to the maximum value rather than setting to the default. > > > > CC: Dario Faggioli <dfaggioli@xxxxxxxx> > > CC: Andrew Cooper <andrew.cooper3@xxxxxxxxxx> > > CC: George Dunlap <george.dunlap@xxxxxxxxxx> > > CC: Jan Beulich <jbeulich@xxxxxxxx> > > CC: Julien Grall <julien@xxxxxxx> > > CC: Stefano Stabellini <sstabellini@xxxxxxxxxx> > > CC: Wei Liu <wl@xxxxxxx> > > --- > > docs/misc/xen-command-line.pandoc | 8 ++++++ > > xen/common/sched/credit.c | 47 +++++++++++++++++++++++++------ > > 2 files changed, 46 insertions(+), 9 deletions(-) > > > > diff --git a/docs/misc/xen-command-line.pandoc > > b/docs/misc/xen-command-line.pandoc > > index f88e6a70ae..9c3c72a7f9 100644 > > --- a/docs/misc/xen-command-line.pandoc > > +++ b/docs/misc/xen-command-line.pandoc > > @@ -1884,6 +1884,14 @@ By default, Xen will use the INVPCID instruction for > > TLB management if > > it is available. This option can be used to cause Xen to fall back to > > older mechanisms, which are generally slower. > > > > +### load-balance-ratelimit > > +> `= <integer>` > > + > > +The minimum interval between load balancing events on a given pcpu, in > > +microseconds. A value of '0' will disable rate limiting. Maximum > > +value 1 second. At the moment only credit honors this parameter. > > +Default 1ms. > > + > > ### noirqbalance (x86) > > > `= <boolean>` > > > > diff --git a/xen/common/sched/credit.c b/xen/common/sched/credit.c > > index f2cd3d9da3..5c06f596d2 100644 > > --- a/xen/common/sched/credit.c > > +++ b/xen/common/sched/credit.c > > @@ -50,6 +50,10 @@ > > #define CSCHED_TICKS_PER_TSLICE 3 > > /* Default timeslice: 30ms */ > > #define CSCHED_DEFAULT_TSLICE_MS 30 > > +/* Default load balancing ratelimit: 1ms */ > > +#define CSCHED_DEFAULT_LOAD_BALANCE_RATELIMIT_US 1000 > > +/* Max load balancing ratelimit: 1s */ > > +#define CSCHED_MAX_LOAD_BALANCE_RATELIMIT_US 1000000 > > #define CSCHED_CREDITS_PER_MSEC 10 > > /* Never set a timer shorter than this value. */ > > #define CSCHED_MIN_TIMER XEN_SYSCTL_SCHED_RATELIMIT_MIN > > @@ -153,6 +157,7 @@ struct csched_pcpu { > > > > unsigned int idle_bias; > > unsigned int nr_runnable; > > + s_time_t last_load_balance; > > > > unsigned int tick; > > struct timer ticker; > > @@ -218,7 +223,7 @@ struct csched_private { > > > > /* Period of master and tick in milliseconds */ > > unsigned int tick_period_us, ticks_per_tslice; > > - s_time_t ratelimit, tslice, unit_migr_delay; > > + s_time_t ratelimit, tslice, unit_migr_delay, load_balance_ratelimit; > > > > struct list_head active_sdom; > > uint32_t weight; > > @@ -612,6 +617,8 @@ init_pdata(struct csched_private *prv, struct > > csched_pcpu *spc, int cpu) > > BUG_ON(!is_idle_unit(curr_on_cpu(cpu))); > > cpumask_set_cpu(cpu, prv->idlers); > > spc->nr_runnable = 0; > > + > > + spc->last_load_balance = NOW(); > > } > > > > static void cf_check > > @@ -1676,9 +1683,17 @@ csched_runq_steal(int peer_cpu, int cpu, int pri, > > int balance_step) > > return NULL; > > } > > > > +/* > > + * Minimum delay, in microseconds, between load balance operations. > > + * This prevents spending too much time doing load balancing, particularly > > + * when the system has a high number of YIELDs due to spinlock priority > > inversion. > > + */ > > +static unsigned int __ro_after_init load_balance_ratelimit_us = > > CSCHED_DEFAULT_LOAD_BALANCE_RATELIMIT_US; > > +integer_param("load-balance-ratelimit", load_balance_ratelimit_us); > > + > > static struct csched_unit * > > csched_load_balance(struct csched_private *prv, int cpu, > > - struct csched_unit *snext, bool *stolen) > > + struct csched_unit *snext, bool *stolen) > > { > > const struct cpupool *c = get_sched_res(cpu)->cpupool; > > struct csched_unit *speer; > > @@ -1958,15 +1973,19 @@ static void cf_check csched_schedule( > > /* > > * SMP Load balance: > > * > > - * If the next highest priority local runnable UNIT has already > > eaten > > - * through its credits, look on other PCPUs to see if we have more > > - * urgent work... If not, csched_load_balance() will return snext, > > but > > - * already removed from the runq. > > + * If the next highest priority local runnable UNIT has > > + * already eaten through its credits (and we're below the > > + * balancing ratelimit), look on other PCPUs to see if we have > > + * more urgent work... If we don't, csched_load_balance() will > > + * return snext, but already removed from the runq. > > */ > > - if ( snext->pri > CSCHED_PRI_TS_OVER ) > > - __runq_remove(snext); > > - else > > + if ( snext->pri <= CSCHED_PRI_TS_OVER > > + && now - spc->last_load_balance > > > prv->load_balance_ratelimit) { > > ^ Just found a style issue (after Andrew pointing out the ones in patch 2). Just checking, you mean the space before the closing `)`? -George

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.