|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH v2 1/2] credit: Limit load balancing to once per millisecond
On Thu, Sep 21, 2023 at 2:12 PM Juergen Gross <jgross@xxxxxxxx> wrote:
>
> On 21.09.23 14:23, George Dunlap wrote:
> > The credit scheduler tries as hard as it can to ensure that it always
> > runs scheduling units with positive credit (PRI_TS_UNDER) before
> > running those with negative credit (PRI_TS_OVER). If the next
> > runnable scheduling unit is of priority OVER, it will always run the
> > load balancer, which will scour the system looking for another
> > scheduling unit of the UNDER priority.
> >
> > Unfortunately, as the number of cores on a system has grown, the cost
> > of the work-stealing algorithm has dramatically increased; a recent
> > trace on a system with 128 cores showed this taking over 50
> > microseconds.
> >
> > Add a parameter, load_balance_ratelimit, to limit the frequency of
> > load balance operations on a given pcpu. Default this to 1
> > millisecond.
> >
> > Invert the load balancing conditional to make it more clear, and line
> > up more closely with the comment above it.
> >
> > Overall it might be cleaner to have the last_load_balance checking
> > happen inside csched_load_balance(), but that would require either
> > passing both now and spc into the function, or looking them up again;
> > both of which seemed to be worse than simply checking and setting the
> > values before calling it.
> >
> > On a system with a vcpu:pcpu ratio of 2:1, running Windows guests
> > (which will end up calling YIELD during spinlock contention), this
> > patch increased performance significantly.
> >
> > Signed-off-by: George Dunlap <george.dunlap@xxxxxxxxx>
> > ---
> > Changes since v1:
> > - Fix editing mistake in commit message
> > - Improve documentation
> > - global var is __ro_after_init
> > - Remove sysctl, as it's not used. Define max value in credit.c.
> > - Fix some style issues
> > - Move comment tweak to the right patch
> > - In the event that the commandline-parameter value is too high, clip
> > to the maximum value rather than setting to the default.
> >
> > CC: Dario Faggioli <dfaggioli@xxxxxxxx>
> > CC: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
> > CC: George Dunlap <george.dunlap@xxxxxxxxxx>
> > CC: Jan Beulich <jbeulich@xxxxxxxx>
> > CC: Julien Grall <julien@xxxxxxx>
> > CC: Stefano Stabellini <sstabellini@xxxxxxxxxx>
> > CC: Wei Liu <wl@xxxxxxx>
> > ---
> > docs/misc/xen-command-line.pandoc | 8 ++++++
> > xen/common/sched/credit.c | 47 +++++++++++++++++++++++++------
> > 2 files changed, 46 insertions(+), 9 deletions(-)
> >
> > diff --git a/docs/misc/xen-command-line.pandoc
> > b/docs/misc/xen-command-line.pandoc
> > index f88e6a70ae..9c3c72a7f9 100644
> > --- a/docs/misc/xen-command-line.pandoc
> > +++ b/docs/misc/xen-command-line.pandoc
> > @@ -1884,6 +1884,14 @@ By default, Xen will use the INVPCID instruction for
> > TLB management if
> > it is available. This option can be used to cause Xen to fall back to
> > older mechanisms, which are generally slower.
> >
> > +### load-balance-ratelimit
> > +> `= <integer>`
> > +
> > +The minimum interval between load balancing events on a given pcpu, in
> > +microseconds. A value of '0' will disable rate limiting. Maximum
> > +value 1 second. At the moment only credit honors this parameter.
> > +Default 1ms.
> > +
> > ### noirqbalance (x86)
> > > `= <boolean>`
> >
> > diff --git a/xen/common/sched/credit.c b/xen/common/sched/credit.c
> > index f2cd3d9da3..5c06f596d2 100644
> > --- a/xen/common/sched/credit.c
> > +++ b/xen/common/sched/credit.c
> > @@ -50,6 +50,10 @@
> > #define CSCHED_TICKS_PER_TSLICE 3
> > /* Default timeslice: 30ms */
> > #define CSCHED_DEFAULT_TSLICE_MS 30
> > +/* Default load balancing ratelimit: 1ms */
> > +#define CSCHED_DEFAULT_LOAD_BALANCE_RATELIMIT_US 1000
> > +/* Max load balancing ratelimit: 1s */
> > +#define CSCHED_MAX_LOAD_BALANCE_RATELIMIT_US 1000000
> > #define CSCHED_CREDITS_PER_MSEC 10
> > /* Never set a timer shorter than this value. */
> > #define CSCHED_MIN_TIMER XEN_SYSCTL_SCHED_RATELIMIT_MIN
> > @@ -153,6 +157,7 @@ struct csched_pcpu {
> >
> > unsigned int idle_bias;
> > unsigned int nr_runnable;
> > + s_time_t last_load_balance;
> >
> > unsigned int tick;
> > struct timer ticker;
> > @@ -218,7 +223,7 @@ struct csched_private {
> >
> > /* Period of master and tick in milliseconds */
> > unsigned int tick_period_us, ticks_per_tslice;
> > - s_time_t ratelimit, tslice, unit_migr_delay;
> > + s_time_t ratelimit, tslice, unit_migr_delay, load_balance_ratelimit;
> >
> > struct list_head active_sdom;
> > uint32_t weight;
> > @@ -612,6 +617,8 @@ init_pdata(struct csched_private *prv, struct
> > csched_pcpu *spc, int cpu)
> > BUG_ON(!is_idle_unit(curr_on_cpu(cpu)));
> > cpumask_set_cpu(cpu, prv->idlers);
> > spc->nr_runnable = 0;
> > +
> > + spc->last_load_balance = NOW();
> > }
> >
> > static void cf_check
> > @@ -1676,9 +1683,17 @@ csched_runq_steal(int peer_cpu, int cpu, int pri,
> > int balance_step)
> > return NULL;
> > }
> >
> > +/*
> > + * Minimum delay, in microseconds, between load balance operations.
> > + * This prevents spending too much time doing load balancing, particularly
> > + * when the system has a high number of YIELDs due to spinlock priority
> > inversion.
> > + */
> > +static unsigned int __ro_after_init load_balance_ratelimit_us =
> > CSCHED_DEFAULT_LOAD_BALANCE_RATELIMIT_US;
> > +integer_param("load-balance-ratelimit", load_balance_ratelimit_us);
> > +
> > static struct csched_unit *
> > csched_load_balance(struct csched_private *prv, int cpu,
> > - struct csched_unit *snext, bool *stolen)
> > + struct csched_unit *snext, bool *stolen)
> > {
> > const struct cpupool *c = get_sched_res(cpu)->cpupool;
> > struct csched_unit *speer;
> > @@ -1958,15 +1973,19 @@ static void cf_check csched_schedule(
> > /*
> > * SMP Load balance:
> > *
> > - * If the next highest priority local runnable UNIT has already
> > eaten
> > - * through its credits, look on other PCPUs to see if we have more
> > - * urgent work... If not, csched_load_balance() will return snext,
> > but
> > - * already removed from the runq.
> > + * If the next highest priority local runnable UNIT has
> > + * already eaten through its credits (and we're below the
> > + * balancing ratelimit), look on other PCPUs to see if we have
> > + * more urgent work... If we don't, csched_load_balance() will
> > + * return snext, but already removed from the runq.
> > */
> > - if ( snext->pri > CSCHED_PRI_TS_OVER )
> > - __runq_remove(snext);
> > - else
> > + if ( snext->pri <= CSCHED_PRI_TS_OVER
> > + && now - spc->last_load_balance >
> > prv->load_balance_ratelimit) {
>
> ^ Just found a style issue (after Andrew pointing out the ones in patch 2).
Just checking, you mean the space before the closing `)`?
-George
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |