[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v2 1/2] credit: Limit load balancing to once per millisecond


  • To: George Dunlap <george.dunlap@xxxxxxxxx>
  • From: Jan Beulich <jbeulich@xxxxxxxx>
  • Date: Fri, 22 Sep 2023 15:49:48 +0200
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=suse.com; dmarc=pass action=none header.from=suse.com; dkim=pass header.d=suse.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=ns+rH3fRRXfd/M6YwkHbVDpDhOXYUvOlcZ6sLQnx8xs=; b=VWTW3wpLe0mnJuBNK/jQojzvwMhFZq/6t3KTps3z7AxhbVJw+w+zr2siCWGr/YkB/Uv0xXjVb0XX0xGOvxIuUcztpbA0EVCw+QaLADkJ4XsYWWrhkbLIkR6jswgwFTfUuOzHs+oibUR6g+WWs7cVUgkRlFowTJQjzj8K+eSEcisc/BVd6jao5QphG0wkThVDS2CoLJoXYtQr6BWquMnvuN5DqVsfdkYQ9pHk0bDJCu0+g+N2IszA+esf2hbaMqtsuJCKNkQWsm9I+Lwrmnxm33eSRX+Y027JkrsNQIRnjB+Obb2YDrXQXaZ9BioT3b+++nj3loI3StMz4ixPcpapqw==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=GWseXgAHbxE93V6TDC0z7VfXYDZcslx4fFC3Y5duj5SNiFzf3HEiKJV3i0dYGkNsRsZqAf0qXDpc6daOgK1NvR+joMOcio270i744hYwrUfPV1jlDslYGPjFqqIS3qgc6sYcGF7BKUxjdMs26dfYjmOprzNH3D5R2A9OHWZ4lrS1rV4cYNY4AYcYasbvNQTioTOVFlzYiW+CEtE4REmMh6Y+IASaSdPB7zY/WxVeu5v5V9sxyLLrfJ9jWdG5cdJWW2j73dKrboCX0XXQD3rWrit6mb4JtRfLsl8NsMCo5g/o1lQl4zocayxMffhzyUDeRbxO6qXC+HAsBBPOf+zMtg==
  • Authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=suse.com;
  • Cc: xen-devel@xxxxxxxxxxxxxxxxxxxx, Dario Faggioli <dfaggioli@xxxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Julien Grall <julien@xxxxxxx>, Stefano Stabellini <sstabellini@xxxxxxxxxx>, Wei Liu <wl@xxxxxxx>, Juergen Gross <jgross@xxxxxxxx>
  • Delivery-date: Fri, 22 Sep 2023 13:50:03 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 22.09.2023 14:20, George Dunlap wrote:
> On Thu, Sep 21, 2023 at 2:12 PM Juergen Gross <jgross@xxxxxxxx> wrote:
>>
>> On 21.09.23 14:23, George Dunlap wrote:
>>> The credit scheduler tries as hard as it can to ensure that it always
>>> runs scheduling units with positive credit (PRI_TS_UNDER) before
>>> running those with negative credit (PRI_TS_OVER).  If the next
>>> runnable scheduling unit is of priority OVER, it will always run the
>>> load balancer, which will scour the system looking for another
>>> scheduling unit of the UNDER priority.
>>>
>>> Unfortunately, as the number of cores on a system has grown, the cost
>>> of the work-stealing algorithm has dramatically increased; a recent
>>> trace on a system with 128 cores showed this taking over 50
>>> microseconds.
>>>
>>> Add a parameter, load_balance_ratelimit, to limit the frequency of
>>> load balance operations on a given pcpu.  Default this to 1
>>> millisecond.
>>>
>>> Invert the load balancing conditional to make it more clear, and line
>>> up more closely with the comment above it.
>>>
>>> Overall it might be cleaner to have the last_load_balance checking
>>> happen inside csched_load_balance(), but that would require either
>>> passing both now and spc into the function, or looking them up again;
>>> both of which seemed to be worse than simply checking and setting the
>>> values before calling it.
>>>
>>> On a system with a vcpu:pcpu ratio of 2:1, running Windows guests
>>> (which will end up calling YIELD during spinlock contention), this
>>> patch increased performance significantly.
>>>
>>> Signed-off-by: George Dunlap <george.dunlap@xxxxxxxxx>
>>> ---
>>> Changes since v1:
>>> - Fix editing mistake in commit message
>>> - Improve documentation
>>> - global var is __ro_after_init
>>> - Remove sysctl, as it's not used.  Define max value in credit.c.
>>> - Fix some style issues
>>> - Move comment tweak to the right patch
>>> - In the event that the commandline-parameter value is too high, clip
>>>    to the maximum value rather than setting to the default.
>>>
>>> CC: Dario Faggioli <dfaggioli@xxxxxxxx>
>>> CC: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
>>> CC: George Dunlap <george.dunlap@xxxxxxxxxx>
>>> CC: Jan Beulich <jbeulich@xxxxxxxx>
>>> CC: Julien Grall <julien@xxxxxxx>
>>> CC: Stefano Stabellini <sstabellini@xxxxxxxxxx>
>>> CC: Wei Liu <wl@xxxxxxx>
>>> ---
>>>   docs/misc/xen-command-line.pandoc |  8 ++++++
>>>   xen/common/sched/credit.c         | 47 +++++++++++++++++++++++++------
>>>   2 files changed, 46 insertions(+), 9 deletions(-)
>>>
>>> diff --git a/docs/misc/xen-command-line.pandoc 
>>> b/docs/misc/xen-command-line.pandoc
>>> index f88e6a70ae..9c3c72a7f9 100644
>>> --- a/docs/misc/xen-command-line.pandoc
>>> +++ b/docs/misc/xen-command-line.pandoc
>>> @@ -1884,6 +1884,14 @@ By default, Xen will use the INVPCID instruction for 
>>> TLB management if
>>>   it is available.  This option can be used to cause Xen to fall back to
>>>   older mechanisms, which are generally slower.
>>>
>>> +### load-balance-ratelimit
>>> +> `= <integer>`
>>> +
>>> +The minimum interval between load balancing events on a given pcpu, in
>>> +microseconds.  A value of '0' will disable rate limiting.  Maximum
>>> +value 1 second. At the moment only credit honors this parameter.
>>> +Default 1ms.
>>> +
>>>   ### noirqbalance (x86)
>>>   > `= <boolean>`
>>>
>>> diff --git a/xen/common/sched/credit.c b/xen/common/sched/credit.c
>>> index f2cd3d9da3..5c06f596d2 100644
>>> --- a/xen/common/sched/credit.c
>>> +++ b/xen/common/sched/credit.c
>>> @@ -50,6 +50,10 @@
>>>   #define CSCHED_TICKS_PER_TSLICE     3
>>>   /* Default timeslice: 30ms */
>>>   #define CSCHED_DEFAULT_TSLICE_MS    30
>>> +/* Default load balancing ratelimit: 1ms */
>>> +#define CSCHED_DEFAULT_LOAD_BALANCE_RATELIMIT_US 1000
>>> +/* Max load balancing ratelimit: 1s */
>>> +#define CSCHED_MAX_LOAD_BALANCE_RATELIMIT_US     1000000
>>>   #define CSCHED_CREDITS_PER_MSEC     10
>>>   /* Never set a timer shorter than this value. */
>>>   #define CSCHED_MIN_TIMER            XEN_SYSCTL_SCHED_RATELIMIT_MIN
>>> @@ -153,6 +157,7 @@ struct csched_pcpu {
>>>
>>>       unsigned int idle_bias;
>>>       unsigned int nr_runnable;
>>> +    s_time_t last_load_balance;
>>>
>>>       unsigned int tick;
>>>       struct timer ticker;
>>> @@ -218,7 +223,7 @@ struct csched_private {
>>>
>>>       /* Period of master and tick in milliseconds */
>>>       unsigned int tick_period_us, ticks_per_tslice;
>>> -    s_time_t ratelimit, tslice, unit_migr_delay;
>>> +    s_time_t ratelimit, tslice, unit_migr_delay, load_balance_ratelimit;
>>>
>>>       struct list_head active_sdom;
>>>       uint32_t weight;
>>> @@ -612,6 +617,8 @@ init_pdata(struct csched_private *prv, struct 
>>> csched_pcpu *spc, int cpu)
>>>       BUG_ON(!is_idle_unit(curr_on_cpu(cpu)));
>>>       cpumask_set_cpu(cpu, prv->idlers);
>>>       spc->nr_runnable = 0;
>>> +
>>> +    spc->last_load_balance = NOW();
>>>   }
>>>
>>>   static void cf_check
>>> @@ -1676,9 +1683,17 @@ csched_runq_steal(int peer_cpu, int cpu, int pri, 
>>> int balance_step)
>>>       return NULL;
>>>   }
>>>
>>> +/*
>>> + * Minimum delay, in microseconds, between load balance operations.
>>> + * This prevents spending too much time doing load balancing, particularly
>>> + * when the system has a high number of YIELDs due to spinlock priority 
>>> inversion.
>>> + */
>>> +static unsigned int __ro_after_init load_balance_ratelimit_us = 
>>> CSCHED_DEFAULT_LOAD_BALANCE_RATELIMIT_US;
>>> +integer_param("load-balance-ratelimit", load_balance_ratelimit_us);
>>> +
>>>   static struct csched_unit *
>>>   csched_load_balance(struct csched_private *prv, int cpu,
>>> -    struct csched_unit *snext, bool *stolen)
>>> +                    struct csched_unit *snext, bool *stolen)
>>>   {
>>>       const struct cpupool *c = get_sched_res(cpu)->cpupool;
>>>       struct csched_unit *speer;
>>> @@ -1958,15 +1973,19 @@ static void cf_check csched_schedule(
>>>           /*
>>>            * SMP Load balance:
>>>            *
>>> -         * If the next highest priority local runnable UNIT has already 
>>> eaten
>>> -         * through its credits, look on other PCPUs to see if we have more
>>> -         * urgent work... If not, csched_load_balance() will return snext, 
>>> but
>>> -         * already removed from the runq.
>>> +         * If the next highest priority local runnable UNIT has
>>> +         * already eaten through its credits (and we're below the
>>> +         * balancing ratelimit), look on other PCPUs to see if we have
>>> +         * more urgent work... If we don't, csched_load_balance() will
>>> +         * return snext, but already removed from the runq.
>>>            */
>>> -        if ( snext->pri > CSCHED_PRI_TS_OVER )
>>> -            __runq_remove(snext);
>>> -        else
>>> +        if ( snext->pri <= CSCHED_PRI_TS_OVER
>>> +             && now - spc->last_load_balance > 
>>> prv->load_balance_ratelimit) {
>>
>> ^ Just found a style issue (after Andrew pointing out the ones in patch 2).
> 
> Just checking, you mean the space before the closing `)`?

And the placement of the opening {.

Jan



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.