[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH 1/2] credit: Limit load balancing to once per millisecond


  • To: George Dunlap <george.dunlap@xxxxxxxxx>, xen-devel@xxxxxxxxxxxxxxxxxxxx
  • From: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
  • Date: Tue, 4 Jul 2023 18:34:39 +0100
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=citrix.com; dmarc=pass action=none header.from=citrix.com; dkim=pass header.d=citrix.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=ewONuJ2//BWQUIeHAtHVinAG9ujjLQ7OFAniG1+xqgU=; b=g0Nw46L+ikvldbCX1efqKucQ/hYGYFFw5sPAwtBZwrUSIzUY+raVREj7jnqt7k9MtOXRbUkOQPVPtCf8yNkZrglmpITeWPcbMfwdvfMrEwncVwPE5vCrqymelOw8c1A+Qfd2HeAmGTErZFEEbbPVTQoHcicdlauNBJas4qpe96UOuwrtT4E9IaMO45R7PwP0KT+WTIDRwvLGmFEG25xqcMODg4tP7tKcAirwJroxgoj4nyA15vCAhyfyL2yIp9PLLHqooF7FxfdeM7XVafs2XbemfPDS9ezGxocZ4y3K+wRWe+ZcQUwsVMeZcaT5Xik5pmP7PlMf4r0VlyyIHDOTRA==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=NsvUS0JPFzQ/OvkSsj0b56AfmVp05fpFsxPy/Z9tM90WVQjOBW18ygGOM/C42D0eFRNIZ5vaLPEH4usgtDt8Z59o9V5HwqH3it8z6WgJRxnCjFEsxOMqEgFhKqUXowV4okir1kB1su+xVEP2Drf4Z25lF3aAGzxh7q+FYR3sJQdV+pVS5nS94E9vJg/ZgavgpVQXoMA8PBL1PRJbVqtYt1jU4ifLpBqkH4euRWeG8zPBBFUm580wFDorAMzDqBav7Jjg9XfA2dKNjU+YMarqT+B2OjDfZlrgUUmaak4BoF3TYUV/jIVXUl4Jck6w8ls7+o9e9/Y3wjuVTwVcvyg11w==
  • Authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=citrix.com;
  • Cc: Dario Faggioli <dfaggioli@xxxxxxxx>, George Dunlap <george.dunlap@xxxxxxxxxx>, Jan Beulich <jbeulich@xxxxxxxx>, Julien Grall <julien@xxxxxxx>, Stefano Stabellini <sstabellini@xxxxxxxxxx>, Wei Liu <wl@xxxxxxx>
  • Delivery-date: Tue, 04 Jul 2023 17:35:14 +0000
  • Ironport-data: A9a23:7q2HAahxLMgyuWn3qWXI1tzWX161mBEKZh0ujC45NGQN5FlHY01je htvWWmEPqqDNzTyfNx+YN+xoU1SvpfVz4U1SwVp+XoyHnsb9cadCdqndUqhZCn6wu8v7q5Ex 55HNoSfdpBcolv0/ErF3m3J9CEkvU2wbuOgTrWCYmYpHlUMpB4J0XpLg/Q+jpNjne+3CgaMv cKai8DEMRqu1iUc3lg8sspvkzsx+qyr0N8klgZmP6sT4weCzyJ94K83fsldEVOpGuG4IcbiL wrz5OnR1n/U+R4rFuSknt7TGqHdauePVeQmoiM+t5mK2nCulARrukoIHKN0hXNsoyeIh7hMJ OBl7vRcf+uL0prkw4zxWzEAe8130DYvFLXveRBTuuTLp6HKnueFL1yDwyjaMKVBktubD12i+ tRbbyoSME2+utmQybOEVqoxisQ8BZHSadZ3VnFIlVk1DN4AaLWaGeDgw48d2z09wMdTAfzZe swVLyJ1awjNaAFOPVFRD48imOCvhT/0dDgwRFC9/PJrpTSMilEsluGya7I5efTTLSlRtm+eq njL4CLSBRYCOcbE4TGE7mitlqnEmiaTtIc6TeTkqqI32QbIroAVIDMYSUeE+8mTtlSjZskAe 1wM1XcFlrdnoSRHSfG4BXVUukWssRMbQdVdVeEn7gWE0oLf5wGECi4PSTspQMwrsoo6SCIn0 neNnsj1Hnp/vbuNU3Wf+7yI6zSoNkA9IWELaiAFSg8Ey8L+u4x1hRXKJv5hGqOoitz+GRnr3 iuH6iM5gt07j9MP1qi91UDKhXSrvJehZhUu+gzdU2ah7wV4TI2ofYql7R7c9/koBJmdZkmMu j4Dgcf20QwVJZSElSjISuNdGrisvqyBKGeF3gUpGIQ9/TOw/XLlZZpX/Dx1OEZuNIADZCPtZ 0jQ/whW4fe/IUeXUEO+WKrpY+xC8EQqPY2Nuiz8BjaWXqVMSQ==
  • Ironport-hdrordr: A9a23:WgYICqkdaHsN1DSzdQRUxVF4sb7pDfKD3DAbv31ZSRFFG/Fw9v rDoB1/73TJYVkqN03I9ervBED4ewK4yXcW2+ks1MaZPTUO41HYSL2KhLGKq1eMJ8SZzJ8+6U 4KSdkcNDSfNykHse/KpCGzH9MkzJ2s+KSwjefRyDNMQGhRGsZdxjY8IgyWF0h7ACNbH4c+EJ aGoupLzgDQAEg/X4CSDHUBWuSGg9HQjprpbVonCnccmXGzpALtwLT3Fh2Vmi0TVD5Oxrlny2 TfjADjooWv2svLsSP05iv65ZhSndek8MdJHsaWisMYMXHNh2+TFeJccozHkDcpoPy+rHYG+e O83CvI5v4DkU85qFvF3CfQ5w==
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 30/06/2023 12:37 pm, George Dunlap wrote:
> The credit scheduler tries as hard as it can to ensure that it always
> runs scheduling units with positive credit (PRI_TS_UNDER) before
> running those with negative credit (PRI_TS_OVER).  If the next
> runnable scheduling unit is of priority OVER, it will always run the
> load balancer, which will scour the system looking for another
> scheduling unit of the UNDER priority.
>
> Unfortunately, as the number of cores on a system has grown, the cost
> of the work-stealing algorithm has dramatically increased; a recent
> trace on a system with 128 cores showed this taking over 50
> microseconds.
>
> Add a parameter, load_balance_ratelimit, to limit the frequency of
> load balance operations on a given pcpu.  Default this to 1
> millisecond.
>
> Invert the load balancing conditional to make it more clear, and line
> up more closely with the comment above it.
>
> Overall it might be cleaner to have the last_load_balance checking
> happen inside csched_load_balance(), but that would require either
> passing both now and spc into the function, or looking them up again;
> both of which seemed to be worse than simply checking and setting the
> values before calling it.
>
> Without this patch, on a system with a vcpu:pcpu ratio of 2:1, running
> Windows guests (which will end up calling YIELD during spinlock
> contention), this patch increased performance significantly.
>
> Signed-off-by: George Dunlap <george.dunlap@xxxxxxxxx>
> ---
> CC: Dario Faggioli <dfaggioli@xxxxxxxx>
> CC: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
> CC: George Dunlap <george.dunlap@xxxxxxxxxx>
> CC: Jan Beulich <jbeulich@xxxxxxxx>
> CC: Julien Grall <julien@xxxxxxx>
> CC: Stefano Stabellini <sstabellini@xxxxxxxxxx>
> CC: Wei Liu <wl@xxxxxxx>
> ---
>  docs/misc/xen-command-line.pandoc |  6 +++++
>  xen/common/sched/credit.c         | 40 ++++++++++++++++++++++++++-----
>  xen/include/public/sysctl.h       |  6 +++++

Given this filelist, why the sysctl change?

There's no logic to drive this parameter in the xc/libxl param get/set.

The only two in-tree users I can see are xenpm, along with an
unconditional print to stderr saying it's deprecated and to use xl, and xl.

> diff --git a/docs/misc/xen-command-line.pandoc 
> b/docs/misc/xen-command-line.pandoc
> index 4060ebdc5d..369557020f 100644
> --- a/docs/misc/xen-command-line.pandoc
> +++ b/docs/misc/xen-command-line.pandoc
> @@ -1856,6 +1856,12 @@ By default, Xen will use the INVPCID instruction for 
> TLB management if
>  it is available.  This option can be used to cause Xen to fall back to
>  older mechanisms, which are generally slower.
>  
> +### load-balance-ratelimit
> +> `= <integer>`
> +
> +The minimum interval between load balancing events on a given pcpu.
> +At the moment only credit honors this parameter.

So this is intended to be a global scheduler parameter?

> diff --git a/xen/common/sched/credit.c b/xen/common/sched/credit.c
> index f2cd3d9da3..b8bdfd5f6a 100644
> --- a/xen/common/sched/credit.c
> +++ b/xen/common/sched/credit.c
> @@ -1267,7 +1272,8 @@ csched_sys_cntl(const struct scheduler *ops,
>                   && (params->ratelimit_us > XEN_SYSCTL_SCHED_RATELIMIT_MAX
>                       || params->ratelimit_us < 
> XEN_SYSCTL_SCHED_RATELIMIT_MIN))
>               || MICROSECS(params->ratelimit_us) > 
> MILLISECS(params->tslice_ms)
> -             || params->vcpu_migr_delay_us > 
> XEN_SYSCTL_CSCHED_MGR_DLY_MAX_US )
> +             || params->vcpu_migr_delay_us > XEN_SYSCTL_CSCHED_MGR_DLY_MAX_US
> +             || params->load_balance_ratelimit_us > 
> XEN_SYSCTL_CSCHED_LB_RATE_MAX_US)

Style (give or take this hunk being with some logic to drive the new
sysctl).

> @@ -1963,10 +1979,12 @@ static void cf_check csched_schedule(
>           * urgent work... If not, csched_load_balance() will return snext, 
> but
>           * already removed from the runq.
>           */
> -        if ( snext->pri > CSCHED_PRI_TS_OVER )
> -            __runq_remove(snext);
> -        else
> +        if ( snext->pri <= CSCHED_PRI_TS_OVER
> +             && now - spc->last_load_balance > prv->load_balance_ratelimit) {
> +            spc->last_load_balance = now;
>              snext = csched_load_balance(prv, sched_cpu, snext, &migrated);
> +        } else
> +            __runq_remove(snext);

Style.

~Andrew



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.