[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v5 06/15] cpufreq: Add Hardware P-State (HWP) driver


  • To: Jason Andryuk <jandryuk@xxxxxxxxx>
  • From: Jan Beulich <jbeulich@xxxxxxxx>
  • Date: Mon, 10 Jul 2023 15:13:45 +0200
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=suse.com; dmarc=pass action=none header.from=suse.com; dkim=pass header.d=suse.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=PxdJdiCEPgMDtV3+5TKaW6K4PuBqE49IsjTzpPtdLpY=; b=jTrQUaek+gAkOCWW/TcX/O43rbwoOdjpGHE3RFKIGogUVaJihUdbCN3CgVRznL5so0GP7P4QaGTaqkcVGscf/PxXzmWuL9woEvTQMZ5E4g6NsM6/dlgraqQOhY43spTcCEh5MnmUh/0vDNourMro7hjXygwjC7GHazDMWSguXS5PVE35qZm1TShIlIwEOR0LChwqYWRve7n7lWwlARcN5OX7ZUyj/Fp5ffMmydCNOnn34Dw+pE5Fr6Fz+U3DLamZULKRPVxhe7/Jc3ooe7EZTY5o86d7dakE+qITmqJRvG8PxysugqqWe46G1ZVOWmZub9i4Jxs9vY/LVkk+y33OsA==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=n94vPFM8ZcQKXdops7kfEeep60W9yqzcn1gsWw2NSdDFqH+R2E9BSBOA8xSdSrpzRDgvHmSEiYTrt0vdKiUjHTDbqlL0mFhQ6l91TVD/1bml7fXrwe8eABTJF8YWJzLfHpQAtlJH/7BtHNIloIzp7BUkAEfUUL74enjLpVDUaBevwB4BF2/QPkyS7PNJ0kqurn8KODUEnqhI2//c/+oVbIEy1iwM6KVjTY6CH7t0NzLP8FgfKuZOJOMF5H8epNKytlrfse2O0Ein2301VFSKTXShNDj4UQE4LCloPeNZBU/TEAXprYymrWatJmg8VEmK1Lpx8W5/o5Jb4EANvj8EOA==
  • Authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=suse.com;
  • Cc: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, George Dunlap <george.dunlap@xxxxxxxxxx>, Julien Grall <julien@xxxxxxx>, Stefano Stabellini <sstabellini@xxxxxxxxxx>, Wei Liu <wl@xxxxxxx>, Roger Pau Monné <roger.pau@xxxxxxxxxx>, xen-devel@xxxxxxxxxxxxxxxxxxxx
  • Delivery-date: Mon, 10 Jul 2023 13:14:08 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 06.07.2023 20:54, Jason Andryuk wrote:
> @@ -510,6 +510,22 @@ choice of `dom0-kernel` is deprecated and not supported 
> by all Dom0 kernels.
>  * `<maxfreq>` and `<minfreq>` are integers which represent max and min 
> processor frequencies
>    respectively.
>  * `verbose` option can be included as a string or also as `verbose=<integer>`
> +  for `xen`.  It is a boolean for `hwp`.
> +* `hwp` selects Hardware-Controlled Performance States (HWP) on supported 
> Intel
> +  hardware.  HWP is a Skylake+ feature which provides better CPU power
> +  management.  The default is disabled.  If `hwp` is selected, but hardware
> +  support is not available, Xen will fallback to cpufreq=xen.
> +* `<hdc>` is a boolean to enable Hardware Duty Cycling (HDC).  HDC enables 
> the
> +  processor to autonomously force physical package components into idle 
> state.
> +  The default is enabled, but the option only applies when `hwp` is enabled.
> +
> +There is also support for `;`-separated fallback options:
> +`cpufreq=hwp,verbose;xen`.  This first tries `hwp` and falls back to `xen`
> +if unavailable.

In the given example, does "verbose" also apply to the fallback case? If so,
perhaps better "cpufreq=hwp;xen,verbose", to eliminate that ambiguity?

> --- a/xen/arch/x86/acpi/cpufreq/cpufreq.c
> +++ b/xen/arch/x86/acpi/cpufreq/cpufreq.c
> @@ -642,7 +642,24 @@ static int __init cf_check cpufreq_driver_init(void)
>          switch ( boot_cpu_data.x86_vendor )
>          {
>          case X86_VENDOR_INTEL:
> -            ret = cpufreq_register_driver(&acpi_cpufreq_driver);
> +            unsigned int i;

At the moment we still don't mix declarations and statements, i.e. all
declarations have to be at the top of a block/scope. What iirc we do use
in a couple of places (and what hence you may want to do here as well) is
...

> +            ret = -ENOENT;
> +
> +            for ( i = 0; i < cpufreq_xen_cnt; i++ )

... declare the induction variable inside the loop header.

> +            {
> +                switch ( cpufreq_xen_opts[i] )
> +                {
> +                case CPUFREQ_xen:
> +                    ret = cpufreq_register_driver(&acpi_cpufreq_driver);
> +                    break;
> +                case CPUFREQ_hwp:
> +                    ret = hwp_register_driver();
> +                    break;
> +                }
> +
> +                if ( ret == 0 )
> +                    break;
> +            }
>              break;

In this model any kind of failure results in the fallback to be tried
(and the fallback's error to be returned to the caller rather than
the primary one). This may or may not be what we actually want;
personally I would have expected

                if ( ret != -ENODEV )
                    break;

or some such instead.

> +static bool hwp_handle_option(const char *s, const char *end)
> +{
> +    int ret;
> +
> +    ret = parse_boolean("verbose", s, end);
> +    if ( ret >= 0 ) {

Nit: Style (brace placement).

> +        cpufreq_verbose = ret;
> +        return true;
> +    }
> +
> +    ret = parse_boolean("hdc", s, end);
> +    if ( ret >= 0 ) {

Same here.

> +        opt_cpufreq_hdc = ret;
> +        return true;
> +    }
> +
> +    return false;
> +}
> +
> +int __init hwp_cmdline_parse(const char *s, const char *e)
> +{
> +    do
> +    {
> +        const char *end = strpbrk(s, ",;");
> +
> +        if ( s && !hwp_handle_option(s, end) )

This check of s not being NULL comes too late, as strpbrk() would have
de-referenced it already. Considering ...

> +        {
> +            printk(XENLOG_WARNING "cpufreq/hwp: option '%s' not 
> recognized\n",
> +                   s);
> +
> +            return -1;
> +        }
> +
> +        s = end ? ++end : end;
> +    } while ( s && s < e );

... this it probably wants to move even ahead of the loop.

> +static int hdc_set_pkg_hdc_ctl(unsigned int cpu, bool val)
> +{
> +    uint64_t msr;
> +
> +    if ( rdmsr_safe(MSR_PKG_HDC_CTL, msr) )
> +    {
> +        hwp_err(cpu, "rdmsr_safe(MSR_PKG_HDC_CTL)\n");
> +        return -1;
> +    }
> +
> +    if ( val )
> +        msr |= PKG_HDC_CTL_HDC_PKG_ENABLE;
> +    else
> +        msr &= ~PKG_HDC_CTL_HDC_PKG_ENABLE;
> +
> +    if ( wrmsr_safe(MSR_PKG_HDC_CTL, msr) )
> +    {
> +        hwp_err(cpu, "wrmsr_safe(MSR_PKG_HDC_CTL): %016lx\n", msr);
> +        return -1;
> +    }
> +
> +    return 0;
> +}

Please can you use either boolean return values or proper 0 / -errno
ones? (Same again then in the subsequent function.)

> +static void cf_check hwp_init_msrs(void *info)
> +{
> +    struct cpufreq_policy *policy = info;
> +    struct hwp_drv_data *data = this_cpu(hwp_drv_data);
> +    uint64_t val;
> +
> +    /*
> +     * Package level MSR, but we don't have a good idea of packages here, so
> +     * just do it everytime.
> +     */
> +    if ( rdmsr_safe(MSR_PM_ENABLE, val) )
> +    {
> +        hwp_err(policy->cpu, "rdmsr_safe(MSR_PM_ENABLE)\n");
> +        data->curr_req.raw = -1;
> +        return;
> +    }
> +
> +    /* Ensure we don't generate interrupts */
> +    if ( feature_hwp_notification )
> +        wrmsr_safe(MSR_HWP_INTERRUPT, 0);
> +
> +    hwp_verbose("CPU%u: MSR_PM_ENABLE: %016lx\n", policy->cpu, val);
> +    if ( !(val & PM_ENABLE_HWP_ENABLE) )
> +    {
> +        val |= PM_ENABLE_HWP_ENABLE;
> +        if ( wrmsr_safe(MSR_PM_ENABLE, val) )
> +        {
> +            hwp_err(policy->cpu, "wrmsr_safe(MSR_PM_ENABLE, %lx)\n", val);
> +            data->curr_req.raw = -1;
> +            return;
> +        }
> +    }
> +
> +    if ( rdmsr_safe(MSR_HWP_CAPABILITIES, data->hwp_caps) )
> +    {
> +        hwp_err(policy->cpu, "rdmsr_safe(MSR_HWP_CAPABILITIES)\n");
> +        goto error;
> +    }
> +
> +    if ( rdmsr_safe(MSR_HWP_REQUEST, data->curr_req.raw) )
> +    {
> +        hwp_err(policy->cpu, "rdmsr_safe(MSR_HWP_REQUEST)\n");
> +        goto error;
> +    }
> +
> +    /*
> +     * Check for APERF/MPERF support in hardware
> +     * also check for boost/turbo support
> +     */
> +    intel_feature_detect(policy);
> +
> +    if ( feature_hdc )
> +    {
> +        if ( hdc_set_pkg_hdc_ctl(policy->cpu, opt_cpufreq_hdc) ||
> +             hdc_set_pm_ctl1(policy->cpu, opt_cpufreq_hdc) ) {

Please can these two if()s be joined and the well-placed brace be
retained?

> +            hwp_err(policy->cpu, "Disabling HDC support\n");
> +            feature_hdc = false;
> +            goto error;

Why? Can't you continue just with HDC turned off?

> +static void cf_check hwp_write_request(void *info)
> +{
> +    const struct cpufreq_policy *policy = info;
> +    struct hwp_drv_data *data = this_cpu(hwp_drv_data);
> +    union hwp_request hwp_req = data->curr_req;
> +
> +    data->ret = 0;
> +
> +    BUILD_BUG_ON(sizeof(union hwp_request) != sizeof(hwp_req.raw));

You changed only the right side to not be sizeof(<type>).

> +static int cf_check hwp_cpufreq_cpu_init(struct cpufreq_policy *policy)
> +{
> +    unsigned int cpu = policy->cpu;
> +    struct hwp_drv_data *data;
> +
> +    data = xzalloc(struct hwp_drv_data);
> +    if ( !data )
> +        return -ENOMEM;
> +
> +    policy->governor = &cpufreq_gov_hwp;
> +
> +    per_cpu(hwp_drv_data, cpu) = data;
> +
> +    on_selected_cpus(cpumask_of(cpu), hwp_init_msrs, policy, 1);

Could I talk you into moving the helper function immediately ahead of
this (sole) one using it, much like you have it for hwp_cpufreq_target()
and hwp_write_request()?

> +    if ( data->curr_req.raw == -1 )
> +    {
> +        hwp_err(cpu, "Could not initialize HWP properly\n");
> +        per_cpu(hwp_drv_data, cpu) = NULL;
> +        xfree(data);
> +        return -ENODEV;
> +    }
> +
> +    data->minimum = data->curr_req.min_perf;
> +    data->maximum = data->curr_req.max_perf;
> +    data->desired = data->curr_req.desired;
> +    data->energy_perf = data->curr_req.energy_perf;
> +    data->activity_window = data->curr_req.activity_window;
> +
> +    if ( cpu == 0 )
> +        hwp_verbose("CPU%u: HWP_CAPABILITIES: %016lx\n", cpu, 
> data->hwp_caps);

While I'm fine with this (perhaps apart from you using "cpu == 0",
which is an idiom we're trying to get rid of), ...

> +    hwp_verbose("CPU%u: rdmsr HWP_REQUEST %016lx\n", cpu, 
> data->curr_req.raw);

... this once-per-CPU message still looks to verbose to me. Perhaps
for both:
- print for the BSP,
- print when AP value differs from BSP (albeit I don't know how
[un]likely that is)?

> +static int cf_check hwp_cpufreq_cpu_exit(struct cpufreq_policy *policy)
> +{
> +    struct hwp_drv_data *data = per_cpu(hwp_drv_data, policy->cpu);
> +    per_cpu(hwp_drv_data, policy->cpu) = NULL;
> +    xfree(data);

Nit: Style (blank line between declaration(s) and statement(s) please.
(Also at least once again below.)

> --- a/xen/drivers/cpufreq/cpufreq.c
> +++ b/xen/drivers/cpufreq/cpufreq.c
> @@ -63,12 +63,18 @@ LIST_HEAD_READ_MOSTLY(cpufreq_governor_list);
>  /* set xen as default cpufreq */
>  enum cpufreq_controller cpufreq_controller = FREQCTL_xen;
>  
> -static int __init cpufreq_cmdline_parse(const char *s);
> +enum cpufreq_xen_opt cpufreq_xen_opts[2] = { CPUFREQ_xen, };
> +unsigned int cpufreq_xen_cnt = 1;

Looks like both can be __initdata?

As to the array initializer: For one Misra won't like the 2nd slot not
initialized. Plus the implicit 0 there is nothing else than CPUFREQ_xen,
which also ends up a little fragile. Perhaps 0 wants to stand for
CPUFREQ_none (or whatever name you deem appropriate)?

Jan



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.