[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v5 06/15] cpufreq: Add Hardware P-State (HWP) driver



On Mon, Jul 10, 2023 at 9:13 AM Jan Beulich <jbeulich@xxxxxxxx> wrote:
>
> On 06.07.2023 20:54, Jason Andryuk wrote:
> > @@ -510,6 +510,22 @@ choice of `dom0-kernel` is deprecated and not 
> > supported by all Dom0 kernels.
> >  * `<maxfreq>` and `<minfreq>` are integers which represent max and min 
> > processor frequencies
> >    respectively.
> >  * `verbose` option can be included as a string or also as 
> > `verbose=<integer>`
> > +  for `xen`.  It is a boolean for `hwp`.
> > +* `hwp` selects Hardware-Controlled Performance States (HWP) on supported 
> > Intel
> > +  hardware.  HWP is a Skylake+ feature which provides better CPU power
> > +  management.  The default is disabled.  If `hwp` is selected, but hardware
> > +  support is not available, Xen will fallback to cpufreq=xen.
> > +* `<hdc>` is a boolean to enable Hardware Duty Cycling (HDC).  HDC enables 
> > the
> > +  processor to autonomously force physical package components into idle 
> > state.
> > +  The default is enabled, but the option only applies when `hwp` is 
> > enabled.
> > +
> > +There is also support for `;`-separated fallback options:
> > +`cpufreq=hwp,verbose;xen`.  This first tries `hwp` and falls back to `xen`
> > +if unavailable.
>
> In the given example, does "verbose" also apply to the fallback case? If so,
> perhaps better "cpufreq=hwp;xen,verbose", to eliminate that ambiguity?

Yes, "verbose" is applied to both.  I can make the change.  I
mentioned it in the commit message, but I'll mention it here as well.

> > --- a/xen/arch/x86/acpi/cpufreq/cpufreq.c
> > +++ b/xen/arch/x86/acpi/cpufreq/cpufreq.c
> > @@ -642,7 +642,24 @@ static int __init cf_check cpufreq_driver_init(void)
> >          switch ( boot_cpu_data.x86_vendor )
> >          {
> >          case X86_VENDOR_INTEL:
> > -            ret = cpufreq_register_driver(&acpi_cpufreq_driver);
> > +            unsigned int i;
>
> At the moment we still don't mix declarations and statements, i.e. all
> declarations have to be at the top of a block/scope. What iirc we do use
> in a couple of places (and what hence you may want to do here as well) is
> ...
>
> > +            ret = -ENOENT;
> > +
> > +            for ( i = 0; i < cpufreq_xen_cnt; i++ )
>
> ... declare the induction variable inside the loop header.

Sounds good, thanks.

> > +            {
> > +                switch ( cpufreq_xen_opts[i] )
> > +                {
> > +                case CPUFREQ_xen:
> > +                    ret = cpufreq_register_driver(&acpi_cpufreq_driver);
> > +                    break;
> > +                case CPUFREQ_hwp:
> > +                    ret = hwp_register_driver();
> > +                    break;
> > +                }
> > +
> > +                if ( ret == 0 )
> > +                    break;
> > +            }
> >              break;
>
> In this model any kind of failure results in the fallback to be tried
> (and the fallback's error to be returned to the caller rather than
> the primary one). This may or may not be what we actually want;
> personally I would have expected
>
>                 if ( ret != -ENODEV )
>                     break;
>
> or some such instead.

I guess this comes back to our fruit preferences. :)

I can switch it around like that, and make hwp_register_driver()
return -ENODEV for hwp_available() returning false.

> > +static bool hwp_handle_option(const char *s, const char *end)
> > +{
> > +    int ret;
> > +
> > +    ret = parse_boolean("verbose", s, end);
> > +    if ( ret >= 0 ) {
>
> Nit: Style (brace placement).
>
> > +        cpufreq_verbose = ret;
> > +        return true;
> > +    }
> > +
> > +    ret = parse_boolean("hdc", s, end);
> > +    if ( ret >= 0 ) {
>
> Same here.

Thanks.  Sorry about those.

> > +        opt_cpufreq_hdc = ret;
> > +        return true;
> > +    }
> > +
> > +    return false;
> > +}
> > +
> > +int __init hwp_cmdline_parse(const char *s, const char *e)
> > +{
> > +    do
> > +    {
> > +        const char *end = strpbrk(s, ",;");
> > +
> > +        if ( s && !hwp_handle_option(s, end) )
>
> This check of s not being NULL comes too late, as strpbrk() would have
> de-referenced it already. Considering ...
>
> > +        {
> > +            printk(XENLOG_WARNING "cpufreq/hwp: option '%s' not 
> > recognized\n",
> > +                   s);
> > +
> > +            return -1;
> > +        }
> > +
> > +        s = end ? ++end : end;
> > +    } while ( s && s < e );
>
> ... this it probably wants to move even ahead of the loop.

I'll switch from do/while to just while and then the NULL check will
be covered.  In practice, this function is never called with s ==
NULL.

> > +static int hdc_set_pkg_hdc_ctl(unsigned int cpu, bool val)
> > +{
> > +    uint64_t msr;
> > +
> > +    if ( rdmsr_safe(MSR_PKG_HDC_CTL, msr) )
> > +    {
> > +        hwp_err(cpu, "rdmsr_safe(MSR_PKG_HDC_CTL)\n");
> > +        return -1;
> > +    }
> > +
> > +    if ( val )
> > +        msr |= PKG_HDC_CTL_HDC_PKG_ENABLE;
> > +    else
> > +        msr &= ~PKG_HDC_CTL_HDC_PKG_ENABLE;
> > +
> > +    if ( wrmsr_safe(MSR_PKG_HDC_CTL, msr) )
> > +    {
> > +        hwp_err(cpu, "wrmsr_safe(MSR_PKG_HDC_CTL): %016lx\n", msr);
> > +        return -1;
> > +    }
> > +
> > +    return 0;
> > +}
>
> Please can you use either boolean return values or proper 0 / -errno
> ones? (Same again then in the subsequent function.)

Sure, I'll use booleans.

> > +static void cf_check hwp_init_msrs(void *info)
> > +{
> > +    struct cpufreq_policy *policy = info;
> > +    struct hwp_drv_data *data = this_cpu(hwp_drv_data);
> > +    uint64_t val;
> > +
> > +    /*
> > +     * Package level MSR, but we don't have a good idea of packages here, 
> > so
> > +     * just do it everytime.
> > +     */
> > +    if ( rdmsr_safe(MSR_PM_ENABLE, val) )
> > +    {
> > +        hwp_err(policy->cpu, "rdmsr_safe(MSR_PM_ENABLE)\n");
> > +        data->curr_req.raw = -1;
> > +        return;
> > +    }
> > +
> > +    /* Ensure we don't generate interrupts */
> > +    if ( feature_hwp_notification )
> > +        wrmsr_safe(MSR_HWP_INTERRUPT, 0);
> > +
> > +    hwp_verbose("CPU%u: MSR_PM_ENABLE: %016lx\n", policy->cpu, val);
> > +    if ( !(val & PM_ENABLE_HWP_ENABLE) )
> > +    {
> > +        val |= PM_ENABLE_HWP_ENABLE;
> > +        if ( wrmsr_safe(MSR_PM_ENABLE, val) )
> > +        {
> > +            hwp_err(policy->cpu, "wrmsr_safe(MSR_PM_ENABLE, %lx)\n", val);
> > +            data->curr_req.raw = -1;
> > +            return;
> > +        }
> > +    }
> > +
> > +    if ( rdmsr_safe(MSR_HWP_CAPABILITIES, data->hwp_caps) )
> > +    {
> > +        hwp_err(policy->cpu, "rdmsr_safe(MSR_HWP_CAPABILITIES)\n");
> > +        goto error;
> > +    }
> > +
> > +    if ( rdmsr_safe(MSR_HWP_REQUEST, data->curr_req.raw) )
> > +    {
> > +        hwp_err(policy->cpu, "rdmsr_safe(MSR_HWP_REQUEST)\n");
> > +        goto error;
> > +    }
> > +
> > +    /*
> > +     * Check for APERF/MPERF support in hardware
> > +     * also check for boost/turbo support
> > +     */
> > +    intel_feature_detect(policy);
> > +
> > +    if ( feature_hdc )
> > +    {
> > +        if ( hdc_set_pkg_hdc_ctl(policy->cpu, opt_cpufreq_hdc) ||
> > +             hdc_set_pm_ctl1(policy->cpu, opt_cpufreq_hdc) ) {
>
> Please can these two if()s be joined and the well-placed brace be
> retained?

Sure.

> > +            hwp_err(policy->cpu, "Disabling HDC support\n");
> > +            feature_hdc = false;
> > +            goto error;
>
> Why? Can't you continue just with HDC turned off?

Yes, that is what I intended to implement after your earlier review,
but I failed to actually delete the goto.

> > +static void cf_check hwp_write_request(void *info)
> > +{
> > +    const struct cpufreq_policy *policy = info;
> > +    struct hwp_drv_data *data = this_cpu(hwp_drv_data);
> > +    union hwp_request hwp_req = data->curr_req;
> > +
> > +    data->ret = 0;
> > +
> > +    BUILD_BUG_ON(sizeof(union hwp_request) != sizeof(hwp_req.raw));
>
> You changed only the right side to not be sizeof(<type>).

Updated.  I was just focused on removing the uint64_t from your earlier comment.

> > +static int cf_check hwp_cpufreq_cpu_init(struct cpufreq_policy *policy)
> > +{
> > +    unsigned int cpu = policy->cpu;
> > +    struct hwp_drv_data *data;
> > +
> > +    data = xzalloc(struct hwp_drv_data);
> > +    if ( !data )
> > +        return -ENOMEM;
> > +
> > +    policy->governor = &cpufreq_gov_hwp;
> > +
> > +    per_cpu(hwp_drv_data, cpu) = data;
> > +
> > +    on_selected_cpus(cpumask_of(cpu), hwp_init_msrs, policy, 1);
>
> Could I talk you into moving the helper function immediately ahead of
> this (sole) one using it, much like you have it for hwp_cpufreq_target()
> and hwp_write_request()?

Yes. sounds good.  I'll move hdc_set_pkg_hdc_ctl(), hdc_set_pm_ctl1(),
hwp_get_cpu_speeds() as well since they are all called by
hwp_init_msrs().

> > +    if ( data->curr_req.raw == -1 )
> > +    {
> > +        hwp_err(cpu, "Could not initialize HWP properly\n");
> > +        per_cpu(hwp_drv_data, cpu) = NULL;
> > +        xfree(data);
> > +        return -ENODEV;
> > +    }
> > +
> > +    data->minimum = data->curr_req.min_perf;
> > +    data->maximum = data->curr_req.max_perf;
> > +    data->desired = data->curr_req.desired;
> > +    data->energy_perf = data->curr_req.energy_perf;
> > +    data->activity_window = data->curr_req.activity_window;
> > +
> > +    if ( cpu == 0 )
> > +        hwp_verbose("CPU%u: HWP_CAPABILITIES: %016lx\n", cpu, 
> > data->hwp_caps);
>
> While I'm fine with this (perhaps apart from you using "cpu == 0",
> which is an idiom we're trying to get rid of), ...

Oh, I didn't know that.  What is the preferred way to identify the
BSP?  This doesn't necessarily run on the BSP, so "cpu"/"policy->cpu"
is all we have to make a determination.

> > +    hwp_verbose("CPU%u: rdmsr HWP_REQUEST %016lx\n", cpu, 
> > data->curr_req.raw);
>
> ... this once-per-CPU message still looks to verbose to me. Perhaps
> for both:
> - print for the BSP,
> - print when AP value differs from BSP (albeit I don't know how
> [un]likely that is)?

On my test systems, the values have all been identical.  But your
differing values idea seems good.

> > +static int cf_check hwp_cpufreq_cpu_exit(struct cpufreq_policy *policy)
> > +{
> > +    struct hwp_drv_data *data = per_cpu(hwp_drv_data, policy->cpu);
> > +    per_cpu(hwp_drv_data, policy->cpu) = NULL;
> > +    xfree(data);
>
> Nit: Style (blank line between declaration(s) and statement(s) please.
> (Also at least once again below.)
>
> > --- a/xen/drivers/cpufreq/cpufreq.c
> > +++ b/xen/drivers/cpufreq/cpufreq.c
> > @@ -63,12 +63,18 @@ LIST_HEAD_READ_MOSTLY(cpufreq_governor_list);
> >  /* set xen as default cpufreq */
> >  enum cpufreq_controller cpufreq_controller = FREQCTL_xen;
> >
> > -static int __init cpufreq_cmdline_parse(const char *s);
> > +enum cpufreq_xen_opt cpufreq_xen_opts[2] = { CPUFREQ_xen, };
> > +unsigned int cpufreq_xen_cnt = 1;
>
> Looks like both can be __initdata?

Yes, thanks.

> As to the array initializer: For one Misra won't like the 2nd slot not
> initialized. Plus the implicit 0 there is nothing else than CPUFREQ_xen,
> which also ends up a little fragile. Perhaps 0 wants to stand for
> CPUFREQ_none (or whatever name you deem appropriate)?

:) I had a CPUFREQ_none originally, but dropped it as there was no
need for one with cpufreq_xen_cnt controlling the iteration.  I'll add
it back.  (gcc 12 at least complains that the switch in
cpufreq_driver_init() needs to handle CPUFREQ_none, so I'll just have
it return 0 in that case.)

Thanks for the review.

Regards,
Jason



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.