[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v5 06/15] cpufreq: Add Hardware P-State (HWP) driver



On Tue, Jul 11, 2023 at 4:18 AM Jan Beulich <jbeulich@xxxxxxxx> wrote:
>
> On 10.07.2023 17:22, Jason Andryuk wrote:
> > On Mon, Jul 10, 2023 at 9:13 AM Jan Beulich <jbeulich@xxxxxxxx> wrote:
> >> On 06.07.2023 20:54, Jason Andryuk wrote:
> >>> @@ -510,6 +510,22 @@ choice of `dom0-kernel` is deprecated and not 
> >>> supported by all Dom0 kernels.
> >>>  * `<maxfreq>` and `<minfreq>` are integers which represent max and min 
> >>> processor frequencies
> >>>    respectively.
> >>>  * `verbose` option can be included as a string or also as 
> >>> `verbose=<integer>`
> >>> +  for `xen`.  It is a boolean for `hwp`.
> >>> +* `hwp` selects Hardware-Controlled Performance States (HWP) on 
> >>> supported Intel
> >>> +  hardware.  HWP is a Skylake+ feature which provides better CPU power
> >>> +  management.  The default is disabled.  If `hwp` is selected, but 
> >>> hardware
> >>> +  support is not available, Xen will fallback to cpufreq=xen.
> >>> +* `<hdc>` is a boolean to enable Hardware Duty Cycling (HDC).  HDC 
> >>> enables the
> >>> +  processor to autonomously force physical package components into idle 
> >>> state.
> >>> +  The default is enabled, but the option only applies when `hwp` is 
> >>> enabled.
> >>> +
> >>> +There is also support for `;`-separated fallback options:
> >>> +`cpufreq=hwp,verbose;xen`.  This first tries `hwp` and falls back to 
> >>> `xen`
> >>> +if unavailable.
> >>
> >> In the given example, does "verbose" also apply to the fallback case? If 
> >> so,
> >> perhaps better "cpufreq=hwp;xen,verbose", to eliminate that ambiguity?
> >
> > Yes, "verbose" is applied to both.  I can make the change.  I
> > mentioned it in the commit message, but I'll mention it here as well.
>
> FTAOD my earlier comment implied that the spelling form you use above
> should not even be accepted when parsing. I.e. it was not just about
> the doc aspect.

Oh.  So what exactly do you want then?

There is a single cpufreq_verbose variable today that is set by either
cpufreq=hwp,verbose or cpufreq=xen,verbose.  Is that okay, or should
the "xen" and "hwp" each get a separate variable?

Do you only want to allow a single trailing "verbose" to apply to all
of cpufreq (cpufreq=$foo,verbose)?  Or do you want "verbose" to be
only valid for "xen"?  Both cpufreq_cmdline_parse() and
hwp_cmdline_parse() just loop over their options and don't care about
order, even though the documentation lists verbose last.  Would you
want "cpufreq=hwp,verbose,hdc" to fail to parse?

All parsing is done upfront before knowing whether "xen" or "hwp" will
be used as the cpufreq driver, so there is a trickiness for
implementing "verbose" only for one option.  Similarly,
"cpufreq=hwp,invalid;xen" will try "hwp" (but not "xen")  since the
live variables are updated.  Even without this patch, cpufreq will be
configured up to an invalid parameter.

FYI, cpufreq=xen;hwp will be accepted.  "xen" shouldn't fail, so it
doesn't make sense to specify that.  But it didn't seem necessary to
prevent it.

> >>> +            {
> >>> +                switch ( cpufreq_xen_opts[i] )
> >>> +                {
> >>> +                case CPUFREQ_xen:
> >>> +                    ret = cpufreq_register_driver(&acpi_cpufreq_driver);
> >>> +                    break;
> >>> +                case CPUFREQ_hwp:
> >>> +                    ret = hwp_register_driver();
> >>> +                    break;
> >>> +                }
> >>> +
> >>> +                if ( ret == 0 )
> >>> +                    break;
> >>> +            }
> >>>              break;
> >>
> >> In this model any kind of failure results in the fallback to be tried
> >> (and the fallback's error to be returned to the caller rather than
> >> the primary one). This may or may not be what we actually want;
> >> personally I would have expected
> >>
> >>                 if ( ret != -ENODEV )
> >>                     break;
> >>
> >> or some such instead.
> >
> > I guess this comes back to our fruit preferences. :)
>
> Does it? It's not just a style question here, but one of when / whether
> to use the fallback.

Indeed.  I was trying to allude back to the earlier conversation.  Do
we try the pears only when there are no apples or also when the apples
are "bad"?  Only falling back for no apples is fine.

> > I can switch it around like that, and make hwp_register_driver()
> > return -ENODEV for hwp_available() returning false.
>
> Thanks.
>
> >>> +int __init hwp_cmdline_parse(const char *s, const char *e)
> >>> +{
> >>> +    do
> >>> +    {
> >>> +        const char *end = strpbrk(s, ",;");
> >>> +
> >>> +        if ( s && !hwp_handle_option(s, end) )
> >>
> >> This check of s not being NULL comes too late, as strpbrk() would have
> >> de-referenced it already. Considering ...
> >>
> >>> +        {
> >>> +            printk(XENLOG_WARNING "cpufreq/hwp: option '%s' not 
> >>> recognized\n",
> >>> +                   s);
> >>> +
> >>> +            return -1;

I should change this to -EINVAL.

> >>> +        }
> >>> +
> >>> +        s = end ? ++end : end;
> >>> +    } while ( s && s < e );
> >>
> >> ... this it probably wants to move even ahead of the loop.
> >
> > I'll switch from do/while to just while and then the NULL check will
> > be covered.  In practice, this function is never called with s ==
> > NULL.
>
> In which case - why not leave things largely as they are, simply dropping
> the odd check of s?

Sure.

> >>> +    if ( data->curr_req.raw == -1 )
> >>> +    {
> >>> +        hwp_err(cpu, "Could not initialize HWP properly\n");
> >>> +        per_cpu(hwp_drv_data, cpu) = NULL;
> >>> +        xfree(data);
> >>> +        return -ENODEV;
> >>> +    }
> >>> +
> >>> +    data->minimum = data->curr_req.min_perf;
> >>> +    data->maximum = data->curr_req.max_perf;
> >>> +    data->desired = data->curr_req.desired;
> >>> +    data->energy_perf = data->curr_req.energy_perf;
> >>> +    data->activity_window = data->curr_req.activity_window;
> >>> +
> >>> +    if ( cpu == 0 )
> >>> +        hwp_verbose("CPU%u: HWP_CAPABILITIES: %016lx\n", cpu, 
> >>> data->hwp_caps);
> >>
> >> While I'm fine with this (perhaps apart from you using "cpu == 0",
> >> which is an idiom we're trying to get rid of), ...
> >
> > Oh, I didn't know that.  What is the preferred way to identify the
> > BSP?
>
> Sometimes we pass a separate boolean to functions, in other cases we
> check whether a struct cpuinfo_x86 * equals &boot_cpu_info. The
> latter clearly can't be used here, and the former doesn't look to be
> a good fit either. However, ...
>
> >  This doesn't necessarily run on the BSP, so "cpu"/"policy->cpu"
> > is all we have to make a determination.
>
> ... isn't it, conversely, the case that the function only ever runs
> on "cpu" when it is the BSP? In which case "cpu == smp_processor_id()"
> ought to do the trick.

The calls do not necessarily run from the BSP.  The cpufreq init
callbacks run later when dom0 uploads the ACPI processor data.  If you
don't want "cpu == 0", maybe just print for the first CPU regardless
of number, and then print differences from that?

Regards,
Jason



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.