[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v4 06/15] cpufreq: Add Hardware P-State (HWP) driver


  • To: Jason Andryuk <jandryuk@xxxxxxxxx>
  • From: Jan Beulich <jbeulich@xxxxxxxx>
  • Date: Wed, 21 Jun 2023 09:58:39 +0200
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=suse.com; dmarc=pass action=none header.from=suse.com; dkim=pass header.d=suse.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=DAfFwxHXDSjBDLqPWIAXU5ZPqOr60cP91v0ziC8JE8A=; b=Wd7hO93skmqVZ3iS6oNJ5DFz7j95rZj/8kKYmm2Qhb7OPq2OP4JxD//XOpJX9b8KnoK7RzLN0ZsLgTuaWvi8WhBb6ow53pFnzH1nfuDx0gmWx89i1BunODK8G4zoW7MXIyGBpuBGMUeurMqy/fvMW+nYOCRi4zlXbAQAS/+wZ6GGVpcXgUhbrLgu/wBCzPuxDUqpaHoqkH8nmByq9Jwjy35BR/0lpTp3lunjHFSUIaCTF/UGZ2lOlh6dCm9aBsx4Cec9+3VaHhQqXayzR+EB6/hJ5AwZA61xNs+f1TanvyPOpPQIjSWI663/pa3ffQiQ2jwySZXIAI5+9IOAQxI4XQ==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=Q1bNX1X6YgGGLMS19Dbm83EdrEd/IKGUdns+7WRSX1DOyyHQ8xIKQ6cqYKhZ/EqUWWa6qbUn9ZlNu9j/Oxp4AtLFNN4EFlSKZqcaHoQlAYHBXISyXB2p0TPDmUkkq/TRitXJvWCcuhy9a55fuiIMGNEs9fWLu9hrZ4l21c+ZI3CK4Jc06aDJ32fw4QnSLBFVRMJ5ngcOgi+0HV95iPXyjXTmay4kfUyuggt0lN2jWLpxsVH16HjxgDTLtNJqg6dtPqVLa+HqXsx5X8MmswWwJgvVR2AZOLOYGG4hBHCeFkerT7EDxAlrsgXSPnrGHRXpoO/0QzzOu06Gj2qUk+6x5Q==
  • Authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=suse.com;
  • Cc: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, George Dunlap <george.dunlap@xxxxxxxxxx>, Julien Grall <julien@xxxxxxx>, Stefano Stabellini <sstabellini@xxxxxxxxxx>, Wei Liu <wl@xxxxxxx>, Roger Pau Monné <roger.pau@xxxxxxxxxx>, xen-devel@xxxxxxxxxxxxxxxxxxxx
  • Delivery-date: Wed, 21 Jun 2023 07:59:09 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 20.06.2023 20:14, Jason Andryuk wrote:
> On Mon, Jun 19, 2023 at 7:38 AM Jan Beulich <jbeulich@xxxxxxxx> wrote:
>> On 14.06.2023 20:02, Jason Andryuk wrote:
>>> Falling back from cpufreq=hwp to cpufreq=xen is a more user-friendly
>>> choice than disabling cpufreq when HWP is not available.  Specifying
>>> cpufreq=hwp indicates the user wants cpufreq, so, if HWP isn't
>>> available, it makes sense to give them the cpufreq that can be
>>> supported.  i.e. I can't see a user only wanting cpufreq=hwp or
>>> cpufreq=none, but not cpufreq=xen.
>>
>> I continue to disagree with this, and I'd like to ask for another
>> maintainer's opinion. To me the described behavior is like getting
>> pears when having asked for apples. I nevertheless agree that
>> having said fallback as an option, but I'd see that done by giving
>> meaning to something like "cpufreq=hwp,xen" (without having checked
>> whether that doesn't have meaning already, i.e. just to give you an
>> idea).
> 
> I know you disagree with the approach.  I implemented what would be
> useful to me and Marek.  It felt counterproductive to implement a hard
> failure mode that is unsuitable for my use case.  Also there was the
> possibility you wouldn't mind when you saw how it was implemented.
> 
> Yeah, asking for an apple and getting a pear can be the wrong thing if
> your recipe calls for apples.  But getting *some* fruit can be better
> than no fruit if you are hungry.

;-)

> As implemented here, with HWP default disabled,
> no command line -> default cpufreq=xen
> cpufreq=xen -> only cpufreq=xen
> cpufreq=hwp -> try hwp and fallback to cpufreq=xen
> 
> If/when HWP is default enabled:
> no command line -> try hwp and fallback to cpufreq=xen
> cpufreq=hwp -> try hwp and fallback to cpufreq=xen
> cpufreq=xen -> cpufreq=xen

At which point the question would be why to have such an option in
the first place. Plus how to specify that fallback to "xen" is not
wanted.

> Unless some other idea comes to me, I guess I'll look at implementing
> "cpufreq=hwp,xen".

Thanks.

>>> +static bool hwp_handle_option(const char *s, const char *end)
>>> +{
>>> +    int ret;
>>> +
>>> +    if ( strncmp(s, "verbose", 7) == 0 )
>>> +    {
>>> +        s += 7;
>>> +        if ( *s == '=' )
>>> +        {
>>> +            s++;
>>> +            cpufreq_verbose = !!simple_strtoul(s, NULL, 0);
>>> +
>>> +            return true;
>>> +        }
>>> +
>>> +        if ( end == NULL ||
>>> +             (end == s && (*end == '\0' || *end == ',')) )
>>> +        {
>>> +            cpufreq_verbose = true;
>>> +
>>> +            return true;
>>> +        }
>>> +
>>> +        return false;
>>> +    }
>>
>> Would be nice if the handling of "verbose" didn't need duplicating here.
>> However, if that's unavoidable, did you consider handling this as a
>> proper boolean instead of the custom way cpufreq_handle_common_option()
>> also uses?
> 
> It seemed more complicated to try to re-use the "verbose" handling
> from cpufreq_handle_common_option(), especially since minfreq and
> maxfreq are also in there.
> 
> I didn't use proper boolean parsing to remain consistent with
> cpufreq_handle_common_option() and the command line option
> documentation.  I'm fine with switching it to a proper boolean if
> that's what you want.

It would be nice if you could do that here, but I won't insist.

>>> +        goto error;
>>> +    }
>>> +
>>> +    /*
>>> +     * Check for APERF/MPERF support in hardware
>>> +     * also check for boost/turbo support
>>> +     */
>>> +    intel_feature_detect(policy);
>>> +
>>> +    if ( feature_hdc )
>>> +    {
>>> +        if ( hdc_set_pkg_hdc_ctl(policy->cpu, opt_cpufreq_hdc) )
>>> +            goto error;
>>> +        if ( hdc_set_pm_ctl1(policy->cpu, opt_cpufreq_hdc) )
>>> +            goto error;
>>
>> If either of these fails the very first time through (presumably for the
>> BSP), wouldn't a better course of action be to clear feature_hdc?
> 
> So if HWP is working, but the HDC part fails, just clear feature_hdc
> but keep using HWP?  I don't know how likely that is to happen, but
> that seems reasonable.

Just to answer the question - yes, that's what I was thinking of. Maybe
accompanied by a log message.

>>> +static int cf_check hwp_cpufreq_target(struct cpufreq_policy *policy,
>>> +                                       unsigned int target_freq,
>>> +                                       unsigned int relation)
>>> +{
>>> +    unsigned int cpu = policy->cpu;
>>> +    struct hwp_drv_data *data = per_cpu(hwp_drv_data, cpu);
>>> +    /* Zero everything to ensure reserved bits are zero... */
>>> +    union hwp_request hwp_req = { .raw = 0 };
>>> +
>>> +    /* .. and update from there */
>>> +    hwp_req.min_perf = data->minimum;
>>> +    hwp_req.max_perf = data->maximum;
>>> +    hwp_req.desired = data->desired;
>>> +    hwp_req.energy_perf = data->energy_perf;
>>> +    if ( feature_hwp_activity_window )
>>> +        hwp_req.activity_window = data->activity_window;
>>> +
>>> +    if ( hwp_req.raw == data->curr_req.raw )
>>> +        return 0;
>>> +
>>> +    data->curr_req = hwp_req;
>>> +
>>> +    hwp_verbose("CPU%u: wrmsr HWP_REQUEST %016lx\n", cpu, hwp_req.raw);
>>> +    on_selected_cpus(cpumask_of(cpu), hwp_write_request, policy, 1);
>>
>> Is this at risk of being too verbose when the verbose option as given?
> 
> For my client use case, it seems fine since there aren't too many
> CPUs.  But I think you are correct that for a server use case it would
> be too much.  Hmmm.  Do you think I should drop it or make it
> ratelimited?

I think it may have been useful for you during development, but I'd
rather see it dropped. Anyone needing to really fiddle with the
driver can add back whatever logging they deem necessary for what
they do.

>>> +    return data->ret;
>>> +}
>>> +
>>> +static int cf_check hwp_cpufreq_cpu_init(struct cpufreq_policy *policy)
>>> +{
>>> +    unsigned int cpu = policy->cpu;
>>> +    struct hwp_drv_data *data;
>>> +
>>> +    data = xzalloc(struct hwp_drv_data);
>>> +    if ( !data )
>>> +        return -ENOMEM;
>>> +
>>> +    if ( cpufreq_opt_governor && strcmp(cpufreq_opt_governor->name,
>>> +                                        cpufreq_gov_hwp.name) )
>>
>> Nit: I think this would better be
>>
>>     if ( cpufreq_opt_governor &&
>>          strcmp(cpufreq_opt_governor->name, cpufreq_gov_hwp.name) )
> 
> Sounds good.  Actually, with the top level cpufreq=hwp,
> cpufreq_opt_governor shouldn't be set anymore.  I left it since it
> would point out something being unexpected.  policy->governor is set
> unilaterally, so maybe this check and message should just be dropped.
> Thoughts?

I didn't realize this could be dropped. If it can be, please do.

Jan



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.