[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH v5 06/15] cpufreq: Add Hardware P-State (HWP) driver
On Mon, Jul 10, 2023 at 9:13 AM Jan Beulich <jbeulich@xxxxxxxx> wrote: > > On 06.07.2023 20:54, Jason Andryuk wrote: > > @@ -510,6 +510,22 @@ choice of `dom0-kernel` is deprecated and not > > supported by all Dom0 kernels. > > * `<maxfreq>` and `<minfreq>` are integers which represent max and min > > processor frequencies > > respectively. > > * `verbose` option can be included as a string or also as > > `verbose=<integer>` > > + for `xen`. It is a boolean for `hwp`. > > +* `hwp` selects Hardware-Controlled Performance States (HWP) on supported > > Intel > > + hardware. HWP is a Skylake+ feature which provides better CPU power > > + management. The default is disabled. If `hwp` is selected, but hardware > > + support is not available, Xen will fallback to cpufreq=xen. > > +* `<hdc>` is a boolean to enable Hardware Duty Cycling (HDC). HDC enables > > the > > + processor to autonomously force physical package components into idle > > state. > > + The default is enabled, but the option only applies when `hwp` is > > enabled. > > + > > +There is also support for `;`-separated fallback options: > > +`cpufreq=hwp,verbose;xen`. This first tries `hwp` and falls back to `xen` > > +if unavailable. > > In the given example, does "verbose" also apply to the fallback case? If so, > perhaps better "cpufreq=hwp;xen,verbose", to eliminate that ambiguity? Yes, "verbose" is applied to both. I can make the change. I mentioned it in the commit message, but I'll mention it here as well. > > --- a/xen/arch/x86/acpi/cpufreq/cpufreq.c > > +++ b/xen/arch/x86/acpi/cpufreq/cpufreq.c > > @@ -642,7 +642,24 @@ static int __init cf_check cpufreq_driver_init(void) > > switch ( boot_cpu_data.x86_vendor ) > > { > > case X86_VENDOR_INTEL: > > - ret = cpufreq_register_driver(&acpi_cpufreq_driver); > > + unsigned int i; > > At the moment we still don't mix declarations and statements, i.e. all > declarations have to be at the top of a block/scope. What iirc we do use > in a couple of places (and what hence you may want to do here as well) is > ... > > > + ret = -ENOENT; > > + > > + for ( i = 0; i < cpufreq_xen_cnt; i++ ) > > ... declare the induction variable inside the loop header. Sounds good, thanks. > > + { > > + switch ( cpufreq_xen_opts[i] ) > > + { > > + case CPUFREQ_xen: > > + ret = cpufreq_register_driver(&acpi_cpufreq_driver); > > + break; > > + case CPUFREQ_hwp: > > + ret = hwp_register_driver(); > > + break; > > + } > > + > > + if ( ret == 0 ) > > + break; > > + } > > break; > > In this model any kind of failure results in the fallback to be tried > (and the fallback's error to be returned to the caller rather than > the primary one). This may or may not be what we actually want; > personally I would have expected > > if ( ret != -ENODEV ) > break; > > or some such instead. I guess this comes back to our fruit preferences. :) I can switch it around like that, and make hwp_register_driver() return -ENODEV for hwp_available() returning false. > > +static bool hwp_handle_option(const char *s, const char *end) > > +{ > > + int ret; > > + > > + ret = parse_boolean("verbose", s, end); > > + if ( ret >= 0 ) { > > Nit: Style (brace placement). > > > + cpufreq_verbose = ret; > > + return true; > > + } > > + > > + ret = parse_boolean("hdc", s, end); > > + if ( ret >= 0 ) { > > Same here. Thanks. Sorry about those. > > + opt_cpufreq_hdc = ret; > > + return true; > > + } > > + > > + return false; > > +} > > + > > +int __init hwp_cmdline_parse(const char *s, const char *e) > > +{ > > + do > > + { > > + const char *end = strpbrk(s, ",;"); > > + > > + if ( s && !hwp_handle_option(s, end) ) > > This check of s not being NULL comes too late, as strpbrk() would have > de-referenced it already. Considering ... > > > + { > > + printk(XENLOG_WARNING "cpufreq/hwp: option '%s' not > > recognized\n", > > + s); > > + > > + return -1; > > + } > > + > > + s = end ? ++end : end; > > + } while ( s && s < e ); > > ... this it probably wants to move even ahead of the loop. I'll switch from do/while to just while and then the NULL check will be covered. In practice, this function is never called with s == NULL. > > +static int hdc_set_pkg_hdc_ctl(unsigned int cpu, bool val) > > +{ > > + uint64_t msr; > > + > > + if ( rdmsr_safe(MSR_PKG_HDC_CTL, msr) ) > > + { > > + hwp_err(cpu, "rdmsr_safe(MSR_PKG_HDC_CTL)\n"); > > + return -1; > > + } > > + > > + if ( val ) > > + msr |= PKG_HDC_CTL_HDC_PKG_ENABLE; > > + else > > + msr &= ~PKG_HDC_CTL_HDC_PKG_ENABLE; > > + > > + if ( wrmsr_safe(MSR_PKG_HDC_CTL, msr) ) > > + { > > + hwp_err(cpu, "wrmsr_safe(MSR_PKG_HDC_CTL): %016lx\n", msr); > > + return -1; > > + } > > + > > + return 0; > > +} > > Please can you use either boolean return values or proper 0 / -errno > ones? (Same again then in the subsequent function.) Sure, I'll use booleans. > > +static void cf_check hwp_init_msrs(void *info) > > +{ > > + struct cpufreq_policy *policy = info; > > + struct hwp_drv_data *data = this_cpu(hwp_drv_data); > > + uint64_t val; > > + > > + /* > > + * Package level MSR, but we don't have a good idea of packages here, > > so > > + * just do it everytime. > > + */ > > + if ( rdmsr_safe(MSR_PM_ENABLE, val) ) > > + { > > + hwp_err(policy->cpu, "rdmsr_safe(MSR_PM_ENABLE)\n"); > > + data->curr_req.raw = -1; > > + return; > > + } > > + > > + /* Ensure we don't generate interrupts */ > > + if ( feature_hwp_notification ) > > + wrmsr_safe(MSR_HWP_INTERRUPT, 0); > > + > > + hwp_verbose("CPU%u: MSR_PM_ENABLE: %016lx\n", policy->cpu, val); > > + if ( !(val & PM_ENABLE_HWP_ENABLE) ) > > + { > > + val |= PM_ENABLE_HWP_ENABLE; > > + if ( wrmsr_safe(MSR_PM_ENABLE, val) ) > > + { > > + hwp_err(policy->cpu, "wrmsr_safe(MSR_PM_ENABLE, %lx)\n", val); > > + data->curr_req.raw = -1; > > + return; > > + } > > + } > > + > > + if ( rdmsr_safe(MSR_HWP_CAPABILITIES, data->hwp_caps) ) > > + { > > + hwp_err(policy->cpu, "rdmsr_safe(MSR_HWP_CAPABILITIES)\n"); > > + goto error; > > + } > > + > > + if ( rdmsr_safe(MSR_HWP_REQUEST, data->curr_req.raw) ) > > + { > > + hwp_err(policy->cpu, "rdmsr_safe(MSR_HWP_REQUEST)\n"); > > + goto error; > > + } > > + > > + /* > > + * Check for APERF/MPERF support in hardware > > + * also check for boost/turbo support > > + */ > > + intel_feature_detect(policy); > > + > > + if ( feature_hdc ) > > + { > > + if ( hdc_set_pkg_hdc_ctl(policy->cpu, opt_cpufreq_hdc) || > > + hdc_set_pm_ctl1(policy->cpu, opt_cpufreq_hdc) ) { > > Please can these two if()s be joined and the well-placed brace be > retained? Sure. > > + hwp_err(policy->cpu, "Disabling HDC support\n"); > > + feature_hdc = false; > > + goto error; > > Why? Can't you continue just with HDC turned off? Yes, that is what I intended to implement after your earlier review, but I failed to actually delete the goto. > > +static void cf_check hwp_write_request(void *info) > > +{ > > + const struct cpufreq_policy *policy = info; > > + struct hwp_drv_data *data = this_cpu(hwp_drv_data); > > + union hwp_request hwp_req = data->curr_req; > > + > > + data->ret = 0; > > + > > + BUILD_BUG_ON(sizeof(union hwp_request) != sizeof(hwp_req.raw)); > > You changed only the right side to not be sizeof(<type>). Updated. I was just focused on removing the uint64_t from your earlier comment. > > +static int cf_check hwp_cpufreq_cpu_init(struct cpufreq_policy *policy) > > +{ > > + unsigned int cpu = policy->cpu; > > + struct hwp_drv_data *data; > > + > > + data = xzalloc(struct hwp_drv_data); > > + if ( !data ) > > + return -ENOMEM; > > + > > + policy->governor = &cpufreq_gov_hwp; > > + > > + per_cpu(hwp_drv_data, cpu) = data; > > + > > + on_selected_cpus(cpumask_of(cpu), hwp_init_msrs, policy, 1); > > Could I talk you into moving the helper function immediately ahead of > this (sole) one using it, much like you have it for hwp_cpufreq_target() > and hwp_write_request()? Yes. sounds good. I'll move hdc_set_pkg_hdc_ctl(), hdc_set_pm_ctl1(), hwp_get_cpu_speeds() as well since they are all called by hwp_init_msrs(). > > + if ( data->curr_req.raw == -1 ) > > + { > > + hwp_err(cpu, "Could not initialize HWP properly\n"); > > + per_cpu(hwp_drv_data, cpu) = NULL; > > + xfree(data); > > + return -ENODEV; > > + } > > + > > + data->minimum = data->curr_req.min_perf; > > + data->maximum = data->curr_req.max_perf; > > + data->desired = data->curr_req.desired; > > + data->energy_perf = data->curr_req.energy_perf; > > + data->activity_window = data->curr_req.activity_window; > > + > > + if ( cpu == 0 ) > > + hwp_verbose("CPU%u: HWP_CAPABILITIES: %016lx\n", cpu, > > data->hwp_caps); > > While I'm fine with this (perhaps apart from you using "cpu == 0", > which is an idiom we're trying to get rid of), ... Oh, I didn't know that. What is the preferred way to identify the BSP? This doesn't necessarily run on the BSP, so "cpu"/"policy->cpu" is all we have to make a determination. > > + hwp_verbose("CPU%u: rdmsr HWP_REQUEST %016lx\n", cpu, > > data->curr_req.raw); > > ... this once-per-CPU message still looks to verbose to me. Perhaps > for both: > - print for the BSP, > - print when AP value differs from BSP (albeit I don't know how > [un]likely that is)? On my test systems, the values have all been identical. But your differing values idea seems good. > > +static int cf_check hwp_cpufreq_cpu_exit(struct cpufreq_policy *policy) > > +{ > > + struct hwp_drv_data *data = per_cpu(hwp_drv_data, policy->cpu); > > + per_cpu(hwp_drv_data, policy->cpu) = NULL; > > + xfree(data); > > Nit: Style (blank line between declaration(s) and statement(s) please. > (Also at least once again below.) > > > --- a/xen/drivers/cpufreq/cpufreq.c > > +++ b/xen/drivers/cpufreq/cpufreq.c > > @@ -63,12 +63,18 @@ LIST_HEAD_READ_MOSTLY(cpufreq_governor_list); > > /* set xen as default cpufreq */ > > enum cpufreq_controller cpufreq_controller = FREQCTL_xen; > > > > -static int __init cpufreq_cmdline_parse(const char *s); > > +enum cpufreq_xen_opt cpufreq_xen_opts[2] = { CPUFREQ_xen, }; > > +unsigned int cpufreq_xen_cnt = 1; > > Looks like both can be __initdata? Yes, thanks. > As to the array initializer: For one Misra won't like the 2nd slot not > initialized. Plus the implicit 0 there is nothing else than CPUFREQ_xen, > which also ends up a little fragile. Perhaps 0 wants to stand for > CPUFREQ_none (or whatever name you deem appropriate)? :) I had a CPUFREQ_none originally, but dropped it as there was no need for one with cpufreq_xen_cnt controlling the iteration. I'll add it back. (gcc 12 at least complains that the switch in cpufreq_driver_init() needs to handle CPUFREQ_none, so I'll just have it return 0 in that case.) Thanks for the review. Regards, Jason
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |