Xen project Mailing List

Re: [Xen-devel] [PATCH 3/3] x86/smt: Support for enabling/disabling SMT at runtime

To: "Andrew Cooper" <andrew.cooper3@xxxxxxxxxx>

From: "Jan Beulich" <JBeulich@xxxxxxxx>

Date: Wed, 03 Apr 2019 04:44:42 -0600

Cc: Xen-devel <xen-devel@xxxxxxxxxxxxx>, Wei Liu <wei.liu2@xxxxxxxxxx>, Roger Pau Monne <roger.pau@xxxxxxxxxx>

Delivery-date: Wed, 03 Apr 2019 10:44:55 +0000

List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

>>> On 03.04.19 at 12:17, <andrew.cooper3@xxxxxxxxxx> wrote: > On 03/04/2019 10:33, Jan Beulich wrote: >>>>> On 02.04.19 at 21:57, <andrew.cooper3@xxxxxxxxxx> wrote: >>> Slightly RFC. I'm not very happy with the contination situation, but -EBUSY >>> is the preexisting style and it seems like it is the only option from >>> tasklet >>> context. >> Well, offloading the re-invocation to the caller isn't really nice. >> Looking at the code, is there any reason why couldn't use >> the usual -ERESTART / hypercall_create_continuation()? This >> would require a little bit of re-work, in particular to allow >> passing the vCPU into hypercall_create_continuation(), but >> beyond that I can't see any immediate obstacles. Though >> clearly I wouldn't make this a prereq requirement for the work >> here. > > The problem isn't really the ERESTART. We could do some plumbing and > make it work, but the real problem is that I can't stash the current cpu > index in the sysctl data block across the continuation point. > > At the moment, the loop depends on, once all CPUs are in the correct > state, getting through the for_each_present_cpu() loop without taking a > further continuation. But these are two orthogonal things: One is how to invoke the continuation, and the other is where the continuation is to resume from. I think the former is more important to address, as it affects how the tools side code needs to look like. >>> Is it intentional that we can actually online and offline processors beyond >>> maxcpu? This is a consequence of the cpu parking logic. >> I think so, yes. That's meant to be a boot time limit only imo. >> The runtime limit is nr_cpu_ids. >> >>> --- a/xen/arch/x86/setup.c >>> +++ b/xen/arch/x86/setup.c >>> @@ -60,7 +60,7 @@ static bool __initdata opt_nosmp; >>> boolean_param("nosmp", opt_nosmp); >>> >>> /* maxcpus: maximum number of CPUs to activate. */ >>> -static unsigned int __initdata max_cpus; >>> +unsigned int max_cpus; >>> integer_param("maxcpus", max_cpus); >> As per above I don't think this change should be needed or >> wanted, but if so for whatever reason, wouldn't the variable >> better be __read_mostly? > > __read_mostly, yes, but as to whether the change is needed, that > entirely depends on whether the change in semantics to maxcpus= was > accidental or intentional. Well, as said, I did consider this while putting together the parking series, and I therefore consider it intentional. >>> + opt_smt = true; >> Perhaps also bail early when the variable already has the >> designated value? And again perhaps right in the sysctl >> handler? > > That is not safe across continuations. > > While it would be a very silly thing to do, there could be two callers > which are fighting over whether SMT is disabled or enabled. Oh, and actually not just that: The continuation then wouldn't do anything anymore (unless you first reverted the setting, which in turn wouldn't be right in case any other CPU activity would occur in parallel, while the continuation is still pending). >>> + for_each_present_cpu ( cpu ) >>> + { >>> + if ( cpu == 0 ) >>> + continue; >> Is this special case really needed? If so, perhaps worth a brief >> comment? > > Trying to down cpu 0 is a hard -EINVAL. But here we're on the CPU-up path. Plus, for eventually supporting the offlining of CPU 0, it would feel slightly better if you used smp_processor_id() here. >>> + if ( cpu >= max_cpus ) >>> + break; >>> + >>> + if ( x86_cpu_to_apicid[cpu] & sibling_mask ) >>> + ret = cpu_up_helper(_p(cpu)); >> Shouldn't this be restricted to CPUs a sibling of which is already >> online? And widened at the same time, to also online thread 0 >> if one of the other threads is already online? > > Unfortunately, that turns into a rats nest very very quickly, which is > why I gave up and simplified the semantics to strictly "this shall > {of,off}line the nonzero siblings threads". Okay, if that's the intention, then I can certainly live with this. But it needs to be called out at the very least in the public header. (It might be worthwhile setting up a flag right away for "full" behavior, but leave acting upon it unimplemented). It also wouldn't hurt if the patch description already set expectations accordingly. Then again, considering your "maxcpus=" related question, it would certainly be odd for people to see non-zero threads come online here when they've intentionally left entire cores or nodes offline for whatever reason. Arguably that's not something to expect people would commonly do, and hence it may not be worth wasting meaningful extra effort on. But as above, and such "oddities" should be spelled out, such that it can be recognized that they're not oversights. >> I also notice that the two functions are extremely similar, and >> hence it might be worthwhile considering to fold them, with the >> caller controlling the behavior via the so far unused function >> parameter (at which point the related remark of mine on patch >> 2 would become inapplicable). > > By passing the plug boolean in via data? Yes. >>> --- a/xen/include/public/sysctl.h >>> +++ b/xen/include/public/sysctl.h >>> @@ -246,8 +246,17 @@ struct xen_sysctl_get_pmstat { >>> struct xen_sysctl_cpu_hotplug { >>> /* IN variables */ >>> uint32_t cpu; /* Physical cpu. */ >>> + >>> + /* Single CPU enable/disable. */ >>> #define XEN_SYSCTL_CPU_HOTPLUG_ONLINE 0 >>> #define XEN_SYSCTL_CPU_HOTPLUG_OFFLINE 1 >>> + >>> + /* >>> + * SMT enable/disable. Caller must zero the 'cpu' field to begin, and >>> + * ignore it on completion. >>> + */ >>> +#define XEN_SYSCTL_CPU_HOTPLUG_SMT_ENABLE 2 >>> +#define XEN_SYSCTL_CPU_HOTPLUG_SMT_DISABLE 3 >> Is the "cpu" field constraint mentioned in the comment just a >> precaution? I can't see you encode anything into that field, or >> use it upon getting re-invoked. I assume that's because of the >> expectation that only actual onlining/offlining would potentially >> take long, while iterating over all present CPUs without further >> action ought to be fast enough. > > Ah - that was stale from before I encountered the "fun" of continuations > from tasklet context. > > I would prefer to find a better way, but short of doing a full vcpu > context switch, I don't see an option. And I don't think there's a strong need. It should just be made clear (again in the description) that the remark here is just a precaution at this time, unless you want to drop it altogether. One thing you may want to do though: /* Tolerate already-online siblings. */ if ( ret == -EEXIST ) { ret = 0; continue; } to bypass the general_preempt_check() in that case, such that you can guarantee making forward progress. Jan _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxxx https://lists.xenproject.org/mailman/listinfo/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.