[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH for-4.17 v4 2/2] amd: remove VIRT_SC_MSR_HVM synthetic feature
On Tue, Nov 15, 2022 at 04:21:07PM +0000, Andrew Cooper wrote: > On 15/11/2022 13:26, Roger Pau Monne wrote: > > Since the VIRT_SPEC_CTRL.SSBD selection is no longer context switched > > on vm{entry,exit} there's no need to use a synthetic feature bit for > > it anymore. > > > > Remove the bit and instead use a global variable. > > > > No functional change intended. > > > > Signed-off-by: Roger Pau Monné <roger.pau@xxxxxxxxxx> > > Reviewed-by: Jan Beulich <jbeulich@xxxxxxxx> > > Release-acked-by: Henry Wang <Henry.Wang@xxxxxxx> > > This is definitely not appropriate for 4.17, but it's a performance > regression in general, hence my firm and repeated objection to this > style of patch. While I don't have any objections in deferring this past 4.17, none of the modified paths are performance sensitive AFAICT. > General synthetic bits have existed for several decades longer than > alternatives. It has never ever been a rule, or even a recommendation, > to aggressively purge the non-alternative bits, because it's a provably > bad thing to do. > > > You are attempting a micro-optimisation, that won't produce any > improvement at all in centuries, while... Oh, I wasn't attempting any micro-optimizations TBH, just didn't see the need to keep this as a synthetic feature, and generally consider better to use a global variable because it's IMO easier to follow. > > diff --git a/xen/arch/x86/cpu/amd.c b/xen/arch/x86/cpu/amd.c > > index a332087604..9e3b9094d3 100644 > > --- a/xen/arch/x86/cpu/amd.c > > +++ b/xen/arch/x86/cpu/amd.c > > @@ -49,6 +49,7 @@ boolean_param("allow_unsafe", opt_allow_unsafe); > > /* Signal whether the ACPI C1E quirk is required. */ > > bool __read_mostly amd_acpi_c1e_quirk; > > bool __ro_after_init amd_legacy_ssbd; > > +bool __ro_after_init amd_virt_spec_ctrl; > > ... actually expending .rodata with something 8 times less efficiently > packed, and ... > > > > > static inline int rdmsr_amd_safe(unsigned int msr, unsigned int *lo, > > unsigned int *hi) > > diff --git a/xen/arch/x86/cpuid.c b/xen/arch/x86/cpuid.c > > index 822f9ace10..acc2f606ce 100644 > > --- a/xen/arch/x86/cpuid.c > > +++ b/xen/arch/x86/cpuid.c > > @@ -3,6 +3,7 @@ > > #include <xen/param.h> > > #include <xen/sched.h> > > #include <xen/nospec.h> > > +#include <asm/amd.h> > > ... (Specific to this instance) making life harder for the people trying > to make CONFIG_AMD work, and ... That's indeed a point, albeit I think adding a `#define amd_virt_spec_ctrl false` won't be the bigger of the problems when dealing with CONFIG_AMD, and will need to be done for other AMD specific variables anyway. > > diff --git a/xen/arch/x86/spec_ctrl.c b/xen/arch/x86/spec_ctrl.c > > index 4e53056624..0b94af6b86 100644 > > --- a/xen/arch/x86/spec_ctrl.c > > +++ b/xen/arch/x86/spec_ctrl.c > > @@ -514,12 +514,12 @@ static void __init print_details(enum ind_thunk > > thunk, uint64_t caps) > > (boot_cpu_has(X86_FEATURE_SC_MSR_HVM) || > > boot_cpu_has(X86_FEATURE_SC_RSB_HVM) || > > boot_cpu_has(X86_FEATURE_IBPB_ENTRY_HVM) || > > - boot_cpu_has(X86_FEATURE_VIRT_SC_MSR_HVM) || > > + amd_virt_spec_ctrl || > > ... breaking apart a single TEST instruction, which not only adds an > extra conditional merge, but now hits an cold-ish cache line everywhere > it's used. Why does performance matter here? It's an init function that prints the speculation related settings to the screen, so that's likely to be many times slower that accessing a cold cache line. > Count how many synthetic feature bits it will actually take to change > the per-cpu data size, and realise that, when it will take more than 200 > years at the current rate of accumulation, any believe that this is an > improvement to be had disappears. > > Yes, it is only a micro regression, but you need a far better > justification than "there is a gain of 64 bytes per CPU which will be > non-theoretical in more than 200 years" when traded off vs the actual > 512 bytes, plus extra code bloat bloat, plus reduced locality of data > that this "improvement" genuinely costs today. I wasn't considering any of the above when proposing the change, my only motivation was that global variables are clearer to use than synthetic features, and I didn't see a need for a synthetic feature in this case. If we agree the above possible performance regressions are worth it I'm fine keeping it as-is. Now that I realize it amd_virt_spec_ctrl could even be plain __init. Thanks, Roger.
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |