[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [RFC] Avoid dom0/HVM performance penalty from MSR access tightening



Hi Roger,

See my other reply which is more detailed.  While enabling reads of
"MSR_IA32_ENERGY_PERF_BIAS" did not cause any effect in my case, it is one of a
handful of exceptions in which MSRs are writeable but not readable. I believe
this may result in potentially unexpected behavior in  read-check/modify-write
cases.

As such, I now see that my original patch of making these MSRs globally readable
is too lenient and the conditions should likely be restricted to the same as
those in which writes are allowed.

In my particular case, it looks like all my troubles resulted from the BIOS
setting MSR_IA32_THERM_CONTROL to an invalid value and the recent code change
prevented dom0 from seeing (and correcting) it...

Regards,

-Alex


On Fri, 2022-02-11 at 09:28 +0100, Roger Pau Monné wrote:
> On Thu, Feb 10, 2022 at 11:27:15AM -0600, Alex Olson wrote:
> > I'm seeing strange performance issues under Xen on a Supermicro server with
> > a Xeon D-1541 CPU caused by an MSR-related commit.
> > 
> > Commit 322ec7c89f6640ee2a99d1040b6f786cf04872cf 'x86/pv: disallow access to
> > unknown MSRs'
> > surprisingly introduces a severe performance penality where dom0 has about
> > 1/8th
> > the normal CPU performance. Even even when 'xenpm' is used to select the
> > performance governor and operate the CPU at maximum frequency, actual CPU
> > performance is still 1/2 of normal (as well as using
> > "cpufreq=xen,performance").
> > 
> > The patch below fixes it but I don't fully understand why.
> > 
> > Basically, when *reads* of MSR_IA32_THERM_CONTROL are blocked, dom0 and
> > guests (pinned to other CPUs) see the performance issues.
> 
> You only mention MSR_IA32_THERM_CONTROL here...
> 
> > For benchmarking purposes, I built a small C program that runs a "for
> > loop" 
> > 4Billion iterations and timed its execution. In dom0, the
> > performance issues
> > also cause HVM guest startup time to go from 9-10
> > seconds to almost 80 seconds.
> > 
> > I assumed Xen was managing CPU frequency and thus blocking related MSR
> > access by dom0 (or any other domain). However,  clearly something else
> > is happening and I don't understand why.
> > 
> > I initially attempted to copy the same logic as the write MSR case. This
> > was effective at fixing the dom0 performance issue, but still left other
> > domains running at 1/2 speed. Hence, the change below has no access control.
> > 
> > 
> > If anyone has any insight as to what is really happening, I would be all
> > ears
> > as I am unsure if the change below is a proper solution.
> > 
> > Thanks
> > 
> > -Alex
> > 
> > ---
> > ---
> >  xen/arch/x86/pv/emul-priv-op.c | 12 ++++++++++++
> >  1 file changed, 12 insertions(+)
> > 
> > diff --git a/xen/arch/x86/pv/emul-priv-op.c b/xen/arch/x86/pv/emul-priv-op.c
> > index 7f4279a051..f254479bda 100644
> > --- a/xen/arch/x86/pv/emul-priv-op.c
> > +++ b/xen/arch/x86/pv/emul-priv-op.c
> > @@ -970,6 +970,18 @@ static int read_msr(unsigned int reg, uint64_t *val,
> >          *val = 0;
> >          return X86EMUL_OKAY;
> >  
> > +    /* being unable to read MSR_IA32_THERM_CONTROL seems to significantly
> > affect
> > +     * dom0 and thus HVM guest startup performance, as well as PVH VMs.
> > +     */
> > +    case MSR_IA32_THERM_CONTROL:
> > +    case MSR_IA32_ENERGY_PERF_BIAS:
> 
> ...yet in the patch you also allow access to
> MSR_IA32_ENERGY_PERF_BIAS, which makes me wonder whether
> MSR_IA32_THERM_CONTROL is the only required one.
> 
> It could help to post full logs Xen + Linux dmesgs.
> 
> Is this reproducible with different Linux versions?
> 
> Thanks, Roger.




 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.