[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v2] Subject: x86/PAT: Report PAT on CPUs that support PAT without MTRR



On 7/13/2022 3:22 PM, Chuck Zmudzinski wrote:
> On 7/13/2022 3:07 PM, Chuck Zmudzinski wrote:
> > On 7/13/2022 9:45 AM, Juergen Gross wrote:
> > > >> On 7/13/2022 6:36 AM, Chuck Zmudzinski wrote:
> > > >> And in addition, if we are going to backport this patch to
> > > >> all current stable branches, we better have a really, really,
> > > >> good reason for changing the behavior of "nopat" on Xen.
> > > >>
> > > >> Does such a reason exist?
> > > > 
> > > > Well, the simple reason is: It doesn't work the same way under Xen
> > > > and non-Xen (in turn because, before my patch or whatever equivalent
> > > > work, things don't work properly anyway, PAT-wise). Yet it definitely
> > > > ought to behave the same everywhere, imo.
> > >
> > > There is Documentation/x86/pat.rst which rather clearly states, how
> > > "nopat" is meant to work. It should not change the contents of the
> > > PAT MSR and keep it just as it was set at boot time (the doc talks
> > > about the "BIOS" setting of the MSR, and I guess in the Xen case
> > > the hypervisor is kind of acting as the BIOS).
> > >
> > > The question is, whether "nopat" needs to be translated to
> > > pat_enabled() returning "false".
> >
> > When I started working on a re-factoring effort of the logic
> > surrounding pat_enabled(), I noticed there are five different
> > reasons in the current code for setting pat_disabled to true,
> > which IMO is what should be a redundant variable that should
> > always be equal !pat_enabled() and !pat_bp_enabled, but that
> > unfortunately is not the case. The five reasons for setting
> > pat_disabled to true are given as message strings:
> >
> > 1. "MTRRs disabled, skipping PAT initialization too."
> > 2. "PAT support disabled because CONFIG_MTRR is disabled in the kernel."
> > 3. "PAT support disabled via boot option."
> > 4. "PAT not supported by the CPU."
> > 5. "PAT support disabled by the firmware."
> >
> > The only effect of setting pat_disabled to true is to inhibit
> > the execution of pat_init(), but it does not inhibit the execution
> > of init_cache_modes(), which is for handling all these cases
> > when pat_init() was skipped. The Xen case is one of those
> > cases, so in the Xen case, pat_disabled will be true yet the
> > only way to fix the current regression and the five-year-old
> > commit is by setting pat_bp_enabled to true so pat_enabled()
> > will return true. So to fix the five-year-old commit, we must have
> >
> > pat_enabled() != pat_disabled
> >
> > Something is wrong with this logic, that is why I wanted to precede
> > my fix with some re-factoring that will change some variable
> > and function names and modify some comments before trying
> > to fix the five-year-old commit, so that we will never have a situation
> > when pat_enabled() != pat_disabled.
> >
> > Chuck
> Sorry, I meant to say,
>
> To fix the five-year-old commit, we must have
>
> pat_enabled() != !pat_disabled or pat_enabled() == pat_disabled,
>
> and there is something wrong with that logic.
>
> Chuck

So to summarize, I think this means that to be comfortable
fixing the five-year-old commit and the current regression
by artificially setting pat_bp_enabled and pat_enabled() to
true, something which both my patch and Jan's patch does,
we need to come to a new understanding of what the
static boolean variable pat_disabled in
arch/x86/mm/pat/memtype.c in the code really means.

The fact is, we have a regression and the only fix we
can find is to try to make pat_enabled() == pat_disabled

I need to stop thinking about this for a while. It is time
for those who have authority to fix this regression to
make some comments about how they think this should
be fixed.

Chuck



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.