[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH 0/3] x86: make pat and mtrr independent from each other



On 7/18/2022 7:32 AM, Chuck Zmudzinski wrote:
> On 7/17/2022 3:55 AM, Thorsten Leemhuis wrote:
> > Hi Juergen!
> >
> > On 15.07.22 16:25, Juergen Gross wrote:
> > > Today PAT can't be used without MTRR being available, unless MTRR is at
> > > least configured via CONFIG_MTRR and the system is running as Xen PV
> > > guest. In this case PAT is automatically available via the hypervisor,
> > > but the PAT MSR can't be modified by the kernel and MTRR is disabled.
> > > 
> > > As an additional complexity the availability of PAT can't be queried
> > > via pat_enabled() in the Xen PV case, as the lack of MTRR will set PAT
> > > to be disabled. This leads to some drivers believing that not all cache
> > > modes are available, resulting in failures or degraded functionality.
> > > 
> > > The same applies to a kernel built with no MTRR support: it won't
> > > allow to use the PAT MSR, even if there is no technical reason for
> > > that, other than setting up PAT on all cpus the same way (which is a
> > > requirement of the processor's cache management) is relying on some
> > > MTRR specific code.
> > > 
> > > Fix all of that by:
> > > 
> > > - moving the function needed by PAT from MTRR specific code one level
> > >   up
> > > - adding a PAT indirection layer supporting the 3 cases "no or disabled
> > >   PAT", "PAT under kernel control", and "PAT under Xen control"
> > > - removing the dependency of PAT on MTRR
> >
> > Thx for working on this. If you need to respin these patches for one
> > reason or another, could you do me a favor and add proper 'Link:' tags
> > pointing to all reports about this issue? e.g. like this:
> >
> >  Link: https://lore.kernel.org/regressions/YnHK1Z3o99eMXsVK@mail-itl/
> >
> > These tags are considered important by Linus[1] and others, as they
> > allow anyone to look into the backstory weeks or years from now. That is
> > why they should be placed in cases like this, as
> > Documentation/process/submitting-patches.rst and
> > Documentation/process/5.Posting.rst explain in more detail. I care
> > personally, because these tags make my regression tracking efforts a
> > whole lot easier, as they allow my tracking bot 'regzbot' to
> > automatically connect reports with patches posted or committed to fix
> > tracked regressions.
> >
> > [1] see for example:
> > https://lore.kernel.org/all/CAHk-=wjMmSZzMJ3Xnskdg4+GGz=5p5p+GSYyFBTh0f-DgvdBWg@xxxxxxxxxxxxxx/
> > https://lore.kernel.org/all/CAHk-=wgs38ZrfPvy=nOwVkVzjpM3VFU1zobP37Fwd_h9iAD5JQ@xxxxxxxxxxxxxx/
> > https://lore.kernel.org/all/CAHk-=wjxzafG-=J8oT30s7upn4RhBs6TX-uVFZ5rME+L5_DoJA@xxxxxxxxxxxxxx/
> >
> > Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
> >
>
> I echo Thorsten's thx for starting on this now instead of waiting until
> September which I think is when Juergen said he could start working
> on this last week. I agree with Thorsten that Link tags are needed.
> Since multiple patches have been proposed to fix this regression,
> perhaps a Link to each proposed patch, and a note that
> the original report identified a specific commit which when reverted
> also fixes it. IMO, this is all part of the backstory Thorsten refers to.
>
> It looks like with this approach, a fix will not be coming real soon,
> and Borislav Petkov also discouraged me from testing this
> patch set until I receive a ping telling me it is ready for testing,
> which seems to confirm that this regression will not be fixed
> very soon. Please correct me if I am wrong about how long
> it will take to fix it with this approach.
>
> Also, is there any guarantee this approach is endorsed by
> all the maintainers who will need to sign-off, especially
> Linus? I say this because some of the discussion on the
> earlier proposed patches makes me doubt this. I am especially
> referring to this discussion:
>
> https://lore.kernel.org/lkml/4c8c9d4c-1c6b-8e9f-fa47-918a64898a28@xxxxxxxxxxxxx/
>
> and also, here:
>
> https://lore.kernel.org/lkml/YsRjX%2FU1XN8rq+8u@xxxxxxx/
>
> where Borislav Petkov argues that Linux should not be
> patched at all to fix this regression but instead the fix
> should come by patching the Xen hypervisor.
>
> So I have several questions, presuming at least the fix is going
> to be delayed for some time, and also presuming this approach
> is not yet an approach that has the blessing of the maintainers
> who will need to sign-off:
>
> 1. Can you estimate when the patch series will be ready for
> testing and suitable for a prepatch or RC release?
>
> 2. Can you estimate when the patch series will be ready to be
> merged into the mainline release? Is there any hope it will be
> fixed before the next longterm release hosted on kernel.org?
>
> 3. Since a fix is likely not coming soon, can you explain
> why the commit that was mentioned in the original
> report cannot be reverted as a temporary solution while
> we wait for the full fix to come later? I can say that
> reverting that commit (It was a commit affecting
> drm/i915) does fix the issue on my system with no
> negative side effects at all. In such a case, it seems
> contrary to Linus' regression rule to not revert the
> offending commit, even if reverting the offending
> commit is not going to be the final solution. IOW,
> I am trying to argue that an important corollary to
> the Linus regression rule is that we revert commits
> that introduce regressions, especially when there
> are no negative effects when reverting the offending
> commit. Why are we not doing that in this case?
>
> 4. Can you explain why this patch series is superior
> to the other proposed patches that are much more
> simple and have been reported to fix the regression?
>
> 5. This approach seems way too aggressive for backporting
> to the stable releases. Is that correct? Or, will the patches
> be backported to the stable releases? I was told that
> backports to the stable releases are needed to keep things
> consistent across all the supported versions when I submitted
> a patch to fix this regression that identified a specific five year
> old commit that my proposed patch would fix.
>
> Remember, this is a regression that is really bothering
> people now. For example, I am now in a position where
> I cannot install the updates of the Linux kernel that Debian
> pushes out to me without patching the kernel with my
> own private build that has one of the known fixes that
> have already been identified as ways to workaround this
> regression while we wait for the full solution that will
> hopefully come later.
>
> Chuck
>
> > P.S.: As the Linux kernel's regression tracker I deal with a lot of
> > reports and sometimes miss something important when writing mails like
> > this. If that's the case here, don't hesitate to tell me in a public
> > reply, it's in everyone's interest to set the public record straight.
> >
> > BTW, let me tell regzbot to monitor this thread:
> >
> > #regzbot ^backmonitor:
> > https://lore.kernel.org/regressions/YnHK1Z3o99eMXsVK@mail-itl/
>

OK, the comments Boris made on the individual patches of
this patch set answers most of my questions. Thx, Boris.

Chuck



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.