[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH] x86/PAT: have pat_enabled() properly reflect state when running on e.g. Xen, with corrected patch
On 7/5/22 12:14 PM, Borislav Petkov wrote: > On Tue, Jul 05, 2022 at 05:56:36PM +0200, Jan Beulich wrote: > > Re-using pat_disabled like you do in your suggestion below won't > > work, because mtrr_bp_init() calls pat_disable() when MTRRs > > appear to be disabled (from the kernel's view). The goal is to > > honor "nopat" without honoring any other calls to pat_disable(). I think Jan is speaking of the narrow goal of the patch that is causing the regression at hand: Commit bdd8b6c98239cad3a976d6f197afc2c794d3cef8 ("drm/i915: replace X86_FEATURE_PAT with pat_enabled()") The author of that commit expressed the desire to more readily handle the nopat option in Linux in an architecture-independent way. in the commit message. The patch was not intended to cause a functional change, but it did - it caused a regression in Xen Dom0s running certain Intel graphics devices. > > Actually, the current goal is to adjust Xen dom0 because: > > 1. it uses the PAT code > > 2. but then it does something special and hides the MTRRs > > which is not something real hardware does. > > So this one-off thing should be prominent, visible and not get in the > way. Actually, this is probably the most insightful comment in all the words that have been written about this regression. This is a one-off thing, peculiar to Xen PV, but it is not visible enough and gets overlooked when changes are made. I guess I agree it should not get in the way either, and it doesn't, except when the lack of visibility of this one-off thing causes the author of a patch to overlook it and cause unforeseen problems for users of Xen on Linux. That is precisely what happened five years ago with commit 99c13b8c8896d7bcb92753bf ("x86/mm/pat: Don't report PAT on CPUs that don't support it") Have you looked at that patch? That is the one that introduced the regression that causes pat_enabled() to return a false negative on Xen Dom0 today, and it hid in the code for four and a half years and was only exposed as a regression with commit bdd8b6c98239cad3a9 last December, which again was written in a way that caused the regression on Xen because this one-off thing that Xen does was not visible enough to the author of the recent patch from last December that exposed this problem as a regression with Xen PV domains and certain Intel graphics adpapters. That patch did not take into account this not-visible-enough Xen case when MTRR is disabled but PAT is still supported by the CPU in Xen PV domains. So the proper way to fix this regression is by fixing that commit from five years ago. It is very simple. After some code re-factoring and other changes, today that commit can be fixed by something like: --- a/arch/x86/mm/pat/memtype.c 2022-06-16 07:32:05.000000000 -0400 +++ b/arch/x86/mm/pat/memtype.c 2022-07-09 17:51:56.783050765 -0400 @@ -315,6 +315,19 @@ PAT(4, WB) | PAT(5, WT) | PAT(6, UC_MINUS) | PAT(7, UC); } + else if (!pat_bp_enabled) { + /* + * In some environments, specifically Xen PV, PAT + * initialization is skipped because MTRRs are + * disabled even though PAT is available. In such + * environments, set PAT to enabled to correctly + * indicate to callers of pat_enabled() that CPU + * support for PAT is available. + */ + pat_bp_enabled = true; + pr_info("x86/PAT: PAT enabled by init_cache_modes\n"); + } + __init_cache_modes(pat); } [N.B. I am re-sending this message because this patch in the earlier message would be malformed because I deleted an inserted line without editing the line in the patch that defines how many new lines are inserted into the patched file. The change from the proposed patch in the previous message is: @@ -315,6 +315,20 @@ -> @@ -315,6 +315,19 @@ Sorry for the confusion.] Like Jan's approach, this patch implements the fix in init_cache_modes(), but unlike Jan's approach, it only sets pat_bp_enabled and pat_bp_initialized to true if boot_cpu_has(X86_FEATURE_PAT) is true and rdmsrl(MSR_IA32_CR_PAT, pat) returns a valid value. No need to check for a hypervisor, just check if the CPU supports PAT here and that PAT MSR returns something valid here. If both are true, then set pat_bp_enabled to true. Regression solved. And that leaves the bigger goal of dealing with this one-off thing that Xen does in a more sane way for another day. I am working on a patch series that attempts to start the process by first re-factoring the currently confusing pat_disable functions and variables that will hopefully make this one-off Xen thing more visible and easier to understand so when someone wants to play around with the current way of deciding whether or not PAT is enabled on X86, no regression will occur on Xen or in any other environment. Best Regards, Chuck
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |