[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH] x86: extend coverage of HLE "bad page" workaround
On 21/03/2023 3:59 pm, Roger Pau Monné wrote: > On Tue, Mar 21, 2023 at 02:51:30PM +0000, Andrew Cooper wrote: >> On 20/03/2023 9:24 am, Jan Beulich wrote: >>> On 17.03.2023 12:39, Roger Pau Monné wrote: >>>> On Tue, May 26, 2020 at 06:40:16PM +0200, Jan Beulich wrote: >>>>> On 26.05.2020 17:01, Andrew Cooper wrote: >>>>>> On 26/05/2020 14:35, Jan Beulich wrote: >>>>>>> On 26.05.2020 13:17, Andrew Cooper wrote: >>>>>>>> On 26/05/2020 07:49, Jan Beulich wrote: >>>>>>>>> Respective Core Gen10 processor lines are affected, too. >>>>>>>>> >>>>>>>>> Signed-off-by: Jan Beulich <jbeulich@xxxxxxxx> >>>>>>>>> >>>>>>>>> --- a/xen/arch/x86/mm.c >>>>>>>>> +++ b/xen/arch/x86/mm.c >>>>>>>>> @@ -6045,6 +6045,8 @@ const struct platform_bad_page *__init g >>>>>>>>> case 0x000506e0: /* errata SKL167 / SKW159 */ >>>>>>>>> case 0x000806e0: /* erratum KBL??? */ >>>>>>>>> case 0x000906e0: /* errata KBL??? / KBW114 / CFW103 */ >>>>>>>>> + case 0x000a0650: /* erratum Core Gen10 U/H/S 101 */ >>>>>>>>> + case 0x000a0660: /* erratum Core Gen10 U/H/S 101 */ >>>>>>>> This is marred in complexity. >>>>>>>> >>>>>>>> The enumeration of MSR_TSX_CTRL (from the TAA fix, but architectural >>>>>>>> moving forwards on any TSX-enabled CPU) includes a confirmation that >>>>>>>> HLE >>>>>>>> no longer exists/works. This applies to IceLake systems, but possibly >>>>>>>> not their initial release configuration (hence, via a later microcode >>>>>>>> update). >>>>>>>> >>>>>>>> HLE is also disabled in microcode on all older parts for errata >>>>>>>> reasons, >>>>>>>> so in practice it doesn't exist anywhere now. >>>>>>>> >>>>>>>> I think it is safe to drop this workaround, and this does seem a more >>>>>>>> simple option than encoding which microcode turned HLE off (which sadly >>>>>>>> isn't covered by the spec updates, as even when turned off, HLE is >>>>>>>> still >>>>>>>> functioning according to its spec of "may speed things up, may do >>>>>>>> nothing"), or the interactions with the CPUID hiding capabilities of >>>>>>>> MSR_TSX_CTRL. >>>>>>> I'm afraid I don't fully follow: For one, does what you say imply HLE is >>>>>>> no longer enumerated in CPUID? >>>>>> No - sadly not. For reasons of "not repeating the Haswell/Broadwell >>>>>> microcode fiasco", the HLE bit will continue to exist and be set. >>>>>> (Although on CascadeLake and later, you can turn it off with >>>>>> MSR_TSX_CTRL.) >>>>>> >>>>>> It was always a weird CPUID bit. You were supposed to put >>>>>> XACQUIRE/XRELEASE prefixes on your legacy locking, and it would be a nop >>>>>> on old hardware and go faster on newer hardware. >>>>>> >>>>>> There is nothing runtime code needs to look at the HLE bit for, except >>>>>> perhaps for UI reporting purposes. >>>>> Do you know of some public Intel doc I could reference for all of this, >>>>> which I would kind of need in the description of a patch ... >>>>> >>>>>>> But then this >>>>>>> erratum does not have the usual text effectively meaning that an ucode >>>>>>> update is or will be available to address the issue; instead it says >>>>>>> that BIOS or VMM can reserve the respective address range. >>>>>> This is not surprising at all. Turning off HLE was an unrelated >>>>>> activity, and I bet the link went unnoticed. >>>>>> >>>>>>> This - assuming the alternative you describe is indeed viable - then is >>>>>>> surely >>>>>>> a much more intrusive workaround than needed. Which I wouldn't assume >>>>>>> they would suggest in such a case. >>>>>> My suggestion was to drop the workaround, not to complicated it with a >>>>>> microcode revision matrix. >>>>> ... doing this? I don't think I've seen any of this in writing so far, >>>>> except by you. (I don't understand how this reply of yours relates to >>>>> what I was saying about the spec update. I understand what you are >>>>> suggesting. I merely tried to express that I'd have expected Intel to >>>>> point out the much easier workaround, rather than just a pretty involved >>>>> one.) Otherwise, may I suggest you make such a patch, to make sure it >>>>> has an adequate description? >>>> Seeing as there seems to be some data missing to justify the commit - >>>> was has Linux done with those erratas? >>> While they deal with the SNB erratum in a similar way, I'm afraid I'm >>> unaware of Linux having or having had a workaround for the errata here. >>> Which, granted, is a little surprising when we did actually even issue >>> an XSA for this. >>> >>> In fact I find Andrew's request even more surprising with that fact (us >>> having issued XSA-282 for it) in mind, which originally I don't think I >>> had paid attention to (nor recalled). >> No - I'm aware of it. It probably was the right move at the time. >> >> But, Intel have subsequently killed HLE in microcode updates update in >> all CPUs it ever existed in (to fix a memory ordering erratum), and >> removed it from the architecture moving forwards (the enumeration of >> TSX_CTRL means HLE architecturally doesn't exist even if it is enumerated). > Should we then check for TSX_CTRL in order to check whether to engage > the workaround? By the looks of the current model list, TSX_CTRL doesn't exist on any of those CPUs. https://xenbits.xen.org/docs/unstable/misc/xen-command-line.html#tsx It was the March 2019 ucode which turned off HLE everywhere, which was only shortly after we released XSA-282. ~Andrew
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |