[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] x86: extend coverage of HLE "bad page" workaround



On 26.05.2020 17:01, Andrew Cooper wrote:
> On 26/05/2020 14:35, Jan Beulich wrote:
>> On 26.05.2020 13:17, Andrew Cooper wrote:
>>> On 26/05/2020 07:49, Jan Beulich wrote:
>>>> Respective Core Gen10 processor lines are affected, too.
>>>>
>>>> Signed-off-by: Jan Beulich <jbeulich@xxxxxxxx>
>>>>
>>>> --- a/xen/arch/x86/mm.c
>>>> +++ b/xen/arch/x86/mm.c
>>>> @@ -6045,6 +6045,8 @@ const struct platform_bad_page *__init g
>>>>      case 0x000506e0: /* errata SKL167 / SKW159 */
>>>>      case 0x000806e0: /* erratum KBL??? */
>>>>      case 0x000906e0: /* errata KBL??? / KBW114 / CFW103 */
>>>> +    case 0x000a0650: /* erratum Core Gen10 U/H/S 101 */
>>>> +    case 0x000a0660: /* erratum Core Gen10 U/H/S 101 */
>>> This is marred in complexity.
>>>
>>> The enumeration of MSR_TSX_CTRL (from the TAA fix, but architectural
>>> moving forwards on any TSX-enabled CPU) includes a confirmation that HLE
>>> no longer exists/works.  This applies to IceLake systems, but possibly
>>> not their initial release configuration (hence, via a later microcode
>>> update).
>>>
>>> HLE is also disabled in microcode on all older parts for errata reasons,
>>> so in practice it doesn't exist anywhere now.
>>>
>>> I think it is safe to drop this workaround, and this does seem a more
>>> simple option than encoding which microcode turned HLE off (which sadly
>>> isn't covered by the spec updates, as even when turned off, HLE is still
>>> functioning according to its spec of "may speed things up, may do
>>> nothing"), or the interactions with the CPUID hiding capabilities of
>>> MSR_TSX_CTRL.
>> I'm afraid I don't fully follow: For one, does what you say imply HLE is
>> no longer enumerated in CPUID?
> 
> No - sadly not.  For reasons of "not repeating the Haswell/Broadwell
> microcode fiasco", the HLE bit will continue to exist and be set. 
> (Although on CascadeLake and later, you can turn it off with MSR_TSX_CTRL.)
> 
> It was always a weird CPUID bit.  You were supposed to put
> XACQUIRE/XRELEASE prefixes on your legacy locking, and it would be a nop
> on old hardware and go faster on newer hardware.
> 
> There is nothing runtime code needs to look at the HLE bit for, except
> perhaps for UI reporting purposes.

Do you know of some public Intel doc I could reference for all of this,
which I would kind of need in the description of a patch ...

>> But then this
>> erratum does not have the usual text effectively meaning that an ucode
>> update is or will be available to address the issue; instead it says
>> that BIOS or VMM can reserve the respective address range.
> 
> This is not surprising at all.  Turning off HLE was an unrelated
> activity, and I bet the link went unnoticed.
> 
>> This - assuming the alternative you describe is indeed viable - then is 
>> surely
>> a much more intrusive workaround than needed. Which I wouldn't assume
>> they would suggest in such a case.
> 
> My suggestion was to drop the workaround, not to complicated it with a
> microcode revision matrix.

... doing this? I don't think I've seen any of this in writing so far,
except by you. (I don't understand how this reply of yours relates to
what I was saying about the spec update. I understand what you are
suggesting. I merely tried to express that I'd have expected Intel to
point out the much easier workaround, rather than just a pretty involved
one.) Otherwise, may I suggest you make such a patch, to make sure it
has an adequate description?

Jan



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.