[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] x86: extend coverage of HLE "bad page" workaround


  • To: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
  • From: Jan Beulich <jbeulich@xxxxxxxx>
  • Date: Tue, 21 Mar 2023 17:31:20 +0100
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=suse.com; dmarc=pass action=none header.from=suse.com; dkim=pass header.d=suse.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=9q7lY/UVgGiuACc9uA82VVKQjT+iIHV20ir0gmJVEoc=; b=ne7/4YD9SyhFBdk95Mz4n+eugXRxQhOQPsoHNPv1ZaBV4nrPgqtyFsK6buyUeuULqFL4ub1OnPizOHQ/zAeoN41ZQv3ZOiBhgxNijQ+Yb6Hmo3Ewn43onuHCuukN1L0nqlvg6zSvd8T32t8xDcAY/FIngOSUVmCC+zlEX/OrG4+ZAyFtQ+Gtfh25JT50p15Y6+6uclR4is3o/4AqRixWKfteMSlKVKUK+XYR9kcAL4K1ItXVSMnnHvhrcrJRZ6p+G+7zGt6ddkECckcCwkf6XuIMq37z6rRiSLQuElVHQzWQqTet/jYpW9oE2MWegWozQbTMiBiTNE4o2Ws/6extKw==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=dXr3F0rr3uSDiyRQBEzYCn5NwXnur0fMIaXGkruYbqJATtuDXFR3xFUlZ/E2m98AKjf83AmypHwBH10bIy2/InE5sCZXBhanrmnzJFx6WgwnK8Tkl4Ih+BXYnNi48F5Bo7ttyghFrvSqIIcSCyjyISLobmuDemyC4zCpIES5owC7XBLB7+Y7Vu3HEFLEmAO7FXwjD9Wo2FqvqLScvHHqMB//TvUc2wBt78eXfHhXms0LuTEM8UvivwiE6ux/H3ynmlEJYxEjkOstLZVr5eXbO7yKuNnVUBxN6sftY08V3oSwaojRJcoHdejr6/ScJ8VvBTk9TGxq3bGLxTb2jeouLg==
  • Authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=suse.com;
  • Cc: "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>, Wei Liu <wl@xxxxxxx>, Roger Pau Monné <roger.pau@xxxxxxxxxx>
  • Delivery-date: Tue, 21 Mar 2023 16:31:42 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 21.03.2023 15:51, Andrew Cooper wrote:
> On 20/03/2023 9:24 am, Jan Beulich wrote:
>> On 17.03.2023 12:39, Roger Pau Monné wrote:
>>> On Tue, May 26, 2020 at 06:40:16PM +0200, Jan Beulich wrote:
>>>> On 26.05.2020 17:01, Andrew Cooper wrote:
>>>>> On 26/05/2020 14:35, Jan Beulich wrote:
>>>>>> On 26.05.2020 13:17, Andrew Cooper wrote:
>>>>>>> On 26/05/2020 07:49, Jan Beulich wrote:
>>>>>>>> Respective Core Gen10 processor lines are affected, too.
>>>>>>>>
>>>>>>>> Signed-off-by: Jan Beulich <jbeulich@xxxxxxxx>
>>>>>>>>
>>>>>>>> --- a/xen/arch/x86/mm.c
>>>>>>>> +++ b/xen/arch/x86/mm.c
>>>>>>>> @@ -6045,6 +6045,8 @@ const struct platform_bad_page *__init g
>>>>>>>>      case 0x000506e0: /* errata SKL167 / SKW159 */
>>>>>>>>      case 0x000806e0: /* erratum KBL??? */
>>>>>>>>      case 0x000906e0: /* errata KBL??? / KBW114 / CFW103 */
>>>>>>>> +    case 0x000a0650: /* erratum Core Gen10 U/H/S 101 */
>>>>>>>> +    case 0x000a0660: /* erratum Core Gen10 U/H/S 101 */
>>>>>>> This is marred in complexity.
>>>>>>>
>>>>>>> The enumeration of MSR_TSX_CTRL (from the TAA fix, but architectural
>>>>>>> moving forwards on any TSX-enabled CPU) includes a confirmation that HLE
>>>>>>> no longer exists/works.  This applies to IceLake systems, but possibly
>>>>>>> not their initial release configuration (hence, via a later microcode
>>>>>>> update).
>>>>>>>
>>>>>>> HLE is also disabled in microcode on all older parts for errata reasons,
>>>>>>> so in practice it doesn't exist anywhere now.
>>>>>>>
>>>>>>> I think it is safe to drop this workaround, and this does seem a more
>>>>>>> simple option than encoding which microcode turned HLE off (which sadly
>>>>>>> isn't covered by the spec updates, as even when turned off, HLE is still
>>>>>>> functioning according to its spec of "may speed things up, may do
>>>>>>> nothing"), or the interactions with the CPUID hiding capabilities of
>>>>>>> MSR_TSX_CTRL.
>>>>>> I'm afraid I don't fully follow: For one, does what you say imply HLE is
>>>>>> no longer enumerated in CPUID?
>>>>> No - sadly not.  For reasons of "not repeating the Haswell/Broadwell
>>>>> microcode fiasco", the HLE bit will continue to exist and be set. 
>>>>> (Although on CascadeLake and later, you can turn it off with 
>>>>> MSR_TSX_CTRL.)
>>>>>
>>>>> It was always a weird CPUID bit.  You were supposed to put
>>>>> XACQUIRE/XRELEASE prefixes on your legacy locking, and it would be a nop
>>>>> on old hardware and go faster on newer hardware.
>>>>>
>>>>> There is nothing runtime code needs to look at the HLE bit for, except
>>>>> perhaps for UI reporting purposes.
>>>> Do you know of some public Intel doc I could reference for all of this,
>>>> which I would kind of need in the description of a patch ...
>>>>
>>>>>> But then this
>>>>>> erratum does not have the usual text effectively meaning that an ucode
>>>>>> update is or will be available to address the issue; instead it says
>>>>>> that BIOS or VMM can reserve the respective address range.
>>>>> This is not surprising at all.  Turning off HLE was an unrelated
>>>>> activity, and I bet the link went unnoticed.
>>>>>
>>>>>> This - assuming the alternative you describe is indeed viable - then is 
>>>>>> surely
>>>>>> a much more intrusive workaround than needed. Which I wouldn't assume
>>>>>> they would suggest in such a case.
>>>>> My suggestion was to drop the workaround, not to complicated it with a
>>>>> microcode revision matrix.
>>>> ... doing this? I don't think I've seen any of this in writing so far,
>>>> except by you. (I don't understand how this reply of yours relates to
>>>> what I was saying about the spec update. I understand what you are
>>>> suggesting. I merely tried to express that I'd have expected Intel to
>>>> point out the much easier workaround, rather than just a pretty involved
>>>> one.) Otherwise, may I suggest you make such a patch, to make sure it
>>>> has an adequate description?
>>> Seeing as there seems to be some data missing to justify the commit -
>>> was has Linux done with those erratas?
>> While they deal with the SNB erratum in a similar way, I'm afraid I'm
>> unaware of Linux having or having had a workaround for the errata here.
>> Which, granted, is a little surprising when we did actually even issue
>> an XSA for this.
>>
>> In fact I find Andrew's request even more surprising with that fact (us
>> having issued XSA-282 for it) in mind, which originally I don't think I
>> had paid attention to (nor recalled).
> 
> No - I'm aware of it.  It probably was the right move at the time.
> 
> But, Intel have subsequently killed HLE in microcode updates update in
> all CPUs it ever existed in (to fix a memory ordering erratum), and
> removed it from the architecture moving forwards (the enumeration of
> TSX_CTRL means HLE architecturally doesn't exist even if it is enumerated).
> 
> These errata no longer exist when you've got up to date microcode, and
> if you've not got up to date microcode on these CPUs, you've got far
> worse security problems.

Are you sure such ucode was issued for _all_ affected CPUs? I think I
agree for the models being added by the patch here, so I guess I can
accept the argument for this change to not really be needed. But for
removing the workaround I'd expect that suitable microcode exists for
the respective models. Yet at the very least 06-55-05 - consisting of
just a single blob - has a date of 2018-Nov-16, i.e. older than March
2019. That's Xeon D-2100 and perhaps at least one other product (I
didn't go hunt further to see where SKZ is used as the acronym).

Jan



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.