[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] x86: extend coverage of HLE "bad page" workaround


  • To: Jan Beulich <jbeulich@xxxxxxxx>, Roger Pau Monné <roger.pau@xxxxxxxxxx>
  • From: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
  • Date: Tue, 21 Mar 2023 14:51:30 +0000
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=citrix.com; dmarc=pass action=none header.from=citrix.com; dkim=pass header.d=citrix.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=t9BJe+0y/uYdfgD0pS0o33L7+hpL4Z1mB0nGrbYNQWQ=; b=IXslA+c1X4UGd/NFz37nXJD2vyWhYStp9U8+w31YtrVYH6c/IvyxdSVWWfS6i88H3xsRDVGQlRxNSDRcozwRq2DkwSwDg6GNy+IKSSV/N+KwBPZ7MeP1sn8WM9Sg2yQFaCFaBkgmbNXbbOz0HxQxBGdMkC9jnIXbA8rYNuTDKESlO6jw1TnxQEIJv8wYPOVexopXV3XiRLRvht0rJy1c2g4qLMrhRM69vigB2lvxDBQyeDl5ABZIq4OWsCJEpmYt1BthlUStv1ca1GkYopFxrM56/WmLo0yuvBB4+jyr8v5+SEaUXkTcXhVcq9ORvlMPDxf1fVeCVsyMklnT5BAOAg==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=Vr07UTXqOs/gq141VfJZHkwlV7MP7WjVTK0JZ0DOPFx8fJUP9+jwml3VU0gYXYJCt+2OdLebxZAsa0W15Rtm0NgKjdwpAaKcZqpZtuIAyw0sQhwMaa/8x1W9QalUfRu+jjbVBqTH62R6Y2Y2MlAxYeitzruQ5h8IoWVKslIX8mIeYRGpVdUPLvsojLwgFpQOWcTKPMU/4RXjy+HR0KVLfTnOvtNc3kNzXQD7JR2ntJtPuUMOooiwNmdGJNmWCfTPbvZnw3/e/wEZAjiClg0vcu/k+hvVb4UurSd0kEtUG/cwYIfwk/UB3A7tg7yjnSCR26iq0V1B9jzA6F4Yk47Czg==
  • Authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=citrix.com;
  • Cc: "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>, Wei Liu <wl@xxxxxxx>
  • Delivery-date: Tue, 21 Mar 2023 14:52:02 +0000
  • Ironport-data: A9a23:mQWa+aAzbaFUYBVW/wTiw5YqxClBgxIJ4kV8jS/XYbTApDoi0TdVx mVKXz/UMviDa2r0e4hzYYm0pE9S75PQx9JrQQY4rX1jcSlH+JHPbTi7wuUcHAvJd5GeExg3h yk6QoOdRCzhZiaE/n9BCpC48T8nk/nOHuGmYAL9EngZbRd+Tys8gg5Ulec8g4p56fC0GArIs t7pyyHlEAbNNwVcbyRFu8pvlDs15K6p4GhC5wRkDRx2lAS2e0c9Xcp3yZ6ZdxMUcqEMdsamS uDKyq2O/2+x13/B3fv8z94X2mVTKlLjFVDmZkh+AsBOsTAbzsAG6Y4pNeJ0VKtio27hc+ada jl6ncfYpQ8BZsUgkQmGOvVSO3kW0aZuoNcrLZUj2CA6IoKvn3bEmp1T4E8K0YIw+OJTXVBl5 8cjMjUhdzmHtviQ+pu5Vbw57igjBJGD0II3nFhFlGicJ9B2BJfJTuPN+MNS2yo2ioZWB/HCa sEFaD1pKhPdfxlIPVRRA5U79AuqriCnL3sE9xTI9exuvTm7IA9ZidABNPL8fNCQSNoTtUGfv m/cpEzyAw0ANczZwj2Amp6prraXxHurBt1JRdVU8NZYoE+W6GYfEiY3C0SnvPn6iX+cXNhAf hl8Fi0G6PJaGFaQZtv3UgC8oXWElgUBQNcWGOo/gCmdx6yR7wuHC2wsSj9adMdgpMIwXSYt1 FKCg5XuHzMHmKKRYWKQ8PGTtzzaBMQOBWoLZCtBRw1V5dDm+dg3lkiWEIclF7OphNroHz222 yqNsCU1m7QUi4gMyrm/+lfExTmro/AlUzII2+keZUr9hisRWWJvT9XABYTzhRqYELukcw==
  • Ironport-hdrordr: A9a23:gfbIpK01RDm/0Wxh9gekDQqjBLUkLtp133Aq2lEZdPU1SL3/qy nKpp536faaslcssR0b+exoQZPvfZq+z+8T3WByB8bGYOCOggLBR72Ki7GSoAEIcxeTygc379 YFT0ERMqyTMbCk5fyU3OHee+xQueW6zA==
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 20/03/2023 9:24 am, Jan Beulich wrote:
> On 17.03.2023 12:39, Roger Pau Monné wrote:
>> On Tue, May 26, 2020 at 06:40:16PM +0200, Jan Beulich wrote:
>>> On 26.05.2020 17:01, Andrew Cooper wrote:
>>>> On 26/05/2020 14:35, Jan Beulich wrote:
>>>>> On 26.05.2020 13:17, Andrew Cooper wrote:
>>>>>> On 26/05/2020 07:49, Jan Beulich wrote:
>>>>>>> Respective Core Gen10 processor lines are affected, too.
>>>>>>>
>>>>>>> Signed-off-by: Jan Beulich <jbeulich@xxxxxxxx>
>>>>>>>
>>>>>>> --- a/xen/arch/x86/mm.c
>>>>>>> +++ b/xen/arch/x86/mm.c
>>>>>>> @@ -6045,6 +6045,8 @@ const struct platform_bad_page *__init g
>>>>>>>      case 0x000506e0: /* errata SKL167 / SKW159 */
>>>>>>>      case 0x000806e0: /* erratum KBL??? */
>>>>>>>      case 0x000906e0: /* errata KBL??? / KBW114 / CFW103 */
>>>>>>> +    case 0x000a0650: /* erratum Core Gen10 U/H/S 101 */
>>>>>>> +    case 0x000a0660: /* erratum Core Gen10 U/H/S 101 */
>>>>>> This is marred in complexity.
>>>>>>
>>>>>> The enumeration of MSR_TSX_CTRL (from the TAA fix, but architectural
>>>>>> moving forwards on any TSX-enabled CPU) includes a confirmation that HLE
>>>>>> no longer exists/works.  This applies to IceLake systems, but possibly
>>>>>> not their initial release configuration (hence, via a later microcode
>>>>>> update).
>>>>>>
>>>>>> HLE is also disabled in microcode on all older parts for errata reasons,
>>>>>> so in practice it doesn't exist anywhere now.
>>>>>>
>>>>>> I think it is safe to drop this workaround, and this does seem a more
>>>>>> simple option than encoding which microcode turned HLE off (which sadly
>>>>>> isn't covered by the spec updates, as even when turned off, HLE is still
>>>>>> functioning according to its spec of "may speed things up, may do
>>>>>> nothing"), or the interactions with the CPUID hiding capabilities of
>>>>>> MSR_TSX_CTRL.
>>>>> I'm afraid I don't fully follow: For one, does what you say imply HLE is
>>>>> no longer enumerated in CPUID?
>>>> No - sadly not.  For reasons of "not repeating the Haswell/Broadwell
>>>> microcode fiasco", the HLE bit will continue to exist and be set. 
>>>> (Although on CascadeLake and later, you can turn it off with MSR_TSX_CTRL.)
>>>>
>>>> It was always a weird CPUID bit.  You were supposed to put
>>>> XACQUIRE/XRELEASE prefixes on your legacy locking, and it would be a nop
>>>> on old hardware and go faster on newer hardware.
>>>>
>>>> There is nothing runtime code needs to look at the HLE bit for, except
>>>> perhaps for UI reporting purposes.
>>> Do you know of some public Intel doc I could reference for all of this,
>>> which I would kind of need in the description of a patch ...
>>>
>>>>> But then this
>>>>> erratum does not have the usual text effectively meaning that an ucode
>>>>> update is or will be available to address the issue; instead it says
>>>>> that BIOS or VMM can reserve the respective address range.
>>>> This is not surprising at all.  Turning off HLE was an unrelated
>>>> activity, and I bet the link went unnoticed.
>>>>
>>>>> This - assuming the alternative you describe is indeed viable - then is 
>>>>> surely
>>>>> a much more intrusive workaround than needed. Which I wouldn't assume
>>>>> they would suggest in such a case.
>>>> My suggestion was to drop the workaround, not to complicated it with a
>>>> microcode revision matrix.
>>> ... doing this? I don't think I've seen any of this in writing so far,
>>> except by you. (I don't understand how this reply of yours relates to
>>> what I was saying about the spec update. I understand what you are
>>> suggesting. I merely tried to express that I'd have expected Intel to
>>> point out the much easier workaround, rather than just a pretty involved
>>> one.) Otherwise, may I suggest you make such a patch, to make sure it
>>> has an adequate description?
>> Seeing as there seems to be some data missing to justify the commit -
>> was has Linux done with those erratas?
> While they deal with the SNB erratum in a similar way, I'm afraid I'm
> unaware of Linux having or having had a workaround for the errata here.
> Which, granted, is a little surprising when we did actually even issue
> an XSA for this.
>
> In fact I find Andrew's request even more surprising with that fact (us
> having issued XSA-282 for it) in mind, which originally I don't think I
> had paid attention to (nor recalled).

No - I'm aware of it.  It probably was the right move at the time.

But, Intel have subsequently killed HLE in microcode updates update in
all CPUs it ever existed in (to fix a memory ordering erratum), and
removed it from the architecture moving forwards (the enumeration of
TSX_CTRL means HLE architecturally doesn't exist even if it is enumerated).

These errata no longer exist when you've got up to date microcode, and
if you've not got up to date microcode on these CPUs, you've got far
worse security problems.

~Andrew



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.