[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] x86: extend coverage of HLE "bad page" workaround


  • To: Roger Pau Monné <roger.pau@xxxxxxxxxx>
  • From: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
  • Date: Tue, 21 Mar 2023 16:14:48 +0000
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=citrix.com; dmarc=pass action=none header.from=citrix.com; dkim=pass header.d=citrix.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=Qgl1SNCikB0E3GRs8+5G0W5EN0ZRe6MsxwBXFxE8y4I=; b=Rg2Ta6OT9sN6zfS3ICCI7BznZZg1i+Vm3c+DLVy9hTZajjGOEmLrqSdqFhoQQAR3scl/UFnYdyiSqBaAUtBc12HmP3aNuXE/W6pyNqCIcE1Rq7SXQbj1ctDpdRdVO8nZAMGJJhgWuRWH9qfwPiz4t3gKgZh97HhrIouqM5S9Oe3jAUo2X5TkbU0xn9fCRusjnXVpO1n46UVAyBrPF03ynsLgjWMaf4/OhVlm4kIfr1mpg4I6X8Y5AFlCYJfRPlP6BfFr4iz/0V46eZGutKS7fq2w+s7mpVDEXJwtPrfFDi3R+helGbxTMxqotnEXWk6sKWIPeZI6SUkb6aVy3M+JFw==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=UkxbT3EuhOoNFKcPDrKAZtKT6hFDkS/PFSN0akvZJhOLpWUDxd6/vAzgj9fSF5tahcCfSUczNxnYR8T2DGQoCM4cEQ+mmrTrLtB8PT3IlKIgSMo8kSzIhiu45GIexH87rAELBj/RnNs+Y9fepYx5YZxTP+69THpKUkMTujRDkyQZltO4Cv3kjjIIbioalZGG02dwqQKexEg/i7DVRw361q/CNoX1xPM5R4VR/s10z3I+yAdQmhl4PAhMGWRR4cMRUb1k+64H6RGxNneB+JyZhV4a2kdO9N9xAyoV1y3OycUBTatX5MMeV2QAWkz1Boo8EQr1UDc0CKDkAYIvRqexow==
  • Authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=citrix.com;
  • Cc: Jan Beulich <jbeulich@xxxxxxxx>, "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>, Wei Liu <wl@xxxxxxx>
  • Delivery-date: Tue, 21 Mar 2023 16:15:25 +0000
  • Ironport-data: A9a23:POPLa6uwr/stO92a/tgGNS4bKOfnVHtfMUV32f8akzHdYApBsoF/q tZmKW2GPvmIazOhethzb4iy8EoP6JLWzNU3TgA/qSAzRiJE+JbJXdiXEBz9bniYRiHhoOCLz O1FM4Wdc5pkJpP4jk3wWlQ0hSAkjclkfpKlVKiffHg3HVQ+IMsYoUoLs/YjhYJ1isSODQqIu Nfjy+XSI1bg0DNvWo4uw/vrRChH4bKj6Fv0gnRkPaoQ5ASEziFPZH4iDfrZw0XQE9E88tGSH 44v/JnhlkvF8hEkDM+Sk7qTWiXmlZaLYGBiIlIPM0STqkAqSh4ai87XB9JFAatjsB2bnsgZ9 Tl4ncfYpTHFnEH7sL91vxFwS0mSNEDdkVPNCSDXXce7lyUqf5ZwqhnH4Y5f0YAwo45K7W9yG fMwdBswfyqji+KM3JWkb8ZppM57KZDKM9ZK0p1g5Wmx4fcOZ7nmGv+Pz/kImTA6i4ZJAOrUY NcfZXx3dhPcbhZTO1ARTpUjgOOvgXq5eDpdwL6XjfNvvy6Pk0osjf60bou9lt+iHK25mm6xo G7c8nu/KRYdLNGFkhKO8262h/+JliT+MG4XPOTgrqMx0AXClgT/DjUYSlmHjOvhuHeaUv52d UlM6zg/8vQLoRnDot7VGkfQTGS/lhwWVsdUEuY6wBqQ0aeS6AGcbkAUQzgEZNE4ucseQT0xy kTPj97vHSZosrCeVTSa7Lj8hSy2ETgYKykFfyBscOcey9zqoYV2hBSfSN9mSfSxloesR2C2x C2Wpi8jgblVldQMy6iw4VHAhXSru4TNSQk2oA7QWwpJ8z9EWWJsXKTwgXCz0BqKBN/xooWp1 JTcp/Wj0Q==
  • Ironport-hdrordr: A9a23:khSp0q3OD89yqwJNnCZ5OgqjBLUkLtp133Aq2lEZdPU1SKylfq WV98jzuiWbtN98YgBDpTn/AtjnfZqsz/9ICOAqVN/PYODIggSVxepZnOjfKlPbakjDHqY078 1dm+IXMrDN5RYQt7ef3OFZe+xQp+W6zA==
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 21/03/2023 3:59 pm, Roger Pau Monné wrote:
> On Tue, Mar 21, 2023 at 02:51:30PM +0000, Andrew Cooper wrote:
>> On 20/03/2023 9:24 am, Jan Beulich wrote:
>>> On 17.03.2023 12:39, Roger Pau Monné wrote:
>>>> On Tue, May 26, 2020 at 06:40:16PM +0200, Jan Beulich wrote:
>>>>> On 26.05.2020 17:01, Andrew Cooper wrote:
>>>>>> On 26/05/2020 14:35, Jan Beulich wrote:
>>>>>>> On 26.05.2020 13:17, Andrew Cooper wrote:
>>>>>>>> On 26/05/2020 07:49, Jan Beulich wrote:
>>>>>>>>> Respective Core Gen10 processor lines are affected, too.
>>>>>>>>>
>>>>>>>>> Signed-off-by: Jan Beulich <jbeulich@xxxxxxxx>
>>>>>>>>>
>>>>>>>>> --- a/xen/arch/x86/mm.c
>>>>>>>>> +++ b/xen/arch/x86/mm.c
>>>>>>>>> @@ -6045,6 +6045,8 @@ const struct platform_bad_page *__init g
>>>>>>>>>      case 0x000506e0: /* errata SKL167 / SKW159 */
>>>>>>>>>      case 0x000806e0: /* erratum KBL??? */
>>>>>>>>>      case 0x000906e0: /* errata KBL??? / KBW114 / CFW103 */
>>>>>>>>> +    case 0x000a0650: /* erratum Core Gen10 U/H/S 101 */
>>>>>>>>> +    case 0x000a0660: /* erratum Core Gen10 U/H/S 101 */
>>>>>>>> This is marred in complexity.
>>>>>>>>
>>>>>>>> The enumeration of MSR_TSX_CTRL (from the TAA fix, but architectural
>>>>>>>> moving forwards on any TSX-enabled CPU) includes a confirmation that 
>>>>>>>> HLE
>>>>>>>> no longer exists/works.  This applies to IceLake systems, but possibly
>>>>>>>> not their initial release configuration (hence, via a later microcode
>>>>>>>> update).
>>>>>>>>
>>>>>>>> HLE is also disabled in microcode on all older parts for errata 
>>>>>>>> reasons,
>>>>>>>> so in practice it doesn't exist anywhere now.
>>>>>>>>
>>>>>>>> I think it is safe to drop this workaround, and this does seem a more
>>>>>>>> simple option than encoding which microcode turned HLE off (which sadly
>>>>>>>> isn't covered by the spec updates, as even when turned off, HLE is 
>>>>>>>> still
>>>>>>>> functioning according to its spec of "may speed things up, may do
>>>>>>>> nothing"), or the interactions with the CPUID hiding capabilities of
>>>>>>>> MSR_TSX_CTRL.
>>>>>>> I'm afraid I don't fully follow: For one, does what you say imply HLE is
>>>>>>> no longer enumerated in CPUID?
>>>>>> No - sadly not.  For reasons of "not repeating the Haswell/Broadwell
>>>>>> microcode fiasco", the HLE bit will continue to exist and be set. 
>>>>>> (Although on CascadeLake and later, you can turn it off with 
>>>>>> MSR_TSX_CTRL.)
>>>>>>
>>>>>> It was always a weird CPUID bit.  You were supposed to put
>>>>>> XACQUIRE/XRELEASE prefixes on your legacy locking, and it would be a nop
>>>>>> on old hardware and go faster on newer hardware.
>>>>>>
>>>>>> There is nothing runtime code needs to look at the HLE bit for, except
>>>>>> perhaps for UI reporting purposes.
>>>>> Do you know of some public Intel doc I could reference for all of this,
>>>>> which I would kind of need in the description of a patch ...
>>>>>
>>>>>>> But then this
>>>>>>> erratum does not have the usual text effectively meaning that an ucode
>>>>>>> update is or will be available to address the issue; instead it says
>>>>>>> that BIOS or VMM can reserve the respective address range.
>>>>>> This is not surprising at all.  Turning off HLE was an unrelated
>>>>>> activity, and I bet the link went unnoticed.
>>>>>>
>>>>>>> This - assuming the alternative you describe is indeed viable - then is 
>>>>>>> surely
>>>>>>> a much more intrusive workaround than needed. Which I wouldn't assume
>>>>>>> they would suggest in such a case.
>>>>>> My suggestion was to drop the workaround, not to complicated it with a
>>>>>> microcode revision matrix.
>>>>> ... doing this? I don't think I've seen any of this in writing so far,
>>>>> except by you. (I don't understand how this reply of yours relates to
>>>>> what I was saying about the spec update. I understand what you are
>>>>> suggesting. I merely tried to express that I'd have expected Intel to
>>>>> point out the much easier workaround, rather than just a pretty involved
>>>>> one.) Otherwise, may I suggest you make such a patch, to make sure it
>>>>> has an adequate description?
>>>> Seeing as there seems to be some data missing to justify the commit -
>>>> was has Linux done with those erratas?
>>> While they deal with the SNB erratum in a similar way, I'm afraid I'm
>>> unaware of Linux having or having had a workaround for the errata here.
>>> Which, granted, is a little surprising when we did actually even issue
>>> an XSA for this.
>>>
>>> In fact I find Andrew's request even more surprising with that fact (us
>>> having issued XSA-282 for it) in mind, which originally I don't think I
>>> had paid attention to (nor recalled).
>> No - I'm aware of it.  It probably was the right move at the time.
>>
>> But, Intel have subsequently killed HLE in microcode updates update in
>> all CPUs it ever existed in (to fix a memory ordering erratum), and
>> removed it from the architecture moving forwards (the enumeration of
>> TSX_CTRL means HLE architecturally doesn't exist even if it is enumerated).
> Should we then check for TSX_CTRL in order to check whether to engage
> the workaround?

By the looks of the current model list, TSX_CTRL doesn't exist on any of
those CPUs.

https://xenbits.xen.org/docs/unstable/misc/xen-command-line.html#tsx

It was the March 2019 ucode which turned off HLE everywhere, which was
only shortly after we released XSA-282.

~Andrew



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.