[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] x86: extend coverage of HLE "bad page" workaround


  • To: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
  • From: Roger Pau Monné <roger.pau@xxxxxxxxxx>
  • Date: Tue, 21 Mar 2023 16:59:47 +0100
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=citrix.com; dmarc=pass action=none header.from=citrix.com; dkim=pass header.d=citrix.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=HyS+hGNztbzOeytP96LD6T29LdGdaVn8WQ2qiDVudeU=; b=RrHlU/MyIqK5jZWTz7Yk1GByDFKPj7H6YFmIQduzniZvC+Kmypv4WVmcopPTk8Fe0rND0eYwpZClfF9TY82b5qdnQiLeFW67eHJYSzxjnjqaYLoiAdxxXrV64VXL6jUvr27dF9DJizfnQ0/ZKx5DljjmAWOsNvWCvIkRyChk7WmGxSBi43BMKAEJmGfCDXQzVQo9gTXsBvzhWc8o/UkKaGiQ23CSTaKPWOnJrPOdQIRoFNS8DqNFAc0dV90SjZCuty76s5/1kUHiZr+vI6ZoN0NlX83GMuwCyQHW10v7yyXMXpbt7BghOBIPKZzGgtA+GsrcCzMJZKYNuDQoXxYLpg==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=RNWFUw5s7ll9wLilSbN0astfBOqKOzi+i8rlras2hAE7oTsNt158mFabZkFGgFLtUK4r784YHWlh/DP3a93QEV/ULKsjPLqMfPRz6V0eIFM3eGX/249SyWjM5S/xSZT3EtHa0Bfk8wxOzAYCPKRp56mloTc0jpTpcvOD5I86GZAoDryTqIH5EfaVQNYVlOMkQ1GXt3jMUPzh0WD/cDr6k6plpDjNFeecm7kLs8K6/YuEyze1lokg6weSqJdp+YQqHgbVtR1S1J9Wdqv7WcA1vx9Hy4l1mcZps/dO9gTEUBTJiZxdh4kN5F7WmP57Nb+BNsrKVVz0gnP33Tz4Xlh2OA==
  • Authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=citrix.com;
  • Cc: Jan Beulich <jbeulich@xxxxxxxx>, "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>, Wei Liu <wl@xxxxxxx>
  • Delivery-date: Tue, 21 Mar 2023 16:00:16 +0000
  • Ironport-data: A9a23:l7cQmKtLk9f+eWMoCS86ncmhDOfnVHBfMUV32f8akzHdYApBsoF/q tZmKTqAb6veY2TzfY90Otnj9xkO7cDRnIRrTwpkrSE8F3wV+JbJXdiXEBz9bniYRiHhoOCLz O1FM4Wdc5pkJpP4jk3wWlQ0hSAkjclkfpKlVKiffHg3HVQ+IMsYoUoLs/YjhYJ1isSODQqIu Nfjy+XSI1bg0DNvWo4uw/vrRChH4bKj6Fv0gnRkPaoQ5ASEziFPZH4iDfrZw0XQE9E88tGSH 44v/JnhlkvF8hEkDM+Sk7qTWiXmlZaLYGBiIlIPM0STqkAqSh4ai87XB9JFAatjsB2bnsgZ9 Tl4ncfYpTHFnEH7sL91vxFwS0mSNEDdkVPNCSDXXce7lyUqf5ZwqhnH4Y5f0YAwo45K7W9yG fMwNioEMT7Su/+KkJm1a+pSiP87Bvi0M9ZK0p1g5Wmx4fcOZ7nmGv+PyfoGmTA6i4ZJAOrUY NcfZXx3dhPcbhZTO1ARTpUjgOOvgXq5eDpdwL6XjfNvvy6Pk0osgf60bou9lt+iHK25mm6Co W3L5SLhCwwyP92D0zuVtHmrg4cjmAuiANhLSOPjrqECbFu79lE6Iy8NSWSCjqOHoUv5Y8JPA nUX0397xUQ13AnxJjXnZDW6qnOZuh8XW/JLDvY3rgqKz8L88wufQ2QJUDNFQNgnr9MtAywn0 EeTmNHkDiApt6eaIVqC8p+EoDX0PjIaRVLufgcBRAoBptXm/oc6i0uVSs45SfDkyNroBTv33 jaG6jAkgKkehtIK0KP9+k3bhzWrpd7CSQtdChjrY19JJzhRPOaND7FEI3CChRqcBO51lmW8g UU=
  • Ironport-hdrordr: A9a23:y3P2C6knAGQqPRNJiuig2VHJX4XpDfIa3DAbv31ZSRFFG/FwWf re+8jzsiWE8Ar5OUtQ4OxoXZPqfZqyz/JICOUqUotKPzOW21dATrsC0WKK+VSJcUDDH4hmpM VdmsNFZuEYY2IK6frS0U2VFMsh3cnC0I3Av5a5854Ud2FXgnhbnmJENjo=
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On Tue, Mar 21, 2023 at 02:51:30PM +0000, Andrew Cooper wrote:
> On 20/03/2023 9:24 am, Jan Beulich wrote:
> > On 17.03.2023 12:39, Roger Pau Monné wrote:
> >> On Tue, May 26, 2020 at 06:40:16PM +0200, Jan Beulich wrote:
> >>> On 26.05.2020 17:01, Andrew Cooper wrote:
> >>>> On 26/05/2020 14:35, Jan Beulich wrote:
> >>>>> On 26.05.2020 13:17, Andrew Cooper wrote:
> >>>>>> On 26/05/2020 07:49, Jan Beulich wrote:
> >>>>>>> Respective Core Gen10 processor lines are affected, too.
> >>>>>>>
> >>>>>>> Signed-off-by: Jan Beulich <jbeulich@xxxxxxxx>
> >>>>>>>
> >>>>>>> --- a/xen/arch/x86/mm.c
> >>>>>>> +++ b/xen/arch/x86/mm.c
> >>>>>>> @@ -6045,6 +6045,8 @@ const struct platform_bad_page *__init g
> >>>>>>>      case 0x000506e0: /* errata SKL167 / SKW159 */
> >>>>>>>      case 0x000806e0: /* erratum KBL??? */
> >>>>>>>      case 0x000906e0: /* errata KBL??? / KBW114 / CFW103 */
> >>>>>>> +    case 0x000a0650: /* erratum Core Gen10 U/H/S 101 */
> >>>>>>> +    case 0x000a0660: /* erratum Core Gen10 U/H/S 101 */
> >>>>>> This is marred in complexity.
> >>>>>>
> >>>>>> The enumeration of MSR_TSX_CTRL (from the TAA fix, but architectural
> >>>>>> moving forwards on any TSX-enabled CPU) includes a confirmation that 
> >>>>>> HLE
> >>>>>> no longer exists/works.  This applies to IceLake systems, but possibly
> >>>>>> not their initial release configuration (hence, via a later microcode
> >>>>>> update).
> >>>>>>
> >>>>>> HLE is also disabled in microcode on all older parts for errata 
> >>>>>> reasons,
> >>>>>> so in practice it doesn't exist anywhere now.
> >>>>>>
> >>>>>> I think it is safe to drop this workaround, and this does seem a more
> >>>>>> simple option than encoding which microcode turned HLE off (which sadly
> >>>>>> isn't covered by the spec updates, as even when turned off, HLE is 
> >>>>>> still
> >>>>>> functioning according to its spec of "may speed things up, may do
> >>>>>> nothing"), or the interactions with the CPUID hiding capabilities of
> >>>>>> MSR_TSX_CTRL.
> >>>>> I'm afraid I don't fully follow: For one, does what you say imply HLE is
> >>>>> no longer enumerated in CPUID?
> >>>> No - sadly not.  For reasons of "not repeating the Haswell/Broadwell
> >>>> microcode fiasco", the HLE bit will continue to exist and be set. 
> >>>> (Although on CascadeLake and later, you can turn it off with 
> >>>> MSR_TSX_CTRL.)
> >>>>
> >>>> It was always a weird CPUID bit.  You were supposed to put
> >>>> XACQUIRE/XRELEASE prefixes on your legacy locking, and it would be a nop
> >>>> on old hardware and go faster on newer hardware.
> >>>>
> >>>> There is nothing runtime code needs to look at the HLE bit for, except
> >>>> perhaps for UI reporting purposes.
> >>> Do you know of some public Intel doc I could reference for all of this,
> >>> which I would kind of need in the description of a patch ...
> >>>
> >>>>> But then this
> >>>>> erratum does not have the usual text effectively meaning that an ucode
> >>>>> update is or will be available to address the issue; instead it says
> >>>>> that BIOS or VMM can reserve the respective address range.
> >>>> This is not surprising at all.  Turning off HLE was an unrelated
> >>>> activity, and I bet the link went unnoticed.
> >>>>
> >>>>> This - assuming the alternative you describe is indeed viable - then is 
> >>>>> surely
> >>>>> a much more intrusive workaround than needed. Which I wouldn't assume
> >>>>> they would suggest in such a case.
> >>>> My suggestion was to drop the workaround, not to complicated it with a
> >>>> microcode revision matrix.
> >>> ... doing this? I don't think I've seen any of this in writing so far,
> >>> except by you. (I don't understand how this reply of yours relates to
> >>> what I was saying about the spec update. I understand what you are
> >>> suggesting. I merely tried to express that I'd have expected Intel to
> >>> point out the much easier workaround, rather than just a pretty involved
> >>> one.) Otherwise, may I suggest you make such a patch, to make sure it
> >>> has an adequate description?
> >> Seeing as there seems to be some data missing to justify the commit -
> >> was has Linux done with those erratas?
> > While they deal with the SNB erratum in a similar way, I'm afraid I'm
> > unaware of Linux having or having had a workaround for the errata here.
> > Which, granted, is a little surprising when we did actually even issue
> > an XSA for this.
> >
> > In fact I find Andrew's request even more surprising with that fact (us
> > having issued XSA-282 for it) in mind, which originally I don't think I
> > had paid attention to (nor recalled).
> 
> No - I'm aware of it.  It probably was the right move at the time.
> 
> But, Intel have subsequently killed HLE in microcode updates update in
> all CPUs it ever existed in (to fix a memory ordering erratum), and
> removed it from the architecture moving forwards (the enumeration of
> TSX_CTRL means HLE architecturally doesn't exist even if it is enumerated).

Should we then check for TSX_CTRL in order to check whether to engage
the workaround?

Thanks, Roger.



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.