[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [BUG] x2apic broken with current AMD hardware


  • To: Jan Beulich <jbeulich@xxxxxxxx>, Elliott Mitchell <ehem+xen@xxxxxxx>
  • From: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
  • Date: Mon, 20 Mar 2023 17:50:00 +0000
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=citrix.com; dmarc=pass action=none header.from=citrix.com; dkim=pass header.d=citrix.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=FuN1LeBsUXFBsm+OrJGaDhnAvkYG6aUiE64AVMrKfc0=; b=Qjz0FNpKA3ReM4HoeqyiBn3fyylSJGo7vdnstweqkr8f8IXJ0EJRyK/UIkde/IlpX1phEz/p2/8/DfSi6PeO6BsMGbGlsAvZ+L0eoYCIE4AEJjDKec1jJKU7L/zKJqq0G8jEGaU2S7w5iLcaYo0Zby07e280/Z+Xy1Rs9XVLY929tqgY/201AgLaYke49T56IquD/EG29NppHrvcLzRC8GDjSfsWiPznIE4kCZSgJrg3wlvgGn4yA5eLCqxXK75lDcsiOq7xu2z4y6OC1h30bXEK9rFykwlRBrsvGMFQG7wv+a3l46Vuc/ennActLoy1nPe7ddrxusc2S06vJZnW+Q==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=Y2TAvtQeZ8WYltUU4bv1ETFmHuVMYvMD7Faepdm7ZX/lJgz1jC/K5Xoq7zf0sQ5W4k+TLHquY0WNsfgnnHifVZ7vFTT4MrO3CGiqaGz/xRGnUv2JaEQKwRoKtVfD5OxKvs1hGIDU7RDfL2pZOYr3Og9iIDS669TmOjJg83zaQKZ9HezFjFP6STOJG+88UnUMGsdOPL38T4vDqItagF+pILcEHEhoAmptrg+d8NCARo+gEqMhYy0REqZVXNcifaAKib1MKZllU4ti5NYyXgK3wMV0WWQwguhYjLzzxJWsFFfWt0YO4aV3b2uROT0K4PeSyjqmUBIou0FnnhF/ZXeskw==
  • Authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=citrix.com;
  • Cc: xen-devel@xxxxxxxxxxxxxxxxxxxx
  • Delivery-date: Mon, 20 Mar 2023 17:50:55 +0000
  • Ironport-data: A9a23:V8ANaKNoW1gOIITvrR2OlsFynXyQoLVcMsEvi/4bfWQNrUp3gmZWy WMZWziFMq2DN2ejft5zbNy+9E8G78TXx9c1QQto+SlhQUwRpJueD7x1DKtS0wC6dZSfER09v 63yTvGacajYm1eF/k/F3oDJ9CU6jufQAOKnUoYoAwgpLSd8UiAtlBl/rOAwh49skLCRDhiE/ Nj/uKUzAnf8s9JPGj9SuvLrRC9H5qyo42tD5ARmPJingXeF/5UrJMNHTU2OByOQrrl8RoaSW +vFxbelyWLVlz9F5gSNy+uTnuUiG9Y+DCDW4pZkc/HKbitq/0Te5p0TJvsEAXq7vh3S9zxHJ HehgrTrIeshFvWkdO3wyHC0GQkmVUFN0OevzXRSLaV/ZqAJGpfh66wGMa04AWEX0shYG1Np9 +EzEz4yNTCvpvKN2JSSYPY506zPLOGzVG8ekldJ6GiBSNoDH9XESaiM4sJE1jAtgMwIBezZe 8cSdTtoalLHfgFLPVAUTpk5mY9EhFGmK2Ee9A3T+PpxujCDpOBy+OGF3N79U9qGX8hK2G2fo XrL5T/RCRAGLt2PjzGC9xpAg8eWxX+jANhISezQGvhCunu591MwBQYsaXC6q8TjqkCXecl6J BlBksYphe1onKCxdfH/VRClpH+PvjYHRsFdVeY97Wmlx6rZ5kWJC24sS2NZLtchsaceVTEsk 1OEgd7tLThuq6GOD2KQ8K+OqjG/MjRTKnUNDRLoViMA6tjn5Y020BTGS487FLbv14WlXzbt3 zqNsS4ywa0JitIG3Lm6+laBhC+wop/OTUg+4QC/sn+Z0z6VrbWNP+SAgWU3J94ZRGpFZjFtZ EQ5pvU=
  • Ironport-hdrordr: A9a23:4L5rTqj3u1LApiMOW2a40d7ISnBQX6R23DAbv31ZSRFFG/Fwz/ re7MjzECWE/Ar5K0tQyuxoWZPwE080kKQf3WB/B8bfYOCLghrQEGgm1/qS/9SCIVyzygc+79 YeT0EWMrSZZzQU7KaKh3jfLz9j+qj4zEnBv5aR854Hd3AQV0gU1XYFNu/tKDwNeOApP+tAKH Op3LsHmxOQPVgHZMGyBn0IRMnEvcDGmajnbxkPGgI95BPLqT+z8rb1HyGC2B0TSSlCzN4ZgA v4uj28yrSotvm6zhPG92vL9JRQhfPq19tEA6W3+6woAwSpphejYYxgX7GYnDQtu+Gp9XMjjd XKqQdIBbUX11rhOk2lqx7k2w3tyw807W7z7FeEjXzosaXCNVAH4od69MVkmtSw0TtsgDnQu5 gn40up875sST/QliX04NbFEzlsi0qPuHIn1dUeinROOLFuIIO4+eYkkn99IdMlJmbX+YonGO 5hAIX3//BNa26XaHjfoy1G3MGsdm5bJGbbfmEy/uiulxRGlnFwyEUVgOYFmG0byZ47Q55Yo8 zZL6VTkq1URMN+V9MhOA44e7rpNoXxe2OWDIvSGyW+KEg/AQOVl3cj2sRu2AmoEKZ4hafa1q 6xHyIHiYb1E3ieSfFml6c7gSwle1/NKwgEEKllltpEU43HNfnW2GW4OSITeuub0rAi657gKr KO0GQ/OY6oEYMYcbw5lDEX3PJpWCkjuBl/gKdrZ7vJmLOMFmXn29arBco7DICdZgoZZg==
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 20/03/2023 8:28 am, Jan Beulich wrote:
> On 20.03.2023 09:14, Jan Beulich wrote:
>> On 17.03.2023 18:26, Elliott Mitchell wrote:
>>> On Fri, Mar 17, 2023 at 09:22:09AM +0100, Jan Beulich wrote:
>>>> On 16.03.2023 23:03, Elliott Mitchell wrote:
>>>>> On Mon, Mar 13, 2023 at 08:01:02AM +0100, Jan Beulich wrote:
>>>>>> On 11.03.2023 01:09, Elliott Mitchell wrote:
>>>>>>> On Thu, Mar 09, 2023 at 10:03:23AM +0100, Jan Beulich wrote:
>>>>>>>> In any event you will want to collect a serial log at maximum 
>>>>>>>> verbosity.
>>>>>>>> It would also be of interest to know whether turning off the IOMMU 
>>>>>>>> avoids
>>>>>>>> the issue as well (on the assumption that your system has less than 255
>>>>>>>> CPUs).
>>>>>>> I think I might have figured out the situation in a different fashion.
>>>>>>>
>>>>>>> I was taking a look at the BIOS manual for this motherboard and noticed
>>>>>>> a mention of a "Local APIC Mode" setting.  Four values are listed
>>>>>>> "Compatibility", "xAPIC", "x2APIC", and "Auto".
>>>>>>>
>>>>>>> That is the sort of setting I likely left at "Auto" and that may well
>>>>>>> result in x2 functionality being disabled.  Perhaps the x2APIC
>>>>>>> functionality on AMD is detecting whether the hardware is present, and
>>>>>>> failing to test whether it has been enabled?  (could be useful to output
>>>>>>> a message suggesting enabling the hardware feature)
>>>>>> Can we please move to a little more technical terms here? What is 
>>>>>> "present"
>>>>>> and "enabled" in your view? I don't suppose you mean the CPUID bit (which
>>>>>> we check) and the x2APIC-mode-enable one (which we drive as needed). It's
>>>>>> also left unclear what the four modes of BIOS operation evaluate to. Even
>>>>>> if we knew that, overriding e.g. "Compatibility" (which likely means some
>>>>>> form of "disabled" / "hidden") isn't normally an appropriate thing to do.
>>>>>> In "Auto" mode Xen likely should work - the only way I could interpret 
>>>>>> the
>>>>>> the other modes are "xAPIC" meaning no x2APIC ACPI tables entries (and
>>>>>> presumably the CPUID bit also masked), "x2APIC" meaning x2APIC mode pre-
>>>>>> enabled by firmware, and "Auto" leaving it to the OS to select. Yet 
>>>>>> that's
>>>>>> speculation on my part ...
>>>>> I provided the information I had discovered.  There is a setting for this
>>>>> motherboard (likely present on some similar motherboards) which /may/
>>>>> effect the issue.  I doubt I've tried "compatibility", but none of the
>>>>> values I've tried have gotten the system to boot without "x2apic=false"
>>>>> on Xen's command-line.
>>>>>
>>>>> When setting to "x2APIC" just after "(XEN) AMD-Vi: IOMMU Extended 
>>>>> Features:"
>>>>> I see the line "(XEN) - x2APIC".  Later is the line
>>>>> "(XEN) x2APIC mode is already enabled by BIOS."  I'll guess "Auto"
>>>>> leaves the x2APIC turned off since neither line is present.
>>>> When "(XEN) - x2APIC" is absent the IOMMU can't be switched into x2APIC
>>>> mode. Are you sure that's the case when using "Auto"?
>>> grep -eAPIC\ driver -e-\ x2APIC:
>>>
>>> "Auto":
>>> (XEN) Using APIC driver default
>>> (XEN) Overriding APIC driver with bigsmp
>>> (XEN) Switched to APIC driver x2apic_cluster
>>>
>>> "x2APIC":
>>> (XEN) Using APIC driver x2apic_cluster
>>> (XEN) - x2APIC
>>>
>>> Yes, I'm sure.
>> Okay, this then means we're running in a mode we don't mean to run
>> in: When the IOMMU claims to not support x2APIC mode (which is odd in
>> the first place when at the same time the CPU reports x2APIC mode as
>> supported), amd_iommu_prepare() is intended to switch interrupt
>> remapping mode to "restricted" (which in turn would force x2APIC mode
>> to "physical", not "clustered"). I notice though that there are a
>> number of error paths in the function which bypass this setting. Could
>> you add a couple of printk()s to understand which path is taken (each
>> time; the function can be called more than once)?
> I think I've spotted at least one issue. Could you give the patch below
> a try please? (Patch is fine for master and 4.17 but would need context
> adjustment for 4.16.)
>
> Jan
>
> AMD/IOMMU: without XT, x2APIC needs to be forced into physical mode
>
> An earlier change with the same title (commit 1ba66a870eba) altered only
> the path where x2apic_phys was already set to false (perhaps from the
> command line). The same of course needs applying when the variable
> wasn't modified yet from its initial value.
>
> Reported-by: Elliott Mitchell <ehem+xen@xxxxxxx>
> Signed-off-by: Jan Beulich <jbeulich@xxxxxxxx>

Reviewed-by: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>

I think it's worth saying that for diagnosing purposes, if
x2apic_phys=true also resolves the issue, then it is likely related to this.

~Andrew

>
> --- unstable.orig/xen/arch/x86/genapic/x2apic.c
> +++ unstable/xen/arch/x86/genapic/x2apic.c
> @@ -236,11 +236,11 @@ const struct genapic *__init apic_x2apic
>      if ( x2apic_phys < 0 )
>      {
>          /*
> -         * Force physical mode if there's no interrupt remapping support: The
> -         * ID in clustered mode requires a 32 bit destination field due to
> +         * Force physical mode if there's no (full) interrupt remapping 
> support:
> +         * The ID in clustered mode requires a 32 bit destination field due 
> to
>           * the usage of the high 16 bits to hold the cluster ID.
>           */
> -        x2apic_phys = !iommu_intremap ||
> +        x2apic_phys = iommu_intremap != iommu_intremap_full ||
>                        (acpi_gbl_FADT.flags & ACPI_FADT_APIC_PHYSICAL) ||
>                        (IS_ENABLED(CONFIG_X2APIC_PHYSICAL) &&
>                         !(acpi_gbl_FADT.flags & ACPI_FADT_APIC_CLUSTER));
>
>
>




 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.