[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [BUG] x2apic broken with current AMD hardware



On Mon, Mar 20, 2023 at 09:28:20AM +0100, Jan Beulich wrote:
> On 20.03.2023 09:14, Jan Beulich wrote:
> > On 17.03.2023 18:26, Elliott Mitchell wrote:
> >> On Fri, Mar 17, 2023 at 09:22:09AM +0100, Jan Beulich wrote:
> >>> On 16.03.2023 23:03, Elliott Mitchell wrote:
> >>>> On Mon, Mar 13, 2023 at 08:01:02AM +0100, Jan Beulich wrote:
> >>>>> On 11.03.2023 01:09, Elliott Mitchell wrote:
> >>>>>> On Thu, Mar 09, 2023 at 10:03:23AM +0100, Jan Beulich wrote:
> >>>>>>>
> >>>>>>> In any event you will want to collect a serial log at maximum 
> >>>>>>> verbosity.
> >>>>>>> It would also be of interest to know whether turning off the IOMMU 
> >>>>>>> avoids
> >>>>>>> the issue as well (on the assumption that your system has less than 
> >>>>>>> 255
> >>>>>>> CPUs).
> >>>>>>
> >>>>>> I think I might have figured out the situation in a different fashion.
> >>>>>>
> >>>>>> I was taking a look at the BIOS manual for this motherboard and noticed
> >>>>>> a mention of a "Local APIC Mode" setting.  Four values are listed
> >>>>>> "Compatibility", "xAPIC", "x2APIC", and "Auto".
> >>>>>>
> >>>>>> That is the sort of setting I likely left at "Auto" and that may well
> >>>>>> result in x2 functionality being disabled.  Perhaps the x2APIC
> >>>>>> functionality on AMD is detecting whether the hardware is present, and
> >>>>>> failing to test whether it has been enabled?  (could be useful to 
> >>>>>> output
> >>>>>> a message suggesting enabling the hardware feature)
> >>>>>
> >>>>> Can we please move to a little more technical terms here? What is 
> >>>>> "present"
> >>>>> and "enabled" in your view? I don't suppose you mean the CPUID bit 
> >>>>> (which
> >>>>> we check) and the x2APIC-mode-enable one (which we drive as needed). 
> >>>>> It's
> >>>>> also left unclear what the four modes of BIOS operation evaluate to. 
> >>>>> Even
> >>>>> if we knew that, overriding e.g. "Compatibility" (which likely means 
> >>>>> some
> >>>>> form of "disabled" / "hidden") isn't normally an appropriate thing to 
> >>>>> do.
> >>>>> In "Auto" mode Xen likely should work - the only way I could interpret 
> >>>>> the
> >>>>> the other modes are "xAPIC" meaning no x2APIC ACPI tables entries (and
> >>>>> presumably the CPUID bit also masked), "x2APIC" meaning x2APIC mode pre-
> >>>>> enabled by firmware, and "Auto" leaving it to the OS to select. Yet 
> >>>>> that's
> >>>>> speculation on my part ...
> >>>>
> >>>> I provided the information I had discovered.  There is a setting for this
> >>>> motherboard (likely present on some similar motherboards) which /may/
> >>>> effect the issue.  I doubt I've tried "compatibility", but none of the
> >>>> values I've tried have gotten the system to boot without "x2apic=false"
> >>>> on Xen's command-line.
> >>>>
> >>>> When setting to "x2APIC" just after "(XEN) AMD-Vi: IOMMU Extended 
> >>>> Features:"
> >>>> I see the line "(XEN) - x2APIC".  Later is the line
> >>>> "(XEN) x2APIC mode is already enabled by BIOS."  I'll guess "Auto"
> >>>> leaves the x2APIC turned off since neither line is present.
> >>>
> >>> When "(XEN) - x2APIC" is absent the IOMMU can't be switched into x2APIC
> >>> mode. Are you sure that's the case when using "Auto"?
> >>
> >> grep -eAPIC\ driver -e-\ x2APIC:
> >>
> >> "Auto":
> >> (XEN) Using APIC driver default
> >> (XEN) Overriding APIC driver with bigsmp
> >> (XEN) Switched to APIC driver x2apic_cluster
> >>
> >> "x2APIC":
> >> (XEN) Using APIC driver x2apic_cluster
> >> (XEN) - x2APIC
> >>
> >> Yes, I'm sure.
> > 
> > Okay, this then means we're running in a mode we don't mean to run
> > in: When the IOMMU claims to not support x2APIC mode (which is odd in
> > the first place when at the same time the CPU reports x2APIC mode as
> > supported), amd_iommu_prepare() is intended to switch interrupt
> > remapping mode to "restricted" (which in turn would force x2APIC mode
> > to "physical", not "clustered"). I notice though that there are a
> > number of error paths in the function which bypass this setting. Could
> > you add a couple of printk()s to understand which path is taken (each
> > time; the function can be called more than once)?
> 
> I think I've spotted at least one issue. Could you give the patch below
> a try please? (Patch is fine for master and 4.17 but would need context
> adjustment for 4.16.)

A college has similar issue, where an AMD system hangs during PV dom0
boot, seems like (at least) nvme's interrupts are not delivered. Setting
x2apic_phys=true solves the issue, which hopefully means the patch below
will help too (unfortunately can't test it right now). It's Xen 4.14.

> Jan
> 
> AMD/IOMMU: without XT, x2APIC needs to be forced into physical mode
> 
> An earlier change with the same title (commit 1ba66a870eba) altered only
> the path where x2apic_phys was already set to false (perhaps from the
> command line). The same of course needs applying when the variable
> wasn't modified yet from its initial value.
> 
> Reported-by: Elliott Mitchell <ehem+xen@xxxxxxx>
> Signed-off-by: Jan Beulich <jbeulich@xxxxxxxx>
> 
> --- unstable.orig/xen/arch/x86/genapic/x2apic.c
> +++ unstable/xen/arch/x86/genapic/x2apic.c
> @@ -236,11 +236,11 @@ const struct genapic *__init apic_x2apic
>      if ( x2apic_phys < 0 )
>      {
>          /*
> -         * Force physical mode if there's no interrupt remapping support: The
> -         * ID in clustered mode requires a 32 bit destination field due to
> +         * Force physical mode if there's no (full) interrupt remapping 
> support:
> +         * The ID in clustered mode requires a 32 bit destination field due 
> to
>           * the usage of the high 16 bits to hold the cluster ID.
>           */
> -        x2apic_phys = !iommu_intremap ||
> +        x2apic_phys = iommu_intremap != iommu_intremap_full ||
>                        (acpi_gbl_FADT.flags & ACPI_FADT_APIC_PHYSICAL) ||
>                        (IS_ENABLED(CONFIG_X2APIC_PHYSICAL) &&
>                         !(acpi_gbl_FADT.flags & ACPI_FADT_APIC_CLUSTER));
> 
> 
> 

-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab

Attachment: signature.asc
Description: PGP signature


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.