[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [BUG] x2apic broken with current AMD hardware
On 20/03/2023 8:28 am, Jan Beulich wrote: > On 20.03.2023 09:14, Jan Beulich wrote: >> On 17.03.2023 18:26, Elliott Mitchell wrote: >>> On Fri, Mar 17, 2023 at 09:22:09AM +0100, Jan Beulich wrote: >>>> On 16.03.2023 23:03, Elliott Mitchell wrote: >>>>> On Mon, Mar 13, 2023 at 08:01:02AM +0100, Jan Beulich wrote: >>>>>> On 11.03.2023 01:09, Elliott Mitchell wrote: >>>>>>> On Thu, Mar 09, 2023 at 10:03:23AM +0100, Jan Beulich wrote: >>>>>>>> In any event you will want to collect a serial log at maximum >>>>>>>> verbosity. >>>>>>>> It would also be of interest to know whether turning off the IOMMU >>>>>>>> avoids >>>>>>>> the issue as well (on the assumption that your system has less than 255 >>>>>>>> CPUs). >>>>>>> I think I might have figured out the situation in a different fashion. >>>>>>> >>>>>>> I was taking a look at the BIOS manual for this motherboard and noticed >>>>>>> a mention of a "Local APIC Mode" setting. Four values are listed >>>>>>> "Compatibility", "xAPIC", "x2APIC", and "Auto". >>>>>>> >>>>>>> That is the sort of setting I likely left at "Auto" and that may well >>>>>>> result in x2 functionality being disabled. Perhaps the x2APIC >>>>>>> functionality on AMD is detecting whether the hardware is present, and >>>>>>> failing to test whether it has been enabled? (could be useful to output >>>>>>> a message suggesting enabling the hardware feature) >>>>>> Can we please move to a little more technical terms here? What is >>>>>> "present" >>>>>> and "enabled" in your view? I don't suppose you mean the CPUID bit (which >>>>>> we check) and the x2APIC-mode-enable one (which we drive as needed). It's >>>>>> also left unclear what the four modes of BIOS operation evaluate to. Even >>>>>> if we knew that, overriding e.g. "Compatibility" (which likely means some >>>>>> form of "disabled" / "hidden") isn't normally an appropriate thing to do. >>>>>> In "Auto" mode Xen likely should work - the only way I could interpret >>>>>> the >>>>>> the other modes are "xAPIC" meaning no x2APIC ACPI tables entries (and >>>>>> presumably the CPUID bit also masked), "x2APIC" meaning x2APIC mode pre- >>>>>> enabled by firmware, and "Auto" leaving it to the OS to select. Yet >>>>>> that's >>>>>> speculation on my part ... >>>>> I provided the information I had discovered. There is a setting for this >>>>> motherboard (likely present on some similar motherboards) which /may/ >>>>> effect the issue. I doubt I've tried "compatibility", but none of the >>>>> values I've tried have gotten the system to boot without "x2apic=false" >>>>> on Xen's command-line. >>>>> >>>>> When setting to "x2APIC" just after "(XEN) AMD-Vi: IOMMU Extended >>>>> Features:" >>>>> I see the line "(XEN) - x2APIC". Later is the line >>>>> "(XEN) x2APIC mode is already enabled by BIOS." I'll guess "Auto" >>>>> leaves the x2APIC turned off since neither line is present. >>>> When "(XEN) - x2APIC" is absent the IOMMU can't be switched into x2APIC >>>> mode. Are you sure that's the case when using "Auto"? >>> grep -eAPIC\ driver -e-\ x2APIC: >>> >>> "Auto": >>> (XEN) Using APIC driver default >>> (XEN) Overriding APIC driver with bigsmp >>> (XEN) Switched to APIC driver x2apic_cluster >>> >>> "x2APIC": >>> (XEN) Using APIC driver x2apic_cluster >>> (XEN) - x2APIC >>> >>> Yes, I'm sure. >> Okay, this then means we're running in a mode we don't mean to run >> in: When the IOMMU claims to not support x2APIC mode (which is odd in >> the first place when at the same time the CPU reports x2APIC mode as >> supported), amd_iommu_prepare() is intended to switch interrupt >> remapping mode to "restricted" (which in turn would force x2APIC mode >> to "physical", not "clustered"). I notice though that there are a >> number of error paths in the function which bypass this setting. Could >> you add a couple of printk()s to understand which path is taken (each >> time; the function can be called more than once)? > I think I've spotted at least one issue. Could you give the patch below > a try please? (Patch is fine for master and 4.17 but would need context > adjustment for 4.16.) > > Jan > > AMD/IOMMU: without XT, x2APIC needs to be forced into physical mode > > An earlier change with the same title (commit 1ba66a870eba) altered only > the path where x2apic_phys was already set to false (perhaps from the > command line). The same of course needs applying when the variable > wasn't modified yet from its initial value. > > Reported-by: Elliott Mitchell <ehem+xen@xxxxxxx> > Signed-off-by: Jan Beulich <jbeulich@xxxxxxxx> Reviewed-by: Andrew Cooper <andrew.cooper3@xxxxxxxxxx> I think it's worth saying that for diagnosing purposes, if x2apic_phys=true also resolves the issue, then it is likely related to this. ~Andrew > > --- unstable.orig/xen/arch/x86/genapic/x2apic.c > +++ unstable/xen/arch/x86/genapic/x2apic.c > @@ -236,11 +236,11 @@ const struct genapic *__init apic_x2apic > if ( x2apic_phys < 0 ) > { > /* > - * Force physical mode if there's no interrupt remapping support: The > - * ID in clustered mode requires a 32 bit destination field due to > + * Force physical mode if there's no (full) interrupt remapping > support: > + * The ID in clustered mode requires a 32 bit destination field due > to > * the usage of the high 16 bits to hold the cluster ID. > */ > - x2apic_phys = !iommu_intremap || > + x2apic_phys = iommu_intremap != iommu_intremap_full || > (acpi_gbl_FADT.flags & ACPI_FADT_APIC_PHYSICAL) || > (IS_ENABLED(CONFIG_X2APIC_PHYSICAL) && > !(acpi_gbl_FADT.flags & ACPI_FADT_APIC_CLUSTER)); > > >
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |