[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [BUG] x2apic broken with current AMD hardware
On 21.03.2023 05:19, Elliott Mitchell wrote: > On Mon, Mar 20, 2023 at 09:28:20AM +0100, Jan Beulich wrote: >> AMD/IOMMU: without XT, x2APIC needs to be forced into physical mode >> >> An earlier change with the same title (commit 1ba66a870eba) altered only >> the path where x2apic_phys was already set to false (perhaps from the >> command line). The same of course needs applying when the variable >> wasn't modified yet from its initial value. >> >> Reported-by: Elliott Mitchell <ehem+xen@xxxxxxx> >> Signed-off-by: Jan Beulich <jbeulich@xxxxxxxx> > > This does appear to be an improvement. With this the system boots if > the "Local APIC Mode" setting is "auto". As you may have guessed, > "(XEN) Switched to APIC driver x2apic_phys". > > > > When I tried setting "Local APIC Mode" to "x2APIC" though things didn't > go so well. Sometime >15 seconds after Domain 0 boots, first: > > "(XEN) APIC error on CPU#: 00(08), Receive accept error" (looks to be > for every core) > > Then: > "(XEN) APIC error on CPU#: 08(08), Receive accept error" (again for > every core, but *after* the above has appeared for all cores) Receive accept errors generally mean a bad vector was received, yet the sending side deemed it fine. That could be a bad I/O APIC RTE, a bad MSI message data value, or a bad translation thereof into an IRTE (albeit iirc we never alter the vector). > The above appears about twice for each core, then I start seeing > "(XEN) CPU#: No irq handler for vector ?? (IRQ -2147483648, LAPIC)" > > The core doesn't vary too much with this, but the vector varies some. > > Upon looking "(XEN) Using APIC driver x2apic_cluster". Unfortunately > I didn't try booting with x2apic_phys forced with this setting. My guess is that this would also help. But the system should still work correctly in clustered mode. As a first step I guess debug key 'i', 'z', and 'M' output may provide some insight. But the request for a full log at maximum verbosity also remains (ideally with a debug hypervisor). > So x2apic_cluster is looking like a <ahem> on recent AMD processors. > > > I'm unsure this qualifies as "Tested-by". Certainly it IS an > improvement, but the problem certainly isn't 100% solved. There simply are multiple problems; one looks to be solved now. Jan
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |