[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [BUG] x2apic broken with current AMD hardware



On Tue, Mar 21, 2023 at 08:13:15AM +0100, Jan Beulich wrote:
> On 21.03.2023 05:19, Elliott Mitchell wrote:
> 
> > The above appears about twice for each core, then I start seeing
> > "(XEN) CPU#: No irq handler for vector ?? (IRQ -2147483648, LAPIC)"
> > 
> > The core doesn't vary too much with this, but the vector varies some.
> > 
> > Upon looking "(XEN) Using APIC driver x2apic_cluster".  Unfortunately
> > I didn't try booting with x2apic_phys forced with this setting.
> 
> My guess is that this would also help. But the system should still work
> correctly in clustered mode. As a first step I guess debug key 'i', 'z',
> and 'M' output may provide some insight. But the request for a full log
> at maximum verbosity also remains (ideally with a debug hypervisor).

Needs a secure channel (PGP most likely) for everything, since there is
too much information in there.  Serial numbers and MAC addresses are a
potential source of attack (or faking returns).  Smaller segments can be
had more readily:

(XEN) SMBIOS 3.5 present.
(XEN) x2APIC mode is already enabled by BIOS.
(XEN) Using APIC driver x2apic_cluster
(XEN) ACPI: PM-Timer IO Port: 0x808 (32 bits)
(XEN) ACPI: v5 SLEEP INFO: control[0:0], status[0:0]
(XEN) ACPI: SLEEP INFO: pm1x_cnt[1:804,1:0], pm1x_evt[1:800,1:0]
(XEN) ACPI: 32/64X FACS address mismatch in FADT - 785a3000/0000000000000000, 
using 32
(XEN) ACPI:             wakeup_vec[785a300c], vec_size[20]
(XEN) ACPI: Local APIC address 0xfee00000
(XEN) ACPI: IOAPIC (id[0x20] address[0xfec00000] gsi_base[0])
(XEN) IOAPIC[0]: apic_id 32, version 33, address 0xfec00000, GSI 0-23
(XEN) ACPI: IOAPIC (id[0x21] address[0xfec01000] gsi_base[24])
(XEN) IOAPIC[1]: apic_id 33, version 33, address 0xfec01000, GSI 24-55
(XEN) ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
(XEN) ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 low level)
(XEN) ACPI: HPET id: 0x######## base: 0xfed00000
(XEN) Using ACPI (MADT) for SMP configuration information
(XEN) IRQ limits: 56 GSI, 6600 MSI/MSI-X
(XEN) AMD-Vi: IOMMU Extended Features:
(XEN) - Peripheral Page Service Request
(XEN) - x2APIC
(XEN) - NX bit
(XEN) - Guest APIC Physical Processor Interrupt
(XEN) - Invalidate All Command
(XEN) - Guest APIC
(XEN) - Performance Counters
(XEN) - Host Address Translation Size: 0x2
(XEN) - Guest Address Translation Size: 0
(XEN) - Guest CR3 Root Table Level: 0x1
(XEN) - Maximum PASID: 0xf
(XEN) - SMI Filter Register: 0x1
(XEN) - SMI Filter Register Count: 0x1
(XEN) - Guest Virtual APIC Modes: 0x1
(XEN) - Dual PPR Log: 0x2
(XEN) - Dual Event Log: 0x2
(XEN) - Secure ATS
(XEN) - User / Supervisor Page Protection
(XEN) - Device Table Segmentation: 0x3
(XEN) - PPR Log Overflow Early Warning
(XEN) - PPR Automatic Response
(XEN) - Memory Access Routing and Control: 0x1
(XEN) - Block StopMark Message
(XEN) - Performance Optimization
(XEN) - MSI Capability MMIO Access
(XEN) - Guest I/O Protection
(XEN) - Enhanced PPR Handling
(XEN) - Invalidate IOTLB Type
(XEN) - VM Table Size: 0x2
(XEN) - Guest Access Bit Update Disable
(XEN) AMD-Vi: Disabled HAP memory map sharing with IOMMU
(XEN) AMD-Vi: IOMMU 0 Enabled.

I'm a bit concerned how all the reports so far are ASUS motherboards.
This could mean people getting the latest, greatest and using Xen tend
towards ASUS motherboards.  This could also mean ASUS's engineering team
did something to their latest round.  Both are possible.

Could well be the latest round from AMD include more pieces from their
server processors, which trigger x2apic_cluster as default.  Yet didn't
bring some extra portion(s) which are required by x2apic_cluster.


> > So x2apic_cluster is looking like a <ahem> on recent AMD processors.
> > 
> > 
> > I'm unsure this qualifies as "Tested-by".  Certainly it IS an
> > improvement, but the problem certainly isn't 100% solved.
> 
> There simply are multiple problems; one looks to be solved now.

I agree with that assessment.  Just I'm unsure whether this step is
enough to include "Tested-by".  I'm concerned there could be a single
deeper problem which solves everything at once.


-- 
(\___(\___(\______          --=> 8-) EHM <=--          ______/)___/)___/)
 \BS (    |         ehem+sigmsg@xxxxxxx  PGP 87145445         |    )   /
  \_CS\   |  _____  -O #include <stddisclaimer.h> O-   _____  |   /  _/
8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445





 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.