[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH for-4.20 1/2] x86/shutdown: quiesce devices ahead of AP shutdown
On 29.01.2025 11:13, Jan Beulich wrote: > On 28.01.2025 17:27, Roger Pau Monne wrote: >> The current shutdown logic in smp_send_stop() will first disable the APs, >> and then attempt to disable (some) of the interrupt sources. >> >> There are two issues with this approach; the first one being that MSI >> interrupt sources are not disabled, the second one is the APs are stopped >> before interrupts are disabled. On AMD systems this can lead to the >> triggering of local APIC errors: >> >> APIC error on CPU0: 00(08), Receive accept error >> >> Such error message can be printed in a loop, thus blocking the system from >> rebooting. I assume this loop is created by the error being triggered by >> the console interrupt, which is further triggered by the ESR reporting >> write to the console. >> >> Intel SDM states: >> >> "Receive Accept Error. >> >> Set when the local APIC detects that the message it received was not >> accepted by any APIC on the APIC bus, including itself. Used only on P6 >> family and Pentium processors." >> >> So the error shouldn't trigger on any Intel CPU supported by Xen. >> >> However AMD doesn't make such claims, and indeed the error is broadcasted >> to all local APIC when for example an interrupt targets a CPU that's >> offline. >> >> To prevent the error from triggering, move the masking of IO-APIC pins >> ahead of stopping the APs. Also introduce a new function that disables >> MSI and MSI-X on all PCI devices. Remove the call to fixup_irqs() since >> there's no point in attempting to move interrupts: all sources will be >> either masked or disabled. >> >> For the NMI crash path also call the newly introduced function, with the >> hope that disabling MSI and MSI-X will make it easier for the (possible) >> crash kernel to boot, as it could otherwise receive the same "Receive >> accept error" upon re-enabling interrupts. >> >> Note that this will have the side-effect of preventing further IOMMU >> interrupts from being delivered, that's expected and at that point in the >> shutdown process no further interaction with the IOMMU should be relevant. > > This is at most for AMD only. Shouldn't we similarly disable VT-d's > interrupt(s)? (It's only one right now, as we still don't use the QI > completion one.) Even for AMD I'm uncertain: It has separate > hw_irq_controller instances, and its set_iommu_interrupt_handler() is > custom as well. Will pci_disable_msi_all() really affect it? (Hmm, > yes, from amd_iommu_msi_enable() it looks like it will.) Oh, no - not for the x2APIC case. There it's solely iommu->ctrl.int_cap_xt_en which controls whether the interrupt is enabled, iirc. Btw, one of the two calls from enable_iommu() to amd_iommu_msi_enable() is actually wrong (commit d9e49d1afe2e should have moved it instead of adding another call); I'll make a patch for that. Jan
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |