[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH] x86/ioapic: sanitize IO-APIC pins before enabling the local APIC
On Wed, Jul 12, 2023 at 04:50:34PM +0200, Jan Beulich wrote: > On 12.07.2023 15:53, Roger Pau Monné wrote: > > On Wed, Jul 12, 2023 at 12:07:43PM +0200, Jan Beulich wrote: > >> On 11.07.2023 15:02, Roger Pau Monné wrote: > >>> On Tue, Jul 11, 2023 at 12:53:31PM +0200, Jan Beulich wrote: > >>>> On 10.07.2023 16:43, Roger Pau Monné wrote: > >>>>> On Mon, Jul 10, 2023 at 12:56:27PM +0200, Jan Beulich wrote: > >>>>>> On 07.07.2023 11:53, Roger Pau Monne wrote: > >>>>>>> The current logic to init the local APIC and the IO-APIC does init the > >>>>>>> former first before doing any kind of sanitation on the IO-APIC pin > >>>>>>> configuration. It's already noted on enable_IO_APIC() that Xen > >>>>>>> shouldn't trust the IO-APIC being empty at bootup. > >>>>>>> > >>>>>>> At XenServer we have a system where the IO-APIC 0 is handed to Xen > >>>>>>> with pin 0 unmasked, set to Fixed delivery mode, edge triggered and > >>>>>>> with a vector of 0 (all fields of the RTE are zeroed). Once the local > >>>>>>> APIC is enabled periodic injections from such pin cause a storm of > >>>>>>> errors: > >>>>>>> > >>>>>>> APIC error on CPU0: 00(40), Received illegal vector > >>>>>>> APIC error on CPU0: 40(40), Received illegal vector > >>>>>>> APIC error on CPU0: 40(40), Received illegal vector > >>>>>>> APIC error on CPU0: 40(40), Received illegal vector > >>>>>>> APIC error on CPU0: 40(40), Received illegal vector > >>>>>>> APIC error on CPU0: 40(40), Received illegal vector > >>>>>>> > >>>>>>> That prevents Xen from booting. > >>>>>> > >>>>>> And I expect no RTE is in ExtInt mode, so one might conclude that > >>>>>> firmware meant to set up RTE 0 that way. Mainly as a remark, albeit > >>>>>> of course there's then the question whether to change the RTE > >>>>>> rather than masking it. What do ACPI tables say? > >>>>> > >>>>> There's one relevant override: > >>>>> > >>>>> [668h 1640 1] Subtable Type : 02 [Interrupt Source > >>>>> Override] > >>>>> [669h 1641 1] Length : 0A > >>>>> [66Ah 1642 1] Bus : 00 > >>>>> [66Bh 1643 1] Source : 00 > >>>>> [66Ch 1644 4] Interrupt : 00000002 > >>>>> [670h 1648 2] Flags (decoded below) : 0000 > >>>>> Polarity : 0 > >>>>> Trigger Mode : 0 > >>>>> > >>>>> So IRQ 0 -> GSI 2, so it's likely pin 0 is the one the i8259 is > >>>>> connected to. > >>>> > >>>> Then wouldn't we be better off converting that RTE to ExtInt? That > >>>> would allow PIC IRQs (not likely to exist, but still) to work > >>>> (without undue other side effects afaics), whereas masking this RTE > >>>> would not. > >>> > >>> I'm kind of worry of trying to automate this logic. Should we always > >>> convert pin 0 to ExtInt if it's found unmasked, with and invalid > >>> vector and there's a source override entry for the IRQ? > >>> > >>> It seems weird to infer this just from the fact that pin 0 is all > >>> zeroed out. > >> > >> As you say in the earlier paragraph, it wouldn't be just that. It's > >> unmasked + bad vector + present source override (indicating that > >> nothing except perhaps a PIC is connected to the pin). > > > > I can do this as a separate patch, but not really here IMO. The > > purpose of this patch is strictly to perform the IO-APIC sanitation > > ahead of enabling the local APIC. I will add a followup patch to the > > series, albeit I'm unconvinced we want to infer IO-APIC pin > > configuration based on firmware miss configurations. > > Hmm. The question really is which of the changes we want to backport. > That one should be first. While it's largely guesswork, I'd be more > inclined to take the change that has less of an effect overall. My views would be the other way around I think. Current code already has a comment to notice that IO-APIC pins might be wrongly unmasked, and there's also logic for masking them when the IO-APICs are initialized. The fact that such logic is placed after the local APIC has been initialized is IMO a bug, as having unmasked unconfigured IO-APIC pins when the local APIC is enabled should be avoided. > That said I can see that Linux have their enable_IO_APIC() calls > also ahead of setup_IO_APIC() (but after connect_bsp_APIC() and > setup_local_APIC()). IOW with your change we'd do the masking yet > earlier than them. This may of course even be advantageous, but > there may also be good reasons for the sequence they're using. Linux calls enable_IO_APIC() before setting up the local APIC LVT Error vector (that's done in end_local_APIC_setup()). That logic seems to be introduced by commit: 1c69524c2e5b x86: clear IO_APIC before enabing apic error vector. Might it be less controversial to do like Linux does and call enable_IO_APIC() before the local APIC ESR is setup? Thanks, Roger.
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |