[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH] x86/ioapic: sanitize IO-APIC pins before enabling the local APIC
On Tue, Jul 11, 2023 at 12:53:31PM +0200, Jan Beulich wrote: > On 10.07.2023 16:43, Roger Pau Monné wrote: > > On Mon, Jul 10, 2023 at 12:56:27PM +0200, Jan Beulich wrote: > >> On 07.07.2023 11:53, Roger Pau Monne wrote: > >>> The current logic to init the local APIC and the IO-APIC does init the > >>> former first before doing any kind of sanitation on the IO-APIC pin > >>> configuration. It's already noted on enable_IO_APIC() that Xen > >>> shouldn't trust the IO-APIC being empty at bootup. > >>> > >>> At XenServer we have a system where the IO-APIC 0 is handed to Xen > >>> with pin 0 unmasked, set to Fixed delivery mode, edge triggered and > >>> with a vector of 0 (all fields of the RTE are zeroed). Once the local > >>> APIC is enabled periodic injections from such pin cause a storm of > >>> errors: > >>> > >>> APIC error on CPU0: 00(40), Received illegal vector > >>> APIC error on CPU0: 40(40), Received illegal vector > >>> APIC error on CPU0: 40(40), Received illegal vector > >>> APIC error on CPU0: 40(40), Received illegal vector > >>> APIC error on CPU0: 40(40), Received illegal vector > >>> APIC error on CPU0: 40(40), Received illegal vector > >>> > >>> That prevents Xen from booting. > >> > >> And I expect no RTE is in ExtInt mode, so one might conclude that > >> firmware meant to set up RTE 0 that way. Mainly as a remark, albeit > >> of course there's then the question whether to change the RTE > >> rather than masking it. What do ACPI tables say? > > > > There's one relevant override: > > > > [668h 1640 1] Subtable Type : 02 [Interrupt Source > > Override] > > [669h 1641 1] Length : 0A > > [66Ah 1642 1] Bus : 00 > > [66Bh 1643 1] Source : 00 > > [66Ch 1644 4] Interrupt : 00000002 > > [670h 1648 2] Flags (decoded below) : 0000 > > Polarity : 0 > > Trigger Mode : 0 > > > > So IRQ 0 -> GSI 2, so it's likely pin 0 is the one the i8259 is > > connected to. > > Then wouldn't we be better off converting that RTE to ExtInt? That > would allow PIC IRQs (not likely to exist, but still) to work > (without undue other side effects afaics), whereas masking this RTE > would not. I'm kind of worry of trying to automate this logic. Should we always convert pin 0 to ExtInt if it's found unmasked, with and invalid vector and there's a source override entry for the IRQ? It seems weird to infer this just from the fact that pin 0 is all zeroed out. > >>> --- a/xen/arch/x86/apic.c > >>> +++ b/xen/arch/x86/apic.c > >>> @@ -1476,6 +1476,10 @@ int __init APIC_init_uniprocessor (void) > >>> return -1; > >>> } > >>> > >>> + if ( smp_found_config && !skip_ioapic_setup && nr_ioapics ) > >>> + /* Sanitize the IO-APIC pins before enabling the local APIC. */ > >>> + sanitize_IO_APIC(); > >> > >> I'm a little puzzled by the smp_found_config part of the check here, > >> but not in smp_prepare_cpus(). What's the reason for (a) the check > >> and (b) the difference? > > > > This just mimics what gates the call to setup_IO_APIC() in that same > > function. It makes no sense to call sanitize_IO_APIC() if > > setup_IO_APIC() is not called, and I wasn't planning on changing the > > logic that gates the call setup_IO_APIC() in this patch. > > > > I did note the difference with smp_prepare_cpus(), and I think we > > should look at unifying those paths, but didn't want to do it as part > > of this fix. > > Well, consistency is one valid goal. But masking RTEs may need to be > done more aggressively than setting up the IO-APICs. In particular > if we're not to use them, we still want to mask all RTEs. Otherwise > we're likely to observe each IRQ to arrive via two separate routes > (once through the 8259 and once from an unmasked IO-APIC pin). So avoid the smp_found_config check here and use the same condition as in smp_prepare_cpus(), I will adjust the patch to that effect. I do wonder why we don't simply mandate IO-APIC usage and get rid of the code to handle the 8259. Is it only to cope with systems that have a broken IO-APIC configuration? Because I don't think there are x86 64bit systems without an IO-APIC? > >> Isn't checking nr_ioapics sufficient in this > >> regard? (b) probably could be addressed by moving the code to the > >> beginning of verify_local_APIC(), immediately ahead of which you > >> insert both call sites. (At which point the function may want naming > >> verify_IO_APIC() for consistency, but that's surely minor.) > > > > I wanted the call to sanitize_IO_APIC() to be done at the same level > > where the call to setup_IO_APIC() is done, because it makes the logic > > clearer. > > Hmm, I see. > > >> As to also checking skip_ioapic_setup - wouldn't the unmasked pin be > >> similarly troublesome in that case? > > > > skip_ioapic_setup is set when the IO-APIC address in the MADT is > > invalid, so in that case attempting to access IO-APIC registers in > > that case won't lead to anything good. > > Of course, as I did say as well. Still if for some IO-APICs we know > their addresses, we would still be able to deal with those (if we > were to stick to this mask-all-RTEs-early model). The issue is that ioapic_init_mappings() will refuse to process further IO-APIC entries once an entry with address 0 is found. We could change that, but I would likely do it as an improvement rather than tie it to this change. Thanks, Roger.
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |