[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH] x86/ioapic: sanitize IO-APIC pins before enabling the local APIC
On 12.07.2023 17:41, Roger Pau Monné wrote: > On Wed, Jul 12, 2023 at 04:50:34PM +0200, Jan Beulich wrote: >> Hmm. The question really is which of the changes we want to backport. >> That one should be first. While it's largely guesswork, I'd be more >> inclined to take the change that has less of an effect overall. > > My views would be the other way around I think. Current code already > has a comment to notice that IO-APIC pins might be wrongly unmasked, > and there's also logic for masking them when the IO-APICs are > initialized. The fact that such logic is placed after the local APIC > has been initialized is IMO a bug, as having unmasked unconfigured > IO-APIC pins when the local APIC is enabled should be avoided. Hmm, if you really meant this (and not setting up / unmasking of LVTERR), then your change would still be insufficient. We may enable the APIC for the BSP in init_bsp_APIC() (which is called quite a bit earlier), and the APIC may also have been enabled by firmware already, so I don't view this argument as fully convincing. (That said, init_bsp_APIC() calls clear_local_APIC(), so while the LAPIC may be enabled, errors would be reported only in ESR, not by delivering interrupts.) Which gets me back to another aspect of your scenario that I haven't fully understood: In the description you say that booting fails. Since we handle the errors, the implication is that the pin remains constantly active (thus triggering errors over and over again). Yet how would this not already cause problems ahead of smp_prepare_cpus() if the LAPIC was already enabled? Wouldn't we need to do part of clear_local_APIC() from init_bsp_APIC() before bailing from there when smp_found_config is set? ("Part of" because as per the comment at the top of init_bsp_APIC() we apparently would need to leave LVT0 [and then perhaps also LVT1] unmasked.) >> That said I can see that Linux have their enable_IO_APIC() calls >> also ahead of setup_IO_APIC() (but after connect_bsp_APIC() and >> setup_local_APIC()). IOW with your change we'd do the masking yet >> earlier than them. This may of course even be advantageous, but >> there may also be good reasons for the sequence they're using. > > Linux calls enable_IO_APIC() before setting up the local > APIC LVT Error vector (that's done in end_local_APIC_setup()). That > logic seems to be introduced by commit: > > 1c69524c2e5b x86: clear IO_APIC before enabing apic error vector. > > Might it be less controversial to do like Linux does and call > enable_IO_APIC() before the local APIC ESR is setup? You already do so, just that you do it yet earlier. I'm not convinced it needs doing from the middle of setup_local_APIC() (or, like nowaday's Linux has it, with ESR / LVTERR setup split to a separate function, and enable_IO_APIC() called between those two LAPIC related calls). You also disliked putting the call at the beginning of setup_local_APIC(), so putting it in the middle of it might be yet worse when taking that perspective. (Another downside of calling it from anywhere in setup_local_APIC() is that this would be another __init function called from a non-__init one. We have examples of such, and keying the call to "bsp" would leave it safe, but avoiding such calls when we easily can is probably a worthwhile secondary goal.) Question is whether it can sensibly be moved at least a little later: verify_local_APIC() isn't of much use anyway, first and foremost because we ignore its return value. And connect_bsp_APIC() largely is concerned about leaving PIC mode. So maybe put the call immediately ahead of the setup_local_APIC(true) ones? A further question is whether, considering that Linux continues to use that name, we wouldn't be better off not renaming the function. One other thing I finally figured was confusing me in the description of the patch; re-quoting that paragraph here: "Fix this by moving the masking of IO-APIC pins ahead of the enabling of the local APIC. Note that before doing such masking Xen attempts to detect the pin where the legacy i8259 is connected, and that logic relies on the pin being unmasked, hence the logic is also moved ahead of enabling the local APIC." The last sentence saying "also" is kind of odd with the first one already stating this very movement. To me it's therefore unclear what exactly the second sentence is intended to be telling me. I guess you want to express that together with the making the detection logic is also moved (i.e. the entire function is called earlier), but I'm afraid this isn't the only way to read that second sentence. Jan
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |