[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] x86/ioapic: sanitize IO-APIC pins before enabling the local APIC


  • To: Jan Beulich <jbeulich@xxxxxxxx>
  • From: Roger Pau Monné <roger.pau@xxxxxxxxxx>
  • Date: Thu, 13 Jul 2023 13:29:34 +0200
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=citrix.com; dmarc=pass action=none header.from=citrix.com; dkim=pass header.d=citrix.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=hSS7ygLTprxq6xSYbXjCyWXIMAU+wteSYsFZ4WKt2Bc=; b=PGKygkJaefJJCNeALiHYtjGempr/5ZFbpfJIXRIaJo+Pra7MUXfXmXeWGfzPzyPu9ZFjEbb681Wz9aBHaiWqjoFtSN9wLedGhgfx6w0Uqc6WOFtqbonr0G5YCmXQ19ya31mOFkxDor0TgMszEg3Gv8kSuWaxvFReB+j2W7WdNHdSRacDpDjrVdrAvEsRrq52SRR0ly7pUhLkz+Q6Dg6RxanaXmip36naNjkRZ5TqHNwqxMDQhHFVZIdpJYAV3drwnaN7F1YGFuOmkpEu1hVauwrAj646QlzPX32VnNoZ0PrMIrnTSshhtzlqTjrcjEj90rO0+IO45uZsoLLekMEGeA==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=IGKGT0CPbEEgcKrtSR5HXCZRyYy/yMlmngXUTs+lPq55MN728DBHBp0yKkX9/k7Hi83/0LlF4NtdJRwlyCn1fqr2YoVt1Zd/QCEBig6BYFNPU2Z5s4/am+ksT70r40YMgBRZHds2kVJVKbAyk/y7ETvtmWWbcTK6zp0AleGrP8BCXAmfLGGTpkMsZIdazyCaB0+1yhYyWPVkW11SQlEKrdG30XQXQpnAM6VOInOtbMrisP9EJOG5fzwRKVAyb/HPGccDCP7GK7IbMXiDQ0y0jv1aldCEmOxZhpg37ZbhVGL/idHhXNJ08+6ETyoccGyWgwhFPx31NUBc4UdlVHyIsQ==
  • Authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=citrix.com;
  • Cc: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Wei Liu <wl@xxxxxxx>, xen-devel@xxxxxxxxxxxxxxxxxxxx
  • Delivery-date: Thu, 13 Jul 2023 11:30:07 +0000
  • Ironport-data: A9a23:oHCuEaNBP0DQqffvrR2DlsFynXyQoLVcMsEvi/4bfWQNrUoi3zMPz 2AdWTvXM6zcajakc9B1PoTk8BgGvcDRx4JrSwto+SlhQUwRpJueD7x1DKtS0wC6dZSfER09v 63yTvGacajYm1eF/k/F3oDJ9CU6jufQAOKnUoYoAwgpLSd8UiAtlBl/rOAwh49skLCRDhiE/ Nj/uKUzAnf8s9JPGjxSs/vrRC9H5qyo42tH5wdmPJingXeF/5UrJMNHTU2OByOQrrl8RoaSW +vFxbelyWLVlz9F5gSNy+uTnuUiG9Y+DCDW4pZkc/HKbitq/0Te5p0TJvsEAXq7vh3S9zxHJ HehgrTrIeshFvWkdO3wyHC0GQkmVUFN0OevzXRSLaV/ZqAJGpfh66wGMa04AWEX0t0qOT1F9 /EaETAqQAyOhe2MnKmxSNA506zPLOGzVG8ekldJ6GiASNwAEdXESaiM4sJE1jAtgMwIBezZe 8cSdTtoalLHfgFLPVAUTpk5mY9EhFGmK2Ee9A3T+PpxujaDpOBy+OGF3N79YNuFSN8Thk+Fj mnH4374ElcRM9n3JT+tqyvy37eexXOnMG4UPLS889opnE+z/DQsLAZPe3ibrqKjg2frDrqzL GRRoELCt5Ma9kamU938VB2Qu2Ofs1gXXN84O/I+wBGAzOzT+QnxLngJSHtNZcIrsOcyRCc2z RmZktXxHzttvbaJD3WH+d+pQSiaPCEUKSoIY38CRA5cut37+tht3lTIU8ppF7OzgpvtAzbsz juWrS84wbIOkcoM0Kb99lfC696xmqX0oscOzl2/dgqYAslRPeZJu6TABYDn0Mt9
  • Ironport-hdrordr: A9a23:vIRWUa03YiDPvhC0nOgN8AqjBCskLtp133Aq2lEZdPUMSL3hqy ncpoVh6faUskdhZJhEo7q90ca7MBXhHPJOkOos1NSZLXnbUQmTXfhfBOLZqlWKdkGQmI886U 4KSdkaNDSENykcsS+M2njdLz9P+qjkzEniv5al854kd3AWV4hQqz5jDACVC0t3QxQDK6YYOf Onl7h6jgvlQG8QaMujAHkDQqzknP3k0LzbQTNuPW9V1OGp5QnYnIIS1yLoqyv3eFt0sMsfGX 6sqX2H2kyMiYDE9iPh
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On Thu, Jul 13, 2023 at 09:49:14AM +0200, Jan Beulich wrote:
> On 12.07.2023 17:41, Roger Pau Monné wrote:
> > On Wed, Jul 12, 2023 at 04:50:34PM +0200, Jan Beulich wrote:
> >> Hmm. The question really is which of the changes we want to backport.
> >> That one should be first. While it's largely guesswork, I'd be more
> >> inclined to take the change that has less of an effect overall.
> > 
> > My views would be the other way around I think.  Current code already
> > has a comment to notice that IO-APIC pins might be wrongly unmasked,
> > and there's also logic for masking them when the IO-APICs are
> > initialized.  The fact that such logic is placed after the local APIC
> > has been initialized is IMO a bug, as having unmasked unconfigured
> > IO-APIC pins when the local APIC is enabled should be avoided.
> 
> Hmm, if you really meant this (and not setting up / unmasking of
> LVTERR), then your change would still be insufficient. We may enable
> the APIC for the BSP in init_bsp_APIC() (which is called quite a bit
> earlier), and the APIC may also have been enabled by firmware already,
> so I don't view this argument as fully convincing. (That said,
> init_bsp_APIC() calls clear_local_APIC(), so while the LAPIC may be
> enabled, errors would be reported only in ESR, not by delivering
> interrupts.)

Indeed.  I should mention the error vector explicitly, as it's not the
local APIC being enabled, but the error vector being setup which
causes the loop here.

> Which gets me back to another aspect of your scenario that I haven't
> fully understood: In the description you say that booting fails.
> Since we handle the errors, the implication is that the pin remains
> constantly active (thus triggering errors over and over again). Yet
> how would this not already cause problems ahead of smp_prepare_cpus()

Before smp_prepare_cpus() (and explicitly before setup_local_APIC())
the error vector is not setup (or that would be my expectation), so
errors are just reported on the ESR, but there's no vector injected.

> if the LAPIC was already enabled? Wouldn't we need to do part of
> clear_local_APIC() from init_bsp_APIC() before bailing from there
> when smp_found_config is set? ("Part of" because as per the comment
> at the top of init_bsp_APIC() we apparently would need to leave LVT0
> [and then perhaps also LVT1] unmasked.)

Right, if the APIC is already enabled by the firmware we do currently
rely on the error vector being masked.  OI think this is a reasonable
expectation, as handing being done with an unknown error vector
unmasked in the local APIC would be bad.  Even with the CPU running
with interrupts disabled we would get error vectors set in IRR.

> 
> >> That said I can see that Linux have their enable_IO_APIC() calls
> >> also ahead of setup_IO_APIC() (but after connect_bsp_APIC() and
> >> setup_local_APIC()). IOW with your change we'd do the masking yet
> >> earlier than them. This may of course even be advantageous, but
> >> there may also be good reasons for the sequence they're using.
> > 
> > Linux calls enable_IO_APIC() before setting up the local
> > APIC LVT Error vector (that's done in end_local_APIC_setup()).  That
> > logic seems to be introduced by commit:
> > 
> > 1c69524c2e5b x86: clear IO_APIC before enabing apic error vector.
> > 
> > Might it be less controversial to do like Linux does and call
> > enable_IO_APIC() before the local APIC ESR is setup?
> 
> You already do so, just that you do it yet earlier. I'm not
> convinced it needs doing from the middle of setup_local_APIC() (or,
> like nowaday's Linux has it, with ESR / LVTERR setup split to a
> separate function, and enable_IO_APIC() called between those two
> LAPIC related calls). You also disliked putting the call at the
> beginning of setup_local_APIC(), so putting it in the middle of it
> might be yet worse when taking that perspective. (Another downside
> of calling it from anywhere in setup_local_APIC() is that this
> would be another __init function called from a non-__init one. We
> have examples of such, and keying the call to "bsp" would leave it
> safe, but avoiding such calls when we easily can is probably a
> worthwhile secondary goal.)

Yes, it's not my preference to call it from setup_local_APIC(),
neither splitting the setup of LVTERR/ESR into a separate function.

> 
> Question is whether it can sensibly be moved at least a little
> later: verify_local_APIC() isn't of much use anyway, first and
> foremost because we ignore its return value. And connect_bsp_APIC()
> largely is concerned about leaving PIC mode. So maybe put the call
> immediately ahead of the setup_local_APIC(true) ones?

Strictly speaking for the issue I have at hand it needs to be done
ahead of setting up LVTERR/ESR.

> A further question is whether, considering that Linux continues to
> use that name, we wouldn't be better off not renaming the function.

Yes, I will leave the name alone, albeit it's IMO a bit misleading,
it's not actually enabling anything, but rather masking the pins and
figuring out the pin where the i8259 might be connected.

> One other thing I finally figured was confusing me in the
> description of the patch; re-quoting that paragraph here:
> 
> "Fix this by moving the masking of IO-APIC pins ahead of the enabling
>  of the local APIC.  Note that before doing such masking Xen attempts
>  to detect the pin where the legacy i8259 is connected, and that logic
>  relies on the pin being unmasked, hence the logic is also moved ahead
>  of enabling the local APIC."
> 
> The last sentence saying "also" is kind of odd with the first one
> already stating this very movement. To me it's therefore unclear what
> exactly the second sentence is intended to be telling me. I guess you
> want to express that together with the making the detection logic is
> also moved (i.e. the entire function is called earlier), but I'm
> afraid this isn't the only way to read that second sentence.

Yes, you are reading it as I was intending.  It's merely a logic that
the i8259 detection is also done slightly earlier, as it relies on the
RTE masking not being tampered with.

What about:

"Fix this by moving the masking of IO-APIC pins ahead of the enabling
 of the local APIC.  Note that before doing such masking Xen attempts
 to detect the pin where the legacy i8259 is connected, and that logic
 relies on the pin being unmasked, hence the logic is also moved ahead
 of enabling the local APIC."

"Move the masking of the IO-APIC pins ahead of the setup of the local
 APIC.  This has the side effect of also moving the detection of the
 pin where the i8259 is connected, as it must be done before masking
 any pins."

Is the aboce any better?  Would you rather prefer me to drop any
mention of the i8259 detection being moved? (as it's just a side
effect of moving the masking).

So to recap, I think we are in agreement that calling enable_IO_APIC()
just ahead of the call to setup_local_APIC() is the preferred
solution?

Thanks, Roger.



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.