[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] x86/ioapic: sanitize IO-APIC pins before enabling the local APIC


  • To: Roger Pau Monné <roger.pau@xxxxxxxxxx>
  • From: Jan Beulich <jbeulich@xxxxxxxx>
  • Date: Thu, 13 Jul 2023 09:49:14 +0200
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=suse.com; dmarc=pass action=none header.from=suse.com; dkim=pass header.d=suse.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=DjO+xW2IuCqUIrUfCx5Fb7+6nhCjDXBnnfII4gE1vTE=; b=jOZhQBB00dv4eNZ3Hprh+/l1Q8hX3f44TW8OUqNxF08jZKvUg2vTy+J5fmqAJrb9lXtQEO8uyylIz7Br62spu9dgFPpd3YRyeomKSoleIGtV1oh4bOaHO+rui636iUtj9GMetCez21KlGNOv3P7NdS50iieD+z//dmH1RyMN38H2J8lba5Gg8shqPvSfoMDrhebo6dcFmKP07yLK4H+qWeOGl4heWuGVCOjRhWuW4eJZ8m00wmAiSgwA+6kRkMXDovOUr9eSj+wrK7L0is72UHqzprawmlbY8DtYcpKQpr+ZC3b7zEQkM1ajrgCsiNqpbRCVG/dkWUOdswZF/ZgxBQ==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=NsmTe9e9CqaqAZcFBtgOTyh5dchxgcew9QBvkFa2x34+QxJfnaNMAXT+x8lD90KFXMXkhxmfRffYdZaEaW72FEIg6rrMa259VigcfEo+A+iB1nGGR7VaHgGx3AS5D53sX8MLKQfekErWvX8U2ojlw9pyv2kkL3P67StEKHykvFh+J2ORSV3uQeib0Amk1ewZMuUwYG5EUQM2G/3quSllfejZTpzfNciGb5/ysAPO4irV4O4AD7u4JFPQyftTLyyEtsw54k50HPNwhJRARL5+7h+wRCOvB1Izxx/vNY531709AFdzM98kQSbiyjfQZYi/5O89nwXX81c1bZKiWSmSrg==
  • Authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=suse.com;
  • Cc: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Wei Liu <wl@xxxxxxx>, xen-devel@xxxxxxxxxxxxxxxxxxxx
  • Delivery-date: Thu, 13 Jul 2023 07:49:23 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 12.07.2023 17:41, Roger Pau Monné wrote:
> On Wed, Jul 12, 2023 at 04:50:34PM +0200, Jan Beulich wrote:
>> Hmm. The question really is which of the changes we want to backport.
>> That one should be first. While it's largely guesswork, I'd be more
>> inclined to take the change that has less of an effect overall.
> 
> My views would be the other way around I think.  Current code already
> has a comment to notice that IO-APIC pins might be wrongly unmasked,
> and there's also logic for masking them when the IO-APICs are
> initialized.  The fact that such logic is placed after the local APIC
> has been initialized is IMO a bug, as having unmasked unconfigured
> IO-APIC pins when the local APIC is enabled should be avoided.

Hmm, if you really meant this (and not setting up / unmasking of
LVTERR), then your change would still be insufficient. We may enable
the APIC for the BSP in init_bsp_APIC() (which is called quite a bit
earlier), and the APIC may also have been enabled by firmware already,
so I don't view this argument as fully convincing. (That said,
init_bsp_APIC() calls clear_local_APIC(), so while the LAPIC may be
enabled, errors would be reported only in ESR, not by delivering
interrupts.)

Which gets me back to another aspect of your scenario that I haven't
fully understood: In the description you say that booting fails.
Since we handle the errors, the implication is that the pin remains
constantly active (thus triggering errors over and over again). Yet
how would this not already cause problems ahead of smp_prepare_cpus()
if the LAPIC was already enabled? Wouldn't we need to do part of
clear_local_APIC() from init_bsp_APIC() before bailing from there
when smp_found_config is set? ("Part of" because as per the comment
at the top of init_bsp_APIC() we apparently would need to leave LVT0
[and then perhaps also LVT1] unmasked.)

>> That said I can see that Linux have their enable_IO_APIC() calls
>> also ahead of setup_IO_APIC() (but after connect_bsp_APIC() and
>> setup_local_APIC()). IOW with your change we'd do the masking yet
>> earlier than them. This may of course even be advantageous, but
>> there may also be good reasons for the sequence they're using.
> 
> Linux calls enable_IO_APIC() before setting up the local
> APIC LVT Error vector (that's done in end_local_APIC_setup()).  That
> logic seems to be introduced by commit:
> 
> 1c69524c2e5b x86: clear IO_APIC before enabing apic error vector.
> 
> Might it be less controversial to do like Linux does and call
> enable_IO_APIC() before the local APIC ESR is setup?

You already do so, just that you do it yet earlier. I'm not
convinced it needs doing from the middle of setup_local_APIC() (or,
like nowaday's Linux has it, with ESR / LVTERR setup split to a
separate function, and enable_IO_APIC() called between those two
LAPIC related calls). You also disliked putting the call at the
beginning of setup_local_APIC(), so putting it in the middle of it
might be yet worse when taking that perspective. (Another downside
of calling it from anywhere in setup_local_APIC() is that this
would be another __init function called from a non-__init one. We
have examples of such, and keying the call to "bsp" would leave it
safe, but avoiding such calls when we easily can is probably a
worthwhile secondary goal.)

Question is whether it can sensibly be moved at least a little
later: verify_local_APIC() isn't of much use anyway, first and
foremost because we ignore its return value. And connect_bsp_APIC()
largely is concerned about leaving PIC mode. So maybe put the call
immediately ahead of the setup_local_APIC(true) ones?

A further question is whether, considering that Linux continues to
use that name, we wouldn't be better off not renaming the function.

One other thing I finally figured was confusing me in the
description of the patch; re-quoting that paragraph here:

"Fix this by moving the masking of IO-APIC pins ahead of the enabling
 of the local APIC.  Note that before doing such masking Xen attempts
 to detect the pin where the legacy i8259 is connected, and that logic
 relies on the pin being unmasked, hence the logic is also moved ahead
 of enabling the local APIC."

The last sentence saying "also" is kind of odd with the first one
already stating this very movement. To me it's therefore unclear what
exactly the second sentence is intended to be telling me. I guess you
want to express that together with the making the detection logic is
also moved (i.e. the entire function is called earlier), but I'm
afraid this isn't the only way to read that second sentence.

Jan



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.