Xen project Mailing List

Re: [XEN PATCH] x86/ACPI: Ignore entries with invalid APIC IDs when parsing MADT

From: Thomas Gleixner <tglx@xxxxxxxxxxxxx>

Date: Wed, 30 Aug 2023 17:44:40 +0200

Cc: Wei Liu <wl@xxxxxxx>, Marek Marczykowski-Górecki <marmarek@xxxxxxxxxxxxxxxxxxxxxx>, Roger Pau Monné <roger.pau@xxxxxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Simon Gaiser <simon@xxxxxxxxxxxxxxxxxxxxxx>, xen-devel@xxxxxxxxxxxxxxxxxxxx

Delivery-date: Wed, 30 Aug 2023 15:44:50 +0000

List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

Jan! On Wed, Aug 30 2023 at 09:20, Jan Beulich wrote: > On 30.08.2023 00:54, Thomas Gleixner wrote: >> On Tue, Aug 29 2023 at 16:25, Roger Pau Monné wrote: >> >> Correct. These IDs are invalid independent of any flag value. > > What we apparently agree on is these special UID values to be invalid, > and that UIDs can't change. But that's not the same for the APIC IDs; > see below. (As a side note, Xen range-checks UID against its > implementation limit MAX_MADT_ENTRIES, so it's more than just the > all-ones values which we'd reject. Not really correct, I know. Looks > like Linux has done away with the simple x86_acpiid_to_apicid[] > translation mechanism. This being a statically sized array requires > this restriction in Xen, for the time being.) Linux ignores entries too once the the maximum number of CPUs is reached. >>> I think Jan's point (if I understood correctly) is that Processor or >>> Device objects can have a _MAT method that returns updated MADT >>> entries, and some ACPI implementations might modify the original >>> entries on the MADT and return them from that method when CPU >>> hotplug takes place. > > Just to mention it: It's not just "might". I've seen DSDT code doing > so on more than one occasion. That does not make it more correct or better. > As stated before, unless putting in place extra restrictions, I can't > even see how firmware would be able to up front determine APIC IDs for > unpopulated sockets: It simply can't know the topology of a package > that's not there yet. Requiring all packages to have identical > topology might be a restriction OSes put in place, but I'd be inclined > to call firmware buggy if it did (short of me being aware of there > being anything in the spec putting in place such a restriction). The ACPI specification does not care about restrictions which are in the realm of hardware. You simply cannot mix random CPUs in a system just as you see fit. But that aside. ACPI based hotplug is purely used by virtualization. The efforts to support real physical hotplug have never advanced beyond the proof of concept state as the required complexity turned out to be just not worth the potential benefit. Of course virtualization people might think that everything which is imaginable and not explicitly forbidden by some specification is something which should be supported. That's just a recipe for disaster as it needlessly expands the complexity space for absolutely zero value. In reality most OSes will require that all possible APIC IDs are enumerated during initialization in order to size things correctly and to make system wide decisions correctly during boot or they will either fail to accept hot-added CPUs later on or end up in a situation where after accepting a hot-added CPU it turns out that system wide boot time decisions are wrong or data is sized incorrectly, which means it is pretty much up a creek without a paddle. So in order to avoid these hard to handle, hard to debug and diagnose failure cases, it's sensible when OSes mandate enumeration requirements which have been omitted from the specification for whatever reason. Linux expects this today and in case the expectation is not met it has issues due to the non-enforcement, which cause hard to diagnose malfunction. Those issues might be fixable by some definition of fixable, but the value of doing that is close to zero. In fact it'd be a net negative because the increased complexity will just put a maintainability burden on the code base which is completely unjustifiable. Coming back to the specification issues. As of v6.3 the Online Capable flag was added with the following rationale (paraphrased): Operating systems need to size resources at boot time and therefore they count the APIC IDs which have the enabled flag cleared to size correctly for potential hotplug operations. But that has diametral effects on bare metal because the OS is not able to distinguish between hot-plugable APIC ID and truly disabled entries. That results in overallocation or suboptimal distribution of multi-queue devices. The benefit (verbatim): The proposed “Online Capable” flag will allow OSPM to unequivocally discern the platform’s intention regarding processors that are not enabled at OS boot-time. Now look at the outcome: Enabled: If this bit is set the processor is ready for use. If this bit is clear and the Online Capable bit is set, system hardware supports enabling this processor during OS runtime. If this bit is clear and the Online Capable bit is also clear, this processor is unusable, and OSPM shall ignore the contents of the Processor Local APIC Structure. So while I conceed that this does not expressis verbis mandate that hot-pluggable CPUs must be enumerated it's not far fetched to interpret it this way. The key is 'OSPM shall ignore the contents...' As MADT is the only source of information from which an OS can deduce sizing and also topology information during early boot, ignoring the contents boils down to not allocating resources and if such an APIC ID magically surfaces later on via hot-add, then the OS can rightfully refuse to add it. Having that information is simply a correctness requirement and there is absolutely no justification to support made up ivory tower configurations which try to explore the gaps and bluryness of the ACPI specification. Of course we can't rely on that flag yet if the table is not implementing v6.3+, but from my testing there is no fallout from refusing to hot-add CPUs which have not been enumerated in MADT during early boot. The issue addressed by the Online Capable bit vs. overallocation is obviosly still there when the enumerated "disabled" APIC IDs can never be hot-added. Feel free to disagree, but Linux will enforce that _all_ possible APIC IDs are enumerated in MADT during early boot in the near future. This makes a lot of things correct, simpler and more robust. Thanks, tglx

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.