[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [XEN PATCH] x86/ACPI: Ignore entries with invalid APIC IDs when parsing MADT



Jan!

On Wed, Aug 30 2023 at 09:20, Jan Beulich wrote:
> On 30.08.2023 00:54, Thomas Gleixner wrote:
>> On Tue, Aug 29 2023 at 16:25, Roger Pau Monné wrote:
>> 
>> Correct. These IDs are invalid independent of any flag value. 
>
> What we apparently agree on is these special UID values to be invalid,
> and that UIDs can't change. But that's not the same for the APIC IDs;
> see below. (As a side note, Xen range-checks UID against its
> implementation limit MAX_MADT_ENTRIES, so it's more than just the
> all-ones values which we'd reject. Not really correct, I know. Looks
> like Linux has done away with the simple x86_acpiid_to_apicid[]
> translation mechanism. This being a statically sized array requires
> this restriction in Xen, for the time being.)

Linux ignores entries too once the the maximum number of CPUs is reached.

>>> I think Jan's point (if I understood correctly) is that Processor or
>>> Device objects can have a _MAT method that returns updated MADT
>>> entries, and some ACPI implementations might modify the original
>>> entries on the MADT and return them from that method when CPU
>>> hotplug takes place.
>
> Just to mention it: It's not just "might". I've seen DSDT code doing
> so on more than one occasion.

That does not make it more correct or better.

> As stated before, unless putting in place extra restrictions, I can't
> even see how firmware would be able to up front determine APIC IDs for
> unpopulated sockets: It simply can't know the topology of a package
> that's not there yet. Requiring all packages to have identical
> topology might be a restriction OSes put in place, but I'd be inclined
> to call firmware buggy if it did (short of me being aware of there
> being anything in the spec putting in place such a restriction).

The ACPI specification does not care about restrictions which are in the
realm of hardware. You simply cannot mix random CPUs in a system just as
you see fit.

But that aside. ACPI based hotplug is purely used by virtualization. The
efforts to support real physical hotplug have never advanced beyond the
proof of concept state as the required complexity turned out to be just
not worth the potential benefit.

Of course virtualization people might think that everything which is
imaginable and not explicitly forbidden by some specification is
something which should be supported. That's just a recipe for disaster
as it needlessly expands the complexity space for absolutely zero value.

In reality most OSes will require that all possible APIC IDs are
enumerated during initialization in order to size things correctly and
to make system wide decisions correctly during boot or they will either
fail to accept hot-added CPUs later on or end up in a situation where
after accepting a hot-added CPU it turns out that system wide boot time
decisions are wrong or data is sized incorrectly, which means it is
pretty much up a creek without a paddle.

So in order to avoid these hard to handle, hard to debug and diagnose
failure cases, it's sensible when OSes mandate enumeration requirements
which have been omitted from the specification for whatever reason.

Linux expects this today and in case the expectation is not met it has
issues due to the non-enforcement, which cause hard to diagnose
malfunction. Those issues might be fixable by some definition of
fixable, but the value of doing that is close to zero. In fact it'd be a
net negative because the increased complexity will just put a
maintainability burden on the code base which is completely
unjustifiable.

Coming back to the specification issues. As of v6.3 the Online Capable
flag was added with the following rationale (paraphrased):

     Operating systems need to size resources at boot time and therefore
     they count the APIC IDs which have the enabled flag cleared to size
     correctly for potential hotplug operations.

     But that has diametral effects on bare metal because the OS is not
     able to distinguish between hot-plugable APIC ID and truly disabled
     entries. That results in overallocation or suboptimal distribution
     of multi-queue devices.

The benefit (verbatim):

     The proposed “Online Capable” flag will allow OSPM to unequivocally
     discern the platform’s intention regarding processors that are not
     enabled at OS boot-time.

Now look at the outcome:

Enabled:

        If this bit is set the processor is ready for use. If this bit
        is clear and the Online Capable bit is set, system hardware
        supports enabling this processor during OS runtime. If this bit
        is clear and the Online Capable bit is also clear, this
        processor is unusable, and OSPM shall ignore the contents of the
        Processor Local APIC Structure.

So while I conceed that this does not expressis verbis mandate that
hot-pluggable CPUs must be enumerated it's not far fetched to interpret
it this way. The key is 'OSPM shall ignore the contents...'

As MADT is the only source of information from which an OS can deduce
sizing and also topology information during early boot, ignoring the
contents boils down to not allocating resources and if such an APIC ID
magically surfaces later on via hot-add, then the OS can rightfully
refuse to add it.

Having that information is simply a correctness requirement and there is
absolutely no justification to support made up ivory tower
configurations which try to explore the gaps and bluryness of the ACPI
specification.

Of course we can't rely on that flag yet if the table is not
implementing v6.3+, but from my testing there is no fallout from
refusing to hot-add CPUs which have not been enumerated in MADT during
early boot. The issue addressed by the Online Capable bit
vs. overallocation is obviosly still there when the enumerated
"disabled" APIC IDs can never be hot-added.

Feel free to disagree, but Linux will enforce that _all_ possible APIC
IDs are enumerated in MADT during early boot in the near future. This
makes a lot of things correct, simpler and more robust.

Thanks,

        tglx



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.