[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [patch 00/37] cpu/hotplug, x86: Reworked parallel CPU bringup



Dear Thomas,


Am 19.04.23 um 14:38 schrieb Thomas Gleixner:
On Wed, Apr 19 2023 at 11:38, Thomas Gleixner wrote:
On Tue, Apr 18 2023 at 22:10, Paul Menzel wrote:
Am 18.04.23 um 10:40 schrieb Thomas Gleixner:
Can you please provide the output of cpuid?

Of course. Here the top, and the whole output is attached.

Thanks for the data. Can you please apply the debug patch below and
provide the dmesg output? Just the line which is added by the patch is
enough. You can boot with cpuhp.parallel=off so you don't have wait for
10 seconds.

Borislav found some a machine which also refuses to boot. It turns of
the debug patch was spot on:

[    0.462724] .... node  #0, CPUs:      #1
[    0.462731] smpboot: Kicking AP alive: 17
[    0.465723]  #2
[    0.465732] smpboot: Kicking AP alive: 18
[    0.467641]  #3
[    0.467641] smpboot: Kicking AP alive: 19

So the kernel gets APICID 17, 18, 19 from ACPI but CPUID leaf 0x1
ebx[31:24], which is the initial APICID has:

CPU1            0x01
CPU2            0x02
CPU3            0x03

Which means the APICID to Linux CPU number lookup based on CPUID 0x01
fails for all of them and stops them dead in the low level startup code.

I am attaching the logs for completeness. Linux is build from your branch with the debug print on top. The firmware, coreboot based, is built from [1], but it also happened non-parallel MP init. The code has better debug prints (attached) though as far as I can see. As Borislav is able to reproduce this too with some non-coreboot firmware, I assume it’s unrelated to coreboot.

```
[    0.259247] smp: Bringing up secondary CPUs ...
[    0.259446] x86: Booting SMP configuration:
[    0.259448] .... node  #0, CPUs:      #1
[    0.259453] smpboot: Kicking AP alive: 17
[   10.260918] CPU1 failed to report alive state
[   10.260998] smp: Brought up 1 node, 1 CPU
[   10.261000] smpboot: Max logical packages: 2
[   10.261001] smpboot: Total of 1 processors activated (7801.09 BogoMIPS)
```

IOW, the BIOS assignes random numbers to the AP APICs for whatever
raisins, which leaves the parallel startup low level code up a creek
without a paddle, except for actually reading the APICID back from the
APIC. *SHUDDER*

I'm leaning towards disabling the CPUID lead 0x01 based discovery and be
done with it.


Kind regards,

Paul


[1]: https://review.coreboot.org/68169

Attachment: kodi-linux-6.3-rc3-smp-tglx.txt
Description: Text document

Attachment: 20230419-coreboot-cbmem-log-cb-68169.txt
Description: Text document


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.