[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: Problems with APIC on versions 4.9 and later (4.8 works)
On 20.01.21 09:50, Jan Beulich wrote: On 19.01.2021 20:36, Claudemir Todo Bom wrote:I do not have serial output on this setup, so I recorded a video with boot_delay=50 in order to be able to get all the kernel messages: https://youtu.be/y95h6vqoF7YThis doesn't show any badness afaics.This is running 4.14 from debian bullseye (testing). I'm also attaching the dmesg output when booting xen 4.8 with the same kernel version and same parameters. I visually compared all the messages, and the only thing I noticed was that 4.14 used tsc as clocksource and 4.8 used xen. I tried to boot the kernel with "clocksource=xen" and the problem is happening with that also.There's some confusion here I suppose: The clock source you talk about is the kernel's, not Xen's. I didn't think this would change for the same kernel version with different Xen underneath, but the Linux maintainers of the Xen code there may know better. Cc-ing them. This might depend on CPUID bits given to dom0 by Xen, e.g. regarding TSC stability. The "start" of the problem is that when the kernel gets to the "Freeing unused kernel image (initmem) memory: 2380K" it hangs and stays there for a while. After a few minutes it shows that a process (swapper) is blocked for sometime (image attached)Now that's pretty unusual - the call trace seen in the screen shot you had attached indicates the kernel didn't even make it past its own initialization just yet. Just to have explored that possibility - could you enable Xen's NMI watchdog (simply "watchdog" on the Xen command line)? Among the boot messages there ought to be one indicating whether it actually works on your system. Without a serial console you wouldn't see anything if it triggers, but the system would then never make it to the kernel side issue. As far as making sure we at least see all kernel messages - are you having "ignore_loglevel" in place? I don't think I've been able to spot the kernel command line anywhere in the video. I'm afraid there's no real way around seeing the full Xen messages, i.e. including possible ones while Dom0 already boots (and allowing some debug keys to be issued, as the rcu_barrier on the stack may suggest there's an issue with one of the secondary CPUs). You could try whether "vag=keep" on the Xen command line allows you to see more, but this option may have quite severe an effect on the timing of Dom0's booting, which may make an already bad situation worse. Alternatively the kernel may need instrumenting to figure what exactly it is that prevent forward progress. There's one other wild guess you may want to try: "cpuidle=no" on the Xen command line. Other wild guesses are: - add "sched=credit" to the Xen command line or - add "xen.fifo_events=0" to the dom0 command line Juergen Attachment:
OpenPGP_0xB0DE9DD628BF132F.asc Attachment:
OpenPGP_signature
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |