[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Problems with APIC on versions 4.9 and later (4.8 works)

On 25.01.2021 20:37, Claudemir Todo Bom wrote:
> I've managed to get the debug messages on the screen using
> vga=text-80x50,keep and disabling all messages from the kernel. Two
> images are attached with the output running the debug patch.

And the 1st of them (161303) was taken at the time of the hang of
the kernel (or entire system), not any earlier? I ask because one
part of the reason for the patch was to understand whether the
rendezvousing itself would fail in some way (like one of the CPUs
not calling in).

Were new log messages (from the debugging patch) still issued at
this point, showing Xen itself was still alive?

The 2nd of the pictures (162313) at least clarifies that indeed
the commit in question had a functional effect on this system,
because of

(XEN) TSC warp detected, disabling TSC_RELIABLE

I still can't figure though why the change in rendezvous handling
(from "std" to "tsc") would have broken your system.

> About the version I've used to test: since the 4.14 shows that other
> bug with the detection of cpu features I mentioned on the other
> subthread, I chose to work on 4.11 that doesn't shows that behaviour.
> Calling with clocksource on the xen command line changed nothing.

Oh, right, because the specific feature that causes the change
of rendezvous functions for you also is a prereq for that mode
of operation.

> I don't know if this part of code is intended to execute a lot of
> times, but when starting with dom0_max_vcpus=1, the system boots up
> and keeps showing the messages.

When there's just one CPU, there's no CPU to rendezvous with.

Iirc you did say that you observe the hang even with as little
as 2 CPUs? The problem the above quoted message is supposed to
address is normally coming into play only on multi-socket
systems. Yet from your initial report I deduce this is a
single socket system. So in the end I suppose there are two
problems - one is the hang, and the other is that your system
gets diagnosed as having an unreliable TSC (at least I didn't
think Xeon E5 v2 should have a problem there).

I will want to extend the debugging patch, but I'd like to
have clarification on some of the points above first.




Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.