[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Recent upgrade of 4.13 -> 4.14 issue



On 15.12.2020 20:08, Liwei wrote:
> Hi list,
>     This is a reply to the thread of the same title (linked here:
> https://www.mail-archive.com/xen-devel@xxxxxxxxxxxxxxxxxxxx/msg84916.html
> ) which I could not reply to because I receive this list by digest.
> 
>     I'm unclear if this is exactly the reason, but I experienced the
> same symptoms when upgrading to 4.14. The issue does not occur if I
> downgrade to 4.11 (the previous version that was provided by Debian).
> Kernel is 5.9.11 and unchanged between xen versions.
> 
>     One thing I noticed is that if I disable the monitor/mwait
> instructions on my CPU (Intel Xeon E5-2699 v4 ES), the stalls seem to
> occur later into the boot. With the instructions enabled, the system
> usually stalls less than a few minutes after boot; disabled, it can
> last for tens of minutes.
> 
>     Further disabling the HPET or forcing the kernel to use PIT causes
> it to be somewhat usable. The stalls still occur tens of minutes in
> but somehow everything seems to continue chugging along fine?

By "the kernel" do you really mean the kernel, or Xen?

>     I've also verified that the stalls do not occur in all the above
> cases if I just boot into the kernel without xen.
> 
>     When the stalls happen, I get the "rcu: INFO: rcu_sched detected
> stalls on CPUs/tasks" backtraces printed on the console periodically,
> but keystrokes don't do anything on the console, and I can't spawn new
> SSH sessions even though pinging the system produces a reply. The last
> item in the call trace is usually "xen_safe_halt", but I've seen it
> occur for other functions related to btrfs and the network adapter as
> well.

The kernel log may not be the only relevant thing here - the hypervisor
log may also need looking at (with full verbosity enabled and
preferably a debug build in use).

>     Do let me know if there's anything I can provide to help
> troubleshoot this. At the moment I've reverted to 4.11, but I can
> temporarily switch over to 4.14 to collect any necessary information.

In that earlier thread a number of things to try were suggested, iirc
(switching scheduler or disabling use of deep C states come to mind).
Did you experiment with those? If so, can you let us know of the
results, so we can see whether there's a pattern?

Jan



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.