Hi,
I have an old Proliant DL380 server running Gentoo Linux as Dom0 on Xen
with several DomUs also running Gentoo Linux. After upgrading to 4.15 I
have noticed that in some of the DomUs (that are used as Kubernetes
nodes) the load slowly keeps climbing until it reaches a level that the
DomU becomes unresponsive and needs to be restarted. This issue is not
present when running on Xen 4.14 and went away once I downgraded bask to
4.14. The same issue presented itself again after upgrading to 4.16.
According to some Munin graphs the load increases by 2-4 per day, but as
far as I can tell nothing else really changes (CPU usage, number of
processes - ) so I don't really have an idea what is causing the issue.
Both the Dom0 and DomUs are running on a hardened-gentoo kernel version
5.10.156 (see the attached .config).
If anyone has any pointers regarding where to look or what can be
tweaked, I would be grateful for the information.
Regards,
Gabor
Hello Gabor,
I remember having these problems:
- with credit2 scheduler
- kernel 5.15 in some point of time (around kernel 5.15.32), but is ok with current versions.
Tomas