[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Performance degradation in 4.15 and above



On 2023-05-23 10:16, Tomas Mozes wrote:
Another thing that came to my mind, the lockups occurred when the grant table was full.

domU config:
max_grant_frames = 256

grub config:
GRUB_CMDLINE_XEN="gnttab_max_frames=256 sched=credit ..."

You can check it with:
xen-diag gnttab_query_size [domid]
for Dom0 nr_frames is 1, for the DomUs it's between 15-30 while the max_nr_frames is 64 for all

On Fri, May 19, 2023 at 1:04 PM Gabor Hudiczius <ghudiczius@xxxxxxxxx> wrote:

    On 2023-05-19 11:48, Tomas Mozes wrote:


    On Fri, May 19, 2023 at 11:19 AM Gabor Hudiczius
    <ghudiczius@xxxxxxxxx> wrote:

        Hi,

        I have an old Proliant DL380 server running Gentoo Linux as
        Dom0 on Xen
        with several DomUs also running Gentoo Linux. After upgrading
        to 4.15 I
        have noticed that in some of the DomUs (that are used as
        Kubernetes
        nodes) the load slowly keeps climbing until it reaches a
        level that the
        DomU becomes unresponsive and needs to be restarted. This
        issue is not
        present when running on Xen 4.14 and went away once I
        downgraded bask to
        4.14. The same issue presented itself again after upgrading
        to 4.16.

        According to some Munin graphs the load increases by 2-4 per
        day, but as
        far as I can tell nothing else really changes (CPU usage,
        number of
        processes - ) so I don't really have an idea what is causing
        the issue.

        Both the Dom0 and DomUs are running on a hardened-gentoo
        kernel version
        5.10.156 (see the attached .config).

Tried with kernel version 5.15.110, but that did not help, I will give 6.1.28 a try as well


        If anyone has any pointers regarding where to look or what
        can be
        tweaked, I would be grateful for the information.

        Regards,
        Gabor



    Hello Gabor,
    I remember having these problems:
    - with credit2 scheduler
    I am using the credit scheduler since after upgrading to 4.12 my
    box stalled several times and I followed the recommendation from
    the Gentoo wiki
    (https://wiki.gentoo.org/wiki/Xen#Xen_domU_hanging_with_Xen_4.12.2B)
    which seemed to solve the issue.
    - kernel 5.15 in some point of time (around kernel 5.15.32), but
    is ok with current versions.

    Tomas


I also noticed that restarting the DomUs has little to no effect on the load, only restarting the Dom0 decreases the load back to normal levels




 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.