Re: xen-balloon thread using 100% of CPU, regression in 5.4.150

On 04.10.21 11:14, Marek Marczykowski-Górecki wrote:
On Mon, Oct 04, 2021 at 07:31:40AM +0200, Juergen Gross wrote:
On 03.10.21 06:47, Marek Marczykowski-Górecki wrote:

After updating a PVH domU to 5.4.150, I see xen-balloon thread using
100% CPU (one thread).
This is a domain started with memory=maxmem=716800KiB (via libvirt). Then,
inside, I see:

# cat /sys/devices/system/xen_memory/xen_memory0/target_kb
# cat /sys/devices/system/xen_memory/xen_memory0/info/current_kb

Doing `cat info/current_kb > target_kb` "fixes" the issue. But still,
something is wrong - on earlier kernel (5.4.143 to be precise), it
wasn't spinning, with exactly the same values reported in sysfs. It
shouldn't run in circles if it can't get that much memory it wants. I
strongly suspect "xen/balloon: use a kernel thread instead a workqueue"
or related commit being responsible, but I haven't verified it.

I think you are right. I need to handle the BP_ECANCELED case similar to
BP_EAGAIN in the kernel thread (wait until target size changes again).

One further question: do you see any kernel message in the guest related
to the looping balloon thread?

Nothing, only the usual "xen:balloon: Initialising balloon driver", and
nothing related to balloon after that.

Could you try the attached patch, please? I've tested it briefly with
PV and PVH guests.


Attachment: 0001-xen-balloon-fix-cancelled-balloon-action.patch
Description: Text Data

Attachment: OpenPGP_0xB0DE9DD628BF132F.asc
Description: OpenPGP public key

Attachment: OpenPGP_signature
Description: OpenPGP digital signature



