[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [ARM][xencons] PV Console hangs due to illegal ring buffer accesses
Hello, As we're running Xen 4.17 (with platform-related support added) on NXP S32G SoCs (ARMv8), with a custom Linux distribution built through Yocto, and we've set some Xen-based demos up, we encountered some issues which we think might not be related to our hardware. For additional context, the Linux kernel version we're running is 5.15.96-rt (with platform-related support added as well). The setup to reproduce the problem is fairly simple: after booting a Dom0 (can provide configuration details if needed), we're booting a normal PV DomU with PV Networking. Additionally, the VMs have k3s (Lightweight Kubernetes - version v1.25.8+k3s1: https://github.com/k3s-io/k3s/releases/tag/v1.25.8%2Bk3s1) installed in their rootfs'es. The problem is that the DomU console hangs (no new output is shown, no input can be sent) some time (non-deterministic, sometimes 5 seconds, other times like 15-20 seconds) after we run the `k3s server` command. We have this command running as part of a sysvinit service, and the same behavior can be observed in that case as well. The k3s version we use is the one mentioned in the paragraph above, but this can be reproduced with other versions as well (i.e., v1.21.11, v1.22.6). If the `k3s server` command is ran in the Dom0 VM, everything works fine. Using DomU as an agent node is also working fine, only when it is run as a server the console problem occurs. Immediately after the serial console hangs, we can still log in on DomU using SSH, and we can observe the following messages its dmesg: [ 57.905806] xencons: Illegal ring page indices [ 59.399620] xenbus: error -5 while reading message [ 59.399649] xenbus: error -5 while writing message [ 67.353608] xencons: Illegal ring page indices [ 78.027813] IPVS: Registered protocols (TCP, UDP, SCTP, AH, ESP) [ 78.027865] IPVS: Connection hash table configured (size=4096, memory=32Kbytes) [ 78.028038] IPVS: ipvs loaded. [ 78.065479] IPVS: [rr] scheduler registered. [ 78.071249] IPVS: [wrr] scheduler registered. [ 78.084190] IPVS: [sh] scheduler registered. Sometimes, Xen also dumps some info about expanding the grant tables, after the DomU console becomes unresponsive: (XEN) common/grant_table.c:1882:d2v1: Expanding d2 grant table from 5 to 6 frames (XEN) common/grant_table.c:1882:d2v1: Expanding d2 grant table from 6 to 7 frames (XEN) common/grant_table.c:1882:d2v1: Expanding d2 grant table from 7 to 8 frames It seems that when spawning the k3s server process, somehow (maybe due to intensive usage) the console ring buffers and the indices used for accessing them become corrupt. But the PV networking still works fine, and the domain is reachable via SSH and can continue to process the workload. We've not been able so far to figure out why this happens, so any help would be appreciated. If you need other Domain configuration details or any inputs from our side, let us know. Thank you, Andrei Cherechesu
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |