[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[ARM][xencons] PV Console hangs due to illegal ring buffer accesses


  • To: xen-devel@xxxxxxxxxxxxxxxxxxxx
  • From: "Andrei Cherechesu (OSS)" <andrei.cherechesu@xxxxxxxxxxx>
  • Date: Wed, 19 Jul 2023 19:13:22 +0300
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=oss.nxp.com; dmarc=pass action=none header.from=oss.nxp.com; dkim=pass header.d=oss.nxp.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=eqRr919NUlOrmXHdah6Gk+0oOo6JMk67MeDVo6GolN0=; b=HiQ993EBEKfZbIacvgvZ2vfIMOvD8aCblp976hMArE44+Hw8BYN2F2ForKwYFd9oGiKquxAphM39qzcaNUxyid02mtWMsbFxdeRs/anvzOchX5Y01c9sPL1+zen0RxnPCQ0/m7Ckl0lLlEweRN6/gPZxqpXRf9ZjyougnsaR47AX24txjNvQlNH2Ueuf1h1C5J7xsKwFA6Syd5eu2VXtqmfTUvVYWOvjBBRgPCvS2cEcqIif73krzhsUWDjTXE0p8ILRjbJmKVDjUM7yCbtOJf0/UagRaWQtUyR/smyNCUzqTE12HewSZZNuq/t3/KykaSyhSg+cxThuDpcQeZUIhg==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=evA06aP37wbQQZ/zOSJWd/TqQwp/WE8c+lddz4Kl0/dYw0Amjfw+mHf/s3lVzxgCn1k0jL6nnil5DF+yd+8eoLB/w8gQbiEDIFd1F5Ic/miu/2QZsZQOHMhImqrZI3jjbi4+mp8S3QxQoZbyV8iK69WTE6U3FlXz8PZo+zYD4wANYUKtmzDSyYbTH93YCh0yu0A0KpKf782785Bw5o5wqUrYH32RFK0Q76XOBSP+5lbRQC6yMqhMMALHuQ4FHduVPyxZzWa6TfYNuVx9P6y7ZeEDJTpDrk9ci97G4hjbF9yRcgkSgb+/bxZahazx1qDNUfO7tk7cTdz6EloqgxQauQ==
  • Authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=oss.nxp.com;
  • Cc: Stefano Stabellini <sstabellini@xxxxxxxxxx>, george.mocanu@xxxxxxx
  • Delivery-date: Wed, 19 Jul 2023 16:14:01 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

Hello,

As we're running Xen 4.17 (with platform-related support added) on NXP S32G 
SoCs (ARMv8), with a custom Linux distribution built through Yocto, and we've 
set some Xen-based demos up, we encountered some issues which we think might 
not be related to our hardware. For additional context, the Linux kernel 
version we're running is 5.15.96-rt (with platform-related support added as 
well).

The setup to reproduce the problem is fairly simple: after booting a Dom0 (can 
provide configuration details if needed), we're booting a normal PV DomU with 
PV Networking. Additionally, the VMs have k3s (Lightweight Kubernetes - version 
v1.25.8+k3s1: https://github.com/k3s-io/k3s/releases/tag/v1.25.8%2Bk3s1) 
installed in their rootfs'es.

The problem is that the DomU console hangs (no new output is shown, no input 
can be sent) some time (non-deterministic, sometimes 5 seconds, other times 
like 15-20 seconds) after we run the `k3s server` command. We have this command 
running as part of a sysvinit service, and the same behavior can be observed in 
that case as well. The k3s version we use is the one mentioned in the paragraph 
above, but this can be reproduced with other versions as well (i.e., v1.21.11, 
v1.22.6). If the `k3s server` command is ran in the Dom0 VM, everything works 
fine. Using DomU as an agent node is also working fine, only when it is run as 
a server the console problem occurs.

Immediately after the serial console hangs, we can still log in on DomU using 
SSH, and we can observe the following messages its dmesg:
[   57.905806] xencons: Illegal ring page indices
[   59.399620] xenbus: error -5 while reading message
[   59.399649] xenbus: error -5 while writing message
[   67.353608] xencons: Illegal ring page indices
[   78.027813] IPVS: Registered protocols (TCP, UDP, SCTP, AH, ESP)
[   78.027865] IPVS: Connection hash table configured (size=4096, 
memory=32Kbytes)
[   78.028038] IPVS: ipvs loaded.
[   78.065479] IPVS: [rr] scheduler registered.
[   78.071249] IPVS: [wrr] scheduler registered.
[   78.084190] IPVS: [sh] scheduler registered.

Sometimes, Xen also dumps some info about expanding the grant tables, after the 
DomU console becomes unresponsive:
(XEN) common/grant_table.c:1882:d2v1: Expanding d2 grant table from 5 to 6 
frames
(XEN) common/grant_table.c:1882:d2v1: Expanding d2 grant table from 6 to 7 
frames
(XEN) common/grant_table.c:1882:d2v1: Expanding d2 grant table from 7 to 8 
frames


It seems that when spawning the k3s server process, somehow (maybe due to 
intensive usage) the console ring buffers and the indices used for accessing 
them become corrupt. But the PV networking still works fine, and the domain is 
reachable via SSH and can continue to process the workload.

We've not been able so far to figure out why this happens, so any help would be 
appreciated. If you need other Domain configuration details or any inputs from 
our side, let us know.

Thank you,
Andrei Cherechesu



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.