[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [ARM][xencons] PV Console hangs due to illegal ring buffer accesses

To: "Andrei Cherechesu (OSS)" <andrei.cherechesu@xxxxxxxxxxx>, xen-devel@xxxxxxxxxxxxxxxxxxxx
From: Julien Grall <julien@xxxxxxx>
Date: Thu, 20 Jul 2023 11:33:59 +0100
Cc: Stefano Stabellini <sstabellini@xxxxxxxxxx>, george.mocanu@xxxxxxx, Juergen Gross <jgross@xxxxxxxx>
Delivery-date: Thu, 20 Jul 2023 10:34:10 +0000
List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

(+ Juergen)

On 19/07/2023 17:13, Andrei Cherechesu (OSS) wrote:

Hello,


Hi Andrei,

As we're running Xen 4.17 (with platform-related support added) on NXP S32G 
SoCs (ARMv8), with a custom Linux distribution built through Yocto, and we've 
set some Xen-based demos up, we encountered some issues which we think might 
not be related to our hardware. For additional context, the Linux kernel 
version we're running is 5.15.96-rt (with platform-related support added as 
well).

The setup to reproduce the problem is fairly simple: after booting a Dom0 (can 
provide configuration details if needed), we're booting a normal PV DomU with 
PV Networking. Additionally, the VMs have k3s (Lightweight Kubernetes - version 
v1.25.8+k3s1: https://github.com/k3s-io/k3s/releases/tag/v1.25.8%2Bk3s1) 
installed in their rootfs'es.

The problem is that the DomU console hangs (no new output is shown, no input 
can be sent) some time (non-deterministic, sometimes 5 seconds, other times 
like 15-20 seconds) after we run the `k3s server` command. We have this command 
running as part of a sysvinit service, and the same behavior can be observed in 
that case as well. The k3s version we use is the one mentioned in the paragraph 
above, but this can be reproduced with other versions as well (i.e., v1.21.11, 
v1.22.6). If the `k3s server` command is ran in the Dom0 VM, everything works 
fine. Using DomU as an agent node is also working fine, only when it is run as 
a server the console problem occurs.

Immediately after the serial console hangs, we can still log in on DomU using 
SSH, and we can observe the following messages its dmesg:
[   57.905806] xencons: Illegal ring page indices

Looking at Linux code, this message is printed in a couple of place inthe xenconsole driver.

I would assume that this is printed when reading from the buffer(otherwise you would not see any message). Can you confirm it?


Also, can you provide the indices that Linux considers buggy?

Lastly, it seems like the barrier used are incorrect. It should be thevirt_*() version rather than a plain mb()/wmb(). I don't think it matterfor arm64 though (I am assuming you are not running 32-bit).

[   59.399620] xenbus: error -5 while reading message

So this message is coming from the xenbus driver (used to read thexenstore ring). This is -EIO, and AFAICT returned when the indices arealso incorrect.

For this driver, I think there is also a TOCTOU because a compiler isfree to reload intf->rsp_cons after the check. Moving virt_mb() isprobably not sufficient. You would also want to use ACCESS_ONCE().

What I find odd is you have two distinct rings (xenconsole and xenbus)with similar issues. Above, you said you are using Linux RT. I wonder ifthis has a play into the issue because if I am not mistaken, the twofunctions would now be fully preemptible.

This could expose some races. For instance, there are some missingACCESS_ONCE() (as mentioned above).

In particular, Xenstored (I haven't checked xenconsoled) is using += toupdate intf->rsp_cons. There is no guarantee that the update will be atomic.

Overall, I am not 100% sure what I wrote is related. But that's probablya good start of things that can be exacerbated with Linux RT.

[   59.399649] xenbus: error -5 while writing message

This is in xenbus as well. But this time in the write part. The analysisI wrote above for the read part can be applied here.


Cheers,

--
Julien Grall

Follow-Ups:
- Re: [ARM][xencons] PV Console hangs due to illegal ring buffer accesses
  - From: Stefano Stabellini

References:
- [ARM][xencons] PV Console hangs due to illegal ring buffer accesses
  - From: Andrei Cherechesu (OSS)

Prev by Date: Re: [PATCH v3 3/3] xen/riscv: introduce identity mapping
Next by Date: [XEN PATCH] x86/mtrr: address violations of MISRA C:2012 Rule 8.3 on parameter types
Previous by thread: [ARM][xencons] PV Console hangs due to illegal ring buffer accesses
Next by thread: Re: [ARM][xencons] PV Console hangs due to illegal ring buffer accesses
Index(es):
- Date
- Thread

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.