[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: dom0 LInux 5.8-rc5 kernel failing to initialize cooling maps for Allwinner H6 SoC



(+ Andre and Stefano)

On 20/07/2020 15:53, Alejandro wrote:
Hello all.

Hello,


I'm new to this community, and firstly I'd like to thank you all for
your efforts on supporting Xen in ARM devices.

Welcome to the community!


I'm trying Xen 4.13.1 in a Allwinner H6 SoC (more precisely a Pine H64
model B, with a ARM Cortex-A53 CPU).
I managed to get a dom0 Linux 5.8-rc5 kernel running fine, unpatched,
and I'm using the upstream device tree for
my board. However, the dom0 kernel has trouble when reading some DT
nodes that are related to the CPUs, and
it can't initialize the thermal subsystem properly, which is a kind of
showstopper for me, because I'm concerned
that letting the CPU run at the maximum frequency without watching out
its temperature may cause overheating.

I understand this concern, I am aware of some efforts to get CPUFreq working on Xen but I am not sure if there is anything available yet. I have CCed a couple of more person that may be able to help here.

The relevant kernel messages are:

[  +0.001959] sun50i-cpufreq-nvmem: probe of sun50i-cpufreq-nvmem
failed with error -2
...
[  +0.003053] hw perfevents: failed to parse interrupt-affinity[0] for pmu
[  +0.000043] hw perfevents: /pmu: failed to register PMU devices!
[  +0.000037] armv8-pmu: probe of pmu failed with error -22

I am not sure the PMU failure is related to the thermal failure below.

...
[  +0.000163] OF: /thermal-zones/cpu-thermal/cooling-maps/map0: could
not find phandle
[  +0.000063] thermal_sys: failed to build thermal zone cpu-thermal: -22
Would it be possible to paste the device-tree node for /thermal-zones/cpu-thermal/cooling-maps? I suspect the issue is because we recreated /cpus from scratch.

I don't know much about how the thermal subsystem works, but I suspect this will not be enough to get it working properly on Xen. For a workaround, you would need to create a dom0 with the same numbers of vCPU as the numbers of pCPUs. They would also need to be pinned.

I will leave the others to fill in more details.


I've searched for issues, code or commits that may be related for this
issue. The most relevant things I found are:

- A patch that blacklists the A53 PMU:
https://patchwork.kernel.org/patch/10899881/
- The handle_node function in xen/arch/arm/domain_build.c:
https://github.com/xen-project/xen/blob/master/xen/arch/arm/domain_build.c#L1427

I remember this discussion. The problem was that the PMU is using per-CPU interrupts. Xen is not yet able to handle PPIs as they often requires more context to be saved/restored (in this case the PMU context).

There was a proposal to look if a device is using PPIs and just remove them from the Device-Tree. Unfortunately, I haven't seen any official submission for this patch.

Did you have to apply the patch to boot up? If not, then the error above shouldn't be a concern. However, if you need PMU support for the using thermal devices then it is going to require some work.


I've thought about removing "/cpus" from the skip_matches array in the
handle_node function, but I'm not sure
that would be a good fix.

The node "/cpus" and its sub-node are recreated by Xen for Dom0. This is because Dom0 may have a different numbers of vCPUs and it doesn't seen the pCPUs.

If you don't skip "/cpus" from the host DT then you would end up with two "/cpus" path in your dom0 DT. Mostly likely, Linux will not be happy with it.

I vaguely remember some discussions on how to deal with CPUFreq in Xen. IIRC we agreed that Dom0 should be part of the equation because it already contains all the drivers. However, I can't remember if we agreed how the dom0 would be made aware of the pCPUs.

@Volodymyr, I think you were looking at CPUFreq. Maybe you can help?

Best regards,

--
Julien Grall



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.