[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH 1/2] xen/arm: Add imx8q{m,x} platform glue



Hi Julien,

On 3/7/24 00:07, Julien Grall wrote:
> Hi John,
> 
>  > Ping on the watchdog discussion bits.
> 
> Sorry for the late reply.
> 
> On 06/03/2024 13:13, John Ernberg wrote:
>> On 2/9/24 14:14, John Ernberg wrote:
>>>
>>>>     * IMX_SIP_TIMER_*:  This seems to be related to the watchdog.
>>>> Shouldn't dom0 rely on the watchdog provided by Xen instead? So those
>>>> call will be used by Xen.
>>>
>>> That is indeed a watchdog SIP, and also for setting the SoC internal RTC
>>> if it is being used.
>>>
>>> I looked around if there was previous discussion and only really 
>>> found [3].
>>> Is the xen/xen/include/watchdog.h header meant to be for this kind of
>>> watchdog support or is that more for the VM watchdog? Looking at the x86
>>> ACPI NMI watchdog it seems like the former, but I have never worked with
>>> x86 nor ACPI.
> 
> include/watchdog.h contains helper to configure the watchdog for Xen. We 
> also have per-VM watchdog and this is configured by the hypercall 
> SCHEDOP_watchdog.
> 
>>>
>>> Currently forwarding it to Dom0 has not caused any watchdog resets with
>>> our watchdog timeout settings, our specific Dom0 setup and VM count.
> 
> IIUC, the SMC API for the watchdog would be similar to the ACPI NMI 
> watchdog. So I think it would make more sense if this is not exposed to 
> dom0 (even if Xen is doing nothing with it).
> 
> Can you try to hide the SMCs and check if dom0 still behave properly?
> 
> Cheers,
> 

This SMC manages a hardware watchdog, if it's not pinged within a 
specific interval the entire board resets.

If I block the SMCs the watchdog driver in Dom0 will fail to ping the 
watchdog, triggering a board reset because the system looks to have 
become unresponsive. The reason this patch set started is because we 
couldn't ping the watchdog when running with Xen.

In our specific system the bootloader enables the watchdog as early as 
possible so that we can get watchdog protection for as much of the boot 
as we possibly can.

So, if we are to block the SMC from Dom0, then Xen needs to take over 
the pinging. It could be implemented similarly to the NMI watchdog, 
except that the system will reset if the ping is missed rather than 
backtrace.
It would also mean that Xen will get a whole watchdog driver-category 
due to the watchdog being vendor and sometimes even SoC specific when it 
comes to Arm.

My understanding of the domain watchdog code is that today the domain 
needs to call SCHEDOP_watchdog at least once to start the watchdog 
timer. Since watchdog protection through the whole boot process is 
desirable we'd need some core changes, such as an option to start the 
domain watchdog on init.

It's quite a big change to make, while I am not against doing it if it 
makes sense, I now wonder if Xen should manage hardware watchdogs.
Looking at the domain watchdog code it looks like if a domain does not 
get enough execution time, the watchdog will not be pinged enough and 
the guest will be reset. So either watchdog approach requires Dom0 to 
get execution time. Dom0 also needs to service all the PV backends it's 
responsible for. I'm not sure it's valuable to add another layer of 
watchdog for this scenario as the end result (checking that the entire 
system works) is achieved without it as well.

So, before I try to find the time to make a proposal for moving the 
hardware watchdog bit to Xen, do we really want it?

Thanks! // John Ernberg

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.