[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH 1/2] xen/arm: Add imx8q{m,x} platform glue
Hi Bertrand, On 3/13/24 11:07, Bertrand Marquis wrote: > Hi, > >> On 8 Mar 2024, at 15:04, Julien Grall <julien@xxxxxxx> wrote: >> >> Hi John, >> >> Thank you for the reply. >> >> On 08/03/2024 13:40, John Ernberg wrote: >>> On 3/7/24 00:07, Julien Grall wrote: >>>> > Ping on the watchdog discussion bits. >>>> >>>> Sorry for the late reply. >>>> >>>> On 06/03/2024 13:13, John Ernberg wrote: >>>>> On 2/9/24 14:14, John Ernberg wrote: >>>>>> >>>>>>> * IMX_SIP_TIMER_*: This seems to be related to the watchdog. >>>>>>> Shouldn't dom0 rely on the watchdog provided by Xen instead? So those >>>>>>> call will be used by Xen. >>>>>> >>>>>> That is indeed a watchdog SIP, and also for setting the SoC internal RTC >>>>>> if it is being used. >>>>>> >>>>>> I looked around if there was previous discussion and only really >>>>>> found [3]. >>>>>> Is the xen/xen/include/watchdog.h header meant to be for this kind of >>>>>> watchdog support or is that more for the VM watchdog? Looking at the x86 >>>>>> ACPI NMI watchdog it seems like the former, but I have never worked with >>>>>> x86 nor ACPI. >>>> >>>> include/watchdog.h contains helper to configure the watchdog for Xen. We >>>> also have per-VM watchdog and this is configured by the hypercall >>>> SCHEDOP_watchdog. >>>> >>>>>> >>>>>> Currently forwarding it to Dom0 has not caused any watchdog resets with >>>>>> our watchdog timeout settings, our specific Dom0 setup and VM count. >>>> >>>> IIUC, the SMC API for the watchdog would be similar to the ACPI NMI >>>> watchdog. So I think it would make more sense if this is not exposed to >>>> dom0 (even if Xen is doing nothing with it). >>>> >>>> Can you try to hide the SMCs and check if dom0 still behave properly? >>>> >>>> Cheers, >>>> >>> This SMC manages a hardware watchdog, if it's not pinged within a >>> specific interval the entire board resets. >> >> Do you know what's the default interval? Is it large enough so Xen + dom0 >> can boot (at least up to when the watchdog driver is initialized)? >> >>> If I block the SMCs the watchdog driver in Dom0 will fail to ping the >>> watchdog, triggering a board reset because the system looks to have >>> become unresponsive. The reason this patch set started is because we >>> couldn't ping the watchdog when running with Xen. >>> In our specific system the bootloader enables the watchdog as early as >>> possible so that we can get watchdog protection for as much of the boot >>> as we possibly can. >>> So, if we are to block the SMC from Dom0, then Xen needs to take over >>> the pinging. It could be implemented similarly to the NMI watchdog, >>> except that the system will reset if the ping is missed rather than >>> backtrace. >>> It would also mean that Xen will get a whole watchdog driver-category >>> due to the watchdog being vendor and sometimes even SoC specific when it >>> comes to Arm. >>> My understanding of the domain watchdog code is that today the domain >>> needs to call SCHEDOP_watchdog at least once to start the watchdog >>> timer. Since watchdog protection through the whole boot process is >>> desirable we'd need some core changes, such as an option to start the >>> domain watchdog on init. > >>> It's quite a big change to make >> >> For clarification, above you seem to mention two changes: >> >> 1) Allow Xen to use the HW watchdog >> 2) Allow the domain to use the watchdog early >> >> I am assuming by big change, you are referring to 2? >> >> , while I am not against doing it if it >>> makes sense, I now wonder if Xen should manage hardware watchdogs. >>> Looking at the domain watchdog code it looks like if a domain does not >>> get enough execution time, the watchdog will not be pinged enough and >>> the guest will be reset. So either watchdog approach requires Dom0 to >>> get execution time. Dom0 also needs to service all the PV backends it's >>> responsible for. I'm not sure it's valuable to add another layer of >>> watchdog for this scenario as the end result (checking that the entire >>> system works) is achieved without it as well. >>> So, before I try to find the time to make a proposal for moving the >>> hardware watchdog bit to Xen, do we really want it? >> >> Thanks for the details. Given that the watchdog is enabled by the >> bootloader, I think we want Xen to drive the watchdog for two reasons: >> 1) In true dom0less environment, dom0 would not exist >> 2) You are relying on Xen + Dom0 to boot (or at least enough to get the >> watchdog working) within the watchdog interval. > > Definitely we need to consider the case where there is no Dom0. > > I think there are in fact 3 different use cases here: > - watchdog fully driven in a domain (dom0 or another): would work if it is > expected > that Xen + Domain boot time is under the watchdog initial refresh rate. I > think this > could make sense on some applications where your system depends on a > specific > domain to be properly booted to work. This would require an initial > refresh time > configurable in the boot loader probably. This is our use-case. ^ Our dom0 is monitoring and managing the other domains in our system. Without dom0 working the system isn't really working as a whole. @Julien: Would you be ok with the patch set continuing in the direction of the original proposal, letting another party (or me at a later time) implement the fully driven by Xen option? > - watchdog fully driven by Xen. One might want to just make sure the > hypervisor is alive. > - hybrid model where the watchdog is driven by Xen until a domain comes up to > drive it. > This could make sense to relax the stress on boot time but would raise the > question of > what should be done if the domain dies. This is also kind of complex as > Xen should stop > refreshing the watchdog when a domain starts doing it (might require a > trap and emulate > initially that is then mapped directly to a domain). I am not completely > sure this makes sense. > > Regards > Bertrand > Thanks and best regards // John Ernberg
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |