[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH 1/2] xen/arm: Add imx8q{m,x} platform glue
Hi John, > On 25 Mar 2024, at 13:18, John Ernberg <john.ernberg@xxxxxxxx> wrote: > > Hi Bertrand, > > On 3/21/24 17:15, Bertrand Marquis wrote: >> Hi John, >> >>> On 21 Mar 2024, at 17:05, John Ernberg <john.ernberg@xxxxxxxx> wrote: >>> >>> Hi Bertrand, >>> >>> On 3/21/24 08:41, Bertrand Marquis wrote: >>>> Hi, >>>> >>>>> On 20 Mar 2024, at 18:40, Julien Grall <julien@xxxxxxx> wrote: >>>>> >>>>> Hi John, >>>>> >>>>> On 20/03/2024 16:24, John Ernberg wrote: >>>>>> Hi Bertrand, >>>>>> On 3/13/24 11:07, Bertrand Marquis wrote: >>>>>>> Hi, >>>>>>> >>>>>>>> On 8 Mar 2024, at 15:04, Julien Grall <julien@xxxxxxx> wrote: >>>>>>>> >>>>>>>> Hi John, >>>>>>>> >>>>>>>> Thank you for the reply. >>>>>>>> >>>>>>>> On 08/03/2024 13:40, John Ernberg wrote: >>>>>>>>> On 3/7/24 00:07, Julien Grall wrote: >>>>>>>>>>> Ping on the watchdog discussion bits. >>>>>>>>>> >>>>>>>>>> Sorry for the late reply. >>>>>>>>>> >>>>>>>>>> On 06/03/2024 13:13, John Ernberg wrote: >>>>>>>>>>> On 2/9/24 14:14, John Ernberg wrote: >>>>>>>>>>>> >>>>>>>>>>>>> * IMX_SIP_TIMER_*: This seems to be related to the watchdog. >>>>>>>>>>>>> Shouldn't dom0 rely on the watchdog provided by Xen instead? So >>>>>>>>>>>>> those >>>>>>>>>>>>> call will be used by Xen. >>>>>>>>>>>> >>>>>>>>>>>> That is indeed a watchdog SIP, and also for setting the SoC >>>>>>>>>>>> internal RTC >>>>>>>>>>>> if it is being used. >>>>>>>>>>>> >>>>>>>>>>>> I looked around if there was previous discussion and only really >>>>>>>>>>>> found [3]. >>>>>>>>>>>> Is the xen/xen/include/watchdog.h header meant to be for this kind >>>>>>>>>>>> of >>>>>>>>>>>> watchdog support or is that more for the VM watchdog? Looking at >>>>>>>>>>>> the x86 >>>>>>>>>>>> ACPI NMI watchdog it seems like the former, but I have never >>>>>>>>>>>> worked with >>>>>>>>>>>> x86 nor ACPI. >>>>>>>>>> >>>>>>>>>> include/watchdog.h contains helper to configure the watchdog for >>>>>>>>>> Xen. We >>>>>>>>>> also have per-VM watchdog and this is configured by the hypercall >>>>>>>>>> SCHEDOP_watchdog. >>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Currently forwarding it to Dom0 has not caused any watchdog resets >>>>>>>>>>>> with >>>>>>>>>>>> our watchdog timeout settings, our specific Dom0 setup and VM >>>>>>>>>>>> count. >>>>>>>>>> >>>>>>>>>> IIUC, the SMC API for the watchdog would be similar to the ACPI NMI >>>>>>>>>> watchdog. So I think it would make more sense if this is not exposed >>>>>>>>>> to >>>>>>>>>> dom0 (even if Xen is doing nothing with it). >>>>>>>>>> >>>>>>>>>> Can you try to hide the SMCs and check if dom0 still behave properly? >>>>>>>>>> >>>>>>>>>> Cheers, >>>>>>>>>> >>>>>>>>> This SMC manages a hardware watchdog, if it's not pinged within a >>>>>>>>> specific interval the entire board resets. >>>>>>>> >>>>>>>> Do you know what's the default interval? Is it large enough so Xen + >>>>>>>> dom0 can boot (at least up to when the watchdog driver is initialized)? >>>>>>>> >>>>>>>>> If I block the SMCs the watchdog driver in Dom0 will fail to ping the >>>>>>>>> watchdog, triggering a board reset because the system looks to have >>>>>>>>> become unresponsive. The reason this patch set started is because we >>>>>>>>> couldn't ping the watchdog when running with Xen. >>>>>>>>> In our specific system the bootloader enables the watchdog as early as >>>>>>>>> possible so that we can get watchdog protection for as much of the >>>>>>>>> boot >>>>>>>>> as we possibly can. >>>>>>>>> So, if we are to block the SMC from Dom0, then Xen needs to take over >>>>>>>>> the pinging. It could be implemented similarly to the NMI watchdog, >>>>>>>>> except that the system will reset if the ping is missed rather than >>>>>>>>> backtrace. >>>>>>>>> It would also mean that Xen will get a whole watchdog driver-category >>>>>>>>> due to the watchdog being vendor and sometimes even SoC specific when >>>>>>>>> it >>>>>>>>> comes to Arm. >>>>>>>>> My understanding of the domain watchdog code is that today the domain >>>>>>>>> needs to call SCHEDOP_watchdog at least once to start the watchdog >>>>>>>>> timer. Since watchdog protection through the whole boot process is >>>>>>>>> desirable we'd need some core changes, such as an option to start the >>>>>>>>> domain watchdog on init. > >>>>>>>>> It's quite a big change to make >>>>>>>> >>>>>>>> For clarification, above you seem to mention two changes: >>>>>>>> >>>>>>>> 1) Allow Xen to use the HW watchdog >>>>>>>> 2) Allow the domain to use the watchdog early >>>>>>>> >>>>>>>> I am assuming by big change, you are referring to 2? >>>>>>>> >>>>>>>> , while I am not against doing it if it >>>>>>>>> makes sense, I now wonder if Xen should manage hardware watchdogs. >>>>>>>>> Looking at the domain watchdog code it looks like if a domain does not >>>>>>>>> get enough execution time, the watchdog will not be pinged enough and >>>>>>>>> the guest will be reset. So either watchdog approach requires Dom0 to >>>>>>>>> get execution time. Dom0 also needs to service all the PV backends >>>>>>>>> it's >>>>>>>>> responsible for. I'm not sure it's valuable to add another layer of >>>>>>>>> watchdog for this scenario as the end result (checking that the entire >>>>>>>>> system works) is achieved without it as well. >>>>>>>>> So, before I try to find the time to make a proposal for moving the >>>>>>>>> hardware watchdog bit to Xen, do we really want it? >>>>>>>> >>>>>>>> Thanks for the details. Given that the watchdog is enabled by the >>>>>>>> bootloader, I think we want Xen to drive the watchdog for two reasons: >>>>>>>> 1) In true dom0less environment, dom0 would not exist >>>>>>>> 2) You are relying on Xen + Dom0 to boot (or at least enough to get >>>>>>>> the watchdog working) within the watchdog interval. >>>>>>> >>>>>>> Definitely we need to consider the case where there is no Dom0. >>>>>>> >>>>>>> I think there are in fact 3 different use cases here: >>>>>>> - watchdog fully driven in a domain (dom0 or another): would work if it >>>>>>> is expected >>>>>>> that Xen + Domain boot time is under the watchdog initial refresh >>>>>>> rate. I think this >>>>>>> could make sense on some applications where your system depends on a >>>>>>> specific >>>>>>> domain to be properly booted to work. This would require an initial >>>>>>> refresh time >>>>>>> configurable in the boot loader probably. >>>>>> This is our use-case. ^ >>>>>> Our dom0 is monitoring and managing the other domains in our system. >>>>>> Without dom0 working the system isn't really working as a whole. >>>>>> @Julien: Would you be ok with the patch set continuing in the direction >>>>>> of the >>>>>> original proposal, letting another party (or me at a later time) >>>>>> implement >>>>>> the fully driven by Xen option? >>>>> I am concerned about this particular point from Bertrand: >>>>> >>>>> "would work if it is expected that Xen + Domain boot time is under the >>>>> watchdog initial refresh rate." >>>>> >>>>> How will a user be able to figure out how to initially configure the >>>>> watchdog? Is this something you can easily configure in the bootloader at >>>>> runtime? >>> >>> If you go through the trouble of enabling the watchdog in the bootloader you >>> usually know what you're doing and set the timeout yourself. >>> >>> Since our systems can be mounted in really annoying locations (both in >>> installations and world-wise) we need as much auto-recovery as possible to >>> avoid people having to travel to collect a unit that just needed a reset due >>> to some unexpected obscure issue at startup. >> >> I definitely get the need do not get me wrong. >> I am just concerned by potential users having target restarting when using >> Xen because of that and not knowing why. >> >>>> >>>> Definitely here it would be better to have the watchdog turned off by >>>> default and document how to enable it in the firmware. >>>> >>>> Even if a long timeout is configured by default, a user could run into >>>> trouble if using a linux without watchdog or not running linux or using >>>> dom0less without a linux having access to it. >>>> I agree with Julien here that the concern could be that users would come >>>> to us instead of NXP if there is system is doing a reset without reasons >>>> after some seconds or minutes. >>> >>> I could add myself as a reviewer for the iMX parts if that helps routing >>> such >>> questions (and future patches) also to me. We have experience with the QXP, >>> and the QM (for the supported parts by this patch set) is identical. >>> >>> Would that help with the concerns? >> >> Definitely it could help. > > I'll figure out how to include myself in the MAINTAINERS file for the > next spin. > Thanks. >> >>> >>>> >>>>> >>>>> >>>>> Overall, I am not for this approach at least in the current status. I >>>>> would be more inclined if there are some documentations explaining how >>>>> this is supposed to be configured on NXP, so others can use the code. >>>>> >>>>> Anyway, this is why we have multiple Arm maintainers for this kind of >>>>> situation. If they other agrees with you, then they can ack the patch and >>>>> this can be merged. >>>> >>>> I agree with Stefano that it would be good to have those board supported. >>>> >>>> One thing i could suggest until there is a watchdog driver inside Xen is >>>> to have a clear Warning at Xen boot on those boards in the console so that >>>> we could at least identify the reason easily if a reset occurs for someone. >>> >>> How do other SoCs deal with this currently? The iMX SoCs aren't the only >>> ones >>> with a watchdog, just the first one added to Xen that pings the watchdog >>> over >>> an SMC. What about the OMAPs, the R-Cars, Xilinx's, Exynos' and so on? >> >> As far as I know the watchdog is usually not active until a driver activates >> it. >> Which means that by default it will not fire. >> Having it activated by the bootloader by default is not usual. >> Now definitely on a lot of use cases the users are activating it in the >> booloader >> but their systems are design for it. >> >> Do I understand that the default boot loader configuration is not activating >> it on your side ? > > We are using a bootloader called Punchboot [1] developed by one of our > employees. With Punchboot you have to explicitly write the code to > enable the watchdog yourself. In u-boot you need to enable the watchdog > drivers and configure the watchdog first before it is started. I don't > know how the situation is in other bootloaders such as BareBox. Then a user not knowing will not have it activated and will not encounter issues. So all good from my point of view :-) Regards Bertrand > > Best regards // John Ernberg > > [1]: https://github.com/jonasblixt/punchboot
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |