[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH 1/2] xen/arm: Add imx8q{m,x} platform glue



Hi Julien,

On 3/8/24 15:04, Julien Grall wrote:
> Hi John,
> 
> Thank you for the reply.
> 
> On 08/03/2024 13:40, John Ernberg wrote:
>> On 3/7/24 00:07, Julien Grall wrote:
>>>   > Ping on the watchdog discussion bits.
>>>
>>> Sorry for the late reply.
>>>
>>> On 06/03/2024 13:13, John Ernberg wrote:
>>>> On 2/9/24 14:14, John Ernberg wrote:
>>>>>
>>>>>>      * IMX_SIP_TIMER_*:  This seems to be related to the watchdog.
>>>>>> Shouldn't dom0 rely on the watchdog provided by Xen instead? So those
>>>>>> call will be used by Xen.
>>>>>
>>>>> That is indeed a watchdog SIP, and also for setting the SoC 
>>>>> internal RTC
>>>>> if it is being used.
>>>>>
>>>>> I looked around if there was previous discussion and only really
>>>>> found [3].
>>>>> Is the xen/xen/include/watchdog.h header meant to be for this kind of
>>>>> watchdog support or is that more for the VM watchdog? Looking at 
>>>>> the x86
>>>>> ACPI NMI watchdog it seems like the former, but I have never worked 
>>>>> with
>>>>> x86 nor ACPI.
>>>
>>> include/watchdog.h contains helper to configure the watchdog for Xen. We
>>> also have per-VM watchdog and this is configured by the hypercall
>>> SCHEDOP_watchdog.
>>>
>>>>>
>>>>> Currently forwarding it to Dom0 has not caused any watchdog resets 
>>>>> with
>>>>> our watchdog timeout settings, our specific Dom0 setup and VM count.
>>>
>>> IIUC, the SMC API for the watchdog would be similar to the ACPI NMI
>>> watchdog. So I think it would make more sense if this is not exposed to
>>> dom0 (even if Xen is doing nothing with it).
>>>
>>> Can you try to hide the SMCs and check if dom0 still behave properly?
>>>
>>> Cheers,
>>>
>>
>> This SMC manages a hardware watchdog, if it's not pinged within a
>> specific interval the entire board resets.
> 
> Do you know what's the default interval? Is it large enough so Xen + 
> dom0 can boot (at least up to when the watchdog driver is initialized)?
> 
>>
>> If I block the SMCs the watchdog driver in Dom0 will fail to ping the
>> watchdog, triggering a board reset because the system looks to have
>> become unresponsive. The reason this patch set started is because we
>> couldn't ping the watchdog when running with Xen.
>>
>> In our specific system the bootloader enables the watchdog as early as
>> possible so that we can get watchdog protection for as much of the boot
>> as we possibly can.
>>
>> So, if we are to block the SMC from Dom0, then Xen needs to take over
>> the pinging. It could be implemented similarly to the NMI watchdog,
>> except that the system will reset if the ping is missed rather than
>> backtrace.
>> It would also mean that Xen will get a whole watchdog driver-category
>> due to the watchdog being vendor and sometimes even SoC specific when it
>> comes to Arm.
>>
>> My understanding of the domain watchdog code is that today the domain
>> needs to call SCHEDOP_watchdog at least once to start the watchdog
>> timer. Since watchdog protection through the whole boot process is
>> desirable we'd need some core changes, such as an option to start the
>> domain watchdog on init. >
>> It's quite a big change to make
> 
> For clarification, above you seem to mention two changes:
> 
>   1) Allow Xen to use the HW watchdog
>   2) Allow the domain to use the watchdog early
> 
> I am assuming by big change, you are referring to 2?

Both of them. I'm expecting the addition of a new driver category 
(hardware watchdog) to be a decent amount of work as well.
> 
> , while I am not against doing it if it
>> makes sense, I now wonder if Xen should manage hardware watchdogs.
>> Looking at the domain watchdog code it looks like if a domain does not
>> get enough execution time, the watchdog will not be pinged enough and
>> the guest will be reset. So either watchdog approach requires Dom0 to
>> get execution time. Dom0 also needs to service all the PV backends it's
>> responsible for. I'm not sure it's valuable to add another layer of
>> watchdog for this scenario as the end result (checking that the entire
>> system works) is achieved without it as well.
>>
>> So, before I try to find the time to make a proposal for moving the
>> hardware watchdog bit to Xen, do we really want it?
> 
> Thanks for the details. Given that the watchdog is enabled by the 
> bootloader, I think we want Xen to drive the watchdog for two reasons:
>   1) In true dom0less environment, dom0 would not exist
>   2) You are relying on Xen + Dom0 to boot (or at least enough to get 
> the watchdog working) within the watchdog interval.
> 
> Let see what the other Arm maintainer thinks.
> 

Regards // John Ernberg

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.