[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH 1/2] xen/arm: Add imx8q{m,x} platform glue


  • To: John Ernberg <john.ernberg@xxxxxxxx>
  • From: Bertrand Marquis <Bertrand.Marquis@xxxxxxx>
  • Date: Thu, 21 Mar 2024 16:15:33 +0000
  • Accept-language: en-GB, en-US
  • Arc-authentication-results: i=2; mx.microsoft.com 1; spf=pass (sender ip is 63.35.35.123) smtp.rcpttodomain=lists.xenproject.org smtp.mailfrom=arm.com; dmarc=pass (p=none sp=none pct=100) action=none header.from=arm.com; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com; arc=pass (0 oda=1 ltdi=1 spf=[1,1,smtp.mailfrom=arm.com] dkim=[1,1,header.d=arm.com] dmarc=[1,1,header.from=arm.com])
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=arm.com; dmarc=pass action=none header.from=arm.com; dkim=pass header.d=arm.com; arc=none
  • Arc-message-signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=69aJvN153GSKxFDX3cPN7v7+CUGmUGaFIwQt6g7ryyo=; b=NfbOW9GYJwlMq50GPlJwmGkcR5Nlei53d2nUmUFpHiji9K52g6i7a4EcJByseww34ok3cK+NUnzNFtRFU8sLlqtwUqlMRXSmfGP07Ae6wlcK+PcnPOp5iA35Kt+4c6dzKAYJpMePmQHrGRbue8qzGun4Y5uD3tXDMtCMVqOq8TY+oroCe6XARRRHaNMExJ+bRwKZEd55AuzDRIo/dPVO8S33O1sAy0x4I+MTjzNzr9QkWoEE5c2CvuZrBeW3o0K1zhKTHBwfs2puj7cqLXp0Bu/GMj4XZzJOlqiKWOTe7SV7bJP+BkwEcBnNN7A9e3XUUOoOnjliH1JtVfOw4yARLQ==
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=69aJvN153GSKxFDX3cPN7v7+CUGmUGaFIwQt6g7ryyo=; b=IyZkTQuUgzzaUUz7gKy9sYHzhhLQitPyIef1gZYktHf08CDgqEibE8bRVfuPKfxPqswJOm9R8r/vApwme9oWhtOKhPhPbfmHE12lSB/KSG3Pi7QonHZ7iZ5FIqC6Iec7ve0BctgRZUnYRMMCq07nxmY1o8tQDzeRXvthPFAhmIglLk8o7kuX6ZJ3BeqcpXnqpZ2us7gclYuylcJ5vnwL09mfrB1XGkvK8EkLRaukqrBrEzNy4s6TpiglMimaiD3oe/lFih+8K8ysDskLECAKQaOn52LbPIHGi+XrM2jggRYBA4qL2+92Glt1u5g/ykb2BeJtTbmTW7wF0YpozTglUA==
  • Arc-seal: i=2; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=pass; b=eYwBdgNdt6szFJnEV4pChF3MPDUP2rB9KtZV49kOGbjCQEnPP18i/qBovB1E4EoBJmXIGEo6eBBJ2s9B6zc4JU8HqrZleB/KED9+x4XvOIlbak6whA0w8Gh7D3l4Ff47sLxf2U5j4fHWGW4Lc+pVLWAoLggy7C/n4RxoN34+ZJXc920rZkwOeoWWFAePeCmc30X4Uu4beDcbQbBwjZtyPqc6ITMuTbbVLVLXqvVUCwMFYB5zCPhqcf0bHW4pmYV/p9MexibAFt5UX8yb/dTKD87jen4SolNslLZF8Z/TZcebEWYMKj/TMFU6oS7RPdLItAOLFJR9clyhZYabA+7mqQ==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=nnAELmX9XjTJBWFUzbENONjoQjDewai5H/ViaZ5WceEeptEwti6WD2WKOU+9GcMaFYsRRmaECNziKXcnAKHXyNxjTk7sEQqBb1gfv0/h24m9oyC6omcfSjiKQGqw+yp7jJkIWsJxcnCE3esHv/i3a0gO4dDzLLtq1rXafybl1YxbWJXHIx5DCHXc7GD3Ps6jQjr1UrDTOuwbyjdG/y+SX3crEcjm4HymqpB0UhwGCaQS3gWzJvAuY0ViWY0beHaxtw+mq2LW8CRLNbLJ+EndntM2nsMzZ2Fx2LOMJ4A2/8JAtj0JmVLH9WJor/okXhMDTMdq/n2K3sBxpWKee2RatQ==
  • Authentication-results-original: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com;
  • Cc: Julien Grall <julien@xxxxxxx>, Peng Fan <peng.fan@xxxxxxx>, Stefano Stabellini <sstabellini@xxxxxxxxxx>, Michal Orzel <michal.orzel@xxxxxxx>, Volodymyr Babchuk <Volodymyr_Babchuk@xxxxxxxx>, Jonas Blixt <jonas.blixt@xxxxxxxx>, "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>
  • Delivery-date: Thu, 21 Mar 2024 16:15:55 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>
  • Nodisclaimer: true
  • Original-authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com;
  • Thread-index: AQHaVDuqVQ/kgxLZgUK/m8/hE0ez5bDz198AgAA1LICAAVx4AIAAQjaAgAEdtQCAAynKwIABnCuAgAZ8E4CAKNxCgIAApf0AgAKGQwCAAAbHgIAHmWEAgAtpgACAABVOgIAA6w+AgACMxYCAAAK5AA==
  • Thread-topic: [PATCH 1/2] xen/arm: Add imx8q{m,x} platform glue

Hi John,

> On 21 Mar 2024, at 17:05, John Ernberg <john.ernberg@xxxxxxxx> wrote:
> 
> Hi Bertrand,
> 
> On 3/21/24 08:41, Bertrand Marquis wrote:
>> Hi,
>> 
>>> On 20 Mar 2024, at 18:40, Julien Grall <julien@xxxxxxx> wrote:
>>> 
>>> Hi John,
>>> 
>>> On 20/03/2024 16:24, John Ernberg wrote:
>>>> Hi Bertrand,
>>>> On 3/13/24 11:07, Bertrand Marquis wrote:
>>>>> Hi,
>>>>> 
>>>>>> On 8 Mar 2024, at 15:04, Julien Grall <julien@xxxxxxx> wrote:
>>>>>> 
>>>>>> Hi John,
>>>>>> 
>>>>>> Thank you for the reply.
>>>>>> 
>>>>>> On 08/03/2024 13:40, John Ernberg wrote:
>>>>>>> On 3/7/24 00:07, Julien Grall wrote:
>>>>>>>>> Ping on the watchdog discussion bits.
>>>>>>>> 
>>>>>>>> Sorry for the late reply.
>>>>>>>> 
>>>>>>>> On 06/03/2024 13:13, John Ernberg wrote:
>>>>>>>>> On 2/9/24 14:14, John Ernberg wrote:
>>>>>>>>>> 
>>>>>>>>>>>      * IMX_SIP_TIMER_*:  This seems to be related to the watchdog.
>>>>>>>>>>> Shouldn't dom0 rely on the watchdog provided by Xen instead? So 
>>>>>>>>>>> those
>>>>>>>>>>> call will be used by Xen.
>>>>>>>>>> 
>>>>>>>>>> That is indeed a watchdog SIP, and also for setting the SoC internal 
>>>>>>>>>> RTC
>>>>>>>>>> if it is being used.
>>>>>>>>>> 
>>>>>>>>>> I looked around if there was previous discussion and only really
>>>>>>>>>> found [3].
>>>>>>>>>> Is the xen/xen/include/watchdog.h header meant to be for this kind of
>>>>>>>>>> watchdog support or is that more for the VM watchdog? Looking at the 
>>>>>>>>>> x86
>>>>>>>>>> ACPI NMI watchdog it seems like the former, but I have never worked 
>>>>>>>>>> with
>>>>>>>>>> x86 nor ACPI.
>>>>>>>> 
>>>>>>>> include/watchdog.h contains helper to configure the watchdog for Xen. 
>>>>>>>> We
>>>>>>>> also have per-VM watchdog and this is configured by the hypercall
>>>>>>>> SCHEDOP_watchdog.
>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> Currently forwarding it to Dom0 has not caused any watchdog resets 
>>>>>>>>>> with
>>>>>>>>>> our watchdog timeout settings, our specific Dom0 setup and VM count.
>>>>>>>> 
>>>>>>>> IIUC, the SMC API for the watchdog would be similar to the ACPI NMI
>>>>>>>> watchdog. So I think it would make more sense if this is not exposed to
>>>>>>>> dom0 (even if Xen is doing nothing with it).
>>>>>>>> 
>>>>>>>> Can you try to hide the SMCs and check if dom0 still behave properly?
>>>>>>>> 
>>>>>>>> Cheers,
>>>>>>>> 
>>>>>>> This SMC manages a hardware watchdog, if it's not pinged within a
>>>>>>> specific interval the entire board resets.
>>>>>> 
>>>>>> Do you know what's the default interval? Is it large enough so Xen + 
>>>>>> dom0 can boot (at least up to when the watchdog driver is initialized)?
>>>>>> 
>>>>>>> If I block the SMCs the watchdog driver in Dom0 will fail to ping the
>>>>>>> watchdog, triggering a board reset because the system looks to have
>>>>>>> become unresponsive. The reason this patch set started is because we
>>>>>>> couldn't ping the watchdog when running with Xen.
>>>>>>> In our specific system the bootloader enables the watchdog as early as
>>>>>>> possible so that we can get watchdog protection for as much of the boot
>>>>>>> as we possibly can.
>>>>>>> So, if we are to block the SMC from Dom0, then Xen needs to take over
>>>>>>> the pinging. It could be implemented similarly to the NMI watchdog,
>>>>>>> except that the system will reset if the ping is missed rather than
>>>>>>> backtrace.
>>>>>>> It would also mean that Xen will get a whole watchdog driver-category
>>>>>>> due to the watchdog being vendor and sometimes even SoC specific when it
>>>>>>> comes to Arm.
>>>>>>> My understanding of the domain watchdog code is that today the domain
>>>>>>> needs to call SCHEDOP_watchdog at least once to start the watchdog
>>>>>>> timer. Since watchdog protection through the whole boot process is
>>>>>>> desirable we'd need some core changes, such as an option to start the
>>>>>>> domain watchdog on init. >
>>>>>>> It's quite a big change to make
>>>>>> 
>>>>>> For clarification, above you seem to mention two changes:
>>>>>> 
>>>>>> 1) Allow Xen to use the HW watchdog
>>>>>> 2) Allow the domain to use the watchdog early
>>>>>> 
>>>>>> I am assuming by big change, you are referring to 2?
>>>>>> 
>>>>>> , while I am not against doing it if it
>>>>>>> makes sense, I now wonder if Xen should manage hardware watchdogs.
>>>>>>> Looking at the domain watchdog code it looks like if a domain does not
>>>>>>> get enough execution time, the watchdog will not be pinged enough and
>>>>>>> the guest will be reset. So either watchdog approach requires Dom0 to
>>>>>>> get execution time. Dom0 also needs to service all the PV backends it's
>>>>>>> responsible for. I'm not sure it's valuable to add another layer of
>>>>>>> watchdog for this scenario as the end result (checking that the entire
>>>>>>> system works) is achieved without it as well.
>>>>>>> So, before I try to find the time to make a proposal for moving the
>>>>>>> hardware watchdog bit to Xen, do we really want it?
>>>>>> 
>>>>>> Thanks for the details. Given that the watchdog is enabled by the 
>>>>>> bootloader, I think we want Xen to drive the watchdog for two reasons:
>>>>>> 1) In true dom0less environment, dom0 would not exist
>>>>>> 2) You are relying on Xen + Dom0 to boot (or at least enough to get the 
>>>>>> watchdog working) within the watchdog interval.
>>>>> 
>>>>> Definitely we need to consider the case where there is no Dom0.
>>>>> 
>>>>> I think there are in fact 3 different use cases here:
>>>>> - watchdog fully driven in a domain (dom0 or another): would work if it 
>>>>> is expected
>>>>>    that Xen + Domain boot time is under the watchdog initial refresh 
>>>>> rate. I think this
>>>>>    could make sense on some applications where your system depends on a 
>>>>> specific
>>>>>    domain to be properly booted to work. This would require an initial 
>>>>> refresh time
>>>>>    configurable in the boot loader probably.
>>>> This is our use-case. ^
>>>> Our dom0 is monitoring and managing the other domains in our system.
>>>> Without dom0 working the system isn't really working as a whole.
>>>> @Julien: Would you be ok with the patch set continuing in the direction
>>>> of the
>>>> original proposal, letting another party (or me at a later time) implement
>>>> the fully driven by Xen option?
>>> I am concerned about this particular point from Bertrand:
>>> 
>>> "would work if it is expected that Xen + Domain boot time is under the 
>>> watchdog initial refresh rate."
>>> 
>>> How will a user be able to figure out how to initially configure the 
>>> watchdog? Is this something you can easily configure in the bootloader at 
>>> runtime?
> 
> If you go through the trouble of enabling the watchdog in the bootloader you
> usually know what you're doing and set the timeout yourself.
> 
> Since our systems can be mounted in really annoying locations (both in
> installations and world-wise) we need as much auto-recovery as possible to
> avoid people having to travel to collect a unit that just needed a reset due
> to some unexpected obscure issue at startup.

I definitely get the need do not get me wrong.
I am just concerned by potential users having target restarting when using Xen 
because of that and not knowing why.

>> 
>> Definitely here it would be better to have the watchdog turned off by 
>> default and document how to enable it in the firmware.
>> 
>> Even if a long timeout is configured by default, a user could run into 
>> trouble if using a linux without watchdog or not running linux or using 
>> dom0less without a linux having access to it.
>> I agree with Julien here that the concern could be that users would come to 
>> us instead of NXP if there is system is doing a reset without reasons after 
>> some seconds or minutes.
> 
> I could add myself as a reviewer for the iMX parts if that helps routing 
> such
> questions (and future patches) also to me. We have experience with the QXP,
> and the QM (for the supported parts by this patch set) is identical.
> 
> Would that help with the concerns?

Definitely it could help.

> 
>> 
>>> 
>>> 
>>> Overall, I am not for this approach at least in the current status. I would 
>>> be more inclined if there are some documentations explaining how this is 
>>> supposed to be configured on NXP, so others can use the code.
>>> 
>>> Anyway, this is why we have multiple Arm maintainers for this kind of 
>>> situation. If they other agrees with you, then they can ack the patch and 
>>> this can be merged.
>> 
>> I agree with Stefano that it would be good to have those board supported.
>> 
>> One thing i could suggest until there is a watchdog driver inside Xen is to 
>> have a clear Warning at Xen boot on those boards in the console so that we 
>> could at least identify the reason easily if a reset occurs for someone.
> 
> How do other SoCs deal with this currently? The iMX SoCs aren't the only 
> ones
> with a watchdog, just the first one added to Xen that pings the watchdog 
> over
> an SMC. What about the OMAPs, the R-Cars, Xilinx's, Exynos' and so on?

As far as I know the watchdog is usually not active until a driver activates it.
Which means that by default it will not fire.
Having it activated by the bootloader by default is not usual.
Now definitely on a lot of use cases the users are activating it in the 
booloader
but their systems are design for it.

Do I understand that the default boot loader configuration is not activating it 
on your side ?

Regards
Bertrand

> 
> Thanks! // John Ernberg





 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.