[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bug] Bring up Dom0 on Arm board



On Sat, 28 Jan 2023, 蔡力刚 wrote:
> Hi, 
> I was on vacation, sorry for the late reply.
> 
> >>>>>> About the mali and sdmmc drivers problem, I compare the log between 
> >>>>>>boot with xen and boot without xen.
> >>>>>> And found an error log as below:
> >>>>>> [ 65.517345] arm-scmi firmware:scmi: SCMI Notifications - Core Enabled.
> >>>>>> (XEN) d0v2 Unhandled SMC/HVC: 0x82000010
> >>>>>> [ 66.559382] arm-scmi firmware:scmi: unable to communicate with SCMI
> >>>>>> [ 66.559516] arm-scmi: probe of firmware:scmi failed with error -95
> >>>>>> It seems SCMI driver probe failed.
> >>>>>> So I did an experiment, disable SCMI driver and rebuild the Linux 
> >>>>>>kernel,
> >>>>>> boot up in normal way without xen, and reproduces the problem that 
> >>>>>>mali and sdmmc did not bring up.
> >>>>>> It looks like a high probability SCMI cause the problem.
> >>>>>> I read the Linux code and targeting located the error -95,
> >>>>>> It seems SCMI probe failed cause by SMCCC not supported, code as below:
> >>>>>> static int smc_send_message(struct scmi_chan_info *cinfo,
> >>>>>> struct scmi_xfer *xfer)
> >>>>>> {
> >>>>>> struct scmi_smc *scmi_info = cinfo->transport_info;
> >>>>>> struct arm_smccc_res res;
> >>>>>> mutex_lock(&scmi_info->shmem_lock);
> >>>>>> shmem_tx_prepare(scmi_info->shmem, xfer);
> >>>>>> if (scmi_info->irq)
> >>>>>> reinit_completion(&scmi_info->tx_complete);
> >>>>>> arm_smccc_1_1_invoke(scmi_info->func_id, 0, 0, 0, 0, 0, 0, 0, &res);
> >>>>>> if (scmi_info->irq)
> >>>>>> wait_for_completion(&scmi_info->tx_complete);
> >>>>>> scmi_rx_callback(scmi_info->cinfo, 
> >>>>>>shmem_read_header(scmi_info->shmem));
> >>>>>> mutex_unlock(&scmi_info->shmem_lock);
> >>>>>> /* Only SMCCC_RET_NOT_SUPPORTED is valid error code */
> >>>>>> if (res.a0)
> >>>>>> return -EOPNOTSUPP;
> >>>>>> return 0;
> >>>>>> }
> >>>>>> #define EOPNOTSUPP 95 /* Operation not supported on transport endpoint 
> >>>>>>*/
> >>>>>> I also check the code where Unhandled SMC/HVC print in xen,
> >>>>>> and found the log cause by unhandled SMCCC call in function 
> >>>>>>vsmccc_handle_call().
> >>>>>> Could it be xen unhandle SMCCC call cause SCMI driver probe failed ?
> >>>>>
> >>>>> Yes. The domain would need to talk to the host SCMI. This is not yet
> >>>>> supported because Xen doesn't provide a mediator (this is necessary to
> >>>>> ensure the safety of the call).
> >>>>>
> >>>>> If you are *only* looking to use the Mali driver in dom0. So you could
> >>>>> add some code in Xen to forward simply forward the request to the host
> >>>>> and check if it helps you.
> >>>>
> >>>> How can I pass the requset to the host? I'm not familiar with xen code,
> >>>> is there any
> >>>> reference code in xen?
> >>>
> >>> There are a couple of solution:
> >>> 1) You request the hypervisor to avoid trapping SVC. This would also
> >>> need some changes in Linux to force the Mali driver to use SVC rather
> >>> than HVC. There is a patch on xen-devel, to avoid trapping (see [1]).
> >>> 2) Add an allow list of the SMCCC operations. There are some examples
> >>> how to "emulate" SMC call in Xen (see vsmccc_handle_call()).
> >>>
> >>> Cheers,
> >>>
> >>> [1]
> >>> 
> >>>https://lore.kernel.org/xen-devel/alpine.DEB.2.21.2106241749310.24906@sstabellini-ThinkPad-T480s/
> >> I tried both solutions but it didn't work.
> >> First way:
> >> I add the code as the patch, add forward_smc=true to the Xen command line.
> >> Then boot, but still print the log 'Unhandled SMC/HVC ...',
> 
> Julien Grall wrote:
> > Can you confirm whether you ask the mali driver to use SMC call?
> 
> I add log to print hsr.ec value in do_trap_guest_sync(), 
> and shows call are HSR_EC_HVC64 and HSR_EC_SMC64.
> But HCR_TSC seems only handle HSR_EC_SMC64 call.
> 
> >> meanwhile xen
> >> throw an exception, log as below:
> >> (XEN) traps.c:1987:d0v3 HSR=0x92000007 pc=0xffffffc0106a3be8 
> >>gva=0xffffffc0127ad0a0 gpa=0x000000001000a0
> >> [ 5.966596] Unhandled fault at 0xffffffc0127ad0a0
> >> [ 5.966619] Mem abort info:
> >> [ 5.966633] ESR = 0x96000000
> >> [ 5.966649] EC = 0x25: DABT (current EL), IL = 32 bits
> >> [ 5.966666] SET = 0, FnV = 0
> >> [ 5.966680] EA = 0, S1PTW = 0
> >> [ 5.966694] Data abort info:
> >> [ 5.966708] ISV = 0, ISS = 0x00000000
> >> [ 5.966722] CM = 0, WnR = 0
> >> [ 5.966738] swapper pgtable: 4k pages, 39-bit VAs, pgdp=00000000194b6000
> >> It seems forward_smc=true did not work, but cause an exception.
> 
> Julien Grall wrote:
> > This is indicating that the dom0 is trying to access a region that is 
> > not mapped.
> 
> > Can you check what the address 0x1000a0 is used for the host layout?
> 
> Stefano Stabellini wrote:
> > This is not "Unhandled SMC/HVC". The guest is trying to access address
> > 0x1000a0 which doesn't seem to be valid in the guest?
>
> > Do you know where 0x1000a0 is coming from? Could it be that one of the
> > SMC calls returns 0x1000a0 to the guest and the guest tries to access
> > it, but actually 0x1000a0 is not present in any valid device tree ranges
> > so it is not accessible from the guest?
> 
> > If 0x1000a0 is a "special" address returned by the firmware, it needs to
> > belong to a range described in one of the device tree nodes for it to be
> > accessible by dom0.
> 
> I searched 0x1000a0 in kernel code and in dtb files, but found nothing.
> So far I haven't figured out where 0x1000a0 is coming from.
> 
> >> Second way:
> >> I used zynqmp_eemi() in xilinx-zynqmp-eemi.c to handle smc call.
> >> The smc call seems succeed according to xen log.
> >> But after run smc call, kernel throw an exception, log as below:
> >> [ 8.771446] rockchip-pm-domain fd8d8000.power-management:power-controller:
> >> Looking up pcie-supply from device tree
> >> [ 8.771485] rockchip-pm-domain fd8d8000.power-management:power-controller:
> >> Looking up pcie-supply property in node 
> >>/power-management@fd8d8000/power-controller failed
> >> [ 66.037851] rcu: INFO: rcu_sched detected stalls on CPUs/tasks:
> >> [ 66.037874] rcu: 3-...0: (0 ticks this GP) idle=196/1/0x4000000000000000 
> >>softirq=30/30 fqs=6000
> >> [ 66.037892] (detected by 5, t=18002 jiffies, g=-1147, q=81)
> >> [ 66.037905] Task dump for CPU 3:
> >> [ 66.037916] task:swapper/0 state:R running task stack: 0 pid: 1 ppid: 0 
> >>flags:0x0000002a
> >> [ 66.037939] Call trace:
> >> [ 66.037952] __switch_to+0x130/0x1a0
> >> [ 66.037968] 0xffffffc0112bc173
> >> It seems I need to modify zynqmp_eemi() code to adapt the kerenl,
> >> But based on what to modify zynqmp_eemi()?
> 
> Julien Grall wrote:
> > I am not sure I understand what are the changes you made in Xen. Can you 
> > post the diff?
> 
> The changes made in xen as below:
> diff --git a/xen/arch/arm/vsmc.c b/xen/arch/arm/vsmc.c
> index b633ff2fe8..83bb063016 100644
> --- a/xen/arch/arm/vsmc.c
> +++ b/xen/arch/arm/vsmc.c
> @@ -27,6 +27,7 @@
>  #include <asm/traps.h>
>  #include <asm/vpsci.h>
>  #include <asm/platform.h>
> +#include <asm/platforms/xilinx-zynqmp-eemi.h>
>  
>  /* Number of functions currently supported by Hypervisor Service. */
>  #define XEN_SMCCC_FUNCTION_COUNT 3
> @@ -280,7 +281,8 @@ static bool vsmccc_handle_call(struct cpu_user_regs *regs)
>              handled = handle_sssc(regs);
>              break;
>          case ARM_SMCCC_OWNER_SIP:
> -            handled = platform_smc(regs);
> +            // handled = platform_smc(regs);
> +            handled = zynqmp_eemi(regs);
>              break;
>          case ARM_SMCCC_OWNER_TRUSTED_APP ... ARM_SMCCC_OWNER_TRUSTED_APP_END:
>          case ARM_SMCCC_OWNER_TRUSTED_OS ... ARM_SMCCC_OWNER_TRUSTED_OS_END:
> 
> diff --git a/xen/include/asm-arm/platforms/xilinx-zynqmp-eemi.h 
> b/xen/include/asm-arm/platforms/xilinx-zynqmp-eemi.h
> index cf25a9014d..55e20e99ca 100644
> --- a/xen/include/asm-arm/platforms/xilinx-zynqmp-eemi.h
> +++ b/xen/include/asm-arm/platforms/xilinx-zynqmp-eemi.h
> @@ -18,7 +18,7 @@
>  #include <asm/smccc.h>
>  
>  #define EEMI_FID(fid) ARM_SMCCC_CALL_VAL(ARM_SMCCC_FAST_CALL, \
> -                                         ARM_SMCCC_CONV_64,   \
> +                                         ARM_SMCCC_CONV_32,   \
>                                           ARM_SMCCC_OWNER_SIP, \
>                                           fid)
> 
> 
> Stefano Stabellini wrote:
> > In theory, there is no reason why forward_smc=true would not work but
> > xilinx-zynqmp-eemi.c would work. But anyway, I am appending the changes
> > to make sure xilinx-zynqmp-eemi.c forwards everything to the firmware.
>
> > It looks like the kernel is throwing an exception because it is blocked
> > for too long. Maybe the SMC call didn't actually succeed after all.
>
> > diff --git a/xen/arch/arm/platforms/xilinx-zynqmp-eemi.c 
> >b/xen/arch/arm/platforms/xilinx-zynqmp-eemi.c
> > index 2053ed7ac5..20aa6afb47 100644
> > --- a/xen/arch/arm/platforms/xilinx-zynqmp-eemi.c
> > +++ b/xen/arch/arm/platforms/xilinx-zynqmp-eemi.c
> > @@ -57,6 +57,8 @@ bool zynqmp_eemi(struct cpu_user_regs *regs)
> >      unsigned int pm_fn = fid & 0xFFFF;
> >      enum pm_ret_status ret;
> >  
> > +    goto forward_to_fw;
> > +
> >      switch ( fid )
> >      {
> >      /* Mandatory SMC32 functions. */
> 
> I tried this change in xen, still has error in kernel, log as below:
> 
> [    8.985102] rockchip-pm-domain fd8d8000.power-management:power-controller: 
> Looking up pcie-supply from device tree
> [    8.985185] rockchip-pm-domain fd8d8000.power-management:power-controller: 
> Looking up pcie-supply property in node 
> /power-management@fd8d8000/power-controller failed
> [   65.911230] rcu: INFO: rcu_sched detected stalls on CPUs/tasks:
> [   65.911274] rcu:     0-...0: (1 ticks this GP) 
> idle=1ba/1/0x4000000000000000 softirq=29/29 fqs=6001 
> [   65.911311]  (detected by 2, t=18003 jiffies, g=-1147, q=82)
> [   65.911337] Task dump for CPU 0:
> [   65.911360] task:swapper/0       state:R  running task     stack:    0 
> pid:    1 ppid:     0 flags:0x0000002a
> [   65.911406] Call trace:
> [   65.911429]  __switch_to+0x130/0x1a0
> [   65.911453]  die+0x58/0x20c
> [   65.911473]  arm64_notify_die+0x88/0x8c
> [   65.911497]  do_mem_abort+0xa0/0xb8
> [   65.911520]  el1_abort+0x3c/0x5c
> [   65.911542]  el1_sync_handler+0x94/0xd0
> [   65.911563]  el1_sync+0x88/0x140
> [   65.911587]  copy_from_kernel_nofault+0xc8/0x128
> [   65.911612]  aarch64_insn_read+0x34/0x6c
> [   65.911635]  show_data.constprop.0+0xac/0x128
> [   65.911660]  show_regs+0xd8/0x118
> [   65.911681]  die+0xcc/0x20c
> [   65.911700]  arm64_notify_die+0x88/0x8c
> [   65.911722]  do_mem_abort+0xa0/0xb8
> [   65.911743]  el1_abort+0x3c/0x5c
> [   65.911764]  el1_sync_handler+0x94/0xd0
> [   65.911785]  el1_sync+0x88/0x140
> [   65.911808]  rockchip_gem_get_ddr_info+0x28/0x40
> [   65.911833]  rockchip_drm_init+0x110/0x114
> [   65.911855]  do_one_initcall+0xa0/0x1e8
> [   65.911878]  kernel_init_freeable+0x2a4/0x2ac
> [   65.911902]  kernel_init+0x20/0x11c
> [   65.911924]  ret_from_fork+0x10/0x30
> 
> I continue to try the second way, and find that during initialization,
> SMC/HVC call come from different case, funcid 0x82000010 handle in 
> HSR_EC_HVC64 case,
> funcid 0x82000009 handle in HSR_EC_SMC64 case.
> 
> log as below: 
> (XEN) d0v2 do_trap_hvc_smccc trap....
> (XEN) d0v2 Unhandled SMC/HVC: 0x82000010
> 
> (XEN) d0v3 do_trap_smc trap....
> (XEN) d0v3 Unhandled SMC/HVC: 0x82000009
> 
> So I change the code in vsmc.c as below:
> (The changes "goto forward_to_fw" remain)
> 
> diff --git a/xen/arch/arm/vsmc.c b/xen/arch/arm/vsmc.c
> index b633ff2fe8..09636aab17 100644
> --- a/xen/arch/arm/vsmc.c
> +++ b/xen/arch/arm/vsmc.c
> @@ -27,6 +27,7 @@
>  #include <asm/traps.h>
>  #include <asm/vpsci.h>
>  #include <asm/platform.h>
> +#include <asm/platforms/xilinx-zynqmp-eemi.h>
>  
>  /* Number of functions currently supported by Hypervisor Service. */
>  #define XEN_SMCCC_FUNCTION_COUNT 3
> @@ -280,7 +281,12 @@ static bool vsmccc_handle_call(struct cpu_user_regs 
> *regs)
>              handled = handle_sssc(regs);
>              break;
>          case ARM_SMCCC_OWNER_SIP:
> -            handled = platform_smc(regs);
> +            if(hsr.ec == HSR_EC_HVC64)
> +            {
> +                handled = zynqmp_eemi(regs);
> +            }
> +            else
> +                handled = platform_smc(regs);
>              break;
>          case ARM_SMCCC_OWNER_TRUSTED_APP ... ARM_SMCCC_OWNER_TRUSTED_APP_END:
>          case ARM_SMCCC_OWNER_TRUSTED_OS ... ARM_SMCCC_OWNER_TRUSTED_OS_END:
> 
> After this change, mali and sdmmc driver bring up succeed.
> But I'm confused, is this the xen problem or kernel work SMC/HVC call in a 
> quirks way?

I think we have two problems here.

One problem, which is a known problem, is that sometimes the kernel can
make firmware calls (as SMC calls) to initialize a device driver. This
is what the "forward_smc" suggestion was meant to solve.

"forward_smc" is an effective workaround, but the proper solution would
be to write a platform driver like zynqmp_eemi which check which SMC
calls need to be forwarded and forward only those.

The second problem is that the kernel is making firmware calls using HVC
as transport instead of SMC. This is uncommon. That is the reason why
"forward_smc" didn't work. "forward_smc" only forward SMC calls, not HVC
calls.

But if you had a platform driver like zynqmp_eemi, in theory the
platform_smc() call in vsmccc_handle_call should have worked correctly
for both SMC and HVC coming from Linux.

Just to see if we understood the problem correctly, the appended patch
alone (no need for other changes) should work, if you pass
forward_firmware=true to the Xen command line.


diff --git a/xen/arch/arm/vsmc.c b/xen/arch/arm/vsmc.c
index 7335276f3f..1d11634bff 100644
--- a/xen/arch/arm/vsmc.c
+++ b/xen/arch/arm/vsmc.c
@@ -8,6 +8,7 @@
 
 
 #include <xen/lib.h>
+#include <xen/param.h>
 #include <xen/types.h>
 #include <public/arch-arm/smccc.h>
 #include <asm/cpuerrata.h>
@@ -26,6 +27,9 @@
 /* Number of functions currently supported by Standard Service Service Calls. 
*/
 #define SSSC_SMCCC_FUNCTION_COUNT (3 + VPSCI_NR_FUNCS)
 
+static bool __read_mostly forward_fw = false;
+boolean_param("forward_firmware", forward_fw);
+
 static bool fill_uid(struct cpu_user_regs *regs, xen_uuid_t uuid)
 {
     int n;
@@ -224,6 +228,27 @@ static bool vsmccc_handle_call(struct cpu_user_regs *regs)
     const union hsr hsr = { .bits = regs->hsr };
     uint32_t funcid = get_user_reg(regs, 0);
 
+    if ( forward_fw )
+    {
+        struct arm_smccc_res res;
+
+        arm_smccc_1_1_smc(get_user_reg(regs, 0),
+                          get_user_reg(regs, 1),
+                          get_user_reg(regs, 2),
+                          get_user_reg(regs, 3),
+                          get_user_reg(regs, 4),
+                          get_user_reg(regs, 5),
+                          get_user_reg(regs, 6),
+                          get_user_reg(regs, 7),
+                          &res);
+
+        set_user_reg(regs, 0, res.a0);
+        set_user_reg(regs, 1, res.a1);
+        set_user_reg(regs, 2, res.a2);
+        set_user_reg(regs, 3, res.a3);
+        return true;
+    }
+
     /*
      * Check immediate value for HVC32, HVC64 and SMC64.
      * It is not so easy to check immediate value for SMC32,

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.