[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Problems building and running Xen on Hikey960



Hi Julien,

On Sat, 10 Nov 2018 at 00:22, Julien Grall <julien.grall@xxxxxxx> wrote:
> > Firstly, Xen fails to bring up any other CPUs but the one it is booting on:
> >
> > (XEN) Bringing up CPU1
> > (XEN) Failed to bring up CPU1
> > (XEN) Failed to bring up CPU 1 (error -9)
> > (XEN) Bringing up CPU2
> > (XEN) Failed to bring up CPU2
> > (XEN) Failed to bring up CPU 2 (error -9)
> > (XEN) Bringing up CPU3
> > (XEN) Failed to bring up CPU3
> > (XEN) Failed to bring up CPU 3 (error -9)
> > (XEN) Bringing up CPU4
> > (XEN) Failed to bring up CPU4
> > (XEN) Failed to bring up CPU 4 (error -9)
> > (XEN) Bringing up CPU5
> > (XEN) Failed to bring up CPU5
> > (XEN) Failed to bring up CPU 5 (error -9)
> > (XEN) Bringing up CPU6
> > (XEN) Failed to bring up CPU6
> > (XEN) Failed to bring up CPU 6 (error -9)
> > (XEN) Bringing up CPU7
> > (XEN) Failed to bring up CPU7
> > (XEN) Failed to bring up CPU 7 (error -9)
> > (XEN) Brought up 1 CPUs
> >
> > I have traced this error code -9 being returned by call_psci_cpu_on.
>
> A similar error was reported a couple of months on the mailing list. From the
> report, a regression was introduced between Xen 4.8 and unstable.
>
> Unfortunately, I don't have an hikey board to bisect it. May I ask if you can
> bisect it? If you can point the offending commit, I should be able to provide
> ideas why it breaks.

I managed to bisect this to commit
9f954a5e90414d10632e6c2fef5a33ea8a4a1e4e. Reverting this revert (!) on
top of current master leads to the CPUs (at least the big cores, as
expected) being brought online correctly:

(XEN) Bringing up CPU1
(XEN) CPU 1 booted.
(XEN) Bringing up CPU2
(XEN) CPU 2 booted.
(XEN) Bringing up CPU3
(XEN) CPU 3 booted.
(XEN) Bringing up CPU4
(XEN) CPU4 MIDR (0x410fd091) does not match boot CPU MIDR (0x410fd034),
(XEN) disable cpu (see big.LITTLE.txt under docs/).
(XEN) CPU4 never came online
(XEN) Failed to bring up CPU 4 (error -5)
(XEN) Bringing up CPU5
(XEN) CPU5 MIDR (0x410fd091) does not match boot CPU MIDR (0x410fd034),
(XEN) disable cpu (see big.LITTLE.txt under docs/).
(XEN) CPU5 never came online
(XEN) Failed to bring up CPU 5 (error -5)
(XEN) Bringing up CPU6
(XEN) CPU6 MIDR (0x410fd091) does not match boot CPU MIDR (0x410fd034),
(XEN) disable cpu (see big.LITTLE.txt under docs/).
(XEN) CPU6 never came online
(XEN) Failed to bring up CPU 6 (error -5)
(XEN) Bringing up CPU7
(XEN) CPU7 MIDR (0x410fd091) does not match boot CPU MIDR (0x410fd034),
(XEN) disable cpu (see big.LITTLE.txt under docs/).
(XEN) CPU7 never came online
(XEN) Failed to bring up CPU 7 (error -5)
(XEN) Brought up 4 CPUs

> > Secondly, Linux fails when it tries to initialise AMBA devices:
> >
> > [    0.941352] Synchronous External Abort: synchronous external abort
> > (0x96000210) at 0xffff0000093fdfe
> > 0
> > [    0.950601] Internal error: : 96000210 [#1] PREEMPT SMP
> > [    0.955866] Modules linked in:
> > [    0.958990] CPU: 0 PID: 1 Comm: swapper/0 Not tainted
> > 4.14.0-rc7-linaro-hikey960+ #8
> > [    0.966791] Hardware name: HiKey960 (DT)
> > [    0.970777] task: ffff80001d900000 task.stack: ffff000008058000
> > [    0.976778] PC is at amba_device_try_add+0x108/0x260
> > [    0.981791] LR is at amba_device_try_add+0xf0/0x260
> > [    0.986735] pc : [<ffff0000084eada8>] lr : [<ffff0000084ead90>]
> > pstate: 60000045
> > [    0.994192] sp : ffff00000805bbf0
> > [    0.997572] x29: ffff00000805bbf0 x28: 0000000000000000
> > [    1.002953] x27: ffff0000090203a8 x26: 0000000000000000
> > [    1.008326] x25: ffff80001dbf9810 x24: 0000000000000000
> > [    1.013702] x23: ffff0000093fd000 x22: 0000000000001000
> > [    1.019079] x21: ffff80001cc1b6f8 x20: 0000000000000000
> > [    1.024455] x19: ffff80001cc1b400 x18: 0000000000000010
> > [    1.029832] x17: 0000000000000001 x16: 00000000deadbeef
> > [    1.035209] x15: 0000000000000006 x14: ffffffffffffffff
> > [    1.040585] x13: 0000000000000020 x12: 0101010101010101
> > [    1.045962] x11: 0000000000000020 x10: 0101010101010101
> > [    1.051338] x9 : 0000000000000000 x8 : ffff80001cc0cf00
> > [    1.056717] x7 : 0000000000000000 x6 : 000000000000003f
> > [    1.062092] x5 : 0000000000000000 x4 : 0000000000000000
> > [    1.067468] x3 : 0000000000000000 x2 : 0000000000000000
> > [    1.072845] x1 : ffff80001d900000 x0 : ffff0000093fdfe0
> > [    1.078223] Process swapper/0 (pid: 1, stack limit = 0xffff000008058000)
> > [    1.084989] Call trace:
> > [    1.087504] Exception stack(0xffff00000805bab0 to 0xffff00000805bbf0)
> > [    1.094008] baa0:
> > ffff0000093fdfe0 ffff80001d900000
> > [    1.101902] bac0: 0000000000000000 0000000000000000
> > 0000000000000000 0000000000000000
> > [    1.109790] bae0: 000000000000003f 0000000000000000
> > ffff80001cc0cf00 0000000000000000
> > [    1.117684] bb00: 0101010101010101 0000000000000020
> > 0101010101010101 0000000000000020
> > [    1.125576] bb20: ffffffffffffffff 0000000000000006
> > 00000000deadbeef 0000000000000001
> > [    1.133468] bb40: 0000000000000010 ffff80001cc1b400
> > 0000000000000000 ffff80001cc1b6f8
> > [    1.141356] bb60: 0000000000001000 ffff0000093fd000
> > 0000000000000000 ffff80001dbf9810
> > [    1.149248] bb80: 0000000000000000 ffff0000090203a8
> > 0000000000000000 ffff00000805bbf0
> > [    1.157139] bba0: ffff0000084ead90 ffff00000805bbf0
> > ffff0000084eada8 0000000060000045
> > [    1.165034] bbc0: ffff00000805bbf0 ffff0000084ead90
> > ffffffffffffffff 00000000fffffffe
> > [    1.172921] bbe0: ffff00000805bbf0 ffff0000084eada8
> > [    1.177865] [<ffff0000084eada8>] amba_device_try_add+0x108/0x260
> > [    1.183935] [<ffff0000084eafec>] amba_device_add+0x1c/0xd8
> > [    1.189493] [<ffff00000890fbbc>] of_platform_bus_create+0x26c/0x300
> > [    1.195814] [<ffff00000890fa74>] of_platform_bus_create+0x124/0x300
> > [    1.202145] [<ffff00000890fd7c>] of_platform_populate+0x4c/0xb0
> > [    1.208135] [<ffff000008f6e2ac>] 
> > of_platform_default_populate_init+0x64/0x78
> > [    1.215247] [<ffff000008083978>] do_one_initcall+0x38/0x120
> > [    1.220882] [<ffff000008f20d18>] kernel_init_freeable+0x184/0x224
> > [    1.227038] [<ffff000008a6a360>] kernel_init+0x10/0x100
> > [    1.232323] [<ffff000008084b60>] ret_from_fork+0x10/0x18
> > [    1.237703] Code: d10082c0 52800002 8b0002e0 52800018 (b9400001)
> > [    1.243880] ---[ end trace dcbf70aa30c979a8 ]---
> > [    1.248573] Kernel panic - not syncing: Attempted to kill init!
> > exitcode=0x0000000b
> >
> > This does not occur when booting the same Linux kernel without Xen.
> >
> > I have traced this to amba_device_try_add trying to access the pid and
> > cid through ioremap'd addresses of this node in the device tree (in
> > arch/arm64/boot/dts/hisilicon/hi3660-coresight.dtsi):
>
> Thank you for confirming it works on Linux baremetal and tracked down the 
> problem.
>
> It is not entirely clear why you receive an external abort here. This may be 
> due
> to misconfiguration of the stage-2 page-tables.
>
> Could you try Xen with this small changes? The patch should print a message is
> the virtual abort was received by Xen but forwarded to the guest.
>
> diff --git a/xen/arch/arm/traps.c b/xen/arch/arm/traps.c
> index 51d2e42c77..f95135d030 100644
> --- a/xen/arch/arm/traps.c
> +++ b/xen/arch/arm/traps.c
> @@ -660,6 +660,8 @@ static void inject_vabt_exception(struct cpu_user_regs 
> *regs)
>   {
>       const union hsr hsr = { .bits = regs->hsr };
>
> +    printk("%pv: Inject Virtual Abort\n", current);
> +    dump_execution_state();
> +
>       /*
>        * SVC/HVC/SMC already have an adjusted PC (See ARM ARM DDI 0487A.j
>        * D1.10.1 for more details), which we need to correct in order to

I did not try your patch, but...

> > /* A73 cluster internal coresight */
> > etm@4,ed440000 {
> >          compatible = "arm,coresight-etm4x","arm,primecell";
> >          reg = <0 0xed440000 0 0x1000>;
> >          clocks = <&pclk>;
> >          clock-names = "apb_pclk";
> >          cpu = <&cpu4>;
> >          port {
> >                  etm4_out_port: endpoint {
> >                  remote-endpoint = <&funnel1_in_port0>;
> >                  };
> >          };
> > };
> >
> > ARM is still relatively new to me and I'm stuck what I should be
> > attempting next. I would simply not compile Linux AMBA support (by not
> > setting CONFIG_AMBA) but it appears that this is selected as a reverse
> > dependency by CONFIG_ARM on Linux, so am unsure if this is wise or
> > even possible.
>
> I would just drop the node you copied above from the Device-Tree and see if 
> you
> can go further in the boot.

...turns out that these nodes appear to belong to the little cores
(which were not brought online previously and still aren't with the
reverted revert), so munging the DT so as to remove these nodes fixes
this problem too.

Thanks for your help!

- Matthew

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.