[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [ARM] Bash often segfaults in Dom0 with the latest Xen



On 4 June 2013 15:45, Julien Grall <julien.grall@xxxxxxxxxx> wrote:
> Hi all,
>
> Since a couple of week,  I'm tracking an issue with Xen on ARM with no luck.
>
> I'm run out of idea, so I send this email to have advice from the community.
>
> Most of the time bash will abort with random error in dom0:
>   - page fault (data and prefetch abort)
>   - memory corruption (malloc corruption and invalid pointer)
>
> It's easily to reproduce by doing ./configure on the xen tree.
>
> My environment is an arndale board:
>   - linux linaro 13.05 (using arndale_xen_dom0_defconfig and 
> exynos5250_arndale.dts)
>   - opensuse 12.03 (http://en.opensuse.org/HCL:Arndale)
>   - xen upstream
>
> The linux tree can be retrieved from 
> git://xenbits.xen.org/people/julieng/linux-arm.git
> using the branch linaro-3.10.
> The previous branch is based on the linaro tree with some patches for the dts 
> and xen.
>
> The issue also occurs on the versatile express. But it's harder to reproduce.
> Here the environment is:
>   - linux linaro 13.05 (using vexpress_xen_dom0_defconfig and 
> vexpress_v2p_ca15_a7.dtb)
>   - ubuntu linaro 13.05
>   - xen upstream
>
> I have tried different distributions and linux version, the issue was the 
> same.
> I made some testing to narrow down the bug and I came to the following test 
> case:
>
> Only dom0 is running and each VCPUs are pinned to a specific cpu
> (vcpu0 -> cpu0 and vcpu1 -> cpu1).
>
> The patch below removes WFI trap and by consequence avoid a VCPU to move to
> another physical CPU.
> =========================================
> diff --git a/xen/arch/arm/traps.c b/xen/arch/arm/traps.c
> index 6cfba1a..e89ca15 100644
> --- a/xen/arch/arm/traps.c
> +++ b/xen/arch/arm/traps.c
> @@ -62,7 +62,7 @@ void __cpuinit init_traps(void)
>      WRITE_SYSREG((vaddr_t)hyp_traps_vector, VBAR_EL2);
>
>      /* Setup hypervisor traps */
> -    
> WRITE_SYSREG(HCR_PTW|HCR_BSU_OUTER|HCR_AMO|HCR_IMO|HCR_VM|HCR_TWI|HCR_TSC, 
> HCR_EL2);
> +    WRITE_SYSREG(HCR_PTW|HCR_BSU_OUTER|HCR_AMO|HCR_IMO|HCR_VM|HCR_TSC, 
> HCR_EL2);
>      isb();
>  }
>
> =========================================
>
> If a bash process is assigned to a specific cpu with taskset, the process 
> seems
> to always run without any issue.
>
> taskset -c 0 ./configure
>
> I guess it's a caching issue, but each time I've tried to play with the 
> caching
> policy Linux was not booting.
>
> Thanks in advance for any advice.

Some thoughts:

 - Does dom0 run with Stage-2 translation? If so, you should be able
to disable caches in both Hyp mode and for dom0 by manipulating the
hyp registers to try and exclude caches. If Linux doesn't boot under
such configuration, something else is completely broken, as it must be
transparent to your dom0.

 - Are you doing any swapping and/or page reclaiming? I wouldn't
assume so for dom0, but if you are, you need to maintain the icache
properly, since it can be aliasing, see
http://lxr.linux.no/linux+v3.9.4/arch/arm/kvm/mmu.c#L495 (I doubt this
is the case though)

- All other cache accesses should be coherent across cores and are
physically indexed/physically tagged so I don't see how this could be
your issue.

- Are you managing the VMID properly across physical CPU migration?
(ensure that dom0 always uses the same vmid regardless of the physical
cpu)

- Do you always see the crash in user space or kernel space in dom0 or
is it all over the map?

-Christoffer

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.