[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [ARM] Bash often segfaults in Dom0 with the latest Xen
On Wed, 5 Jun 2013, Julien Grall wrote: > On 06/05/2013 03:30 PM, Christoffer Dall wrote: > > > On 5 June 2013 04:48, Julien Grall <julien.grall@xxxxxxxxxx> wrote: > >> On 06/05/2013 02:38 AM, Christoffer Dall wrote: > >> > >>> On 4 June 2013 15:45, Julien Grall <julien.grall@xxxxxxxxxx> wrote: > >>>> Hi all, > >>>> > >>>> Since a couple of week, I'm tracking an issue with Xen on ARM with no > >>>> luck. > >>>> > >>>> I'm run out of idea, so I send this email to have advice from the > >>>> community. > >>>> > >>>> Most of the time bash will abort with random error in dom0: > >>>> - page fault (data and prefetch abort) > >>>> - memory corruption (malloc corruption and invalid pointer) > >>>> > >>>> It's easily to reproduce by doing ./configure on the xen tree. > >>>> > >>>> My environment is an arndale board: > >>>> - linux linaro 13.05 (using arndale_xen_dom0_defconfig and > >>>> exynos5250_arndale.dts) > >>>> - opensuse 12.03 (http://en.opensuse.org/HCL:Arndale) > >>>> - xen upstream > >>>> > >>>> The linux tree can be retrieved from > >>>> git://xenbits.xen.org/people/julieng/linux-arm.git > >>>> using the branch linaro-3.10. > >>>> The previous branch is based on the linaro tree with some patches for > >>>> the dts and xen. > >>>> > >>>> The issue also occurs on the versatile express. But it's harder to > >>>> reproduce. > >>>> Here the environment is: > >>>> - linux linaro 13.05 (using vexpress_xen_dom0_defconfig and > >>>> vexpress_v2p_ca15_a7.dtb) > >>>> - ubuntu linaro 13.05 > >>>> - xen upstream > >>>> > >>>> I have tried different distributions and linux version, the issue was > >>>> the same. > >>>> I made some testing to narrow down the bug and I came to the following > >>>> test case: > >>>> > >>>> Only dom0 is running and each VCPUs are pinned to a specific cpu > >>>> (vcpu0 -> cpu0 and vcpu1 -> cpu1). > >>>> > >>>> The patch below removes WFI trap and by consequence avoid a VCPU to move > >>>> to > >>>> another physical CPU. > >>>> ========================================= > >>>> diff --git a/xen/arch/arm/traps.c b/xen/arch/arm/traps.c > >>>> index 6cfba1a..e89ca15 100644 > >>>> --- a/xen/arch/arm/traps.c > >>>> +++ b/xen/arch/arm/traps.c > >>>> @@ -62,7 +62,7 @@ void __cpuinit init_traps(void) > >>>> WRITE_SYSREG((vaddr_t)hyp_traps_vector, VBAR_EL2); > >>>> > >>>> /* Setup hypervisor traps */ > >>>> - > >>>> WRITE_SYSREG(HCR_PTW|HCR_BSU_OUTER|HCR_AMO|HCR_IMO|HCR_VM|HCR_TWI|HCR_TSC, > >>>> HCR_EL2); > >>>> + WRITE_SYSREG(HCR_PTW|HCR_BSU_OUTER|HCR_AMO|HCR_IMO|HCR_VM|HCR_TSC, > >>>> HCR_EL2); > >>>> isb(); > >>>> } > >>>> > >>>> ========================================= > >>>> > >>>> If a bash process is assigned to a specific cpu with taskset, the > >>>> process seems > >>>> to always run without any issue. > >>>> > >>>> taskset -c 0 ./configure > >>>> > >>>> I guess it's a caching issue, but each time I've tried to play with the > >>>> caching > >>>> policy Linux was not booting. > >>>> > >>>> Thanks in advance for any advice. > >>> > >>> Some thoughts: > >>> > >>> - Does dom0 run with Stage-2 translation? If so, you should be able > >>> to disable caches in both Hyp mode and for dom0 by manipulating the > >>> hyp registers to try and exclude caches. If Linux doesn't boot under > >>> such configuration, something else is completely broken, as it must be > >>> transparent to your dom0. > >>> > >>> - Are you doing any swapping and/or page reclaiming? I wouldn't > >>> assume so for dom0, but if you are, you need to maintain the icache > >>> properly, since it can be aliasing, see > >>> http://lxr.linux.no/linux+v3.9.4/arch/arm/kvm/mmu.c#L495 (I doubt this > >>> is the case though) > >>> > >>> - All other cache accesses should be coherent across cores and are > >>> physically indexed/physically tagged so I don't see how this could be > >>> your issue. > >> > >> It was only an idea because I have noticed the memory was often corrupted. > >> > >>> - Do you always see the crash in user space or kernel space in dom0 or > >>> is it all over the map? > >> > >> > >> Only in user space in dom0. > >> > > Hmm, which kernel version is dom0 based on? Can you bisect the dom0 > > source to make sure it's not something introduced during development. > > I'm using the linaro's branch ll_20130528.0, I have only few patches for > the dts and not yet in linaro tree patches. > > I have the same issue with linux 3.9-rc4 with multiple CPUs and I can't > really go before without carrying many xen patches to try it. > > I have tried different configuration with the number of CPUs in Xen > (pCPU) and linux (vCPU): > - 2 pCPU 2 vCPU : segfaulting > - 2 pCPU 1 vCPU : working > - 1 pCPU 1 vCPU : working > - 1 pCPU 2 vCPU : very slow but working If you put it like that, it would seem to me that the most likely candidate would be a bug in SMP support in Xen. What happen if you have 2 pCPU, 1vCPU but you keep moving the vCPU between the two pCPU? _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |