[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-users] Debugging DomU




On 28/05/2015 02:02, Chris (Christopher) Brand wrote:
> Hi Julien,

Hi Chris,
 
>> Hmmm... I remembered a similar issue on Xen which I though was fixed in 3.13.
> 
> I hunted around quite a bit, and didn't find anything. Nothing leaps out in 
> the list of upstream kernel patches to mmu.c (there's a migration from 
> meminfo to memblock, which I tried backporting with no effect on behaviour). 
> Most of the reports of similar panics that I found, the recommendation was to 
> ensure that u-boot was disabling the L2 cache before jumping to the kernel, 
> which is presumably not helpful.

Even though, the bug occurred in mmu.c the bug was because of
miscalculation in kernel/head.S

I remembered a thread explaining the problem but I can't find
it anymore :/.

>> There was some issue with CONFIG_ARM_PATCH_PHYS_VIRT, which is required in 
>> order to boot Linux anywhere in the memory. The final result were 
>> mis->computed depending on the CONFIG_VMSPLIT_*. Can you try to use 
>> different one? FWIW, I'm using CONFIG_VMSPLIT_3G.
> 
> The result I reported was with CONFIG_ VMSPLIT_3G. With _1G and _2G, I don't 
> see that same panic, and I see successful CMA memory allocation. I don't see 
> any more boot messages after that, though, and xenctx reports a PC of 
> 0xffff000c. Hard to say whether that's better or worse :-/
> 
> Throwing some printk() calls into sanity_check_meminfo() shows that it 
> decides that all the memory is highmem, and so passes 0 to 
> memblock_set_current_limit(). That then seems to lead to the failure to find 
> suitable blocks of memory to allocate, and hence the panic.

That's exactly the problem I had with some CONFIG_VMSLIPT_*. It was
related to Linux computing a wrong offset between the virtual and
the physical address.
 
> As an experiment, I tried changing the start of memory in the DTS from 
> 0x80000000. With that change, I can get the same result with 
> CONFIG_VMSPLIT_3G as I got with the other configs above (PC=0xfff000c). That 
> seems to indicate that this is the problem you recalled, but that there's yet 
> another problem I'm hitting afterwards. I *think* I saw it go from 
> __arm_ioremap_pfn() into do_DataAbort(), but I'm far from certain.

How did you choose the 0x80000000?

As the guest shouldn't have much device (only the GIC and the timer),
I think that __arm_ioremap_pfn shouldn't be called often.

On a previous mail you were saying that you are using a custom kernel
based on 3.14, right? I'm wondering if the kernel is trying to map
device which it should not do.

Can you try to apply the patch below in Xen? It will print any guest
data abort not handled by Xen before injecting it to the guest.

diff --git a/xen/arch/arm/traps.c b/xen/arch/arm/traps.c
index 47d6cef..8b33b3e 100644
--- a/xen/arch/arm/traps.c
+++ b/xen/arch/arm/traps.c
@@ -2413,6 +2413,8 @@ static void do_trap_data_abort_guest(struct cpu_user_regs 
*regs,
     }

 bad_data_abort:
+    gdprintk(XENLOG_DEBUG, "HSR=0x%x pc=%#"PRIregister" gva=%#"PRIvaddr"\n",
+             hsr.bits, regs->pc, info.gva);
     inject_dabt_exception(regs, info.gva, hsr.len);
 }

Regards,

-- 
Julien Grall

_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxx
http://lists.xen.org/xen-users


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.