[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: repeated Kernel oops need help to debug



Hi Jurgon,

I think the commit 021a17ed796b is mostly causing the issue as you explained
I tried to port back the fixes in 5.4 and 5.5 back to 4.19 but that was out of my level (the code changed between those versions and I no longer clearly see where to apply the fixes)

So the other workaround i did is that i reverted   021a17ed796b in 4.19 and comibled new kernel
the new kernel is much more stable although the issue still occur but i would say the frequency of the occurring is 10% of  what it was before reverting  021a17ed796b

maybe if someone can port the fixed in 5.4 and 5.5 back to 4.19 it will fix the issue 100% (I still have the oops but the frequency is less than before)
 

Thanks

On Thu, Aug 6, 2020 at 11:22 AM Jürgen Groß <jgross@xxxxxxxx> wrote:
On 06.08.20 17:16, moftah moftah wrote:
> this is a bit weird
> I decided to debug the issue before patching the kernel more
> so i went to one server and change the qdisc on all network interfaces
> (not guest interfaces ) to fifo instead of pfifo fast
> and to my shock i got the same panic on pfifo_fast_dequeue !!!!
>
> another thing i noticed that when ever the issue occur on any server i
> check the cpu stack of the cpu that had the panic
> and there is 2 stacks always one for dom0 and the other one for hypervisor
>
> the hypervisor stack always has this on the panicking cpu
>    ffff832027be7d20: ffff82d08021831d kexec_crash+0x4d/0x50
>    ffff832027be7d28: 00000000fffffffe
>    ffff832027be7d30: ffff82d080218c9d do_kexec_op_internal+0x44d/0x710
>    ffff832027be7d38: 0000000000040660
>    ffff832027be7d40: 0000000000000000
>    ffff832027be7d48: 0000000000000000
>    ffff832027be7d50: 000000000000000c
>    ffff832027be7d58: 000000000000000c
>    ffff832027be7d60: ffff83202780f000
>    ffff832027be7d68: ffff832027be7da8 .+64
>    ffff832027be7d70: 000000000000000c
>    ffff832027be7d78: ffff82d080266750 vga_noop_puts+0/0x10
>    ffff832027be7d80: ffff82d080249f3a do_console_io+0x41a/0x460
>    ffff832027be7d88: ffffc90000000001
>    ffff832027be7d90: ffff83202780fa24
>    ffff832027be7d98: 000000000000e033
>
> Could the issue be the in the hypervisor side and the dom0 kernel panic
> message is just misleading ?

No, a dom0 panic will end up in the hypervisor which will then try to
trigger kexec for taking a dump (if configured).


Juergen

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.