[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Dom0 kernel 4.14 with SMP randomly crashing





On Mon, Nov 5, 2018 at 6:32 PM Rishi <2rushikeshj@xxxxxxxxx> wrote:


On Mon, Nov 5, 2018 at 6:29 PM Rishi <2rushikeshj@xxxxxxxxx> wrote:
Yes, I'm taking out patches from 4.4 and actually do have a working 4.9 kernel along with blktap. Tested networking and disk IO in it.

There are roughly 415 patches to 4.4 out of which some ~210+ are already applied in 4.9 and ~220+ are already applied in 4.14. I dont have numbers for 4.19 yet.

Essentially I'm down to single digit number of patches atm to have a working setup for kernel 4.9. I know there would be mishaps since I'm not applying all patches but my experiment is to see how close can we stay near mainline kernel + what can be the patches that kernel.org can accept.



On Mon, Nov 5, 2018 at 6:19 PM Wei Liu <wei.liu2@xxxxxxxxxx> wrote:
I forgot to say: please don't top-post.

On Mon, Nov 05, 2018 at 06:00:10PM +0530, Rishi wrote:
> I'm using a XenServer Host and XCP-NG on it as HVM. I used xencons=tty
> console=ttyS0 on XCP-NG dom0 kernel line, to obtain serial console.
> I'm working on to build a more recent dom0 kernel for improved support of
> Ceph in XenServer/XCP-NG.

This is an interesting setup. I don't think you can expect to just drop
in a new kernel to XenServer/XCP-NG and then it works flawlessly. What
did you do to the patch queue XenServer carries for 4.4?

Also, have you got a working baseline? I.e. did the stock 4.4 kernel
work?

Wei.

>
>
>
> On Mon, Nov 5, 2018 at 5:28 PM Wei Liu <wei.liu2@xxxxxxxxxx> wrote:
>
> > On Mon, Nov 05, 2018 at 05:18:43PM +0530, Rishi wrote:
> > > Yes, I'm running it in a HVM domU for development purpose.
> >
> > What is your exact setup?
> >
> > Wei.
> >
> > >
> > > On Mon, Nov 5, 2018 at 5:11 PM Wei Liu <wei.liu2@xxxxxxxxxx> wrote:
> > >
> > > > On Mon, Nov 05, 2018 at 04:58:35PM +0530, Rishi wrote:
> > > > > Alright, I got the serial console and following is the crash log.
> > Thank
> > > > you
> > > > > for pointing that out.
> > > > >
> > > > > [  133.594852] watchdog: BUG: soft lockup - CPU#2 stuck for 22s!
> > > > > [ksoftirqd/2:22]
> > > > >
> > > > > [  133.599232] Kernel panic - not syncing: softlockup: hung tasks
> > > > >
> > > > > [  133.602275] CPU: 2 PID: 22 Comm: ksoftirqd/2 Tainted: G
> > > > > L    4.19.1
> > > > > #1
> > > > >
> > > > > [  133.606620] Hardware name: Xen HVM domU, BIOS 4.4.1-xs132257
> > > > 12/12/2016
> > > >
> > > > Is this serial log from the host? It says it is running as a HVM DomU.
> > > > Maybe you have mistaken guest serial log with host serial log?
> > > >
> > > > This indicates your machine runs XenServer, which has its own patch
> > > > queues on top of upstream Xen. You may also want to report to xs-devel
> > > > mailing list.
> > > >
> > > > Wei.
> > > >
> >


Sorry, I'll take care of top post from onwards. 

So after knowing the stack trace, it appears that the CPU was getting stuck for xen_hypercall_xen_version

watchdog: BUG: soft lockup - CPU#0 stuck for 23s! [swapper/0:0]


[30569.582740] watchdog: BUG: soft lockup - CPU#0 stuck for 23s! [swapper/0:0]

[30569.588186] Kernel panic - not syncing: softlockup: hung tasks

[30569.591307] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G             L    4.19.1 #1

[30569.595110] Hardware name: Xen HVM domU, BIOS 4.4.1-xs132257 12/12/2016

[30569.598356] Call Trace:

[30569.599597]  <IRQ>

[30569.600920]  dump_stack+0x5a/0x73

[30569.602998]  panic+0xe8/0x249

[30569.604806]  watchdog_timer_fn+0x200/0x230

[30569.607029]  ? softlockup_fn+0x40/0x40

[30569.609246]  __hrtimer_run_queues+0x133/0x270

[30569.611712]  hrtimer_interrupt+0xfb/0x260

[30569.613800]  xen_timer_interrupt+0x1b/0x30

[30569.616972]  __handle_irq_event_percpu+0x69/0x1a0

[30569.619831]  handle_irq_event_percpu+0x30/0x70

[30569.622382]  handle_percpu_irq+0x34/0x50

[30569.625048]  generic_handle_irq+0x1e/0x30

[30569.627216]  __evtchn_fifo_handle_events+0x163/0x1a0

[30569.629955]  __xen_evtchn_do_upcall+0x41/0x70

[30569.632612]  xen_evtchn_do_upcall+0x27/0x50

[30569.635136]  xen_do_hypervisor_callback+0x29/0x40

[30569.638181] RIP: e030:xen_hypercall_xen_version+0xa/0x20

[30569.641302] Code: 51 41 53 b8 10 00 00 00 0f 05 41 5b 59 c3 cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc 51 41 53 b8 11 00 00 00 0f 05 <41> 5b 59 c3 cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc

[30569.651998] RSP: e02b:ffff8800b6203e10 EFLAGS: 00000246

[30569.655077] RAX: 0000000000040007 RBX: ffff8800ae41a898 RCX: ffffffff8100122a

[30569.659226] RDX: ffffc900400080ff RSI: 0000000000000000 RDI: 0000000000000000

[30569.663480] RBP: ffff8800ae41a890 R08: 0000000000000000 R09: 0000000000000000

[30569.667943] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000080000600

[30569.672057] R13: 000000000000001d R14: 00000000000001d0 R15: 000000000000001d

[30569.675911]  ? xen_hypercall_xen_version+0xa/0x20

[30569.678470]  ? xen_force_evtchn_callback+0x9/0x10

[30569.681495]  ? check_events+0x12/0x20

[30569.683738]  ? xen_restore_fl_direct+0x1f/0x20

[30569.686632]  ? _raw_spin_unlock_irqrestore+0x14/0x20

[30569.689166]  ? cp_rx_poll+0x427/0x4d0 [8139cp]

[30569.691519]  ? net_rx_action+0x171/0x3a0

[30569.694219]  ? __do_softirq+0x11e/0x295

[30569.696442]  ? irq_exit+0x62/0xb0

[30569.698251]  ? xen_evtchn_do_upcall+0x2c/0x50

[30569.701037]  ? xen_do_hypervisor_callback+0x29/0x40

[30569.704439]  </IRQ>

[30569.705731]  ? xen_hypercall_sched_op+0xa/0x20

[30569.708766]  ? xen_hypercall_sched_op+0xa/0x20

[30569.711344]  ? xen_safe_halt+0xc/0x20

[30569.713353]  ? default_idle+0x80/0x140

[30569.715345]  ? do_idle+0x13a/0x250

[30569.717216]  ? cpu_startup_entry+0x6f/0x80

[30569.719511]  ? start_kernel+0x4f6/0x516

[30569.721681]  ? set_init_arg+0x57/0x57

[30569.723985]  ? xen_start_kernel+0x575/0x57f

[30569.726453] Kernel Offset: disabled



So I wrote a kernel module to try to call the function xen_hypercall_xen_version through it, it could successfully run and return the version.


What else should I be checking for?

 
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.