[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: repeated Kernel oops need help to debug



On 26.07.20 17:47, moftah moftah wrote:
Hi All,
We have a problem that is ongoing for more than 1 month

We have several servers running xcp-ng and we are facing kernel oops that crash the server

My skill is not enough to debug the issue So need someone to point me to the right direction
the issue is not hardware related
it occurred on servers that are of different processor , nic and even kernel version (all under 4.19)

the stack trace looks like this

[2399526.430672]  ALERT: BUG: unable to handle kernel NULL pointer dereference at 0000000000000004
[2399526.430695]   INFO: PGD 447268067 P4D 447268067 PUD 44775f067 PMD 0
[2399526.430710]   WARN: Oops: 0000 [#1] SMP NOPTI
[2399526.430720]   WARN: CPU: 1 PID: 17 Comm: ksoftirqd/1 Not tainted 4.19.108 #1 [2399526.430728]   WARN: Hardware name: HP ProLiant SL230s Gen8   /, BIOS P75 05/24/2019
[2399526.430745]   WARN: RIP: e030:pfifo_fast_dequeue+0xc9/0x140
[2399526.430753]   WARN: Code: 50 28 48 8b 4f 58 f7 da 65 01 51 04 48 8b 57 50 65 48 03 15 11 64 99 7e 8b 88 cc 00 00 00 be 01 00 00 00 48 03 88 d0 00 00 00 <66> 83 79 04 00 74 04 0f b7 71 06 8b 48 28 01 72 08 48 01 0a f0 ff
[2399526.430773]   WARN: RSP: e02b:ffffc900400c3de0 EFLAGS: 00010246
[2399526.430780]   WARN: RAX: ffff88842087b900 RBX: 0000000000000001 RCX: 0000000000000000 [2399526.430789]   WARN: RDX: ffffe8fffee60a1c RSI: 0000000000000001 RDI: ffff8883de0b9c00 [2399526.430801]   WARN: RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000020 [2399526.430811]   WARN: R10: 0000000000000000 R11: ffff8883de0b9d40 R12: 0000000000000001 [2399526.430823]   WARN: R13: ffff8883db210a00 R14: 0000000000000002 R15: ffff8883de0b9c00 [2399526.430852]   WARN: FS:  00007ffac43fe700(0000) GS:ffff888451240000(0000) knlGS:0000000000000000
[2399526.430868]   WARN: CS:  e033 DS: 0000 ES: 0000 CR0: 0000000080050033
[2399526.430879]   WARN: CR2: 0000000000000004 CR3: 000000044ad58000 CR4: 0000000000040660
[2399526.430899]   WARN: Call Trace:
[2399526.430914]   WARN:  __qdisc_run+0xa2/0x4f0
[2399526.430928]   WARN:  ? __switch_to_asm+0x41/0x70
[2399526.430940]   WARN:  net_tx_action+0x148/0x230
[2399526.430949]   WARN:  __do_softirq+0xd1/0x28c
[2399526.430966]   WARN:  run_ksoftirqd+0x26/0x40
[2399526.430980]   WARN:  smpboot_thread_fn+0x10e/0x160
[2399526.430993]   WARN:  kthread+0xf8/0x130
[2399526.431004]   WARN:  ? sort_range+0x20/0x20
[2399526.431010]   WARN:  ? kthread_bind+0x10/0x10
[2399526.431017]   WARN:  ret_from_fork+0x35/0x40

I wonder whether you are missing all fixes for commit 021a17ed796b
which went into kernel 4.18. It needs following fixes on top:

d518d2ed8640 (went into 5.4), 90b2be27bb0e (went into 5.5).

From the backtrace I really doubt this is a Xen problem, BTW. Maybe
running under Xen makes the problem more likely due to different
timing.


Juergen



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.