Hi All,
I have had a few oopses in the past week already and am trying to find out what the likely cause is and, more importantly, how to resolve this.
I have tried google, but not found anything useful yet. Several results showing issues during boot (not the case as it runs succesfully for nearly a week) or related to 2.6 kernel versions.
I have attached the dmesg-output after the first time I noticed the "oops". (Actually, several in a row)
Last nights was unable to get as the server was frozen by the time I got to it. Which means I am unable to confirm fully if the pattern was the same. The last message I could still read was nearly identical to the ones I saw the first time.
I also attached a normal dmesg output, taken after boot and all VMs finished starting.
I noticed the following section which looks interesting:
===
[317321.524229] BUG: unable to handle page fault for address: ffff888510ebd0e0
[317321.524307] #PF: supervisor write access in kernel mode
[317321.524368] #PF: error_code(0x0003) - permissions violation
===
But I have no idea if this is a cause or a result of the earlier trace messages in the output.
I found a new BIOS and Firmware version available for the mainboard, which I am planning on applying this week.
The kernel is "tainted" because of the use of ZFS. No other out-of-tree modules are installed.
My distro: Gentoo
Kernel version: 5.4.38
ZFS version: 0.8.3
XEN version: 4.12.2
If more info is needed to analyse this, please let me know.
Additionally, if anyone has/knows good resources (online preferred, but hardcopy will be fine as well) I can use to analyse/understand these kernel messages I would definitely appreciate it.
Many thanks in advance,
Joost Roeleveld
|