[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Xen-users] Segfaults on a migrated guest



Hi *,

 I run a few Xen hosts (Supermicro X8D*, i.e. Intel Tylersburg) with
a few guests. From time to time, strange things happen to me.... this
one seems easy to describe:

I migrated a PV Linux (Slackware 14.1, kernel 4.12.2) guest from one
host (Xen 4.8.1, kernel 4.11.0) to another (Xen 4.9.0, kernel 4.12.4).
Shortly after migration some daemons on the guest machine crashed; this
is what dmesg shows:

[18440203865.237646] Suspended for 2.605 seconds
[18440203865.267996] PM: noirq restore of devices complete after 0.212 msecs
[18440203865.268170] PM: early restore of devices complete after 0.115 msecs
[18440203865.282097] PM: restore of devices complete after 12.602 msecs
[18440203865.282167] OOM killer enabled.
[18440203865.282168] Restarting tasks ... done.
[18440203865.283838] xen:manage: Unable to read sysrq code in control/sysrq
[18440203865.379604] dbus-daemon[1233]: segfault at 0 ip (null) sp 00007ffce31aaf10 error 14 in dbus-daemon[400000+61000] [18440203865.381385] ntpd[15191]: segfault at 8 ip 00007f81f24c3dc9 sp 00007ffdee851c90 error 4 in ld-2.17.so[7f81f24b5000+23000] [18440204017.834883] bash[6056]: segfault at 0 ip 00007fe5bd185c2d sp 00007fff675a6b78 error 4 in libc-2.17.so[7fe5bd0f9000+1bf000] [18440204017.865750] sshd[16597]: segfault at 7fa09372afa8 ip 00007fa093517429 sp 00007ffc4b605838 error 7 in ld-2.17.so[7fa093507000+23000] [18440204228.000316] automount[1199]: segfault at 8 ip 00007f46a9a03153 sp 00007f46a975f990 error 4 in libc-2.17.so[7f46a9983000+1bf000] [18440204729.291952] fail2ban-server[4209]: segfault at 0 ip 00007ff8339f7c2c sp 00007ff82f7861c0 error 4 in libpython2.7.so.1.0[7ff833955000+1bf000]

What seems suspicious to me are the timestamps: I'm quite sure that none
of the machines has been up for more than 500 years.

The only other thing I found is that xl dmesg is full of
"(XEN) tmem: operation requested on uncreated pool"

A different guest (kernel 4.12.3) seems fine so far; dmesg says

[18439429870.442886] Suspended for 2.513 seconds
[18439429870.443112] PM: noirq restore of devices complete after 0.157 msecs
[18439429870.443249] PM: early restore of devices complete after 0.116 msecs
[18439429870.464423] PM: restore of devices complete after 19.453 msecs
[18439429870.464498] OOM killer enabled.
[18439429870.464498] Restarting tasks ... done.
[18439429870.466351] xen:manage: Unable to read sysrq code in control/sysrq

Respective configurations are at http://camelot.lf2.cuni.cz/vejvalka/temp/reports/20170803/ .

Where else should I look, what else should I provide so that the case
is worth looking into ?

Thanks,

Jan

_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxx
https://lists.xen.org/xen-users

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.