[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-users] Dom0 crashes without logging lately on Debian Stretch with Xen 4.8


  • To: "xen-users@xxxxxxxxxxxxxxxxxxxx" <xen-users@xxxxxxxxxxxxxxxxxxxx>
  • From: Volker Janzen <volker@xxxxxxxxxx>
  • Date: Fri, 2 Nov 2018 18:53:38 +0000
  • Accept-language: de-DE, en-US
  • Authentication-results: spf=none (sender IP is ) smtp.mailfrom=volker@xxxxxxxxxx;
  • Delivery-date: Fri, 02 Nov 2018 18:54:58 +0000
  • List-id: Xen user discussion <xen-users.lists.xenproject.org>
  • Spamdiagnosticmetadata: NSPM
  • Spamdiagnosticoutput: 1:99
  • Thread-index: AQHUb4DYU0THCxAau0yBitdh+L4GGaU64dFBgAHP0QCAACoXdg==
  • Thread-topic: [Xen-users] Dom0 crashes without logging lately on Debian Stretch with Xen 4.8

Hi John,

the problem is that I cannot provide any metrics or logfiles showing an error. I can only tell that dom0 is rebooting for a reason that is not logged. I have no physical access to the server. I got one other report about this kind of issue.

My assumption the cause are the backported patches is based on the current 16 day uptime. 16 days ago the server rebooted every 3-5 days. It won’t be a useful bug report from my point of view.

The other thing is that my two servers are now running upstream Xen and kernel and I might not go back to both old versions in Debian stretch. The other server had always running upstream versions and had never a problem, that’s why I updated the other, too.


Best regards
    Volker


Am 02.11.2018 um 17:23 schrieb John Naggets <hostingnuggets@xxxxxxxxx>:

I was wondering if any of you guys reported this bug/issue/problem back to the Debian community? For example on their bugs.debian org web site?

On Thu, Nov 1, 2018 at 1:47 PM Volker Janzen <volker@xxxxxxxxxx> wrote:
Hi,

I had these crash problems with the Xen version in Debian stretch, too. After 3 to 7 days the Xen server rebooted without log entry or something else to observe. The problems started when the first patches were applied by Debian. Some updates made it better, the last worse again. I checked hard drives, RAM and closely monitored metrics what might be the cause.

My solution after no longer suspecting a hardware fault: build upstream Xen 4.11 for Debian stretch. I am currently running this setup with my own build of kernel 4.19. The machines are now working stable again.


    Volker


Am 29.10.2018 um 13:13 schrieb Roalt Zijlstra | webpower <roalt.zijlstra@xxxxxxxxxxx>:

Hi there,

Ever since all the Meltdown and Spectre kernel updates and possibly also Xen 4.8 updates, we experience crashes of the Dom0 just out of the blue. Sometimes after 1 day, sometimes after a few days or even 14 days, completely random.

We have two Dell P730 servers and two Dell P720 servers with this behaviour. One thing is that we updated these machine to the latest available firmware, because that is the most secure way. Then we installed Debian Stretch with Xen 4.8 support

We have done serveral installs and 4 servers seem to crash pretty fast and other don't. In the end we think that we can lead it back to the xen-4.8.4-pre version being stable and the xen-4.8.5-pre being unstable. This was kinda independent of the kernel that we were using 4.14 or 4.9.0-8-amd64. This is off course all Debian package numbering.

As last resort  we updated on one server all DomU kernels of our Jessie servers on this Dom0 to 4.9.0 from backports instead of the 3.16 kernel. For now that seems to work, but the crashes are random so it could happen any time again. The idea is that these kernels are completely spectre& meltdown unaware and might cause trouble in Xen kernel support. I am not sure if this is true at all, but we are pretty lost what the actual cause is.

We also tested with CentOS and we also had these crashes there with certain combinations of kernel/Xen. The most recent updates seem to be more stable tough. The most frustrating part is the there is absolutely no logs to be found. No kernel oops or what.. the server just resets and boots again.

Are there others experiencing problems like this? Do you see more frequent server/kernel crashes on production servers?  

Best regards,
 
Roalt Zijlstra

_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-users
_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-users
_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-users

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.