[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Kernel panic on Xen virtualisation in Debian


  • To: xen-devel@xxxxxxxxxxxxx
  • From: Ingo Jürgensmann <ij@xxxxxxxxxxxxxxxxxx>
  • Date: Sun, 10 Jul 2016 15:18:37 +0200
  • Delivery-date: Sun, 10 Jul 2016 13:19:02 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xen.org>

Am 10.07.2016 um 00:29 schrieb Andreas Ziegler <ml@xxxxxxxxxxxxxxxxxx>:

> In May, Ingo Jürgensmann also started experiencing this problem and
> blogged about it:
> https://blog.windfluechter.net/content/blog/2016/03/23/1721-xen-randomly-crashing-server
> https://blog.windfluechter.net/content/blog/2016/05/12/1723-xen-randomly-crashing-server-part-2

Actually I’m suffering from this problem since April 2013. Here’s my story… ;)

Everything was working smoothly when I was still using a rootserver at Hetzner. 
The setup there was some sort of non-standard, as I needed to have eth0 as 
outgoing interface not being part of the Xen bridge. So I used a mixture of 
bridge and routed in xend-config.sxp. This setup worked for years without 
problems.

However: as Hetzner started to bill for every single IPv4 address, I moved to 
my new provider where I could get the same address space (/26) without being 
forced to pay for every IPv4 address. The server back then was a Cisco C200 M2.

Since I got my own VLAN at the new location, I was then able to dismiss the 
mixed setup of routing and bridging and used only bridging with eth0 now being 
part of the Xen bridge. The whole setup consists of two bridges: one for the 
external IP addresses (xenbr0) and one for internal traffic (xenbr1). This was 
already that way with Hetzner.

However, shortly after I moved to the new provider, the issues started: random 
crashes of the host. With the new provider, who was and is still very helpful, 
we exchanged for example the memory. The provider reported as well that other 
Cisco C200 server with Ubutu LTS didn’t show this issue.

Over time a pattern showed up that might cause the frequent crashes (sometimes 
several times in a row, let’s say 2-10 times a day!):

My setup is this:

Debian stable with packaged Xen hypervisor and these VMs:
1) Mail, Database, Nameserver, OpenVPN
2) Webserver, Squid3
3) Login server
4) … some more servers (10 in total), e.g. Tor Relay…

IPv4 /26 network, IPv6 /48 network

From my workplace I need to login to 3) and have a tunnel to the Squid on 2) 
via the internal addresses on xenbr1. Of course Squid queries the nameserver on 
1), so there is some internal traffic going back and forth on the internal 
bridge and traffic originating from the external bridge (xenbr0). Using Squid I 
access my Roundcube on my small homebrew server that is connected to 1) via 
OpenVPN. Of course the webserver on 2) queries the database on 1)

So, the most crashes do happen while I’m using the SSH tunnel from my 
workplace. If a crash happen, it’s most likely that at least two in a row will 
happen in a short time frame (within 1-2 hours), sometimes even within 10 mins 
after the server came back. From time to time my impression was, that the 
server crashes the second time instantly when I try to access my Roundcube at 
home.

Furthermore, I switched from using the Cisco C200 server to my own server with 
Supermicro X9SRi-F mainboard and a XEON E5-2630L V2, but still the same 
provider, and the same issue: the new server crashes the same way as the Cisco 
server did. With the new server we did a replacement of the memory as well: 
from 32G to 128G. So over time we have switched memory twice and hardware once. 
Since then I don’t assume anymore that this might be hardware related.

In the meantime I switched from using Squid on 2) to tinyproxy running on 2) as 
well as running tinyproxy on another third party VPS. Still the crashes happen, 
regardless of using Squid on 2) or not.

In May the server crashed again several times a week and several times a day. 
Really, really annoying!
So together with my provider we setup a netconsole to catch some more 
information about the crash than just the few lines from the IPMI console.

Trying linux-image 4.4 from backports didn’t help either. I switched from PV to 
PVHVM as well some months ago.

> He is pretty sure, that the problem went away after disabling IPv6.

Indeed. Since I disabled IPv6 for all of my VMs (it’s still active on dom0, but 
not routed to the domUs anymore) no single crash happened again.

> But: we can't say for sure, because on our server it sometimes happened
> often in a short period of time, but then it didn't for months.
> and: disabling IPv6 is no option for me at all.

I won’t state that I have an exact way of reproducing the crashes, but it 
happens fairly often when doing as described above.

What I can offer is:
- activate IPv6 again
- install a kernel with debugging symbols (*-dbg)
- try to provoke another crash
- send netconsole output if happened

What I cannot do:
- interpret the debug symbols
- access IPMI console from workplace (firewalled)

I’m with Andreas that disabling IPv6 cannot be an option.

--
Ciao...          //        http://blog.windfluechter.net
      Ingo     \X/     XMPP: ij@xxxxxxxxxxxxxxxxxxxxxxxx

gpg pubkey:  http://www.juergensmann.de/ij_public_key.asc



Attachment: signature.asc
Description: Message signed with OpenPGP using GPGMail

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.