Xen project Mailing List

Re: [Xen-users] Network and SATA Instability on Xen 4.6/4.8

From: "J. Roeleveld" <joost@xxxxxxxxxxxx>

Date: Wed, 20 Dec 2017 10:55:29 +0100

Delivery-date: Wed, 20 Dec 2017 09:56:40 +0000

List-id: Xen user discussion <xen-users.lists.xenproject.org>

On Friday, December 8, 2017 10:17:30 PM CET Kevin Stange wrote: > Hi, > > I've been running Xen 4.4 stably for some time under kernel 4.9 in dom0 > on CentOS 6 and have been trying to finally move my environment up to > Xen 4.6 or 4.8 using CentOS 7. Since I've built out my test server with > Xen 4.6, I've been having issues where the Intel NICs begin flapping > repeatedly and the SATA disk interfaces go down and will not come back > up until I reboot the server. Even sending the bus rescan command > doesn't bring the drives back. The issue seems to trigger based on > activity, so during something like an mdraid resync is more likely to > cause the issue, but it's not reproducible in a consistent amount of > time, which makes it hard to tell if a particular change has definitely > fixed it. > > This is reminiscent of a problem I had been experiencing while running > kernel 3.18 and Xen 4.4 on CentOS 6, but the problem resolved itself > upon upgrading to kernel 4.4 and later 4.9, so I chalked that up to > something bad with PCIe management in kernel 3.18 and thought nothing > more of it until now. > > The initial test environment where the issue occurred was kernel 4.9.58 > and Xen 4.6.6-7 (with security patches from CentOS). I then tried > upgrading to kernel 4.9.63 and Xen 4.8.2-5, which didn't result in any > improvements. > > I tried pcie_aspm=off on the kernel line, which has helped in the past > with similar issues, but that didn't help here. > > I tried booting without Xen (just kernel 4.9.63) and it seems like that > made the issue go away, which lead me to believe the issue only happens > with hardware accessed from dom0. I dug through Xen command line > options and tried booting with msi=off and that now seems to have > resulted in the problem going away, or at least, the system hasn't > exhibited the issue since last week. Previously, the issue would tend > to manifest after less than 24 hours. > > My hardware is Supermicro X8DT3-F with Dual Intel Xeon E5620 CPUs. > > Disk issues begin with a kernel message like this followed by continuous > ATA command failures: > > ata2.00: exception emask 0x0 sact 0x7c01ffff serr 0x50000 action 0x6 frozen > > NIC issues begin with a message like: > > igb 0000:04:00.1: enp4s0f1: Reset adapter unexpectedly > > NICs do recover almost immediately but continue to flap periodically > until reboot. > > I don't know if this is a bug in Xen or something else at play, but I > could really use some help figuring out what's going on, why msi=off > seems to fix it, and if there are any better ways to resolve this. > > Thanks. I have not seen anything like this on any server I am currently using and it's a mix of Tyan boards and Supermicro. (Switching away from Tyan for unrelated reasons) # xl info | grep command xen_commandline : dom0_mem=24GB,max:24GB console=vga dom0_max_vcpus=4 dom0_vcpus_pin gnttab_max_frames=256 # cat /proc/cmdline root=zhost/host/root by=id elevator=noop logo.nologo triggers=zfs quiet refresh softlevel=prexen FYI: I use ZFS and some of the VMs are using 2 SSDs that are maintained by the host. The majority of the storage is handled by a storage domain which has the HBA assigned to it directly. I have 4 10Gbe ports that are bonded and VLAN tagged to provide connectivity to other hosts. Mainboard: Supermicro X10DRI-T4i The hardware is occasionally stressed both on the SSDs (connected via SATA) and the network. I am running a 4.9.49 kernel with Xen 4.8.2 and ZoL 0.7.3. -- Joost _______________________________________________ Xen-users mailing list Xen-users@xxxxxxxxxxxxxxxxxxxx https://lists.xenproject.org/mailman/listinfo/xen-users

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.