[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [Xen-users] Network and SATA Instability on Xen 4.6/4.8

On Fri, Dec 8, 2017 at 9:17 PM, Kevin Stange <kevin@xxxxxxxxxxxxx> wrote:
> Hi,
> I've been running Xen 4.4 stably for some time under kernel 4.9 in dom0
> on CentOS 6 and have been trying to finally move my environment up to
> Xen 4.6 or 4.8 using CentOS 7.  Since I've built out my test server with
> Xen 4.6, I've been having issues where the Intel NICs begin flapping
> repeatedly and the SATA disk interfaces go down and will not come back
> up until I reboot the server.  Even sending the bus rescan command
> doesn't bring the drives back.  The issue seems to trigger based on
> activity, so during something like an mdraid resync is more likely to
> cause the issue, but it's not reproducible in a consistent amount of
> time, which makes it hard to tell if a particular change has definitely
> fixed it.
> This is reminiscent of a problem I had been experiencing while running
> kernel 3.18 and Xen 4.4 on CentOS 6, but the problem resolved itself
> upon upgrading to kernel 4.4 and later 4.9, so I chalked that up to
> something bad with PCIe management in kernel 3.18 and thought nothing
> more of it until now.
> The initial test environment where the issue occurred was kernel 4.9.58
> and Xen 4.6.6-7 (with security patches from CentOS).  I then tried
> upgrading to kernel 4.9.63 and Xen 4.8.2-5, which didn't result in any
> improvements.
> I tried pcie_aspm=off on the kernel line, which has helped in the past
> with similar issues, but that didn't help here.
> I tried booting without Xen (just kernel 4.9.63) and it seems like that
> made the issue go away, which lead me to believe the issue only happens
> with hardware accessed from dom0.  I dug through Xen command line
> options and tried booting with msi=off and that now seems to have
> resulted in the problem going away, or at least, the system hasn't
> exhibited the issue since last week.  Previously, the issue would tend
> to manifest after less than 24 hours.
> My hardware is Supermicro X8DT3-F with Dual Intel Xeon E5620 CPUs.
> Disk issues begin with a kernel message like this followed by continuous
> ATA command failures:
> ata2.00: exception emask 0x0 sact 0x7c01ffff serr 0x50000 action 0x6 frozen
> NIC issues begin with a message like:
> igb 0000:04:00.1: enp4s0f1: Reset adapter unexpectedly
> NICs do recover almost immediately but continue to flap periodically
> until reboot.
> I don't know if this is a bug in Xen or something else at play, but I
> could really use some help figuring out what's going on, why msi=off
> seems to fix it, and if there are any better ways to resolve this.

Jan / Andy,

Any idea why Kevin might be seeing stability issues under 4.6 / 4.8
that is solved by adding 'msi=off'?


Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.