[Xen-users] Networking of domUs stops working after a few minutes


A few minutes after starting a domU, network access is no longer possible from and to it.

This does not always happen and is not easily reproducible, but seems to occur in all newly started instances of the same domU from some point in time onwards. However, also restarting the dom0 does not necessarily prevent the problem.

At the moment when the network in the domU completely stops working, there is the error message
[2178752.854380] vif vif-33-0 vif33.0: Guest Rx stalled
visible in dmesg in the dom0. The connection can already be limited a bit before of that.

It is sometimes possible to for example ping the domU for a longer time than pinging any host from the domU. Also pings may still be possible for a few minutes, while SSH sessions do no longer work. This all occurs before the "Guest Rx stalled" error.

Both dom0 and domU are Debian lenny installations. We tested the kernel version 3.16.7-ckt11-1+deb8u3 as well as 4.1.3-1~bpo8+1 on the dom0 as well as the domU. The problem happens with newly created domUs via 'xen-create-image', as well as with older domUs which have been migrated from a debian wheezy dom0. It happens with the vif-route as well as the vif-bridge script in the domU configuration.
The xen hypervisor version is 4.4.1-9+deb8u1.

When the network stops working, the ARP tables are no longer filled on dom0 and domU, for example:
Address HWtype HWaddress Flags Mask Iface (incomplete) vif33.0

Debugging with tcpdump in the minutes before the "Guest Rx stalled" error shows that packets sent from the domU reach the dom0, but the replies from dom0 do not arrive at the domU. For example pings or ARP requests show as being sent on dom0, but the corresponding entry never shows on the domU.

Thank you and best regards,
Arne Klein

