[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] (repeatable) cross-domain networking failure




Nivedita Singhvi <niv@xxxxxxxxxx> wrote:

I don't have boxes at the moment and can't reproduce till
Monday, but can you show us the output of netstat -uan and
netstat -s on both domains? Is there stuff in the receive
or send queues?

The detailed output of netstat follows. But their is neither anything in the send queue on domU, nor anything in the receive queue on dom0. (The UDP server in question is running on port 2000.)

On dom0:

$ netstat -uan
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address           Foreign Address         State
udp        0      0 0.0.0.0:1024            0.0.0.0:*
udp        0      0 0.0.0.0:2049            0.0.0.0:*
udp        0      0 0.0.0.0:514             0.0.0.0:*
udp        0      0 0.0.0.0:1027            0.0.0.0:*
udp        0      0 155.98.36.34:1028       155.98.32.70:8509       ESTABLISHED
udp        0      0 0.0.0.0:775             0.0.0.0:*
udp        0      0 0.0.0.0:653             0.0.0.0:*
udp        0      0 192.168.0.1:2000        192.168.1.1:1024        ESTABLISHED
udp        0      0 224.4.0.1:2917          0.0.0.0:*
udp        0      0 224.4.0.1:2917          0.0.0.0:*
udp        0      0 224.4.0.1:2917          0.0.0.0:*
udp        0      0 0.0.0.0:111             0.0.0.0:*
udp        0      0 0.0.0.0:759             0.0.0.0:*

On domU:

# netstat -uan
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address           Foreign Address         State
udp        0      0 192.168.1.1:1024        192.168.0.1:2000        ESTABLISHED

The netstat -s output is a bit long, so I've attached those, instead of including them inline.

And was all the udp traffic going to the same port? i.e. any successful udp traffic to another endpoint?

All the traffic was going to port 2000. Trying to send UDP traffic from domU to a different port in dom0 (after the networking failure) does not succeed. (If you're asking if traffic could be sent to multiple ports while the networking is functional, I believe the answer is yes, but would double check.)

What does ifconfig on dom0 show?
Are there any error messages in /var/log/messages?

$ ifconfig vif1.0
vif1.0    Link encap:Ethernet  HWaddr AA:00:01:7B:92:C2
          inet addr:192.168.0.1  Bcast:192.168.0.255  Mask:255.255.0.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:134 errors:0 dropped:0 overruns:0 frame:0
          TX packets:16 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:5884 (5.7 Kb)  TX bytes:676 (676.0 b)

$ sudo tail /var/log/messages
Jan 16 19:34:09 node1 ntpd[993]: kernel time sync disabled 0041
Jan 16 19:35:15 node1 ntpd[993]: kernel time sync enabled 0001
Jan 16 19:39:29 node1 ntpd[993]: synchronized to 155.98.33.74, stratum=2
Jan 16 19:49:07 node1 ntpd[993]: time correction of -18001 seconds exceeds sanity limit (1000); set clock manually to the correct UTC time. Jan 16 19:59:15 node1 sshd(pam_unix)[1457]: session opened for user mukesh by (uid=30245) Jan 16 19:59:18 node1 sshd(pam_unix)[1486]: session opened for user mukesh by (uid=30245) Jan 16 19:59:30 node1 sshd(pam_unix)[1517]: session opened for user mukesh by (uid=30245) Jan 16 20:09:29 node1 modprobe: modprobe: Can't open dependencies file /lib/modules/2.4.27-xen0/modules.dep (No such file or directory)
Jan 16 20:09:44 node1 last message repeated 2 times
Jan 16 20:16:02 node1 kernel: device vif1.0 entered promiscuous mode

Looking at the interrupt counts in /proc/interrupts, I see that D0 no
longer receives packets sent by D1. D1, however, does receive packets
sent by D0. (To be clear, D0->D1 traffic is ICMP ping requests,
unrelated to the UDP traffic. There is not UDP traffic sent from D0 to D1.)

Is there any other successful traffic from D0 -> D1 (tcp?)

Any traffic is successful from D0->D1, even after the network stops working. This includes ICMP, UDP, and TCP. (Sorry if my comment about "There is not UDP traffic sent from D0 to D1" was confusing. What I meant was that I wasn't sending and UDP traffic from D0 to D1. Not that such traffic fails.)

This is subject to the limitation mentioned in my first message. Namely, that dom0's ARP cache entry for domU eventually times out. At that point, dom0 attempts to ARP for domU's MAC. domU sees this, and replies (as seen by tcpdump on domU). But dom0 never gets the ARP replies, so eventually D0->D1 traffic fails as well. (E.g. "telnet 192.168.1.1" returns "No route to host".)

Also, let me add some more detail to my original report:

1. The networking fails after the 128th UDP packet received in dom0, even if I restart domU. Specifically:

        - If I send one UDP packet from domU to dom0, shut down domU, and
          start a fresh domU, then I can only send 127 (rather than
          128) UDP packets from the new domU before networking will fail.

        - If I shut down domU after the networking failure, and start a
          new domU, networking between the new domU and dom0 does not
          work.

2. The server run in dom0 is
        nc -l -u -p 2000

3. The traffic generator run in domU is

        i=0; while true; do
                ((++i)); echo $i
                echo $i | nc -u -w 1 192.168.0.1 2000
        done &

thanks,
mukesh

Attachment: netstat-dom0.txt
Description: netstat -s for domain0

Attachment: netstat-domU.txt
Description: netstat -s for domain1


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.