[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] RE: [Xen-devel] arp during live migration
> > In my case, I NEVER see the gratuitous ARP being sent (confirmed > using > > tcpdump on peth0 in Dom0) and the return value from dev_queue_xmit > is > > sometimes 0 and sometimes 2 (that's PLUS 2 -- congestion > notification > > [NET_XMIT_CN]). > > I am seeing the same error, indeed it looks like it is NET_XMIT_CN. I > also see 100% percent loss, the ARP never makes it to the wire in any > of > my tests. > So, I have a little more info now -- it seems that the ARP is being assembled and passed to the backend driver BUT it is ignoring it because the VIF link state is down (netif_carrier_ok() is returning FALSE) -- the link goes up shortly after, but the packet has been dropped by this time. The actual sequence of events is also a little strange (but *very* reproducible): . In the DomU, I see the following at the end of migration: . First, netfront sees the backend state change to InitWait - this causes it to attempt to connect the rings and send the ARP (even though the current state is actually Connected). . Next, the resume processing runs in netfront (I think this is expected to run first but it does not). . Now it sees the back state change to InitWait a second time and attempts to send the ARP a second time. . In Dom0: . The first attempt to send the ARP is completely ignored since the backend is not connected yet (specifically, it hasn't set up the softirq handler) . The first thing we see is the front end state changing to Connected -- this causes it to initialize the connection and setup the irq handler . Now we see an irq signaled, BUT it is ignored by the backend because netif_carrier_ok() returns FALSE. . The very next thing is the link becomes ready and the backend completes its state change to the Connected state. It seems to me that problem lies in the fact that the backend sees the ARP packet before it's finished setting up the vif and ignores it. I don't know if this is relevant, but Dom0 is running with 2 VCPUs in this configuration so it's possible that the timing window here was not seen when Dom0 is run as a uni-processor... /simgr _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |