[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] domU to domU networking issues in v3.7? (netserver/netperf failing to communicate)



On Mon, Nov 12, 2012 at 06:20:55PM +0100, Sander Eikelenboom wrote:
> 
> Monday, November 12, 2012, 5:32:04 PM, you wrote:
> 
> > On Mon, Nov 12, 2012 at 03:50:24PM +0100, Sander Eikelenboom wrote:
> >> 
> >> Monday, November 12, 2012, 3:28:35 PM, you wrote:
> >> 
> >> > On Mon, Nov 12, 2012 at 09:54:44AM +0000, Ian Campbell wrote:
> >> >> On Sat, 2012-11-10 at 13:59 +0000, Konrad Rzeszutek Wilk wrote:
> >> >> > Hey Ian, Xen-devel mailingl list,
> >> >> > 
> >> >> > I think the issue of 70% traffic lost was actually introduced in v3.6 
> >> >> > or
> >> >> > perhaps v3.5. Annie and Marcos (CC-ed here) are looking to see which 
> >> >> > of
> >> >> > the releases introduced this. The issue we are seeing is that a domU
> >> >> > to domU communication breaks - this is with netperf/netserver talking 
> >> >> > to
> >> >> > each other.
> >> >> > 
> >> >> > Anyhow, I think the 3.7 compound page exacerbated the problem and also
> >> >> > (at least on some of my test hardware) exposed existing issues with
> >> >> > drivers. The issue I have is that the 'skge' driver has a bug that has
> >> >> > been there for ages (I tested way back to 3.0 and still saw it) were 
> >> >> > it
> >> >> > can not work with SWIOTLB. It is probably missing an pci_dma_sync
> >> >> > somewhere. 
> >> >> > 
> >> >> > Anyhow the compound page got me to look at Xen-SWIOTLB and that looks
> >> >> > OK. Even with synthetic driver (the fake one I posted somewhere) it
> >> >> > dealt with compound pages properly (with debug or non-debug Xen
> >> >> > hypervisor).
> >> >> 
> >> >> The debug build is probably most interesting since it deliberately
> >> >> allocates a non 1-1 p-to-m mapping so as to catch exactly these sorts of
> >> >> issues.
> >> 
> >> > Right. My test env runs with that. And so far it only has issues
> >> > with the skge one.
> >> >> 
> >> >> > So was wondering if you had looked at this in more details? Any
> >> >> > ideas? Or would it be more prudent to ask that once we know for sure
> >> >> > which Linux release introduced the communication failures between
> >> >> > guests?
> >> >> 
> >> >> I've not looked at it any further I'm afraid.
> >> >> 
> >> >> If these changes (be they in 3.5 or later, or earlier) are exposing
> >> >> driver bugs then I suspect the netdev chaps would want to know about it.
> >> 
> >> > Right. Annie (CC-ed here) mentioned to me that v3.5 looks to work ok.
> >> > And is off checking v3.6. v3.7 is definitly a no go.
> >> >> 
> >> >> FWIW I see the issue with tg3.
> >> 
> >> After the issues with netback where fixed, I'm seeing the issues with 
> >> net_front reverting the single commit 
> >> 5640f7685831e088fe6c2e1f863a6805962f8e81 (that was pointed out for 
> >> netback) also makes these disappear.
> 
> > Were you ever able to trigger the BUG_ON in the patch that Ian posted?
> 
> What exact patch (or any other patch that can help you ) ?
> (so i can try again to be sure)

This one:
http://lists.xen.org/archives/html/xen-devel/2012-10/msg00893.html

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.