[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [RFC PATCH] page_alloc: use first half of higher order chunks when halving
On Wed, Mar 26, 2014 at 03:15:42PM -0700, Matthew Rushton wrote: > On 03/26/14 10:56, Konrad Rzeszutek Wilk wrote: > >On Wed, Mar 26, 2014 at 10:47:44AM -0700, Matthew Rushton wrote: > >>On 03/26/14 09:36, Konrad Rzeszutek Wilk wrote: > >>>On Wed, Mar 26, 2014 at 08:59:04AM -0700, Matthew Rushton wrote: > >>>>On 03/26/14 08:15, Matt Wilson wrote: > >>>>>On Wed, Mar 26, 2014 at 11:08:01AM -0400, Konrad Rzeszutek Wilk wrote: > >>>>>>Could you elaborate a bit more on the use-case please? > >>>>>>My understanding is that most drivers use a scatter gather list - in > >>>>>>which > >>>>>>case it does not matter if the underlaying MFNs in the PFNs spare are > >>>>>>not contingous. > >>>>>> > >>>>>>But I presume the issue you are hitting is with drivers doing > >>>>>>dma_map_page > >>>>>>and the page is not 4KB but rather large (compound page). Is that the > >>>>>>problem you have observed? > >>>>>Drivers are using very large size arguments to dma_alloc_coherent() > >>>>>for things like RX and TX descriptor rings. > >>>Large size like larger than 512kB? That would also cause problems > >>>on baremetal then when swiotlb is activated I believe. > >>I was looking at network IO performance so the buffers would not > >>have been that large. I think large in this context is relative to > >>the 4k page size and the odds of the buffer spanning a page > >>boundary. For context I saw ~5-10% performance increase with guest > >>network throughput by avoiding bounce buffers and also saw dom0 tcp > >>streaming performance go from ~6Gb/s to over 9Gb/s on my test setup > >>with a 10Gb NIC. > >OK, but that would not be the dma_alloc_coherent ones then? That sounds > >more like the generic TCP mechanism allocated 64KB pages instead of 4KB > >and used those. > > > >Did you try looking at this hack that Ian proposed a long time ago > >to verify that it is said problem? > > > >https://lkml.org/lkml/2013/9/4/540 > > > > Yes I had seen that and intially had the same reaction but the > change was relatively recent and not relevant. I *think* all the > coherent allocations are ok since the swiotlb makes them contiguous. > The problem comes with the use of the streaming api. As one example > with jumbo frames enabled a driver might use larger rx buffers which > triggers the problem. > > I think the right thing to do is to make the dma streaming api work > better with larger buffers on dom0. That way it works across all OK. > drivers and device types regardless of how they were designed. Can you point me to an example of the DMA streaming API? I am not sure if you mean 'streaming API' as scatter gather operations using DMA API? Is there a particular easy way for me to reproduce this. I have to say I hadn't enabled Jumbo frame on my box since I am not even sure if the switch I have can do it. Is there a idiots-punch-list of how to reproduce this? Thanks! > > >>>>>--msw > >>>>It's the dma streaming api I've noticed the problem with, so > >>>>dma_map_single(). Applicable swiotlb code would be > >>>>xen_swiotlb_map_page() and range_straddles_page_boundary(). So yes > >>>>for larger buffers it can cause bouncing. > _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |