[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [RFC PATCH] page_alloc: use first half of higher order chunks when halving

On Wed, Mar 26, 2014 at 03:15:42PM -0700, Matthew Rushton wrote:
> On 03/26/14 10:56, Konrad Rzeszutek Wilk wrote:
> >On Wed, Mar 26, 2014 at 10:47:44AM -0700, Matthew Rushton wrote:
> >>On 03/26/14 09:36, Konrad Rzeszutek Wilk wrote:
> >>>On Wed, Mar 26, 2014 at 08:59:04AM -0700, Matthew Rushton wrote:
> >>>>On 03/26/14 08:15, Matt Wilson wrote:
> >>>>>On Wed, Mar 26, 2014 at 11:08:01AM -0400, Konrad Rzeszutek Wilk wrote:
> >>>>>>Could you elaborate a bit more on the use-case please?
> >>>>>>My understanding is that most drivers use a scatter gather list - in 
> >>>>>>which
> >>>>>>case it does not matter if the underlaying MFNs in the PFNs spare are
> >>>>>>not contingous.
> >>>>>>
> >>>>>>But I presume the issue you are hitting is with drivers doing 
> >>>>>>dma_map_page
> >>>>>>and the page is not 4KB but rather large (compound page). Is that the
> >>>>>>problem you have observed?
> >>>>>Drivers are using very large size arguments to dma_alloc_coherent()
> >>>>>for things like RX and TX descriptor rings.
> >>>Large size like larger than 512kB? That would also cause problems
> >>>on baremetal then when swiotlb is activated I believe.
> >>I was looking at network IO performance so the buffers would not
> >>have been that large. I think large in this context is relative to
> >>the 4k page size and the odds of the buffer spanning a page
> >>boundary. For context I saw ~5-10% performance increase with guest
> >>network throughput by avoiding bounce buffers and also saw dom0 tcp
> >>streaming performance go from ~6Gb/s to over 9Gb/s on my test setup
> >>with a 10Gb NIC.
> >OK, but that would not be the dma_alloc_coherent ones then? That sounds
> >more like the generic TCP mechanism allocated 64KB pages instead of 4KB
> >and used those.
> >
> >Did you try looking at this hack that Ian proposed a long time ago
> >to verify that it is said problem?
> >
> >https://lkml.org/lkml/2013/9/4/540
> >
> Yes I had seen that and intially had the same reaction but the
> change was relatively recent and not relevant. I *think* all the
> coherent allocations are ok since the swiotlb makes them contiguous.
> The problem comes with the use of the streaming api. As one example
> with jumbo frames enabled a driver might use larger rx buffers which
> triggers the problem.
> I think the right thing to do is to make the dma streaming api work
> better with larger buffers on dom0. That way it works across all

> drivers and device types regardless of how they were designed.

Can you point me to an example of the DMA streaming API?

I am not sure if you mean 'streaming API' as scatter gather operations
using DMA API?

Is there a particular easy way for me to reproduce this. I have
to say I hadn't enabled Jumbo frame on my box since I am not even
sure if the switch I have can do it. Is there a idiots-punch-list
of how to reproduce this?

> >>>>>--msw
> >>>>It's the dma streaming api I've noticed the problem with, so
> >>>>dma_map_single(). Applicable swiotlb code would be
> >>>>xen_swiotlb_map_page() and range_straddles_page_boundary(). So yes
> >>>>for larger buffers it can cause bouncing.

Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.