[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [RFC PATCH] page_alloc: use first half of higher order chunks when halving

To: Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx>
From: Matthew Rushton <mvrushton@xxxxxxxxx>
Date: Wed, 26 Mar 2014 15:15:42 -0700
Cc: Keir Fraser <keir@xxxxxxx>, Matt Wilson <msw@xxxxxxxxxx>, Matt Wilson <msw@xxxxxxxxx>, Tim Deegan <tim@xxxxxxx>, Jan Beulich <jbeulich@xxxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, xen-devel@xxxxxxxxxxxxxxxxxxxx
Delivery-date: Wed, 26 Mar 2014 22:16:03 +0000
List-id: Xen developer discussion <xen-devel.lists.xen.org>

On 03/26/14 10:56, Konrad Rzeszutek Wilk wrote:

On Wed, Mar 26, 2014 at 10:47:44AM -0700, Matthew Rushton wrote:

On 03/26/14 09:36, Konrad Rzeszutek Wilk wrote:

On Wed, Mar 26, 2014 at 08:59:04AM -0700, Matthew Rushton wrote:

On 03/26/14 08:15, Matt Wilson wrote:

On Wed, Mar 26, 2014 at 11:08:01AM -0400, Konrad Rzeszutek Wilk wrote:

Could you elaborate a bit more on the use-case please?
My understanding is that most drivers use a scatter gather list - in which
case it does not matter if the underlaying MFNs in the PFNs spare are
not contingous.

But I presume the issue you are hitting is with drivers doing dma_map_page
and the page is not 4KB but rather large (compound page). Is that the
problem you have observed?

Drivers are using very large size arguments to dma_alloc_coherent()
for things like RX and TX descriptor rings.

Large size like larger than 512kB? That would also cause problems
on baremetal then when swiotlb is activated I believe.

I was looking at network IO performance so the buffers would not
have been that large. I think large in this context is relative to
the 4k page size and the odds of the buffer spanning a page
boundary. For context I saw ~5-10% performance increase with guest
network throughput by avoiding bounce buffers and also saw dom0 tcp
streaming performance go from ~6Gb/s to over 9Gb/s on my test setup
with a 10Gb NIC.

OK, but that would not be the dma_alloc_coherent ones then? That sounds
more like the generic TCP mechanism allocated 64KB pages instead of 4KB
and used those.

Did you try looking at this hack that Ian proposed a long time ago
to verify that it is said problem?

https://lkml.org/lkml/2013/9/4/540

Yes I had seen that and intially had the same reaction but the changewas relatively recent and not relevant. I *think* all the coherentallocations are ok since the swiotlb makes them contiguous. The problemcomes with the use of the streaming api. As one example with jumboframes enabled a driver might use larger rx buffers which triggers theproblem.

I think the right thing to do is to make the dma streaming api workbetter with larger buffers on dom0. That way it works across all driversand device types regardless of how they were designed.

--msw

It's the dma streaming api I've noticed the problem with, so
dma_map_single(). Applicable swiotlb code would be
xen_swiotlb_map_page() and range_straddles_page_boundary(). So yes
for larger buffers it can cause bouncing.



_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

Follow-Ups:
- Re: [Xen-devel] [RFC PATCH] page_alloc: use first half of higher order chunks when halving
  - From: Konrad Rzeszutek Wilk

References:
- Re: [Xen-devel] [RFC PATCH] page_alloc: use first half of higher order chunks when halving
  - From: Tim Deegan
- Re: [Xen-devel] [RFC PATCH] page_alloc: use first half of higher order chunks when halving
  - From: Matt Wilson
- Re: [Xen-devel] [RFC PATCH] page_alloc: use first half of higher order chunks when halving
  - From: Matthew Rushton
- Re: [Xen-devel] [RFC PATCH] page_alloc: use first half of higher order chunks when halving
  - From: Tim Deegan
- Re: [Xen-devel] [RFC PATCH] page_alloc: use first half of higher order chunks when halving
  - From: Matt Wilson
- Re: [Xen-devel] [RFC PATCH] page_alloc: use first half of higher order chunks when halving
  - From: Konrad Rzeszutek Wilk
- Re: [Xen-devel] [RFC PATCH] page_alloc: use first half of higher order chunks when halving
  - From: Matt Wilson
- Re: [Xen-devel] [RFC PATCH] page_alloc: use first half of higher order chunks when halving
  - From: Matthew Rushton
- Re: [Xen-devel] [RFC PATCH] page_alloc: use first half of higher order chunks when halving
  - From: Konrad Rzeszutek Wilk
- Re: [Xen-devel] [RFC PATCH] page_alloc: use first half of higher order chunks when halving
  - From: Matthew Rushton
- Re: [Xen-devel] [RFC PATCH] page_alloc: use first half of higher order chunks when halving
  - From: Konrad Rzeszutek Wilk

Prev by Date: Re: [Xen-devel] [PATCH v3 3/5] x86: Call efi_memblock_x86_reserve_range() on native EFI platform only
Next by Date: Re: [Xen-devel] [PATCH RFC 0/4] x86/AMD: support newer hardware features
Previous by thread: Re: [Xen-devel] [RFC PATCH] page_alloc: use first half of higher order chunks when halving
Next by thread: Re: [Xen-devel] [RFC PATCH] page_alloc: use first half of higher order chunks when halving
Index(es):
- Date
- Thread

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.