[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] TSQ accounting skb->truesize degrades throughput for large packets

On Fri, 2013-09-06 at 17:36 +0100, Zoltan Kiss wrote:
> On 06/09/13 13:57, Eric Dumazet wrote:
> > Well, I have no problem to get line rate on 20Gb with a single flow, so
> > other drivers have no problem.
> I've made some tests on bare metal:
> Dell PE R815, Intel 82599EB 10Gb, 3.11-rc4 32 bit kernel with 3.17.3 
> ixgbe (TSO, GSO on), iperf 2.0.5
> Transmitting packets toward the remote end (so running iperf -c on this 
> host) can make 8.3 Gbps with the default 128k tcp_limit_output_bytes. 
> When I increased this to 131.506 (128k + 434 bytes) suddenly it jumped 
> to 9.4 Gbps. Iperf CPU usage also jumped a few percent from ~36 to ~40% 
> (softint percentage in top also increased from ~3 to ~5%)

Typical tradeoff between latency and throughput

If you favor throughput, then you can increase tcp_limit_output_bytes

The default is quite reasonable IMHO.

> So I guess it would be good to revisit the default value of this 
> setting. What hw you used Eric for your 20Gb results?

Mellanox CX-3

Make sure your NIC doesn't hold TX packets in TX ring too long before
signaling an interrupt for TX completion.

For example I had to patch mellanox :

commit ecfd2ce1a9d5e6376ff5c00b366345160abdbbb7
Author: Eric Dumazet <edumazet@xxxxxxxxxx>
Date:   Mon Nov 5 16:20:42 2012 +0000

    mlx4: change TX coalescing defaults
    mlx4 currently uses a too high tx coalescing setting, deferring
    TX completion interrupts by up to 128 us.
    With the recent skb_orphan() removal in commit 8112ec3b872,
    performance of a single TCP flow is capped to ~4 Gbps, unless
    we increase tcp_limit_output_bytes.
    I suggest using 16 us instead of 128 us, allowing a finer control.
    Performance of a single TCP flow is restored to previous levels,
    while keeping TCP small queues fully enabled with default sysctl.
    This patch is also a BQL prereq.
    Reported-by: Vimalkumar <j.vimal@xxxxxxxxx>
    Signed-off-by: Eric Dumazet <edumazet@xxxxxxxxxx>
    Cc: Yevgeny Petrilin <yevgenyp@xxxxxxxxxxxx>
    Cc: Or Gerlitz <ogerlitz@xxxxxxxxxxxx>
    Acked-by: Amir Vadai <amirv@xxxxxxxxxxxx>
    Signed-off-by: David S. Miller <davem@xxxxxxxxxxxxx>

Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.