Xen project Mailing List

Re: [Xen-devel] [PATCH RFC 0/4]: xen-net{back, front}: Multiple transmit and receive queues

To: Andrew Bennieston <andrew.bennieston@xxxxxxxxxx>, "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>

From: Paul Durrant <Paul.Durrant@xxxxxxxxxx>

Date: Thu, 16 Jan 2014 10:04:14 +0000

Accept-language: en-GB, en-US

Cc: Wei Liu <wei.liu2@xxxxxxxxxx>, Ian Campbell <Ian.Campbell@xxxxxxxxxx>

Delivery-date: Thu, 16 Jan 2014 10:04:25 +0000

List-id: Xen developer discussion <xen-devel.lists.xen.org>

Thread-index: AQHPEg4tkrLMu+HMMUmr4/hNIQfB6JqHH4Hw

Thread-topic: [PATCH RFC 0/4]: xen-net{back,front}: Multiple transmit and receive queues

> -----Original Message----- > From: Andrew J. Bennieston [mailto:andrew.bennieston@xxxxxxxxxx] > Sent: 15 January 2014 16:23 > To: xen-devel@xxxxxxxxxxxxxxxxxxxx > Cc: Ian Campbell; Wei Liu; Paul Durrant > Subject: [PATCH RFC 0/4]: xen-net{back,front}: Multiple transmit and receive > queues > > This patch series implements multiple transmit and receive queues (i.e. > multiple shared rings) for the xen virtual network interfaces. > > The series is split up as follows: > - Patches 1 and 3 factor out the queue-specific data for netback and > netfront respectively, and modify the rest of the code to use these > as appropriate. > - Patches 2 and 4 introduce new XenStore keys to negotiate and use > multiple shared rings and event channels, and code to connect these > as appropriate. > > All other transmit and receive processing remains unchanged, i.e. there > is a kthread per queue and a NAPI context per queue. > > The performance of these patches has been analysed in detail, with > results available at: > > http://wiki.xenproject.org/wiki/Xen-netback_and_xen-netfront_multi- > queue_performance_testing > Nice numbers! > To summarise: > * Using multiple queues allows a VM to transmit at line rate on a 10 > Gbit/s NIC, compared with a maximum aggregate throughput of 6 Gbit/s > with a single queue. > * For intra-host VM--VM traffic, eight queues provide 171% of the > throughput of a single queue; almost 12 Gbit/s instead of 6 Gbit/s. > * There is a corresponding increase in total CPU usage, i.e. this is a > scaling out over available resources, not an efficiency improvement. > * Results depend on the availability of sufficient CPUs, as well as the > distribution of interrupts and the distribution of TCP streams across > the queues. > > One open issue is how to deal with the tx_credit data for rate limiting. > This used to exist on a per-VIF basis, and these patches move it to > per-queue to avoid contention on concurrent access to the tx_credit > data from multiple threads. This has the side effect of breaking the > tx_credit accounting across the VIF as a whole. I cannot see a situation > in which people would want to use both rate limiting and a > high-performance multi-queue mode, but if this is problematic then it > can be brought back to the VIF level, with appropriate protection. > Obviously, it continues to work identically in the case where there is > only one queue. > > Queue selection is currently achieved via an L4 hash on the packet (i.e. > TCP src/dst port, IP src/dst address) and is not negotiated between the > frontend and backend, since only one option exists. Future patches to > support other frontends (particularly Windows) will need to add some > capability to negotiate not only the hash algorithm selection, but also > allow the frontend to specify some parameters to this. > Yes, Windows RSS stipulates a Toeplitz hash and specifies a hash key and mapping table. There's further awkwardness in the need to pass the actual hash value to the frontend too - but we could use an 'extra' seg for that, analogous to passing the GSO mss value through. Paul > Queue-specific XenStore entries for ring references and event channels > are stored hierarchically, i.e. under .../queue-N/... where N varies > from 0 to one less than the requested number of queues (inclusive). If > only one queue is requested, it falls back to the flat structure where > the ring references and event channels are written at the same level as > other vif information. > > -- > Andrew J. Bennieston _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.