Xen project Mailing List

Re: [Xen-devel] Poor network performance between DomU with multiqueue support

On Thu, Dec 04, 2014 at 12:09:33PM +0000, Zhangleiqiang (Trump) wrote: [...] > > > However, I find another issue. Even using 6 queues and making sure > > > that all of these 6 netback processes running with high cpu usage > > > (indeed, any of it running with 87% cpu usage), the whole VM receive > > > throughout is not very higher than results when using 4 queues. The > > > results are from 4.5Gbps to 5.04 Gbps using TCP with 512 bytes length > > > and 4.3Gbps to 5.78Gbps using TCP with 1460 bytes length. > > > > > > > I would like to ask if you're still using 4U4G (4 CPU 4 G?) configuration? > > If so, > > please make sure there are at least the same number of vcpus as queues. > > Sorry for misleading you, 4U4G means 4 CPU and 4 G memory, :). I also > found that the max_queue of netback is determinated by min(online_cpu, > module_param) yesterday, so when using 6 queues in the previous > testing, I used VM with 6 CPU and 6 G Memory. > > > > According to the testing result from WIKI: > > > http://wiki.xen.org/wiki/Xen-netback_and_xen-netfront_multi-queue_perf > > > ormance_testing, The VM receive throughput is also more lower than VM > > > transmit. > > > > > > > I think that's expected, because guest RX data path still uses grant_copy > > while > > guest TX uses grant_map to do zero-copy transmit. > > As I understand, the RX process is as follows: > 1. Phy NIC receive packet > 2. XEN Hypervisor trigger interrupt to Dom0 > 3. Dom0' s NIC driver do the "RX" operation, and the packet is stored into > SKB which is also owned/shared with netback > 4. NetBack notify netfront through event channel that a packet is receiving > 5. Netfront grant a buffer for receiving and notify netback the GR (if using > grant-resue mechanism, netfront just notify the GR to netback) through IO Ring > 6. NetBack do the grant_copy to copy packet from its SKB to the buffer > referenced by GR, and notify netfront through event channel > 7. Netfront copy the data from buffer to user-level app's SKB > > Am I right? Step 4 is not correct, netback won't notify netfront at that point. Step 5 is not correct, all grant refs are pre-allocated and granted before that. Other steps look correct. > Why not using zero-copy transmit in guest RX data pash too ? > A rogue / buggy guest might hold the mapping for arbitrary long period of time. > > > > I am wondering why the VM receive throughout cannot be up to 8-10Gbps > > > as VM transmit under multi-queue? I also tried to send packets > > > directly from Local Dom0 to DomU, the DomU receive throughput can > > > reach about 8-12Gbps, so I am also wondering why transmitting packets > > > from Dom0 to Remote DomU can only reach about 4-5Gbps throughout? > > > > If data is from Dom0 to DomU then SKB is probably not fragmented by network > > stack. You can use tcpdump to check that. > > In our testing , the MTU is set to 1600. However, even testing with > packets whose length are 1024 (small than 1600), the throughout > between Dom0 to Local DomU is more higher than that between Dom0 to > Remote DomU. So maybe the fragment is not the reason for it. > Don't have much idea about this, sorry. Wei. > > > Wei. > > > > > > > > > Wei. > > > > > > > > > ---------- > > > > > zhangleiqiang (Trump) > > > > > > > > > > Best Regards > > > > > > > > > > > > > > > > -----Original Message----- > > > > > > From: Wei Liu [mailto:wei.liu2@xxxxxxxxxx] > > > > > > Sent: Tuesday, December 02, 2014 8:12 PM > > > > > > To: Zhangleiqiang (Trump) > > > > > > Cc: Wei Liu; zhangleiqiang; xen-devel@xxxxxxxxxxxxx; Luohao > > > > > > (brian); Xiaoding (B); Yuzhou (C); Zhuangyuxin > > > > > > Subject: Re: [Xen-devel] Poor network performance between DomU > > > > > > with multiqueue support > > > > > > > > > > > > On Tue, Dec 02, 2014 at 11:50:59AM +0000, Zhangleiqiang (Trump) > > wrote: > > > > > > > > -----Original Message----- > > > > > > > > From: xen-devel-bounces@xxxxxxxxxxxxx > > > > > > > > [mailto:xen-devel-bounces@xxxxxxxxxxxxx] On Behalf Of Wei > > > > > > > > Liu > > > > > > > > Sent: Tuesday, December 02, 2014 7:02 PM > > > > > > > > To: zhangleiqiang > > > > > > > > Cc: wei.liu2@xxxxxxxxxx; xen-devel@xxxxxxxxxxxxx > > > > > > > > Subject: Re: [Xen-devel] Poor network performance between > > > > > > > > DomU with multiqueue support > > > > > > > > > > > > > > > > On Tue, Dec 02, 2014 at 04:30:49PM +0800, zhangleiqiang wrote: > > > > > > > > > Hi, all > > > > > > > > > I am testing the performance of xen netfront-netback > > > > > > > > > driver that with > > > > > > > > multi-queues support. The throughput from domU to remote > > > > > > > > dom0 is 9.2Gb/s, but the throughput from domU to remote domU > > > > > > > > is only 3.6Gb/s, I think the bottleneck is the throughput > > > > > > > > from dom0 to local domU. However, we have done some testing > > > > > > > > and found the throughput from dom0 to local domU is 5.8Gb/s. > > > > > > > > > And if we send packets from one DomU to other 3 DomUs > > > > > > > > > on different > > > > > > > > host simultaneously, the sum of throughout can reach 9Gbps. > > > > > > > > It seems like the bottleneck is the receiver? > > > > > > > > > After some analysis, I found that even the max_queue > > > > > > > > > of netfront/back > > > > > > > > is set to 4, there are some strange results as follows: > > > > > > > > > 1. In domU, only one rx queue deal with softirq > > > > > > > > > > > > > > > > Try to bind irq to different vcpus? > > > > > > > > > > > > > > Do you mean we try to bind irq to different vcpus in DomU? I > > > > > > > will try it > > > > now. > > > > > > > > > > > > > > > > > > > Yes. Given the fact that you have two backend threads running > > > > > > while only one DomU vcpu is busy, it smells like misconfiguration in > > DomU. > > > > > > > > > > > > If this phenomenon persists after correctly binding irqs, you > > > > > > might want to check traffic is steering correctly to different > > > > > > queues. > > > > > > > > > > > > > > > > > > > > > > > 2. In dom0, only two netback queues process are > > > > > > > > > scheduled, other two > > > > > > > > process aren't scheduled. > > > > > > > > > > > > > > > > How many Dom0 vcpu do you have? If it only has two then > > > > > > > > there will only be two processes running at a time. > > > > > > > > > > > > > > Dom0 has 6 vcpus, and 6G memory. There are only one DomU > > > > > > > running in > > > > > > Dom0 and so four netback processes are running in Dom0 (because > > > > > > the max_queue param of netback kernel module is set to 4). > > > > > > > The phenomenon is that only 2 of these four netback process > > > > > > > were running > > > > > > with about 70% cpu usage, and another two use little CPU. > > > > > > > Is there a hash algorithm to determine which netback process > > > > > > > to handle the > > > > > > input packet? > > > > > > > > > > > > > > > > > > > I think that's whatever default algorithm Linux kernel is using. > > > > > > > > > > > > We don't currently support other algorithms. > > > > > > > > > > > > Wei. _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.