[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Poor network performance between DomU with multiqueue support



On Thu, Dec 04, 2014 at 12:09:33PM +0000, Zhangleiqiang (Trump) wrote:
[...]
> > > However, I find another issue. Even using 6 queues and making sure
> > > that all of these 6 netback processes running with high cpu usage
> > > (indeed, any of it running with 87% cpu usage), the whole VM receive
> > > throughout is not very higher than results when using 4 queues. The
> > > results are from 4.5Gbps to 5.04 Gbps using TCP with 512 bytes length
> > > and 4.3Gbps to 5.78Gbps using TCP with 1460 bytes length.
> > >
> > 
> > I would like to ask if you're still using 4U4G (4 CPU 4 G?) configuration? 
> > If so,
> > please make sure there are at least the same number of vcpus as queues.
> 

> Sorry for misleading you, 4U4G means 4 CPU and 4 G memory, :). I also
> found that the max_queue of netback is determinated by min(online_cpu,
> module_param) yesterday, so when using 6 queues in the previous
> testing, I used VM with 6 CPU and 6 G Memory.

> 
> > > According to the testing result from WIKI:
> > > http://wiki.xen.org/wiki/Xen-netback_and_xen-netfront_multi-queue_perf
> > > ormance_testing, The VM receive throughput is also more lower than VM
> > > transmit.
> > >
> > 
> > I think that's expected, because guest RX data path still uses grant_copy 
> > while
> > guest TX uses grant_map to do zero-copy transmit.
> 
> As I understand, the RX process is as follows: 
> 1. Phy NIC receive packet
> 2. XEN Hypervisor trigger interrupt to Dom0
> 3. Dom0' s NIC driver do the "RX" operation, and the packet is stored into 
> SKB which is also owned/shared with netback
> 4. NetBack notify netfront through event channel that a packet is receiving
> 5. Netfront grant a buffer for receiving and notify netback the GR (if using 
> grant-resue mechanism, netfront just notify the GR to netback) through IO Ring
> 6. NetBack do the grant_copy to copy packet from its SKB to the buffer 
> referenced by GR, and notify netfront through event channel
> 7. Netfront copy the data from buffer to user-level app's SKB
> 
> Am I right?

Step 4 is not correct, netback won't notify netfront at that point.

Step 5 is not correct, all grant refs are pre-allocated and
granted before that.

Other steps look correct.

> Why not using zero-copy transmit in guest RX data pash too ?
> 

A rogue / buggy guest might hold the mapping for arbitrary long period
of time.

> 
> > > I am wondering why the VM receive throughout cannot be up to 8-10Gbps
> > > as VM transmit under multi-queue?  I also tried to send packets
> > > directly from Local Dom0 to DomU, the DomU receive throughput can
> > > reach about 8-12Gbps, so I am also wondering why transmitting packets
> > > from Dom0 to Remote DomU can only reach about 4-5Gbps throughout?
> > 
> > If data is from Dom0 to DomU then SKB is probably not fragmented by network
> > stack.  You can use tcpdump to check that.
> 
> In our testing , the MTU is set to 1600. However, even testing with
> packets whose length are 1024 (small than 1600), the throughout
> between Dom0 to Local DomU is more higher than that between Dom0 to
> Remote DomU. So maybe the fragment is not the reason for it.
> 

Don't have much idea about this, sorry.

Wei.

> 
> > Wei.
> > 
> > >
> > > > Wei.
> > > >
> > > > > ----------
> > > > > zhangleiqiang (Trump)
> > > > >
> > > > > Best Regards
> > > > >
> > > > >
> > > > > > -----Original Message-----
> > > > > > From: Wei Liu [mailto:wei.liu2@xxxxxxxxxx]
> > > > > > Sent: Tuesday, December 02, 2014 8:12 PM
> > > > > > To: Zhangleiqiang (Trump)
> > > > > > Cc: Wei Liu; zhangleiqiang; xen-devel@xxxxxxxxxxxxx; Luohao
> > > > > > (brian); Xiaoding (B); Yuzhou (C); Zhuangyuxin
> > > > > > Subject: Re: [Xen-devel] Poor network performance between DomU
> > > > > > with multiqueue support
> > > > > >
> > > > > > On Tue, Dec 02, 2014 at 11:50:59AM +0000, Zhangleiqiang (Trump)
> > wrote:
> > > > > > > > -----Original Message-----
> > > > > > > > From: xen-devel-bounces@xxxxxxxxxxxxx
> > > > > > > > [mailto:xen-devel-bounces@xxxxxxxxxxxxx] On Behalf Of Wei
> > > > > > > > Liu
> > > > > > > > Sent: Tuesday, December 02, 2014 7:02 PM
> > > > > > > > To: zhangleiqiang
> > > > > > > > Cc: wei.liu2@xxxxxxxxxx; xen-devel@xxxxxxxxxxxxx
> > > > > > > > Subject: Re: [Xen-devel] Poor network performance between
> > > > > > > > DomU with multiqueue support
> > > > > > > >
> > > > > > > > On Tue, Dec 02, 2014 at 04:30:49PM +0800, zhangleiqiang wrote:
> > > > > > > > > Hi, all
> > > > > > > > >     I am testing the performance of xen netfront-netback
> > > > > > > > > driver that with
> > > > > > > > multi-queues support. The throughput from domU to remote
> > > > > > > > dom0 is 9.2Gb/s, but the throughput from domU to remote domU
> > > > > > > > is only 3.6Gb/s, I think the bottleneck is the throughput
> > > > > > > > from dom0 to local domU. However, we have done some testing
> > > > > > > > and found the throughput from dom0 to local domU is 5.8Gb/s.
> > > > > > > > >     And if we send packets from one DomU to other 3 DomUs
> > > > > > > > > on different
> > > > > > > > host simultaneously, the sum of throughout can reach 9Gbps.
> > > > > > > > It seems like the bottleneck is the receiver?
> > > > > > > > >     After some analysis, I found that even the max_queue
> > > > > > > > > of netfront/back
> > > > > > > > is set to 4, there are some strange results as follows:
> > > > > > > > >     1. In domU, only one rx queue deal with softirq
> > > > > > > >
> > > > > > > > Try to bind irq to different vcpus?
> > > > > > >
> > > > > > > Do you mean we try to bind irq to different vcpus in DomU? I
> > > > > > > will try it
> > > > now.
> > > > > > >
> > > > > >
> > > > > > Yes. Given the fact that you have two backend threads running
> > > > > > while only one DomU vcpu is busy, it smells like misconfiguration in
> > DomU.
> > > > > >
> > > > > > If this phenomenon persists after correctly binding irqs, you
> > > > > > might want to check traffic is steering correctly to different 
> > > > > > queues.
> > > > > >
> > > > > > > >
> > > > > > > > >     2. In dom0, only two netback queues process are
> > > > > > > > > scheduled, other two
> > > > > > > > process aren't scheduled.
> > > > > > > >
> > > > > > > > How many Dom0 vcpu do you have? If it only has two then
> > > > > > > > there will only be two processes running at a time.
> > > > > > >
> > > > > > > Dom0 has 6 vcpus, and 6G memory. There are only one DomU
> > > > > > > running in
> > > > > > Dom0 and so four netback processes are running in Dom0 (because
> > > > > > the max_queue param of netback kernel module is set to 4).
> > > > > > > The phenomenon is that only 2 of these four netback process
> > > > > > > were running
> > > > > > with about 70% cpu usage, and another two use little CPU.
> > > > > > > Is there a hash algorithm to determine which netback process
> > > > > > > to handle the
> > > > > > input packet?
> > > > > > >
> > > > > >
> > > > > > I think that's whatever default algorithm Linux kernel is using.
> > > > > >
> > > > > > We don't currently support other algorithms.
> > > > > >
> > > > > > Wei.

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.