[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Xen-devel] Good unidirectional TCP performance, weird asymetric performance going bidirectional



Hi Xen-Developers!

After several days of tuning and testing I stuck with a serious
performance degration doing bidirectional tcp networking between
dom0 to domU as well as domU to domU. I send this posting to xen-users
and was encouraged there to send to xen-devel.

Overview:
=======
dom0 to domU:
-------------
unidirectional :
dom0 -> domU : 578 Mbits/sec
dom0 <- domU : 1.22 Gbits/sec
Cool isnt it?

bidirectional:
dom0 <=> domU:
dom0 -> domU : 1.22 Gbits/sec
dom0 <- domU  : 38.2 Mbits/sec

Ups! But things it can become worse...

domU1 to domU2:
---------------
unidirectional:
domU1 -> domU2 : 410,2 Mbits/sec
domU1 <- domU2  : 378.1 Mbits/sec
Can easily live with that.

bidirectional:
domU1 <=> domU2 :
domU1 -> domU2 : 42,3 Mbits/sec
domU1 <- domU2  : 38.2 Mbits/sec
But what is that?

Some problems looking similiar to that have been discussed in postings
in xen-users list and elsewhere. I have read lots of them.
But none of the mentioned solutions (TCP-tuning, ethtool tweaking,
scheduler tuning, etc) have helped to get rid of this behavior. May be I
missed something.

This behavior is reproducable with xen 3.1, xen 3.2 on two different
machines.

* May anybody confirm this findings?
* Anybody an idea?

Any help appreciated.

Best Regards,

Volker

In the following the details ...

iperf yields the following reproductable numbers.

>From dom0 to domU:
==============
zeus:/etc/xen# iperf -c 192.168.2.22 -p5555
------------------------------------------------------------
Client connecting to 192.168.2.22, TCP port 5555
TCP window size: 27.2 KByte (default)
------------------------------------------------------------
[  3] local 192.168.2.20 port 59804 connected with 192.168.2.22 port 5555
[  3]  0.0-10.0 sec    689 MBytes    578 Mbits/sec

>From domU to dom0:
==============
apollo:~# iperf -c 192.168.2.20 -p6666 -t60
------------------------------------------------------------
Client connecting to 192.168.2.20, TCP port 6666
TCP window size: 23.3 KByte (default)
------------------------------------------------------------
[  3] local 192.168.2.22 port 36007 connected with 192.168.2.20 port 6666
[  3]  0.0-60.0 sec  8.55 GBytes  1.22 Gbits/sec

So far nothing to complain about. But

domU to dom0 full duplex (both directions simultaneous)
=================
apollo:~# iperf -c 192.168.2.20 -p6666 -t60 -d
------------------------------------------------------------
Client connecting to 192.168.2.20, TCP port 6666
TCP window size: 23.3 KByte (default)
------------------------------------------------------------
[  3] local 192.168.2.22 port 36007 connected with 192.168.2.20 port 6666
[  4] local 192.168.2.22 port 5555 connected with 192.168.2.20 port 34223
[  3]  0.0-60.0 sec  8.55 GBytes  1.22 Gbits/sec
[  4]  0.0-60.0 sec    274 MBytes  38.2 Mbits/sec

This is weird. The domU to dom0 performance is still there. But the dom0
to domU performance
drops to less than 10%.

Same holds for domU to domU communication: About 400MBit/sec in every
one direction and
40Mbit/sec bidirectionally.

This behavior is independent of the client server role of iperf. Same
happens if I start two simultaneous seperate
iperf runs in opposite direction by hand. This eliminates iperf as
source of the problem.

The test-setup is as follows:
* AMD64 Opteron Dual Core server.
* ASUS Server-Mainboard.
* 4GB RAM.
* Running debian etch. Debian Kernel 2.6.18-6-xen-amd64 for dom0 and domUs.
* xen3.2 hypervisor from debian package. (Same behavior with xen3.1)
* Dom0(zeus) 2 GB RAM, tagged on CPU 0
* DomU(apollo) 2 GB  RAM, tagged on CPU 1
* Network connected via bridge br0.

bridge name     bridge id               STP enabled     interfaces
br0             8000.001bfcdbd279       no              eth0

                                                       vif9.0

I see dropped packets in the vif2:0 interface.
But only if packages go from the domU to the dom0. In case dom0 to domU
no dropped packages.

zeus:~# ifconfig
br0       Link encap:Ethernet  HWaddr 00:1B:FC:DB:D2:79
          inet addr:192.168.2.20  Bcast:192.168.2.255  Mask:255.255.255.0
          inet6 addr: fe80::21b:fcff:fedb:d279/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:7705950 errors:0 dropped:0 overruns:0 frame:0
          TX packets:1406199 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:11028796571 (10.2 GiB)  TX bytes:1086554767 (1.0 GiB)

eth0      Link encap:Ethernet  HWaddr 00:1B:FC:DB:D2:79
          inet6 addr: fe80::21b:fcff:fedb:d279/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:6776 errors:0 dropped:0 overruns:0 frame:0
          TX packets:3001 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:1442559 (1.3 MiB)  TX bytes:339793 (331.8 KiB)
          Interrupt:23

lo        Link encap:Local Loopback
          inet addr:127.0.0.1  Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING  MTU:16436  Metric:1
          RX packets:34 errors:0 dropped:0 overruns:0 frame:0
          TX packets:34 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:2576 (2.5 KiB)  TX bytes:2576 (2.5 KiB)

vif2.0    Link encap:Ethernet  HWaddr FE:FF:FF:FF:FF:FF
          inet6 addr: fe80::fcff:ffff:feff:ffff/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:7699163 errors:0 dropped:0 overruns:0 frame:0
          TX packets:1401631 errors:0 dropped:4941 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:11027474868 (10.2 GiB)  TX bytes:1082823070 (1.0 GiB)


If have used ethtool -K eth tx off applied on the domUs interfaces to
prevent TCP checksum errors.
Checked with tcpdump for errors. No obvisouly errors in the stream. A
few out of order packages but nothing serious.

One fact is remarkable: The CPU utilisation is lower on the domU side
and strikingly low for the domU to domU case.

dom0 to domU case:
dom0   domU
96%     80%

domU to domU case:
domU   domU
15%      15%
(Data from xentop)

I'm not sure
* if this low cpu utilisation is due to the low network performance
OR
* if something limits the CPU bandwith in the domU and so causes the
degraded network performance??

All Machines (dom0, and domUs) have exactly the same TCP tuning parameters.
zeus:/etc/xen# sysctl -a | grep net.core
net.core.netdev_budget = 300
net.core.somaxconn = 128
net.core.xfrm_aevent_rseqth = 2
net.core.xfrm_aevent_etime = 10
net.core.optmem_max = 20480
net.core.message_burst = 10
net.core.message_cost = 5
net.core.netdev_max_backlog = 1000
net.core.dev_weight = 64
net.core.rmem_default = 126976
net.core.wmem_default = 126976
net.core.rmem_max = 131071
net.core.wmem_max = 131071

zeus:/etc/xen# sysctl -a | grep net.ipv4.tcp
net.ipv4.tcp_slow_start_after_idle = 1
net.ipv4.tcp_dma_copybreak = 4096
net.ipv4.tcp_workaround_signed_windows = 0
net.ipv4.tcp_base_mss = 512
net.ipv4.tcp_mtu_probing = 0
net.ipv4.tcp_abc = 0
net.ipv4.tcp_congestion_control = bic
net.ipv4.tcp_tso_win_divisor = 3
net.ipv4.tcp_moderate_rcvbuf = 1
net.ipv4.tcp_no_metrics_save = 0
net.ipv4.tcp_low_latency = 0
net.ipv4.tcp_frto = 0
net.ipv4.tcp_tw_reuse = 0
net.ipv4.tcp_adv_win_scale = 2
net.ipv4.tcp_app_win = 31
net.ipv4.tcp_rmem = 4096        87380   4194304
net.ipv4.tcp_wmem = 4096        16384   4194304
net.ipv4.tcp_mem = 196608       262144  3932160
net.ipv4.tcp_dsack = 1
net.ipv4.tcp_ecn = 0
net.ipv4.tcp_reordering = 3
net.ipv4.tcp_fack = 1
net.ipv4.tcp_orphan_retries = 0
net.ipv4.tcp_max_syn_backlog = 1024
net.ipv4.tcp_rfc1337 = 0
net.ipv4.tcp_stdurg = 0
net.ipv4.tcp_abort_on_overflow = 0
net.ipv4.tcp_tw_recycle = 0
net.ipv4.tcp_syncookies = 0
net.ipv4.tcp_fin_timeout = 60
net.ipv4.tcp_retries2 = 15
net.ipv4.tcp_retries1 = 3
net.ipv4.tcp_keepalive_intvl = 75
net.ipv4.tcp_keepalive_probes = 9
net.ipv4.tcp_keepalive_time = 7200
net.ipv4.tcp_max_tw_buckets = 180000
net.ipv4.tcp_max_orphans = 65536
net.ipv4.tcp_synack_retries = 5
net.ipv4.tcp_syn_retries = 5
net.ipv4.tcp_retrans_collapse = 1
net.ipv4.tcp_sack = 1
net.ipv4.tcp_window_scaling = 1
net.ipv4.tcp_timestamps = 1

TCP-Tuning has no influence on this behavior. Send-, and
Receive-Buffer-Sizes, Backlog length, txqueuelength etc. does not shift
this behavior
a bit.

The bottleneck is nor the machine nor the bridge-setup. The same machine
using the same bridge setup runs that
fast in bidirectional tests
------------------------------------------------------------
Client connecting to 192.168.2.202, TCP port 5555
TCP window size: 16.0 KByte (default)
------------------------------------------------------------
[  5] local 192.168.2.200 port 50169 connected with 192.168.2.202 port 5555
[  4] local 192.168.2.200 port 5555 connected with 192.168.2.202 port 53744
[ ID] Interval       Transfer     Bandwidth
[  5]  0.0-10.0 sec    830 MBytes    696 Mbits/sec
[  4]  0.0-10.0 sec  1.02 GBytes    878 Mbits/sec

with two openVZ containers having real veth interfaces. We tested openVZ
for quality zope application hosting to get rid of the kernel RAM
overhead of XEN. We expected the network of openVZ to be weak and
half-baked.

Please don't get me wrong : this is no XEN bashing attempt.

We have XEN in production since years and like to have it as long as it
stays open source and is avaible in recent kernels :-).
We simply like to understand what our xen machines do.

Best Regards

Volker

-- 
====================================================
   inqbus it-consulting      +49 ( 341 )  5643800
   Dr.  Volker Jaenisch      http://www.inqbus.de
   Herloßsohnstr.    12      0 4 1 5 5    Leipzig
   N  O  T -  F Ä L L E      +49 ( 170 )  3113748
====================================================



_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.