[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Mirage networking benchmarks and TCP hang



Hi,

In the current mirage release there is a ring corruption issue (caused by the request and response counters not being stored and loaded by single asm instructions, allowing other cores to see corrupted values as the individual byte loads and stores are permuted). The next release will have this problem fixed. We're hoping to release it today -- Vincent is working on making some fixes to his vchan stuff which will be released at the same time. We'll send an email to the list when this is done!

Cheers,
Dave


On Wed, Aug 7, 2013 at 9:09 AM, Balraj Singh <balrajsingh@xxxxxxxx> wrote:

I suspect the RX ring is getting clobbered the way the TX ring was earlier. I'll try to reproduce it and do some debugging (if I can't figure it out today unfortunately I will not be able to look at it again till the week of Aug 26).

About the speeds mirage does not do hw offloads yet, in particular it doesn't do segmentation offloading.  It looks like you have turned it off for Linux domU to Linux domU, but just to be sure could you use ethtool and turn off all offloading for all the interfaces involved.

One last necessary tweak for this kind of domU to domU test is to increase the txqueuelen on all concerned interfaces including the VIFs. I just use ifconfig and set txqueuelen to 10000 (or more). 

The congestion control algorithm implemented is TCP New Reno.

Balraj



On Tue, Aug 6, 2013 at 7:59 PM, Dimosthenis Pediaditakis <dimosthenis.pediaditakis@xxxxxxxxxxxx> wrote:
Hi everyone,
I've being using Mirage under Xen the past few days and I came across an intermittent issue which might have to do with the TCP stack implementation.

I've created an iperf client and an iperf server based on the sources I found here:
https://github.com/mirage/mirage-net/tree/master/direct/tests/iperf_self
I've de-coupled the client from the server, and minimized I/O and per-packet processing between "Net.Flow.write" and "Net.Flow.read" calls, for saturating the network stack's throughput.
The sources of the iperf server/client can be found here (I am using mirari):
 http://www.cl.cam.ac.uk/~dp463/files/iperf.tar.gz
I have also written custom fail-safe scripts which automatically, create, configure, install and uninstall mirage Xen kernels (for XenServer XAPI), minimizing the time required to experiment with different kernels (PM me to provide you with those as well)

The FIRST problem is that the iperf benchmark most of the times stalls and never completes, especially when I run mirage iperf client as a Xen guest (see EXPERIMENTS A, B).
When the iperf client is compiled against unix, the benchmark manages to complete (see EXPERIMENTS C, D), but the speed is not impressive.

The SECOND problem/observation, is that the datarates I obtain with mirage are quite low, especially when compared to Linux guests (near an order of magnitude difference).

It might be the case that the network stack of mirage requires some manual tuning, or that my code is inefficient. If this is the case, feel free to provide me with your comments.
The TCP settings of all linux guests (including dom0) are here:
http://www.cl.cam.ac.uk/~dp463/files/linux.tcp.settings.txt
Also, does anyone know which congestion control algorithm is implemented in the Mirage TCP stack?
Ubuntu and Debian Linux uses Cubic by default.

At  the bottom of this email you can find the details of my setup and the benchmark methodology.
I also include some benchmark results for various configurations.

Thank you,
Dimos




===============================
    ***    Configuration Setup     ***
===============================

*** System Details ***
- Machine specs:  i7 3770, 4-core (8-threads), 32GB DDR3 1333 RAM

*** Ocaml / Mirage ***
- Fetched the latest versions of mirage, mirage-fs, mirage-net, mirage-net-direct, mirage-unix, mirari (all 1.0.0), from "opam-repo-dev"
- Mirage iperf settings:
   Total data volume sent during a benchamrk: 8 gigabits (kilo is 1000, in other words 1000000000 bytes, ~640K [packets )
    Packet size: 1461bytes
    Default TCP settings

*** Xen / Linux ***
- Xen Hypervisor 4.1 (Ubuntu 12.10, XCP, openvswitch, Xen API v1.3)
- Xen vCPU allocations:
    Dom0: pinned on cores 0 and 1 (VCPUs-params:mask=0,2)
    Dom1: pinned on core 2 (VCPUs-params:mask=4)
    Dom2: pinned on core 3 (VCPUs-params:mask=6)
- All Linux programs (e.g. mirari run --unix, iperf) run with "nice -n -15"

*** Networking configuration ***
- Using OpenVSwitch v1.4.3. for bridging
- Default Linux TCP settings for Linux running on DomUs and Dom0 (see here: http://www.cl.cam.ac.uk/~dp463/files/linux.tcp.settings.txt)
- No Jumbo frames for Xen:
    xe network-param-get uuid=a0378bc6-a9e4-9054-50ff-59d5ab8d2ee8 param-name=MTU  --> 1500
- No Jumbo frames for Linux:
    All interfaces are set with 1500 butes MTU
- For the Linux2Linux iperf experiments that run only on Dom0, the client and the server are forced to push traffic through xenbr0 (mtu1500) (not via loopback)
- For the Mirage instances that are compiled for unix target, mirari creates a tap interface which I attach to bridge xenbr1. When I run 2 unix instances of mirage, I force them to communicate over 2 distinct tap interfaces, attached on the same bridge (xenbr1). I had to modify a bit mirari as it had "tap0" and "10.0.0.1" values hard-coded.




===============================
  *** Mirage2Mirage benchmarks ***
===============================

------------------------------------------------------
    Experiment A:
      Xen.DomU( Mirage xen  Iperf.Server )
      Xen.DomU( Mirage xen  Iperf.Client )
------------------------------------------------------
Most of the times (around 7 out of 10) the experiment hangs, and it seems to me that the network stack is block-waiting.
When the experiment hangs, the vCPUs utilisation are ~0% for DomUs, and ~100++% for Dom0.
Also,  I get a mixture of error messages which I attached right under the next paragraph.

When the benchmark manages to complete, I get a huge variability in the throughput.
Note that Dom0 and DomUs are pinned to exclusive vCPUs / physical cores, so the variabilty in performance can't be due to noise from other processes.
Pushing 8 gigabits across the VMs takes anywhere from 6 seconds (~1.35Gbps) to 20seconds (~0.4Gbps).

Server:
Iperf server: Received connection from 192.168.8.102:18836.
RX error 0
RX error 0
RX error 0
RX error 0
RX error 0

Client:
Iperf client: Made connection to server.
TCP retransmission on timer seq = 1435231548
TCP retransmission on timer seq = 1435231548
TCP retransmission on timer seq = 1435231548

Client:
RX: ack (id = 1523)
wakener not found valid ids = [ ]
RX: ack (id = 655)
wakener not found valid ids = [ ]
RX: ack (id = 1975)
wakener not found valid ids = [ ]


------------------------------------------------------
    Experiment B:
      Xen.Dom0( Mirage unix Iperf.Server )
      Xen.DomU( Mirage xen Iperf.client )
------------------------------------------------------
Most of the time the benchmark hangs, with the message "TCP retransmission on timer seq = ...." shown either at the Client, or the Server or both.
When the benchamrk completes, 8 Gigabits are transfered in about 8-9 seconds (0.9-1Gbps)


------------------------------------------------------
    Experiment C:
      Xen.DomU( Mirage xen  Iperf.Server )
      Xen.Dom0( Mirage unix Iperf.client )
------------------------------------------------------
The benchamark always completes without any issues.
The client pushes to the server 8 Gigabits in around 10.2sec (~0.8 Gbps, quite consistent)


------------------------------------------------------
    Experiment D:
      Xen.Dom0( Mirage unix  Iperf.Server )
      Xen.Dom0( Mirage unix Iperf.client )
------------------------------------------------------
The benchmark always completes without hanging and/or errors.
It takes from 8seconds to 10 seconds to pushe 8 Gigabits from the client to the server (0.8 - 1Gbps , quite consistent)





===============================
 ***     Linux2Linux  benchmarks      ***
===============================

------------------------------------------------------
    Experiment E:
      Xen.DomU( Debian XenGuest Iperf.Server )
      Xen.DomU( Debian XenGuest Iperf.client )
------------------------------------------------------
Reaches around 9-10 Gbps, with vCpu utilisations
~45% for the client, ~60% for the server and ~90% for Dom0
Probably Dom0 is the bottleneck (memory bandwidth or processing horsepower, or both)

------------------------------------------------------
    Experiment F:
      Xen.Dom0( Ubuntu  Iperf.Server )
      Xen.DomU( Debian XenGuest Iperf.client )
------------------------------------------------------
Reaches around 20 Gbps, with vCpu utilisations:
 ~70% for the DomU (client), ~80-85% for the Dom0 (server)

------------------------------------------------------
    Experiment G:
      Xen.DomU( Debian XenGuest Iperf.Server )
      Xen.Dom0( Ubuntu Iperf.client )
------------------------------------------------------
Reaches around 22 Gbps, with vCpu utilisations:
 ~90% for the DomU (server), ~90% for the Dom0 (client)

------------------------------------------------------
    Experiment H:
      Xen.Dom0( Ubuntu  Iperf.Server )
      Xen.Dom0( Ubuntu Iperf.client )
------------------------------------------------------
Reaches around 24.5 Gbps, with vCpu utilisations:
 ~95% for the Dom0 (both server and client)








--
Dave Scott

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.