[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Issue: Networking performance in Xen VM on Arm64



Hi there,

I tested the networking performance on my Arm64 platform in Xen
virtual machine, below I will try to give out summary for testing
result and share some analysis, at the end I want to check a bit
from the community and get suggestion before I can proceed.

First of all, if you want to know more details for the profiling, you
could access the slides:
https://docs.google.com/presentation/d/1iTQRx8-UYnm19eU6CnVUSaAodKZ0JuRiHYaXBomfu3E/edit?usp=sharing

## Testing summary

The TL;DR is that I used two tools: netperf and ddsperf to test the
networking latency and throughput for Xen Dom0 and DomU, the below
result shows the performance for sending data from a Xen domain (Dom0
or DomU) to my x86 PC respectively, and performance is poor when
transmit data from Xen DomU (Note, I used the default networking
bridge configuration when launch Xen VM).

  Throughput result:

    Profile     netperf (Mbits/sec)    ddsperf (Mbits/sec)
    Xen-Dom0    939.41                 > 620
    Xen-DomU    107.73                 4~12

  Latency result:

    Profile     ddsperf's max ping/pong latency (us)
    Xen-Dom0    200 ~ 1400
    Xen-DomU    > 60,000

## Analysis

The critical thing for the performance is low level network driver if
it uses synchronous or asynchronous mode for skb transferring.

When we transfer data from my x86 machine to Xen DomU, the data flow is:

  bridge -> xenif (Xen network backend driver)               => Dom0
              `> xennet driver (Xen net forend driver)       => DomU

In this flow, Xen network backend driver (in Dom0) copies skb into the
mediate buffer (gnttab_batch_copy()) and notify Xen VM by sending rx
irq, the key point is the backend driver doesn't wait for Xen VM to
process the skb and directly return to user space, therefore, Xen Dom0
and DomU work in asynchronous mode in this case (Dom0 doesn't need to
wait for DomU), the duration for handling a skb is 30+ us.

Conversely, if transmit data from Xen DomU, the flow is:

           DomU                    |               Dom0
  ---------------------------------+------------------------------------
  xennet driver receives skb       |
    `> send tx interrupt to Dom0   |
                                   |  xenif respond tx interrupt
                                   |  Copy skb into mediate buffer
                                   |  Notify DomU (send tx irq)
  xennet driver handle tx irq      |
  free skb                         |

So we can see when DomU sends out packets, it needs to wait for Dom0 to
process the packets, until Dom0 notifies DomU that packet has been
processed the net forend driver in DomU releases skb.

This means it's a long way to process skbs: Xen DomU and Dom0 work
in synchronous mode, the forend driver in DomU sends out skb and
notifies Dom0, Dom0 handles skb and notifies back to DomU, finally DomU
knows the skb has been processed and releases it.  The duration between
sendind and releasing a skb is about 180+ us.

## Questions

Given Xen network driver has been merged in Linux kernel 2.6 (back in
2007), it's very unlikely I am the first person to observe this issue.

I think this is a common issue and not specific to Arm64 arch, the
reason is the long latency is mainly caused by Xen networking driver
and I did't see the Xen context switching on Arm64 is abnormal (I saw
it takes ~10us for context switching between Xen domains).

Could anyone confirm if this is a known issue?
 
The second question is how to mitigate the long latency when send data
from DomU.  A possible solution is the Xen network forend driver copies
skb into mediate (bounce) buffer, just like what does in Xen net
backend driver with gnttab_batch_copy(), in this way the forend driver
doesn't need to wait for backend driver response and directly return
back.  But here I am not clear for the mechanism for Xen grant table,
especially if the Xen grant table is only writtable from Dom0, then it
would be hard for us to optimize the forend driver in DomU by directly
copying skb into the grant table.  Any thoughts for this?

Welcome any suggestion and comments.  Thanks!

Leo



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.