Hi all,
The last few days I've been trying to pin-down the performance
issues of the Mirage network stack, when running over Xen.
When trying to push net-direct to its limits, random transmissions
stall for anywhere between 0.1sec-4sec (especially at the sender).
After some experimentation, I believe that those time-outs occur
because netif is not (always) notified (via Activations) about freed
TX-ring slots.
It seems that these events (intermittently) don't reach the guest
domain's front-end driver.
AFAIK Activations.wait() currently blocks waiting for an event on
the port belonging to the event channel for the netif.
This event is delivered to Activations.run via Main.run.aux which is
invoked via the callback in app_main() of runtime/kernel/main.c
The problem I observed was that using "SCHEDOP_poll" without masking
the intended events, the hypervisor didn't "wake-up" the blocked
domain upon new event availability.
The requirement for event-masking when using "SCHEDOP_poll" is also
mentioned in the Xen documentation.
I've produced a patch that seems to fix the above erratic behavior.
Now I am able to consistently achieve higher speeds (up to 2.75Gbps
DomU2Domu). Please, have a look at my repo:
https://github.com/dimosped/mirage-platform
It will be helpful to use big-enough txqueuelen values for your
VIFs, as the current TCP implementation doesn't like much losses at
high datarates. The default size in my system was only 32.
I have also modified the mirage-net-direct by adding per-flow TCP
debug logging. This has helped me to better understand and pin-down
the problem.
You can grab the modified sources here:
https://github.com/dimosped/mirage-net
Be aware that logging big volumes of data for a TCP flow will
require big enough memory. Nevertheless, it only barely affects
performance.
The iperf benchmark sources can be found here:
https://github.com/dimosped/iperf-mirage
I've included as much info as possible in the README file. This
should be sufficient to get you started and replicate my
experiments.
In the iperf-mirage repo there is also a Python tool, which you can
use to automatically generate plots based on the collected TCP debug
info (I include also a sample dataset in data/ ):
https://github.com/dimosped/iperf-mirage/tree/master/tools/MirageTcpVis
For really large datasets, the script might be slow. I need to
switch into using NumPy arrays at some point...
Please keep in mind that I am a newbie in Xen/Mirage so your
comments/input are more than welcome.
Regards,
Dimos
------------------------------------------------
MORE TECHNICAL DETAILS
------------------------------------------------
-----------------------------------------------------------------
=== How (I think) Mirage and XEN scheduling works ===
-----------------------------------------------------------------
- When Netif receives a writev request, it checks if the TX ring
has enough empty space (for the producer) for the data
- If there is not enough space, it block-waits (via
Activations.wait) for an event on the port mapped to the netif (and
bound to the backend driver)
- Otherwise it pushes the request.
- Activations are notified (via run) from "aux ()" in Main.run.
Once notified, it means that the waiting netif can proceed, check
again the ring for free space, write a new request, and send an
event to the backend.
- Main.run.aux is registered as a callback (under name
"OS.Main.run") and is invoked in xen/runtime/kernel/main.c (in
app_main() loop). As long as the Mirage guest domain is scheduled,
this loop keeps running.
- However, in Main.run.aux, the Mirage guest domain is blocked via
"block_domain timeout" if the main thread has no task to perform.
- In turn, "block_domain" invokes caml_block_domain() found in
xen/runtime/kernel/main.c, which issues a
"HYPERVISOR_sched_op(SCHEDOP_poll, &sched_poll);" hypercall
-------------------------------------
=== Polling mode issue ===
-------------------------------------
In my opinion, and based on debug information, it seems that the
problem is that Mirage uses "SCHEDOP_poll" without masking the event
channels.
The XEN documentation clearly states that with "SCHEDOP_poll" the
domain would yield until either
a) an event is pending on the polled channels and
b) the timeout time (given in nanoseconds, is not duration but
absolute system time) is reached
It also states that this SCHEDOP_poll can only be be executed when
the guest has delivery of events disabled.
In Mirage, netif events are not masked and therefore they never
"wakeup" the guest domain.
The guest only wakes-up whenever a thread is scheduled to wakeup in
Time.SleepQueue (e.g. a TCP timer).
Once the guest is scheduled again, it completes any outstanding
tasks, sends any packets pending, and whenever a) the TX ring gets
full, or b)the hypervisor it, c) it will sleep again.
To further support the above, whenever I press buttons via
XEN-console while the mirage-sender is running, the execution
completes faster.
----------------
=== Fix ===
----------------
There are multiple ways to mask events (e.g. at VCPU level, event
level etc).
As a quick hack I replaced "Eventchn.unmask h evtchn;" in
Netif.plug_inner with Eventchn.mask h evtchn (which I had to
create, both in Eventchn and as a stub in
xen/runtime/kernel/eventchn_stubs.c).
See:
https://github.com/dimosped/mirage-platform/commit/6d4d3f0403497f07fde4db6f4cb63665a8bf8e26
|