Xen project Mailing List

Re: raw netif performance enhancement

To: Charalampos Rotsos <cr409@xxxxxxxxxxxx>

From: Anil Madhavapeddy <anil@xxxxxxxxxx>

Date: Mon, 26 Aug 2013 12:07:18 +0100

List-id: MirageOS development <cl-mirage.lists.cam.ac.uk>

On 26 Aug 2013, at 09:29, Charalampos Rotsos <cr409@xxxxxxxxxxxx> wrote: > > A first thing I observed with Mirage was that beyond 700Mbps the VM crashes > because it runs out of memory. This behaviour I think is related to the logic > of > the listen method which spawns a new thread for each packet received. If the > thread > creation rate is lower than the packet processing rate, then the VM cannot > fulfil the memory requirements and dies. This is easily solvable > using a capped Lwt_stream which functions as a NIC rx queue. Using this trick > I > managed to solve the memory sortage problem. That's right -- the default streams are pretty harmful by not imposing flow control, but a capped stream is fine. We should make that change in the listen function and not allow the unbounded version at all. > Now my next problem is that the VM cannot switch more that 700Mbps, while the > CPU utilisation is maximum 60% during execution. I inserted some counter in > the > code and noticed that the tx ring tends to drop some packets (I compare the > number of packet I inject to the vif with the number of packets observed on > the > end host of the iperf test). I have verified using tc and ifconfig that these > packets are not lost in between the vif during the switching process (you need > to do a bit of configuration in the txqueuelen in order to ensure that packets > are not dropped). My guess so far is that the performance bottleneck is on the > TX ring of the vif. Are there any hints on how I could improve the performance > or any ideas on the bottleneck point? There are a couple of low-hanging-fruit fixes to drop CPU usage by 50% at least, in ascending order of difficulty: - OCaml 4.1beta1 has compiler builtins that are automatically used by cstruct to eliminate all the intermediate allocation in accessing struct values. This works fine in the UNIX backend, but we need to patch mirage-platform to detect which version of OCaml it's being built for, and swap in the runtime libraries for that version. This isn't too difficult, but I've been putting it off until we get the core fully stable under 4.0 first. - Investigate the netfront multiring/offload options in Xen. There are patches flying around to remove the need for so much granting in Xen, which is a serious bottleneck with small packets. If that's done, you relieve a lot of the CPU pressure from the interactions with Xen. - As a slightly future-looking thing, Pierre is working on amazing looking inlining layer for OCaml that (when it does cross-module) should dramatically improve Mirage performance by working across all the libraries we use: http://www.ocamlpro.com/blog/2013/07/11/inlining-progress-report.html Btw, is your test case possible to run as a single unikernel, as Balraj's tcp loopback test is? If so, committing it to mirage-skeleton would be useful in order to make us run it regularly. -anil

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.